1. Introduction
Carbon dioxide (CO
2) is the most significant greenhouse gas influencing the global climate. Although an individual CO
2 molecule traps less heat than methane and other gases, its high atmospheric concentration makes it the largest contributor to the greenhouse effect. Since the Industrial Revolution, human activities have significantly increased greenhouse gas emissions, which are now the main driver of global warming [
1,
2]. Climate change has led to more frequent extreme weather events, increased pressure on ecosystems, accelerated glacier melt, and rising sea levels, posing threats to coastal regions. Therefore, establishing an accurate greenhouse gas monitoring and research system is essential to track emission sources and trends, providing a scientific basis for developing response strategies [
3,
4].
Currently, CO
2 concentration monitoring technologies have evolved into an integrated system combining ground-based observations, remote sensing, and numerical modeling [
5]. However, each monitoring method has its limitations. Ground-based observations offer high accuracy but suffer from limited spatial coverage, while remote sensing provides broad coverage yet is susceptible to atmospheric interference [
6]. In contrast, atmospheric chemistry models, by integrating observational data with physical process simulations, can achieve higher spatial resolution and capture CO
2 distribution characteristics at different altitudes [
7,
8]. Atmospheric chemistry models are mainly divided into global-scale and regional-scale categories. As a representative model within the Weather Research and Forecasting (WRF) framework that couples chemical processes online, WRF-Chem has been widely applied in recent years [
9,
10,
11,
12,
13]. Ballav et al. [
14] used the WRF-Chem model to analyze drivers of CO
2 concentration variations across different time scales. They found that diurnal variations are mainly influenced by terrestrial carbon flux and boundary layer height, while weather-scale changes are primarily driven by the spatial distribution of surface fluxes and wind direction. Liu et al. [
15] simulated atmospheric CO
2 concentrations in the Beijing-Tianjin-Hebei region in 2015 using WRF-Chem. Comparison with GOSAT satellite data showed that the model systematically overestimated concentrations by 2–3 ppm, which may be due to overestimation of tropospheric CO
2 in the model. Dong et al. [
16] applied a coupled WRF-Chem and VPRM to study atmospheric CO
2 concentrations across China from 2016 to 2018. They found that spatiotemporal variations were mainly influenced by anthropogenic emissions, while seasonal fluctuations were primarily governed by terrestrial fluxes and the CO
2 background field. Seo et al. [
17] simulated atmospheric CO
2 concentrations over East Asia from 2009 to 2018 using WRF-Chem at a high spatial resolution of 9 km. Based on the simulations, they constructed a comprehensive dataset that includes multi-source emissions and biospheric CO
2 fluxes. In summary, thanks to its advantages such as online coupling of meteorological fields and chemical processes, as well as adjustable spatial resolution, the WRF-Chem model has been widely applied in regional atmospheric CO
2 concentration simulation studies. Particularly in East Asia and China, a series of refined modeling practices and multi-level observational data validations have demonstrated that the model can reliably capture the spatiotemporal distribution characteristics of CO
2 concentrations.
To more accurately simulate CO
2 concentrations, data assimilation techniques that integrate observational data with model simulations have emerged [
18,
19,
20,
21]. In recent years, data assimilation studies based on the WRF-Chem model have significantly improved the estimation accuracy of carbon concentrations at regional scales. Seo et al. [
22] used the WRF-Chem model with three-dimensional variational data assimilation (3DVAR) to investigate the impact of meteorological data assimilation on high-resolution CO
2 simulations in East Asia. Subsequently, Seo et al. [
23] applied the WRF-Chem/DART coupled system with the Ensemble Adjustment Kalman Filter (EAKF) algorithm to assimilate surface CO
2 concentration observations, simulating atmospheric CO
2 concentrations in East Asia during January (winter) and July (summer) of 2019, and systematically analyzed the spatiotemporal distribution characteristics of CO
2 in different seasons. Zhang et al. [
24] extended the Data Assimilation Research Testbed (DART) system, integrated it with the WRF-Chem model, and used the EAKF method to assimilate OCO-2 satellite XCO
2 data, estimating atmospheric CO
2 concentrations in the Midwestern United States. Building on Zhang et al. [
24], Jin et al. [
25] coupled OCO-2 satellite observations with the WRF-Chem/DART atmospheric transport model to assimilate and invert CO
2 concentrations and fluxes in Lisbon, Portugal. However, existing studies still face significant challenges. Whether based on surface observations or single-satellite platforms (e.g., OCO-2) providing XCO
2 data, currently available assimilation data sources remain limited in spatiotemporal coverage, suffer from severe data scarcity, and often exhibit spatial discontinuity and temporal sparsity. Furthermore, the specific application scenarios and practical utility of CO
2 concentration datasets generated through such assimilation methods require further exploration, as does the scientific value and supporting role of related achievements in CO
2 emission research.
Emission inventories serve as critical input data for air quality modeling and pollution control, and their accuracy directly impacts the effectiveness of numerical forecasting and the development of emission reduction strategies [
26,
27]. Currently, the “bottom-up” approach is the mainstream method for constructing emission inventories. This method aggregates activity data and emission factors by sector and process to build annual or monthly total emissions at regional or national scales [
28]. However, when disaggregating these macro-level totals into hourly gridded data required by models, significant uncertainties arise due to insufficient energy statistics, limited representativeness of emission factors, and imperfect temporal profile construction [
29]. These uncertainties become a key source of error affecting the accuracy of air quality models. To overcome the inherent limitations of the “bottom-up” approach, “top-down” data assimilation techniques have been increasingly applied in recent years to optimize emission inventories of atmospheric pollutants. This category of methods uses chemical transport models as a bridge to integrate multi-source observational data (e.g., ground-based monitoring and satellite remote sensing) with model simulations. By establishing the response relationship between concentration and emissions, it dynamically constrains and optimizes emission fluxes [
30,
31]. This study improves upon existing emission calculation methods for atmospheric pollutants and successfully extends their application to atmospheric CO
2, achieving more accurate inversion of regional carbon emissions.
Therefore, this study employs the WRF-Chem regional atmospheric chemistry model to assimilate multi-platform satellite-derived CO2 concentration data, establishing an assimilation system capable of simultaneously optimizing CO2 concentrations and emission sources. The system enables hourly frequency optimization updates, effectively enhancing the spatiotemporal continuity of assimilation results. By integrating multi-source data, it significantly reduces uncertainties associated with assimilating single observational datasets. Furthermore, the system inverts concentration changes during the assimilation process into emission fluxes, enabling the correction of existing carbon emission inventories and providing a new technical approach for carbon emission monitoring. In this paper, we first used the WRF-Chem model to simulate the concentration distribution for December 2019 as a baseline experiment. Subsequently, we conducted two parallel assimilation experiments (3DVAR and EAKF), assimilating multi-source satellite observations into the model at 6 h intervals to reconstruct a regional CO2 concentration field. Based on this reconstruction, we inverted the systematic discrepancies between simulated and assimilated concentrations to derive CO2 emission errors, ultimately achieving dynamic optimization and precise correction of existing emission inventories. This system not only improves the accuracy of regional CO2 concentration simulations but also provides crucial methodological support for dynamically assessing and optimizing carbon emission inventories.
3. Results
3.1. Sensitivity Experiments
3.1.1. Evaluation of Hourly Carbon Emission Correction Through Assimilation
To evaluate the sustained impact of each initial field assimilation on CO
2 emission correction, a sensitivity experiment focusing on initial field assimilation was conducted prior to the formal experiments. The model underwent a 60 h spin-up period starting from 0000 UTC on 1 December 2019, to establish a reasonable atmospheric initial field. Subsequently, multi-source satellite remote sensing fused CO
2 concentration data were assimilated every 6 h, with a 12 h simulation conducted after each assimilation event, starting from 1200 UTC on 3 December 2019. By analyzing the hourly simulation results after assimilation and comparing them with the control experiment without assimilation, the corresponding emission corrections were calculated using Equations (5) and (6) to quantitatively assess the temporal evolution of the assimilation effects.
Figure 3a–e display the spatial distribution of emission correction means at 3 h, 6 h, 9 h, and 12 h after assimilation, as well as the mean from 0 h to 12 h after assimilation, respectively.
Figure 3f shows the time series of the average CO
2 concentration differences before and after assimilation and the average emission correction amounts over the study region. The results indicate that carbon emission sources were underestimated in most parts of central and eastern China, while they were overestimated in regions such as Yunnan and Guizhou. The magnitude of carbon emission correction gradually diminished over time after assimilation, consistent with the weakening of the carbon assimilation effect. The average correction over the 0 h–12 h period was found to be closer to the distribution observed at 6 h after assimilation, and the correction response remained relatively stable around 6 h after assimilation, indicating that the mid-term assimilation effect is representative.
3.1.2. Evaluation of Hourly Carbon Concentration Through Simulations
To better evaluate the distinct impacts of initial field assimilation and the application of a new emission inventory on CO2 concentration simulations, we conducted sensitivity experiments using the optimized emission inventory. Starting from 1200 UTC on 3 December, simulation experiments (SIM_N) using this inventory were conducted every 6 h, with each run lasting 12 h. To assess the optimization effect, the SIM_N experiments were compared with assimilation experiments (ASS) that incorporated multi-source satellite remote sensing fused data and simulation experiments using the MEIC inventory (SIM), with a focus on analyzing differences in simulated CO2 concentrations over every 12 h period.
Figure 4 shows the monthly average spatial distribution of CO
2 concentrations from the ASS experiment at 3, 6, 9, and 12 h after the start of assimilation, as well as the differences with the SIM and SIM_N experiments at corresponding times. The results indicate that the CO
2 concentrations simulated by the SIM_N experiment show smaller deviations from the ASS assimilation experiment. As the assimilation time increases, the deviations I confirm.of both the SIM and SIM_N experiments from the ASS experiment become more similar, further reflecting the gradual weakening of the assimilation effect from 6 to 12 h.
As shown in
Table 4, the SIM_N experiment significantly outperforms the SIM experiment across all evaluation metrics for CO
2 concentration simulation, demonstrating the positive impact of updating the emission source on improving simulation performance. The Bias of the SIM_N experiment is lower than that of the SIM experiment at all times. Specifically, the 3 h forecast bias decreased from 1.67 ppm to 0.33 ppm, a reduction of 80.2%, and the 6 h forecast bias decreased from 1.53 ppm to 0.39 ppm, a reduction of 74.6%. This indicates that the new emission source mitigates the issue of systematic overestimation. The RMSE of the SIM_N experiment also shows improvement. The 3 h forecast RMSE decreased from 3.10 ppm to 1.09 ppm, a reduction of 64.8%, and the 6 h forecast RMSE decreased from 2.83 ppm to 0.99 ppm, a reduction of 65.0%. The biases of the new emission source experiment are consistently lower than the average biases of the control experiment.
In summary, the assimilation results and emission source optimization at 1800 UTC demonstrate relatively better performance and tend to stabilize, reaffirming that the 6 h assimilation effect is optimal. Based on the results of the two sensitivity experiments, the 6 h post-assimilation results were selected for subsequent calculation of carbon emission corrections.
3.2. Spatial Changes in Emissions
Figure 5 displays the spatial distribution of the optimized mean emissions for December 2019 and their differences from the MEIC emission inventory. Compared with the MEIC emissions (
Figure 1b), both optimized emission results (
Figure 5a,b) exhibit spatial patterns largely consistent with MEIC, sharing similar high- and low-value areas. This indicates that the spatial distributions of the optimized emission inventories are reasonable and reliable. An analysis of the differences between the optimized inventories and the MEIC inventory (
Figure 5c,d) reveals minor adjustments in carbon emissions over the northeastern and western regions, with slight overestimations observed in western China and coastal areas of Shandong and Jiangsu provinces. In contrast, the MEIC inventory shows slight underestimations in the economically developed eastern regions, such as NEC, BTH, YRD, and PRD, where the optimized emissions increase noticeably. Although certain local discrepancies exist, the differences in most areas remain within a range of ±3 × 10
4 mol km
−2 h
−1, indicating a high overall consistency in the optimization results.
The optimized simulation results of CO
2 emissions in China and its major regions for December 2019 are presented in
Table 5. Compared to the MEIC baseline emissions, both the 3DVAR and EAKF assimilation methods indicate an increase in national CO
2 emissions, with increments of 13.06% and 7.51%, respectively. This reflects that anthropogenic emission activities during the simulation period were generally higher than the inventory estimates. Spatially, emission changes exhibit significant heterogeneity across regions.
In the economically developed and industrially intensive YRD region, the CO2 emissions simulated by both optimization methods are relatively close to the MEIC inventory, only slightly higher than the baseline. This suggests a relatively high representativeness of the inventory in this region, which may also be related to its industrial restructuring and continuously implemented emission reduction measures. BTH and PRD regions show moderate increases in emissions, with increments ranging between 2% and 10%, potentially linked to increased heating demand in winter and the still high intensity of industrial production activities.
In the NEC region, emission growth is more pronounced, particularly with a 12.62% increase in the 3DVAR method. Major cities in the region, such as Harbin, Changchun, and Shenyang, show significant increasing trends in emissions, while the changes in surrounding suburban and rural areas are relatively minor. This spatial disparity may stem from increased coal heating demand in winter and the influence of urban energy consumption structures. CC region exhibits the most significant absolute increase in emissions, with the 3DVAR inversion results showing a 20.08% increase compared to MEIC, while EAKF also indicates an 8.23% rise. As a traditional industrial base, this region has a high emission intensity, and its changes may be influenced by both the post-pandemic resumption of work and production and the regulatory effects of environmental policies.
The XJ region is the only area showing a significant decline in emissions, with a decrease of over 10% particularly in the EAKF results. This aligns with the region’s sparse distribution of emission sources and low intensity of human activities, as the optimization algorithms better capture the weaker emission signals in remote areas.
In terms of methodological comparison, both optimization schemes show consistent directions of change in most regions. However, the emission estimates of the 3DVAR method are generally higher than those of EAKF, with significant differences particularly in Northeast China and Central China. This phenomenon reflects the sensitivity differences in different assimilation algorithms in handling prior error structures and observational information. The 3DVAR method may respond more sensitively to emissions in strong source regions, while EAKF demonstrates stronger constraint stability.
3.3. Temporal Changes in Emissions
Figure 6 illustrates the daily average and hourly average variations of CO
2 emissions in China from 1 to 31 December 2019. It should be noted that the MEIC dataset only provides monthly total emissions and does not include fluctuations at daily or hourly scales. Therefore, daily emissions within the same month are generally assumed to remain constant. In terms of daily average emissions (
Figure 6a), the optimized mean emissions from the 3DVAR and EAKF methods are 38.29 MT/day and 36.41 MT/day, respectively, both slightly higher than the MEIC baseline value of 33.86 MT/day. Between 22 and 31 December, both optimized results reached their monthly peaks, with 3DVAR at 42.35 MT/day and EAKF at 38.54 MT/day. This increase can be attributed to multiple factors. Firstly, elevated anthropogenic emissions in December resulted from combined heating demands and industrial activities in northern China [
52]. Secondly, stagnant meteorological conditions during this period weakened atmospheric dispersion, leading to CO
2 accumulation. In the inversion system, this suppressed vertical mixing is partially interpreted as an increase in “effective emissions.” Additionally, if systematic underestimation exists in the prior emission inventory, the assimilation process progressively corrects for this bias over time, further contributing to the rising trend in estimated emissions.
On an hourly scale (
Figure 6b), the MEIC data exhibit a typical bimodal structure. The emission peaks in the MEIC data occur at 0100 UTC (0900 Beijing Time) and 0900 UTC (1700 Beijing Time), corresponding to the morning and evening commuting rush hours, respectively. This pattern aligns with existing understanding of the temporal characteristics of anthropogenic emissions. However, the two optimization results show significant overestimation during the first peak period and underestimation during the second peak. This discrepancy may be attributed to the superposition of factory emissions and morning traffic flow in the early hours, which elevates actual emissions, whereas social activities decline sharply after the evening peak, leading to a rapid reduction in emission intensity. The optimized emission data generally show higher values between 0100 UTC and 0400 UTC (0900 to 1200 Beijing Time), while lower values are observed from 1600 UTC to 2200 UTC (0000 to 0600 Beijing Time). This indicates sustained strong industrial and transportation emissions from morning to noon, whereas only baseline industrial emissions remain from late night to early morning, reflecting a significant reduction in activity levels.
Figure 7 further illustrates the hourly variation characteristics of optimized emissions across six major regions during the study period. Temporal analysis reveals distinct differences in optimized emission patterns among these regions. In northern China (including the NEC and BTH region), both optimization approaches maintain a bimodal structure, though the second peak is less pronounced. Both optimized results show significant overestimation during 0000–0600 UTC, marked underestimation during 0600–1200 UTC, and close alignment with the MEIC inventory between 1200 and 2300 UTC. During the 0000–1200 UTC period, the 3DVAR optimized results demonstrate smaller adjustments and closer agreement with MEIC, whereas the EAKF method exhibits larger deviations.
The XJ region displays the smallest hourly variation amplitude in optimized emissions among all regions. Except for slightly higher 3DVAR values during 1300–2100 UTC, both optimized results remain below the MEIC inventory throughout the day. This suggests relatively stable emission levels in Xinjiang, potentially attributable to its consistent industrial structure, minimal diurnal fluctuations in human activity, and simpler energy consumption patterns.
In the CC region, both optimized emission trajectories show broadly synchronized temporal variations, remaining significantly higher than MEIC during 0000–0900 UTC and moderately elevated during 1200–2300 UTC. Within the YRD region, EAKF results exceed 3DVAR values from 0000 to 0800 UTC, while the reverse pattern occurs during 0900–1200 UTC. The PRD region shows a single emission peak at 0400 UTC for EAKF compared to 0200 UTC for 3DVAR, with both methods exhibiting gradual declines post-peak. These regional disparities in diurnal emission characteristics likely stem from varying emission source structures, industrial activity patterns, and meteorological dispersion conditions across different areas.
3.4. Evaluation of Posterior Emission Source Simulation Performance
To assess the improvement in CO
2 concentration simulations achieved by posterior emission sources across different regions, this study compares the diurnal variation characteristics of simulated CO
2 concentrations from three experiments (Sim_MEIC, Sim_3DVAR, Sim_EAKF) with observational data for December 2019. WRF-Chem model outputs were evaluated by comparing the simulated surface-layer concentrations (lowest model level) with in situ measurements from the WDCGG stations at WLG and HKO, as shown in
Figure 8. Results demonstrate that data assimilation techniques improve the accuracy of CO
2 concentration simulations, though the degree of improvement exhibits regional variations.
At the WLG station, observed CO2 concentrations show minor fluctuations within the 412–418 ppm range, consistent with its characteristics as a global background station minimally influenced by local anthropogenic emissions. While the Sim_MEIC experiment captures the general background concentration level, it exhibits a systematic positive bias of approximately +3 ppm relative to observations, particularly during early December. Furthermore, it fails to reproduce several minor decreasing fluctuations observed in mid-to-late December. In contrast, both assimilation experiments reduce the systematic overestimation present in Sim_MEIC. The Sim_3DVAR experiment shows the smallest discrepancy from observations, achieving remarkable consistency with measured values after December 25. The Sim_EAKF experiment, however, maintains a slight overestimation, indicating its assimilation performance is inferior to that of Sim_3DVAR.
At the HKO station, the observed concentrations exhibit oscillations exceeding 30 ppm in amplitude, reflecting the emission characteristics of urban areas influenced by anthropogenic activities such as morning-evening traffic peaks and weekday-weekend variations. Additionally, potential marine-source emissions in the vicinity of the station introduce extra uncertainties to the simulations, resulting in substantial biases in the baseline experiment. The Sim_MEIC baseline experiment performed the poorest, demonstrating a systematic underestimation and failing to capture almost all major concentration peaks. This indicates that the prior MEIC emission inventory underestimates both the intensity of anthropogenic CO2 emissions and their diurnal variation patterns in this region. The Sim_3DVAR experiment shows improvement over Sim_MEIC, successfully elevating simulated concentration levels and capturing the main variation trends. However, it still exhibits insufficient peak magnitudes and phase lags—for instance, the simulated peaks on 10 and 22 December lag behind observations by half a day to a full day. Compared to Sim_3DVAR, the Sim_EAKF experiment yields lower concentration values and demonstrates relatively weaker performance in reproducing peak features.
WRF-Chem derived XCO
2 concentrations by applying pressure-weighted averaging to the simulated vertical concentration profiles, which were then compared with ground-based observations from the Hefei and Xianghe stations in the TCCON network, as shown in
Figure 9. At Hefei station, simulated concentrations fluctuated within a range of approximately 409 ppm to 420 ppm, with a total variation of about 11 ppm. However, limited observational data revealed that the Sim_3DVAR experiment performed similarly to Sim_MEIC, matching its results on 16 and 26 December but exhibiting underestimation during other periods.
At Xianghe station, observed XCO2 concentrations demonstrated fluctuations, ranging from approximately 407 ppm to 416 ppm with a total amplitude of about 9 ppm. This pattern reflects the station’s characteristic as a site influenced by both regional anthropogenic activities and natural processes. The Sim_3DVAR experiment showed improvement over Sim_MEIC by partially correcting the systematic low bias, resulting in better agreement with observations. This was evident during 8–9 and 22 December, when it captured concentration variations. The Sim_EAKF experiment demonstrated a good performance at Xianghe station, with its simulated curve showing the closest agreement with observations throughout the study period. It not only achieved consistency in concentration levels but also captured multiple short-term fluctuations, such as those occurring from 1 to 6 December.
We also compared the results from the three experiments (Sim_MEIC, Sim_3DVAR, and Sim_EAKF) with the CAMS global greenhouse gas reanalysis (EGG4), as shown in
Figure 10. The results demonstrate that compared to the Sim_MEIC experiment, both Sim_3DVAR and Sim_EAKF significantly improved the accuracy of CO
2 concentration simulations. Specifically, the bias decreased from 0.566 ppm to 0.140 ppm and 0.169 ppm, respectively, while the RMSE was reduced from 1.177 ppm to 0.599 ppm and 0.626 ppm, representing decreases of approximately 49% and 47%, respectively. The assimilation methods based on 3DVAR and EAKF optimized the simulation results, reducing both systematic bias and random errors, validating the effectiveness of data assimilation in enhancing the accuracy of carbon emission inversion.
4. Discussion
This study developed an atmospheric CO2 source inversion method based on the WRF-Chem model coupled with 3DVAR/EAKF assimilation frameworks, which enables hourly optimization of emission inventories through the integration of multi-source remote sensing satellite observations of XCO2 concentrations. However, this methodology relies on two critical assumptions: First, the influence of CO2 chemical reactions on sources and sinks is negligible within a 1 h time window. Since CO2 chemical reaction rates are slow under cloud-free and precipitation-free conditions, this requirement can be satisfied by excluding such areas and time periods. Second, it assumes that when wind speed remains below 4 m·s−1 and divergence exceeds 10−4 s−1 within one hour, CO2 diffusion is confined to grid-cell regions within the boundary layer. This assumption holds reasonable accuracy under stable boundary layer conditions, as CO2 primarily undergoes intra-grid diffusion. However, in situations with elevated and unstable boundary layers, where the WRF-Chem model exhibits biases in simulating unstable boundary layer heights and upper-level winds are typically stronger, CO2 may experience excessive diffusion or advective transport beyond grid boundaries. This could lead to an underestimation of emissions and consequently affect inversion accuracy.
We compared the original and corrected MEIC emission inventories for December 2019 with the EDGAR emission inventory, as shown in
Figure 11. Comparative analysis between the MEIC inventory (
Figure 1b) and EDGAR inventory (
Figure 11a) reveals spatial characteristics: the EDGAR inventory shows higher emission values in urban high-carbon emission areas but lower values in surrounding regions. In contrast, the MEIC inventory demonstrates opposite characteristics, with less pronounced representation of urban high-emission zones but more significant emission distribution in peripheral areas. In terms of overall emission levels, as shown in
Figure 11b, the EDGAR inventory generally exhibits higher values than the MEIC inventory. Further comparison between the Sim_3DVAR and Sim_EAKF experimental results with the EDGAR inventory demonstrates that both correction methods effectively reduce the RMSE. Specifically, the 3DVAR method reduces RMSE by 9% and increases R by 12%, while the EAKF method demonstrates better performance, achieving a 56% reduction in RMSE and a 39% improvement in R. These results indicate spatial distribution differences between the MEIC and EDGAR inventories regarding carbon emissions. The data assimilation methods enhance the performance of the MEIC inventory, with the EAKF method exhibiting better correction capability.
We analyzed the average planetary boundary layer height (PBLH) over different time periods within the study area and plotted the daily mean PBLH variation curve, as shown in
Figure 12. The study found that during the 1200 UTC to 2300 UTC period, the PBLH was relatively low, generally ranging between 100 and 300 m, indicating a relatively stable planetary boundary layer structure during this time. This stability favors the accumulation of near-surface pollutants, and the simulation results are less influenced by vertical diffusion. In contrast, during the 0300 UTC to 0800 UTC period, the PBLH increased significantly, and the boundary layer became unstable. The stronger vertical diffusion led to the dilution of near-surface CO
2 concentrations, which may weaken the simulation differences between various emission scenarios. Therefore, this study concludes that the simulation results during the 1200 UTC to 2300 UTC period are more reliable, with relatively lower uncertainty in emission source inversion. On the other hand, during the 0300 UTC to 0800 UTC period, due to the intense turbulent mixing caused by the unstable boundary layer, the current method for estimating emission sources still exhibits considerable uncertainty. Future research should incorporate more observational data and improved vertical mixing parameterization schemes.
In the WRF-Chem simulation of CO2 concentrations, this study utilized the MEIC emission inventory with an original spatial resolution of 0.25° (approximately 27 km). To minimize the uncertainty introduced by the emission inventory in the simulation, the model grid resolution was also set to 27 km, thereby avoiding potential biases caused by spatial resampling of emission data. Additionally, the MEIC inventory provides monthly average emission data, and when processing it into an hourly scale, the hourly emission factors were configured based on recommendations from the MEIC emission inventory development team. This process inevitably introduces certain errors. Moreover, given the significant differences in emission characteristics across different regions, developing region-specific hourly factors in the future will help further enhance the accuracy of the model in simulating CO2 concentrations.
Currently, widely used global carbon emission inventories (e.g., EDGAR and ODIAC) provide CO2 concentration data with limited spatial resolution, while regional emission inventories commonly applied in Asia (e.g., MIX and MEIC), though offering regional data products to some extent, still require improvements in temporal resolution. In practical validation processes, the availability of real emission source data for comparison is extremely limited. Therefore, to establish a reliable reference, this study adopts an assimilated CO2 concentration field as the optimized result. Based on this posterior concentration field, carbon emissions are calculated using a retrieval method, serving as a proxy for the “true” emission values. However, this approach still has certain limitations. On one hand, the inversion results are influenced by factors such as model errors, assimilation algorithms, and the representativeness of observational data, leading to inherent uncertainties. On the other hand, the lack of independent real emission data for validation also restricts the reliability of the inversion results as a benchmark truth. Future efforts should incorporate more ground-based observations and higher-resolution remote sensing data to further enhance the accuracy of emission estimates.
Prior to WRF-Chem assimilation, we implemented rigorous quality control on the multi-source fused XCO
2 dataset from Jin et al. [
33] by excluding all data points with uncertainties exceeding 3 ppm, which enhanced the dataset’s accuracy and reliability, yet residual assimilation errors still mainly stem from regionally heterogeneous uncertainties in the dataset construction process—variable regional observation conditions disrupt satellite CO
2 spectral detection and leave residual retrieval errors that cannot be fully eliminated via uncertainty-weighted fusion, the uneven global distribution of TCCON validation stations weakens uncertainty quantification in data-sparse regions where fused XCO
2 uncertainties depend more on model parameterization than ground-truth constraints thus introducing uncalibrated errors, 30-day temporal smoothing and CarbonTracker-based gap-filling add extra uncertainties in sparse-observation regions where most data are model-simulated and carry biases from CarbonTracker’s representation of regional CO
2 transport and source-sink processes, and Maximum Likelihood Estimation and Optimal Interpolation fusion algorithms fail to resolve divergent satellite retrievals in complex terrain and land-ocean transition zones. These uncertainties propagate into the assimilation system and may lead to spatially correlated biases in the optimized fluxes.
In this study, the optimization of the anthropogenic carbon emission inventory is based on a clear and reasonable premise. Short-term variations in regional atmospheric CO2 concentrations are primarily influenced by anthropogenic carbon emissions. We treat non-anthropogenic sources—including biogenic fluxes simulated by the VPRM, ocean fluxes based on JMA data, and negligible biomass burning emissions during the non-fire season—as deterministic background or secondary contributors. The magnitudes of these non-anthropogenic fluxes are far lower than those of anthropogenic emissions, and their potential errors have statistically and physically negligible impacts on the inversion results. Therefore, the concentration discrepancies identified by the assimilation system between the model and observations can be robustly attributed to spatiotemporal errors in the anthropogenic emission inventory, rather than uncertainties in natural source fluxes. This ensures the reliability of the emission inversion results in accurately reflecting anthropogenic activities.
Potential uncertainties may also arise from the processing of MEIC inventory data, primarily due to the spatiotemporal redistribution performed by the meic2wrf tool. The tool applies empirical parameters to determine emission height distribution coefficients and temporal allocation factors for five major anthropogenic sectors, including power and industry. While the default parameters are suitable for national averages or typical regions, they may not fully capture local industrial structures and energy consumption patterns in certain study areas. Moreover, these parameters do not account for dynamic factors such as seasonal and meteorological variations, which can influence the spatiotemporal characteristics of emissions. Consequently, minor discrepancies may occur between the allocated emissions and actual conditions. Future research could refine this process through regionalization and dynamic adjustments.
The initial and boundary conditions used in this study are derived from CarbonTracker 2022. According to a recent systematic validation over China by Ruan et al. [
53], CT2022 shows good agreement with ground-based TCCON XCO
2 observations (RMSE = 1.78 ppm, R = 0.92) and performs robustly in comparisons with multiple satellite products (GOSAT/GOSAT-2, OCO-2/OCO-3). This indicates that CT2022 has high accuracy over China and is suitable as the background field for this study. However, any background field carries inherent uncertainties, and CT2022 may still exhibit regional or seasonal systematic biases. Within our inversion framework, such large-scale, systematic concentration biases are partially attributed and corrected to surface fluxes during the cost-function minimization process, thereby reducing their direct impact on the final inversion results. The discrepancies between simulated and observed concentrations in this study primarily reflect uncertainties in the prior anthropogenic emission inventory, rather than significant biases in the CT2022 background field. The optimization of fluxes through the assimilation process effectively corrects the concentration simulation biases caused by inaccuracies in the prior emissions.
5. Conclusions
Based on the WRF-Chem model, this study developed a multi-source data assimilation system that integrates multi-platform satellite observations to simultaneously optimize regional CO2 concentration fields and emission fluxes. Through simulation and assimilation experiments conducted in China during December 2019, the system successfully reconstructed better CO2 concentration fields. Furthermore, based on concentration discrepancies, emission errors were inverted, enabling dynamic correction and precise optimization of the carbon emission inventory.
We evaluated the performance of 3DVAR and EAKF methods in optimizing CO2 emissions and improving forecast accuracy over China during December 2019. The study employed WRF-Chem models with identical configurations and implemented hourly surface CO2 observation data assimilation for both methods using a multi-source satellite fused CO2 concentration dataset. The results show that 3DVAR and EAKF produced optimized emissions with similar spatiotemporal distribution patterns across most regions of China, demonstrating both methods’ effectiveness in reducing prior emission inventory uncertainties. Compared with the MEIC emission inventory, the optimized emissions increased by 13.6% and 5.1% for 3DVAR and EAKF, respectively. Nationwide emission increases were observed except in Xinjiang. Specifically, December 2019 saw carbon emission reductions of 3.24 MT and 7.99 MT in Xinjiang under 3DVAR and EAKF, respectively, while central China exhibited increases of 74.5 MT and 30.52 MT.
By designing three simulation scenarios (using prior emissions, 3DVAR-optimized emissions, and EAKF-optimized emissions), this study evaluated the improvement effects of emission optimization on CO2 forecasting. Comparative validation against ground-based observations from TCCON and WDCGG stations demonstrated that optimized emission inventories enhanced simulation performance. Evaluation against the EGG4 dataset revealed that both 3DVAR and EAKF methods reduced systematic biases and random errors in CO2 concentration simulations, achieving approximately 75% reduction in bias and 49% decrease in RMSE.
Comparison with the EDGAR dataset shows that both the 3DVAR and EAKF optimization methods improve the simulation accuracy based on the MEIC inventory, with EAKF performing better by reducing RMSE by 56% and increasing the correlation coefficient by 39%, enhancing the ability to represent regional carbon emissions. The CO2 emission assimilation algorithm developed in this study offers an effective tool for regional carbon monitoring, with future applications extendable to different climate zones and multi-pollutant emission optimization, including aerosols, nitrogen oxides, ammonia, and methane for building a broader-coverage, higher-accuracy dynamic emission inventory system.