Spatiotemporal Modeling of Carbon Fluxes over Complex Underlying Surfaces along the North Shore of Hangzhou Bay

: Urban areas contribute to over 80% of carbon dioxide emissions, and considerable efforts are being undertaken to characterize spatiotemporal


Introduction
Human transportation-related activities are responsible for 80% of the carbon dioxide (CO 2 ) emission in densely populated urban areas [1], making them a significant source of atmospheric carbon.Changes in urban environments resulting from human activities primarily revolve around several aspects, such as the combustion of fossil fuels and the rapid increase in impervious surfaces [2,3].The rapid increase in urban impervious surface area has altered the biogeochemical processes associated with the soil-atmosphere gas exchange within the urban ecosystems and their response to global climate change.An urban impervious surface not only weakens surface transpiration and evaporation but also interferes with the exchange of water and heat between the surface and the atmosphere [4].
In the continual evolution of cities and the escalating challenges posed by climate change, the establishment of eddy covariance (EC) observation systems within urban areas has emerged as a crucial tool for comprehending urban environmental transformations.The research on carbon, water, and energy exchange between complex underlying urban surfaces and the atmosphere by the eddy covariance observation system mainly has focused on dynamic characteristics, influencing factors, and flux footprint determination.Lietzke et al. [5] presented a comprehensive overview of recent studies on urban CO 2 emissions, and the longest time series analysis reported in the literature spans six years [6,7].Furthermore, Schmutz et al. [8] conducted a decade-long examination of the long-term sequence of urban ecosystems, offering insights into the CO 2 flux and CO 2 molar concentration of this flux through observation at the University of Basel in Switzerland.
Since the 20th century, there has been increasing recognition of the predictive power of machine learning algorithms to solve a variety of problems, such as determining the relative importance of environmental factors affecting the mechanisms of carbon exchange [9], elucidating the nonlinear procession of carbon interaction between surface and atmosphere [10], providing a useful tool for carbon flux upscaling [11], and combining different process-based models to reduce predictive uncertainty [12], interpolated missing data on carbon flux, energy flux, and climate variables based on observations from the Global Flux Towers [13].Existing studies mainly have focused on the simulation of carbon flux in natural ecosystems.The simulation and prediction of carbon flux in ecosystems remain challenging, especially for highly heterogeneous urban underlying surfaces.Tramontana et al. [14] conducted a comprehensive cross-validation analysis on the spatial prediction of various carbon and energy fluxes, highlighting the particular difficulty in predicting net ecosystem exchange (NEE).Meyer [15] proposed a sequence feature selection algorithm based on spatial cross-validation to remove spatial autocorrelation predictors.However, challenges persist in expanding the scale, with data quality of EC data being a significant obstacle.Wang et al. [16] utilized the random forest (RF) model to assess the relationship between land use and carbon emissions.Their results showed that carbon emissions from retail and residential land categories accounted for a large proportion, and carbon emissions from terrace houses were higher than emissions from other residential building categories.Reitz et al. [17] used the RF model to predict the daily CO 2 flux at 250 m spatial resolution in the Ruhr catchment area of western Germany from 2010 to 2018; the results proved that although the model results underestimated the variance of CO 2 flux, they could accurately reflect the average value.Thus, spatial prediction was more difficult than time series prediction.
Previous studies on the observation and simulation of carbon flux mostly focused on natural ecosystems with relatively simple underlying surface types, such as forest ecosystems, grassland ecosystems, and farmland ecosystems [18][19][20].However, due to the high heterogeneity of urban complex underlying surface and intensive human activities, it is difficult for physiological models to be used to estimate the variations and spatial distribution of carbon flux in urban.At the same time, activities in urban areas also increase the uncertainty of carbon flux simulation.Therefore, a data-driven-based model is an effective way to solve the simulation of carbon flux over urban complex underlying surfaces.Although there were researchers using machine learning to study the interpolation and scale inference of carbon flux in urban, the results were not very good [16,17].In most of these studies, only meteorological and vegetation factors were considered in the model, and there were also studies that believed the impact of land use type on the spatial distribution of carbon flux, but only considered different ecosystem types, and did not study the spatial distribution of carbon flux of different land uses in the complex urban underlying surface.
In conclusion, previous research could not develop a suitable model to study the spatiotemporal distribution of CO 2 flux in urban areas.Hence, our objectives are the following: (1) Using four machine learning models to simulate the long time series of carbon flux over the complex urban underlying surfaces, and evaluating the model performance; (2) Utilizing the best-fitting model to scale up the carbon flux to Fengxian District, Shanghai.

Site Description
The study area is located in Fengxian District, Shanghai, along the north bank of Hangzhou Bay in the middle and lower reaches of the Yangtze River in China (Figure 1).The study area is flat and located in the mid-temperate zone geographically, with a subtropical monsoon climate.The prevailing wind direction is southeast, and the climate is humid and mild [21].The annual average temperature stands at around 16.1 • C, with an annual rainfall of approximately 1191.5 mm.Moreover, the annual frost-free period lasts for about 225 days [22].The underlying surface in the study is complex and fragmented, including forests, grasslands, farmlands, water bodies, and buildings.Each land use type collectively influences the carbon cycle within the study area [22].
Although there were researchers using machine learning to study the interpolation and scale inference of carbon flux in urban, the results were not very good [16,17].In most of these studies, only meteorological and vegetation factors were considered in the model, and there were also studies that believed the impact of land use type on the spatial distribution of carbon flux, but only considered different ecosystem types, and did not study the spatial distribution of carbon flux of different land uses in the complex urban underlying surface.
In conclusion, previous research could not develop a suitable model to study the spatiotemporal distribution of CO2 flux in urban areas.Hence, our objectives are the following: (1) Using four machine learning models to simulate the long time series of carbon flux over the complex urban underlying surfaces, and evaluating the model performance; (2) Utilizing the best-fitting model to scale up the carbon flux to Fengxian District, Shanghai.

Site Description
The study area is located in Fengxian District, Shanghai, along the north bank of Hangzhou Bay in the middle and lower reaches of the Yangtze River in China (Figure 1).The study area is flat and located in the mid-temperate zone geographically, with a subtropical monsoon climate.The prevailing wind direction is southeast, and the climate is humid and mild [21].The annual average temperature stands at around 16.1 °C, with an annual rainfall of approximately 1191.5 mm.Moreover, the annual frost-free period lasts for about 225 days [22].The underlying surface in the study is complex and fragmented, including forests, grasslands, farmlands, water bodies, and buildings.Each land use type collectively influences the carbon cycle within the study area [22].

Measurements and Data Processing
The eddy covariance flux observation tower (EC tower, 121 • 30 ′ 38.96 ′′ E, 30 • 50 ′ 32.26 ′′ N) is located in Fengxian Bay University City, Shanghai.CO 2 flux was measured using an open-path eddy covariance (OPEC) system, which consisted of an open-path and fastresponse infrared gas analyzer (Model LI-7500, Li-Cor Inc., Lincoln, NE, USA) to monitor the densities of CO 2 and H 2 O, and a 3D sonic anemometer (Model WindMaster Pro, Gill Instruments Ltd., Lymington, UK) to measure the fluctuations of three-dimensional wind speed and virtual temperature [21].The height of EC monitoring was 20 m.The raw data were recorded and saved to a data logger (Model CR 3000, Campell Scientific Inc., North Logan, UT, USA) (Figure 1).Micrometeorological measurement systems monitored multiple parameters, including relative humidity and air temperature (HMP-45C, Vaisala Inc., Helsinki, Finland), wind speed (AR-100, Vector Instruments, Weymouth, UK), photosynthetically active radiation (PAR) (Li-190SB, Li-Cor Inc., Lincoln, NE, USA) at 15 m above the ground surface, and soil temperature (Ts) (CS615-L, Campbell Scientific, Logan, UT, USA) at 5 cm below the soil surface.The meteorological data were recorded at 30 min intervals using a datalogger (Model CR 3000, Campbell Scientific).
In this study, we selected five years of flux and meteorological data (2011, 2012, 2017, 2018, and 2019).We used EddyPro 7.1.1software (Li-COR, Lincoln, NE, USA) to calculate CO 2 flux at a time interval of 30 min.EddyPro 7.1.1 was also employed for various data treatments, including axis rotation coordination, frequency response correction, Webb Pearman Leuning (WPL) density correction, wild point removal, and flux data quality indicator establishment (where 0 represents the best quality, 1 represents the middle quality, and 2 represents the worst quality) [23][24][25].Then we deleted the flux generated within 1 h before and after the rain and when the friction velocity (u star, u*) was less than 0.15 m/s, and we also deleted the flux data quality marked as 2. When the CO 2 flux was negative, it represented that the whole ecosystem was in a state of absorbing CO 2 , and when the flux was positive, it represented that the whole ecosystem was in a state of releasing CO 2 .After data quality control, we gap-filled CO 2 flux using the REddyProc package in R.

Daily Scale Urban Impervious Surface Area Data
The Kljun flux footprint model is a novel algorithm based on scale (dimension) analysis developed by Kljun et al. [26] that can be used to calculate the crosswind integral function of a flux source area.According to the Kljun flux source calculation model, we calculated the daily scale flux source area and obtained the two-dimensional coordinates and a twodimensional plane diagram of the flux source area.Finally, the daily scale land use/land cover information within the flux source area was obtained.The computation of the footprints was conducted by MatLab 2015b.

Grid Data
The environmental factors affecting carbon flux at a single site and regional carbon flux have some differences.In this study, combined with the research results of Papale [11], Chen [27], Li [28], and Liu [29], we used the six factors of land use, atmospheric temperature, soil moisture, air relative humidity, precipitation, and photosynthetically active radiation to study the scale increase in regional carbon flux.Explanatory variables were compiled from various sources and were of different spatial and temporal resolutions, as shown in Table 1.The time span of all data is 2011-2019.

Statistical Analysis
The Lindeman-Merenda-Gold (LMG) method [30] was used to quantify the relative contributions of each factor to the daily changes of carbon flux over complex underlying surfaces.We performed the algorithm using the "relaimpo" package in R, which contains variance decomposition methods for multiple linear regression models.The LMG method estimated the relative importance of each variable by decomposing the square sum into nonnegative contributions shared by each variable, and obtained the LMG value by averaging the sum of squares (R 2 ) of all possible orders, as follows: where q (k) is the order in which the independent variable x k enters the multiple linear regression model; S k (q) is the set of independent variables before xk enters the model in q permutations; seqR 2 {x k |S k (q) } is the R 2 increment when xk enters the model when the model contains the independent variable set S k (q); and LMG (x k ) is the average increase in R 2 caused by the independent variable x k .

Machine Learning and Model Evaluation
Long short-term memory (LSTM) is a fully interconnected neural network which has positive and negative feedback connections between neurons [31].It initially considers the long-term dependence of the learning phase and can overcomes the shortcomings of recurrent neural networks (RNNs) [32].We used the backpropagation algorithm with gradient descent to calculate the weights and bias terms in the training phase to minimize the objective function across time.Each LSTM unit included four parts: input gate; output gate; forgetting gate; and memory unit.LSTM used these memory units to control the impact of historical information on current information, which ultimately enabled the model to persist and transmit information.
The artificial neural network (ANN) referred to a feedforward neural network based on Pytorch [33].The architecture of the neural network model of the algorithm comprises three primary parts: the input layer, the hidden layer (middle layer), and the output layer.Typically, only one input layer and one output layer are considered.However, the number of hidden layers varies in different studies.In this study, all operations used a hidden layer.The parameter value played a decisive role in the performance of the ANN model during the training, verification, and generalization phases.We employed the trial-and-error method to determine the number of nodes in the hidden layer, based on the squared error between the output value and the observed value of the training network.This trained model was then applied to the model validation and testing (prediction) stages.In this study, the number of trained nodes in the hidden layer was 128.
The RF model is a tree-based ensemble method utilized to manage high-dimensional regression simulations in which forest development is based on multiple interconnected trees [34,35].In regression problems, the basic units of RF are regression trees.Each regression tree is constructed using random initial data sampling, where in a random subset of "m" attributes is used in each data sample to choose attributes with the most significant information.RF generates a ranking of the most important attributes in forest development based on the cumulative importance of the node partition in each tree.The regression trees are independent of each other.Each node in a regression tree randomly selects a subset of characteristic variables, then picks the optimal subset of variables from these subsets to split the branches.The final estimation is the average of estimates from all regression trees.RF is neutral to outliers and can avoid overfitting when dealing with high-dimensional features [36].
The support vector machine (SVM) method is known for its strong generalization ability [37,38].When developing an SVM model, it is particularly important to select the appropriate kernel function.In our study, we compared and evaluated several kernel functions based on the dataset to ensure the accuracy of the predictions.The radial basis function (RBF) kernel yielded the best performance for our SVM model.Additionally, after determining the kernel function, we considered other parameters that influence the SVM model's simulation ability, such as the insensitive loss factor, error penalty factor, and kernel function parameters.In this study, the insensitive loss coefficient was set to 0.01 by default.We utilized grid search to determine the kernel function parameters.
The model accuracy was assessed using root mean square error (RMSE), mean absolute error (MAE), and R-squared value (R 2 ).R 2 reflects the overall simulation performance of the model, RMSE indicates the general quality of the simulation, and MAE measures the average deviation of the simulation results [39].The model was considered highly accurate when R 2 approached 1, and both RMSE and MAE approached 0.
The operating system of this experiment was Windows 10, using Python 3.8.5 as the development language, and using Jupyter Notebook 6.4 and Spyder 3.5 as the development platform.

Important Driving Factors of Carbon Flux in a Long Time Series
To identify the key environmental variables influencing the long-term carbon fluxes at the site, we utilized the LMG model to analyze the contribution of urban impervious surface area (IMS), as well as atmospheric temperature (Ta), soil temperature 10 cm from the surface (Ts_10 cm), air relative humidity (RH), and net radiation (Rn) on the daily average carbon flux changes.The results are given in Table 2. Within the observation scope of the flux tower, each variable exhibited distinct effects on the fluctuations of daily average carbon flux within the ecosystem.Among these factors, air temperature emerged as the primary driver behind the variations in daily average carbon fluxes, explaining 61.22% of the observed trend of daily average CO 2 flux.Additionally, Ts_10 cm and Rn contributed 13.12% and 13.30%, respectively, to the changes in daily average CO 2 flux.The impervious surface area had a relatively moderate influence, accounting for 8.23% of the variability in daily average CO 2 flux.Ta, RH, Ts_10 cm, Rn, and IMS factors had a significant influence on the change in carbon flux.Therefore, we considered these five factors to be input factors to simulate the carbon flux in a long time series.

Evaluation of Model Performance for Long-Term CO 2 Flux
In the assessment of daily average CO 2 flux prediction for urban complex underlying surfaces using SVM, LSTM, ANN, and RF models, notable differences were observed in the agreement between predicted and observed values.Despite this, most predictions clustered closely around the 1:1 line, suggesting that these machine learning models could effectively forecast daily average CO 2 flux for complex urban environments (Figure 2).The performance metrics varied significantly among the models.The RF model exhibited the lowest RMSE of 0.293 µmol•m −2 •s −1 and the highest R 2 of 0.852, indicating superior performance compared to the other models.The SVM, LSTM, and ANN models had RMSE values of 0.413 µmol•m −2 •s −1 , 0.461 µmol•m −2 •s −1 , and 0.438 µmol•m −2 •s −1 , respectively, all higher than the RF model.Under the condition of similar RMSE values, the LSTM model demonstrated a higher R 2 value (0.830) than both the SVM model (0.702) and the ANN model (0.688).This suggested that the LSTM model's predictive performance surpassed that of the SVM and ANN models.

Simulation of Regional Carbon Flux Distribution Based on the RF Model
Based on the findings from the preceding section, we utilized the RF model to simulate the annual CO2 flux at a spatial resolution of 500 m for Fengxian District in the years 2011, 2012, 2017, 2018, and 2019 (Figure 3).Our results demonstrated that the RF model exhibited strong simulation capabilities, with an R 2 value of 0.8446, an RMSE of 0.2808

Effects of Impervious Surface Area in Simulating CO2 Flux
In this study, we introduced the factor of land use change, specifically, the incorporation of impervious surface area, and assessed the disparity in simulation performance of the carbon flux model with and without impervious surface area factor (Figure 4).It was observed that upon incorporating impervious surface area into the training model (Figure 4), the RF model demonstrated an ability to account for 84.46% of the daily average CO2 flux variation over the complex underlying urban surface.The resulting RMSE was

Effects of Impervious Surface Area in Simulating CO 2 Flux
In this study, we introduced the factor of land use change, specifically, the incorporation of impervious surface area, and assessed the disparity in simulation performance of the carbon flux model with and without impervious surface area factor (Figure 4).It was observed that upon incorporating impervious surface area into the training model (Figure 4), the RF model demonstrated an ability to account for 84.46% of the daily average CO 2 flux variation over the complex underlying urban surface.The resulting RMSE was 0.2808 µmol•m −2 •s −1 , with an MAE of 0.1462 µmol•m −2 •s −1 , and the predicted value and observed value generally were distributed near the 1:1 line.In contrast, the RF model accounted for 83.74% of the daily average CO 2 flux variation over the complex underlying urban surface without the inclusion of impervious surface area in its training (Figure 4).The RMSE was 0.2872 µmol•m −2 •s −1 , and the MAE was 0.147 µmol•m −2 •s −1 .Consequently, based on this comparison, we designated the incorporation of impervious surface area as a contributing factor, which effectively enhanced the model's performance and consequently refined the accuracy of simulation outcomes.served value generally were distributed near the 1:1 line.In contrast, the RF model accounted for 83.74% of the daily average CO2 flux variation over the complex underlying urban surface without the inclusion of impervious surface area in its training (Figure 4).The RMSE was 0.2872 µmol•m −2 •s −1 , and the MAE was 0.147 µmol•m −2 •s −1 .Consequently, based on this comparison, we designated the incorporation of impervious surface area as a contributing factor, which effectively enhanced the model's performance and consequently refined the accuracy of simulation outcomes.

Interannual Variation of CO2 Flux Spatial Distribution
This study utilized an RF model to simulate the spatial distribution patterns of carbon fluxes in Fengxian District of Shanghai for the years 2011, 2012, 2017, 2018, and 2019.Additionally, a simple linear regression method was employed to quantify the interannual variations in the spatial patterns of carbon fluxes in Fengxian District (Figure 5).Over the span of five years, the interannual variability of annual CO2 fluxes in Fengxian District, Shanghai, exhibited very distinct characteristics: more than 90% of the region experienced a decrease in annual CO2 flux values, with the maximum reduction rate reaching 17.83%.These reductions were primarily concentrated in the eastern Shanghai Harbor Industrial Zone, western Nanqiao Town, and Zhuanghang Town of Fengxian District.Conversely, there were fewer areas where the annual CO2 flux values increased, and these were more scattered, primarily located in the eastern part of Haiwan Town, with a growth rate of up to 6.74%.Combined with the land use changes in Fengxian District of Shanghai, it was found that the areas with decreased CO2 flux values generally corresponded to regions with denser vegetation, while areas with increased CO2 flux values were associated with increased land for construction purposes [40].Additionally, a simple linear regression method was employed to quantify the interannual variations in the spatial patterns of carbon fluxes in Fengxian District (Figure 5).Over the span of five years, the interannual variability of annual CO 2 fluxes in Fengxian District, Shanghai, exhibited very distinct characteristics: more than 90% of the region experienced a decrease in annual CO 2 flux values, with the maximum reduction rate reaching 17.83%.These reductions were primarily concentrated in the eastern Shanghai Harbor Industrial Zone, western Nanqiao Town, and Zhuanghang Town of Fengxian District.Conversely, there were fewer areas where the annual CO 2 flux values increased, and these were more scattered, primarily located in the eastern part of Haiwan Town, with a growth rate of up to 6.74%.Combined with the land use changes in Fengxian District of Shanghai, it was found that the areas with decreased CO 2 flux values generally corresponded to regions with denser vegetation, while areas with increased CO 2 flux values were associated with increased land for construction purposes [40].

Discussion
In this study, we adopted three traditional machine learning models (ANN, SVR, RF) and one deep learning model (LSTM).Our modeled results are shown to be within a reasonable range and highlight the importance of impervious surface area in simulating CO2

Discussion
In this study, we adopted three traditional machine learning models (ANN, SVR, RF) and one deep learning model (LSTM).Our modeled results are shown to be within a reasonable range and highlight the importance of impervious surface area in simulating CO 2 flux over urban complex underlying surfaces.Machine learning models are automatically able to learn complex nonlinear relationships from input data [41,42].In view of our restricted comprehension of the complicated physical, chemical, and biological interactions in the carbon flux cycle within urban ecosystems, there exists significant uncertainty in the simulation of carbon flux [14].Leveraging machine learning models enables us to more precisely quantify the spatial and temporal dynamics of carbon flux on regional and global scales.Relatively few studies have examined the time series of urban CO 2 flux [43,44].Schmidt et al. [43] employed ANN and RBF to observe and simulate CO 2 flux.In a six-week observational simulation conducted 65 m above an urban community in Münster, Germany, they achieved an R 2 of 0.67.Järvi et al. [44] predicted CO 2 flux based on the ANN model algorithm using five years of data from Helsinki, Finland, achieving an R 2 up to 0.4; the findings of Menzer et al. [13] showed that the ANN algorithm successfully provided similar performance in urban environments (with R 2 values ranging from 0.60 to 0.86).The performance of various machine learning models in estimating NEE also varied.The R 2 of ANN in estimating CO 2 flux in this study was 0.67, consistent with the results in Münster, while the RF model attained an R 2 value of 0.85 in Münster.Zeng et al. [45] used the RF model to estimate the carbon flux of global terrestrial ecosystems based on eddy covariance data, achieving favorable outcomes (with R 2 values of 0.97 for gross primary production (GPP), 0.96 for ecosystem respiration (RECO), and 0.94 for NEE when combining all training data).
By comparing the results of previous studies and the results in our study, we found that when ANN simulated CO 2 flux over urban complex surfaces, there was a difference in the simulation performance, which might be related to the input factors or the length of time [46,47].The RF model has shown excellent simulation performance in many studies, which was consistent with the results of this study.
In the simulation and prediction of carbon fluxes in the study area, quantifying the actual contributions of land use and land cover changes to carbon fluxes [48,49] shows significant uncertainty.One of the reasons is the lack of core data that disaggregate these fluxes into individual grids [49].Furthermore, limited data or the exclusion of certain processes (such as tree felling and conversion of land for cultivation) may lead to the underestimation of carbon dioxide emissions or transfers resulting from land use changes [49][50][51][52].Most studies establish regional-scale simulation models for carbon fluxes based on data from individual sites to construct machine learning models applied to those sites [53][54][55].However, utilizing data from multiple flux sites to establish a generalized simulation model proves effective in addressing this issue.Such a generalized model can also be used to infer carbon fluxes at meteorological stations and provide additional observational datasets for studies on flux changes across multiple regions.The RF model is considered a reasonable and suitable method for simulating CO 2 fluxes from site to regional scales.Firstly, as a machine learning algorithm, the RF model selects the optimal output from multiple regression trees to capture the features of the data, effectively enhancing the accuracy of flux data [56,57].Secondly, by extracting the multivariate functional relationships between observed data and explanatory variables, the RF model can integrate data from different sources and simplify complex processes, addressing nonlinear issues in ecosystems [58].
This study proposed a novel model for simulating the spatiotemporal scale of carbon fluxes in urban complex underlying surfaces, leveraging four machine learning algorithms for the first time.Such a model demonstrated commendable performance in handling flux data while effectively circumventing the computation of intricate parameters and ecological processes.By utilizing footprint models to quantify the area of impervious surfaces at a daily scale, the model achieved enhanced accuracy in simulating the spatiotemporal variability of carbon flux in Fengxian District, Shanghai, thereby reducing the uncertainty associated with spatiotemporal scale simulation of carbon flux in the region.Nevertheless, this study acknowledges several uncertainties and limitations.Firstly, it did not quantify the errors generated during the process of data scale transformation.Secondly, due to the opaque nature of machine learning models, it was difficult to fully understand the impact of the models on the prediction results during the model training process.Additionally, the variable resampling process only employed linear interpolation, neglecting other resampling methods.Hence, future research endeavors should focus on exploring appropriate methods to mitigate the scale effects caused by scale conversion.

Conclusions
In this study, we employed machine learning algorithms to proficiently simulate the spatiotemporal fluctuations of carbon flux on the complex underlying surfaces of Fengxian District, located on the northern bank of Hangzhou Bay, by integrating diverse sources of observation data.The primary findings obtained in this study are summarized concisely as follows: (1) Our study demonstrated that the four machine learning models used in our study can accurately simulate the long-term carbon flux over the complex underlying surfaces, with the RF model exhibiting the highest simulation performance.This study concludes that the RF model can accurately simulate the carbon flux of complex urban underlying surfaces and confirms the significant role of impervious surface area in precisely predicting the spatiotemporal scale of carbon flux on such surfaces.The innovation of this paper is the obtainment of the impervious area of the daily scale based on the Kljun model, and the results in our study demonstrate that the incorporation of the daily impervious surface area index improves the accuracy of long-term carbon flux simulations over the urban complex underlying surface.This not only reduces the uncertainty in modeling carbon cycling in terrestrial ecosystems but also broadens the variety of models for the carbon cycling of terrestrial ecosystems.

Figure 1 .
Figure 1.(a) An image of Shanghai; (b) a distribution map of streets and towns in Fengxian District of Shanghai; and (c) a land use map of the underlying surface for the vorticity tower area in 2019.

Figure 1 .
Figure 1.(a) An image of Shanghai; (b) a distribution map of streets and towns in Fengxian District of Shanghai; and (c) a land use map of the underlying surface for the vorticity tower area in 2019.

Figure 2 .
Figure 2. Comparison of daily mean CO2 flux predicted by different machine learning models and observed daily mean CO2 flux: SVM: support vector machine; LSTM: long short-term memory network; ANN: artificial neural network; RF: random forest.The red line was the fitting line, the black one was 1:1 line.

Figure 2 .
Figure 2. Comparison of daily mean CO 2 flux predicted by different machine learning models and observed daily mean CO 2 flux: SVM: support vector machine; LSTM: long short-term memory network; ANN: artificial neural network; RF: random forest.The red line was the fitting line, the black one was 1:1 line.3.3.Simulation of Regional Carbon Flux Distribution Based on the RF Model Based on the findings from the preceding section, we utilized the RF model to simulate the annual CO 2 flux at a spatial resolution of 500 m for Fengxian District in the years 2011, 2012, 2017, 2018, and 2019 (Figure 3).Our results demonstrated that the RF model exhibited strong simulation capabilities, with an R 2 value of 0.8446, an RMSE of 0.2808 µmol•m −2 •s −1 , and an MAE of 0.1462 µmol•m −2 •s −1 .Fengxian District acted as a net CO 2 source on average between 2011 and 2019; the average net CO 2 exchange was 1.02 g•m −2 •d −1 .The analysis of the annual average CO 2 flux in Fengxian District revealed that the western region exhibited lower values (mean CO 2 flux of 0.79 g•m −2 •d −1 ) compared to the eastern region (mean CO 2 flux of 1.25 g•m −2 •d −1 ), while the northern region had lower values (mean CO 2 flux of 0.87 g•m −2 •d −1 ) than the southern region (mean CO 2 flux of 1.22 g•m −2 •d −1 ), with a gradual increase in CO 2 flux values from west to east.In the western region, the CO 2 flux values demonstrated a circular pattern, with Nanqiao Town and Fengxian District Modern Agricultural Park serving as the central points, and the CO 2 flux values increasing outward in a concentric manner.Furthermore, the distribution characteristics and range of CO 2 flux varied significantly from year to year.

Figure 3 .
Figure 3. Spatial distribution of annual average CO2 flux in Fengxian District of Shanghai along the north bank of Hangzhou Bay in 2011, 2012, 2017, 2018, and 2019.

Figure 3 .
Figure 3. Spatial distribution of annual average CO 2 flux in Fengxian District of Shanghai along the north bank of Hangzhou Bay in 2011, 2012, 2017, 2018, and 2019.

Figure 4 .
Figure 4. Comparison of the predicted daily average carbon flux with the observed daily average carbon flux: y1 is CO2 flux predicted by random forest with impervious surface input; y2 is CO2 flux values predicted by RF without impervious surface input.

Figure 4 .
Figure 4. Comparison of the predicted daily average carbon flux with the observed daily average carbon flux: y1 is CO 2 flux predicted by random forest with impervious surface input; y2 is CO 2 flux values predicted by RF without impervious surface input.

3. 5 .
Interannual Variation of CO 2 Flux Spatial Distribution This study utilized an RF model to simulate the spatial distribution patterns of carbon fluxes in Fengxian District of Shanghai for the years 2011, 2012, 2017, 2018, and 2019.

( 2 )
The RF model can accurately portray the spatiotemporal distribution characteristics of carbon flux in Fengxian District, Shanghai.(3)Spatial heterogeneity in carbon flux was evident in Fengxian District on the north bank of Hangzhou Bay: the carbon flux value in the western region was lower compared to this in the eastern region, with a gradual increase observed from west to east within Fengxian District.(4) When simulating the spatiotemporal carbon flux of complex underlying surfaces using machine algorithms, the incorporation of the impervious surface area index marginally improved the accuracy of long-term carbon flux simulations.At a spatial scale, regions with larger impervious surface areas exhibit higher carbon flux values, indicating a strong correlation between carbon flux distribution and land use patterns.Consequently, the incorporation of the impervious surface area index serves as a relatively significant indicator for simulating spatial-scale carbon flux.

Table 1 .
Predictor variables used for model simulation.

Table 2 .
Contribution of influence factors to daily average carbon flux.