A 1 km Global Carbon Flux Dataset Using In Situ Measurements and Deep Learning

: Global carbon ﬂuxes describe the carbon exchange between land and atmosphere. However, already available global carbon ﬂuxes datasets have not been adjusted by the available site data and deep learning tools. In this work, a global carbon ﬂuxes dataset (named as GCFD) of gross primary productivity (GPP), terrestrial ecosystem respiration (RECO), and net ecosystem exchange (NEE) has been developed via a deep learning based convolutional neural network (CNN) model. The dataset has a spatial resolution of 1 km at three time steps per month from January 1999 to June 2020. Flux measurements were used as a training target while remote sensing of vegetation conditions and meteorological data were used as predictors. The results showed that CNN could outperform other commonly used machine learning methods such as random forest (RF) and artiﬁcial neural network (ANN) by leading to satisfactory performance with R 2 values of the validation stage as 0.82, 0.72 and 0.62 for GPP, RECO, and NEE modelling, respectively. Thus, CNN trained using reanalysis meteorological data and remote sensing data was chosen to produce the global dataset. GCFD showed higher accuracy and more spatial details than some other global carbon ﬂux datasets with reasonable spatial pattern and temporal variation. GCFD is also in accordance with vegetation conditions detected by remote sensing. Owing to the obtained results, GCFD can be a useful reference for various meteorological and ecological analyses and modelling, especially when high resolution carbon ﬂux maps are required.


Introduction
Carbon dioxide in the atmosphere is an important factor influencing climate [1,2].Carbon dioxide exchanged between the atmosphere and land, commonly referred to as carbon flux, is a major contributor to global climate change and warming [3,4].Gross primary production (GPP), terrestrial ecosystem respiration (RECO), and net ecosystem exchange (NEE) are the most common carbon flux variables used to measure the carbon exchange process between the land and the atmosphere [5,6].Continuous measurements of these data are provided by eddy covariance flux observation towers around the world [7].FLUXNET is an example of the global observation network [8].However, despite the large number of sites, the measured values at these sites can only represent the regional carbon fluxes in the absence of global distribution information [9][10][11][12].Thus, it is necessary to estimate spatial-temporal continuous distribution of global carbon flux.
In order to make effective estimates of carbon flux in various regions of the earth, reliable upscaling methods should be adopted and applied [13].There are several kinds of technologies available to calculate or estimate global carbon fluxes [14].Data-driven machine learning methods are crucial in these approaches.These methods typically make use of satellite remote sensing records, meteorological data, and other datasets related to the carbon flux.Several studies and projects used various machine learning models to predict carbon fluxes.The most widely used of such projects is FLUXCOM (http://www.fluxcom.org/,accessed on 31 January 2023) [15].The model tree ensemble method employing remote sensing data and a meteorological dataset was the benchmark approach for FLUXNET upscaling [16].The datasets obtained can be used to assess land surface process models and the state of the biosphere [17].The recent initiative further developed models based on machine learning algorithms in carbon flux prediction.Tramontana et al. [18] evaluated the random forest model's performance in various regions for various target variables.Their study used 11 machine learning methods in four categories, including tree method, kernel function method, neural network, and regression splines, and compared the outputs with results of the existing models [9,19].Using the same models, Jun et al. [20] predicted the global gridded data of energy flux, including radiation, latent heat, and sensible heat.Other studies also used some machine learning methods for carbon flux estimation.Among these methods, random forest and some other similar tree models showed acceptable outcomes for predicting carbon flux.The prediction results using random forest showed that it outperformed MODIS products in several locations [21].Xiao et al. [22] used a modified regression tree to predict the net ecosystem carbon exchange in the US.Cubist regression tree model was used to estimate global SIF (solar-induced chlorophyll fluorescence) and GPP based on MODIS remote sensing data and meteorological reanalysis data [23].The tree ensemble method was applied to develop a dataset nationwide over China [24].Zeng et al. [25] used the random forest model to upscale GPP, RECO, and NEE data of FLUXNET2015 to obtain global carbon flux products from 1999 to 2019.In addition to tree models, the Neural Network model has also been used to estimate global GPP with remotely sensed SIF and other radiation and meteorological data as predictors [26].The Support Vector Regression model could also reproduce GPP and NEE at site level and spatial level in Asia [27].The potential for deep learning applications in environmental modelling is growing [28].For example, Yu et al. used four deep learning models, including deep neural network, convolutional neural network, back propagation neural network, and recurrent neural network to downscale GLASS GPP and NPP data to a higher resolution [29].
Nowadays, there are still some research gaps regarding models and data utilized for prediction of carbon fluxes.First, few studies have paid attention to the application of deep learning on upscaling one carbon flux in order to derive global products, despite the fact that they outperform conventional machine learning in many cases, though many studies have focused on estimating carbon flux by conventional machine learning methods.Therefore, in our research, the convolutional neural network (CNN), as a robust deep learning model, was applied for prediction of the carbon flux in order to make a comparison with other conventional machine learning methods.Second, available site data are not fully utilized in previous studies [9,19,25].In our study, in addition to the FLUXNET dataset, two more site datasets (i.e., FLUXNET-CH4 [30] and Drought-2018 [31]) with updated data were added to the training target set, which can enhance the representativeness of the site data, and thus lead to a higher quality of the carbon flux estimation.Last but not least, current global carbon flux products are at relative coarse resolutions with limited precision.The widely used FLUXCOM has a resolution of 0.5 degree, while Zeng et al. [25] provides a product at 0.1 degree.In our study, we aim to produce a global carbon flux dataset at a much higher resolution, i.e., 30 arc second (~1 km), which can meet the demand of research that requires more spatial details of carbon fluxes.The purpose of this study was to develop a global carbon flux dataset (GCFD) with 1 km resolution using remote sensing and meteorological variables.The main research questions were as follows: (1) Can CNN as a deep learning model achieve better performance than conventional machine learning models such as ANN and RF? (2) Are the spatial patterns and temporal variations of carbon fluxes estimated by CNN reasonable at the resolution of 1 km?
The remaining structure of the paper is organized as follows.Section 2 describes the target variables and predictors, models, experiments, model evaluation, and data production.Section 3 gives the validation results, timeseries diagrams, spatial distributions, and variable relation analysis.Sections 4 and 5 present the discussion and conclusions, respectively.

Materials and Methods
This section introduces data sources used, models for predicting carbon flux, experiment design, and evaluation indicators.The framework of this study is shown in Figure 1.

Target Carbon Fluxes and Predictors
The predicted target variables for the models were three carbon fluxes, namely, GPP, RECO, and NEE from flux tower sites.There are mainly two categories of predictors used in this research: remote sensing and meteorological variables.Remote sensing variables include fraction of absorbed photosynthetically active radiation (FAPAR), leaf area index (LAI), minimum LAI of each year (LAI_MIN), maximum LAI of each year (LAI_MAX), and the number of LAI exceeding the average of LAI_MAX and LAI_MIN in each year (LAI_COUNT).The meteorological variables include 2 m temperature (TA), surface solar radiation downward (SW_IN), latent heat flux (LE), sensible heat flux (H), soil temperature (TS) and soil water content (SWC).These predictors were selected for the current modelling according to previous studies [25].The remote sensing variables were used as indicators of vegetation conditions.Zeng et al. proved that there is a good statistical relationship between carbon fluxes and FAPAR (or LAI) [25].In addition, LAI_MIN, LAI_MAX, and LAI_COUNT can also represent different plant functional types to some extent.The meteorological variables were used as environmental condition indicators that affect plants' activities.

Flux Tower Site Data
Three datasets, FLUXNET2015 [8], FLUXNET-CH4 [30], and Drought-2018 [31] were used as site datasets in this study, which include meteorological data and carbon flux variables.The FLUXNET2015 Tier 2, the FLUXNET-CH4, and Drought-2018 datasets contain data from 212, 80, and 52 sites, respectively.Since the FLUXNET-CH4 and Drought-2018 datasets have been produced more recently, only the new versions of data were kept for duplicated sites from different datasets.After removing duplicated sites, 280 sites around the world remained.The locations of these sites are shown in Figure 1.In this study, daily values of three carbon flux variables were extracted from these sites.The time coverages of these sites are different, varying from 1996 to 2018.Data after 1999 were used in the study to keep consistent with the time range of available remote sensing data.In order to keep consistent with the time interval of remote sensing data, site data were aggregated into a 10-day time step.Since GPP and RECO are always positive, we removed the negative values of GPP and RECO.We also removed the unrealistic continuous constant values of site measurements at some timestamps.Missing values were deleted for the machine learning modelling.

Remote Sensing Data
The remote sensing data including FAPAR and LAI are from Copernicus Global Land Service (CGLS) version 2 dataset [32] with 1 km resolution, covering from January 1999 to June 2020.The interval of the data is 10 days: the first 10 days of a month, the 11th to 20th days of the month, and after the 20th day of the month.The LAI_MAX, LAI_MIN, and LAI_COUNT variables were calculated according to LAI, and they were annual scale.For every year in our 10-day fitting, we applied the same values of these variables.Remote sensing data were extracted according to the location of each flux site.That is, if a site is located in a grid of remote sensing data, the corresponding values are extracted.

Reanalysis Data
The reanalysis data for the meteorological variables were retrieved from the ERA5-Land dataset (Land component of the fifth generation of European Reanalysis) with the spatial resolution of 0.1 • × 0.1 • [33].Data from January 1999 to June 2020 were used in this study.In accordance with remote sensing data, the original data with a three-hour time step were aggregated to the 10-day average.We processed the ERA5 data into 1 km resolution using the nearest neighbor method.

Models for Carbon Flux Prediction
The CNN model [34] was used in this study as a robust deep learning approach to predict carbon fluxes.For comparison, two conventional machine learning models were also applied, including ANN [35] and RF [36].The main components of CNN include convolution layer and pooling layer.The convolution layer projects the features of the input data to the next layer through convolution operation, while the pooling layer reduces the number of parameters and computation by down sampling the input data.After the above processing, the data are processed and outflow through the fully connected layers.The basic structure of ANN is a perceptron, which is a kind of artificial neuron.It first accepts input and calculates weighted sum, and then produces output using activation functions.A neural network consists of several layers of perceptron, including input layer, fully connected layer, and output layer.In regression problems, the number of neurons in the last output layer of ANN and CNN is the number of output dimensions.In our study, three neurons were considered in the output layer to predict the values of GPP, RECO, and NEE.RF is an ensemble of multiple decision trees.It uses the bagging method to randomly select a feature subset from all available features to train different decision trees at each time (iteration), and it finally takes the average of the predicted results of all the decision trees as the final predicted value.

Experiment Design
Experiments were designed for comparison among three machine learning algorithms, including CNN, ANN, and RF.The grid search method was used to determine the optimum parameters for each model.We specified the search range of each parameter based on the default setting of the software and some previous studies about machine learning.For the random forest model, we considered a search range of total tree numbers within 100 to 900 with an interval of 100, and we searched for minimum samples at the leaf node from 1 to 10.According to the above searching scheme, we set the total tree numbers and the minimum samples at a leaf node as 500 and 5, respectively.For ANN and CNN models, the searching range of the number of neurons of each dense layer was from 10 to 90 with an interval of 10.The number of dense layers was searched in the range of 1 to 5. The learning rate was searched among 0.1, 0.01, and 0.001.For the ANN model, the number of dense layers, the number of neurons, and the learning rate were set as 3, 90, and 0.01, respectively.For CNN modelling, a 1-dimensional convolutional layer followed by 3 dense layers with 70 neurons and the learning rate of 0.01 were used.The simulation setting of the parameters in this part was described in Table 1.

Model Evaluation and Data Production
The modelling performance was evaluated by the year-to-year and station-to-station validations.In the year-to-year validation, the data from 1999 to 2017 in each station were used as a training dataset, and to evaluate the accuracy of the models on a temporal scale, the generated carbon fluxes by three models were compared with the observed carbon flux towers data in 2018.In the station-to-station validation, randomly selected data from 90% of the stations from 1999 to 2018 were applied for training, and the data from the rest stations were used to evaluate accuracy of the three models on a spatial scale.This process was repeated 10 times in order to make use of every station to evaluate the models.Finally, we computed the averaged results of the above 10 evaluations.
In this study, the coefficient of determination (R 2 ), root mean square error (RMSE), and bias were used to evaluate modelling performance as: where y i and ŷi are the observed and predicted values, respectively, and y is the averaged value of observations.
Due to the evaluation of three models (see Sections 3.1 and 3.2), the CNN model was chosen to produce the global dataset (GCFD).GCFD is provided with two resolutions, i.e., 30 arc second (~1 km) and 0.1 degree (~9 km) 3 times per month, covering January 1999 to June 2020, which is in accordance with the CGLS version 2 dataset used as predictors.The three carbon fluxes were set to zero for the LAI values less than 0.1.

Station-to-Station and Year-to-Year Cross Validation Results
In order to compare different machine learning models, three ML models (i.e., RF, ANN and CNN) were used to carry out the validation.
Figure 2 shows the station-to-station cross-validation results of each model.Two interesting aspects could be observed according to the obtained results.First, there were significant differences among the performance of the models.CNN led to the best performance for all three carbon fluxes, while RF had the worst.Meanwhile, ANN was slightly worse than CNN.Second, according to the obtained R 2 , all models generally achieved the best performance in predicting GPP, followed by RECO and NEE.Though NEE had slightly higher RMSE than RECO for ANN and CNN, GPP generally obtained the largest RMSE in both experiments, followed by RECO and NEE.This could be due to the wider range of GPP data compared to the RECO and NEE.As the time range of the used data for training was from 1999 to 2017, the data in 2018 were left out for the year-to-year validation purpose.The validation results are shown in Figure 4.The two interesting aspects could be highlighted in station-to-station validation, as also observed in the year-to-year validation.However, models generally had worse performance in the year-to-year validation than those in the station-to-station one, especially for the prediction of NEE by RF and ANN.This indicated that the predictability at the space scale was better than that at the time scale, especially for NEE.All models had similar RMSE around 1.9 and 2.3 g C m −2 d −1 .Nevertheless, according to R 2 , CNN could still be considered as the best model in general for all three fluxes.In summary, CNN outperformed the other two machine learning models according to the results of the two validation scenarios.

Prediction of Time Series at Single Sites
To further compare the prediction results, the trained models were used to make prediction at each site using the remote sensing records and reanalysis meteorological data for the location of the site.After prediction, the bias between predictions and observations was calculated (Figure 5).The median biases of all three carbon fluxes were close to zero in the experiments, which means that the models remained unbiased during the prediction.In order to evaluate the modelling performance, three ML models (RF, ANN, and CNN) were used to predict the time series of carbon fluxes for four selected sites in different climate zones.Figure 6 shows the time series of GPP of these sites, including observed and predicted values by the models.CNN could lead to the best predictions at all sites, except that it had slightly lower R 2 (0.78) at NL-Loo than RF (0.83).All three models showed a similar trend in the predicted time series, but CNN tends to be closer to the observations.For the low values, all three models could make reliable estimates in most cases.However, for some peaks, RF and ANN showed significant overestimates, especially for ANN at FI-Hyy and CH-Dav (see red circles in Figure 6).

Multi-Site Averaged Time Series
In order to further explore the temporal variation of each carbon flux, data from all stations were analysed together.Here, predictions from all sites at each timestamp in GCFD were averaged for the period of 1999 to 2018. Figure 7a shows the annual average changes predicted by the proposed model and the observed values.Moreover, according to the Koppen Climate Classification [37], the annual variation of the carbon fluxes in different climate zones is shown in Figure 7b-h.The predictions of annual changes were quite consistent with the observation from carbon flux towers in all climate zones.It shows that the variations of GPP and RECO were similar, which coincides with recent studies [38].Meanwhile, the variations of NEE were relatively small in all climate zones, except in tropical and subtropical climate.GPP showed a significant increase in the dry climate zone in the last two decades, which can be due, in the main, to the CO 2 fertilization effect, as reported by previous studies (Gonsamo et al., 2021).On the other hand, GPP has been decreasing in tropical zones since 2005.This possibly happened due to a higher vapour pressure deficit in the warmer climate over the tropics, which could lead to a suppression of photosynthesis [39].The comparison among different global carbon fluxes is discussed in Section 4.2.

Spatial Patterns
Figure 8 shows the annual mean values of GPP, RECO, and NEE from 1999 to June 2020 in the GCFD.Generally, the spatial patterns of the three carbon fluxes were reasonable according to the physical knowledge about the process.As can be seen from Figure 8, GPP predicted by the CNN model showed relatively high values near the equator (mainly rainforest).There were also relatively high values in some places such as central America and the southern part of Asia.This pattern of GPP distribution is consistent with other studies [14,40].According to the Koppen classification, most areas with high GPP have an equatorial fully humid or monsoonal climate.These kinds of climate provide adequate conditions of temperature and humidity for photosynthesis, so the GPP values are higher than other areas out of the tropical climates [41].There is usually a positive correlation between GPP and temperature and precipitation parameters [42].Recent studies have also revealed that temperature is the dominant factor affecting GPP at a global scale, while water availability is more important at the local scale, especially at low latitudes [43].Therefore, in some humid areas of lower latitudes such as the southeast part of China, eastern Australia, the southern part of South America, and the southern US, there were also high GPP values, as shown by Figure 8.The spatial pattern of RECO was generally consistent with that of GPP.That is, higher GPP corresponds to RECO, which is consistent with studies that indicated that high RECO values usually appear with high temperature [44,45].As recent studies have showed, RECO had the highest values in the tropical area, followed by southeast China, southeast US, and western Europe, and low values in cold and dry areas at high latitude [46,47].Regarding NEE, the high values (carbon source) appeared in the high latitudes, while low values (carbon sink) appeared in the tropical rainforest regions.This is consistent with the previous studies showing that NEE is strongly related to air temperature [48].Recent studies also pointed to low NEE values in Asia and the tropics [49].In cold and arid areas, previous studies reported that carbon usually releases into the atmosphere [50].

Main Variables Affecting Carbon Fluxes
Through correlation analysis, major factors affecting carbon fluxes could be determined among the existing predictors.Figure S1 shows the correlation coefficient matrix, which calculates all correlation coefficients among the 11 predictors and 3 target variables in this study.As can be seen from Figure S1, FAPAR, LAI, LE, and TA had the greatest influence on the three carbon flux variables, where the absolute values of correlation coefficients between these predictors and GPP (or RECO) were all above 0.5.The high correlation of remote sensing predictors and carbon fluxes is consistent with previous studies [51 -53].
In order to further analyse the influence of predictors on the carbon flux variables, the feature importance of random forest was used to determine and rank the important variables.Figure 9 shows the importance of each predictor by RF in Experiment 2, in which the sum of the importance values of all predictors is one.The result shows that LAI, FAPAR, and LE had larger feature importance than other variables when predicting GPP and RECO, while the five remote sensing predictors were the most important in predicting NEE, which is consistent with the result of correlation coefficients.However, air temperature was not important in the RF models compared to the correlation analysis result.This may be due to that the RF importance can reflect a nonlinear relationship, while the correlation coefficient measures only linear relationship between parameters.

Comparison with Different Global Carbon Flux Products
Here, the GCFD is compared with the predictions provided by Zeng et al. [24] (hereafter referred to as Zeng et al.) and FLUXCOM [9, 19,25].Because FLUXCOM data have been provided at monthly scale, all three data sets were processed into a monthly scale for comparison.Using data from all sites and their corresponding grid data for comparison, the provided results in Figure S2 were obtained.It shows that GCFD led to the best performance, but significant underestimations and overestimations were observed for the other two datasets.FLUXCOM and Zeng et al. showed quite similar levels of performance according to R 2 and RMSE values, while GCFD got higher R 2 and lower RMSE values for three fluxes.For example, compared with FLUXCOM, GCFD improved the R 2 by 58%, 66%, and 168% for GPP, RECO, and NEE, respectively.The above validation may not be so fair because FLUXCOM and Zeng et al. [24] only used FLUXNET2015, while GCFD used two more site datasets, i.e., FLUXNET-CH4 and Drought-2018 Flux.So, further independent comparison should be done by comparing the two datasets with the results of stationto-station and year-to-year cross validations of CNN in Experiment 2. According to the station-to-station cross validation, CNN in Experiment 2 still showed higher R 2 and lower RMSE values than the other two datasets.For example, compared with FLUXCOM, CNN improved the R 2 by 47%, 40%, and 100% for GPP, RECO, and NEE, respectively.According to the year-to-year cross validation, CNN led to higher R 2 for GPP (improved by 30%), RECO (improved by 1%), and NEE (improved by 42%), and similar RMSE with the other two datasets (Figure S3).The above results imply that GCFD could lead to higher accuracy than FLUXCOM and Zeng et al. [24], though it may have drawbacks in predicting the unseen period by the CNN model.The extrapolation for the unseen period by the model may produce poorer results compared with the interpolation for the unseen locations.Thus, it should be noted that the data in GCFD from 2019 to 2020 may have lower quality than those from 1999 to 2018 because in situ observations from 1999 to 2018 were used to derive the GCFD.
The global distribution of GPP, RECO, and NEE as well as their distribution in an area with oceanic climate from the three products on 10 July 2010 are presented in Figure 10, Figures S4 and S5.As can be seen, the GCFD with a spatial resolution of 1 km has more detailed information than other products, and it captures the spatial contiguity of carbon fluxes well.Meanwhile, the spatial patterns of GPP and RECO are consistent with the distribution of the most two important predictors with high resolution (LAI and FAPAR).The spatial distribution of the three carbon fluxes in continental climate zones, dry climate zones, and subtropical climate zones are also presented to further illustrate the advantages of the GCFD (Figures S6-S14).In summary, GCFD can better reflect the detailed spatial distribution of global carbon fluxes in accordance with vegetation conditions than Zeng et al. and FLUXCOM.
As shown in Figure 7, GCFD could better represent the annual changes of global carbon fluxes than Zeng et al. and FLUXCOM.For all three carbon fluxes, FLUXCOM had little annual change in all climate zones, which is a major defect of this product.Though Zeng et al. could reflect annual changes to some extent, it failed to reflect the increasing trend of GPP and RECO in dry climate zones and the decreasing trend in tropical climate zones, but the GCFD did.For the global average of all sites, GCFD had a slight overestimation of GPP and RECO compared with the observations, while Zeng et al. had a slight underestimation.However, FLUXCOM showed significant underestimation.This may be due to the resolution discrepancy of these products; because grids with larger size tend to have lower values of GPP and NEE in reflecting the average state.This phenomenon can also be observed in the regional maps of carbon fluxes (Figures 10 and S4-S14).This further emphasizes the necessity for deriving a high-resolution carbon flux dataset for appropriately reflecting the spatial heterogeneity.

Uneven Distribution of Sites
This study used data from 280 sites around the world.As shown in Figure 1, it is easy to see that in Europe and North America, the number and density of sites are significantly higher than in other parts of the world.Therefore, a high proportion of data from Europe and North America were used in the training step of the modelling.This could lead to better predictions in Europe and North America, but greater errors in other regions.Recent reviews of carbon flux products also pointed out that the density of sites is higher in temperate area than in tropical area [14].Therefore, it is necessary to find a better way to solve the problem of uneven distribution of sites, especially making more flux data available for the areas with sparse sites.

Selection of Predictors and Models
To predict carbon flux, this study used two remote sensing variables and six meteorological parameters and compared the performance of three different machine learning algorithms.After exploring the relationship between variables, the results showed that remote sensing variables can strongly impact on the carbon flux variations.Therefore, selecting and incorporating more predictors, especially remote-sensing-based variables related to vegetation, is crucial for machine learning based modelling of carbon fluxes.Moreover, the deep-learning based CNN model was used in this study.According to the results, the overall performance of the deep learning model was better than that of conventional machine learning methods.Therefore, we suggest that other deep learning based models such as ConvLSTM (Convolutional Long Short-Term Memory) [54] should be explored, and the stability and suitability of the modelling for carbon flux prediction should be improved.This study also found that LAI is the dominant factor for the prediction of carbon flux variables.Now, studies suggest that the relation between LAI and GPP is not so consistent in different land cover types, with a strong relationship in arid areas and a weak relationship in humid areas [55].Therefore, further investigation is needed to see how GPP can respond to variables of greenness, such as LAI and NDVI, in different vegetation types.Recent studies also showed that the increasing trend of CO 2 concentration has a positive effect on GPP trend [56], and, therefore, its incorporation in the proposed machine learning modelling can be investigated in future studies.
In addition, the spatial and temporal scales of the remote sensing data, meteorological data, and carbon flux data used in this study were incompatible.In order to match the time scale of remote sensing data, the 10-day average of the other two types of data were calculated, which may result in bias if the original time series contains missing values.On the other hand, the spatial scale of the remote sensing data is 1 km, while the resolution is 0.1 • × 0.1 • for the meteorological data.So, spatial interpolating was conducted before carbon flux estimation.In this process, the nearest value was used when interpolating the ERA5-Land data into a 1 km scale.This may reduce some detailed information and produce uncertainty in the 1 km carbon flux dataset.

Conclusions
Based on in situ carbon flux measurements, meteorological records, and remote sensing data, this paper evaluated the performance of multiple machine learning models, including CNN, ANN, and RF, in generating the global carbon flux dataset (GCFD) with three carbon fluxes: GPP, RECO, and NEE.According to the station-to-station and yearto-year validations, CNN performed better than RF and ANN models, and performed better in spatial interpolation than in temporal extrapolation.Thus, CNN was chosen to produce GCFD.GCFD achieved a satisfactory performance with R 2 values of 0.82, 0.72, and 0.62 for GPP, RECO, and NEE, respectively, and with RMSE values of 1.54, 1.27, and 1.31 g C m −2 d −1 , respectively.Compared with GPP and RECO, NEE has relative low accuracy, especially for its temporal change, which should be examined in the application of GCFD.GCFD showed reasonable spatial and temporal variations, and in accordance with the vegetation conditions, including LAI and FAPAR.GCFD outperformed two other machine learning based products, including Zeng et al. and FLUXCOM, in providing more accurate spatial distribution with higher resolution and more accurate annual changes.GCFD has made a great contribution to carbon flux modelling by providing a high spatial resolution dataset using a deep learning model.With a spatial resolution up to 1 km, GCFD can be a useful complement to existing model-based and satellite-based datasets for various meteorological and ecological analyses and modelling, especially for the practical applications that require high resolution carbon flux maps.In the future, GCFD can be updated with more site observations and more predictors affecting carbon fluxes, such as CO 2 concentration, for longer time coverage.The application of deep learning tools in predicting carbon fluxes could also be an important field of study.

Figure 1 .
Figure 1.The framework of this study.The red dots are sites in the training set and the blue dots are site in the test set.

Figure 2 .
Figure 2. The coefficient of determination (a, R 2 ) and root mean square error (b, RMSE) for station-tostation cross-validation of three models.RF: random forest; ANN: artificial neural network; CNN: convolutional neural network.Furthermore, Figure 3 shows the scatter plots of the station-to-station cross-validation for the best model, i.e., CNN.CNN led to R 2 values of 0.82, 0.72, and 0.62.However, GPP had a larger RMSE (1.54 g C m −2 d −1 ) than RECO (1.27 g C m −2 d −1 ) and NEE (1.31 g C m −2 d −1).In the experiments, the scattered points were close to the 1:1 line and no significant underestimation or overestimation was observed.

Figure 3 .
Figure 3. Scatter plots of predicted and observed values of GPP, RECO, and NEE of station-to-station cross-validation for CNN model; (a-c) are GPP, RECO, and NEE results, respectively.

Figure 4 .
Figure 4.The R 2 (a) and RMSE (b) results of year-to-year validation by leaving out data in 2018.

Figure 5 .
Figure 5.The prediction bias of (a) GPP, (b) RECO, and (c) NEE from all sites using RF, ANN, and CNN.

Figure 6 .
Figure 6.Timeseries of GPP (gC m −2 d −1 ) observed by FLUXNET and predicted by three models at 4 sites, including (a) ZA-Kru in subtropic, (b) NL-Loo in oceanic climate, (c) FI-Hyy in continental climate, and (d) CH-Dav in polar climate.Red circles indicate significant overestimation.

Figure 7 .
Figure 7. Annual timeseries of GPP, RECO and NEE for all sites in the global and in seven climate zones.The number of sites in each area is given in brackets.

Figure 9 .
Figure 9. Bar chart of feature importance of random forest model: GPP (a), RECO (b), and NEE (c).

Figure 10 .
Figure 10.GPP maps of the global and an oceanic climate area from different products, FAPAR maps, and LAI maps on 10 July 2010.The resolution is 1 km for GCFD, 0.5 degree for FLUXCOM, and 0.1 degree for Zeng et al.
: The correlation coefficient matrix between the variables; Figure S2: Scatter plot of GPP, RECO and NEE observed data and predicted data by GCFD (Figure a, b and c), Zeng et al (Figure d, e and f) and FLUXCOM (Figure g, h and i); Figure S3: Scatter plot of GPP, RECO and NEE observed data and predicted data by CNN model at monthly scale in station-to-station validation (Figure a, b and c) and year-to-year validation (Figure d, e and f); Figure S4: RECO maps of the global and oceanic climate area from different products on 10th July 2010; Figure S5: NEE maps of the global and oceanic climate area from different products on 10th July 2010; Figure S6: GPP maps of the global and a continental climate area from different products, FAPAR maps and LAI maps on 10th July 2010; Figure S7: RECO maps of the global and a continental climate area from different products on 10th July 2010; Figure S8: NEE maps of the global and a continental climate area from different products on 10th July 2010; Figure S9: GPP maps of the global and a dry climate area from different products, FAPAR maps and LAI maps on 10th July 2010; Figure S10: RECO maps of the global and a dry climate area from different products on 10th July 2010; Figure S11: NEE maps of the global and a dry climate area from different products on 10th July 2010; Figure S12: GPP maps of the global and a subtropical climate area from different products, FAPAR maps and LAI maps on 10th July 2010; Figure S13: RECO maps of the global and a subtropical climate area from different products on 10th July 2010; Figure S14: NEE maps of the global and a subtropical climate area from different products on 10th July 2010.

Table 1 .
The simulation setting.