Machine Learning-Based Prediction of Ecosystem-Scale CO2 Flux Measurements

Uyekawa, Jeffrey; Leland, John; Bergl, Darby; Liu, Yujie; Richardson, Andrew D.; Lucas, Benjamin

doi:10.3390/land14010124

Open AccessFeature PaperArticle

Machine Learning-Based Prediction of Ecosystem-Scale CO₂ Flux Measurements

by

Jeffrey Uyekawa

¹

,

John Leland

¹

,

Darby Bergl

^2,3

,

Yujie Liu

^2,4

,

Andrew D. Richardson

^2,4,*

and

Benjamin Lucas

^1,*

¹

Department of Mathematics and Statistics, Northern Arizona University, Flagstaff, AZ 86011, USA

²

Center for Ecosystem Science and Society, Northern Arizona University, Flagstaff, AZ 86011, USA

³

Department of Biology, Northern Arizona University, Flagstaff, AZ 86011, USA

⁴

School of Informatics, Computing & Cyber Systems, Northern Arizona University, Flagstaff, AZ 86011, USA

^*

Authors to whom correspondence should be addressed.

Land 2025, 14(1), 124; https://doi.org/10.3390/land14010124

Submission received: 13 December 2024 / Revised: 31 December 2024 / Accepted: 6 January 2025 / Published: 9 January 2025

(This article belongs to the Section Landscape Ecology)

Download

Browse Figures

Review Reports Versions Notes

Abstract

AmeriFlux is a network of hundreds of sites across the contiguous United States providing tower-based ecosystem-scale carbon dioxide flux measurements at 30 min temporal resolution. While geographically wide-ranging, over its existence the network has suffered from multiple issues including towers regularly ceasing operation for extended periods and a lack of standardization of measurements between sites. In this study, we use machine learning algorithms to predict CO₂ flux measurements at NEON sites (a subset of Ameriflux sites), creating a model to gap-fill measurements when sites are down or replace measurements when they are incorrect. Machine learning algorithms also have the ability to generalize to new sites, potentially even those without a flux tower. We compared the performance of seven machine learning algorithms using 35 environmental drivers and site-specific variables as predictors. We found that Extreme Gradient Boosting (XGBoost) consistently produced the most accurate predictions (Root Mean Squared Error of 1.81 μmolm⁻²s⁻¹, R² of 0.86). The model showed excellent performance testing on sites that are ecologically similar to other sites (the Mid Atlantic, New England, and the Rocky Mountains), but poorer performance at sites with fewer ecological similarities to other sites in the data (Pacific Northwest, Florida, and Puerto Rico). The results show strong potential for machine learning-based models to make more skillful predictions than state-of-the-art process-based models, being able to estimate the multi-year mean carbon balance to within an error ±50 gCm⁻²y⁻¹ for 29 of our 44 test sites. These results have significant implications for being able to accurately predict the carbon flux or gap-fill an extended outage at any AmeriFlux site, and for being able to quantify carbon flux in support of natural climate solutions.

Keywords:

carbon dioxide flux; nature-based climate solutions; machine learning; XGBoost; National Ecological Observatory Network; AmeriFlux; phenocam

1. Introduction

Tower-based, ecosystem-scale CO₂ flux measurements quantify the exchange of turbulence flux of CO₂ (FCO₂, measured in μmolm⁻²s⁻¹) between the land surface and the atmosphere. Plainly, FCO₂ measures how much CO₂ is moving into or out of an ecosystem, per unit area and per unit time. During daytime hours, most ecosystems are a strong sink for CO₂ (negative FCO₂, following the micrometeorological sign convention) as they remove CO₂ from the atmosphere through the process of photosynthesis. By comparison, during the night, ecosystems are generally a moderate source of CO₂ (positive FCO₂), as they release CO₂ back into the atmosphere through the process of respiration. FCO₂ is measured using a method known as eddy covariance (EC) [1]. Eddy covariance measurements are continuous in time (24 h a day, 7 days a week, 365 days a year) and are generally reported at an hourly or half-hourly temporal resolution. Global networks of eddy covariance flux towers collect in situ carbon flux measurements, providing information on photosynthesis dynamics across different ecosystems and under various environmental conditions.

Currently, FCO₂ is measured at hundreds of research sites across the United States, with 385 of these sites being members of the AmeriFlux network [2,3]. While these measurements run continuously at high frequency (e.g., at 5 Hz), practical limitations such as technical failures, instrument malfunction, and the necessity for filtering out data with low turbulent conditions have led to extended gaps in the collected data. Moreover, there has been little success resulting from attempts to standardize the measurements across sites within the AmeriFlux network, meaning that when measurements are available, they may be more or less reliable than another site. Together, these compromise the validity of the fluxes that are measured and reported without issue.

These are the issues we tackle in this paper: to develop a consistent, robust, and explainable method for quantifying FCO₂. To perform this, we experimented with seven machine learning algorithms and 37 explanatory variables (environmental ‘drivers’) to make predictions about the half-hourly, daily, and annual FCO₂ in 19 different ecosystems across the continental United States.

Machine learning algorithms are modern, computationally intensive statistical modeling techniques that learn from data to discover generalizable predictive patterns. The term includes many families of algorithms, including neural networks, tree-based models, and generalized linear models; however, the no free lunch theorem dictates that no model will outperform another in all applications [4]. Machine learning has found application in a large number of fields, including environmental monitoring; for an overview, see [5].

The primary contributions of this paper can be summarized as follows:

A wide-ranging comparison of many common machine learning methods for predicting tower-based FCO₂;
The discovery of a generalizable machine learning-based model that can predict FCO₂ to within 1.81 μmolm⁻²s⁻¹ of tower-based measurements;
An open-source gap-filled FCO₂ dataset covering 44 unique sites for free use by other researchers in the climate science community;
An open-source code repository for reproducibility and wider implementation.

The remainder of this paper is organized as follows: Section 2 describes the background and purpose of the AmeriFlux network and provides the reader with an explanation on the importance of quantifying carbon fluxes to the theory of Natural Climate Solutions; Section 3 discusses the state of existing work using machine learning for modeling CO₂ and other flux measurements; Section 4 details the data, algorithms, and structure of our experiments, while the results are presented in Section 5 and they are analyzed further in Section 6; finally, we draw conclusions, discuss limitations, and suggest future directions in Section 7.

2. Background Information

2.1. The AmeriFlux Network

The driving motivation for the establishment of AmeriFlux almost three decades ago was to measure the carbon balance of different ecosystems, and more specifically, to better understand the distribution of CO₂ sinks and sources across the continent [6]. While this is an ambitious goal, from this perspective, the sampling provided by AmeriFlux is woefully inadequate—assuming that all 385 AmeriFlux sites are currently active (which they often are not), this coverage equates to approximately 1 flux measurement site every 25,000 km².

Therefore, extrapolation and upscaling from individual sites to fine resolutions and regional and continental scales must be carried out using either process-based or statistical-based models. The former approach is attractive because these simulation models are based on state-of-the-art understanding of how the carbon cycle works. However, parameterization and initial conditions remain outstanding challenges, and past model validation efforts have highlighted serious model errors. By comparison, the latter approach is unattractive because many of these statistical approaches are essentially black boxes from which it is impossible to verify process-level representation. Standardization of inputs for statistical models is also a challenge, and, to the best of our knowledge, the validation of model predictions has generally not been conducted against independent datasets.

An extensive model–data comparison project of over 20 ecosystem models conducted under the North American Carbon Program found that process-based models generally performed poorly in representing site-level carbon flux dynamics across sites with varying land cover. Specifically, substantial model errors in representing FCO₂ were found at annual, seasonal, and diurnal time scales [7,8]; models misrepresented the inter-annual variability in observed CO₂ uptake [9]; models did not properly represent phenological transitions in spring or fall [10]; and models could not predict photosynthetic uptake within the uncertainty of observations [11]. These results lead to valid questions about the viability of using process-based models to evaluate natural climate solution strategies (discussed later in Section 6).

Statistical-based upscaling of FCO₂ began about two decades ago with the pioneering work of Papale et al. [12]. They used an artificial neural network, trained with CO₂ flux data from 16 measurement sites in Europe to calibrate a simulation model to predict CO₂ fluxes of European forests at 1 km resolution. Several years later, Xiao et al. [13] calibrated a modified regression tree model to FCO₂ measurements across the AmeriFlux network, using satellite observed greenness indicators, such as vegetation indices, leaf area index, and fraction of observed photosynthetically active radiation [14]. The sophistication of these kinds of upscaling efforts has matured over the last 15 years. The current state of the art is probably defined by the FLUXCOM project [15], which uses satellite remote sensing and gridded meteorological products to calibrate a model trained on FCO₂ measurements from sites around the world.

However, a challenge with past efforts to upscale site-level measurements is the lack of standardization in measurement protocols across sites. For example, across the AmeriFlux network, the choice of instrument setup and configuration, and even the details of flux data processing and corrections (which are critically important), may be different for each site. Furthermore, key instrumentation principles (e.g., open vs. closed path gas analyzer or sonic anemometer geometry), installation protocols (e.g., depth profiles of soil temperature and moisture measurements), measured and calibrated quantities (gravimetric vs. volumetric soil water content vs. soil water potential), and even units (hPa vs. kPa for vapor pressure deficit—easily converted, but also easily incorrectly reported or interpreted) are not consistent across sites. In particular, this lack of consistency of site variables across sites is a major barrier for any predictive modeling methods that use machine learning techniques.

The aforementioned inconsistency and variation in the AmeriFlux network’s data largely stem from its design as a “coalition of the willing”, where sites are set up and monitored by a large number of researchers (and consequently, research interests). Fortunately, FCO₂ data from the 47 long-term research sites operated within the National Ecological Observatory Network (NEON) are also contributed to the AmeriFlux data archive. NEON was specifically established to “collect long-term open access ecological data to better understand how U.S. ecosystems are changing” [16], and implicit in this mission statement is the need for standardization of measurement protocols and techniques across sites. This standardization opens up the possibility to use a machine learning algorithm to predict site-level FCO₂ without relying on gridded or reanalysis products as is necessary when using sites from AmeriFlux as a whole. Thus, the network of NEON sites represents an opportunity to train models on observational data across numerous sites which might be viewed as analogous to a model emulator [17]. The key difference is that this model is trained on real observations rather than the output of a simulation model.

NEON sites are strategically located, following a clustering algorithm to identify and group distinct regions of vegetation, landforms, and ecosystem dynamics into 20 different domains, as shown in Figure 1. Within each domain, at one or more monitoring sites, standardized measurements of environmental drivers (weather, solar radiation, etc.) are conducted along with ecosystem-level measurements of FCO₂ and other quantities measured by eddy covariance (e.g., sensible and latent heat fluxes).

As discussed in Section 4, our experiments leverage the reliability, consistency, and high quality of the data collected at NEON-based AmeriFlux sites to train machine learning models.

2.2. Natural Climate Solutions

Accurately quantifying the terrestrial–atmospheric exchange of carbon is vital to assessing the impact of environmental management projects and policies at all scales. Hemes et al. [18] argue that ecosystem-scale CO₂ flux measurements can play an important role in developing and evaluating climate mitigation strategies at the global level, while Hollinger et al. [19] noted the value of CO₂ flux measurements for quantifying the magnitude of carbon storage, on an annual basis, by a single evergreen forest. Both of these articles also highlight the value of accurate CO₂ flux measurements in the context of a theory known as Natural Climate Solutions (NCS).

NCS is a framework for adapting existing theory and knowledge of ecosystem science to mitigate the impact of anthropogenic climate change. It focuses on deliberate actions to manage, restore, and otherwise conserve ecosystems to increase the quantity of CO₂ that they remove from the atmosphere and store in slow-turnover carbon reservoirs, such as soil or woody biomass. While the role of terrestrial ecosystems in the global carbon cycle has been relatively well understood for decades [20,21], the theory of NCS was first defined in 2017 in a presentation by a group of scientists and practitioners at the Proceedings of the National Academy of Sciences [22]. Since this time, support for natural climate solutions has gradually gained momentum [23,24] (potentially because efforts to reduce fossil fuel emissions have not yet been successful).

A recent work [25] defined the five foundational principles of NCS, with Principle 4 reading: ‘There are multiple potential NCS actions that can occur in a given landscape and quantifying the overall magnitude of opportunity can help to focus efforts on the actions that can offer the largest mitigation returns. However, appropriate accounting is required to ensure that NCS potential is consistently and clearly quantified’. The authors argue that accurate carbon dioxide flux estimation is essential to the implementation and adoption of NCS, as it will help optimize the actions taken by governments and other land managers. The machine learning-based predictive modeling conducted in this paper aims to contribute to this important need.

3. Literature Review

The prediction of CO₂ is an important task in environmental sciences as rising levels of atmospheric CO₂ are the primary cause of climate change [26]. However, it is also a difficult predictive modeling problem, and relatively few studies have been conducted in this space, and even fewer using modern machine learning methods.

The majority of existing work using machine learning to predict CO₂ is focussed on estimating the atmospheric CO₂ concentration at various geographical scales (as opposed to our focus of predicting CO₂ flux). Alomar et al. [27] used extreme learning machines (a variation of feed-forward neural networks) to accurately predict CO₂ concentration based on a single site in Hawaii, while Hou et al. [28] used XGBoost to predict emissions in geographical regions in China. Conversely, Fang et al. [29] uses Gaussian processes, and Mardani et al. [30] uses a multi-stage neural network technique to predict national-level carbon emissions. A policy-based need for this modeling is evident in Zhang et al. [31], where a genetic algorithm was used to assess the impact of China’s ecological zones on its CO₂ emissions. Finally, Baareh et al. [32] took a time series forecasting approach to model CO₂ emissions with a neural network.

Model choice, hyperparameter tuning, and variable choice are always difficult in machine learning-based work, which is why Hamrani et al. [33] compared nine different machine learning algorithms to predict CO₂ for their specific agricultural sites, and Durmanov et al. [34] looked at the key enablers of greenhouse gases.

When considering the prediction of eddy covariance carbon dioxide flux with machine learning, the existing research either produced a low accuracy model, or a model that cannot generalize beyond the experimental site(s). For example, Tramontana et al. [35] predicted carbon dioxide and energy fluxes across global FLUXNET sites with four different algorithms, but the R² for net ecosystem exchange of CO₂ was less than 0.5. Alternatively, multiple works [36,37,38,39] all demonstrate high accuracy results using machine learning models to predict carbon dioxide on a single or a few experimental sites.

While most gap-filling techniques are process-based, Zhou et al. [40] used a variation of a random forest model to gap-fill extra long periods of missing values in carbon, heat, and energy fluxes. For further reading, a survey on using machine learning to predict various air pollutants, including CO₂, is presented in [41].

4. Methods

Our experiments compared the performance of seven machine learning algorithms to predict half-hourly FCO₂ measurements collected between 1 January 2016 and 30 June 2022 (Data were first accessed 1 June 2023). The experimental details are all provided in the following section and the code for the experiments is available at: https://github.com/jsl339/AmeriFlux.

4.1. Data

There are 47 NEON core terrestrial sites located across the U.S. and Puerto Rico, which strategically represent a range of vegetation, climate, and ecosystems divided into 20 different ecological domains as shown in Figure 1. Our experiments used data collected at 44 sites, as three sites—Marvin Klemme Range Research Station (OAES), Mountain Lake Biological Station (MLBS), and Puu Makaala Natural Area Reserve (PUUM)—were removed from the analysis due to inconsistencies in predictor variables, missing flux measurements, and errors arising during preprocessing. With these sites removed, our 44 sites represented 19 out of the 20 ecological domains (see [42] for general information about the data product).

We preprocessed the data using the R package REddyproc, as is the standard approach for gap-filling and u* filtering of carbon flux values. We used the U50 threshold to filter our u* values.

Table 1 shows a general explanation and summary statistics for the environmental drivers that we used as feature variables to learn our models. The data were sourced from 3 locations: AmeriFlux, the Phenocam Network, and MODIS satellite imagery [6,43,44].

Each site was assigned both a primary and secondary vegetation type from the following categories:

Agricultural (AG);
Deciduous Broadleaf (DB);
Evergreen Broadleaf (EB);
Evergreen Needleleaf (EN);
Grassland (GR);
Shrub (SH);
Tundra (TN).

After preprocessing, our final dataset consisted of 961,340 observations unevenly divided among the 44 NEON sites.

4.2. Experimental Design

We compared the predictive performance of seven machine learning algorithms (explained in Section 4.3 below) in two experimental scenarios. The experiments employ cross-validation, a common tool in machine learning experiments to ensure generalizability of the results. It performs this by ‘holding out’ some data (called a ‘fold’), training the model without it, and testing on the held-out data. This gives an unbiased estimate of how the model would have performed on unseen data. This process is then repeated and averaged for robustness. When the data are split at random into k folds, this is referred to as k-fold cross-validation.

In the first experimental scenario, we performed 10-fold cross-validation on the data. This means that the data were randomly divided into 10 folds, with each containing approximately 10% of the data. The models were then trained using 9 folds (90% of the available data) and tested on the remaining fold. This process was repeated so that each fold was used in the training set 9 times and appeared as the test set once (see Figure 2a for an illustrated explanation). The performance of each algorithm was reported as the average across the 10 different runs. We note that the data were divided into the same 10 folds for each predictive algorithm.

K-fold cross-validation is a common technique in the testing and comparison of machine learning algorithms as it removes selection bias (whether deliberate or not), and demonstrates the ability of the models to generalize to unseen data [45].

In the second experimental scenario, which we will refer to as leave-one-site-out cross-validation (L1SO CV), we began by partitioning the data by site, resulting in 44 uneven groups of data. We then employed a similar process to scenario one, where the models were trained on the all-but-one group and tested on the remaining group (an example is shown in Figure 2b). This was repeated so that each site was used as the test data once, and therefore, the stated performance metrics are the average of the 44 models fitted and tested.

The L1SO CV experiments present an inherently more difficult problem than the prior scenario as a predictive model significantly benefits from learning from data belonging to the test site. These experiments were included to replicate a situation where a site has no prior carbon flux recordings, i.e., it could be a new site or the instrumentation might not be functioning correctly. In addition, this experimental setup also tests whether we might be able to make a minimal set of measurements at a site with lower standardization in measurement protocols in order to predict the FCO₂. This would be helpful for carbon accounting purposes and nature-based carbon solutions, and also to enable a benchmark for land surface model simulations and checking existing datasets.

The performance of each model was assessed using 2 evaluation metrics—Root Mean Squared Error (RMSE) and the Coefficient of Determination (

R^{2}

). The RMSE is the square root of the average of the squared prediction errors over all of the data in the test set. Specifically,

R M S E (\hat{y}, y) = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(y_{i} - \hat{y_{i}})}^{2}}

where y is the measured (true) value and

\hat{y}

is the predicted value for a test set of size N. Due to the squared component of the metric, the RMSE is sensitive to large errors in any of the individual predictions.

The

R^{2}

evaluation metric is a measure of the goodness-of-fit of the linear model found by regressing the predicted values against the true values. It is calculated as

R^{2} (\hat{y}, y) = 1 - \frac{\sum_{i = 1}^{N} (y_{i} - \hat{y_{i}})}{\sum_{i = 1}^{N} (y_{i} - \bar{y})}

In contrast to RMSE,

R^{2}

is not sensitive to large errors in any of the individual predictions as it measures the amount of total variance accounted for by the predictions. When used together, these metrics complement each other and provide a more comprehensive picture of the performance of the algorithms. Each metric can then be analyzed on half-hourly, daily, and annual time scales. Ensuring accurate model predictions on an annual scale is important for reliable carbon accounting. However, it is also critical to evaluate model performance at finer temporal resolutions, such as half-hourly and daily scales, to ensure that our models produce accurate annual predictions for scientifically sound reasons. In order to produce meaningful predictions of annual sums of FCO₂ for each test site, we must first use models optimized in the 10-fold experimental setting to fill in missing FCO₂ values for each site before making predictions per site in the L1SO experimental setting.

4.3. Machine Learning Models

We compared the performance of 6 different machine learning models on predicting carbon dioxide flux, as follows:

Linear Regression (all predictors): This is a linear model including all of the variables using the maximum likelihood estimates for the coefficients. Linear regression assumes a linear relationship between the predictors and the response variable, which is unlikely in complex modeling problems, but does provide a baseline for the comparison of the performance of other models.
Stepwise Linear Regression: This model began by testing for the most significant single variable in a linear regression model, and then iteratively added variables and tested for greatest improvement. A threshold number of selection variables was set to 15 for this forward selection technique. In this way, we simplify the basic linear regression model to find feature variables with greater importance for linear prediction.
Decision Tree: A decision tree is a model based on recursively splitting the data on values of variables to maximize the difference between observations. Decision trees are most effective on problems where there is a non-linear relationship between the predictors and response variable [46,47]. The optimal tree depth was found to be 10, which was found through cross-validation.
Random Forest: A random forest model [48] is a bagged ensemble of decision trees. The algorithm creates an uncorrelated forest of decision trees by using random subsets of features in each tree. When predicting a regression variable with a random forest model, the overall prediction is the average of the results of each of its constituent trees.
Extreme Gradient Boosting (XGBoost): The XGBoost model [49] is a boosted ensemble of n underfit decision tree models. In practice, a decision tree is fit the to data and the errors in prediction are measured. Next, a second decision tree is used to fit the errors of the first tree. Then, a third decision tree is fit to the errors of the second tree, and we continue until we have n trees in our ensemble. The optimal number of trees in our ensemble was found to be 2000. We also set the number of rounds for early stopping to be 50, and we used a learning rate of 0.05, a max depth of 10, a subsample ratio of 0.5, and a subsample ratio of columns for each node of 0.45. Finally, we used the histogram-optimized approximate greedy algorithm for tree construction to optimize our XGBoost model. All hyperparameters were optimized through 10-fold cross-validation using an exhaustive grid search.
Neural Network (single layer): A neural network is the sum of weighted non-linear functions of the predictor variables. This model is a single-layer neural network, with 256 neurons in the hidden layer, and uses a feed-forward architecture with ReLU activation. Early stopping was implemented to prevent model over-fitting, and training was performed with a data loader with a batch size of 128. The learning rate was set to 0.0003, and the best performance was achieved with no weight decay using the Adam optimizer. For more information on the mathematics of neural networks, see [50,51].
Deep Neural Network: The model uses the same mathematical structure as the single-layer neural network, but increases the number of hidden layers to 3, each consisting of 256 neurons. Compared to the single-layer neural network, the increased depth of the model increases the number of parameters to learn, meaning the model is capable of modeling more complex relationships, but also takes longer to learn from the data.

5. Results

5.1. Results of 10-Fold Cross-Validation

The results for fitting each model and testing on each fold of the 10-fold cross-validation experiments are shown in Table 2 (RMSE). The XGBoost model and deep neural network were the only two models with an RMSE less than 2 μmolm⁻²s⁻¹. The strength of these models suggests that there are non-linearities in the the relationships between the environmental drivers and FCO₂. It is important to note that the XGBoost model outperformed our deep neural network at each stage throughout model development, and in addition, the XGboost model requires significantly less training time than either neural network.

After determining the optimal algorithm, we used the trained XGBoost model to gap-fill all of the missing values for each of the 44 sites. The resulting dataset, consisting of 4,068,459 observations, is freely available at https://zenodo.org/records/10719776 for use by other researchers in the climate science community.

5.2. L1SO Cross-Validation Results

The results for fitting each model and testing on each site of the leave-one-site-out cross-validation experiments are shown in Table 3 (RMSE) and Table 4 (R²). Again, the XGBoost model was superior to all others with a mean prediction RMSE of 2.45 μmolm⁻²s⁻¹. This is 35% greater than the RMSE of the same model in the 10-fold cross-validation experiments, demonstrating the substantial information the model gains from seeing data from the test site in the training set (as is the case in the 10-fold experiments).

The results also varied greatly between test sites—from an RMSE of 0.66 μmolm⁻²s⁻¹ up to 6.22 μmolm⁻²s⁻¹. The model performed best on Toolik (TOOL), as well as other sites with Tundra as the primary vegetation—Barrow Environmental Observatory (BARR), Healy (HEAL), and Niwot Ridge Mountain Research Station (NIWO)—suggesting that the environmental drivers for these sites are highly similar. Another justification for a lower model RMSE across sites with Tundra primary vegetation is that these sites in general experience smaller magnitude fluxes. Random errors scale with flux magnitude, so it is almost inevitable that sites with higher magnitude fluxes will have somewhat larger model–data mismatch.

The model performed worst on Lajas Experimental Station (LAJA), which is one of two sites in Puerto Rico, and together these two represent the only two sites with an evergreen broadleaf primary vegetation type. While we cannot separate the domain and primary vegetation effects here, we can say that our training data, which are mostly from mainland United States, does not generalize well when predicting FCO₂ in vastly different climates and ecosystems.

A map of the average RMSE per domain is shown in Figure 3.

5.3. XGBoost Feature Importance

The XGBoost algorithm has a built-in method for calculating the importance of each feature to the model’s performance. For our regression problem, we can quantify the increase in accuracy due to the split using the reduction in sum of squared errors resulting from that node, weight this value by the number of observations being split, and then attribute that value to the feature being split on. We then sum the values across the tree by iterating through all nodes that are not leaf nodes. Finally, we average the values over each feature over all of the trees in the model. Consequently, a greater value of importance tells us that a feature variable is better at splitting the data and therefore more useful to our model’s predictive ability. A plot of the twenty most important features for prediction is shown in Figure 4.

There are two input features that are noticeably more important to the model than others: EVI and net radiation. This is interesting as these are not measurements taken through site-level instrumentation, which suggests that we can learn a lot about the FCO₂ of a site just by knowing the vegetation greenness and the radiation environment of a site. Furthermore, six of the ten most important variables are continuous measurement variables, as opposed to the domain or vegetation categorical variables, meaning the model should generalize easier to any new sites of interest.

6. Discussion

6.1. Comparison of 10-Fold and L1SO Experimental Results

By making predictions on each site in both 10-fold and L1SO contexts, we are able to gain a greater understanding of model performance across the 44 NEON sites. We partitioned our model’s 10-fold RMSE by site, and treated a site’s average RMSE value as the irreducible error, that is, error that can be attributed to variability in the dataset, measurement errors, and the error inherent in using a model to predict a biological process. From there, we compare this irreducible error to the average RMSE values for each site obtained through L1SO CV experiments and therefore obtain an estimate of the amount of error attributable to testing on an ‘unseen’ sight, which we call the L1SO remainder. A visualization of the baseline error and its corresponding L1SO remainder for each site, ordered by ecological domain, is shown in Figure 5.

The L1SO remainder gives us a reasonable way to identify sites that are difficult to predict without having that site’s data available in the training set. We identified five NEON terrestrial sites that had an L1SO remainder greater than 0.85. These sites are Guanica Forest (GUAN), Lajas Experimental Station (LAJA), LBJ National Grassland (CLBJ), Disney Wilderness Preserve (DSNY), and Wind River Experimental Forest (WREF). There are several reasons that can justify why these sites in particular may be difficult for a model in an L1SO scenario. Firstly, Guanica Forest and Lajas Experimental Station are the only two sites in Puerto Rico and in ecological domain 4. In addition, these two sites are the only two whose primary vegetation type is evergreen broadleaf (EB).

Wind River Experimental Forest is a site in Washington state, located in an old growth forest with very tall trees with a real summer dry-down that restricts FCO₂. Overall, Wind River Experimental Forest is a very unusual site in comparison with the other NEON terrestrial sites.

For each of these five sites, we created a time series of predicted FCO₂ values and actual FCO₂ values reported both in half-hourly increments, and aggregated as an average for each day of the year, as well as a scatter plot of predicted FCO₂ vs. actual FCO₂ for analysis. We then compared these results to sites with the same primary vegetation types for which our model had superior performance. In the case of Guanica Forest and Lajas Experimental Station, since there were no other sites with the same primary vegetation type, both sites are included in Figure 6.

Steigerwaldt Land Services (STEI), Dakota Coteau Field School (DCFS), and Delta Junction (DEJU) were used as comparison for our other three sites representing primary vegetation types of DB, GR, and EN, respectively. These comparisons are found in Figure 7, Figure 8 and Figure 9.

Note that in most cases, we observed large systematic errors in model performance for our five sites with the greatest L1SO remainder values. For example, when considering scatter plots of predicted vs. observed FCO₂ for LAJA and WREF, the slope of predicted vs. observed FCO₂ is less than 1. At CLBJ, the magnitude of summertime uptake is under-predicted. At DSNY, the seasonality is represented well but there is a consistent offset of several μmolm⁻²s⁻¹, with predicted values higher than the measured values. By comparison, at STEI, DCFS, and DEJU, the magnitude and timing of predicted FCO₂ is much better.

What is interesting from this analysis is that even on sites with relatively high L1SO remainder, our model seems to perform well on average predicting patterns and dips in daily average FCO₂. It appears that most of the errors associated with sites with large L1SO remainder can be attributed to the model being too conservative in its predictions, that is, it predicts values closer to zero than the true measured flux values. As seen by the right column of plots in Figure 7, Figure 8 and Figure 9, sites of the same primary vegetation type where our model had stronger performance seem to generally have less large positive and negative flux values. This makes sense, since our model learns to minimize prediction error, and since each error ends up being squared, predicting very large positive or negative values in general would be more heavily penalized. A good example of this fact can be seen in the half-hourly time series for Lajas Experimental Station in Figure 6. This site has a mix of large positive and negative observed flux values, and our model rarely made large positive or negative predictions. Compare this to a site like Delta Junction in Figure 8. Here, there are a number of large negative observed flux values, but not as many large positive values. Spikes in the negative direction are less erratic, and model predictions, as a result, more closely represent measured flux values. When looking at scatter plots of predicted flux vs. observed flux, one can see that the results for sites in the right-hand column are more tightly clustered around the 1:1 line, resulting in higher

R^{2}

scores.

6.2. Relevance of L1SO Predictions for Unseen Sites

Historically, process-based models have been considered the “gold standard” for predicting ecosystem CO₂ fluxes. However, past model–data evaluation studies have shown that although process-based models can often predict daily or sub-daily fluxes that agree reasonably well with measured values, model performance on longer time scales (seasonal, annual, and inter-annual) is often quite poor [7,9,52]. Models that cannot accurately predict ecosystem carbon budgets on annual and inter-annual time scales are not likely to be useful for carbon accounting purposes or for developing strategies for nature-based climate solutions. This suggests that alternatives to process-based models are needed. While machine learning-based models have been used for flux upscaling for almost two decades [12,13,15], these analyses have generally attempted to extrapolate from individual sites to regions and continents using only remotely sensed variables as drivers. While this strategy is intuitively appealing, it is unable to leverage the site-level characteristics that are undoubtedly relevant for making fine-scale predictions. Indeed, basic ecosystem theory suggests that without accounting for these site-level characteristics such as disturbance and land use history, it is impossible to predict ecosystem carbon balance. Notably, we found that site characteristics related to vegetation type, as well as to soils, were identified as among the most important features for predicting FCO₂. However, remotely sensed variables from MODIS such as EVI and NDVI were found to be more important than site-level vegetation indices (e.g., GCC, RCC) derived from PhenoCam imagery. We can hypothesize that while PhenoCam imagery can provide phenological information at a fine spatial and temporal scale, it may be subject to issues related to the mismatch of footprints with eddy covariance flux measurements. In the case of heterogeneous landscapes, MODIS vegetation indices with larger spatial coverage may actually be more representative of seasonal variations in vegetation dynamics within the flux tower footprint.

Finally, we note that although site-level meteorological and environmental drivers (e.g., air temperature, relative humidity or VPD, soil temperature, soil moisture, and precipitation) were not ranked highly in terms of feature importance, this is not to say that these variables do not matter. Rather, it is likely that in the context of variation in FCO₂ from the Arctic to the Tropics, from winter to summer, and from day to night, that the additional information contributed by these variables explains only a small amount of the half-hourly variation in FCO₂, although it may contribute greatly to improved estimates of annual FCO₂.

A persistent challenge in estimating site-level carbon balance via FCO₂ measurements has always been that small but selectively systematic measurement errors in 30 min data can accumulate to large errors in annual integrals [10]. In our machine learning approach, selectively systematic prediction errors could occur if important meteorological or environmental variables were not accounted for as covariates. Omission of these variables might do little to impact the

R^{2}

calculated on 30 min values but could seriously impact annual flux integrals. Adoption of model optimization criteria that place more weight on reducing selectively systematic bias (which might not even show up when bias is calculated over a multi-year dataset) and improving predictive power on annual and multi-year time scales could be important for further improving the application of machine learning methods to carbon accounting and nature-based climate solutions.

6.3. Leveraging Site-Level Data When Standardized Model Inputs Are Not Available

Our feature importance plot (Figure 4) shows that, in spite of our assertion that site-level data are critical for correctly predicting ecosystem carbon balance, much of the information needed to predict half-hourly FCO₂ actually comes from variables that are already available from gridded land cover maps (i.e., vegetation type classifications), satellite data products that characterize phenology (i.e., EVI, NDVI), and basic energy balance data that are also widely available as satellite data products (e.g., net radiation). This suggests that there is the potential for leveraging the much greater abundance of AmeriFlux towers (for which site-level measurements are not standardized, but still useful), together with key remotely sensed data products to generate an initial map of ecosystem carbon balance. This initial map, when fused with elements of the analysis presented here, could lead to a hybrid data product that leverages the sampling intensity of AmeriFlux and the standardized sampling of NEON. The development of a data fusion platform such as that which is described here is beyond the scope of the present analysis, but it is potentially an exciting direction to be pursued in future research.

6.4. Annual Carbon Sums

For most sites, we managed to obtain low RMSE and high R² for predicting the measured half-hourly FCO₂, even in the L1SO analysis (Table 3 and Table 4). However, in the context of carbon accounting and nature-based climate solutions, it is more important to know the overall carbon balance on an annual time scale. That is, we want to answer the question of how much carbon (if any) the ecosystem is removing from the atmosphere and putting into biomass and soil carbon on an annual basis. This carbon balance reflects the balance between plant photosynthesis (carbon uptake or negative flux) and ecosystem respiration (carbon release or positive flux). It is a challenge for models, either process-based or data-driven, to correctly determine the overall carbon balance because of the opposing nature of these processes on different time scales. For example, in most ecosystems, there is a strong seasonal pattern of carbon uptake during the growing season and release during the dormant season. During the growing season, there is also a diurnal pattern of carbon uptake during the day and release during the night. Annually, the difference between photosynthesis and respiration is much smaller (0–30%) than the flux associated with either of these two key processes.

A model that predicts the annual carbon balance for an unknown site would be extremely valuable if it successfully estimated the multi-year mean carbon balance. The model would be even more useful if it successfully represented the inter-annual variability in carbon balance. State-of-the-art process-based models have generally failed to meet either of these targets [9]. Our results show that across all vegetation types, annual sums predicted in the L1SO analysis performed surprisingly well at achieving the first target (see Table 5). For 29 out of 44 sites (66%), the L1SO-predicted multi-year mean carbon balance was within ±50 gCm⁻²y⁻¹ of the “true” value estimated by gap-filling missing values in the CV analysis. This is quite remarkable given that the total uncertainty on the annual carbon balance, derived from gap-filled FCO₂ measurements, is typically estimated to be about ±50 gCm⁻²y⁻¹ [53]. However, for 7 of 44 sites (16%), the deviation between the L1SO-predicted multi-year mean and the “true” value was greater than 150 gCm⁻²y⁻¹. Three of these were deciduous broadleaf forest sites, one was an evergreen needleleaf forest site, and one was a grassland site. We expect that there may be land use history, disturbance, or similar factors that might explain these deviations, but were not included in our model.

Annual sums predicted in the L1SO analysis also performed reasonably well in representing the “true” inter-annual variability estimated from gap-filled time series. At more than a quarter of sites (12 of 44, 27%), the correlation of L1SO-predicted annual sums and the gap-filled annual sums was greater than 0.75, while for almost half of the sites (21 of 44, 47%), the correlation was greater than 0.50. While these results are based on at most 5 years of data per site, they point to the enormous potential of machine learning to predict not only the long-term carbon balance of an unknown site, but even the inter-annual variation in that carbon balance. By comparison, it has been known for more than a decade that even the most sophisticated process-based models are unable to capture this inter-annual variability [54,55,56], despite accurately capturing the dynamics of “fast” processes operating on time scales of hours to days.

7. Conclusions

In this paper, we showed the ability of machine learning-based models to make skillful predictions of tower-based CO₂ flux measurements. Specifically, we found that an XGBoost model trained on 37 environmental drivers, from 44 AmeriFlux sites, can predict FCO₂ at an unseen site to within an average error of 2.45 μmolm⁻²s⁻¹. Furthermore, this error reduces significantly—down to as little as 0.66 μmolm⁻²s⁻¹—when a site in the training data has similar ecological characteristics to the unseen sites. This suggests that, with strategic placement of instrumentation to record future data, there is potential to predict most locations of interest with high accuracy. Our research underscores the importance of integrating advanced modeling techniques into carbon accounting frameworks, enabling more accurate quantification of carbon sequestration potential and guiding the implementation of effective natural climate mitigation strategies.

While our results are a significant step forward for quantifying carbon fluxes, we note that this work, like all machine learning-based modeling, is limited by the quality and quantity of training data—in this case, the tower-based flux measurements. The predictive performance of our model is generally lower for unique ecosystems, such as in Washington and Puerto Rico. Further work would be required to fine-tune to the model if predicting one of these sites with high accuracy is of specific interest (such as using domain adaptation techniques [57,58] or increasing the data quantity in these regions).

NEON flux measurements, and those from AmeriFlux more generally, have substantial uncertainties [53], but for the most part, it is believed by the CO₂ flux measurement community that theoretically based corrections largely eliminate the systematic biases in measurements, and that the random errors then average out over long time scales. Unfortunately, validating the accuracy of these measurements is a challenge because of the numerous possible pathways by which CO₂ can be removed from, stored, and returned to the atmosphere. Despite the lack of complete standardization across AmeriFlux sites, it is widely believed that tower-based measurements of FCO₂ provide the most accurate and informative estimates of ecosystem carbon uptake and storage. Importantly, interpretation of annual FCO₂ is also possible in a cross-site context, whereas it is not so straightforward to compare biometric forest inventory measurements with estimates of grassland productivity based on biomass clipping, or estimates of agricultural productivity based on crop yield. For this reason, the ability of our ML-based model to successfully predict across-site variation in annual FCO₂ integrals, and within-site inter-annual variation in annual FCO₂ integrals, represents an important step forward in coast-to-coast mapping of ecosystem carbon balances, at fine spatial resolution, and in the application of these carbon balance estimates in implementing natural climate solutions.

Finally, we note that our model is currently designed for site-level analysis, and that there are many broader sources of CO₂ emissions and sinks (including power generation, transportation, and land use land cover change [59,60,61]). Each year an ensemble of dynamic global vegetation models embedded in Earth system models are synthesized to estimate the overall ‘land sink’ [62,63]. We anticipate that this is a challenge that can be met in future years with more data and better ML techniques.

Author Contributions

Conceptualization, A.D.R. and B.L.; methodology, A.D.R. and B.L.; validation, J.U., J.L. and B.L.; formal analysis, J.U. and J.L.; resources, D.B., Y.L. and A.D.R.; data curation, J.U., J.L., D.B. and Y.L.; writing—original draft preparation, J.U., A.D.R. and B.L.; writing—review and editing, J.U., J.L., D.B., Y.L., A.D.R. and B.L.; project administration, A.D.R. and B.L.; funding acquisition, A.D.R. All authors have read and agreed to the published version of the manuscript.

Funding

This project was funded by NSF awards 1702697 and 2105828.

Data Availability Statement

All processed data and code are at https://github.com/jsl339/AmeriFlux. The gap-filled dataset has been made available for download at https://zenodo.org/records/10719776.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Baldocchi, D.D. How eddy covariance flux measurements have contributed to our understanding of Global Change Biology. Glob. Change Biol. 2020, 26, 242–260. [Google Scholar] [CrossRef] [PubMed]
Novick, K.A.; Biederman, J.; Desai, A.; Litvak, M.; Moore, D.J.; Scott, R.; Torn, M. The AmeriFlux network: A coalition of the willing. Agric. For. Meteorol. 2018, 249, 444–456. [Google Scholar] [CrossRef]
Chu, H.; Christianson, D.S.; Cheah, Y.W.; Pastorello, G.; O’Brien, F.; Geden, J.; Ngo, S.T.; Hollowgrass, R.; Leibowitz, K.; Beekwilder, N.F.; et al. AmeriFlux BASE data pipeline to support network growth and data sharing. Sci. Data 2023, 10, 614. [Google Scholar] [CrossRef] [PubMed]
Wolpert, D. The Lack of A Priori Distinctions Between Learning Algorithms. Neural Comput. 1996, 8, 1341–1390. [Google Scholar] [CrossRef]
Hino, M.; Benami, E.; Brooks, N. Machine learning for environmental monitoring. Nat. Sustain. 2018, 1, 583–588. [Google Scholar] [CrossRef]
United States Department of Energy. AmeriFlux Management Project. 2023. Available online: https://ameriflux.lbl.gov/ (accessed on 1 June 2023).
Dietze, M.C.; Vargas, R.; Richardson, A.D.; Stoy, P.C.; Barr, A.G.; Anderson, R.S.; Arain, M.A.; Baker, I.T.; Black, T.A.; Chen, J.M.; et al. Characterizing the performance of ecosystem models across time scales: A spectral analysis of the North American Carbon Program site-level synthesis. J. Geophys. Res. Biogeosciences 2011, 116. [Google Scholar] [CrossRef]
Schwalm, C.R.; Williams, C.A.; Schaefer, K.; Anderson, R.; Arain, M.A.; Baker, I.; Barr, A.; Black, T.A.; Chen, G.; Chen, J.M.; et al. A model-data intercomparison of CO₂ exchange across North America: Results from the North American Carbon Program site synthesis. J. Geophys. Res. Biogeosci. 2010, 115. [Google Scholar] [CrossRef]
Keenan, T.; Baker, I.; Barr, A.; Ciais, P.; Davis, K.; Dietze, M.; Dragoni, D.; Gough, C.M.; Grant, R.; Hollinger, D.; et al. Terrestrial biosphere model performance for inter-annual variability of land-atmosphere CO₂ exchange. Glob. Change Biol. 2012, 18, 1971–1987. [Google Scholar] [CrossRef]
Richardson, A.D.; Anderson, R.S.; Arain, M.A.; Barr, A.G.; Bohrer, G.; Chen, G.; Chen, J.M.; Ciais, P.; Davis, K.J.; Desai, A.R.; et al. Terrestrial biosphere models need better representation of vegetation phenology: Results from the N orth A merican C arbon P rogram S ite S ynthesis. Glob. Change Biol. 2012, 18, 566–584. [Google Scholar] [CrossRef]
Schaefer, K.; Schwalm, C.R.; Williams, C.; Arain, M.A.; Barr, A.; Chen, J.M.; Davis, K.J.; Dimitrov, D.; Hilton, T.W.; Hollinger, D.Y.; et al. A model-data comparison of gross primary productivity: Results from the North American Carbon Program site synthesis. J. Geophys. Res. Biogeosci. 2012, 117. [Google Scholar] [CrossRef]
Papale, D.; Valentini, R. A new assessment of European forests carbon exchanges by eddy fluxes and artificial neural network spatialization. Glob. Change Biol. 2003, 9, 525–535. [Google Scholar] [CrossRef]
Xiao, J.; Zhuang, Q.; Baldocchi, D.D.; Law, B.E.; Richardson, A.D.; Chen, J.; Oren, R.; Starr, G.; Noormets, A.; Ma, S.; et al. Estimation of net ecosystem carbon exchange for the conterminous United States by combining MODIS and AmeriFlux data. Agric. For. Meteorol. 2008, 148, 1827–1847. [Google Scholar] [CrossRef]
Kang, Y.; Gaber, M.; Bassiouni, M.; Lu, X.; Keenan, T. CEDAR-GPP: Spatiotemporally upscaled estimates of gross primary productivity incorporating CO₂ fertilization. Earth Syst. Sci. Data Discuss. 2023, 2023, 1–51. [Google Scholar]
Jung, M.; Schwalm, C.; Migliavacca, M.; Walther, S.; Camps-Valls, G.; Koirala, S.; Anthoni, P.; Besnard, S.; Bodesheim, P.; Carvalhais, N.; et al. Scaling carbon fluxes from eddy covariance sites to globe: Synthesis and evaluation of the FLUXCOM approach. Biogeosciences 2020, 17, 1343–1365. [Google Scholar] [CrossRef]
Battelle. National Science Foundation’s National Ecological Observatory Network (NEON). 2024. Available online: https://www.neonscience.org/ (accessed on 1 June 2023).
Fer, I.; Kelly, R.; Moorcroft, P.R.; Richardson, A.D.; Cowdery, E.M.; Dietze, M.C. Linking big models to big data: Efficient ecosystem model calibration through Bayesian model emulation. Biogeosciences 2018, 15, 5801–5830. [Google Scholar] [CrossRef]
Hemes, K.S.; Runkle, B.R.; Novick, K.A.; Baldocchi, D.D.; Field, C.B. An ecosystem-scale flux measurement strategy to assess natural climate solutions. Environ. Sci. Technol. 2021, 55, 3494–3504. [Google Scholar] [CrossRef]
Hollinger, D.; Davidson, E.; Fraver, S.; Hughes, H.; Lee, J.; Richardson, A.; Savage, K.; Sihi, D.; Teets, A. Multi-decadal carbon cycle measurements indicate resistance to external drivers of change at the Howland forest AmeriFlux site. J. Geophys. Res. Biogeosciences 2021, 126, e2021JG006276. [Google Scholar] [CrossRef]
Wofsy, S.C.; Harris, R.C. The North American Carbon Program 2002. Technical Report, The Global Carbon Project. 2002. Available online: https://www.globalcarbonproject.org/global/pdf/thenorthamericancprogram2002.pdf (accessed on 1 June 2023).
Schimel, D.S.; House, J.I.; Hibbard, K.A.; Bousquet, P.; Ciais, P.; Peylin, P.; Braswell, B.H.; Apps, M.J.; Baker, D.; Bondeau, A.; et al. Recent patterns and mechanisms of carbon exchange by terrestrial ecosystems. Nature 2001, 414, 169–172. [Google Scholar] [CrossRef]
Griscom, B.W.; Adams, J.; Ellis, P.W.; Houghton, R.A.; Lomax, G.; Miteva, D.A.; Schlesinger, W.H.; Shoch, D.; Siikamäki, J.V.; Smith, P.; et al. Natural climate solutions. Proc. Natl. Acad. Sci. USA 2017, 114, 11645–11650. [Google Scholar] [CrossRef]
Fargione, J.E.; Bassett, S.; Boucher, T.; Bridgham, S.D.; Conant, R.T.; Cook-Patton, S.C.; Ellis, P.W.; Falcucci, A.; Fourqurean, J.W.; Gopalakrishna, T.; et al. Natural climate solutions for the United States. Sci. Adv. 2018, 4, eaat1869. [Google Scholar] [CrossRef]
Bossio, D.; Cook-Patton, S.; Ellis, P.; Fargione, J.; Sanderman, J.; Smith, P.; Wood, S.; Zomer, R.; Von Unger, M.; Emmer, I.; et al. The role of soil carbon in natural climate solutions. Nat. Sustain. 2020, 3, 391–398. [Google Scholar] [CrossRef]
Ellis, P.W.; Page, A.M.; Wood, S.; Fargione, J.; Masuda, Y.J.; Carrasco Denney, V.; Moore, C.; Kroeger, T.; Griscom, B.; Sanderman, J.; et al. The principles of natural climate solutions. Nat. Commun. 2024, 15, 547. [Google Scholar] [CrossRef] [PubMed]
Lee, H.; Calvin, K.; Dasgupta, D.; Krinner, G.; Mukherji, A.; Thorne, P.; Trisos, C.; Romero, J.; Aldunce, P.; Barret, K.; et al. IPCC, 2023: Climate Change 2023: Synthesis Report, Summary for Policymakers. Contribution of Working Groups I, II and III to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change; Core Writing Team, Lee, H., Romero, J., Eds.; Technical report; Intergovernmental Panel on Climate Change (IPCC): Geneva, Switzerland, 2023. [Google Scholar]
AlOmar, M.K.; Hameed, M.M.; Al-Ansari, N.; Razali, S.F.M.; AlSaadi, M.A. Short-, medium-, and long-term prediction of carbon dioxide emissions using wavelet-enhanced extreme learning machine. Civ. Eng. J. 2023, 9, 815–834. [Google Scholar] [CrossRef]
Hou, Y.; Liu, S. Predictive Modeling and Validation of Carbon Emissions from China’s Coastal Construction Industry: A BO-XGBoost Ensemble Approach. Sustainability 2024, 16, 4215. [Google Scholar] [CrossRef]
Fang, D.; Zhang, X.; Yu, Q.; Jin, T.C.; Tian, L. A novel method for carbon dioxide emission forecasting based on improved Gaussian processes regression. J. Clean. Prod. 2018, 173, 143–150. [Google Scholar] [CrossRef]
Mardani, A.; Liao, H.; Nilashi, M.; Alrasheedi, M.; Cavallaro, F. A multi-stage method to predict carbon dioxide emissions using dimensionality reduction, clustering, and machine learning techniques. J. Clean. Prod. 2020, 275, 122942. [Google Scholar] [CrossRef]
Zhang, Y.; Fu, B. Impact of China’s establishment of ecological civilization pilot zones on carbon dioxide emissions. J. Environ. Manag. 2023, 325, 116652. [Google Scholar] [CrossRef]
Baareh, A.K. Solving the Carbon Dioxide Emission Estimation Problem: An Artificial Neural Network Model. J. Softw. Eng. Appl. 2013, 6, 338–342. [Google Scholar] [CrossRef]
Hamrani, A.; Akbarzadeh, A.; Madramootoo, C.A. Machine learning for predicting greenhouse gas emissions from agricultural soils. Sci. Total Environ. 2020, 741, 140338. [Google Scholar] [CrossRef]
Durmanov, A.; Saidaxmedova, N.; Mamatkulov, M.; Rakhimova, K.; Askarov, N.; Khamrayeva, S.; Mukhtorov, A.; Khodjimukhamedova, S.; Madumarov, T.; Kurbanova, K. Sustainable growth of greenhouses: Investigating key enablers and impacts. Emerg. Sci. J. 2023, 7, 1674–1690. [Google Scholar] [CrossRef]
Tramontana, G.; Jung, M.; Schwalm, C.R.; Ichii, K.; Camps-Valls, G.; Ráduly, B.; Reichstein, M.; Arain, M.A.; Cescatti, A.; Kiely, G.; et al. Predicting carbon dioxide and energy fluxes across global FLUXNET sites with regression algorithms. Biogeosciences 2016, 13, 4291–4313. [Google Scholar] [CrossRef]
Dou, X.; Yang, Y.; Luo, J. Estimating forest carbon fluxes using machine learning techniques based on eddy covariance measurements. Sustainability 2018, 10, 203. [Google Scholar] [CrossRef]
Vais, A.; Mikhaylov, P.; Popova, V.; Nepovinnykh, A.; Nemich, V.; Andronova, A.; Mamedova, S. Carbon sequestration dynamics in urban-adjacent forests: A 50-year analysis. Civ. Eng. J. 2023, 9, 2205–2220. [Google Scholar] [CrossRef]
Zhao, J.; Lange, H.; Meissner, H. Estimating Carbon Sink Strength of Norway Spruce Forests Using Machine Learning. Forests 2022, 13, 1721. [Google Scholar] [CrossRef]
Safaei-Farouji, M.; Thanh, H.V.; Dai, Z.; Mehbodniya, A.; Rahimi, M.; Ashraf, U.; Radwan, A.E. Exploring the power of machine learning to predict carbon dioxide trapping efficiency in saline aquifers for carbon geological storage project. J. Clean. Prod. 2022, 372, 133778. [Google Scholar] [CrossRef]
Zhu, S.; Clement, R.; McCalmont, J.; Davies, C.A.; Hill, T. Stable gap-filling for longer eddy covariance data gaps: A globally validated machine-learning approach for carbon dioxide, water, and energy fluxes. Agric. For. Meteorol. 2022, 314, 108777. [Google Scholar] [CrossRef]
Madan, T.; Sagar, S.; Virmani, D. Air Quality Prediction using Machine Learning Algorithms –A Review. In Proceedings of the 2020 2nd International Conference on Advances in Computing, Communication Control and Networking, Greater Noida, India, 18–19 December 2020; pp. 140–145. [Google Scholar]
National Ecological Observatory Network (NEON). Bundled Data Products—Eddy Covariance (DP4.00200.001). 2024. Available online: https://data.neonscience.org/data-products/DP4.00200.001 (accessed on 1 June 2023).
Richardson, A.D.; Hufkens, K.; Milliman, T.; Aubrecht, D.M.; Chen, M.; Gray, J.M.; Johnston, M.R.; Keenan, T.F.; Klosterman, S.T.; Kosmala, M.; et al. Tracking vegetation phenology across diverse North American biomes using PhenoCam imagery. Sci. Data 2018, 5, 1–24. [Google Scholar] [CrossRef]
Seyednasrollah, B.; Young, A.M.; Hufkens, K.; Milliman, T.; Friedl, M.A.; Frolking, S.; Richardson, A.D. Tracking vegetation phenology across diverse biomes using Version 2.0 of the PhenoCam Dataset. Sci. Data 2019, 6, 222. [Google Scholar]
Rodriguez, J.D.; Perez, A.; Lozano, J.A. Sensitivity analysis of k-fold cross validation in prediction error estimation. IEEE Trans. Pattern Anal. Mach. Intell. 2009, 32, 569–575. [Google Scholar] [CrossRef]
Nie, F.; Zhu, W.; Li, X. Decision Tree SVM: An extension of linear SVM for non-linear classification. Neurocomputing 2020, 401, 153–159. [Google Scholar] [CrossRef]
Vanli, N.D.; Sayin, M.O.; Mohaghegh, M.; Ozkan, H.; Kozat, S.S. Nonlinear regression via incremental decision trees. Pattern Recognit. 2019, 86, 1–13. [Google Scholar] [CrossRef]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
James, G.; Witten, D.; Hastie, T.; Tibshirani, R. An Introduction to Statistical Learning: With Applications in R, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 2021. [Google Scholar]
Mahabbati, A.; Beringer, J.; Leopold, M.; McHugh, I.; Cleverly, J.; Isaac, P.; Izady, A. A comparison of gap-filling algorithms for eddy covariance fluxes and their drivers. Geosci. Instrum. Methods Data Syst. 2021, 10, 123–140. [Google Scholar] [CrossRef]
Stoy, P.C.; Dietze, M.C.; Richardson, A.D.; Vargas, R.; Barr, A.G.; Anderson, R.S.; Arain, M.A.; Baker, I.T.; Black, T.A.; Chen, J.M.; et al. Evaluating the agreement between measurements and models of net ecosystem exchange at different times and timescales using wavelet coherence: An example using data from the North American Carbon Program Site-Level Interim Synthesis. Biogeosciences 2013, 10, 6893–6909. [Google Scholar] [CrossRef]
Richardson, A.D.; Aubinet, M.; Barr, A.G.; Hollinger, D.Y.; Ibrom, A.; Lasslop, G.; Reichstein, M. Uncertainty Quantification. In Eddy Covariance: A Practical Guide to Measurement and Data Analysis; Aubinet, M., Vesala, T., Papale, D., Eds.; Springer: Dordrecht, The Netherlands, 2012; pp. 173–209. [Google Scholar]
Braswell, B.H.; Sacks, W.J.; Linder, E.; Schimel, D.S. Estimating diurnal to annual ecosystem parameters by synthesis of a carbon flux model with eddy covariance net ecosystem exchange observations. Glob. Change Biol. 2005, 11, 335–355. [Google Scholar] [CrossRef]
Siqueira, M.; Katul, G.G.; Sampson, D.; Stoy, P.C.; Juang, J.Y.; McCarthy, H.R.; Oren, R. Multiscale model intercomparisons of CO₂ and H₂O exchange rates in a maturing southeastern US pine forest. Glob. Change Biol. 2006, 12, 1189–1207. [Google Scholar] [CrossRef]
Ricciuto, D.M.; Davis, K.J.; Keller, K. A Bayesian calibration of a simple carbon cycle model: The role of observations in estimating and reducing uncertainty. Glob. Biogeochem. Cycles 2008, 22. [Google Scholar] [CrossRef]
Lucas, B.; Pelletier, C.; Schmidt, D.; Webb, G.I.; Petitjean, F. Unsupervised Domain Adaptation Techniques for Classification of Satellite Image Time Series. In Proceedings of the IGARSS 2020—2020 IEEE International Geoscience and Remote Sensing Symposium, Virtual Event, 26 September–2 October 2020; pp. 1074–1077. [Google Scholar]
Lucas, B.; Pelletier, C.; Inglada, J.; Schmidt, D.; Webb, G.; Petitjean, F. Exploring data quantity requirements for Domain Adaptation in the classification of satellite image time series. In Proceedings of the MultiTemp 2019, 10th International Workshop on the Analysis of Multitemporal Remote Sensing Images, Shanghai, China, 5–7 August 2019; Bovolo, F., Liu, S., Eds.; IEEE, Institute of Electrical and Electronics Engineers: Piscataway Township, NJ, USA, 2019. [Google Scholar]
Ou, Y.; Zheng, J.; Liang, Y.; Bao, Z. When green transportation backfires: High-speed rail’s impact on transport-sector carbon emissions from 315 Chinese cities. Sustain. Cities Soc. 2024, 114, 105770. [Google Scholar] [CrossRef]
Ou, Y.; Bao, Z.; Ng, S.T.; Song, W.; Chen, K. Land-use carbon emissions and built environment characteristics: A city-level quantitative analysis in emerging economies. Land Use Policy 2024, 137, 107019. [Google Scholar] [CrossRef]
Huang, C.; Xu, N. Quantifying urban expansion from 1985 to 2018 in large cities worldwide. Geocarto Int. 2022, 37, 18356–18371. [Google Scholar] [CrossRef]
Friedlingstein, P.; O’Sullivan, M.; Jones, M.W.; Andrew, R.M.; Hauck, J.; Landschützer, P.; Le Quéré, C.; Li, H.; Luijkx, I.T.; Olsen, A.; et al. Global Carbon Budget 2024. Earth Syst. Sci. Data 2024, 2024, 1–133. [Google Scholar] [CrossRef]
Friedlingstein, P.; O’Sullivan, M.; Jones, M.W.; Andrew, R.M.; Bakker, D.C.E.; Hauck, J.; Landschützer, P.; Le Quéré, C.; Luijkx, I.T.; Peters, G.P.; et al. Global Carbon Budget 2023. Earth Syst. Sci. Data 2023, 15, 5301–5369. [Google Scholar] [CrossRef]

Figure 1. A map of the NEON core terrestrial sites and their locations within the 19 ecological domains.

Figure 2. A visual explanation of the cross-validation techniques used in our experiments. (a) The data split into 10 random ‘folds’ with the model being trained on 9 folds and the tenth held out for testing. This process is repeated 10 times and the results are averaged. (b) The data stratified by site; in this case, the model is trained on an all-but-one site, and this is held out for testing. This is repeated until each site has been held out as the test site exactly once.

Figure 3. The average RMSE (μmolm⁻²s⁻¹) per domain for the leave-one-site-out experiments.

Figure 4. The twenty most important features of our XGBoost model.

Figure 5. Visualization of XGBoost L1SO RMSE remainder organized by domain number (shown as a prefix to the site code).

Figure 6. Time series and scatter plots of FCO₂ prediction error for sites with EB primary vegetation type.

Figure 7. Comparison of FCO₂ prediction error for sites with DB primary vegetation type. Site with poor model performance (CLBJ) is on the left and site with better model performance (STEI) on the right.

Figure 8. Comparison of FCO₂ prediction error for sites with GR primary vegetation type. Site with poor model performance (DSNY) is on the left and site with better model performance (DCFS) on the right.

Figure 9. Comparison of sites with EN primary vegetation type. Site with poor model performance (WREF) is on the left and site with better model performance (DEJU) on the right.

Table 1. Environmental drivers (feature variables) used as input to our machine learning models to predict carbon dioxide flux.

Variable	Description (Units)	Source
DOY	Day of Year (as percentage of a year)	AmeriFlux/NEON
HOUR	Hour Of Day (as percentage of a day)	AmeriFlux/NEON
TS_1_1_1	Soil Temperature Depth 1 (degrees C)	AmeriFlux/NEON
TS_1_2_1	Soil Temperature Depth 2 (degrees C)	AmeriFlux/NEON
PPFD	Photosynthetic Photon Flux Density (μmolPhoton m⁻² s⁻¹)	AmeriFlux/NEON
TAIR	Air Temperature (degrees C)	AmeriFlux/NEON
VPD	Vapor Pressure Deficit (hPa)	AmeriFlux/NEON
SWC_1_1_1	Soil Water Content (as percentage of volume)	AmeriFlux/NEON
PPFD_OUT	Photosynthetic Photon Flux Density, Outgoing (μmolPhoton m⁻² s⁻¹)	AmeriFlux/NEON
PPFD_BC_IN_1_1_1	Photosynthetic Photon Flux Density, Below Canopy Incoming (μmolPhoton m⁻² s⁻¹)	AmeriFlux/NEON
RH	Relative Humidity (percentage)	AmeriFlux/NEON
NETRAD	Net Radiation (W m⁻²)	AmeriFlux/NEON
USTAR	Friction velocity (ms⁻¹)	AmeriFlux/NEON
GCC_50	Green Chromatic Coordinate, median (dimensionless)	Phenocam
RCC_50	Red Chromatic Coordinate, median (dimensionless)	Phenocam
MAT_DAYMET	Mean Annual Temperature (degree C)	DAYMET
MAP_DAYMET	Mean Annual Precipitation (mm)	DAYMET
PVEG	Primary Vegetation Type (categorical)	Phenocam
SVEG	Secondary Vegetation Type (categorical)	Phenocam
LW_OUT	Longwave Radiation, Outgoing (W m⁻²)	AmeriFlux/NEON
DAILY PRECIPITATION	Daily Precipitation (mm)	AmeriFlux/NEON
PRCP1WEEK	Cummulative Precipitation 1 Week (mm)	AmeriFlux/NEON
PRCP2WEEK	Cumulative Precipitation 2 Week (mm)	AmeriFlux/NEON
NDVI	Normalized Difference Vegetation Index (dimensionless)	MODIS
EVI	Enhanced Vegetation Index (dimensionless)	MODIS
LAT	Latitude (decimal degrees)	Phenocam
LON	Longitude (decimal degrees)	Phenocam
ELEV	Elevation (meters)	Phenocam
DOMAIN	NEON Field Site Domain (categorical)	Phenocam
organic_C	Total Organic Carbon Stock in Soil Profile (g C m⁻²)	AmeriFlux/NEON
total_N	Total Nitrogen Stock in Soil Profile (g C m⁻²)	AmeriFlux/NEON
O_thickness	Total Thickness of Organic Horizon (cm)	AmeriFlux/NEON
A_pH	pH of A Horizon (dimensionless)	AmeriFlux/NEON
A_sand	Texture of A Horizon (% Sand)	AmeriFlux/NEON
A_silt	Texture of A Horizon (% Silt)	AmeriFlux/NEON
A_clay	Texture of A Horizon (% Clay)	AmeriFlux/NEON
A_BD	Bulk Density of A Horizon (g cm⁻³)	AmeriFlux/NEON

Table 2. Comparison of the RMSE and R² in predicting FCO₂ using seven machine learning models in a 10-fold cross-validation experimental setting (values shown are the average values across the 10 validation folds).

	Linear Reg	Stepwise	Decision Tree	Random Forest	XGB	NN 1-Layer	NN Deeper
RMSE	3.49	3.58	2.39	2.26	1.81	2.06	1.91
R²	0.48	0.46	0.76	0.77	0.86	0.82	0.85

Table 3. A comparison of the RMSE (μmolm⁻²s⁻¹) in predicting FCO₂ using seven machine learning models in a stratified leave-one-site-out cross-validation experimental setting.

Test Set	Site Code	Site Name	Primary Veg Type	Linear Reg	Stepwise	Decision Tree	Random Forest	XGB	NN 1-Layer	NN Deeper
1	PR-xGU	Guanica Forest (GUAN)	EB	4.83	4.47	5.83	5.32	3.49	5.95	6.48
2	PR-xLA	Lajas Experimental Station (LAJA)	EB	7.52	6.99	7.60	6.68	6.22	6.02	6.60
3	US-xAB	Abby Road (ABBY)	EN	7.25	4.45	4.72	3.86	3.43	3.55	3.66
4	US-xBA	Barrow Environmental Observatory (BARR)	TN	135.35	1.30	1.51	1.49	0.86	2.91	0.89
5	US-xBL	Blandy Experimental Farm (BLAN)	DB	4.10	3.96	2.77	2.69	2.62	2.89	2.98
6	US-xBN	Caribou Creek—Poker Flats Watershed (BONA)	EN	14.61	2.41	2.12	2.01	1.93	2.70	1.92
7	US-xBR	Bartlett Experimental Forest (BART)	DB	5.21	4.41	3.33	3.06	2.77	3.13	3.06
8	US-xCL	LBJ National Grassland (CLBJ)	DB	5.19	4.17	4.38	4.16	3.88	4.11	3.31
9	US-xCP	Central Plains Experimental Range (CPER)	GR	4.24	2.47	1.38	1.29	1.22	1.60	1.48
10	US-xDC	Dakota Coteau Field School (DCFS)	GR	20.35	2.70	1.79	1.70	1.61	1.64	1.74
11	US-xDJ	Delta Junction (DEJU)	EN	5.52	2.28	2.05	1.64	1.44	1.56	1.44
12	US-xDL	Dead Lake (DELA)	DB	9.86	5.29	4.36	4.21	3.84	4.23	4.26
13	US-xDS	Disney Wilderness Preserve (DSNY)	GR	10.21	3.03	3.64	3.25	3.33	2.67	3.35
14	US-xGR	Great Smoky Mountains National Park, Twin Creeks (GRSM)	DB	6.51	6.06	4.21	3.99	3.87	4.12	3.94
15	US-xHA	Harvard Forest (HARV)	DB	5.24	4.50	3.05	2.91	2.60	2.73	2.92
16	US-xHE	Healy (HEAL)	TN	5.03	1.72	2.00	1.65	1.15	1.77	1.17
17	US-xJE	Jones Ecological Research Center (JERC)	DB	6.07	4.37	3.75	3.46	3.19	3.43	3.41
18	US-xJR	Jornada LTER (JORN)	GR	2.56	1.79	1.25	1.23	1.17	1.76	1.26
19	US-xKA	Konza Prairie Biological Station - Relocatable (KONA)	AG	6.57	3.64	3.02	2.95	2.61	3.05	3.56
20	US-xKZ	Konza Prairie Biological Station (KONZ)	GR	6.88	3.57	2.60	2.23	2.21	2.06	2.16
21	US-xLE	Lenoir Landing (LENO)	DB	6.83	5.27	4.92	4.53	4.32	4.25	4.19
22	US-xMB	Moab (MOAB)	GR	8.63	1.86	0.73	0.71	0.68	1.54	0.68
23	US-xNG	Northern Great Plains Research Laboratory (NOGP)	GR	5.07	2.29	1.67	1.59	1.46	1.55	1.96
24	US-xNQ	Onaqui-Ault (ONAQ)	SH	4.01	1.73	1.17	1.11	1.05	1.90	1.21
25	US-xNW	Niwot Ridge Mountain Research Station (NIWO)	TN	9.63	1.46	0.85	0.80	0.74	1.86	1.76
26	US-xRM	Rocky Mountain National Park, CASTNET (RMNP)	EN	8.49	3.18	2.70	2.31	1.92	2.45	1.94
27	US-xRN	Oak Ridge National Lab (ORNL)	DB	5.75	5.11	4.43	4.22	3.68	3.92	3.61
28	US-xSB	Ordway-Swisher Biological Station (OSBS)	EN	7.77	3.40	3.06	2.78	2.63	3.17	3.08
29	US-xSC	Smithsonian Conservation Biology Institute (SCBI)	DB	4.53	4.11	3.36	3.00	2.86	3.12	2.98
30	US-xSE	Smithsonian Environmental Research Center (SERC)	DB	6.79	4.62	3.40	3.21	3.08	3.35	3.32
31	US-xSJ	San Joaquin Experimental Range (SJER)	EN	5.13	4.23	3.23	3.11	3.02	3.23	3.81
32	US-xSL	North Sterling, CO (STER)	AG	6.10	2.40	2.00	1.93	1.83	1.90	2.08
33	US-xSP	Soaproot Saddle (SOAP)	EN	3.57	3.58	4.16	3.86	2.50	2.78	2.67
34	US-xSR	Santa Rita Experimental Range (SRER)	SH	3.22	2.19	4.23	3.63	1.18	2.42	1.12
35	US-xST	Steigerwaldt Land Services (STEI)	DB	3.96	4.06	2.44	2.10	1.91	2.34	1.78
36	US-xTA	Talladega National Forest (TALL)	EN	5.36	5.16	4.53	4.33	3.34	3.77	3.98
37	US-xTE	Lower Teakettle (TEAK)	EN	6.11	3.07	2.99	2.93	2.53	2.48	2.95
38	US-xTL	Toolik (TOOL)	TN	134.54	1.44	1.24	0.79	0.66	2.12	0.96
39	US-xTR	Treehaven (TREE)	DB	5.13	3.89	2.41	2.35	2.12	2.61	2.21
40	US-xUK	The University of Kansas Field Station (UKFS)	DB	5.16	4.12	3.20	3.06	2.92	3.56	2.92
41	US-xUN	University of Notre Dame Environmental Research Center (UNDE)	DB	3.79	3.81	2.51	2.47	2.11	2.53	1.92
42	US-xWD	Woodworth (WOOD)	GR	5.16	2.21	1.77	1.61	1.49	1.52	1.70
43	US-xWR	Wind River Experimental Forest (WREF)	EN	7.53	5.31	5.89	5.82	4.67	4.92	4.68
44	US-xYE	Yellowstone Northern Range (Frog Rock) (YELL)	EN	5.05	2.49	2.10	2.05	1.61	1.71	1.74
	AVERAGE			12.28	3.51	3.05	2.82	2.45	2.88	2.70

Table 4. A comparison of the R² in predicting FCO₂ using seven machine learning models in a stratified leave-one-site-out cross-validation experimental setting.

Test Set	Site Code	Site Name	Primary Veg Type	Linear Reg	Stepwise	Decision Tree	Random Forest	XGBoost	NN (1-Layer)	NN (Deep)
1	PR-xGU	Guanica Forest (GUAN)	EB	0.07	0.21	−0.35	−0.12	0.52	−0.40	−0.67
2	PR-xLA	Lajas Experimental Station (LAJA)	EB	0.31	0.40	0.29	0.45	0.53	0.56	0.47
3	US-xAB	Abby Road (ABBY)	EN	−0.37	0.48	0.42	0.61	0.69	0.67	0.65
4	US-xBA	Barrow Environmental Observatory (BARR)	TN	−16,320.00	−0.51	−1.03	−0.97	0.34	−6.54	0.29
5	US-xBL	Blandy Experimental Farm (BLAN)	DB	0.54	0.57	0.79	0.80	0.81	0.77	0.76
6	US-xBN	Caribou Creek—Poker Flats Watershed (BONA)	EN	−33.28	0.07	0.28	0.35	0.40	−0.17	0.41
7	US-xBR	Bartlett Experimental Forest (BART)	DB	0.34	0.53	0.73	0.77	0.81	0.76	0.77
8	US-xCL	LBJ National Grassland (CLBJ)	DB	0.35	0.58	0.54	0.58	0.64	0.59	0.74
9	US-xCP	Central Plains Experimental Range (CPER)	GR	−4.44	−0.85	0.42	0.50	0.55	0.22	0.33
10	US-xDC	Dakota Coteau Field School (DCFS)	GR	−28.15	0.49	0.78	0.80	0.82	0.81	0.79
11	US-xDJ	Delta Junction (DEJU)	EN	−3.89	0.17	0.32	0.57	0.67	0.61	0.67
12	US-xDL	Dead Lake (DELA)	DB	−0.89	0.46	0.63	0.66	0.71	0.65	0.65
13	US-xDS	Disney Wilderness Preserve (DSNY)	GR	−3.07	0.64	0.48	0.59	0.57	0.72	0.56
14	US-xGR	Great Smoky Mountains National Park, Twin Creeks (GRSM)	DB	0.39	0.48	0.75	0.77	0.79	0.76	0.78
15	US-xHA	Harvard Forest (HARV)	DB	0.31	0.49	0.77	0.79	0.83	0.81	0.79
16	US-xHE	Healy (HEAL)	TN	−4.45	0.36	0.14	0.41	0.72	0.33	0.71
17	US-xJE	Jones Ecological Research Center (JERC)	DB	0.19	0.58	0.69	0.74	0.78	0.74	0.75
18	US-xJR	Jornada LTER (JORN)	GR	−2.75	−0.85	0.11	0.13	0.21	−0.77	0.09
19	US-xKA	Konza Prairie Biological Station - Relocatable (KONA)	AG	−1.33	0.28	0.51	0.53	0.63	0.50	0.31
20	US-xKZ	Konza Prairie Biological Station (KONZ)	GR	−0.85	0.50	0.74	0.81	0.81	0.83	0.82
21	US-xLE	Lenoir Landing (LENO)	DB	0.19	0.52	0.58	0.64	0.67	0.69	0.69
22	US-xMB	Moab (MOAB)	GR	−145.46	−5.79	−0.05	0.01	0.09	−3.66	0.09
23	US-xNG	Northern Great Plains Research Laboratory (NOGP)	GR	−2.17	0.36	0.66	0.69	0.74	0.71	0.52
24	US-xNQ	Onaqui-Ault (ONAQ)	SH	−7.30	−0.54	0.29	0.37	0.43	−0.87	0.25
25	US-xNW	Niwot Ridge Mountain Research Station (NIWO)	TN	−120.13	−1.77	0.05	0.17	0.28	−3.53	−3.04
26	US-xRM	Rocky Mountain National Park, CASTNET (RMNP)	EN	−5.45	0.09	0.35	0.52	0.67	0.46	0.66
27	US-xRN	Oak Ridge National Lab (ORNL)	DB	0.25	0.41	0.56	0.60	0.69	0.65	0.71
28	US-xSB	Ordway-Swisher Biological Station (OSBS)	EN	−1.39	0.54	0.63	0.69	0.73	0.60	0.62
29	US-xSC	Smithsonian Conservation Biology Institute (SCBI)	DB	0.42	0.52	0.68	0.74	0.77	0.72	0.75
30	US-xSE	Smithsonian Environmental Research Center (SERC)	DB	−0.01	0.53	0.75	0.77	0.79	0.75	0.76
31	US-xSJ	San Joaquin Experimental Range (SJER)	EN	−0.51	−0.03	0.40	0.44	0.47	0.40	0.17
32	US-xSL	North Sterling, CO (STER)	AG	−4.83	0.10	0.38	0.42	0.47	0.44	0.32
33	US-xSP	Soaproot Saddle (SOAP)	EN	−0.98	−0.98	−1.68	−1.31	0.03	−0.19	−0.10
34	US-xSR	Santa Rita Experimental Range (SRER)	SH	−7.73	−3.04	−14.04	−10.11	−0.18	−3.93	−0.06
35	US-xST	Steigerwaldt Land Services (STEI)	DB	0.53	0.50	0.82	0.87	0.89	0.83	0.90
36	US-xTA	Talladega National Forest (TALL)	EN	0.39	0.44	0.57	0.60	0.76	0.70	0.66
37	US-xTE	Lower Teakettle (TEAK)	EN	−2.27	0.17	0.22	0.25	0.44	0.46	0.24
38	US-xTL	Toolik (TOOL)	TN	−12181.30	−0.40	−0.03	0.58	0.71	−2.01	0.38
39	US-xTR	Treehaven (TREE)	DB	0.24	0.57	0.83	0.84	0.87	0.80	0.86
40	US-xUK	The University of Kansas Field Station (UKFS)	DB	0.24	0.52	0.71	0.73	0.76	0.64	0.76
41	US-xUN	University of Notre Dame Environmental Research Center (UNDE)	DB	0.56	0.55	0.81	0.81	0.86	0.80	0.89
42	US-xWD	Woodworth (WOOD)	GR	−2.01	0.45	0.65	0.71	0.75	0.74	0.67
43	US-xWR	Wind River Experimental Forest (WREF)	EN	−0.65	0.18	−0.01	0.02	0.37	0.30	0.36
44	US-xYE	Yellowstone Northern Range (Frog Rock) (YELL)	EN	−2.28	0.20	0.43	0.46	0.67	0.62	0.61
AVERAGE				−656.42	−0.02	0.06	0.23	0.60	−0.01	0.44

Table 5. Table of mean bias and correlation coefficient (r) using L1SO-predicted annual carbon sums and 10-fold projections of annual carbon sums.

Primary Vegetation	Site	Mean Bias	R
AG	US-xSL	−15.80	0.58
	US-xKA	4.73	0.22
	AVERAGE	−5.53 ± 10.27	0.40 ± 0.18
DB	US-xSC	−60.82
	US-xLE	134.12
	US-xJE	76.42	0.68
	US-xHA	−46.71	0.32
	US-xGR	20.46	0.05
	US-xRN	−67.14	0.82
	US-xDL	55.57	−0.56
	US-xST	21.37	0.75
	US-xSE	17.43	−0.36
	US-xCL	170.94	−0.78
	US-xBR	114.38	0.85
	US-xTR	−3.15	0.96
	US-xBL	135.44	0.019
	US-xUK	1.47	0.98
	US-xUN	4.95	−0.44
	AVERAGE	38.32 ± 71.72	0.25 ± 0.64
EB	PR-xLA	140.70
	PR-xGU	31.93
	AVERAGE	86.32 ± 54.38
EN	US-xSB	121.02	−0.20
	US-xSP	−44.92	0.48
	US-xTA	−68.90	0.32
	US-xTE	47.40	−0.35
	US-xSJ	−15.44	−0.65
	US-xRM	−48.55	−0.67
	US-xYE	4.31	0.57
	US-xDJ	20.68	0.24
	US-xWR	−10.31	−0.92
	US-xAB	52.29	0.09
	US-xBN	−18.25	0.55
	AVERAGE	3.57 ± 51.99	−0.05 ± 0.51
GR	US-xWD	−5.97	0.62
	US-xCP	−9.58	0.52
	US-xDC	21.99	0.88
	US-xMB	17.88	0.98
	US-xDS	230.26	−0.90
	US-xJR	28.82	0.63
	US-xKZ	19.34	−0.62
	US-xNG	34.74	0.85
	AVERAGE	42.18 ± 72.57	0.37 ± 0.67
SH	US-xSR	−61.37	0.99
	US-xNQ	63.5	0.99
	AVERAGE	1.07 ± 62.44	0.99 ± 0.01
TN	US-xNW	−28.12	0.95
	US-xHE	−12.39	0.76
	US-xTL	−22.12	0.54
	US-xBA	−29.72	0.81
	AVERAGE	−23.09 ± 6.80	0.77 ± 0.15

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Uyekawa, J.; Leland, J.; Bergl, D.; Liu, Y.; Richardson, A.D.; Lucas, B. Machine Learning-Based Prediction of Ecosystem-Scale CO₂ Flux Measurements. Land 2025, 14, 124. https://doi.org/10.3390/land14010124

AMA Style

Uyekawa J, Leland J, Bergl D, Liu Y, Richardson AD, Lucas B. Machine Learning-Based Prediction of Ecosystem-Scale CO₂ Flux Measurements. Land. 2025; 14(1):124. https://doi.org/10.3390/land14010124

Chicago/Turabian Style

Uyekawa, Jeffrey, John Leland, Darby Bergl, Yujie Liu, Andrew D. Richardson, and Benjamin Lucas. 2025. "Machine Learning-Based Prediction of Ecosystem-Scale CO₂ Flux Measurements" Land 14, no. 1: 124. https://doi.org/10.3390/land14010124

APA Style

Uyekawa, J., Leland, J., Bergl, D., Liu, Y., Richardson, A. D., & Lucas, B. (2025). Machine Learning-Based Prediction of Ecosystem-Scale CO₂ Flux Measurements. Land, 14(1), 124. https://doi.org/10.3390/land14010124

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu