Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessFeature PaperArticle

Peer-Review Record

Machine Learning-Based Prediction of Ecosystem-Scale CO₂ Flux Measurements

Land 2025, 14(1), 124; https://doi.org/10.3390/land14010124

by Jeffrey Uyekawa¹

, John Leland¹

, Darby Bergl^2,3

, Yujie Liu^2,4

, Andrew D. Richardson^2,4,*

and Benjamin Lucas^1,*

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Land 2025, 14(1), 124; https://doi.org/10.3390/land14010124

Submission received: 13 December 2024 / Revised: 31 December 2024 / Accepted: 6 January 2025 / Published: 9 January 2025

(This article belongs to the Section Landscape Ecology)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

Title: Machine learning-based prediction of ecosystem-scale CO2 flux measurements.

ARTICLE CENTRAL POINT

This manuscript focuses on the comparison of seven machine learning algorithms to predict CO2 flux measurements based on several (up to 35) environmental drivers and site specific variables as predictors. The seven machine learning algorithms have been used to predict FCO₂ measurements collected every 30 minutes between January 1sst 2016 and June 30,2022. Data sets collected at 44 sites located across the U.S, and Puerto Rico have been analyzed. These 44 sites are thought to be representative of 19 out of 20 biomes.

Both, the experimental data sets and the computational work are very impressive, compared with other studies based on machine learning predictions. Estimations of CO₂ fluxes and reliable calculation of CO₂ are increasingly important in the context of studies related climate change.

OBJECTIVES

The objectives of this work are aligned with this job and they have been formulated as follows:

- Compare various common machine learning methods for predicting tower-based FCO₂

- Provide light on a generalized machine learning-based model that can predict FCO₂below a reasonable threshold based on tower-based measurements.

- Create an open source gap-filled FCO₂ dataset covering 44 unique sites for free use in the climate science community.

- Stablish an open source code repository for reproducibility and wider implementation.

NOVELTY

Also, this is study is novel, as it focused on predicting CO₂ fluxes, whereas most existing work using machine learning to predict CO₂ is devoted to estimation of the atmospheric CO₂ concentration

GENERAL COMMENTS

The manuscript is well organized and it is very well written.

The introduction mainly addresses FCO₂ and present the network of eddy covariance flux towers of the Ameriflux Network. Then, the objectives are clearly stated.

Next a so-called background information section describes more thoroughly the Ameriflux Net, including information about data quality and consistence and the theory of Natural Climate Solutions.

A section is devoted to literature review related to previous work in machine learning for prediction of CO₂. This is very pertinent.

The methodology is robust and the description is adequate. The section of Methods includes subsections about data sets, experimental design and the seven machine learning models used.

The results are clearly presented. They indicate that the so-called extreme Gradient Boosting (XGBoost) consistently produced the most accurate predictions. Again, results focused on predicting CO₂ fluxes and not CO₂ concentrations.

The Discussion sections is very long, as it thoroughly compares models and assess reliability of the obtained results in the various conditions studied. Please consider to move part of the content of the Discussion section to the Results section.

Conclusions are in accordance with the main results and they are linked with the objectives stated in the Introduction section.

Figures and Tables are needed. They contain important and impressive information and they are well commented in the text.

All the references are relevant and about one half of he references are from the last five years.

In summary, this is an interesting manuscript, bringing significant and novel scientific contributions and can be accept to be published in this journal with minor corrections.

MINOR SUGGESTIONS

1) Please, consider to enhance the Introduction section adding the basics of machine learning. This would be very helpful, in spite of the fact that the manuscript contains a section devoted to literature review of machine learning and climate change.

2) Table 1. Please, add units to the listed variables, where appropriate. For example soil and air temperature, etc., etc.

3) Table 4 didn.t belong to the Conclusion section. Please, move this Table above.

4) References should be in the format requested by the Land Journal.

Author Response

Comment 1: Please, consider to enhance the Introduction section adding the basics of machine learning. This would be very helpful, in spite of the fact that the manuscript contains a section devoted to literature review of machine learning and climate change.

Response: We thank the reviewer for the suggestion and, in response, have added a machine learning paragraph to the introduction.

Comment 2: Table 1. Please, add units to the listed variables, where appropriate. For example soil and air temperature, etc., etc.

Response: We have added the units to Table 1.

Comment 3: Table 4 didn.t belong to the Conclusion section. Please, move this Table above.

Response: We have moved Table 5 to be above the conclusion (Table 4 is referenced in section 5.2 and located on page 11, so we have assumed the reviewer meant Table 5).

Comment 4: References should be in the format requested by the Land Journal.

Response: This has been corrected.

Reviewer 2 Report

Comments and Suggestions for Authors

Dear all,

After reading the article entitled "Machine learning-based prediction of ecosystem-scale CO2 flux measurements", I have the following comments:

1- The analyzed article discusses the use of machine learning to predict ecosystem-scale CO2 fluxes, highlighting the XGBoost method as the most effective for filling data gaps in network measurements such as AmeriFlux. The article solves a major problem — the inconsistency and interruption in CO2 flux measurements at AmeriFlux towers. Creating models that fill data gaps is essential for carbon accounting and nature-based solutions. The comparative analysis of seven machine learning algorithms enriches the field, demonstrating the superiority of XGBoost in terms of RMSE and 𝑅2.

2- However, there is insufficient generalization in Distinct Ecosystems: Models have shown lower performance in ecologically unique regions, such as the Pacific Northwest and Puerto Rico. This limits the global applicability of the method.

3- I believe there is a Systematic Bias: The analysis indicates that the models tend to underestimate larger flows, reflecting a conservative bias in the predictions.

My Suggestions for Improvement:

1- Increase Data Diversity: Include more sites with unique ecological characteristics to improve model generalization.

2- Explore Hybrids: Combine machine learning with process-based models to capture causal relationships and complex ecological patterns.

3- Focus on Interannual Variability: Investigate how models can predict not only annual averages, but also deviations due to extreme weather events.

These actions may generate additional figures and analyses, but will greatly increase the quality of the research.

Author Response

Comment 1: Increase Data Diversity: Include more sites with unique ecological characteristics to improve model generalization.

Response: We thank the reviewer for the suggestion and agree that adding data from more sites with unique characteristics would improve generalization. However, we have included all sites in the NEON network in this work, and while AmeriFlux sites (and FLUXNET sites more generally) could provide additional data, the measurements are not standardized and therefore we cannot be assured of the consistency of the measurements. Additionally, the ancillary datasets we have used for predictors, soil Carbon and other ecological site characteristics, would not be readily available as they are provided through the NEON network.

We have included a paragraph in the conclusion suggesting that future work might be able to improve model performance in poorly represented/unique ecosystems such as Washington and Puerto Rico.

Comment 2: Explore Hybrids: Combine machine learning with process-based models to capture causal relationships and complex ecological patterns.

Response: We agree with the reviewer that this is a great idea and we would like to pursue this in future work. At present, this is far beyond the scope of this analysis.

Comment 3: Focus on Interannual Variability: Investigate how models can predict not only annual averages, but also deviations due to extreme weather events.

Response: We agree with the reviewer on this and note that our manuscript already addresses this in the paragraph starting on line 532. In summary, we agree that our model is not perfect but it is as good as, if not better than, current process-based models!

Reviewer 3 Report

Comments and Suggestions for Authors

Dear Authors,

The manuscript “Machine learning-based prediction of ecosystem-scale CO2 flux measurements ” has been reviewed, and my comments are as follows.

1 In the abstract, the advantages of the machine learning methods mentioned in the title need to be emphasized.

2 Regarding the performance assessmen, this study used RMSE and R2, I think a indicator of relative error is also recommended for accuracy evaluation.

3 For figure 6, I think the order of different sub-figures should be added to improve readability. I addition, for the 5-th and 6-th sub figures, some indicators, such as RMSE and R2 should also be added, as well as figure 7 and other similar figures.

4 I think the XGBoost Feature Importance needs more detailed explanations and enable readers to better understand features and models.

5 In the conclusion section, the authors need to further elaborate on potential limitations of their adopted methods in distinguishing the sources of carbon emissions. CO2 emissions are generated from various sources including transportation, land use land cover changes, and power generation. Therefore, source apportionment is also important in analyzing the flux of CO2. Following could be used as references, doi.org/10.1016/j.scs.2024.105770; doi.org/10.1080/10106049.2022.2142957; doi.org/10.1016/j.landusepol.2023.107019; to highlight the variety of carbon emissions sources.

6 In general, many figures in this manuscript need large improvement, Many details are missing, resulting in poor readability.

Author Response

Comment 1: In the abstract, the advantages of the machine learning methods mentioned in the title need to be emphasized.

Response: We thank the reviewer for this suggestion and have edited the abstract accordingly.

Comment 2: Regarding the performance assessment, this study used RMSE and R2, I think an indicator of relative error is also recommended for accuracy evaluation.

Response: We respectfully disagree with this statement, since FCO2 can be positive or negative and sometimes close to zero, this will give a skewed relative error.

Comment 3: For figure 6, I think the order of different sub-figures should be added to improve readability. I addition, for the 5-th and 6-th sub figures, some indicators, such as RMSE and R2 should also be added, as well as figure 7 and other similar figures.

Response: We thank the reviewer for this suggestion and have edited Figures 5,6, and 7.

Comment 4: I think the XGBoost Feature Importance needs more detailed explanations and enable readers to better understand features and models.

Response: We thank the reviewer for this suggestion, Section 5.3 now includes an explanation of how the feature importance of XGBoost is calculated, and its meaning.

Comment 5: In the conclusion section, the authors need to further elaborate on potential limitations of their adopted methods in distinguishing the sources of carbon emissions. CO2 emissions are generated from various sources including transportation, land use land cover changes, and power generation. Therefore, source apportionment is also important in analyzing the flux of CO2. Following could be used as references, doi.org/10.1016/j.scs.2024.105770; doi.org/10.1080/10106049.2022.2142957; doi.org/10.1016/j.landusepol.2023.107019; to highlight the variety of carbon emissions sources.

Response: We have broadened the conclusion section of the paper and included the references recommended by the reviewer.

Comment 6: In general, many figures in this manuscript need large improvement, Many details are missing, resulting in poor readability.

Response: We thank the reviewer for this suggestion and have edited some of the Figures in the manuscript.

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

Dear all,

I believe there has been significant progress in the content of the manuscript. I recommend approval.

Article Menu

Machine Learning-Based Prediction of Ecosystem-Scale CO₂ Flux Measurements

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Machine Learning-Based Prediction of Ecosystem-Scale CO2 Flux Measurements

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Machine Learning-Based Prediction of Ecosystem-Scale CO₂ Flux Measurements