Next Article in Journal
Ergonomics Management Evaluation Model for Supply Chain: An Axiomatic Design Approach
Previous Article in Journal
Predicting Prices of Staple Crops Using Machine Learning: A Systematic Review of Studies on Wheat, Corn, and Rice
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An XGBoost-Based Machine Learning Approach to Simulate Carbon Metrics for Forest Harvest Planning

1
FORAC Research Consortium, Université Laval, Quebec, QC G1V 0A6, Canada
2
Department of Wood and Forest Sciences, Pavillon Abitibi-Price, Université Laval, 2405, rue de la Terrasse, Quebec, QC G1V 0A6, Canada
3
Interuniversity Research Centre on Enterprise Networks, Logistics and Transportation (CIRRELT), Québec, QC G1V 0A6, Canada
4
Bureau du Forestier en Chef, Ministère des Ressources Naturelles et Forêts, Quebec, QC G1P 3W8, Canada
*
Author to whom correspondence should be addressed.
Sustainability 2025, 17(12), 5454; https://doi.org/10.3390/su17125454
Submission received: 21 April 2025 / Revised: 17 May 2025 / Accepted: 5 June 2025 / Published: 13 June 2025

Abstract

It has become increasingly important to incorporate carbon metrics in the forest harvest planning process. The Generic Carbon Budget Model (GCBM) is a well-recognized tool to evaluate the potential impact of management decisions on carbon sequestration and storage, supporting sustainable forest management planning. Although GCBM is effective in carbon budgeting and estimating carbon metrics, its computational complexity makes it difficult to integrate into forest planning with multiple scenarios. In this regard, this study proposes using machine algorithms to expedite the output generated by GCBM. XGBoost was implemented to estimate the carbon pool and NEP in managed forests of Quebec. Furthermore, polynomial regression was also implemented to serve as a validation benchmark. Datasets with total sizes of 13.53 million and 7.56 million samples were compiled for NEP and carbon pool forecasting to run the model. The results indicate that XGBoost was able to accurately replicate the performance of the GCBM model for both NEP forecasting (R2 = 0.883) and carbon pool estimation (R2 = 0.967 for aboveground biomass). Although machine learning approaches are comparatively faster, GCBM still offers better accuracy. Hence, the decision on which method to use, either machine learning or GCBM, should be dictated by the specific objectives and the constraints of the project.

1. Introduction

Sustainable management of large tracts of public forests requires formulating management plans based on three pillars of sustainability, i.e., social, economic, and environmental [1]. A top-down hierarchical planning framework is generally adopted for this purpose [2]. This framework consists of three distinct levels, i.e., strategic, tactical, and operational. Long-term strategic planning determines the annual allowable cut (AAC), which is generally derived using a linear programming optimization model. As such, it is an aspatial model that determines the maximum sustainable volumes of wood by tree species that can be harvested in the long term [3]. The tactical planning spatially disaggregates the volume targets set at the strategic level with the consideration of three pillars of sustainability [4]. The process determines the precise location of cutblocks available for harvest during different time periods. Further down the hierarchy, output from the tactical level aids in developing operational plans that provide a detailed harvesting schedule to meet industrial demand [5].
It has become increasingly important to incorporate forest carbon budgeting in forest management planning given the imminent threat posed by climate change [6]. Forests are believed to have a substantial role in mitigating climate change [7]. It is therefore crucial for forest planners and researchers to keep track of past and present carbon flux and stock dynamics to develop sound land-use policies [8,9]. Net ecosystem productivity (NEP) is an indicator of carbon flux within an ecosystem during a given period [10]. When the NEP exhibits a negative value, it is considered as a source of CO2, while a positive value often signifies its ability to act as a sink for CO2 [11,12]. Understanding carbon storage within the forest ecosystem is also equally important. This is because forests have the unique ability to store significant amounts of carbon in the form of woody biomass (dead and living), litter and soil [13]. Carbon budget modelling is often a preferred tool to understand the carbon stock under different management scenarios [8,14] and different carbon flux dynamics. For such applications, the Carbon Budget Model of the Canadian Forest Sector (CBM-CFS) has been commonly used [8,15,16,17]. Moreover, the recent version, i.e., CBM-CFS3, has implemented a Tier 3 Good Practice Guidance standard for carbon budget modelling [8,9]. Therefore, CBM-CFS3 is widely used for forest carbon accounting at the operational scale [14,18]. Typically, this model allows forest practitioners to estimate carbon metrics during the calculation of AAC. Nevertheless, CBM-CFS provides spatial carbon budgeting only for a limited area, which is a major shortcoming, because forest management planning is conducted in large tracts of forests. The Generic Carbon Budget Model (GCBM) uses the same basic foundations as CBM-CFS3, but in addition to being spatial, GCBM is fully capable of operating in a larger forest territory [19,20].
In forest management planning, consideration of carbon metrics in the spatial allocation phase is a major challenge [21]. It is worth pointing out that spatial allocation is a challenging numerical problem even when simply allocating volume [22,23]. Mixed integer programming is commonly used, and it presents computational limitations for large combinatorial problems [22,24]. The inclusion of GCBM, in this already complex process, would further exacerbate the problem as the timeframe required to obtain output is too lengthy for practitioners. This prevents evaluation of carbon metrics to select an optimal plan under different scenarios. We hypothesize that machine learning algorithms can help overcome this challenge by rapidly generating alternative plans which can subsequently be evaluated for carbon metrics in spatial forest planning. While the standalone GCBM model has been successfully implemented in various studies like [10,25], the issue of intensive computational cost still persists. This work is, to the best of our knowledge, among the first to operationalize ML training on GCBM outputs at a regional scale.
Machine learning approaches have already been successfully applied for forest structure prediction [26] and carbon mapping [27]. Machine learning algorithms can provide an accurate estimation within a practical time frame and with much lower computational requirements [26,27,28]. In this regard, the broad aim of this research is to limit the use of computationally complex approaches, such as GCBM, with machine learning algorithms that can provide instantaneous output so that multiple forest planning scenarios can rapidly be evaluated. The specific objective of this study is therefore to evaluate the capacity of machine learning algorithms to estimate net ecosystem productivity (NEP) and carbon pools in the context of spatial forest planning.

2. Materials and Methods

The methodology adopted in this study is summarized in Figure 1. First, the AACs for each study area were calculated using forest resource inventory data and a linear programming (LP) optimization model. Forest management tool (FMT) [29] was used to generate spatial output based on the aspatial solution generated by the LP model. Next, the GCBM model was used to estimate NEP and the carbon pool for the spatial solution. The carbon pool includes aboveground biomass, belowground biomass, deadwood, litter and soil carbon content. The output generated by GCBM was used to train the XGBoost (Extreme Gradient Boosting), a machine learning algorithm. As an added validation benchmark, polynomial regression was also carried out. The NEP and carbon pool were subsequently predicted using the trained ML model to evaluate their performance. A description of the study area is provided in the next subsection followed by description of AAC and carbon calculations, data preparation methods, and model building and evaluating procedures.

2.1. Study Area

This study was conducted in the province of Quebec, Canada where 12 management units (MUs) were selected from the following administrative regions: Côte Nord, Saguenay-Lac Saint-Jean, Capitale-Nationale and Nord-du-Québec (Figure 2). All 12 MUs are predominantly situated within the Canadian boreal forest and mostly consist of conifer tree species. The total area within each MU and their respective managed and unmanaged forest areas are presented in Table 1.

2.2. AAC and Carbon Calculations

First, AAC was determined using LP model II in the Woodstock Forest modeling system [30]. The output of the AAC was the determination of the maximum volume that can be harvested in the MU per period, by species. As such, it disregarded the spatial aspect. FMT was used to spatialize the disturbances and forest inventory at a 14.4 ha resolution. FMT is an open-source object-oriented library in C++ and was used through the Python programming language version 3.12 for spatial interpretation of Woodstock output.
Next, GCBM was used for carbon calculations as it is designed specifically to evaluate the carbon stock and fluxes of a forest [20]. The spatially explicit plan generated by Woodstock and FMT was input into GCBM along with yield curves, forest inventory data and historical disturbance information to simulate tree growth-related changes in the carbon stock. GCBM also takes into consideration aboveground and belowground biomass, deadwood biomass, litter and soil carbon based on the forest management practice adopted. It explicitly considers biomass mortality dynamics by accounting for the transfer from living to dead biomass [31]. It also incorporates the carbon emission due to the decomposition of dead biomass or direct oxidation within the atmosphere caused by fire disturbance. The list of variables essential to run the GCBM is listed as an Appendix A (Table A1). The next step required was to train machine learning algorithms to partly substitute for GCBM. As mentioned earlier, one of the significant limitations of GCBM is its high computational requirements. This can make it more challenging for forest practitioners to evaluate multiple scenarios.

2.3. Data Preparation and Dimensionality Reduction

This section provides a detailed description of the data preprocessing that serves as the inputs for the machine learning algorithms. The independent variables for the machine learning algorithms were the same as those used in GCBM (Table A1). This is because GCBM is a state-of-the-art spatial forest carbon model which considers different relevant inputs for its operation [10]. A total of 7.56 million samples were compiled for NEP forecasting and 13.53 million samples for carbon pool prediction. For validation purposes, the effectiveness of the machine learning algorithm was compared to the output of GCBM. Here, the objective was to predict the NEP and the carbon pools; therefore, it was deemed important to apply two different preprocessing approaches. For NEP forecasting, we selected two forest strata because it measures the carbon flux between two periods. For the carbon pool, a single stratum was used because it measures the amount of carbon stored at a given period. The selection of strata was followed by a data cleaning phase. Initially, duplicates as well as null or “NaN” were removed from the list of dependent and independent variables. After the elimination of “NaN” and duplicates, there were approximately 6.77 million samples with 18 independent variables for NEP forecasting and 13.52 million samples with 11 independent variables for carbon pool forecasting. Even after eliminating duplicates and null entries, the dataset was sufficiently large and considered adequate for training and testing the machine learning models.
Since the numbers of dependent variables were large i.e., 18 for NEP and 11 for carbon pools, Principal Component Analysis (PCA) was conducted as a dimensionality reduction technique. It is a technique for simplifying large datasets by transforming them into a lower dimensional space while retaining most of the original information. It works by identifying new axes which are called principal components. Principal components capture the maximum variance in the data, with each component being orthogonal (uncorrelated) to the others [32].
Initially, the data was standardized to have a mean of zero and unit variance using the StandardScaler from the sklearn.preprocessing module. PCA was then applied via the PCA class from sklearn.decomposition. A total of 5 principal components for carbon pool and 6 principal components for NEP were selected as they explained about 85% of the cumulative explained variance of the original data. The transformed principal components were subsequently used for downstream analysis. Then, the dataset was randomly split into two parts, i.e., training (80% of data) and testing (20%), using Scikit-learn with a fixed random seed (random_state = 42). This criterion was adopted for both NEP and carbon pool forecasting.

2.4. Model Selection

In this study, Extreme Gradient Boosting (XGBoost) was used to predict the carbon pool and NEP. To serve as a validation benchmark, polynomial regression was also used to predict the independent variables. A detailed description of the regression techniques utilized is given in the subsection below.

2.4.1. Polynomial Regression

A polynomial regression is a special type of multiple linear regression in which a curvilinear relationship is established between independent and dependent variables (Maulud and Abdulazeez 2020) [33]. The polynomial regression is given as:
y = β 0 + β 1 x + β 2 x 2 + + β n x n +
where y is the intended outcome (NEP and carbon pool) and β0 is the intercept, while βn represents the slope for each explanatory variable x (from Table A1) and is the error.
One of the major drawbacks of polynomial regression is its inability to deal with the presence of outliers within the dataset, which is believed to reduce performance of these models. Nevertheless, this study uses polynomial regression as a benchmark model and compares prediction with XGBoost.
For polynomial regression model building, the dataset was first split into training (80%) and testing (20%). To determine the optimal model complexity, a systematic grid search was conducted over polynomial degrees from 1 to 4. Similarly, the model was regularized with alpha values of 0.01, 0.1, 1, 10 and 100 using RidgeCV. The parameter combination of polynomial degree and regularization parameter yielding the highest R2 value was selected as the best model.

2.4.2. XGBoost

XGBoost is a powerful machine learning algorithm based on the gradient boosting framework and belongs to the family of ensemble learning methods. In this model, a series of decision trees are built, in which each tree attempts to correct the mistakes of the previous one. It incorporates features like regularization, which helps to prevent overfitting, and parallelization, which enables parallel computation during tree construction. Because of these advantages, it is an ideal choice for large-scale datasets due to its significantly faster performance compared to many other implementations. Despite its advantages, XGBoost can still be prone to overfitting if not properly tuned, particularly when dealing with noisy data [34].

2.5. Model Building

XGBoost regression was implemented using the XGBRegressor class from the xgboost python package. To accelerate model training, a histogram-based tree construction algorithm was used in conjunction with GPU acceleration using cuda.
For hyperparameter tuning, a Bayesian optimization approach was employed using the BayesSearchCV utility from the scikit-optimize library to find the best parameter from the given range of possible alternatives (Table 2). Bayesian optimization uses a probabilistic model to make informed decisions about where in the parameter space to sample next. It focuses on regions of the parameter space that are more likely to yield performance improvements based on the prior observations. Since Bayesian optimization avoids wasting resources on unpromising combinations, it is sample-efficient and faster [35].
Three-fold cross validation was used for internal performance validation. Since XGBoost is a tree-based algorithm, it is not sensitive to the scale of the features. Hence, data transformation methods like standardization and normalization are not required as tree-based models divide the data based on feature thresholds, not distance or magnitude [36]. Since PCA was already applied in advance, which required data scaling, no further data transformation was carried out for XGBoost.
As machine learning models are susceptible to overfitting and susceptible to noisy data, some precautions were also taken into account. First, the incorporation of PCA as a dimensionality reduction step helped to reduce feature collinearity and noise, in turn improving generalization. Likewise, as mentioned previously in the description of the XGBoost model, its inherent ensemble structure and decision tree-based approach are comparatively less sensitive to noisy or unscaled features, as splits are based on thresholds rather than distances.

2.6. Model Evaluation Criteria

Three metrics, namely coefficient of determination (R2), mean absolute error (MAE), and root mean squared error (RMSE), were used to evaluate overall model performance. The coefficient of determinants, R2, is an extensively used evaluation metric for a variety of applications [37]. The R2 value ranges between 0 and 1 and the model is considered to be effective if its value is close to 1. The two other metrics, namely RMSE and MAE, measure the deviation of modelled output from the observed value. The values of both MSE and MAE range between 0 and ∞ and the model with a value close to 0 is regarded as the best-performing. Among these three indicators, the R2 was the main performance metric used to assess performance of tested models. It is also worth pointing out that the results obtained with the testing dataset are linked to its distribution and it is impossible to guarantee that another dataset with completely different distributions would yield the same results. It should also be noted that all the processing was carried out in the Python programming language within the Anaconda environment using the sklearn package. Likewise, to determine the robustness of the model across different data partitions, a repeated sub sampling method was implemented using the ShuffleSplit method from the scikit-learn library. The methodology involved generation of 10 independent train–test splits (80–20 ratio). For each split, the machine learning model was retrained using the optimal hyperparameters previously identified through the Bayesian optimization and evaluated using the performance metrics.

3. Results

3.1. GCBM Predictions

Predictions of different carbon pool components and the net ecosystem productivity for the 12 management units derived from the GCBM model are shown in Figure 3. Generally, GCBM uses around 3.5 min per million simulations and predicts the NEP and carbon pools. Note that each prediction represents a forest stratum. The time taken by GCBM may vary by ±1 min depending upon the composition of the forest strata used. The violin plot shows the descriptive statistics of results from the year 2023 to 2158, broken down into 5-year periods. The width of the violin plot reflects the data’s distribution density, with wider sections indicating a higher frequency of observations. The central box plot conveys measures of central tendency and variability. Symbols and colors have been arbitrarily assigned to differentiate between various MUs. It predicted the annual average carbon stock (tonne ha−1) of aboveground biomass, belowground biomass, deadwood, litter and soil carbon. Also shown in the figure is the annual average carbon flux, i.e., net ecosystem productivity (tonne CO2e ha−1 year−1). Each subplot shows a noticeable amount of variability in carbon metrics and the net ecosystem productivity. This is due to the fact that our data contains variable forests with different age classes, even and uneven-aged forest management practice, climatic differences and disturbances. The majority of the carbon was stored on the ground or below (belowground biomass, deadwood, litter and soil). Nevertheless, aboveground biomass represents an important proportion of the carbon as well. When observing Figure 3, site-wise variability in the carbon pools and flux (NEP) is observed. During most of the time period (2023–2158), there is a great variability among each MU stratum. As an example, most of those strata show a negative NEP value, meaning that they act like carbon sources. However, strata producing emissions, which can be attributed to harvesting and other silvicultural treatments, represent a small portion in each MU. A high proportion of area in each MU is experiencing growth, so this is most likely the explanation for the carbon sink seen in these strata.

3.2. Descriptive Statistics

The descriptive statistics for the NEP are shown in Table 3 and those for the carbon pool are shown in Table 4. Each of the variables were of continuous data type.

3.3. Machine Learning Results

For our simulation, we used an i9-10900K processor computer with 64GB RAM and NVIDIA GeForce RTX 3080 graphics card. On average, it took approximately 35 min to train one dependent variable in the XGBoost model. After the Bayesian optimization process found the best parameters for the training (Table 5), it took approximately 45 s to predict the NEP and the carbon pools. Although the polynomial regression was the worst performing model, the computational requirement was insignificant as it only took on average a minute to train the model.
In the case of NEP, XGBoost was significantly better and outperformed polynomial regression in all of the metrics, i.e., R2, RMSE and MAE (Table 6). XGBoost yielded a higher R2, 0.883, along with lower RMSE and MAE values.
As the carbon pool includes aboveground biomass, belowground biomass, deadwood, litter and soil carbon content, it is important to segregate and identify whether ML models provide good outcomes for these individual variables. Similar to NEP, polynomial regression provided the least favourable outcome as compared to XGBoost when predicting the carbon contents, especially for the deadwood carbon pool component (i.e., R2 = 0.496) (Figure 4). Among all the variables, XGBoost also showed less favourable prediction for Deadwood (i.e., R2 = 0.872) (Figure 4) while still being significantly better than polynomial regression. Aboveground and belowground biomass were the only two variables which yielded acceptable R2 under polynomial regression. XGBoost showed the highest predictive accuracy for aboveground biomass (R2 = 0.968) followed by belowground biomass, litter, soil and deadwood. All in all, XGBoost was a comparatively better and more accurate predictive model when evaluated through the metrics of R2 (Figure 4), RMSE (Figure 5) and MAE (Figure 6). In the case of polynomial regression, the best degree was found to be 4 in all the instances.
The results of the ShuffleSplit method, which accounts for stability of the model across different data partitions, can be found in Appendix A Table A2. The result is summarized as the mean and standard deviation of the evaluation metrics across the 10 runs. The standard deviation across 10 runs was quite low, indicating that the model’s predictive performance was consistent across different data partitions. Although the training time for polynomial regression was minimal (less than one minute) despite the large dataset, the resulting predictions exhibited considerable variability and low accuracy and reliability across the NEP and carbon pool components. In contrast, XGBoost demonstrated superior performance in predicting both the NEP and carbon pool components with high accuracy. Although XGBoost requires a relatively longer initial training time, it remains within acceptable computational limits.

4. Discussion

To investigate the carbon fluxes and the stock dynamics of managed forests in Quebec, a set of 12 management units were chosen, and subsequently analysed using GCBM, a robust tool for carbon budgeting [10,19,20]. As expected, in these MUs, we observed site-wise variability for carbon pools and NEP. Several past studies have already provided detailed information regarding the role of forest age, structure, management, climate and disturbances which cause variability in carbon storage and flux [7,10,14,38,39], as is the case in these MUs. Moreover, our results, illustrated in Figure 3, indicate that the soil stored a significant proportion of carbon while deadwood stored the lowest amount. Our results are comparable to other studies that indicate that boreal forest stores a significant proportion of carbon in soil rather than in vegetation [40,41]. This can be attributed to the reduced organic carbon decomposition rates, particularly in the northern regions with colder climates [40]. The geographical distribution along with the adopted management practices also contributes to the variability in overall carbon content for these individual carbon pool components. These differences contribute to variability in the composition and structure of forests which in turn have substantial influence on soil carbon properties [42].
Based on the GCBM model results, MU-3771, located in Capitale-Nationale, exhibited greater mean carbon stock (Figure 3) than all other management units. As displayed in Figure 1, MU-3771 is situated in the southernmost region. Here, a favorable climate supports the existence of mixed wood forests. Such diverse forests often have an effective ability to capture a significant proportion of carbon [43]. For instance, [44] compared the variability in soil carbon sequestration potential for temperate and boreal forest tree species. The authors concluded that mixed forests had larger potential to store soil carbon than the spruce forest. On the contrary, it is also worth pointing out that the northern forests, with less diverse tree species, showcased variability in carbon pools. Several past studies have already highlighted the importance of species composition and carbon capture. Therefore, one can see the variability in the carbon pools across our management units. We also evaluated the NEP values derived using GCBM. Most of the strata had a negative NEP value which can be explained by logging activities that took place in the area [45], or the successional stage (late) of the forests [46,47]. The MUs are mostly viewed as carbon sinks over the long term > 100 years [31]. Our data is composed of a majority of carbon source strata, but these only represent a small amount of area for each MU.
Likewise, the results indicate that XGBoost model was effective in replicating the results of the GCBM model. The model demonstrated high accuracy and generalization capacity, as evidenced by the R2 value. XGBoost’s success as compared to the normal polynomial regression can be largely attributed to its ability to efficiently handle large datasets, and its superior abilities to capture non-linear feature interactions and mitigate overfitting through integrated regularization mechanisms [34]. Among the predicted variables, aboveground and belowground were predicted with the highest accuracy with R2 exceeding 0.96. Litter and soil carbon also exhibited high predictive performance with values around 0.90. NEP and deadwood had relatively lower predictive performance as compared to the others. Nevertheless, these deviations are minor, and the overall performance still remained within the acceptable limits. This variation is likely due to the inherent variability within the individual dataset and measurement uncertainty across the individual components.
A critical factor in achieving high model performance is the careful selection and tuning of hyperparameters, as the accuracy of a machine learning model is highly dependent upon the appropriate hyperparameters. While the hyperparameter space was optimized to a practical extent, the process was constrained by the computational burden posed by the large dataset. A wider search could have potentially improved the model accuracy even further; however, it was limited by time and hardware restrictions. Hence, there should be an optimal balance of computational efficiency and model accuracy.
Moving towards computational efficiency, other machine learning algorithms, particularly random forest and artificial neural networks, were also considered in the pilot phase, but ultimately discarded due to their relative inefficiency in handling large datasets. In preliminary comparisons, it took more than 4 h to train a single dependent variable for an artificial neural network as compared to XGBoost, which only took an average of 30 min. Likewise, random forest also took more than 7 h in the training phase. In addition to this, ANNs, while offering high representational capacity, are also sensitive to data transformation methods like scaling and normalization and require extensive training time. Hence, XGBoost was selected as the best machine learning model in our pilot study phase which offered a good balance of accuracy and efficiency.
It seems that XGBoost is best suited for scenarios where rapid decision-making is required, such as generating multiple forest management scenarios in tactical planning, conducting preliminary carbon assessments, or supporting stakeholder engagement through scenario comparison. Its speed and scalability make it particularly advantageous when hundreds or thousands of alternative management scenarios must be evaluated quickly. However, XGBoost is not a replacement for GCBM in contexts that demand high precision, or mechanistic understanding of carbon dynamics, for instance, in regulatory reporting, policy evaluation, carbon credit verification or long-term national carbon accounting. GCBM remains more suitable when detailed representations of ecological processes, forest succession, and disturbance interactions are critical. Therefore, we recommend a hybrid use, where XGBoost can rapidly screen options and GCBM is applied to a select set of high-priority scenarios for more robust carbon evaluation.
The objective of this study was to achieve the maximum possible accuracy within an acceptable time frame. Note that the models presented in this study exclusively rely on the datasets from the management units in Quebec. While the XGBoost model developed in this study demonstrated high predictive accuracy within Quebec’s boreal forest context, its applicability to other forest regions needs to be evaluated. The model was trained using region-specific variables, forest inventory data, disturbance regimes and climatic conditions unique to Quebec’s boreal forest. As such, applying the model to other provinces or countries with different ecological characteristics, forest compositions or management practices could lead to reduced accuracy. Model retraining or transfer learning with local data from other places should be used in future work to examine the transferability and applicability of the model in other ecological contexts.

5. Conclusions

The Generic Carbon Budget Model (GCBM) is a robust and widely utilized tool used for simulating carbon stocks and fluxes in forest ecosystems. However, its high computational demand poses a significant barrier for rapid scenario assessments, limiting its practical integration into dynamic forest management planning workflows. To address this limitation, we investigated the potential of machine learning, specifically XGBoost, to predict key carbon metrics. Our results demonstrated that XGBoost can effectively replicate the output of GCBM with high accuracy while drastically reducing computation time. For instance, the trained model achieved R2 values of 0.883 for NEP and 0.967 for aboveground biomass carbon. It was able to predict millions of outputs in less than a minute after model training. Polynomial regression, used as a benchmark, consistently underperformed as compared to XGBoost, further validating the model’s suitability.
Based on the results, it is proposed that machine learning models like XGBoost are suitable for tasks that involve rapid evaluation of multiple forest management scenarios. GCBM, in contrast, remains better suited for applications which demand high accuracy. A hybrid approach using a machine learning model to filter a wide set of management options and then applying GCBM to a smaller subset could offer a promising pathway for balancing speed and precision in carbon-informed forest planning.
Practically, our findings suggest that machine learning can be effectively integrated into existing spatial planning workflows to enable faster carbon evaluation without significantly compromising on accuracy. However, care must be taken when applying the trained models outside their original context, as the accuracy of the model is tied to the structure and conditions of the training dataset. For future work, we recommend developing streamlined methods to integrate ML models like XGBoost directly into forest planning tools. In addition, more extensive benchmarking against other machine learning algorithms and application of the model to forest regions outside Quebec would further test its generalizability and operational utility.

Author Contributions

Conceptualization, A.M., L.L., G.C. and J.-F.C.; Data curation, G.C., R.T. and J.-F.C.; Formal analysis, B.S. and A.M.; Funding acquisition, L.L. and J.-F.C.; Methodology, B.S., A.M. and G.C.; Project administration, L.L.; Software, B.S. and A.M.; Supervision, L.L. and S.G.; Validation, B.S., A.M., G.C. and R.T.; Writing—original draft, S.G.; Writing—review and editing, B.S., L.L. and S.G. All authors have read and agreed to the published version of the manuscript.

Funding

Funding for this research was obtained from the Ministère des Ressources Naturelles et Forêts, Gouvernement du Québec, Canada.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The underlying data for the study was obtained from the Bureau du forestier en chef, Ministère des Ressources Naturelles et Forêts, Québec, Canada. The authors do not hold the right to publish the data in its raw form. Readers may contact (bureau@fec.gouv.qc.ca) the Office of the Chief Forester directly for information regarding data access.

Acknowledgments

The authors wish to acknowledge Achut Parajuli for his contributions to an earlier draft of the manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
AACAnnual Allowable Cut
NEPNet Ecosystem Productivity
CO2Carbon Dioxide
CBM-CFSCarbon Budget Model of the Canadian Forest Sector
CBM-CFS3Carbon Budget Model of the Canadian Forest Sector, version 3
GCBMGeneric Carbon Budget Model
MLMachine Learning
FMTForest Management Tool
LPLinear Programming
MUManagement Unit
PCAPrincipal Component Analysis
R2Coefficient of Determination
RMSERoot Mean Squared Error
MAEMean Absolute Error
XGBoostExtreme Gradient Boosting
ANNArtificial Neural Network
MUsManagement Units

Appendix A

Table A1. List of independent variables essential to run GCBM and machine learning algorithms for prediction of carbon pool and NEP.
Table A1. List of independent variables essential to run GCBM and machine learning algorithms for prediction of carbon pool and NEP.
S.N.Net Ecosystem Productivity (NEP) VariablesAcronym
1Time in the period of 5 years since the last disturbance of the source developmentS1_distance
2Last source disturbanceS1_disturbance
3Time in the period of 5 years since the penultimate disturbance of the source development S2_distance
4Penultimate source disturbanceS2_disturbance
5Time in the period of 5 years since the disturbance of the source developmentS3_distance
6Source disturbanceS3_disturbance
7Source development age (age counted in 5-year periods)Source_age
8Volume of intolerant hardwood per hectare of source development (m3 ha−1)Source_YV_G_GFI
9Volume of tolerant hardwood per hectare of source development (m3 ha−1)Source_YV_G_GFT
10Volume of softwood per hectare of source development (m3 ha−1)Source_YV_G_GR
11Volume of hardwood per hectare of source development (m3 ha−1)Source_YV_G_GF
12Time in the period of 5 years since the disturbance occurred after the source periodTarget_distance
13Type of disturbance occurred between the source and the target period Target_disturbance
14Target development age (age is counted in 5-year periods)Target_age
15Volume of intolerant hardwood per hectare of target development (m3 ha−1)Target_YV_G_GFI
16Volume of tolerant hardwood per hectare of target development (m3 ha−1)Target_YV_G_GFT
17Volume of softwood per hectare of target development (m3 ha−1)Target_YV_G_GR
18Volume of hardwood per hectare of target development (m3 ha−1)Target_YV_G_GF
Carbon pool variablesAcronym
1Time in the period of 5 years since the last disturbance of the source developmentS1_distance
2Last source disturbanceS1_disturbance
3Time in the period of 5 years since the penultimate disturbance of the source development S2_distance
4Penultimate source disturbanceS2_disturbance
5Time in the period of 5 years since the disturbance of the source developmentS3_distance
6Source disturbanceS3_disturbance
7Source development age (age counted in 5-year periods)Age
8Volume of intolerant hardwood per hectare (m3 ha−1)YG_G_GFI
9Volume of tolerant hardwood per hectare (m3 ha−1)YG_G_GFT
10Volume of softwood per hectare (m3 ha−1)YG_G_GR
11Volume of hardwood per hectare (m3 ha−1)YG_G_GF
Table A2. Shuffle split cross-validation results (mean ± standard deviation) obtained after ten runs for R2, RMSE and MAE across target variables.
Table A2. Shuffle split cross-validation results (mean ± standard deviation) obtained after ten runs for R2, RMSE and MAE across target variables.
Target VariablesR2 (Mean ± std)RMSE (Mean ± std)MAE (Mean ± std)
NEP0.8825 ± 0.00040.4289 ± 0.00070.1956 ± 0.0002
Aboveground biomass0.9684 ± 0.00014.9520 ± 0.00651.8947 ± 0.0031
Belowground biomass0.9645 ± 0.00011.2230 ± 0.00150.4871 ± 0.0005
Deadwood0.8712 ± 0.00092.6144 ± 0.00821.3259 ± 0.0014
Litter0.9178 ± 0.00026.0740 ± 0.00853.4792 ± 0.0042
Soil carbon0.8970 ± 0.000310.3506 ± 0.01285.8932 ± 0.0053

References

  1. Purvis, B.; Mao, Y.; Robinson, D. Three Pillars of Sustainability: In Search of Conceptual Origins. Sustain. Sci. 2019, 14, 681–695. [Google Scholar] [CrossRef]
  2. Bettinger, P.; Boston, K.; Siry, J.P.; Grebner, D.L. Forest Management and Planning; Academic Press: Cambridge, MA, USA, 2016. [Google Scholar]
  3. Gautam, S.; LeBel, L.; Beaudoin, D. A Hierarchical Planning System to Assess the Impact of Operational-Level Flexibility on Long-Term Wood Supply. Can. J. For. Res. 2017, 47, 424–432. [Google Scholar] [CrossRef]
  4. Church, R.L.; Murray, A.T.; Barber, K.H. Forest Planning at the Tactical Level. Ann. Oper. Res. 2000, 95, 3–18. [Google Scholar] [CrossRef]
  5. Rodriguez, L.C.E.; Pasalodos-Tato, M.; Diaz-Balteiro, L.; McTague, J.P. The Importance of Industrial Forest Plantations. In The Management of Industrial Forest Plantations; Borges, J.G., Diaz-Balteiro, L., McDill, M.E., Rodriguez, L.C.E., Eds.; Managing Forest Ecosystems; Springer: Dordrecht, Netherlands, 2014; Volume 33, pp. 3–26. ISBN 978-94-017-8898-4. [Google Scholar]
  6. Dong, L.; Bettinger, P.; Liu, Z.; Qin, H. Spatial Forest Harvest Scheduling for Areas Involving Carbon and Timber Management Goals. Forests 2015, 6, 1362–1379. [Google Scholar] [CrossRef]
  7. Collalti, A.; Thornton, P.E.; Cescatti, A.; Rita, A.; Borghetti, M.; Nolè, A.; Trotta, C.; Ciais, P.; Matteucci, G. The Sensitivity of the Forest Carbon Budget Shifts Across Processes Along with Stand Development and Climate Change. Ecol. Appl. 2019, 29, e01837. [Google Scholar] [CrossRef]
  8. Wang, W.; Peng, C.; Larocque, G.R. Modeling Forest Carbon Budgets Toward Ecological Forest Management: Challenges and Future Directions. In Ecological Forest Management Handbook; CRC Press: Boca Raton, FL, USA, 2016; ISBN 978-0-429-18878-7. [Google Scholar]
  9. Kurz, W.A.; Dymond, C.C.; White, T.M.; Stinson, G.; Shaw, C.H.; Rampley, G.J.; Smyth, C.; Simpson, B.N.; Neilson, E.T.; Trofymow, J.A.; et al. CBM-CFS3: A Model of Carbon-Dynamics in Forestry and Land-Use Change Implementing IPCC Standards. Ecol. Model. 2009, 220, 480–504. [Google Scholar] [CrossRef]
  10. Shaw, C.H.; Rodrigue, S.; Voicu, M.F.; Latifovic, R.; Pouliot, D.; Hayne, S.; Fellows, M.; Kurz, W.A. Cumulative Effects of Natural and Anthropogenic Disturbances on the Forest Carbon Balance in the Oil Sands Region of Alberta, Canada; A Pilot Study (1985–2012). Carbon Balance Manag. 2021, 16, 3. [Google Scholar] [CrossRef]
  11. Fu, Z.; Stoy, P.C.; Luo, Y.; Chen, J.; Sun, J.; Montagnani, L.; Wohlfahrt, G.; Rahman, A.F.; Rambal, S.; Bernhofer, C.; et al. Climate Controls over the Net Carbon Uptake Period and Amplitude of Net Ecosystem Production in Temperate and Boreal Ecosystems. Agric. For. Meteorol. 2017, 243, 9–18. [Google Scholar] [CrossRef]
  12. Janisch, J.E.; Harmon, M.E. Successional Changes in Live and Dead Wood Carbon Stores: Implications for Net Ecosystem Productivity. Tree Physiol. 2002, 22, 77–89. [Google Scholar] [CrossRef]
  13. Sharrow, S.H.; Ismail, S. Carbon and Nitrogen Storage in Agroforests, Tree Plantations, and Pastures in Western Oregon, USA. Agrofor. Syst. 2004, 60, 123–130. [Google Scholar] [CrossRef]
  14. Pilli, R.; Kull, S.J.; Blujdea, V.N.B.; Grassi, G. The Carbon Budget Model of the Canadian Forest Sector (CBM-CFS3): Customization of the Archive Index Database for European Union Countries. Ann. For. Sci. 2018, 75, 71. [Google Scholar] [CrossRef]
  15. Kurz, W.A.; Apps, M.J. The Carbon Budget of Canadian Forests: A Sensitivity Analysis of Changes in Disturbance Regimes, Growth Rates, and Decomposition Rates. Environ. Pollut. 1994, 83, 55–61. [Google Scholar] [CrossRef] [PubMed]
  16. Kurz, W.A.; Shaw, C.H.; Boisvenue, C.; Stinson, G.; Metsaranta, J.; Leckie, D.; Dyk, A.; Smyth, C.; Neilson, E.T. Carbon in Canada’s Boreal Forest—A Synthesis. Environ. Rev. 2013, 21, 260–292. [Google Scholar] [CrossRef]
  17. Smiley, B.P.; Trofymow, J.A.; Niemann, K.O. Spatially-Explicit Reconstruction of 100 Years of Forest Land Use and Disturbance on a Coastal British Columbia Douglas-Fir-Dominated Landscape: Implications for Future Watershed-Scale Carbon Stock Recovery. Appl. Geogr. 2016, 74, 109–122. [Google Scholar] [CrossRef]
  18. Heffner, J.; Steenberg, J.; Leblon, B. Comparison Between Empirical Models and the CBM-CFS3 Carbon Budget Model to Predict Carbon Stocks and Yields in Nova Scotia Forests. Forests 2021, 12, 1235. [Google Scholar] [CrossRef]
  19. Böttcher, H.; Freibauer, A.; Obersteiner, M.; Schulze, E.-D. Uncertainty Analysis of Climate Change Mitigation Options in the Forestry Sector Using a Generic Carbon Budget Model. Ecol. Model. 2008, 213, 45–62. [Google Scholar] [CrossRef]
  20. Magnus, G.K.; Celanowicz, E.; Voicu, M.; Hafer, M.; Metsaranta, J.M.; Dyk, A.; Kurz, W.A. Growing Our Future: Assessing the Outcome of Afforestation Programs in Ontario, Canada. For. Chron. 2021, 97, 179–190. [Google Scholar] [CrossRef]
  21. Dong, L.; Lu, W.; Liu, Z. Developing Alternative Forest Spatial Management Plans When Carbon and Timber Values Are Considered: A Real Case from Northeastern China. Ecol. Model. 2018, 385, 45–57. [Google Scholar] [CrossRef]
  22. Bettinger, P.; Graetz, D.; Boston, K.; Sessions, J.; Chung, W. Eight Heuristic Planning Techniques Applied to Three Increasingly Difficult Wildlife Planning Problems. Silva Fenn. 2002, 36, 561–584. [Google Scholar] [CrossRef]
  23. Martín-Fernández, S.; García-Abril, A. Optimisation of Spatial Allocation of Forestry Activities Within a Forest Stand. Comput. Electron. Agric. 2005, 49, 159–174. [Google Scholar] [CrossRef]
  24. Troncoso, J.; D’Amours, S.; Flisberg, P.; Rönnqvist, M.; Weintraub, A. A Mixed Integer Programming Model to Evaluate Integrating Strategies in the Forest Value Chain—A Case Study in the Chilean Forest Industry. Can. J. For. Res. 2015, 45, 937–949. [Google Scholar] [CrossRef]
  25. Ko, Y.; Song, C.; Fellows, M.; Kim, M.; Hong, M.; Kurz, W.A.; Metsaranta, J.; Son, J.; Lee, W.-K. Generic Carbon Budget Model for Assessing National Carbon Dynamics Toward Carbon Neutrality: A Case Study of Republic of Korea. Forests 2024, 15, 877. [Google Scholar] [CrossRef]
  26. Zhao, Q.; Yu, S.; Zhao, F.; Tian, L.; Zhao, Z. Comparison of Machine Learning Algorithms for Forest Parameter Estimations and Application for Forest Quality Assessments. For. Ecol. Manag. 2019, 434, 224–234. [Google Scholar] [CrossRef]
  27. Mascaro, J.; Asner, G.P.; Knapp, D.E.; Kennedy-Bowdoin, T.; Martin, R.E.; Anderson, C.; Higgins, M.; Chadwick, K.D. A Tale of Two “Forests”: Random Forest Machine Learning Aids Tropical Forest Carbon Mapping. PLoS ONE 2014, 9, e85993. [Google Scholar] [CrossRef] [PubMed]
  28. Parajuli, A.; Nadeau, D.F.; Anctil, F.; Parent, A.-C.; Bouchard, B.; Girard, M.; Jutras, S. Exploring the Spatiotemporal Variability of the Snow Water Equivalent in a Small Boreal Forest Catchment Through Observation and Modelling. Hydrol. Process. 2020, 34, 2628–2644. [Google Scholar] [CrossRef]
  29. Cyr, G.; Forest, B.; Hardy, C. FMT (Forest Management Tool); Bureau du Forestier en chef du Québec: Roberval, QC, Canada, 2019. [Google Scholar]
  30. Woodstock Forest Management Software, Remsoft Inc.: Fredericton, NB, Canada, 1997.
  31. Bilan Provincial du Carbone Forestier—Période 2023–2025. Bureau du Forestier en chef du Québec: Roberval, QC, Canada, 2022. Available online: https://forestierenchef.gouv.qc.ca/wp-content/uploads/rap-00629-rapport-sur-levaluation-du-carbone-des-unites-damenagement-4.0.2.pdf (accessed on 22 July 2024).
  32. Abdi, H.; Williams, L.J. Principal Component Analysis. WIREs Comput. Stat. 2010, 2, 433–459. [Google Scholar] [CrossRef]
  33. Maulud, D.H.; Abdulazeez, A.M. A Review on Linear Regression Comprehensive in Machine Learning. J. Appl. Sci. Technol. Trends 2020, 1, 140–147. [Google Scholar] [CrossRef]
  34. Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Toronto, ON, Canada, 3–7 August 2025; Association for Computing Machinery: New York, NY, USA, 2016; pp. 785–794. [Google Scholar]
  35. Candelieri, A. A Gentle Introduction to Bayesian Optimization. In Proceedings of the 2021 Winter Simulation Conference (WSC), Phoenix, AZ, USA, 13–15 December 2021; pp. 1–16. [Google Scholar] [CrossRef]
  36. Mahmud Sujon, K.; Binti Hassan, R.; Tusnia Towshi, Z.; Othman, M.A.; Abdus Samad, M.; Choi, K. When to Use Standardization and Normalization: Empirical Evidence from Machine Learning Models and XAI. IEEE Access 2024, 12, 135300–135314. [Google Scholar] [CrossRef]
  37. Avdeef, A. Do You Know Your R2? ADMET DMPK 2021, 9, 69–74. [Google Scholar] [CrossRef]
  38. Wilcox, B.P. Transformative Ecosystem Change and Ecohydrology: Ushering in a New Era for Watershed Management. Ecohydrology 2010, 3, 126–130. [Google Scholar] [CrossRef]
  39. Von Haden, A.C.; Dornbush, M.E. Ecosystem Carbon Pools, Fluxes, and Balances Within Mature Tallgrass Prairie Restorations. Restor. Ecol. 2017, 25, 549–558. [Google Scholar] [CrossRef]
  40. Deluca, T.H.; Boisvenue, C. Boreal Forest Soil Carbon: Distribution, Function and Modelling. Forestry 2012, 85, 161–184. [Google Scholar] [CrossRef]
  41. Bradshaw, C.J.A.; Warkentin, I.G. Global Estimates of Boreal Forest Carbon Stocks and Flux. Glob. Planet. Chang. 2015, 128, 24–30. [Google Scholar] [CrossRef]
  42. Laganière, J.; Paré, D.; Bergeron, Y.; Chen, H.Y.H.; Brassard, B.W.; Cavard, X. Stability of Soil Carbon Stocks Varies with Forest Composition in the Canadian Boreal Biome. Ecosystems 2013, 16, 852–865. [Google Scholar] [CrossRef]
  43. Osuri, A.M.; Gopal, A.; Raman, T.R.S.; DeFries, R.; Cook-Patton, S.C.; Naeem, S. Greater Stability of Carbon Capture in Species-Rich Natural Forests Compared to Species-Poor Plantations. Environ. Res. Lett. 2020, 15, 034011. [Google Scholar] [CrossRef]
  44. Vesterdal, L.; Clarke, N.; Sigurdsson, B.D.; Gundersen, P. Do Tree Species Influence Soil Carbon Stocks in Temperate and Boreal Forests? For. Ecol. Manag. 2013, 309, 4–18. [Google Scholar] [CrossRef]
  45. Howard, E.A.; Gower, S.T.; Foley, J.A.; Kucharik, C.J. Effects of Logging on Carbon Dynamics of a Jack Pine Forest in Saskatchewan, Canada. Glob. Chang. Biol. 2004, 10, 1267–1284. [Google Scholar] [CrossRef]
  46. Liu, P.; Black, T.A.; Jassal, R.S.; Zha, T.; Nesic, Z.; Barr, A.G.; Helgason, W.D.; Jia, X.; Tian, Y.; Stephens, J.J.; et al. Divergent Long-term Trends and Interannual Variation in Ecosystem Resource Use Efficiencies of a Southern Boreal Old Black Spruce Forest 1999–2017. Glob. Chang. Biol. 2019, 25, 3056–3069. [Google Scholar] [CrossRef]
  47. Launiainen, S.; Katul, G.G.; Leppä, K.; Kolari, P.; Aslan, T.; Grönholm, T.; Korhonen, L.; Mammarella, I.; Vesala, T. Does Growing Atmospheric CO2 Explain Increasing Carbon Sink in a Boreal Coniferous Forest? Glob. Chang. Biol. 2022, 28, 2910–2929. [Google Scholar] [CrossRef]
Figure 1. Schematic of the methodology adopted in this study.
Figure 1. Schematic of the methodology adopted in this study.
Sustainability 17 05454 g001
Figure 2. Study area consisting of 12 management units within the boreal forest of Quebec, Canada.
Figure 2. Study area consisting of 12 management units within the boreal forest of Quebec, Canada.
Sustainability 17 05454 g002
Figure 3. Violin plots showcasing the variability in carbon pools and NEP predicted by GCBM for the period of 2023 to 2158.
Figure 3. Violin plots showcasing the variability in carbon pools and NEP predicted by GCBM for the period of 2023 to 2158.
Sustainability 17 05454 g003
Figure 4. R2 value comparison between XGBoost and polynomial regression for the carbon pool components.
Figure 4. R2 value comparison between XGBoost and polynomial regression for the carbon pool components.
Sustainability 17 05454 g004
Figure 5. RMSE value comparison between XGBoost and polynomial regression for the carbon pool components.
Figure 5. RMSE value comparison between XGBoost and polynomial regression for the carbon pool components.
Sustainability 17 05454 g005
Figure 6. MAE value comparison between XGBoost and polynomial regression for the carbon pool components.
Figure 6. MAE value comparison between XGBoost and polynomial regression for the carbon pool components.
Sustainability 17 05454 g006
Table 1. List of managed, unmanaged and total areas within the management units selected for the study.
Table 1. List of managed, unmanaged and total areas within the management units selected for the study.
SiteManagement UnitArea Designated for Forest ManagementArea Excluded from Forest ManagementTotal Area (Hectare)
Hectare %Hectare%
Nord-du-Québec2661298,20060201,47040499,670
266399,39034193,24066292,630
2664311,4408173,12019384,560
2665258,9608739,25013298,210
2666161,6008626,41014188,010
866340,40022141,76078182,160
8664102,2806554,35035156,630
8762265,9408738,31013304,250
8764249,6209026,23010275,850
Saguenay-Lac Saint-Jean2751819,67083170,92017990,590
Capitale-Nationale3771214,8107378,37027293,180
Côte Nord9751909,49073335,100271,244,590
Table 2. Hyperparameter search space used for Bayesian optimization of the XGBoost model.
Table 2. Hyperparameter search space used for Bayesian optimization of the XGBoost model.
HyperparameterRange
n_estimators100 to 500
max_depth3 to 15
learning_rate0.01 to 0.3 (log-uniform)
subsample0.6 to 1.0
colsample_bytree0.6 to 1.0
Table 3. Descriptive statistics of the variables for NEP estimation.
Table 3. Descriptive statistics of the variables for NEP estimation.
SNVariableDescriptionUnitMeanVariance
1s1_distanceTime since last source disturbance5-year period13.5609163.9410
2s1_disturbanceLast source disturbance5-year period3.645317.6253
3s2_distanceTime since penultimate source disturbance5-year period17.1715296.9884
4s2_disturbancePenultimate source disturbance5-year period9.251643.8843
5s3_distanceTime since source disturbance5-year period8.5932254.3553
6s3_disturbanceSource disturbance5-year period14.049629.2366
7source_ageAge of source development5-year period13.6820170.1568
8source_YV_G_GFIVolume of intolerant hardwood per hectarem3/ha11.5353624.1651
9source_YV_G_GFTVolume of tolerant hardwood per hectarem3/ha6.8136836.8290
10source_YV_G_GRVolume of softwood per hectarem3/ha43.44373144.2515
11source_YV_G_GFVolume of hardwood per hectarem3/ha18.34891626.6555
12target_distanceTime since disturbance at target5-year period0.93830.0579
13target_disturbanceMost recent disturbance at target5-year period15.126812.0259
14target_ageAge of target stand5-year period13.6366169.9266
15target_YV_G_GFITarget: intolerant hardwood volumem3/ha11.4001602.9208
16target_YV_G_GFTTarget: tolerant hardwood volumem3/ha6.7364824.7986
17target_YV_G_GRTarget: softwood volumem3/ha43.32583219.2981
18target_YV_G_GFTarget: hardwood volumem3/ha18.13651591.1015
19NEP *Net ecosystem productivitytonne CO2e ha −1 year −10.15401.5648
* Dependent variable.
Table 4. Descriptive statistics of the independent variables for carbon pool estimation.
Table 4. Descriptive statistics of the independent variables for carbon pool estimation.
SNVariableDescriptionUnitMeanVariance
1s1_distanceTime since last source disturbance5-year period13.5207164.7908
2s1_disturbanceLast source disturbance5-year period3.566617.3322
3s2_distanceTime since penultimate source disturbance5-year period17.7052292.3556
4s2_disturbancePenultimate source disturbance5-year period9.061543.7595
5s3_distanceTime since source disturbance5-year period9.3789259.8389
6s3_disturbanceSource disturbance5-year period13.923930.1450
7ageAge of source development5-year period13.6564169.9665
8YV_G_GFIVolume of intolerant hardwood per hectarem3/ha11.4687613.4764
9YV_G_GFTVolume of tolerant hardwood per hectarem3/ha6.7712830.0655
10YV_G_GRVolume of softwood per hectarem3/ha43.37993183.7146
11YV_G_GFVolume of hardwood per hectarem3/ha18.23991608.0060
12AG_Biomass_C *Aboveground biomasstonne ha−130.955774.117
13BG_Biomass_C *Belowground biomasstonne ha−18.00842.046
14Deadwood_C *Deadwoodtonne ha−111.32253.018
15Litter_C *Littertonne ha−143.506449.052
16Soil_C *Soil carbontonne ha−179.6391039.982
* Dependent variables.
Table 5. Best set of hyperparameters obtained after Bayesian optimization of the XGBoost model.
Table 5. Best set of hyperparameters obtained after Bayesian optimization of the XGBoost model.
TargetColsample_BytreeLearning RateMax DepthN EstimatorsSubsample
NEP1.00.1462154241.0
AGB1.00.1752155001.0
Belowground1.00.2987155001.0
Deadwood1.00.3000152991.0
Litter1.00.3000153460.8966
Soil carbon1.00.3000153460.8966
Table 6. Performance evaluation metrics comparison between XGBoost and the polynomial regression model for prediction of NEP.
Table 6. Performance evaluation metrics comparison between XGBoost and the polynomial regression model for prediction of NEP.
MetricXGBoostPolynomial Regression
R20.8830.678
RMSE0.4280.708
MAE0.1960.471
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Subedi, B.; Morneau, A.; LeBel, L.; Gautam, S.; Cyr, G.; Tremblay, R.; Carle, J.-F. An XGBoost-Based Machine Learning Approach to Simulate Carbon Metrics for Forest Harvest Planning. Sustainability 2025, 17, 5454. https://doi.org/10.3390/su17125454

AMA Style

Subedi B, Morneau A, LeBel L, Gautam S, Cyr G, Tremblay R, Carle J-F. An XGBoost-Based Machine Learning Approach to Simulate Carbon Metrics for Forest Harvest Planning. Sustainability. 2025; 17(12):5454. https://doi.org/10.3390/su17125454

Chicago/Turabian Style

Subedi, Bibek, Alexandre Morneau, Luc LeBel, Shuva Gautam, Guillaume Cyr, Roxanne Tremblay, and Jean-François Carle. 2025. "An XGBoost-Based Machine Learning Approach to Simulate Carbon Metrics for Forest Harvest Planning" Sustainability 17, no. 12: 5454. https://doi.org/10.3390/su17125454

APA Style

Subedi, B., Morneau, A., LeBel, L., Gautam, S., Cyr, G., Tremblay, R., & Carle, J.-F. (2025). An XGBoost-Based Machine Learning Approach to Simulate Carbon Metrics for Forest Harvest Planning. Sustainability, 17(12), 5454. https://doi.org/10.3390/su17125454

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop