Hybrid Machine Learning Models for Predicting Gross CO2e Balance in Polish Forest Stands: A Tool for Sustainable Forest Carbon Assessment in the Circular Economy
Abstract
1. Introduction
2. Materials and Methods
2.1. Research Design and Analytical Framework
2.2. Research Material
2.2.1. Habitat and Stand Variables
2.2.2. Biometric and Structural Variables
2.2.3. Carbon Variables
2.2.4. Operational and Circular Variables
2.2.5. Environmental Variables and Life Cycle Indicators
2.3. Data Preprocessing
2.4. Development of Machine Learning Algorithms
2.4.1. Neural Network and Hybrid Models
2.4.2. Advanced Gradient Models
2.4.3. Tree-Based and Forest-Based Models
2.5. The Optimization and Validation Process
2.6. Evaluation Criteria
2.7. Explainability of Variables Using SHAP
2.8. Software
3. Results and Discussion
3.1. Model Evaluation Metrics
3.2. Actual and Predicted Values
3.3. Overfitting Diagnostics
3.4. Learning Curve
3.5. Explainability of Variables by SHAP
4. Conclusions
- The gross CO2e balance can be predicted with high accuracy based on variables available in forestry practice;
- The most useful models were those capable of capturing non-linear relationships and interactions between variables;
- Simpler single-tree models proved more susceptible to overfitting and less stable on unseen data;
- SHAP interpretation confirmed the dominant role of structural features in shaping the gross CO2e;
- Reliable assessment of gross CO2e requires simultaneous consideration of biological, habitat, operational, and product components;
- Further development of models should include external validation, greater regional differentiation, and integration of remote sensing data and dynamic environmental indicators.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Cienciala, E.; Melichar, J. Forest Carbon Stock Development Following Extreme Drought-Induced Dieback of Coniferous Stands in Central Europe: A CBM-CFS3 Model Application. Carbon. Balance Manag. 2024, 19, 1. [Google Scholar] [CrossRef]
- Saponaro, V.; De Cáceres, M.; Dalmonech, D.; D’Andrea, E.; Vangi, E.; Collalti, A. Assessing the Combined Effects of Forest Management and Climate Change on Carbon and Water Fluxes in European Beech Forests. For. Ecosyst. 2025, 12, 100290. [Google Scholar] [CrossRef]
- Pretzsch, H.; Hilmers, T. Structural Diversity and Carbon Stock of Forest Stands: Tradeoff as Modified by Silvicultural Thinning. Eur. J. For. Res. 2025, 144, 775–796. [Google Scholar] [CrossRef]
- Araza, A.; de Bruin, S.; Herold, M.; Quegan, S.; Labriere, N.; Rodriguez-Veiga, P.; Avitabile, V.; Santoro, M.; Mitchard, E.T.A.; Ryan, C.M.; et al. A Comprehensive Framework for Assessing the Accuracy and Uncertainty of Global Above-Ground Biomass Maps. Remote Sens. Environ. 2022, 272, 112917. [Google Scholar] [CrossRef]
- Wei, X.; Zhao, J.; Hayes, D.J.; Daigneault, A.; Zhu, H. A Life Cycle and Product Type Based Estimator for Quantifying the Carbon Stored in Wood Products. Carbon. Balance Manag. 2023, 18, 1. [Google Scholar] [CrossRef] [PubMed]
- Bianchi, M.; Cascavilla, A.; Diaz, J.C.; Ladu, L.; Blazquez, B.P.; Pierre, M.; Staffieri, E.; Yilan, G. Circular Bioeconomy: A Review of Empirical Practices across Implementation Scales. J. Clean. Prod. 2024, 477, 143816. [Google Scholar] [CrossRef]
- Xie, S.H.; Kurz, W.A.; Smyth, C.; Xu, Z.; Roeser, D. Forest Products Circular Economy in an Export-Focused Jurisdiction: Can It Fill the Emission Reduction Gap? Clean. Circ. Bioeconomy 2024, 8, 100096. [Google Scholar] [CrossRef]
- Szichta, P.; Risse, M.; Weber-Blaschke, G.; Richter, K. Environmental Potentials from Wood Cascading: A Future-Oriented Consequential yet Dynamic Approach Considering Market and Time-Dependent Biogenic Carbon Effects for Selected Scenarios under German Conditions. Clean. Circ. Bioeconomy 2024, 9, 100103. [Google Scholar] [CrossRef]
- Wedajo, D.Y.; Cristescu, C.; Billore, S.; Adamopoulos, S. Carbon Impact of Wood-Based Products through Substitution: A Review of Assessment Aspects and Future Research Perspectives in Life Cycle Assessment. Carbon Manag. 2025, 16, 2536350. [Google Scholar] [CrossRef]
- Hoppen, M.; Baier, S.; Schinke, L.; Ziesak, M.; Schreiber, L.J.; Wahl, A.; Chen, J.; Bektas, A.R.; Heinze, F.; Schluse, M.; et al. Digital Technologies for Precise Carbon Balancing in Timber Procurement. Eur. J. For. Res. 2025, 144, 1043–1061. [Google Scholar] [CrossRef]
- Zhao, C.; Zhang, M.; Bai, J.; Wu, J.; Chang, I.-S. A Review of the Application of Machine Learning in Carbon Emission Assessment Studies: Prediction Optimization and Driving Factor Selection. Sci. Total Environ. 2025, 987, 179678. [Google Scholar] [CrossRef]
- Fasihi, M.; Portelli, B.; Cadez, L.; Tomao, A.; Falcon, A.; Alberti, G.; Serra, G. Assessing Ensemble Models for Carbon Sequestration and Storage Estimation in Forests Using Remote Sensing Data. Ecol. Inform. 2024, 83, 102828. [Google Scholar] [CrossRef]
- Dadgar, M.; Faramarzi, S.E. Assessing the Performance of Machine Learning Models for Predicting Soil Organic Carbon Variability across Diverse Landforms. Environ. Earth Sci. 2024, 83, 657. [Google Scholar] [CrossRef]
- Li, T.; Cui, L.; Kuhnert, M.; McLaren, T.I.; Pandey, R.; Liu, H.; Wang, W.; Xu, Z.; Xia, A.; Dalal, R.C.; et al. A Comprehensive Review of Soil Organic Carbon Estimates: Integrating Remote Sensing and Machine Learning Technologies. J. Soils Sediments 2024, 24, 3556–3571. [Google Scholar] [CrossRef]
- Li, Y.; Li, J.; Tan, J.; Ma, T.; Yan, X.; Chen, Z.; Li, K. Fine Resolution Mapping of Forest Soil Organic Carbon Based on Feature Selection and Machine Learning Algorithm. Remote Sens. 2025, 17, 2000. [Google Scholar] [CrossRef]
- Triantakonstantis, D.; Karakostas, A. Soil Organic Carbon Monitoring and Modelling via Machine Learning Methods Using Soil and Remote Sensing Data. Agriculture 2025, 15, 910. [Google Scholar] [CrossRef]
- Padalia, H.; Prakash, A.; Watham, T. Modelling Aboveground Biomass of a Multistage Managed Forest through Synergistic Use of Landsat-OLI, ALOS-2 L-Band SAR and GEDI Metrics. Ecol. Inform. 2023, 77, 102234. [Google Scholar] [CrossRef]
- Zheng, M.; Wen, Q.; Xu, F.; Wu, D. Regional Forest Carbon Stock Estimation Based on Multi-Source Data and Machine Learning Algorithms. Forests 2025, 16, 420. [Google Scholar] [CrossRef]
- Lech, P.; Hildebrand, R.; Małachowska, J. Forest Monitoring in Poland: Legal Foundations and Scope of the Programme. Folia For. Pol. 2025, 67, 35–45. [Google Scholar] [CrossRef]
- National Forest Inventory-BULiGL EN. Available online: https://buligl.pl/pl/web/buligl-en/w/national-forest-inventory (accessed on 10 April 2026).
- Integrating Digital Technologies for Comprehensive Carbon Accounting in Forests and Agroforestry Systems. Available online: https://openpub.fmach.it/handle/10449/88285 (accessed on 14 May 2026).
- Ali, A.; Russell, J.D. Accelerating the Transition to Wood-Based Circular Bioeconomy: A Literature Review of Current State, Trends, Opportunities, and Priorities for Future Research. Curr. For. Rep. 2025, 11, 23. [Google Scholar] [CrossRef]
- Grabska-Szwagrzyk, E.; Tiede, D.; Sudmanns, M.; Kozak, J. Map of Forest Tree Species for Poland Based on Sentinel-2 Data. Earth Syst. Sci. Data 2024, 16, 2877–2891. [Google Scholar] [CrossRef]
- Assefa, G.; Mengistu, T.; Getu, Z.; Zewdie, S. Training Manual on Forest Carbon Pools and Carbon Stock Assessment in the Context of Sustainable Forest Management and REDD+; Wondo Genet; Hawassa University: Hawassa, Ethiopia, 2013. [Google Scholar]
- Garba, M.; Usman, M.; Saidu, M. Enhancing employee attrition prediction: The impact of data preprocessing on machine learning model performance. FUDMA J. Sci. 2025, 9, 205–210. [Google Scholar] [CrossRef]
- Song, F.; Liu, H.; Ma, H.; Chen, X.; Wang, S.; Qin, T.; Liang, H.; Huang, D. AI Model Based on Diaphragm Ultrasound to Improve the Predictive Performance of Invasive Mechanical Ventilation Weaning: Prospective Cohort Study. JMIR Form. Res. 2025, 9, e72482. [Google Scholar] [CrossRef] [PubMed]
- Guo, S.; Wang, Z.; Liang, S. Calculation and Analysis of Load DC Magnetic Bias of Three-Phase Five-Column Transformer. High Volt. Appar. 2023, 59, 113–121+129. [Google Scholar] [CrossRef]
- Prediksi Dan Deteksi Bug Pada Software Menggunakan Pendekatan Machine Learning. Available online: https://bpika.uma.ac.id/2025/12/02/penerapan-machine-learning-dalam-software-engineering-untuk-prediksi-bug/ (accessed on 14 May 2026).
- Przybył, K.; Koszela, K. Applications MLP and Other Methods in Artificial Intelligence of Fruit and Vegetable in Convective and Spray Drying. Appl. Sci. 2023, 13, 2965. [Google Scholar] [CrossRef]
- Przybył, K.; Masewicz, Ł.; Koszela, K.; Duda, A.; Szychta, M.; Gierz, Ł. An MLP Artificial Neural Network for Detection of the Degree of Saccharification of Arabic Gum Used as a Carrier Agent of Raspberry Powders. In Thirteenth International Conference on Digital Image Processing (ICDIP 2021); SPIE: Bellingham, WA, USA, 2021; Volume 11878, pp. 605–609. [Google Scholar] [CrossRef]
- Orrù, P.F.; Zoccheddu, A.; Sassu, L.; Mattia, C.; Cozza, R.; Arena, S. Machine Learning Approach Using MLP and SVM Algorithms for the Fault Prediction of a Centrifugal Pump in the Oil and Gas Industry. Sustainability 2020, 12, 4776. [Google Scholar] [CrossRef]
- Dubey, S.R.; Singh, S.K.; Chaudhuri, B.B. Activation Functions in Deep Learning: A Comprehensive Survey and Benchmark. Neurocomputing 2022, 503, 92–108. [Google Scholar] [CrossRef]
- Ogundokun, R.O.; Maskeliunas, R.; Misra, S.; Damaševičius, R. Improved CNN Based on Batch Normalization and Adam Optimizer. In Computational Science and Its Applications; Springer International Publishing: Cham, Switzerland, 2022; LNCS; Volume 13381, pp. 593–604. [Google Scholar] [CrossRef]
- Yang, J.; Yang, G. Modified Convolutional Neural Network Based on Dropout and the Stochastic Gradient Descent Optimizer. Algorithms 2018, 11, 28. [Google Scholar] [CrossRef]
- Yulisa, A.; Park, S.H.; Choi, S.; Chairattanawat, C.; Hwang, S. Enhancement of Voting Regressor Algorithm on Predicting Total Ammonia Nitrogen Concentration in Fish Waste Anaerobiosis. Waste Biomass Valorization 2023, 14, 461–478. [Google Scholar] [CrossRef]
- Chen, S.; Zheng, W. RRMSE-Enhanced Weighted Voting Regressor for Improved Ensemble Regression. PLoS ONE 2025, 20, e0319515. [Google Scholar] [CrossRef]
- Ahmad, A.; Farooq, F.; Niewiadomski, P.; Ostrowski, K.; Akbar, A.; Aslam, F.; Alyousef, R. Prediction of Compressive Strength of Fly Ash Based Concrete Using Individual and Ensemble Algorithm. Materials 2021, 14, 794. [Google Scholar] [CrossRef] [PubMed]
- Nacar, E.N.; Erdebilli, B.; Eraslan, E. Toward Green Manufacturing: A Heuristic Hybrid Machine Learning Framework with PSO for Scrap Reduction. Sustainability 2025, 17, 9106. [Google Scholar] [CrossRef]
- Erbulut, Ö.G.; Çolak, Z. A Hybrid Machine Learning Approach for Housing Price Prediction: The Stacking Regressor Method. Int. J. Hous. Mark. Anal. 2026, 19, 942–970. [Google Scholar] [CrossRef]
- Madhukesh, J.K.; Madhu, J.; Fareeduddin, M.; Chandan, K.; Khan, U.; Al-Tref, G.A.; Hussain, S.M.; Nagaraja, K.V.; Kumar, R. Implementation of Stacking Regressor Model on the Flow Induced by TiO2-H2O and Ti6Al4V-H2O Nanofluid with Waste Discharge Concentration. ZAMM Z. Fur Angew. Math. Und Mech. 2024, 104, e202300796. [Google Scholar] [CrossRef]
- Aslam, F.; Alyousef, R.; Hassan Awan, H.; Faisal Javed, M. Forecasting the Self-Healing Capacity of Engineered Cementitious Composites Using Bagging Regressor and Stacking Regressor. Structures 2023, 54, 1717–1728. [Google Scholar] [CrossRef]
- Mahamat, A.A.; Boukar, M.M.; Leklou, N.; Celino, A.; Obianyo, I.I.; Bih, N.L.; Stanislas, T.T.; Savastanos, H. Decision Tree Regression vs. Gradient Boosting Regressor Models for the Prediction of Hygroscopic Properties of Borassus Fruit Fiber. Appl. Sci. 2024, 14, 7540. [Google Scholar] [CrossRef]
- Bagalkot, N.; Keprate, A.; Orderløkken, R. Combining Computational Fluid Dynamics and Gradient Boosting Regressor for Predicting Force Distribution on Horizontal Axis Wind Turbine. Vibration 2021, 4, 17. [Google Scholar] [CrossRef]
- Li, X.; Li, W.; Xu, Y. Human Age Prediction Based on DNA Methylation Using a Gradient Boosting Regressor. Genes 2018, 9, 424. [Google Scholar] [CrossRef]
- Sharma, H.; Harsora, H.; Ogunleye, B. An Optimal House Price Prediction Algorithm: XGBoost. Analytics 2024, 3, 30–45. [Google Scholar] [CrossRef]
- Niazkar, M.; Menapace, A.; Brentan, B.; Piraei, R.; Jimenez, D.; Dhawan, P.; Righetti, M. Applications of XGBoost in Water Resources Engineering: A Systematic Literature Review (Dec 2018–May 2023). Environ. Model. Softw. 2024, 174, 105971. [Google Scholar] [CrossRef]
- Zhang, P.; Jia, Y.; Shang, Y. Research and Application of XGBoost in Imbalanced Data. Int. J. Distrib. Sens. Netw. 2022, 18, 15501329221106935. [Google Scholar] [CrossRef]
- Hakkal, S.; Lahcen, A.A. XGBoost To Enhance Learner Performance Prediction. Comput. Educ. Artif. Intell. 2024, 7, 100254. [Google Scholar] [CrossRef]
- Oukhouya, H.; Kadiri, H.; El Himdi, K.; Guerbaz, R. Forecasting International Stock Market Trends: XGBoost, LSTM, LSTM-XGBoost, And Backtesting XGBoost Models. Stat. Optim. Inf. Comput. 2024, 12, 200–209. [Google Scholar] [CrossRef]
- Guan, X.; Xue, R.; He, Z.; Chen, S.; Chen, X. CatBoost-Optimized Hyperspectral Modeling for Accurate Prediction of Wood Dyeing Formulations. Forests 2025, 16, 1279. [Google Scholar] [CrossRef]
- Elmasry, N.H.; Elshaarawy, M.K. Hybrid Metaheuristic Optimized Catboost Models for Construction Cost Estimation of Concrete Solid Slabs. Sci. Rep. 2025, 15, 21612. [Google Scholar] [CrossRef]
- Hadianto, A.; Utomo, W.H. CatBoost Optimization Using Recursive Feature Elimination. J. Online Inform. 2024, 9, 169–178. [Google Scholar] [CrossRef]
- Hancock, J.T.; Khoshgoftaar, T.M. CatBoost for Big Data: An Interdisciplinary Review. J. Big Data 2020, 7, 94. [Google Scholar] [CrossRef]
- Przybył, K. Explainable AI: Machine Learning Interpretation in Blackcurrant Powders. Sensors 2024, 24, 3198. [Google Scholar] [CrossRef] [PubMed]
- Alghamdi, S.J. Classifying High Strength Concrete Mix Design Methods Using Decision Trees. Materials 2022, 15, 1950. [Google Scholar] [CrossRef] [PubMed]
- Przybył, K.; Walkowiak, K.; Kowalczewski, P.Ł. Efficiency of Identification of Blackcurrant Powders Using Classifier Ensembles. Foods 2024, 13, 697. [Google Scholar] [CrossRef] [PubMed]
- Hamzat, A.K.; Salman, U.T.; Murad, M.S.; Altay, O.; Bahceci, E.; Asmatulu, E.; Bakir, M.; Asmatulu, R. Predicting Flexural Strengths of Fiber-Reinforced Polymeric Composites. Hybrid. Adv. 2025, 8, 100385. [Google Scholar] [CrossRef]
- Mosavi, A.; Sajedi Hosseini, F.; Choubin, B.; Goodarzi, M.; Dineva, A.A.; Rafiei Sardooi, E. Ensemble Boosting and Bagging Based Machine Learning Models for Groundwater Potential Prediction. Water Resour. Manag. 2021, 35, 23–37. [Google Scholar] [CrossRef]
- Yılmaz, Y.; Nayır, S. Machine Learning Based Prediction of Compressive and Flexural Strength of Recycled Plastic Waste Aggregate Concrete. Structures 2024, 69, 107363. [Google Scholar] [CrossRef]
- Fuchs, M.; Krautenbacher, N. Minimization and Estimation of the Variance of Prediction Errors for Cross-Validation Designs. J. Stat. Theory Pract. 2016, 10, 420–443. [Google Scholar] [CrossRef]
- Bengio, Y.; Grandvalet, Y. No Unbiased Estimator of the Variance of K-Fold Cross-Validation. J. Mach. Learn. Res. 2004, 5, 1089–1105. [Google Scholar]
- Chicco, D.; Warrens, M.J.; Jurman, G. The Coefficient of Determination R-Squared Is More Informative than SMAPE, MAE, MAPE, MSE and RMSE in Regression Analysis Evaluation. PeerJ Comput. Sci. 2021, 7, e623. [Google Scholar] [CrossRef]
- Karch, J. Improving on Adjusted R-Squared. Collabra Psychol. 2020, 6, 45. [Google Scholar] [CrossRef]
- Weng, S.; Yu, S.; Guo, B.; Tang, P.; Liang, D. Non-Destructive Detection of Strawberry Quality Using Multi-Features of Hyperspectral Imaging and Multivariate Methods. Sensors 2020, 20, 3074. [Google Scholar] [CrossRef] [PubMed]
- Wang, Y.; Zhang, W.; Gao, R.; Jin, Z.; Wang, X. Recent Advances in the Application of Deep Learning Methods to Forestry. Wood Sci. Technol. 2021, 55, 1171–1202. [Google Scholar] [CrossRef]
- Nohara, Y.; Matsumoto, K.; Soejima, H.; Nakashima, N. Explanation of Machine Learning Models Using Shapley Additive Explanation and Application for Real Data in Hospital. Comput. Methods Programs Biomed. 2022, 214, 106584. [Google Scholar] [CrossRef] [PubMed]
- Wadoux, A.M.J.C.; Heuvelink, G.B.M.; de Bruin, S.; Brus, D.J. Spatial Cross-Validation Is Not the Right Way to Evaluate Map Accuracy. Ecol. Modell. 2021, 457, 109692. [Google Scholar] [CrossRef]
- Han, J.; Guzman, J.A.; Chu, M.L. Prediction of Gully Erosion Susceptibility through the Lens of the SHapley Additive ExPlanations (SHAP) Method Using a Stacking Ensemble Model. J. Environ. Manag. 2025, 383, 125478. [Google Scholar] [CrossRef]
- Vega García, M.; Aznarte, J.L. Shapley Additive Explanations for NO2 Forecasting. Ecol. Inform. 2020, 56, 101039. [Google Scholar] [CrossRef]
- M’hamdi, O.; Takács, S.; Palotás, G.; Ilahy, R.; Helyes, L.; Pék, Z. A Comparative Analysis of XGBoost and Neural Network Models for Predicting Some Tomato Fruit Quality Traits from Environmental and Meteorological Data. Plants 2024, 13, 746. [Google Scholar] [CrossRef]
- Martinović, M.; Dokic, K.; Pudić, D. Comparative Analysis of Machine Learning Models for Predicting Innovation Outcomes: An Applied AI Approach. Appl. Sci. 2025, 15, 3636. [Google Scholar] [CrossRef]
- Przybył, K.; Gawałek, J.; Koszela, K.; Przybył, J.; Rudzińska, M.; Gierz, Ł.; Domian, E. Neural Image Analysis and Electron Microscopy to Detect and Describe Selected Quality Factors of Fruit and Vegetable Spray-Dried Powders—Case Study: Chokeberry Powder. Sensors 2019, 19, 4413. [Google Scholar] [CrossRef]
- Yersaw, B.T.; Ebstu, E.T.; Areru, D.A.; Asres, L.A. Performance Evaluation of AquaCrop Model of Tomato under Stage Wise Deficit Drip Irrigation at Southern Ethiopia. Adv. Agric. 2024, 2024, 7201523. [Google Scholar] [CrossRef]
- Hong, T.; Kim, C.J.; Jeong, J.; Kim, J.; Koo, C.; Jeong, K.; Lee, M. Framework for Approaching the Minimum CV(RMSE) Using Energy Simulation and Optimization Tool. Proc. Energy Procedia 2016, 88, 265–270. [Google Scholar] [CrossRef]
- Wei, Y.; He, S.; Huang, P.; Duan, Y.; Dewancker, B.J.; Zhou, L. A Calibration Procedure for Simulation Models of Rural Residential Buildings Using Monthly Energy Bills: A Case Study in Zhejiang, China. Case Stud. Therm. Eng. 2025, 73, 106463. [Google Scholar] [CrossRef]
- Zhang, Y.; Khan, A.A.; Zhao, W.; Xiao, X. Optimization of Cultivation Strategies Through Crop Yield Prediction for Rice and Maize Using a Hybrid CatBoost-NSGA-II Model. Agriculture 2026, 16, 423. [Google Scholar] [CrossRef]
- Sarfarazi, S.; Mascolo, I.; Modano, M.; Guarracino, F. Application of Artificial Intelligence to Support Design and Analysis of Steel Structures. Metals 2025, 15, 408. [Google Scholar] [CrossRef]
- Machine Learning|Google for Developers. Available online: https://developers.google.com/machine-learning/decision-forests/overfitting-and-pruning?hl=pl (accessed on 8 April 2026).
- Hamad, K.; Alotaibi, E.; Zeiada, W.; Al-Khateeb, G.; Abu Dabous, S.; Omar, M.; Mantha, B.R.K.; Arab, M.G.; Merabtene, T. Explainable Artificial Intelligence Visions on Incident Duration Using EXtreme Gradient Boosting and SHapley Additive ExPlanations. Multimodal Transp. 2025, 4, 100209. [Google Scholar] [CrossRef]
- Lundberg, S.M.; Lee, S.I. A Unified Approach to Interpreting Model Predictions. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 1705.07874. [Google Scholar] [CrossRef]
- Ma, T.; Zhang, C.; Ji, L.; Zuo, Z.; Beckline, M.; Hu, Y.; Li, X.; Xiao, X. Development of Forest Aboveground Biomass Estimation, Its Problems and Future Solutions: A Review. Ecol. Indic. 2024, 159, 111653. [Google Scholar] [CrossRef]











| Group | Variable Name | Variable Status |
|---|---|---|
| Habitat and stand variables | Shortened forest site type | Input |
| Dominant tree stand type or mixed system | Input | |
| Management variant | Input | |
| Main species | Input | |
| Bonitation | Input | |
| Sample area [ha]—data from a forester | Input | |
| Macroregion | Input | |
| Voivodeship | Input | |
| Year of observation | Input | |
| Pine share [%] | Input | |
| Oak share [%] | Input | |
| Beech share [%] | Input | |
| Birch share [%] | Input | |
| Spruce share [%] | Input | |
| Fir share [%] | Input | |
| Alder share [%] | Input | |
| Biometric and structural variables | Age (years) | Input |
| Number of trees—estimated data [ha] | Input | |
| Height—estimated data [m] | Input | |
| Volume [m3/ha] | Not used | |
| Dry mass [t/ha] | Not used | |
| Deadwood [m3/ha] | Input | |
| Biomass left dry/ha | Input | |
| Share of remaining biomass [%] | Input | |
| Carbon variables | AGB [MgC/ha] | Not used |
| BGB [MgC/ha] | Not used | |
| C SOC 0–30 MgC/ha | Input | |
| SOC MgC/ha/year | Input | |
| C HWP MgC/ha | Input | |
| Gross CO2e t/ha | Output | |
| Net CO2e t/ha | Not used | |
| CI95 t/ha | Not used | |
| Operational and circular variables | Emissions from harvesting tCO2e/ha | Input |
| Emissions related to the transportation of wood or biomass tCO2e/ha | Input | |
| Emissions from processing tCO2e/ha | Input | |
| Avoided emissions tCO2e/ha | Input | |
| Harvesting [m3/ha/year]. | Input | |
| Environmental and life cycle variables | Annual precipitation [mm] | Input |
| Annual temperature [°C] | Input | |
| Water retention index | Input | |
| Biodiversity index of water retention | Input |
| Variable | Definition | Unit | System Boundary |
|---|---|---|---|
| Emissions from harvesting | Emissions generated during felling, pruning, cross-cutting, skidding, and forwarding | tCO2e/ha | Forest operations within stand boundary |
| Emissions from transport | Emissions related to timber or biomass transport from stand to first point of utilization | tCO2e/ha | From stand to yard/sawmill/plant gate |
| Emissions from processing | Emissions generated during industrial wood transformation | tCO2e/ha | Gate-to-gate or cradle-to-gate, depending on product pathway |
| Avoided emissions | Emissions avoided through substitution of high-emission materials or fuels | tCO2e/ha | Relative to defined reference scenario |
| Carbon stored in harvested wood products | Carbon retained in wood products after harvest | MgC/ha | Post-harvest product pool |
| LCA GWP | Global warming potential per unit volume of raw material or product | kgCO2e/m3 | According to adopted LCA scope |
| Model Name | Hyperparameter | Tested Values |
|---|---|---|
| Random Forest | max-depth | 5, 6, 7 |
| n_estimators | 200, 500 | |
| max_features | sqrt, log2, 0.3, 0.5 | |
| min_samples_leaf | 1, 2, 4, 8 | |
| min_samples_split | 2, 5, 10 | |
| bootstrap | True, False | |
| Decision Tree | max-depth | 3, 5, 10, 20 |
| min_samples_leaf | 2, 5, 10 | |
| min_samples_split | 1, 2, 4 | |
| ccp_alpha | 0.0, 0.01, 0.5 | |
| AdaBoost | n_estimators | 100, 200, 500 |
| learning_rate | 0.01, 0.05, 0.1, 0.5, 1.0 | |
| loss | linear, square, exponential | |
| estimator | DecisionTreeRegressor(max_depth = 3), DecisionTreeRegressor(max_depth = 5), DecisionTreeRegressor(max_depth = 7) | |
| Bagging | n_estimators | 50, 100, 200, 500 |
| max_samples | 0.5, 0.7, 0.8, 1.0 | |
| max_features | 0.5, 0.7, 0.8, 1.0 | |
| bootstrap | True, False | |
| bootstrap_features | True, False | |
| estimator | DecisionTreeRegressor(max_depth = 5), DecisionTreeRegressor(max_depth = 10) | |
| XGboost | n_estimators | 100, 200, 500 |
| max_depth | 3, 5, 7 | |
| learning_rate | 0.01, 0.05, 0.1 | |
| subsample | 0.7, 0.8, 1.0 | |
| colsample_bytree | 0.7, 0.8, 1.0 | |
| min_child_weight | 1, 3, 5, 10 | |
| reg_alpha | 0, 0.01, 0.1, 1.0 | |
| reg_lambda | 0.5, 1.0, 2.0, 5.0 | |
| gamma | 0, 0.1, 0.5, 1.0 | |
| MLPRegressor | hidden_layer_sizes | (50,),(100,) |
| activation | relu | |
| solver | sgd, adam | |
| alpha | 0.001, 0.01 | |
| learning_rate_init | 0.00005, 0.0001, 0.001 | |
| Gradient Boosting | n_estimators | 100, 200, 500 |
| max_depth | 3, 4, 5 | |
| learning_rate | 0.01, 0.05, 0.1 | |
| subsample | 0.7, 0.8, 1.0 | |
| n_iter_no_change | 10, 20, 50 | |
| Voting Regressor | estimators | reg1 = RandomForestRegressor(n_estimators = 100, random_state = 42) reg2 = BaggingRegressor(n_estimators = 100, random_state = 42) reg3 = xgb.XGBRegressor(random_state = 42) |
| CatBoost | iterations | 500 |
| depth | 4, 6, 8 | |
| learning_rate | 0.01, 0.05, 0.1 | |
| l2_leaf_reg | 1, 3, 5 | |
| Stacking Regressor | rf__n_estimators | 100, 200 |
| rf__max_depth | 3, 5 | |
| xgb__max_depth | 3, 5 | |
| xgb__learning_rate | 0.05, 0.1 |
| Model | R-Squared (R2) | MSE | MAE | RMSE | CV(RMSE) |
|---|---|---|---|---|---|
| Random Forest | 0.892 | 48,047.770 | 126.606 | 219.198 | 29.767 |
| Decision Tree | 0.806 | 86,737.838 | 184.000 | 294.513 | 39.994 |
| AdaBoost | 0.912 | 39,114.962 | 116.812 | 197.775 | 26.857 |
| Bagging | 0.880 | 53,480.906 | 123.644 | 231.259 | 31.405 |
| XGboost | 0.933 | 29,985.796 | 100.751 | 173.164 | 23.515 |
| MLPRegressor | 0.940 | 26,721.980 | 104.901 | 163.469 | 22.199 |
| Gradient Boosting | 0.936 | 28,713.472 | 96.430 | 169.451 | 23.011 |
| Voting Regressor | 0.894 | 47,486.900 | 114.352 | 217.915 | 29.592 |
| CatBoost | 0.948 | 23,177.672 | 91.913 | 152.242 | 20.674 |
| Stacking Regressor | 0.934 | 29,462.084 | 105.886 | 171.645 | 23.309 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Przybył, K.; Pilarska, A.A.; Pilarski, K. Hybrid Machine Learning Models for Predicting Gross CO2e Balance in Polish Forest Stands: A Tool for Sustainable Forest Carbon Assessment in the Circular Economy. Sustainability 2026, 18, 6366. https://doi.org/10.3390/su18126366
Przybył K, Pilarska AA, Pilarski K. Hybrid Machine Learning Models for Predicting Gross CO2e Balance in Polish Forest Stands: A Tool for Sustainable Forest Carbon Assessment in the Circular Economy. Sustainability. 2026; 18(12):6366. https://doi.org/10.3390/su18126366
Chicago/Turabian StylePrzybył, Krzysztof, Agnieszka A. Pilarska, and Krzysztof Pilarski. 2026. "Hybrid Machine Learning Models for Predicting Gross CO2e Balance in Polish Forest Stands: A Tool for Sustainable Forest Carbon Assessment in the Circular Economy" Sustainability 18, no. 12: 6366. https://doi.org/10.3390/su18126366
APA StylePrzybył, K., Pilarska, A. A., & Pilarski, K. (2026). Hybrid Machine Learning Models for Predicting Gross CO2e Balance in Polish Forest Stands: A Tool for Sustainable Forest Carbon Assessment in the Circular Economy. Sustainability, 18(12), 6366. https://doi.org/10.3390/su18126366
