Comparing Machine Learning and Statistical Models for Remote Sensing-Based Forest Aboveground Biomass Estimations
Abstract
1. Introduction
2. Data and Method
2.1. Study Area
2.2. Data
2.2.1. Forest Inventory and Analysis (FIA) Data
2.2.2. Aboveground Biomass Calculation
2.2.3. Remote Sensing Variables
2.3. Model Training and Testing
2.3.1. Random Forest Regression
2.3.2. Support Vector Machine
2.3.3. Recursive Feature Elimination
2.3.4. Multiple Linear Regression
2.3.5. Treatments for Multicollinearity and Selecting Variables for MLR
2.3.6. Comparison of Model Performance
2.3.7. Building the Multiple Linear Regression Ensemble
3. Results
3.1. Identification of Optimized Random Forest Regression
3.2. Variable Selection and Support Vector Machine Model
3.3. Treating Multicollinearity of Explanatory Variables
3.4. Stepwise Regression
3.5. Best Subset Regression Method
3.6. Selecting the Best Regression Model
3.7. Linear Regression Ensemble
3.8. Model Comparison
4. Discussion
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Pan, Y.; Birdsey, R.A.; Fang, J.; Houghton, R.; Kauppi, P.E.; Kurz, W.A.; Phillips, O.L.; Shvidenko, A.; Lewis, S.L.; Canadell, J.G.; et al. A large and persistent carbon sink in the world’s forests. Science 2011, 333, 988–993. [Google Scholar] [CrossRef]
- Forest Service, U.S. Department of Agriculture. Forest Carbon Status and Trends; Forest Service, U.S. Department of Agriculture: Fort Collins, CO, USA, 2021. Available online: https://research.fs.usda.gov/understory/forest-carbon-status-and-trends (accessed on 17 September 2024).
- Johnson, K.D.; Birdsey, R.; Cole, J.; Swatantran, A.; O’Neil-Dunne, J.; Dubayah, R.; Lister, A. Integrating LIDAR and forest inventories to fill the trees outside forests data gap. Environ. Monit. Assess. 2015, 187, 623. [Google Scholar] [CrossRef] [PubMed]
- Raihan, A.; Begum, R.A.; Mohd Said, M.N.; Abdullah, S.M.S. A review of emission reduction potential and cost savings through forest carbon sequestration. Asian J. Water Environ. Pollut. 2019, 16, 1–7. [Google Scholar] [CrossRef]
- Tompalski, P.; Wulder, M.A.; White, J.C.; Hermosilla, T.; Riofrío, J.; Kurz, W.A. Developing aboveground biomass yield curves for dominant boreal tree species from time series remote sensing data. For. Ecol. Manag. 2024, 561, 121894. [Google Scholar] [CrossRef]
- Goodale, C.L.; Apps, M.J.; Birdsey, R.A.; Field, C.B.; Heath, L.S.; Houghton, R.A.; Jenkins, J.C.; Kohlmaier, G.H.; Kurz, W.; Liu, S.; et al. Forest carbon sinks in the Northern Hemisphere. Ecol. Appl. 2002, 12, 891–899. [Google Scholar] [CrossRef]
- Houghton, R.A. Aboveground forest biomass and the global carbon balance. Glob. Change Biol. 2005, 11, 945–958. [Google Scholar] [CrossRef]
- Johnson, L.K.; Mahoney, M.J.; Bevilacqua, E.; Stehman, S.V.; Domke, G.M.; Beier, C.M. Fine-resolution landscape-scale biomass mapping using a spatiotemporal patchwork of LiDAR coverages. Int. J. Appl. Earth Obs. Geoinf. 2022, 114, 103059. [Google Scholar] [CrossRef]
- Kershaw, J.A., Jr.; Ducey, M.J.; Beers, T.W.; Husch, B. Forest Mensuration; John Wiley & Sons: Hoboken, NJ, USA, 2016. [Google Scholar]
- Torre-Tojal, L.; Bastarrika, A.; Boyano, A.; Lopez-Guede, J.M.; Grana, M. Above-ground biomass estimation from LiDAR data using random forest algorithms. J. Comput. Sci. 2022, 58, 101517. [Google Scholar] [CrossRef]
- Chojnacky, D.C.; Heath, L.S.; Jenkins, J.C. Updated generalized biomass equations for North American tree species. Forestry 2014, 87, 129–151. [Google Scholar] [CrossRef]
- Jenkins, J.C.; Chojnacky, D.C.; Heath, L.S.; Birdsey, R.A. National-scale biomass estimators for United States tree species. For. Sci. 2003, 49, 12–35. [Google Scholar] [CrossRef]
- Blackard, J.A.; Finco, M.V.; Helmer, E.H.; Holden, G.R.; Hoppus, M.L.; Jacobs, D.M.; Lister, A.J.; Moisen, G.G.; Nelson, M.D.; Riemann, R.; et al. Mapping US forest biomass using nationwide forest inventory data and moderate resolution information. Remote Sens. Environ. 2008, 112, 1658–1677. [Google Scholar] [CrossRef]
- Sheridan, R.D.; Popescu, S.C.; Gatziolis, D.; Morgan, C.L.; Ku, N.W. Modeling forest aboveground biomass and volume using airborne LiDAR metrics and forest inventory and analysis data in the Pacific Northwest. Remote Sens. 2014, 7, 229–255. [Google Scholar] [CrossRef]
- Hudak, A.T.; Fekety, P.A.; Kane, V.R.; Kennedy, R.E.; Filippelli, S.K.; Falkowski, M.J.; Tinkham, W.T.; Smith, A.M.; Crookston, N.L.; Domke, G.M.; et al. A carbon monitoring system for mapping regional, annual aboveground biomass across the northwestern USA. Environ. Res. Lett. 2020, 15, 095003. [Google Scholar] [CrossRef]
- Tang, H.; Ma, L.; Lister, A.J.; O’Neil-Dunne, J.; Lu, J.; Lamb, R.; Dubayah, R.O.; Hurtt, G.C. Lidar Derived Biomass, Canopy Height, and Cover for New England Region, USA, 2015; ORNL DAAC: Oak Ridge, TN, USA, 2021. [Google Scholar]
- Johnson, K.D.; Birdsey, R.; Finley, A.O.; Swantaran, A.; Dubayah, R.; Wayson, C.; Riemann, R. Integrating Forest inventory and analysis data into a LIDAR-based carbon monitoring system. Carbon Balance Manag. 2014, 9, 3. [Google Scholar] [CrossRef]
- Hayashi, M.; Saigusa, N.; Yamagata, Y.; Hirano, T. Regional forest biomass estimation using ICESat/GLAS spaceborne LiDAR over Borneo. Carbon Manag. 2015, 6, 19–33. [Google Scholar] [CrossRef]
- Zheng, D.; Heath, L.S.; Ducey, M.J. Spatial distribution of forest aboveground biomass estimated from remote sensing and forest inventory data in New England, USA. J. Appl. Remote Sens. 2008, 2, 021502. [Google Scholar]
- Csillik, O.; Kumar, P.; Mascaro, J.; O’Shea, T.; Asner, G.P. Monitoring tropical forest carbon stocks and emissions using Planet satellite data. Sci. Rep. 2019, 9, 17831. [Google Scholar] [CrossRef]
- Naik, P.; Dalponte, M.; Bruzzone, L. Prediction of forest aboveground biomass using multitemporal multispectral remote sensing data. Remote Sens. 2021, 13, 1282. [Google Scholar] [CrossRef]
- Nandy, S.; Srinet, R.; Padalia, H. Mapping forest height and aboveground biomass by integrating ICESat-2, Sentinel-1 and Sentinel-2 data using Random Forest algorithm in northwest Himalayan foothills of India. Geophys. Res. Lett. 2021, 48, e2021GL093799. [Google Scholar] [CrossRef]
- Urbazaev, M.; Thiel, C.; Cremer, F.; Dubayah, R.; Migliavacca, M.; Reichstein, M.; Schmullius, C. Estimation of forest aboveground biomass and uncertainties by integration of field measurements, airborne LiDAR, and SAR and optical satellite data in Mexico. Carbon Balance Manag. 2018, 13, 5. [Google Scholar] [CrossRef]
- Mancini, F.; Castagnetti, C.; Rossi, P.; Dubbini, M.; Fazio, N.L.; Perrotti, M.; Lollino, P. An integrated procedure to assess the stability of coastal rocky cliffs: From UAV close-range photogrammetry to geomechanical finite element modeling. Remote Sens. 2017, 9, 1235. [Google Scholar] [CrossRef]
- Gao, Y.; Lu, D.; Li, G.; Wang, G.; Chen, Q.; Liu, L.; Li, D. Comparative analysis of modeling algorithms for forest aboveground biomass estimation in a subtropical region. Remote Sens. 2018, 10, 627. [Google Scholar] [CrossRef]
- Li, C.; Li, Y.; Li, M. Improving forest aboveground biomass (AGB) estimation by incorporating crown density and using Landsat 8 OLI images of a subtropical forest in Western Hunan in Central China. Forests 2019, 10, 104. [Google Scholar] [CrossRef]
- Shrestha, N. Detecting multicollinearity in regression analysis. Am. J. Appl. Math. Stat. 2020, 8, 39–42. [Google Scholar] [CrossRef]
- Akinwande, M.O.; Dikko, H.G.; Samson, A. Variance inflation factor: As a condition for the inclusion of suppressor variable (s) in regression analysis. Open J. Stat. 2015, 5, 754. [Google Scholar] [CrossRef]
- Chen, H.; Qin, Z.; Zhai, D.L.; Ou, G.; Li, X.; Zhao, G.; Fan, J.; Zhao, C.; Xu, H. Mapping Forest Aboveground Biomass with MODIS and Fengyun-3C VIRR Imageries in Yunnan Province, Southwest China Using Linear Regression, K-Nearest Neighbor and Random Forest. Remote Sens. 2022, 14, 5456. [Google Scholar] [CrossRef]
- Zhang, Z. Variable selection with stepwise and best subset approaches. Ann. Transl. Med. 2016, 4, 136. [Google Scholar] [CrossRef] [PubMed]
- Osborne, J.W.; Waters, E. Four assumptions of multiple regression that researchers should always test. Pract. Assess. Res. Eval. 2019, 8, 2. [Google Scholar]
- Chan, J.Y.L.; Leow, S.M.H.; Bea, K.T.; Cheng, W.K.; Phoong, S.W.; Hong, Z.W.; Chen, Y.L. Mitigating the multicollinearity problem and its machine learning approach: A review. Mathematics 2022, 10, 1283. [Google Scholar] [CrossRef]
- Bifet, A.; de Francisci Morales, G.; Read, J.; Holmes, G.; Pfahringer, B. Efficient online evaluation of big data stream classifiers. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, NSW, Australia, 10–13 August 2015; ACM: New York, NY, USA, 2015; pp. 59–68. [Google Scholar]
- Gomes, H.M.; Bifet, A.; Read, J.; Barddal, J.P.; Enembreck, F.; Pfharinger, B.; Holmes, G.; Abdessalem, T. Adaptive random forests for evolving data stream classification. Mach. Learn. 2017, 106, 1469–1495. [Google Scholar] [CrossRef]
- Guo, Y.; Li, Z.; Zhang, X.; Chen, E.X.; Bai, L.; Tian, X.; He, Q.; Feng, Q.; Li, W. Optimal support vector machines for forest above-ground biomass estimation from multisource remote sensing data. In Proceedings of the 2012 IEEE International Geoscience and Remote Sensing Symposium, Munich, Germany, 22–27 July 2012; IEEE: Piscataway, NJ, USA, 2012; pp. 6388–6391. [Google Scholar] [CrossRef]
- Sivasankar, T.; Lone, J.M.; Sarma, K.K.; Qadir, A.; Raju, P.L.N. Estimation of above ground biomass using support vector. Vietnam. J. Earth Sci. 2013, 41, 95–104. [Google Scholar] [CrossRef]
- Li, T.; Jiang, Z.; Le Treut, H.; Li, L.; Zhao, L.; Ge, L. Machine learning to optimize climate projection over China with multi-model ensemble simulations. Environ. Res. Lett. 2021, 16, 094028. [Google Scholar] [CrossRef]
- Wolpert, D.H. Stacked generalization. Neural Netw. 1992, 5, 241–259. [Google Scholar] [CrossRef]
- Dormann, C.F.; Calabrese, J.M.; Guillera-Arroita, G.; Matechou, E.; Bahn, V.; Bartoń, K.; Beale, C.M.; Ciuti, S.; Elith, J.; Gerstner, K.; et al. Model averaging in ecology: A review of Bayesian, information-theoretic, and tactical approaches for predictive inference. Ecol. Monogr. 2018, 88, 485–504. [Google Scholar] [CrossRef]
- Butler, B.J. Forests of Connecticut, 2016; Resource Update FS-130; Northern Research Station, Forest Service, U.S. Department of Agriculture: Newtown Square, PA, USA, 2017; 4p. [Google Scholar] [CrossRef]
- Connecticut Environmental Conditions Online. Orthophotography and Lidar Download. 2016. Available online: https://maps.cteco.uconn.edu/data/flight2016/ (accessed on 15 February 2022).
- Connecticut Department of Energy and Environmental Protection (DEEP). Connecticut 2016 High Resolution Land Cover (NOAA CCAP); Connecticut Department of Energy and Environmental Protection (DEEP): Hartford, CT, USA, 2016. Available online: https://geodata.ct.gov/maps/CTECO::ct-2016-high-res-land-cover-noaa-ccap/about (accessed on 18 August 2022).
- Burrill, E.A.; DiTommaso, A.M.; Turner, J.A.; Pugh, S.A.; Menlove, J.; Christiansen, G.; Perry, C.J.; Conkling, B.L. The Forest Inventory and Analysis Database: Database Description and User Guide Version 9.0.1 for Phase 2; Forest Service, U.S. Department of Agriculture: Fort Collins, CO, USA, 2021; 1026p. [Google Scholar]
- Michael, H.; Lister, A. The status of accurately locating forest inventory and analysis plots using the Global Positioning System. In Proceedings of the Seventh Annual Forest Inventory and Analysis Symposium, Portland, OR, USA, 3–6 October 2005; Volume 36, p. 179184. [Google Scholar]
- Duncanson, L.; Huang, W.; Johnson, K.; Swatantran, A.; McRoberts, R.E.; Dubayah, R. Implications of allometric model selection for county-level biomass mapping. Carbon Balance Manag. 2017, 12, 18. [Google Scholar] [CrossRef]
- Woudenberg, S.W.; Conkling, B.L.; O’Connell, B.M.; LaPoint, E.B.; Turner, J.A.; Waddell, K.L. The Forest Inventory and Analysis Database: Database Description and User’s Manual Version 4.0 for Phase 2; Rocky Mountain Research Station, Forest Service, United States Department of Agriculture: Fort Collins, CO, USA, 2010; p. 336. [Google Scholar] [CrossRef]
- Fang, G.; Yu, H.; Fang, L.; Zheng, X. Synergistic Use of Sentinel-1 and Sentinel-2 Based on Different Preprocessing for Predicting Forest Aboveground Biomass. Forests 2023, 14, 1615. [Google Scholar] [CrossRef]
- Tamiminia, H.; Salehi, B.; Mahdianpari, M.; Goulden, T. State-wide forest canopy height and aboveground biomass map for New York with 10 m resolution, integrating GEDI, Sentinel-1, and Sentinel-2 data. Ecol. Inform. 2024, 79, 102404. [Google Scholar] [CrossRef]
- Dewi, C.; Chen, R.C. Random forest and support vector machine on features selection for regression analysis. Int. J. Innov. Comput. Inf. Control 2019, 15, 2027–2037. [Google Scholar]
- Zhang, L.; Zheng, X.; Pang, Q.; Zhou, W. Fast Gaussian kernel support vector machine recursive feature elimination algorithm. Appl. Intell. 2021, 51, 9001–9014. [Google Scholar] [CrossRef]
- Zhu, L.; O’Dwyer, J.P.; Chang, V.S.; Granda, C.B.; Holtzapple, M.T. Multiple linear regression model for predicting biomass digestibility from structural features. Bioresour. Technol. 2010, 101, 4971–4979. [Google Scholar] [CrossRef]
- Li, Y.; Andersen, H.E.; McGaughey, R. A comparison of statistical methods for estimating forest biomass from light detection and ranging data. West. J. Appl. For. 2008, 23, 223–231. [Google Scholar] [CrossRef]
- Yamashita, T.; Yamashita, K.; Kamimura, R. A stepwise AIC method for variable selection in linear regression. Commun. Stat. Theory Methods 2007, 36, 2395–2403. [Google Scholar] [CrossRef]
- Lamahewage, S.H.G.; Witharana, C.; Riemann, R.; Fahey, R.; Worthley, T. Aboveground biomass estimation using multimodal remote sensing observations and machine learning in mixed temperate forest. Sci. Rep. 2025, 15, 31120. [Google Scholar] [CrossRef]
- He, Q.; Chen, E.; An, R.; Li, Y. Above-ground biomass and biomass components estimation using LiDAR data in a coniferous forest. Forests 2013, 4, 984–1002. [Google Scholar] [CrossRef]
- Véga, C.; Vepakomma, U.; Morel, J.; Bader, J.L.; Rajashekar, G.; Jha, C.S.; Ferêt, J.; Proisy, C.; Pélissier, R.; Dadhwal, V.K. Aboveground-biomass estimation of a complex tropical forest in India using LiDAR. Remote Sens. 2015, 7, 10607–10625. [Google Scholar] [CrossRef]
- Ehlers, D.; Wang, C.; Coulston, J.; Zhang, Y.; Pavelsky, T.; Frankenberg, E.; Woodcock, C.; Song, C. Mapping Forest Aboveground Biomass Using Multisource Remotely Sensed Data. Remote Sens. 2022, 14, 1115. [Google Scholar] [CrossRef]
- Qin, H.; Zhou, W.; Yao, Y.; Wang, W. Estimating aboveground carbon stock at the scale of individual trees in subtropical forests using UAV LiDAR and hyperspectral data. Remote Sens. 2021, 13, 4969. [Google Scholar] [CrossRef]
- Haralick, R.M.; Shanmugam, K.; Dinstein, I.H. Textural features for image classification. IEEE Trans. Syst. Man Cybern. 1973, SMC–3, 610–621. [Google Scholar] [CrossRef]
- Bright, B.C.; Hicke, J.A.; Hudak, A.T. Estimating aboveground carbon stocks of a forest affected by mountain pine beetle in Idaho using lidar and multispectral imagery. Remote Sens. Environ. 2012, 124, 270–281. [Google Scholar] [CrossRef]
- Shao, Z.; Zhang, L.; Wang, L. Stacked Sparse Autoencoder Modeling Using the Synergy of Airborne LiDAR and Satellite Optical and SAR Data to Map Forest Above-Ground Biomass. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2017, 10, 5569–5582. [Google Scholar] [CrossRef]
- Pandit, S.; Tsuyuki, S.; Dube, T. Estimating above-ground biomass in sub-tropical buffer zone community forests, Nepal, using Sentinel-2 data. Remote Sens. 2018, 10, 601. [Google Scholar] [CrossRef]
- Moradi, F.; Darvishsefat, A.A.; Pourrahmati, M.R.; Deljouei, A.; Borz, S.A. Estimating aboveground biomass in dense Hyrcanian forests by the use of Sentinel-2 data. Forests 2022, 13, 104. [Google Scholar] [CrossRef]
- Shi, Y.; Wang, Z.; Liu, L.; Li, C.; Peng, D.; Xiao, P. Improving Estimation of Woody Aboveground Biomass of Sparse Mixed Forest over Dryland Ecosystem by Combining Landsat-8, GaoFen-2, and UAV Imagery. Remote Sens. 2021, 13, 4859. [Google Scholar] [CrossRef]
- Huete, A.R. A soil-adjusted vegetation index (SAVI). Remote Sens. Environ. 1988, 25, 295–309. [Google Scholar] [CrossRef]
- Mohammadpour, P.; Viegas, D.X.; Viegas, C. Vegetation mapping with random forest using Sentinel-2 and GLCM texture feature—A case study for Lousã region, Portugal. Remote Sens. 2022, 14, 4585. [Google Scholar] [CrossRef]
- De Castilho, C.V.; Magnusson, W.E.; de Araújo, R.N.O.; Luizao, R.C.; Luizao, F.J.; Lima, A.P.; Higuchi, N. Variation in aboveground tree live biomass in a central Amazonian Forest: Effects of soil and topography. For. Ecol. Manag. 2006, 234, 85–96. [Google Scholar] [CrossRef]
- Parent, J.R.; Gold, A.J.; Vogler, E.; Lowder, K.A. Guiding decisions on the future of dams: A GIS database characterizing ecological and social considerations of Dam decisions. J. Environ. Manag. 2024, 351, 119683. [Google Scholar] [CrossRef] [PubMed]
- Næsset, E. Estimating above-ground biomass in young forests with airborne laser scanning. Int. J. Remote Sens. 2011, 32, 473–501. [Google Scholar] [CrossRef]
- Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Han, S.; Williamson, B.D.; Fong, Y. Improving random forest predictions in small datasets from two-phase sampling designs. BMC Med. Inform. Decis. Mak. 2021, 21, 322. [Google Scholar] [CrossRef]
- Gregorutti, B.; Michel, B.; Saint-Pierre, P. Correlation and variable importance in random forests. Stat. Comput. 2017, 27, 659–678. [Google Scholar] [CrossRef]
- Breiman, L. Classification and Regression Trees; Routledge: London, UK, 2017. [Google Scholar]
- Qiu, A.; Yang, Y.; Wang, D.; Xu, S.; Wang, X. Exploring parameter selection for carbon monitoring based on Landsat-8 imagery of the aboveground forest biomass on Mount Tai. Eur. J. Remote Sens. 2020, 53 (Suppl. S1), 4–15. [Google Scholar] [CrossRef]
- Patle, A.; Chouhan, D.S. SVM kernel functions for classification. In Proceedings of the 2013 International Conference on Advances in Technology and Engineering (ICATE), Mumbai, India, 23–25 January 2013; IEEE: Piscataway, NJ, USA, 2013; pp. 1–9. [Google Scholar] [CrossRef]
- Rasel, S.M.; Chang, H.C.; Ralph, T.J.; Saintilan, N.; Diti, I.J. Application of feature selection methods and machine learning algorithms for saltmarsh biomass estimation using Worldview-2 imagery. Geocarto Int. 2021, 36, 1075–1099. [Google Scholar] [CrossRef]
- Marill, K.A. Advanced statistics: Linear regression, part II: Multiple linear regression. Acad. Emerg. Med. 2004, 11, 94–102. [Google Scholar] [CrossRef]
- Ott, R.L.; Longnecker, M.T. An Introduction to Statistical Methods and Data Analysis; Cengage Learning: Boston, MA, USA, 2015. [Google Scholar]
- Hofmann, M.; Gatu, C.; Kontoghiorghes, E.J.; Colubi, A.; Zeileis, A. Lmsubsets: Exact variable-subset selection in linear regression for R. J. Stat. Softw. 2020, 93, 1–21. [Google Scholar] [CrossRef]
- Fisher, R.; Wilson, S.K.; Sin, T.M.; Lee, A.C.; Langlois, T.J. A simple function for full-subsets multiple regression in ecology with R. Ecol. Evol. 2018, 8, 6104–6113. [Google Scholar] [CrossRef]
- Lumley, T.; Lumley, M.T. Package ‘Leaps’. Regression Subset Selection. Thomas Lumley Based on Fortran Code by Alan Miller. 2013. Available online: http://CRAN.R-project.org/package=leaps (accessed on 18 March 2018).
- Liu, K.; Wang, J.; Zeng, W.; Song, J. Comparison and evaluation of three methods for estimating forest above ground biomass using TM and GLAS data. Remote Sens. 2017, 9, 341. [Google Scholar] [CrossRef]
- Powell, S.L.; Cohen, W.B.; Healey, S.P.; Kennedy, R.E.; Moisen, G.G.; Pierce, K.B.; Ohmann, J.L. Quantification of live aboveground forest biomass dynamics with Landsat time-series and field inventory data: A comparison of empirical modeling approaches. Remote Sens. Environ. 2010, 114, 1053–1068. [Google Scholar] [CrossRef]
- Fassnacht, F.E.; Hartig, F.; Latifi, H.; Berger, C.; Hernández, J.; Corvalán, P.; Koch, B. Importance of sample size, data type and prediction method for remote sensing-based estimations of aboveground forest biomass. Remote Sens. Environ. 2014, 154, 102–114. [Google Scholar] [CrossRef]
- Qiu, X.; Fu, D.; Fu, Z. Feature selection of atmospheric corrosion data based on SVM-RFE method. Adv. Comput. Sci. Its Appl. 2013, 2, 443–448. [Google Scholar]
- Chakrabarti, A.; Ghosh, J.K. AIC, BIC and recent advances in model selection. Philos. Stat. 2011, 7, 583–605. [Google Scholar] [CrossRef]
- Hodkinson, I.D.; Coulson, S.J.; Webb, N.R.; Block, W.; Strathdee, A.T.; Bale, J.S.; Worland, M.R. Temperature and the biomass of flying midges (Diptera: Chironomidae) in the high Arctic. Oikos 1996, 75, 241–248. [Google Scholar] [CrossRef]
- Morgan, J.A.; Tatar, J.F. Calculation of the residual sum of squares for all possible regressions. Technometrics 1972, 14, 317–325. [Google Scholar] [CrossRef]
- Bui, Q.T.; Pham, Q.T.; Pham, V.M.; Tran, V.T.; Nguyen, D.H.; Nguyen, Q.H.; Nguyen, H.D.; Do, N.T.; Vu, V.M. Hybrid machine learning models for aboveground biomass estimations. Ecol. Inform. 2024, 79, 102421. [Google Scholar] [CrossRef]
- Luo, P.; Liao, J.; Shen, G. Combining spectral and texture features for estimating leaf area index and biomass of maize using Sentinel-1/2, and Landsat-8 data. IEEE Access 2020, 8, 53614–53626. [Google Scholar] [CrossRef]
- Hu, T.; Su, Y.; Xue, B.; Liu, J.; Zhao, X.; Fang, J.; Guo, Q. Mapping global forest aboveground biomass with spaceborne LiDAR, optical imagery, and forest inventory data. Remote Sens. 2016, 8, 565. [Google Scholar] [CrossRef]
- Genuer, R.; Poggi, J.M.; Tuleau-Malot, C. Variable selection using random forests. Pattern Recognit. Lett. 2010, 31, 2225–2236. [Google Scholar] [CrossRef]
- Yıldırım, H. The multicollinearity effect on the performance of machine learning algorithms: Case examples in healthcare modelling. Acad. Platf. J. Eng. Smart Syst. 2024, 12, 68–80. [Google Scholar] [CrossRef]
- Barzani, A.R.; Pahlavani, P.; Ghorbanzadeh, O.; Gholamnia, K.; Ghamisi, P. Evaluating the Impact of Recursive Feature Elimination on Machine Learning Models for Predicting Forest Fire-Prone Zones. Fire 2024, 7, 440. [Google Scholar] [CrossRef]
- Lin, X.; Yang, F.; Zhou, L.; Yin, P.; Kong, H.; Xing, W.; Lu, X.; Jia, L.; Wang, Q.; Xu, G. A support vector machine-recursive feature elimination feature selection method based on artificial contrast variables and mutual information. J. Chromatogr. B 2012, 910, 149–155. [Google Scholar] [CrossRef] [PubMed]
- Paterlini, S.; Minerva, T. Regression model selection using genetic algorithms. In Proceedings of the 11th WSEAS International Conference on Nural Networks and 11th WSEAS International Conference on Evolutionary Computing and 11th WSEAS International Conference on Fuzzy Systems, Lasi, Romania, 13–15 June 2010; World Scientific and Engineering Academy and Society (WSEAS): Newark, NJ, USA, 2010; pp. 19–27. [Google Scholar]
- Wan-Mohd-Jaafar, W.S.; Woodhouse, I.H.; Silva, C.A.; Omar, H.; Hudak, A.T. Modelling individual tree aboveground biomass using discrete return LiDAR in lowland dipterocarp forest of Malaysia. J. Trop. For. Sci. 2017, 29, 465–484. [Google Scholar] [CrossRef]
- Paetzold, R.L. Multicollinearity and the use of regression analyses in discrimination litigation. Behav. Sci. Law 1992, 10, 207–228. [Google Scholar] [CrossRef]
- Dai, S.; Zheng, X.; Gao, L.; Xu, C.; Zuo, S.; Chen, Q.; Wei, X.; Ren, Y. Improving plot-level model of forest biomass: A combined approach using machine learning with spatial statistics. Forests 2021, 12, 1663. [Google Scholar] [CrossRef]
- Song, J.; Liu, X.; Adingo, S.; Guo, Y.; Li, Q. A Comparative Analysis of Remote Sensing Estimation of Aboveground Biomass in Boreal Forests Using Machine Learning Modeling and Environmental Data. Sustainability 2024, 16, 7232. [Google Scholar] [CrossRef]
- Zhang, L.; Yin, X.; Wang, Y.; Chen, J. Aboveground Biomass Mapping in SemiArid Forests by Integrating Airborne LiDAR with Sentinel-1 and Sentinel-2 Time-Series Data. Remote Sens. 2024, 16, 3241. [Google Scholar] [CrossRef]
Analysis | № | Variable | Label | Literature |
---|---|---|---|---|
Section 01: Light Detection and Ranging LiDAR variables—2016 | ||||
DEM | 1 | Slope from DEM | slope | [55,56] |
FUSION Height Bins | 2 | 0–5m | hb_0_5 | [14] |
3 | 5–10 m | hb_5_10 | ||
4 | 10–15 m | hb_10_15 | ||
5 | 15–20 m | hb_15_20 | ||
6 | 20–25 m | hb_20_25 | ||
7 | >25 m | hb_more25 | ||
FUSION Densities | 8 | 0–5 m | dens_0_5 | [14] |
9 | 5–10 m | dens_5_10 | ||
10 | 10–15 m | dens_10_15 | ||
11 | 15–20 m | dens_15_20 | ||
12 | 20–25 m | dens_20_25 | ||
13 | >25 m | dens_more25 | ||
Height Percentile (subplot-level data) | 14 | 50th Percentile | p50th | [3,15,18,57] |
15 | 60th Percentile | p60th | ||
16 | 70th Percentile | p70th | ||
17 | 75th Percentile | p75th | ||
18 | 80th Percentile | p80th | ||
19 | 85th Percentile | p85th | ||
20 | 90th Percentile | p90th | ||
21 | 95th Percentile | p95th | ||
22 | 99th Percentile | p99th | ||
23 | Kurtosis | lida_kurt | ||
24 | Median | lida_med | ||
25 | Mean | lida_mn | ||
26 | Skewness | lida_skw | ||
27 | Standard deviation | lida_stdv | ||
Canopy Height Model | 28 | Canopy Height (CHM) | cnpy_ht | |
FUSION/LDV LiDAR Processing Point Cloud Metrics | 29 | Maximum Elevation | elv_max | [14] |
30 | Mean Elevation | elv_mean | ||
31 | Canopy Relief Ratio | crr | ||
Section 02: NAIP summertime imagery derived variables—2016 | ||||
Principal Component Analysis | 32 | Principal Component 1 | pca_1 | [58] |
33 | Principal Component 2 | pca_2 | ||
34 | Principal Component 3 | pca_3 | ||
35 | Principal Component 4 | pca_4 | ||
Green Co-occurrence | 36 | Mean | green_layer1 | [20,57,59] |
37 | Variance | green_layer2 | ||
38 | Homogeneity | green_layer3 | ||
39 | Contrast | green_layer4 | ||
40 | Entropy | green_layer5 | ||
41 | Dissimilarity | green_layer6 | ||
42 | Second Moment | green_layer7 | ||
43 | Correlation | green_layer8 | ||
Red Co-occurrence matrix | 44 | Mean | nir_layer1 | [20,57,59] |
45 | Variance | nir_layer2 | ||
46 | Homogeneity | nir_layer3 | ||
47 | Contrast | nir_layer4 | ||
48 | Dissimilarity | nir_layer5 | ||
49 | Entropy | nir_layer6 | ||
50 | Second Moment | nir_layer7 | ||
51 | Correlation | nir_layer8 | ||
Vegetation Indices | 52 | NDVI | naip_ndvi | [57,60,61] |
53 | SAVI | naip_savi | [62] | |
54 | GNDVI | naip_gndvi | [63] | |
Spectral Bands | 55 | NIR Band | naip_nir | [64] |
56 | Red Band | naip_red | [64] | |
Section 03: Sentinel-2 leaf-off image variables—2016 | ||||
Vegetation Indices | 57 | EVI | sen_evi | [65] |
58 | GNDVI | sen_gndvi | [22] | |
59 | NDVI | sen_ndvi | [22] | |
60 | SAVI | sen_savi | [65] | |
61 | NDMI | sen_ndmi | [64] | |
62 | NDRE | sen_ndre | [62,64] | |
Sentinel-2 Spectral Response | 63 | SWIR | sen_swir | [62,66] |
64 | NIR | sen_nir | [62] | |
65 | RED | sen_red | [62] | |
Soil | 66 | Soil type | soil_type | [67] |
CT CCAP | 67 | Landcover type | cover_type | [68] |
Hyperparameters | Parameter Ranges | ||
---|---|---|---|
Start | End | Size | |
ntrees | 300 | 500 | 50 |
mtry | 1 | 67 | 1 |
nodesize | 1 | 10 | 2 |
max_depth | 1 | 100 | 20 |
min_samples_leaf | 10 | 30 | 2 |
min_samples_split | 10 | 30 | 2 |
№ | Variable | VIF |
---|---|---|
1 | slope | 1.528 |
2 | pca_4 | 2.023 |
3 | pca_2 | 2.239 |
4 | hb_15_20 | 2.289 |
5 | cnpy_ht | 2.329 |
6 | sen_swir | 2.431 |
7 | dens_5_10 | 2.531 |
8 | dens_more25 | 2.569 |
9 | hb_0_5 | 2.711 |
10 | dens_10_15 | 3.032 |
11 | hb_20_25 | 3.041 |
12 | sen_red | 3.172 |
13 | pca_3 | 3.384 |
14 | green_layer4 | 3.461 |
15 | lida_skw | 3.653 |
16 | nir_layer7 | 4.084 |
17 | elv_max | 4.113 |
18 | lida_kurt | 4.505 |
19 | nir_layer4 | 4.507 |
20 | green_layer2 | 4.546 |
21 | pca_1 | 4.795 |
22 | green_layer7 | 5.124 |
23 | lida_mn | 5.273 |
24 | sen_evi | 5.752 |
25 | sen_ndre | 5.951 |
26 | naip_gndvi | 6.071 |
27 | sen_ndmi | 6.789 |
28 | green_layer8 | 6.808 |
29 | lida_med | 6.937 |
30 | sen_gndvi | 7.029 |
31 | lida_stdv | 8.551 |
32 | nir_layer8 | 9.141 |
Explanatory Variables | Model № | ||||||||
---|---|---|---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | ||
1 | pca_1 | ✓ | |||||||
2 | pca_3 | ✓ | |||||||
3 | green_layer4 | ✓ | ✓ | ✓ | |||||
4 | green_layer8 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |
5 | sen_evi | ✓ | ✓ | ✓ | ✓ | ✓ | |||
6 | sen_gndvi | ✓ | |||||||
7 | sen_ndre | ✓ | ✓ | ✓ | ✓ | ✓ | |||
8 | sen_swir | ✓ | ✓ | ||||||
9 | lida_skw | ✓ | ✓ | ✓ | |||||
10 | lida_stdv | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
Model | F- Statistic | p-Value | Residual Standard Error (RSE)/Mgha−1 | RMSE/Mgha−1 | R-Squared (Training) | R-Squared (Testing) |
---|---|---|---|---|---|---|
Stepwise | 9.408 | 1.777 × 10−7 | 21.51 | 23.53 | 0.3015 | 0.1508 |
Best Subset | 6.982 | 2.296 × 10−7 | 21.12 | 22.59 | 0.3451 | 0.2172 |
Model Type | MSE/(Mgha−1)2 | RMSE/Mgha−1 | R2 | Number of Variables |
---|---|---|---|---|
RF | 739.46 | 27.19 | 0.41 | 28 |
SVM | 1034.59 | 32.17 | 0.10 | 44 |
MLR | 510.27 | 22.59 | 0.22 | 8 |
Ensemble | 123.85 | 11.13 | 0.79 | Ensemble of RF, SVM, and MLR |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lamahewage, S.H.G.; Witharana, C.; Riemann, R.; Fahey, R.; Worthley, T. Comparing Machine Learning and Statistical Models for Remote Sensing-Based Forest Aboveground Biomass Estimations. Forests 2025, 16, 1430. https://doi.org/10.3390/f16091430
Lamahewage SHG, Witharana C, Riemann R, Fahey R, Worthley T. Comparing Machine Learning and Statistical Models for Remote Sensing-Based Forest Aboveground Biomass Estimations. Forests. 2025; 16(9):1430. https://doi.org/10.3390/f16091430
Chicago/Turabian StyleLamahewage, Shashika Himandi Gardeye, Chandi Witharana, Rachel Riemann, Robert Fahey, and Thomas Worthley. 2025. "Comparing Machine Learning and Statistical Models for Remote Sensing-Based Forest Aboveground Biomass Estimations" Forests 16, no. 9: 1430. https://doi.org/10.3390/f16091430
APA StyleLamahewage, S. H. G., Witharana, C., Riemann, R., Fahey, R., & Worthley, T. (2025). Comparing Machine Learning and Statistical Models for Remote Sensing-Based Forest Aboveground Biomass Estimations. Forests, 16(9), 1430. https://doi.org/10.3390/f16091430