Assessing the Utility of Satellite Embedding Features for Biomass Prediction in Subtropical Forests with Machine Learning
Highlights
- Machine learning models trained on embedding features derived from the Google AlphaEarth Foundations dataset demonstrated strong predictive capability for forest biomass, particularly in broad-leaved and coniferous forests, as validated through five-fold cross-validation. These models successfully captured large-scale spatial patterns of forest biomass distribution.
- Spatially predicational biomass maps revealed clear landscape-scale gradients, with lower biomass values predominantly occurring in fragmented forest patches and forest edges near urbanized areas, while higher biomass levels were concentrated within continuous, intact forest regions that are relatively distant from human disturbance.
- Embedding-based remote sensing models offer an effective and efficient framework for monitoring biomass dynamics in Yunhe Forestry Station, especially in regions where field surveys or the acquisition of multi-source remote sensing data are constrained by terrain accessibility or logistical limitations.
- By leveraging embeddings that integrate information from diverse Earth observation sensors, this study demonstrates an optional and scalable methodology for forest biomass estimation, highlighting the potential of representation learning to advance large-area forest carbon assessment and management.
Abstract
1. Introduction
2. Materials and Methods
2.1. Study Area
2.2. Data Preparation
2.2.1. Field Inventory and Calculation of Biomass of Sample Plots
2.2.2. Collection of Embedding Dataset
2.3. Machine Learning Algorithms
2.3.1. Random Forest
2.3.2. Support Vector Regression
2.3.3. Multi-Layer Perceptron Neural Network
2.3.4. Gaussian Process Regression
2.4. K-Fold Cross-Validation and Model Evaluation
2.5. Predictor Importance and Spatial Applicability Analysis
3. Results
3.1. Model Training and Validation
3.2. Total Biomass Prediction in Yunhe Forestry Station
4. Discussion
4.1. Capability of Embedding Dataset for Predicting Forest Biomass
4.2. Managing Forests Using Spatial Biomass Predictions
4.3. Limitations
5. Conclusions
- Embedding datasets derived from Google AlphaEarth Foundations’ satellites can effectively predict forest biomass, demonstrating performance comparable to that reported in studies using conventional optical, SAR, or LiDAR data.
- The spatially explicit biomass maps generated in this study provide valuable information for forest monitoring and management, supporting large-scale decision-making, particularly for identifying biomass patterns in near-urban forests and conserving high-biomass, continuous forests.
Supplementary Materials
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Pan, Y.; Birdsey, R.A.; Fang, J.; Houghton, R.; Kauppi, P.E.; Kurz, W.A.; Phillips, O.L.; Shvidenko, A.; Lewis, S.L.; Canadell, J.G.; et al. A Large and Persistent Carbon Sink in the World’s Forests. Science 2011, 333, 988–993. [Google Scholar] [CrossRef]
- Houghton, R.A. Aboveground Forest Biomass and the Global Carbon Balance. Glob. Change Biol. 2005, 11, 945–958. [Google Scholar] [CrossRef]
- Vashum, K.T.; Jayakumar, S. Methods to estimate above-ground biomass and carbon stock in natural forests—A review. J. Ecosyst. Ecography 2012, 2, 1–7. [Google Scholar] [CrossRef]
- Pan, Y.; Birdsey, R.A.; Phillips, O.L.; Jackson, R.B. The Structure, Distribution, and Biomass of the World’s Forests. Annu. Rev. Ecol. Evol. Syst. 2013, 44, 593–622. [Google Scholar] [CrossRef]
- Zhao, F.; Guo, Q.; Kelly, M. Allometric equation choice impacts lidar-based forest biomass estimates: A case study from the Sierra National Forest, CA. Agric. For. Meteorol. 2012, 165, 64–72. [Google Scholar] [CrossRef]
- Abdul-Hamid, H.; Mohamad-Ismail, F.-N.; Mohamed, J.; Samdin, Z.; Abiri, R.; Tuan-Ibrahim, T.-M.; Mohammad, L.-S.; Jalil, A.-M.; Naji, H.-R. Allometric Equation for Aboveground Biomass Estimation of Mixed Mature Mangrove Forest. Forests 2022, 13, 325. [Google Scholar] [CrossRef]
- Viana, H.; Aranha, J.; Lopes, D.; Cohen, W.B. Estimation of crown biomass of Pinus pinaster stands and shrubland above-ground biomass using forest inventory data, remotely sensed imagery and spatial prediction models. Ecol. Model. 2012, 226, 22–35. [Google Scholar] [CrossRef]
- Yan, X.; Li, J.; Smith, A.R.; Yang, D.; Ma, T.; Su, Y.; Shao, J. Evaluation of machine learning methods and multi-source remote sensing data combinations to construct forest above-ground biomass models. Int. J. Digit. Earth 2023, 16, 4471–4491. [Google Scholar] [CrossRef]
- Su, Y.; Guo, Q.; Jin, S.; Guan, H.; Sun, X.; Ma, Q.; Hu, T.; Wang, R.; Li, Y. The Development and Evaluation of a Backpack LiDAR System for Accurate and Efficient Forest Inventory. IEEE Geosci. Remote Sens. Lett. 2021, 18, 1660–1664. [Google Scholar] [CrossRef]
- Brovkina, O.; Novotny, J.; Cienciala, E.; Zemek, F.; Russ, R. Mapping forest aboveground biomass using airborne hyperspectral and LiDAR data in the mountainous conditions of Central Europe. Ecol. Eng. 2017, 100, 219–230. [Google Scholar] [CrossRef]
- Fahey, C.; Choi, D.; Wang, J.; Domke, G.M.; Edwards, J.D.; Fei, S.; Kivlin, S.N.; LaRue, E.A.; McCormick, M.K.; McShea, W.J.; et al. Canopy complexity drives positive effects of tree diversity on productivity in two tree diversity experiments. Ecology 2025, 106, e4500. [Google Scholar] [CrossRef] [PubMed]
- Pelletier, F.; Cardille, J.A.; Wulder, M.A.; White, J.C.; Hermosilla, T. Inter- and intra-year forest change detection and monitoring of aboveground biomass dynamics using Sentinel-2 and Landsat. Remote Sens. Environ. 2024, 301, 113931. [Google Scholar] [CrossRef]
- Hyde, P.; Nelson, R.; Kimes, D.; Levine, E. Exploring LiDAR–RaDAR synergy—Predicting aboveground biomass in a southwestern ponderosa pine forest using LiDAR, SAR and InSAR. Remote Sens. Environ. 2007, 106, 28–38. [Google Scholar] [CrossRef]
- Foody, G.M.; Boyd, D.S.; Cutler, M.E.J. Predictive relations of tropical forest biomass from Landsat TM data and their transferability between regions. Remote Sens. Environ. 2003, 85, 463–474. [Google Scholar] [CrossRef]
- Ghasemi, N.; Sahebi, M.R.; Mohammadzadeh, A. Biomass Estimation of a Temperate Deciduous Forest Using Wavelet Analysis. IEEE Trans. Geosci. Remote Sens. 2013, 51, 765–776. [Google Scholar] [CrossRef]
- Vafaei, S.; Soosani, J.; Adeli, K.; Fadaei, H.; Naghavi, H.; Pham, T.D.; Tien Bui, D. Improving Accuracy Estimation of Forest Aboveground Biomass Based on Incorporation of ALOS-2 PALSAR-2 and Sentinel-2A Imagery and Machine Learning: A Case Study of the Hyrcanian Forest Area (Iran). Remote Sens. 2018, 10, 172. [Google Scholar] [CrossRef]
- Duncanson, L.; Neuenschwander, A.; Hancock, S.; Thomas, N.; Fatoyinbo, T.; Simard, M.; Silva, C.A.; Armston, J.; Luthcke, S.B.; Hofton, M.; et al. Biomass estimation from simulated GEDI, ICESat-2 and NISAR across environmental gradients in Sonoma County, California. Remote Sens. Environ. 2020, 242, 111779. [Google Scholar] [CrossRef]
- Qin, S.; Wang, H.; Rogers, C.; Bermúdez, J.; Lourenço, R.B.; Zhang, J.; Li, X.; Chau, J.; Tompalski, P.; Gonsamo, A. Aboveground biomass mapping of Canada with SAR and optical satellite observations aided by active learning. ISPRS J. Photogramm. Remote Sens. 2025, 226, 204–220. [Google Scholar] [CrossRef]
- Brown, C.; Kazmierski, M.; Pasquarella, V.; Rucklidge, W.; Samsikova, M.; Zhang, C.; Shelhamer, E.; Lahera, E.; Wiles, O.; Ilyushchenko, S. AlphaEarth Foundations: An embedding field model for accurate and efficient global mapping from sparse label data. arXiv 2025, arXiv:2507.22291. [Google Scholar] [CrossRef]
- Seydi, S.T. Deep Learning-Based Burned Area Mapping Using Bi-Temporal Siamese Networks and AlphaEarth Foundation Datasets. arXiv 2025, arXiv:2509.07852. [Google Scholar]
- Alvarez, C.I.; Ulloa Vaca, C.A.; Echeverria Llumipanta, N.A. Machine Learning for Urban Air Quality Prediction Using Google AlphaEarth Foundations Satellite Embeddings: A Case Study of Quito, Ecuador. Remote Sens. 2025, 17, 3472. [Google Scholar] [CrossRef]
- Ehlers, D.; Wang, C.; Coulston, J.; Zhang, Y.; Pavelsky, T.; Frankenberg, E.; Woodcock, C.; Song, C. Mapping Forest Aboveground Biomass Using Multisource Remotely Sensed Data. Remote Sens. 2022, 14, 1115. [Google Scholar] [CrossRef]
- Huang, H.; Liu, C.; Wang, X.; Zhou, X.; Gong, P. Integration of multi-resource remotely sensed data and allometric models for forest aboveground biomass estimation in China. Remote Sens. Environ. 2019, 221, 225–234. [Google Scholar] [CrossRef]
- Liu, J.; Coomes, D.A.; Gibson, L.; Hu, G.; Liu, J.; Luo, Y.; Wu, C.; Yu, M. Forest fragmentation in China and its effect on biodiversity. Biol. Rev. 2019, 94, 1636–1657. [Google Scholar] [CrossRef]
- Yuan, W.G.; Jiang, B.; Ge, Y.J.; Zhu, J.R.; Shen, A.H. Study on Biomass Model of Key Ecological Forest in Zhejiang Province. J. Zhejiang For. Sci. Technol. 2009, 29, 1–5. [Google Scholar]
- Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Cutler, A.; Cutler, D.R.; Stevens, J.R. Random Forests. In Ensemble Machine Learning: Methods and Applications; Zhang, C., Ma, Y., Eds.; Springer: New York, NY, USA, 2012; pp. 157–175. [Google Scholar]
- Liaw, A.; Wiener, M. Classification and Regression by randomForest. R News 2002, 2, 18–22. [Google Scholar]
- Chaofan, W.; Huanhuan, S.; Aihua, S.; Jinsong, D.; Muye, G.; Jinxia, Z.; Hongwei, X.; Ke, W. Comparison of machine-learning methods for above-ground biomass estimation based on Landsat imagery. J. Appl. Remote Sens. 2016, 10, 035010. [Google Scholar] [CrossRef]
- Karatzoglou, A.; Smola, A.; Hornik, K.; Zeileis, A. kernlab—An S4 Package for Kernel Methods in R. J. Stat. Softw. 2004, 11, 1–20. [Google Scholar] [CrossRef]
- Sharifi, A.; Amini, J.; Tateishi, R. Estimation of Forest Biomass Using Multivariate Relevance Vector Regression. Photogramm. Eng. Remote Sens. 2016, 82, 41–49. [Google Scholar] [CrossRef]
- Zhang, L.; Zhang, X.; Shao, Z.; Jiang, W.; Gao, H. Integrating Sentinel-1 and 2 with LiDAR data to estimate aboveground biomass of subtropical forests in northeast Guangdong, China. Int. J. Digit. Earth 2023, 16, 158–182. [Google Scholar] [CrossRef]
- Bergmeir, C.; Benítez, J.M. Neural Networks in R Using the Stuttgart Neural Network Simulator: RSNNS. J. Stat. Softw. 2012, 46, 1–26. [Google Scholar] [CrossRef]
- Xie, R.; Darvishzadeh, R.; Skidmore, A.K.; Heurich, M.; Holzwarth, S.; Gara, T.W.; Reusen, I. Mapping leaf area index in a mixed temperate forest using Fenix airborne hyperspectral data and Gaussian processes regression. Int. J. Appl. Earth Obs. Geoinf. 2021, 95, 102242. [Google Scholar] [CrossRef]
- Abebe, G.; Tadesse, T.; Gessesse, B. Estimating Leaf Area Index and biomass of sugarcane based on Gaussian process regression using Landsat 8 and Sentinel 1A observations. Int. J. Image Data Fusion 2023, 14, 58–88. [Google Scholar] [CrossRef]
- Ploton, P.; Mortier, F.; Réjou-Méchain, M.; Barbier, N.; Picard, N.; Rossi, V.; Dormann, C.; Cornu, G.; Viennois, G.; Bayol, N.; et al. Spatial validation reveals poor predictive performance of large-scale ecological mapping models. Nat. Commun. 2020, 11, 4540. [Google Scholar] [CrossRef]
- Meyer, H.; Reudenbach, C.; Wöllauer, S.; Nauss, T. Importance of spatial predictor variable selection in machine learning applications—Moving from data reproduction to spatial prediction. Ecol. Model. 2019, 411, 108815. [Google Scholar] [CrossRef]
- Kuhn, M. Building Predictive Models in R Using the caret Package. J. Stat. Softw. 2008, 28, 1–26. [Google Scholar] [CrossRef]
- Lundberg, S.; Lee, S.I. A unified approach to interpreting model predictions. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2025. [Google Scholar]
- Jian, K.; Lu, D.; Li, G. Modeling Forest Carbon Stock Based on Sample Plots and UAV Lidar Data from Multiple Sites and Examining Its Vertical Characteristics in Wuyishan National Park. Remote Sens. 2025, 17, 377. [Google Scholar] [CrossRef]
- Muro, J.; Linstädter, A.; Magdon, P.; Wöllauer, S.; Männer, F.A.; Schwarz, L.-M.; Ghazaryan, G.; Schultz, J.; Malenovský, Z.; Dubovyk, O. Predicting plant biomass and species richness in temperate grasslands across regions, time, and land management with remote sensing and deep learning. Remote Sens. Environ. 2022, 282, 113262. [Google Scholar] [CrossRef]
- Schmit, J.P.; Johnson, L.R.; Baker, M.; Darling, L.; Fahey, R.; Locke, D.H.; Morzillo, A.T.; Sonti, N.F.; Trammell, T.L.E.; Aronson, M.F.J.; et al. The influence of urban and agricultural landscape contexts on forest diversity and structure across ecoregions. Ecosphere 2025, 16, e70188. [Google Scholar] [CrossRef]
- Pretzsch, H. Canopy space filling and tree crown morphology in mixed-species stands compared with monocultures. For. Ecol. Manag. 2014, 327, 251–264. [Google Scholar] [CrossRef]
- Li, C.; Li, M.; Iizuka, K.; Liu, J.; Chen, K.; Li, Y. Effects of Forest Canopy Structure on Forest Aboveground Biomass Estimation Using Landsat Imagery. IEEE Access 2021, 9, 5285–5295. [Google Scholar] [CrossRef]
- Yang, Q.; Su, Y.; Hu, T.; Jin, S.; Liu, X.; Niu, C.; Liu, Z.; Kelly, M.; Wei, J.; Guo, Q. Allometry-based estimation of forest aboveground biomass combining LiDAR canopy height attributes and optical spectral indexes. For. Ecosyst. 2022, 9, 100059. [Google Scholar] [CrossRef]
- Yang, G.; Crowther, T.W.; Lauber, T.; Zohner, C.M.; Smith, G.R. A globally consistent negative effect of edge on aboveground forest biomass. Nat. Ecol. Evol. 2025, 9, 2036–2045. [Google Scholar] [CrossRef]





| Species Group | Stem Biomass Equation | Foliage Biomass Equation | Root Biomass Equation | Reference |
|---|---|---|---|---|
| Pine | = 0.0600H0.7934D1.8005 | = 0.1377D1.4872L0.4052 | = 0.0417H−0.0780D2.2618 | [25] |
| Fir | = 0.0647H0.8959D1.4880 | = 0.0971D1.7814L0.0346 | = 0.0617H−0.10374D2.115 | |
| Hard-wood and broad-leaves 1 | = 0.0560H0.8099D1.8140 | = 0.0980D1.6481L0.4610 | = 0.0549H0.1068D2.0953 | |
| Hard-wood and broad-leaves 2 | = 0.0803H0.7815D1.8056 | = 0.2860D1.0968L0.945 | = 0.2470H0.1745D1.7954 | |
| Soft-wood and broad-leaves | = 0.0444H0.7197D1.7095 | = 0.0856D1.22657L0.397 | = 0.0459H0.1067D2.0247 | |
| Moso bamboos | = 0.0398H0.5778D1.8540 | = 0.280D0.8357L0.2740 | = 0.371H0.1357D0.9817 |
| Forest Type | Number of Samples | Range (t/ha) | Mean (t/ha) | Standard Deviation (t/ha) |
|---|---|---|---|---|
| Broad-leaved forest | 26 | 40~125.8 | 84.4 | 23.03 |
| Mixed forest | 38 | 44.8~189.8 | 106.7 | 47.52 |
| Coniferous forest | 25 | 17.2~163.9 | 76.7 | 41.38 |
| Forest Type | Machine Learning Methods | RMSE (t/ha) (Means ± sd) | MAE (t/ha) (Means ± sd) | R2 (Means ± sd) |
|---|---|---|---|---|
| All forests | GPR | 41.90 ± 4.01 | 34.59 ± 2.94 | 0.06 ± 0.07 |
| SVR | 42.48 ± 4.19 | 34.34 ± 2.95 | 0.06 ± 0.06 | |
| RF | 42.53 ± 4.15 | 34.87 ± 3.20 | 0.06 ± 0.09 | |
| MLPNN | 45.02 ± 9.42 | 36.49 ± 8.50 | 0.06 ± 0.08 | |
| Broad-leaved forest | SVR | 20.87 ± 5.45 | 16.61 ± 4.27 | 0.33 ± 0.27 |
| GPR | 21.46 ± 5.63 | 16.93 ± 4.60 | 0.32 ± 0.27 | |
| RF | 22.41 ± 5.24 | 17.85 ± 4.36 | 0.24 ± 0.26 | |
| MLPNN | 24.76 ± 6.53 | 20.38 ± 5.96 | 0.26 ± 0.24 | |
| Mixed forest | SVR | 47.40 ± 5.37 | 40.58 ± 5.41 | 0.13 ± 0.16 |
| RF | 48.75 ± 5.58 | 42.01 ± 5.88 | 0.13 ± 0.16 | |
| GPR | 49.05 ± 5.84 | 42.50 ± 6.28 | 0.13 ± 0.18 | |
| MLPNN | 51.00 ± 11.86 | 43.85 ± 11.23 | 0.14 ± 0.15 | |
| Coniferous forest | RF | 34.27 ± 7.34 | 29.79 ± 6.73 | 0.48 ± 0.25 |
| SVR | 35.53 ± 9.67 | 29.93 ± 8.06 | 0.37 ± 0.26 | |
| GPR | 35.73 ± 8.38 | 30.82 ± 7.38 | 0.42 ± 0.29 | |
| MLPNN | 45.61 ± 9.02 | 37.97 ± 7.55 | 0.46 ± 0.22 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Jin, C.; Jiang, X.; Wen, L.; Wu, C.; Xu, X.; Jiao, J. Assessing the Utility of Satellite Embedding Features for Biomass Prediction in Subtropical Forests with Machine Learning. Remote Sens. 2026, 18, 436. https://doi.org/10.3390/rs18030436
Jin C, Jiang X, Wen L, Wu C, Xu X, Jiao J. Assessing the Utility of Satellite Embedding Features for Biomass Prediction in Subtropical Forests with Machine Learning. Remote Sensing. 2026; 18(3):436. https://doi.org/10.3390/rs18030436
Chicago/Turabian StyleJin, Chao, Xiaodong Jiang, Lina Wen, Chuping Wu, Xia Xu, and Jiejie Jiao. 2026. "Assessing the Utility of Satellite Embedding Features for Biomass Prediction in Subtropical Forests with Machine Learning" Remote Sensing 18, no. 3: 436. https://doi.org/10.3390/rs18030436
APA StyleJin, C., Jiang, X., Wen, L., Wu, C., Xu, X., & Jiao, J. (2026). Assessing the Utility of Satellite Embedding Features for Biomass Prediction in Subtropical Forests with Machine Learning. Remote Sensing, 18(3), 436. https://doi.org/10.3390/rs18030436

