# Above-Ground Biomass Prediction for Croplands at a Sub-Meter Resolution Using UAV–LiDAR and Machine Learning Methods

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

^{2}). An extremely randomized trees (ERT) regressor was selected for the regression analysis, based on its predictive performance for the first year’s growing season. The model was retrained using previously identified hyperparameters to predict the AGB of the crops in the second year. The ERT performed AGB estimation using height and reflectance metrics from LiDAR-derived point cloud data and achieved a prediction performance of ${R}^{2}$ = 0.48 at a spatial resolution of 0.35 m

^{2}. The prediction performance could be improved significantly by aggregating adjacent predictions (${R}^{2}$ = 0.71 and ${R}^{2}$ = 0.93 at spatial resolutions of 1 m

^{2}and 2 m

^{2}, respectively) as they ultimately converged to the reference biomass values because any individual errors averaged out. The AGB prediction results were examined as function of predictor type, training set size, sampling resolution, phenology, and canopy density. The results demonstrated that when combined with ML regression methods, the UAV–LiDAR method could be used to provide accurate real-time AGB prediction for crop fields at a high resolution, thereby providing a way to map their biochemical constituents.

## 1. Introduction

## 2. Materials and Methods

#### 2.1. Study Area

^{2}). The ploughing layer (30 cm deep) sits on a sandy loam with pebble inclusions that are around 3–5 cm in diameter. The water table depth lies at 5.5 ± 1 m below ground. A visual documentation of the two crops is shown in Figure 2c,d.

#### 2.2. Instrumentation, Flight Parameters, and Field Measurements

^{2}), and field coverage. The UAV–LiDAR surveys followed the calendar and frequency of the AGB sampling campaigns (Figure 2a,b), which resulted in a maximum time difference between the UAV–LiDAR surveys and AGB collection of 48 h in case of adverse weather conditions.

#### 2.3. LiDAR Data Processing

#### 2.3.1. Point Cloud Scene Generation

#### 2.3.2. Point Cloud Scene Processing

- The metrics of height were the mean, median, standard deviation, variance, skewness, and kurtosis;
- The metrics of reflectance were the mean and standard deviation.

#### 2.3.3. Data Post-Processing

#### 2.4. Datasets

#### 2.5. ML Model Training and Evaluation

#### 2.6. Description of the Selected ML Model

#### 2.7. Generation of AGB Prediction Maps

^{2}resolution. These prediction feature maps were used as the input for the trained regression model in order to produce the AGB prediction maps (Figure 8).

#### 2.8. Uncertainty of AGB Field-Based Measurements

## 3. Results

#### 3.1. Model Selection

#### 3.2. AGB Prediction at a Sub-Meter Resolution

^{2}spatial resolution that were produced by the ERT model using the barley and wheat testing datasets are shown in Figure 10a and Figure 11a, respectively. In both cases, the results were compared to the linear regression model, which was fitted to the same datasets (i.e., fitting to the 70% and predicting the remaining 30% of the 2021 data). The predictive performances of this model are shown in Figure 10b and Figure 11b for comparison.

#### 3.3. Aggregated AGB Predictions

^{2}; wheat: −18.5 g/m

^{2}). The residuals were independent and distributed around zero, so they were partially canceled out when averaged over a larger area, which caused the $\overline{{R}^{2}}$ score to converge to 1 as the number of aggregate predictions increased (Figure 12b and Figure 13b). Thus, the prediction could be improved by coarsening the spatial resolution (Table 4). This technique is known as the spatial averaging of errors [87].

^{2}resolution for both the barley and wheat datasets improved significantly compared to the respective AGB predictions at a 0.35 m

^{2}resolution (Table 4 and Figure 12b and Figure 13b).

^{2}and 2 m

^{2}of spatial resolution (i.e., three and six aggregate samples, respectively), reaching optimal predictions (${R}^{2}\approx 0.95$) from 3 m

^{2}onward for the barley crops (Figure 12b) and from 4 m

^{2}onward for the wheat crops (Figure 13b). At 2 m

^{2}of spatial resolution, the ${R}^{2}$ reached 0.93 and 0.89 for the barley and wheat testing datasets, respectively (Table 4 and Figure 12b.1 and Figure 13b.1).

^{2}spatial resolution.

## 4. Discussion

#### 4.1. AGB Prediction

^{2}) [53], Ma et al. (12 m

^{2}) [56] and Zha et al. (160–300 m

^{2}) [54].

^{2}resolution when predicting AGB for the same crop species that was used for training and when predicting AGB for a different crop species, respectively. The ERT regression model achieved valid predictions in both case and the ERT model predictions outperformed those of the linear regression model that was used as the comparative baseline (Figure 10b and Figure 11b). However, the prediction results for different crop species (i.e., training the model with the barley dataset and predicting the AGB for wheat crops) were not as accurate as those for the same species, which was expected. The higher accuracy of the predictions for the same crop species also meant that the aggregated prediction required fewer aggregated samples to converge in the barley set (Figure 12b) than in the wheat set (Figure 13b). The performance reduction when training and predicting for different species could be attributed to (i) the different plant-level AGB distributions between the two crop species, as captured by the AGB labels (Figure 2), and (ii) the different morphology in the canopy structures, as portrayed by the LiDAR-derived predictors. The differences between both crops were retained in the datasets, which caused a dataset shift effect [88]. This shift challenged the accuracy of the AGB predictions since the joint distributions of the predictors and the target AGB values were different for the training and testing phases.

#### 4.2. Aggregation of AGB Predictions

^{2}), while the 1 m

^{2}resolution was found to be the optimal trade-off between spatial resolution and prediction performance for capturing local variations in AGB (Table 4 and Figure 12b.1 and Figure 13b.1). In terms of the sampling resolution, the datasets that were composed of smaller samples (i.e., 0.5 m × 0.34 m) showed lower correlations between the height-related predictors and AGB than the same datasets after the aggregation of two or three samples (1 m × 0.34 m and 1.5 m × 0.34 m, respectively). The AGB samples with a very high spatial resolution (e.g., 0.175 m

^{2}) could suffer from poor PCD representation (i.e., low counts of LiDAR returns), thereby compromising the reliability of the statistics that were extracted. Additionally, the datasets that had small sample sizes presented higher variance in the AGB ground truth values (i.e., the measurements at a 0.175 m

^{2}resolution produced noisier labels).

#### 4.3. Applied ML Methods

## 5. Conclusions

^{2}. However, by aggregating the individual predictions, it was observed that the prediction performance could be increased significantly by coarsening the spatial resolution as the predictions were statistically independent and uncorrelated. At a spatial resolution of 2 m

^{2}, the regression performance achieved ${R}^{2}$ = 0.93, RMSE = 300 g/m

^{2}, MAE = 266 g/m

^{2}, and MAPE = 13% when the training and testing datasets corresponded to the same crop species. The aggregated AGB predictions also achieved reasonable results at a 2 m

^{2}spatial resolution when the model was trained on one crop type and tested on another: ${R}^{2}$ = 0.89, RMSE = 400 g/m

^{2}, MAE = 351 g/m

^{2}, and MAPE = 16%. This slight reduction in the prediction performance was explained by the differences between (i) the canopy structures and (ii) the plant-level biomass distributions of the two crop species under consideration. We encourage the continuation of protocolized field-based data collection campaigns, as well as UAV–LiDAR surveying, as a means to gain valuable data that are representative of crops under different environmental conditions in order to develop AGB regression models that are capable of generalizing predictions that are even more robust to inter-annual and inter-species variations.

## Author Contributions

## Funding

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## Abbreviations

AGB | Above-Ground Biomass |

dGPS | Differential Global Positioning System |

DTM | Digital Terrain Model |

ERT | Extremely Randomized Trees |

FoV | Field of View |

ICOS | Integrated Carbon Observation System |

LiDAR | Light Detection and Ranging |

MAE | Mean Absolute Error |

MAPE | Mean Absolute Percentage Error |

ML | Machine Learning |

PCD | Point Cloud Data |

RMSE | Root Mean Square Error |

RS | Remote Sensing |

RTK | Real-Time Kinematic |

SAR | Synthetic Aperture Radar |

UAV | Unstaffed Aerial Vehicle |

## Appendix A

^{2}each (see Figure A1). This extra study contrasted with the main field study, for which sub-meter samples were taken exclusively. To calculate the variance, each of the 18 sub-meter samples was upscaled to g/m

^{2}and then subtracted from the total AGB in the 1 m

^{2}that was measured at each location. Due to oven capacity limitations, this analysis was conducted with wet AGB, assuming that the variations in water content within 1 m

^{2}would be negligible. A higher variance was found in the datasets that were composed of samples with higher spatial resolutions (i.e., the 2021 datasets) and within those, it was substantially higher for crops with higher sparsity (i.e., wheat) than for the more homogeneous crops (i.e., barley), as was expected.

**Figure A1.**The variance of measurements of wet AGB value per sample (upscaled to g/m

^{2}) with respect to the total wet AGB value in an area of 1 m

^{2}(which was set as a reference at 0 g/m

^{2}): (

**a**) the 2020 barley dataset (sample size: 1 × 0.35 m

^{2}); (

**b**) the 2021 wheat and barley datasets (sample size: 0.5 × 0.35 m

^{2}). In both (

**a**,

**b**), the solid lines represent the estimation of the kernel probability density.

## References

- Maimaitijiang, M.; Sagan, V.; Sidike, P.; Daloye, A.M.; Erkbol, H.; Fritschi, F.B. Crop monitoring using satellite/UAV data fusion and machine learning. Remote Sens.
**2020**, 12, 1357. [Google Scholar] [CrossRef] - Gebbers, R.; Adamchuk, V.I. Precision agriculture and food security. Science
**2010**, 327, 828–831. [Google Scholar] [CrossRef] - Isbell, F.; Adler, P.R.; Eisenhauer, N.; Fornara, D.; Kimmel, K.; Kremen, C.; Letourneau, D.K.; Liebman, M.; Polley, H.W.; Quijas, S.; et al. Benefits of increasing plant diversity in sustainable agroecosystems. J. Ecol.
**2017**, 105, 871–879. [Google Scholar] [CrossRef] - Lambin, E.F.; Meyfroidt, P. Global land use change, economic globalization, and the looming land scarcity. Proc. Natl. Acad. Sci. USA
**2011**, 108, 3465–3472. [Google Scholar] [CrossRef] - Challinor, A.J.; Ewert, F.; Arnold, S.; Simelton, E.; Fraser, E. Crops and climate change: Progress, trends, and challenges in simulating impacts and informing adaptation. J. Exp. Bot.
**2009**, 60, 2775–2789. [Google Scholar] [CrossRef] - Wang, N.; Wang, E.; Wang, J.; Zhang, J.; Zheng, B.; Huang, Y.; Tan, M. Modelling maize phenology, biomass growth and yield under contrasting temperature conditions. Agric. For. Meteorol.
**2018**, 250, 319–329. [Google Scholar] [CrossRef] - Raza, A.; Razzaq, A.; Mehmood, S.S.; Zou, X.; Zhang, X.; Lv, Y.; Xu, J. Impact of climate change on crops adaptation and strategies to tackle its outcome: A review. Plants
**2019**, 8, 34. [Google Scholar] [CrossRef] [PubMed] - Deryng, D.; Elliott, J.; Folberth, C.; Müller, C.; Pugh, T.A.; Boote, K.J.; Conway, D.; Ruane, A.C.; Gerten, D.; Jones, J.W.; et al. Regional disparities in the beneficial effects of rising CO2 concentrations on crop water productivity. Nat. Clim. Chang.
**2016**, 6, 786–790. [Google Scholar] [CrossRef] - Wang, X.; Zhao, C.; Müller, C.; Wang, C.; Ciais, P.; Janssens, I.; Peñuelas, J.; Asseng, S.; Li, T.; Elliott, J.; et al. Emergent constraint on crop yield response to warmer temperature from field experiments. Nat. Sustain.
**2020**, 3, 908–916. [Google Scholar] [CrossRef] - Jägermeyr, J.; Müller, C.; Ruane, A.C.; Elliott, J.; Balkovic, J.; Castillo, O.; Faye, B.; Foster, I.; Folberth, C.; Franke, J.A.; et al. Climate impacts on global agriculture emerge earlier in new generation of climate and crop models. Nat. Food
**2021**, 2, 873–885. [Google Scholar] [CrossRef] - Tully, K.; Ryals, R. Nutrient cycling in agroecosystems: Balancing food and environmental objectives. Agroecol. Sustain. Food Syst.
**2017**, 41, 761–798. [Google Scholar] [CrossRef] - Abalos, D.; van Groenigen, J.W.; Philippot, L.; Lubbers, I.M.; De Deyn, G.B. Plant trait-based approaches to improve nitrogen cycling in agroecosystems. J. Appl. Ecol.
**2019**, 56, 2454–2466. [Google Scholar] [CrossRef] - EIT-Food. More Crops Consituents Sensing; EIT-Food: Leuven, Belgium, 2022. [Google Scholar]
- Weih, M.; Hamnér, K.; Pourazari, F. Analyzing plant nutrient uptake and utilization efficiencies: Comparison between crops and approaches. Plant Soil
**2018**, 430, 7–21. [Google Scholar] [CrossRef] - Kumar, L.; Mutanga, O. Remote sensing of above-ground biomass. Remote Sens.
**2017**, 9, 935. [Google Scholar] [CrossRef] - Huete, A.; Liu, H.; Batchily, K.; Van Leeuwen, W. A comparison of vegetation indices over a global set of TM images for EOS-MODIS. Remote Sens. Environ.
**1997**, 59, 440–451. [Google Scholar] [CrossRef] - Luckman, A.; Baker, J.; Honzák, M.; Lucas, R. Tropical forest biomass density estimation using JERS-1 SAR: Seasonal variation, confidence limits, and application to image mosaics. Remote Sens. Environ.
**1998**, 63, 126–139. [Google Scholar] [CrossRef] - Hoekman, D.; Quiñones, M. Land cover type and biomass classification using AirSAR data for evaluation of monitoring scenarios in the Colombian Amazon. IEEE Trans. Geosci. Remote Sens.
**2000**, 38, 685–696. [Google Scholar] [CrossRef] - Attarchi, S.; Gloaguen, R. Improving the estimation of above ground biomass using dual polarimetric PALSAR and ETM+ data in the Hyrcanian mountain forest (Iran). Remote Sens.
**2014**, 6, 3693–3715. [Google Scholar] [CrossRef] - Joshi, N.P.; Mitchard, E.T.; Schumacher, J.; Johannsen, V.K.; Saatchi, S.; Fensholt, R. L-band SAR backscatter related to forest cover, height and aboveground biomass at multiple spatial scales across Denmark. Remote Sens.
**2015**, 7, 4442–4472. [Google Scholar] [CrossRef] - Vaglio Laurin, G.; Pirotti, F.; Callegari, M.; Chen, Q.; Cuozzo, G.; Lingua, E.; Notarnicola, C.; Papale, D. Potential of ALOS2 and NDVI to estimate forest above-ground biomass, and comparison with lidar-derived estimates. Remote Sens.
**2016**, 9, 18. [Google Scholar] [CrossRef] - Viergever, K.M. Establishing the Sensitivity of Synthetic Aperture Radar to Above-Ground Biomass in Wooded Savannas. Ph.D. Thesis, The University of Edinburgh, Edinburgh, UK, 2008. [Google Scholar]
- Michelakis, D.; Stuart, N.; Lopez, G.; Linares, V.; Woodhouse, I.H. Local-scale mapping of biomass in tropical lowland pine savannas using ALOS PALSAR. Forests
**2014**, 5, 2377–2399. [Google Scholar] [CrossRef] - Houborg, R.; McCabe, M.F. High-Resolution NDVI from planet’s constellation of earth observing nano-satellites: A new data source for precision agriculture. Remote Sens.
**2016**, 8, 768. [Google Scholar] [CrossRef] - Deng, L.; Mao, Z.; Li, X.; Hu, Z.; Duan, F.; Yan, Y. UAV-based multispectral remote sensing for precision agriculture: A comparison between different cameras. ISPRS J. Photogramm. Remote Sens.
**2018**, 146, 124–136. [Google Scholar] [CrossRef] - Bastin, J.F.; Barbier, N.; Couteron, P.; Adams, B.; Shapiro, A.; Bogaert, J.; De Cannière, C. Aboveground biomass mapping of African forest mosaics using canopy texture analysis: Toward a regional approach. Ecol. Appl.
**2014**, 24, 1984–2001. [Google Scholar] [CrossRef] [PubMed] - Ploton, P.; Barbier, N.; Couteron, P.; Antin, C.; Ayyappan, N.; Balachandran, N.; Barathan, N.; Bastin, J.F.; Chuyong, G.; Dauby, G.; et al. Toward a general tropical forest biomass prediction model from very high resolution optical satellite images. Remote Sens. Environ.
**2017**, 200, 140–153. [Google Scholar] [CrossRef] - Hlatshwayo, S.T.; Mutanga, O.; Lottering, R.T.; Kiala, Z.; Ismail, R. Mapping forest aboveground biomass in the reforested Buffelsdraai landfill site using texture combinations computed from SPOT-6 pan-sharpened imagery. Int. J. Appl. Earth Obs. Geoinf.
**2019**, 74, 65–77. [Google Scholar] [CrossRef] - Yue, J.; Yang, G.; Tian, Q.; Feng, H.; Xu, K.; Zhou, C. Estimate of winter-wheat above-ground biomass based on UAV ultrahigh-ground-resolution image textures and vegetation indices. ISPRS J. Photogramm. Remote Sens.
**2019**, 150, 226–244. [Google Scholar] [CrossRef] - Saatchi, S.; Marlier, M.; Chazdon, R.L.; Clark, D.B.; Russell, A.E. Impact of spatial variability of tropical forest structure on radar estimation of aboveground biomass. Remote Sens. Environ.
**2011**, 115, 2836–2849. [Google Scholar] [CrossRef] - Zolkos, S.G.; Goetz, S.J.; Dubayah, R. A meta-analysis of terrestrial aboveground biomass estimation using lidar remote sensing. Remote Sens. Environ.
**2013**, 128, 289–298. [Google Scholar] [CrossRef] - Calders, K.; Adams, J.; Armston, J.; Bartholomeus, H.; Bauwens, S.; Bentley, L.P.; Chave, J.; Danson, F.M.; Demol, M.; Disney, M.; et al. Terrestrial laser scanning in forest ecology: Expanding the horizon. Remote Sens. Environ.
**2020**, 251, 112102. [Google Scholar] [CrossRef] - Bates, J.S.; Montzka, C.; Schmidt, M.; Jonard, F. Estimating canopy density parameters time-series for winter wheat using UAS Mounted LiDAR. Remote Sens.
**2021**, 13, 710. [Google Scholar] [CrossRef] - Ferraz, A.; Saatchi, S.; Mallet, C.; Meyer, V. Lidar detection of individual tree size in tropical forests. Remote Sens. Environ.
**2016**, 183, 318–333. [Google Scholar] [CrossRef] - Morsdorf, F.; Eck, C.; Zgraggen, C.; Imbach, B.; Schneider, F.D.; Kükenbrink, D. UAV-based LiDAR acquisition for the derivation of high-resolution forest and ground information. Lead. Edge
**2017**, 36, 566–570. [Google Scholar] [CrossRef] - Schneider, F.D.; Morsdorf, F.; Schmid, B.; Petchey, O.L.; Hueni, A.; Schimel, D.S.; Schaepman, M.E. Mapping functional diversity from remotely sensed morphological and physiological forest traits. Nat. Commun.
**2017**, 8, 1441. [Google Scholar] [CrossRef] - Schneider, F.D.; Kükenbrink, D.; Schaepman, M.E.; Schimel, D.S.; Morsdorf, F. Quantifying 3D structure and occlusion in dense tropical and temperate forests using close-range LiDAR. Agric. For. Meteorol.
**2019**, 268, 249–257. [Google Scholar] [CrossRef] - Kükenbrink, D.; Schneider, F.D.; Schmid, B.; Gastellu-Etchegorry, J.P.; Schaepman, M.E.; Morsdorf, F. Modelling of three-dimensional, diurnal light extinction in two contrasting forests. Agric. For. Meteorol.
**2021**, 296, 108230. [Google Scholar] [CrossRef] - Jin, X.; Kumar, L.; Li, Z.; Xu, X.; Yang, G.; Wang, J. Estimation of winter wheat biomass and yield by combining the aquacrop model and field hyperspectral data. Remote Sens.
**2016**, 8, 972. [Google Scholar] [CrossRef] - Gastellu-Etchegorry, J.P.; Lauret, N.; Yin, T.; Landier, L.; Kallel, A.; Malenovskỳ, Z.; Al Bitar, A.; Aval, J.; Benhmida, S.; Qi, J.; et al. DART: Recent advances in remote sensing data modeling with atmosphere, polarization, and chlorophyll fluorescence. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens.
**2017**, 10, 2640–2649. [Google Scholar] [CrossRef] - Demol, M.; Calders, K.; Verbeeck, H.; Gielen, B. Forest above-ground volume assessments with terrestrial laser scanning: A ground-truth validation experiment in temperate, managed forests. Ann. Bot.
**2021**, 128, 805–819. [Google Scholar] [CrossRef] - Sofonia, J.; Shendryk, Y.; Phinn, S.; Roelfsema, C.; Kendoul, F.; Skocaj, D. Monitoring sugarcane growth response to varying nitrogen application rates: A comparison of UAV SLAM LiDAR and photogrammetry. Int. J. Appl. Earth Obs. Geoinf.
**2019**, 82, 101878. [Google Scholar] [CrossRef] - Longfei, Z.; Xiaohe, G.; Shu, C.; Guijun, Y.; Meiyan, S.; Quian, S. Analysis of Plant Height Changes of Lodged Maize Using UAV-LiDAR Data. Agriculture
**2020**, 10, 146. [Google Scholar] - Trepekli, K.; Friborg, T. Deriving Aerodynamic Roughness Length at Ultra-High Resolution in Agricultural Areas Using UAV-Borne LiDAR. Remote Sens.
**2021**, 13, 3538. [Google Scholar] [CrossRef] - Bendig, J.; Yu, K.; Aasen, H.; Bolten, A.; Bennertz, S.; Broscheit, J.; Gnyp, M.L.; Bareth, G. Combining UAV-based plant height from crop surface models, visible, and near infrared vegetation indices for biomass monitoring in barley. Int. J. Appl. Earth Obs. Geoinf.
**2015**, 39, 79–87. [Google Scholar] [CrossRef] - Yang, G.; Liu, J.; Zhao, C.; Li, Z.; Huang, Y.; Yu, H.; Xu, B.; Yang, X.; Zhu, D.; Zhang, X.; et al. Unmanned aerial vehicle remote sensing for field-based crop phenotyping: Current status and perspectives. Front. Plant Sci.
**2017**, 8, 1111. [Google Scholar] [CrossRef] [PubMed] - Lu, N.; Zhou, J.; Han, Z.; Li, D.; Cao, Q.; Yao, X.; Tian, Y.; Zhu, Y.; Cao, W.; Cheng, T. Improved estimation of aboveground biomass in wheat from RGB imagery and point cloud data acquired with a low-cost unmanned aerial vehicle system. Plant Methods
**2019**, 15, 17. [Google Scholar] [CrossRef] [PubMed] - Pan, L.; Liu, L.; Condon, A.G.; Estavillo, G.M.; Coe, R.A.; Bull, G.; Stone, E.A.; Petersson, L.; Rolland, V. Biomass Prediction With 3D Point Clouds From LiDAR. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 3–8 January 2022; pp. 1330–1340. [Google Scholar]
- Oehmcke, S.; Li, L.; Revenga, J.; Nord-Larsen, T.; Trepekli, K.; Gieseke, F.; Igel, C. Deep Learning Based 3D Point Cloud Regression for Estimating Forest Biomass. arXiv
**2021**, arXiv:2112.11335. [Google Scholar] - Forrester, D.I.; Tachauer, I.H.H.; Annighoefer, P.; Barbeito, I.; Pretzsch, H.; Ruiz-Peinado, R.; Stark, H.; Vacchiano, G.; Zlatanov, T.; Chakraborty, T.; et al. Generalized biomass and leaf area allometric equations for European tree species incorporating stand structure, tree age and climate. For. Ecol. Manag.
**2017**, 396, 160–175. [Google Scholar] [CrossRef] - Herold, A.; Zell, J.; Rohner, B.; Didion, M.; Thürig, E.; Rösler, E. State and change of forest resources. In Swiss National Forest Inventory–Methods and Models of the Fourth Assessment; Springer: Berlin/Heidelberg, Germany, 2019; pp. 205–230. [Google Scholar]
- Shendryk, Y.; Sofonia, J.; Garrard, R.; Rist, Y.; Skocaj, D.; Thorburn, P. Fine-scale prediction of biomass and leaf nitrogen content in sugarcane using UAV LiDAR and multispectral imaging. Int. J. Appl. Earth Obs. Geoinf.
**2020**, 92, 102177. [Google Scholar] [CrossRef] - Han, L.; Yang, G.; Dai, H.; Xu, B.; Yang, H.; Feng, H.; Li, Z.; Yang, X. Modeling maize above-ground biomass based on machine learning approaches using UAV remote-sensing data. Plant Methods
**2019**, 15, 10. [Google Scholar] [CrossRef] - Zha, H.; Miao, Y.; Wang, T.; Li, Y.; Zhang, J.; Sun, W.; Feng, Z.; Kusnierek, K. Improving unmanned aerial vehicle remote sensing-based rice nitrogen nutrition index prediction with machine learning. Remote Sens.
**2020**, 12, 215. [Google Scholar] [CrossRef] - Tamiminia, H.; Salehi, B.; Mahdianpari, M.; Beier, C.M.; Klimkowski, D.J.; Volk, T.A. comparison of machine and deep learning methods to estimate shrub willow biomass from UAS imagery. Can. J. Remote Sens.
**2021**, 47, 209–227. [Google Scholar] [CrossRef] - Ma, J.; Li, Y.; Chen, Y.; Du, K.; Zheng, F.; Zhang, L.; Sun, Z. Estimating above ground biomass of winter wheat at early growth stages using digital images and deep convolutional neural network. Eur. J. Agron.
**2019**, 103, 117–129. [Google Scholar] [CrossRef] - Danish Ministry of Environment, Government of Denmark. Order on the Use of Fertilisers by Agriculture for the 2020/2021 Planning Period. Available online: https://www.retsinformation.dk/eli/lta/2020/1166 (accessed on 25 October 2021).
- Jensen, R.; Herbst, M.; Friborg, T. Direct and indirect controls of the interannual variability in atmospheric CO2 exchange of three contrasting ecosystems in Denmark. Agric. For. Meteorol.
**2017**, 233, 12–31. [Google Scholar] [CrossRef] - Davidson, L.; Mills, J.; Haynes, I.; Augarde, C.; Bryan, P.; Douglas, M. Airborne to UAS LiDAR: An analysis of UAS LiDAR ground control targets. In Proceedings of the ISPRS Geospatial Week 2019, Enschede, The Netherlands, 10–14 June 2019. [Google Scholar]
- Jutzi, B.; Eberle, B.; Stilla, U. Estimation and measurement of backscattered signals from pulsed laser radar. In Image and Signal Processing for Remote Sensing VIII; SPIE: New York, NY, USA, 2003; Volume 4885, pp. 256–267. [Google Scholar]
- Gielen, B.; Acosta, M.; Altimir, N.; Buchmann, N.; Cescatti, A.; Ceschia, E.; Fleck, S.; Hortnagal, L.; Klumpp, K.; Kolari, P.; et al. Ancillary vegetation measurements at ICOS ecosystem stations. Int. Agrophys.
**2018**, 32, 645–664. [Google Scholar] [CrossRef] - Sechidis, K.; Tsoumakas, G.; Vlahavas, I. On the stratification of multi-label data. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases; Springer: Berlin/Heidelberg, Germany, 2011; pp. 145–158. [Google Scholar]
- Meier, U. Growth Stages of Mono-and Dicotyledonous Plants; Blackwell Wissenschafts-Verlag: Berlin, Germany, 1997. [Google Scholar]
- Kuester, T.; Spengler, D.; Barczi, J.F.; Segl, K.; Hostert, P.; Kaufmann, H. Simulation of multitemporal and hyperspectral vegetation canopy bidirectional reflectance using detailed virtual 3-D canopy models. IEEE Trans. Geosci. Remote Sens.
**2013**, 52, 2096–2108. [Google Scholar] [CrossRef] - Hartigan, J.A. Clustering Algorithms; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 1975. [Google Scholar]
- Bock, H.H. Clustering methods: A history of k-means algorithms. In Selected Contributions in Data Analysis and Classification; Springer: Berlin/Heidelberg, Germany, 2007; pp. 161–172. [Google Scholar]
- Owen, A.B. A robust hybrid of lasso and ridge regression. Contemp. Math.
**2007**, 443, 59–72. [Google Scholar] - Huber, P.J. Robust statistics. In International Encyclopedia of Statistical Science; Springer: Berlin/Heidelberg, Germany, 2011; pp. 1248–1251. [Google Scholar]
- Morsdorf, F.; Meier, E.; Kötz, B.; Itten, K.I.; Dobbertin, M.; Allgöwer, B. LIDAR-based geometric reconstruction of boreal type forest stands at single tree level for forest and wildland fire management. Remote Sens. Environ.
**2004**, 92, 353–362. [Google Scholar] [CrossRef] - Geurts, P.; Ernst, D.; Wehenkel, L. Extremely randomized trees. Mach. Learn.
**2006**, 63, 3–42. [Google Scholar] [CrossRef] - Chen, T.; He, T.; Benesty, M.; Khotilovich, V.; Tang, Y.; Cho, H.; Chen, K. Xgboost: Extreme gradient boosting. R Package Version 0.4-2
**2015**, 1, 1–4. [Google Scholar] - Yang, L.; Shami, A. On hyperparameter optimization of machine learning algorithms: Theory and practice. Neurocomputing
**2020**, 415, 295–316. [Google Scholar] [CrossRef] - Feurer, M.; Hutter, F. Hyperparameter optimization. In Automated Machine Learning; Springer: Cham, Switzerland, 2019; pp. 3–33. [Google Scholar]
- Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat.
**2001**, 29, 1189–1232. [Google Scholar] [CrossRef] - Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
- Breiman, L. Random forests. Mach. Learn.
**2001**, 45, 5–32. [Google Scholar] [CrossRef] - Vosselman, G. Slope based filtering of laser altimetry data. Int. Arch. Photogramm. Remote Sens.
**2000**, 33, 935–942. [Google Scholar] - Zhang, W.; Qi, J.; Wan, P.; Wang, H.; Xie, D.; Wang, X.; Yan, G. An easy-to-use airborne LiDAR data filtering method based on cloth simulation. Remote Sens.
**2016**, 8, 501. [Google Scholar] [CrossRef] - Zhao, X.; Guo, Q.; Su, Y.; Xue, B. Improved progressive TIN densification filtering algorithm for airborne LiDAR data in forested areas. ISPRS J. Photogramm. Remote Sens.
**2016**, 117, 79–91. [Google Scholar] [CrossRef] - R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2013. [Google Scholar]
- GreenValley International, Ltd. LiDAR360; GreenValley International, Ltd.: Berkeley, CA, USA, 2021. [Google Scholar]
- Longley, P.A.; Goodchild, M.F.; Maguire, D.J.; Rhind, D.W. Geographic Information Systems and Science; John Wiley & Sons: Hoboken, NJ, USA, 2005. [Google Scholar]
- Burrough, P.A.; McDonnell, R.A.; Lloyd, C.D. Principles of Geographical Information Systems; Oxford University Press: Oxford, UK, 2015. [Google Scholar]
- Beutel, A.; Mølhave, T.; Agarwal, P.K. Natural neighbor interpolation based grid DEM construction using a GPU. In Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems, San Jose, CA, USA, 2–5 November 2010; pp. 172–181. [Google Scholar]
- Walter, J.D.; Edwards, J.; McDonald, G.; Kuchel, H. Estimating biomass and canopy height with LiDAR for field crop breeding. Front. Plant Sci.
**2019**, 10, 1145. [Google Scholar] [CrossRef] - Chen, Q.; Laurin, G.V.; Valentini, R. Uncertainty of remotely sensed aboveground biomass over an African tropical forest: Propagating errors from trees to plots to pixels. Remote Sens. Environ.
**2015**, 160, 134–143. [Google Scholar] [CrossRef] - Goetz, S.; Dubayah, R. Advances in remote sensing technology and implications for measuring and monitoring forest carbon stocks and change. Carbon Manag.
**2011**, 2, 231–244. [Google Scholar] [CrossRef] - Quiñonero-Candela, J.; Sugiyama, M.; Schwaighofer, A.; Lawrence, N.D. Dataset Shift in Machine Learning; Mit Press: Cambridge, MA, USA, 2008. [Google Scholar]
- Genuer, R.; Poggi, J.M.; Tuleau-Malot, C. Variable selection using random forests. Pattern Recognit. Lett.
**2010**, 31, 2225–2236. [Google Scholar] [CrossRef] - Toloşi, L.; Lengauer, T. Classification with correlated features: Unreliability of feature ranking and solutions. Bioinformatics
**2011**, 27, 1986–1994. [Google Scholar] [CrossRef] - Gregorutti, B.; Michel, B.; Saint-Pierre, P. Correlation and variable importance in random forests. Stat. Comput.
**2017**, 27, 659–678. [Google Scholar] [CrossRef] - Zhang, H.; Nettleton, D.; Zhu, Z. Regression-Enhanced Random Forests. In Statistics Conference Proceedings; Presentations and Posters; 2017; Volume 9, Available online: https://dr.lib.iastate.edu/entities/publication/8c7c1d24-a466-4e37-a5c0-7f7405fa867e (accessed on 13 June 2022).

**Figure 1.**The location of the study site (⋆) in Mid-Jutland (DK). The inset shows a top-down view of the field site and the surrounding area. Source: www.icos-cp.eu (accessed on: 8 August 2021) and Google Earth Engine.

**Figure 2.**The crop development and canopy structure: (

**a**,

**b**) the AGB development during the 2020 (barley) and 2021 (wheat) growing seasons, respectively, with an indication of the dates of the AGB sampling events (see Table 1). The shaded area covers ± the standard deviation; (

**c**,

**d**) the crop structure at the maturity stage of the barley and wheat, respectively.

**Figure 3.**Instrumentation: (

**a**) the UAV (DJI Matrice 600) that was used in this study with the mounted LiDAR system; (

**b**) the LiDAR system (Nano M8, LidarSwiss GmbH).

**Figure 4.**The spatial distribution of the AGB sampling locations. Each color indicates one of the original datasets:

**red**represents the barley samples that were collected in 2020 (i.e., $barle{y}_{20}$);

**blue**represents the wheat samples that were collected in 2021 (i.e., $whea{t}_{21}$);

**white**represents the barley samples that were collected in 2021 (i.e., $barle{y}_{21}$).

**Figure 5.**The point cloud data (PCD) scenes (the crops are shown at the maturity stage and the PCD scenes are colored by elevation): (

**a**) the barley field in 2020; (

**b**) the wheat field in 2021. In both (

**a**,

**b**), the upper panels show the cross-sectional views of the PCD, with a buffer depth of 0.5 m. The x (–), y (–), and z (–) axes indicate easting, northing, and elevation, respectively. It can be seen that there was a higher PCD porosity in (

**b**) than in (

**a**).

**Figure 6.**The generation of the augmented datasets: (

**a**) the partitioning of the individual LiDAR samples into three adjacent parts; (

**b**) the three original adjacent AGB samples (individual size: 0.175 m

^{2}); (

**c**) the shaded area represents each augmented sample (size: 0.35 m

^{2}–0.52 m

^{2}). The two augmented datasets were produced by sampling (with replacements) the individual original samples and adding either two or three of them together to produce a new instance. Each of the instances that were produced in this way were attributed the mean value of AGB of the individual samples while the LiDAR-derived features were re-calculated by considering all of the LiDAR returns contained in the individual samples.

**Figure 7.**The data processing pipeline, model training, and the evaluation of the predictions: (

**a**) the 2020 barley dataset was split into training and validation sets. The instances in both sets were stratified according to the mean AGB values from the training distribution. A cross-validated grid search was conducted to optimize the hyperparameters for model selection; (

**b**) the ML model with the best performance was fitted to a new dataset (i.e., 70% of either the 2021 barley or wheat datasets) and the final prediction performance was evaluated using the remaining test set (i.e., 30% of the 2021 datasets).

**Figure 8.**The processing pipeline from the input PCD scene to the output AGB prediction map (in g/m

^{2}at a 1 m

^{2}resolution): (

**a**) the PCD scene processing, including binary classification and digital terrain model (DTM) generation via the interpolation of ground returns; (

**b**) a normalized point cloud with height values that were relative to the ground was used to produce the prediction feature maps for the metrics of height and reflectance; (

**c**) the predictors were input into the trained ML regression model to produce the AGB prediction maps. The example AGB map corresponds to the barley field on 8 July 2021.

**Figure 9.**A comparison of the regression performances of the considered models using the training and validation datasets: (

**a**) $\overline{{R}^{2}}$; (

**b**) $\overline{RMSE}$; (

**c**) $\overline{MAPE}$; (

**d**) $\overline{MAE}$. The blue and green horizontal lines represent the performance of the linear regression baseline model using the training and validation sets, respectively. The overlined scores represent the mean values of 10 randomized executions.

**Figure 10.**The AGB predictions for the barley crops at a 0.35 m

^{2}resolution that were produced by the ERT model compared to those that were produced by the linear model (baseline): (

**a**) the regression performance of the ERT model using the testing dataset (${R}^{2}$ = 0.48; RMSE = 207 g/m

^{2}; MAE = 162 g/m

^{2}; MAPE = 42%); (

**b**) the regression performance of the linear model using the testing dataset (${R}^{2}$ = 0.1; RMSE = 302 g/m

^{2}; MAE = 247 g/m

^{2}; MAPE = 34%).

**Figure 11.**The AGB predictions for the wheat crops at a 0.35 m

^{2}resolution that were produced by the ERT model compared to those that were produced by the linear model (baseline): (

**a**) the regression performance of the ERT model using the testing dataset (${R}^{2}$ = 0.20; RMSE = 288 g/m

^{2}; MAE = 216 g/m

^{2}; MAPE = 23%); (

**b**) the regression performance of the linear model (baseline) using the testing dataset (${R}^{2}$ = 0.14; RMSE = 304 g/m

^{2}; MAE = 254 g/m

^{2}; MAPE = 33%).

**Figure 12.**An analysis of the aggregated predictions using the barley testing dataset: (

**a**) the residual distribution had a mean value approaching zero (i.e., 2.2 g/m

^{2}in the testing set, where N = 57); (

**b**) the $\overline{{R}^{2}}$ score converged to 1 as the number of aggregated samples increased. At every step along the x-axis, the data series took the mean (green solid line) of 100 repetitions (at a 1 m

^{2}spatial resolution, where $\overline{{R}^{2}}$ = 0.71). The light gray line shows the worst performance in each iteration, while the shaded area covers the confidence interval (i.e., ±the standard deviation); (

**b.1**) a scatter plot of the predicted AGB values vs. the AGB field measurements at a 1 m

^{2}spatial resolution.

**Figure 13.**An analysis of the aggregated predictions using the wheat testing dataset: (

**a**) the residual distribution had a mean value approaching zero with a slight overestimation (i.e., −18.5 g/m

^{2}in the testing set, where N = 183), which represented a systematic error of 2% for the average wheat sample weight; (

**b**) the $\overline{{R}^{2}}$ score converged to 1 as the number of aggregated samples increased. At every step along the x-axis, the data series took the mean (green solid line) of 100 executions (at a 1 m

^{2}spatial resolution, where $\overline{{R}^{2}}$ = 0.58). The light gray line shows the worst performance in each iteration, while the shaded area covers the confidence interval (i.e., ±the standard deviation); (

**b.1**) a scatter plot of the predicted AGB values vs. the AGB field measurements at a 1 m

^{2}spatial resolution.

**Table 1.**A description of the datasets. The $barle{y}_{20}$ dataset was used for the training and validation phases, while the $barle{y}_{21,aug.}$ and $whea{t}_{21,aug.}$ datasets were used to test the prediction results.

Growing Season | Dataset | Number of Samples | Sample Size |
---|---|---|---|

2020 | $barle{y}_{20}$ | 104 | 1 m × 0.35 m |

2021 | $barle{y}_{21}$ | 142 | 0.5 m × 0.35 m |

$barle{y}_{21,aug.}$ | 188 | (1–1.5) m × 0.35 m | |

$whea{t}_{21}$ | 455 | 0.5 m × 0.35 m | |

$whea{t}_{21,aug.}$ | 609 | (1–1.5) m × 0.35 m |

**Table 2.**A description of the models that were evaluated. The implementations were standardized Python modules.

Regression Model | Family | Description | Implementation |
---|---|---|---|

Extremely Randomized Trees (ERT) | Tree-Based Ensemble | Ensemble of decision trees (parallel setup) [70] in which the output is the average of individual predictions | scikit-learn |

XGboost | Boosting | Gradient boosting method that is based on stage-wise additive expansions [74,75] | xgboost |

Huber | Linear | Regularized linear regression that is robust to outliers [67,68] | scikit-learn |

Linear Regression (Baseline) | Linear | Ordinary least squares linear regression | scikit-learn |

Regression Model | Hyperparameters | |
---|---|---|

Included in Cross-Validation ${}^{\mathbf{a}}$ | Total ${}^{\mathbf{b}}$ | |

Extremely Randomized Trees (ERT) | Criterion {mae; mse}, max. depth (None; 1, …, 9), bootstrap {True; False}, max. features {log2; sqrt} | 17 |

XGboost | Booster {gbtree; gblinear; dart}, step size shrinkage (0.1, …, 0.5), learning rate (0.01, …, 0.1), L1 regularization (0, …, 0.5) | 29 |

Huber | Epsilon (1.1, …, 1.75), alpha ($5\phantom{\rule{0.166667em}{0ex}}\xb7\phantom{\rule{0.166667em}{0ex}}{10}^{-5}$, …, ${10}^{-3}$), fit intercept {True; False}, tolerance (${10}^{-6}$, …, ${10}^{-4}$) | 6 |

Linear Regression (Baseline) | Fit intercept {True; False} | 1 |

^{a}The hyperparameters that were included in the cross-validation grid search for parameter selection (values in curled brackets show parameter sets and those in round brackets show the ranges of the search);

^{b}the tunable hyperparameters that were considered for each model in scikit-learn or xgboost Python libraries.

Testing Dataset | Spatial Resolution (m^{2}) | ${\mathit{R}}^{2}$ | RMSE (g/m^{2}) | MAE (g/m^{2}) | MAPE (%) |
---|---|---|---|---|---|

Barley Testing Dataset | 0.35 | 0.48 | 207 | 162 | 42.0 |

1.00 | 0.71 | 232 | 214 | 23.0 | |

2.00 | 0.93 | 300 | 266 | 13.0 | |

Wheat Testing Dataset | 0.35 | 0.20 | 288 | 216 | 23.7 |

1.00 | 0.58 | 284 | 264 | 19.7 | |

2.00 | 0.89 | 400 | 351 | 16.0 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Revenga, J.C.; Trepekli, K.; Oehmcke, S.; Jensen, R.; Li, L.; Igel, C.; Gieseke, F.C.; Friborg, T.
Above-Ground Biomass Prediction for Croplands at a Sub-Meter Resolution Using UAV–LiDAR and Machine Learning Methods. *Remote Sens.* **2022**, *14*, 3912.
https://doi.org/10.3390/rs14163912

**AMA Style**

Revenga JC, Trepekli K, Oehmcke S, Jensen R, Li L, Igel C, Gieseke FC, Friborg T.
Above-Ground Biomass Prediction for Croplands at a Sub-Meter Resolution Using UAV–LiDAR and Machine Learning Methods. *Remote Sensing*. 2022; 14(16):3912.
https://doi.org/10.3390/rs14163912

**Chicago/Turabian Style**

Revenga, Jaime C., Katerina Trepekli, Stefan Oehmcke, Rasmus Jensen, Lei Li, Christian Igel, Fabian Cristian Gieseke, and Thomas Friborg.
2022. "Above-Ground Biomass Prediction for Croplands at a Sub-Meter Resolution Using UAV–LiDAR and Machine Learning Methods" *Remote Sensing* 14, no. 16: 3912.
https://doi.org/10.3390/rs14163912