Next Article in Journal
Advances in Folding-Wing Flying Underwater Drone (FUD) Technology
Previous Article in Journal
Objective Programming Partitions and Rule-Based Spanning Tree for UAV Swarm Regional Coverage Path Planning
Previous Article in Special Issue
Estimation of Cotton LAI and Yield Through Assimilation of the DSSAT Model and Unmanned Aerial System Images
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Can Environmental Analysis Algorithms Be Improved by Data Fusion and Soil Removal for UAV-Based Buffel Grass Biomass Prediction?

by
Wagner Martins dos Santos
1,
Alexandre Maniçoba da Rosa Ferraz Jardim
2,*,
Lady Daiane Costa de Sousa Martins
1,
Márcia Bruna Marim de Moura
3,
Elania Freire da Silva
4,
Luciana Sandra Bastos de Souza
3,
Alan Cezar Bezerra
5,
José Raliuson Inácio Silva
5,
Ênio Farias de França e Silva
1,
João L. M. P. de Lima
6,
Leonor Patricia Cerdeira Morellato
2 and
Thieres George Freire da Silva
1,5,*
1
Department of Agricultural Engineering, Federal Rural University of Pernambuco, Recife 52171-900, Pernambuco, Brazil
2
Department of Biodiversity, Institute of Biosciences, São Paulo State University—UNESP, Rio Claro 13506-900, São Paulo, Brazil
3
Postgraduate Program in Biodiversity and Conservation, Academic Unit of Serra Talhada, Federal Rural University of Pernambuco, Serra Talhada 56909-535, Pernambuco, Brazil
4
Department of Agricultural and Forestry Sciences, Federal Rural University of the Semi-Arid, Mossoró 59625-900, Rio Grande do Norte, Brazil
5
Postgraduate Program in Plant Production, Academic Unit of Serra Talhada, Federal Rural University of Pernambuco, Serra Talhada 56909-535, Pernambuco, Brazil
6
MARE—Marine and Environmental Sciences Centre, ARNET—Aquatic Research Network, Department of Civil Engineering, Faculty of Sciences and Technology, University of Coimbra, Rua Luís Reis Santos, Pólo II, 3030-788 Coimbra, Portugal
*
Authors to whom correspondence should be addressed.
Drones 2026, 10(1), 61; https://doi.org/10.3390/drones10010061
Submission received: 20 November 2025 / Revised: 6 January 2026 / Accepted: 13 January 2026 / Published: 15 January 2026

Highlights

What are the main findings?
  • Removing soil pixels consistently improved the performance of biomass prediction models.
  • The model with the highest accuracy (CatBoost) was obtained using only RGB sensor data and the Boruta feature selection method.
What are the implications of the main findings?
  • The model with the highest accuracy (CatBoost) was obtained using only RGB sensor data and the Boruta feature selection method.
  • Vegetation cover area is a dominant predictor, suggesting that structural canopy metrics are more informative than complex spectral indices for forage biomass estimation.

Abstract

The growing demand for sustainable livestock systems requires efficient methods for monitoring forage biomass. This study evaluated spectral (RGB and multispectral), textural (GLCM), and area attributes derived from unmanned aerial vehicle (UAV) imagery to predict buffelgrass (Cenchrus ciliaris L.) biomass, also testing the effect of soil pixel removal. A comprehensive machine learning pipeline (12 algorithms and 6 feature selection methods) was applied to 14 data combinations. Our results demonstrated that soil removal consistently improved the performance of the applied models. Multispectral (MSI) sensors were the most robust individually, whereas textural (GLCM) attributes did not contribute significantly. Although the MSI and RGB data combination proved complementary, the model with the highest accuracy was obtained with CatBoost using only RGB information after Boruta feature selection, achieving a CCC of 0.83, RMSE of 0.214 kg, and R2 of 0.81 in the test set. The most important variable was vegetation cover area (19.94%), surpassing spectral indices. We conclude that integrating RGB UAVs with robust processing can generate accessible and effective tools for forage monitoring. This approach can support pasture management by optimizing stocking rates, enhancing natural resource efficiency, and supporting data-driven decisions in precision silvopastoral systems.

1. Introduction

The growing demand for more productive, efficient, and sustainable agricultural systems has driven the adoption of advanced crop monitoring and management technologies [1,2]. In this context, the non-destructive estimation of forage biomass plays a strategic role, as it provides essential information on productivity, growth dynamics, and the carrying capacity of livestock systems [3,4]. Accurate biomass quantification is a crucial component of pasture management programs, enabling the optimization of stocking rates, enhancing natural resource use efficiency, and supporting data-driven precision agriculture practices [4,5].
In recent decades, advances in remote sensing and unmanned aerial vehicles (UAVs) have considerably advanced the monitoring of plant biomass [6,7]. The use of UAV-mounted sensors capable of capturing images with high spatial and temporal resolution offers an effective alternative to traditional methods, which are costly, destructive, and often unfeasible on a large scale [6,8,9]. In this context, vegetation indices derived from spectral imagery (particularly those based on visible and near-infrared bands) have been widely employed to infer the biophysical and biochemical parameters of crops, including biomass, leaf area, and chlorophyll content [9,10].
In the literature, it is well-established that spectral attributes obtained through remote sensing are directly related to vegetative vigor and biomass accumulation [5,9]. For example, previous studies have demonstrated a strong correlation between vegetation indices (VIs) such as the Normalized Difference Vegetation Index (NDVI) [9], Normalized Green-Red Difference Index (NGRDI), and Visible Atmospherically Resistant Index (VARI) [11], and the productivity of forage grasses. In addition to spectral attributes, structural and textural features extracted from high-resolution imagery have emerged as valuable complementary indicators, capturing spatial variations associated with canopy architecture and vegetation distribution [12,13,14]. Textural analysis based on the Gray-Level Co-occurrence Matrix (GLCM), for instance, has been successfully employed to enhance land cover class discrimination and estimate structural parameters in various agricultural systems [14,15].
Furthermore, methodological strategies have been proposed to mitigate background interference, particularly associated with exposed soil, which can compromise the vegetation’s spectral response in both homogeneous and heterogeneous agricultural scenes [16,17]. In such heterogeneous conditions, the coexistence multiple plant species, variable canopy densities, crop residues, shadows, and soil exposure increases spectral variability and makes vegetation–background separation more challenging [16]. Within this context, the use of thresholds applied to vegetation indices (VIs) stands out as a simple and operationally efficient technique that enables a more precise isolation of the canopy signal and has shown consistent results across different agricultural settings [17,18,19], especially when combined with the vegetation cover fraction [18,20]. Other approaches, such as object-based segmentation, supervised classification, and deep-learning methods, have also been applied to heterogeneous scenes; however, threshold-based procedures remain attractive due to their transparency, low computational cost, and ease of implementation, while still providing competitive performance in practical biomass-estimation workflows [10,16,21,22].
Despite recent advances in the integration of remote sensing, UAVs, and machine learning for estimating biophysical parameters in agricultural crops, their application to biomass quantification in forage species remains relatively underexplored, particularly regarding the number and diversity of species studied [10]. This gap is especially critical in the case of buffelgrass (Cenchrus ciliaris L.), an important species for livestock production in semiarid regions worldwide [23]. Owing to its drought tolerance, buffelgrass exhibits high competitive capacity, good productivity, and adaptability to diverse environmental conditions (e.g., soil types, altitudes, and rainfall ranging from 250 to 2700 mm), as well as a strategic role in the sustainability of production systems [11,23,24]. Therefore, studies that investigate the modeling of biomass in this and other forage grasses based on spectral and textural attributes represent a necessary advancement, with the potential to significantly expand the use of precision agriculture tools in forage crop management [7,10].
However, the optimized application of these technologies presents several challenges. First, although the superiority of multispectral sensors (MSI) is often assumed, RGB (Red, Green, Blue) sensors—when combined with advanced processing and variable selection techniques—can achieve comparable accuracy [25,26]. Second, the relative contribution of different sources of information, including spectral (RGB and MSI), textural (GLCM), and spatial (vegetation cover area) data, must be systematically compared [27]. Finally, a critical limitation of many approaches lies in the inclusion of a large number of predictors (e.g., dozens of VIs), which makes variable selection a fundamental step in the modeling process [28,29].
Accordingly, the objective of this study was to evaluate the accuracy of different combinations of spectral, textural, and area-based attributes derived from UAV imagery for predicting the biomass of buffelgrass, both with and without soil removal. Our hypothesis was that integrating these different sources of information, together with the use of variable selection techniques, would significantly improve the performance of predictive models compared to using each group of predictors in isolation. To achieve this, we implemented a comprehensive machine learning pipeline that tested multiple combinations of datasets and algorithms, as well as different feature selection methods. This approach not only enabled a direct comparison of the predictive potential of spectral, textural, and area-based information but also allowed for the assessment of the effect of soil removal. Consequently, this study provides an accessible and effective framework for forage monitoring, with significant potential to support data-driven decisions and optimize pasture management.

2. Materials and Methods

2.1. Study Area

The study was conducted at the International Reference Center for Agrometeorological Studies in Cactus and Other Forage Crops, located at the Universidade Federal Rural de Pernambuco/Unidade Acadêmica de Serra Talhada (UFRPE/UAST), in Serra Talhada, Pernambuco, Brazil (−7.95458° S, −38.29509° W; Figure 1). The region is characterized by air temperatures ranging from 20.1 to 32.9 °C, an average annual rainfall of 642 mm, and a mean evapotranspiration demand of 1800 mm per year [30].
The experimental plots (6 × 4 m) consisted of areas previously established with irrigated buffalo grass (Cenchrus ciliaris L.). The experimental site is characterized by a Cambissolo Háplico Ta Eutrófico [31], classified as an Inceptisol according to the Soil Taxonomy classification [32]. The physical and chemical characterization of the 0.0–0.40 m layer indicated a pH in water of 6.90 and a soil texture composed of 77% sand, 10% silt, and 13% clay, the soil presented a sum of bases of 6.4 (cmolc dm−3), with individual concentrations of K+ 0.61 (cmolc dm−3), Na+ (0.04 cmolc dm−3), Ca2+ (4.5 cmolc dm−3), and Mg2+ (1.20 cmolc dm−3), cation exchange capacity of 7.5 cmolc dm−3, base saturation of 85%. Furthermore, the bulk and particle densities were recorded at 1.45 (g cm−3) and 2.57 (g cm−3), respectively.
Irrigation management was based on reference evapotranspiration (ET0), which was calculated using the Penman–Monteith (Equation (1)) approach. Water was supplied through a drip irrigation system operating at 100 kPa, with emitters spaced 0.20 m apart and an average discharge of 1.57 L h−1, presenting an application uniformity coefficient of 92%. Irrigation events were scheduled in the morning on Mondays, Wednesdays, and Fridays. Irrigation depths were applied at 50, 75, 100, and 125% of ET0. When rainfall exceeded the irrigation depth required, water application was suspended, ensuring full replacement of ET0. The irrigation water originated from a nearby artesian well and was classified as C3S1, with electrical conductivity of approximately 1.62 dS m−1, sodium and potassium concentrations of 168.66 and 28.17 mg L−1, respectively, and pH of 6.84, indicating high salinity and low sodium hazard [33]. The relationship between rainfall and the 100% ET0 irrigation depth applied during the experimental period is presented in the Supplementary Material (Figure S1).
ET 0 = 0.408     Rn G + γ   900 T + 273 μ 2 ( e s e a ) + γ   ( 1 + 0.34   μ 2 )
where ET0 is the reference evapotranspiration (mm day−1), Rn is the net radiation at the crop surface (MJ m−2 day−1), G is the soil heat flux (MJ m−2 day−1), T is the daily average air temperature at 2 m height above ground (°C), μ2 is the wind speed at 2 m height (m s−1), es is the vapor saturation pressure (kPa), ea is the current vapor pressure (kPa), Δ is the slope of the vapor pressure versus temperature curve (kPa °C−1), and γ is the psychrometric constant (kPa °C−1).
Each plot was composed of four cultivation rows, but only the two central rows were considered for sampling to minimize border effects. Furthermore, each central row was subdivided into three equal segments (2 m each), resulting in the experimental plots (Figure 2). Thus, a total of six sampled units were obtained per plot, corresponding to the three segments of each of the two central rows. In cases where the row length was insufficient due to plant growth, subdivision was performed into two parts while maintaining representative sampling. Harvesting was carried out in full for each experimental unit, and the plant material was immediately weighed in the field to obtain fresh biomass (kg).
Two sampling events were conducted: the first on 5 January 2025 (90 experimental plots, 72-day cycle, starting on 25 October 2024), and the second on 3 June 2025 (94 experimental plots, 53-day cycle, starting on 11 April 2025). The average values for each treatment and date are described in Table 1.

2.2. Aerial Imaging and Orthomosaic Generation

Aerial images were acquired on the day preceding biomass collection, specifically on 6 January 2025, and 3 June 2025 (Figure 1). Image acquisition was performed using a Mavic 3M UAV (DJI, Shenzhen, China) equipped with an RGB sensor (4/3 CMOS: 20 MP, 5280 × 3956 px) and four multispectral (MSI) sensors (1/2.8 CMOS: 5 MP, 2592 × 1944 px). The MSI bands included Green (G): 560 ± 16 nm, Red (R): 650 ± 16 nm, Red Edge (RE): 730 ± 16 nm, and Near-Infrared (NIR): 860 ± 26 nm. The UAV also featured a GNSS system (GPS + Galileo + BeiDou + GLONASS) and an RTK module (horizontal accuracy: 1 cm + 1 ppm; vertical accuracy: 1.5 cm + 1 ppm).
In this study, flight planning was conducted using the DJI Pilot 2 application (v.10.1.0.30), with flight parameters set at an altitude of 30 m, 80% front and side overlap, and a speed of 2 m s−1. Flights were performed between 11:00 a.m. and 12:00 p.m. under clear skies and uniform lighting conditions. Ten ground control points (GCPs) were used for positional correction of the images. Radiometric calibration of the MSI bands was carried out using the integrated solar sensor. Orthomosaic generation, along with all geometric and radiometric correction steps, was performed in Agisoft Metashape software (v.2.1.2, Agisoft LLC, St. Petersburg, Russia) (Table S1). The final spatial resolution of the orthomosaics was approximately 0.8 cm for RGB and 1 cm for MSI.

Removal of Soil and Vegetation Cover Fraction

The RGB and MSI orthomosaics were imported into QGIS software (v.3.28), and polygons corresponding to each experimental unit were manually delineated based on the boundaries of the planting rows (Figure 1e and Figure 2). For each polygon, the total area (m2, areatotal), vegetated area (m2, areacrop), and bare soil area (m2, areasoil) were calculated based on pixel classification. The separation between vegetation and soil was performed using predefined spectral index thresholds: NGRDI < −0.05 for RGB images and NDVI < 0.35 for MSI images. These threshold values were established through detailed visual inspection of the imagery. Including the total area allowed for representation of the effective size of each plot, acknowledging that buffelgrass grows without physical constraints imposed by the previously established planting rows.

2.3. Vegetation Indices

A total of 34 vegetation indices were calculated, including 18 derived from RGB bands and 16 from MSI bands (Table 2). Some indices were adapted by band equivalence, allowing their calculation using either MSI or RGB bands. The mean value of each index was extracted for every delineated polygon, both with and without prior removal of soil pixels, to assess the effect of soil interference on the predictive models. All computations were performed in R software (v.4.3.2) [34], using the package raster (v.3.6-32) [35], terra (v.1.8-70) [36], sf (v.1.0-21) [37] and FieldImageR (v.0.6.0) [38].

2.4. Textural Analysis Using the Gray-Level Co-Occurrence Matrix (GLCM)

Textural metrics were derived from the RGB orthomosaics using the Gray-Level Co-occurrence Matrix (GLCM) approach [57]. The matrices were computed with a one-pixel displacement in four directions (i.e., 0°, 45°, 90°, and 135°) within an 11 × 11 moving window. The mean across directions was then calculated to ensure rotational invariance. Prior to analysis, original images with 256 gray levels were quantized into 32 gray levels to reduce computational cost and noise. The extracted metrics included contrast (CO), mean (ME), entropy (ENT), correlation (COR), variance (VAR), angular second moment (ASM), and maximum probability (MaxProb) (Table 3). Mean values of these metrics were extracted per polygon, both with and without the exclusion of soil pixels, to ensure comparability between the two scenarios. All analyses were performed in Python (v.3.10.18) [58], using the rasterio (v.1.4.3) [59], numpy (v.1.26.4) [60], and numba (v.0.61.0) [61] libraries.

2.5. Construction of the Datasets

Based on the extracted attributes, datasets were constructed for biomass modeling. Each dataset consistently incorporated area information (areatotal, areacrop and areasoil) associated with each polygon. Seven primary combinations of data sources were created: (1) RGB indices and bands, (2) MSI indices and bands, (3) GLCM metrics, (4) RGB + GLCM, (5) RGB + MSI, (6) MSI + GLCM, and (7) RGB + MSI + GLCM. Each combination was produced in two versions: (i) with soil pixel removal and (ii) without soil pixel removal. This resulted in a total of 14 distinct datasets, enabling the evaluation of the impact of soil presence, different information sources, and their combinations on biomass modeling (Table 4).

2.6. Predictive Modeling of Biomass Using Machine Learning

The analytical pipeline was structured to ensure reproducibility, standardization, and comparability across models. Initially, the 14 datasets derived from combinations of predictors (Table 4) were partitioned into training (70%) and testing (30%) subsets. Stratified sampling ensured that the distribution of the response variable was preserved in both subsets. To prevent data leakage, feature scaling was performed using Min-Max normalization. The normalization parameters (minimum and maximum values) were estimated exclusively from the training set and subsequently applied to scale both the training and testing subsets. The test subset remained isolated throughout the entire training and optimization process, being used exclusively for the final generalization assessment.
Twelve regression algorithms were evaluated, representing different modeling paradigms: regularized linear regressions (Lasso, Ridge, Elastic Net); instance-based methods such as K-Nearest Neighbors (KNN); Support Vector Machines with Radial Basis Function (RBF-SVM) and Linear (L-SVM) kernels; and ensemble methods, including Decision Tree (DT), Random Forest (RF), Extra Trees (ET), Extreme Gradient Boosting (XGB), Light Gradient Boosting Machine (LGBM), and CatBoost. This diversity allowed for the exploration of both linear and non-linear relationships between the predictors and biomass.
To mitigate the performance bias associated with default parameters and ensure an equitable comparison among the algorithms, we performed a hyperparameter optimization process using k-fold cross-validation (k = 5) on the training set. As the hyperparameter optimization process can be computationally costly, particularly due to the large number of models, a small space of 20 parameter combinations (Table 5) was selected, obtained through a maximum entropy space-filling design, to balance computational efficiency and search space coverage. The combination that minimized the mean Root Mean Square Error (RMSE) in the cross-validation was selected. The final model for each algorithm was then fitted using the entire training set.

2.7. Feature Selection

Given the high number of features extracted from the images, a feature selection step was incorporated to mitigate multicollinearity and reduce dimensionality, seeking to identify an optimized subset of predictor variables. Six feature selection methods were compared, divided into three groups: filter, wrapper, and embedded:
  • Filter methods:
(1)
Correlation-based Elimination (Corr): Features with an absolute Pearson correlation > 0.8 were grouped using hierarchical clustering, retaining only the one most correlated with the response variable. In case of a tie, the secondary criterion was the feature with the highest variance, as it contains more information.
(2)
RReliefFF Algorithm: A method that estimates attribute quality based on its ability to discriminate neighboring instances [62,63]. As RReliefFF does not have a pre-defined threshold, features with a normalized importance > 40% of the maximum observed were selected after preliminary analyses in this study. Although this threshold was arbitrarily defined, we believe that setting it iteratively would deviate from the method’s intent.
  • Wrapper methods:
(3)
Recursive Feature Elimination (RFE): A method that iteratively removes features until the best subset of variables is identified [64]. RF was used as the base model for evaluation with 5-fold cross-validation, choosing the subset that minimized the RMSE (root mean square error).
(4)
Boruta: A method built upon RF, which compares the importance of real features with “shadow features” (randomly permuted copies of the original features) [65]. Only features with statistical significance (p-value < 0.01) were retained.
  • Embedded:
(5)
Lasso Regression: Through L1 penalization (i.e., the sum of the absolute values of the coefficients), the coefficients of less relevant features are shrunk to zero [66].
(6)
Ridge Regression: Through the L2 penalty (i.e., the sum of the squared coefficients), it reduces the coefficients to be close to zero [66]. Similarly to RReliefF, we defined a threshold of 40% of the maximum value for the normalized coefficients. For both Lasso and Ridge, 5-fold cross-validation was employed.

2.8. Evaluation and Selection of Integration Methods, Models, and Features

The evaluation process was based on the seven previously defined dataset combinations. A structured stepwise analytical strategy was adopted (Figure 3):
  • In Step 1, the individual data sources (RGB, MSI, and GLCM) were modeled both with and without soil pixel removal. This procedure allowed the assessment of whether soil removal improved predictive performance and, consequently, the selection of the most appropriate scenario regarding this source of interference.
  • In Step 2, only the datasets retained from Step 1 were subjected to feature selection. Different feature selection techniques (filter, wrapper, and embedded) were applied, followed by new modeling runs. This step enabled the identification of the feature selection method that most effectively optimized the predictive performance for the selected scenario.
  • In Step 3, the combinations of data sources were evaluated (RGB + GLCM, RGB + MSI, MSI + GLCM and RGB + GLCM + MSI). Two groups of combined datasets were constructed: one obtained directly from the original datasets without feature selection, and another obtained from the individual datasets already optimized in Step 2 through feature selection. Both groups of combinations were modeled and compared using machine learning algorithms. This approach enabled the simultaneous assessment of the effect of data fusion and the prior feature selection step, supporting the identification of the most appropriate dataset combination and the most accurate predictive models for biomass estimation.
Figure 3. Flowchart of the selection process for integration methods, modeling, and features. GLCM: Gray-Level Co-occurrence Matrix, MSI: Multispectral, RGB: Red, Green, and Blue, rsoil: soil removed.
Figure 3. Flowchart of the selection process for integration methods, modeling, and features. GLCM: Gray-Level Co-occurrence Matrix, MSI: Multispectral, RGB: Red, Green, and Blue, rsoil: soil removed.
Drones 10 00061 g003
All processing steps were performed strictly using the training set and internal cross-validation. The test subset remained completely isolated throughout the entire modeling and optimization workflow, being used exclusively for the final evaluation of the models’ generalization performance.
To quantify predictive accuracy, metrics widely recognized in the regression modeling literature were adopted. Lin’s concordance correlation coefficient [67] (CCC, Equation (2)) was used to simultaneously assess precision and accuracy, providing a robust measure of agreement between observed and estimated values. The mean absolute error (MAE, Equation (3)) and the root mean square error (RMSE, Equation (4)) were employed to measure the magnitude of deviations at different scales, allowing for the characterization of both mean discrepancies and the stricter penalization of large errors. Additionally, the mean absolute percentage error (MAPE, Equation (5)) was considered as a relative metric, enabling the interpretation of model performance in terms proportional to the variability of the observed data.
CCC   = 2 ρ σ 1 σ 2 σ 1 2 + σ 2 2 + μ 1 μ 2 2
MAE = 1 n i = 1 n y i ŷ i
RMSE = 1 n i = 1 n y i ŷ i 2
MAPE = 100 % n i = 1 n y i ŷ i y i
All metrics were initially calculated to ensure a comprehensive evaluation. However, in certain situations, the decision was made to summarily present only the RMSE, as it expresses the magnitude of deviations in the same unit as biomass and more severely penalizes large errors. Since the other indicators (CCC, MAE, and MAPE) exhibited similar behavior, the selection of RMSE as the primary metric in these cases aimed to avoid redundancies, ensuring clarity in the presentation without compromising the consistency of the analysis. All models, selection methods, and evaluations were performed in the R environment (v.4.3.2), using the glmnet [68], caret [69], CORElearn [70] and Boruta [65] packages, and the tidymodels [71] (a collection of packages for modeling and machine learning). The ggplot2 [72] package was used for the analysis visualizations.

3. Results

The GLCM-based models exhibited the highest RMSEs with the KNN, LGBM, DT, and XGBoost algorithms, with values ranging between 0.25 and 0.32. For the other models, GLCM had results closer to the other scenarios (RGB, MSI), although its performance was still inferior (Figure 4). In contrast, the MSI and RGB predictors achieved the lowest RMSE values with tree-based models, such as RF and ET, with errors near 0.18 and 0.22, respectively. Overall, the highest RMSE values were observed for XGB and DT, whereas RF and ET showed the lowest values. The linear regression methods Ridge, Lasso, and Elastic Net also demonstrated promising results, with RMSE values below 0.22. A slight difference was observed in the GLCM data, where the SVM, Ridge, and Lasso algorithms performed better. The MAE, MAPE, and CCC analyses followed a similar pattern, with a greater difference among the MAPE values (Figure 4). The MSI predictor set yielded the highest CCC values overall and most consistently, with the majority of models reaching values between 0.75 and 0.88, reinforcing this dataset as the most promising. Analyzing the spatial distribution of the metrics, the removal of soil (variable that displays the subscript abbreviation rsoil) demonstrated a trend of performance improvement for most combinations of models and predictors. These tendencies were confirmed by the summary analysis presented in the Supplementary Table (Table S2).
The feature selection analysis revealed distinct patterns among the algorithms and the evaluated datasets. In the GLCMrsoil dataset, the selection methods showed greater stability in dimensionality reduction, with selection rates varying between 20% (RFE, RReliefFF) and 40% (Lasso and Boruta), demonstrating the ability to identify a reduced subset of relevant features. The ME variable was the most consistently selected GLCM metric (identified by the majority of algorithms), followed by the area variables (areacrop, areasoil, and areatotal). Conversely, the CO, ENT, MaxProb, and VAR features were not selected by any algorithm, indicating lower relevance for the predictive models (Figure 5).
For the MSIrsoil dataset, the algorithms exhibited greater difficulty in selecting smaller sets of representative features, which may indicate lower redundancy of information gain among the indices used. The RFE method demonstrated the smallest reduction (4.3%), followed by Boruta (17.4%) and Lasso (34.8%) (Figure 5). Meanwhile, RReliefFF (78.3%), Corr (73.9%), and Ridge (69.6%) had the highest reduction rates. Furthermore, high agreement was observed among the methods in identifying relevant vegetation indices, highlighting RVI, MCARI, and GNDVI, followed by areacrop and area as the most frequently selected features by all six evaluated algorithms. Unlike GLCM and RGB, each feature was selected at least once by each algorithm. Likewise, this was the only case where a feature (RVI) was selected by all algorithms.
In the RGBrsoil dataset, the algorithms also showed stability in dimensionality reduction. With the exception of Boruta, which reduced the set by only 37.5%—almost half of the second-lowest reduction by Lasso (66.7%)—the other methods performed substantial reductions (Figure 5). RReliefFF (87.5%) and Corr (83.3%) had the greatest feature reduction, followed by RFE (79.2%), Ridge (70.8%), and Lasso (66.7%) (Figure 5). The features ExR, WI, R, HUE, GRRI, and G, in addition to areatotal and areacrop, stood out in the selection methods, with areatotal being particularly notable as it was selected by five algorithms.
The application of feature selection methods resulted in variations in model performance, with both increases and decreases in RMSE observed, depending on the combination of the dataset, selection method, and machine learning algorithm (Figure 6). For the GLCMrsoil dataset, we observed a consistent increase in error across all algorithms when Correlation and RReliefF selection were used, with values exceeding 0.05 in several cases and reaching up to 0.10 for RReliefF with GLCM. In contrast, the Boruta, RFE, Lasso, and Ridge methods predominantly resulted in decreases or small negative variations, notably below 0.03 in some models.
In the MSIrsoil dataset, the Boruta method led to error reductions across different algorithms. However, selection by Correlation, RReliefF, and Ridge produced frequent increases in error, especially for the Correlation method, with positive relative differences in RMSE (ΔRMSE) in almost all models (Figure 6). RFE, on the other hand, yielded smaller fluctuations, with variations concentrated near zero and negligible positive increments, except for the reduction in XGB and the increase in LGBM.
For the RGBrsoil dataset, selection by Boruta predominantly led to error decreases, with reductions observed in all evaluated algorithms, reaching values greater than 0.02 (Figure 6). The use of Correlation and RReliefF resulted in consistent increases, following the pattern of the other datasets. The Lasso and RFE methods predominantly showed reductions, with negative variations distributed among the different algorithms, generally remaining close to zero. Overall, the patterns indicated that the magnitude of the variations was more pronounced in the GLCMrsoil and RGBrsoil datasets, whereas MSI represented the greatest challenge for the selection methods, with infrequent performance gains.
Similarly to the simple datasets (GLCM, MSI, and RGB), applying feature selection methods resulted in varied impacts on model performance for the combined datasets when compared to the same combination without applying any selection, as measured by ΔRMSE (Figure 7a). The Boruta and RFE methods predominantly promoted a decrease in RMSE for most of the evaluated algorithms, regardless of the dataset combination. Clearly, the most expressive reductions in error were observed for the XGB, CatBoost, and RBF-SVM models, reaching a maximum reduction of 0.042 for RBF-SVM in the GLCMrsoil + RGBrsoil combination with RFE. In contrast, feature selection via Lasso showed mixed behavior, resulting in slight RMSE increases for some models and decreases for others. Overall, RFE presented better results when combinations were used, showing mostly reductions or less significant increases in RMSE. Consequently, the combined models based on RFE were retained.
The evaluation of different dataset combinations using RFE (Figure 7b) revealed that using combinations of subsets generally resulted in lower or very similar RMSE compared to using all available sets (All). Furthermore, the GLCMrsoil + RGBrsoil combination showed the best results in linear and hyperplane-based models, as well as DT. Meanwhile, MSIrsoil + RGBrsoil performed better for random tree-based and gradient boosting models. The GLCMrsoil + MSIrsoil combination was the least prominent among the two-source combinations.
Using the GLCMrsoil predictors as a baseline (Figure 8a), the incorporation of only one of the variables promoted ΔRMSE reductions in most machine learning algorithms, with reductions exceeding 0.04. The GLCMrsoil + RGBrsoil combination demonstrated more consistent ΔRMSE reductions, particularly for non-linear models. Our results show prominence for XGB (0.065; GLCMrsoil + MSIrsoil) and CatBoost (0.055; All).
Employing MSIrsoil as the reference (Figure 8b), there were less expressive variations in RMSE, with some improvements and deteriorations for the models. The most notable was the combination of MSIrsoil with RGBrsoil, which showed more consistent reductions for non-linear models. Meanwhile, the combination of all datasets (All) presented the smallest RMSE reduction, at 0.019 for LGBM. For the linear and SVM models, increases occurred in all combination cases involving MSIrsoil when MSIrsoil was used as the baseline.
Based on the RGBrsoil predictors (Figure 8c), using the complete set of features (All) or only MSIrsoil resulted in performance improvements for some algorithms (CatBoost, LGBM, RF, and ET), with ΔRMSE reductions of approximately 0.025. In contrast, the GLCMrsoil + RGBrsoil combination showed reductions in some specific models but increases in most. In general, combining GLCM or MSI with RGBrsoil positively influenced the results of non-linear models that used only GLCM or MSI.
The comparative analysis among the different predictor sets allowed for the identification of three most promising scenarios for the final modeling stage (Figure 7 and Figure 8). Here, the isolated MSIrsoil set stood out as the best overall performing dataset, presenting more robust and consistent results across the evaluated algorithms, with lower errors and higher concordance values. The MSIrsoil + RGBrsoil + RFE combination proved to be the most stable, promoting consistent gains, especially in tree-based models and boosting techniques, indicating strong complementarity between multispectral and visible-spectral information. Finally, RGBrsoil + Boruta emerged as a strategic alternative for specific scenarios, particularly in linear and SVM models. Therefore, these three scenarios were selected for the model evaluation and selection, as well as for the assessment of feature influences.
The results showed that the lowest RMSE values were obtained by the RF (0.1857) and CatBoost (0.1858) models using RGBrsoil, followed by CatBoost with MSIrsoil (0.1879) and ET with RGBrsoil (0.1895) (Figure 9a). Among the linear models, Lasso presented an RMSE of 0.1908, close to the values of Ridge (0.1909) and Elastic Net (0.1927), both with MSIrsoil + RGBrsoil. The highest errors were observed mainly for DT, with MSIrsoil + RGBrsoil (0.2365), MSIrsoil (0.2471), and RGBrsoil (0.2725). For the MAE metric, the lowest values were recorded for CatBoost with RGBrsoil (0.1458), Lasso with MSIrsoil + RGBrsoil (0.1470), and Ridge with the same combination (0.1487) (Figure 9b). Ensemble models, such as RF and ET, also showed low absolute errors with RGBrsoil (0.1489 and 0.1514, respectively). In contrast, XGB and DT presented the highest MAE, with 0.1945 for MSIrsoil and 0.2077 for RGBrsoil, respectively.
The MAPE values indicated the lowest percentage error for CatBoost (0.2047) and KNN (0.2098) with RGBrsoil, and CatBoost with MSIrsoil (0.2146). Random Forest with RGBrsoil showed a MAPE of 0.2151, close to KNN with MSIrsoil (0.2188) (Figure 9c). The highest values were obtained by L-SVM with 0.2751 for RGBrsoil and LGBM with 0.2869 for MSIrsoil + RGBrsoil. In the CCC analysis, the highest values were observed for CatBoost with RGBrsoil (0.8799) and with MSIrsoil (0.8796), followed by L-SVM with MSIrsoil + RGBrsoil (0.8766). Furthermore, DT again stood out among the worst-performing models, with 0.8120 for MSIrsoil + RGBrsoil, 0.7983 for MSIrsoil, and 0.7635 for RGBrsoil (Figure 9d).
Overall, the CatBoost model stood out as the most consistent across all evaluated metrics, simultaneously exhibiting low RMSE, MAE, and MAPE values, as well as the highest concordance coefficients. Regarding the predictor sets, although the use of RGBrsoil selected by Boruta stood out among the best performers, the results were found to vary according to the interaction between models and predictors, showing proximity in the obtained values and suggesting that different combinations can achieve competitive performance.
The detailed performance of the CatBoost model, which stood out in the comparative analysis, revealed high performance (Figure 10). In the training phase, the model achieved a CCC of 0.98, with an RMSE of 0.086, an MAE of 0.069, and a MAPE of 0.127. For the test dataset, the performance resulted in a CCC of 0.83, RMSE of 0.214, MAE of 0.161, and MAPE of 0.258. The scatter plot indicated a distribution of predicted and observed values close to the identity line for both datasets. The feature importance analysis (Figure 10b) identified the areacrop variable as the most significant predictor, contributing 19.94% to the model, followed by areatotal with 13.83%. The remaining features showed lower contributions. Among them, ExR registered 7.76% importance, while WI, VARI, B, G, and R showed similar values, ranging from 6.41% to 6.04%. The features with the lowest predictive power were GRRI (2.96%), EVI2green (3.41%), NGRDI (3.88%), and BI (3.88%).

4. Discussion

In this study, we demonstrated the high capacity of machine learning models, powered by UAV data, to predict buffelgrass biomass. The results indicated that combining RGB sensors with robust preprocessing and feature selection strategies can generate models with accuracy comparable to, or even exceeding, that of more complex multispectral data [8,13,73,74] (Figure 9), demonstrating the viability of more accessible and scalable agricultural monitoring methodologies. The close performance and interchangeable rankings of the algorithms across the different metrics and evaluations suggest that even simpler approaches, when properly tuned, can rival more complex algorithms, highlighting that the processing steps are as important as the models themselves [75,76]. In particular, the success of CatBoost (Figure 9 and Figure 10) lies in its ability to optimize gradients to build robust models that capture feature interactions [77,78]. This superior performance, in alignment with other studies in the agricultural field, solidifies the standing of gradient boosting and decision tree-based algorithms [14,79,80,81,82].
However, although CatBoost achieved good absolute predictive performance, its apparent superiority should be interpreted with caution, as part of the gain resulted from overfitting (RMSEtrain = 0.086; RMSEtest = 0.214, Figure 10). Stronger regularization strategies, such as restricting tree depth, increasing L2 penalization, reducing the learning rate, and implementing early stopping based on an independent validation set, represent appropriate avenues to mitigate this effect and will be considered in future developments [83]. More broadly, the results reinforce that predictive performance depends not only on the choice of the algorithm but also on adequate control of model complexity and rigorous validation procedures [83,84]. In addition, the small margin separating CatBoost from competing algorithms (Figure 9) indicates that these models are also likely to benefit from the aforementioned regularization and validation strategies.
The consistent improvement in model performance after soil exclusion (Figure 4) demonstrates that the soil background acts as spectral noise in sparse vegetation canopies [85,86,87]. Simple segmentation based on vegetation index thresholds (e.g., NGRDI and NDVI) proved to be a fundamental step, refining the relationship between the predictors and biomass [17]. Based on these findings, we reinforce the need for data extraction approaches that isolate the vegetation signal. Future research should focus on validating these models under more diverse conditions and investigating automated methods for spectral thresholding to further refine the pre-processing stage, such as Rana et al. [22] and Sarathamani et al. [16].
Although texture analysis is frequently employed to enhance prediction models [12,88,89], the comparison among data sources revealed that GLCM-derived attributes generally did not contribute to modeling biomass (Figure 4, Figure 7 and Figure 8). The main hypothesis for this divergence is that the mean of the textural parameters underestimated the canopy’s structural complexity, a characteristic that proved to be better represented by spectral data. Furthermore, the use of a single window and an average across directions may have obscured relevant information. Therefore, it is suggested that the predictive capacity of GLCM could be enhanced by extracting various alternative statistical aggregation metrics (e.g., maximum, minimum, standard deviation) [90] or by adopting deep learning architectures that process the raw image data, overcoming the limitations of a single summary metric [91]. The statistical summary of the GLCM metrics exhibited consistently high homogeneity and low variance (e.g., low mean contrast and a restricted entropy range) for buffelgrass cover, indicating a rapid saturation of textural information under dense cover conditions, considering the methods adopted in this study. This limited spatial variability at centimeter-scale resolution prevented the textural features employed from effectively capturing the structural differences that are critical for accurate biomass estimation (Table S3).
In contrast, the combination of RGB and multispectral (MSI) information resulted in performance improvements in specific scenarios (Figure 7 and Figure 8), highlighting the contribution of the RE and NIR bands in capturing features complementary to visible spectrum data [92,93,94]. The stability observed in the combined MSIrsoil + RGBrsoil model (Figure 9) reinforces the value of sensor integration, even though the final model with the highest accuracy was based exclusively on RGB predictors. Studies conducted on oilseed rape [95], oat [29], and different grass species [96] indicate that fusing RGB data and multispectral characteristics can increase precision in estimating biomass and productivity. However, this gain is highly context-dependent, varying according to the cultivated species, the analyzed target, and environmental conditions [25,97]. Thus, the evaluation of this integration’s potential must be performed on a case-by-case basis, as demonstrated in this study by the proximity of the obtained metric values (Figure 9). For forage managers who prioritize low equipment cost, faster processing time, and high efficiency in large-scale monitoring, the RGB-only workflow is clearly the preferred choice. Conversely, for research scenarios or high-value precision agriculture applications that require the highest assurance of model stability under different conditions, the combination of MSIrsoil + RGBrsoil (although more computationally demanding) is justified.
Wrapper-type methods (RFE and Boruta) and Lasso (embedded) showed superior performance compared to filter methods (Correlation and RReliefFF) and Ridge (embedded), highlighting the relevance of strategies that automatically eliminate features (Figure 6 and Figure 7). The absence of predefined thresholds in Correlation, RReliefFF, and Ridge limits their evaluation capacity and reduces precision in feature selection. In contrast, Boruta and RFE, by estimating feature importance within the context of the modeling algorithm itself, identified more informative and non-redundant subsets. This approach resulted in reduced computational complexity and increased model robustness.
However, the effectiveness of feature selection proved to be strongly dependent on the nature of the input variables. Whereas in the GLCMrsoil and RGBrsoil datasets, high redundancy allowed methods like RFE and Boruta to identify subsets with information gain or without significant informational loss (Figure 6), in the MSIrsoil dataset, dimensionality reduction was more limited (Figure 5), reflecting the complementarity of the chosen multispectral indices. In this scenario, feature selection proves to be an essential step in defining the best balance between reducing the number of attributes and preserving the model’s predictive capacity [10,28,29]. For example, Zhang et al. [29] also highlighted the importance of feature selection for ML models in estimating oat biomass, finding the best combination with RFE and a multilayer perceptron. Although filter methods, such as those based on correlation [28], are also employed in different contexts, model-integrated procedures exhibit greater robustness and constitute a more reliable starting point for defining the feature set [98].
The dominance of the areacrop (vegetation area) variable as the most important predictor in the final model (Figure 10b)—surpassing all vegetation indices and spectral bands—and its predominance in the feature selection methods (Figure 5) indicate that, for buffelgrass under the evaluated conditions, the vegetation cover fraction constitutes highly significant information, especially when associated with spectral data. This is particularly true in conjunction with the total available areatotal, which was the second most important variable, characterizing a relationship between the available area and the area occupied by the crop. The estimate is complemented by spectral data, capable of capturing variations in canopy physiology and structure [10,94,99]. This finding has practical implications, suggesting that simpler models focused on the precise estimation of cover fraction can offer excellent cost-effectiveness in pasture monitoring, particularly for biomass [10,99,100,101]. This finding reinforces the fundamental importance of simple structural variables, indicating that spectral complexity is not the only path to greater accuracy in biomass estimation [101]. The high importance of the variables areacrop and areatotal resulted, in part, from the experimental design since fresh biomass was measured at the whole-plot level. The use of absolute areas as predictors constitutes a limitation, as it introduces dependence between available area and total biomass, constraining extrapolation to larger operational scales. In practical applications, this constraint may be mitigated by previously estimating the fraction of vegetation cover through automatic image segmentation into regular grids or management zones, over which biomass is subsequently modeled.
Conducting a comprehensive comparative analysis, incorporating multiple algorithms, data sources, and feature selection methods, proved essential for critically evaluating the obtained results. This approach reduced the risk of biases associated with reliance on a single model or preprocessing technique, offering a robust basis for interpreting the observed patterns. The consistent performance across different algorithms confirmed that trends—such as the superiority of soil removal or the efficacy of certain feature selection methods—were not the result of model-specific behaviors but rather reflected generalizable responses from the dataset. These findings demonstrate that predictive performance emerges from the integration of data quality, preprocessing efficacy, and the appropriateness of the employed algorithms, reinforcing the interdependent nature of the modeling workflow components (Figure 4, Figure 6 and Figure 9).
Despite the promising results, we acknowledge the limitations of this study. The study was conducted at a single experimental site, and model transferability to other regions (with different soil types and management conditions) needs to be validated [102]. The thresholds for soil removal were defined by visual inspection and were effective for the present dataset; however, this procedure may introduce subjectivity, and the absence of a sensitivity analysis of these cut-offs remains a limitation of the workflow. Future work should evaluate data-driven procedures, such automated optimization methods, to define thresholds with greater robustness under different conditions. For application in heterogeneous environments, soil-normalization procedures based on soil-line concepts or local radiometric calibration can mitigate background effects on vegetation indices, whereas prioritizing highly informative structural variables, such as cultivated area and canopy area, may improve model stability relative to raw spectral bands under variable atmospheric and calibration conditions. Further research should also validate the models across broader spatial and temporal domains, integrate 3D structural metrics derived from Structure-from-Motion photogrammetry, and explore deep learning approaches, such as Convolutional Neural Networks, capable of jointly learning spectral and textural features from raw imagery to advance biomass estimation accuracy.

5. Conclusions

In conclusion, our study establishes a robust workflow for estimating buffelgrass biomass using UAV data and machine learning. The removal of soil pixels, for example, consistently improved model performance, reducing errors and increasing concordance for spectral data. In the evaluation of isolated data sources, multispectral (MSI) predictors proved to be the most robust, while textural (GLCM) attributes did not contribute significantly to the modeling. Although the combination of MSI and RGB data demonstrated strong complementarity in specific models, we have shown that RGB sensors, when associated with careful processing and feature selection, can produce high-precision results. The prominence of area calculations (areatotal, areacrop and areasoil), surpassing the contribution of spectral and textural indices, demonstrates that the optimization of the analytical pipeline can be more decisive than sensor complexity. These findings pave the way for the development of low-cost tools for precision pasture management, with the potential to optimize animal production and promote sustainability in agricultural systems in semiarid regions.
Although the results are promising, the study was conducted at a single experimental site, indicating that model transferability needs to be validated under more diverse conditions. To translate these findings, future research should focus on integrating 3D structural information, such as canopy height obtained from photogrammetry, which can capture vertical biomass variability not fully represented by 2D data, as well as on exploring deep learning architectures to extract spectro-textural features directly from the images, further refining the estimation accuracy.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/drones10010061/s1. Figure S1: Daily rainfall and irrigation depths applied 100% based on reference evapotranspiration (ET0) during the experimental. Vertical dashed lines represent the beginning and end of each cycle (25 October 2024–5 January 2025; 11 April 2025–3 June 2025); Table S1: Template Agisoft Metashape (v.2.1.2, Agisoft LLC, St. Petersburg, Russia); Table S2: Mean variation and percentage change in metrics for soil removal, including the number of models that showed an increase or decrease; Table S3: Descriptive statistics of Gray-Level Co-occurrence Matrix (GLCM) metrics.

Author Contributions

Conceptualization, W.M.d.S., A.M.d.R.F.J. and T.G.F.d.S.; Methodology, W.M.d.S., A.M.d.R.F.J., L.D.C.d.S.M., M.B.M.d.M., E.F.d.S., L.S.B.d.S., A.C.B., J.R.I.S., Ê.F.d.F.e.S., J.L.M.P.d.L., L.P.C.M. and T.G.F.d.S.; Software, W.M.d.S., A.M.d.R.F.J., A.C.B. and J.R.I.S.; Validation, W.M.d.S., M.B.M.d.M., L.S.B.d.S. and T.G.F.d.S.; Formal analysis, W.M.d.S., A.M.d.R.F.J., L.D.C.d.S.M., M.B.M.d.M., E.F.d.S., L.S.B.d.S., A.C.B., J.R.I.S., Ê.F.d.F.e.S., J.L.M.P.d.L., L.P.C.M. and T.G.F.d.S.; Investigation, W.M.d.S., A.M.d.R.F.J. and T.G.F.d.S.; Resources, A.M.d.R.F.J. and T.G.F.d.S.; Data curation, W.M.d.S., A.M.d.R.F.J., L.D.C.d.S.M., M.B.M.d.M., E.F.d.S., L.S.B.d.S., A.C.B., J.R.I.S., J.L.M.P.d.L., L.P.C.M. and T.G.F.d.S.; Writing—original draft preparation, W.M.d.S., L.D.C.d.S.M.; Writing—review and editing, W.M.d.S., A.M.d.R.F.J., L.D.C.d.S.M., M.B.M.d.M., E.F.d.S., L.S.B.d.S., A.C.B., J.R.I.S., Ê.F.d.F.e.S., J.L.M.P.d.L., L.P.C.M. and T.G.F.d.S.; Visualization, W.M.d.S., A.M.d.R.F.J., L.D.C.d.S.M., M.B.M.d.M., E.F.d.S., L.S.B.d.S., A.C.B., J.R.I.S., Ê.F.d.F.e.S., J.L.M.P.d.L., L.P.C.M. and T.G.F.d.S.; Supervision, A.M.d.R.F.J., A.C.B. and T.G.F.d.S.; Project administration, A.M.d.R.F.J., L.S.B.d.S. and T.G.F.d.S.; Funding acquisition, A.M.d.R.F.J., L.S.B.d.S., J.L.M.P.d.L., L.P.C.M. and T.G.F.d.S. All authors have read and agreed to the published version of the manuscript.

Funding

We are grateful to the Research Support Foundation of the State of Pernambuco (FACEPE; APQ-0428-5.01/21, APQ-1449-5.01/22, APQ-0783-5.01/22, APQ-2105-5.01/24); the National Council for Scientific and Technological Development (CNPq; 309558/2021-2, 402622/2021-9); the Coordination for the Improvement of Higher Education Personnel (CAPES; Finance Code 001); and the São Paulo Research Foundation (FAPESP; 2025/19074-1) for financial support and research grants.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding authors.

Acknowledgments

We sincerely acknowledge the Postgraduate Program in Agricultural Engineering (PGEA), the University of Coimbra (UC), Portugal, and the Federal Rural University of Pernambuco (UFRPE), Brazil, for their support and the provision of equipment used in this research.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Sanyaolu, M.; Sadowski, A. The Role of Precision Agriculture Technologies in Enhancing Sustainable Agriculture. Sustainability 2024, 16, 6668. [Google Scholar] [CrossRef]
  2. Qin, A.; Ning, D. Developments, Applications, and Innovations in Agricultural Sciences and Biotechnologies. Appl. Sci. 2025, 15, 4381. [Google Scholar] [CrossRef]
  3. Fernandes, P.B.; dos Santos, C.A.; Gurgel, A.L.C.; Gonçalves, L.F.; Fonseca, N.N.; Moura, R.B.; Costa, K.A.d.P.; Paim, T.d.P. Non-Destructive Methods Used to Determine Forage Mass and Nutritional Condition in Tropical Pastures. AgriEngineering 2023, 5, 1614–1629. [Google Scholar] [CrossRef]
  4. Urquizo, J.; Ccopi, D.; Ortega, K.; Castañeda, I.; Patricio, S.; Passuni, J.; Figueroa, D.; Enriquez, L.; Ore, Z.; Pizarro, S. Estimation of Forage Biomass in Oat (Avena sativa) Using Agronomic Variables through UAV Multispectral Imaging. Remote Sens. 2024, 16, 3720. [Google Scholar] [CrossRef]
  5. Subhashree, S.N.; Igathinathane, C.; Hendrickson, J.; Archer, D.; Liebig, M.; Halvorson, J.; Kronberg, S.; Toledo, D.; Sedivec, K. Evaluating Remote Sensing Resolutions and Machine Learning Methods for Biomass Yield Prediction in Northern Great Plains Pastures. Agriculture 2025, 15, 505. [Google Scholar] [CrossRef]
  6. Avneri, A.; Aharon, S.; Brook, A.; Atsmon, G.; Smirnov, E.; Sadeh, R.; Abbo, S.; Peleg, Z.; Herrmann, I.; Bonfil, D.J.; et al. UAS-Based Imaging for Prediction of Chickpea Crop Biophysical Parameters and Yield. Comput. Electron. Agric. 2023, 205, 107581. [Google Scholar] [CrossRef]
  7. Wang, T.; Liu, Y.; Wang, M.; Fan, Q.; Tian, H.; Qiao, X.; Li, Y. Applications of UAS in Crop Biomass Monitoring: A Review. Front. Plant Sci. 2021, 12, 616689. [Google Scholar] [CrossRef]
  8. Grüner, E.; Astor, T.; Wachendorf, M. Biomass Prediction of Heterogeneous Temperate Grasslands Using an SfM Approach Based on UAV Imaging. Agronomy 2019, 9, 54. [Google Scholar] [CrossRef]
  9. Ferreira, F.M.; Leite, R.V.; Malikouski, R.G.; Peixoto, M.A.; Bernardeli, A.; Alves, R.S.; de Magalhães Júnior, W.C.P.; Andrade, R.G.; Bhering, L.L.; Machado, J.C. Bioenergy Elephant Grass Genotype Selection Leveraged by Spatial Modeling of Conventional and High-Throughput Phenotyping Data. J. Clean. Prod. 2022, 363, 132286. [Google Scholar] [CrossRef]
  10. Santos, W.M.d.; Martins, L.D.C.d.S.; Bezerra, A.C.; Souza, L.S.B.d.; Jardim, A.M.d.R.F.; Silva, M.V.d.; Souza, C.A.A.d.; Silva, T.G.F.d. Use of Unmanned Aerial Vehicles for Monitoring Pastures and Forages in Agricultural Sciences: A Systematic Review. Drones 2024, 8, 585. [Google Scholar] [CrossRef]
  11. Santos, W.M.d.; Costa, C.d.J.P.; Medeiros, M.L.d.S.; Jardim, A.M.d.R.F.; Cunha, M.V.d.; Dubeux Junior, J.C.B.; Jaramillo, D.M.; Bezerra, A.C.; Souza, E.J.O.d. Can Unmanned Aerial Vehicle Images Be Used to Estimate Forage Production Parameters in Agroforestry Systems in the Caatinga? Appl. Sci. 2024, 14, 4896. [Google Scholar] [CrossRef]
  12. Freitas, R.G.; Pereira, F.R.S.; Dos Reis, A.A.; Magalhães, P.S.G.; Figueiredo, G.K.D.A.; do Amaral, L.R. Estimating Pasture Aboveground Biomass under an Integrated Crop-Livestock System Based on Spectral and Texture Measures Derived from UAV Images. Comput. Electron. Agric. 2022, 198, 107122. [Google Scholar] [CrossRef]
  13. Wang, X.; Yan, S.; Wang, W.; Liubing, Y.; Li, M.; Yu, Z.; Chang, S.; Hou, F. Monitoring Leaf Area Index of the Sown Mixture Pasture through UAV Multispectral Image and Texture Characteristics. Comput. Electron. Agric. 2023, 214, 108333. [Google Scholar] [CrossRef]
  14. Li, S.; Xiang, Y.; Jin, M.; Tang, Z.; Sun, T.; Liu, X.; Huang, X.; Li, Z.; Zhang, F. Estimation of Potato Leaf Area Index and Aboveground Biomass Based on a New Texture Index Constructed from Unmanned Aerial Vehicles Multispectral Images. J. Soil Sci. Plant Nutr. 2025, 25, 7092–7107. [Google Scholar] [CrossRef]
  15. Deng, H.; Zhang, W.; Zheng, X.; Zhang, H. Crop Classification Combining Object-Oriented Method and Random Forest Model Using Unmanned Aerial Vehicle (UAV) Multispectral Image. Agriculture 2024, 14, 548. [Google Scholar] [CrossRef]
  16. Sarathamani, A.P.; Kumar, A.; Singh, R.P. Optimizing Harvested Paddy Field Classification: Leveraging Combined Local Convolution and Individual Sample as Mean Training Approach. J. Indian Soc. Remote Sens. 2025, 53, 1–23. [Google Scholar] [CrossRef]
  17. Xu, C.; Zeng, Y.; Zheng, Z.; Zhao, D.; Liu, W.; Ma, Z.; Wu, B. Assessing the Impact of Soil on Species Diversity Estimation Based on UAV Imaging Spectroscopy in a Natural Alpine Steppe. Remote Sens. 2022, 14, 671. [Google Scholar] [CrossRef]
  18. Psiroukis, V.; Papadopoulos, G.; Kasimati, A.; Tsoulias, N.; Fountas, S. Cotton Growth Modelling Using UAS-Derived DSM and RGB Imagery. Remote Sens. 2023, 15, 1214. [Google Scholar] [CrossRef]
  19. Zhang, C.; Zhu, Q.; Fu, Z.; Yuan, C.; Geng, M.; Meng, R. Estimation of Aboveground Biomass of Chinese Milk Vetch Based on UAV Multi-Source Map Fusion. Remote Sens. 2025, 17, 699. [Google Scholar] [CrossRef]
  20. Xu, X.; Fan, L.; Li, Z.; Meng, Y.; Feng, H.; Yang, H.; Xu, B. Estimating Leaf Nitrogen Content in Corn Based on Information Fusion of Multiple-Sensor Imagery from UAV. Remote Sens. 2021, 13, 340. [Google Scholar] [CrossRef]
  21. Agrawal, J.; Arafat, M.Y. Transforming Farming: A Review of AI-Powered UAV Technologies in Precision Agriculture. Drones 2024, 8, 664. [Google Scholar] [CrossRef]
  22. Rana, S.; Gerbino, S.; Carillo, P. Study of Spectral Overlap and Heterogeneity in Agriculture Based on Soft Classification Techniques. MethodsX 2025, 14, 103114. [Google Scholar] [CrossRef] [PubMed]
  23. Negawo, A.T.; Assefa, Y.; Hanson, J.; Abdena, A.; Muktar, M.S.; Habte, E.; Sartie, A.M.; Jones, C.S. Genotyping-By-Sequencing Reveals Population Structure and Genetic Diversity of a Buffelgrass (Cenchrus ciliaris L.) Collection. Diversity 2020, 12, 88. [Google Scholar] [CrossRef]
  24. Santos, A.R.M.d.; de Morais, J.E.F.; Salvador, K.R.d.S.; de Souza, C.A.A.; Araújo Júnior, G.d.N.; Alves, C.P.; Jardim, A.M.d.R.F.; da Silva, M.J.; de Carvalho, F.G.; dos Santos, A.S.L.; et al. Environmental Seasonality Affects the Growth, Yield and Economic Viability of Irrigated Forage Species in Dry Regions. Irrig. Drain. 2025, 74, 1609–1637. [Google Scholar] [CrossRef]
  25. Herzig, P.; Borrmann, P.; Knauer, U.; Klück, H.C.; Kilias, D.; Seiffert, U.; Pillen, K.; Maurer, A. Evaluation of Rgb and Multispectral Unmanned Aerial Vehicle (Uav) Imagery for High-Throughput Phenotyping and Yield Prediction in Barley Breeding. Remote Sens. 2021, 13, 2670. [Google Scholar] [CrossRef]
  26. Prey, L.; Hanemann, A.; Ramgraber, L.; Seidl-Schulz, J.; Noack, P.O. UAV-Based Estimation of Grain Yield for Plant Breeding: Applied Strategies for Optimizing the Use of Sensors, Vegetation Indices, Growth Stages, and Machine Learning Algorithms. Remote Sens. 2022, 14, 6345. [Google Scholar] [CrossRef]
  27. Yu, X.; Lu, D.; Jiang, X.; Li, G.; Chen, Y.; Li, D.; Chen, E.; Yu, X.; Lu, D.; Jiang, X.; et al. Examining the Roles of Spectral, Spatial, and Topographic Features in Improving Land-Cover and Forest Classifications in a Subtropical Region. Remote Sens. 2020, 12, 2907. [Google Scholar] [CrossRef]
  28. Whitmire, C.D.; Vance, J.M.; Rasheed, H.K.; Missaoui, A.; Rasheed, K.M.; Maier, F.W. Using Machine Learning and Feature Selection for Alfalfa Yield Prediction. AI 2021, 2, 71–88. [Google Scholar] [CrossRef]
  29. Zhang, P.; Lu, B.; Ge, J.; Wang, X.; Yang, Y.; Shang, J.; La, Z.; Zang, H.; Zeng, Z. Using UAV-Based Multispectral and RGB Imagery to Monitor above-Ground Biomass of Oat-Based Diversified Cropping. Eur. J. Agron. 2025, 162, 127422. [Google Scholar] [CrossRef]
  30. Alvares, C.A.; Stape, J.L.J.L.; Sentelhas, P.C.; Leonardo, J.; Gonçalves, M.; Stape, J.L.J.L.; Sparovek, G.; De Moraes Gonçalves, J.L.; Sparovek, G. Köppen’s Climate Classification Map for Brazil. Meteorol. Z. 2013, 22, 711–728. [Google Scholar] [CrossRef]
  31. Santos, H.G.d. Sistema Brasileiro de Classificação de Solos; Embrapa Solos: Rio de Janeiro, Brazil, 2018. [Google Scholar]
  32. Staff, S.S. Keys to Soil Taxonomy, 13th ed.; USDA Natural Resources Conservation Service: Washington, DC, USA, 2022. [Google Scholar]
  33. McGeorge, W.T. Diagnosis and Improvement of Saline and Alkaline Soils. Soil Sci. Soc. Am. J. 1954, 18, 348. [Google Scholar] [CrossRef]
  34. R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2025. [Google Scholar]
  35. Hijmans, R.J. Raster: Geographic Data Analysis and Modeling. 2022. Available online: http://r.meteo.uni.wroc.pl/web/packages/raster/raster.pdf (accessed on 10 September 2025).
  36. Hijmans, R.J. Terra: Spatial Data Analysis. 2025. Available online: https://cran.r-project.org/web/packages/terra/terra.pdf (accessed on 10 September 2025).
  37. Pebesma, E. Simple Features for R: Standardized Support for Spatial Vector Data. R J. 2018, 10, 439–446. [Google Scholar] [CrossRef]
  38. Matias, F.I.; Caraza-Harter, M.V.; Endelman, J.B. FIELDimageR: An R Package to Analyze Orthomosaic Images from Agricultural Field Trials. Plant Phenome J. 2020, 3, e20005. [Google Scholar] [CrossRef]
  39. Rouse, J.W.; Haas, R.H.; Schell, J.A.; Deering, D.W. Monitoring Vegetation Systems in the Great Plains with ERTS. NASA Spec. Publ. 1973, 351, 309. [Google Scholar]
  40. Gitelson, A.A.; Merzlyak, M.N. Remote Sensing of Chlorophyll Concentration in Higher Plant Leaves. Adv. Space Res. 1998, 22, 689–692. [Google Scholar] [CrossRef]
  41. Widjaja Putra, B.T.; Soni, P. Enhanced Broadband Greenness in Assessing Chlorophyll a and b, Carotenoid, and Nitrogen in Robusta Coffee Plantations Using a Digital Camera. Precis. Agric. 2018, 19, 238–256. [Google Scholar] [CrossRef]
  42. Jordan, C.F. Derivation of Leaf-Area Index from Quality of Light on the Forest Floor. Ecology 1969, 50, 663–666. [Google Scholar] [CrossRef]
  43. Gitelson, A.; Kaufman, Y.; Merzlyak, M.N. Use of a Green Channel in Remote Sensing of Global Vegetation from EOS-MODIS. Remote Sens. Environ. 1996, 58, 289–298. [Google Scholar] [CrossRef]
  44. Daughtry, C.S.T.; Walthall, C.L.; Kim, M.S.; De Colstoun, E.B.; McMurtrey, J.E. Estimating Corn Leaf Chlorophyll Concentration from Leaf and Canopy Reflectance. Remote Sens. Environ. 2000, 74, 229–239. [Google Scholar] [CrossRef]
  45. Broge, N.H.; Leblanc, E. Comparing Prediction Power and Stability of Broadband and Hyperspectral Vegetation Indices for Estimation of Green Leaf Area Index and Canopy Chlorophyll Density. Remote Sens. Environ. 2001, 76, 156–172. [Google Scholar] [CrossRef]
  46. Baptista, G.M.d.M. Aplicação Do Índice de Vegetação Por Profundidade de Feição Espectral (SFDVI-Spectral Feature Depth Vegetation Index) Em Dados RapidEye. In Proceedings of the XVII Simpósio Brasileiro de Sensoriamento Remoto—SBSR. INPE, João Pessoa, Brazil, 25–29 April 2015; pp. 2277–2284. [Google Scholar]
  47. Huete, A.R. A Soil-Adjusted Vegetation Index (SAVI). Remote Sens. Environ. 1988, 25, 295–309. [Google Scholar] [CrossRef]
  48. Sripada, R.P.; Heiniger, R.W.; White, J.G.; Meijer, A.D. Aerial Color Infrared Photography for Determining Early In-Season Nitrogen Requirements in Corn. Agron. J. 2006, 98, 968–977. [Google Scholar] [CrossRef]
  49. Tucker, C.J. Red and Photographic Infrared Linear Combinations for Monitoring Vegetation. Remote Sens. Environ. 1979, 8, 127–150. [Google Scholar] [CrossRef]
  50. Maimaitijiang, M.; Sagan, V.; Sidike, P.; Maimaitiyiming, M.; Hartling, S.; Peterson, K.T.; Maw, M.J.W.; Shakoor, N.; Mockler, T.; Fritschi, F.B. Vegetation Index Weighted Canopy Volume Model (CVMVI) for Soybean Biomass Estimation from Unmanned Aerial System-Based RGB Imagery. ISPRS J. Photogramm. Remote Sens. 2019, 151, 27–41. [Google Scholar] [CrossRef]
  51. Woebbecke, D.M.; Meyer, G.E.; Von Bargen, K.; Mortensen, D.A. Color Indices for Weed Identification Under Various Soil, Residue, and Lighting Conditions. Trans. ASAE 1995, 38, 259–269. [Google Scholar] [CrossRef]
  52. Louhaichi, M.; Borman, M.M.; Johnson, D.E. Spatially Located Platform and Aerial Photography for Documentation of Grazing Impacts on Wheat. Geocarto Int. 2001, 16, 65–70. [Google Scholar] [CrossRef]
  53. Peñuelas, J.; Gamon, J.A.; Fredeen, A.L.; Merino, J.; Field, C.B. Reflectance Indices Associated with Physiological Changes in Nitrogen- and Water-Limited Sunflower Leaves. Remote Sens. Environ. 1994, 48, 135–146. [Google Scholar] [CrossRef]
  54. Gitelson, A.A.; Stark, R.; Grits, U.; Rundquist, D.; Kaufman, Y.; Derry, D.; Gitelson, A.A.; Stark, R.; Grits, U.; Rundquist, D.; et al. Vegetation and Soil Lines in Visible Spectral Space: A Concept and Technique for Remote Estimation of Vegetation Fraction. Int. J. Remote Sens. 2002, 23, 2537–2562. [Google Scholar] [CrossRef]
  55. Ahmad, I.S.; Reid, J.F. Evaluation of Colour Representations for Maize Images. J. Agric. Eng. Res. 1996, 63, 185–195. [Google Scholar] [CrossRef]
  56. Richardson, A.J.; Weigand, C.L. Distinguishing Vegetation from Soil Background Information. Photogramm. Eng. Remote Sens. 1977, 43, 1541–1552. [Google Scholar]
  57. Haralick, R.M.; Dinstein, I.; Shanmugam, K. Textural Features for Image Classification. IEEE Trans. Syst. Man Cybern. 1973, SMC-3, 610–621. [Google Scholar] [CrossRef]
  58. Python Software Foundation Python Language Reference, Version 3.10.18; Network Theory Ltd.: Godalming, UK, 2025.
  59. Gillies, S. Others Rasterio: Geospatial Raster I/O for {Python} Programmers; GitHub: San Francisco, CA, USA, 2013. [Google Scholar]
  60. Harris, C.R.; Millman, K.J.; van der Walt, S.J.; Gommers, R.; Virtanen, P.; Cournapeau, D.; Wieser, E.; Taylor, J.; Berg, S.; Smith, N.J.; et al. Array Programming with NumPy. Nature 2020, 585, 357–362. [Google Scholar] [CrossRef] [PubMed]
  61. Lam, S.K.; Pitrou, A.; Seibert, S. Numba: A LLVM-Based Python JIT Compiler. In Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC, Austin, TX, USA, 15 November 2015; Association for Computing Machinery: New York, NY, USA, 2015. [Google Scholar]
  62. Kira, K.; Rendell, L.A. The Feature Selection Problem: Traditional Methods and a New Algorithm. In Proceedings of the Tenth National Conference on Artificial Intelligence, San Jose, CA, USA, 12–16 July 1992; AAAI Press: Washington, DC, USA, 1992; pp. 129–134. [Google Scholar]
  63. Robnik-Sikonja, M.; Kononenko, I. An Adaptation of Relief for Attribute Estimation in Regression. In Proceedings of the Fourteenth International Conference on Machine Learning, Nashville, TN, USA, 8–12 July 1997; Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 1997; pp. 296–304. [Google Scholar]
  64. Niquini, F.G.F.; Branches, A.M.B.; Costa, J.F.C.L.; Moreira, G.d.C.; Schneider, C.L.; Araújo, F.C.d.; Capponi, L.N. Recursive Feature Elimination and Neural Networks Applied to the Forecast of Mass and Metallurgical Recoveries in A Brazilian Phosphate Mine. Minerals 2023, 13, 748. [Google Scholar] [CrossRef]
  65. Kursa, M.B.; Rudnicki, W.R. Feature Selection with the Boruta Package. J. Stat. Softw. 2010, 36, 1–13. [Google Scholar] [CrossRef]
  66. Almutiri, T.M.; Alomar, K.H.; Alganmi, N.A. Integrating Multi-Omics Using Bayesian Ridge Regression with Iterative Similarity Bagging. Appl. Sci. 2024, 14, 5660. [Google Scholar] [CrossRef]
  67. Lin, L.I.-K. A Concordance Correlation Coefficient to Evaluate Reproducibility. Biometrics 1989, 45, 255. [Google Scholar] [CrossRef]
  68. Friedman, J.H.; Hastie, T.; Tibshirani, R. Regularization Paths for Generalized Linear Models via Coordinate Descent. J. Stat. Softw. 2010, 33, 1–22. [Google Scholar] [CrossRef]
  69. Kuhn, M. Caret: Classification and Regression Training. 2022. Available online: https://cran.r-project.org/web/packages/caret/caret.pdf (accessed on 10 September 2025).
  70. Robnik-Sikonja, M.; Savicky, P. CORElearn: Classification, Regression and Feature Evaluation. 2024. Available online: https://za.mirrors.cicku.me/CRAN/web/packages/CORElearn/CORElearn.pdf (accessed on 10 September 2025).
  71. Kuhn, M.; Wickham, H. Tidymodels: A Collection of Packages for Modeling and Machine Learning Using Tidyverse Principles. 2020. Available online: https://CRAN.R-project.org/package=tidymodels (accessed on 10 September 2025).
  72. Wickham, H. Ggplot2: Elegant Graphics for Data Analysis, 3rd ed.; Springer: New York, NY, USA, 2016; ISBN 978-3-319-24277-4. [Google Scholar]
  73. Gracia-Romero, A.; Kefauver, S.C.; Vergara-Díaz, O.; Zaman-Allah, M.A.; Prasanna, B.M.; Cairns, J.E.; Araus, J.L. Comparative Performance of Ground vs. Aerially Assessed Rgb and Multispectral Indices for Early-Growth Evaluation of Maize Performance under Phosphorus Fertilization. Front. Plant Sci. 2017, 8, 309121. [Google Scholar] [CrossRef]
  74. Niu, Y.; Han, W.; Zhang, H.; Zhang, L.; Chen, H. Estimating Maize Plant Height Using a Crop Surface Model Constructed from UAV RGB Images. Biosyst. Eng. 2024, 241, 56–67. [Google Scholar] [CrossRef]
  75. Shin, J.; Moon, H.; Chun, C.J.; Sim, T.; Kim, E.; Lee, S. Enhanced Data Processing and Machine Learning Techniques for Energy Consumption Forecasting. Electronics 2024, 13, 3885. [Google Scholar] [CrossRef]
  76. Kamencay, P.; Hockicko, P.; Hudec, R. Sensors Data Processing Using Machine Learning. Sensors 2024, 24, 1694. [Google Scholar] [CrossRef]
  77. Hancock, J.T.; Khoshgoftaar, T.M. CatBoost for Big Data: An Interdisciplinary Review. J. Big Data 2020, 7, 94. [Google Scholar] [CrossRef] [PubMed]
  78. Bentéjac, C.; Csörgő, A.; Martínez-Muñoz, G. A Comparative Analysis of Gradient Boosting Algorithms. Artif. Intell. Rev. 2021, 54, 1937–1967. [Google Scholar] [CrossRef]
  79. Yasaswy, M.Y.S.K.; Manimegalai, T.; Somasundaram, J. Crop Yield Prediction in Agriculture Using Gradient Boosting Algorithm Compared with Random Forest. In Proceedings of the 2022 International Conference on Cyber Resilience (ICCR), Dubai, United Arab Emirates, 6–7 October 2022. [Google Scholar] [CrossRef]
  80. Pradeep, G.; Rayen, T.D.V.; Pushpalatha, A.; Rani, P.K. Effective Crop Yield Prediction Using Gradient Boosting to Improve Agricultural Outcomes. In Proceedings of the 2023 International Conference on Networking and Communications (ICNWC), Chennai, India, 5–6 April 2023. [Google Scholar] [CrossRef]
  81. Zhai, W.; Li, C.; Fei, S.; Liu, Y.; Ding, F.; Cheng, Q.; Chen, Z. CatBoost Algorithm for Estimating Maize Above-Ground Biomass Using Unmanned Aerial Vehicle-Based Multi-Source Sensor Data and SPAD Values. Comput. Electron. Agric. 2023, 214, 108306. [Google Scholar] [CrossRef]
  82. Tunca, E.; Köksal, E.S.; Akay, H.; Öztürk, E.; Taner, S. Novel Machine Learning Framework for High-Resolution Sorghum Biomass Estimation Using Multi-Temporal UAV Imagery. Int. J. Environ. Sci. Technol. 2025, 22, 13673–13688. [Google Scholar] [CrossRef]
  83. Raiaan, M.A.K.; Sakib, S.; Fahad, N.M.; Mamun, A.A.; Rahman, M.A.; Shatabda, S.; Mukta, M.S.H. A Systematic Review of Hyperparameter Optimization Techniques in Convolutional Neural Networks. Decis. Anal. J. 2024, 11, 100470. [Google Scholar] [CrossRef]
  84. Xu, Y.; Goodacre, R. On Splitting Training and Validation Set: A Comparative Study of Cross-Validation, Bootstrap and Systematic Sampling for Estimating the Generalization Performance of Supervised Learning. J. Anal. Test. 2018, 2, 249–262. [Google Scholar] [CrossRef]
  85. Prudnikova, E.; Savin, I.; Vindeker, G.; Grubina, P.; Shishkonakova, E.; Sharychev, D. Influence of Soil Background on Spectral Reflectance of Winter Wheat Crop Canopy. Remote Sens. 2019, 11, 1932. [Google Scholar] [CrossRef]
  86. Almeida-Ñauñay, A.F.; Tarquis, A.M.; López-Herrera, J.; Pérez-Martín, E.; Pancorbo, J.L.; Raya-Sereno, M.D.; Quemada, M. Optimization of Soil Background Removal to Improve the Prediction of Wheat Traits with UAV Imagery. Comput. Electron. Agric. 2023, 205, 107559. [Google Scholar] [CrossRef]
  87. Riquelme, L.; Duncan, D.H.; Rumpff, L.; Vesk, P.A. Using Remote Sensing to Estimate Understorey Biomass in Semi-Arid Woodlands of South-Eastern Australia. Remote Sens. 2022, 14, 2358. [Google Scholar] [CrossRef]
  88. Pan, T.; Ye, H.; Zhang, X.; Liao, X.; Wang, D.; Bayin, D.; Safarov, M.; Okhonniyozov, M.; Majid, G. Estimating Aboveground Biomass of Grassland in Central Asia Mountainous Areas Using Unmanned Aerial Vehicle Vegetation Indices and Image Textures—A Case Study of Typical Grassland in Tajikistan. Environ. Sustain. Indic. 2024, 22, 100345. [Google Scholar] [CrossRef]
  89. Dos Reis, A.A.; Werner, J.P.S.; Silva, B.C.; Figueiredo, G.K.D.A.; Antunes, J.F.G.; Esquerdo, J.C.D.M.; Coutinho, A.C.; Lamparelli, R.A.C.; Rocha, J.V.; Magalhães, P.S.G. Monitoring Pasture Aboveground Biomass and Canopy Height in an Integrated Crop–Livestock System Using Textural Information from PlanetScope Imagery. Remote Sens. 2020, 12, 2534. [Google Scholar] [CrossRef]
  90. Dheepak, G.; Christaline J. , A.; Vaishali, D. Brain Tumor Classification: A Novel Approach Integrating GLCM, LBP and Composite Features. Front. Oncol. 2023, 13, 1248452. [Google Scholar] [CrossRef]
  91. Dritsas, E.; Trigka, M. Remote Sensing and Geospatial Analysis in the Big Data Era: A Survey. Remote Sens. 2025, 17, 550. [Google Scholar] [CrossRef]
  92. Cundill, S.L.; der van Werff, H.M.A.; der van Meijde, M. Adjusting Spectral Indices for Spectral Response Function Differences of Very High Spatial Resolution Sensors Simulated from Field Spectra. Sensors 2015, 15, 6221–6240. [Google Scholar] [CrossRef] [PubMed]
  93. Wei, H.E.; Grafton, M.; Bretherton, M.; Irwin, M.; Sandoval, E. Evaluation of Point Hyperspectral Reflectance and Multivariate Regression Models for Grapevine Water Status Estimation. Remote Sens. 2021, 13, 3198. [Google Scholar] [CrossRef]
  94. Marques, P.; Pádua, L.; Sousa, J.J.; Fernandes-Silva, A. Advancements in Remote Sensing Imagery Applications for Precision Management in Olive Growing: A Systematic Review. Remote Sens. 2024, 16, 1324. [Google Scholar] [CrossRef]
  95. Zhu, H.; Liang, S.; Lin, C.; He, Y.; Xu, J.L. Using Multi-Sensor Data Fusion Techniques and Machine Learning Algorithms for Improving UAV-Based Yield Prediction of Oilseed Rape. Drones 2024, 8, 642. [Google Scholar] [CrossRef]
  96. Hernandez, A.; Jensen, K.; Larson, S.; Larsen, R.; Rigby, C.; Johnson, B.; Spickermann, C.; Sinton, S. Using Unmanned Aerial Vehicles and Multispectral Sensors to Model Forage Yield for Grasses of Semiarid Landscapes. Grasses 2024, 3, 84–109. [Google Scholar] [CrossRef]
  97. Ochiai, S.; Kamada, E.; Sugiura, R. Comparative Analysis of RGB and Multispectral UAV Image Data for Leaf Area Index Estimation of Sweet Potato. Smart Agric. Technol. 2024, 9, 100579. [Google Scholar] [CrossRef]
  98. Prastyo, P.H.; Ardiyanto, I.; Hidayat, R. A Review of Feature Selection Techniques in Sentiment Analysis Using Filter, Wrapper, or Hybrid Methods. In Proceedings of the 2020 6th International Conference on Science and Technology (ICST), Yogyakarta, Indonesia, 7–8 September 2020. [Google Scholar] [CrossRef]
  99. Schaefer, M.T.; Lamb, D.W.; Ozdogan, M.; Baghdadi, N.; Thenkabail, P.S. A Combination of Plant NDVI and LiDAR Measurements Improve the Estimation of Pasture Biomass in Tall Fescue (Festuca arundinacea Var. Fletcher). Remote Sens. 2016, 8, 109. [Google Scholar] [CrossRef]
  100. Théau, J.; Lauzier-Hudon, É.; Aubé, L.; Devillers, N. Estimation of Forage Biomass and Vegetation Cover in Grasslands Using UAV Imagery. PLoS ONE 2021, 16, e0245784. [Google Scholar] [CrossRef]
  101. Poley, L.G.; McDermid, G.J. A Systematic Review of the Factors Influencing the Estimation of Vegetation Aboveground Biomass Using Unmanned Aerial Systems. Remote Sens. 2020, 12, 1052. [Google Scholar] [CrossRef]
  102. Punalekar, S.M.; Verhoef, A.; Quaife, T.L.; Humphries, D.; Bermingham, L.; Reynolds, C.K. Application of Sentinel-2A Data for Pasture Biomass Monitoring Using a Physically Based Radiative Transfer Model. Remote Sens. Environ. 2018, 218, 207–220. [Google Scholar] [CrossRef]
Figure 1. Details of the study area location (−7.95458° S, −38.29509° W). (a) Brazil within South America. (b) The state of Pernambuco, Brazil. Aerial images from (c) 4 January 2025, and (d) 2 June 2025. (e) Overview of the sampled Cenchrus ciliaris L. (buffelgrass) plots.
Figure 1. Details of the study area location (−7.95458° S, −38.29509° W). (a) Brazil within South America. (b) The state of Pernambuco, Brazil. Aerial images from (c) 4 January 2025, and (d) 2 June 2025. (e) Overview of the sampled Cenchrus ciliaris L. (buffelgrass) plots.
Drones 10 00061 g001
Figure 2. Sampling of fresh biomass from Cenchrus ciliaris L. (buffelgrass) plots in the present study. Each plot consisted of four rows, of which only the two central rows were considered for sampling. Each of these two rows was subdivided into three equal segments, resulting in six sampled units per plot, from which fresh biomass was harvested and weighed (kg).
Figure 2. Sampling of fresh biomass from Cenchrus ciliaris L. (buffelgrass) plots in the present study. Each plot consisted of four rows, of which only the two central rows were considered for sampling. Each of these two rows was subdivided into three equal segments, resulting in six sampled units per plot, from which fresh biomass was harvested and weighed (kg).
Drones 10 00061 g002
Figure 4. Performance of machine learning models applied to different data sources (Gray-Level Co-occurrence Matrix-GLCM, Multispectral-MSI, and Red, Green, and Blue-RGB) and their variations with soil pixel removal (rsoil), evaluated by RMSE (root mean square error), MAE (mean absolute error), MAPE (mean absolute percentage error), and CCC (Lin’s concordance correlation coefficient) indicators in the prediction of fresh biomass (kg). In the radial charts, superior model performance is represented by polygons closer to the center for error metrics (RMSE, MAE, and MAPE) and closer to the outer edges for the CCC.
Figure 4. Performance of machine learning models applied to different data sources (Gray-Level Co-occurrence Matrix-GLCM, Multispectral-MSI, and Red, Green, and Blue-RGB) and their variations with soil pixel removal (rsoil), evaluated by RMSE (root mean square error), MAE (mean absolute error), MAPE (mean absolute percentage error), and CCC (Lin’s concordance correlation coefficient) indicators in the prediction of fresh biomass (kg). In the radial charts, superior model performance is represented by polygons closer to the center for error metrics (RMSE, MAE, and MAPE) and closer to the outer edges for the CCC.
Drones 10 00061 g004
Figure 5. Selection and frequency of spectral and textural features by different feature selection methods. (a) Breakdown of selected versus non-selected features. (b) Proportion of selection relative to the total available features per feature set. (c) Frequency of feature selection. Where rsoil: soil removed, CO: Contrast, ME: Mean, ENT: Entropy, COR: Correlation, VAR: Variance, ASM: Angular Second Moment, MaxProb: Maximum Probability, Corr: Correlation-based Elimination, and RFE: Recursive Feature Elimination.
Figure 5. Selection and frequency of spectral and textural features by different feature selection methods. (a) Breakdown of selected versus non-selected features. (b) Proportion of selection relative to the total available features per feature set. (c) Frequency of feature selection. Where rsoil: soil removed, CO: Contrast, ME: Mean, ENT: Entropy, COR: Correlation, VAR: Variance, ASM: Angular Second Moment, MaxProb: Maximum Probability, Corr: Correlation-based Elimination, and RFE: Recursive Feature Elimination.
Drones 10 00061 g005
Figure 6. Performance (RMSE) of different feature selection methods applied to three feature groups combined with distinct machine learning models in the prediction of fresh biomass (kg). (a) Model performance for each selection method compared to using all attributes (gray area). (b) ΔRMSE (RMSEselection algorithm − RMSEAll features) of models after feature selection. Negative values indicate an error reduction. Where rsoil: soil removed, Corr: Correlation-based Elimination, and RFE: Recursive Feature Elimination.
Figure 6. Performance (RMSE) of different feature selection methods applied to three feature groups combined with distinct machine learning models in the prediction of fresh biomass (kg). (a) Model performance for each selection method compared to using all attributes (gray area). (b) ΔRMSE (RMSEselection algorithm − RMSEAll features) of models after feature selection. Negative values indicate an error reduction. Where rsoil: soil removed, Corr: Correlation-based Elimination, and RFE: Recursive Feature Elimination.
Drones 10 00061 g006
Figure 7. Variations in relative root mean square error (ΔRMSE) across different combinations of predictor sets and feature selection methods in the prediction of fresh biomass (kg). (a) Comparison of ΔRMSE (RMSEwithout selection − RMSEwith selection) of the combination with and without feature selection. (b) Comparison of ΔRMSE ( RMSE left RMSE right ) among the different possible combinations using RFE (Recursive Feature Elimination). Where rsoil: soil removed, and All: RGBrsoil + MSIrsoil + GLCMrsoil.
Figure 7. Variations in relative root mean square error (ΔRMSE) across different combinations of predictor sets and feature selection methods in the prediction of fresh biomass (kg). (a) Comparison of ΔRMSE (RMSEwithout selection − RMSEwith selection) of the combination with and without feature selection. (b) Comparison of ΔRMSE ( RMSE left RMSE right ) among the different possible combinations using RFE (Recursive Feature Elimination). Where rsoil: soil removed, and All: RGBrsoil + MSIrsoil + GLCMrsoil.
Drones 10 00061 g007
Figure 8. Change in root mean square error (ΔRMSE) as a function of the isolated use of GLCMrsoil, MSIrsoil, and RGBrsoil after applying Boruta, and their combinations (All (RGBrsoil + MSIrsoil + GLCMrsoil), GLCMrsoil, MSIrsoil, RGBrsoil, GLCMrsoil + MSIrsoil, GLCMrsoil + RGBrsoil, and MSIrsoil + RGBrsoil) after applying RFE (Recursive Feature Elimination) for different machine learning models in the prediction of fresh biomass (kg). (a) Isolated use of GLCM and its combinations. (b) Isolated use of MSI and its combinations. (c) Isolated use of RGB and its combinations.
Figure 8. Change in root mean square error (ΔRMSE) as a function of the isolated use of GLCMrsoil, MSIrsoil, and RGBrsoil after applying Boruta, and their combinations (All (RGBrsoil + MSIrsoil + GLCMrsoil), GLCMrsoil, MSIrsoil, RGBrsoil, GLCMrsoil + MSIrsoil, GLCMrsoil + RGBrsoil, and MSIrsoil + RGBrsoil) after applying RFE (Recursive Feature Elimination) for different machine learning models in the prediction of fresh biomass (kg). (a) Isolated use of GLCM and its combinations. (b) Isolated use of MSI and its combinations. (c) Isolated use of RGB and its combinations.
Drones 10 00061 g008
Figure 9. Performance metrics for the machine learning models based on different predictor sets (MSIrsoil, MSIrsoil + RGBrsoil − RFE, and RGBrsoil − Boruta) in the prediction of fresh biomass (kg). (a) Root Mean Square Error (RMSE), (b) Mean Absolute Error (MAE), (c) Mean Absolute Percentage Error (MAPE), and (d) Concordance Correlation Coefficient (CCC).
Figure 9. Performance metrics for the machine learning models based on different predictor sets (MSIrsoil, MSIrsoil + RGBrsoil − RFE, and RGBrsoil − Boruta) in the prediction of fresh biomass (kg). (a) Root Mean Square Error (RMSE), (b) Mean Absolute Error (MAE), (c) Mean Absolute Percentage Error (MAPE), and (d) Concordance Correlation Coefficient (CCC).
Drones 10 00061 g009
Figure 10. (a) The predictive model performance and (b) relative feature importance. The red dashed line represents the reference line for prediction (1:1).
Figure 10. (a) The predictive model performance and (b) relative feature importance. The red dashed line represents the reference line for prediction (1:1).
Drones 10 00061 g010
Table 1. Average values for each treatment and date.
Table 1. Average values for each treatment and date.
DateIrrigation Depths (% ET0)Biomass (kg)
per Treatmentper Date
5 January 2025500.430.71
750.78
1000.72
1250.90
3 June 2025500.840.92
750.99
1000.96
1250.88
Irrigation management was based on reference evapotranspiration (ET0).
Table 2. Vegetation indices and equations used.
Table 2. Vegetation indices and equations used.
SensorIndexAbbr.EquationsReferences
MSINormalized Difference Vegetation IndexNDVI(NIR − R)/(NIR − R)[39]
Normalized difference red edgeNDRE(NIR − RE)/(NIR + RE)[40]
Normalized NIR indexNNIRNIR/(NIR + RE + G)[41]
Ratio Vegetation IndexRVINIR/R[42]
Green Normalized Difference Vegetation IndexGNDVI(NIR − G)/(NIR + G)[43]
Modified Chlorophyll Absorption in Reflectance IndexMCARI((RE − R) − 0.2(RE − G)) × (RE/R)[44]
MERIS total chlorophyll indexMTCI(NIR − RE)/(RE − R)[12]
Triangular Vegetation IndexTVI0.5(120(NIR − G) − 200(R − G))[45]
Spectral Feature Depth Vegetation IndexSFDVI((NIR + G)/2) − ((R + RE)/2)[46]
Soil adjusted vegetation indexSAVI(NIR − R)(1 + 0.5)/(NIR + R + 0.5)[47]
Green optimal soil adjusted vegetation indexGOSAVI(1 + 0.16) × (NIR − G)/(NIR + G + 0.16)[48]
Normalized green red difference indexNGRDIM(G − R)/(G − R)[49]
Green-red ratio indexGRRIMG/R[50]
Optimized Soil Adjusted Vegetation Index-GreenOSAVIgreenM(1.5(G − R))/((G + R) + 0.16)[41]
Enhanced Vegetation Index 2-GreenEVI2greenM(2.5(G − R))/(G + 2.4R + 1)[41]
Excess Red Vegetation IndexExRM1.4R − G[51]
RGBNormalized green red difference indexNGRDI(G − R)/(G + R)[49]
Green-red ratio indexGRRIG/R[50]
Optimized Soil Adjusted Vegetation Index-GreenOSAVIgreen(1.5(G − R))/((G + R) + 0.16)[41]
Enhanced Vegetation Index 2-GreenEVI2green(2.5(G − R))/(G + 2.4R + 1)[41]
Excess Red Vegetation IndexExR1.4R − G[51]
Excess Green vegetation indexExG2G − R − B[51]
Excess Blue Vegetation IndexExB1.4B − G[51]
Normalized Difference Vegetation Index RGBNDVIrgb((G + B) − R)/((G + B) + R)[41]
Green leaf indexGLI(2G − R − B)/(2G + R + B)[52]
Normalized pigment chlorophyll ratio indexNPCI(R − B)/(R + B)[53]
Visible atmospherically resistant indexVARI(G − R)/(G + R − B)[54]
Woebbecke IndexWI(G − B)/(R − B)[51]
Normalized RedRnR/(R + G + B)[50]
Normalized GreenGnG/(R + G + B)[50]
Normalized BlueBnB/(R + G + B)[50]
Color Intensity IndexINT(R + G + B)/3[55]
Brightness IndexBI((R2 + G2 + B2)/3)0.5[56]
Overall Hue IndexHUEatan(2(B − G − R)/30.5(G − R))[38]
Abbr.: Abbreviation; RGB index—R: Red, B: Blue, G: Green; MSI index—G: Green (560 ± 16 nm), R: Red (650 ± 16 nm), NIR: Near-infrared (860 ± 26 nm), and RE: Red edge (730 ± 16 nm).
Table 3. Description of the GLCM metrics.
Table 3. Description of the GLCM metrics.
MetricsDescription
Contrast (CO) i , j N 1 P i , j i j 2
Mean (ME) i , j N 1 i   ×   P i , j
Entropy (ENT) i , j   N 1 P i , j ( ln P i , j )
Correlation (COR) i , j N 1 P i , j ( i μ i ) ( j μ j ) σ i 2 σ j 2
Variance (VAR) i , j N 1 P i , j i μ i 2
Angular Second Moment (ASM) i , j N 1 P i , j 2
Maximum Probability (MaxProb) max i , j P i , j
Pi,j: normalization of the matrix that represents an approximation of the probability of values i and j occurring in adjacent pixels within the defined window, i: value of a target pixel, j: value of the neighbor of pixel i, and N: number of rows or columns.
Table 4. Description of the datasets constructed for biomass modeling.
Table 4. Description of the datasets constructed for biomass modeling.
Data SourcesRGB
Indices and Bands
MSI
Indices and Bands
GLCM MetricsSoil Pixel Removal
1RGBYesNoNoNo
2RGBrsoilYesNoNoYes
3MSINoYesNoNo
4MSIrsoilNoYesNoYes
5GLCMNoNoYesNo
6GLCMrsoilNoNoYesYes
7RGB + GLCMYesNoYesNo
8RGBrsoil + GLCMrsoilYesNoYesYes
9RGB + MSIYesYesNoNo
10RGBrsoil + MSIrsoilYesYesNoYes
11MSI + GLCMNoYesYesNo
12MSIrsoil + GLCMrsoilNoYesYesYes
13RGB + MSI + GLCMYesYesYesNo
14RGBrsoil + MSIrsoil + GLCMrsoilYesYesYesYes
GLCM: Gray-Level Co-occurrence Matrix, MSI: Multispectral, RGB: Red, Green, and Blue, rsoil: soil removed.
Table 5. Tuned hyperparameters.
Table 5. Tuned hyperparameters.
ModelsEngines (Packages)Hyperparameters
Ridgeglmnetpenalty
Lassoglmnetpenalty
Elastic Netglmnetpenalty, mixture
SVM Linearkernlabcost, margin
SVM Kernel RBFkernlabcost, rbf_sigma
K-Nearest Neighborskknnneighbors
Random Forestrangertrees
Extra Treesrangertrees
Decision Treesrpartcost_complexity
XGBoostxgboosttrees, learn_rate
LightGBMlightgbmtrees, learn_rate
CatBoostcatboosttrees, learn_rate
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Santos, W.M.d.; Jardim, A.M.d.R.F.; Martins, L.D.C.d.S.; Moura, M.B.M.d.; Silva, E.F.d.; Souza, L.S.B.d.; Cezar Bezerra, A.; Silva, J.R.I.; Silva, Ê.F.d.F.e.; de Lima, J.L.M.P.; et al. Can Environmental Analysis Algorithms Be Improved by Data Fusion and Soil Removal for UAV-Based Buffel Grass Biomass Prediction? Drones 2026, 10, 61. https://doi.org/10.3390/drones10010061

AMA Style

Santos WMd, Jardim AMdRF, Martins LDCdS, Moura MBMd, Silva EFd, Souza LSBd, Cezar Bezerra A, Silva JRI, Silva ÊFdFe, de Lima JLMP, et al. Can Environmental Analysis Algorithms Be Improved by Data Fusion and Soil Removal for UAV-Based Buffel Grass Biomass Prediction? Drones. 2026; 10(1):61. https://doi.org/10.3390/drones10010061

Chicago/Turabian Style

Santos, Wagner Martins dos, Alexandre Maniçoba da Rosa Ferraz Jardim, Lady Daiane Costa de Sousa Martins, Márcia Bruna Marim de Moura, Elania Freire da Silva, Luciana Sandra Bastos de Souza, Alan Cezar Bezerra, José Raliuson Inácio Silva, Ênio Farias de França e Silva, João L. M. P. de Lima, and et al. 2026. "Can Environmental Analysis Algorithms Be Improved by Data Fusion and Soil Removal for UAV-Based Buffel Grass Biomass Prediction?" Drones 10, no. 1: 61. https://doi.org/10.3390/drones10010061

APA Style

Santos, W. M. d., Jardim, A. M. d. R. F., Martins, L. D. C. d. S., Moura, M. B. M. d., Silva, E. F. d., Souza, L. S. B. d., Cezar Bezerra, A., Silva, J. R. I., Silva, Ê. F. d. F. e., de Lima, J. L. M. P., Morellato, L. P. C., & Silva, T. G. F. d. (2026). Can Environmental Analysis Algorithms Be Improved by Data Fusion and Soil Removal for UAV-Based Buffel Grass Biomass Prediction? Drones, 10(1), 61. https://doi.org/10.3390/drones10010061

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop