1. Introduction
Sugarcane crops (
Saccharum officinarum L.) in Brazil play a major role in agriculture because of their enormous contribution to the production of biofuel and sugar, as well as the growing importance of sugarcane bagasse in new sectors, such as the generation of bioelectricity and the supply of renewable raw material (Vandenberghe et al. [
1]). Brazil has been the world’s largest producer of sugarcane for decades (Cursi et al. [
2]), and a key reason for this success has been the fast and continuous adoption of novel technologies by farmers and industries associated with the sugarcane production in recent decades.
According to the National Supply Agency-CONAB, a Brazilian governmental agency, the sugarcane production in the 2022/23 agricultural season reached a total of 598.3 million tons, which was 3.9% higher than the production in the last season in 2021/22, despite the occurrence of severe droughts and heat in the last years. Thus, as stated above, Brazil has a notable agronomic scenario to develop technologies applied to this crop.
It is well known that sugarcane crops cover large areas in Brazil, making it challenging to evaluate crop performance in real-time by using traditional monitoring tools. Consequently, the use of modern approaches based on aerial images obtained either by satellite or by remotely piloted aircraft systems (RPAS) may be a viable option for surveying vast portions of lands cultivated with sugarcane in an inexpensive and rapid manner. Variations in aerial images of the canopies of cultivated areas are typically tracked by using vegetation indices calculated from data of wavelength bands collected during different stages of crop development, at different locations, and in cultivated fields adopting different agricultural practices. Therefore, slight variations in growth conditions can be identified, compared, and correlated by applying mathematical models. These vegetation indices are typically obtained using mathematical combinations among spectral band data from proximal, suborbital, and orbital sensors that are related to the quantity and development stage of vegetation in the area where the spectral measurement was collected (Hoffman and Todd [
3]; Simoes et al. [
4]).
For this reason, vegetation indices have been increasingly used in studies to characterize biophysical parameters, such as leaf area and green biomass, which indicate the presence and condition of vegetation at a given time due to their strong correlation with absorbed solar light [
5]. Thus, these indices can differentiate vegetated areas from non-vegetated areas, as well as identify stages of the phenological cycle of a specific agricultural crop [
6]. Ref. [
7] used the NDVI (normalized difference vegetation index) to determine the effect of soil chemical attributes, soil type, and rainfall on cotton yields and yield variability, while [
8] evaluated water potential and NDVI variability in four plant species growing in a saline semiarid ecosystem in southern Sonora, Mexico. For the estimation of sugarcane yield, Ref. [
9] developed a specific model which uses vegetation indices based on RGB image, digital elevation model, and digital surface model obtained from a camera installed in RPAS. According to the results obtained by these authors, the model had high accuracy for sugarcane fields in Thailand. Furthermore, [
10] used vegetation indices obtained from multispectral cameras coupled to RPAS to estimate sugarcane yield at the planting row level during three different growth stages in fields located in Brisbane, Australia. These authors used a generalized linear model, where the results showed that the best predictions occurred during the growth stage.
In this context, classical linear regression models are widely used in various fields of study (Cho et al. [
11]; Liu et al. [
12]; and Chaboun et al. [
13]) and linear regression can accurately identify cause-and-effect relationships among variables of interest. However, some dataset scenarios must be treated more carefully, for example, when explanatory variables have a nonlinear effect on a variable response. In this case, the application of a semiparametric model should be more appropriate.
Indeed, assuming that not all co-variates exhibit a linear effect associated with functional and arbitrary dependence on other explanatory variables, alternative approaches beyond parametric regressions should be taken into consideration. Ref. [
14] proposed semiparametric generalized linear models in which the linear predictors were assumed to be of the additive semiparametric type. Further, Ref. [
15] proposed a generalized additive model (GAM) that combines the technical features of generalized linear models with additive models while [
16] combined a nonparametric regression with parametric regression into a semiparametric regression. In addition, Ref. [
17] developed a generalized additive model for location, scale, and shape (GAMLSS) sufficiently flexible to enable the simultaneous introduction of an extensive variable of parameters. As exemplified below, several authors have applied semiparametric models within this context: Ref. [
18] showed the advantages of GAMLSS for modeling and interpretation of possible nonlinear climatic impacts during the growth of eucalyptus trees; Ref. [
19] proposed additive semiparametric models for symmetric distributions; and Ref. [
20] used semiparametric and stochastic frontier models to estimate the efficiency of maize production by smallholders in Zimbabwe.
Against this background, the present study aimed to develop a statistical model based on the analysis of data generated from two field experiments (Field A and Field B) cultivated with four commercial varieties of sugarcane, as described in detail in the
Section 2. Data from multispectral images collected by RPAS at the peak of sugarcane development were compared to agronomic data measured directly from these two sugarcane experimental fields to correlate vegetation indices with sugarcane production. Afterward, the model was validated using yield data obtained from commercial sugarcane fields and vegetation indices calculated from the time series of the Sentinel-2 satellite available on the Google Earth Engine platform. In this sense, the hypothesis tested in this study can be summarized as the application of VIs as predictors for sugarcane yield through a model based on semiparametric statistics.
It follows that the main highlights of this study are (1) RGB and multispectral images of sugarcane fields obtained by drones or satellites are an accurate and fast source of data to evaluate sugarcane growth and biomass accumulation; (2) Adjustment of semiparametric regression models applied to vegetative indices allows the accurate prediction of sugarcane traits such as sugarcane yield; (3) The use of remotely collected aerial imagery can be used as an alternative to combine or even replace laborious, expensive, and time-consuming practices in sugarcane fields; and (4) The model of sugarcane yield prediction demonstrated a reliable ability to overlap predicted data with observed data.
5. Conclusions
The IG semiparametric regression model proved to be suitable for application to experimental data (
Section 3) by explaining 73.8% of the observed data based on the generalized R-squared and explaining 92.1% of validation data obtained from commercial fields (
Section 4). For the model development, the predictor variables were “fields”, “varieties”, “NDVI”, and “VARI”, and the CTC1007 and RB966928 varieties exhibited higher productivity in Field A than in Field B. However, the same effect was not observed for the CV7870 and CV0618 varieties, indicating the effectiveness of this type of approach in correlating data obtained from different sources and in explaining the same phenomenon, in this case, sugarcane productivity. Meanwhile, for the model validation, the predictor variables were “Location”, “Area”, “NDVI”, and “VARI”, and the productivity was found to be higher at Locations A and B than at other locations. Finally, in both scenarios (development and validation) described in
Section 3 and
Section 4, the results obtained from the IG semiparametric regression model were adjusted and agreed with the respective descriptive data analysis. The model was also versatile for elucidating the effect of continuous predictive variables (“Locations" or "Experimental Fields”, “NDVI”, and “VARI”) on the response variable of the production (TCH) and was considerably easier to apply than multiple linear regression models that only enable the identification of a positive or negative direction associated with parameter estimation.
A residual analysis proved that the IG semiparametric regression model was well adjusted to the data. That is, the proposed approach was able to satisfactorily model datasets containing continuous predictor variables, including vegetation indices such as the NDVI and VARI.
Additionally, to verify the statistical performance of the IG semiparametric regression model, its versatility was compared to the standard multiple linear regression model (modeling approach usually applied to inverse Gaussian distribution) and which does not consider the smoothing function. The results indicated that the proposed semiparametric model showed a better performance than the standard model.
Assessing field productivity parameters by using conventional methods of direct field sampling can be inaccurate, tiresome, and time-consuming. Thus, this study demonstrated that spectral indices, such as NDVI and VARI that are easily extracted from aerial images collected with drones or satellites, provide a feasible alternative to support or even replace non-automatic parameters. Additionally, results of model validation with on-farm data indicate that the development of such tools can be worthy for daily-basis operations carried out by farmers, improving the agronomic efficiency of their crops.
Thus, this research shows that the use of VIs from images captured by RPAS or satellites can be used to predict sugarcane yield in a reliable way.