Sugarcane Yield Mapping Using High-Resolution Imagery Data and Machine Learning Technique

: Yield maps provide essential information to guide precision agriculture (PA) practices. Yet, on-board yield monitoring for sugarcane can be challenging. At the same time, orbital images have been widely used for indirect crop yield estimation for many crops like wheat, corn, and rice, but not for sugarcane. Due to this, the objective of this study is to explore the potential of multi-temporal imagery data as an alternative for sugarcane yield mapping. The study was based on developing predictive sugarcane yield models integrating time-series orbital imaging and a machine learning technique. A commercial sugarcane site was selected, and Sentinel-2 images were acquired from the beginning of the ratoon sprouting until harvesting of two consecutive cropping seasons. The predictive yield models RF (Random forest) and MLR (Multiple Linear Regression) were developed using orbital images and yield maps generated by a commercial sensor-system on harvesting. Original yield data were ﬁltered and interpolated with the same spatial resolution of the orbital images. The entire dataset was divided into training and testing datasets. Spectral bands, especially the near-infrared at tillering crop stage showed greater contribution to predicting sugarcane yield than the use of derived spectral vegetation indices. The Root Mean Squared Error (RMSE) obtained for the RF regression based on multiple spectral bands was 4.63 Mg ha − 1 with an R 2 of 0.70 for the testing dataset. Overall, the RF regression had better performance than the MLR to predict sugarcane yield.


Introduction
Remote sensing (RS) is a potential source of data for site-specific crop monitoring, providing spatial and temporal information. Orbital images are commonly used in agriculture to identify spectral variations resulting from soil and crop characteristics at a large-scale, supporting diagnostics for agronomical crop parameters and helping farmers to make better management decisions. For example, over the years, orbital images were used to delimit management zones for annual crops [1], monitor within-field yield variability for many crops such as corn [2] and cotton [3], map vineyard variability [4], plan the wheat harvest [5], develop crop growth model [6], and map grasslands biomass [7,8], among others. Some of the main limitations related to orbital images are the lack of ground truth data (calibration) and the measurement accuracy of the agronomical variables [9]. Furthermore, the empirical models to predict agronomical parameters based on spectral information may have spatial and temporal restrictions for application across different fields and seasons [10][11][12][13].
For the sugarcane crop, the assessment of the spatial variability is challenging due to the limited adapted solutions, mainly for yield mapping. Yield maps are essential to better understand the within-field variability, to delimit management zones, and improve site-specific management strategies [10,14,15]. Sugarcane yield maps are usually obtained from data collected directly by monitors on harvesters that present some limitations, such as the required calibration, accessibility of data processing for farmers or users, and a lack of knowledge to manage data from more than one harvester [16]. One additional challenge for sugarcane yield mapping based on yield monitors compared to grains is the high-resolution data (due to slow traveling speed of the harvester and narrow row spacing) combined with high biomass variability and noise from the yield monitor system. These limitations have guided the interest on RS techniques to monitor the sugarcane yield.
Several studies reported the strength of linear correlation between the image-based vegetation indices (VIs) and sugarcane yield forecast [17][18][19], but none of these were applied using yield data generated by harvesters at the field level. No technique is widely adopted yet for generating yield maps in sugarcane from imagery information and the current practice to relate biomass estimated by VIs and stalk yield is not consistent regarding spatial and temporal resolutions. Other strategies for sugarcane yield estimation have relied on mathematical models for regional scales that consider the meteorological and crop management variables at a low spatial resolution [20], which is insufficient for precision agriculture (PA) purposes. Sugarcane crop has an important economic role for Brazilian agribusiness with about 9 M ha, especially for the São Paulo state that corresponds to 53.70% of national sugarcane production (Agricultural Economics Institute-IEA, 2019), where this study was developed.
One of the initial research studies regarding spectral patterns of reflectance for sugarcane was conducted by Simões et al. [21] that found strong correlations between the red spectral band (648 nm) and production parameters (leaf area index-LAI and total sugarcane biomass) until the end of the vegetative stage. Abdel-Rahman et al. [22] listed different applications of RS techniques for sugarcane, such as disease detection, crop health status, and nutrition scouting by identifying patterns in spectral data throughout its phenological stages and orbital images. Lisboa et al. [23] found a relationship between the (Normalized Difference Vegetation Index) NDVI and the concentration of leaf-tissue nutrients when monitoring sugarcane yield, and they observed changes in light canopy reflectance according to the concentration of leaf-tissue nutrients. Rahman and Robson [24] developed a time-series approach through orbital images from the Sentinel-2 satellite to estimate sugarcane yield at the individual block level. Results showed that the maximum Green Normalized Difference Vegetation Index (GNDVI) values across the block were most correlated (r = 0.93) with the average actual yield, reported by the sugar mill. Abdel-Rahman et al. [25] investigated RF (Random Forest) predictive models considering VIs from Landsat TM to study sugarcane yield under rainfed and irrigated conditions for two sugarcane varieties. The results suggested a strong relationship for predicting sugarcane yield, mainly, under irrigated conditions. Despite the satisfactory prediction accuracy, the field data collection was performed using manual harvesting (average yield values per plot). As such, the site-specific yield estimation is still a gap to bridge in research.
Another common feature of previous research is the reliance on a single image at a given crop stage to investigate the relationship between spectral VIs and yield. Given the relatively higher temporal resolution of available imagery (for example, the revisit frequency of Sentinel-2 constellation is five days), and the potential of novel analytic tools to handle many variable predictors (see below for commentary on Machine Learning techniques), a time series approach for sugarcane yield prediction is worthwhile to investigate. Machine Learning (ML) techniques have been shown to provide higher prediction accuracy compared to the traditional statistical analyses as well as to identify dataset patterns [26][27][28]. These models are commonly subjected to tests and training processes, depending on the complexity of the dataset. The potential of ML techniques, such as the RF and ANN (Artificial Neural Network), for agronomical data, was verified by Yuan et al. [29] to estimate the LAI of soybeans. The authors concluded that RF, which is a non-parametric method, is more accurate to estimate when sample plots and variation are relatively large. Yue et al. [30] and Han et al. [31] also reported the application of RF and ANN to assess the above-ground biomass of wheat and maize, respectively. These studies verified that the RF algorithm provided better results than other ML techniques. Schwalbert et al. [32] evaluated the Remote Sens. 2021, 13, 232 3 of 14 contribution of weather variables to estimate corn yield based on RS data and RF algorithm, which resulted in a mean absolute error (MAE) of about 0.89 Mg ha −1 .
Thus, considering the previous experiences, the purpose of this investigation was to assess spectral bands and derived-VIs from orbital images to develop predictive sugarcane yield models based on RF and MLR (Multiple Linear Regression) algorithms. The specific objectives were (i) to develop a yield prediction model based on time-series analysis of Sentinel-2 images and (ii) to compare the accuracy of MLR and RF regression methods.

Study Site
The study used a commercial site of 56.4 ha (Figure 1 The climate in the study region is mesothermal, Cwa-humid subtropical, which includes drought in the winter (from June to September) and rain from November to April. The average annual rainfall in the municipality is 1433 mm. The air relative humidity is 71% with an annual average temperature of 23 • C. All fields of the study site had the same variety of sugarcane, SP83-2847 (4th ratoon), planted with a row spacing of 1.5 m in Argisol [33], and it was mechanically harvested in the late season (wet time of the year-October and November). The previous harvesting dates (10/14/2017 and 11/11/2018) were considered as a reference to determine the days after cutting (DAC) associated with the date of the orbital images and related to the phenological crop stage. Thus, considering the previous experiences, the purpose of this investigation was to assess spectral bands and derived-VIs from orbital images to develop predictive sugarcane yield models based on RF and MLR (Multiple Linear Regression) algorithms. The specific objectives were (i) to develop a yield prediction model based on time-series analysis of Sentinel-2 images and (ii) to compare the accuracy of MLR and RF regression methods.

Study Site
The study used a commercial site of 56.4 ha (Figure 1), composed of four fields, located in the municipality of Botucatu, São Paulo, Brazil (22°41′42.8″ S; 48°16′54.0″ W; 480 m) during the 2017/2018 and 2019/2020 sugarcane growing seasons. The climate in the study region is mesothermal, Cwa-humid subtropical, which includes drought in the winter (from June to September) and rain from November to April. The average annual rainfall in the municipality is 1433 mm. The air relative humidity is 71% with an annual average temperature of 23 °C. All fields of the study site had the same variety of sugarcane, SP83-2847 (4th ratoon), planted with a row spacing of 1.5 m in Argisol [33], and it was mechanically harvested in the late season (wet time of the year-October and November). The previous harvesting dates (10/14/2017 and 11/11/2018) were considered as a reference to determine the days after cutting (DAC) associated with the date of the orbital images and related to the phenological crop stage.

Imagery Data
Fifty-four orbital images from Sentinel-2 were used by considering 2018/2019 and 2019/2020 sugarcane growing seasons. This satellite constellation uses a multi-spectral instrument with 13 spectral channels with a 290-km swath width. All images were downloaded from the United States Geological Survey (USGS) via the Earth Explorer (earthexplorer.usgs.gov). These images were selected with low cloud cover (<1%) and clipped to the region of interest. All images were projected on the WGS/UTM zone 22S. An internal

Imagery Data
Fifty-four orbital images from Sentinel-2 were used by considering 2018/2019 and 2019/2020 sugarcane growing seasons. This satellite constellation uses a multi-spectral instrument with 13 spectral channels with a 290-km swath width. All images were downloaded from the United States Geological Survey (USGS) via the Earth Explorer (earthexplorer.usgs.gov). These images were selected with low cloud cover (<1%) and clipped to the region of interest. All images were projected on the WGS/UTM zone 22S. An internal buffer of 5 m was applied to each field boundary using a Quantum Geographic Information System (QGIS 2.18.26) [34] to ensure that the spectral data corresponded to the sugarcane plants. The original orbital images were submitted to atmospheric correction before each VI calculation using a free open source plugin Semi-Automatic Classification  [34]. This was done to convert the digital numbers to reflectance data, considering the DOS1 (Dark Object Subtraction) atmospheric correction methodology [36]. Five spectral bands from Sentinel-2 were considered in this study ( Table 1). The most common VIs cited in the literature for crop yield monitoring were calculated. Four VIs were evaluated (Table 2) throughout the sugarcane development cycle, which included the NDVI [37], Normalized Difference Red-Edge Index (NDRE) [38], GNDVI [39], and Wide Dynamic Range Vegetation Index (WDRVI) [40]. As suggested by Abrahão et al. [41] and Maresma et al. [42], the value 0.1 was adopted for the weighting coefficient (a) of WDRVI.

Vegetation Index Equation Authors
Normalized Difference Vegetation Index NDVI = (NIR − Red)/(NIR + Red) Rouse et al. [37] Normalized Difference Red-Edge Index NDRE = (NIR − Red-edge)/(NIR + Red-edge) Barnes et al. [38] Green Normalized Difference Vegetation Index GNDVI = (NIR − Green)/(NIR + Green) Gitelson et al. [39] Wide Dynamic Range Vegetation Index WDRVI = (a × NIR − Red)/(a × NIR + Red) Gitelson [40] Red: reflectance of red region (630 nm-685 nm). red-edge: reflectance in the transition region (690 nm-730 nm). NIR: reflectance in the near-infrared region (760 nm-1500 nm). Green: reflectance of green region (542 nm-578 nm). a: weighting coefficient (0.1). Table 3 summarizes the dates of orbital images taken and the respective phenological stage of sugarcane according to Matsuoka and Stolf [43]. Each phenological stage was identified according to the DAC to standardize the variables for data processing of both seasons. No images for the initial growing stage were found without cloud cover. All images with low cloud cover (<1%) collected during the sugarcane development cycle were considered for evaluating the potential of free orbital images to assess crop yield.

Yield Data and Predictive Models
The sugarcane yield maps used as a reference for modeling were generated for both sugarcane growing seasons by an on-board commercial sensor-system (Solinftec, Araçatuba, São Paulo, Brazil) that measures the difference in hydraulic pressure of the harvester chopper calibrated to the real total yield. A Global Navigation Satellite System (GNSS) receiver with Real-Time Kinematic (RTK) differential correction signal (GPS L1/L2 + Glonass) was installed on the harvesters for georeferencing the data every five seconds (0.2 Hz). The base station was fixed in a georeferenced point at the sugar mill, which is commonly used for other agricultural operations. Two harvesters generated the data for each season with an average speed of 1.6 m s −1 . The machine cuts each row at a time following an auto-steering file previously available from mechanical planting.
The original data were converted to sugarcane yield based on the weight distribution of the haulage and filtered, according to the methodology described by Maldaner and Molin [44]. The removal of a significant amount of points is expected due to the associated error on the yield data due to the sugarcane flow stabilization time and elevator time [44]. Although filtering yield data is a common practice in PA for any crop, issues related with high spatial density and data noise make the adequate filtering of yield monitor data, which is especially important for sugarcane, with a significant amount of raw data being possibly deleted as a result [44]. The semivariogram and the interpolation of the data were carried out using Vesper 1.6 software [45] with the ordinary kriging method (spatial resolution of 10.0 m × 10.0 m). The input parameters for the variogram calculation of both datasets were: 30 lags, 50% lag tolerance, and a maximum distance of 200 m, which were considered the best adjustment parameters for semivariogram calculation.
Aware of the uneven availability of orbital images from year to year, the average value of the orbital imagery data that composed each phenological stage was used. For example, in season 2017/2018, the value set to the ripening stage 1 (R1) was the average value of the images from "08/13," "08/18," "08/23," and "08/28." Thus, as the four fields present the same variable predictors (imagery data), the datasets from both seasons were grouped. For developing the predictive yield models, the spectral bands, VIs, and sugarcane yield data from each field were combined into an entire dataset. The number of observations from the entire dataset was randomly divided into the training (2/3) and testing (1/3) dataset considering both cropping seasons.
All statistical analyses and yield prediction models were performed using the R 3.5.5 software [46] with the built-in function "lm" for the MLR model, and "randomForest" from the random forest package [47] for RF regression. As it is required, the results are reproducible and the function "set.seed" was used with the value 123 before fitting the predictive models.
The predictive yield model based on MLR was fitted for each type of predictor variables (spectral bands, GNDVI, NDVI, NDRE, and WDRVI). Aiming to improve MLR prediction accuracy, the variable selection was carried out by relying on the p-value < 0.05, and that process was conducted until all the predictor variables that had a p-value > 0.05 were eliminated. To find the p-value, the built-in function "summary" from R 3.5.5 software [46] was used. An additional exploratory analysis was the spatial correlation between the VIs for each period of the sugarcane development cycle and yield data using Pearson's correlation coefficient (r). The synthesis of the methodology is shown in Figure 2.
For the RF regression predictions, it was necessary to fine-tune two parameters: (a) ntree, the average result of many trees, and (b) mtry, predictor subset value. To find the best ntree value, the prediction error rate for a range of 0 to 500 was calculated, and the one that presented the lowest error before stabilization was selected ("elbow rule") [48]. For this study, the setting 100 for ntree was chosen for both the spectral bands and VIs models. For mtry, values from 2 to 45 (maximum number of predictors variables, excluding those where variance was close to zero) were applied for the spectral bands with 22 being used. For the VIs models, values from 2 to 9 were tested and selected. In both cases, the mtry selection was the one that presented the lowest RMSE. For both models (RF and MLR), the independent variables were the reflectance values of the spectral bands (totaling 45 spectral band variables, 5 bands times 9 dates) and derived VIs (totaling 36 VIs variables, 4 VIs times 9 dates). The observed sugarcane yield (measured by the on-board sensor-system) was the dependent variable to be predicted. All models were compared considering the R 2 , RMSE, and MAE. ducible and the function "set.seed" was used with the value 123 before fitting the predictive models.
The predictive yield model based on MLR was fitted for each type of predictor variables (spectral bands, GNDVI, NDVI, NDRE, and WDRVI). Aiming to improve MLR prediction accuracy, the variable selection was carried out by relying on the p-value < 0.05, and that process was conducted until all the predictor variables that had a p-value > 0.05 were eliminated. To find the p-value, the built-in function "summary" from R 3.5.5 software [46] was used. An additional exploratory analysis was the spatial correlation between the VIs for each period of the sugarcane development cycle and yield data using Pearson's correlation coefficient (r). The synthesis of the methodology is shown in Figure  2. For the RF regression predictions, it was necessary to fine-tune two parameters: (a) ntree, the average result of many trees, and (b) mtry, predictor subset value. To find the best ntree value, the prediction error rate for a range of 0 to 500 was calculated, and the one that presented the lowest error before stabilization was selected ("elbow rule") [48]. For this study, the setting 100 for ntree was chosen for both the spectral bands and VIs models. For mtry, values from 2 to 45 (maximum number of predictors variables, excluding those where variance was close to zero) were applied for the spectral bands with 22 being used. For the VIs models, values from 2 to 9 were tested and selected. In both cases, the mtry selection was the one that presented the lowest RMSE. For both models (RF and MLR), the independent variables were the reflectance values of the spectral bands (totaling 45 spectral band variables, 5 bands times 9 dates) and derived VIs (totaling 36 VIs variables, 4 VIs times 9 dates). The observed sugarcane yield (measured by the on-board sensor-system) was the dependent variable to be predicted. All models were compared considering the R 2 , RMSE, and MAE.

Yield Data and Statistical Analyses
The descriptive statistics of the original and filtered yield data are presented in Table  4. Original data ranged from 0.86 to 501.23 Mg ha −1 for the 2018/2019 crop season, and

Yield Data and Statistical Analyses
The descriptive statistics of the original and filtered yield data are presented in Table 4.  Despite the exclusion of more than 50% of the original data, which was expected due to the local and global filtering [44], the number of points per hectare was 255, thus, meeting the requirement of at least 23 samples ha −1 according to the calculated grid (Table 5). Table 5 summarizes the best-adjusted variogram model used for interpolating yield data with an RMSE of 0.53 and 1.41 Mg ha −1 for 2018/2019 and 2019/2020 sugarcane growing seasons. An exploratory analysis of the dataset from both crop seasons indicated that the linear correlation between VIs and sugarcane yield did not present values greater than 0.50. Such results will negatively influence the MLR prediction accuracy since there is no strong linear relationship between dependent and independent variables. That is the main reason for combining multiple imagery dates (time-series) and RF regression as an alternative to perform yield prediction.

Selection of Predictor Variables
For the model developed from spectral bands using MLR, 45 variables were originally considered (five spectral bands times nine dates). As a result of the variable selection considering the p-value < 0.05, the fitted model relied on the use of 22 predictor variables. Among the evaluated VIs, the variables were selected according to the crop stage and the correspondent p-value. For example, the predictor variable "NDVI_R2," relating to ripening crop stage, was removed for not being significant to the MLR. The most important variables for predicting sugarcane yield with RF regression (Figure 3) were identified as the NIR spectral band (B8), and NDRE at "T2" crop stage, "T3" crop stage for NDVI, and WDRVI, all relating to the tillering crop stage, and "R1", relating to the ripening crop stage for GNDVI (for each group of variables). For the spectral bands, Figure 3A shows that RF regression is more sensitive to the three predictor variables (T2_B8, T2_B5, and R1_B8). This also suggests that green (B3) and red (B4) spectral bands were less relevant for modeling the sugarcane yield. Li et al. [50] suggested the use of the red-edge spectral band as an alternative to reduce the influence of the soil background at the early stages of wheat, as well as saturation for later phenological stages of the crop. A similar approach was proposed by Cui et al. [51] and Sun et al. [52]. Table 6 summarizes the relative error and the accuracy of the predictive models (MLR and RF) for different predictor variables and dataset. RF regression showed a better accuracy, according to the RMSE and R 2 values, for the testing dataset. Considering the test dataset, the RF regression had RMSE values ranging from 4.63 to 5.47 Mg ha −1 , which was lower than the MLR model (RMSE closer to 6.0 Mg ha −1 ). Other studies reported RF prediction as more accurate than MLR [2,53].

Accuracy Assessment
Generally, all VIs showed a similar level of accuracy for predicting sugarcane yield considering the test dataset with an RMSE of about 5.40 Mg ha −1 ( Table 6). The spectral bands were more effective (the highest R 2 and the lowest RMSE) in estimating sugarcane yield for both (RF and MLR) models. In other words, better results were found when using the spectral bands directly rather than calculating VIs (Table 6). This result should influence the computational processing of the orbital images because it is less time-consuming to determine spectral bands when compared to the normalization of the reflectance values. Figure 4 shows the comparison between the observed yield map of the 2018/2019 crop season ( Figure 4A) and the predicted yield maps from spectral bands or using the NDRE (the best performing VI according to Table 6), based on either RF regression or MLR. These results corroborate with previous results regarding the better performance of spectral bands and RF. Figure 4D,E represent MLR prediction considering NDRE and spectral bands, respectively, which showed less similarity with the real yield map ( Figure 4A). Similar results were found for the 2019/2020 season ( Figure 5).
WDRVI, all relating to the tillering crop stage, and "R1", relating to the ripening crop stage for GNDVI (for each group of variables). For the spectral bands, Figure 3A shows that RF regression is more sensitive to the three predictor variables (T2_B8, T2_B5, and R1_B8). This also suggests that green (B3) and red (B4) spectral bands were less relevant for modeling the sugarcane yield. Li et al. [50] suggested the use of the red-edge spectral band as an alternative to reduce the influence of the soil background at the early stages of wheat, as well as saturation for later phenological stages of the crop. A similar approach was proposed by Cui et al. [51] and Sun et al. [52].  Table 6 summarizes the relative error and the accuracy of the predictive models (MLR and RF) for different predictor variables and dataset. RF regression showed a better accuracy, according to the RMSE and R 2 values, for the testing dataset. Considering the test dataset, the RF regression had RMSE values ranging from 4.63 to 5.47 Mg ha −1 , which was lower than the MLR model (RMSE closer to 6.0 Mg ha −1 ). Other studies reported RF prediction as more accurate than MLR [2,53].

Discussion
This study presents a first approach to explore the multi-temporal satellite-based method to determine sugarcane yield at the field level. It was possible to use 54 orbital images that were merged according to their phenological stage, as shown in Table 3 generating nine new variables. These variables had values of five spectral bands and four VIs with a spatial resolution of 10 m × 10 m assigned to compose the data used to infer the

Discussion
This study presents a first approach to explore the multi-temporal satellite-based method to determine sugarcane yield at the field level. It was possible to use 54 orbital images that were merged according to their phenological stage, as shown in Table 3 generating nine new variables. These variables had values of five spectral bands and four VIs with a spatial resolution of 10 m × 10 m assigned to compose the data used to infer the potential of MLR and RF regression methods to estimate sugarcane yield. In addition, the type of input predictors for both models were evaluated to fit the predictive models using high-resolution data. The study demonstrates the potential of using spectral bands as an alternative approach to the common crop yield forecast based on VIs [23,[54][55][56]. The spatial variability of the fields was mapped for the range of yield data values across all fields over the sugarcane growing seasons. The association of high-resolution imagery data and ML technique showed satisfactory results for estimating and mapping the sugarcane yield, which can be applied for supporting PA practices and, also, to provide a better understanding of the spatial variability within-field over the season.
The time-series analysis of orbital images enabled the development of a methodology based on the ML technique to advance on understanding the spatio-temporal variability of sugarcane fields. It helps to improve the decision-making process, as imagery data can provide relevant information regarding yield potential over the season. Previously, Dubey et al. [56] developed a time-series analysis for VIs to forecast sugarcane yield based on empirical models with RMSE values ranging from 7.0 to 17.0 Mg ha −1 , but such an analysis was conducted at the district level.
In this study, the fact that individual correlations between predicting and target variables were low, combined with the clear autocorrelation in predicting variables, supports the use of a multivariate input (time-series) through a robust ML algorithm (the RF). It was expected that linear approaches (MLR) that predict yield based on VIs would present worse accuracy performance than non-linear models (e.g., RF regression). Zhao et al. [57] applied MLR with spectral bands and simple linear regression with NDVI to predict sugarcane yield. They found that MLR did not improve accuracy when compared to the simple linear regression. Thus, they proceeded using the simple linear regression and found R 2 values ranging from 0.22 to 0.41. Comparing their result with this study, the MLR presented lower R 2 than those found in Zhao et al. [57], even using a greater number of predictor variables (22 instead of 8). Despite the higher R 2 found in the MLR model, the RF regression is the model that outperformed the linear approach, highlighting its suitability to be used to support yield prediction models based on orbital images.
Comparing NDVI, GNDVI, NDRE, and WDRVI, they all presented similar accuracy results for estimating the sugarcane yield. These results, except for NDRE, corroborate the results in Zhao et al. [57] that reported no significant difference among the use of GNDVI, NDVI, and WDRVI for predicting sugarcane crop growth and yield. However, their approach based on linear regression models was different from the one in this study that tested both linear and non-linear models, as it was noted that the non-linear (RF regression) model performed better (higher R 2 , lower RMSE, and MAE), independent of the type of predictor variables. Fine tuning of the model should include environmental variables and agricultural management practices because they can potentially improve the accuracy of predictive models. Additionally, this study considered a two-year single site study, while the ideal condition to refine the modeling should involve multiple seasons and several commercial fields.
Future research on RS tools for crop monitoring should also aim at near real-time diagnostics of crop status to support local interventions and optimize agricultural production. The advantages of RS applications for agricultural systems involve advancing on integrating machinery operation, according to the field conditions, which can be assessed with greater accuracy than traditional methods (low-scale and labor-intensive).

Conclusions
Integrating multi-temporal imagery data from Sentinel-2 and the RF regression method enabled the development of predictive yield models for commercial sugarcane fields. The use of spectral bands outperformed derived-VIs. In addition, the RF regression showed greater accuracy (lowest RMSE and higher R 2 ) when compared with the MLR. Overall, spatial patterns were successfully verified at the field level using high-resolution RS data with time-series analysis and the ML technique.