Sugarcane Yield Mapping Using High-Resolution Imagery Data and Machine Learning Technique

Canata, Tatiana Fernanda; Wei, Marcelo Chan Fu; Maldaner, Leonardo Felipe; Molin, José Paulo

doi:10.3390/rs13020232

Open AccessArticle

Sugarcane Yield Mapping Using High-Resolution Imagery Data and Machine Learning Technique

by

Tatiana Fernanda Canata

^*

,

Marcelo Chan Fu Wei

,

Leonardo Felipe Maldaner

and

José Paulo Molin

Department of Biosystems Engineering, “Luiz de Queiroz” College of Agriculture (ESALQ), University of Sao Paulo (USP), 11 Padua Dias Avenue, Piracicaba 13418-900, Brazil

^*

Author to whom correspondence should be addressed.

Remote Sens. 2021, 13(2), 232; https://doi.org/10.3390/rs13020232

Submission received: 15 December 2020 / Revised: 7 January 2021 / Accepted: 8 January 2021 / Published: 12 January 2021

(This article belongs to the Special Issue Digital Agriculture with Remote Sensing)

Download

Browse Figures

Versions Notes

Abstract

Yield maps provide essential information to guide precision agriculture (PA) practices. Yet, on-board yield monitoring for sugarcane can be challenging. At the same time, orbital images have been widely used for indirect crop yield estimation for many crops like wheat, corn, and rice, but not for sugarcane. Due to this, the objective of this study is to explore the potential of multi-temporal imagery data as an alternative for sugarcane yield mapping. The study was based on developing predictive sugarcane yield models integrating time-series orbital imaging and a machine learning technique. A commercial sugarcane site was selected, and Sentinel-2 images were acquired from the beginning of the ratoon sprouting until harvesting of two consecutive cropping seasons. The predictive yield models RF (Random forest) and MLR (Multiple Linear Regression) were developed using orbital images and yield maps generated by a commercial sensor-system on harvesting. Original yield data were filtered and interpolated with the same spatial resolution of the orbital images. The entire dataset was divided into training and testing datasets. Spectral bands, especially the near-infrared at tillering crop stage showed greater contribution to predicting sugarcane yield than the use of derived spectral vegetation indices. The Root Mean Squared Error (RMSE) obtained for the RF regression based on multiple spectral bands was 4.63 Mg ha⁻¹ with an R² of 0.70 for the testing dataset. Overall, the RF regression had better performance than the MLR to predict sugarcane yield.

Keywords:

orbital images; precision agriculture; remote sensing; vegetation index

Graphical Abstract

1. Introduction

Remote sensing (RS) is a potential source of data for site-specific crop monitoring, providing spatial and temporal information. Orbital images are commonly used in agriculture to identify spectral variations resulting from soil and crop characteristics at a large-scale, supporting diagnostics for agronomical crop parameters and helping farmers to make better management decisions. For example, over the years, orbital images were used to delimit management zones for annual crops [1], monitor within-field yield variability for many crops such as corn [2] and cotton [3], map vineyard variability [4], plan the wheat harvest [5], develop crop growth model [6], and map grasslands biomass [7,8], among others. Some of the main limitations related to orbital images are the lack of ground truth data (calibration) and the measurement accuracy of the agronomical variables [9]. Furthermore, the empirical models to predict agronomical parameters based on spectral information may have spatial and temporal restrictions for application across different fields and seasons [10,11,12,13].

For the sugarcane crop, the assessment of the spatial variability is challenging due to the limited adapted solutions, mainly for yield mapping. Yield maps are essential to better understand the within-field variability, to delimit management zones, and improve site-specific management strategies [10,14,15]. Sugarcane yield maps are usually obtained from data collected directly by monitors on harvesters that present some limitations, such as the required calibration, accessibility of data processing for farmers or users, and a lack of knowledge to manage data from more than one harvester [16]. One additional challenge for sugarcane yield mapping based on yield monitors compared to grains is the high-resolution data (due to slow traveling speed of the harvester and narrow row spacing) combined with high biomass variability and noise from the yield monitor system. These limitations have guided the interest on RS techniques to monitor the sugarcane yield.

Several studies reported the strength of linear correlation between the image-based vegetation indices (VIs) and sugarcane yield forecast [17,18,19], but none of these were applied using yield data generated by harvesters at the field level. No technique is widely adopted yet for generating yield maps in sugarcane from imagery information and the current practice to relate biomass estimated by VIs and stalk yield is not consistent regarding spatial and temporal resolutions. Other strategies for sugarcane yield estimation have relied on mathematical models for regional scales that consider the meteorological and crop management variables at a low spatial resolution [20], which is insufficient for precision agriculture (PA) purposes. Sugarcane crop has an important economic role for Brazilian agribusiness with about 9 M ha, especially for the São Paulo state that corresponds to 53.70% of national sugarcane production (Agricultural Economics Institute—IEA, 2019), where this study was developed.

One of the initial research studies regarding spectral patterns of reflectance for sugarcane was conducted by Simões et al. [21] that found strong correlations between the red spectral band (648 nm) and production parameters (leaf area index—LAI and total sugarcane biomass) until the end of the vegetative stage. Abdel-Rahman et al. [22] listed different applications of RS techniques for sugarcane, such as disease detection, crop health status, and nutrition scouting by identifying patterns in spectral data throughout its phenological stages and orbital images. Lisboa et al. [23] found a relationship between the (Normalized Difference Vegetation Index) NDVI and the concentration of leaf-tissue nutrients when monitoring sugarcane yield, and they observed changes in light canopy reflectance according to the concentration of leaf-tissue nutrients. Rahman and Robson [24] developed a time-series approach through orbital images from the Sentinel-2 satellite to estimate sugarcane yield at the individual block level. Results showed that the maximum Green Normalized Difference Vegetation Index (GNDVI) values across the block were most correlated (r = 0.93) with the average actual yield, reported by the sugar mill. Abdel-Rahman et al. [25] investigated RF (Random Forest) predictive models considering VIs from Landsat TM to study sugarcane yield under rainfed and irrigated conditions for two sugarcane varieties. The results suggested a strong relationship for predicting sugarcane yield, mainly, under irrigated conditions. Despite the satisfactory prediction accuracy, the field data collection was performed using manual harvesting (average yield values per plot). As such, the site-specific yield estimation is still a gap to bridge in research.

Another common feature of previous research is the reliance on a single image at a given crop stage to investigate the relationship between spectral VIs and yield. Given the relatively higher temporal resolution of available imagery (for example, the revisit frequency of Sentinel-2 constellation is five days), and the potential of novel analytic tools to handle many variable predictors (see below for commentary on Machine Learning techniques), a time series approach for sugarcane yield prediction is worthwhile to investigate. Machine Learning (ML) techniques have been shown to provide higher prediction accuracy compared to the traditional statistical analyses as well as to identify dataset patterns [26,27,28]. These models are commonly subjected to tests and training processes, depending on the complexity of the dataset. The potential of ML techniques, such as the RF and ANN (Artificial Neural Network), for agronomical data, was verified by Yuan et al. [29] to estimate the LAI of soybeans. The authors concluded that RF, which is a non-parametric method, is more accurate to estimate when sample plots and variation are relatively large. Yue et al. [30] and Han et al. [31] also reported the application of RF and ANN to assess the above-ground biomass of wheat and maize, respectively. These studies verified that the RF algorithm provided better results than other ML techniques. Schwalbert et al. [32] evaluated the contribution of weather variables to estimate corn yield based on RS data and RF algorithm, which resulted in a mean absolute error (MAE) of about 0.89 Mg ha⁻¹.

Thus, considering the previous experiences, the purpose of this investigation was to assess spectral bands and derived-VIs from orbital images to develop predictive sugarcane yield models based on RF and MLR (Multiple Linear Regression) algorithms. The specific objectives were (i) to develop a yield prediction model based on time-series analysis of Sentinel-2 images and (ii) to compare the accuracy of MLR and RF regression methods.

2. Materials and Methods

2.1. Study Site

The study used a commercial site of 56.4 ha (Figure 1), composed of four fields, located in the municipality of Botucatu, São Paulo, Brazil (22°41′42.8″ S; 48°16′54.0″ W; 480 m) during the 2017/2018 and 2019/2020 sugarcane growing seasons. The climate in the study region is mesothermal, Cwa—humid subtropical, which includes drought in the winter (from June to September) and rain from November to April. The average annual rainfall in the municipality is 1433 mm. The air relative humidity is 71% with an annual average temperature of 23 °C. All fields of the study site had the same variety of sugarcane, SP83-2847 (4th ratoon), planted with a row spacing of 1.5 m in Argisol [33], and it was mechanically harvested in the late season (wet time of the year–October and November). The previous harvesting dates (10/14/2017 and 11/11/2018) were considered as a reference to determine the days after cutting (DAC) associated with the date of the orbital images and related to the phenological crop stage.

2.2. Imagery Data

Fifty-four orbital images from Sentinel-2 were used by considering 2018/2019 and 2019/2020 sugarcane growing seasons. This satellite constellation uses a multi-spectral instrument with 13 spectral channels with a 290-km swath width. All images were downloaded from the United States Geological Survey (USGS) via the Earth Explorer (earthexplorer.usgs.gov). These images were selected with low cloud cover (<1%) and clipped to the region of interest. All images were projected on the WGS/UTM zone 22S. An internal buffer of 5 m was applied to each field boundary using a Quantum Geographic Information System (QGIS 2.18.26) [34] to ensure that the spectral data corresponded to the sugarcane plants. The original orbital images were submitted to atmospheric correction before each VI calculation using a free open source plugin Semi-Automatic Classification 5.4.2 [35] on QGIS 2.18.26 [34]. This was done to convert the digital numbers to reflectance data, considering the DOS1 (Dark Object Subtraction) atmospheric correction methodology [36]. Five spectral bands from Sentinel-2 were considered in this study (Table 1). The most common VIs cited in the literature for crop yield monitoring were calculated. Four VIs were evaluated (Table 2) throughout the sugarcane development cycle, which included the NDVI [37], Normalized Difference Red-Edge Index (NDRE) [38], GNDVI [39], and Wide Dynamic Range Vegetation Index (WDRVI) [40]. As suggested by Abrahão et al. [41] and Maresma et al. [42], the value 0.1 was adopted for the weighting coefficient (a) of WDRVI.

Table 3 summarizes the dates of orbital images taken and the respective phenological stage of sugarcane according to Matsuoka and Stolf [43]. Each phenological stage was identified according to the DAC to standardize the variables for data processing of both seasons. No images for the initial growing stage were found without cloud cover. All images with low cloud cover (<1%) collected during the sugarcane development cycle were considered for evaluating the potential of free orbital images to assess crop yield.

2.3. Yield Data and Predictive Models

The sugarcane yield maps used as a reference for modeling were generated for both sugarcane growing seasons by an on-board commercial sensor-system (Solinftec, Araçatuba, São Paulo, Brazil) that measures the difference in hydraulic pressure of the harvester chopper calibrated to the real total yield. A Global Navigation Satellite System (GNSS) receiver with Real-Time Kinematic (RTK) differential correction signal (GPS L1/L2 + Glonass) was installed on the harvesters for georeferencing the data every five seconds (0.2 Hz). The base station was fixed in a georeferenced point at the sugar mill, which is commonly used for other agricultural operations. Two harvesters generated the data for each season with an average speed of 1.6 m s⁻¹. The machine cuts each row at a time following an auto-steering file previously available from mechanical planting.

The original data were converted to sugarcane yield based on the weight distribution of the haulage and filtered, according to the methodology described by Maldaner and Molin [44]. The removal of a significant amount of points is expected due to the associated error on the yield data due to the sugarcane flow stabilization time and elevator time [44]. Although filtering yield data is a common practice in PA for any crop, issues related with high spatial density and data noise make the adequate filtering of yield monitor data, which is especially important for sugarcane, with a significant amount of raw data being possibly deleted as a result [44]. The semivariogram and the interpolation of the data were carried out using Vesper 1.6 software [45] with the ordinary kriging method (spatial resolution of 10.0 m × 10.0 m). The input parameters for the variogram calculation of both datasets were: 30 lags, 50% lag tolerance, and a maximum distance of 200 m, which were considered the best adjustment parameters for semivariogram calculation.

Aware of the uneven availability of orbital images from year to year, the average value of the orbital imagery data that composed each phenological stage was used. For example, in season 2017/2018, the value set to the ripening stage 1 (R1) was the average value of the images from “08/13,” “08/18,” “08/23,” and “08/28.” Thus, as the four fields present the same variable predictors (imagery data), the datasets from both seasons were grouped. For developing the predictive yield models, the spectral bands, VIs, and sugarcane yield data from each field were combined into an entire dataset. The number of observations from the entire dataset was randomly divided into the training (2/3) and testing (1/3) dataset considering both cropping seasons.

All statistical analyses and yield prediction models were performed using the R 3.5.5 software [46] with the built-in function “lm” for the MLR model, and “randomForest” from the random forest package [47] for RF regression. As it is required, the results are reproducible and the function “set.seed” was used with the value 123 before fitting the predictive models.

The predictive yield model based on MLR was fitted for each type of predictor variables (spectral bands, GNDVI, NDVI, NDRE, and WDRVI). Aiming to improve MLR prediction accuracy, the variable selection was carried out by relying on the p-value < 0.05, and that process was conducted until all the predictor variables that had a p-value > 0.05 were eliminated. To find the p-value, the built-in function “summary” from R 3.5.5 software [46] was used. An additional exploratory analysis was the spatial correlation between the VIs for each period of the sugarcane development cycle and yield data using Pearson’s correlation coefficient (r). The synthesis of the methodology is shown in Figure 2.

For the RF regression predictions, it was necessary to fine-tune two parameters: (a) ntree, the average result of many trees, and (b) mtry, predictor subset value. To find the best ntree value, the prediction error rate for a range of 0 to 500 was calculated, and the one that presented the lowest error before stabilization was selected (“elbow rule”) [48]. For this study, the setting 100 for ntree was chosen for both the spectral bands and VIs models. For mtry, values from 2 to 45 (maximum number of predictors variables, excluding those where variance was close to zero) were applied for the spectral bands with 22 being used. For the VIs models, values from 2 to 9 were tested and selected. In both cases, the mtry selection was the one that presented the lowest RMSE. For both models (RF and MLR), the independent variables were the reflectance values of the spectral bands (totaling 45 spectral band variables, 5 bands times 9 dates) and derived VIs (totaling 36 VIs variables, 4 VIs times 9 dates). The observed sugarcane yield (measured by the on-board sensor-system) was the dependent variable to be predicted. All models were compared considering the R², RMSE, and MAE.

3. Results

3.1. Yield Data and Statistical Analyses

The descriptive statistics of the original and filtered yield data are presented in Table 4. Original data ranged from 0.86 to 501.23 Mg ha⁻¹ for the 2018/2019 crop season, and from 10.39 to 498.21 Mg ha⁻¹ for the 2019/2020 crop season. The coefficient of variation (CV) of the filtered data was 11% for the 2018/2019 crop season, and 13% for the subsequent season. The results highlighted that there is a requirement to exclude discrepant values from the original data to better understand the actual spatial variability of the fields. For that, the data filtering was carried out to decrease values of standard deviation (SD) and average yield. As a result of the filtering process, the minimum and the maximum values of sugarcane yield for the 2018/2019 crop season were 36 and 86.4 Mg ha⁻¹, which are more realistic. The mean value of yield for the 2019/2020 crop season was about 71 Mg ha⁻¹. The observed SD for the filtered yield data from the 2018/2019 and 2019/2020 crop seasons were 7.06 and 9.55 Mg ha⁻¹.

Despite the exclusion of more than 50% of the original data, which was expected due to the local and global filtering [44], the number of points per hectare was 255, thus, meeting the requirement of at least 23 samples ha⁻¹ according to the calculated grid (Table 5). Table 5 summarizes the best-adjusted variogram model used for interpolating yield data with an RMSE of 0.53 and 1.41 Mg ha⁻¹ for 2018/2019 and 2019/2020 sugarcane growing seasons.

An exploratory analysis of the dataset from both crop seasons indicated that the linear correlation between VIs and sugarcane yield did not present values greater than 0.50. Such results will negatively influence the MLR prediction accuracy since there is no strong linear relationship between dependent and independent variables. That is the main reason for combining multiple imagery dates (time-series) and RF regression as an alternative to perform yield prediction.

3.2. Selection of Predictor Variables

For the model developed from spectral bands using MLR, 45 variables were originally considered (five spectral bands times nine dates). As a result of the variable selection considering the p-value < 0.05, the fitted model relied on the use of 22 predictor variables. Among the evaluated VIs, the variables were selected according to the crop stage and the correspondent p-value. For example, the predictor variable “NDVI_R2,” relating to ripening crop stage, was removed for not being significant to the MLR. The most important variables for predicting sugarcane yield with RF regression (Figure 3) were identified as the NIR spectral band (B8), and NDRE at “T2” crop stage, “T3” crop stage for NDVI, and WDRVI, all relating to the tillering crop stage, and “R1”, relating to the ripening crop stage for GNDVI (for each group of variables). For the spectral bands, Figure 3A shows that RF regression is more sensitive to the three predictor variables (T2_B8, T2_B5, and R1_B8). This also suggests that green (B3) and red (B4) spectral bands were less relevant for modeling the sugarcane yield. Li et al. [50] suggested the use of the red-edge spectral band as an alternative to reduce the influence of the soil background at the early stages of wheat, as well as saturation for later phenological stages of the crop. A similar approach was proposed by Cui et al. [51] and Sun et al. [52].

3.3. Accuracy Assessment

Table 6 summarizes the relative error and the accuracy of the predictive models (MLR and RF) for different predictor variables and dataset. RF regression showed a better accuracy, according to the RMSE and R² values, for the testing dataset. Considering the test dataset, the RF regression had RMSE values ranging from 4.63 to 5.47 Mg ha⁻¹, which was lower than the MLR model (RMSE closer to 6.0 Mg ha⁻¹). Other studies reported RF prediction as more accurate than MLR [2,53].

Generally, all VIs showed a similar level of accuracy for predicting sugarcane yield considering the test dataset with an RMSE of about 5.40 Mg ha⁻¹ (Table 6). The spectral bands were more effective (the highest R² and the lowest RMSE) in estimating sugarcane yield for both (RF and MLR) models. In other words, better results were found when using the spectral bands directly rather than calculating VIs (Table 6). This result should influence the computational processing of the orbital images because it is less time-consuming to determine spectral bands when compared to the normalization of the reflectance values.

Figure 4 shows the comparison between the observed yield map of the 2018/2019 crop season (Figure 4A) and the predicted yield maps from spectral bands or using the NDRE (the best performing VI according to Table 6), based on either RF regression or MLR. These results corroborate with previous results regarding the better performance of spectral bands and RF. Figure 4D,E represent MLR prediction considering NDRE and spectral bands, respectively, which showed less similarity with the real yield map (Figure 4A). Similar results were found for the 2019/2020 season (Figure 5).

4. Discussion

This study presents a first approach to explore the multi-temporal satellite-based method to determine sugarcane yield at the field level. It was possible to use 54 orbital images that were merged according to their phenological stage, as shown in Table 3 generating nine new variables. These variables had values of five spectral bands and four VIs with a spatial resolution of 10 m × 10 m assigned to compose the data used to infer the potential of MLR and RF regression methods to estimate sugarcane yield. In addition, the type of input predictors for both models were evaluated to fit the predictive models using high-resolution data. The study demonstrates the potential of using spectral bands as an alternative approach to the common crop yield forecast based on VIs [23,54,55,56]. The spatial variability of the fields was mapped for the range of yield data values across all fields over the sugarcane growing seasons. The association of high-resolution imagery data and ML technique showed satisfactory results for estimating and mapping the sugarcane yield, which can be applied for supporting PA practices and, also, to provide a better understanding of the spatial variability within-field over the season.

The time-series analysis of orbital images enabled the development of a methodology based on the ML technique to advance on understanding the spatio-temporal variability of sugarcane fields. It helps to improve the decision-making process, as imagery data can provide relevant information regarding yield potential over the season. Previously, Dubey et al. [56] developed a time-series analysis for VIs to forecast sugarcane yield based on empirical models with RMSE values ranging from 7.0 to 17.0 Mg ha⁻¹, but such an analysis was conducted at the district level.

In this study, the fact that individual correlations between predicting and target variables were low, combined with the clear autocorrelation in predicting variables, supports the use of a multivariate input (time-series) through a robust ML algorithm (the RF). It was expected that linear approaches (MLR) that predict yield based on VIs would present worse accuracy performance than non-linear models (e.g., RF regression). Zhao et al. [57] applied MLR with spectral bands and simple linear regression with NDVI to predict sugarcane yield. They found that MLR did not improve accuracy when compared to the simple linear regression. Thus, they proceeded using the simple linear regression and found R² values ranging from 0.22 to 0.41. Comparing their result with this study, the MLR presented lower R² than those found in Zhao et al. [57], even using a greater number of predictor variables (22 instead of 8). Despite the higher R² found in the MLR model, the RF regression is the model that outperformed the linear approach, highlighting its suitability to be used to support yield prediction models based on orbital images.

Comparing NDVI, GNDVI, NDRE, and WDRVI, they all presented similar accuracy results for estimating the sugarcane yield. These results, except for NDRE, corroborate the results in Zhao et al. [57] that reported no significant difference among the use of GNDVI, NDVI, and WDRVI for predicting sugarcane crop growth and yield. However, their approach based on linear regression models was different from the one in this study that tested both linear and non-linear models, as it was noted that the non-linear (RF regression) model performed better (higher R², lower RMSE, and MAE), independent of the type of predictor variables. Fine tuning of the model should include environmental variables and agricultural management practices because they can potentially improve the accuracy of predictive models. Additionally, this study considered a two-year single site study, while the ideal condition to refine the modeling should involve multiple seasons and several commercial fields.

Future research on RS tools for crop monitoring should also aim at near real-time diagnostics of crop status to support local interventions and optimize agricultural production. The advantages of RS applications for agricultural systems involve advancing on integrating machinery operation, according to the field conditions, which can be assessed with greater accuracy than traditional methods (low-scale and labor-intensive).

5. Conclusions

Integrating multi-temporal imagery data from Sentinel-2 and the RF regression method enabled the development of predictive yield models for commercial sugarcane fields. The use of spectral bands outperformed derived-VIs. In addition, the RF regression showed greater accuracy (lowest RMSE and higher R²) when compared with the MLR. Overall, spatial patterns were successfully verified at the field level using high-resolution RS data with time-series analysis and the ML technique.

Author Contributions

Conceptualization, T.F.C. and J.P.M. Methodology, T.F.C., M.C.F.W. and L.F.M. Software, T.F.C., M.C.F.W. and L.F.M. Formal Analysis, T.F.C. and M.C.F.W. Writing—Original Draft Preparation, T.F.C., M.C.F.W., L.F.M. and J.P.M. Writing—Review & Editing, T.F.C., M.C.F.W., L.F.M. and J.P.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Acknowledgments

For the operational support of Solinftec and São Manoel sugar mill. This study was financed in part by scholarships from the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES–Brazil–Finance Code 001). We would also like to thank Michael James Stablein for his translation services and review of this text. We are grateful to the suggestions of reviewers and editors.

Conflicts of Interest

The authors declare no conflict of interest.

References

Damian, J.M.; Pias, O.H.D.C.; Cherubin, M.R.; da Fonseca, A.Z.; Fornari, E.Z.; Santi, A.L. Applying the NDVI from satellite images in delimiting management zones for annual crops. Sci. Agricoia 2020, 77, e20180055. [Google Scholar] [CrossRef]
Kayad, A.; Sozzi, M.; Gatto, S.; Marinello, F.; Pirotti, F. Monitoring within-field variability of corn yield using Sentinel-2 and machine learning techniques. Remote Sens. 2019, 11, 2873. [Google Scholar] [CrossRef]
Baio, F.H.R.; Neves, D.C.; Campos, C.N.S.; Teodoro, P.E. Relationship between cotton productivity and variability of NDVI obtained by Landsat images. Biosci. J. 2018, 34, 197–205. [Google Scholar] [CrossRef]
Khaliq, A.; Comba, L.; Biglia, A.; Aimonino, D.R.; Chiaberge, M.; Gay, P. Comparison of satellite and UAV-based multispectral imagery for vineyard variability assessment. Remote Sens. 2019, 11, 436. [Google Scholar] [CrossRef]
Taghizadeh, S.; Navid, H.; Adiban, R.; Maghsodi, Y. Harvest chronological planning using a method based on satellite-derived vegetation indices and artificial neural networks. Span. J. Agric. Res. 2019, 17, 206. [Google Scholar] [CrossRef]
Levitan, N.; Gross, B. Utilizing collocated crop growth model simulations to train agronomic satellite retrieval algorithms. Remote Sens. 2018, 10, 1968. [Google Scholar] [CrossRef]
Cisneros, A.; Fiorio, P.; Menezes, P.; Pasqualotto, N.; Wittenberghe, V.S.; Bayma, G.; Furlan Nogueira, S. Mapping Productivity and Essential Biophysical Parameters of Cultivated Tropical Grasslands from Sentinel-2 Imagery. Agronomy 2020, 10, 711. [Google Scholar] [CrossRef]
Sibanda, M.; Mutanga, O.; Rouget, M. Examining the potential of Sentinel-2 MSI spectral resolution in quantifying above ground biomass across different fertilizer treatments. ISPRS J. Photogramm. Remote Sens. 2015, 110, 55–65. [Google Scholar] [CrossRef]
Lobell, D.B.; Thau, D.; Seifert, C.; Engle, E.; Little, B. A scalable satellite-based crop yield mapper. Remote Sens. Environ. 2015, 164, 324–333. [Google Scholar] [CrossRef]
Jeffries, G.R.; Griffin, T.S.; Fleisher, D.H.; Naumova, E.N.; Koch, M.; Wardlow, B.D. Mapping sub-field maize yields in Nebraska, USA by combining remote sensing imagery, crop simulation models, and machine learning. Precis. Agric. 2020, 21, 678–694. [Google Scholar] [CrossRef]
Schwalbert, R.A.; Amado, T.J.C.; Nieto, L.; Varela, S.; Corassa, G.M.; Horbe, T.A.N.; Rice, C.W.; Peralta, N.R.; Ciampitti, I.A. Forecasting maize yield at field scale based on high-resolution satellite imagery. Biosyst. Eng. 2018, 171, 179–192. [Google Scholar] [CrossRef]
Lobell, D.B. The use of satellite data for crop yield gap analysis. Field Crop. Res. 2013, 143, 56–64. [Google Scholar] [CrossRef]
Colaço, A.F.; Bramley, R. Site–Year Characteristics Have a Critical Impact on Crop Sensor Calibrations for Nitrogen Recommendations. Agron. J. 2019, 111, 2047–2059. [Google Scholar] [CrossRef]
Bramley, R.G.V.; Ouzman, J.; Gobbett, D.L. Regional scale application of the precision agriculture thought process to promote improved fertilizer management in the Australian sugar industry. Precis. Agric. 2019, 20, 362–378. [Google Scholar] [CrossRef]
Momin, M.A.; Grift, T.E.; Valente, D.S.; Hansen, A.C. Sugarcane yield mapping based on vehicle tracking. Precis. Agric. 2019, 20, 896–910. [Google Scholar] [CrossRef]
Fulton, J.P.; Port, K. Precision agriculture data management. In Precision Agriculture Basics; Shannon, D.K., Clay, D.E., Kitchen, N.R., Eds.; ASA, CSSA, SSSA: Madison, WI, USA, 2018. [Google Scholar]
Rahman, M.M.; Robson, A.J. A novel approach for sugarcane yield prediction using landsat time series imagery: A case study on bundaberg region. Adv. Remote Sens. 2016, 5, 93–102. [Google Scholar] [CrossRef]
Mulianga, B.; Bégué, A.; Simoes, M.; Todoroff, P. Forecasting Regional Sugarcane Yield Based on Time Integral and Spatial Aggregation of MODIS NDVI. Remote Sens. 2013, 5, 2184–2199. [Google Scholar] [CrossRef]
Bégué, A.; Lebourgeois, V.; Bappel, E.; Todoroff, P.; Pellegrino, A.; Baillarin, F.; Siegmund, B. Spatio-temporal variability of sugarcane fields and recommendations for yield forecast using NDVI. Int. J. Remote Sens. 2010, 31, 5391–5407. [Google Scholar] [CrossRef]
Hammer, R.G.; Sentelhas, P.C.; Mariano, J.C.Q. Sugarcane yield prediction through data mining and crop simulation models. Sugar Tech. 2019, 22, 216–225. [Google Scholar] [CrossRef]
Simões, M.D.S.; Rocha, J.V.; Lamparelli, R.A.C. Spectral variables, growth analysis and yield of sugarcane. Sci. Agric. 2005, 62, 199–207. [Google Scholar] [CrossRef]
Abdel-Rahman, E.M.; Ahmed, F.B. The application of remote sensing techniques to sugarcane (Saccharum spp. hybrid) production: A review of the literature. Int. J. Remote Sens. 2008, 29, 3753–3767. [Google Scholar] [CrossRef]
Lisboa, I.P.; Damian, J.M.; Cherubin, M.R.; Barros, P.P.S.; Fiorio, P.R.; Cerri, C.C.; Cerri, C.E.P. Prediction of sugarcane yield based on NDVI and concentration of leaf-tissue nutrients in fields managed with straw removal. Agronomy 2018, 8, 196. [Google Scholar] [CrossRef]
Rahman, M.M.; Robson, A.J. Integrating Landsat-8 and Sentinel-2 Time Series Data for Yield Prediction of Sugarcane Crops at the Block Level. Remote Sens. 2020, 12, 1313. [Google Scholar] [CrossRef]
Abdel-Rahman, E.M.; Ahmed, F.B.; Riyad, I. Random forest regression for sugarcane yield prediction based on Landsat TM derived spectral parameters. In Sugarcane: Production, Cultivation and Uses; Nova Science Publishers Inc.: Hauppauge, NY, USA, 2012; Chapter 10. [Google Scholar]
Hunt, M.L.; Blackburn, G.A.; Carrasco, L.; Redhead, J.W.; Rowland, C.S. High resolution wheat yield mapping using Sentinel-2. Remote Sens. Environ. 2019, 233, 111410. [Google Scholar] [CrossRef]
Jeong, J.H.; Resop, J.P.; Mueller, N.D.; Fleisher, D.H.; Yun, K.; Butler, E.E.; Timlin, D.J.; Shim, K.M.; Gerber, J.S.; Reddy, V.R.; et al. Random forests for global and regional crop yield predictions. PLoS ONE 2016, 11, e0156571. [Google Scholar] [CrossRef]
Hochachka, W.M.; Caruana, R.; Fink, D.; Munson, A.; Riedewald, D.; Sorokina, D.; Kelling, S. Data-mining discovery of pattern and process in ecological systems. J. Wildl. Manag. 2007, 71, 2427–2437. [Google Scholar] [CrossRef]
Yuan, H.; Yang, G.; Li, C.; Wang, Y.; Liu, J.; Yu, H.; Feng, H.; Xu, B.; Zhao, X.; Yang, X. Retrieving soybean leaf area index from unmanned aerial vehicle hyperspectral remote sensing: Analysis of RF, ANN, and SVM regression models. Remote Sens. 2017, 9, 309. [Google Scholar] [CrossRef]
Yue, J.; Feng, H.; Yang, G.; Li, Z. A comparison of regression techniques for estimation of above-ground winter wheat biomass using near-surface spectroscopy. Remote Sens. 2018, 10, 66. [Google Scholar] [CrossRef]
Han, L.; Yang, G.; Dai, H.; Xu, B.; Yang, H.; Feng, H.; Li, Z.; Yang, X. Modeling maize above-ground biomass based on machine learning approaches using UAV remote-sensing data. Plant Methods 2019, 15, 1–19. [Google Scholar] [CrossRef]
Schwalbert, R.A.; Amado, T.J.C.; Nieto, L.; Corassa, G.M.; Rice, C.W.; Peralta, N.R.; Schauberger, B.; Gornott, C.; Ciampitti, I.A. Mid-season county-level corn yield forecast for US Corn Belt integrating satellite imagery and weather variables. Crop Sci. 2020, 60. [Google Scholar] [CrossRef]
EMBRAPA—Empresa Brasileira de Pesquisa Agropecuária. Sistema Brasileiro de Classificação de Solos, 3rd ed.; Empresa Brasileira de Pesquisa Agropecuária (Embrapa): Brasília, Brazil, 2013; p. 353. [Google Scholar]
QGIS Development Team. QGIS Geographic Information System. Open Source Geospatial Foundation Project. 2018. Available online: http://qgis.osgeo.org (accessed on 10 January 2021).
Congedo, L. Semi-Automatic Classification Plugin Documentation. Release 2016, 4, 29. [Google Scholar] [CrossRef]
Chavez, P., Jr. Image-Based Atmospheric Corrections—Revisited and Improved. Photogramm. Eng. Remote Sens. 1996, 62, 1025–1036. [Google Scholar]
Rouse, J.W.; Haas, R.H.; Schell, J.A.; Deering, D.W. Monitoring vegetation systems in the great plains with ERTS. In Proceedings of the Earth Resources Technology Satellite—1 Symposium, Washington, DC, USA, 10–14 December 1974; pp. 309–317. [Google Scholar]
Barnes, E.M.; Clarke, T.R.; Richards, S.E.; Colaizzi, P.D.; Haberland, J.; Kostrzewski, M.; Waller, P.; Choi, C.; Riley, E.; Thompson, T.; et al. Coincident detection of crop water stress, nitrogen status and canopy density using ground-based multispectral data. In Proceedings of the 5th International Conference on Precision Agriculture, Bloomington, MN, USA, 16–19 July 2000. [Google Scholar]
Gitelson, A.A.; Kaufman, Y.J.; Merzlyak, M.N. Use of a green channel in remote sensing of global vegetation from EOS- MODIS. Remote Sens. Environ. 1996, 58, 289–298. [Google Scholar] [CrossRef]
Gitelson, A.A. Wide Dynamic Range Vegetation Index for Remote Quantification of Crop Biophysical Characteristics. J. Plant Physiol. 2004, 161, 165–173. [Google Scholar] [CrossRef]
Abrahão, S.A.; Pinto, F.D.A.D.C.; Queiroz, D.M.D.; Santos, N.T.; Gleriani, J.M.; Alves, E.A. Índices de vegetação de base espectral para discriminar doses de nitrogênio em capim-tanzânia. Rev. Bras. Zootec. 2009, 38, 1637–1644. [Google Scholar] [CrossRef][Green Version]
Maresma, Á.; Ariza, M.; Martínez, E.; Lloveras, J.; Martínez-Casasnovas, J.A. Analysis of Vegetation Indices to Determine Nitrogen Application and Yield Prediction in Maize (Zea mays L.) from a Standard UAV Service. Remote Sens. 2018, 10, 368. [Google Scholar] [CrossRef]
Matsuoka, S.; Stolf, R. Sugarcane tillering and ratooning: Key factors for a profitable cropping. In Sugarcane: Production, Cultivation and Uses; Gonçalves, J.F., Correia, K.D., Eds.; Nova Science Publishers: New York, NY, USA, 2012; Volume 5, pp. 137–157. [Google Scholar]
Maldaner, L.F.; Molin, J.P. Data processing within rows for sugarcane yield mapping. Sci. Agric. 2020, 77, e20180391. [Google Scholar] [CrossRef]
Minasny, B.; Mcbratney, A.B.; Whelan, B.M. VESPER Version 1.62; Australian Centre for Precision Agriculture, McMillan Building A05, The University of Sydney: Sydney, Australia, 2005. [Google Scholar]
R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2018. [Google Scholar]
Liaw, A.; Wiener, M. Classification and Regression by randomForest. R News 2020, 2, 18–22. [Google Scholar]
Tracy, T.; Fu, Y.; Roy, I.; Jonas, E.; Glendenning, P. Towards Machine Learning on the Automata Processor. In High Performance Computing; Kunkel, J., Balaji, P., Dongarra, J., Eds.; Springer: Cham, Switzerland, 2016; Volume 9697, pp. 200–218. [Google Scholar] [CrossRef]
Ripley, B.D. Spatial Statistics; John Wiley Sons: New York, NY, USA, 1981; Chapter 3. [Google Scholar]
Li, W.; Jiang, J.; Guo, T.; Zhou, M.; Tang, Y.; Wang, Y.; Zhang, Y.; Cheng, T.; Zhu, Y.; Cao, W.; et al. Generating Red-Edge Images at 3 M Spatial Resolution by Fusing Sentinel-2 and Planet Satellite Products. Remote Sens. 2019, 11, 1422. [Google Scholar] [CrossRef]
Cui, Z.; Kerekes, J.P. Potential of Red Edge Spectral Bands in Future Landsat Satellites on Agroecosystem Canopy Green Leaf Area Index Retrieval. Remote Sens. 2018, 10, 1458. [Google Scholar] [CrossRef]
Sun, Y.; Qin, Q.; Ren, H.; Zhang, T.; Chen, S. Red-Edge Band Vegetation Indices for Leaf Area Index Estimation from Sentinel-2/MSI Imagery. IEEE Trans. Geosci. Remote Sens. 2019, 58, 826–840. [Google Scholar] [CrossRef]
Wei, M.C.F.; Maldaner, L.F.; Ottoni, P.M.N.; Molin, J.P. Carrot Yield Mapping: A Precision Agriculture Approach Based on Machine Learning. Artif. Intell. 2020, 1, 229–241. [Google Scholar] [CrossRef]
Venancio, L.P.; Filgueiras, R.; Cunha, F.F.D.; Silva, F.C.S.D.; Santos, R.A.D.; Mantovani, E.C. Mapping of corn phenological stages using NDVI from OLI and MODIS sensors. Semin. Ciênc. Agrar. 2020, 41, 1517–1534. [Google Scholar] [CrossRef]
Morel, J.; Bégué, A.; Todoroff, P.; Martiné, J.F.; Lebourgeois, V.; Petit, M. Coupling a sugarcane crop model with the remotely sensed time series of fIPAR to optimise the yield estimation. Eur. J. Agron. 2014, 61, 60–68. [Google Scholar] [CrossRef]
Dubey, S.K.; Gavli, A.S.; Yadav, S.K.; Sehgal, S.; Ray, S.S. Remote Sensing-Based Yield Forecasting for Sugarcane (Saccharum officinarum L.) Crop in India. J. Indian Soc. Remote Sens. 2018, 46, 1823–1833. [Google Scholar] [CrossRef]
Zhao, D.; Gordon, V.S.; Comstock, J.C.; Glynn, N.C.; Johnson, R.M. Assessment of Sugarcane Yield Potential across Large Numbers of Genotypes using Canopy Reflectance Measurements. Crop Sci. 2016, 56, 1747–1759. [Google Scholar] [CrossRef]

Figure 1. Location of sugarcane fields study site in Brazil.

Figure 2. Flow chart of the methodology for data acquisition and processing.

Figure 3. The most important variables in decreasing order according to the Mean Squared Error based on the Random Forest (RF) regression applied for the predictor variables: Spectral bands (A). Green Normalized Difference Vegetation Index—GNDVI (B). Normalized Difference Red-Edge—NDRE (C). Normalized Difference Vegetation Index—NDVI (D). Wide Dynamic Range Vegetation Index—WDRVI (E).

Figure 4. Observed sugarcane yield of 2018/2019 season (A). Predictive yield models based on Random Forest (RF) using: NDRE (Normalized Difference Red Edge) (B) and spectral bands (C). Predictive yield models based on Multiple Linear Regression (MLR) using: NDRE (Normalized Difference Red Edge) (D) and spectral bands (E).

Figure 5. Observed sugarcane yield of 2019/2020 season (A). Predictive yield models based on Random Forest (RF) using NDRE (Normalized Difference Red Edge) (B), spectral bands (C), based on Multiple Linear Regression (MLR) using: NDRE (Normalized Difference Red Edge) (D), and spectral bands (E).

Table 1. Specifications of the spectral bands from Sentinel-2.

Spectral Bands	Central Wavelength (nm)	Resolution
Spectral Bands	Central Wavelength (nm)	Spatial (m)	Temporal (Days)	Radiometric (Bits)
B2 Blue	490	10	5	12
B3 Green	560
B4 Red	665
B8 NIR	842
B5 Red-Edge	705	20

NIR: Near-infrared.

Table 2. Vegetation index considered in this study.

Vegetation Index	Equation	Authors
Normalized Difference Vegetation Index	NDVI = (NIR − Red)/(NIR + Red)	Rouse et al. [37]
Normalized Difference Red-Edge Index	NDRE = (NIR − Red-edge)/(NIR + Red-edge)	Barnes et al. [38]
Green Normalized Difference Vegetation Index	GNDVI = (NIR − Green)/(NIR + Green)	Gitelson et al. [39]
Wide Dynamic Range Vegetation Index	WDRVI = (a × NIR − Red)/(a × NIR + Red)	Gitelson [40]

Red: reflectance of red region (630 nm–685 nm). red-edge: reflectance in the transition region (690 nm–730 nm). NIR: reflectance in the near-infrared region (760 nm–1500 nm). Green: reflectance of green region (542 nm–578 nm). a: weighting coefficient (0.1).

Table 3. Dates of the orbital images and respective phenological stages of sugarcane.

ID	DAC	Orbital Image Dates in 2018 (Month/Day)	Orbital Image Dates in 2019 (Month/Day)	Phenological Stage
I1	30	NA	NA	Initial
I2	60	NA	NA	Initial
T1	90	02/04, 02/09, 02/24	02/09	Tillering
T2	120	03/06, 03/11, 03/16, 03/21	03/06, 03/26, 03/31	Tillering
T3	150	04/05, 04/20, 04/25, 04/30	04/20, 04/25	Tillering
D1	180	05/20, 05/30	05/05, 05/30	Development
D2	210	06/19, 06/29	06/09, 06/14, 06/24, 06/29	Development
D3	240	07/04, 07/09, 07/14, 07/19, 07/24, 07/29	07/09, 07/14, 07/24	Development
R1	270	08/13, 08/18, 08/23, 08/28	08/08, 08/18, 08/23, 08/28	Ripening
R2	300	09/07, 09/22	09/07, 09/12, 09/17	Ripening
R3	330	10/12, 10/22	10/02, 10/12, 10/17	Ripening
M	360	NA	NA	Maturation

ID: identification. DAC: days after cutting. NA: no data available.

Table 4. Descriptive statistics of the sugarcane yield data considering all fields.

Season	Dataset	n	Minimum	Median	Mean	Maximum	SD	CV (%)
			Mg ha⁻¹
2018/2019	Original	53,759	0.86	70.85	71.20	501.23	23.53	33.04
2018/2019	Filtered	16,202	36.03	64.43	64.31	86.35	7.06	10.98
2019/2020	Original	67,716	10.39	72.52	112.90	498.21	81.96	72.60
2019/2020	Filtered	28,247	42.74	65.82	70.92	107.90	9.55	13.47

n: number of samples. SD: standard deviation. CV: coefficient of variation.

Table 5. Fitted model and semi-variogram variables used to interpolate sugarcane yield data.

Season	n	Model	Range (m)	Sill	Nugget	RMSE (Mg ha⁻¹)	Calc. Grid (Samples ha⁻¹)
2018/2019	5616	Exponential	42.04	33.79	12.95	0.53	23
2019/2020	5686	Exponential	49.30	23.54	5.19	1.41	16

n: number of samples. RMSE: root mean squared error. Calc. grid [49]: calculated grid (= 10,000 (0.5 Range))⁻².

Table 6. Results of Random Forest and Multiple Linear Regression algorithms to predict the sugarcane yield.

		Random Forest			Multiple Linear Regression
Variables	Dataset	RMSE	R²	MAE	RMSE	R²	MAE
Spectral bands	Training	1.95	0.96	1.42	6.10	0.48	4.73
	Testing	4.63	0.70	3.46	6.11	0.47	4.67
	Entire	3.13	0.87	2.11	6.10	0.47	4.71
GNDVI	Training	2.44	0.93	1.83	6.35	0.44	4.92
	Testing	5.47	0.57	4.21	6.14	0.46	4.79
	Entire	3.76	0.81	2.64	6.28	0.44	4.87
NDRE	Training	2.39	0.94	1.79	6.36	0.43	4.93
	Testing	5.30	0.60	4.06	6.18	0.45	4.82
	Entire	3.65	0.82	2.56	6.30	0.44	4.89
NDVI	Training	2.42	0.93	1.81	6.39	0.43	4.93
	Testing	5.39	0.58	4.18	6.21	0.45	4.83
	Entire	3.71	0.81	2.62	6.33	0.43	4.90
WDRVI	Training	2.41	0.94	1.81	6.36	0.43	4.93
	Testing	5.43	0.58	4.20	6.18	0.45	4.81
	Entire	3.73	0.81	2.62	6.30	0.44	4.89

RMSE—Root mean squared error. R²—coefficient of determination. MAE—mean absolute error.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Canata, T.F.; Wei, M.C.F.; Maldaner, L.F.; Molin, J.P. Sugarcane Yield Mapping Using High-Resolution Imagery Data and Machine Learning Technique. Remote Sens. 2021, 13, 232. https://doi.org/10.3390/rs13020232

AMA Style

Canata TF, Wei MCF, Maldaner LF, Molin JP. Sugarcane Yield Mapping Using High-Resolution Imagery Data and Machine Learning Technique. Remote Sensing. 2021; 13(2):232. https://doi.org/10.3390/rs13020232

Chicago/Turabian Style

Canata, Tatiana Fernanda, Marcelo Chan Fu Wei, Leonardo Felipe Maldaner, and José Paulo Molin. 2021. "Sugarcane Yield Mapping Using High-Resolution Imagery Data and Machine Learning Technique" Remote Sensing 13, no. 2: 232. https://doi.org/10.3390/rs13020232

APA Style

Canata, T. F., Wei, M. C. F., Maldaner, L. F., & Molin, J. P. (2021). Sugarcane Yield Mapping Using High-Resolution Imagery Data and Machine Learning Technique. Remote Sensing, 13(2), 232. https://doi.org/10.3390/rs13020232

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Sugarcane Yield Mapping Using High-Resolution Imagery Data and Machine Learning Technique

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Site

2.2. Imagery Data

2.3. Yield Data and Predictive Models

3. Results

3.1. Yield Data and Statistical Analyses

3.2. Selection of Predictor Variables

3.3. Accuracy Assessment

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI