Forecasting Yield and Lignocellulosic Composition of Energy Cane Using Unmanned Aerial Systems

: Crop monitoring and appropriate agricultural management practices of elite germplasm will enhance bioenergy’s e ﬃ ciency. Unmanned aerial systems (UAS) may be a useful tool for this purpose. The objective of this study was to assess the use of UAS with true color and multispectral imagery to predict the yield and total cellulosic content (TCC) of newly created energy cane germplasm. A trial was established in the growing season of 2016 at the Texas A&M AgriLife Research Center in Weslaco, Texas, where 15 energy cane elite lines and three checks were grown on experimental plots, arranged in a complete block design and replicated four times. Four ﬂights were executed at di ﬀ erent growth stages in 2018, at the ﬁrst ratoon crop, using two multi-rotor UAS: the DJI Phantom 4 Pro equipped with RGB camera and the DJI Matrice 100, equipped with multispectral sensor (SlantRange 3p). Canopy cover, canopy height, NDVI (Normalized Di ﬀ erence Vegetation Index), and ExG (Excess Green Index) were extracted from the images and used to perform a stepwise regression to obtain the yield and TCC models. The results showed a good agreement between the predicted and the measured yields ( R 2 = 0.88); however, a low coe ﬃ cient of determination was found between the predicted and the observed TCC ( R 2 = 0.30). This study demonstrated the potential application of UAS to estimate energy cane yield with high accuracy, enabling plant breeders to phenotype larger populations and make selections with higher conﬁdence.


Introduction
When producing biofuels from dedicated bioenergy crops, maintaining high yields in low input conditions is a priority if global environmental change and increase in world population are considered [1]. Benefits from bioenergy crops can be expanded by deploying varieties adapted for growth on marginal or degraded lands. Promoting high yielding bioenergy crops with positive attributes for water use and soil impact will also expand bioenergy benefits, not to mention the production of bioenergy in land that makes a small contribution to food production.
Perennial C 4 plant species such as energy sorghums (Sorghum bicolor), sugarcanes, and energy canes (Saccharum spp.) are promising feedstock species for the South Central and Southern U.S. regions, where favorable weather conditions would allow maximum biomass production rates. The combination of high productivity (~20 dry tons per acre), resulting from the C 4 photosynthesis, with high light, water, and nitrogen use efficiency, drought tolerance, and wide adaptation, make them well suited for marginal lands [2]. Because these species are perennial, they can also be ratooned (harvested and allowed to re-sprout from the roots or rhizomes, in the case of Saccharum spp.) for several years before replanting is necessary [3]. These species provide other environmental benefits compared to traditional row crops, including extensive root systems, which can increase nutrient capture, improve soil quality, sequester carbon, reduce erosion, and increase growth rate.
Of these energy crops, sugarcane is extremely high yielding, producing large quantities of biomass. Nonetheless, minimal efforts have been employed to develop varieties under low-input management specifically. The plant has tremendous yield potential but is restricted to subtropical and tropical regions. However, because modern sugarcane varieties have been derived from a hybridization process involving S. officinarum and the wild cane S. spontaneum [4], which has drought and cold resistance [5], the creation of energy cane adapted to low input conditions is possible.
Sugarcane genetic breeding is a long process that takes between 10 and 12 years involving hybridization crosses and field selection of new genotypes [6]. Since the crop is highly polyploid and open-pollinated, the hybridization step generates hundreds of thousands of different genotypes to be selected. Because the species is clonally propagated, the biggest challenge for breeders is to identify, among those genotypes, the very few individuals that will produce higher yields consistently over the years and across different environments, a process that may generate an enormous amount of phenotypic data to be analyzed.
In the past, sugarcane breeding programs were looking for higher productivity of sugar, but now some breeding programs are also focused on high yield of fiber [7]. Different from conventional sugarcane (Saccharum spp.), energy cane is selected more for fiber than sucrose composition [8]. Therefore, as energy cane breeding programs work to develop a high-yield, low-input production system, two important goals for novel germplasm are maximizing productivity and optimizing composition.
Plants have developed different responses to abiotic stresses, resulting in different traits for avoidance or tolerance [9]. The incorporation of such traits in plant breeding may be facilitated through phenotyping protocols [10]. High-throughput phenotyping is particularly important in studies of tolerance to abiotic stresses, such as drought. The highly complex responses of plants to drought require the dissection of such responses into a series of component traits, which can be measured most efficiently and accurately with non-destructive image technologies [11]. The use of one of these technologies, red, green, blue (RGB) digital cameras has allowed estimating green biomass via specifically developed indices [12].
Crop yield is commonly estimated with manual surveys or by establishing the relationship between agronomic factors or climatic factors and crop yield based on statistical analysis methods [13]. However, several observations and sampling of experimental plots are required to determine the parameters of a yield prediction model [14]. New approaches have been used for crop yield prediction. You et al. (2017) [15] forecasted soybean yield in the United States at county-level with deep learning models such as convolutional neural networks (CNN) and long-short term memory (LSTM) networks. Their models performed better than traditional remote sensing methods with a 30% reduction of root mean square error (RMSE). Similarly, Khaki et al. (2020) [16] proposed a model to predict corn and soybean yield across the Corn Belt in the United States using convolutional neural networks (CNNs) and recurrent neural networks (RNNs) based on environmental data and management practices.
The model outperformed other methods tested, achieving a RMSE between 9% and 8% of their corresponding average yields. Some disadvantages of these methods are the lack of specific regression relationships, and the labor-intensive calculation process, which substantially restrict their efficiency and implementation.
Unmanned aerial systems equipped with various sensors are being used for non-destructive high-throughput phenotyping [14]. Digital and multispectral cameras are frequently used sensors. Some of the applications of these sensors for field-based phenotyping include biomass estimation, canopy surface modeling, and crop height estimation [17][18][19]. Therefore, unmanned aerial systems have been used to predict yield with plant physiological parameters and vegetation indices [14]. Sanches et al. (2017) [20] predicted sugarcane yield in Brazil using the LAI (Leaf Area Index) and GRVI (Green-Red Vegetation Index). Their results showed that GRVI estimated yield (R 2 = 0.69) with higher accuracy than LAI (R 2 = 0.34), but when both indices were combined, the yield was estimated with greater precision (R 2 = 0.79). However, the authors suggested for future studies the incorporation of plant height and additional indices to improve the results. Chea et al. (2020) [21] developed prediction models for Brix, Pol, fiber, and CCS (Commercial Cane Sugar) value using six vegetation indices (GNDVI, NDVI, RVI, CIgreen, CIrededge, and SRPIb). Their findings indicate that CIrededge is correlated with pol (R 2 = 0.77) and CCS (R 2 = 0.68), independent of variety, whereas brix models depend on the variety and need different vegetation indices. A weak correlation (R 2 = 0.35-0.50) was found between fiber content with the six vegetation indices.
Even though the use of UAS to predict yield and composition of sugarcane has been evaluated, information is missing about the utilization of this technology for energy cane. Additionally, most studies have been focused on the commercial or industrial benefits that UAS can provide. However, this research proposes to use UAS as a tool that facilitates the acquisition of phenotypic information. For this reason, the objective of this study is to assess the use of UAS with true color and multispectral imagery to predict the yield and total cellulosic content (TCC) of newly created energy cane germplasm.

Study Site
The study was conducted on an experimental farm at the Texas A&M AgriLife Research and Extension Center in Weslaco, Texas (26 •

Imagery Acquisition and Processing
Two multi-rotor UAS were used to acquire the data, a DJI Phantom 4 Pro and a DJI Matrice 100 (SZ DJI Technology Co., Ltd., Shenzen, China) ( Table 1). The Phantom was equipped with an RGB sensor with a resolution of 20 megapixels, and 1" CMOS (Complementary Metal Oxide

Imagery Acquisition and Processing
Two multi-rotor UAS were used to acquire the data, a DJI Phantom 4 Pro and a DJI Matrice 100 (SZ DJI Technology Co., Ltd., Shenzen, China) ( Table 1). The Phantom was equipped with an RGB sensor with a resolution of 20 megapixels, and 1" CMOS (Complementary Metal Oxide Semiconductor) detector. The Matrice was equipped with a SlantRange 3p multispectral sensor (SlantRange Inc., San Diego, CA, USA) and an ambient illumination sensor that can be used to help calibrate images when sunlight conditions are changing at the time of data collection. The SlantRange 3p sensor has a spatial resolution of 4.8 cm/pixel at 120 m above ground level (AGL). Images were acquired on July 17, September 18, November 14, and December 19 of 2018, corresponding to 273, 210, 153, and 118 days before harvesting, respectively. The RGB images were collected at 20 m AGL and 80% overlap and sidelap, whereas the multispectral images were collected at 30 m AGL and 70% overlap and sidelap. Flights were conducted between 10:00 AM and 12:00 PM with wind speed less than 8 km/h to avoid image distortion. For georeferencing purposes, eight ground control points (GCPs) were placed uniformly in the study area. The GCPs were surveyed twice with a differential dual frequency, post processed kinematic (PPK) GPS system, collecting data at 20 Hz (V-map Air model, Micro Aerial Projects L.L.C., Gainesville, FL, USA).
SlantView (SlantRange Inc., San Diego, CA, USA) software was used to export radiometrically calibrated multispectral images for further analysis. Both RGB and multispectral images were processed in Agisoft Metashape Professional software (Agisoft LLC, St. Petersburg, Russia) to generate the orthomosaics and digital surface models (DSMs).

Canopy Height
Point cloud datasets were imported into Quick Terrain Modeler (Applied Imagery, Chevy Chase, MD, USA) software to generate the canopy height models (CHMs). The ground surface was estimated with the AGL Analyst tool, and a 10 m grid sampling distance was selected. The digital terrain model (DTM) created was subtracted from the digital surface model (DSM) to obtain the CHMs. Additionally, the generated CHMs were processed in ENVI (Harris Geospatial Solutions Inc., Broomfield, CO, USA) to set the negative values to zero. CHMs were imported into ArcGIS 10.6.1 (ESRI, Redlands, CA, USA) in which average canopy height (CH) per plot was extracted.

Canopy Cover
RGB orthomosaic images were converted into binary images with the canopeo algorithm [22] in QGIS (Open Source Geospatial Foundation Project), where zero designates non-green canopy pixels, and one denotes green canopy pixels. The zonal statistics tool was used to compute the total number of pixels and the sum of green canopy pixels within each plot. Then, percentage canopy cover (CC) was calculated as the ratio of green canopy pixels to the total number of pixels using Equation (1).

Normalized Difference Vegetation Index (NDVI)
The Normalized Difference Vegetation Index (NDVI) was calculated in ArcGIS 10.6.1 based on the multispectral orthomosaic images. In the raster calculator, the near infrared (NIR) and red bands were selected to generate the index (Equation (2) [23]). Then, average NDVI values per plot were extracted using the zonal statistics as a table tool. This index is reported to be well correlated with biomass and has been used to describe crop phenology in tomatoes [24,25].

Excess Green Index (ExG)
To calculate the Excess Green Index (ExG), the RGB orthomosaic images were imported into ArcGIS 10.6.1. The raster calculator tool was used to create the ExG expression by selecting the green, red, and blue bands (Equation (3) [26]). Average ExG values per plot were extracted with the zonal statistics as a table tool.
where g, r, and b are the normalized spectral components, according to: R, G and B denote the values of the red, green, and blue bands, respectively.

Cell Wall Composition Analysis
Before harvesting, one sample of 10 stalks per variety was collected per replicate for composition analyses. In the laboratory, the chemical composition of the energy cane bagasse was determined (cellulose, hemicellulose, and lignin) with near infrared spectroscopy [27].

Harvest Data
At the end of the season the trial was harvested with a Cameco sugarcane harvester on April 16, 2019, and the plot weights were measured with a small capacity weigh wagon (3-Ton Weigh Wagon, CAMECO) instrumented with three load cells [28]. Then, the yield was calculated in megagrams per hectare (Mg ha −1 ).

Data Analysis
The data were analyzed in JMP 14 software (SAS Institute Inc., Cary, NC, USA) to identify outliers with the quantile range outlier's method. A tail quantile value of 0.25 was defined, which means that the interquartile range is between the 0.25 and 0.75 quantiles of the data. Then a multiplier (Q) of 1.5 was selected to identify values as outliers. An outlier was calculated as any value more than Q times the interquartile range from the lower and upper quantiles.
A multivariate analysis was performed in JMP 14 software to find the pairwise and higher relationships among the CC, CH, NDVI, ExG, yield, and total cellulosic content (TCC). This analysis allowed summarization of the linear relationships between flights to check for data consistency. A correlation probability report was generated, which showed the p-values that correspond to a test of the null hypothesis that the true correlation between the variables is zero.

Yield and Total Cellulosic Content Models
A stepwise regression approach with a k-fold cross validation was implemented in JMP 14 to obtain the yield and TCC models with the data from the varieties Ho02−113, TH16−19, and TH16-25. Six folds were selected to split the data into groups of equal number of observations. The stopping rule adopted was Max K-Fold RSquare, which attempted to find a model to maximize the coefficient of determination for the validation set. Stepwise multiple linear regression is a commonly implemented empirical statistical method for high-throughput field phenotyping [29]. It is used to improve the prediction performance of the models by eliminating unnecessary factors and selecting significant factors [30].

Statistics
After an initial analysis, no outliers were found in the variables CC, CH, NDVI, and ExG obtained from the UAS images, and in the observed yield and TCC ( Table 2). The highest mean CC, CH, and ExG values were observed for the second flight (85.74, 3.73, and 0.23, respectively), while the lowest values of CC and ExG were for the fourth flight; the lowest mean CH was observed during the first flight. Similarly, the highest mean NDVI values were obtained for the third flight (0.61), whereas the lowest NDVI values were for the fourth flight.  The canopy cover variability was greater during the third flight, evidenced by the negatively skewed distribution (Figure 2a). Canopy height variability was higher during the first, third, and fourth flights, which presented a negatively skewed distribution for the first flight and a positively skewed distribution for the third and fourth flights (Figure 2b). NDVI showed higher variation in the fourth flight with a positively skewed distribution (Figure 2c). ExG variability was observed in the first flight with positively skewed distribution and the third flight with a negatively skewed distribution (Figure 2d).

Relationship Analysis
The correlations among the independent variables were analyzed between flights (Table 3). CC showed high temporal consistency between the first and the fourth flight (r = 0.73); however, a weak correlation was found between the other flights. Positive relationships associated with height were consistent throughout the evaluation period, being the lowest correlation between the first and fourth flight (r = 0.73), whereas the highest correlation was between the first and second flight (r = 0.93). A strong positive relationship was found between NDVI values of the first and second flight (r = 0.76), and between the first and fourth flight (r = 0.60); nevertheless, for the remaining flights, weak positive or negative correlations were observed. The highest ExG correlation was between the first and fourth flight (r = 0.63), followed by the correlation between the first and second flight (r = 0.39), whereas weak positive relationships were found for the remaining flights.
From each flight were identified the variables with a moderate or strong correlation with yield and TCC. For the first flight, CC, CH, and NDVI were positively related to yield, whereas CC and ExG were associated with TCC. For the second flight, CC, CH, and NDVI were positively correlated to yield; CC, CH, and ExG were related to TCC. For the third flight, the variables associated with yield were CH and NDVI: the first variable was positively correlated, and the second negatively correlated. In contrast, CH, NDVI, and ExG were related to TCC: CH was positively correlated and NDVI and ExG negatively correlated. For the fourth flight, all the variables were moderately related to yield; CC, CH, and ExG were associated with TCC.

Relationship Analysis
The correlations among the independent variables were analyzed between flights (Table 3). CC showed high temporal consistency between the first and the fourth flight (r = 0.73); however, a weak correlation was found between the other flights. Positive relationships associated with height were consistent throughout the evaluation period, being the lowest correlation between the first and fourth flight (r = 0.73), whereas the highest correlation was between the first and second flight (r = 0.93). A strong positive relationship was found between NDVI values of the first and second flight (r = 0.76), and between the first and fourth flight (r = 0.60); nevertheless, for the remaining flights, weak positive or negative correlations were observed. The highest ExG correlation was between the first and fourth flight (r = 0.63), followed by the correlation between the first and second flight (r = 0.39), whereas weak positive relationships were found for the remaining flights.
From each flight were identified the variables with a moderate or strong correlation with yield and TCC. For the first flight, CC, CH, and NDVI were positively related to yield, whereas CC and ExG were associated with TCC. For the second flight, CC, CH, and NDVI were positively correlated to yield; CC, CH, and ExG were related to TCC. For the third flight, the variables associated with yield were CH and NDVI: the first variable was positively correlated, and the second negatively correlated. In contrast, CH, NDVI, and ExG were related to TCC: CH was positively correlated and NDVI and ExG negatively correlated. For the fourth flight, all the variables were moderately related to yield; CC, CH, and ExG were associated with TCC.

Energy Cane Yield and TCC Models
The yield models were obtained for the four flights after performing a stepwise regression, and the coefficient of determination (R 2 ), p-value, and RMSE were used to evaluate the model's performance ( Table 4). The variables that most influenced energy cane yield were NDVI and canopy height. The models for the first and third flight showed a good coefficient of determination to estimate energy cane yield. The lowest R 2 was for the fourth flight, while the highest R 2 corresponded to the third flight (Table 4).
In the same way, total cellulosic content (TCC) models for the fourth flights presented different coefficients of determination, p-values, and RMSE (Table 5). Nonetheless, since the variables were not significantly correlated to TCC, then low R 2 were obtained for all the flights. The relationship between observed and predicted yield is presented in Figure 3. This figure shows that the best relationship between observed and predicted yield is given by the model for the third flight with a coefficient of determination of 0.88 (Figure 3c). In contrast, the lowest relationship of yields is found in the model for the fourth flight ( Figure 3d).
The relationship between observed TCC and predicted TCC is showed in Figure 4, where the best agreement between the observed and predicted TCC is provided by the model for the third flight with a coefficient of determination of 0.30 (Figure 4c). On the other hand, the lowest relationship of TCC is observed in the model for the second flight (Figure 4d). The relationship between observed TCC and predicted TCC is showed in Figure 4, where the best agreement between the observed and predicted TCC is provided by the model for the third flight with a coefficient of determination of 0.30 (Figure 4c). On the other hand, the lowest relationship of TCC is observed in the model for the second flight (Figure 4d).

Discussion
True color and multispectral imagery can be used to extract plant measurements and vegetation indices to estimate yield. However, it is necessary to know the best time of data collection to make

Discussion
True color and multispectral imagery can be used to extract plant measurements and vegetation indices to estimate yield. However, it is necessary to know the best time of data collection to make the predictions, so there is a need for assessing the accuracy of the models generated for each of the flights. A previous study by Sanches et al. (2018) [20] found that the inflection point of biomass accumulation by the crop is a useful reference in the estimation of sugarcane yield with UAS images.
NDVI has been used in several studies of yield prediction, and it has shown good results [12,[31][32][33]. In this study, it was significantly or moderately correlated to yield for the first (r = 0.84) and second (r = 0.55) assessments. Nevertheless, the higher correlation was observed during the first flight, which indicates that this crop stage can be used to collect UAS imagery for energy cane yield prediction.
Sugarcane height is highly correlated with biomass and nitrogen uptake [34]. Additionally, height can be an indicator of yield and other parameters since it is highly influenced by soil, total sugar, leaf nitrogen, temperature, and light intensity [35,36]. Nevertheless, when sugarcane plants attain a certain height, they start to lodge, and environmental factors such as wind can cause plant breakage [35]. In this experiment, plant height was also affected by wind, and for the last two assessments, lodging was evident. The yield models for the third and fourth flights contain canopy height as one of their predictors; however, since the measurements after lodging are not representative, then these models could have some limitations estimating energy cane yield.
Canopy cover is related to crop growth and water use [37]; for this reason, it is an important parameter in crop monitoring. Energy cane canopy cover did not play an essential role in yield prediction; it was moderately correlated with yield for first assessment (r = 0.32) and the fourth assessment (r = 0.44). However, it was strongly correlated with yield for the second assessment (r = 0.70). These results are due mainly to the thresholds used to separate green vegetation from non-vegetation according to the canopeo algorithm [22].
ExG is useful for discriminating between green and non-green vegetation [26]. This index has been used coupled with crop classified mean heights to predict corn grain yield, and it showed good results [38]. Opposite to these findings, in the present study, the mean ExG values for the first, second, and third assessments were not significantly correlated to energy cane yield; it only was significantly correlated to yield for the fourth flight (r = 0.60).
In this study, a stepwise regression analysis was implemented to obtain the yield and TCC models, and the accuracy was assessed by means R 2 , p-value, and RMSE. Energy cane yield was satisfactorily estimated by two of the models, which were built with UAS data collected 273 and 153 days before harvest. The predictor for the first model is NDVI, which implies that the acquisition of multispectral imagery is required, whereas, for the second model, the predictors are CC, CH, and ExG, which can be extracted from RGB images. The performance of these models was also satisfactory since RMSE ranged from 6.  [39], ranging from 7.20 to 11.0 Mg ha −1 . Nevertheless, the performance of the model for the fourth flight was not satisfactory, since it was 11.39 Mg ha −1 .
The previous findings may apply to other locations if the environmental conditions at the time of the data collection are appropriate. It can also be applied to other crops such as sorghum [40], maize [41], wheat [42], and rice [32]. Furthermore, other variables that may be included in future yield prediction models could be abiotic stresses, such as temperature, precipitation, and solar radiation, which would increase the importance of this technology as a valuable tool for the breeding of stress tolerant varieties [43].
TCC was not successfully estimated by the models created, and the main reason for this was the lack of correlation of TCC with the variables used. The coefficient of determination ranged from 0.21 to 0.30. These results agree with those reported by Chea et al. (2020) [21], who found a weak correlation (R 2 = 0.35-0.50) between fiber content with GNDVI (Green Normalized Difference Vegetation Index), NDVI (Normalized Difference Vegetation Index), RVI (Ratio Vegetation Index), CIgreen (Green Chlorophyll Index), CIrededge (Red Edge Chlorophyll Index), and SRPIb (Simple Ratio Pigment Index). Moreover, Roberts et al. (2011) [44] highlighted that lignocellulosic content indices rely on short-wave-infrared (SWIR) wavelengths, and wavelengths from 1500 to 1800 nm and 2000 to 2350 nm, which compare reflectance at an absorbing wavelength to a non-absorbing wavelength. In this case, hyperspectral imagery was not available to calculate indices such as CAI (Cellulose Absorption Index) [45] or NDLI (Normalized Difference Lignin Index) [46], so this could be a limiting factor to predict TCC in energy cane.

Conclusions
This study assessed the utilization of UAS in the prediction of yield and composition of energy cane. Crop parameters and vegetation indices (NDVI and ExG) were extracted from visible and multispectral imagery to generate the models. The yield was satisfactorily estimated by two of the models created, the first model with data collected 273 days before harvest (R 2 = 0.88) and the second with data collected 153 days before harvest (R 2 = 0.71). TCC was not estimated satisfactorily, and the highest coefficient of determination was 0.30. This study demonstrated the potential application of UAS to estimate energy cane yield with high accuracy, enabling plant breeders to phenotype larger populations and make selections with higher confidence. Further investigation is required for TCC prediction considering the use of hyperspectral imagery.