Prediction of Ratoon Sugarcane Family Yield and Selection Using Remote Imagery

Remote sensing techniques and the use of Unmanned Aerial Systems (UAS) have simplified the estimation of yield and plant health in many crops. Family selection in sugarcane breeding programs relies on weighed plots at harvest, which is a labor-intensive process. In this study, we utilized UAS-based remote sensing imagery of plant-cane and first ratoon crops to estimate family yields for a second ratoon crop. Multiple families from the commercial breeding program were planted in a randomized complete block design by family. Standard red, green, and blue imagery was acquired with a commercially available UAS equipped with a Red–Green–Blue (RGB) camera. Color indices using the CIELab color space model were estimated from the imagery for each plot. The cane was mechanically harvested with a sugarcane combine harvester and plot weights were obtained (kg) with a field wagon equipped with load cells. Stepwise regression, correlations, and variance inflation factors were used to identify the best multiple linear regression model to estimate the second ratoon cane yield (kg). A multiple regression model, which included family, and five different color indices produced a significant R2 of 0.88. This indicates that it is possible to make family selection predictions of cane weight without collecting plot weights. The adoption of this technology has the potential to decrease labor requirements and increase breeding efficiency.


Introduction
Sugarcane is an economically important crop in Louisiana [1]. To increase profitability, sugarcane varieties are constantly being improved by the United States Department of Agriculture-Agricultural Research Service's (USDA-ARS) Louisiana sugarcane variety development program, located in Houma, LA. Sugarcane variety development is costly, requiring approximately 12 years of testing to release a cultivar. Due to this, methods are constantly being evaluated to increase efficiency. An important part of this program is the selection of sugarcane seedlings, which requires the evaluation of 70-80 thousand seedlings over a short period of time by brief visual inspection, which requires significant time and labor. As these seedlings are individually un-replicated, selection is potentially biased due to spatial arrangement and microenvironments. Family selection, which involves the selection of seedling families instead of individuals, is based on data from replicated family plots. This procedure is more efficient because fewer poor performing individuals with low heritability traits are introduced into the program [2], but family selection requires gathering data such as weighed plot yields from seedling plots. Family selection in sugarcane at the seedling stage is widely practiced around the world in places, such as Australia [3][4][5][6][7] and the United States [8][9][10][11].
Spectral imaging has been used in several crop species, including sugarcane, to evaluate yield, nutritional status, and crop health [12][13][14][15][16], and can be used to identify spectral traits linked to yield for selection [17]. If the image is acquired aerially, this method also has the advantage of covering large areas easily. These spectral techniques could be used as a tool for evaluating sugarcane seedlings by identifying soil and environmental patterns that may affect selection, and, since spectral analysis can be linked to genetic yield traits, it may also be used for selection. This could give breeders another tool to examine yield parameters quickly without harvest data. Due to land and resource constraints, later ratoon crops (second and later) are not grown, and data are not measured in the Houma seedling stage. If we could predict second ratoon family performance and yield parameters from plant cane and first ratoon images, then the efficiency and speed of this breeding program could be increased.
Remote sensing utilizing specialized sensors has become an important component of crop monitoring and high throughput phenotyping [18,19]. Several indices have been developed that successfully utilize data from both multispectral and RBG imagery as a lowcost alternative to predict biomass [19,20]. The RBG image indices derived from models of Hue-Intensity-Saturation (HIS), International Commission on Illumination L*a*b* (CIELab), and L*u*v* (CIELuv) cylindrical coordinate representations of colors [18,[21][22][23] have been useful in predicting crop yield and, in some cases, have been more accurate than multispectral methods [22,24,25]. RGB indices have been correlated with the sugarcane biomass in other crops [19,20]; therefore, if similar results are obtained with sugarcane seedlings, then this procedure may be used to identify sugarcane seedlings with high biomass. The objective of this study was to (1) determine if remote sensing imagery acquired by UAS could accurately and efficiently evaluate the seedling family performance of cane yield in Houma and (2) determine if second ratoon cane yield could be estimated from plant cane and first ratoon images. To the author's knowledge this is the first use of RGB remote imagery to estimate yield for seedling selection in Louisiana.

Materials and Methods
Experiments were conducted on land near Houma, LA (29 • 38 33.2 N 90 • 51 34.5 W) ( Figure 1). This area is classified as humid subtropical with a Köppen classification of Cfa. The soil at this site is classified as a Cancienne silt loam (fine-silty, mixed, superactive, nonacid, hyperthermic Fluvaquentic Epiaquepts). This land has been used to grow sugarcane for more than 50 years. Seedlings from 23 diverse crosses (families) including 26 unique parents and the check cultivar HoCP 96-540 [26] were selected and planted in a randomized complete block design with three replications (Table 1). This procedure is similar to test plots used to evaluate seedling performance at the Louisiana State University sugarcane breeding program [27]. Each family had two rows spaced 1.8 m apart in raised beds planted, with 19 plants, each spaced 40 cm apart, with each family spaced 1.2 m apart. Fields were fertilized and kept weed free using the standard herbicide and cultivation methods of the area [28]. Red-Green-Blue (RGB) images were taken with a Phantom 4 drone equipped with a 1/2.3 CMOS 12.5 MP camera (SZ DJI Technology Co., Ltd., Nanshan, China) in plant cane at a distance of 91.4 m (3.52 cm/px) on 2 September 2016, and first and second ratoon at 45.7 m (1.76 cm/px) on 25 September 2017 and 2 July 2018, respectively.
Cane was harvested in 2016 and 2018 using a chopper harvester and plot weights were taken utilizing a single-axle, high-dump wagon equipped with electronic load cells [29]. Each family plot had two rows per replication and each row was weighed separately. The data were analyzed using a mixed model in SAS Proc Mixed [30], where plot weight was considered the dependent variable, and family, year harvested, and their interaction as fixed effects. Replication and row harvested were treated as random effects. Significant differences between families within each year were estimated by calculating significant differences between means (Table 1). Cane was harvested in 2016 and 2018 using a chopper harvester and plot weights were taken utilizing a single-axle, high-dump wagon equipped with electronic load cells [29]. Each family plot had two rows per replication and each row was weighed separately. The data were analyzed using a mixed model in SAS Proc Mixed [30], where plot weight was considered the dependent variable, and family, year harvested, and their interaction as fixed effects. Replication and row harvested were treated as random effects. Significant differences between families within each year were estimated by calculating significant differences between means (Table 1).   The Breedpix program [31] measures various color indices using the CIELab color space model [32], from low cost RGB (red-green-blue) remote sensing imagery that can be used to correlate and predict various yield and agronomic traits such as biomass. The indices utilized in this study included the green area (GA), which is the proportion of green pixels in an image [31]. The greener area (GGA) excludes yellow pixels that correlate with senescent leaves [31]. These indices generally correlate to green biomass and are used to calculate the Crop Senescence Index (CSI), which is the scaled ratio between yellow and green vegetation pixels calculated using the following formula: CSI = (GA − GGA)/GA * 100 [21]. This index is correlated with leaf senescence. The Normalized Green-Red Difference Index (NGRDI) [33] and the Triangular Greenness Index (TGI) [34], with their standard deviations, were also calculated using the Breedpix program to evaluate their usefulness in predicting yield. The NGRDI was developed to estimate the vegetation fraction, or the area covered with vegetation. The TGI is affected only by the chlorophyll content in the leaves and can be used to estimate N requirements. NGRDI is calculated as follows: TGI is calculated as follows: where Rg and Rr represent the reflectances of green and red bands, respectively, and λr, λb, and λg represent the center wavelengths for the red, blue, and green bands, respectively. Other more general CIELab values calculated include lightness, which represents the range from black to white with pure black having a value of zero and pure white having a value of ten [35]. The range from green to red is represented by the a* index, and the range from blue to yellow is represented by the b* index. The v* component is the scale from blue to green. The u* index is the scale from blue to red [19,32]. Hue is a description of color in the form of hue angles in the a*b* plane of the CIELab color space and vary from 0 to 360 • , where 0 • is red, 60 • is yellow, 120 • is green, and 180 • is cyan [19].
Plots in the acquired imagery were manually identified and cropped individually into 72 × 408 pixel segments using the GIMP image editor [36]. These images were run through the Breedpix Maize scanner program [31] in ImageJ [37]. The means for each variable were calculated. Raw data were run through SAS [30] using the Meta Macro [38]. This was used to calculate heritability, defined as the proportion of the phenotypic or trait variability that is due to genetic causes [39], and the trait phenotypic correlations between years. The arithmetic means by family of these variables were then entered into PAST (PAleontological STatistics) software [40] for Pearson correlations. SAS Proc GLMSELECT [30] was used to identify the best multiple regression model for plant weight estimation using stepwise regression with Schwartz's Bayesian Criterion (SBC) selection and an adjusted R 2 as a stopping criterion. For plant cane and first ratoon models, the number of selected variables were manually modified to add spectral variables that contributed the most to the regression model and drop variables with high cross correlation and a Variance Inflation Factor (VIF) greater than ten that were estimated using Proc Reg [30]. A VIF greater than 10 is an indicator that multicollinearity is high [30].

Family Selection Yield Performance
To compare the family plot yield within and between ratoons, a mixed model analysis was performed and all fixed effects in the model were significant including family, crop year, and crop year × family interactions ( Table 2). The strongest effect was crop year, which had a much higher F value than the other effects, and family × crop year was smaller than family effects. This shows the large effect of crop year on yield. Even though the interaction of crop year and family was the smallest effect, it was significant, indicating that crop year significantly affects the yield of genotypes differently. This makes the accurate selection of ratoon performance based on plant cane performance difficult. The highest yielding family in the plant cane was also one of the highest in the second ratoon: family CPX14-0705, whose parents were HoCP 04-852 and Ho 12-630 (Table 1). The family CPX15-0167, whose parents were Ho 13-756 and L 09-099, was one of the highest yielding in the plant cane but had moderate yield in the second ratoon. The family CPX15-0105 was among the lowest yielding in the plant cane but was not significantly different than the highest in the second ratoon (Table 1). This indicates that selecting the highest yielding families in the plant cane will not necessarily select the highest yielding second ratoon family, which is consistent with the findings of [41]. In the plant cane, the highest yielding family was not significantly different from the check cultivar HoCP 96-540 and 15 other families (Table 1). Of these, several parents were repeated multiple times, including HoCP 04-852 (4 times), HoCP 04-838 [42] (3), and HoCP 01-517 (3). Among these, HoCP 04-852 was crossed with HoCP 04-838 and HoCP 01-517 one time each, and these families were among the highest group in the second ratoon as well (Table 1). In second ratoon, 14 families were not significantly different from the check, and these included some of the same commonly repeated parents of the high yielding plant cane families, including HoCP 04-852 (5 times), HoCP 04-838 (3), and HoCP 01-517 (4). Between the plant cane and the second ratoon, there were 11 families in common. This indicates that the selection for cane yield in the plant cane will select and identify some high-yielding families in the second ratoon, but it will also include some low-yielding families in the second ratoon and discard some potentially valuable crosses. If family selections were made in the plant cane, five poor ratooning families would have been included and two good ratooning families would have been discarded. Among the poor yielding families, L 12-201 was used twice. L 12-201 was only used in these crosses; therefore, it is difficult to fully estimate its crossing performance, but one of the male parents, Ho 11-529, seems to have poor family performance that could have lowered the yield (Table 1). Ho 11-529 was also crossed with the good parent Ho 11-512 in family CP14-0332 that had a poor plot weight in the second ratoon. The genotype Ho 11-512 was parent four of the highest yielding families in the second ratoon. This indicates that Ho 11-529 produces poor progeny, even with good parents, and is not a good parent for high cane yield or ratooning ability. In Australia, the practice was to select 30 to 40 percent of the families based on plant cane weight [2]. However, if this were performed with the families in this study based on plant cane yield, 10 crosses would have been selected, but these would have included three lower yielding second ratoon crosses and discarded six high yielding second ratoon families, including the highest yielding family in the second ratoon. There was only a 0.36 correlation for family plot weight between the plant cane and the second ratoon. This correlation is similar to that found by Skinner [43], who also found a low phenotypic correlation (0.38) for cane weight between three row seedlings in plant cane and a first ratoon evaluation trial in Australia.

Family Selection Based on Remote Sensing
There were several RGB indices that had moderate correlations between the plant cane and the second ratoon; these included u*, TGI, and CSI measured in plant cane (Tables 3 and 4). These also had better correlations with the second ratoon plot weight than the plant cane plot weight. In the first ratoon, the indices b*, v*, lightness, intensity, hue, GGA, and NGRDI had high correlations between years and high correlations with the second ratoon plot weight (Tables 4 and 5). The best second ratoon indices that had good correlations to cane yield and high heritability were a*, u*, v*, lightness, intensity, and TGI (Tables 4 and 5). These indices also had high heritability in plant cane. Our results indicate that there was little relationship between the plot weights of the plant cane and the second ratoon at this location. The heritability was lower for the plot weight (0.20) in the plant cane but was much higher in the second ratoon at 0.65. This indicates that selection would be better if made in the second ratoon. Additionally, supporting this conclusion was our finding of more significant differences in cane yield among crosses in the second ratoon compared with the plant cane (Table 1). However, due to land constraints, the Houma sugarcane breeding program is not currently able to select crosses in the second ratoon. One possibility would be to plant test crosses to determine yield performance that could be estimated by using remote sensing methods. Casadesús and Villegas [31] also found positive significant correlations between hue, a*, u*, GA, and GGA and dry biomass in barley, wheat, and triticale, which they attributed to canopy coverage and color. The indices GA and GAA relate to green biomass and CSI relates to sensing leaves [44] and, since GA and CSI correlated with cane yield, indicates that the number of green leaves is a factor to yield prediction. Similarly, since the NGRDI is related to vegetation fraction or area covered with vegetation was correlated with cane plot weight, vegetation fraction is also related to plot yield. Overall, our results indicate that greenness, a possible indication of plant health, and vegetation and canopy coverage are the important traits being visualized by remote sensing and being correlated with biomass.   To take into account the potential benefits of combining indices to improve yield predictions, multiple regression was utilized to combine independent factors. Most variables had large significant cross correlations with each other; therefore, only the variables with low variance inflation factors (<10) were included. If only the variables from the plant cane were utilized to predict the second ratoon cane yield, the best model in 2016 included hue, lightness, b*, and family (Table 6). These spectral variables also had high heritabilities and gave an R 2 of 0.46, which is better than the R 2 = 0.13 for the plant cane harvested plot weight. The best fitting variables with low variance inflation factors from the first ratoon were family, hue, and the standard deviation for TGI. This had a better R 2 (0.58) to the second ratoon plot weight than any of the plant cane variables. The best fitting model using the second ratoon variables included the variables intensity, CSI, saturation, GGA along with the family that produced a higher R 2 (0.82) with the second ratoon plot weight. A higher correlation utilizing the second ratoon traits is to be expected, as this was taken during the same harvest season. The most useful regression equation that combined the variables from every year with low variance inflation factors included plant cane hue; first ratoon TGI SD; second ratoon intensity, CSI and GGA; and family as a numerical variable. This model had an R 2 of 0.88 for predicting the second ratoon yield. The TGI, CSI, and GGA indices were important to most models, and they can be related to specific traits of the plant. The GGA index is related to the proportion of green pixels in the image without the yellow (which correspond to senescing leaves). The CSI is also correlated with crop senescence and indicates that a high proportion of green leaves that are not senescing are related to high biomass or cane weight. The TGI is only affected by chlorophyll content, which indicates that chlorophyll content affects cane weight. As the spectral indices measure many different spectral and physiological traits, and have higher heritability, they were able to predict the second ratoon plot weight yield better than the plant cane plot weight, a trait that also had very low heritability in comparison to other plant cane traits.  The R 2 of 0.88 indicates that an accurate selection for cane yield could be made from using spectral variables without the need to collect plot weights. Our results indicate that spectral indices more accurately predicted the second ratoon family yield than the plant cane weighed plots and could predict better ratooning families. Therefore, the spectral indices we identified could save resources by reducing the number of years required to predict second-ratoon cane yields and to predict families with high ratoon cane yields. Since remote sensing imagery is non-destructive, it could save the time and labor of harvesting the field and could be performed immediately before selection.

Conclusions
Currently, family selection based on plant cane weighed plot data is not accurate for selecting second ratoon family cane yield. Resource limitations also prevent the collection of additional data that could potentially increase the overall accuracy of this method. In this study we utilized a UAV to collect RGB imagery from breeding plots to determine if it could be used to reduce resources and improve our selection accuracy in sugarcane family selection. Several spectral variables correlated to seedling cane weight and varied by ratoon, with some important ones being hue, saturation, intensity, GGA, CSI, and the standard deviations for TGI and NGRDI. The color indices of the CIELab color space model calculated from RGB remote sensing imagery improved the accuracy in the prediction of second ratoon seedling family plot weights. These results indicate that the efficiency and accuracy of sugarcane family selection could be increased by incorporating remote imagery acquired with a UAV. Future studies will be conducted to see if seedling specific traits, such as stalk diameter and height and subjective ratings, correlate with aerial spectral indices.