Multi-Temporal Predictive Modelling of Sorghum Biomass Using UAV-Based Hyperspectral and LiDAR Data

: High-throughput phenotyping using high spatial, spectral, and temporal resolution remote sensing (RS) data has become a critical part of the plant breeding chain focused on reducing the time and cost of the selection process for the “best” genotypes with respect to the trait(s) of interest. In this paper, the potential of accurate and reliable sorghum biomass prediction using visible and near infrared (VNIR) and short-wave infrared (SWIR) hyperspectral data as well as light detection and ranging (LiDAR) data acquired by sensors mounted on UAV platforms is investigated. Predictive models are developed using classical regression-based machine learning methods for nine experiments conducted during the 2017 and 2018 growing seasons at the Agronomy Center for Research and Education (ACRE) at Purdue University, Indiana, USA. The impact of the regression method, data source, timing of RS and ﬁeld-based biomass reference data acquisition, and the number of samples on the prediction results are investigated. R 2 values for end-of-season biomass ranged from 0.64 to 0.89 for di ﬀ erent experiments when features from all the data sources were included. Geometry-based features derived from the LiDAR point cloud to characterize plant structure and chemistry-based features extracted from hyperspectral data provided the most accurate predictions. Evaluation of the impact of the time of data acquisition during the growing season on the prediction results indicated that although the most accurate and reliable predictions of ﬁnal biomass were achieved using remotely sensed data from mid-season to end-of-season, predictions in mid-season provided adequate results to di ﬀ erentiate between promising varieties for selection. The analysis of variance (ANOVA) of the accuracies of the predictive models showed that both the data source and regression method are important factors for a reliable prediction; however, the data source was more important with 69% signiﬁcance, versus 28% signiﬁcance for the regression method.


Introduction
Biomass yield is an important trait of biofuel crops such as sorghum, as it is a key factor in determining the amount of biofuel that can be produced. With recent advances in science and technology surrounding genotyping, it has become possible to create numerous genotypes of a plant [1] and then select the genotypes with the maximum biomass production. However, traditional methods of biomass measurement involving labor-intensive and time-consuming destructive sampling do not meet the requirements for timely evaluation of the genotypes in large-scale breeding programs. Recently, remote sensing (RS) data have been explored for estimation of many phenotypic traits,

Ground Reference Data
Details of the experiment trials including sowing date, which differed by approximately one week, and harvest dates are provided in Table 1. For the HyCal-17 and HyCal-18 panels, biomass data were destructively collected multiple times during each growing season. For all other experiments, the biomass data were collected only once at the end of each growing season using a two-row combine harvester. The weight of the shredded plant material of each plot was considered as the fresh biomass value for that plot. After harvesting, around 500 g of the shredded plant material was used to determine the moisture content of each plot by measuring the fresh weight and dry weight (after drying the plant materials).

Ground Reference Data
Details of the experiment trials including sowing date, which differed by approximately one week, and harvest dates are provided in Table 1. For the HyCal-17 and HyCal-18 panels, biomass data were destructively collected multiple times during each growing season. For all other experiments, the biomass data were collected only once at the end of each growing season using a two-row combine harvester. The weight of the shredded plant material of each plot was considered as the fresh biomass value for that plot. After harvesting, around 500 g of the shredded plant material was used to determine the moisture content of each plot by measuring the fresh weight and dry weight (after drying the plant materials).  HyCal-18 panels, respectively, during the growing seasons. In 2017, the biomass data on 27 June, 17 July, and 7 August were collected by hand harvesting one meter sections of three rows from each plot. Harvesting for all other dates was performed with a two-row combine harvester. These figures indicate that the genotypes have similar biomass early in the season but differ at the end of the season. Figure 2c shows the distribution of the end of the season biomass of the nine trials over both years:  Figure 2a,b shows the fresh biomass distribution of the HyCal-17 and HyCal-18 panels, respectively, during the growing seasons. In 2017, the biomass data on June 27th, July 17th, and August 7th were collected by hand harvesting one meter sections of three rows from each plot. Harvesting for all other dates was performed with a two-row combine harvester. These figures indicate that the genotypes have similar biomass early in the season but differ at the end of the season. Figure 2c shows the distribution of the end of the season biomass of the nine trials over both years: 1) the InCal-17 and SbDiv-17 are similar, 2) the HyCal-17 and HyCal- 18

Remote Sensing Data
This study includes RGB, hyperspectral, and LiDAR remote sensing data collected by custom designed UAV platforms. All remote sensing data acquisition platforms were flown with global navigation satellite system/inertial navigation system (GNSS/INS) units for direct georeferencing. The description of the sensors used in this study is provided in Table 2. RGB data for this study were

Remote Sensing Data
This study includes RGB, hyperspectral, and LiDAR remote sensing data collected by custom designed UAV platforms. All remote sensing data acquisition platforms were flown with global navigation satellite system/inertial navigation system (GNSS/INS) units for direct georeferencing. The description of the sensors used in this study is provided in Table 2. RGB data for this study were collected using a Sony Alpha ILCE-7R RGB camera delivering high-resolution UAV-based aerial imagery. LiDAR data were collected with a Velodyne VLP-16 3D LiDAR sensor operating in the strongest return mode providing an average point cloud density of 750 points per m 2 . Both the RGB Remote Sens. 2020, 12, 3587 6 of 35 camera and the VLP-16 sensor are mounted on a DJI Matrice 600 Pro (M600P) platform. Spatial and temporal system calibration for the datasets used in this study were conducted using the approaches described in [45] and [46], respectively. Additionally, the georeferenced orthomosaics were generated using the structure from motion strategies introduced in [47,48]. Visible near infrared (VNIR) and short wave infrared (SWIR) hyperspectral data were collected with two Headwall Photonics push-broom scanners. In 2017, the VNIR sensor was flown at an altitude of 60 m with a 12 mm Schneider lens, resulting in a ground sampling distance (GSD) of~4 cm. An 8 mm lens was used in 2018, and the flying height was 40 m to maintain the GSD at~4 cm and accommodate the field of view of other sensors on the platform. In both years, the SWIR sensor was flown with a 25 mm lens at 40 m, resulting in approximately a 4 cm GSD. In 2018, the VNIR and SWIR sensors were integrated and flown together on a single UAV platform. A rigorous boresight calibration process described by Habib et al. [49] was applied, yielding simultaneously collected co-aligned VNIR and SWIR data. Similar to [50], all the hyperspectral data were converted to reflectance using the empirical line method to relate the spectra collected from the UAV to data acquired by an SVC 1024i field spectrometer over the calibration targets placed in the field for each acquisition. The data acquired by the sensors in both 2017 and 2018 are listed in Table A2 in Appendix A.

Feature Extraction
As discussed earlier, it is important to extract features that are related to the specific trait of interest and are preferably not redundant. In this study, both traditional and new candidate features focused on the relevance to biomass prediction were extracted from rows 2 and 3 of the 4-or 12-row plots to minimize the border effect. For each acquired data set (listed in Table A2 in Appendix A), we extracted the features described in the following sections.

•
Spectral Reflectance From the Hyperspectral Imaging (HSI) data, the average reflectance values of the plots were calculated from rows 2 and 3 of each plot after masking the shadow and soil pixels.

• Vegetation Indices
Vegetation Indices (VIs) obtained from HSI data have been widely used in different applications, as they are computationally simple and representative of the relevant chemically interpretable absorption and reflectance features in the spectrum. In this study, 13 vegetation indices, listed in Table A3 in Appendix A were extracted and used in the predictive models.  NIR bands are particularly important for representing plant physiology but are subject to the time  during the growing season and environmental conditions. The area under the spectral curve for a given range from λ a to λ b is defined as Intg(λ a , λ b ) = λ b λ a S(λ)dλ, where S(λ) is the reflectance at λ nm. Using different ranges of spectral values, six features were extracted from each HSI spectrum as listed in Table A4 in Appendix A. •

Derivative Features
The spectral derivatives, which quantify slope, curvature, and higher-order aspects of reflectance spectra, can be useful by revealing spectral features that may not be apparent in reflectance data alone [51]. For example, the "red-edge" position (between 680 nm and 750 nm) in crop reflectance data can be easily identified in the derivative spectra, and has been related to crop biomass [52]. Feng et al. analyzed 20 spectral derivative features near the red edge area to estimate wheat leaf nitrogen concentration [53]. In this study, the polished spectra were calculated using a Savitzky-Golay filter [54], then the first derivative (FDR) and second derivative (SDR) of the spectra were extracted. From FDR and SDR, 11 features were extracted and used in the additional analysis as described in Table A5 in Appendix A. These features were selected at wavelengths where spectra of the varieties differed and were also uncorrelated.

LiDAR-Based Features
The 3D structural characteristics of the plants in a plot can be described using various features extracted from LiDAR data. The digital terrain model (DTM) was derived from LiDAR data acquired before the emergence of the plants in each field by interpolating the LiDAR point cloud into a regular grid (8 × 8 cm in this study) using the nearest neighbour interpolation method. The DTM represents the bare earth height information and is assumed to be constant throughout the growing season. For each point cloud data acquired throughout the season, the height of points in the was estimated by subtracting the DTM from the "z" coordinate of each point. Then, the following features were extracted from the point cloud of each plot: •

Height Percentile
To capture the vertical distribution of the LiDAR points in each plot, the 30th, 50th, 75th, 90th, and 95th percentile height values from the point cloud of each plot were calculated.

Canopy Volume
To estimate volume related characteristics of the canopy in each plot, a grid with cells of size 8 × 8 cm was assigned to each plot, and then the associated height was calculated from the points located in each cell, multiplied by the size of the cell to estimate the volume of the canopy within each cell. The aggregate "volume" in each plot is referred to as the volume of the vegetation within a plot. The height of each cell in this study was calculated as the average of the height of the lowest point and the height of the highest point in each cell.

Canopy Cover
Canopy cover can be estimated from LiDAR data as the ratio of above-ground points (or, canopy points) to the total number of LiDAR points in a given area. The following approach was used in this study for canopy cover estimation for each plot. First, the field is divided into grid cells of a user-defined dimensions (8 × 8 cm in this study, consistent with the canopy volume calculation). Then, for each grid cell, the LiDAR points are split into two groups, canopy points and bare earth based on their height using a user-defined threshold. The points above the threshold are considered as Remote Sens. 2020, 12, 3587 8 of 35 canopy points. The canopy cover is estimated as the ratio of the number of canopy points to the total number of LiDAR points in each cell. The average of the canopy cover estimated for the cells located in each plot is assigned as the canopy cover for that plot. In this study, candidate threshold values were 0.1, 0.2, 0.3, 0.4, 0.5, and 0.75 multiplied by the 95th percentile height of each plot, resulting in six height-dependent canopy cover related features. •

Height Statistics
The spatial distribution of the height of the LiDAR points in each plot can also be represented using statistical moments of the distribution. These statistics can also be included as candidate input features for predictive models.

Regression-Based Modeling Approaches
Common regression-based approaches such as partial least squares regression (PLSR), support vector regression (SVR), and random forests (RF) are widely utilized to build predictive models with remote sensing based inputs. PLSR reduces a potentially large number of measured collinear input variables to a few uncorrelated latent variables while seeking to explain the maximum multi-dimensional variance of the dependent variable via a linear model. PLSR has been investigated for developing predictive models to estimate leaf biochemical and biophysical properties [55], chlorophyll content [56], carotenoid content [57], relative water content [58], protein, lignin, and cellulose [59], leaf nitrogen content [60], LAI [61], and biomass [34].
SVR is a supervised non-parametric regression technique, and therefore, no assumptions regarding the underlying data model are required. The original feature space is transformed into a higher dimensional space [62], with the goal of finding a hyperplane to predict the training data set. The optimal values of the kernel function parameters were obtained in this study by a general k-fold cross-validation in a grid search.
Random Forest (RF) modeling is an ensemble learning technique which uses a large set of classification and regression trees (CART) to predict the variable of interest [63]. In random forest regression, each tree is built by randomly choosing a set of variables and a subset of training samples with replacement. The selected samples are used for training, and the remaining observations are used in an internal cross-validation process to determine the performance of the RF model. The outputs of all trees are aggregated to produce a final prediction. A review of RF modeling in remote sensing applications is available in [64]. Similar to SVR, the parameter optimization was accomplished by a general k-fold cross-validation in a grid search method. A grid search method was used to select the best hyperparameters for each model. Table A6 lists the candidate parameters that were tested for each method in the grid search process. The Anaconda Distribution of Python version 3.7 with the Scikit-learn library [65] was used for conducting grid search and developing regression models.

Statistical Analysis
The one-way analysis of variance (ANOVA) is used to determine whether there is a significant difference among the groups of data. If there is a significant difference, then an honest significant difference (HSD) Tukey test (with α = 0.05) can be conducted to determine which groups are significantly different from each other. We also use two-way ANOVA, to evaluate combinations of several variables or factors to identify those that have a significant effect on the estimates [66]. Prior to these statistical tests, the normality and homoscedasticity were confirmed by visually inspecting the variables. The statsmodels library [67] was used for data preparation and statistical analysis.

Time Series of Biomass Data
In the 2017 and 2018 growing seasons, the destructive biomass data were collected multiple times (approximately every month) from the hybrid calibration panels. Figure 3 shows the fresh weight and moisture content of the 18 sorghum varieties planted in the HyCal-17 and HyCal-18 experiments. The moisture content increases at the beginning of the season until it reaches its maximum at 50 to 60 days after sowing. The fresh weight of the plants also increases rapidly at the beginning of the season, while at the end of the season, it decreases as the plants senesce. Among the varieties shown in Figure 3, those that are photoperiod sensitive ("Sordan Headless" and "Trudan Headless") did not flower in the environment in which these experiments were conducted, and continued to add plant material until the end of the season.
Remote Sens. 2020, 12, x 9 of 35 Headless") did not flower in the environment in which these experiments were conducted, and continued to add plant material until the end of the season. Each variety has four replicates in the hybrid Calibration panels, providing adequate sample data to compare the relationship across the varieties and associated changes during the growing season. For each date that the biomass data were collected, an ANOVA test was conducted on the fresh biomass, and if it indicated variability among the varieties was highly significant, a Tukey's multi-comparison test was performed. Figure 4 shows the results of Tukey's pairwise multicomparison test (α = 0.05) for the fresh biomass data collected in the 2017 and 2018 growing seasons. In general, at the beginning of each season, only a few varieties were significantly different, while the variability among the varieties at the end of each growing season was greater. From Figure 4 it is also clear that the two photoperiod sensitive varieties (varieties 16 and 18) were significantly different from the other varieties at the end of both 2017 and 2018 growing seasons.  Each variety has four replicates in the hybrid Calibration panels, providing adequate sample data to compare the relationship across the varieties and associated changes during the growing season. For each date that the biomass data were collected, an ANOVA test was conducted on the fresh biomass, and if it indicated variability among the varieties was highly significant, a Tukey's multi-comparison test was performed. Figure 4 shows the results of Tukey's pairwise multi-comparison test (α = 0.05) for the fresh biomass data collected in the 2017 and 2018 growing seasons. In general, at the beginning of each season, only a few varieties were significantly different, while the variability among the varieties at the end of each growing season was greater. From Figure 4 it is also clear that the two photoperiod sensitive varieties (varieties 16 and 18) were significantly different from the other varieties at the end of both 2017 and 2018 growing seasons.   Figure A4 in Appendix B for the variance plot). On that day, the signatures of the 18 varieties were very similar in the visible range of the spectrum, but there was more variability in the NIR portion of spectrum that may reflect variation in biochemical features, including lignin type and composition associated with the brown-midrib (bmr) traits. Figure 6 shows the reflectance of one of the varieties from June to September which shows there is very little change in the visible range of the spectrum, and especially in the blue and green bands, while the reflectance in NIR bands changes from about 35% to 60% on average (see Figure A5 in Appendix B for the variance plot). The maximum reflectance values were observed in the range of 750-850 nm on 3 July (DAS = 56). One of the reasons for changes in the reflectance for sorghum is the appearance of the panicles, which emerge a few days before the flowering date (the date on which 50% of panicles in a plot are flowered). Field notes indicate that the flowering date for "Trudan 8" was 10 July 2018.

Time Series of Remote Sensing Data
RS spectral signatures varied both across phenotypes and during the growing season. Figure 5 shows the average spectra of the 18 varieties of sorghum in the HyCal-18 panel on 18 July 2018 (day after sowing (DAS) = 71) (see Figure A4 in Appendix B for the variance plot). On that day, the signatures of the 18 varieties were very similar in the visible range of the spectrum, but there was more variability in the NIR portion of spectrum that may reflect variation in biochemical features, including lignin type and composition associated with the brown-midrib (bmr) traits. Figure 6 shows the reflectance of one of the varieties from June to September which shows there is very little change in the visible range of the spectrum, and especially in the blue and green bands, while the reflectance in NIR bands changes from about 35% to 60% on average (see Figure A5 in Appendix B for the variance plot). The maximum reflectance values were observed in the range of 750-850 nm on 3 July (DAS = 56). One of the reasons for changes in the reflectance for sorghum is the appearance of the panicles, which emerge a few days before the flowering date (the date on which 50% of panicles in a plot are flowered). Field notes indicate that the flowering date for "Trudan 8" was 10 July 2018. Figure 7 shows the Digital Surface Model (DSM) generated from the LiDAR point cloud for the SbDivTc-18 panel from multiple dates in the 2018 growing season. From Figure 7, the plots are more similar at the beginning of the season, with greater differences later in the season. Figure 8 shows the point cloud data for two plots (rows 2 and 3) of the HyCal-18 experiment for multiple dates in the 2018 growing season. "341 × 10" is a dwarf grain sorghum variety (Figure 3), while "Trudan Headless" is a photoperiod sensitive forage sorghum with high biomass accumulation.
in the visible range of the spectrum, and especially in the blue and green bands, while the reflectance in NIR bands changes from about 35% to 60% on average (see Figure A5 in Appendix B for the variance plot). The maximum reflectance values were observed in the range of 750-850 nm on 3 July (DAS = 56). One of the reasons for changes in the reflectance for sorghum is the appearance of the panicles, which emerge a few days before the flowering date (the date on which 50% of panicles in a plot are flowered). Field notes indicate that the flowering date for "Trudan 8" was 10 July 2018.       SbBAP-17 experiment also has the maximum average height at the end of the season, as it includes many plots of photoperiod sensitive genotypes that do not flower. The InCal-17, InCal-18, and SbDiv-17 include similar inbred varieties; thus, they have a very similar average height (also the lowest height values among the experiments). As was noted earlier, a histogram of the height of the points from the LiDAR point cloud provides information about the distribution of different height values in a plot. This genotype dependent information may be discriminating in predictive models. Figure A3 in Appendix B shows the histograms for the dwarf grain sorghum 341x10 and the photoperiod sensitive Trudan Headless varieties in the HyCal-17 experiment.
The multi-temporal and cross-correlations during the growing season can be useful for screening for redundant features. Figure 10 shows the correlation matrices for the OSAVI and LiDAR-based canopy cover calculated using the combined data acquired over the HyCal-17 experiment in the 2017 growing season. It illustrates the rapid changes at the beginning of the season, especially prior to the flowering time (second week of July for this experiment) which is associated with the rapid growth of the plants. From Figure 10, OSAVI changed more than the canopy cover during the early season, and end of season OSAVI values have lower inter-temporal correlation compared to the correlation between the canopy cover values on corresponding dates.

Average height
Standard deviation of height  This genotype dependent information may be discriminating in predictive models. Figure A3 in Appendix B shows the histograms for the dwarf grain sorghum 341 × 10 and the photoperiod sensitive Trudan Headless varieties in the HyCal-17 experiment.
Remote Sens. 2020, 12, x 12 of 35 "341 × 10" "Trudan Headless" Figure 8. Point cloud data for rows two and three of two plots of the HyCal-18 experiment for multiple dates in the 2018 growing season. "341 × 10" and "Trudan Headless" achieve maximum height of 1.4 m and 3.2 m, respectively. Figure 9 shows the mean and standard deviation of the height of all the plots for all the experiments in the 2017 and 2018 growing seasons extracted from the LiDAR point clouds, providing structural characteristics of the varieties planted in each experiment. For both HyCal-17 and HyCal-18, the height increases as the headless varieties continue to grow until the end of the season. The SbBAP-17 experiment also has the maximum average height at the end of the season, as it includes many plots of photoperiod sensitive genotypes that do not flower. The InCal-17, InCal-18, and SbDiv-17 include similar inbred varieties; thus, they have a very similar average height (also the lowest height values among the experiments). As was noted earlier, a histogram of the height of the points from the LiDAR point cloud provides information about the distribution of different height values in a plot. This genotype dependent information may be discriminating in predictive models. Figure A3 in Appendix B shows the histograms for the dwarf grain sorghum 341x10 and the photoperiod sensitive Trudan Headless varieties in the HyCal-17 experiment.
The multi-temporal and cross-correlations during the growing season can be useful for screening for redundant features. Figure 10 shows the correlation matrices for the OSAVI and LiDAR-based canopy cover calculated using the combined data acquired over the HyCal-17 experiment in the 2017 growing season. It illustrates the rapid changes at the beginning of the season, especially prior to the flowering time (second week of July for this experiment) which is associated with the rapid growth of the plants. From Figure 10, OSAVI changed more than the canopy cover during the early season, and end of season OSAVI values have lower inter-temporal correlation compared to the correlation between the canopy cover values on corresponding dates.

Average height
Standard deviation of height Similar to the last section, Figure 11 and Figure 12 show the results of the multiple-comparison Tukey's test for the OSAVI and volume features for the HyCal-17 and HyCal-18 experiments on multiple dates during the 2017 and 2018 growing seasons. These results are consistent with the results of Tukey's test conducted on biomass data in the previous section, which greater variability among the varieties at the end of each growing season. The multi-temporal and cross-correlations during the growing season can be useful for screening for redundant features. Figure 10 shows the correlation matrices for the OSAVI and LiDAR-based canopy cover calculated using the combined data acquired over the HyCal-17 experiment in the 2017 growing season. It illustrates the rapid changes at the beginning of the season, especially prior to the flowering time (second week of July for this experiment) which is associated with the rapid growth of the plants. From Figure 10, OSAVI changed more than the canopy cover during the early season, and end of season OSAVI values have lower inter-temporal correlation compared to the correlation between the canopy cover values on corresponding dates.

Relationship Between Features and Biomass
In this section, the relationship between the biomass data and RS features, as well as the change in their relationship during the growing season, is discussed. Figure 13 shows one feature from

Relationship Between Features and Biomass
In this section, the relationship between the biomass data and RS features, as well as the change in their relationship during the growing season, is discussed. Figure 13 shows one feature from

Relationship Between Features and Biomass
In this section, the relationship between the biomass data and RS features, as well as the change in their relationship during the growing season, is discussed. Figure 13 shows one feature from

Relationship between Features and Biomass
In this section, the relationship between the biomass data and RS features, as well as the change in their relationship during the growing season, is discussed. Figure 13 shows one feature from LiDAR (90th percentile height of the plants), one feature from hyperspectral data (Intg_NIR1), and the biomass data for the dwarf grain sorghum 341 × 10 and the photoperiod sensitive Trudan Headless varieties, both from the HyCal-17 and HyCal-18 experiments (one with low and one with high biomass values). To compare the data from the two years at the same stage of growth, the data are plotted versus growing degree day (GDD), a heat index calculated from temperature data for each day [68]. At GDD of 2100, the biomass in 2018 for both varieties was slightly higher than in 2017 (as noted in Table 1, the HyCal-18 was planted two weeks earlier than HyCal-17). The height data, Intg_NIR1, and biomass data follow the same pattern of change over time for each variety in both growing seasons. The height for the photoperiod sensitive variety ("Sordan Headless") always increases, while other varieties stop growing around flowering time (GDD of 1500); the Intg_NIR1 increases rapidly earlier in the season, and then gradually decreases at around GDD of 1500 until the end of the season; the biomass continues to increase, and especially for the photoperiod sensitive variety. Inter-annual differences also inherently include the impact of the timing and quantity of rainfall.
Remote Sens. 2020, 12, x 14 of 35 LiDAR (90th percentile height of the plants), one feature from hyperspectral data (Intg_NIR1), and the biomass data for the dwarf grain sorghum 341x10 and the photoperiod sensitive Trudan Headless varieties, both from the HyCal-17 and HyCal-18 experiments (one with low and one with high biomass values). To compare the data from the two years at the same stage of growth, the data are plotted versus growing degree day (GDD), a heat index calculated from temperature data for each day [68]. At GDD of 2100, the biomass in 2018 for both varieties was slightly higher than in 2017 (as noted in Table 1, the HyCal-18 was planted two weeks earlier than HyCal-17). The height data, Intg_NIR1, and biomass data follow the same pattern of change over time for each variety in both growing seasons. The height for the photoperiod sensitive variety ("Sordan Headless") always increases, while other varieties stop growing around flowering time (GDD of 1500); the Intg_NIR1 increases rapidly earlier in the season, and then gradually decreases at around GDD of 1500 until the end of the season; the biomass continues to increase, and especially for the photoperiod sensitive variety. Inter-annual differences also inherently include the impact of the timing and quantity of rainfall. For each feature extracted from the RS data, the simple prediction potential (R 2 ) and associated changes during a season were investigated using linear regression-based models for the end of season biomass prediction. Robust features should be applicable across the varieties, at least for common experiments. Figure 14 and Figure 15 show the R 2 values of the models for each feature extracted from LiDAR and hyperspectral VNIR data, at four stages of growth and for all nine experiments conducted in the 2017 and 2018 growing seasons. From Figure 14, the 30th percentile height and volume features provided the highest R 2 values for predicting the end of season biomass among the LiDAR-based features for both HyCal-17 and HyCal-18 experiments, as the varieties in those experiments were more diverse in their structural characteristics, providing strong potential for biomass prediction using geometric-related features. The R 2 values for different LiDAR-based features in the InCal-17, InCal-18, and SbDiv-17 experiments are very similar, which is consistent with Figure 9; they all have the lowest average height and lowest variability in height compared to the other experiments, resulting in lower R 2 values for these experiments compared to the experiments with hybrid cultivars. For both InCal-17 and InCal-18 experiments, the highest R 2 values were obtained from feature #5 (coefficient of variation of height), which is representative of the distribution of the points in the canopy point cloud. The R 2 values of the models developed for the SbDivTc-18 and InCalTc-18 are lower than the HyCal-17 and HyCal-18 experiments, but the same features (features #1, #2, and #5) provided the maximum R 2 for all of these experiments, which include hybrid cultivars. The SbBAP-17 also includes hybrid cultivars; however, the R 2 values for all the features are lower compared to all other experiments, mainly because the last LiDAR data were For each feature extracted from the RS data, the simple prediction potential (R 2 ) and associated changes during a season were investigated using linear regression-based models for the end of season biomass prediction. Robust features should be applicable across the varieties, at least for common experiments. Figures 14 and 15 show the R 2 values of the models for each feature extracted from LiDAR and hyperspectral VNIR data, at four stages of growth and for all nine experiments conducted in the 2017 and 2018 growing seasons. From Figure 14, the 30th percentile height and volume features provided the highest R 2 values for predicting the end of season biomass among the LiDAR-based features for both HyCal-17 and HyCal-18 experiments, as the varieties in those experiments were more diverse in their structural characteristics, providing strong potential for biomass prediction using geometric-related features. The R 2 values for different LiDAR-based features in the InCal-17, InCal-18, and SbDiv-17 experiments are very similar, which is consistent with Figure 9; they all have the lowest average height and lowest variability in height compared to the other experiments, resulting in lower R 2 values for these experiments compared to the experiments with hybrid cultivars. For both InCal-17 and InCal-18 experiments, the highest R 2 values were obtained from feature #5 (coefficient of variation of height), which is representative of the distribution of the points in the canopy point cloud. The R 2 values of the models developed for the SbDivTc-18 and InCalTc-18 are lower than the HyCal-17 and HyCal-18 experiments, but the same features (features #1, #2, and #5) provided the maximum R 2 for all of these experiments, which include hybrid cultivars. The SbBAP-17 also includes hybrid cultivars; however, the R 2 values for all the features are lower compared to all other experiments, mainly because the last LiDAR data were collected on August 30th, and included many photoperiod sensitive cultivars which grew until the final biomass data were collected at the harvest (28 September). Other varieties did not grow during this time, which impacted the biomass-height relationships. Generally, for the experiments with the hybrid cultivars (refer to Table 1), the late season data sets provided the highest R 2 , while for the experiments that included inbred cultivars, the data sets of GDDs yielded the lowest R 2 values.
Remote Sens. 2020, 12, x 15 of 35 collected on August 30th, and included many photoperiod sensitive cultivars which grew until the final biomass data were collected at the harvest (September 28th). Other varieties did not grow during this time, which impacted the biomass-height relationships. Generally, for the experiments with the hybrid cultivars (refer to Table 1), the late season data sets provided the highest R 2 , while for the experiments that included inbred cultivars, the data sets of GDDs yielded the lowest R 2 values. For the hyperspectral features shown in Figure 15, the highest R 2 values are associated with InCal-17 and InCal-18, collected on ~80 DAS, while for other experiments, the dataset of ~95 DAS yielded the highest R 2 values. Moreover, the same pattern for R 2 values for the features of the InCal-17, InCal-18, and SbDiv-17 was observed. For these panels, the R 2 is generally higher than the panels that include hybrid cultivars.  For the hyperspectral features shown in Figure 15, the highest R 2 values are associated with InCal-17 and InCal-18, collected on~80 DAS, while for other experiments, the dataset of~95 DAS yielded the highest R 2 values. Moreover, the same pattern for R 2 values for the features of the InCal-17, InCal-18, and SbDiv-17 was observed. For these panels, the R 2 is generally higher than the panels that include hybrid cultivars.
Given similar trends of the regression models shown in Figures 14 and 15, the models were developed across all the experiments for each of the features, and all the available dates to investigate the potential for using a common set of features for all experiments and times for the multiple input predictive models. The average R 2 for each feature, from all the dates and all the experiments is provided in Figure 16, which shows volume, 30th percentile height, OSAVI, FDR-min, and NDWI features had the highest average R 2 from LiDAR, VNIR, and SWIR data sets, respectively. Linear regression models were also developed for the individual band values from both hyperspectral VNIR and SWIR data. The average and maximum R 2 for each band from all the dates and all the experiments are shown in Figure 17, which shows that the area of spectrum between 750 and 1100 nm provided the highest R 2 for the linear regression models. While the R 2 values for some experiments and some dates for the bands in 2000-2300 nm are relatively high (30-60%), the average R 2 values are much lower in comparison to the 750-1100 nm range. Given similar trends of the regression models shown in Figure 14 and Figure 15, the models were developed across all the experiments for each of the features, and all the available dates to investigate the potential for using a common set of features for all experiments and times for the multiple input predictive models. The average R 2 for each feature, from all the dates and all the experiments is provided in Figure 16, which shows volume, 30 th percentile height, OSAVI, FDR-min, and NDWI features had the highest average R 2 from LiDAR, VNIR, and SWIR data sets, respectively. Linear regression models were also developed for the individual band values from both hyperspectral VNIR and SWIR data. The average and maximum R 2 for each band from all the dates and all the experiments are shown in Figure 17, which shows that the area of spectrum between 750 and 1100 nm provided the highest R 2 for the linear regression models. While the R 2 values for some experiments and some dates for the bands in 2000-2300 nm are relatively high (30-60%), the average R 2 values are much lower in comparison to the 750-1100 nm range. Given similar trends of the regression models shown in Figure 14 and Figure 15, the models were developed across all the experiments for each of the features, and all the available dates to investigate the potential for using a common set of features for all experiments and times for the multiple input predictive models. The average R 2 for each feature, from all the dates and all the experiments is provided in Figure 16, which shows volume, 30 th percentile height, OSAVI, FDR-min, and NDWI features had the highest average R 2 from LiDAR, VNIR, and SWIR data sets, respectively. Linear regression models were also developed for the individual band values from both hyperspectral VNIR and SWIR data. The average and maximum R 2 for each band from all the dates and all the experiments are shown in Figure 17, which shows that the area of spectrum between 750 and 1100 nm provided the highest R 2 for the linear regression models. While the R 2 values for some experiments and some dates for the bands in 2000-2300 nm are relatively high (30-60%), the average R 2 values are much lower in comparison to the 750-1100 nm range.

Biomass Predictive Models
In this section, the results related to the impact of different regression methods, the time of biomass sampling and remote sensing data acquisition, and the number of samples on the prediction results are provided.

Impact of the Data Source and Regression Method on the Prediction Results
To evaluate the performance of different regression-based modeling approaches, PLSR, SVR, and RF were implemented for end of season biomass prediction using the LiDAR and hyperspectral data collected in each growing season. Figure 18 shows the R 2 values of the predictions relative to the reference data for all the experiments, using six data sources (LiDAR, VNIR, SWIR, and combinations), and the three methods. For each prediction with a data source, all available data sets over the whole season were used for training and validation of the models, where two thirds of the sample data (or a maximum of 200 samples) were randomly selected 100 times for the training of the algorithm, and the remaining samples were used for cross validation via the hold-out method. For all the experiments except the SNitTs-18, all the replicates of a variety were assigned to either the training or test sets to avoid any impact from the number of replicates on the prediction results. SNitTs-18, however, included only four varieties, and a different number of replicates for each variety; thus, the training and test sets were assigned randomly from the plots regardless of their varieties for this experiment. Potential reasons for differences in predictions include: i) Diversity in the samples: the regression models are better able to learn the pattern in the data when the samples are more diverse. ii) Number of data samples: the larger the number of data points in an experiment, the higher accuracies are typically achieved for the prediction. iii) Similarity between the training and test data sets; if the training and test data sets are very different, then overfitting can occur for the training data set, resulting in decreased accuracy of the predictions. Note that this can happen when the number of data samples is limited, which causes unlike training and test sets, even when the samples are selected randomly. Additionally, if there is a significant range of biomass values in one experiment, there is more chance to have dissimilar training and test sets.

Biomass Predictive Models
In this section, the results related to the impact of different regression methods, the time of biomass sampling and remote sensing data acquisition, and the number of samples on the prediction results are provided.

Impact of the Data Source and Regression Method on the Prediction Results
To evaluate the performance of different regression-based modeling approaches, PLSR, SVR, and RF were implemented for end of season biomass prediction using the LiDAR and hyperspectral data collected in each growing season. Figure 18 shows the R 2 values of the predictions relative to the reference data for all the experiments, using six data sources (LiDAR, VNIR, SWIR, and combinations), and the three methods. For each prediction with a data source, all available data sets over the whole season were used for training and validation of the models, where two thirds of the sample data (or a maximum of 200 samples) were randomly selected 100 times for the training of the algorithm, and the remaining samples were used for cross validation via the hold-out method. For all the experiments except the SNitTs-18, all the replicates of a variety were assigned to either the training or test sets to avoid any impact from the number of replicates on the prediction results. SNitTs-18, however, included only four varieties, and a different number of replicates for each variety; thus, the training and test sets were assigned randomly from the plots regardless of their varieties for this experiment. Potential reasons for differences in predictions include: (i) Diversity in the samples: the regression models are better able to learn the pattern in the data when the samples are more diverse. (ii) Number of data samples: the larger the number of data points in an experiment, the higher accuracies are typically achieved for the prediction. (iii) Similarity between the training and test data sets; if the training and test data sets are very different, then overfitting can occur for the training data set, resulting in decreased accuracy of the predictions. Note that this can happen when the number of data samples is limited, which causes unlike training and test sets, even when the samples are selected randomly. Additionally, if there is a significant range of biomass values in one experiment, there is more chance to have dissimilar training and test sets.  Figure 18 shows that the highest accuracy of end of season biomass prediction using all combinations of data sources was achieved for the SNitTs-18 experiment. There were two nitrogen treatments in this experiment; half of the plots in this experiment were fertilized with 250 kg/ha nitrogen while the other half were not fertilized, causing high and low biomass values for the plots, high diversity in the reflectance data from the hyperspectral images, as well as high diversity in geometric-based features extracted from LiDAR point cloud (reason i). As was noted, samples were assigned to the training and test sets for this experiment differently from the other experiments, causing multiple samples of each variety to be assigned to both the training and test sets, resulting in increased similarity in the two sets (reason iii).
The highest accuracy of prediction using LiDAR features as the sole input was obtained for the HyCal-17 and HyCal-18 experiments, which include hybrid cultivars that are more diverse in structural characteristics compared to the inbred cultivars; thus, the regression model can distinguish and relate the LiDAR-based features to the biomass data (reason i), which is consistent with the  Figure 18 shows that the highest accuracy of end of season biomass prediction using all combinations of data sources was achieved for the SNitTs-18 experiment. There were two nitrogen treatments in this experiment; half of the plots in this experiment were fertilized with 250 kg/ha nitrogen while the other half were not fertilized, causing high and low biomass values for the plots, high diversity in the reflectance data from the hyperspectral images, as well as high diversity in geometric-based features extracted from LiDAR point cloud (reason i). As was noted, samples were assigned to the training and test sets for this experiment differently from the other experiments, causing multiple samples of each variety to be assigned to both the training and test sets, resulting in increased similarity in the two sets (reason iii).
The highest accuracy of prediction using LiDAR features as the sole input was obtained for the HyCal-17 and HyCal-18 experiments, which include hybrid cultivars that are more diverse in structural characteristics compared to the inbred cultivars; thus, the regression model can distinguish and relate the LiDAR-based features to the biomass data (reason i), which is consistent with the results in Figure 14.
In general, the predictions are more accurate for the experiments that include hybrid cultivars. As was shown in Figure 9, the InCal-17, InCal-18, and SbDiv-17 have the smallest standard deviation in the LiDAR-based height, indicating that the associated varieties have similar structural characteristics.
The predictions for the SbDiv-17 are more accurate than the predictions for the InCal-17 as more samples were available for the training set, 200 for SbDiv-17, and 80 for InCal-17 (reason ii). For the SbBAP-17, the prediction accuracies are lower than most of the other experiments. This experiment included varieties that were highly diverse in terms of structural characteristics (Figure 9), also had a much larger range compared to the other experiments. This resulted in dissimilar samples in the training and test sets (reason iii). Figure 19 shows a box plot for the fresh biomass data for all the experiments, and the SbBAP-17 experiment had the greatest range of biomass values among the experiments, the lowest accuracies for the predictions, while SNitTs-18 experiment has the smallest range and the highest R 2 values for the predictions.
Remote Sens. 2020, 12, x 19 of 35 The predictions for the SbDiv-17 are more accurate than the predictions for the InCal-17 as more samples were available for the training set, 200 for SbDiv-17, and 80 for InCal-17 (reason ii). For the SbBAP-17, the prediction accuracies are lower than most of the other experiments. This experiment included varieties that were highly diverse in terms of structural characteristics (Figure 9), also had a much larger range compared to the other experiments. This resulted in dissimilar samples in the training and test sets (reason iii). Figure 19 shows a box plot for the fresh biomass data for all the experiments, and the SbBAP-17 experiment had the greatest range of biomass values among the experiments, the lowest accuracies for the predictions, while SNitTs-18 experiment has the smallest range and the highest R 2 values for the predictions. For HyCal-17 and HyCal-18 experiments, PLSR, SVR, and RF models were developed using all three data sources and leave-one-out cross validation strategy, where in each fold, one variety was assigned as test and the other 17 varieties were included in the training set. The results are shown in Figure 20. For both experiments, the SVR method provided the highest R 2 values for the predictions. For HyCal-17, all three regression methods underestimated the value of the biomass for one of the photoperiod varieties (which also had the maximum biomass, as noted previously); however, the RF model had the lowest accuracy. RF for both years resulted the lowest R 2 as a result of overfitting as was discussed earlier. All three methods resulted in predictions with lower accuracies for the experiment in 2018 compared to 2017, which could be because the end of season biomass data were measured at an earlier date in 2018, when all the plants had not reached full maturity. The predictions for the SbDiv-17 are more accurate than the predictions for the InCal-17 as more samples were available for the training set, 200 for SbDiv-17, and 80 for InCal-17 (reason ii). For the SbBAP-17, the prediction accuracies are lower than most of the other experiments. This experiment included varieties that were highly diverse in terms of structural characteristics (Figure 9), also had a much larger range compared to the other experiments. This resulted in dissimilar samples in the training and test sets (reason iii). Figure 19 shows a box plot for the fresh biomass data for all the experiments, and the SbBAP-17 experiment had the greatest range of biomass values among the experiments, the lowest accuracies for the predictions, while SNitTs-18 experiment has the smallest range and the highest R 2 values for the predictions.

Predictions in Time
To evaluate the capability of remotely sensed data for predicting biomass through the growing season, SVR models were developed for six dates in the 2017 and 2018 growing season for the HyCal-17 and HyCal-18 experiments. The R 2 values of the predictions relative to the reference data are shown in Figure 21. For each prediction, all the VNIR hyperspectral and LiDAR data collected prior to the date of biomass measurement were used in the SVR models. The R 2 values of the predictions at the beginning of the season were lower compared to the end of the season, especially when using only VNIR features. Early season growth is focused on the production of biomass from stalks and leaves, while mid-season development is related to flowering and early development of panicles. Plant structural characteristics do not change significantly after flowering in the mid-season, while spectral characteristics change significantly especially during flowering with the emergence of the panicles.

Multi-Temporal Predictions of End-of-Season Biomass
It is highly desirable to predict the end-of-season biomass as early as possible during the growing season to avoid unnecessary investment of phenotyping resources in non-productive varieties. The SbBAP-17, SbDiv-17, and SbDiv-18 were chosen to conduct the evaluations in this section as they had an adequate number of samples as well as RS data points and included both hybrid and inbred cultivars. Figure 22 shows the accuracy of end-of-season biomass predictions for these experiments using hyperspectral VNIR, SWIR, and LiDAR data from each date individually, and in combination with the earlier dates. For both SbBAP-17 and SbDiv-17 experiments, the earliest data set yielded very low prediction accuracies. For SbBAP-17, the best results when using features from individual sensors for VNIR and SWIR data sets were achieved from Sep. 10th with R 2 = 0.60

Predictions in Time
To evaluate the capability of remotely sensed data for predicting biomass through the growing season, SVR models were developed for six dates in the 2017 and 2018 growing season for the HyCal-17 and HyCal-18 experiments. The R 2 values of the predictions relative to the reference data are shown in Figure 21. For each prediction, all the VNIR hyperspectral and LiDAR data collected prior to the date of biomass measurement were used in the SVR models. The R 2 values of the predictions at the beginning of the season were lower compared to the end of the season, especially when using only VNIR features. Early season growth is focused on the production of biomass from stalks and leaves, while mid-season development is related to flowering and early development of panicles. Plant structural characteristics do not change significantly after flowering in the mid-season, while spectral characteristics change significantly especially during flowering with the emergence of the panicles.

Predictions in Time
To evaluate the capability of remotely sensed data for predicting biomass through the growing season, SVR models were developed for six dates in the 2017 and 2018 growing season for the HyCal-17 and HyCal-18 experiments. The R 2 values of the predictions relative to the reference data are shown in Figure 21. For each prediction, all the VNIR hyperspectral and LiDAR data collected prior to the date of biomass measurement were used in the SVR models. The R 2 values of the predictions at the beginning of the season were lower compared to the end of the season, especially when using only VNIR features. Early season growth is focused on the production of biomass from stalks and leaves, while mid-season development is related to flowering and early development of panicles. Plant structural characteristics do not change significantly after flowering in the mid-season, while spectral characteristics change significantly especially during flowering with the emergence of the panicles.

Multi-Temporal Predictions of End-of-Season Biomass
It is highly desirable to predict the end-of-season biomass as early as possible during the growing season to avoid unnecessary investment of phenotyping resources in non-productive varieties. The SbBAP-17, SbDiv-17, and SbDiv-18 were chosen to conduct the evaluations in this section as they had an adequate number of samples as well as RS data points and included both hybrid and inbred cultivars. Figure 22 shows the accuracy of end-of-season biomass predictions for these experiments using hyperspectral VNIR, SWIR, and LiDAR data from each date individually, and in combination with the earlier dates. For both SbBAP-17 and SbDiv-17 experiments, the earliest data set yielded very low prediction accuracies. For SbBAP-17, the best results when using features from individual sensors for VNIR and SWIR data sets were achieved from Sep. 10th with R 2 = 0.60

Multi-Temporal Predictions of End-of-Season Biomass
It is highly desirable to predict the end-of-season biomass as early as possible during the growing season to avoid unnecessary investment of phenotyping resources in non-productive varieties. The SbBAP-17, SbDiv-17, and SbDiv-18 were chosen to conduct the evaluations in this section as they had an adequate number of samples as well as RS data points and included both hybrid and inbred cultivars. Figure 22 shows the accuracy of end-of-season biomass predictions for these experiments using hyperspectral VNIR, SWIR, and LiDAR data from each date individually, and in combination with the earlier dates. For both SbBAP-17 and SbDiv-17 experiments, the earliest data set yielded very low prediction accuracies. For SbBAP-17, the best results when using features from individual sensors for VNIR and SWIR data sets were achieved from 10 September with R 2 = 0.60 and R 2 = 0.54, respectively. Based on LiDAR data, 23 August resulted in the highest values, with R 2 = 0.46. For the SbDiv-17 experiment, the combined VNIR and LiDAR data sets of 25 July and 2 August provided the highest accuracy of using individual data sets. For SbDivTc-18, the data inputs from 11 July resulted an R 2 of 0.75, which indicates the July data sets have good potential for biomass predictions. For all three experiments, the best results were obtained when features from all the hyperspectral and LiDAR data sets from the whole season were used, resulting in R 2 of 0.63, 0.75, and 0.78 for the SbBAP-17, SbDiv-17, and SbDiv-18 experiments, respectively. Although the best results were obtained using the whole season RS data, the models developed using middle season data (DAS of~60 to 80) were also able to provide comparable accuracies.
Remote Sens. 2020, 12, x 21 of 35 and R 2 = 0.54, respectively. Based on LiDAR data, Aug. 23rd resulted in the highest values, with R 2 = 0.46. For the SbDiv-17 experiment, the combined VNIR and LiDAR data sets of July 25th and August 2nd provided the highest accuracy of using individual data sets. For SbDivTc-18, the data inputs from July 11th resulted an R 2 of 0.75, which indicates the July data sets have good potential for biomass predictions. For all three experiments, the best results were obtained when features from all the hyperspectral and LiDAR data sets from the whole season were used, resulting in R 2 of 0.63, 0.75, and 0.78 for the SbBAP-17, SbDiv-17, and SbDiv-18 experiments, respectively. Although the best results were obtained using the whole season RS data, the models developed using middle season data (DAS of ~60 to 80) were also able to provide comparable accuracies.

Impact of the Number of Features and Samples on Biomass Prediction
As noted earlier, measuring biomass in the field is time-consuming and expensive; however, it is still required for training the regression models. Generally, the greater the number of samples for training, the more accurate the predictions. To evaluate the impact of the number of samples in training set, the SVR and PLSR models are developed for end-of-season biomass prediction using all the data sources and various numbers of samples in the training set. Each model was trained with a specific number of samples, and the process was repeated 100 times, each time with a different, but same sized set of randomly selected samples. The rest of the available samples were assigned to the testing set. Figure 23 shows the median and standard deviation of the R 2 values of developed models for some of the experimental trials. The R 2 of predictions for both SVR and PLSR models increase as the number of training samples increases. However, the accuracy of PLSR models is higher than SVR models when a smaller number of training samples is used. The rate of increase of R 2 with the respective increases in the number of training samples is higher for SVR compared to PLSR; thus, when the maximum of available samples is used in training for experiments (e.g., for SbDivTc-18), the SVR models had higher R 2 values. For all the experiments, the standard deviation of the R 2 values decreases as the number of training samples increases, showing more reliable (repeatable) prediction models are developed when more samples are available for training, as expected. However, for some experiments such as HyCal-17, the standard deviation of R 2 decreases initially, reaches a minimum, then increases. This is attributed to the small total number of samples: using more samples in the training set implies a smaller number of samples is available in the test set.
Remote Sens. 2020, 12, x 22 of 35 As noted earlier, measuring biomass in the field is time-consuming and expensive; however, it is still required for training the regression models. Generally, the greater the number of samples for training, the more accurate the predictions. To evaluate the impact of the number of samples in training set, the SVR and PLSR models are developed for end-of-season biomass prediction using all the data sources and various numbers of samples in the training set. Each model was trained with a specific number of samples, and the process was repeated 100 times, each time with a different, but same sized set of randomly selected samples. The rest of the available samples were assigned to the testing set. Figure 1 shows the median and standard deviation of the R 2 values of developed models for some of the experimental trials. The R 2 of predictions for both SVR and PLSR models increase as the number of training samples increases. However, the accuracy of PLSR models is higher than SVR models when a smaller number of training samples is used. The rate of increase of R 2 with the respective increases in the number of training samples is higher for SVR compared to PLSR; thus, when the maximum of available samples is used in training for experiments (e.g. for SbDivTc-18), the SVR models had higher R 2 values. For all the experiments, the standard deviation of the R 2 values decreases as the number of training samples increases, showing more reliable (repeatable) prediction models are developed when more samples are available for training, as expected. However, for some experiments such as HyCal-17, the standard deviation of R 2 decreases initially, reaches a minimum, then increases. This is attributed to the small total number of samples: using more samples in the training set implies a smaller number of samples is available in the test set.

Discussion
The predictions results with respect to the diversity, number of samples, and similarity between test and training sets for all the experiments are summarized in Error! Reference source not found.3. Table 3. Summary of the prediction results of the experimental trials.

InCalTc-18
SbDivTc-18 SNitTs Figure 23. Impact of the number of samples on R 2 of the predictive models using SVR and PLSR models.

Discussion
The predictions results with respect to the diversity, number of samples, and similarity between test and training sets for all the experiments are summarized in Table 3. In general, for the experiments with hybrid cultivars, RF models had lowest prediction accuracies among the three methods, which is related to the fact that there was more dissimilarity between the training and test sets in both RS and biomass data among the hybrid cultivars compared to those that were inbred, and RF models can be overfitted to the training data set; thus, they may not provide as accurate predictions as SVR and PLSR. For the InCal-18, however, RF yielded the highest accuracies for most of the data sources. For the experiments with a sample size of 200 (SbDiv-17, SbBAP-17, and SbDivTc-18), the SVR models provided the most accurate results, while for the experiments with a lower number of data samples, PLSR provided the highest prediction accuracies.
A summary of the prediction results for various data sources and regression methods is provided in Figure 24, where the R 2 values of the nine experiments are shown in a box plot (RMSE values are shown in Figure A6 in Appendix B). For the LiDAR data, the RF method provided slightly higher median accuracies than PLSR and SVR, with lower variability in R 2 values (more reliability). When VNIR data was the only input, PLSR yielded more accurate results, which is similar to the results obtained in [10] yield prediction of potatoes using VNIR hyperspectral data. For all other data sources, SVR yielded a higher median R 2 . For SWIR and VNIR combined with SWIR sources, SVR provided more reliable results, while for VNIR, VNIR combined with LiDAR, as well as a combination of all data sources, PLSR provided more reliable results. Also, the SVR models provided the maximum R 2 for all the data sources. In general, for the experiments with hybrid cultivars, RF models had lowest prediction accuracies among the three methods, which is related to the fact that there was more dissimilarity between the training and test sets in both RS and biomass data among the hybrid cultivars compared to those that were inbred, and RF models can be overfitted to the training data set; thus, they may not provide as accurate predictions as SVR and PLSR. For the InCal-18, however, RF yielded the highest accuracies for most of the data sources. For the experiments with a sample size of 200 (SbDiv-17, SbBAP-17, and SbDivTc-18), the SVR models provided the most accurate results, while for the experiments with a lower number of data samples, PLSR provided the highest prediction accuracies.
A summary of the prediction results for various data sources and regression methods is provided in Figure 24, where the R 2 values of the nine experiments are shown in a box plot (RMSE values are shown in Figure A6 in Appendix B). For the LiDAR data, the RF method provided slightly higher median accuracies than PLSR and SVR, with lower variability in R 2 values (more reliability). When VNIR data was the only input, PLSR yielded more accurate results, which is similar to the results obtained in [10] yield prediction of potatoes using VNIR hyperspectral data. For all other data sources, SVR yielded a higher median R 2 . For SWIR and VNIR combined with SWIR sources, SVR provided more reliable results, while for VNIR, VNIR combined with LiDAR, as well as a combination of all data sources, PLSR provided more reliable results. Also, the SVR models provided the maximum R 2 for all the data sources. Similar to the study by Almeida et al. [69], an ANOVA test was performed on the prediction results to determine the impact of the method and data source (e.g. sensor-based features) on the prediction results. The ANOVA results provided in Table 4 show that the data source is the cause of 69% of the variation in the prediction results, 28% for the regression method, and 3% for the interaction their interaction. These results are consistent with those of [33,69]. This indicates that the data source should also be considered to determine the regression method in the design of similar experiments. A similar ANOVA test was performed on the R 2 values for six experiments, including Similar to the study by Almeida et al. [69], an ANOVA test was performed on the prediction results to determine the impact of the method and data source (e.g., sensor-based features) on the prediction results. The ANOVA results provided in Table 4 show that the data source is the cause of 69% of the variation in the prediction results, 28% for the regression method, and 3% for the interaction their interaction. These results are consistent with those of [33,69]. This indicates that the data source should also be considered to determine the regression method in the design of similar experiments. A similar ANOVA test was performed on the R 2 values for six experiments, including HyCal-17, HyCal-18, InCal-17, InCal-18, SbDiv-17, and SbDiv-18, of which three include hybrid cultivars, and three include inbred cultivars. On this test, the cultivar type (hybrid or inbred) was considered as the third factor. Recall that the sorghum hybrid cultivars are more variable in their characteristics than inbreds in terms of biomass and structural characteristics. The results of this test in Table 5 indicate that the data source also has the highest contribution to the variation in the predictions (44%). The regression method is the cause of 26% of the variation in the prediction results, and less than 1% is attributed to the cultivar type. However, the interaction between the regression method and cultivar is responsible for 24% of the variation, suggesting that the cultivar type is another important factor to consider when determining the regression method for developing predictive models. HyCal-17, HyCal-18, InCal-17, InCal-18, SbDiv-17, and SbDiv-18, of which three include hybrid cultivars, and three include inbred cultivars. On this test, the cultivar type (hybrid or inbred) was considered as the third factor. Recall that the sorghum hybrid cultivars are more variable in their characteristics than inbreds in terms of biomass and structural characteristics. The results of this test in Table 5 indicate that the data source also has the highest contribution to the variation in the predictions (44%). The regression method is the cause of 26% of the variation in the prediction results, and less than 1% is attributed to the cultivar type. However, the interaction between the regression method and cultivar is responsible for 24% of the variation, suggesting that the cultivar type is another important factor to consider when determining the regression method for developing predictive models.

Recommendations for Biomass Prediction
In this section, we summarize the findings of the tests conducted in this paper in the format of recommendations to achieve reliable biomass prediction. The recommendations are based on the data, the genotypes, and the location where the study was conducted; thus, they might not be generalizable.
• Regression Model Both PLSR and SVR models provided more accurate predictions than RF. SVR generally provided more accurate predictions, however, PLSR is preferable when the number of sample data points is very limited (less than 50 samples) as well as when high variability in the biomass data is expected.
• Data Source ANOVA analysis on the prediction results showed that the data source is the most important factor on determining the accuracy of the predictions. Considering individual sensors, the features extracted from VNIR provided the most accurate predictions. However, adding the geometric based features extracted from the LiDAR data to the models improved the accuracy of the predictions significantly, which is consistent with previous studies for biomass prediction in forest environments [44,69]. We recommend acquiring data from both VNIR and LiDAR sensors, if possible, for the most reliable biomass prediction.   HyCal-17, HyCal-18, InCal-17, InCal-18, SbDiv-17, and SbDiv-18, of which three include hybrid cultivars, and three include inbred cultivars. On this test, the cultivar type (hybrid or inbred) was considered as the third factor. Recall that the sorghum hybrid cultivars are more variable in their characteristics than inbreds in terms of biomass and structural characteristics. The results of this test in Table 5 indicate that the data source also has the highest contribution to the variation in the predictions (44%). The regression method is the cause of 26% of the variation in the prediction results, and less than 1% is attributed to the cultivar type. However, the interaction between the regression method and cultivar is responsible for 24% of the variation, suggesting that the cultivar type is another important factor to consider when determining the regression method for developing predictive models.

Recommendations for Biomass Prediction
In this section, we summarize the findings of the tests conducted in this paper in the format of recommendations to achieve reliable biomass prediction. The recommendations are based on the data, the genotypes, and the location where the study was conducted; thus, they might not be generalizable.
• Regression Model Both PLSR and SVR models provided more accurate predictions than RF. SVR generally provided more accurate predictions, however, PLSR is preferable when the number of sample data points is very limited (less than 50 samples) as well as when high variability in the biomass data is expected.
• Data Source ANOVA analysis on the prediction results showed that the data source is the most important factor on determining the accuracy of the predictions. Considering individual sensors, the features

Recommendations for Biomass Prediction
In this section, we summarize the findings of the tests conducted in this paper in the format of recommendations to achieve reliable biomass prediction. The recommendations are based on the data, the genotypes, and the location where the study was conducted; thus, they might not be generalizable.

Regression Model
Both PLSR and SVR models provided more accurate predictions than RF. SVR generally provided more accurate predictions, however, PLSR is preferable when the number of sample data points is very limited (less than 50 samples) as well as when high variability in the biomass data is expected.

•
Data Source ANOVA analysis on the prediction results showed that the data source is the most important factor on determining the accuracy of the predictions. Considering individual sensors, the features extracted from VNIR provided the most accurate predictions. However, adding the geometric based features extracted from the LiDAR data to the models improved the accuracy of the predictions significantly, which is consistent with previous studies for biomass prediction in forest environments [44,69]. We recommend acquiring data from both VNIR and LiDAR sensors, if possible, for the most reliable biomass prediction. •

Flight Time and Frequency
The analysis on correlation between the data sets captured throughout the season showed that changes occur more rapidly earlier in the season (a result of fast-growing plants), while the multi-temporal data captured later in season are more similar. This implies that frequent data collection at the end of season is not required. Moreover, the most accurate end of season biomass prediction was achieved using the data captured around 60-80 DAS, which can be considered as an important time for collecting RS data. Zhou et al. [70] obtained similar results, but for rice yield prediction using UAV-based multispectral and digital imagery. •

Required Reference Data
Based on the results provided in Figure 24, we recommend collecting least 50 samples. If it is expected to have high variability in the biomass data associated with the varieties in the experiments, more samples would be required.

Conclusions
In this paper, we explored the potential for reliable prediction of sorghum biomass using multi-temporal hyperspectral and LiDAR data acquired by sensors mounted on UAV platforms. We developed prediction models using three nonlinear regression models for nine experiments conducted in the 2017 and 2018 growing seasons at the Agronomy Center for Research and Education (ACRE) at Purdue University. Experiments included multiple sorghum varieties with different sample sizes, providing an opportunity for multiple statistical tests and models. Based on the experiments conducted in this study, nitrogen and photosynthesis related features extracted from hyperspectral data and geometric based features derived from the LiDAR data provided reliable and accurate prediction of biomass. The 750-1100 nm range of the spectrum provided the most relevant information for biomass prediction.
Both data source and regression method are important factors for a reliable prediction; however, the ANOVA results show that the data source was more important with 69% significance, versus 28% significance for the regression method. The number of samples in training set for the prediction is an important factor for determining the accuracy of the predictions. Generally, the PLSR method provided more accurate prediction models when the number of samples in training was limited. With increasing samples, the rate of increase in the accuracy of the SVR models was higher than PLSR.
We also evaluated the prediction models with respect to the time of the RS data acquisition and the time of harvest. The end-of-season biomass predictions were more reliable and accurate than the mid-season predictions, as more varieties in the field were at the same stage of growth. With respect to the remote sensing data, the best results were obtained using the RS data from the whole season; however, the models developed using mid-season data (DAS of~60 to 80) were also able to provide comparably accurate results, which were useful for early screening of varieties.

Conflicts of Interest:
The authors declare no conflict of interest.