Development and Validation of a Model to Combine NDVI and Plant Height for High-Throughput Phenotyping of Herbage Yield in a Perennial Ryegrass Breeding Program

: Sensor-based phenotyping technologies may o ﬀ er a non-destructive, high-throughput and e ﬃ cient assessment of herbage yield (HY) to replace current ine ﬃ cient phenotyping methods. This paper assesses the feasibility of combining normalised di ﬀ erence vegetative index (NDVI) from multispectral imaging and ultrasonic sonar estimates of plant height to estimate HY of single plants in a large perennial ryegrass breeding program. For sensor calibration, fresh HY (FHY) and dry HY (DHY) were acquired destructively, and plant height was measured at four dates each in 2017 and 2018 from a selected subset of 480 plants. Global multiple linear regression models based on K-fold and random split cross-validation methods were used to evaluate the relationship between observed vs. predicted HY. The coe ﬃ cient of determination (R 2 ) = 0.67–0.68 and a root mean square error (RMSE) between 5.43–7.60 g was obtained for the validation of predicted vs. observed DHY. The mean absolute error (MAE) and mean percentage error (MPE) ranged between 3.59–5.44 g and 22–28%, respectively. For the FHY, R 2 values ranged from 0.63 to 0.70, with an RMSE between 23.50 and 33 g, MAE between 15.11 and 24.34 g and MPE between ~22% and 31%. Combining NDVI and plant height is a robust method to enable high-throughput phenotyping of herbage yield in perennial ryegrass breeding programs.


Introduction
Perennial ryegrass (Lolium perenne L.) is one of the most important temperate forage crops in Australia for dairy, meat and wool production and it makes a significant contribution to Australia's grazing industries valued at $8 billion/annum [1]. Herbage yield (HY) improvement is one of the primary ryegrass breeding targets, yet it is not easy to improve because phenotyping herbage yield is laborious, costly and must be performed multiple times within a year and across multiple years [2][3][4]. The development of new cultivars may take up to 15 years, where repeated phenotyping of herbage yield and related traits is required at every harvest. However, performing measurements using studies have been performed that apply sensors in combination to determine the HY estimations of single plants. In this study, we develop various cross-validation models using NDVI from UAS-based multispectral imaging and plant height from the ground-based vehicle to predict fresh herbage yield (FHY) and dry herbage yield (DHY) in large perennial ryegrass breeding program.
The overall objective of the study was to establish herbage yield estimation models for individual perennial ryegrass plants by using data derived from non-destructive methods including plant height, NDVI and a fusion of plant height and NDVI for eight harvests during the 2017 and 2018 growing seasons. The model predictions were compared with manual measurements to validate them as predictors of HY for individual plants.

Experimental Setup
The experiment was carried out on a field at Agriculture Victoria Research's (AVR)-Hamilton Centre (37. Table 1. Two phenotyping systems, ground and UAS-based platforms were used for data collection. A ground-based phenotyping system was used to collect plant height measurements from single plants whereas the UAV used a multispectral imaging system to extract individual plant's NDVI values. The selected plants were assessed for height and cut for fresh and dry weight on the following day. During 2018, aerial images were collected via a RedEdge-M (RedEdge-M, MicaSense Inc., Seattle, WA, USA) multispectral camera aboard a DJI M100 (DJI Technology Co., Shenzhen, China) UAS to collect images at 20 m flight elevation. New UAS and camera system was used in this season due to the reason AVR decided to upgrade the carrying capacity of the UAS system at Hamilton Centre in the year 2018. Flights were conducted using the Pix4D capture software. Reflectance values from calibration tarps (Tetracam INC. Chatsworth, CA, USA) were used to correct the reflectance values of the images. The multispectral imagery was collected with 1.2 megapixels. The details of UAS data acquisition methods and processing accuracies are listed in Table 2. During 2018, aerial images were collected via a RedEdge-M (RedEdge-M, MicaSense Inc., Seattle, WA, USA) multispectral camera aboard a DJI M100 (DJI Technology Co., Shenzhen, China) UAS to collect images at 20 m flight elevation. New UAS and camera system was used in this season due to the reason AVR decided to upgrade the carrying capacity of the UAS system at Hamilton Centre in the year 2018. Flights were conducted using the Pix4D capture software. Reflectance values from calibration tarps (Tetracam INC. Chatsworth, CA, USA) were used to correct the reflectance values of the images. The multispectral imagery was collected with 1.2 megapixels. The details of UAS data acquisition methods and processing accuracies are listed in Table 2.   (https://www.qgis.org/en) software environment. This was performed by manually projecting polygons on to each sampled single plant in the coordinate reference system of the trial. For the 2018 data TIFF files of index values from individual bands were loaded to eCognition software (eCognition Developer 9, Trimble, Munich, Germany). In the eCognition workflow, calibration of TIFF index values was performed based on the extrapolation of digital numbers from the calibration tarps of known reflectance. Next, the calculation of VI, manual segmentation, and classification of VIs were performed, followed by NDVI value extractions.   Pix4D, Lausanne, Switzerland) (https://pix4d.com) was used to perform geolocation of captured images by generating orthomosaics. The initial main steps in this process included importing ground control points (GCPs), aligning and reoptimizing of images. The next step included building dense point clouds, building orthomosaic, performing radiometric calibration (2017 data) and calculating NDVI [21]. Nine GCPs were distributed across the field trial site for increasing accuracy and noise reduction ( Figure 1). Accurate geographical locations of GCPs were obtained using a Trimble RTK GNSS receiver (Navcom SF-3040, Nav Com Technology Inc., Torrance, CA, USA), with a horizontal/vertical accuracy of 0.01 m + 0.5 ppm/0.02 m + 1 ppm respectively. The values of the geometric root mean square errors of the GCPs and ground sample distances (GSD) of the processed data are indicated in Table 2 were used to evaluate the accuracy of the orthomosaic images.

Data Collection Platforms
For the 2017 data, single plants NDVI value was calculated from the orthophotos using zonal statistics in the QGIS 2.18.2 (QGIS Development Team, 2017, Raleigh, NC, USA) (https://www.qgis.org/en) software environment. This was performed by manually projecting polygons on to each sampled single plant in the coordinate reference system of the trial.
For the 2018 data TIFF files of index values from individual bands were loaded to eCognition software (eCognition Developer 9, Trimble, Munich, Germany). In the eCognition workflow, calibration of TIFF index values was performed based on the extrapolation of digital numbers from the calibration tarps of known reflectance. Next, the calculation of VI, manual segmentation, and classification of VIs were performed, followed by NDVI value extractions.

PhenoRover System Field Deployments
The PhenoRover system is a ground-based platform consisting of a Polaris Ranger 6 × 6 side-by-side vehicle system (Polaris Industries Inc., Medina, MN, USA) and a range of integrated sensors (Figure 2b). The PhenoRover phenotyping platform was designed for collecting pasture height data, and the development was completed at Hamilton in 2016. The PhenoRover was deployed at the beginning of 2018 and proceeded to collect data at each three-leaf stage harvest at an average speed of 1.4 ms −1 . Six ultrasonic sonar sensors (UNDK 30U6103/S14, Baumer group, Frauenfeld, Switzerland) were installed onto a sensor boom, which was 1.45 m wide custom-made steel rigidly welded to the PhenoRover front. The six ultrasonic sonar sensors in total collected height data in three rows and two columns of plants on each pass of the PhenoRover over the field trial. All ultrasonic sonar sensors were mounted at 0.6 m above the ground and arranged in three rows of 0.6 m apart in a nadir view. The ultrasonic sonars have a high repeatability accuracy of 0.0005 m. The ultrasonic sonar sensors have a measuring range from 0.1 to 1.0 m, with a sonic frequency of 240 KHz taking measurements at 10 Hz. This allows for height measurement to be captured every 0.05-0.10 m when driving at 1.4 ms −1 . In addition to the ultrasonic sensors, one Global Navigation Satellite System (GNSS) antennae (AG25, Trimble, Westminster, CO, USA) was installed at the rooftop of the PhenoRover system and connected to a real-time kinematics GNSS (RTK-GNSS) receiver (FMX Integrated Display, Trimble, Westminster, CA, USA) for generating georeferenced and geolocated sensor data. Moreover, to record the ultrasonic sonar measurements a Campbell Scientific CR3000 datalogger (Campbell Scientific, Inc., Logan, UT, USA) was fitted on the back of the PhenoRover system.

Plant Height Data Extraction from the Ultrasonic Sonar
Ultrasonic sonar sensors measurements were processed using the methodology of Thompson et al. [35] and Wang et al. [36] (Figure 3). Two main steps are followed for analysing the data: (1) georeferencing of sensor and RTK-GNSS data through projecting to the Universal Transverse Mercator (UTM) coordinate system. The UTM position of each sensor's measurement was calculated. The field site is in the UTM zone 54S; and (2) matching ultrasonic sensor data to individual plant information using QGIS to annotate sensor data to each plant's boundaries. Geo-referenced individual plants data were overlayed on pre-defined polygon shapefiles of the selected individual plants in QGIS. Polygons with ID numbers were manually drawn around the individual plants. The data processing and extraction were performed through the processing steps of the HTP geo-processer plug-in. This completes the processing of plant height, displaying an attribute

PhenoRover System Field Deployments
The PhenoRover system is a ground-based platform consisting of a Polaris Ranger 6 × 6 side-byside vehicle system (Polaris Industries Inc., Medina, MN, USA) and a range of integrated sensors ( Figure 2b). The PhenoRover phenotyping platform was designed for collecting pasture height data, by the growth stage of the single plants, in which the 2-3 leaf stage was considered as a standard simulated grazing stage [37]. Thirty-two plants from the third row of the sampled plot were selected for plant height and herbage yield measurement. The destructive measurements of FHY and DHY of single plants were obtained by cutting plants at the 0.05 m above-ground level. FHY was measured from individual plants, and all samples were subsequently dried in an oven for 48 h at 60 • C for the determination of DHY. Additionally, individual plant height was measured using a standard ruler from the base of each plant to the highest upright standing leaf.

Estimation Models of HY through Sensors Data Combination
Data from the ground and aerial-based platforms were combined to obtain the most suitable model for individual plants' herbage yield estimation. Firstly, UAS-based NDVI data extracted from each plant through the process in Figure 3 were modelled to estimate HY at every seasonal harvest. Secondly, manually measured plant height and PhenoRover-based plant height data extracted from individual plants through the process in Figure 3 were compared. Thirdly, the manually measured and PhenoRover-based plant height data were modelled to estimate HY. To improve the estimation of accuracy of FHY and DHY, combining the spectral and structural information may be necessary [12,15,31]. In this study, different multiplicative combination methods of NDVI and plant height were assessed, including NDVI × plant height, NDVI × plant height × plant height and, NDVIsq × plant height (NDVIsq_PH). NDVIsq_PH showed the best combination to estimate FHY and DHY with higher accuracy.

Statistical Analysis
The relationship of herbage yields with plant height and NDVI were analysed using general linear regression models ( Figure 3). The measurement taken at the individual plant level was the unit of analyses. Residuals versus fitted values plots were examined to determine the need for data transformation to ensure the normality of residuals with constant variance. When deemed necessary, data were logarithmically transformed before the final analyses. All analyses were performed in R version 3.5.2 (R Core Team 2018, Vienna, Austria).
On some data collection dates (all of 2017 and one day in 2018), plant heights from ultrasonic sonar was not available due to technical issues with the GPS of the PhenoRover system. For these dates, manually measured plant height with a standard ruler was used. For the testing, the estimation of plant height using ultrasonic sonar, a linear regression relationship with the manually measured plant height was performed on three collection dates of 2018.
A parsimonious model for FHY/DHY was developed by using variables such as NDVI values, manually measured plant height (PH)/ plant height from ultrasonic sonar sensors, the product of NDVI square and plant height (NDVIsq_PH) and seasons. The parsimonious model selected was: The predictive accuracy of the model was determined by splitting data into calibrations/training and validation/test sets, which is often referred to as cross-validation. Cross-validation can be carried out in many ways, but we focused on two methods (1) k-fold cross-validation and (2) random split of the dataset. In K-fold cross-validation, we randomly split the data into k-folds (k = 2, 5, 10 and 20) and K-1 folds (sets) of data used as calibration/training set and model parameters were estimated using this data. Using these model parameters, we predicted the FHY/DHY for the remaining one-fold (set) of the data. The cross-validation process is then repeated k times, with each of the k-fold (sets) used exactly once as the validation/test data.
In random split approach, we randomly split data into 60%/40%, 70%/30% and 80%/20%, where 60%, 70% and 80% of the data were used as calibration/training and 40%, 30% and 20% of the data were used as validation/test set, respectively. Each of the random split strategies was carried out 10,000 times, and results averaged.
The model accuracy was measured using coefficient of determination (R 2 ), root mean squared error (RMSE), mean absolute error (MAE), and mean percentage error (MPE) [21]. The R 2 is the correlation coefficient between measured and predicted herbage yield. RMSE is the square root of the mean squared difference between observed herbage yield and predicted herbage yield. MAE is the mean of the absolute difference between observed herbage yield and predicted herbage yield and mean percentage error is the mean of the error percentage ((|observed herbage yield-predicted herbage yield|/observed herbage yield) × 100).

Correlation between Manual and Sonar Plant Height
A strong correlation (R 2 = 0.76) was observed between manual and ultrasonic sonar plant height ( Figure 5). Plant height measurement using ultrasonic sonar slightly overestimated height, mainly when plants are small, which may result from the uneven surface of the measured field trial. The slope of the regression line was close to one, and the RMSE was 2.98 cm, and this implies that the use

Correlation between Manual and Sonar Plant Height
A strong correlation (R 2 = 0.76) was observed between manual and ultrasonic sonar plant height ( Figure 5). Plant height measurement using ultrasonic sonar slightly overestimated height, mainly when plants are small, which may result from the uneven surface of the measured field trial. The slope of the regression line was close to one, and the RMSE was 2.98 cm, and this implies that the use of ultrasonic sonar is useful to estimate plant heights of space-planted perennial ryegrass single plants. of ultrasonic sonar is useful to estimate plant heights of space-planted perennial ryegrass single plants.

Individual Plant Herbage Yield Model Evaluation
The R 2 for linear regression models of log-transformed DHY estimation from NDVI ranges between 0.35 and 0.73 (Figure 6a,c). The variation in accuracy of seasonal HY estimation indicates that NDVI as an estimator can be limited by saturation at high seasonal HY accumulation. A linear relation between DHY and plant height showed variable R 2, which was between 0.15 and 0.65 ( Figure  6b,d). Lower estimation accuracy of the mid and late-spring of 2018 (R 2 = 0.12-0.15). Figure 7a,b showed seasonal linear regression models of DHY estimation from NDVIsq_PH that improved the R 2 averagely with up to 10% in 2017 and 24% in 2018 compared to plant height and NDVI alone. Looking at the combined seasonal estimation models, yearly estimation values of DHY from NDVIsq_PH were consistent with the individual season models (R 2 = 0.67) (Figure 8a,b), indicating the possibility of applying the global model of HY estimation instead of specific (seasonal) models.

Individual Plant Herbage Yield Model Evaluation
The R 2 for linear regression models of log-transformed DHY estimation from NDVI ranges between 0.35 and 0.73 (Figure 6a,c). The variation in accuracy of seasonal HY estimation indicates that NDVI as an estimator can be limited by saturation at high seasonal HY accumulation. A linear relation between DHY and plant height showed variable R 2, which was between 0.15 and 0.65 (Figure 6b,d). Lower estimation accuracy of the mid and late-spring of 2018 (R 2 = 0.12-0.15). Figure 7a,b showed seasonal linear regression models of DHY estimation from NDVIsq_PH that improved the R 2 averagely with up to 10% in 2017 and 24% in 2018 compared to plant height and NDVI alone. Looking at the combined seasonal estimation models, yearly estimation values of DHY from NDVIsq_PH were consistent with the individual season models (R 2 = 0.67) (Figure 8a Tables 3 and 4 show the details of cross-validations (CVs) of multiple linear regression models to estimate FHY and DHY yield, respectively. Looking at Table 3 Tables 3 and 4 show the details of cross-validations (CVs) of multiple linear regression models to estimate FHY and DHY yield, respectively. Looking at Table 3 Tables 3 and 4 show the details of cross-validations (CVs) of multiple linear regression models to estimate FHY and DHY yield, respectively. Looking at Table 3 The MPE% of 2018 K-fold CV was higher than that of 2017. Similarly, random splits of the yearly dataset into 10,000 simulated repetitions of 60%/40%, 70%/30%, and 80%/20% indicated similar R 2 values (0.64-0.69) like the K-fold CV in both experimental years. The RMSE and MAE values of the 10,000 random splits were similar to the K-fold within the year but significantly higher in 2017 than 2018. In the DHY estimation, K-fold and random split CV showed relatively similar accuracy for the R 2 values (0.66-0.67) of the 2017 and 2018. The RMSE, MAE, and MPE values from the K-fold and random split CV were in the range of 7.53-7.60 g,~5.44 g and~22%, respectively, for the 2017 dataset (Table 4). This indicated K-fold CV (at different values of K) and the and 10,000 simulated random splits of 60%/40%, 70%/30% and 80%/20% showed no sign of variation in these values irrespective of the sample size. Similarly, RMSE, MAE, and MPE values of the 2018 dataset were 5.44-5.52 g, 3.59-3.67 g and 28%, respectively. The RMSE and MAE of the CV values were generally smaller in 2018 than the 2017 dataset, whereas the MPE was reverse.

Herbage Yield Prediction
The scatterplot in Figure 9 shows an example of the graphic representation of observed vs. the predicted DHY. The DHY validation shows the data points generally closer and slightly deviated from the 1:1 line, indicating DHY prediction methods may be used to accurately estimate HY of individual ryegrass plants.

Discussion
Current, HY phenotyping is time-consuming and relies mainly on visual scoring. HTP platforms that would improve the speed and accuracy of HY phenotyping have, so far, only been applied at the plot level [20,29], and no one has conducted phenotyping at the individual plant level except on trees [38][39][40]. Various sensor-based assessments of phenotyping at the individual plant level is required to implement molecular breeding techniques, like GS. However, the implementation of GS implies cross-validation on the evaluation of the genetic and phenotypic breeding values of individuals for accurate selection of superior genotypes [41,42]. GS, in perennial ryegrass, has currently followed the trend of evaluating the HY of sward plots of cultivars and breeding lines [43,44]. The phenotypic selection at individual plants level is required to explore the full potential of GS programs. Sensorbased phenotyping technologies offer a non-destructive, high-throughput and efficient assessment of HY replacing current inefficient phenotyping methods [5]. In this study, we explained the potentials of sensor-based HTP technologies to estimate FHY and DHY through the development and cross-validation of various prediction models.
The seasonal HY distribution in the 2017 and 2018 showed that HY from the 2017 harvest dates was higher compared to 2018 in all seasons. The notable decline in dry matter yield as the year progressed may result from the drought stress and repetitive cutting [45].
Plant height can be obtained using manual and ultrasonic sonar-based measurements [26][27][28]46]. Automation of plant height measurements using sensors offers an excellent advantage for assessing large breeding populations rapidly and objectively [5]. A high correlation was observed between plant height from a standard ruler and an ultrasonic sonar sensor. Similar results of R 2 (0.56-0.7) from UAS-derived grassland height and RMSE of 3.50 cm from LiDAR-derived wheat height were reported. However, other reports achieved higher R 2 (0.90-0.98) applying ground-based LiDAR on

Discussion
Current, HY phenotyping is time-consuming and relies mainly on visual scoring. HTP platforms that would improve the speed and accuracy of HY phenotyping have, so far, only been applied at the plot level [20,29], and no one has conducted phenotyping at the individual plant level except on trees [38][39][40]. Various sensor-based assessments of phenotyping at the individual plant level is required to implement molecular breeding techniques, like GS. However, the implementation of GS implies cross-validation on the evaluation of the genetic and phenotypic breeding values of individuals for accurate selection of superior genotypes [41,42]. GS, in perennial ryegrass, has currently followed the trend of evaluating the HY of sward plots of cultivars and breeding lines [43,44]. The phenotypic selection at individual plants level is required to explore the full potential of GS programs. Sensor-based phenotyping technologies offer a non-destructive, high-throughput and efficient assessment of HY replacing current inefficient phenotyping methods [5]. In this study, we explained the potentials of sensor-based HTP technologies to estimate FHY and DHY through the development and cross-validation of various prediction models.
The seasonal HY distribution in the 2017 and 2018 showed that HY from the 2017 harvest dates was higher compared to 2018 in all seasons. The notable decline in dry matter yield as the year progressed may result from the drought stress and repetitive cutting [45].
Plant height can be obtained using manual and ultrasonic sonar-based measurements [26][27][28]46]. Automation of plant height measurements using sensors offers an excellent advantage for assessing large breeding populations rapidly and objectively [5]. A high correlation was observed between plant height from a standard ruler and an ultrasonic sonar sensor. Similar results of R 2 (0.56-0.7) from UAS-derived grassland height and RMSE of 3.50 cm from LiDAR-derived wheat height were reported. However, other reports achieved higher R 2 (0.90-0.98) applying ground-based LiDAR on wheat and cotton, aerial based-RGB (R 2 = 0.87-0.98) on barley [23,24] and ground-based sonar sensors (R 2 = 0.86) on cotton [47]. Most of these studies conducted at the plot level were of the canopy structure of plants growing as a sward plot and may be different from space-planted single plants.
The correlation between log-transformed seasonal DHY vs. NDVI of individual ryegrass plants was moderate, with R 2 ranging between 0.36 and 0.73 across seasons. The significant variations of R 2 (up to 37%) of estimation accuracies occur among the seasons; mainly, it could result from the effect of saturation at the late growth stage [19]. In previous sward studies, similar R 2 values (0.38-0.56) of NDVI vs. DHY were reported in alfalfa, Bermuda grass and tall fescue [12,28,48]. However, R 2 values in these studies were slightly lower compared to studies on grain crops, including wheat, barley, and maize [49][50][51]. This could be related to the vigour of canopy growth habit difference between forage and grain crops, where NDVI could be influenced [52].
Plant height is another widely used structural and physical parameter used to estimate the biomass of various crops [20,23,53,54]. Previous studies on grasslands indicated moderate to high correlation (R 2 = 0.59-0.81) on predicting dry herbage yield from plant height measurements [45]. In the current study, the R 2 of plant height vs. log-transformed DHY was more variable (R 2 = 0.12-0.65) than NDVI across seasons. Moreover, correlations were slightly lower compared to previous results in the literature [13,55]. Specifically, lower accuracies were obtained from the spring 2018 experiments due to early flower initiation from drought stresses. In this study, we believe the surface of these small plants may have reflected weak sound echoes, which made it difficult for the ultrasonic sonar to detect accurate distance from the top canopy of the plant. Furthermore, when the plant canopy is a mixture of leaves and sparse reproductive tillers there is a greater variability in height and the surface that interacts with the sensor. Similar results of low accuracies were reported to detect thin spikes of wheat at the time of maturity compared to clustered young leaves [26].
The NDVI values explain the spectral properties of the plant canopy, whereas plant height is related to the vertical structure and growth rate of a plant [53,56]. Combining the physical and spectral parameters was successful in grasping spectral and structural information and improved the regression estimation of aboveground biomass [20]. The results show combining NDVI and plant height (NDVIsq_PH) data exhibited up to 10-24% improvement in prediction accuracy of DHY than using either NDVI or plant height alone. Schaefer and Lamb [12] investigated the combination of NDVI and LiDAR plant height to improve the estimation accuracy of tall fescue HY by up to 20%. However, this study made on single measurement experiment without looking at combined information of changes in plant performance for multiple growing seasons.
The validation of the model performance of observed vs. predicted values of FHY and DHY showed similar accuracies across all the K-fold (k = 2, 5, 10, and 20) and random split (60/40%, 70/30% and 80/20%) with R 2 = 0.63-0.70 across the two seasons dataset. We excluded FHY results in some of our results since data points deviated similarly from the 1:1 line, indicating that either FHY or DHY prediction methods will be enough to estimate HY of individual ryegrass plants. However, FHY results were included in the cross-validation results to match up with some of the results of the whole GSS result with only FHY datasets. Our cross-validation results were consistent with the results of [15], who applied random split CV to estimate fresh and dry biomass of barley from the fusion of NDVI and plant height. The authors obtained (R 2 = 0.67-0.79) from linear regression. The R 2 , RMSE, MAE and MPE of yearly regression models performed similarly irrespective of the CV type used, suggesting that models developed are stable and reliable across all K-fold and 10,000 repetitive random splits. However, regression models from previous studies showed variability in model performance at different random split [15,23,24]. The reason for this was, a small number of samples (n = 48-194) were used in these studies, and experiments were performed with not more than three harvests compared to the current study (n = 1626-1900) with a total of eight seasonal harvests in two years. Several other studies indicated combined VIs (other than NDVI) and plant height performed better for biomass estimation than VIs alone [13,15,20,57].
In the current study, year-for-year RMSE, MAE and MPE values of the CV showed variations in values. For instance, the RMSE and MAE values of the 2017 dataset were more significantly higher than the 2018 dataset, suggesting the plants from the 2017 harvest dates were larger with larger FHY and DHY values. It was impossible to compare the RMSE and MAE values from our results with the literature as there is no work done on single plants in the previous research where works focused on measurements of kg per plot or in a given area (square metre/hectare). Contrary to the RMSE and MAE, the MPE values from the 2017 dataset were slightly smaller compared to the 2018 dataset.
We found a high correlation by combining NDVI and plant height to predict DHY of individual ryegrass plants at the 2-3 leaf growth stage. This was proved by cross-validation using multiple linear regression model, and we still need to apply more non-linear regression methods that include machine learning techniques with more datasets from larger study samples may improve the prediction models further and be able to rank genotypes non-destructively and objectively.

Conclusions
This study is the first to integrate various sensors on aerial and ground-based phenotyping platforms that implement combining spectral and structural information to estimate HY of space-planted perennial ryegrass in the context of the large-scale breeding program. The results of this study demonstrate that the best prediction of DHY of space-planted perennial ryegrass single plants come from the multiplicative combination of NDVI and Plant height (NDVIsq_PH). The K-fold and random split CV findings imply that the combination of NDVI and plant height improved prediction accuracy over the use of NDVI and plant height alone. This yielded an accuracy of the coefficient determinations of HY estimation more than 0.63 and RMSE for FHY, and DHY was less than 33 g/plant and 8 g/plant, respectively. The application of various K-fold and random split methods do not change the prediction accuracy. We propose this non-destructive and rapid HY phenotyping technique to replace the current slow and time-consuming HY yield phenotyping methods. Therefore, this method can be assessed for effective estimation of thousands of individual plants and develop prediction equations that can be used for phenotypic ranking of genotypes for HY.
Author Contributions: P.B., J.W., K.S. and G.S. conceived the experiment. A.G. performed the experiment. A.G. and P.B. organised and processed the PhenoRover system and UAS data. K.G. and A.G. developed the R-programmed script and finalised the data analysis. A.G. wrote the paper. P.B., J.W. and K.S. contributed to the overall experimental design and supervision. All authors assisted in editing and improving the paper.
Funding: This research was funded by Agriculture Victoria and Dairy Australia through the DairyBio initiative and The University of Melbourne.