Machine Learning Algorithms to Predict Forage Nutritive Value of In Situ Perennial Ryegrass Plants Using Hyperspectral Canopy Reﬂectance Data

: Nutritive value (NV) of forage is too time consuming and expensive to measure routinely in targeted breeding programs. Non-destructive spectroscopy has the potential to quickly and cheaply measure NV but requires an intermediate modelling step to interpret the spectral data. A novel machine learning technique for forage analysis, Cubist, was used to analyse canopy spectra to predict seven NV parameters, including dry matter (DM), acid detergent ﬁbre (ADF), ash, neutral detergent ﬁbre (NDF), in vivo dry matter digestibility (IVDMD), water soluble carbohydrates (WSC), and crude protein (CP). Perennial ryegrass (Lolium perenne) was used as the test crop. Independent validation of the developed models revealed prediction capabilities with R2 values and Lin’s concordance values reported between 0.49 and 0.82, and 0.68 and 0.89, respectively. Informative wavelengths for the creation of predictive models were identiﬁed for the seven NV parameters. These wavelengths included regions of the electromagnetic spectrum that are usually excluded due to high background variation, however, they contain important information and utilising them to obtain meaningful signals within the background variation is an advantage for accurate models. Non-destructive ﬁeld spectroscopy along with the predictive models was deployed inﬁeld to measure NV of individual ryegrass plants. A signiﬁcant reduction in labour was observed. The associated increase in speed and reduction of cost makes targeting NV in commercial breeding programs now feasible.


Introduction
Using hyperspectral sensors in crop research is increasingly common for complex traits that multispectral sensors have failed to describe, because these sensors capture a large amount of information without the need for destructive harvesting [1,2] Non-destructive measurement removes many time consuming and costly steps from data capture and analysis. As a result, it makes an appealing option for phenotyping, particularly for quantitative traits, the improvement of which SVM has been used to predict nitrogen uptake 1999, dry matter, and crude protein in grass and clover forage with R2 between 0.90 and 0.98 [17]. Another commonly used machine learning algorithm is Random Forest Regression, which creates thousands of regression trees and averages all the outputs for the prediction of dependent variables [20]. This technique has been applied to predict NDF 2019, acid detergent fibre (ADF), and lignin in tropical forage grasses [20].
Cubist is an alternative machine learning technique that is based on decision trees, with data partitioned into units of similar spectral signals and attributes with a hierarchy of rules determining the partitions [28]. Decision trees work well for simple discrete classification but less well for continuous measures. To address this problem Cubist uses decision trees that instead of ending in a binary decision 2011, end in a regression equation [28,29]. The rules have the formula of a boolean statement, an action for when true, and an alternative action for not true, (if[], then[], else[]). These rules divide data into similar classes which can then be more easily analysed with linear regression [29]. Cubist has been demonstrated to be an accurate alternative to PLSR and is ideal to be used for analysis of hyperspectral datasets [30]. Cubist models have been successfully utilised in other disciplines of agriculture and soil sciences; however, Cubist has not been tried as an approach for predicting NV values in forage plants from a field-based breeding nursery [30,31]. Cubist models are also able to provide the wavelengths utilised and a percentage of usefulness to prediction which makes this technique less of a "Blackbox" approach than other machine learning modelling options.
The aims of this study were to: (i) Use data mining techniques to extract biophysical parameters of perennial ryegrass from hyperspectral canopy data; (ii) Identify specific wavelengths important for modelling NV parameters in perennial ryegrass; (iii) Evaluate the predictive ability of Cubist models to analyse NV parameters with an independent dataset; (iv) Assess advantages of the machine learning approach for data analysis as well as potential limiting factors; (v) Demonstrate the use of the developed predictive models to analyse NV parameters from the canopy spectra of a large study population of 2880 plants.

Study Site
All samples used in this study are from a perennial ryegrass field trial in Hamilton Victoria 2019, Australia (37.819440 S, 142.062171E. Fifty experimental varieties of perennial ryegrass were grown as plots of 96 individual ryegrass plants, with ten replicates of each plot. Spectral measurements from 960 of these plants were collected at four harvest dates over the course of the growing season of At each harvest a subset of 128 plants were cut immediately after scanning, then dried at 60 • C for 48 h, then ground using a 1 mm grate for laboratory based NIR analysis. Seven nutritive value (NV) parameters were analysed using a Foss XDS analyser ® including Ash 2019, crude protein (CP), in vivo dry matter digestibility (IVDMD), neutral detergent fibre (NDF), acid detergent fibre (ADF), water-soluble carbohydrates (WSC), and dry matter percentage (DM. Sixty-five plants were discarded from analysis as they had died between measurements or were too low in biomass for lab-analysis. A set of 156 data points from previous harvests of the trial were included to expand the calibration, making a total of 605 plants. Convex hull and Mahalanobis distance were then used to identify spectral outliers which were removed from the analysis if the H value was over 0.6, with sixty-five samples excluded [32]. In total, 540 samples with both lab results and spectra were used in this experiment to build and test the NV predictive models. The seven NV parameters were then predicted using scanned spectra obtained from 2880 samples (960 plants measured four times over the growing season.

Spectra Collection
The canopy spectra of 960 plants were collected at each harvest date using an ADS ® FieldSpec Hi-Res 4 (Boulder 2019, CO, USA) with a 10 • lens and scrambler. Spectra within the visual-NIR (350 nm to 2500 nm) range was recorded ( Figure 1). For each sample, the spectra were measured 50 times Remote Sens. 2020, 12, 928 4 of 15 and averaged. The spectrometer was calibrated after measuring each plot of 96 plants, approximately every 20 min. A light shield was used to reduce background spectral signals from the environment. The shield consisted of a 56 cm tall cylindrical plastic bin with a diameter of 45 cm, painted inside with matte black paint (Black 2.0©), and fitted with three tungsten halogen lights with spectral range of 300-2500 nm [22]. The light shield was equipped with a sensor holder that insured the sensor was always perpendicular to the ground and at 56 cm from the sample, creating a field of view of 79 cm 2 . A full description of the light shield is provided in Smith et al. 2019. The light shield and halogen lights were used instead of sunlight as a source of irradiance as this method was shown to be more successful for creating predictive models [21,22].
Remote Sens. 2018, 10, x FOR PEER REVIEW 4 of 15 every 20 min. A light shield was used to reduce background spectral signals from the environment. The shield consisted of a 56 cm tall cylindrical plastic bin with a diameter of 45 cm, painted inside with matte black paint (Black 2.0©), and fitted with three tungsten halogen lights with spectral range of 300-2500 nm [22]. The light shield was equipped with a sensor holder that insured the sensor was always perpendicular to the ground and at 56 cm from the sample, creating a field of view of 79 cm2. A full description of the light shield is provided in Smith et al. 2019. The light shield and halogen lights were used instead of sunlight as a source of irradiance as this method was shown to be more successful for creating predictive models [21,22].

Spectra Data Pre-Processing
The software used for the pre-processing and model development was R version 3.5.3, reflectance spectra was trimmed, leaving between 400 nm to 2450 nm, this was done to remove the regions at the ends of the sensor range which contain a lot of background variation. The spectra were then filtered to every 5th wavelength, this decision was a balance between reducing the dimensionality of the data to prevent overfitting and retaining high spectral resolution so that important information is not lost, as hyperspectral reflectance data is highly autocorrelated and spectral variance captured at 1 nm resolution should still be present at 5 nm. To optimise the signal to noise ratio, Savitzky Golay smoothing was applied, with an interval width of 11 nm [33]. To reduce the impact of light scattering, a spectral scatter correction technique, standard normal variate (SNV) was used to scale each spectrum based on their standard deviation and mean [32].

Splitting Data as Model Calibration and Validation
R version 3.5.3 was also use for the data splitting; Conditional Latin hypercube sampling was used to split the samples with corresponding lab results into a calibration set of 75% and a validation set of 25% [32]. The calibration set included 405 samples and the validation set included 135 samples. A processing example of the cubist model is available upon request to the first author.

Spectral Model Development
Models were developed with Cubist algorithms using pre-processed spectra from the 405 calibration samples (Figure 2).

Spectra Data Pre-Processing
The software used for the pre-processing and model development was R version 3.5.3, reflectance spectra was trimmed, leaving between 400 nm to 2450 nm, this was done to remove the regions at the ends of the sensor range which contain a lot of background variation. The spectra were then filtered to every 5th wavelength, this decision was a balance between reducing the dimensionality of the data to prevent overfitting and retaining high spectral resolution so that important information is not lost, as hyperspectral reflectance data is highly autocorrelated and spectral variance captured at 1 nm resolution should still be present at 5 nm. To optimise the signal to noise ratio, Savitzky Golay smoothing was applied, with an interval width of 11 nm [33]. To reduce the impact of light scattering, a spectral scatter correction technique, standard normal variate (SNV) was used to scale each spectrum based on their standard deviation and mean [32].

Splitting Data as Model Calibration and Validation
R version 3.5.3 was also use for the data splitting; Conditional Latin hypercube sampling was used to split the samples with corresponding lab results into a calibration set of 75% and a validation set of 25% [32]. The calibration set included 405 samples and the validation set included 135 samples. A processing example of the cubist model is available upon request to the first author.

Spectral Model Development
Models were developed with Cubist algorithms using pre-processed spectra from the 405 calibration samples ( Figure 2).

Model Validation
The model predictions and observed lab results were compared and several validation indices were derived to determine model performance, including mean square error (MSE) which depicts the model bias, root mean square error (RMSE) which depicts the model accuracy, Lin's concordance correlation coefficient (LCC), and the correlation coefficient or R2 [34].

Model Prediction of Nutritive Value (NV)
Once the model prediction ability had been assessed, the best models were then used to predict the NV parameters in all plants which had been scanned for canopy reflectance (2880.

Model Variable Usage and Importance
Cubist provides wavelength usage statistics which gives the percentage of times a wavelength was used either in a condition or in a linear model [28]. The usage includes wavelengths used in predictive models created at each split of the tree and therefore also includes each variable used in the current split or any split above it [28].

Cubist Model Comparison to Partial Least Square Regression (PLSR) Model
In order to assess the advantages of the machine learning approach for data analysis 2012, the process was compared to a previously validated traditional approach, partial least square regression

Model Validation
The model predictions and observed lab results were compared and several validation indices were derived to determine model performance, including mean square error (MSE) which depicts the model bias, root mean square error (RMSE) which depicts the model accuracy, Lin's concordance correlation coefficient (LCC), and the correlation coefficient or R2 [34].

Model Prediction of Nutritive Value (NV)
Once the model prediction ability had been assessed, the best models were then used to predict the NV parameters in all plants which had been scanned for canopy reflectance (2880.

Model Variable Usage and Importance
Cubist provides wavelength usage statistics which gives the percentage of times a wavelength was used either in a condition or in a linear model [28]. The usage includes wavelengths used in predictive models created at each split of the tree and therefore also includes each variable used in the current split or any split above it [28].

Cubist Model Comparison to Partial Least Square Regression (PLSR) Model
In order to assess the advantages of the machine learning approach for data analysis 2012, the process was compared to a previously validated traditional approach, partial least square regression (PLSR. We have previously explored the use of non-destructive spectroscopy to assess NV in forage using PLSR as the intermediate modelling step. This previous study had very similar methodology, Remote Sens. 2020, 12, 928 6 of 15 except the sample size was much smaller and the predictive models were developed with PLSR using the software WinISI ® . To compare the predictive ability of cubist to PLSR, spectra of the total 109 samples used in the earlier study were run through the cubist models to predict seven NV parameters. As the earlier PLSR models were developed with much lower sample numbers, to make comparison fairer the PLSR models were redeveloped using the same calibration set of 405 sample used for the cubist models. The results given by the cubist models and the PLSR models were then compared to lab results of the 109 samples.

Descriptive Statistics and Evaluation of Model Performances for Key Nutritive Traits
After accumulating a library of spectra and corresponding lab results, Cubist models for the seven NV parameters ADF, ash, NDF, CP, IVDMD, WSC, and DM were created. Models for the all parameters showed decent predictive ability with R2 between 0.60 and 0.82, and LCC between 0.73 and 0.89 for the calibration results (Table 1), and for the independent validation R2 between 0.66 and 0.82, LCC between 0.82 and 0.89. The WSC model showed the lowest predictive ability with R2 of 0.49 and LCC of 0.68 ( Table 1). The variability of NV parameters found in the samples used to build and validate the models are shown below in Table 2. The average, minimum, and maximum values of predicted NV results was slightly broader, showing that the models were able to extrapolate ( Table 2). The spectra of perennial ryegrass are highly variable, this variation comes from many sources, as each plant will have differences in leaf structure, water content, and other biophysical parameters all of which contribute to the spectral signature [35]. The changing environmental conditions over the growing season also increase the spectral variability. Though the spectra are highly variable 2017, the NV parameters do not have a wide range of values (Table 2). This illustrates the challenges of finding spectral response to biophysical parameters, as they are often less prominent than signals relating to the environmental components and the three-dimensional structure of the plant. The statistics given are, including mean square error (MSE) which depicts the model bias, root mean square error (RMSE) which depicts the model accuracy, Lin's concordance correlation coefficient (LCC), and the correlation coefficient or (R2. Parameters listed are (ADF), ash, dry matter (DM), crude protein (CP), in vivo dry matter digestibility (IVDMD), neutral detergent fibre (NDF), and water-soluble carbohydrates (WSC.)

Application of Models for High-Throughput NV Prediction
To compare difference in time taken to analyse a single sample between the lab-based approach and field-based approach required calculating the average time a single sample would take with either method. The time required to analyse a plant sample with lab-based spectroscopy was calculated by combining the total time for identification and hand cutting plants, oven drying the samples at 60 • C for 48 h, grinding the samples to a fine powder in a mechanical grinder with a 1mm grate, then scanning of all samples in a lab-based spectrometer. The total time taken to measure samples was then divided by the number of samples measured, averaging 15 min. The time required for analysis of a single plant with field-based spectroscopy was calculated by combining the time needed to identify plants, measure the reflectance spectra, and run the spectra through the predictive models. This time was then divided by the number of samples analysed averaging 30 s, making the field-based approach 30 times faster than the lab-based approach.

Key Model Drivers for Prediction
The Cubist models produce a list of variable importance, which is a combination of wavelength usage in the rule conditions for data splitting and wavelengths used in the regression [28]. The usage percentage of wavelengths for this study can be found in the additional information.

Cubist Model Comparison to PLSR Model
The predicted results from cubist models were compared to lab results 2012, the PLSR predicted results were also compared to lab results.
When comparing the predicted results of NV parameters determined using Cubist to lab results the models showed consistently stronger regressions than models created using PLSR with the same data set (Figure 3). The samples used in the above analysis were from the same field trial but measured in a previous year to all the samples that had been used in the model calibration, showing that the Cubist model is robust enough to cover multiple years of analysis. results were also compared to lab results.
When comparing the predicted results of NV parameters determined using Cubist to lab results the models showed consistently stronger regressions than models created using PLSR with the same data set (Figure 3). The samples used in the above analysis were from the same field trial but measured in a previous year to all the samples that had been used in the model calibration, showing that the Cubist model is robust enough to cover multiple years of analysis.

Data Mining Techniques to Extract Biophysical Parameters of Perennial Ryegrass
This study demonstrates that it is possible to predict NV parameters in large populations of perennial ryegrass grown in natural, outdoor conditions. The cubist models showed strong predictive statistics for all parameters with R2 between 0.49 and 0.82 and LCCs of between 0.68 and 0.89 for the validation of models with samples not included in their calibration ( Table 1). The minimum, maximum and average value for each parameter were calculated for both the 540 samples with lab results in the calibration set and the 2880 predicted values ( Table 2). The predictive models were able to cover the range of NV values included in the calibration but also extrapolate to predict higher or lower values if necessary.
As a pipeline for selection of high NV plants for breeding purposes, this system will be rapid and cost effective once the initial work of developing the models is complete, however, the initial cost of the equipment and lab-analysis of the calibration may still be prohibitively expensive. Portable spectrometers are comparatively expensive to lab-based systems and cover a similar range and resolution of wavelengths. The software used to analyse the spectra is open sourced.

Identify Specific Wavelengths Important for Modelling NV Parameters in Perennial Ryegrass
An advantage of Cubist model is that it provides the percentage of use for the wavelengths utilised by the model [30]. This identifies the most important wavelengths for each parameter and collectively for NV in ryegrass. The wavelengths selected were often from biophysically meaningful regions of the spectrum which is promising that the model will be robust for use in other field trials [36]. By routinely identifying wavelengths important to modelling NV 2013, the parsimonious wavelengths for each parameter can be identified. Identifying important wavelengths for each parameter, along with the percentage of usage could be used for further refinement of the models and for designing sensors with reduced range and resolution. For instance, this information can be used to develop a cheaper, lighter sensor that captures only the parsimonious wavelengths for forage NV. This would have the added advantage of reduced data dimensionality, removing unnecessary wavelengths to diminish the number of redundant variables in models [37]. Additionally, Cubist variable importance (percent usage) could potentially be used to develop customized multispectral cameras for capturing spectral images of samples in NV parsimonious wavelengths.
The key model drivers for prediction were varied and ranged across the entire electromagnetic spectrum from the visual range to long wave near infrared (for the wavelengths identified please see supplementary information, Table S1. Further work is needed to single out the parsimonious wavelengths for all NV parameters 2019, ensuring the wavelengths selected are related to chemical bonds within the targeted biophysical parameters to help reduce the inclusion of spectral noise in the predictive models, building on the previous studies that have identified wavelengths important in NV prediction [38][39][40]. When comparing the wavelengths identified in this study with wavelengths that had previously been identified in forage studies, there were many similarities. For ADF, some of the most important wavelengths for prediction are related to aromatics and aliphatic C-H stretches, O-H stretches and deformations which are all found in lignin, cellulose, and hemicellulose [40,41]. Other important wavelengths have been previously identified for ADF in models using stepwise multiple linear regression (SMLR) or MPLS [38][39][40]42,43].
Ash can be more difficult to analyse as the inorganic proportions are often not measured directly but rather an organic molecule that correlates to the inorganic component. Wavelengths in the visible range of the spectrum likely relate to chlorophyll 1990, whereas, wavelengths within the NIR region have been associated with lignin C-H stretches in starch molecules and C-H bends in lignin [39]. Some of the important wavelengths from the ash model have been previously identified by stepwise multiple linear regression (SMLR) as important for prediction of ash [38]. For IVDMD some of the key wavelengths have been linked to digestibility previously or are very similar to wavelengths identified by PCA and SMLR analysis of IVDMD in grass silage [39,44,45]. For NDF 2008, an important wavelength related to the O-H stretch in lignin, the O-H deformations in starch and N=H bends associated with protein. Some of these wavelengths have been previously identified to relate specifically to NDF which is known to correlate to IVDMD in forage [39,41]. Wavelengths in the visible range had previously been identified as important for MLR equations to determine NDF [43].
Some of the wavelengths most important for predicting CP included were within the visual range and are related to chlorophyll electron transition 2008, this may be due to the high protein content of chlorophyll [39,46,47]. Predictive wavelengths from the NIR region likely related to N-H asymmetry in protein, and the second overtone of N=H bends in protein [39,46,47]. Often wavelengths in the MIR region 1475-1575 nm are utilised in protein analysis relate to an amide I and Amide II region of the spectrum which may in this case be the 1550 nm and 1545 nm wavelengths [48]. Wavelengths that have been identified previously by stepwise multiple linear regression (SMLR) as important for prediction of CP or for prediction of nitrogen in forage were also identified in this analysis [38,49].
Unsurprisingly 2008, many of the wavelengths most important to predicting WSC have been associated with the O-H stretches and deformation in sugar and starch [39,46,49]. For DM, some of the most important wavelengths are associated with absorption by the C-H bond in oil molecules, though it is unclear how this may relate to DM. Many other selected wavelengths for DM have been linked to C-H stretches 2008, CH2 bends, and deformations associated with cellulose, sugar, and starch [39,44].

Evaluation of the Predictive Ability of Models Created Using Cubist to Analyze NV Parameters from an Independent Data Set
Splitting the total collected samples made it possible to see if the models could predict samples that had not been included in model training (Figure 1). The Cubist models were able to consistently produce results with stronger correlation to lab results than PLSR models for a dataset of samples harvested in a different year and from different cultivars of perennial ryegrass (Figure 3). This success is likely due to the machine learning algorithm that first separates the data into sets of similar samples. Studies of complex traits often find that in some instances using a machine learning approach produces models with better predictive ability than PLSR, and in other instances there is no difference. This discrepancy is thought to relate to the type of non-linear relationship that is targeted, and the quality of the data provided for the training set [50]. Both machine learning techniques and traditional chemometric techniques have advantages and limitations, PLSR is adversely affected by outliers, whilst machine learning can be prone to overfitting [18]. When finding the optimal modelling solution for complex traits or removing high background variation 2007, or both, it is necessary to trial a wide variety of different methods and techniques as well as the conventional approaches.

Advantages of the Data Mining Approach for NV Analysis as well as Potential Limiting Factors
Traditional chemometric approaches often include Stepwise Multiple Linear Regression (SMLR), PCA, PLSR in the analysis [51]. An advantage of SMLR is that it includes the entire hyperspectral range, unfortunately, the multi-collinearity and spectral overlap of biophysical parameters makes SMLR inappropriate for use in hyperspectral analysis of forage [51]. There is a danger in using too many wavelengths in analysis as the increase in dimensionality causes what is known as the Hughes phenomenon which diminishes the effectiveness of classifiers [52,53]. Principal component analysis and PLSR are often used together in spectral analysis 2015, with PCA used as a means of reducing the dimensionality of hyperspectral data so that the PLSR model is less prone to overfitting [38]. PLSR and modified PLSR are useful for multivariate regression to explain the relationship between multiple independent variables and dependent variables [38].
The Cubist model incorporates aspects of PLSR and decision tree modelling into one process where the binary decision tree first separates the data into spectrally similar sets 2009, making it more accurate to then fit the data to a one global PLSR model equation [28]. Another advantage of the Cubist model is its ability to utilise the entire spectra rather than removing the background variation. Areas of high variability such as the water bands are often removed from analysis. The previous study we conducted found that removing water bands from the PLSR regression created more accurate models [22]. Hydrogen-oxygen bonds in water show a high variation in intensity and wavelength frequency due to the shifting and bending of the molecule [54]. Temperature dramatically changes the absorbance and reflectance of energy in spectral water bands [55]. In laboratory conditions 2012, this phenomenon can be minimised by maintaining a standard temperature, but this is not possible in field conditions [56] Removing water from the plant tissue can make other spectral features easier to identify as the complex signal of water molecules can overshadow other biochemical signals [57]. When analysing field spectra 1988, the reflectance values in the range between 1800 nm to 1939 nm and between 2430 nm to 2500 nm show high levels of noise associated with water vapour and are often omitted from linear calibration strategies [38]. However, this region can contain important information relating to biophysical parameters [58]. With the introduction of aquaphotomics 2009, proposed in 2006 by the School of Bio-measurement, of Kobe University, Japan, the technique of removing wavelengths relating to water is in question [58]. In living tissue, water is the medium in which all other molecules are suspended, the structure of the water molecules responds to the presence of other molecules and this in turn changes the reflectance spectra of the water [58]. The differences in spectra associated with water structure may be identified with machine learning techniques and may overcome the problems of high spectral variation in these regions 2009, contributing to more accurate models for NV in living tissue [23,55].
Including a range of NV reference values in the calibration helps to improve the robustness of the models, therefore the data used in this study was sampled across different seasons to increase the variability of NV results [18]. Selection of appropriate data for model calibration is important to ensure the sample population is accurately represented by the calibration set, especially for heterogeneous, compositionally complex samples [18]. Simple random splitting of data will not guarantee appropriate selection 2010, so conditional Latin HyperCube sampling was used [32]. This approach is used to select optimal calibration samples through multidimensional consideration of wavelengths, which is important for the development of robust models. Conditional Latin HyperCube sampling ensures that the calibration dataset is matched with the population. As a result 2019, the calibration dataset captures the variability exhibited in each spectrum across all the samples.

Conclusions
This study demonstrates that it is possible to measure large sample numbers of individual plants in field conditions through capturing canopy spectra with a portable spectrometer and light shield. The results show that data mining techniques are effective for predicting NV results (Table 1) and suggests a pipeline for large scale NV analysis in the field (Figure 2). The Cubist models were able to extract biophysical parameters of perennial ryegrass growing in a natural, outdoor setting without disturbing the plants. The throughput that was achieved to sample a large data set of plants would be useful in selecting individual plants for an NV improvement program. With continued use and extension of the data available to the models, further refinements will be possible and greater accuracy will be expected. Issues of overfitting data will be mitigated with the anticipated larger data sets. The problem of overfitting will be further mitigated by identifying informative bands important for modelling nutritive value parameters and ensuring these bandwidths are attributed to logical, biophysical parameters and not background variation [52]. This method of analysis made it possible to derive NV results for 2880 samples of perennial ryegrass thirty times as quickly as analysis of this scale would normally take. Using this protocol to predict forage NV during crossing selection would make targeted high nutrition forage breeding possible.
Supplementary Materials: The following are available online at http://www.mdpi.com/2072-4292/12/6/928/s1, Table S1: The wavelengths (nm) identified by the Cubist as important for prediction of NV parameters. Each wavelength used in prediction is listed along with the parameter it was linked to, the percent usage in the cubist model, possible biophysical reasons for this wavelength to be useful and references to studies that have also used the wavelengths. Parameters listed are (ADF), ash, dry matter (DM), crude protein (CP), in vivo dry matter digestibility (IVDMD), neutral detergent fibre (NDF) and water-soluble carbohydrates (WSC). References [59][60][61] are cited in the supplementary materials.