Prospects for Measurement of Dry Matter Yield in Forage Breeding Programs Using Sensor Technologies

Increasing the yield of perennial forage crops remains a crucial factor underpinning the profitability of grazing industries, and therefore is a priority for breeding programs. Breeding for high dry matter yield (DMY) in forage crops is likely to be enhanced with the development of genomic selection (GS) strategies. However, realising the full potential of GS will require an increase in the amount of phenotypic data and the rate at which it is collected. Therefore, phenotyping remains a critical bottleneck in the implementation of GS in forage species. Assessments of DMY in forage crop breeding include visual scores, sample clipping and mowing of plots, which are often costly and time-consuming. New groundand aerial-based platforms equipped with advanced sensors offer opportunities for fast, nondestructive and low-cost, high-throughput phenotyping (HTP) of plant growth, development and yield in a field environment. The workflow of image acquisition, processing and analysis are reviewed. The “big data” challenges, proposed storage and management techniques, development of advanced statistical tools and methods for incorporating the HTP into forage breeding systems are also reviewed. Initial results where these techniques have been applied to forages have been promising but further research and development is required to adapt them to forage breeding situations, particularly with respect to the management of large data sets and the integration of information from spaced plants to sward plots. However, realizing the potential of sensor technologies combined with GS leads to greater rates of genetic gain in forages.


Introduction
Increasing the dry matter yield (DMY) of perennial forages remains a crucial factor underpinning the profitability of grazing industries [1], and is therefore a priority outcome for forage breeding programs.The rate of genetic gain in forage crops is lower (0.25-0.7% per year) than the main cereal crops (1.3% per year) [2][3][4][5].DMY of perennial forages have high genotypic and phenotypic variability, with repeated phenotypic assessment required, a challenge which is exacerbated by the current shortcomings of phenotypic assessment methodology [6,7].A typical forage breeding program requires yield to be estimated multiple times throughout the year and on plants sown in various context of breeding programs, indicating the potential of utilising sensor-based technology for DMY phenotyping in forage improvement programs.
This review is structured in four sections.Firstly, we review the status of phenotyping forage DMY.Secondly, we discuss the advantages of advanced sensors deployed on the ground and aerial platforms for evaluating forage plant traits.The HTP platforms involve the deployment of proximal sensors with high-resolution imaging technologies accompanied by automation and precision to measure plant performance in large populations established as individual plants, rows, plots or swards.Thirdly, we review the "big data" challenges resulting from HTP, storage methods and management.Finally, we summarise the workflow of image processing and analysis as well as a method for modelling data postprocessing techniques.

Current Phenotyping Status of Forage Dry Matter Yield and Yield Components
Forage DMY is expressed as the amount of dried biomass obtained after clipping and removing the moisture available in the fresh forage biomass [28].Methods of assessing DMY of forage can be grouped into direct and indirect measurement techniques.Direct methods that include sample clipping and mowing plots are widely used in breeding trials as a standard method to evaluate and phenotype individual plants, rows and swards plots in forage crops [10,11,[29][30][31].A range of indirect methods include rising plate meter, a ruler, a sward stick, visual score and others, which are useful to predict DMY, plant growth rate, plant height, number of tillers, leaf dimension and above-ground crown volume/density [6,[32][33][34].
In the indirect methods of estimating DMY, calibration is made by developing empirical relationships between indirectly measured and actual sample values.Most indirect methods use Current phenotyping methods lack the resolution for the precise discrimination of genetic effects when the difference between genotypes may be small relative to those due to environmental effects.For example, measuring a particular trait of interest (e.g., DMY) is challenging because measurement is limited by the following factors: (1) the number of sampled plants from sections of experimental plots may not be the best representative, considering the genetic variability that can appear among genotypes; (2) mowing off all the plots is destructive and time-consuming; (3) samples harvesting and subsequent weighing to determine fresh weight and oven-drying to determine dry weight, as well as the logistics used to transport, may add more cost.Thus, development of robust and precise in-field forage phenotyping has the potential to replace destructive and costly phenotyping techniques as well as to provide options to utilise traits that were previously too complex to be assessed easily by breeders [4,13].The estimation of forage yield is similar to the estimation of biomass, yield components and morphological functions of cereal crops [14,15].Thus, developing robust and precise high-throughput phenotyping (HTP) methods that can increase the number of samples and record accurate data throughout the life cycle of the crop needs to be a priority for forage breeding.
On the other hand, the development of next-generation sequencing technologies for forage species [16][17][18] with associated reductions in the cost of genotyping has seen the development of genomic selection (GS) strategies for perennial forage species [8,[19][20][21] with the potential to increase the rate of genetic gain and reduce the cost per unit of gain [21].It is also evident that DMY improvement in ryegrass is not only achieved by genetic factors but also with improving management techniques to allow the expression of the plant's genetic potential [22].However, realising the full potential of genomic selection will require an increase in amount and rate of collection of phenotypic data, and thus phenotyping remains a critical bottleneck and impediment to the implementation of GS in forage species like perennial ryegrass [21,23].
Limited attention to developing field-based, high-throughput phenotyping of forage crops has restrained the progress of '-omics' technologies for forage improvement [24].However, a review [25] indicated the potential, opportunities and challenges of sensor-based HTP tools in forage DMY, quality, drought and disease tolerance improvement.Other authors indicated the possibility of sensor-based HTP to determine perennial ryegrass DMY and plant base area of regrowth after cutting [26,27].These methods have been validated in experiments with small populations but have not been optimised to screen larger field trials.Recently, various sensors, vehicle platforms, data loggers and real-time kinematic global navigation and satellite systems (RTK-GNSS), along with processing pipelines, have been used to capture phenotypic data in the field from single plants, rows and sward plots of 270,000 perennial ryegrass genotypes [12].Most of these have been demonstrated in the context of breeding programs, indicating the potential of utilising sensor-based technology for DMY phenotyping in forage improvement programs.
This review is structured in four sections.Firstly, we review the status of phenotyping forage DMY.Secondly, we discuss the advantages of advanced sensors deployed on the ground and aerial platforms for evaluating forage plant traits.The HTP platforms involve the deployment of proximal sensors with high-resolution imaging technologies accompanied by automation and precision to measure plant performance in large populations established as individual plants, rows, plots or swards.Thirdly, we review the "big data" challenges resulting from HTP, storage methods and management.Finally, we summarise the workflow of image processing and analysis as well as a method for modelling data postprocessing techniques.

Current Phenotyping Status of Forage Dry Matter Yield and Yield Components
Forage DMY is expressed as the amount of dried biomass obtained after clipping and removing the moisture available in the fresh forage biomass [28].Methods of assessing DMY of forage can be grouped into direct and indirect measurement techniques.Direct methods that include sample clipping and mowing plots are widely used in breeding trials as a standard method to evaluate and phenotype individual plants, rows and swards plots in forage crops [10,11,[29][30][31].A range of indirect methods include rising plate meter, a ruler, a sward stick, visual score and others, which are useful to predict DMY, plant growth rate, plant height, number of tillers, leaf dimension and above-ground crown volume/density [6,[32][33][34].
In the indirect methods of estimating DMY, calibration is made by developing empirical relationships between indirectly measured and actual sample values.Most indirect methods use inexpensive instruments and are nondestructive to predict DMY, and are thus applied widely to estimate DMY in various pasture simulation and grazing experiments [34][35][36][37][38].Although these techniques are often easy to apply, estimations of DMY show inconsistent results with different species.For instance, plant density measured using a plate meter showed a low correlation (r = 0.21 to 0.41) with biomass across four different seasons of semiarid grassland [39], whereas a plate meter measuring plant height showed high (r = 0.79 to 0.94) correlation with DMY for tall fescue pasture [40].

Sensors
The typical sensors used for ground-based, high-throughput phenotyping (GB-HTP) include passive sensors (e.g., red-green-blue, hyperspectral, fluorescence and thermal) [40][41][42][43] and active sensors (e.g., GreenSeeker, ultrasonic sonar and LiDAR laser scanner [44][45][46], among others) [4,47].Active sensors possess a light-or sound-emitting unit whereas passive sensors use ambient sunlight as a light source.Active sensors, as a result, are independent of radiation differences so that they can operate in cloudy weather conditions.Active sensors often possess a narrow field of view (FOV), whereas passive sensors are less limited to FOV depending on the light source distance from target objects.Unlike active sensors, images obtained from passive sensors require radiometric calibration, adding more steps to the data-processing complexity.

Visible Digital Imaging
Visible digital imaging systems are mainly red-green-blue (RGB) sensors that use wavelengths ranging from 400 to 700 nm to capture two-dimensional (2D) images from the band ranges of red (630-690 nm), green (510-580 nm) and blue (450-495 nm) [40,48].Images from RGB require adjustment and correction of image brightness, as well as 3D structure reconstruction, followed by geometric and radiometric calibrations to minimise image variability and to capture information from overlapping plant parts.There have been several reports of the estimation of DMY from RGB imaging systems: (1) RGB image analysis through commercial software to calculate the number of pixels and compare with the actual weight measured [26].In this experiment, image analysis algorithms were developed in Math Lab to estimate DMY with significant correlations, r = 0.74 and r = 0.93, to visual scores and measured DMY, respectively.RGB images analysis methods were also used to estimate plant morphological parameters, like individual plants base area and tiller number, to understand the growth rate between two cuts of perennial ryegrass field trial [27].(2) Images from RGB cameras can be used to calculate the vegetative index (VI) and correlate it with measured DMY.The VIs are the spectral transformation of two or more bands designed for a qualitative or quantitative measure of vegetation properties [49].Limited work has been done so far in using RGB images for extracting VIs, but results showed a potential to estimate biomass [50].(3) Generating high-resolution digital surface models from RGB images through height maps can also be used to estimate DMY from plant height correlations.The retrieval of plant height from RGB images is a relatively new method of automated height measurement [51].Unmanned aerial system (UAS)-acquired overlapped images can be photo-merged and processed using the structure from motion algorithms to create a 3D point cloud using commercial software (i.e., PiX4D Mapper Pro, Lausanne, Switzerland, http://pix4d.com)[52].The resulting point clouds are mostly saved as TIFF/CSV files to allow digital surface model (DSM) information to be available for export.A study describing the automation of perennial ryegrass height estimation using the DSM technique indicated a high correlation (R 2 = 0.54-0.63)between a 'herbometre' (a tool similar to rising plate meter [38]) and DSM, offering an alternative option to automating ryegrass plant height estimation [53].

Multispectral and Hyperspectral Imaging
Multispectral and hyperspectral imaging technologies are the most common HTP tools which may be deployed on the ground or aerial platforms.Multispectral imaging captures image data from a few specific narrow spectral bands (usually less than 10).Some multispectral imaging systems may be used to capture information at the RGB and near-infrared spectrum simultaneously.Hyperspectral imaging captures data from a broader range of spectral regions (400-2500 nm, usually more than 10 narrow bands).The use of continuous bands can be beneficial in using high-resolution images to describe the crop canopy in a more discrete and detailed manner.
The use of hyperspectral imaging systems to collect data across a broad range of wavelengths could have the advantage of refining old spectral indices and developing new VIs that may achieve higher correlations with morphological traits and DMY.Hyperspectral imaging systems operate to capture an image in the form of push-broom or snapshot sensor form.Hyperspectral push-broom systems work with a 2D detector array scanning the full spectrum simultaneously with spatial line scanning over time.Push-broom sensors collect spectral information on moving platforms in the form of scanned images or multiple points of the fibre optic spectrometer.The drawback of this sensor is that it requires moving along a platform (e.g., conveyor belt) where the movement of the platform may cause noise.Snapshot sensors are nonscanning sensors in which the entire data of an object can be captured in a single 3D data cube integration [54].Snapshot sensors can avoid artefacts and noises since there is no requirement of movement while capturing the spectrum.Snapshot techniques, on the other hand, may not filter information, and thus image noise could be included.Generally, hyperspectral imaging systems are expensive and require technical knowledge to process the data.The use of continuous bands can be beneficial in using high-resolution images to describe the crop canopy in a more discrete and detailed manner.Therefore, application of machine learning and artificial intelligence to utilise all spectral information rather than simple VIs could have the advantage of refining new VIs that can achieve higher correlations with morphological traits.
Several VIs from multispectral and hyperspectral imaging systems have been calculated and utilised for nondestructive estimation of plant biomass [55][56][57].For instance, the normalised difference vegetative index (NDVI) quantifies the vegetation density, greenness and health by calculating the difference in reflectance at the near-infrared and red region to the sum of reflectance at red and near-infrared regions [58].The potential and limitations of NDVI for nondestructive forage DMY estimation are known [59][60][61][62][63].However, at high canopy density, it may saturate and be less able to discriminate the difference in biomass yield [64][65][66][67].This limitation can be alleviated by (1) applying narrowband vegetative indices like the red edge spectral reflectance, that could better estimate biomass than NDVI alone with less saturation at high biomass level [68]; (2) combining NDVI with other biophysical parameters (e.g., plant height) to overcome saturation and achieve accurate estimation [67,[69][70][71].The green-red, normalised difference vegetative index (GRNVI) has been shown to have a positive correlation with biomass estimation of corn, alfalfa and soybeans [72].(3) There is an increased interest to use hyperspectral data to overcome the effect of saturation at higher DMY.Several narrow-band spectral signatures obtained from hyperspectral images may have a potential to indicate the subtle variations in reflected energy so that the canopy difference can be detected.This may lead to the introduction of new indices that may overcome issues associated with the saturation of NDVI in estimating DMY.The application of machine learning (ML) algorithms to refine and validate new VIs that can perform well across any growth stage may have great contribution in this regard [73].

Ultrasonic Sonar
Ultrasonic sonar, also called ultrasonic sonar height (USH), [74] measures the distance from a sensor to the plant canopy by detecting the time delay of sound echoes returning to the sensor from plant surfaces.Ultrasonic sonar enables significant improvement in sampling time and sample size compared to the manual measurement of canopy height.USH has been used for height measurement in cotton [75,76], blueberry [77,78], wheat [70,79], barley [38] and pasture grasses and legumes [35,80].The use of ultrasonic sensors for plant height estimation may, however, be affected by the angle of divergence of the target plant (i.e., the field view becomes larger when the distance between the sensor and the targeted canopy increases, and the field view becomes smaller when the distance between the sensor the targeted canopy decreases).This results in a reduced accuracy of measurement.Similarly, the use of sonar estimates of height for biomass prediction may be affected by canopy structure [81] depending on the number and dimensions of a single leaf or a group of leaves that generate echoes [82].

Light Detection and Ranging (LiDAR)
Light Detection and Ranging (LiDAR) works based on laser light to measure the shortest distance to the target object from the LiDAR scanner (light source) by analysing the time of emission and the time of reflected light detection [83,84].LiDAR technology provides an option to create and construct 3D point cloud data for filtering and segmenting canopy volume as well as for modelling plant architecture [85][86][87].In recent years, LiDAR sensors have been utilised by researchers for the high-resolution and accurate high-throughput phenotyping of a range of forests and cereal crops.LiDAR can rapidly estimate canopy height, width, volume and other structural parameters [45,88,89].The application of LiDAR covers a diverse range of crops from fruit trees and forests [90][91][92][93][94] to field crops and pasture grasses [95][96][97][98].In perennial ryegrass, ground-based LiDAR was used to measure variation among 12 cultivars with high accuracy of fresh and dry biomass estimation (R 2 = 0.76-0.78)[99].In this study, it is noted that only 12 cultivars in 30 rows were evaluated with a ground-based platform where simultaneous measurement of thousands of plots and cultivars will require the development of aerial LiDAR systems [100].
LiDAR was used in forage management systems with Miscanthus giganteus Keng under static and dynamic modes to measure plant height [101].The authors indicated that accurate height measurement using LiDAR compared to manual measurements with an error of 4.2% for static and 3.8% for the dynamic mode.Measured plant height using LiDAR also correlates well with biomass for some forage crops (e.g., tall fescue), indicating the potential application of LiDAR to estimate biomass per unit area [69].LiDAR measurements, combined with other sensors, for instance, plant height measured using LiDAR combined with NDVI (height × NDVI), may be used to improve biomass estimation [69].
LiDAR is an existing technology that can fulfil the current phenotyping demands of perennial ryegrass and other forage breeding programs, but its expense limits its application.Installations can cost up to several thousand dollars (though this is decreasing), depending on the complexity and performance parameters of the technology [102] as well as its technical demands [4,103].Most LiDAR is incapable of giving measures of plant physiological attributes.Therefore, combining georeferenced LiDAR data with data from different sensor types that look on physiological properties may be required to evaluate the overall plant growth performance in the field.However, it is sometimes challenging to combine data that come from different sensors with various temporal and spatial ranges of resolution and units of measurement [104].

Ground-Based Platforms
A platform is a facility or a physical carrier where proximal remote-sensing sensors are mounted for the acquisition of phenomic data (Figure 2).Platforms are important for increasing and accelerating the capture of phenotypic information so that genotypic correlations could be made under diverse environmental conditions.Over recent years, several ground-based, high-throughput phenotyping (GB-HTP) platforms (both automated and semiautomated) have been developed and deployed for phenotyping various crops, including maize, wheat, Bermuda grass and lucerne [68,103,105].These platforms are equipped with sensors mounted on bicycles, robots and other vehicles (e.g., tractors and other all-terrain vehicles).These vehicles often deploy multiple sensors that enable data capture of plots, rows and individual plants [46,106,107].Moreover, GB-HTPs typically have a higher payload that provides access to many sensors with relatively minimum additional data-postprocessing requirements [46,106,107].However, GB-HTPs require a longer time for data capture than unmanned aerial systems.The vehicles can cause soil compaction through repeated traversing-a variation of environmental conditions (such as temperature, light, wind) from the start to the end of the data capture period, which makes thermal imaging systems unapplicable.There is also less precision in output data due to speed variations and an inability to collect data from waterlogged and rough surfaces.

Aerial Platforms
Aerial platforms may include various UAS types carrying different sensors, including digital cameras, multispectral imaging, hyperspectral imaging systems and LiDAR (Figure 3) (for detail on the integration of sensors to UASs see review in Tables 2 and 3 by [108] and [109], respectively).UASs have the potential to rapidly measure ground cover, plant height, biomass and leaf area index [110][111][112][113][114][115].Another aerial platform that can perform phenotyping may include satellites, where they have the advantage of covering large areas at a time.However, satellites have mostly lower temporal and spatial resolution compared to UASs.It is also challenging to obtain satellite imagery at frequent time intervals without cloud interference.Furthermore, UASs have a lower cost of buying and deploying compared to the launching and buying of the high-resolution satellites [116].UASs fly at a lower altitude than satellites (even < 100 m) allowing reasonable resolution and may cover large areas of measurement in limited timeframe compared to GB-HTP.UASs are affordable and available for repeated data collection without impacting the soil or plants [117,118].

Aerial Platforms
Aerial platforms may include various UAS types carrying different sensors, including digital cameras, multispectral imaging, hyperspectral imaging systems and LiDAR (Figure 3) (for detail on the integration of sensors to UASs see review in Tables 2 and 3 by [108] and [109], respectively).UASs have the potential to rapidly measure ground cover, plant height, biomass and leaf area index [110][111][112][113][114][115].Another aerial platform that can perform phenotyping may include satellites, where they have the advantage of covering large areas at a time.However, satellites have mostly lower temporal and spatial resolution compared to UASs.It is also challenging to obtain satellite imagery at frequent time intervals without cloud interference.Furthermore, UASs have a lower cost of buying and deploying compared to the launching and buying of the high-resolution satellites [116].UASs fly at a lower altitude than satellites (even < 100 m) allowing reasonable resolution and may cover large areas of measurement in limited timeframe compared to GB-HTP.UASs are affordable and available for repeated data collection without impacting the soil or plants [117,118].
Limitations related to UAS phenotyping include operating safety regulations that limit payload and mode of operation [107].Regarding payload and the flight specifications, some countries have restrictive flight operation rules, and this limits their application [116].As a result, some researchers still prefer to invest in ground vehicles to increase the payload even though these are usually more expensive than UASs.However, recently UAS aviation regulation authorities have started to relax their rules.For instance, the Australian Civil Aviation Safety Authority (CASA) recently reduced licensing costs and introduced less strict legal and operational conditions for UASs [119].Limitations related to UAS phenotyping include operating safety regulations that limit payload and mode of operation [107].Regarding payload and the flight specifications, some countries have restrictive flight operation rules, and this limits their application [116].As a result, some researchers still prefer to invest in ground vehicles to increase the payload even though these are usually more expensive than UASs.However, recently UAS aviation regulation authorities have started to relax their rules.For instance, the Australian Civil Aviation Safety Authority (CASA) recently reduced licensing costs and introduced less strict legal and operational conditions for UASs [119].
In summary, the development of GB-HTP and aerial-based phenotyping platforms for forage breeding is promising and can provide reliable methods for nondestructive, cost-and time-efficient phenotyping methods for economically important traits.However, further research is required before these methods are routine.Calibration models need to be built that are robust across different environmental conditions and seasons to ensure applicability in multiharvest, multienvironment breeding programs.Moreover, sensors generate large amounts of data, and current processing methods often require manual interfacing and interaction from skilled operators [109,120].Cost of data processing and analysis is another thing to be considered when developing phenotyping platforms and the automation of data extraction, analysis and interpretation is likely to aid the routine application of these methods in commercial programs.Image processing, data management techniques and statistical analysis and integration of sensors data will be covered in the following sections.

Image Processing
Sensor-based phenotyping platforms capture images with the interaction of electromagnetic radiation and the plant organ.The reflectance values vary with a wavelength of the radiation and chemical composition of the plants where many of the captured data require calibration, validation and standardising of the data-processing pipeline to produce quantitative values of phenotypes.In summary, the development of GB-HTP and aerial-based phenotyping platforms for forage breeding is promising and can provide reliable methods for nondestructive, cost-and time-efficient phenotyping methods for economically important traits.However, further research is required before these methods are routine.Calibration models need to be built that are robust across different environmental conditions and seasons to ensure applicability in multiharvest, multienvironment breeding programs.Moreover, sensors generate large amounts of data, and current processing methods often require manual interfacing and interaction from skilled operators [109,120].Cost of data processing and analysis is another thing to be considered when developing phenotyping platforms and the automation of data extraction, analysis and interpretation is likely to aid the routine application of these methods in commercial programs.Image processing, data management techniques and statistical analysis and integration of sensors data will be covered in the following sections.

Image Processing
Sensor-based phenotyping platforms capture images with the interaction of electromagnetic radiation and the plant organ.The reflectance values vary with a wavelength of the radiation and chemical composition of the plants where many of the captured data require calibration, validation and standardising of the data-processing pipeline to produce quantitative values of phenotypes.However, noise and variation in sensor position during the capturing of images can create difficulty in standardising of raw images.Thus, pre-data collection sensor calibration and preprocess images correction influence the final accuracy of quantitative phenotypic data output.However, the captured and preprocessed image data require technical image processing software and machine learning algorithm which is a challenging scenario to put the pipeline developed in place and extract biologically meaningful quantitative data easily [116].Many general image-processing software and hardware solutions are available for phenotyping images from GB-HTP [121][122][123][124].Most of the image-processing software requires processing workflow procedures, including image segmentation, classification, image calibration (geometric and radiometric) and extraction features in the form of geographic information system (GIS) deliverable data.

Geometric Calibrations
Geometric distortion effects from the original images from cameras are common in remotely sensed data due to the altitude differences between the camera and the target position on the ground.The traditional geometric calibration method is based on ground control points (GCP) on the experimental site.The GCPs are visible marks on the ground and are mostly set with high RTK global positioning systems [125,126].Geometric calibration with the manual matching of GCPs can be time-consuming but correlates well with onboard RTK positioning [127].There is also a possibility to perform geometric calibration without GCPs; this can be achieved by repeated observation and low-precision positioning and orientation systems data [108,127].The limitation of this method is that the errors may be high, and precision can be reduced with small field trials or measurements on individual plants.

Radiometric Calibrations
Sensor-based captured images require calibration and correction to minimise the variation due to change in brightness, cloudiness or reflectance surface temperature during imaging time.This correction involves the matching of a calibration reflectance panel (e.g., MicaSense Inc., Seattle, WA, USA) that has different reflectance percentages to orthomosaic images [117].Several approaches to the use of radiometric calibration panels have been used depending on the data acquisition and extraction methods used [117,126,128,129] for calibration of acquired images to the known standard reflectance value in contrast to the well adopted methodology used with satellite imagery [130].

Segmentation
Image processing starts with segmentation and classification of raw images into the background (soils, debris and so forth) and foreground (mainly plants and plant parts) ( [131], Figure 4).The segmentation of images involves training and validating of the software in order to partition digital images so that it can decide on the grouping of pixels into plants and background based on geometry, texture, intensity and colour [131,132].Alternatively, raw images can be transformed to another form of domain like hue, saturation, value (HSV), canny edge detection and Zhang-Suen thinning algorithm for segmenting plants from the soil, plant structure and plant geometry [133,134].Image-processing software (e.g., Pix4D, eCognition Developer 9, Trimble, Munich, Germany) is used to create georeferenced orthomosaic photos and then erase the background through an object identification software application to remain with the plant features.
primary traits.Plant indices can be then calculated and processed for single plants, rows or plots based on user-defined georeferenced areas allowing for the quantitative measurements of various biophysical traits (Figure 4).Images and data obtained can be preprocessed to remove noise and artefacts and stored in a server for further processing.Images can be further identified, segmented and classified from the background and nontarget objects using the software.In the post-megadata processing, machine learning through predictive models would develop to the framework for data analysis and analysed data can be used to rank, visualise and screen at individual, row and sward plot levels.

"Big Data" Storage, Management and Challenges
High-throughput phenotyping produces large volumes of data (big data) through sensors-based imaging and scanning.Big data is defined as 'big' when it becomes hard to process the data with the existing powerful processing tools [136].Most data that come from large field trials incorporate multisensor captured images and scanned data in 100 s and 1000 s of gigabytes.Therefore, it is sometimes challenging to remove noise, process and analyse big data at the same time.The effectiveness of data processing and analysis of big data can be improved by developing pipelines and scripts in combination with commercially available software packages to allow for automation of quantitative data extraction in a fast and shorter time frame.The main problem with big data processing comes from calibration effects (internal sensor calibration, deployment calibration and environmental noise calibration).One option we should take into consideration is being able to be selective regarding information to be computed from sensors and avoiding low efficiency caused by non-trait-related spectral ranges.The other option is moving away from these simple indices and getting the software to create predictive models using all available information captured.
To fulfil the demand for proper big data acquisition, storage and backup for reanalysis, global data management and support tools are critical [137].For instance, global data management and support tools like the "Minimum Information About a Plant Phenotyping Experiment" (MIAPPE) can be developed to store, protect and allow the retrieval of data [138,139].Data stored in global data management systems and support tools can also be freely downloaded by scientists for validation, further analysis, the design of new experiments and decision-making.To fulfil the standards of which and what phenotypic information to be in the global data management and support tool system, scientists, private funders and the public sector must agree on the standards and essential requirements to be included.Funding and institutionalisation of a global, sensor-based phenotypic Images and data obtained can be preprocessed to remove noise and artefacts and stored in a server for further processing.Images can be further identified, segmented and classified from the background and nontarget objects using the software.In the post-megadata processing, machine learning through predictive models would develop to the framework for data analysis and analysed data can be used to rank, visualise and screen at individual, row and sward plot levels.

Feature Extraction
Quantitative extraction of target features from processed images, on the other hand, involves object identification, image colour combination, geometry and texture, as well as local and global 3D point cloud creation [135].The features extracted are mainly used as a potential method to identify plant organs and characterise the organs to extract relevant features accordingly.Feature extraction results in creating a series of secondary traits like plant biophysical indices that can be obtained from the orthomosaic image as point files (shapefile and TIFF files) which are used for the prediction of primary traits.Plant indices can be then calculated and processed for single plants, rows or plots based on user-defined georeferenced areas allowing for the quantitative measurements of various biophysical traits (Figure 4).

"Big Data" Storage, Management and Challenges
High-throughput phenotyping produces large volumes of data (big data) through sensors-based imaging and scanning.Big data is defined as 'big' when it becomes hard to process the data with the existing powerful processing tools [136].Most data that come from large field trials incorporate multisensor captured images and scanned data in 100 s and 1000 s of gigabytes.Therefore, it is sometimes challenging to remove noise, process and analyse big data at the same time.The effectiveness of data processing and analysis of big data can be improved by developing pipelines and scripts in combination with commercially available software packages to allow for automation of quantitative data extraction in a fast and shorter time frame.The main problem with big data processing comes from calibration effects (internal sensor calibration, deployment calibration and environmental noise calibration).One option we should take into consideration is being able to be selective regarding information to be computed from sensors and avoiding low efficiency caused by non-trait-related spectral ranges.The other option is moving away from these simple indices and getting the software to create predictive models using all available information captured.
To fulfil the demand for proper big data acquisition, storage and backup for reanalysis, global data management and support tools are critical [137].For instance, global data management and support tools like the "Minimum Information About a Plant Phenotyping Experiment" (MIAPPE) can be developed to store, protect and allow the retrieval of data [138,139].Data stored in global data management systems and support tools can also be freely downloaded by scientists for validation, further analysis, the design of new experiments and decision-making.To fulfil the standards of which and what phenotypic information to be in the global data management and support tool system, scientists, private funders and the public sector must agree on the standards and essential requirements to be included.Funding and institutionalisation of a global, sensor-based phenotypic data management system will, therefore, help scientists to connect the genomic advancement achieved with the new phenotypic metadata to accelerate breeding [41,46,140].

Statistical Modeling
Application of HTP platforms to phenotype breeding trial at large scales presents challenges of data processing, data handling, data storing and application of suitable statistical methods and interpretation.Statistical and mathematical methods are required to analyse data from HTP platforms.The choice of statistical methods ranges from simple regression and partial least squares to advanced ML tools depending on the objective and hypothesis being tested.Several studies applied various regression models to calculate the relationship between measured biomass and spectral reflectance data [50,141,142].These studies used regression models with limited features considered, where noise that comes from atmospheric effects, temporal effects and sensors limit the interpretation of the data.The models also tend to describe the relationship from a specific harvest rather than focus on developing predicitve models.ML techniques evolved to deal with the noise from remote sensed data [143].ML involves the development of quantitative predictive models, splitting of data sets into training, validation and testing subsets, as well as selecting appropriate mathematical models to apply; this data can be used for ranking, visualising and screening forage genotypes (Figure 4).
ML is a relatively new data analysis method in agriculture [144] with promising results and a vast potential.Limited studies have used various ML algorithms to estimate forage grass biomass from proximal remote sensing.For example, the performance of support vector machine (SVM) and partial least squares regression (PLSR) was evaluated for biomass estimation of grassland using data derived from a field spectrometer [145].The authors suggested using PLSR as the most accurate model to estimate biomass.Another experiment compared the Multiple Linear Regression (MLR) and the Random Forest Estimation (RF) to estimate grass biomass from plant height models, RGB and VI features [146].Both ML techniques provided accurate estimation results.ML has not been validated in data to determine the biomass of forage species in a breeding program.
However, several cereal crops breeding programs have used ML algorithms to predict plant growth, plant height, biomass and leaf counts [147][148][149][150].The current applications of machine learning tools in plant science were reviewed [151] and indicated the significance of using SVM and artificial neural networks in identifying stress-tolerant genotypes.[126] Deep neural networks and ML have been used to accurately estimate vegetation indices for high-throughput phenotyping of wheat using aerial imaging [126].Therefore, ML is now an exciting technique that can be widely applied in forage data analysis for integration, interpretation and further quantification of phenotypic traits at large population field trials, particularly those that may be required for genomic selection.

Concluding Remarks
Current forage DMY phenotyping involves manual measuring of samples and mowing of plots and visual scoring.They are either destructive, costly, time-consuming or subjective.Recent developments in advanced sensors technologies are changing the course of phenotyping towards a rapid, large-scale and more accurate direction.Advanced field phenomics involves deployment of proximal sensors and imaging technologies in the field accompanied by high-resolution, precision and large population scale of measurement at individual plants, rows and plots levels.Images captured from the proximal sensors produce multidimensional big data sets that require calibration, validation and analysis to translate them into biologically meaningful measurements.The advanced phenotyping technologies are now capable of assessing large populations in a robust and cost-efficient manner [152].Sensor-based HTP of forage crops now attracts large investments from governments and nongovernment organisations with a long-term objective to increase the profitability of the pasture-based dairy and meat industry [12].Therefore, applications of sensor-based forage phenotyping, managing of the "big data" that comes out from the sensors and analysing and standardising of the data require more significant focus from scientists in the area.Application of sensor-based phenotyping can reduce the cost and speed up data collection, phenotyping at a scale we have never seen before and with high resolution.It will enable genomic selection and accelerate genetic gain for forage breeding.

Figure 1 .
Figure 1.An aerial image of perennial ryegrass genomic subsection field trial which comprises 48,000 individual plants at Hamilton Centre, Victoria, Australia.

Figure 1 .
Figure 1.An aerial image of perennial ryegrass genomic subsection field trial which comprises 48,000 individual plants at Hamilton Centre, Victoria, Australia.

Figure 2 .
Figure 2. The ground-based phenotyping platform named as "Phenorover" assembled in Hamilton Centre, AVR for field phenotyping of perennial ryegrass.(A) Semiautomated ground vehicle named Phenorover; (B) list of deployed sensors on Phenorover.

Figure 3 .
Figure 3.The aerial-based unmanned aerial system (UAS) phenotyping platform in Hamilton Centre, AVR for field phenotyping of perennial ryegrass.(A) List of various semiautomated UASs; (B) list of sensors deployed on UASs.

Figure 3 .
Figure 3.The aerial-based unmanned aerial system (UAS) phenotyping platform in Hamilton Centre, AVR for field phenotyping of perennial ryegrass.(A) List of various semiautomated UASs; (B) list of sensors deployed on UASs.

Figure 4 .
Figure 4.A diagram of workflow indicating image acquisition, processing and analysis.Image-and data-processing platform (in phenomics lab) requires data and image capture from an experimental field.Images and data obtained can be preprocessed to remove noise and artefacts and stored in a server for further processing.Images can be further identified, segmented and classified from the background and nontarget objects using the software.In the post-megadata processing, machine learning through predictive models would develop to the framework for data analysis and analysed data can be used to rank, visualise and screen at individual, row and sward plot levels.

Figure 4 .
Figure 4.A diagram of workflow indicating image acquisition, processing and analysis.Image-and data-processing platform (in phenomics lab) requires data and image capture from an experimental field.Images and data obtained can be preprocessed to remove noise and artefacts and stored in a server for further processing.Images can be further identified, segmented and classified from the background and nontarget objects using the software.In the post-megadata processing, machine learning through predictive models would develop to the framework for data analysis and analysed data can be used to rank, visualise and screen at individual, row and sward plot levels.