DNA-Based Assessment of Genetic Diversity in Grassland Plant Species: Challenges, Approaches, and Applications

: Grasslands are wide-spread, multi-species ecosystems that provide many valuable services. Plant genetic diversity (i.e., the diversity within species) is closely linked to ecosystem functioning in grasslands and constitutes an important reservoir of genetic resources that can be used to breed improved cultivars of forage grass and legume species. Assessing genetic diversity in grassland plant species is demanding due to the large number of di ﬀ erent species and the level of resolution needed. However, recent methodological advances could help in tackling this challenge at a larger scale. In this review, we outline the methods that can be used to measure genetic diversity in plants, highlighting their strengths and limitations for genetic diversity assessments of grassland plant species, with a special focus on forage plants. Such methods can be categorized into DNA fragment, hybridization array, and high-throughput sequencing (HTS) methods, and they di ﬀ er in terms of resolution, throughput, and multiplexing potential. Special attention is given to HTS approaches (i.e., plastid genome skimming, whole genome re-sequencing, reduced representation libraries, sequence capture, and amplicon sequencing), because they enable unprecedented large-scale assessments of genetic diversity in non-model organisms with complex genomes, such as forage grasses and legumes. As no single method may be suited for all kinds of purposes, we also provide practical perspectives for genetic diversity analyses in forage breeding and genetic resource conservation e ﬀ orts.


Introduction
Grasslands are among the most widespread land ecosystems of the planet, covering more than 50 million square kilometres, roughly 40% of the land surface [1]. In addition to their fundamental role as sources of roughage for ruminant livestock, grasslands provide important ecosystems services related to water (e.g., water provisioning and water flow regulation), climate (e.g., carbon sequestration and storage), soil quality (e.g., erosion prevention), biodiversity (e.g., habitats for pollinating insects) and cultural heritage [2]. They can be categorized according to their management intensity into natural, semi-natural, and improved grasslands. Natural grasslands are dominated by species-rich, herbaceous vegetation and developed with no or minimal human intervention (i.e., prevention of woody overgrowth caused by natural succession). Semi-natural grasslands depend on minimal human management, such as moderate grazing and occasional mowing [3,4]. Improved grasslands, on the other hand, are usually intensively managed and tend to harbour only high yielding forage crop cultivars [2]. Over the last century, the surface area covered by natural and semi-natural grassland has shrunk to make place for arable land or, in the case of some semi-natural grasslands, due to the abandonment of cultivation [2,5]. Demographic growth and changing dietary preferences are driving available, we describe exemplary studies on grasses and legumes, the two major grassland plant families used as forages. We finish by providing practical perspectives on using large-scale genetic diversity monitoring to inform forage breeding, improve genetic resource conservation efforts, and to deepen the study of grassland ecology.

DNA Fragment Size Analysis
Genetic diversity assessments based on DNA fragment analysis rely on the size variation among DNA fragments generated from different individuals or populations. Such fragments can be produced by restriction enzyme digestion (e.g., restriction fragment length polymorphism, RFLP [20]), by random PCR amplification (e.g., random amplified polymorphic DNA, RAPD [21]), or by a combination of restriction enzyme digestion, adapter ligation and PCR amplification (e.g., amplified fragment length polymorphism, AFLP [22]). Another approach to generate polymorphic DNA fragments is using primers that target general sequence patterns of coding regions (e.g., sequence-related amplified polymorphism, SRAP [23]). Polymorphic DNA fragments can also be generated by amplifying repetitive DNA, such as retrotransposons (e.g., inter-primer binding site amplification, iPBS [24]) or microsatellites (e.g., simple sequence repeats, SSR [25]), or by using the SSR sequences themselves as primers to amplify the sequence between two SSR (e.g., inter simple sequence repeats, ISSR [26]).
DNA fragment analysis methods can be divided into dominant and co-dominant marker systems, depending on whether only one (dominant) or all (co-dominant) alleles of a locus can be detected. An important advantage of dominant marker systems, such as AFLP, RAPD, iPBS, SRAP, and ISSR, is that markers can be produced at low cost and with no sequence information of the target species needed. This makes them the systems of choice for many grassland plant species where genomic sequence information is scarce. Although a major drawback of dominant DNA marker systems is that they are not able to detect heterozygotes, they have been successfully used to study the diversity of grassland plant species, such as orchardgrass (Dactylis glomerata L.), perennial ryegrass (Lolium perenne L.), Italian ryegrass (L. multiflorum Lam.), tall oat-grass (Arrhenatherum elatius L.), tall fescue (Festuca arundinacea Schreb., syn. Lolium arundinaceum and Schenodurus arundinaceus), meadow fescue (F pratensis Huds., syn. Lolium pratense and Schenodurus pratensis), Kentucky bluegrass (Poa pratensis L.), bird's-foot trefoil (Lotus corniculatus L.), red clover (Trifolium pratense L.) and white clover (T. repens L.; Table 1). Co-dominant marker systems, such as SSR, can detect heterozygotes and enable diversity analyses based on allele frequencies. Developing SSR requires significant input and a priori sequence information. However, once obtained, SSR primer sets can be easily standardized and used by multiple laboratories. SSR are highly polymorphic, and usually, 10 to 30 SSR primer combinations are used to assess the diversity of a grassland plant population (Table 1). Genetic diversity estimations obtained with SSR are generally higher than those of dominant marker systems [42]. SSR-based genetic diversity analyses have been reported for orchardgrass, perennial and Italian ryegrass, tall and meadow fescue, timothy (Phleum pratense L.), and white clover (Table 1).
Most economically relevant grassland plants are outbreeding, i.e., they are unable to self-pollinate. Consequently, grassland plant populations tend to be highly genetically diverse, and multiple plants are needed to obtain representative samples of the original population. Sample size is dictated by the frequency of the alleles to be detected. For outbreeding species, the frequency of a given allele in a sample follows a binomial distribution [68]. Thus, to sample alleles that are present in the original population with a frequency of >10%, approximately 30 plants would be necessary at a probability of p = 0.95 [68]. Mainly due to practical limitations, DNA fragment-based studies usually analysed 10 to 30 plants per population, which gave them enough power to detect alleles with a frequency of 25% to 10%, respectively. In tall fescue [69] and Kentucky bluegrass, as few as 16 or three plants per population, respectively, have been found sufficient to detect genetic differentiation among populations [54,58]. Such results suggest that frequent alleles are driving genetic differentiation among genetically distant populations. More plants may be necessary to assess genetic diversity and differentiation in closely related grassland plant populations and to detect rare alleles [68,70]. A useful guideline could be using samples of at least 60 plants, the recommended sample size to characterize new ryegrass cultivars [71]. The number of analysed fragments may also influence genetic diversity estimations. In principle, the more loci are analysed, the more likely it is that polymorphisms will be detected. However, as the number of polymorphisms reaches saturation, each additional fragment stops adding information. For example, 120 AFLP fragments were found sufficient to characterize genetic diversity in 15 grassland species [72].
An important limitation of DNA fragment-based methods is throughput. Throughput is defined as the amount of data that is produced simultaneously by a method. Typically, 5-10 polymorphic RAPD fragments or 20-30 AFLP can be reliably scored per primer combination used, while SSR primers yield from one to as many fragments (i.e., alleles) as the ploidy number of the species being analysed. Producing consistent fragment patterns across samples and laboratories is a critical issue for RAPD, and to some degree, for the other dominant marker systems [73]. While reproducibility is not a significant issue for SSR, SSR primers are usually species-specific and can normally be applied only on a single or a few related species (e.g., the Lolium-Festuca complex) [74]. Another limitation of fragment-based methods is their limited multiplexing potential, which reflects the maximum number of plants or markers that can be analysed in a single run. DNA fragments are usually detected by gel or capillary electrophoresis on instruments with up to 96 capillaries, in each of which up to five differently labelled samples can be analysed simultaneously. Bulked DNA approaches can further increase the multiplexing level by up to twenty plants per sample [33,45,56,62].
Despite the above-mentioned limitations, DNA fragment analysis methods offer a cost-effective alternative for genetic diversity assessments. Such methods are flexible enough to be applied in basic laboratory settings, for example, in field or breeding stations. They are also useful pilot-study approaches that set the guidelines for higher throughput analyses, which can detect genetic differentiation at a higher resolution.

Hybridization Array Methods
DNA hybridization arrays are high throughput methods that rely on the hybridization of labelled target DNA to a matrix of probes, which are bound to a solid surface. After removing non-hybridized target DNA, successful hybridizations are visualized by fluorescence or chemiluminescence-based methods. DNA hybridization arrays have a considerably larger throughput than fragment-based, allowing many thousands of loci to be assessed simultaneously. They have been widely applied in crop species to assess gene expression, for gene mapping or for analysing genomic loci related to quantitative traits (i.e., Quantitative Trait Loci, QTL) [75]. Hybridization array-based methods, such as SNP arrays and the Diversity Array Technology (DArT, [76,77]), have been used to measure genetic variation in grassland plant populations ( Table 2). Other hybridization-based, sequencing-independent methods, such as competitive allele-specific PCR (KASP ® ) [78] or TaqMan ® assays, have been mainly used as genotyping tools for trait selection studies and have limited potential for diversity analyses [79].

SNP Arrays
Single nucleotide polymorphisms (SNPs) are co-dominantly inherited mutations that occur at the single-base level in coding or non-coding regions of the genome [73,80]. SNPs are the most common kind of polymorphism and are identified by comparing the sequences of multiple plants from a population. They can be used to study the genetic diversity that is linked to gene function and phenotypic traits (i.e., functional genetic diversity), which may have direct applications in plant breeding [79].
SNP arrays consist of a set of SNP-containing oligonucleotides that are bound to a surface in a grid. The throughput of SNP arrays can reach hundreds of thousands of SNPs [79]. However, SNP arrays are available for only a few grassland plant species, mainly due to their high development costs. For example, an array with 2185 informative SNPs was developed and used to assess the genetic diversification of 90 ecotype accessions of perennial ryegrass from different European geographic regions [81]. Partitioning of the genetic variation showed the greatest amount of diversity to be attributed to individuals within accessions. In addition, the study found a strong differentiation of allele frequencies according to geographical location [81]. The same array was used to show that amenity and forage cultivars of perennial ryegrass are genetically differentiated from each other and that both represent a limited amount of the genetic diversity of European perennial ryegrass ecotypes [82]. In alfalfa, a 7476-SNP array was used to assess genetic differentiation of 280 genotypes, representing four cultivated subspecies of different ploidy levels, as well as related species. The study found strong differentiation according to alfalfa subspecies, ploidy level, and fall dormancy level [83].
An important limitation of SNP arrays for within-species diversity assessments is that they target a fixed set of variants, which may not be useful for applications where rare alleles are to be detected [75]. Such a bias towards frequent alleles is a kind of ascertainment bias and can be overcome to some extent by increasing the amount of SNPs on an array [84]. A reference genome can also be used to map SNPs and produce haplotypes, thus reducing the effect of ascertainment bias. Such a strategy was used to analyze the genetic relationships of 4506 wheat accessions from all over the world with a 280,226-SNP array, revealing three large gene pools that group most of the wheat diversity: an Eastern European and Mediterranean pool, an Asian pool, and a Western European pool [85]. Nevertheless, high-density SNP arrays depend on expensive oligonucleotide synthesis procedures, which makes them accessible only for major crop plants and substantially limits their use in grassland species where comprehensive genome information is often lacking.

Diversity Arrays (DArT)
In contrast to SNP arrays, diversity arrays (DArT) do not require a priori sequence information, which greatly reduces their development costs. DArT arrays are produced using "genomic representations" of the populations under analysis. Such genomic representations are generated by cutting the pooled DNA of multiple plants with restriction enzymes and then enriching certain fragments using selective PCR primers. The fragments are then cloned into a plasmid library and subsequently tethered to a surface (e.g., microscope slides) [76].
DArT arrays are particularly useful for organisms in which SNP information is scarce. For example, a DArT array based on 7-40 genotypes each of five species of the Festuca-Lolium complex produced 3884 polymorphic markers and was used to analyse the genetic relationships among a group of 184 samples of two ryegrass species (L. perenne and L. multiflorum) and three fescues (F. pratensis, F. arundinacea., and F. glaucescens Hegetschw.) [86]. The Festuca-Lolium DArT array (DArTFest) was able to differentiate each species and also capture the genetic diversity in each of those species, finding that fescues have lower levels of polymorphism than ryegrasses, which is in accordance to DNA fragment-based studies [32,42]. Low levels of genetic diversity were also found in turf-type F. arundinacea using the same array [87]. The DArTFest array has also been used for genetic mapping and to study the genetic constitution of Festuca-Lolium hybrids [88,89]. In crop plants, diversity arrays have been widely used to study genetic diversity in barley, sorghum and wheat, three cereal species with complex genomes [90][91][92].
Major drawbacks of diversity arrays include locus redundancy and ascertainment bias. Diversity arrays are made with DNA probes of several hundred base pairs in length [84,93]. Such probes are not sequence-characterized and usually show some degree of redundancy (i.e., the same fragment or portions of it are represented more than once) [84]. Ascertainment bias is also present in diversity arrays since they are produced from a reduced sample of the total diversity of a species and still can miss rare alleles, which can impact diversity estimates [84].

Assessing Genetic Diversity with Sequencing-Based Methods
Over the past two decades, DNA sequencing technologies have greatly increased in throughput. The output of high-throughput sequencing (HTS) technologies now reaches more than one terabase (10 12 bp) per run [94], a large difference when compared to the~10 5 bases of a 96-capillary Sanger sequencing run. In turn, the costs to obtain a megabase (10 6 bp) of a sequence have declined from several thousand US dollars in the early 2000s to less than ten cents since 2011 [95]. Such technological advances have enabled new approaches to assess the genetic diversity of plant populations with sequence-level precision. HTS technologies allow for larger multiplexing levels, which greatly increases the number of plants that are assessed simultaneously, especially when combined with bulked DNA approaches. Furthermore, ascertainment bias is reduced with HTS approaches, as SNPs are directly discovered on the sequences obtained, which enables rare SNPs to be identified. Table 2. Selected applications of hybridization array and high throughput sequencing (HTS) approaches to assess genetic diversity in plants.  DArT 1920 probes Whole-genome profiling

Species
The DNA from 33 cultivars and two wild accessions was analysed. The DNA from the two wild accessions and two Southeast Asian landraces were underrepresented on the array, and they were genetically differentiated from the rest of the cultivars, which clustered according to expected relationships. hybridum samples proved to be a mosaic of B. distachyon and B. stacei. [101] Whole genome re-sequencing~4 million SNPs Genetic diversity and pan-genome analysis 54 lines of B. distachyon were resequenced at a median coverage of 94×. The resulting pangenome was 58% larger (430 Mb vs. 272 Mb) and contained 40% more genes. Core genes were related to essential cell functions, while non-core genes were related to environmental adaptations.

SNP array 9277
Genetic diversity and population structure analysis Principal component analysis of 280 alfalfa (Medicago sativa L.) genotypes revealed grouping according to species and ploidy level. Tetraploids had higher heterozygosity levels. [83] Sequence capture 50 genes Phylogenetic analysis The sequences of 50 genes were recovered from samples of four Medicago spp. The alignment of such genes revealed known phylogenetic relationships between the Medicago spp. and an outgroup. [103] Northern switchgrass (Panicum virginatum L.) Sequence capture 1,590,653 SNPs Genetic diversity and population structure analysis 537 individuals from 45 upland and 21 lowland populations were genotyped. Population structure analysis revealed seven groups that match geographical origin and up-or lowland distribution. [104] Orchid (Cypripedium macranthos var. rebunense) Amplicon sequencing~1000 SNPs Genetic diversity analysis Eight samples from two populations of a Japanese orchid were genotyped with 16 MIG-seq primers. A principal component analysis of 209 SNPs was able to differentiate the two populations. Remarkably, the MIG-seq primers were also functional in a wide spectrum of organisms, including a pine tree, bamboo, mushroom, a copepod, a snail, a sea cucumber, and a lizard.

Plastid Genome Skimming
In plant cells, DNA is mainly contained in the nucleus. However, organelles such as mitochondria or plastids also contain DNA that codes for proteins involved in processes to maintain essential cell functions or photosynthesis. In contrast to the nuclear genome, plastid genomes are circular molecules of~150 kb that harbour~100-120 genes [106]. The genes in plastid genomes are under high selective pressure and do not usually undergo sexual recombination, so they tend to accumulate mutations at a slower rate than the nuclear genome and are therefore highly conserved among plants, with few exceptions [106]. The high conservation of plastid genes favours their use as phylogenetic markers and for species identification ( Table 2; [106]). Furthermore, organellar genomes are overrepresented several times in a cell when compared to the nuclear genome. Such overrepresentation of organelle genomes is the basis of plastid genome skimming, the recovery of plastid genomes from shotgun sequencing of plant DNA (Figure 1, green arrows). In the particular case of amplicon sequencing, library preparation can start either during PCR (i.e., using adapter-bound primers) or after PCR (i.e., through blunt-end ligation of the adapters). Sequence capture (blue arrows) starts by hybridizing modified RNA or DNA probes (yellow hexagons) to DNA fragments containing target loci; the hybridized DNA fragments are then pulled using magnetic beads that bind to the RNA or DNA probes. Whole genome re-sequencing (WGR; purple arrows) is performed by sequencing random fragments of the starting DNA. Plastid genome skimming (green arrows) Figure 1. Schematic representation of high-throughput-sequencing methods for genetic diversity assessments. Complexity reduction methods are highlighted in the grey oval. The plus signs indicate the stages where library preparation is usually performed (e.g., Illumina adapter and indexing incorporation). Reduced representation libraries (red arrows) are produced by digesting the starting DNA material with restriction enzymes. Amplicon sequencing (orange arrows) begins by amplifying the targeted loci from the starting DNA. Priming sites are shown in grey. In the particular case of amplicon sequencing, library preparation can start either during PCR (i.e., using adapter-bound primers) or after PCR (i.e., through blunt-end ligation of the adapters). Sequence capture (blue arrows) starts by hybridizing modified RNA or DNA probes (yellow hexagons) to DNA fragments containing target loci; the hybridized DNA fragments are then pulled using magnetic beads that bind to the RNA or DNA probes. Whole genome re-sequencing (WGR; purple arrows) is performed by sequencing random fragments of the starting DNA. Plastid genome skimming (green arrows) recovers the plastid genome sequences from the data produced by, for example, sequence capture or WGR.
The analysis of entire plastid genomes has been boosted by plastid genome skimming approaches based on HTS. The availability of plastid genomes has, in turn, enhanced phylogenetic resolution in many groups of plants, including the Pooideae, a group that includes all major cereal crops and forage grasses [107]. Plastid genomes also enhance species-level taxonomic identifications and can be used to produce the information needed to develop SSR [106,108]. In the model grass species Brachypodium distachyon (L.) Beauv., a total of 298 polymorphic sites and 32 haplotypes were found in 53 accessions, allowing for genetic differentiation among major clades [100]. To our knowledge, the utility of plastid genome skimming for genetic diversity assessments in forages species has not yet been thoroughly assessed.

Whole Genome Re-Sequencing
As sequencing costs per megabase keep dropping, analyzing genetic variation at the whole-genome level is becoming a reality. Once a high-quality reference genome is available for a species, additional genomes can be resequenced at a shallower depth, mapped to the reference genome and have millions of SNPs called on them [109]. Whole genome re-sequencing (WGR; purple arrows in Figure 1) was applied to assess the genomic diversity of Arabidopsis halleri L., an outcrossing plant species using a bulked DNA approach also known as Pool-Seq [97], in which the DNA from multiple genotypes is equimolarly pooled and sequenced [110,111]. Genome-wide SNPs had a good correlation with microsatellite allele richness, although SNPs are more evenly distributed and are likely to show a lower ascertainment bias. Such an even distribution of SNPs makes it possible that a few thousand randomly chosen SNPs can still reliably estimate genetic diversity and differentiation patterns [97].
An important feature of WGR is that it enables the study of pangenomes, i.e., the set of genes that exist in a species. A typical reference genome is sequenced from a single individual and, thus, contains a very narrow set of the total genomic variation that can be found in a species. Pangenome sequencing aims at complementing the reference genome of a species with genome sequences from additional individuals of the same or closely related species, sequenced at an intermediate coverage [112]. Pangenome analyses greatly increase the discovery of associations between agronomical traits and previously uncharacterized genes [112]. Furthermore, pangenomics also allows the characterization of other kinds of genomic variation, such as genomic duplications, deletions, or translocations [112]. For example, in wheat, the core genome (i.e., the genes shared by all specimens sequenced) of 18 wheat cultivars spanned 81,070 ± 1631 genes, while its pangenome spanned 140,500 ± 102 genes [113]. The variable fraction of the pangenome was enriched in genes related to agronomically relevant traits. In addition, 2.87 million SNPs were identified in the pangenome but not in the Chinese Spring wheat reference genome. In B. distachyon, 45% of the pangenome of 54 lines were not represented in the reference genome [102].
Pangenome approaches would be particularly useful to assess genetic diversity in grassland plant species, which are highly diverse and are distributed throughout contrasting geographical regions. Although the genomes of grassland plant species are large and will require a large amount of sequencing capacity, some reference genomes are already available for such species (e.g., red clover [114,115], perennial ryegrass [116]), and Italian ryegrass [117], which can help in assembling additional genomes with low coverage data. Furthermore, pangenomic information is becoming available for crop species with similarly large genomes, such as wheat, thanks to the joint efforts of international consortia (e.g., the International Wheat Genome Sequencing Consortium, [118]).

Reduced Representation Libraries
Reducing the complexity of an entire genome to a set of loci is a useful strategy to assess genetic diversity, particularly for species with a large genome size like many forage grasses and legumes. Such complexity reduction may be achieved through reduced representation libraries (RRLs; red arrows in Figure 1). RRLs are produced in a similar way as the genomic representations used for DArT (see above), except that, instead of being cloned into a plasmid library, the resulting fragments are used to produce next-generation sequencing libraries by PCR amplification. Such fragments, also known as restriction associated DNA-tags, are short sequences that are adjacent to a restriction enzyme cutting site. RRLs are at the core of methods, such as RADseq (restriction-site associated DNA sequencing [119]) or GBS (genotyping-by-sequencing [120]), among many others, which can produce tens of thousands of SNPs at very low cost [121].
In crop species, RRLs have mainly been used for marker-trait association analyses and genotyping [79]; however, applications of RRLs to study the genetic diversity of plant populations have been reported as well (Table 2). For example, GBS has been applied to assess the genetic diversity of natural populations of the self-compatible, model grass species B. distachyon [101]. Individual GBS libraries were produced from accessions of B. distachyon from different European countries, resulting in >50,000 high quality SNPs, which grouped the accessions into three genetically differentiated populations. Pooled GBS, an adaptation of GBS based on bulked DNA samples, has been used to assess segregation distortion in barley by pooling 76 genotypes to construct a single GBS library [98]. The 76 samples were also used to generate individual GBS libraries for validation and SNP allele frequencies obtained with the pooled samples was an accurate estimation of the frequencies obtained with individual samples. However, pooling of equal amounts of DNA or leaf material and sufficient sequencing depth are crucial for achieving consistent results [98].
Since no previous sequence information is needed to generate RRLs, they are suitable to analyse non-model species with a high density of SNPs. For example, pooled GBS was applied to monitor temporal changes of genetic diversity in experimental populations of perennial ryegrass by using multiple pools of 40 plants to construct GBS libraries [96]. Over the course of four years, swards sown with cultivar mixtures of perennial ryegrass changed their initial composition, while no considerable changes were detected in single-cultivar swards. Pooled GBS was also used to generate Genome Wide Allele Frequency Fingerprints (GWAFFs) in eight cultivars of perennial ryegrass [122]. The GWAFFs, which were based on the frequencies of >30,000 SNPs, successfully captured the genetic differentiation among the cultivars and always clustered together the replicates of the same cultivar [122]. Another remarkable pooled GBS application enabled the genotyping of 995 F 2 families of perennial ryegrass, each one composed of 200-500 plants [123]. After filtering the SNPs with low and high allele frequencies (0.02 and 0.98, respectively), 728,359 high-quality SNPs were obtained and used to assess the heritability of crown rust resistance and heading date, finding that the most accurate estimates of heritability were produced with the SNPs with the highest sequencing depth [123].
An important limitation of RRLs is the high prevalence of missing data [124]. A missing data point arises when a given SNP is not found in all samples, which complicates downstream analyses. High rates of missing data in RRLs lead to underestimation of the true genetic diversity of a population [125,126]. Nevertheless, the reproducibility of SNP calls can be enhanced by increasing sequencing depth and stringent SNP filtering [122]. Missing data can also be imputed by substituting missing values with an approximation [127]. Furthermore, although a certain ascertainment bias is also associated with RRLs, it is lower than with other methods, such as DArT arrays [84].

Sequence Capture
Sequence capture (blue arrows in Figure 1) is another complexity reduction method in which a set of short, biotinylated RNA or DNA molecules are used as probes (or baits) to hybridize with the sample DNA. After hybridization, the conjugates are pulled down with streptavidin-coated beads, resuspended and sequenced. Although also relying on the hybridization of probes and target DNA, sequence capture differs from hybridization arrays in that sequence capture hybridization occurs in an aqueous solution instead of over a surface and polymorphisms are detected by DNA sequencing [128].
Sequence capture approaches are highly customizable. Probes can be directed to specific parts of the genome (e.g., exons, intergenic regions, centromeres), so that different components of genetic diversity can be analyzed (i.e., neutral and non-neutral genetic diversity or chromosomal rearrangements). They can also be designed with enough sequence degeneracy to capture phylogenetically diverse samples [129]. Furthermore, the use of conserved orthologous genes and ultra-conserved noncoding elements as targets is a way to overcome the lack of sequence information for orphan species [130]. Sequence capture approaches commonly result in lower amounts of missing loci than RRLs [131] and has been successfully applied to study the intraspecific diversity of northern switchgrass (Panicum virginatum L.), Sabal spp. palms and Medicago spp. (Table 2).
Combining sequence capture with RRLs has been reported to reduce the amount of missing data on the latter [132,133]. In one study, RAD libraries were subjected to sequence capture, using baits to enrich a subset of 500 pre-defined RAD-tags in 288 fish samples. With this approach >16× coverage was achieved in all the target loci and samples [133]. A similar outcome was obtained by following a modified approach, achieving 99.8% recovery of 964 target loci at >4× coverage for over 90% of 96 tree samples [132].
A commonly cited disadvantage of sequence capture approaches is that they are relatively expensive [131]. However, library pooling strategies, proper bait design, and in-house bait production can help to reduce costs per sequence capture reaction [134,135]. Furthermore, reducing costs may be possible by using standardized sets of baits, with proved efficiency in a wide taxonomic spectrum.

Amplicon Sequencing
Once a polymorphic locus is identified within a species, primers can be designed to target its flanking regions to amplify and sequence it in individual samples. Amplicon sequencing is a complexity reduction method that has been applied to assess genetic diversity in various plant species (Table 2; orange arrows in Figure 1). Amplicon sequencing is a low-cost alternative for genetic diversity assessments since only standard PCR reagents are needed. However, the method also needs a discovery step to find the genomic loci that are suitable for PCR. For example, sequence capture was used as a discovery tool for conserved genomic loci in five genotypes of 16 species of forage grasses and legumes [136]. Such sequences were then used to identify loci showing large within-species variation and to design multi-species primers. Primer sequences used for other SNP-probing methods, such as the TaqMan™ assay, can also be used to generate within-species variable amplicons [137]. Under certain conditions, publicly available gene sequences can also serve as templates for primer design. For example, the sequences of the flowering time habit locus VrnH1 from two barley lines were obtained from a public repository and used for primer design [138]. In the same study, a single reference sequence of the grain protein gene HvNAM1 was also used for primer design since that gene is known to be polymorphic in lines with differing grain protein content [138]. Still, a discovery phase may not always be necessary for primer design. ISSR can be amplified from a wide range of species by using microsatellite repeats as primers. ISSR-sequencing or MIG-seq (multiplexed ISSR genotyping by sequencing) can produce~1000 SNPs per sample, which proved to be enough to assess genetic differentiation in populations of an orchid species [105].
The capacity of PCR-based methods can be further enhanced by multiplexing more than one pair of primers in the same reaction. Such approach is commonly called "multiplex PCR" and, for example, was applied as a genotyping tool for 192 loci on 2068 DNA samples of the fish Oncorhynchus mykiss in a single Illumina HiSeq 1500 run [137]. A multiplex PCR of >3000 primer pairs yielded a set of 2655 reproducible SNPs on two pooled samples of fugu fish (Takifugu rubripes), each with the same 326 individually tagged fishes [139].

Outlook for Genetic Diversity Assessments in Grassland Plant Species
A wide spectrum of tools is available to assess genetic diversity in plant populations. Such tools differ in terms of resolution (i.e., the fraction of the genome that is analysed), throughput (i.e., the amount of data that is produced simultaneously by a method), and multiplexing potential (i.e., the number of plants or markers that can be assessed simultaneously). Choosing a suitable method requires a careful balance between such methodological features and the exact purpose of the assessment. Adequate sampling of plant populations is also key to reduce ascertainment bias in genetic diversity estimations.
Genetic diversity analysis can assist forage breeders in many aspects. For example, it can be used to ensure sufficient diversity in the breeding gene pool, since low levels of genetic diversity may decrease the probability of further genetic gain [17]. Genetic diversity analysis can also be applied to select genetically diverse parents to optimize heterosis [140]. Many studies have demonstrated that the genetic differentiation between geographically distant ecotypes can be addressed with DNA fragment analysis. About 15-20 SSR or 2-10 AFLP primer combinations can produce enough polymorphic fragments to capture a meaningful proportion of the genetic diversity of a forage plant population (Table 1). Nevertheless, DNA fragment analysis has a rather low throughput (i.e., usually <20 polymorphic fragments per sample) and a low multiplexing potential (i.e., pools of 20-30 plants). In contrast, low-resolution, high-throughput sequencing approaches (e.g., amplicon sequencing) can be applied to characterize 100-1000 SNPs in highly multiplexed samples (i.e., pools of >100 plants). Such a method could, therefore, be useful as a cost-efficient, routine tool to assess genetic differentiation in the starting germplasm for forage breeding.
Genetic diversity assessments based on amplicon sequencing could be applied, for example, as a monitoring tool for breeding in mixtures. Mixture breeding aims at leveraging biodiversity effects and the interactions of multiple cultivars to increase yields and the resilience of grassland systems [141]. Keeping track of the genetic diversity when assembling and testing mixtures could greatly facilitate evaluating their performance [141]. In addition, such diversity assessments could also assist strategies to breed for several species in mixtures simultaneously to optimize interaction traits essential in multi-species grasslands [11].
Cultivar registration is another area where genetic diversity assessments could support traditional phenotypic assessments. Plant breeders need to characterize new varieties in terms of distinctness, uniformity, and stability (DUS). However, ensuring the distinctness of new cultivars of outcrossing forage cultivars is particularly difficult. Within-cultivar variation is usually high, and alleles can be widely spread among cultivars, which leads to lower genetic differentiation between cultivars. DNA fragment-based genetic diversity analyses often detect low levels of differentiation between gene pools and cultivars from diverse origins [142]. High-resolution, high-throughput approaches (e.g., RRLs), in contrast, can produce tens of thousands of SNPs on multiplexed samples, which could be used to generate GWAFFs to thoroughly characterize a novel forage variety at a low cost [143].
Large-scale genetic diversity assessments can also help in setting up and monitoring genetic resource conservation programs. Valuable genetic resources are at risk of disappearing as grassland intensification rates increase through some of the grassland biodiversity hotspots of the globe [144][145][146]. Programs to protect such genetic resources are in place in every continent, either in the form of protected areas (i.e., in situ conservation) or in large collections of germplasm (i.e., ex situ conservation). Genetic diversity information would provide an insight into the effectiveness of such conservation programs. In particular, HTS-based genetic diversity assessments (e.g., WGR, RRLs, sequence capture, and some amplicon sequencing approaches) could become informative and cost-effective tools to manage in situ and ex situ genetic resource conservation efforts, as they can be applied without much previous information [17]. WGR and pangenomic approaches could be especially relevant for assessing and incorporating the diversity of wild relatives of forage crops [147]. WGR approaches may add more genomic context information to SNPs, making it possible to detect variations in linkage disequilibrium (e.g., selective sweeps). Adding more genomic context information to SNPs could, in principle, also increase the power of genome-wide association studies [148].
In conclusion, there are many promising approaches to tackle the challenge of assessing the genetic diversity of grassland plant species. There is no one-size-fits-all solution for all kinds of assessments: the methods differ in their resolution, throughput and multiplexing potential; cost optimization will ultimately depend on the scope and purpose of the assessment. HTS approaches (e.g., RRLs, sequence capture, amplicon sequencing, and whole genome re-sequencing) hold great potential for routine genetic diversity assessments, as long as they can provide information of multiple plants at a low cost. Further increases in sequencing throughput could reduce analysis costs and enable routine monitoring programs that are currently prohibitively expensive [17]. Such large-scale monitoring programs will ultimately be of great value for forage breeding and genetic resource conservation efforts.