Developing Heterotic Groups for Successful Hybrid Breeding in Perennial Ryegrass

Perennial ryegrass (Lolium perenne L.), an important forage grass species in temperate regions, is genetically improved by population breeding. Although valued for their broad genetic base, the resulting synthetic varieties only partially exploit heterosis. Hybrid breeding offers opportunities to fix beneficial heterotic patterns more effectively and, hence, to increase the yield potential. A suspected bottleneck in the production of perennial ryegrass hybrids is the genetic intermixture of existing germplasm, impeding the definition of heterotic groups. In this study, selected parental populations of a diploid and tetraploid cytoplasmic male sterility (CMS)-based hybrid breeding program were characterized using genotyping-by-sequencing (GBS). Hybrid populations, derived from 26 parental combinations of the tetraploid breeding program, were tested for yield performance and compared to synthetic varieties at five sites over four growing seasons. The hybrids significantly outperformed the synthetics by 4.15% on average for total dry matter yield. Additionally, GBS revealed the existence of sub-populations within the tetraploid CMS germplasm. This sub-population structure represents the untapped potential that could be exploited for heterosis to further increase biomass yields. Here, we show that CMS hybrids generate substantial yield gains in perennial ryegrass and provide a method to further improve hybrid breeding, using GBS to select for heterotic groups.


Introduction
In temperate regions, perennial ryegrass (Lolium perenne L.) is the predominantly grown forage grass, supplying a major fraction of the roughage for beef and milk production [1]. It is valued for its high fodder quality and persistency but lacks reliable tolerance to cold, heat, and drought stress [1,2]. The development of hardy tetraploid varieties has mitigated some of these problems [2]. However, further improvement is needed as perennial ryegrass yields have recently suffered from weather extremes, damaging pastures across central Europe [3,4].
For allogamous forage grasses, the most widely used breeding strategy to produce cultivars is synthetic breeding [5,6]. Besides, in perennial ryegrass, this breeding strategy has proven successful for cultivar development and to meet the requirements for distinctiveness, uniformity, and stability (DUS) criteria [5]. At the beginning of each breeding cycle, [8][9][10][11][12][13][14][15][16] plants are selected for a population cross. population, and one tetraploid pollinator population. For the analysis of the genetic structure, 60 individuals of each of the four populations were randomly selected.
The yield trials consisted of 26 tetraploid hybrid populations, derived from 26 parental combinations of the tetraploid CMS hybrid breeding program. The 26 tetraploid parental populations (nine CMS and 17 pollinator populations, among them, the two used for genetic structure analysis) were crossed following a company-specific matrix scheme to produce F1-hybrid seeds. F1-hybrids were harvested from the CMS populations. Nine tetraploid synthetic varieties were used as standards within the yield trials. The populations, as well as yield trial information, were anonymized according to the common practice of the owner.

Genotyping-by-Sequencing
The tissue of young leaves was harvested with three pieces of 1 cm for each plant. DNA was extracted using a 96-well plate KingFisher Flex Purification System (Thermo Fisher Scientific, Waltham, MA, USA) with Mag-Bind ® Plant DNA DS 96 Kit (Omega Bio-tek, Inc., Norcross, GA, USA). The quality of isolated DNA was assessed on a 1% agarose gel and quantified with a NanoDrop 8000 spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA).
One 192-plex and one 60-plex GBS library of the 240 individuals were sequenced using 126 bp single-end reads on two lanes of an Illumina HiSeq2500 platform at the Functional Genomics Center Zurich, Switzerland. Reads were demultiplexed using sabre (https://github.com/najoshi/sabre), without mismatches allowed. The reads were then aligned to the draft genome of perennial ryegrass [27] using Bowtie2 (v2.3.2) [28]. The resulting alignment files were sorted and compressed before variant calling using SAMtools (v.0. 1.19) with the number of high-quality bases for each observed allele (DPR) reported [29].

Principal Component Analysis, Hierarchical Clustering, and Permutations
The data generated by GBS was analyzed with an in-house R-script. In brief, after filtering, samples with >10,000 single nucleotide polymorphisms (SNPs) were retained using the 'vcfR' package [30]. To account for differing ploidy levels, polymorphic loci were converted to allele frequencies (0-1). To compare between genotypes, complete pairwise correlations were used to account for missing values, which are common for GBS. Comparisons were measured using the Pearson correlation coefficient ('cor' function). The resulting correlation matrix was then converted to a Euclidean-based distance matrix ('dist' function) and subject to agglomerative hierarchical clustering ('hclust' function) using the 'ward.D' method. Hierarchical clustering was displayed using the 'factoextra R package' ('fviz_dend' function). The distance matrix was subject to principal component analysis (PCA, 'prcomp' function) and permutation analysis. For each permutation, 5-55 individuals were selected randomly, then the correlation between the SNP frequencies within each subset was compared to the whole population, as described above, using the 'cor' function. Each permutation step was subject to 1000 iterations. Differences within and between populations were tested by ANOVA ('aov' function), followed by a post hoc test ('TukeyHSD' function; 0.95 confidence interval).

Yield Trials
The 26 tetraploid hybrid populations were tested in five independent field trials sown and harvested in the years 2014-2016, 2015-2017, 2016-2017, containing two, four, and four tetraploid synthetics as standards, respectively. The varieties of standards were released in recent years, which were continuously exchanged in sets of two. Two trials had limitations in space and duration, which resulted in a reduced number of available plots and the use of two standards only for the two harvesting years. Each trial was organized as a fully randomized block design, with each entry in triplicate. The individual trials contained a unique set of F1-hybrids derived from crosses of the previous year.
Plot size was 7. Average population performance, measured as yield (Y), was estimated using a linear model with population (P) and environment (trial and year) (E) as predictors with an intercept (i), as shown below. Each location per year was considered independent.
The prediction was used to assess the performance of hybrid populations and synthetic varieties across the trial sites and years. As a null hypothesis, no difference between hybrids and synthetic varieties was assumed. The null hypothesis was tested by ANOVA.

Genetic Structure of Selected Parental Populations from a CMS-Based Hybrid Breeding Program
Out of the 240 genotyped individuals, 215 met the filtering criteria of at least 10,000 SNP calls across all polymorphic sites. The hierarchical clustering of these 215 individuals revealed five subgroups ( Figure 1a). Within those subgroups, the 55 diploid pollinators, the 51 diploid CMS plants, and the 40 tetraploid pollinators formed separate clusters. However, the tetraploid CMS individuals were split between two distinct clusters with 39 and 16 individuals, respectively. Fourteen individuals did not cluster into their respective groups: one diploid CMS plant clustered with the tetraploid pollinators, one diploid pollinator clustered with the tetraploid CMS plants, nine tetraploid pollinators clustered with the tetraploid CMS plants, and three tetraploid CMS plants clustered with the tetraploid pollinators.
The SNP-based distance matrix was used for PCA, with the first three components explaining 30.96, 50.15, and 63.69% cumulative variance, respectively (Figure 1b, Figure S1). Individuals of the diploid CMS (yellow), diploid pollinator (grey), and tetraploid pollinator (blue) populations clustered in distinct groups. Similar to hierarchical clustering, individuals of the tetraploid CMS population (green) formed a sub-population shared with the tetraploid pollinator.

Sample Numbers Required to Represent the Allelic Composition of Breeding Populations
In order to determine the number of individuals required to represent the allelic composition of the populations, 5-55 individuals of the diploid and tetraploid CMS and pollinator populations were randomly selected for permutation tests (Figure 2). With increasing sample numbers, all populations showed a significant increase in the saturation of the proportion of allelic frequencies. Only the increase from 50 to 55 samples of the diploid CMS population was not significantly different (p = 0.2585). Diploid and tetraploid CMS populations (Figure 2a) showed a significant difference (p = 0.0744) in the proportion of allelic frequencies for up to 45 samples. Within the pollinator populations (Figure 2b), diploids and tetraploids were similar using 20 samples and more (p = 0.9435). In addition, the variance of each permutation subset decreased with an increasing sample number.

Sample Numbers Required to Represent the Allelic Composition of Breeding Populations
In order to determine the number of individuals required to represent the allelic composition of the populations, 5-55 individuals of the diploid and tetraploid CMS and pollinator populations were randomly selected for permutation tests (Figure 2). With increasing sample numbers, all populations showed a significant increase in the saturation of the proportion of allelic frequencies. Only the increase from 50 to 55 samples of the diploid CMS population was not significantly different (p = 0.2585). Diploid and tetraploid CMS populations (Figure 2a) showed a significant difference (p = 0.0744) in the proportion of allelic frequencies for up to 45 samples. Within the pollinator populations (Figure 2b), diploids and tetraploids were similar using 20 samples and more (p = 0.9435). In addition, the variance of each permutation subset decreased with an increasing sample number.

Performance of Tetraploid Perennial Ryegrass Hybrids and Synthetic Varieties
In the linear model used to describe the yield performance, the estimated dry matter yield (dt ha −1 ) was significantly (p < 0.001) correlated to the observed dry matter yield (R 2 = 0.96) (Figure 3a). Based on the estimated yield, ANOVA revealed significant differences between hybrids and synthetics (p = 0.0155). The average dry matter yield performance was 429.5 dt ha −1 and 411.7 dt ha −1 for hybrids and synthetics, respectively (Figure 3b). This was equivalent to a 4.15% increased average performance of hybrids.

Performance of Tetraploid Perennial Ryegrass Hybrids and Synthetic Varieties
In the linear model used to describe the yield performance, the estimated dry matter yield (dt ha −1 ) was significantly (p < 0.001) correlated to the observed dry matter yield (R 2 = 0.96) (Figure 3a). Based on the estimated yield, ANOVA revealed significant differences between hybrids and synthetics (p = 0.0155). The average dry matter yield performance was 429.5 dt ha −1 and 411.7 dt ha −1 for hybrids and synthetics, respectively (Figure 3b). This was equivalent to a 4.15% increased average performance of hybrids.

Improved Yield Performance Through Hybrid Breeding
The tetraploid perennial ryegrass hybrids described here outperformed tetraploid synthetic varieties by 4.15% on average for total dry matter yield. Considering the annual genetic gain for biomass yield in perennial ryegrass, this yield increase equates to seven to ten years of breeding progress [12,31]. However, only the hybrid performance was assessed but not heterosis, as knowledge of parental performance is absent. In addition, the hybrids were generated by crossing CMS populations with highly heterozygous pollinator populations. This is in contrast to most hybrid breeding strategies, where parental inbred lines are crossed to maximize heterozygosity and heterosis in the F1 hybrid.
The efficient development of inbred lines in perennial ryegrass is hampered by an effective SI mechanism, preventing seed production from selfings. Additionally, SI and subsequent outcrossing have allowed for the accumulation of deleterious alleles in the evolutionary history of Lolium spp., leading to severe inbreeding depression in selfed breeding material. Overcoming SI and hence the prevention of inbreeding would not only enable to increase the degree of homozygosity in parental plant materials but also purge breeding germplasm from deleterious alleles with recurrent cycles of self-pollination. This could lead to the development of superior inbred lines to be used as parents in hybrid breeding schemes, untapping the potential to further increase biomass yields in F1 hybrids [5,32].

Identification of Heterotic Groups is Key for Successful and Efficient Hybrid Breeding in Perennial Ryegrass
Genotyping-by-sequencing allowed us to identify heterotic groups within the diploid and tetraploid parental breeding populations. This is a major improvement compared to unstructured germplasm, as identification of heterotic groups is not only pivotal to fully exploit the potential of hybrid breeding but also to improve the efficiency of CMS-based hybrid breeding strategies [17,25,33]

Improved Yield Performance through Hybrid Breeding
The tetraploid perennial ryegrass hybrids described here outperformed tetraploid synthetic varieties by 4.15% on average for total dry matter yield. Considering the annual genetic gain for biomass yield in perennial ryegrass, this yield increase equates to seven to ten years of breeding progress [12,31]. However, only the hybrid performance was assessed but not heterosis, as knowledge of parental performance is absent. In addition, the hybrids were generated by crossing CMS populations with highly heterozygous pollinator populations. This is in contrast to most hybrid breeding strategies, where parental inbred lines are crossed to maximize heterozygosity and heterosis in the F1 hybrid.
The efficient development of inbred lines in perennial ryegrass is hampered by an effective SI mechanism, preventing seed production from selfings. Additionally, SI and subsequent outcrossing have allowed for the accumulation of deleterious alleles in the evolutionary history of Lolium spp., leading to severe inbreeding depression in selfed breeding material. Overcoming SI and hence the prevention of inbreeding would not only enable to increase the degree of homozygosity in parental plant materials but also purge breeding germplasm from deleterious alleles with recurrent cycles of self-pollination. This could lead to the development of superior inbred lines to be used as parents in hybrid breeding schemes, untapping the potential to further increase biomass yields in F1 hybrids [5,32].

Identification of Heterotic Groups is Key for Successful and Efficient Hybrid Breeding in Perennial Ryegrass
Genotyping-by-sequencing allowed us to identify heterotic groups within the diploid and tetraploid parental breeding populations. This is a major improvement compared to unstructured germplasm, as identification of heterotic groups is not only pivotal to fully exploit the potential of hybrid breeding but also to improve the efficiency of CMS-based hybrid breeding strategies [17,25,33]. Using CMS, it is possible to test the general combining abilities of selected populations within the breeding material [34]. In practice, however, only a fraction of the possible parental combinations can be evaluated in each cycle [17]. The factors constraining testcross combinations include technical (seed production, space) and biological (non-overlapping flowering times) limitations [17,35]. Thus, the availability of marker data within and between heterotic groups will be helpful to allocate the often limited resources of a breeding program to the most promising parental combinations for testcross evaluation.
Previous studies have presented methods where genotyping has been performed on pooled samples to infer population allele frequencies [26,36,37]. These approaches have been successful in distinguishing varieties, aiming to support breeders' rights and to improve variety registration. However, they have failed to identify the detailed structure within population-based germplasm and to support the development of heterotic groups [26,38]. In contrast, the individual-based genotyping strategy described here not only allows precise germplasm fingerprinting, but it also provides a solid basis to develop and further diverge heterotic groups using marker data of individual plants.
With the highly multiplexed GBS protocol applied here, genotyping of up to 1536 individuals is possible with a single run on an Illumina HiSeq2500 instrument. Additional flexibility and throughput of DNA sequencing can be realized using the novel Illumina NovaSeq 6000 or similar systems, further consolidating GBS as a time-and cost-effective tool to genotype single individuals. By doing so, breeders can assign individuals of elite populations into distinct heterotic groups. For this purpose, genotyping 20-45 individuals is sufficient to capture, on average, 99% of the genetic variation and to detect concealed heterotic groups within a population. As the development of any heterotic group requires a minimum of 8-16 plants, genotyping of at least 60 individuals is recommended to form new breeding populations. Diversification and development of heterotic groups could potentially lead to similar success in hybrid breeding, as seen in maize, canola, rye, and other crops [19][20][21]26].

Evidence for Sub-Populations within Heterotic Groups as Revealed by Genotyping-by-Sequencing
Interestingly, the proportion of allelic frequencies within the diploid and tetraploid pollinator populations appears to be higher when compared to their respective CMS populations. There are a number of potential reasons for this difference in the genetic make-up of the pollinator and the CMS populations. For example, the CMS trait has been introgressed from a single founder plant. This could have influenced the selection of alleles that are necessary to maintain the male sterility within CMS populations. Additionally, CMS populations have been subject to seed propagation by backcrossing to their maintainer populations. This could have driven the emergence of sub-populations and amplify their diversity based on the allelic constitution at SI loci. Finally, SI could also have played a role in maintaining high diversity by frequency-dependent selection in pollinator populations: the phenotypic selection of plants with high seed yield may have favored pollen contaminations with novel SI alleles [5].
In the context of hybrid breeding, the overlap between heterotic groups increases the degree of homozygosity in the resulting hybrids and consequently impede hybrid performance. Our observations in the tetraploid CMS germplasm described here illustrate the need to genotype single plants to develop and maintain breeding populations. An exhaustive analysis of populations from the CMS hybrid breeding program beyond the four characterized in our study could identify more populations harboring multiple groups, thus providing a method to identify complementary germplasm and to develop heterotic groups by directed crosses. Both are key to successful hybrid breeding for perennial ryegrass [20].

Conclusions
Averaged across multiple evaluation sites and years, the biomass yield performance of CMS-based population hybrids in tetraploid perennial ryegrass was significantly higher compared to currently used synthetic cultivars. Detailed analysis of the genetic structure of parental populations using molecular tools will help to realize the untapped potential to further increase biomass yields. Genotyping-by-sequencing is a useful tool to characterize diploid and tetraploid breeding populations of perennial ryegrass and possibly other crops in order to identify population structures. Through the continuous development of populations towards distinct heterotic groups, the potential of heterosis can be exploited more efficiently.

Supplementary Materials:
The following are available online at http://www.mdpi.com/2073-4395/10/9/1410/s1, Figure S1: Cumulative proportion of explained variance of the principal component analysis (PCA) used to describe the genetic structure of selected diploid and tetraploid parental populations from a cytoplasmic male sterility (CMS)-based hybrid breeding program of perennial ryegrass (Lolium perenne L.).