Assessment of Genetic Diversity in Faba Bean Based on Single Nucleotide Polymorphism

Detection of genetic diversity is important for characterisation of crop plant collections in order to detect the presence of valuable trait variation for use in breeding programs. A collection of faba bean (Vicia faba L.) genotypes was evaluated for intra-and inter-population diversity using a set of 768 genome-wide distributed single nucleotide polymorphism (SNP) markers, of which 657 obtained successful amplification and detected polymorphisms. Gene diversity and polymorphism information content (PIC) values varied between 0.022–0.500 and 0.023–1.00, with averages of 0.363 and 0.287, respectively. The genetic structure of the germplasm collection was analysed and a neighbour-joining (NJ) dendrogram was constructed. The faba bean accessions grouped into two major groups, with several additional smaller subgroups , predominantly on the basis of geographical origin. These results were further supported by principal coordinate analysis (PCoA), deriving two major groupings which were differentiated on the basis of site of origin and pedigree relationships. In general, high levels of heterozygosity were observed, presumably due to the partially allogamous nature of the species. The results will facilitate targeted crossing strategies in future faba bean breeding programs in order to achieve genetic gain.


Introduction
Faba bean (Vicia faba L.) is a major grain legume crop, grown for high seed protein content (approximately 25%-30%) and superior biomass, ranking as the fourth most internationally important cool-season legume after peas, chickpeas and lentils [1].Although the exact origin is unknown, it is generally accepted that faba bean was one of the earliest food legumes to be domesticated, and has probably been cultivated since the Neolithic period [2].Both botanical and molecular genetic data suggest that the wild progenitor of contemporary faba bean is yet to be discovered, or has become extinct [3].As for other legume crops, faba bean plays a critical role in crop rotations and effective nitrogen fixation for the purpose of soil improvement [4].Faba bean (Vicia faba L.) is a diploid (2n = 2x = 12), partially allogamous plant species [5], with a very large nuclear genome (c. 13 Gbp diploid [6,7]).Large genomes pose substantial challenges to effective development and implementation of molecular genetic markers for genomics-assisted breeding.
More than 43,500 faba bean accessions are conserved within 37 global collections [1,8].The largest collection of faba bean accessions (>9,000 accessions) is located at the International Center for Agricultural Research in the Dry Areas (ICARDA) in Syria, followed by the Chinese Academy of Agricultural Sciences (CAAS) in China (>5,200 accessions).Faba bean has long attracted the interests of numerous taxonomists and evolutionists due to many unknown evolutionary aspects [9].The species is divided into distinct groups based on seed size, ranging from small-seeded minor beans (0.2-0.8 g per seed) to medium-seeded equina beans and the large-seeded major beans (greater than approximately 1.0 g and up to 2.6 g per seed), that have become known as distinct botanical groups [10].
The key factor for successful crop improvement is a continued supply of genetic diversity in breeding programs, including new or improved variability for target traits.Plant genetic resources have always played a major role in providing sources of resistance to biotic and abiotic stresses, and it is therefore critical to manage, conserve and evaluate such materials.Although management and utilisation of large-scale diversity in collections provide major challenges to germplasm curators and crop breeders, it is equally important to characterise and understand genetic diversity among plant resources for effective use in breeding programs [11].
Molecular genetic markers represent a powerful tool for characterisation of germplasm collections.Different types of marker systems have been used to characterise genetic diversity in various crop species, including faba bean.Genetic diversity among 28 inbred lines (European and Mediterranean lines) was assessed using random amplified polymorphic DNA (RAPD) markers [9], identifying three major clusters comprised of European large-seeded, European small-seeded and Mediterranean germplasm.Amplified fragment length polymorphism (AFLP) analysis was used to determine genetic diversity among inbred lines derived from elite cultivars in some earlier studies [10,12], while in another study the authors used target region amplification polymorphism (TRAP) markers to assess genetic diversity and relationships between faba bean germplasm entries [13].Mediterranean landraces that were highly diverse for morpho-agronomic traits were analysed using inter-simple sequence repeat (ISSR) markers, revealing substantial diversity.ISSR markers were also used to assess genetic diversity and relationships within globally distributed faba bean germplasm [14].The authors concluded that accessions from North China showed the highest genetic diversity, while accessions from central China displayed a low level of diversity, and accessions from Europe were genetically closer to those from North Africa.Recently, large numbers of simple sequence repeat (SSR) markers have been identified and characterized from faba bean [15][16][17], and those derived from expressed sequence tags (ESTs) were used to understand the genetic relationships between 32 genotypes, permitting definition of four distinct clusters based on geographical origins [17].
Until recently, SSRs have been considered as the marker system of choice for the majority of applications.However, recent advances in sequencing and genotyping technologies now permit generation of large sets of single nucleotide polymorphism (SNP) markers from relatively understudied crop species such as faba bean at an acceptable level of cost.As a consequence, SNPs have become more widely used due to high abundance and capacity to be multiplex-formatted for high-throughput genotyping.In addition, SNP discovery from transcribed regions of the genome provides the basis to establish a direct link between sequence polymorphism and putative functional variation.
In the present study, a selection was made of 45 faba bean lines from North Africa, China, Ecuador, Europe and Australia that represent Australian cultivars, as well as the major parents of the Australian faba bean breeding program.Multiple genotypes from each of the faba bean lines were genotyped with EST-derived SNP markers to assess genetic diversity, which was then related to geographical location of origin and pedigree structure, providing a support for design of future breeding populations.

Plant Material and DNA Extraction
A total of 45 accessions of faba bean (Vicia faba L.) that originated from different geographical locations and represented major Australian cultivars and donors of major traits of interest to the breeding program, were obtained from the Pulse Breeding Australia faba bean breeding program, University of Adelaide, New South Wales Department of Primary Industries and University of Sydney, Australia (Table 1).Two seeds from each accession were sown into potting mix and young leaf tissue was harvested from each plant and stored immediately in 96-well microtube plate.Total genomic DNA was isolated after grinding with the MM 300 Mixer Mill system (Retsch, Germany).DNA extraction was performed using the DNeasy 96 plant mini kit (QIAGEN, Hilden, Germany).DNA was suspended in 1 × TE buffer and further diluted to approximately 50 ng/μL prior to SNP genotyping.

SNP Genotyping
A sub-set of 768 SNPs was selected on the basis of informative data from previous SNP discovery and genetic linkage mapping experiments [18].All of these SNPs met the criteria of possessing sufficient sequence information both 5' and 3' to the target locus, and absence of other known sequence variants in their vicinity.The designability score calculated for each SNP locus by Illumina Inc. (San Diego, CA, USA) was higher than 0.6, predicting high assay conversion rates.A total of 250 ng of genomic DNA from each plant genotype was used for amplification, after which PCR products were hybridised to bead chips via the address sequence for detection on an Illumina iSCAN Reader.On the basis of obtained fluorescence, data for allele calls were viewed graphically as a scatter plot for each marker assay using GenomeStudio software v2011.1 with a GeneCall threshold of 0.20.Table 1.Details of breeding history, major characteristic traits and origin of the cultivars and germplasm evaluated in the diversity study.The countries listed in the origin of the source germplasm are listed in decreasing order of contribution to the pedigree of the test genotype.The origin of source germplasm reflects the original source, when known, rather than the location of the donor.

Genetic Diversity and Population Structure Analysis
SNP scoring data was analysed using the NTSYSpc 2.1 package [19].Basic statistics were calculated using the genetic analysis package PowerMarker (ver.3.23; [20]) for diversity measurements at each locus, including the total number of alleles (NA), allele frequency, major allele frequency (MAF) (allele with the highest frequency), accession-specific alleles, gene diversity (GD), and polymorphism information content (PIC).Genetic distance between each pair of accessions was calculated using the equation from Nei [21].Based on genetic similarity, a dendrogram was constructed by application of the unweighted pair group method with arithmetic average (UPGMA) cluster analysis using DARwin v5 [22].
Genetic structure of the germplasm collection was first analysed by performing principal component analysis (PCoA) implemented in the GenAlex 6.41 package [23], based on standardised covariance of genetic distance calculated for codominant markers.

SNP Polymorphism
A sub-set of 768 genome-wide distributed SNPs was used to assess genetic diversity both within and between faba bean accessions.Two individual plants from each accession were genotyped in order to assess levels of intra-population genetic diversity.Of the total, 657 SNPs (85.4%) obtained successful amplification and detected polymorphisms, while the remaining 112 assays either failed or produced inconsistent results.Only a small percentage (~ 3%) of the marker data was missing for any individual accession.These results are comparable to a similar study of diversity analysis in faba bean based on use of 80 SNP markers, of which 67 (83.5%) obtained successful amplification [24].However, similar SNP-based studies of germplasm characterisation in other crop species obtained comparable results for amplification efficiency (91% in grape [25]; 74% in winter wheat [26]).
In order to define both intra-and inter-population diversity, SNP polymorphism was measured in terms of the gene diversity and PIC value, values ranging from 0.022 (SNP_50000835) to 0.500 (SNP_50001459) and from 0.023 (SNP_50000835) to 1.000 (SNP_50001459), with averages of 0.363 and 0.287, respectively.Major allele frequencies ranged up to 0.989 (SNP_50000835) with an average of 0.716.A total of 15% of the SNP markers exhibited MAFs greater than 0.900.The highest level of heterozygosity (0.989) was detected at SNP_50000841 and SNP_50000709 followed by SNP_50002454, SNP_50002393, SNP_50001315, SNP_50000803, SNP_50000804, SNP_50000805, SNP_50000270, SNP_50000396, SNP_50000114 and SNP_50000129 (0.978).Despite the high levels of heterozygosity obtained for some of the SNPs, these markers were found to be highly informative in the current study, as well as to segregate in an associated linkage mapping study [18].Heterozygosity was lowest at locus SNP_50000835 (0.023), followed by SNP_5000029 (0.025) and SNP 50000669 and SNP_50002436 (0.033) (Supplementary 1).A high level of heterozygosity was observed within faba bean accession in most of cases, as predicted by the partially allogamous nature of the species [27][28][29].

Genetic Diversity Analysis
Genetic distance between studied genotypes was quantified using Nei's metric [20].The appropriateness of use for specific similarity coefficients has been often debated [30].In the present study, in which two genotypic samples were obtained from each accession under study, the Nei GD metric (in which pair-wise comparisons are made and a similarity matrix is constructed) was selected, (Supplementary 2).The highest value of GD was recorded between genotypes Ascot_2 and Icarus_1 (0.176).The closest pair of genotypes was 624#8103 and 391#5148 (Nei's coefficient = 0.00133), both of which are rust-resistant selections undertaken in Australia [31] from accessions that originate from the Maghreb in north-western Africa.A large amount of intra-population variation was also observed for the majority of the accessions, the exception being 1248-4 for which the value of Nei's coefficient between the two samples was 0.0 (genetic distance), both biological replicates obtained from this accession hence being genetically similar.A neighbour joining phylogram was generated (Figure 1, Supplementary 3) based on UPGMA-derived data.The faba bean accessions grouped into two main groups (G-I and G-II).G-I consisted of 16 accessions (6 accessions from Ecuador; 8 from Australia; and one accession that originated from the ICARDA breeding program).G-II consisted of 29 accessions and could be further sub-divided into three sub-groups (G-II-A, G-II-B and G-II-C).G-II-A consisted of 4 accessions that originated from China, 7 accessions from North Africa, 4 from Australia and one from Europe.G-II-B consisted of 2 accessions from Australia, and G-II-C consisted of another two accessions from Australia.The large number of sub-clusters identified in this study indicates high genetic variability related to diversity collection sites and may prove useful for exploitation of trait variability in faba bean improvement.
All of the Australian cultivars and breeding lines in G-I, with the exception of Doza, were either selected from germplasm derived from Ecuador, or include Ecuadorean germplasm in their pedigrees.In addition, Acc 683 (which was obtained from ICARDA) has Ecuadorean parentage.Germplasm from Ecuador, and in particular the BPL 710 population from which Icarus was selected, has been reported to displayed superior level of resistance to chocolate spot (Botrytis fabae Sardiña) [32][33][34], and has consequently been used extensively in the Australian faba bean breeding program.A number of diversity studies have been reported for faba bean, for example comparing European and Mediterranean germplasm [9], Chinese and globally distributed germplasm [27,35], and European, North African and Asian germplasm [10], but none have included germplasm from Ecuador.The present study indicates a significant divergence from other accessions of both Ecuadorean germplasm and breeding lines that include a significant contribution from this germplasm.This observation suggests that more intensive utilisation of Ecuadorean germplasm could allow broadening of diversity in faba bean breeding programs.Faba bean is of Old World origin, and was introduced to Latin America post-Columbus.Consequently, the apparent divergence observed here warrants further investigation, both of other Latin American faba bean populations to establish the extent of variation in this region, and comparisons between Latin American and Western European faba bean accessions, the latter presumably being the source of Latin American introductions.The majority of G-II germplasm originated from the Mediterranean Basin or Middle East, the major exception being a sub-cluster within G-IIA, comprising four accessions from China.Acc 1714/1, which originated from the spring-sown cultivation region of Gansu, was separated from the other three lines which originated from the sub-tropical and temperate regions of southern China.Chinese germplasm has been reported to be highly distinct from other germplasm [14,27,35], consistent with the result for G-II when considered in isolation.However, when G-I is also taken into account, considerably greater diversity among faba bean germplasm is apparent than implied by the prior comparisons of Chinese and global germplasm.

Relationships between Genetic Diversity and Breeding History
The progression in Australian faba bean breeding, and relationship between cultivars, is apparent in the dendrogram structure depicted in Figure 1.The breeding program has progressed through three phases.The initial phase consisted of introduction and evaluation, and resulted in the release of the first cultivar (Fiord) in 1980, while Icarus, Manafest and Aquadulce also fit in this category.Faba bean cultivars and accessions are generally heterogeneous for many traits, a consequence of the partially allogamous mating system, and this enabled a second phase based on selection within populations.Considerable heterogeneity has been observed within many populations for reaction to ascochyta blight (Ascochyta fabae) [36] and selection within cultivars has resulted in resistant cultivars such as Ascot (selected from Fiord), Farah (selected from Fiesta) and PBA Kareema (selected from Aquadulce) [37][38][39].In all cases, original and derived cultivars are grouped together in the dendrogram.The third phase of breeding activities has been based on hybridisation followed by selection in segregating populations.PBA Rana (breeding line AF01006-1) resulted from a single backcross between Manafest as the recurrent parent and Acc 611 as the donor of resistance to ascochyta blight, and, as expected, PBA Rana is grouped with Manafest.The progeny of more complex or diverse crosses generally occupy positions intermediate to the parents in the dendrogram.
A high level of genetic diversity in breeders' germplasm is a key to successful crop improvement.An earlier study concluded that genetically diverse parents improve the efficiency of linkage map construction and identification of linked markers for traits of interest [40].Molecular genetic marker analysis has clarified the structure of genetic diversity in a broad range of crops.Recent technological developments have made whole-genome sequence polymorphism and gene-targeted surveys possible, casting light on population dynamics and the impact of selection during domestication.Germplasm description has hence conferred analytical power for resolution of the genetic basis of trait variation and adaptation in major crops such as cereals, chickpea, grapevine, cacao, or banana [41].

Population Structure Analysis
The genetic structure of the entire germplasm collection was analysed using PCoA.PCoA based on SNP allele frequencies revealed a clear differentiation between faba bean genotypes.The first and second axes explained 31.85% and 16.16% of the total variance, and predominantly separated faba bean genotypes on the basis of geographical origin and pedigree relationships (Figure 2).Two major clusters were identified; cluster 1 only contained Icarus, while cluster 2 contained all other genotypes.Cluster 2 further contained small sub-clusters grouping according to geographical origin or pedigree relationships.PCoA also confirmed the results obtained from Nei's GD estimates, Ascot and Icarus being most divergent.Nura, which is the progeny of an Icarus × Ascot cross, was at the mid-point between the two parents on the first axis.

Figure 2 .
Figure 2. Principal component analysis (PCoA) bi-plot generated from genetic distance calculations obtained using the GENALEX package.

Table 1 .
Cont.Accession number in The University of Adelaide faba bean collection; ILB (International Legume Bean) and BPL (Bean Pure Line): Accession number in International Center for Agricultural Research in the Dry Areas (ICARDA) collections; Seed size: approximate weight per 100 seeds when grown in field plots southern Australia; Adaptation: good adaptation in specific regions of Australia; Asco res: Ascochyta blight resistant; Choc spot res: Chocolate spot resistant; Cero res: Cercospora leaf spot resistant; BLRV res: Bean leaf roll virus resistant; BYMV res: Bean Yellow Mosaic Virus resistant; Salt tol: Salt tolerant; Low v/cv: Low vicine/convicine.