Genotyping ‐ By ‐ Sequencing Reveals Population Structure and Genetic Diversity of a Buffelgrass ( Cenchrus ciliaris L.) Collection

: Buffelgrass ( Cenchrus ciliaris L.) is an important forage grass widely grown across the world with many good characteristics including high biomass yield, drought tolerance, and adaptability to a wide range of soil conditions and agro ‐ ecologies. Two hundred and five buffelgrass accessions from diverse origins, conserved as part of the in ‐ trust collection in the ILRI genebank, were analyzed by genotyping ‐ by ‐ sequencing using the DArTseq platform. The genotyping generated 234,581 single nucleotide polymorphism (SNP) markers, with polymorphic information content (PIC) values ranging from 0.005 to 0.5, and the short sequences of the markers were aligned with foxtail millet ( Setaria italica ) as a reference genome to generate genomic map positions of the markers. One thousand informative SNP markers, representing a broad coverage of the reference genome and with an average PIC value of 0.35, were selected for population structure and diversity analyses. The population structure analysis suggested two main groups, while the hierarchical clustering showed up to eight clusters in the collection. A representative core collection containing 20% of the accessions in the collection, with germplasm from 10 African countries and Oman, was developed. In general, the study revealed the presence of considerable genetic diversity and richness in the collection and a core collection that could be used for further analysis for specific traits of interest.


Introduction
The availability of sufficient quantity and quality feed resources is a key factor underpinning sustainable livestock production, particularly with the current trend of climate change [1] and to meet the ever-increasing demand for livestock products. To address these challenges, the promotion of new options of climate resilient forages from the collections held in genebanks is considered crucial. Thus, generating knowledge to support a greater understanding of the genetic resources of forage crops and promoting their use can contribute to the sustainable development goals of 'no poverty', 'zero hunger', and ensuring 'healthy lives'.
Buffelgrass (Cenchrus ciliaris L. Poaceae) is an important warm-season perennial (C4) forage grass [2] widely grown in the tropics, subtropics, and warm temperate areas of the world [3,4]. It is widely distributed, but native to Africa, Middle East, Western Asia, and Europe [3,4]. Over the course of forage cultivation, it has been introduced to USA, Mexico, Colombia, Nicaragua, El Salvador, Honduras, Brazil, Bolivia, Panama, Venezuela, and Australia, where several cultivars have been developed [3,4].
Buffelgrass is a climate resilient species adapted to diverse soil characteristics, altitudes (sea level to 2000 m), and agro-ecological conditions [3]. Known for its good pasture production across a wide range of environments, it can produce up to 24 tons/ha/yr of good quality forage [10]. It is one of the most drought-and high temperature stress-tolerant species that can grow in areas with annual rainfall as low as 250 mm and up to 2670 mm [3]. Some genotypes have been shown to tolerate cold temperatures [11,12]. It can be grazed directly, is capable of recovering from heavy grazing, and can be made into hay and stored for use during the feed shortage seasons of the year [10]. It is a deeprooted species that responds rapidly to rain and plays a crucial role in soil conservation [13]. The combination of these characteristics makes buffelgrass a forage of choice in smallholder farming systems.
The forage genebank at the International Livestock Research Institute (ILRI) maintains a large collection of buffelgrass, both as live plants in field genebanks and as seeds in cold storage. The germplasm was collected from different parts of Africa, India, Yemen, and Oman [14]. Previous phenotypic studies using subsets of the collection revealed the presence of large variation for agromorphological traits [15,16]. Similarly, wide agro-morphological variation has been recorded in buffelgrass germplasm from South Africa [17], Pakistan [18], Tunisia [19], and other countries [20]. Considerable genetic diversity was also reported in buffelgrass germplasm maintained in the United States Department of Agriculture-Agricultural Research Service (USDA-ARS) genetic resources conservation unit and in germplasm collected from different provenances of Tunisia using AFLP molecular markers [21,22]. Thus, this suggests the presence of wide genetic diversity in the different collections that could be used for the selection of climate resilient lines and to develop novel varieties to address future unforeseen production constraints.
Despite the phenotypic diversity, there is little information on genetic diversity in the collection based on molecular characterization. Developing our understanding of the genetic diversity contained in the collection and how this relates to and potentially complements other collections of this species will contribute to the enhanced use, conservation, and improvement of buffelgrass germplasm globally. Therefore, the aim of this study was to characterize the buffelgrass collection maintained in ILRI's genebank and to develop a core collection containing most of the genetic diversity using the molecular approach of genotyping-by-sequencing (GBS) of the DArTseq platform.

Materials
Two hundred and five accessions of buffelgrass held in the ILRI genebank were used in the study. The collection contains germplasm collected from different parts of Africa, Asia, and Middle East ( Figure 1, Table S1). The collection contains accessions with a diverse range of morphological and agronomic characteristics [15,16].

DNA Extraction
Leaf samples were collected from plants maintained in Zwai (7.899966, 38.734574) field genebank of Oromia region, Ethiopia. DNA was extracted from freeze-dried leaf samples using a DNeasy Plant Mini kit (Cat No./ID:69106) according to the manufacturer's instructions. The DNA quantity and quality were checked using a DeNovix DS-11 spectrophotometer. DNA samples were diluted to a concentration of 50-100 ng/μL, and 25 μL of the diluted DNA samples was aliquoted into 96-well fully skirted plates, packed, and shipped for genotyping.

Genotyping
GBS was performed on the DArTseq platform at Diversity Array, Canberra, Australia. The single nucleotide polymorphism (SNP) markers were generated according to the DArTSeq protocol as described elsewhere [23]. The marker fragments were aligned with the Setaria italica reference genome [24] and the genome-wide distribution of the markers was visualized using the R-package Synbreed [25]. The reference genome of Setaria italica was selected based on phylogenetic tree analysis of species in the Poaceae family, for which the whole genome sequence is available in the literature [26]. The basic chromosome number and the subfamily were also taken into account in selecting the reference genome. Cenchrus ciliaris and Setaria italica belong to the subfamily Panicoideae (Poaceae) and have a similar basic chromosome number of x = 9.

Data Analysis
The genotyping data were analyzed using various statistical software packages. The percentage of missing data and polymorphic information content (PIC) were calculated in Microsoft Excel. The PIC value was calculated using the formula PIC = 1-∑Xi 2 , where Xi is the frequency of the i th allele of the SNP marker [27]. Markers with known genomic positions, ≤20% missing data, and PIC value of ≥0.2 were selected for population structure and genetic diversity analyses. The DAPC function of the R package adegenet [28] was used to determine the optimal number of groups and assign individual accessions to the different groups, as well as to determine a marker's contribution to the diversity in the collection. The Euclidean distance matrix and hierarchical clustering were calculated using the dist () and hclust () functions of R statistical software [29]. The R packages dendextend [30] and

Number of accessions
Country of origin factoextra [31] were used to visualize the phylogenetic relationship and a principal component analysis of the population, respectively. Analysis of molecular variance (AMOVA) was conducted to determine the contribution of, among, and within cluster variation to the total variation using GenAlex 6.5 [32]. The STRUCTURE software [33,34] was used to analyze population structure as described elsewhere [35], with modifications as follows: The burn-in time and number of iterations were set to 30,000, with three repetitions testing the probability of K = 2-20 subpopulations. The results of the run were uploaded to the software "Structure harvester" [36] and the optimal number of subpopulations was determined by the Evanno method [37].

Core Collection Development
The R package Core Hunter v.3.2.1 [38] was used to select a subset of accessions broadly representing the genetic diversity held in the collection. Genotyping data of 1000 informative SNP markers identified during diversity analysis of the collection were used for core collection development. To assess the representation of the core collection, analysis of molecular variance (AMOVA) and principal coordinate analysis visualization were performed using GenAlex 6.5 [32].

Informativeness and Diversity of the SNP Markers
A total of 234,581 SNP markers were generated for the 205 buffelgrass accessions. The PIC value of the markers ranged from 0.005 to 0.5 ( Figure 2), and 65,361 SNP markers had a PIC value of ≥0.2. The missing data percentage ranged from 1% to 92% per SNP marker and 42% to 81% per accession, with an average of 65.5%. Approximately, 1% of the markers (2163) had no missing data, while 4.3% of the markers (10,318) had up to 20% missing data.  Figure 3 shows the genome-wide distribution of the SNP markers on the Setaria italica reference genome [24]. Around 12% (28,459) of the markers mapped onto the different chromosomes and scaffolds. The largest number of markers were mapped onto chromosome 9 (5677 SNP markers) followed by chromosomes 5 (4274 markers), 2 (3597 markers), and 3 (3526 markers). The lowest number of markers were mapped onto chromosomes 8 (1173 markers) and 6 (1855 markers). A few markers (94 SNPs) were mapped onto different scaffolds. Over 88% of the markers (206,122) were not able to be mapped onto the Setaria italica reference genome.

Population Structure and Genetic Diversity of the Buffelgrass Collection
To assess the genetic diversity and population structure of the collection, markers with missing data percentage of ≤20%, polymorphic information content (PIC) values of ≥0.2, and that were mapped onto the reference genome were selected. From 1641 SNP markers which passed the selection criteria, the top 1000 markers contributing to the diversity and clustering were selected using the R package adegenet [28] for in-depth analyses of the collection (population structure and diversity). The average PIC value of the selected markers was 0.35. Figure 4 shows the genome wide distribution of the selected 1000 SNP markers. The hierarchical cluster analysis grouped the collection into eight clusters (Figure 5a) with further subclusters, and the accessions were assigned to the clusters with clear cluster membership (Figure 5b,c). The result of a cluster plot using the first two components, which explained 22.9% of the total variation, was consistent with the hierarchical clustering ( Figure 5d). AMOVA was used to estimate the components of total genetic variation ( Table 1). The AMOVA result showed that the among and within cluster diversity explained 38% and 62% of the total variation, respectively. In addition, population structure was also analyzed with the STRUCTURE software [30,31], with the highest delta K (∆K) [34] at K = 2 suggesting the presence of two main groups in the collection (Figure  5e,f). There was a second peak at K = 11, indicating further subgrouping of the collection.  Table 1. Analysis of genetic differentiation among and within clusters of buffelgrass collection by analysis of molecular variance (AMOVA).

Core Collection Development
Core collection development was undertaken using the R package Core Hunter [38]. The core collection contained 41 accessions, representing the different clusters of the collection (Figure 6). Table 2 shows the list of accessions constituting the core collection which originated from 11 different countries: 11 accessions from Tanzania, 4 from Botswana, 6 each from Kenya and Republic of South Africa, 2 from Namibia, 5 from Ethiopia, 2 from Uganda, and 1 accession each from Oman, Somalia, Djibouti, and Niger. One accession of unknown origin (19380) was also included in the core collection. In terms of clusters, the largest number of accessions was contained in cluster II (14 accessions), followed by clusters I, III, V, and VI (6 accessions each). The least number of accessions were from clusters V, VII, and VIII (one accession each). The AMOVA result showed that there was no significant difference between the developed core collection and the rest of germplasm, and that the 'within population' differences between accessions contributed almost all of the total genetic variation, indicating that the developed core collection represented the overall collection well (Table 3).    Table 3. Result of the AMOVA between the core collection and the rest of the population.

Population Structure and Genetic Diversity of the Buffelgrass Collection
Understanding the genetic relationship and population structure of a collection is very important for enhancing the conservation and utilization of the genetic resources. In this study, 205 accessions of buffelgrass from the ILRI forage genebank were studied by genotyping-by-sequencing, and a large number of SNP markers were generated from the collection. The short sequences of the markers were mapped onto the reference genome of Setaria italica. However, only a small percentage of the generated markers (12%) was able to be aligned with the reference genome. A subset of genome-wide representative markers was selected for population structure and diversity analyses and core collection development. Diversity analysis revealed the presence of substantial genetic variation in the collection. In the hierarchical cluster analysis, the collection was grouped into eight clusters. Further subclustering of the clusters was observed in five of the clusters (I, II, III, V, and VI). Cluster II contained the largest number (52) of accessions with their origins traced to different countries, while cluster VII contained the fewest number (6) of accessions originating from India, Yemen, and Zambia. Similarly, the population structure analysis indicated the presence of two main clusters in the collection. The first cluster contained 100 accessions that originated from various African countries (97 accessions), Oman (2 accessions), and Yemen (1 accession), as well as five accessions of unknown origin. All accessions from clusters II, IV, VII and VIII of the hierarchical grouping were contained in this cluster of the STRUCTURE analysis. The second cluster contained 105 accessions originating from Africa (101 accessions), India (3 accessions) and one accession of unknown origin. All accessions from cluster III and most of the accessions from clusters I, V, and VI of the hierarchical grouping were contained in this second cluster of the STRUCTURE analysis. This is the first report of diversity information on the collection using advanced molecular marker technologies. The observed clustering showed the presence of a wide range of genetic diversity in the collection. The result from this study is in line with previous reports, which documented the genetic diversity of the collection using agro-morphological variables [15,16]. However, Jorge et al. [15] reported the lack of clear clustering of the collection based on agro-morphological variables, which may be due to the limited polymorphism of agro-morphological variables compared to molecular markers [39,40] and the continuous nature of the variables [14]. The presence of wide genetic diversity using AFLP markers was also reported in pentaploid buffelgrass germplasm held in the USDA National Germplasm System [21] and among germplasm collected from different provinces of Tunisia [22]. In addition, the phenotypic polymorphism in buffelgrass germplasm from South Africa [17], Pakistan [18], and Tunisia [19] also revealed a wide genetic basis of the buffelgrass genetic resources.

GBS Data Revealed a Lack of Genetic Differentiation among Germplasm from Diverse Origins
The studied collection contains 199 accessions of known origin from 19 countries (16 countries in Africa, India, Oman, and Yemen) and 6 accessions of unknown origin. However, the observed clustering and population stratification did not follow the geographical origins of the genetic resources (Figure 7). Genotypes from the same country (origin) were scattered among the different clusters. For example, germplasm from Tanzania was found in all the eight clusters, while germplasm from Botswana, Ethiopia, India, Kenya, Namibia, Somalia, Uganda, South Africa, and Zimbabwe was distributed in at least three of the clusters (Table S2). This is supported by the weak Mantel correlation coefficient (r = 0.206, p-value = 0.0001) between the genotypic distance and geographic distance ( Figure S1). A similar result was reported in buffelgrass using AFLP markers [21,22]. The lack of direct correlation between genotypic clustering and spatial distribution between pasture and roadside populations of buffelgrass in Mexico was also reported [41]. This could be explained by the historical movement of genetic resources across countries. According to Marshall and colleagues [3], there has been an extensive intercontinental dispersal of buffelgrass for the pastoral industries since the early 1900s. This could be one of the reasons for the lack of correlation between genetic differentiation and geographical backgrounds.

The GBS Data could be Used for Gap Filling in the Collection
A large amount of buffelgrass genetic resources are held by different centres (Table S3). Of the different centres, ILRI's forage genebank maintains geographically diverse germplasm resources collected from different countries in Eastern, Western, and Southern Africa and Asia. Despite the observed genetic diversity, the collection lacks germplasm from some of the countries of origin (Northern Africa, Asia, and Europe) and/or where the species has been naturalized over time (South and North American countries and Australia) [3,4]. Those geographical areas may contain a different set of polymorphisms to add to the existing collection. This is supported by recent reports using geographic information, cytological techniques, and molecular markers. Based on the analysis of geographic information, it was indicated that germplasm from dry environments were underrepresented in the ILRI collection [15]. Buffelgrass is a polymorphic grass of different ploidy levels (tetraploid, pentaploid, hexaploids, and aneuploids), with tetraploid being the most common followed by pentaploid [11]. However, within the germplasm collected from different provinces of Tunisia, hexaploids were the most frequent from warmer climatic conditions, while tetraploid was dominant in humid areas [2]. Septaploids were also reported in germplasm from Australia and South Africa [11]. Though there is no information on the ploidy of accessions in the ILRI collection, germplasm from countries like Tunisia, where hexaploids are common, is not represented in the collection. In line with these reports, there are gaps which could be complemented by germplasm exchange and/or collection of new germplasm from underrepresented areas. This would enhance the chance of capturing unique materials that will widen the diversity in the collection. Molecular toolboxes, such as high-throughput SNP genotyping, could help in the identification of gaps and in making decisions about what to add to the existing collection. ILRI has experience of marker-assisted identification of gaps and germplasm acquisition, as demonstrated in the Napier grass collection, which was used to enhance the diversity of the existing collection [42]. A similar approach could be used to fill some of the identified gaps in the buffelgrass collection.
Another challenge in genebank management is the potential of holding duplicate genotypes and/or closely related genotypes, and the associated cost incurred to regenerate and curate them. A recent report of a substantial number of duplicates, both within and across genebank collections of Aegilops tauschii [43], provides good evidence of the issues associated with the presence of duplicates in genebank collections. The molecular tools described here could also be used to identify and eradicate duplicate materials held in genebank collections.

Core Collection Establishment
One of the goals of genotyping and genomic studies is to enhance the use and conservation of germplasm in genebanks. To this end, the current GBS data were used to develop a core collection (subset), a proportion of the collection that contains most of the genetic diversity and richness of the collection [44,45], that can be used as an entry point to the full range of germplasm available in the collection. Accordingly, a core collection containing 20% of accessions in the entire collection was developed using the R package Core Hunter [38]. The core collection contained germplasm from 10 African countries and Oman. Despite the lack of relationship between the hierarchical clustering and geographical origins of the germplasm, the developed core collection is a good representation of geographic diversity in the collection. The different clusters from the structure analysis were also represented in the core collection. The AMOVA result also confirmed the representativeness of the core collection, with no significant difference between the core collection and the rest of the population in terms of genetic diversity. The developed core collection and/or subset could be used to enhance the germplasm utilization and for multilocational evaluation for some traits of interest.

Conclusions
A collection of buffelgrass, a high-value forage grass, was assessed in this study using the GBS approach. A large number of SNP markers were generated for the genetic analysis of buffelgrass and its related species. Population structure and genetic diversity analyses using 1000 informative 13 markers revealed the presence of a substantial amount of genetic diversity in the collection. A core collection containing 20% of the collection, with germplasm from diverse geographical backgrounds, was identified, which could enhance the use of the buffelgrass genetic resources held in the ILRI genebank. In general, the generated information, together with the developed core collection, could be used to select genotypes with diverse genetic makeup for further evaluation in multilocation trials, as well as for further genomic studies, such as those that aim to develop our understanding of the molecular basis of drought tolerance in buffelgrass.
Supplementary Materials: The following are available online at www.mdpi.com/xxx/s1, Table S1: Passport data of buffelgrass collection, Table S2. Number for accessions in different clusters according to their country of origin, Table S3: Buffelgrass collections and number of accessions held by centres across the world, Figure S1: Mantel correlation analysis of genetic and geographical distance of the buffelgrass accessions Funding: This research was funded by the Genebank platform "use module" and Deutsche Gesellschaft für Internationale Zuammenarbeit (GIZ), "attributed funding".