Analysis of Genetic Diversity and Population Structure of Pigeonpea [Cajanus cajan (L.) Millsp] Accessions Using SSR Markers

Understanding the genetic diversity present amongst crop genotypes is an efficient utilization of germplasm for genetic improvement. The present study was aimed at evaluating genetic diversity and population structure of 48 pigeonpea genotypes from four populations collected from diverse sources. The 48 pigeonpea entries were genotyped using 33 simple sequence repeat (SSR) markers that are polymorphic to assess molecular genetic diversity and genetic relatedness. The informative marker combinations revealed a total of 155 alleles at 33 loci, with an average of 4.78 alleles detected per marker with the mean polymorphic information content (PIC) value of 0.46. Population structure analysis using model based revealed that the germplasm was grouped into two subpopulations. The analysis of molecular variance (AMOVA) revealed that 53.3% of genetic variation existed within individuals. Relatively low population differentiation was recorded amongst the test populations indicated by the mean fixation index (Fst) value of 0.032. The Tanzanian pigeonpea germplasm collection was grouped into three major clusters. The clustering pattern revealed a lack of relationship between geographic origin and genetic diversity. This study provides a foundation for the selection of parental material for genetic improvement.


Introduction
Pigeonpea (Cajanus cajan [L.] Millsp) is an important legume crop widely cultivated in Africa and Asia in diverse environments [1]. Pigeonpea belongs to the family Leguminosae and sub-family Papilionoideae. Cajanus cajan is the only domesticated species among Cajanea family. The crop has the ability to produce an economic yield under low moisture condition making it an important crop in dry areas [2]. Pigeonpea is one of the under-researched crops compared to other common legumes. In Tanzania, the crop is grown in several regions as a food and cash crop [3,4]. Tanzania is one of the top six global exporters of pigeonpea to the Asian market.
Pigeonpea is considered to be a valuable food and commercial crop with diverse uses. This crop improves soil physical properties and yield of associated intercropped crops while simultaneously yielding marketable grain. Their thick stems are mainly used as fuel food and roofing material [5,6]. Moreover, it provides an important source of protein for low-resource farmers who cannot afford animal protein [7]. In Australia, pigeonpea is used to reduce population of the cotton bollworm

DNA Extraction
The genomic DNA was extracted from fresh leaf material from 10-to 14 day-old-plants of each of the 48 pigeonpea entries and was ground to fine powder in liquid nitrogen, following the cetyl trimethyl ammonium bromide (CTAB) method as described by Mace et al. [21], with some modification. The quality and concentration of the extracted DNA was estimated by using a spectrophometer and the samples were diluted to a final concentration of 30 ng µL −1 . For all samples, the DNA quality was determined by agarose gel electrophoresis (0.8% (w/v) stained with 5 µ/100 mL Gel ® (Biotium Inc., Hayward, CA, USA), while the quantity was determined by spectrophotometry (Nanodrop© 100, Thermo Scientific, Wilmington, DE, USA). The DNA samples were analysed at ICRISAT-laboratory, Centre of Excellence in Genomics, India.

Polymerase Chain Reaction (PCR) Amplification
Out of 35 SSR markers tested, only 33 SSR markers were polymorphic and used for this study ( Table 2). The markers used for PCR amplification were selected based on sequence information and has been used in different genomic studies of pigeonpea [22]. The PCR amplification was optimized and conducted in a reaction buffer of 12.5 µL containing 1 × PCR buffer; 1 Unit Taq DNA polymerase; 0.2 mM each of dATP, dGTP and dTTP; 3 mM of MgCl 2 , 0.1 µM of respective forward and reverse primer and 40 ng of genomic DNA. The PCR amplification was carried out in a Bioer XP Thermal Cycler (Hangzhou Bioer Technologies, Hangzhou, China). The thermal cycling conditions were as follows: initial denaturation at 94 • C for 5 min, followed by 35 cycles of denaturation (94 • C) for 1 min, annealing (56-72 • C) for 1 min, primer extension (72 • C) for 1 min, followed by an extension at 72 • C for 20 min. The amplification products were analysed by electrophoresis on a 2.8% agarose gel, stained with ethidium bromide and photographed under short wavelength ultraviolet (UV) light in a gel documentation system. A 100 bp DNA ladder (MBI Fermentas, Baden Wurttemberg, Germany) was used as a size fragment standard.

Data Analysis
The allelic size was estimated visually for all 48 genotypes of pigeonpea. The band size of amplified products was determined by comparing with 100 bp DNA ladder (MBI Fermentas, Baden Wurttemberg, Germany). The SSR bands scored in pigeonpea genotypes was subjected to GenAlex version 6.5 [23] to assess the genetic diversity. Statistical parameters defining total number of alleles per locus (Na), number of effective alleles per locus (Ne), Shannon's information index (I), observed heterozygosity (Ho), gene diversity (He), unbiased expected heterozygosity (uHe) and fixation index (F) were determined using the protocol of Nei and Li [24]. The polymorphic information content (PIC) values were calculated for each SSR locus as PIC = 1 − Σ (pi2), where pi is the frequency of the ith allele. An analysis of molecular variance (AMOVA) was performed to test the degree of differentiation among and within the sources of collection of the pigeonpea genotypes.
The population structure of the 48 pigeonpea was established using the Bayesian clustering method in STRUCTURE version 2.3.4 [25]. The length of the burn-in period and Markov Chain Monte Carlo (MCMC) were set at 10,000 iterations [26]. To obtain an accurate estimation of the number of populations, 20 n runs were performed for each K-value (assumed number of subpopulations), ranging from 1 to 10. Further, Delta K values were calculated and the appropriate K value was estimated by implementing the method of Evanno et al. [26] using the STRUCTURE Harvester program [27]. The relatedness was estimated by the genetic dissimilarity coefficients and the dendrogram were drawn using the unweighted pair group method (UPGMA) in DARwin 6.0 [28].

Genetic Polymorphism of Simple Sequence Repeat (SSR) Markers
The sizes of amplified polymorphic DNA fragments (bands) ranged from 127 to 280 bp (Table 2). Genetic diversity parameters, such as number of alleles per locus (Na), number of effective alleles per locus (Ne), Shannon's information index (I), observed (Ho) and expected (He) heterozygosity, unbiased expected heterozygosity (uHe), fixation index (F) and polymorphic information content (PIC) are presented in Table 3. A total of 158 alleles were amplified among the 48 pigeonpea and the numbers of alleles scored for 33 loci ranged from 2 to 11 with an average of 4.78. The lowest numbers of alleles per locus were detected from the markers CcM0484, CcM0594, CcM0673, CcM0785, CcM1045, CcM1251, CcM2379 and CcM2505. The maximum number of alleles (11) was detected at the CcM0246 and CcM0443 locus. The markers CcM2379 and CcM0246 had the lowest and highest numbers of effective alleles of 1.02 and 6.14, respectively.  The PIC value of the SSR markers, which is a measure of allele diversity at a locus, ranged from 0.04 to 0.84 with an average of 0.44. The observed variations in PIC value in this study could be attributed to genotypic differences in the pigeonpea material used. Eight SSR loci (CcM0246, CcM0381, CcM0443, CcM0492, CcM0721, CcM0974, CcM2704 and CcM2895) exhibited PIC values higher than 0.70 indicating their usefulness in discriminating genotypes. The observed heterozygosity values ranged from 0.0 to 0.69 with an average of 0.26. The gene diversity ranged from 0.04 to 0.85, with a mean of 0.45 (Table 3). Markers CcM2379 and CcM0246 had the lowest and highest genetic diversities, respectively. The inbreeding coefficient (F IS ) ranged from 0.03 to 0.93 with a mean of 0.39. Two loci (CcM0698 and CcM2379) showed an F IS value of 1 suggesting the alleles at these loci are fixed i.e., reached 100% homozygosity and, therefore were excluded from the analysis. However, six loci (CcM0484, CcM0444, CcM2004, CcM2044, CcM2505 and CcM2049) showed a very low F IS values signifying that the alleles at these loci are heterozygotes.

Genetic Relationship among 48 Pigeonpea Genotypes Based on Source of Collection
The genetic parameters of the population studied based on source of collection were presented in Table 4. Among the four-population investigated, the mean values of observed alleles (Na) and effective alleles (Ne) were 3.52 and 2.30, respectively. Popn 1 (Improved genotypes-ICRISAT) recorded the lowest values of Na

Population Structure of 48 Pigeonpea Accessions
Based on the ∆K value, the population structure analysis of the 48 pigeonpea genotypes grouped the population into two subpopulations (Figure 1). Similarly, the maximum ad hoc measure ∆K was found to be K = 2 (Figure 1). Membership of all genotypes to a particular sub-cluster was based on at least 70% ancestry. Cluster 2 had the largest membership with 78% of the population, while the smallest was Cluster 1 with only 22% (Table 5). Sub-population 1 comprised of genotypes from landraces (54.5%) and improved cultivars (45.5%) mainly long maturity duration, and sub-population 2 consisted of genotypes from landraces (64.9%) and improved cultivars (35.1%) mainly medium-maturity duration. and graph of estimated membership fraction for K = 2. The maximum of ad hoc measure ∆K determined by structure harvester was found to be K = 2, which indicated that the four populations can be grouped into two subgroups.

Genetic Cluster Analysis of 48 Pigeonpea Accessions
The UPGMA cluster analysis based on genetic dissimilarity using the neighbour-joining method in DARwin 5.0 grouped the 48 pigeonpea genotypes into three genetic clusters (Figure 2). Three genetic clusters identified were not consistent with the results of structure analysis. There is no other cluster made up exclusively of accession from the same geographic location except cluster I. Cluster I contained only 1 accession (landraces from NPGRC). Cluster 2 contained 17 accessions (7 landraces from NPGRC, 3 landraces from farmers, 4 released cultivars from TARI and 3 breeding line from ICRISAT). Cluster III contained 30 accessions (9 breeding line from ICRISAT, 12 landraces from NPGRC, 3 released cultivar from TARI and 6 landraces from farmers). Genotypes with the most distinct genetic make-up are ICEAP 00040, ICEAP 00932, Bangili, Babati White, ICEAP 00557, ICEAP 00554, TZA 253, TZA 5596 and TZA 2466 could be considered for future breeding.

Analysis of Molecular Variance (AMOVA)
AMOVA was performed in model-based populations ( Table 6). The results of AMOVA revealed that the majority of variance occurred within individuals and accounted for 53.3% of the total variation, whereas 3.2% and 43.5% of the variation was attributed to differences between population and among individuals. Calculation of Wright's [29] F statistic at all SSR loci revealed that F IS was and 0.449 and F IT was 0.466. Determination of mean fixation index (F ST ) for the polymorphic loci across all accessions indicated that F ST was 0.0318 which implies low genetic variation across genetic subgroups. The haploid Nm was very high (7.6) indicating a high gene exchange among populations (Table 6). These results demonstrated that genetic differentiation among subpopulations was low and within subpopulations was high.

Discussion
The estimated average PIC value (0.46) recorded in the current study was similar to PIC value (0.49) reported by Sousa et al. [18], lower than a mean PIC (0.57) reported by Zavinon et al. [19] but relatively higher than the average PIC value of 0.37 as reported by Bohra et al. [30]. Similarly, Singh et al. [31] observed a PIC value of 0.515. Also, Singh et al. [32] using SSR markers to assess diversity of 40 genotypes, including four wild relatives observed a high mean PIC value of 0.523. However, a PIC value obtained from this study is greater than the study reported by Njung'e et al. [17] on pigeonpea.
Low PIC values of 0.18 and a high PIC value of 0.90 was reported by Walunjkar et al. [33] using SSR and random amplified polymorphic DNA (RAPD) markers, respectively. Rani et al. [34] using 10 RAPD and 10 ISSR markers for 42 genotypes reported high PIC values of 0.73 and 0.77, respectively. Using genotyping-by-sequencing (GBS) and single nucleotide polymorphisms (SNPs) a lower PIC value of 0.25 was reported [19]. The low levels of polymorphism reported in this study are in agreement with previous findings of several researchers [35][36][37], who reported low polymorphism in cultivated pigeonpea genotypes. Therefore, the SSR markers used in this study confirmed the existence of genetic variation in pigeonpea germplasm.
Genetic improvement of crops depends on the amount of genetic variation among the breeding material. Previous studies of genetic diversity in Brazilian pigeonpea genotypes by Sousa et al. [18] and Malawian germplasm by Njung'e et al. [17], using SSR markers reported a higher average number of alleles of 5.1 and 5.58, respectively. An earlier study of Kenyan pigeonpea and Indian accessions by Songok et al. [38] using 88 genotypes and six SSR markers reported a high average number of alleles per locus. This shows that genotypes studied in Kenya, Malawi and Brazil have a higher diversity than that of Tanzania. The reason for low diversity of pigeonpea in Tanzania as revealed by SSR markers is attributed due to low number of genotypes used in this study and lack of wild relatives conserved in the gene bank. The low average number of alleles observed in this study could be attributed to the fact that the genotypes were collected from a relative narrow geographical area.
The average gene diversity (H e ), which is a measure of genetic diversity observed, in the present study, was low compared to most previous studies [38] which reported a higher genetic diversity in Indian genotypes than East African germplasm. Wasike et al. [39] investigated the Asian and African accession using AFLP markers and revealed that Indian accession are more diverse than African accession. These findings and several other reports suggest that India is the centre of origin and domestication of pigeonpea and East Africa is the secondary centre of genetic diversity. In the present study, the observed heterozygosity of the genotypes was low. Comparable results were obtained from a microsatellite-based study that involved 77 accessions from Brazil using 43 SSR markers [18]. The low level of observed heterozygosity is mostly likely attributable to the farmers selection pressure that might have reduced polymorphism in the populations.
The population structure analysis based on STRUCTURE revealed the presence of two subpopulations among the 48 pigeonpea accessions collected from a subset of Tanzanian and Kenyan germplasm. This result is supporting the earlier findings by Bohra et al. [30] who reported two subpopulations among 94 tested pigeonpea. By contrast, population structure of South American pigeonpea accessions by Sousa et al. [18] reported four subpopulations.
In the present study, the genetic variation components confirmed that there is fair genetic diversity among individuals within the population (43.5%) than among populations (3.2%). The percentage variation within groups was as high as 53.3% of the given populations. The current study is in agreement with the findings of previous publications on pigeonpea [40,41], which reported a higher percentage variation within groups than among populations and among individuals. In a similar way, Bohra et al. [30] reported a high genetic diversity within the population (89%) than among individuals (11%). The value of Fst was observed to be 0.031, indicating little differentiation among populations. The fixation index (Fst) obtained in the current study were lower than those of Kassa et al. [42] and Kumar et al. [40] with 0.949 and 0.17, respectively.
The dendrogram based on SSR markers revealed three major clusters. This indicates the existence of a high degree of genetic diversity in the germplasm evaluated in this study. Therefore, these germplasms would serve as a valuable source for the selection of diverse parents useful for plant breeders to improve the existing commercial cultivars. In this study distinct clustering was not observed according to geographical basis or on the basis of maturity duration. This was also reported by Petchiammal et al. [43], who recorded 21 pigeonpea genotypes including 3 wild relatives with the same morphological features grouped into different clusters. By contrast, Singh et al. [44] and Songok et al. [38] reported a grouping of pigeonpea genotypes on geographical basis. Also, Bohra et al. [30] reported a pattern of grouping genotypes of long maturity duration within the same cluster similar to the previous findings [31,45].

Conclusions
This study deduces the presence of a considerable level of genetic diversity among pigeonpea genotypes in Tanzania. This will serve as basic information by providing options to breeders to develop, through selection and breeding, new and more productive varieties that are adapted to changing environments. This germplasm could also be used for mapping population studies, producing a core collection, and facilitates the identification of useful traits. Additionally, with the enlarged sampling area, the existing germplasm will allow us to search for additional rare alleles.