Genetic Diversity and Population Structure of a Large USDA Sesame Collection

Seay, Damien; Szczepanek, Aaron; De La Fuente, Gerald N.; Votava, Eric; Abdel-Haleem, Hussein

doi:10.3390/plants13131765

Open AccessArticle

Genetic Diversity and Population Structure of a Large USDA Sesame Collection

by

Damien Seay

^1,†

,

Aaron Szczepanek

^1,†,

Gerald N. De La Fuente

²,

Eric Votava

² and

Hussein Abdel-Haleem

^1,*

¹

US Arid Land Agricultural Research Center, USDA ARS, Maricopa, AZ 85138, USA

²

Sesaco Corporation, 5405 Bandera Rd. San Antonio, TX 78238, USA

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Plants 2024, 13(13), 1765; https://doi.org/10.3390/plants13131765

Submission received: 21 May 2024 / Revised: 11 June 2024 / Accepted: 24 June 2024 / Published: 26 June 2024

(This article belongs to the Section Plant Genetic Resources)

Download

Browse Figures

Versions Notes

Abstract

Sesame, Sesamum indicum L., is one of the oldest domesticated crops used for its oil and protein in many parts of the world. To build genomic resources for sesame that could be used to improve sesame productivity and responses to stresses, a USDA sesame germplasm collection of 501 accessions originating from 36 countries was used in this study. The panel was genotyped using genotyping-by-sequencing (GBS) technology to explore its genetic diversity and population structure and the relatedness among its accessions. A total of 24,735 high-quality single-nucleotide polymorphism (SNP) markers were identified over the 13 chromosomes. The marker density was 1900 SNP per chromosome, with an average polymorphism information content (PIC) value of 0.267. The marker polymorphisms and heterozygosity estimators indicated the usefulness of the identified SNPs to be used in future genetic studies and breeding activities. The population structure, principal components analysis (PCA), and unrooted neighbor-joining phylogenetic tree analyses classified two distinct subpopulations, indicating a wide genetic diversity within the USDA sesame collection. Analysis of molecular variance (AMOVA) revealed that 29.5% of the variation in this population was due to subpopulations, while 57.5% of the variation was due to variation among the accessions within the subpopulations. These results showed the degree of differentiation between the two subpopulations as well as within each subpopulation. The high fixation index (F_ST) between the distinguished subpopulations indicates a wide genetic diversity and high genetic differentiation among and within the identified subpopulations. The linkage disequilibrium (LD) pattern averaged 161 Kbp for the whole sesame genome, while the LD decay ranged from 168 Kbp at chromosome LG09 to 123 Kbp in chromosome LG05. These findings could explain the complications of linkage drag among the traits during selections. The selected accessions and genotyped SNPs provide tools to enhance genetic gain in sesame breeding programs through molecular approaches.

Keywords:

Sesamum indicum L.; genotyping-by-sequencing (GBS); next-generation sequencing single-nucleotide polymorphism (SNP)

1. Introduction

Sesame, Sesamum indicum L., is a 5000-year-old domesticated crop [1]. In many countries, sesame is one of the main industrial crops used for its seed oil and protein. Sesame seed’s oil content reaches 50% of the seed weight, which is higher than that of cotton (15%), soybean (20%), canola (40%), and sunflower (40%). Furthermore, oleic (40%) and linoleic (40%) acids are the prominent fatty acids in sesame oil, making it an ideal healthy fat source in nutritional applications [2]. That said, while oils account for the majority of the seed weight, it is important to note that sesame seed proteins also contribute to 28% of the seed weight [3].

The first step in the genetic enhancement of a crop is to understand its genetic diversity. Advancements in next-generation sequencing technologies have paved the road for a deeper understanding of these relationships. As such, it has been made possible to expedite the discovering of genes and alleles controlling these traits via genome-wide association studies (GWAS), genetic mapping, and omics approaches such transcriptomics, metabolomics, and lipidomics, and the selection of parental accessions for use in conventional plant breeding techniques. As one of the important oil crops, sesame’s germplasm and accessions were characterized both phenotypically [4,5,6,7,8,9] and genetically [10,11,12], indicating the possibility of improving sesame for high productivity and biotic and abiotic stress tolerance [5,13,14]. Sesame’s genetic diversity was assayed using different types of biochemical [15] and molecular markers, among which were random amplified polymorphic DNA (RAPD) [16,17,18], simple sequence repeats (SSR) [4,5,19,20], inter-simple sequence repeats (ISSR) [11,21], amplified fragment length polymorphism (AFLP) [7,22], and sequence-related amplified polymorphism (SRAP) [23]. Utilizing next-generation sequencing (NGS) technologies reduced the genotyping time and effort required to genotype thousands of genotypes and, thus, reduce the cost, which has made single-nucleotide polymorphism (SNP) markers the most common marker type for genome-wide studies [24], where SNPs are biallelic markers that are uniformly distributed throughout the genomes. SNP markers were used to explore the genetic diversity of Mediterranean [12] and Ethiopian [25] sesame core collections. Genotyping-by-sequencing technology (GBS) [26] is one of the NGS high-throughput genotyping technologies that was used for SNP discovery and the genotyping of several crops, such as Brassica rapa L. [27], Brassica carinata A. Braun [28], Brassica juncea L. [29], Camelina sativa [30], Olea europaea [31], Glycine max [32], Citrullus lanatus [33], and Triticum aestivum [34].

Our current study used GBS technology to genotype the USDA sesame collection and characterize its genetic diversity and population structure. Specifically, the objectives of our study were as follows: to (1) discover SNP markers in the USDA sesame diversity panel; (2) characterize the genetic diversity and population structure; and (3) characterize the genetic differentiation between and within its subpopulations.

2. Results and Discussion

2.1. Single-Nucleotide Polymorphism Markers Coverage and Polymorphism Analyses

Besides playing a crucial role in the evolution of plant populations, the degree of genetic diversity has an important role in improving economic crops through the integration of new traits/alleles/genes into the genetic pool. It is a fundamental step in studying the genetic relationships among and within crop populations to improve their productivity and responses to environments. To reach this goal, phenotypic and genetic diversity studies were conducted to explore the variations in sesame populations [4,5,6,10,12,13,14,15,16,17,18,19,20,21,22,23,25,35,36,37,38]. In this study, we employed NGS-GBS technology to gain a deep understanding of the USDA sesame collection and study the genetic relationships among its accessions. The advantages of using GBS technology to discover SNP markers include genome-wide coverage, a lower error rate, and increased cost effectiveness [39,40]. The USDA sesame collection consists of 501 accessions collected from 36 countries in Asia (245 accessions), Africa (119 accessions), Europe (77 accessions), North America (42 accessions), and South America (18 accessions) (Figure 1, Supplementary Table S1). Previous genetic and cytological studies suggest that the center of origin for sesame is the African continent and Indian subcontinent [41], with some cultivated sesame growing in Asia being able to hybridize with S. indicum to produce fertile seeds [1]. Refs. [39,40] indicated that the wide distribution of sesame accessions over a wide area of the world could be due to material exchange around the trade routes.

The sequencing of the USDA sesame accessions using an NGS-GBS pipeline resulted in 3,277,732,037 raw reads. After trimming off the barcodes and truncating bad sequences, those reads were filtered down to 2,962,283,105 demultiplexed reads, where 881,504 tags were mapped to the aligned chromosome using the Sesamum indicum genome sequence (genome assembly: ASM2616843v1) [42]. The GBS pipeline identified 24,735 high-quality bi-allelic SNPs after rejecting SNPs with an MAF < 0.05, a missing rate > 20%, and a heterozygous proportion <25%. The 24,735 SNP markers were uniformly distributed across the 13 sesame chromosomes (Figure 2, Table 1).

In general, there was an increase in polymorphism near the telomeres compared to the centromeres (Figure 2). The average number of SNPs per chromosome was 1903, with chromosome LG03 having the largest number of SNPs (3118), while chromosome LG13 had the fewest mapped SNPs (1198). There was an SNP mapped every 12.76 Kbp, and that coverage ranged from 16.28 Kbp on chromosome LG13 to 9.25 Kbp on chromosome LG12 (Table 1).

The average SNP density was 81.73 SNPs in an Mbp across the chromosomes (Table 1), and it ranged from 108 SNPs/Mbp mapped on chromosome LG12 to 53 SNPs/Mbp mapped on chromosome LG2 (Table 1). The reasons for the different polymorphism rates among the chromosomes could be related to selection pressure during domestication events and breeding processes and/or different new recombination and mutation rates [28,43,44,45]. The polymorphisms rate is affected by the substitution mutation type and can be classified into transition and transversion mutations. The substitution SNPs mapped to the sesame genome in the current study were 58.4% classified as transition mutations, while 41.6% were classified as transversion mutations (Table 2), which is similar to what is observed in Mediterranean sesames [12], Brassica rapa L. [46,47], B. napus L. [48], Brassica juncea L [29], and Camelina sativa L. [30]. One possible explanation for the higher number of transition mutation frequencies relative to transversion mutation frequencies could be the increased stability of transition mutations during natural selection and, consequently, during exposure to selective pressures [30,46,49].

Marker polymorphism and heterozygosity estimations are used to determine the usfulness of markers in molecular breeding programs. These estimators are used as well to understand the causes of diversity changes due to mutations and natural and artificial selection forces [50]. For example, a high expected heterozygosity (He), the proportion of heterozygous genotypes expected under the Hardy–Weinberg equilibrium equation [51,52], means larger genetic adaptability and wider popultion genetic diversity. The current results inidcated that the USDA sesame panel has considerable genetic diversity in terms of He, where the He values ranged from 0.088 to 0.500, with an average of 0.332. Approximately 1.3% of the genotyped SNPs had a He ≤ 0.100, while 58% had a He ≥ 0.300 (Figure 3A). These results were slightly lower than the reported He values in sesame [5,38] using a lower number of SSR markers. Perhaps these discrepancies in the He values can be explained by the effects of the population size and diversity as well as the molecular marker types and allele numbers per locus. For example, the He value was 0.283 for rutabaga (B. napus) accessions collected from Nordic countries [53], while it was 0.43 for a more diverse group collected from 19 countries from North Africa, Europe, North America, Asia, and New Zealand [48]. Moreover, the He estimation was 0.559 for 48 African sesame genotypes genotyped with 33 SSR markers and 0.11 for 293 African accession genotypes with 6473 SNP markers [25].

The polymorphism information content (PIC), estimates marker polymorphisms and informativeness, is useful tool in linkage analysis and molecular breeding studies [50,54]. The PIC values depend on the marker type, number of alleles, and position of markers on a given chromosome as well as the origin of the population. For example, the PIC values ranged from 0.18 to 0.81 when Brassica juncea accessions were genotyped using SSR and RAPD markers, respectively [55], and reached 0.95 [56] with different sets of those markers. The same approach was observed in Brassica napus when using RAPD and SSR markers [57,58,59,60] compared to SNP markers [45,48,53]. In the current study, using 501 sesame accesions collected from 36 countries and genotyped with 24,735 SNP markers, the average PIC value was 0.267 (Figure 3B). Previously published results of sesame PIC values indicated that the PIC values ranged from 0.12 using SNP markers [25] to 0.51 using SSR markers [38] for African-origin sesame accessions. Taking into account that SNP markers are biallelic, which reduces the overall PIC values [61], 59.2% and 30.2% of the SNP markers indentified in the current study were considered highly and moderately informative, respectively (Figure 3B). The informativeness of the SNP markers identified using NGS-GBS technology in the current study could lay the foundation for in-depth genome-wide association studies (GWAS) as well as genetic mapping studies to discover alleles/genes related to sesame productivity and response to stresses.

2.2. Analysis of Population Structure

A population structure analysis was conducted to understand the genetic relationships among the panel of 501 sesame accessions. Understanding the population structure of sesame helps to reduce false-positive associations due to the population structure among individuals when conducting GWAS analyses [62]. Further, the classification of subpopulations within the panel can assist with the identification of diverse parental genotypes to be used in future genetic and breeding activities. The maximized marginal likelihood of the fastStructure software was used to group the USDA sesame panel into subpopulations in the range of k = 1 − 10. The MedMedK, MedMeanK, MaxMedK, and MaxMeanK methods [63] (Figure 4A) of the Structure Selector software [64] indicated that the USDA sesame panel was grouped into two distinct subpopulations (Figure 4B; Supplementary Table S1). The number of sesame subpopulations identified in previous studies varied based on the population size, the origin of the genotypes, and the number and type of molecular markers used. For example, previous research has grouped sesame populations into two [7,10,37,38], three [5,12,19,65], or four subpopulations [4,25]. The current results indicated that sesame subpopulation 1 included 256 accessions that were mainly collected from Asia (133 accessions), where 78 accessions were collected from eastern Asia, while subpopulation 2 consisted of 245 accessions, with 121 being collected from Asia (108 from the Indian subcontinent) and 93 being collected from Africa (65 from Sudan) (Supplementary Table S1). It is believed that the origins of cultivated sesame can be traced back to two separate regions, Africa and/or the Indian subcontinent [1,41], from which the worldwide spreading of sesame likely occurred via human migration and trade [12]. This could explain why there are no clear distinguishing features when comparing sesame from different countries.

To confirm the results of the fastStructure analysis and explore the genetic relatedness among the USDA sesame collection, principal component analysis (PCA) and neighbor-joining (NJ) tree analyses were conducted using the 501 accessions and 24,735 genotyped SNPs. In agreement with the fastStracture results, the phylogenetic analysis showed two major clades with minor displacements of the mixtures (Figure 4C). The clustering patterns in the PCA clearly distinguished two distinct groups (Figure 5A), echoing the fastStructure results. To gain a more in-depth understanding of the relationship among the sesame accessions, the two clusters were reclassified based on their country/continent of origin (Figure 5B). The PCA analysis in respect of the accessions’ origin revealed four subclusters: one derived from the Indian subcontinent, a second from eastern and western Africa, a third from eastern Europe/eastern Asia/North America/South America (Pacific Ocean), and a fourth from southern Europe/western Asia/northern Africa (Mediterranean and Middle East region). Again, these findings support the notion that the modern-day sesame distribution has largely been influenced by historic trade routes and human migration (Figure 5B).

2.3. Analysis of Molecular Variance (AMOVA) and Genetic Diversity Indices

An analysis of molecular variance (AMOVA) was conducted to explain the relationship among the two sesame subpopulations and estimate their differentiation. The AMOVA results indicated that 29.5% of the variation in the studied sesame collection was among the subpopulations, 57.5% of the variation was among the accessions within each subpopulation, and the remainder (13%) was attributed to the variation within the accessions (Table 3). The fixation index (F_ST), i.e., the degree of differentiation among the populations, was considered very strongly differentiated at F_ST values were >0.25 [27,66]. Our results showed an F_ST estimate of 0.867, which indicates a very strong differentiation between the two sesame subpopulations.

2.4. Linkage Disequilibrium

Understanding the linkage disequilibrium (LD) pattern, the nonrandom associations among alleles at different loci [67,68], can help with exploring the linkage drag, i.e., the unintentional co-selecting of undesirable linked genes [69]. Identifying closely linked polymorphic SNPs led to creating a strong LD and haplotype blocks and, thus, impacted the resolution of the gene identification studies by reducing the recombination rate [28,70,71,72]. Moreover, the LD can explain the genetic diversity, where it is affected by factors that can change the genetic diversity, such as natural selection, population bottlenecks, genetic drift, inbreeding, inversions and gene conversion, the recombination rate, and mating system [73,74].

The pairwise r², which is the square of the correlation between alleles at two loci in the same gamete, was used to study the LD in the USDA sesame collection mapped with 24,735 SNPs across the 13 chromosomes. The LD decay, at an r² threshold = 0.20, averaged at 160.69 Kbp for the whole sesame genome as a whole population (Figure 6, Table 4). Previous studies have reported different values for the LD decay [10,37,75], controlled by the number and origin of sesame accessions and/or number of SNPs and marker density. In the current study, the LD decay value varied among the subpopulations and chromosomes (Table 4). These results show the variation in the LD decay between subpopulation 1 (166.5 Kbp) and subpopulation 2 (143.8 Kbp) (Figure 6, Table 4). Further, the LD decay varied across the 13 sesame chromosomes, with a value of 123.3 Kbp in chromosome LG05 and a value of 167.7 Kbp estimated at chromosome LG09 (Table 4). The observed variation in the LD decay could be related to differences in polymorphism, including both the number and positions of SNPs as well as diversity among the sesame chromosomes (Figure 2, Table 1). Differences in the LD among chromosomes have previously been reported in other crops. For example, variations in the LD values of brassica chromosomes and sub-genomes were reported to be the result of increased gene conservation, large segmental structural variation [76], and/or the ecogeographical adaptation and artificial selection for important traits [77]; for example, the selection of the flowering time and/or seed quality [78] can reduce genetic diversity and increase the LD and haplotype block size regions. These findings are likely applicable to many domesticated crops, including sesame, and highlight the mechanisms by which the LD variance can be explained in the current study.

2.5. Applications of High-Throughput Genotyping of Sesame Accessions

Recent advancements in genomics approaches, such as next-generation sequencing, high-throughput genotyping, and omics technologies, allow plant breeders to explore the genetic diversity within crop populations and, thus, rapidly accelerate plant breeding for germplasm improvement through identifying favorable gene/alleles in specific genotypes. Additionally, the low-cost nature of these methodologies and their overall efficiency have led to the development of advanced, accessible, high-throughput sequencing technologies (HTS) [79]. One of the HTS approaches is genotyping-by-sequencing (GBS) technology, which has facilitated the discovery of hundreds of thousands of SNP markers at a low cost in plant populations [80]. GBS technology was used to characterize the genetic diversity of crops such as Camelina sativa [30] and Brassica juncea [29], and the identified SNPs have subsequently been used for gene discovery [81,82,83]. As such, these technologies were used to genotype the panel of 501 sesame accessions in this study, collected from different parts of the world and maintained by the USDA-NPGS system, with 24,735 SNP markers. These markers span the entire Sesamum indicum genome and illustrate the wide genetic diversity of the panel. Moving forward, the genotyped sesame accessions and the mapped SNP markers will be used to design and conduct allele/gene discovery studies using a GWAS approach, the results of which could lead to the discovery of candidate genes controlling the seed yield, oil content, fatty acid compositions, and tolerances to abiotic (drought, high heat, salt, and flood) and biotic (disease and insect) stresses. Further, phenotyping these accessions may also lead to the identification of genetically diverse accessions with traits of interest. Of these, the genotyped accessions will be used as parents in sesame breeding programs using innovative technologies such as marker-assisted selection and genomic selection, speed breeding, and doubled-haploid methodologies to rapidly increase sesame productivity and stability under different environments.

3. Materials and Methods

3.1. Plant Materials

The current study assessed a large USDA sesame collection consisting of 501 sesame accessions, including old landraces, old varieties, and breeding lines that were collected from 36 counties in Asia, Africa, Europe, and North and South America (Figure 1, Supplement Table S1) and that is maintained at the USDA-ARS Plant Genetic Resources Conservation Unit (PGRCU), Georgia, USA. The panel was planted in the greenhouse at the U.S. Arid-Land Research Center (ALARC, USDA-ARS) at Maricopa, Arizona, USA, during the summer of 2022. The fresh tissues were collected from one-month-old plants and stored at −80 °C for further analyses.

3.2. DNA Extraction and Genotyping-by-Sequencing (GBS)

DNA was extracted according to Muthulakshmi et al. [84] with modifications. Briefly, frozen leaf tissue from each accession (~0.10 g) was lyophilized, and grinding buffer (100 mM Tris-HCl, 5 mM EDTA, 0.35 M sorbitol, 2%PVP, 1% v/v ßME) was added to the tissue and homogenized using stainless-steel beads and a Geno/Grinder 2010 device (SPEX SamplePrep, Metuchen, NJ, USA). The homogenized tissue was centrifuged, and the polyphenol-/saccharide-containing supernatant was removed. This process was performed two additional times until the supernatant was no longer viscous to ensure the removal of contaminants that could interfere with downstream applications. The remaining steps of the modified CTAB extraction protocol were conducted as described, with DNA being eluted in TE buffer and stored at −20 °C until further use. DNA quality and concentrations were determined using a NanoDrop Spectrophotometer (Thermo Scientific, Boston, MA, USA). Additionally, DNA integrity was determined via BamHI-HF (New England Biolabs, Ipswich, MA, USA) restriction digests, the bands from which were subsequently visualized and assessed after running them in 1% agarose gel electrophoresis. To optimize the GBS library preparation, eight DNA samples were chosen at random to generate GBS libraries with each enzyme, ApeKI, PstI/MspI, PstI/BfaI, NsiI/MspI, and NsiI/BfaI. The libraries were run on an Agilent Tapestation 4200 device (Agilent Technologies, Santa Clara, CA, USA) to observe the fragment sizing and profile. ApeKI was chosen based on its smooth profile and concise fragment size range. The DNA samples were digested with the ApeKI restriction enzyme to prepare the GBS library [26,27,80]. Library preparation and Illumina sequencing were carried out by the University of Wisconsin Bioinformatics Resource Center (UWBRC) using a NovaSeq X Plus 2 × 150 sequencer.

3.3. GBS Sequencing and Genotyping Pipeline Analyses

Sequencing and genotyping analyses were carried out according to the UWBRC pipelines and Luo et al. [30] and Abdel-Haleem et al. [29]. Briefly, the raw sequence data, in Fastq formatted files, were trimmed to remove any sequencing adaptors and low-quality bases using skewer software [85]. The pre-processed raw Fastq files were analyzed using the TASSEL v5.0 GBS v2 pipeline [86]. The trimmed GBS sequencing data were converted into a unique tag database using the GBSSeqToTagDBPlugin to trim off the barcodes and truncate the sequences. The GBS tags were exported from the database in fastq format using the TagExportToFastqPlugin and were aligned to the Sesamum indicum genome (genome assembly: ASM2616843v1; https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_026168435.1/, accessed on 15 May 2024) using Bowtie 2 [87]. Sequence Alignment Map (SAM)-formatted files were used to import the alignments to the GBS database using the SAMToGBSdbPlugin to determine the potential positions of tags against the reference genome. Identified SNPs were called from the imported alignments using the DiscoverySNPCallerPluginV2. The SNPQualityProfilerPlugin was used to calculate the coverage, depth, and genotypic statistics for alignments in the database. The ProductionSNPCallerPluginV2 was used to convert the SNP data from the fastq format to a Variant Call Format (VCF) file. The mapped SNPs were filtered to keep only biallelic sites with, at most, 20% missing data using vcftools [88].

3.4. Population Genetic Analyses

3.4.1. Marker Polymorphism Analyses

The VCF files were converted to HAPMAP format using the TASSEL export feature. The resulted SNPs were further filtered to remove SNPs with a minor allele frequency (MAF) < 0.05. The number of alleles and allele frequencies and MAF for each SNP were calculated using the geno summary function of TASSEL v5.0 GUI [89] (www.maizegenetics.net, accessed on 15 May 2024). The expected heterozygosity (He), expressed as the expected proportion of heterozygous genotypes under the Hardy–Weinberg equilibrium, was calculated following Nei’s equation [51], and the polymorphism information content (PIC) was calculated following Botstein et al. [50].

3.4.2. Analysis of Population Structure

The population structure was estimated using the fastStructure software [90] using the Bayesian Markov Chain Monte Carlo (MCMC) model and algorithms that allow for inferring population structures in large SNP data sets. FastStructure was run with the “simple prior” option and the remaining default parameters. The number of subpopulations (K) was set to 1–10, and the best number of subpopulations was selected using the “chooseK.py” function to maximize the marginal likelihood of subpopulations [90]. The MedMedK, MedMeanK, MaxMedK, and MaxMeanK methods [63] of the Structure Selector software [64] were used to identify the subtle clustering patterns and optimal subpopulation number. The admixture proportions of each sesame accession, estimated by fastStructure, were visualized using the Pophelper 2.3.1 software [91].

Principle component analysis (PCA) was carried out using the PCA function of TASSEL v5.0 GUI and plotted using the R package ggplot2 (http://www.r-project.org, accessed on 15 May 2024). An unrooted neighbor-joining phylogenetic tree was constructed using the Tassel software and visualized using interactive tree of life (iTOL) (https://itol.embl.de, accessed on 15 May 2024).

3.5. Analysis of Molecular Variance (AMOVA) and Genetic Diversity Indices

The defined two subpopulations determined with the fastStructure and structure selector software were used to conduct an analysis of molecular variance (AMOVA) and to calculate the population fixation index (F_ST) using the Arlequin v.3.52 software [92]. The F_ST index measures the amount of genetic variance that can be explained by population structures based on Wright’s F-statistics [93,94], where F_ST ranges from 0 (no differentiation between subpopulations) to 1 (complete differentiation between subpopulations) [95].

3.6. Linkage Disequilibrium (LD)

The linkage disequilibrium (LD) values for the whole population, each subpopulation, and each chromosome were estimated by calculating the squared allele frequency correlation coefficient (r²) between each pair of SNP markers for all the distributed SNPs through the genome using the PopLDdecay software [96]. The r² values were plotted against the corresponding genetic distances in kilobase pairs (Kbp), and a threshold value of 0.20 for r² represents the 95th percentile of unlinked r² values and was used to declare the LD decay.

4. Conclusions

The rapid advancements in next-generation sequencing technologies reduce the cost, time, and effort required to develop and utilize high-throughput genotyping pipelines. Using GBS technology, the current study genotyped a panel of 501 sesame accessions with 24,735 SNP markers to explore their genetic diversity and population structure. The 24K high-quality SNP markers covered 13 chromosomes, with an average 1900 SNPs/chromosome, a PIC value of 0.267, and a He value of 0.332, thus providing sufficient marker information for further studies. The population structure, PCA, and phylogenetic tree analyses identified two distinct subpopulations. The variations in polymorphism, genetic diversity indexes, and LD decay patterns indicate that directed selection and geographical adaptation may have affected the formation and differentiation within natural sesame populations at the chromosomal and, consequently, genome-wide level. This information can be used in future breeding efforts, where the genotyped panel characterized with SNP markers is a great resource for allele/gene identification using genome-wide association analysis studies (GWAS), ultimately providing a tool to enhance genetic gain in sesame breeding programs using innovative breeding methodologies such as marker-assisted selection and genomic selection.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/plants13131765/s1, Figure S1: Subpopulation structure based on K = 1–10 of fastStructure software analysis; Table S1: Country and continent of origin of the 501 USDA sesame accessions used in this study, and subpopulation structure based on fastStructure software analysis.

Author Contributions

Conceptualization, H.A.-H.; methodology, D.S.; software, A.S.; validation, H.A.-H.; formal analysis, H.A.-H. and A.S.; investigation, H.A.-H., D.S. and A.S.; resources, H.A.-H.; data curation, H.A.-H., D.S. and A.S.; writing—original draft preparation, H.A.-H.; writing—review and editing, H.A.-H., D.S., A.S., G.N.D.L.F. and E.V.; visualization, H.A.-H. and A.S.; supervision, H.A.-H.; project administration, H.A.-H.; funding acquisition, H.A.-H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the United States Department of Agriculture-Agricultural Research Service (USDA-ARS) 2020-21410-008-00D.

Data Availability Statement

Data are available upon request to the corresponding author.

Acknowledgments

The authors utilized the University of Wisconsin—Madison Biotechnology Center’s DNA Sequencing Facility (research resource identifier: RRID:SCR_017759) to generate the GBS libraries and sequence the GBS libraries. The UWBC is a licensed service provider for internal and external clients, providing GBS services under license from Keygene N.V., which owns patents and patent applications protecting its sequence-based genotyping technologies.

Conflicts of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. The mentioning of trade names or commercial products in this publication is solely for providing specific information and does not imply recommendation or endorsement by the United States Department of Agriculture. The USDA is an equal opportunity provider and employer.

References

Miao, H.; Langham, D.R.; Zhang, H. Botanical Descriptions of Sesame. In The Sesame Genome; Miao, H., Zhang, H., Kole, C., Eds.; Springer International Publishing: Cham, Switzerland, 2021; pp. 19–57. [Google Scholar]
Wei, P.; Zhao, F.; Wang, Z.; Wang, Q.; Chai, X.; Hou, G.; Meng, Q. Sesame (Sesamum indicum L.): A Comprehensive Review of Nutritional Value, Phytochemical Composition, Health Benefits, Development of Food, and Industrial Applications. Nutrients 2022, 14, 4079. [Google Scholar] [CrossRef] [PubMed]
Pathak, N.; Rai, A.K.; Kumari, R.; Bhat, K.V. Value addition in sesame: A perspective on bioactive components for enhancing utility and profitability. Pharmacogn. Rev. 2014, 8, 147–155. [Google Scholar] [CrossRef] [PubMed]
Teklu, D.H.; Shimelis, H.; Tesfaye, A.; Shayanowako, A.I.T. Analyses of genetic diversity and population structure of sesame (Sesamum indicum L.) germplasm collections through seed oil and fatty acid compositions and SSR markers. J. Food Compos. Anal. 2022, 110, 104545. [Google Scholar] [CrossRef]
Wang, Z.; Zhou, F.; Tang, X.; Yang, Y.; Zhou, T.; Liu, H. Morphology and SSR Markers-Based Genetic Diversity Analysis of Sesame (Sesamum indicum L.) Cultivars Released in China. Agriculture 2023, 13, 1885. [Google Scholar] [CrossRef]
Pham, T.; Nguyen, T.-D.; Carlsson, A.; Bui, T. Morphological evaluation of sesame (Sesamum indicum L.) varieties from different origins. Aust. J. Crop Sci. 2010, 4, 498–504. [Google Scholar]
Frary, A.; Tekin, P.; Celik, I.; Furat, S.; Uzun, B.; Doganlar, S. Morphological and Molecular Diversity in Turkish Sesame Germplasm and Core Set Selection. Crop Sci. 2015, 55, 702–711. [Google Scholar] [CrossRef]
Gedifew, S.; Demelash, H.; Abate, A.; Abebe, T.D. Association of quantitative traits and genetic diversity in Ethiopian sesame (Sesamum indicum L.) genotypes. Heliyon 2024, 10, e26676. [Google Scholar] [CrossRef] [PubMed]
Teklu, D.H.; Shimelis, H.; Tesfaye, A.; Mashilo, J. Genetic diversity and association of yield-related traits in sesame. Plant Breed. 2021, 140, 331–341. [Google Scholar] [CrossRef]
Wei, X.; Liu, K.; Zhang, Y.; Feng, Q.; Wang, L.; Zhao, Y.; Li, D.; Zhao, Q.; Zhu, X.; Zhu, X.; et al. Genetic discovery for oil production and quality in sesame. Nat. Commun. 2015, 6, 8609. [Google Scholar] [CrossRef]
Kim, D.H.; Zur, G.; Danin-Poleg, Y.; Lee, S.W.; Shim, K.B.; Kang, C.W.; Kashi, Y. Genetic relationships of sesame germplasm collection as revealed by inter-simple sequence repeats. Plant Breed. 2002, 121, 259–262. [Google Scholar] [CrossRef]
Basak, M.; Uzun, B.; Yol, E. Genetic diversity and population structure of the Mediterranean sesame core collection with use of genome-wide SNPs developed by double digest RAD-Seq. PLoS ONE 2019, 14, e0223757. [Google Scholar] [CrossRef] [PubMed]
Teklu, D.H.; Shimelis, H.; Abady, S. Genetic Improvement in Sesame (Sesamum indicum L.): Progress and Outlook: A Review. Agronomy 2022, 12, 2144. [Google Scholar] [CrossRef]
Kefale, H.; Wang, L. Discovering favorable genes, QTLs, and genotypes as a genetic resource for sesame (Sesamum indicum L.) improvement. Front. Genet. 2022, 13, 1002182. [Google Scholar] [CrossRef] [PubMed]
Mesfer, A.S.; Safhi, F.A.; Alshaya, D.S.; Ibrahim, A.A.; Mansour, H.; Abd El Moneim, D. Genetic diversity using biochemical, physiological, karyological and molecular markers of Sesamum indicum L. Front. Genet. 2022, 13, 1035977. [Google Scholar] [CrossRef]
Salazar, B.; Laurentín, H.; Davila, M.; Castillo, A. Reliability of the RAPD technique for germplasm analysis of sesame (Sesamum indicum L.) from Venezuela. Interciencia 2006, 31, 456–460. [Google Scholar]
Ercan, A.; Taskin, K.; Turgut, K. Analysis of genetic diversity in Turkish sesame (Sesamum indicum L.) populations using RAPD markers. Genet. Resour. Crop Evol. 2004, 51, 599–607. [Google Scholar] [CrossRef]
Bhat, K.; Babrekar, P.; Lakhanpaul, S. Study of genetic diversity in Indian and exotic sesame (Sesamum indicum L.) germplasm using random amplified polymorphic DNA (RAPD) markers. Euphytica 1999, 110, 21–34. [Google Scholar] [CrossRef]
Asekova, S.; Kulkarni, K.; Oh, K.W.; Lee, M.-H.; Oh, E.; Kim, J.-I.; Yeo, U.-S.; Pae, S.-B.; Ha, T.J.; Kim, S.U. Analysis of Molecular Variance and Population Structure of Sesame (Sesamum indicum L.) Genotypes Using Simple Sequence Repeat Markers. Plant Breed. Biotech. 2018, 6, 321–336. [Google Scholar] [CrossRef]
Sasipriya, S.; Balram, M.; Kamireddy, P.; Eswari, K. Assessment of molecular divergence in sesame (Sesamum indicum L.) genotypes using microsatellite (SSR) markers. Int. J. Ecol. Environ. Sci. 2020, 2, 182–187. [Google Scholar]
Anggraeni, T.D.A.; Fadilah, S.N.; Kusnadi, J.; Basuki, S. The Use of ISSR markers for clustering sesame genotypes based on geographical origin. IOP Conf. Ser. Earth Environ. Sci. 2022, 974, 012031. [Google Scholar] [CrossRef]
Laurentin, H.E.; Karlovsky, P. Genetic relationship and diversity in a sesame (Sesamum indicum L.) germplasm collection using amplified fragment length polymorphism (AFLP). BMC Genet. 2006, 7, 10. [Google Scholar] [CrossRef] [PubMed]
Ali Al-Somain, B.H.; Migdadi, H.M.; Al-Faifi, S.A.; Alghamdi, S.S.; Muharram, A.A.; Mohammed, N.A.; Refay, Y.A. Assessment of genetic diversity of sesame accessions collected from different ecological regions using sequence-related amplified polymorphism markers. 3 Biotech 2017, 7, 82. [Google Scholar] [CrossRef] [PubMed]
Verma, S.; Gupta, S.; Bandhiwal, N.; Kumar, T.; Bharadwaj, C.; Bhatia, S. High-density linkage map construction and mapping of seed trait QTLs in chickpea (Cicer arietinum L.) using genotyping-by-sequencing (GBS). Sci. Rep. 2015, 5, 17512. [Google Scholar] [CrossRef] [PubMed]
Tesfaye, T.; Tesfaye, K.; Keneni, G.; Ziyomo, C.; Alemu, T. Genetic diversity of Sesame (Sesamum indicum L) using high throughput diversity array technology. J. Crop Sci. Biotechnol. 2022, 25, 359–371. [Google Scholar] [CrossRef]
Sonah, H.; Bastien, M.; Iquira, E.; Tardivel, A.; Légaré, G.; Boyle, B.; Normandeau, É.; Laroche, J.; Larose, S.; Jean, M.; et al. An improved genotyping by sequencing (GBS) approach offering increased versatility and efficiency of SNP discovery and genotyping. PLoS ONE 2013, 8, e54603. [Google Scholar] [CrossRef] [PubMed]
Bird, K.A.; An, H.; Gazave, E.; Gore, M.A.; Pires, J.C.; Robertson, L.D.; Labate, J.A. Population structure and phylogenetic relationships in a diverse panel of Brassica rapa L. Front. Plant Sci. 2017, 8, 321. [Google Scholar] [CrossRef] [PubMed]
Khedikar, Y.; Clarke, W.E.; Chen, L.; Higgins, E.E.; Kagale, S.; Koh, C.S.; Bennett, R.; Parkin, I.A.P. Narrow genetic base shapes population structure and linkage disequilibrium in an industrial oilseed crop, Brassica carinata A. Braun. Sci. Rep. 2020, 10, 12629. [Google Scholar] [CrossRef]
Abdel-Haleem, H.; Luo, Z.; Szczepanek, A. Genetic diversity and population structure of the USDA collection of Brassica juncea L. Ind. Crop. Prod. 2022, 187, 115379. [Google Scholar] [CrossRef]
Luo, Z.; Brock, J.; Dyer, J.M.; Kutchan, T.; Schachtman, D.; Augustin, M.; Ge, Y.; Fahlgren, N.; Abdel-Haleem, H. Genetic Diversity and Population Structure of a Camelina sativa Spring Panel. Front. Plant Sci. 2019, 10, 184. [Google Scholar] [CrossRef]
Islam, A.; Sanders, D.; Mishra, A.K.; Joshi, V. Genetic Diversity and Population Structure Analysis of the USDA Olive Germplasm Using Genotyping-By-Sequencing (GBS). Genes 2021, 12, 2007. [Google Scholar] [CrossRef]
Fu, Y.B.; Cober, E.R.; Morrison, M.J.; Marsolais, F.; Peterson, G.W.; Horbach, C. Patterns of Genetic Variation in a Soybean Germplasm Collection as Characterized with Genotyping-by-Sequencing. Plants 2021, 10, 1611. [Google Scholar] [CrossRef] [PubMed]
Lee, K.J.; Lee, J.R.; Sebastin, R.; Shin, M.J.; Kim, S.H.; Cho, G.T.; Hyun, D.Y. Genetic Diversity Assessed by Genotyping by Sequencing (GBS) in Watermelon Germplasm. Genes 2019, 10, 822. [Google Scholar] [CrossRef] [PubMed]
Yang, X.; Tan, B.; Liu, H.; Zhu, W.; Xu, L.; Wang, Y.; Fan, X.; Sha, L.; Zhang, H.; Zeng, J.; et al. Genetic Diversity and Population Structure of Asian and European Common Wheat Accessions Based on Genotyping-By-Sequencing. Front. Genet. 2020, 11, 580782. [Google Scholar] [CrossRef] [PubMed]
Pham, T.D.; Geleta, M.; Bui, T.M.; Bui, T.C.; Merker, A.; Carlsson, A.S. Comparative analysis of genetic diversity of sesame (Sesamum indicum L.) from Vietnam and Cambodia using agro-morphological and molecular markers. Hereditas 2011, 148, 28–35. [Google Scholar] [CrossRef]
Yates, H.E.; Frary, A.; Doganlar, S.; Frampton, A.; Eannetta, N.T.; Uhlig, J.; Tanksley, S.D. Comparative fine mapping of fruit quality QTLs on chromosome 4 introgressions derived from two wild tomato species. Euphytica 2004, 135, 283–296. [Google Scholar] [CrossRef]
Cui, C.; Mei, H.; Liu, Y.; Zhang, H.; Zheng, Y. Genetic Diversity, Population Structure, and Linkage Disequilibrium of an Association-Mapping Panel Revealed by Genome-Wide SNP Markers in Sesame. Front. Plant Sci. 2017, 8, 1189. [Google Scholar] [CrossRef]
Dossa, K.; Wei, X.; Zhang, Y.; Fonceka, D.; Yang, W.; Diouf, D.; Liao, B.; Cissé, N.; Zhang, X. Analysis of Genetic Diversity and Population Structure of Sesame Accessions from Africa and Asia as Major Centers of Its Cultivation. Genes 2016, 7, 14. [Google Scholar] [CrossRef] [PubMed]
Mason, A.S.; Zhang, J.; Tollenaere, R.; Teuber, P.V.; Dalton-Morgan, J.; Hu, L.Y.; Yan, G.J.; Edwards, D.; Redden, R.; Batley, J. High-throughput genotyping for species identification and diversity assessment in germplasm collections. Mol. Ecol. Resour. 2015, 15, 1091–1101. [Google Scholar] [CrossRef]
Mohd Saad, N.S.; Severn-Ellis, A.A.; Pradhan, A.; Edwards, D.; Batley, J. Genomics armed with diversity leads the way in brassica improvement in a changing global environment. Front. Genet. 2021, 12, 600789. [Google Scholar] [CrossRef]
Ashri, A. Sesame (Sesamum indicum L.). In Genetic Resources, Chromosome Engineering, and Crop Improvement: Oilseed Crops, Volume 4; Singh, R.J., Ed.; CRC Press: Boca Raton, FL, USA, 2006; pp. 231–289. [Google Scholar]
Wang, M.; Huang, J.; Liu, S.; Liu, X.; Li, R.; Luo, J.; Fu, Z. Improved assembly and annotation of the sesame genome. DNA Res. 2022, 29, dsac041. [Google Scholar] [CrossRef]
Bancroft, I.; Morgan, C.; Fraser, F.; Higgins, J.; Wells, R.; Clissold, L.; Baker, D.; Long, Y.; Meng, J.; Wang, X.; et al. Dissecting the genome of the polyploid crop oilseed rape by transcriptome sequencing. Nat. Biotechnol. 2011, 29, 762–766. [Google Scholar] [CrossRef] [PubMed]
Schnable, J.C.; Springer, N.M.; Freeling, M. Differentiation of the maize subgenomes by genome dominance and both ancient and ongoing gene loss. Proc. Natl. Acad. Sci. USA 2011, 108, 4069–4074. [Google Scholar] [CrossRef] [PubMed]
Delourme, R.; Falentin, C.; Fomeju, B.F.; Boillot, M.; Lassalle, G.; André, I.; Duarte, J.; Gauthier, V.; Lucante, N.; Marty, A.J.B.g. High-density SNP-based genetic map development and linkage disequilibrium assessment in Brassica napus L. BMC Genom. 2013, 14, 120. [Google Scholar] [CrossRef] [PubMed]
Park, S.; Yu, H.-J.; Mun, J.-H.; Lee, S.-C. Genome-wide discovery of DNA polymorphism in Brassica rapa. Mol. Genet. Genom. 2009, 283, 135–145. [Google Scholar] [CrossRef] [PubMed]
Kim, S.-J.; Park, J.-S.; Shin, Y.-H.; Park, Y.-D. Identification and validation of genetic variations in transgenic Chinese cabbage plants (Brassica rapa ssp. pekinensis) by next-generation sequencing. Genes 2021, 12, 621. [Google Scholar] [CrossRef] [PubMed]
Bus, A.; Hecht, J.; Huettel, B.; Reinhardt, R.; Stich, B. High-throughput polymorphism detection and genotyping in Brassica napus using next-generation RAD sequencing. BMC Genom. 2012, 13, 281. [Google Scholar] [CrossRef] [PubMed]
Luo, Z.; Iaffaldano, B.J.; Zhuang, X.; Fresnedo-Ramírez, J.; Cornish, K. Analysis of the first Taraxacum kok-saghyz transcriptome reveals potential rubber yield related SNPs. Sci. Rep. 2017, 7, 9939. [Google Scholar] [CrossRef] [PubMed]
Botstein, D.; White, R.L.; Skolnick, M.; Davis, R.W. Construction of a genetic linkage map in man using restriction fragment length polymorphisms. Am. J. Hum. Genet. 1980, 32, 314–331. [Google Scholar] [PubMed]
Nei, M. Analysis of gene diversity in subdivided populations. Proc. Natl. Acad. Sci. USA 1973, 70, 3321–3323. [Google Scholar] [CrossRef]
Harris, A.M.; DeGiorgio, M. An unbiased estimator of gene diversity with improved variance for samples containing related and inbred individuals of any ploidy. G3 Gene. Genom. Genet. 2017, 7, 671–691. [Google Scholar] [CrossRef]
Yu, Z.; Fredua-Agyeman, R.; Hwang, S.-F.; Strelkov, S.E. Molecular genetic diversity and population structure analyses of rutabaga accessions from Nordic countries as revealed by single nucleotide polymorphism markers. BMC Genom. 2021, 22, 442. [Google Scholar] [CrossRef]
Guo, X.; Elston, R.C. Linkage information content of polymorphic genetic markers. Human. Hered. 1999, 49, 112–118. [Google Scholar] [CrossRef]
Singh, B.K.; Mishra, D.C.; Yadav, S.; Ambawat, S.; Vaidya, E.; Tribhuvan, K.U.; Kumar, A.; Kumar, S.; Kumar, S.; Chaturvedi, K.K.; et al. Identification, characterization, validation and cross-species amplification of genic-SSRs in Indian Mustard (Brassica juncea). J. Plant Biochem. Biotechnol. 2016, 25, 410–420. [Google Scholar] [CrossRef]
Gupta, R.; Chandrashekar, U.S.; Yadav, J.B.; Chakrabarty, S.K.; Dadlani, M. Assessment of genetic relatedness among Indian mustard (Brassica juncea) genotypes using morphological traits and DNA marker. Ind. J. Agri. Sci. 2012, 82, 746–752. [Google Scholar] [CrossRef]
Raza, A.; Farooq, A.U.; Khan, W.A.; Iqbal, A.; Celik, S.; Ali, M.; Khan, R.S.A. Polymorphic information and genetic diversity in Brassica species revealed by RAPD markers. Biocell 2020, 44, 769–776. [Google Scholar] [CrossRef]
Qamar, H.; Shabbir, G.; Ilyas, M.; Arshad, A.; Malik, S.I.; Mahmood, T.; Bin Mustafa, H.S. Studies on genetic divergence of rapeseed genotypes using SSR markers. Pak. J. Bot. 2020, 52, 197–204. [Google Scholar] [CrossRef] [PubMed]
Moghaddam, M.; Mohammmadi, S.A.; Mohebalipour, N.; Toorchi, M.; Aharizad, S.; Javidfar, F. Assessment of genetic diversity in rapeseed cultivars as revealed by RAPD and microsatellite markers. Afr. J. Biotechnol. 2009, 8, 3160–3167. [Google Scholar]
Wu, J.; Li, F.; Xu, K.; Gao, G.; Chen, B.; Yan, G.; Wang, N.; Qiao, J.; Li, J.; Li, H.; et al. Assessing and broadening genetic diversity of a rapeseed germplasm collection. Breed. Sci. 2014, 64, 321–330. [Google Scholar] [CrossRef] [PubMed]
Eltaher, S.; Sallam, A.; Belamkar, V.; Emara, H.A.; Nower, A.A.; Salem, K.F.M.; Poland, J.; Baenziger, P.S. Genetic Diversity and Population Structure of F3:6 Nebraska Winter Wheat Genotypes Using Genotyping-By-Sequencing. Front. Genet. 2018, 9, 76. [Google Scholar] [CrossRef]
Zhao, K.; Aranzana, M.J.; Kim, S.; Lister, C.; Shindo, C.; Tang, C.; Toomajian, C.; Zheng, H.; Dean, C.; Marjoram, P.; et al. An Arabidopsis Example of Association Mapping in Structured Samples. PLoS Genet. 2007, 3, e4. [Google Scholar] [CrossRef]
Puechmaille, S.J. The program structure does not reliably recover the correct population structure when sampling is uneven: Subsampling and new estimators alleviate the problem. Mol. Ecol. Resour. 2016, 16, 608–627. [Google Scholar] [CrossRef] [PubMed]
Li, Y.L.; Liu, J.X. StructureSelector: A web-based software to select and visualize the optimal number of clusters using multiple methods. Mol. Ecol. Resour. 2018, 18, 176–177. [Google Scholar] [CrossRef] [PubMed]
Dossou, S.S.K.; Song, S.; Liu, A.; Li, D.; Zhou, R.; Berhe, M.; Zhang, Y.; Sheng, C.; Wang, Z.; You, J.; et al. Resequencing of 410 Sesame Accessions Identifies SINST1 as the Major Underlying Gene for Lignans Variation. Int. J. Mol. Sci. 2023, 24, 1055. [Google Scholar] [CrossRef] [PubMed]
Mohammadi, S.A.; Prasanna, B.J.C.s. Analysis of genetic diversity in crop plants—Salient statistical tools and considerations. Crop Sci. 2003, 43, 1235–1248. [Google Scholar] [CrossRef]
Zapata, C.; Rodríguez, S.; Visedo, G.; Sacristán, F. Spectrum of nonrandom associations between microsatellite loci on human chromosome 11p15. Genetics 2001, 158, 1235–1251. [Google Scholar] [CrossRef] [PubMed]
Lewontin, R.C.; Kojima, K.-i. The evolutionary dynamics of complex polymorphisms. Evolution 1960, 14, 458–472. [Google Scholar] [CrossRef]
Ward, R.A.; Kim, K.S.; Diers, B.W. Yield drag associated with the soybean aphid resistance gene Rag2 from PI 200538. Crop Sci. 2017, 57, 3035–3042. [Google Scholar] [CrossRef]
Andrade, A.C.B.; Viana, J.M.S.; Pereira, H.D.; Pinto, V.B.; Fonseca E Silva, F. Linkage disequilibrium and haplotype block patterns in popcorn populations. PLoS ONE 2019, 14, e0219417. [Google Scholar] [CrossRef] [PubMed]
Wall, J.D.; Pritchard, J.K. Haplotype blocks and linkage disequilibrium in the human genome. Nat. Rev. Genet. 2003, 4, 587–597. [Google Scholar] [CrossRef]
Voss-Fels, K.; Snowdon, R.J. Understanding and utilizing crop genome diversity via high-resolution genotyping. Plant Biotechnol. J. 2016, 14, 1086–1094. [Google Scholar] [CrossRef]
Flint-Garcia, S.A.; Thornsberry, J.M.; Buckler, E.S.t. Structure of linkage disequilibrium in plants. Annu. Rev. Plant Biol. 2003, 54, 357–374. [Google Scholar] [CrossRef] [PubMed]
Slatkin, M. Linkage disequilibrium-understanding the evolutionary past and mapping the medical future. Nat. Rev. Genet. 2008, 9, 477–485. [Google Scholar] [CrossRef] [PubMed]
Wang, L.; Han, X.; Zhang, Y.; Li, D.; Wei, X.; Ding, X.; Zhang, X. Deep resequencing reveals allelic variation in Sesamum indicum. BMC Plant Biol. 2014, 14, 225. [Google Scholar] [CrossRef] [PubMed]
Zhang, W.; Hu, D.; Raman, R.; Guo, S.; Wei, Z.; Shen, X.; Meng, J.; Raman, H.; Zou, J. Investigation of the Genetic Diversity and Quantitative Trait Loci Accounting for Important Agronomic and Seed Quality Traits in Brassica carinata. Front. Plant Sci. 2017, 8, 615. [Google Scholar] [CrossRef] [PubMed]
Qian, L.; Qian, W.; Snowdon, R.J. Sub-genomic selection patterns as a signature of breeding in the allopolyploid Brassica napus genome. BMC Genom. 2014, 15, 1170. [Google Scholar] [CrossRef] [PubMed]
Rafalski, A.; Morgante, M. Corn and humans: Recombination and linkage disequilibrium in two genomes of similar size. Trends Genet. 2004, 20, 103–111. [Google Scholar] [CrossRef] [PubMed]
Kircher, M.; Kelso, J. High-throughput DNA sequencing--concepts and limitations. BioEssays 2010, 32, 524–536. [Google Scholar] [CrossRef] [PubMed]
Elshire, R.J.; Glaubitz, J.C.; Sun, Q.; Poland, J.A.; Kawamoto, K.; Buckler, E.S.; Mitchell, S.E. A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. PLoS ONE 2011, 6, e19379. [Google Scholar] [CrossRef]
Luo, Z.; Fahlgren, N.; Kutchan, T.; Schachtman, D.; Ge, Y.; Gesch, R.; George, S.; Dyer, J.; Abdel-Haleem, H. Discovering candidate genes related to flowering time in the spring panel of Camelina sativa. Ind. Crop. Prod. 2021, 173, 114104. [Google Scholar] [CrossRef]
Luo, Z.; Szczepanek, A.; Abdel-Haleem, H. Genome-wide association study (GWAS) analysis of camelina seedling germination under salt stress condition. Agronomy 2020, 10, 1444. [Google Scholar] [CrossRef]
Luo, Z.; Tomasi, P.; Fahlgren, N.; Abdel-Haleem, H. Genome-wide association study (GWAS) of leaf cuticular wax components in Camelina sativa identifies genetic loci related to intracellular wax transport. BMC Plant Biol. 2019, 19, 187. [Google Scholar] [CrossRef] [PubMed]
Muthulakshmi, C.; Sivaranjani, R.; Selvi, S. Modification of sesame (Sesamum indicum L.) for Triacylglycerol accumulation in plant biomass for biofuel applications. Biotechnol. Rep. 2021, 32, e00668. [Google Scholar] [CrossRef] [PubMed]
Jiang, H.; Lei, R.; Ding, S.-W.; Zhu, S. Skewer: A fast and accurate adapter trimmer for next-generation sequencing paired-end reads. BMC Bioinform. 2014, 15, 182. [Google Scholar] [CrossRef] [PubMed]
Glaubitz, J.C.; Casstevens, T.M.; Lu, F.; Harriman, J.; Elshire, R.J.; Sun, Q.; Buckler, E.S. TASSEL-GBS: A High capacity genotyping by sequencing analysis ppeline. PLoS ONE 2014, 9, e90346. [Google Scholar] [CrossRef] [PubMed]
Langmead, B.; Salzberg, S.L. Fast gapped-read alignment with Bowtie 2. Nat. Method. 2012, 9, 357–359. [Google Scholar] [CrossRef] [PubMed]
Danecek, P.; Auton, A.; Abecasis, G.; Albers, C.A.; Banks, E.; DePristo, M.A.; Handsaker, R.E.; Lunter, G.; Marth, G.T.; Sherry, S.T.; et al. The variant call format and VCFtools. Bioinformatics 2011, 27, 2156–2158. [Google Scholar] [CrossRef] [PubMed]
Bradbury, P.J.; Zhang, Z.; Kroon, D.E.; Casstevens, T.M.; Ramdoss, Y.; Buckler, E.S. TASSEL: Software for association mapping of complex traits in diverse samples. Bioinformatics 2007, 23, 2633–2635. [Google Scholar] [CrossRef] [PubMed]
Raj, A.; Stephens, M.; Pritchard, J.K. fastSTRUCTURE: Variational inference of population structure in large SNP data sets. Genetics 2014, 197, 573–589. [Google Scholar] [CrossRef] [PubMed]
Francis, R.M. Pophelper: An R package and web app to analyse and visualize population structure. Mol. Ecol. Resour. 2017, 17, 27–32. [Google Scholar] [CrossRef]
Excoffier, L.; Lischer, H.E. Arlequin suite ver 3.5: A new series of programs to perform population genetics analyses under Linux and Windows. Mol. Ecol. Resour. 2010, 10, 564–567. [Google Scholar] [CrossRef]
Weir, B.S. Estimating F-statistics: A historical view. Philos. Sci. 2012, 79, 637–643. [Google Scholar] [CrossRef] [PubMed]
Wright, S. The interpretation of population structure by F-statistics with special regard to systems of mating. Evolution 1965, 19, 395–420. [Google Scholar] [CrossRef]
Del Carpio, D.P.; Basnet, R.K.; De Vos, R.C.; Maliepaard, C.; Visser, R.; Bonnema, G. The patterns of population differentiation in a Brassica napus L. core collection. Theor. Appl. Genet. 2011, 122, 1105–1118. [Google Scholar] [CrossRef] [PubMed]
Zhang, C.; Dong, S.-S.; Xu, J.-Y.; He, W.-M.; Yang, T.-L. PopLDdecay: A fast and effective tool for linkage disequilibrium decay analysis based on variant call format files. Bioinformatics 2018, 35, 1786–1788. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Geographical distribution of the origin of the 501 accessions of sesame used in this study.

Figure 2. Genomic distributions and marker densities of 24,735 single-nucleotide polymorphisms (SNPs) across the 13 chromosomes of Sesamum indicum. Centromere is suggested by white blocks.

Figure 3. Marker polymorphism and heterozygosity estimations of 24,735 single-nucleotide polymorphism (SNP) markers. (A) Expected heterozygosity (He) and (B) polymorphic information content (PIC).

Figure 4. Population structure of the 501 USDA sesame accessions: (A) inferred clusters obtained using the Puechmaille method and Structure Selector software, (B) estimated population structure based on K = 2 using fastStructure software; and (C) the neighbor-joining phylogenetic tree based on genetic distance matrix.

Figure 5. Principal component analysis (PCA) based on (A) genetic distance of the two clustered subpopulations and (B) genetic distance among accessions and their origin.

Figure 6. Linkage disequilibrium (LD) decay of the 501 sesame accessions, the r² values plotted against physical distance for whole panel and two subpopulations based on fastStructure software.

Table 1. Genomic distribution and SNP marker statistics of 24735 SNPs mapped on sesame genome using USDA sesame accessions.

Chromosomes *	Length (Mbp) *	No. of SNP	Maker Density **
Chromosomes *	Length (Mbp) *	No. of SNP	Kbp	SNP/Mbp
LG01	23.75	2389	9.94	100.60
LG02	23.37	1240	18.85	53.05
LG03	31.44	3118	10.08	99.18
LG04	21.23	1466	14.48	69.07
LG05	20.96	1434	14.62	68.40
LG06	27.99	2467	11.35	88.13
LG07	16.12	1259	12.80	78.11
LG08	31.99	2342	13.66	73.22
LG09	26.74	2336	11.45	87.36
LG10	22.21	1705	13.03	76.76
LG11	17.33	1716	10.10	99.01
LG12	19.10	2065	9.25	108.11
LG13	19.50	1198	16.28	61.44
Total	301.73	24,735	-	-
Average	-	1903	12.76	81.73

* Chromosome names and sizes according to genome assembly; https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_026168435.1/, accessed on 15 May 2024). ** Marker density as average distance between two SNPs (Kbp) and number of SNPs in 1Mbp.

Table 2. Percentage of transition and transversion SNPs across sesame genome using USDA sesame accessions.

SNP Type	Transitions		Transversions
SNP Type	A/G	C/T	A/T	A/C	G/T	G/C
Number of SNPs	7145	7292	2609	2555	2529	2605
Allele frequency	0.289	0.295	0.105	0.103	0.102	0.105
Total (percentage)	0.584		0.416

Table 3. Analysis of molecular variance (AMOVA) among and within USDA sesame population.

Source	df	Sum of Squares	Variance Components	Variation %
Among subpopulations	1	155,958.45	308.77	29.50
Among accessions within subpopulations	499	667,180.45	599.25	57.26
Within accessions	501	69,405.00	138.53	13.24
Total			892,543.90	1046.56

Table 4. Linkage disequilibrium at r < 0.15 for USDA sesame accessions.

Chromosome	LD
LG01	148.77
LG02	132.90
LG03	148.88
LG04	131.75
LG05	123.26
LG06	151.14
LG07	149.29
LG08	161.50
LG09	167.65
LG10	133.18
LG11	146.04
LG12	159.01
LG13	148.82
Whole population	160.69
Subpopulation 1	166.45
Subpopulation 2	143.78

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Seay, D.; Szczepanek, A.; De La Fuente, G.N.; Votava, E.; Abdel-Haleem, H. Genetic Diversity and Population Structure of a Large USDA Sesame Collection. Plants 2024, 13, 1765. https://doi.org/10.3390/plants13131765

AMA Style

Seay D, Szczepanek A, De La Fuente GN, Votava E, Abdel-Haleem H. Genetic Diversity and Population Structure of a Large USDA Sesame Collection. Plants. 2024; 13(13):1765. https://doi.org/10.3390/plants13131765

Chicago/Turabian Style

Seay, Damien, Aaron Szczepanek, Gerald N. De La Fuente, Eric Votava, and Hussein Abdel-Haleem. 2024. "Genetic Diversity and Population Structure of a Large USDA Sesame Collection" Plants 13, no. 13: 1765. https://doi.org/10.3390/plants13131765

APA Style

Seay, D., Szczepanek, A., De La Fuente, G. N., Votava, E., & Abdel-Haleem, H. (2024). Genetic Diversity and Population Structure of a Large USDA Sesame Collection. Plants, 13(13), 1765. https://doi.org/10.3390/plants13131765

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Genetic Diversity and Population Structure of a Large USDA Sesame Collection

Abstract

1. Introduction

2. Results and Discussion

2.1. Single-Nucleotide Polymorphism Markers Coverage and Polymorphism Analyses

2.2. Analysis of Population Structure

2.3. Analysis of Molecular Variance (AMOVA) and Genetic Diversity Indices

2.4. Linkage Disequilibrium

2.5. Applications of High-Throughput Genotyping of Sesame Accessions

3. Materials and Methods

3.1. Plant Materials

3.2. DNA Extraction and Genotyping-by-Sequencing (GBS)

3.3. GBS Sequencing and Genotyping Pipeline Analyses

3.4. Population Genetic Analyses

3.4.1. Marker Polymorphism Analyses

3.4.2. Analysis of Population Structure

3.5. Analysis of Molecular Variance (AMOVA) and Genetic Diversity Indices

3.6. Linkage Disequilibrium (LD)

4. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI