Genetic Diversity and Collection Structure Studies of Sesame (Sesamum indicum L.) Accessions Across Ethiopian Research Centers

Gelashe, Feyisa Bejiga; Ndeve, Arsénio D.; Menamo, Temesgen M.; Gandhi, Harish; Chiulele, Rogério M.

doi:10.3390/genes17030300

Open AccessArticle

Genetic Diversity and Collection Structure Studies of Sesame (Sesamum indicum L.) Accessions Across Ethiopian Research Centers

by

Feyisa Bejiga Gelashe

^1,2

,

Arsénio D. Ndeve

¹,

Temesgen M. Menamo

^2,*,

Harish Gandhi

³ and

Rogério M. Chiulele

^1,4

¹

Department of Crop Production, Faculty of Agronomy and Forestry Engineering, Eduardo Mondlane University, 3453 Avenida Julius Nyerere, Maputo P.O. Box 257, Mozambique

²

Department of Plant Science and Horticulture, College of Agriculture and Veterinary Medicine, Jimma University, Jimma P.O. Box 307, Ethiopia

³

Dryland Legumes and Cereals Program, International Maize and Wheat Improvement Center (CIMMYT), ICRAF House, United Nations Avenue, Nairobi P.O. Box 1041-00621, Kenya

⁴

Centre of Excellence in Agri-Food Systems and Nutrition (CE-AFSN), Eduardo Mondlane University, 5° Andar, Edificio da Reitoria, Praça 25 de Junho, Maputo P.O. Box 257, Mozambique

^*

Author to whom correspondence should be addressed.

Genes 2026, 17(3), 300; https://doi.org/10.3390/genes17030300

Submission received: 26 January 2026 / Revised: 13 February 2026 / Accepted: 16 February 2026 / Published: 28 February 2026

(This article belongs to the Section Plant Genetics and Genomics)

Download

Browse Figures

Versions Notes

Abstract

Background/Objectives: Despite its economic importance, the genome-wide genetic diversity of sesame germplasm conserved in the Ethiopian national ex situ collection, a proposed center of origin, remains inadequately characterized. This study assessed genome-wide genetic diversity and population structure in 188 sesame accessions from six Ethiopian Agricultural Research Centers using DArTSeq-based SNP markers. Methods: After quality filtering, 5163 high-quality markers were retained from the original set of 12,302 SNPs. Mean expected heterozygosity (He = 0.201) exceeded observed heterozygosity (Ho = 0.193), reflecting sesame’s predominantly self-pollinating nature. Results: The SNPs showed a transition/transversion ratio of 1.17:1 and an uneven distribution across 16 linkage groups. STRUCTURE, PCA, DAPC, and neighbor-joining cluster analyses revealed a clear hierarchical population structure with distinct clusters and varying admixture. Accessions from Assosa (AARC) and Bako (BARC) were genetically uniform, whereas Werer (WARC) and Gambella (GaARC) were major diversity reservoirs, exhibiting high heterozygosity and gene diversity. Pairwise FST values ranged from 0.001 to 0.356, and AMOVA indicated that 30–43% of variation occurred among collections and 57–70% within collections, highlighting substantial intra-collection diversity. Conclusions: The findings highlight that specific research centers were identified as key sources of genetic variation for breeding, conservation, and association mapping to enhance the improvement in agronomic and adaptive traits in sesame for the Ethiopian sesame gene pool.

Keywords:

Sesamum indicum L.; genetic diversity; collection structure; GBS; DArTSeq; SNP markers; cluster analysis; PCA; collection diversity

1. Introduction

Sesame (Sesamum indicum L.) is a self-pollinating crop belonging to the family Pedaliaceae. Its origin and evolution have long been debated [1,2], with proposed centers including India, the Middle East, and sub-Saharan Africa, including Ethiopia [3,4,5]. However, limited archeological evidence and few experimental studies within the Sesamum genus make its evolutionary history difficult to resolve [6]. Thus, it is challenging to pinpoint the crop’s precise origin with accuracy due to all these assertions. Cytogenetically, sesame species are grouped into three classes based on chromosome number: diploid S. indicum and S. alatum (2n = 2x = 26), diploid S. latifolium (2n = 2x = 32), and allotetraploid S. radiatum (2n = 4x = 64) [7]. A total of 36 sesame species have been reported [8], and the presence of indigenous species in Ethiopia, including S. alatum & S. indicum (2n = 26) and S. latifolium (2n = 32), supports the country as a center of origin for sesame [9].

Sesame is widely cultivated in tropical regions of the world, with major production in Sudan, India, Myanmar, China, Tanzania, Nigeria, Burkina Faso, Chad, the Central African Republic, and Ethiopia [10,11]. It is valued for its high-quality edible oil (~55%) [12,13] and substantial protein content (18–25%) [14]. In sub-Saharan Africa, sesame has transitioned from a marginal crop to a major export commodity, though smallholder farmers predominantly produce it [15]. In Ethiopia, sesame is widely grown in central and northern regions [16] and is the country’s largest oilseed export, valued at USD 307 million, ranking second only to coffee among agricultural exports [17]. Due to its seed color, size, sweet flavor, natural aroma, and production mostly under organic farming systems, Ethiopian sesame seed is highly favored in premium markets [18]. Despite its economic importance and suitability for genetic studies, genetic improvement of sesame has lagged behind that of other major oilseed crops, highlighting the need for more efficient breeding strategies.

Conventional breeding methods have long been used in oilseed crops to develop new genotypes with desirable traits. However, these approaches are slow and labor-intensive, making them insufficient to meet the growing global demand for oilseeds in the face of an increasing population and declining agricultural resources [19]. Consequently, molecular breeding techniques, which utilize molecular markers, have become valuable tools for characterizing and evaluating genetic diversity both between species and among populations. In sesame, understanding and harnessing genetic diversity is essential for enhancing global productivity. Recent studies have explored sesame genetic diversity using DNA-based markers across various global collections [20,21,22,23]. These markers are particularly useful and reliable, as they remain stable under different environmental conditions [24].

Understanding the extent and pattern of genetic diversity within gene pools is essential for plant breeders to develop improved varieties with desirable traits [25]. In sesame, genetic diversity has been widely assessed using morphological, biochemical, and molecular markers worldwide [26,27,28,29,30,31,32,33,34]. However, studies on Ethiopian sesame germplasm remain limited, with only a few reports based on a limited number of accessions [35], and very few molecular marker-based studies on the local sesame collections. Globally, sesame genetic diversity has been examined using various molecular markers, including amplified fragment length polymorphism [36], sequence-related amplified polymorphism [37], inter-simple sequence repeat [34], simple sequence repeat [38], expressed sequence tag [33], and insertion–deletion markers [39].

Recent advances in molecular genetics have led to the emergence of single-nucleotide polymorphism (SNP) markers. SNP markers are precise, cost-effective, and high-throughput tools with several advantages over earlier markers [40,41]. They are abundant, stable, genome-wide, and efficiently assayed using automated genotyping platforms. Among available methods, genotype-by-sequencing (GBS) is currently the most widely used approach for SNP discovery in plants [42], generating robust marker datasets ranging from tens to thousands, compared with earlier SNP arrays [43]. GBS has been widely applied to assess genetic diversity and population structure in many crops, including sesame [44,45]. However, the genetic diversity and population structure of sesame germplasm conserved in Ethiopian institutional collections remain poorly understood at the genome-wide level. Previous studies have been limited by small sample sizes or few molecular markers, leaving gaps in our knowledge of variation among accessions and across research centers, as well as the impact of curation and regeneration practices. This study, therefore, aims to characterize the genome-wide diversity of one-hundred eighty-eight Ethiopian sesame accessions using the GBS protocol DArTSeq-derived SNPs markers, resolve collection diversity and genetic relationships, and inform germplasm management and breeding, including the identification of diverse accessions for conservation and crop improvement.

2. Materials and Methods

2.1. Plant Materials

This study evaluated 188 genetically diverse sesame accessions obtained from six different Ethiopian Institutes of Agricultural Research (Table S1). Specifically, 21 accessions from Assosa Agricultural Research Center, 28 accessions from Bako Agricultural Research Center, 12 accessions from Gambela Agricultural Research Center, 36 accessions from Gondar Agricultural Research Center, 21 accessions from Pawe Agricultural Research Center, and 70 accessions from Werer Agricultural Research Center were collected. These genetic materials were collected through the centers from various regions across the country, representing a diverse range of agroecologies.

It is important to note that the accessions analyzed in this study represent ex situ germplasm collections rather than natural biological populations. The genetic structure observed herein is a product of both historical evolutionary processes and human-mediated factors. Specifically, the diversity within these research center groupings has been shaped by the following:

Purposeful Sampling: Institutional priorities for specific traits or geographic regions.
Management Bottlenecks: Potential loss of rare alleles during repeated cycles of seed increase and regeneration.
Managed Mating: The use of closed mating systems during germplasm maintenance.
Germplasm Exchange: The historical movement and sharing of accessions between research centers.

Consequently, the clusters identified via STRUCTURE and DAPC are interpreted as genetic groups reflecting the curation history and provenance of the accessions, rather than independent evolutionary units driven solely by migration–drift–selection dynamics.

2.2. Genomic DNA Extraction and Sequencing

One-hundred eighty-eight (188) sesame accessions were grown under greenhouse conditions at the Horticulture and Plant Science Department, College of Agriculture and Veterinary Medicine, Jimma University (JUCAVM), Ethiopia, in July 2025 (Figure 1). Four seeds from each accession were planted in a seedling tray and maintained. Samples were collected from fifteen-day-old, fresh, young, and healthy leaf material from the four seedlings for each of the 188 sesame accessions, stored in two 96-well sample collection plates. The leaf tissue samples were stored at −80 °C until lyophilization, and the samples were then dried in a freeze-dryer (Alpha 1-2 LD-plus, Osterode am Harz, Germany) machine. Afterwards, the leaf samples were shipped to SEQART AFRICA at the International Livestock Research Institute (BecA-ILRI) Hub in Nairobi, Kenya, for genotyping.

Genomic DNA was extracted from leaf tissue using the NucleoMag Plant DNA extraction kit, yielding DNA concentrations ranging from 50 to 100 ng/μL. The quality and quantity of the extracted DNA were assessed on 0.8% agarose gels. Libraries were constructed following the DArTSeq complexity reduction protocol [46], which involved digesting genomic DNA with PstI and MseI enzymes, ligating barcoded and common adapters, and amplifying the adapter-ligated fragments via PCR. Sequencing was performed as single-end reads over 138 cycles on the NovaSeq X platform, following a quality check using GBS as described by [42], employing DArTseq™ technology at the SEQART AFRICA (https://www.seqart.net accessed 12 December 2025) genotyping lab. DArTsoft14, an internal marker scoring pipeline built on algorithms, was used to score DArTseq markers. Two types of DArTseq markers were scored: SilicoDArT and SNP markers. For specific analyses requiring genomic representation in binary form, both marker types were scored as the presence (1) or absence (0) of the corresponding restriction fragment. However, SNP markers were otherwise treated as codominant allelic/genotypic data, and conversion to binary format was strictly limited to analyses where it was methodologically required, and not applied to downstream estimates of genetic diversity and population structure inference. Additionally, to ascertain chromosome positions, both SilicoDArT and SNP markers were aligned to the reference genomes of Chrom_Sesame.

2.3. SNP Calling and Data Filtering

The data were filtered after using DArTSeq technology and were analyzed using the R package dartR [47]. Single-nucleotide polymorphism (SNP) markers with less than 10% missing data, minor allele frequency (MAF) greater than 5%, or unknown position were removed for further analysis using a new version of R 4.5.2 software. The SNP markers generated were aligned to the sesame reference genome Zhongzhi No. 13 [13] to determine their positions along the 16 sesame chromosomes. The following criteria were used to filter the data: markers with a minor allele frequency > 5% and a call rate > 95% were kept, while non-informative monomorphic markers were discarded. For marker filtration, VCF tools V0.1.13 [48] software was utilized. The dartR package in R was used to calculate measures such as polymorphic information content (PIC), reproducibility, and call rate in order to examine the traits and distribution of the markers along the 16 sesame chromosomes [47]. The fraction of mutation types, including transversion (Tv) and transition (Ts), that are accountable for the observed polymorphism was also determined using the same package.

2.4. Genetic Relationship Visualization and Diversity Analysis

SNP marker information and genetic diversity analyses, including minor allele frequency parameter, were performed using TASSEL (v5.2.52) [49]. The adegenet package in R was also used to calculate observed and expected heterozygosity [50], while call rate and marker reproducibility were estimated using the dartR package in R [51]. Genetic structure was evaluated using a Bayesian model-based clustering approach implemented in STRUCTURE software (v2.3.4) [52]. Relationships among individuals were examined by computing a pairwise genetic distance matrix based on Euclidean distances in R. A NJ cluster was then constructed using the hclust function and exported in Newick format via the ape package for visualization and annotation in the Interactive Tree of Life (iTOL) platform (v6.5.2) (https://itol.embl.de/, accessed 12 December 2025) [53]. We emphasize that this analysis represents a genetic relationship visualization rather than a phylogenetic reconstruction, as it is applied to within-species diversity where reticulate events (hybridization and admixture) are expected.

2.5. Collection Diversity and Cluster Analysis

Filtered SNPs were used to investigate collection diversity within the germplasm using a Bayesian clustering approach implemented in STRUCTURE [54]. Binary files generated from the VCF data were further subjected to admixture analysis using the adegenet package in R [55]. The optimal number of genetic clusters (K) was determined through k-means clustering by testing k values ranging from 2 to 6, with competing solutions evaluated using the Bayesian Information Criterion (BIC) [56]. Based on admixture results, accessions with membership coefficients (q-values) ≥ 60% were assigned to specific groups, while those with Q-values < 60% were classified as admixed [57].

Principal component analysis (PCA) was performed in TASSEL v5.2.60, and scatter plots of sesame accessions were generated using the first two principal components (PC1 and PC2). Genetic diversity parameters, including private alleles, private SNPs, expected heterozygosity (He), and observed heterozygosity (Ho), were calculated for the identified subpopulations. To validate the model-based collection diversity inferred by STRUCTURE using a model-free approach, discriminant analysis of principal components (DAPC) was conducted with the adegenet package [58] in R v3.5.0 [50]. DAPC was applied to confirm the best-fitting number of clusters among the sesame accessions. This multivariate method combines sequential k-means clustering with model selection to infer and describe genetic structure, with the optimal K identified as the point beyond which further increases result in negligible changes in BIC values [59].

2.6. Analysis of Molecular Variance (AMOVA)

The genetic structure of the population was assessed using analysis of molecular variance (AMOVA) [60] implemented in the poppr package in R v2.2.4 [61,62]. Following the approach described by [63], AMOVA was applied to partition total genetic variation into components attributable to differences among populations and within populations.

3. Results

3.1. SNP Markes Summary

In total, 12,302 SNP markers were initially obtained from the 188 sesame accessions. After applying filtering criteria, minor allele frequency (MAF) > 5% and missing data < 10%, a total of 7139 markers (58%) were removed. The remaining dataset was subsequently imputed. A final set of 5163 high-quality SNP markers (42%) met the required criteria and was retained for downstream genetic diversity analyses. All genome-wide SNPs were distributed across all 16 chromosomes but showed clear heterogeneity in density (Figure 2A). Using a 1 Mb sliding window, SNP density ranged from 0 to 54 SNPs/Mb, revealing heterogeneity in polymorphism levels across the genome. The total number of SNPs varied markedly across the sixteen chromosomes (Figure 2B). LG3, LG6, and LG8 contained the highest numbers of markers, each exceeding 500 SNPs, while LG13, LG14, and LG16 had the lowest counts (<150 SNPs). SNP density per megabase showed a slightly different trend, with LG12 exhibiting the highest density (>40 SNPs/Mb), despite having a moderate total SNP count. Moderate densities (20–30 SNPs/Mb) were recorded on LG1, LG3, LG6, LG8, LG11, and LG15, whereas LG7, LG13, LG14, and LG16 showed the lowest densities.

In the analyzed sesame genomes, transition-type SNPs were more frequent (54%) than transversion-type SNPs (46%), resulting in a transition-to-transversion (Ts/Tv) ratio of 1.17:1 (2788/2375) (Figure 2C). More A/T and C/G transitions were observed than G/T and A/C transitions. The frequency of the two transitions (A/G, C/T) was observed to be higher than the four transversions (A/C, G/T, C/G, and A/T), with C/T having the highest frequency of 31%, while the lowest frequency among the six allele combinations was A/C and G/T with 11%. The frequencies of the four transversion types were 11%, 11%, 12%, and 12% for A/C, G/T, C/G, and A/T, respectively (Figure 2C).

3.2. Genetic Relationship Visualization Analysis

Neighbor-Joining (NJ) tree-based visualizations revealed clear genetic relationships and differentiation among the 188 sesame accessions collected from the six research centers (AARC, BARC, GaARC, GARC, PARC, and WARC) (Figure 3A–C). The accessions clustered into three major genetic cluster groups, each further divided into multiple well-supported subgroups. This hierarchical clustering pattern indicates substantial genetic diversity within the panel. Although there was some degree of mixing among locations, several subclusters showed a strong tendency for samples from the same research center to group. For example, accessions from WARC and GARC formed several distinct and compact subclusters, suggesting closer genetic relatedness within these groups. In contrast, accessions from AARC and BARC were more widely dispersed across the tree, reflecting greater within-location diversity or shared ancestry with materials from other centers.

3.3. Genetic Diversity

The number of accessions, private alleles, private SNPs, allelic diversity, nucleotide diversity, heterozygosity, gene diversity, and fixation index across the research centers’ collections are shown in Table 1. The Werer Agricultural Research Center (WARC) collections showed higher heterozygosity (Ho = 0.292), and the GaARC collections showed higher gene diversity (He = 0.325) than other regions, while those from BARC and AARC had the lowest heterozygosity values (Ho = 0.131 and 0.138, respectively). The genetic diversity within each group/region revealed that the GaARC and PARC collection possess a comparable level of genetic diversity. The comparison of the total gene diversity analysis showed that the GaARC collection (He = 0.325) revealed the highest diversity, followed by the PARC (He = 0.282), BARC (He = 0.197), WARC (He = 0.185), GARC (He = 0.119), and AARC (He = 0.102) collections. A similar results trend was obtained using the fixation index across research centers‘ collections, in which the GaARC region had the highest value (Fst = 0.291), while the lowest was found for AARC (Fst = −0.123) (Table 1).

The pairwise genetic differentiation (FST) values revealed variable genetic structuring among sesame collections from different research centers, with low to moderate differentiation predominating, suggesting substantial gene flow or shared ancestry likely driven by germplasm exchange and the predominantly self-pollinating nature of sesame (Figure 4). Very low differentiation was observed between GaARC–PARC (Fst = 0.001) and BARC–AARC (0.015), as well as among AARC–GaARC (0.031), BARC–GaARC (0.046), and AARC–PARC (0.043), reflecting strong genetic similarity and overlap among these collections. In contrast, moderate differentiation was mainly associated with WARC; particularly for BARC–WARC (0.356) and AARC–WARC (0.327), the highest Fst values were observed, indicating greater genetic divergence that may reflect geographic isolation, distinct selection histories, or limited germplasm exchange. Moderate differentiation was also evident for GARC–BARC (0.285) and GARC–AARC (0.250) (Figure 4).

3.4. Principal Component Analysis (PCA)

The principal component analysis (PCA) revealed clear genetic structuring among the sesame accessions collected from the six research centers (AARC, BARC, GaARC, GARC, PARC, and WARC) (Figure 5). PC1 (34.52%) and PC2 (8.25%) together explained 42.77% of the total genetic variation. Accessions from WARC formed a distinct and well-separated cluster along the positive axis of PC1, indicating substantial divergence from the other populations. GARC also showed a wide dispersion, forming a partially separated cluster mainly along PC2, suggesting high within-population variability. In contrast, accessions from AARC, BARC, GaARC, and PARC clustered closely near the origin, reflecting relatively lower genetic differentiation and higher similarity among these regions. The overall pattern highlights strong genetic divergence in WARC and GARC collections, while the remaining populations share more overlapping genetic backgrounds.

3.5. Discriminant Analysis of Principal Components (DAPC)

The DAPC clearly separated the sesame accessions into distinct genetic clusters corresponding to their geographical origins (Figure 6). DAPC scatterplot and membership vectors show that WARC forms a well-defined and strongly differentiated cluster, indicating pronounced genetic divergence from the other accessions. GARC also separates clearly, with long membership vectors suggesting high within-population variability and strong discriminatory power of the discriminant functions. In contrast, accessions from AARC, BARC, GaARC, and PARC exhibit substantial overlap and compact clustering near the center, reflecting their relatively similar genetic backgrounds and low genetic cluster differentiation. Overall, the DAPC results confirm strong genetic structure driven mainly by the divergence of WARC and GARC collections, while the remaining four regions share highly overlapping genetic compositions. The DAPC, based on the optimal number of genetic clusters (K = 3), revealed a clear separation of the sesame accessions into three well-defined groups (Figure 7). Cluster 1 (green) formed a broad and widely dispersed group, indicating high within-cluster variability. Cluster 2 (orange) was tightly packed, suggesting relatively low internal diversity and strong genetic cohesion. Cluster 3 (purple) appeared as a small and completely isolated cluster, reflecting a highly distinct genetic lineage separated from the other two groups. The strong separation among the three clusters highlights pronounced genetic structuring within the entire collection and confirms the presence of at least three major genetically differentiated groups in the germplasm.

3.6. Collection Diversity Analysis

The model-based Bayesian cluster analysis in STRUCTURE visualized the genetic structure of the research centers’ collection of sesame accessions. The STRUCTURE analysis (k = 5) grouped the 188 sesame accessions from AARC, BARC, GaARC, GARC, PARC, and WARC into five genetic clusters (Figure 8A). Three clusters (pop_1, pop_2, and pop_3) showed mostly pure ancestry, representing well-differentiated gene pools likely maintained within specific centers. In contrast, pop_4 and pop_5 displayed strong admixture, indicating extensive gene flow and germplasm exchange among centers, particularly those with active breeding and seed-sharing systems.

The multi-K STRUCTURE analysis (k = 2 to k = 6) showed clear differences in genetic composition and admixture levels among sesame accessions collected from the six research centers (Figure 8B). AARC and BARC form almost completely pure clusters, indicating genetically uniform and less-admixed gene pools (k = 2). GaARC and PARC show mixed ancestry, while WARC displays strong admixture, suggesting greater genetic heterogeneity and historical gene flow (k = 2; Table 2). AARC and BARC remain homogeneous, each dominated by a single genetic lineage (k = 3). GaARC and PARC begin to split into two or more subgroups, while WARC continues to show high admixture, reflecting a more diverse germplasm base (k = 3). AARC and BARC still cluster tightly with minimal substructure, while GaARC, GARC, and PARC show increasing subdivision, indicating the presence of multiple genetic backgrounds within these centers (K = 4). WARC again forms complex, highly admixed clusters, reinforcing its status as the most genetically diverse collection (k = 4). AARC and BARC retain predominantly single-ancestry profiles, reflecting conserved local breeding lines. GaARC and GARC show moderate admixture, indicating germplasm exchange or shared ancestry with neighboring centers (k = 5). PARC and especially WARC exhibit strong multi-ancestry patterns, confirming broad genetic mixing (k = 5). The overall pattern stabilizes, AARC and BARC remain genetically distinct with very low admixture (k= 6). GaARC, GARC, and PARC show intermediate levels of structure and admixture, forming several subgroups (k = 6). WARC consistently displays the highest admixture, representing the most diverse and genetically mixed gene pool (k = 6). Across all K values, AARC and BARC maintain highly homogeneous and distinct genetic clusters, while GaARC, GARC, and PARC show moderate admixture and substructures. WARC consistently exhibits the highest admixture and genetic diversity, indicating extensive germplasm mixing and a broad ancestral base.

3.7. Molecular Variance (AMOVA)

The analysis of molecular variance (AMOVA) showed that, when accessions were grouped by research center collections, 70% of the total genetic variation was distributed within accession groups, while 30% occurred among groups (Table 3). The among-group differentiation was significant, with a PhiPT value of 0.298 (p = 0.001), suggesting measurable accession groups structuring but with considerable gene flow or shared ancestry among institutional collections. When grouped based on collection diversity, AMOVA revealed a higher proportion of variation among accession groups (43%), with 57% of the variation within groups (Supplementary Table S3). This is used to quantify the magnitude of differentiation already identified by the clustering algorithms, rather than as independent proof of biological truth. This grouping also showed significant genetic differentiation, with a PhiPT value of 0.429 (p = 0.001), indicating stronger genetic differentiation under structure-based grouping compared to institutional collections. This demonstrates that genetic structure explains genetic differentiation better than sampling location, reflecting underlying genetic subdivisions not strictly aligned with geography.

4. Discussion

Sesame is a predominantly self-pollinating oilseed crop, though occasional outcrossing occurs and is exploited in breeding programs. Predominant selfing leads to low heterozygosity, high homozygosity within accessions, strong linkage disequilibrium, and pronounced differentiation among accessions [64,65]. Consequently, elevated FST values observed in this study are interpreted primarily as reflecting restricted effective recombination and fixation within accession groups, rather than ongoing gene flow among research center collections. Occasional outcrossing, combined with historical germplasm exchange and regeneration practices, introduces admixture and contributes to within-accession groups [66]. Overall, the observed patterns of uniformity, admixture, and divergence are shaped by selfing, rare hybridization, and ex situ management, providing essential context for interpreting heterozygosity, differentiation metrics (including FST), and clustering in these managed germplasm collections.

4.1. SNP Markers Density and Genome-Wide Variation Patterns

The high filtering rate, where 58% of raw SNP markers were removed, is consistent with the findings of [67], yielding 5163 high-quality SNPs (42%), and aligns with previous sesame studies using similar genotyping-by-sequencing approaches, which typically retain 40–55% of raw markers after stringent quality control [68]. The observed deficit of heterozygotes (mean He = 0.201 > Ho = 0.193) is an indication of self-pollination and has been consistently documented across diverse sesame germplasm panels [69,70,71]. In line with earlier findings of heterogeneous SNP density in sesame [68,70], genome-wide SNPs were unevenly distributed across the 16 chromosomes, with hotspots on LG6, LG8, LG12, and LG15, and extended low-polymorphism regions on LG4, LG7, LG10, and LG13. While shorter chromosomes (LG9–LG16) had fewer markers, the longest chromosomes (LG1–LG8) typically had more SNPs; however, LG12 showed the highest density per megabase despite moderate total SNP counts. This pattern provides adequate genomic resolution for downstream diversity and association analyses. Furthermore, the predominance of transition mutations (Ts/Tv = 1.17:1), particularly C/T transitions (31%), mirrors the known mutational bias in plants due to spontaneous cytosine deamination and matches spectra from recent resequencing studies [72]. This consistency in the mutation spectrum underscores the biological validity of the filtered SNP set. The heterozygosity patterns, mutation spectrum, and uneven SNP distribution taken together show that the dataset is reliable for thorough genomic analyses and captures the inherent genomic variation of sesame.

4.2. Unequal Distribution of Genetic Diversity of Sesame Accessions Across Ethiopian Research Centers

Our findings reveal significant heterogeneity in genetic variation and clear accession stratification, offering valuable insights for conservation and breeding strategies. The model-based clustering and NJ cluster analysis confirmed substantial genetic differentiation among accessions from the six centers, forming three major genetic cluster groups. While the tree provides a clear hierarchical view of the collection’s structure, it is a simplified representation of the underlying genetic relationships. Given the history of breeding and germplasm exchange in these research centers, some degree of reticulated ancestry is likely, which may not be fully captured by a strictly bifurcating model. This hierarchical genetic structure is consistent with patterns observed in other crops, where geographic and institutional isolation can drive divergence [73]. The tendency for accessions from specific centers (notably, GaARC WARC and GARC) to form distinct, compact subclusters indicates strong genetic identity and potentially limited gene flow with external sources. In contrast, the wider dispersion of AARC, PARC, and BARC accessions across the tree suggests either a historically diverse founding population or more extensive exchange and introgression with materials from other centers, use of similar breeding materials, or localized adaptation, a phenomenon documented in sesame collections in China and India [74,75]. The presence of mixed-center clusters further suggests historical gene flow and sharing of germplasm among research centers, contributing to the moderate genetic differentiation observed. From a conservation perspective, the distinct clustering and unique lineages associated with GaARC, WARC, and GARC highlight these collections as priority targets for both in situ and ex situ conservation, as they harbor rare alleles and divergent accessions that may be lost if not properly maintained. For genetic improvement, the clear genetic divergence among clusters provides valuable opportunities for parental selection, as crosses between genetically distant accessions, particularly those from GaARC or WARC and more uniform populations, could enhance heterosis and broaden the genetic base of Ethiopian sesame breeding programs.

4.3. Genetic Diversity Metrics Among Research Center-Based Sesame Accessions

Our analyses reveal a structured yet interconnected landscape of genetic diversity among sesame accessions maintained across Ethiopian research centers. Diversity metrics identified GaARC as a major reservoir of variation, exhibiting the highest gene diversity (He = 0.325) and fixation index (Fst = 0.291) (Table 1), along with numerous private alleles, indicating unique genetic resources valuable for breeding and adaptation. PARC also showed high allelic and nucleotide diversity, emphasizing its importance for broadening the breeding pool. WARC displayed comparatively high heterozygosity (Ho = 0.292) and moderate-to-high differentiation from other centers, reflecting both substantial internal variation and genetic distinctiveness. In contrast, AARC and BARC consistently exhibited lower heterozygosity and gene diversity (Ho = 0.131–0.138, He = 0.102–0.197) (Table 1), suggesting a narrower genetic base potentially shaped by historical selection, founder effects, or genetic erosion, which may increase vulnerability to environmental stresses [76].

Consistent with these patterns, pairwise differentiation estimates indicated that most collections are weakly differentiated, with very low FST values among AARC, BARC, GaARC, and PARC (e.g., GaARC–PARC = 0.001; BARC–AARC = 0.015) (Figure 4), reflecting shared germplasm sources, frequent seed exchange, and comparable breeding histories [13]. In contrast, WARC—and to a lesser extent GARC—showed moderate differentiation (e.g., WARC–BARC = 0.356; WARC–AARC = 0.327; GARC–BARC = 0.285; GARC–AARC = 0.250) (Figure 4), highlighting their potential as sources of novel alleles for traits such as drought tolerance and disease resistance [77].

Population structure analyses supported these observations. STRUCTURE analysis (optimal K = 5) partitioned the 188 accessions into five genetic clusters, ranging from relatively homogeneous clusters with limited admixture to highly admixed groups. Principal component analysis separated more divergent collections, such as WARC, from a core cluster of genetically similar centers (AARC, BARC, GaARC, PARC), while Neighbor-Joining tree visualization highlighted both highly similar clusters among core centers and distinct lineages in more differentiated collections. Clusters with predominantly pure ancestry indicate the maintenance of locally adapted gene pools with restricted gene flow, whereas the extensive admixture observed—particularly in WARC—reflects its historical role as an active germplasm hub engaged in systematic collection and exchange [75,78,79].

The AMOVA results further reinforced these patterns. Moderate but significant differentiation among collection sites (PhiPT = 0.298, p = 0.001) (Table 3), with 70% of the total variation residing within populations, aligns with the partial overlap of accessions from different centers observed in NJ clustering. This overlap likely reflects shared ancestry and seed exchange among institutes, a pattern commonly observed in self-pollinated crops with regional germplasm movement [80,81]. When accessions were grouped according to inferred genetic clusters, the proportion of variation among clusters increased to 43% (PhiPT = 0.429, p = 0.001) (Table S3), indicating stronger differentiation and confirming the distinct groups identified in STRUCTURE analyses with reduced admixture. These results demonstrate that genetic differentiation in this panel is better explained by underlying genomic structure than by geographic origin alone, consistent with previous diversity and GWAS panels [59].

Overall, the concordance among diversity metrics, pairwise differentiation, NJ, STRUCTURE, and AMOVA reveals a robust and hierarchical genetic structure. Most Ethiopian sesame collections are genetically interconnected, reflecting shared germplasm and seed exchange, while WARC and, to a lesser extent, GARC represent more differentiated gene pools harboring unique alleles. This structure, characterized by substantial within-group diversity alongside distinct genetic clusters, provides a strong foundation for association mapping, parent selection, and breeding programs, while ensuring the long-term sustainability of sesame improvement efforts [80,82].

The clear genetic divergence of WARC, as a distinct cluster in PCA space, identifies it as a critical reservoir of diversity. Its broad ancestral base likely results from the amalgamation of materials from multiple origins. Conversely, the tight, homogeneous clustering of AARC and BARC accessions near the PCA origin, alongside their pure ancestry, suggests that these centers maintain genetically uniform breeding lines or landraces. While such purity is valuable for preserving specific traits, it also indicates a narrow genetic base, which can limit adaptive potential and is a common concern in structured breeding programs [76]. The intermediate and substructured nature of GaARC, GARC, and PARC reflects a more moderate history of exchange. These centers show evidence of both distinct subgroups and admixture, suggesting they possess internal diversity while also sharing ancestry with neighboring pools. The minimal differentiation among AARC, BARC, GaARC, and PARC in the PCA supports this interpretation of shared genetic background within a core group, a pattern similarly observed in regional sesame populations in South Asia [74]. The wide dispersion of GARC along PC2 further highlights significant within-population variation, another key component of overall diversity.

The Discriminant Analysis of Principal Components (DAPC) provides robust, complementary evidence of pronounced genetic structure within the sesame germplasm collection, effectively discriminating among populations based on their geographic and institutional origins. The separation of accessions into distinct clusters validates and refines the patterns identified by PCA, highlighting specific centers as drivers of major genetic divergence. The pronounced isolation of the WARC population in DAPC space, forming a well-defined and distant cluster, underscores its status as a highly divergent genetic pool. This strong differentiation is consistent with findings in other crops where specific germplasm hubs, often located in regions of secondary diversity or characterized by intensive historical collection efforts, accumulate and preserve unique alleles, leading to significant genetic distinctness [13]. Similarly, the separation and long membership vectors of the GARC population indicate not only differentiation from other groups but also considerable within-population variability. This pattern suggests that GARC’s germplasm may comprise several sub-lineages or admixed individuals, indicating collections that have incorporated materials from diverse sources, thereby increasing their value for breeding [83].

In contrast, the substantial overlap and compact clustering of accessions from AARC, BARC, GaARC, and PARC near the DAPC centroid reflect a shared genetic background and low inter-population differentiation. This genetic homogeneity among centers is indicative of either common ancestral origins, historical seed exchange within a regional network, or parallel selection for similar adaptive traits, a phenomenon frequently observed in adjacent agricultural zones [79]. The DAPC result confirming K = 3 as an optimal clustering level reveals a simplified but meaningful hierarchy: one broad, variable group (Cluster 1), one cohesive, uniform group (Cluster 2), and a small, isolated lineage (Cluster 3). This tripartite structure suggests that, beyond the specific institutional origins, the entire collection is stratified into major gene pools representing (1) a diverse, admixed background, (2) a conserved, pure lineage, and (3) a unique genetic resource. The tight, isolated nature of Cluster 3 is particularly noteworthy. Such distinct, small clusters often represent rare landraces, breeding lines with unique ancestries, or relics of older varietal groups. Their preservation is critical, as they may harbor novel alleles for stress tolerance or quality traits that have been lost from mainstream breeding pools, a concern raised in assessments of genetic erosion in crops like sesame and rice [84]. Conversely, the broad dispersion of Cluster 1 aligns with the high-admixture profiles seen in WARC and parts of GARC, reinforcing their role as repositories of genetic variance. In conclusion, the DAPC analysis definitively captures the major axes of genetic differentiation within the collection, identifying WARC and GARC as key divergent populations and revealing a core group of genetically similar centers. The resolution into three primary genetic clusters provides a practical and powerful lens through which to manage, conserve, and utilize this germplasm. By aligning breeding strategies with this inherent genetic architecture, we can more efficiently harness the full spectrum of diversity to enhance the resilience and productivity of sesame.

5. Conclusions

This study provides a comprehensive genome-wide assessment of genetic diversity and population structure in 188 Ethiopian sesame accessions using high-quality DArTSeq-based SNP markers. Four seedlings per accession were sampled to ensure successful DNA extraction and to represent each accession, rather than to quantify within-accession genetic variation. Consequently, the analyses were designed to infer broader patterns of genetic diversity and structure among accessions and ex situ collections rather than fine-scale intra-accession variation. The large number of informative SNPs distributed across all sixteen chromosomes revealed substantial genetic variation and a heterogeneous genomic landscape, confirming the robustness of the dataset for genetic analysis of accession groups. Overall levels of expected heterozygosity exceeded observed heterozygosity, reflecting the predominantly self-pollinating nature of sesame, while the prevalence of transition-type mutations and uneven SNP density further highlighted the intrinsic genomic characteristics of the crop.

Consistent results from STRUCTURE, PCA, DAPC, and NJ cluster analysis, pairwise FST, and AMOVA demonstrated a clear but complex population structure shaped by both historical gene flow and institutional germplasm management. Certain research centers, particularly WARC and GARC, emerged as highly diverse and genetically distinct reservoirs, whereas AARC and BARC showed relatively homogeneous and narrowly based gene pools. AARC and BARC appear highly homogeneous with predominantly pure ancestry across multiple K values in STRUCTURE, whereas the same collections are widely dispersed in the neighbor-joining (NJ) tree. This difference reflects methodological contrasts rather than a true inconsistency. STRUCTURE emphasizes shared ancestry and between-cluster differentiation, which may obscure fine-scale variation within collections, while the NJ tree is based on genetic distances and is sensitive to subtle variation and admixture. Consequently, the dispersion of AARC and BARC accessions in the NJ tree reflects intra-collection diversity, whereas STRUCTURE highlights their shared genetic background, rendering the two approaches complementary. The AMOVA results confirmed that most genetic variation resides within populations, yet structure-based grouping explained a greater proportion of among-population variance than geographic origin alone, underscoring the importance of underlying genomic structure over simple spatial classification.

Moreover, the genetic diversity and structure of sesame collections across Ethiopian research centers revealed extensive germplasm exchange and shared curation histories, as indicated by weak genetic structure and significant genetic overlap among centers. These patterns show that institutional origin alone does not define separate genetic pools, highlighting the need for diversity-based organization of collections. The observed moderately differentiated and highly admixed groups provide a solid foundation for developing compact core sets that maximize allelic diversity while minimizing redundancy, facilitating efficient parental selection and pre-breeding. Collectively, these findings demonstrate the strategic value of divergent and admixed populations for crop improvement, supporting efficient germplasm conservation and serving as a foundation for designing structured parental panels specifically for use in association mapping or the development of multi-parent populations. Integrating this genomic information with multi-environment phenotypic data will enable the identification of trait-associated loci and accelerate marker-assisted and genomic selection for yield stability, drought tolerance, and other economically important traits, thereby enhancing sesame breeding programs and strengthening long-term food and economic security.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/genes17030300/s1, Table S1: List of sesame accessions with their collection research centers, Table S2: Genetic diversity parameters across chromosomes, and Table S3: Analysis of molecular variance among and within sesame accession groups according to the K-group.

Author Contributions

Conceptualization, F.B.G., A.D.N., T.M.M. and H.G.; methodology, F.B.G., A.D.N., T.M.M. and H.G.; software, F.B.G., A.D.N. and T.M.M.; validation, F.B.G., A.D.N., T.M.M., H.G. and R.M.C.; formal analysis, F.B.G. and T.M.M.; investigation, F.B.G.; resources, R.M.C. and H.G.; data curation, F.B.G.; writing—original draft preparation, F.B.G.; writing—review and editing, F.B.G., A.D.N., T.M.M., H.G. and R.M.C.; visualization, F.B.G. and T.M.M.; supervision, A.D.N., T.M.M. and H.G.; project administration, A.D.N., T.M.M., H.G. and R.M.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research work and article processing charges were supported through a scholarship awarded to Feyisa Bejiga Gelashe by the Centre of Excellence in Agri-Food Systems and Nutrition (CE-AFSN) of Eduardo Mondlane University, Mozambique, under the World Bank—African Center of Excellence (ACE) II Funded Project grant number E089-MZ and International Maize and Wheat Improvement Center (CIMMYT)—Vision for Adapted Crops & Soils (VACS) Capacity Building Project Initiative.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

The study was part of the M.Sc. research work of the first author (F.B.G.), and he is grateful to Jimma University for giving him study leave. He would also like to express his sincere gratitude to the Centre of Excellence in Agri-Food Systems and Nutrition (CE-AFSN) for providing a supportive and enabling research environment during his M.Sc. studies at Eduardo Mondlane University. The authors also acknowledge the Assosa, Bako, Gambella, Gondar, Pawe, and Werer Agricultural Research Centers in Ethiopia for generously providing the sesame accessions and improved varieties used in this study, as well as the SEQART AFRICA genotyping platform for its high-quality genotyping services. During the preparation of this manuscript, GenAI (Gemini) tools were employed to enhance the language and correct grammatical errors.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AARC	Assosa Agricultural Research Center
BARC	Bako Agricultural Research Center
GaARC	Gambella Agricultural Research Center
GARC	Gondar Agricultural Research Center
PARC	Pawe Agricultural Research Center
WARC	Werer Agricultural Research Center
Ho	Observed heterozygosity
He	Expected heterozygosity
Ht	Overall gene diversity
Dst	Gene diversity among samples
Dst	Locus-Specific Differentiation
Fis	Inbreeding Coefficient
H′	Shannon diversity index
Ng	Number of genotypes
Pa	Private alleles
Ps	Private SNPs
Ad	Allelic diversity
Nd	Nucleotide diversity
He′	Gene diversity
Ho′	Heterozygosity
Fst	Fixation index across the population
Df	Degrees of freedom
SS	Sum of squares
MS	Mean square
EV	Estimated variance
PV	Percentage variance
PhiPT	Genetic differentiation among populations
Phi’PT	Standardized genetic differentiation
PhiPT max	Maximum possible population differentiation

References

Gormley, I.C.; Bedigian, D.; Olmstead, R.G. Phylogeny of pedaliaceae and martyniaceae and the placement of Trapella in Plantaginaceae s. l. Syst. Bot. 2015, 40, 259–268. [Google Scholar] [CrossRef]
Bedigian, D. Systematics and evolution in Sesamum L. (Pedaliaceae), part 1: Evidence regarding the origin of sesame and its closest relatives. Webbia 2015, 70, 1–42. [Google Scholar] [CrossRef]
De Candolle, A. Origin of Cultivated Plants; Kegan Paul, French Co.: London, UK, 1884; Reprinted by Noble Offset Printers: New York, NY, USA, 1959. [Google Scholar]
Vavilov, N.I. Studies on the origin of cultivated plants. Bull. Appl. Bot. 1926, 16, 1–248, (In Russian, English Summary). [Google Scholar]
Watt, G. Dictionary of the Economic Products of India, Govt; India Central Printing Office: Calcutta, India, 1893; Volume 6, pp. 502–541. [Google Scholar]
Angamuthu, M.; Govindasamy, S.; Kasirajan, S.; Langyan, S.; Rangan, P.; Pradheep, K. Origin and History of Sesame and Its Uses; Springer: Berlin/Heidelberg, Germany, 2025; pp. 1–19. [Google Scholar] [CrossRef]
Miao, H.; Wang, L.; Qu, L.; Liu, H.; Sun, Y.; Le, M.; Wang, Q.; Wei, S.; Zheng, Y.; Lin, W.; et al. Genomic evolution and insights into agronomic trait innovations of Sesamum species. Plant Commun. 2024, 5, 100729. [Google Scholar] [CrossRef]
Kobayashi, T. Cytogenetics of sesame (Sesamum indicum L.). In Chromosome Engineering in Plants: Genetics, Breeding, Evolution. Part B; Tsuchiya, T., Gupta, P.K., Eds.; Elsevier: Amsterdam, The Netherlands, 1991; pp. 581–592. [Google Scholar]
Teklu, D.H.; Shimelis, H.; Tesfaye, A.; Mashilo, J.; Zhang, X.; Zhang, Y.; Dossa, K.; Shayanowako, A.I.T. Genetic variability and population structure of Ethiopian sesame (Sesamum indicum L.) germplasm assessed through phenotypic traits and simple sequence repeats markers. Plants 2021, 10, 1129. [Google Scholar] [CrossRef]
Wei, P.; Zhao, F.; Wang, Z.; Wang, Q.; Chai, X.; Hou, G.; Meng, Q. Sesame (Sesamum indicum L.): A Comprehensive Review of Nutritional Value, Phytochemical Composition, Health Benefits, Development of Food, and Industrial Applications. Nutrients 2022, 14, 4079. [Google Scholar] [CrossRef]
Food and Agriculture Organization Corporate Statistical Database (FAOSTAT). Crops and Livestock Products. 2024. Available online: https://www.fao.org/faostat/en/#data/QCL (accessed on 1 December 2025).
Pathak, N.; Rai, A.K.; Kumari, R.; Thapa, A.; Bhat, K.V. Sesame Crop: An Underexploited Oilseed Holds Tremendous Potential for Enhanced Food Value. Agric. Sci. 2014, 5, 519–529. [Google Scholar] [CrossRef]
Wang, L.; Yu, S.; Tong, C.; Zhao, Y.; Liu, Y.; Song, C.; Zhang, Y.; Zhang, X.; Wang, Y.; Hua, W.; et al. Genome Sequencing of the High Oil Crop Sesame Provides Insight into Oil Biosynthesis. Genome Biol. 2014, 15, R39. [Google Scholar] [CrossRef] [PubMed]
Maurya, R.; Singh, S.; Babu, Y.S.; Khan, F.N.; Nawade, B.; Vishwakarma, H.; Kumar, A.; Yadav, R.; Jalli, R.; Angamuthu, M.; et al. Molecular Diversity Studies and Core Development in Sesame Germplasm (Sesamum indicum L.) Using SSR Markers. Plant Mol. Biol. Rep. 2025, 43, 180–196. [Google Scholar] [CrossRef]
Gildemacher, P.; Audet-Bélanger, G.; Mangnus, E.; Van De Pol, F.; Tiombiano, D.; Sanogo, K. Sesame Sector Development Lessons Learned in Burkina Faso and Mali; Royal Tropical Institute (KIT): Amsterdam, The Netherlands, 2015. [Google Scholar]
Abate, M.; Mekbib, F. Assessment of Genetic Diversity in Ethiopian Sesame (Sesamum indicum L.) Germplasm using Random Amplified Polymorphic DNA (RAPD) Markers. J. Adv. Agric. 2015, 5, 639–649. [Google Scholar] [CrossRef]
Muthoni, J.; Shimelis, H. Production of minor tropical oil crops in Africa: Case of sesame (Sesamum indicum L.). Aust. J. Crop Sci. 2025, 19, 816–829. [Google Scholar] [CrossRef]
Girma, T.K.; Worku, Y.; Bachewe, F.; Asnake, W.; Abate, G. Scoping study on Ethiopian sesame value chain. In Rethinking Food Markets; International Food Policy Research Institute: Washington, DC, USA, 2022. [Google Scholar]
Weldemichael, M.Y.; Gebremedhn, H.M. Research advances and prospects of molecular markers in sesame: A review. Plant Biotechnol. Rep. 2023, 17, 585–603. [Google Scholar] [CrossRef]
Gulhan, E.A.; Taskin, M.; Turgut, K. Analysis of Genetic Diversity in Turkish Sesame (Sesamum indicum L.) Populations Using RAPD Markers. Genet. Resour. Crop Evol. 2004, 51, 599–607. [Google Scholar] [CrossRef]
Laurentin, H.; Karlovsky, P. AFLP fingerprinting of sesame (Sesamum indicum L.) cultivars: Identification, genetic relationship and comparison of AFLP informativeness parameters. Genet. Resour. Crop Evol. 2007, 54, 1437–1446. [Google Scholar] [CrossRef]
Laurentin, H.; Ratzinger, A.; Karlovsky, P. Relationship between metabolic and genomic diversity in sesame (Sesamum indicum L.). BMC Genom. 2008, 9, 250. [Google Scholar] [CrossRef]
Pham, T.D.; Bui, T.M.; Werlemark, G.; Bui, T.C.; Merker, A.; Carlsson, A. A study of genetic diversity of sesame (Sesamum indicum L.) in Vietnam and Cambodia estimated by RAPD markers. Genet. Resour. Crop Evol. 2009, 56, 679–690. [Google Scholar] [CrossRef]
Ferdinandez, Y.S.N.; Somers, D.J.; Coulman, B.E. Estimating the genetic relationship of hybrid bromegrass to smooth bromegrass and meadow bromegrass using RAPD markers. Plant Breed. 2001, 120, 149–153. [Google Scholar] [CrossRef]
Govindaraj, M.; Vetriventhan, M.; Srinivasan, M. Importance of genetic diversity assessment in crop plants and its recent advances: An overview of its analytical perspectives. Genet. Res. Int. 2015, 2015, 431487. [Google Scholar] [CrossRef]
Akbar, F.; Rabbani, M.A.; Shahid Masood, M.; Shinwari, Z.K. Genetic diversity of sesame (Sesamum indicum L.) germplasm from Pakistan using RAPD markers. Pak. J. Bot. 2011, 43, 2153–2160. [Google Scholar]
Nyongesa, B.O.; Were, B.A.; Gudu, S.; Dangasuk, O.G.; Onkware, A.O. Genetic diversity in cultivated sesame (Sesamum indicum L.) and related wild species in East Africa. J. Crop Sci. Biotechnol. 2013, 16, 9–15. [Google Scholar] [CrossRef]
Cho, Y.I.; Park, J.H.; Lee, C.W.; Ra, W.H.; Chung, J.W.; Lee, J.R.; Ma, K.H.; Lee, S.Y.; Lee, K.S.; Lee, M.C.; et al. Evaluation of the genetic diversity and population structure of sesame (Sesamum indicum L.) using microsatellite markers. Genes Genom. 2011, 33, 187–195. [Google Scholar] [CrossRef]
Furat, S.; Uzun, B. The use of agro-morphological characters for the assessment of genetic diversity in sesame (Sesamum indicum L.). Plant Omics 2010, 3, 85–91. [Google Scholar]
Sharma, S.N.; Kumar, V.; Mathur, S. Comparative Analysis of RAPD and ISSR Markers for Characterization of Sesame (Sesamum indicum L.) Genotypes. J. Plant Biochem. Biotechnol. 2009, 18, 37–43. [Google Scholar] [CrossRef]
Tabatabaei, I.; Pazouki, L.; Bihamta, M.R.; Mansoori, S.; Javaran, J.; Niinemets, Ü. Genetic variation among Iranian sesame (Sesamum indicum L.) accessions vis-à-vis exotic genotypes on the basis of morpho-physiological traits and RAPD markers. Aust. J. Crop Sci. 2011, 5, 1396–1407. [Google Scholar]
Uzun, B.; Çaǧ, M.I. Identification of molecular markers linked to determinate growth habit in sesame. Euphytica 2009, 166, 379–384. [Google Scholar] [CrossRef]
Zhang, H.; Wei, L.; Miao, H.; Zhang, T.; Wang, C. Development and validation of genic-SSR markers in sesame by RNA-seq. BMC Genom. 2012, 13, 316. [Google Scholar] [CrossRef]
Kumar, H.; Kaur, G.; Banga, S. Molecular Characterization and Assessment of Genetic Diversity in Sesame (Sesamum indicum L.) Germplasm Collection Using ISSR Markers. J. Crop Improv. 2012, 26, 540–557. [Google Scholar] [CrossRef]
Gebremichael, D.E.; Parzies, H.K. Genetic variability among landraces of sesame in Ethiopia. Afr. Crop Sci. J. 2010, 19, 1–13. [Google Scholar] [CrossRef]
Laurentin, H.E.; Karlovsky, P. Genetic relationship and diversity in a sesame (Sesamum indicum L.) germplasm collection using amplified fragment length polymorphism (AFLP). BMC Genet. 2006, 7, 10. [Google Scholar] [CrossRef] [PubMed]
Zhang, Y.X.; Zhang, X.R.; Hua, W.; Wang, L.H.; Che, Z. Analysis of genetic diversity among indigenous landraces from sesame (Sesamum indicum L.) core collection in China as revealed by SRAP and SSR markers. Genes Genom. 2010, 32, 207–215. [Google Scholar] [CrossRef]
Zhang, Y.; Zhang, X.; Che, Z.; Wang, L.; Wei, W.; Li, D. Genetic diversity assessment of sesame core collection in China by phenotype and molecular markers and extraction of a mini-core collection. BMC Genet. 2012, 13, 102. [Google Scholar] [CrossRef]
Wu, K.; Yang, M.; Liu, H.; Tao, Y.; Mei, J.; Zhao, Y. Genetic analysis and molecular characterization of Chinese sesame (Sesamum indicum L.) cultivars using Insertion-Deletion (InDel) and Simple Sequence Repeat (SSR) markers. BMC Genet. 2014, 15, 35. [Google Scholar] [CrossRef] [PubMed]
Antoni, R. Applications of Single-Nucleotide Polymorphisms in Crop Genetics. Curr. Opin. Plant Biol. 2002, 5, 94–100. [Google Scholar] [CrossRef]
Basak, M.; Uzun, B.; Yol, E. Genetic diversity and population structure of the Mediterranean sesame core collection with use of genome-wide SNPs developed by double digest RAD-Seq. PLoS ONE 2019, 14, e0223757. [Google Scholar] [CrossRef] [PubMed]
Elshire, R.J.; Glaubitz, J.C.; Sun, Q.; Poland, J.A.; Kawamoto, K.; Buckler, E.S.; Mitchell, S.E. A robust, simple genotyping-by-sequencing (GBS) approach for high-diversity species. PLoS ONE 2011, 6, e19379. [Google Scholar] [CrossRef]
Bajgain, P.; Rouse, M.N.; Anderson, J.A. Comparing genotyping-by-sequencing and single nucleotide polymorphism chip genotyping for quantitative trait loci mapping in wheat. Crop Sci. 2016, 56, 232–248. [Google Scholar] [CrossRef]
Xiong, H.; Shi, A.; Mou, B.; Qin, J.; Motes, D.; Lu, W.; Ma, J.; Weng, Y.; Yang, W.; Wu, D. Genetic diversity and population structure of cowpea (Vigna unguiculata L. Walp). PLoS ONE 2016, 11, e0160941. [Google Scholar] [CrossRef]
Fatokun, C.A.; Tarawali, S.A.; Singh, B.B.; Kormawa, P.M.; Tamò, M. Challenges and Opportunities for Enhancing Sustainable Cowpea Production; International Institute of Tropical Agriculture (IITA): Ibadan, Nigeria, 2018. [Google Scholar]
Kilian, A.; Wenzl, P.; Huttner, E.; Carling, J.; Xia, L.; Blois, H.; Caig, V.; Heller-Uszynska, K.; Jaccoud, D.; Hopper, C.; et al. Diversity arrays technology: A generic genome profiling technology on open platforms. Methods Mol. Biol. 2012, 888, 67–89. [Google Scholar] [CrossRef]
Gruber, B.; Unmack, P.J.; Berry, O.F.; Georges, A. Dartr: An R package to facilitate analysis of SNP data generated from reduced representation genome sequencing. Mol. Ecol. Resour. 2018, 18, 691–699. [Google Scholar] [CrossRef]
Danecek, P.; Auton, A.; Abecasis, G.; Albers, C.A.; Banks, E.; DePristo, M.A.; Handsaker, R.E.; Lunter, G.; Marth, G.T.; Sherry, S.T.; et al. The variant call format and VCF-tools. Bioinformatics 2011, 27, 2156–2158. [Google Scholar] [CrossRef]
Bradbury, P.J.; Zhang, Z.; Kroon, D.E.; Casstevens, T.M.; Ramdoss, Y.; Buckler, E.S. TASSEL: Software for association mapping of complex traits in diverse samples. Bioinformatics 2007, 23, 2633–2635. [Google Scholar] [CrossRef]
R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2021. [Google Scholar]
Jombart, T.; Ahmed, I. Adegenet 1.3-1: New tools for the analysis of genome-wide SNP data. Bioinformatics 2011, 27, 3070–3071. [Google Scholar] [CrossRef]
Evanno, G.; Regnaut, S.; Goudet, J. Detecting the number of clusters of individuals using the software STRUCTURE: A simulation study. Mol. Ecol. 2005, 14, 2611–2620. [Google Scholar] [CrossRef] [PubMed]
Letunic, I.; Bork, P. Interactive tree of life (iTOL) v5: An online tool for phylogenetic tree display and annotation. Nucleic Acids Res. 2021, 49, W293–W296. [Google Scholar] [CrossRef]
Porras-Hurtado, L.; Ruiz, Y.; Santos, C.; Phillips, C.; Carracedo, Á.; Lareu, M.V. An overview of STRUCTURE: Applications, parameter settings, and supporting software. Front. Genet. 2013, 4, 98. [Google Scholar] [CrossRef]
Jombart, T.; Collins, C. A Tutorial for Discriminant Analysis of Principal Components (DAPC) Using Adegenet 2.1.6; Imperial College: London, UK, 2022. [Google Scholar]
Neath, A.A.; Cavanaugh, J.E. The Bayesian information criterion: Background, derivation, and applications. WIREs Comput. Stat. 2012, 4, 199–203. [Google Scholar] [CrossRef]
Salazar, E.; González, M.; Araya, C.; Mejía, N.; Carrasco, B. Genetic diversity and intra-racial structure of Chilean Choclero corn (Zea mays L.) germplasm revealed by simple sequence repeat markers (SSRs). Sci. Hortic. 2017, 225, 620–629. [Google Scholar] [CrossRef]
Jombart, T. Adegenet: A R package for the multivariate analysis of genetic markers. Bioinformatics 2008, 24, 1403–1405. [Google Scholar] [CrossRef] [PubMed]
Jombart, T.; Devillard, S.; Balloux, F. Discriminant analysis of principal components: A new method for the analysis of genetically structured populations. BMC Genet. 2010, 11, 94. [Google Scholar] [CrossRef]
Excoffier, L.; Smouse, P.E.; Quattro, J.M. Analysis of Molecular Variance Inferred from Metric Distances Among DNA Haplotypes: Application to Human Mitochondrial DNA Restriction Data. Genetics 1992, 131, 479–491. [Google Scholar] [CrossRef]
Kamvar, Z.N.; Tabima, J.F.; Grünwald, N.J. Poppr: An R package for genetic analysis of populations with clonal, partially clonal, and/or sexual reproduction. PeerJ 2014, 2014, e281. [Google Scholar] [CrossRef]
Kamvar, Z.N.; Brooks, J.C.; Grünwald, N.J. Novel R tools for analysis of genome-wide population genetic data with emphasis on clonality. Front. Genet. 2015, 6, 208. [Google Scholar] [CrossRef]
Nei, M. Genetic distance between populations. Am. Nat. 1972, 106, 283–292. [Google Scholar] [CrossRef]
Hamrick, J.L.; Godt, M.W. Effects of life history traits on genetic diversity in plant species. Philos. Trans. R. Soc. Lond. Ser. B Biol. Sci. 1996, 351, 1291–1298. [Google Scholar]
Nordborg, M. Linkage disequilibrium, gene trees and selfing: An ancestral recombination graph with partial self-fertilization. Genetics 2000, 154, 923–929. [Google Scholar] [CrossRef] [PubMed]
Van Hintum, T.J.; Brown, A.H.; Spillane, C. Core Collections of Plant Genetic Resources; Bioversity International: Rome, Italy, 2000. [Google Scholar]
Dossa, K.; Diouf, D.; Wang, L.; Wei, X.; Zhang, Y.; Niang, M.; Fonceka, D.; Yu, J.; Mmadi, M.A.; Yehouessi, L.W.; et al. The emerging oilseed crop Sesamum indicum enters the ‘Omics’ era. Front. Plant Sci. 2017, 8, 1154. [Google Scholar] [CrossRef] [PubMed]
Wang, L.; Xia, Q.; Zhang, Y.; Zhu, X.; Zhu, X.; Li, D.; Ni, X.; Gao, Y.; Xiang, H.; Wei, X.; et al. Updated sesame genome assembly and fine mapping of plant height and seed coat color QTLs using a new high-density genetic map. BMC Genom. 2016, 17, 31. [Google Scholar] [CrossRef]
Wei, X.; Wang, L.; Zhang, Y.; Qi, X.; Wang, X.; Ding, X.; Zhang, J.; Zhang, X. Development of simple sequence repeat (SSR) markers of sesame (Sesamum indicum L.) from a genome survey. Molecules 2014, 19, 5150–5162. [Google Scholar] [CrossRef]
Dossa, K.; Wei, X.; Zhang, Y.; Fonceka, D.; Yang, W.; Diouf, D.; Liao, B.; Cissé, N.; Zhang, X. Analysis of genetic diversity and population structure of sesame accessions from Africa and Asia as major centers of its cultivation. Genes 2016, 7, 14. [Google Scholar] [CrossRef] [PubMed]
Dossa, K.; Li, D.; Zhou, R.; Yu, J.; Wang, L.; Zhang, Y.; You, J.; Liu, A.; Mmadi, M.A.; Fonceka, D.; et al. The genetic basis of drought tolerance in the high oil crop Sesamum indicum. Plant Biotechnol. J. 2019, 17, 1788–1803. [Google Scholar] [CrossRef]
Cui, C.; Liu, Y.; Liu, Y.; Cui, X.; Sun, Z.; Du, Z.; Wu, K.; Jiang, X.; Mei, H.; Zheng, Y. Genome-wide association study of seed coat color in sesame (Sesamum indicum L.). PLoS ONE 2021, 16, e0251526. [Google Scholar] [CrossRef]
Rosenberg, N.A.; Pritchard, J.K.; Weber, J.L.; Cann, H.M.; Kidd, K.K.; Zhivotovsky, L.A.; Feldman, M.W. Genetic structure of human populations. Science 2002, 298, 2381–2385. [Google Scholar] [CrossRef]
Dixit, A.; Jin, M.H.; Chung, J.W.; Yu, J.W.; Chung, H.K.; Ma, K.H.; Park, Y.J.; Cho, E.G. Development of polymorphic microsatellite markers in sesame (Sesamum indicum L.). Mol. Ecol. Notes 2005, 5, 736–738. [Google Scholar] [CrossRef]
Zhang, H.; Miao, H.; Wang, L.; Qu, L.; Liu, H.; Wang, Q.; Yue, M. Genome sequencing of the important oilseed crop Sesamum indicum L. Genome Biol. 2013, 14, 401. [Google Scholar] [CrossRef] [PubMed]
Allendorf, F.W.; Luikart, G.; Aitken, S.N. Conservation and the Genetics of Populations, 2nd ed.; John Wiley & Sons: Hoboken, NJ, USA, 2013. [Google Scholar]
Hajjar, R.; Hodgkin, T. The use of wild relatives in crop improvement: A survey of developments over the last 20 years. Euphytica 2007, 156, 1–13. [Google Scholar] [CrossRef]
Gouesnard, B.; Diaw, Y.; Gay, L.; Ronfort, J.; David, J. Seed management and selection in ancient maize landraces from the French Pyrenees: Ethnobotanical survey and selection experiment. Genet. Resour. Crop Evol. 2025, 72, 10375–10395. [Google Scholar] [CrossRef]
Morrell, P.L.; Buckler, E.S.; Ross-Ibarra, J. Crop genomics: Advances and applications. Nat. Rev. Genet. 2011, 13, 85–96. [Google Scholar] [CrossRef]
Eltaher, S.; Sallam, A.; Belamkar, V.; Emara, H.A.; Nower, A.A.; Salem, K.F.M.; Poland, J.; Baenziger, P.S. Genetic diversity and population structure of F3:6 Nebraska Winter wheat genotypes using genotyping-by-sequencing. Front. Genet. 2018, 9, 76. [Google Scholar] [CrossRef] [PubMed]
Pritchard, J.K.; Stephens, M.; Donnelly, P. Inference of Population Structure Using Multilocus Genotype Data. Genetics 2000, 155, 945–959. [Google Scholar] [CrossRef]
Olukolu, B.A.; Mayes, S.; Stadler, F.; Ng, N.Q.; Fawole, I.; Dominique, D.; Azam-Ali, S.N.; Abbott, A.G.; Kole, C. Genetic diversity in Bambara groundnut (Vigna subterranea (L.) Verdc.) as revealed by phenotypic descriptors and DArT marker analysis. Genet. Resour. Crop Evol. 2012, 59, 347–358. [Google Scholar] [CrossRef]
Van De Wouw, M.; Kik, C.; Van Hintum, T.; Van Treuren, R.; Visser, B. Genetic erosion in crops: Concept, research results and challenges. Plant Genet. Resour. Charact. Util. 2010, 8, 1–15. [Google Scholar] [CrossRef]
Khazaei, M.R.; Ahmadi, S.; Saghafian, B.; Zahabiyoun, B. A new daily weather generator to preserve extremes and low-frequency variability. Clim. Change 2013, 119, 631–645. [Google Scholar] [CrossRef]

Figure 1. A map showing the collection areas of sesame accessions across Ethiopian research centers.

Figure 2. SNP marker summary. (A) Distribution of SNP markers within 1 Mb windows across sixteen linkage groups, (B) number and density of SNP markers in each linkage group, and (C) SNP mutation types identified among 5163 SNP markers used in analysis of sesame accessions.

Figure 3. The cluster analysis using DArTseq-SNP markers for genetic relationship visualization among 188 sesame accessions. (A) Neighbor-Joining (NJ); (B) networking using accessions; and (C) networking using accessions sourced from institutes. The color indicate the source of centers.

Figure 4. Institutional accession collection pairwise genetic differentiation index (FST) values of sesame research center collections (red number just for contrasting background for visibility of the numbers).

Figure 5. Principal component analysis showing the clustering between the geographical regions.

Figure 6. DAPC analysis showing the relationship between the geographic regions, with each color representing one cluster.

Figure 7. Genetic networks for all genetic groups, with node sizes indicating genetic relationships between different accessions.

Figure 8. Collection diversity: (A) estimated collection diversity of sesame accessions assessed by STRUCTURE, in k = 5, where each color represents one cluster, and (B) collection diversity generated by the ADMIXTURE model among 188 sesame accessions (k = 2 top to k = 6 bottom). Each vertical bar represents an accession, partitioned into up to k colored segments.Red color Pop_1 group, light green represent Pop_2; green color represent Pop_3, light blue represent Pop_4 and Pink color represent Pop_5, and blue-green represent Pop_6.

Table 1. Genetic parameter estimates based on 5163 SNPs among sesame subpopulations.

Populations	Genetic Parameters
Populations	Ng	Pa	Ps	Ad	Nd	Ho′	He′	Fst
Based on Research Center Collections
AARC	21	0	0	1.404	0.2048	0.131	0.102	−0.123
BARC	28	0	0	1.743	0.3921	0.166	0.197	0.138
GaARC	12	113	20	1.930	0.6477	0.223	0.325	0.291
GARC	35	6	1	1.480	0.2385	0.138	0.119	−0.066
PARC	21	152	24	1.882	0.5644	0.292	0.282	0.002
WARC	71	0	0	1.741	0.3726	0.207	0.185	−0.065
Mean		45.2	7.5	1.7	0.4	0.2	0.2	0.03

Table 2. Proportion of membership of each predefined accession group in each of clusters obtained at best k (k = 5).

Research Center Collections	Number of Accessions	Admixed Individual	Proportion of Membership in Each Cluster (%)
Research Center Collections	Number of Accessions	Admixed Individual	Cluster I	Cluster II	Cluster III	Cluster IV	Cluster V
STRUCTURE
AARC	21	0	21	0	0	0	0
BARC	28	2	26	0	0	0	0
GaARC	12	2	8	0	0	2	0
GARC	36	13	5	11	3	4	0
PARC	21	7	14	0	0	0	0
WARC	70	35	0	0	0	17	19
	188	31%	39%	6%	2%	12%	10%

Table 3. Analysis of molecular variance among and within sesame accession groups.

Source of Variation	Df	SS	MS	EV	PV	Statistics			p-Value
Source of Variation	Df	SS	MS	EV	PV	PhiPT	PhiPT Max	Phi’PT	p-Value
Among accession groups	5	46,887.540	9377.508	298.304	30%
Within accession groups	182	127,753.864	701.944	701.944	70%
Total	187	174,641.404		1000.248	100%	0.298	0.730	0.409	0.001

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Gelashe, F.B.; Ndeve, A.D.; Menamo, T.M.; Gandhi, H.; Chiulele, R.M. Genetic Diversity and Collection Structure Studies of Sesame (Sesamum indicum L.) Accessions Across Ethiopian Research Centers. Genes 2026, 17, 300. https://doi.org/10.3390/genes17030300

AMA Style

Gelashe FB, Ndeve AD, Menamo TM, Gandhi H, Chiulele RM. Genetic Diversity and Collection Structure Studies of Sesame (Sesamum indicum L.) Accessions Across Ethiopian Research Centers. Genes. 2026; 17(3):300. https://doi.org/10.3390/genes17030300

Chicago/Turabian Style

Gelashe, Feyisa Bejiga, Arsénio D. Ndeve, Temesgen M. Menamo, Harish Gandhi, and Rogério M. Chiulele. 2026. "Genetic Diversity and Collection Structure Studies of Sesame (Sesamum indicum L.) Accessions Across Ethiopian Research Centers" Genes 17, no. 3: 300. https://doi.org/10.3390/genes17030300

APA Style

Gelashe, F. B., Ndeve, A. D., Menamo, T. M., Gandhi, H., & Chiulele, R. M. (2026). Genetic Diversity and Collection Structure Studies of Sesame (Sesamum indicum L.) Accessions Across Ethiopian Research Centers. Genes, 17(3), 300. https://doi.org/10.3390/genes17030300

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Genetic Diversity and Collection Structure Studies of Sesame (Sesamum indicum L.) Accessions Across Ethiopian Research Centers

Abstract

1. Introduction

2. Materials and Methods

2.1. Plant Materials

2.2. Genomic DNA Extraction and Sequencing

2.3. SNP Calling and Data Filtering

2.4. Genetic Relationship Visualization and Diversity Analysis

2.5. Collection Diversity and Cluster Analysis

2.6. Analysis of Molecular Variance (AMOVA)

3. Results

3.1. SNP Markes Summary

3.2. Genetic Relationship Visualization Analysis

3.3. Genetic Diversity

3.4. Principal Component Analysis (PCA)

3.5. Discriminant Analysis of Principal Components (DAPC)

3.6. Collection Diversity Analysis

3.7. Molecular Variance (AMOVA)

4. Discussion

4.1. SNP Markers Density and Genome-Wide Variation Patterns

4.2. Unequal Distribution of Genetic Diversity of Sesame Accessions Across Ethiopian Research Centers

4.3. Genetic Diversity Metrics Among Research Center-Based Sesame Accessions

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI