Genetic Characterization and Population Structure of Mozambique’s Sesame (Sesamum indicum L.) Accessions Using DArTseq-Derived SNP Markers

Muteti, Winfred Nthamo; Chiulele, Rogerio Marcos; Abincha, Wilfred

doi:10.3390/genes17050528

Open AccessArticle

Genetic Characterization and Population Structure of Mozambique’s Sesame (Sesamum indicum L.) Accessions Using DArTseq-Derived SNP Markers

by

Winfred Nthamo Muteti

^1,2

,

Rogerio Marcos Chiulele

^1,2,* and

Wilfred Abincha

³

¹

Department of Crop Production, Faculty of Agronomy and Forest Engineering, Eduardo Mondlane University, 3453 Avenida Julius Nyerere, Maputo P.O. Box 257, Mozambique

²

Centre of Excellence in Agri-Food Systems and Nutrition (CE-AFSN), Eduardo Mondlane University, 5 Andar Edificio da Reitorio, Praca 25 de Junho, Maputo P.O. Box 257, Mozambique

³

Department of Research and Development, Kentegra Biotechnology Holdings LLC, Nairobi P.O. Box 566-00502, Kenya

^*

Author to whom correspondence should be addressed.

Genes 2026, 17(5), 528; https://doi.org/10.3390/genes17050528

Submission received: 6 April 2026 / Revised: 23 April 2026 / Accepted: 27 April 2026 / Published: 29 April 2026

(This article belongs to the Special Issue 5Gs in Crop Genetic and Genomic Improvement: 2025–2026)

Download

Browse Figures

Review Reports Versions Notes

Abstract

Background/Objective: Sesame (Sesamum indicum L.) is a nutritionally and economically important oilseed crop that is grown predominantly by smallholder farmers in Mozambique. However, its breeding process is constrained by a limited understanding of the genetic diversity in sesame germplasm. Therefore, this study determined the genetic diversity and population structure of a panel of 109 sesame accessions from Instituto de Investigação Agrária de Mocambique (IIAM) using DArTseq SNPs. Methods: The generated 14,763 SNPs were filtered, retaining 11,502 high-quality SNPs for this study. Results: Overall genetic diversity was moderate (mean He = 0.30, Ho = 0.30, MAF = 0.21, PIC = 0.25). Population structure analysis using sparse non-negative matrix factorization identified eight subpopulations, consistent with principal component analysis implemented via the Latent factor mixed model. Discriminant analysis of principal components (DAPC) and Ward’s hierarchical clustering based on Nei’s distance resolved the same eight clusters, although DAPC revealed overlap among clusters, consistent with extensive admixture. Analysis of molecular variance showed that 85.85% of total molecular variation was within subpopulations and 14.15% among the subpopulations. Pairwise fixation indices (ranging from 0.02 to 0.10) identified divergent subpopulations 7 and 1 as suitable candidates for hybridization. Within subpopulations, observed heterozygosity exceeded expected heterozygosity, likely reflecting residual heterozygosity in sesame landraces, admixture, reverse Wahlund effect and scoring of paralogs as heterozygous SNPs. Conclusions: Overall, this study provided insights into sesame’s genetic diversity in Mozambique, contributing to germplasm conservation and informed parental selection.

Keywords:

Sesamum indicum; DArTseq; SNPs; population structure; genetic diversity; next-generation sequencing (NGS); genotyping-by-sequencing (GBS); Mozambique; cluster analysis; DAPC

1. Introduction

Sesame (Sesamum indicum L.) is one of the world’s oldest cultivated oilseed crops, with a diploid genome (2n = 26) and a cultivation history of more than 5000 years [1]. The crop belongs to the family Pedaliaceae and is grown predominantly in tropical and subtropical regions [2]. Sesame seed is valued for its high oil content (50–60%) and 18–25% protein content [3]. The unique lignans, such as sesamin, sesamolin, and sesamol, contribute to high oxidative stability and provide pharmaceutical benefits, including anti-hypertensive and anti-cancer effects [4]. Globally, sesame is a source of income for millions of smallholder farmers in arid and semi-arid regions, with global annual production exceeding six million metric tons. In recent years, sub-Saharan Africa has emerged as a major producer, accounting for approximately 60% of global sesame production [5].

In Mozambique, sesame, locally known as Gergelim, is an important agricultural export. It is the country’s fourth most valuable agricultural export, after sugar, pigeon peas, and tobacco [6]. According to [5], the export quantity has grown consistently over the last five years, from 58,784 tonnes in 2019 to 128,433 tonnes in 2023 [7]. Sesame is primarily grown in the northern and central regions, including Nampula, Cabo Delgado, Zambézia, Tete, and Sofala, with over 588,000 producers [6]. However, the average yield of sesame in Mozambique is relatively low (541 kg/ha) for small-scale farmers, with a potential of 1500 kg/ha [8]. The low yields can be attributed to abiotic and biotic stresses, seed recycling, poor agronomic practices, and varietal limitations [9,10]. To bridge the productivity gap, understanding and exploiting sesame’s genetic diversity is essential for improvement programs.

Genetic diversity can be assessed through morphological, biochemical, and molecular markers. While morphological markers provide preliminary information, molecular markers report environmentally independent genomic variation at higher resolution [11]. Different molecular markers have been used to assess genetic diversity in sesame, including RAPDs [12,13], SSRs [14,15], AFLPs [16,17], and SRAPs [18,19]. However, the advancement of next-generation sequencing (NGS) has revolutionized the study of sesame genetic resources, making SNPs the preferred marker choice in genetic diversity studies due to their abundance, precision, and genome-wide distribution [20].

NGS applications in sesame include: whole-genome resequencing (WGS) used to detect high-quality SNPs in Saudi and exotic germplasm [21], genotyping-by-sequencing (GBS) utilized by [22] to characterize 501 accessions in the USDA collection, and diversity array technology sequencing (DArTseq) applied to evaluate 300 Ethiopian accessions [23]. Using restriction enzymes, DArTseq digests the genome DNA and then uses high-throughput sequencing to identify markers. This method produces dominant and co-dominant markers (silicoDArT and SNPs, respectively) [24]. The DArTseq genotyping-by-sequencing platform is a widely used genotyping-by-sequencing platform due to its efficiency, low cost, and speed [25]. DArT technology has been applied to mustard [26], soybean [27], and sweet potato [28], among other crops. However, there are few studies on its application in sesame, especially in genomic studies of Mozambican sesame germplasm.

The present study was therefore designed to assess genetic diversity and the population structure of a diverse panel of sesame accessions from Mozambique using SNP markers generated by DArTseq. The findings provide a foundation for germplasm conservation and informed parental selection.

2. Materials and Methods

2.1. Plant Materials

A panel of 109 sesame accessions, comprising landraces and cultivars, was sourced from Instituto de Investigação Agrária de Mocambique (IIAM), Nampula, Mozambique, and was used in this study (see Supplementary Materials).

2.2. Sample Collection, DNA Extraction, and Genotyping

For genotyping, seed samples from 109 sesame accessions were sent to SEQART Africa, Nairobi, Kenya. Four seeds from each accession were planted in a seedling tray at SEQART Africa. After 15 days, fresh healthy tissue was collected from each of the four seedlings using a leaf puncher and pooled. A NucleoMag Plant DNA extraction kit was used to isolate DNA from leaf tissue at concentration ranging from 50 to 100 ng/µL. On 0.8% agarose, the amount and quality of genomic DNA were assessed. The DArTseq complexity-reduction method [29] was used to construct the libraries. The protocol involved digesting genomic DNA with a mixture of PstI and MseI enzymes, ligating barcoded and common adapters, and PCR amplifying adapter-ligated fragments. Sequencing was performed on the NovaSeq X platform as single-end reads over 138 cycles. GBS was used for quality control [30] via DArTseq™ technology. DArTseq markers (SilicoDArT and SNP) were scored using DArTsoft14, an internal marker scoring pipeline based on algorithms. The markers were aligned to the S. indicum genome (genome assembly: ASM2616843v1; https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_026168435.1/, accessed on 28 December 2025) [31].

2.3. Quality Control and Genetic Diversity Analysis

Quality control of the DArTseq-generated SNPs was carried out using the snpReady package in R statistical software (Version 4.5.1) [32]. This included filtering markers with minor allele frequency (MAF) < 0.05 and a call rate < 0.95. Imputation of missing markers was conducted using the wright method. Similarly, the snpReady package was used to estimate overall diversity estimates; observed heterozygosity (Ho), expected heterozygosity (He)/gene diversity (GD), polymorphic information content (PIC), and FIS were computed using the snpReady package [33]. Mutation types, transversion (Tv) and transition (Ts) were also computed. Pairwise fixation indices (FSTs), intra-cluster and inter-cluster diversity statistics were determined using the dartR package [34].

2.4. Population Structure Analysis

The population structure of the panel was inferred with the sparse non-negative matrix factorization (snmf) model implemented via the landscape and ecological associations (LEA) package in R [35]. The optimal number (K) of ancestral populations from the snmf model was determined using a cross-entropy criterion, with K varying from 1 to 10 and 10 independent runs per K. The optimal K was selected as the value that minimized the cross-entropy criterion [36]. Principal component analysis performed within LEA was used to confirm the number of genetic clusters.

2.5. Analysis of Molecular Variance

Analysis of molecular variance (AMOVA) was carried out to assess the genetic differentiation using the poppr package [37].

2.6. Hierarchical Clustering and Discriminant Analysis of Principal Components (DAPC)

To explore the genetic relationships among the accessions, discriminant analysis of principal components and cluster analysis were conducted. In the DAPC, the 10 best PCs were retained via stratified cross-validation. Agglomerative hierarchical clustering was performed using Ward’s minimum-variance method and Nei’s distance (Ward, D). The clusters were visualized using a dendrogram in R [38]. Discriminant analysis of principal components (DAPC) was carried out using the adegenet package [39].

3. Results

3.1. SNP Marker Characterization

A total of 14,763 DArTseq SNPs were generated, of which 12,230 were aligned to the S. indicum genome (genome assembly: ASM2616843v1; https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_026168435.1/, accessed on 28 December 2025) and the remaining were unknown or scaffolds (Table 1, Figure 1). Filtering for MAF < 0.05 and a call rate < 0.95 yielded 11,502 high-quality SNPs for this study. The aligned markers were distributed across the 13 chromosomes with an average of 941 SNPs/chromosome. LG3 had the highest number of SNPs (1356), while LG13 had the lowest 596) (Table 1). The average SNP density was 40.4 SNPs/Mbp across the chromosomes, ranging from 30.56 SNPs/Mbp on LG13 to 47.49 SNPs/Mbp on LG12. The transition-to-transversion (Ts/Tv) ratio was 2.47, with transitions accounting for 71.2% (8706) of the total mutations and transversions accounting for 28.8% (3524) (Figure 2).

3.2. Genetic Diversity

The MAF of the 11,502 SNP markers ranged from 0.05 to 0.5 with a mean of 0.21, while the PIC ranged between 0.09 and 0.38 with a mean of 0.25 (Table 2). The mean genetic diversity (GD/He) was 0.30, and the observed heterozygosity was also 0.30. Within-cluster genetic diversity varied across the eight subpopulations, with higher observed heterozygosity than expected heterozygosity (Table 3). Cluster 6 (n = 29) had the highest number of polymorphic loci (10,161), followed by cluster 8 (n = 15, 9800 loci) and cluster 1 (n = 30, 9142 loci). Cluster 7 (n = 3) had the fewest polymorphic loci (6026). All subpopulations had negative inbreeding coefficients (FIS ranging from −0.28 to −0.36), while the whole panel had an FIS of 0.01.

3.3. Population Structure and Genetic Relationships

The panel was grouped into eight subpopulations (K = 8) using the snmf model, yielding the lowest entropy (0.6025) (Figure 3). The hierarchical dendrogram grouped the genotypes into eight clusters (Figure 4). This result was confirmed by the PCA scree plot, which also started to plateau (forming an elbow) beginning at the eighth PC. The largest cluster, cluster 1, comprised 30 accessions, followed by cluster 6 (29 accessions). Cluster 7 was the smallest, with three accessions.

3.4. Discriminant Analysis of Principal Components (DAPC) and Admixture Analysis

Discriminant analysis of principal components (DAPC) visualized the genetic relationships of the accessions (Figure 5). The first two principal components captured the majority of the between-group variance. Clusters 2 and 3 were distinct from clusters 1, 5, and 6. Cluster 8 partially overlapped with cluster 6. Cluster 4 was clearly differentiated with minimal overlap with other clusters. Clusters 1, 5, and 6 also showed partial overlap; cluster 7 was the most differentiated, with minimal internal variance. The ancestry matrix (Figure 6) revealed substantial admixture levels across the panel. Few individuals showed pure assignment to a single ancestral lineage.

3.5. AMOVA

The analysis of molecular variance (AMOVA) partitioned the total genetic variation, revealing that the majority of the genetic variation (85.85%) was within the subpopulations, while 14.15% was among the subpopulations (Table 4). Moreover, pairwise analysis (FST) (Table 5) values ranged from low to moderate, i.e., 0.02 (subpopulation 8 and 6, subpopulation 8 and 5) to 0.1 (subpopulation 7 and 1). Subpopulation 7 showed the highest genetic divergence (0.05–0.10).

4. Discussion

Genetic diversity is essential for conservation and for the efficient use of germplasm in breeding programs [23]. Therefore, understanding the extent of genetic diversity and population structure provides useful information for strategic breeding and conservation efforts. Genetic characterization of sesame has been conducted using various molecular markers; however, genomic resources remain limited. This study investigated the extent of genetic diversity and population structure in 109 sesame accessions from Mozambique using genome-wide DArTseq single-nucleotide polymorphism (SNP) markers.

4.1. SNP Marker Coverages and Polymorphism

The 11,502 high-quality DArTseq SNPs generated in this study provided genome-wide coverage for investigating and characterizing the genetic diversity of this panel. The SNPs were well distributed across the 13 chromosomes; however, density varied within chromosomes. High-density regions on LG12 and LG9 indicate high recombination rates, whereas low-density regions on LG13 suggest lower polymorphism, possibly due to suppressed recombination at centromeres [40]. Similar non-uniform distributions have been reported in sesame [21,41]. Substitution mutation types affect polymorphism rates and are grouped as transition and transversion mutations. The predominance of transitions over transversions (Figure 2) could be attributed to the purine-pyrimidine stability of transitions during natural selection [42]. Similar dominance has been reported in Brassica juncea L. [43] and Camelina sativa L. [44].

4.2. Genetic Diversity of 109 Sesame Accessions

The overall genetic diversity (He = 0.30, Ho = 0.30, MAF = 0.21) in this study indicates a moderately broad genetic base among the 109 accessions. This agrees with earlier studies reporting He = 0.3 in 41 sesame accessions [21] and He = 0.332 in 501 sesame accessions [22], suggesting that sesame harbors substantial diversity. Polymorphism information content (PIC) estimates marker polymorphism and marker informativeness [45]. The mean PIC value of 0.25 indicates that the markers were moderately informative [46]. Previous studies in sesame have reported values ranging from 0.12 to 0.28 using SNPs and from 0.35 to 0.63 using SSRs [21,22,23,47,48]. This can be attributed to population origin and marker type, for example, because SNPs are biallelic and have a lower maximum PIC, whereas SSRs are multiallelic and have a higher PIC. Further, the chromosomal regions under selective pressure may alter allele frequency distributions and the number of alleles [49].

Population structure analysis provides insight into the genetic relatedness of individuals and is therefore useful for identifying diverse parents for hybridization. The panel in this study was grouped into eight subpopulations (K = 8) despite extensive admixture, suggesting that although the panel shares a common ancestral history, it still contains distinct subpopulations. Using the cross-entropy criterion in our structure analysis, K = 8 reached the absolute lowest cross-entropy value across K = 1–10 with 10 repetitions per K (Figure 3). Following the standard protocol for the snmf algorithm, the K value that minimizes this criterion (i.e., the lowest cross-entropy) is typically chosen as the most supported model under the sNMF framework for representing the population’s true genetic structure. This was further complemented by PCA implemented in the lfmm model. Previous studies in sesame have reported four [23], three [50,51,52], and two [21,53] subpopulations. The varying number of subpopulations may be attributed to the origin of accessions, population size, and the type of molecular marker used. The high admixture levels observed indicate historical seed-exchange practices, both locally and internationally, regeneration practices, and potential outcrossing in sesame [54]. Our findings align with earlier studies that reported varying degrees of admixture. For example, [51] using STRUCTURE and a membership threshold of <0.50 identified 7 accessions out of 95 as admixed, and [55] using a membership threshold of <0.60 identified 35 genotypes out of 129 as admixed.

All eight subpopulations showed negative FIS values (−0.28 to −0.36), indicating a systematic excess of heterozygotes, which is unlikely in self-pollinated crops. This pattern can be attributed to a combination of factors. First, the residual heterozygosity of landraces in the panel. Although sesame is predominantly self-pollinated, occasional outcrossing occurs at varying rates. For example, [47] reported outcrossing rates of approximately 45% and 46.4% in Ethiopian sesame landraces and cultivars, respectively, suggesting that the material consisted of line mixtures and segregants from outcrossing events. Under mixed-mating conditions, loci under balance or weak selection could remain unfixed for many generations. Second, admixture, as seen in the admixture analysis (Figure 6), suggests gene flow among lineages. Accessions of mixed ancestry may have contributed to heterozygosity. Third, scoring paralog loci as heterozygous SNPs is a limitation of reduced-representation platforms. According to [56], loci with Ho substantially exceeding Hardy–Weinberg equilibrium (HWE) expectation may represent collapsed paralogs. At the population level, Ho was equal to He; however, within each of the eight subpopulations, Ho was greater than He. Within subpopulations, heterozygous landrace genotypes may have inflated the Ho. However, at the population level, pooling all 109 diverse samples increased the overall allelic richness (reverse Wahlund effect). This inflated He to match Ho, creating the appearance of balance at the broader scale while masking the heterozygote excess within subpopulations. These findings collectively reflect the management of the panel and the complex domestication history of sesame, where human migration and trade facilitated germplasm exchange across geographical boundaries [57].

The DAPC supported the genetic relationships of the accessions with the first two axes explaining most of the variation. The DAPC partitioned the accessions in the panel into eight clusters, confirming the presence of distinct clusters. Overlapping clusters 1, 5, 6, and 8 in the DAPC supported the admixture patterns observed among the accessions, while differentiated cluster 7 suggested single-ancestry lineages consistent with the near-pure ancestry in the admixture plot. Cluster analysis further clearly separated the accessions into distinct clades, confirming the genetic diversity revealed by the SNP markers. At the same time, the assignment of accessions across the branches reflected the admixture patterns shaping the panel’s genetic structure. Cluster 1 was separated from other clusters by the largest distance, while clusters 2 through 8 were broadly complex. Clusters 7 and 8 revealed moderate similarity. The clustering pattern was consistent with genetic exchange or shared breeding history.

The higher variance within subpopulations (85.85%) than among subpopulations (14.15%) suggests that the accessions within subpopulations harbor a broad genetic base, providing a wide range of phenotypic traits useful for targeted selection. Consistent with our findings, previous studies on sesame have reported substantial genetic diversity within populations. Analysis by [58] using SSR markers reported 92% of variation within populations, while ref. [55,59] identified similar levels of genetic variation of 85.84% and 87.1%, respectively.

Further, the pairwise fixation indices (FSTs) quantified the differentiation between the subpopulations. Pairwise FST values < 0.05 indicate low differentiation, while values between 0.05 and 0.15 suggest moderate differentiation [60]. Our findings revealed that the subpopulations were slightly to moderately differentiated (Table 5). For example, a low fixation index (0.02) between subpopulations 5 and 8 suggests that the accessions in those subpopulations are genetically similar. This confirms the overlap observed in the DAPC. However, the moderate differentiation between subpopulations 1 and 7 (FST = 0.1) suggests that the subpopulations are genetically distinct. Accessions from those divergent subpopulations may be useful for exploring transgressive segregation for key agronomic traits.

5. Conclusions

The present study provided a genome-wide assessment of genetic diversity and population structure of a diverse panel of 109 sesame accessions from Mozambique using DArTseq SNP markers. The panel exhibited moderate overall genetic diversity (He = 0.30, MAF = 0.21), which is useful for potential breeding programs. Eight subpopulations were identified, indicating detectable genetic structure within the Mozambican germplasm. The high admixture levels indicate that while the accessions can form distinct clusters, they share an evolutionary history likely facilitated by historical germplasm exchange or localized gene flow. The excess heterozygosity within subpopulations likely reflects residual heterozygosity in sesame landraces, admixture effects, and potential scoring of paralogs as heterozygous SNPs. The high within-subpopulation molecular variance (85.85%) suggests useful selectable variation exists within as well as among subpopulations. The pairwise fixation indices identified moderately differentiated accessions in subpopulations 7 and 1 as potential candidates for hybridization. Overall, these findings provide a foundation for informed parental selection in future sesame breeding programs in Mozambique.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/genes17050528/s1, Table S1: List of Sesame Genotypes and their identified clusters.

Author Contributions

Conceptualization, W.N.M. and R.M.C.; methodology, W.N.M. and R.M.C.; software, W.N.M. and W.A.; validation, W.N.M., R.M.C. and W.A.; formal analysis, W.N.M. and W.A.; investigation, W.N.M.; resources, R.M.C.; data curation, W.N.M.; writing—original draft preparation, W.N.M.; writing—review and editing, W.N.M., R.M.C. and W.A.; visualization, W.N.M. and W.A.; supervision, R.M.C.; project administration, W.N.M. and R.M.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Centre of Excellence in Agri-Food Systems and Nutrition (CE-AFSN) of Eduardo Mondlane University, Mozambique (Project grant number E089-MZ), and the International Maize and Wheat Improvement Center (CIMMYT)—Vision for Adapted Crops & Soils (VACS) Capacity Building Project Initiative.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data used in this study are available in the article and Supplementary Materials. Further inquiries should be directed to the corresponding author.

Acknowledgments

The authors would like to thank the Centre of Excellence in Agri-Food Systems and Nutrition (CE-AFSN), Eduardo Mondlane University, and Vision for Adapted Crops & Soils (VACS) Capacity Building Project Initiative for supporting the postgraduate education of W.M and financing this research. Moreover, the Instituto de Investigação Agrária de Mocambique (IIAM), Nampula, provided the accessions used in the study. Lastly, the authors would like to thank Tanzania Agricultural Research Institute (TARI), Naliendele, for hosting the research, and SEQART Africa for their genotyping services. Grammarly was used to correct grammatical errors.

Conflicts of Interest

Author Wilfred Abincha was employed by the company Kentegra Biotechnology Holdings LLC. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

RAPD	Random amplified polymorphic DNA
SSRs	Simple sequence repeats
ISSRs	Inter-simple sequence repeats
AFLPs	Amplified fragment length polymorphisms
SRAPs	Sequence-related amplified polymorphisms
SNPs	Single-nucleotide polymorphisms
DArTseq	Diversity array technology sequencing
USDA	US Arid Land Agricultural Research Center
WGS	Whole-genome resequencing
MAF	Minor allele frequency
He	Expected heterozygosity
GD	Genetic diversity
Ho	Observed heterozygosity
FST	Fixation index
FIS	Coefficient of inbreeding
AMOVA	Analysis of molecular variance
DAPC	Discriminant analysis of principal components
Va	Additive variance
Vd	Dominant variance
Ne	Effective population size
Pop	Population
n.ind	Number of individuals
Polyloc	Polymorphic loci
Monoloc	Monomorphic loci
A/G	Adenine/Guanine
C/T	Cytosine/Thymine
PIC	Polymorphic information content
Df	Degree of freedom
NA	Not Applicable

References

Zhang, H.; Li, C.C.; Miao, H.; Xiong, S. Insights from the Complete Chloroplast Genome into the Evolution of Sesamum indicum L. PLoS ONE 2013, 8, e80508. [Google Scholar] [CrossRef]
Bedigian, D.; Harlan, J.R. Evidence for Cultivation of Sesame in the Ancient World. Econ. Bot. 1986, 40, 137–154. [Google Scholar] [CrossRef]
Uzun, B.; Arslun, Ç.; Furat, S. Variation in Fatty Acid Compositions, Oil Content and Oil Yield in a Germplasm Collection of Sesame (Sesamum indicum L.). J. Am. Oil Chem. Soc. 2008, 85, 1135–1142. [Google Scholar] [CrossRef]
Mostashari, P.; Mousavi, A. Sesame Seeds: A Nutrient-Rich Superfood. Foods 2024, 13, 1153. [Google Scholar] [CrossRef]
FAOSTAT. Crops and Livestock Products. 2025. Available online: https://www.fao.org/faostat/en/#data/QCL (accessed on 13 March 2026).
USAID. Feed the Future Mozambique Promoting Innovative and Resilient Agriculture Market Systems Activity (FTF Premier); USAID: Washington, DC, USA, 2023.
Sanni, G.B.T.A.; Ezin, V.; Odjo, T.; Ahanchede, A. Unlocking the Promise of Sesame (Sesamum indicum L.) Industry in Africa: Economics, Opportunities, Obstacles, and Future Horizons. CABI Agric. Biosci. 2026, 7, 0022. [Google Scholar] [CrossRef]
Instituto do Algodão e Oleaginosas (IAOM). Produção de Gergelim Em Moçambique; Instituto do Algodão e Oleaginosas (IAOM): Maputo, Mozambique, 2024.
Muthoni, J.; Shimelis, H. Production of Minor Tropical Oil Crops in Africa: Case of Sesame (Sesamum indicum L.). Aust. J. Crop Sci. 2025, 19, 816–829. [Google Scholar] [CrossRef]
Galli, M. SOMICA-Value-Chain-Analysis-and-Development-January-2019-1; CEFA: Kansas City, MO, USA, 2019. [Google Scholar]
Hasan, N.; Choudhary, S.; Naaz, N.; Sharma, N.; Laskar, R.A. Recent Advancements in Molecular Marker-Assisted Selection and Applications in Plant Breeding Programmes. J. Genet. Eng. Biotechnol. 2021, 19, 128. [Google Scholar] [CrossRef]
Ercan, A.G.; Taskin, M.; Turgut, K. Analysis of Genetic Diversity in Turkish Sesame (Sesamum indicum L.) Populations Using RAPD Markers. Genet. Resour. Crop Evol. 2004, 51, 599–607. [Google Scholar] [CrossRef]
Pham, T.D.; Bui, T.M.; Werlemark, G.; Bui, T.C.; Merker, A.; Carlsson, A.S. A Study of Genetic Diversity of Sesame (Sesamum indicum L.) in Vietnam and Cambodia Estimated by RAPD Markers. Genet. Resour. Crop Evol. 2009, 56, 679–690. [Google Scholar] [CrossRef]
Kumar, V.; Sharma, S.N. Comparative Potential of Phenotypic, ISSR and SSR Markers for Characterization of Sesame (Sesamum indicum L.) Varieties from India. J. Crop Sci. Biotechnol. 2011, 14, 163–171. [Google Scholar] [CrossRef]
Parsaeian, M.; Mirlohi, A.; Saeidi, G. Study of Genetic Variation in Sesame (Sesamum indicum L.) Using Agro-Morphological Traits and ISSR Markers. Russ. J. Genet. 2011, 47, 314–321. [Google Scholar] [CrossRef]
Ali, G.M.; Yasumoto, S.; Seki-Katsuta, M. Assessment of Genetic Diversity in Sesame (Sesamum indicum L.) Detected by Amplified Fragment Length Polymorphism Markers. Electron. J. Biotechnol. 2007, 10, 12–23. [Google Scholar] [CrossRef]
Laurentin, H.E.; Karlovsky, P. Genetic Relationship and Diversity in a Sesame (Sesamum indicum L.) Germplasm Collection Using Amplified Fragment Length Polymorphism (AFLP). BMC Genet. 2006, 7, 10. [Google Scholar] [CrossRef]
Zhang, Y.; Zhang, X.; Hua, W.; Wang, L.; Che, Z. Analysis of Genetic Diversity among Indigenous Landraces from Sesame (Sesamum indicum L.) Core Collection in China as Revealed by SRAP and SSR Markers. Genes Genom. 2010, 32, 207–215. [Google Scholar] [CrossRef]
Zhang, Y.; Zhang, X.; Che, Z.; Wang, L.; Wei, W.; Li, D. Genetic Diversity Assessment of Sesame Core Collection in China by Phenotype and Molecular Markers and Extraction of a Mini-Core Collection. BMC Genet. 2012, 13, 102. [Google Scholar] [CrossRef]
Hussain, H.; Nisar, M. Assessment of Plant Genetic Variations Using Molecular Markers: A Review. J. Appl. Biol. Biotechnol. 2020, 8, 99–109. [Google Scholar]
Almarri, N.B.; Alsofuani, N.M.; Osman, S.; Ahmad, S.; Ibrahim, M.A.; Mostafa, S.M.; Al-Haidar, O.A.; Alomran, S.M.; Alrashidi, I.; Tariq, R. Robust Biodiversity Assessment and DNA Fingerprinting of Saudi and Exotic Sesame Germplasm Using Whole Genome Resequencing. Genet. Resour. Crop Evol. 2026, 73, 91. [Google Scholar] [CrossRef]
Seay, D.; Szczepanek, A.; De La Fuente, G.N.; Votava, E.; Abdel-Haleem, H. Genetic Diversity and Population Structure of a Large USDA Sesame Collection. Plants 2024, 13, 1765. [Google Scholar] [CrossRef]
Tesfaye, T.; Tesfaye, K.; Keneni, G.; Ziyomo, C.; Alemu, T. Genetic Diversity of Sesame (Sesamum indicum L) Using High Throughput Diversity Array Technology. J. Crop Sci. Biotechnol. 2022, 25, 359–371. [Google Scholar] [CrossRef]
Mukhebi, D.; Gachanja, P.; Karan, D.; Kamau, B.M.; King’ori, P.W.; Juma, B.S.; Mbinda, W.M. DArTseq-Based silicoDArT and SNP Markers Reveal the Genetic Diversity and Population Structure of Kenyan Cashew (Anacardium occidentale L.) Landraces. PLoS ONE 2025, 20, e0313850. [Google Scholar] [CrossRef] [PubMed]
Deres, D.; Feyissa, T. Concepts and Applications of Diversity Array Technology (DArT) Markers for Crop Improvement. J. Crop Improv. 2023, 37, 913–933. [Google Scholar] [CrossRef]
Ambaw, Y.D.; Abitea, A.G.; Olango, T.M. Genetic Diversity and Population Structure in Ethiopian Mustard (Brassica carinata A. Braun) Revealed by High-Density DArTSeq SNP Genotyping. BMC Genom. 2025, 26, 354. [Google Scholar] [CrossRef]
Silue, T.; Agre, P.A.; Olasanmi, B.; Adewumi, A.S.; Adejumobi, I.I.; Abebe, A.T. Genetic Diversity and Population Structure of Soybean (Glycine max (L.) Merril) Germplasm. PLoS ONE 2025, 20, e0312079. [Google Scholar] [CrossRef]
Mahaman Mourtala, I.Z.; Gouda, A.C.; Baina, D.; Maxwell, N.I.I.; Adje, C.O.; Baragé, M.; Happiness, O.O. Genetic Diversity and Population Structure Studies of West African Sweetpotato [Ipomoea batatas (L.) Lam] Collection Using DArTseq. PLoS ONE 2025, 20, e0312384. [Google Scholar] [CrossRef]
Kilian, A.; Wenzl, P.; Huttner, E.; Carling, J.; Xia, L.; Blois, H.; Caig, V.; Heller-Uszynska, K.; Jaccoud, D.; Hopper, C.; et al. Diversity Arrays Technology: A Generic Genome Profiling Technology on Open Platforms. In Data Production and Analysis in Population Genomics: Methods and Protocols; Pompanon, F., Bonin, A., Eds.; Humana Press: Totowa, NJ, USA, 2012; pp. 67–89. [Google Scholar]
Elshire, R.J.; Glaubitz, J.C.; Sun, Q.; Poland, J.A.; Kawamoto, K.; Buckler, E.S.; Mitchell, S.E. A Robust, Simple Genotyping-by-Sequencing (GBS) Approach for High Diversity Species. PLoS ONE 2011, 6, e19379. [Google Scholar] [CrossRef] [PubMed]
Wang, M.; Huang, J.; Liu, S.; Liu, X.; Li, R.; Luo, J.; Fu, Z. Improved Assembly and Annotation of the Sesame Genome. DNA Res. Int. J. Rapid Publ. Rep. Genes Genomes 2022, 29, dsac041. [Google Scholar] [CrossRef] [PubMed]
R Core Team. R: The R Project for Statistical Computing. Available online: https://www.r-project.org/ (accessed on 14 March 2026).
Granato, I.S.C.; Galli, G.; de Oliveira Couto, E.G.; e Souza, M.B.; Mendonça, L.F.; Fritsche-Neto, R. snpReady: A Tool to Assist Breeders in Genomic Analysis. Mol. Breed. 2018, 38, 102. [Google Scholar] [CrossRef]
Gruber, B.; Unmack, P.J.; Berry, O.F.; Georges, A. Dartr: An r Package to Facilitate Analysis of SNP Data Generated from Reduced Representation Genome Sequencing. Mol. Ecol. Resour. 2018, 18, 691–699. [Google Scholar] [CrossRef]
Frichot, E.; François, O. LEA: An R Package for Landscape and Ecological Association Studies. Methods Ecol. Evol. 2015, 6, 925–929. [Google Scholar] [CrossRef]
Alexander, D.H.; Lange, K. Enhancements to the ADMIXTURE Algorithm for Individual Ancestry Estimation. BMC Bioinform. 2011, 12, 246. [Google Scholar] [CrossRef]
Kamvar, Z.N.; Tabima, J.F.; Grünwald, N.J. Poppr: An R Package for Genetic Analysis of Populations with Clonal, Partially Clonal, and/or Sexual Reproduction. PeerJ 2014, 2, e281. [Google Scholar] [CrossRef]
Galili, T. Dendextend: An R Package for Visualizing, Adjusting and Comparing Trees of Hierarchical Clustering. Bioinformatics 2015, 31, 3718–3720. [Google Scholar] [CrossRef] [PubMed]
Jombart, T.; Devillard, S.; Balloux, F. Discriminant Analysis of Principal Components: A New Method for the Analysis of Genetically Structured Populations. BMC Genet. 2010, 11, 94. [Google Scholar] [CrossRef]
Khedikar, Y.; Clarke, W.E.; Chen, L.; Higgins, E.E.; Kagale, S.; Koh, C.S.; Bennett, R.; Parkin, I.A.P. Narrow Genetic Base Shapes Population Structure and Linkage Disequilibrium in an Industrial Oilseed Crop, Brassica carinata A. Braun. Sci. Rep. 2020, 10, 12629. [Google Scholar] [CrossRef]
Gelashe, F.B.; Ndeve, A.D.; Menamo, T.M.; Gandhi, H.; Chiulele, R.M. Genetic Diversity and Collection Structure Studies of Sesame (Sesamum indicum L.) Accessions Across Ethiopian Research Centers. Genes 2026, 17, 300. [Google Scholar] [CrossRef] [PubMed]
Ossowski, S.; Schneeberger, K.; Lucas-Lledó, J.I.; Warthmann, N.; Clark, R.M.; Shaw, R.G.; Weigel, D.; Lynch, M. The Rate and Molecular Spectrum of Spontaneous Mutations in Arabidopsis Thaliana. Science 2010, 327, 92–94. [Google Scholar] [CrossRef]
Abdel-Haleem, H.; Luo, Z.; Szczepanek, A. Genetic Diversity and Population Structure of the USDA Collection of Brassica juncea L. Ind. Crops Prod. 2022, 187, 115379. [Google Scholar] [CrossRef]
Luo, Z.; Brock, J.; Dyer, J.M.; Kutchan, T.; Schachtman, D.; Augustin, M.; Ge, Y.; Fahlgren, N.; Abdel-Haleem, H. Genetic Diversity and Population Structure of a Camelina sativa Spring Panel. Front. Plant Sci. 2019, 10, 184. [Google Scholar] [CrossRef]
Guo, X.; Elston, R.C. Linkage Information Content of Polymorphic Genetic Markers. Hum. Hered. 1999, 49, 112–118. [Google Scholar] [CrossRef]
Serrote, C.M.L.; Reiniger, L.R.S.; Silva, K.B.; dos Santos Rabaiolli, S.M.; Stefanel, C.M. Determining the Polymorphism Information Content of a Molecular Marker. Gene 2020, 726, 144175. [Google Scholar] [CrossRef] [PubMed]
Gebremichael, D.; Parzies, H. Genetic Variability among Landraces of Sesame in Ethiopia. Afr. Crop Sci. J. 2011, 19, 1–13. [Google Scholar] [CrossRef][Green Version]
Berhe, M.; You, J.; Dossa, K.; Abera, F.A.; Adjei, E.A.; Zhang, Y.; Wang, L. Large Scale Genetic Landscape and Population Structure of Ethiopian Sesame (Sesamum indicum L.) Germplasm Revealed through Molecular Marker Analysis. Oil Crop Sci. 2023, 8, 266–277. [Google Scholar] [CrossRef]
Leśniowska-Nowak, J.; Bednarek, P.T.; Czapla, K.; Nowak, M.; Niedziela, A. Effect of Chromosomal Localization of NGS-Based Markers on Their Applicability for Analyzing Genetic Variation and Population Structure of Hexaploid Triticale. Int. J. Mol. Sci. 2024, 25, 9568. [Google Scholar] [CrossRef]
Wang, Z.; Zhou, F.; Tang, X.; Yang, Y.; Zhou, T.; Liu, H. Morphology and SSR Markers-Based Genetic Diversity Analysis of Sesame (Sesamum indicum L.) Cultivars Released in China. Agriculture 2023, 13, 1885. [Google Scholar] [CrossRef]
Basak, M.; Uzun, B.; Yol, E. Genetic Diversity and Population Structure of the Mediterranean Sesame Core Collection with Use of Genome-Wide SNPs Developed by Double Digest RAD-Seq. PLoS ONE 2019, 14, e0223757. [Google Scholar] [CrossRef] [PubMed]
Cho, Y.-I.; Park, J.-H.; Lee, C.-W.; Ra, W.-H.; Chung, J.-W.; Lee, J.-R.; Ma, K.-H.; Lee, S.-Y.; Lee, K.-S.; Lee, M.-C.; et al. Evaluation of the Genetic Diversity and Population Structure of Sesame (Sesamum indicum L.) Using Microsatellite Markers. Genes Genom. 2011, 33, 187–195. [Google Scholar] [CrossRef]
Cui, C.; Mei, H.; Liu, Y.; Zhang, H.; Zheng, Y. Genetic Diversity, Population Structure, and Linkage Disequilibrium of an Association-Mapping Panel Revealed by Genome-Wide SNP Markers in Sesame. Front. Plant Sci. 2017, 8, 1189. [Google Scholar]
van Hintum, T.J.L.; Brown, A.H.D.; Spillane, C. Core Collections of Plant Genetic Resources; Bioversity International: Rome, Italy, 2000; ISBN 978-92-9043-454-2. [Google Scholar]
Asekova, S.; Kulkarni, K.P.; Oh, K.W.; Lee, M.-H.; Oh, E.; Kim, J.-I.; Yeo, U.-S.; Pae, S.-B.; Ha, T.J.; Kim, S.U. Analysis of Molecular Variance and Population Structure of Sesame (Sesamum indicum L.) Genotypes Using Simple Sequence Repeat Markers. Plant Breed. Biotechnol. 2018, 6, 321–336. [Google Scholar] [CrossRef]
McKinney, G.J.; Waples, R.K.; Seeb, L.W.; Seeb, J.E. Paralogs Are Revealed by Proportion of Heterozygotes and Deviations in Read Ratios in Genotyping-by-Sequencing Data from Natural Populations. Mol. Ecol. Resour. 2017, 17, 656–669. [Google Scholar] [CrossRef]
Ashri, A.; Singh, R.J. Sesame (Sesamum indicum L.). In Genetic Resources, Chromosome Engineering, and Crop Improvement; CRC Press: Boca Raton, FL, USA, 2006; Volume 4, pp. 231–289. [Google Scholar]
Azon, C.F.; Fassinou Hotegni, N.V.; Adjé, C.O.; Gnanglè, L.S.; Benjamin, E.; Mhuruyengwe, R.L.; Salaou, A.M.; Houdegbe, A.C.; Sogbohossou, D.O.; Sedah, P.; et al. Molecular Diversity and Agronomic Performance of Sesame ( Sesamum indicum ) Cultivars in Benin: Local Cultivars and Lines Introduced From China. Plant-Environ. Interact. 2024, 5, e70024. [Google Scholar] [CrossRef]
Surapaneni, M.; Yepuri, V.; Vemireddy, L.R.; Ghanta, A.; Siddiq, E.A. Development and Characterization of Microsatellite Markers in Indian Sesame (Sesamum indicum L.). Mol. Breed. 2014, 34, 1185–1200. [Google Scholar] [CrossRef]
Mohammadi, S.A.; Prasanna, B.M. Analysis of Genetic Diversity in Crop Plants—Salient Statistical Tools and Considerations. Crop Sci. 2003, 43, 1235–1248. [Google Scholar] [CrossRef]

Figure 1. Distribution of SNP markers across sesame across 13 linkage groups within 1 Mb windows.

Figure 2. SNP mutation spectrum identified among the 11,502 SNPs used in the analysis of 109 sesame accessions.

Figure 3. Cross-entropy and PCA scree plot-based population structure based on 11,502 sesame SNPs.

Figure 4. Dendrogram showing the 8 subpopulations in 109 sesame accessions based on 11,502 SNPs, with the numbers and colors representing each subpopulation.

Figure 5. DAPC scatterplot of 8 subpopulations of 109 sesame accessions based on 11,502 SNPs, with each color representing a subpopulation.

Figure 6. Genetic admixture proportions across 8 subpopulations of 109 sesame accessions based on 11,502 SNPs (The x-axis represents the accessions/individuals, while the colors represent each subpopulation).

Table 1. Genomic distribution of 14,763 DArTseq SNPs across sesame chromosomes.

Chromosome	Length (Mbp)	No. of SNPs	Marker Distance (Kbp)	SNP_per_Mbp
LG1	23.75	1044	22.75	43.96
LG2	23.37	815	28.67	34.87
LG3	31.44	1356	23.19	43.13
LG4	21.23	797	26.64	37.54
LG5	20.96	657	31.9	31.35
LG6	27.99	1220	22.94	43.59
LG7	16.12	716	22.51	44.42
LG8	31.99	1252	25.55	39.14
LG9	26.74	1249	21.41	46.71
LG10	22.21	825	26.92	37.15
LG11	17.33	796	21.77	45.93
LG12	19.1	907	21.06	47.49
LG13	19.5	596	32.72	30.56
Scaffolds	-	49	-	-
Unknown	-	2484	-	-
Total	301.73	14,763	328.03	525.84
Average	-	941	25.2	40.4

Chromosome names and length according to the genome assembly, https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_026168435.1/ (accessed on 17 March 2026).

Table 2. Genetic diversity parameters of the sesame population based on 11,502 SNPs.

	Mean	Lower	Upper
He/GD	0.3	0.1	0.5
PIC	0.25	0.09	0.38
MAF	0.21	0.05	0.5
Ho	0.3	0.08	0.55
FIS	0.01	−0.83	0.72

Table 3. Subpopulation diversity estimates.

Pop	n.Ind	polyLoc	monoLoc	Ho	He	FIS
6	29	10,161	1341	0.37	0.26	−0.3
2	13	8408	3094	0.37	0.24	−0.36
1	30	9142	2360	0.37	0.24	−0.35
8	15	9800	1702	0.38	0.26	−0.28
4	8	8794	2708	0.37	0.25	−0.29
3	5	7704	3798	0.38	0.24	−0.31
7	3	6026	5476	0.37	0.22	−0.35
5	6	8189	3313	0.38	0.25	−0.3

Table 4. Analysis of molecular variance between and within sesame accessions.

Source of Variance	Df	Sum Sq	Mean Sq	% Variance	p-Value
Between samples	7	90,349.87	12,907.13	14.15
Within samples	101	423,713.39	4195.182	85.85
Total	108	514,063.27	4759.845	100	<0.001

Table 5. Fixation indices across the 8 sesame subpopulations.

	Pop 6	Pop 2	Pop 1	Pop 8	Pop 4	Pop 3	Pop 7
Pop 2	0.04	NA	NA	NA	NA	NA	NA
Pop 1	0.03	0.08	NA	NA	NA	NA	NA
Pop 8	0.02	0.03	0.05	NA	NA	NA	NA
Pop 4	0.03	0.07	0.04	0.04	NA	NA	NA
Pop 3	0.04	0.04	0.07	0.03	0.06	NA	NA
Pop 7	0.06	0.07	0.1	0.05	0.08	0.06	NA
Pop 5	0.03	0.05	0.05	0.02	0.05	0.04	0.07

NA represents Not Applicable.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Muteti, W.N.; Chiulele, R.M.; Abincha, W. Genetic Characterization and Population Structure of Mozambique’s Sesame (Sesamum indicum L.) Accessions Using DArTseq-Derived SNP Markers. Genes 2026, 17, 528. https://doi.org/10.3390/genes17050528

AMA Style

Muteti WN, Chiulele RM, Abincha W. Genetic Characterization and Population Structure of Mozambique’s Sesame (Sesamum indicum L.) Accessions Using DArTseq-Derived SNP Markers. Genes. 2026; 17(5):528. https://doi.org/10.3390/genes17050528

Chicago/Turabian Style

Muteti, Winfred Nthamo, Rogerio Marcos Chiulele, and Wilfred Abincha. 2026. "Genetic Characterization and Population Structure of Mozambique’s Sesame (Sesamum indicum L.) Accessions Using DArTseq-Derived SNP Markers" Genes 17, no. 5: 528. https://doi.org/10.3390/genes17050528

APA Style

Muteti, W. N., Chiulele, R. M., & Abincha, W. (2026). Genetic Characterization and Population Structure of Mozambique’s Sesame (Sesamum indicum L.) Accessions Using DArTseq-Derived SNP Markers. Genes, 17(5), 528. https://doi.org/10.3390/genes17050528

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Genetic Characterization and Population Structure of Mozambique’s Sesame (Sesamum indicum L.) Accessions Using DArTseq-Derived SNP Markers

Abstract

1. Introduction

2. Materials and Methods

2.1. Plant Materials

2.2. Sample Collection, DNA Extraction, and Genotyping

2.3. Quality Control and Genetic Diversity Analysis

2.4. Population Structure Analysis

2.5. Analysis of Molecular Variance

2.6. Hierarchical Clustering and Discriminant Analysis of Principal Components (DAPC)

3. Results

3.1. SNP Marker Characterization

3.2. Genetic Diversity

3.3. Population Structure and Genetic Relationships

3.4. Discriminant Analysis of Principal Components (DAPC) and Admixture Analysis

3.5. AMOVA

4. Discussion

4.1. SNP Marker Coverages and Polymorphism

4.2. Genetic Diversity of 109 Sesame Accessions

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI