Genetic Variability and Population Structure of Ethiopian Sesame (Sesamum indicum L.) Germplasm Assessed through Phenotypic Traits and Simple Sequence Repeats Markers

Ethiopia is one of the centers of genetic diversity of sesame (Sesamum indicum L.). The sesame genetic resources present in the country should be explored for local, regional, and international genetic improvement programs to design high-performing and market-preferred varieties. This study’s objective was to determine the extent of genetic variation among 100 diverse cultivated sesame germplasm collections of Ethiopia using phenotypic traits and simple sequence repeat (SSR) markers to select distinct and complementary genotypes for breeding. One hundred sesame entries were field evaluated at two locations in Ethiopia for agro-morphological traits and seed oil content using a 10 × 10 lattice design with two replications. Test genotypes were profiled using 27 polymorphic SSR markers at the Oil Crops Research Institute of the Chinese Academy of Agricultural Sciences. Analysis of variance revealed significant (p ≤ 0.05) entry by environment interaction for plant height, internode length, number of secondary branches, and grain yield. Genotypes such as Hirhir Kebabo Hairless-9, Setit-3, Orofalc ACC-2, Hirhir Humera Sel-6, ABX = 2-01-2, and Setit-1 recorded grain yield of >0.73 ton ha−1 with excellent performance in yield component such as oil yield per hectare. Grain yield had positive and significant (p < 0.01) associations with oil yield (r = 0.99), useful for simultaneous selection for yield improvement in sesame. The SSR markers revealed gene diversity and polymorphic information content values of 0.30 and 0.25, respectively, showing that the tested sesame accessions were genetically diverse. Cluster analysis resolved the accessions into two groups, while population structure analysis revealed four major heterotic groups, thus enabling selection and subsequent crossing to develop breeding populations for cultivar development. Based on phenotypic and genomic divergence, the following superior and complementary genotypes: Hirhir Humera Sel-6, Setit-3, Hirhir Kebabo Hairless Sel-4, Hirhir Nigara 1st Sel-1, Humera-1 and Hirhir Kebabo Early Sel-1 (from cluster II-a), Hirhir kebabo hairless-9, NN-0029(2), NN0068-2 and Bawnji Fiyel Kolet, (from cluster II-b). The selected genotypes will serve as parents in the local breeding program in Ethiopia.

Sesame is the second most valuable export crop after coffee (Coffea arabica L.) and a major contributor to Ethiopia's gross domestic product [10]. In Ethiopia, the area allocated for sesame production in 2018 was 294,819.49 ha, approximately 39.4% of the total estimated area allocated for oil crops production [11]. Compared with global sesame production, Ethiopia ranks eight with a total annual output of 301,302 tons after Sudan (981,000 tons), Myanmar (768,858 tons), India (746,000 tons), Nigeria (572,761 tons), Tanzania (561,103 tons), China (433,386 tons), and China Mainland (431,500 tons) [12].
Ethiopia is the center of origin and diversity for the cultivated sesame and its allied species. The Ethiopian Biodiversity Institute (EBI) maintains one of the most extensive core collections of sesame genetic resources in Africa. About 5000 genetically diverse sesame germplasm resources are conserved by the EBI [13]. The germplasm pool can provide various unique economic traits and gene combinations for global sesame improvement. However, the genetic resources maintained at the EBI are yet to be explored for local, regional, and international sesame improvement programs to develop high-performing and marketpreferred varieties. Ethiopia's mean sesame yield is 0.68 tons ha −1 , which is relatively low compared with a mean yield of 1 ton ha −1 in sub-Saharan Africa and 1.29 ton ha −1 in Egypt [11,12]. The low productivity is attributable to a lack of improved and high-yielding varieties and traditional production technologies, among other constraints. Landrace varieties are the main sources of seed for cultivating the crop in Ethiopia. Landraces are inherently low yielders and prone to capsule shattering leading to reduced productivity. However, landraces are highly valued for possessing intrinsic farmer-preferred attributes such as unique taste and aroma, adaptation to marginal growing conditions that often characterize low input farming systems [8,14].
Sesame genetic resources maintained at the EBI can be explored to search for new sources of useful genetic variation for economic traits. This includes grain yield and yieldcomponents, resistance to diseases and insect pests, tolerance to abiotic stresses, capsule shattering tolerance, and nutritional quality. This will identify desirable and complementary parents and for gene discovery. Hence, rigorous phenotyping and genotyping can establish genetic polymorphism in the germplasm pool and classify the heterotic groups for ideotype breeding.
Previous studies have reported considerable phenotypic variation for agronomic and quality traits in Ethiopia's sesame genetic resources [15][16][17][18]. However, these studies did not fully represent the landrace collections from various parts of Ethiopia. Hence there is a need for a comprehensive assessment of the genetic diversity present in the Ethiopian sesame using a relatively more significant number of accessions representing the diverse germplasm resources and sampled from various regions through phenotypic traits and effective molecular markers.
Several molecular markers such as amplified fragment length polymorphism (AFLP), restriction fragment length polymorphism (RFLP), random amplified polymorphic DNA (RAPD), microsatellites or simple sequence repeat (SSR), and single nucleotide polymorphisms (SNPs) markers are widely used in genetic diversity analysis of various crop genetic resources. SSR or microsatellites have been commonly used in genetic variation studies on sesame [19][20][21][22]. The SSRs are preferred for their ability to detect higher degrees of polymorphism, higher reproducibility, and abundant coverage of the genome [20,21]. Moreover, SSR markers can be used for loci with multiple co-dominant alleles [23]. Wei et al. [20] and Asekova et al. [21] assessed genetic diversity and population structure present in sesame genetic resources sampled from China and Korea using 44 and 23 SSRs, respectively. The authors reported two and three major heterotic groups among the Chinese and Korean collections, respectively. The level of genetic diversity varies among different germplasm populations and environmental conditions, suggesting that each set of populations must be assessed in a target production environment for selection and genetic grouping. Therefore, this study's objectives were to determine the extent of genetic variation among 100 diverse sesame germplasm collections of Ethiopia using phenotypic traits and simple sequence repeat markers to select and recommend distinct and complementary parents for direct production, breeding, and conservation.

Plant Materials
The study used a mini-core collection of 100 sesame entries originally collected from the Amhara, Tigray, Afar, Oromia, and Gambela regions in Ethiopia. The test genotypes were obtained from the sesame and groundnut breeding program of Werer Agricultural Research Centre of the Ethiopian Institute of Agricultural Research (EIAR). The collection comprised of 95 accessions, one landrace (farmer variety), and four released varieties. The landrace variety "Hirhir" is widely cultivated by farmers in the study areas. The four released varieties (i.e., Setit-1, -2, -3, and Humera-1) were developed by the Humera Agricultural Research Center (HuARC) through mass selection amongst the local germplasm collections. The details of the germplasm collections used in the study are summarized in Table 1.

Experimental Design and Trial Management
The experiment was conducted under field conditions and laid out using a 10 × 10 simple lattice design, with two replications, at each site. Each entry was planted in four rows plots measuring four meters in length, with an inter-row and intra-row spacings of 0.4 m and 0.1 m, respectively. The trials were maintained following the standard agronomic practices of sesame production [24].

Phenotypic Data Collection
Data were collected on quantitative and qualitative traits. Plant height, internode length, number of primary branches per plant, number of secondary branches per plant, number of capsules per plant, number of seeds per capsule, stem height to first branch, and distance from lowest branch to 1st capsule were recorded from 10 randomly selected and tagged plants during plant growth and at harvest. Plant height (PH) was measured from the base to the tip of the plant. Stem height from the base to the 1st branch (SHB) was measured from the base of the plant to first emerged primary branch. Internode length (INL) was measured between two consecutive nodes situated in the middle of the plant. The number of primary branches per plant (NPB) was counted from the plant's main stem, while the number of secondary branches per plant (NSB) was counted from the plant's main branch. Distance from the base of the lowest branch to the first capsule (DFLBC) was measured as the distance between the lowest situated primary branch to the 1st emerged capsule on the main stem and expressed in cm.
The number of days to flowering (DF) was recorded by counting the number of days from planting to the date when 50% of the plants showed flowers, while days to maturing (DM) was recorded as the number of days from planting to the date when 75% of the plants reached physiological maturity. The number of capsules per plant (NCPP) and number of seeds per capsule (NSPP) were counted from a composite of three capsules per plant at harvest. Thousand seed weight (TSW) was measured from a random sample of 1000 seeds of each entry. Grain yield (GYH) was measured in grams per plot and converted into ton (t) per hectare (ha −1 ).
Oil content was determined at Wuhan city, China using the Near-Infrared Spectroscopy (NIR) (FOSS, model DS2500, Hillerød, Denmark). Oil yield per hectare was calculated and expressed in tons per hectare as the product of grain yield and percent oil content.

Phenotypic Data Analysis
The phenotypic data were subjected to analysis of variance (ANOVA) using the alphalattice and general linear model (GLM) procedures of the SAS software version 9.4 [25]. A combined analysis of variance across the two locations was performed after Bartlett's homogeneity test of variance. Mean comparisons among accessions were performed using Tukey's Honestly Significant Difference (HSD) test procedure at 5% level of significance used to identify significant differences among means in Table 4. The correlation among traits was performed using R software version 4.0 [26] to determine the magnitude of associations among the studied traits. Multivariate analysis using the principal components was performed using R software version 4.0 [26].

DNA Extraction, Primer Selection, Polymerase Chain Reaction, and Electrophoresis
The above 100 sesame entries (Table 1) were planted at the Oil Crops Research Institute (OCRI)-the Chinese Academy of Agricultural Sciences (OCRI-CAAS), China. Ten seeds per entry were sown in a plastic tray in a growth room. Three two-weeks old seedlings were randomly selected from each entry, and fresh young leaves were collected and ground in liquid nitrogen for DNA extraction. The DNA was extracted following the Cetyl-tetramethyl ammonium bromide (CTAB) method. Approximately 200 mg of ground plant tissue combined with 500 µL of CTAB buffer was incubated in a water bath at 65 • C, 4 times for 10 min, and subjected to centrifugation at 12,000 rpm for 10 min at 4 • C. The supernatant was then transferred into new 5 mL micro-tubes, and 400 µL chloroform: iso-amyl alcohol (24:1) was added into the tubes and mixed gently. After a minute of centrifugation (centrifuged at 12,000 rpm for 10 min at 4 • C), the supernatant was transferred into new 5 mL micro-tubes, and 400 µL isopropanol was added into the tubes, mixed gently and kept at −20 • C for 30 min and subjected to centrifugation at 12,000 rpm for 10 min at 4 • C. The precipitated DNA was washed by 75% ethanol three times. The resulting pellet was dried under vacuum and dissolved in 100 uL DD H 2 O. DNA concentrations were measured using the Quantus TM Fluorometer (Promega Corporation, Madison, USA). Microsatellites from 13 linkage groups were designed and used for the following experiments. The 27 primers were selected because of their suitability in discriminating sesame genotypes. The presently used primers were initially selected amongst 160 candidate primers based on their higher polymorphic information content and provided clear and informative amplicon profiles in sesame genetic analysis [27].
The polymerase chain reaction (PCR) conditions were maintained as follows; each PCR reaction was carried out in a 20 µL solution containing 25 ng of DNA, 4 µmol of forward primers, 4 µmol of reverse primers, 1 × buffer, 0.25 mmol of dNTPs, and 0.80 U Taq polymerase. The temperature profile used for PCR amplification comprised a denaturation step at 94 • C for 1 min, followed by primer annealing temperature at 45.2-53 • C for 1 min, and elongation at 72 • C for 1 min. After 34 cycles, the reaction was terminated with a 10 min final extension time at 72 • C.

Genotypic Data Analysis
The fragment sizes were determined using the ABI 3730 automatic sequencer. Data were analysed using the software GeneMarker V 2.2.0 to determine peak detection threshold levels that ranged from the minimum intensity of 500 and max intensity of 30,000. The 27 primers were used to detect the band sizes based on the peak detection thresholds, which were then scored using 1 to denote presence and 0 for absence. Genetic parameters, such as major allele frequency (M.A.F.), observed heterozygosity (Ho), expected heterozygosity (He), and the polymorphic information content (PIC) were calculated using Power Marker v3.2. Cluster analysis was carried out using a neighbor-joining (NJ) algorithm using the unweighted pair group method (UWPGM) in R software version 4.0 [26].
The population structure of the 100 sesame accessions was investigated using the Bayesian clustering method in STRUCTURE version 2.3.4 [28]. The length of the burn-in period and Markov Chain Monte Carlo (MCMC) were set at 20,000 iterations [29]. To obtain an accurate estimation of the number of populations, ten runs were performed for each K-value (assumed number of subpopulations), ranging from 1 to 10. Further, Delta K values were calculated, and the appropriate K value was estimated by implementing the [29] method using CLUMPK. The principal coordinate analysis was also used to deduce the genotypes' genetic structure using Darwin version 6.

Genetic Variation and Mean Performance of Sesame Accessions
Combined ANOVA revealed significant (p ≤ 0.05) entry x environment interaction for plant height, internode length, number of primary branches, number of secondary branches, distance from the base of the lowest branch to 1st capsule, and grain yield per hectare (Table 3). Entries showed significant (p ≤ 0.05) differences for days-to-50% flowering, days-to-75% maturity, plant height, internode length, number of secondary branches, number of seeds per capsule, distance from the base of the lowest branch to 1st capsule, and grain yield per hectare. Note: Genotype by environment interaction (Gen × Env), * and ** denote significance difference at the 5% and 1% levels of probability, respectively; Non-significant ( Based on grain yield response, the top 10 best performing and the five bottom performing accessions are summarized in Table 4. The mean grain yield across locations was 0.48 ton ha −1, and the mean thousand-seed weight was 2.9 g. The highest grain yield was recorded for entries such as: Hirhir Kebabo Hairless-9 (1.01 ton ha −1 ), Setit

Correlations of Yield and Yield Components
Phenotypic correlation coefficients for the studied traits are presented in Table 5. Grain yield was significantly and positively correlated with oil yield (r = 0.99; p < 0.01). Significant and positive correlations were also observed between grain yield and internode length (r = 0.35; p < 0.01), number of secondary branches (r = 0.21; p < 0.01), number of capsules per plant (r = 0.18; p < 0.01), number of seeds per capsule (r = 0.17; p < 0.01), stem height from base to 1st branch (r = 0.16; p < 0.01), and thousand-seed weight (r = 0.23; p < 0.01).

Principal Component Analysis
Principal component analysis (PCA) was computed to show each trait's contribution to the overall observed variation. A scree plot was generated to visualize the number of principal components. Overall, four principal components were identified with >1 Eigen values of which principal components 1 (PC1) and PC2 explained the highest proportion to the total variance ( Figure 1). Principal component one (PC1) explained 19.9% to the total variation with OYH and GYH contributing the largest variation to PC1. Principal component two (PC2) accounted for 15.9% of the total variation, and DM, DF, DFLBC, and NPB were the most influential traits.

Principal Component Analysis
Principal component analysis (PCA) was computed to show each trait's contribution to the overall observed variation. A scree plot was generated to visualize the number of principal components. Overall, four principal components were identified with >1 Eigen values of which principal components 1 (PC1) and PC2 explained the highest proportion to the total variance ( Figure 1). Principal component one (PC1) explained 19.9% to the total variation with OYH and GYH contributing the largest variation to PC1. Principal component two (PC2) accounted for 15.9% of the total variation, and DM, DF, DFLBC, and NPB were the most influential traits.

Genetic Polymorphism of the SSR Markers
The summary statistics describing the SSR markers are presented in Table 6. The major alleles frequency per locus ranged from 0.52 to 0.96, with a mean of 0.78 alleles per locus. The observed heterozygosity varied from 0.08 to 0.96, with a mean of 0.43. The unbiased expected heterozygosity (gene diversity) of the markers ranged from 0.08 to 0.5, with a mean of 0.30. The PIC values ranged from 0.07 (for markers ID0041, ID0175, and ZMM2818) to 0.37 (ZMM3261 and ZMM1189) with a grand mean value of 0.25.

Genetic Polymorphism of the SSR Markers
The summary statistics describing the SSR markers are presented in Table 6. The major alleles frequency per locus ranged from 0.52 to 0.96, with a mean of 0.78 alleles per locus. The observed heterozygosity varied from 0.08 to 0.96, with a mean of 0.43. The unbiased expected heterozygosity (gene diversity) of the markers ranged from 0.08 to 0.5, with a mean of 0.30. The PIC values ranged from 0.07 (for markers ID0041, ID0175, and ZMM2818) to 0.37 (ZMM3261 and ZMM1189) with a grand mean value of 0.25.
Genotypes allocated in population IV were also early maturing with taller plants. Some accessions within this group also had remarkable seeds per capsule (Setit-3 and NN-0020), better thousand-seed weight (3.4 g), higher seed and oil yields (0.84 and 0.40 ton ha −1 ), and oil content (54.7%). To develop new breeding populations possessing desirable economic traits new crosses could be developed between the selected parents. Hence accessions Orofalc ACC-2 (from population II), Hirhir Filwha Large Seeded (population III), and Setit-3, Hirhir Humera Sel-6 (population IV) are ideal candidates with complementary traits for production and further breeding. However, the principal coordinate analysis assigned the 100 genotypes into admixture groups with an inconclusive structure ( Figure 2c).

Cluster Analysis of 100 Sesame Accessions
The cluster analysis involving 100 sesame genotypes resolved two clusters, and each cluster was further partitioned into two sub-clusters ( Figure 3). Cluster I consisted of 49 accessions and one improved variety sourced from the following regions: Amhara (37 accessions), Tigray (5 accessions and one improved variety), Afar (6 accessions), and Oromia (1 accession). Cluster II contained 50 diverse genotypes, of which 28 accessions were from Amhara, while 13 accessions, one landrace and 3 improved varieties from Tigray,  Table 1 for codes of entries.  Table 1 for codes of entries.

Genotypic Variation and Mean Performance for Seed and Oil Yields, and Yield-Component Traits
Assessment of genetic diversity among crop genetic resources is essential to identify candidate accessions possessing desirable traits, including yield and quality attributes. The current study evaluated the genetic variation present among 100 accessions of sesame through rigorous field phenotyping and polymorphic SSR markers as a preliminary step to select genetically complementary parental accessions for breeding.
The test genotypes showed significant (p ≤ 0.05) variation for grain yield and yield components (Table 3). This suggests that the germplasm pool contains vital phenotypic traits for sesame improvement through hybridization and selections. The test genotypes were sourced from five historically sesame-growing regions in Ethiopia. Given the long agricultural history and sesame production of the collection areas, it is expected that the test genotypes have adapted and evolved under local conditions through natural selection. This caused genetic differentiation of the studied sesame accessions for grain and oil yields and important yield-contributing agronomic traits. For example, the present study identified and selected sesame genotypes such as Hirhir Kebabo Hairless-9 and Setit-3 with high grain yields of >0.8 tons ha −1 and higher oil yields of 0.40 ton ha −1 . The selected genotypes, which are locally referred to as Humera types, are known for their unique quality associated with product aroma and taste [30]. The selected genotypes expressed higher grain yield which is above the mean yield of 0.68 tons ha −1 currently recorded in Ethiopia using traditional varieties.

Traits Associations
Sesame seed and oil yields are low in Ethiopia due to a lack of high-yielding varieties. These results in low financial returns for producers and processors across the sesame value chains. To improve selection response and genetic gains for economic traits, selection of highly heritable yield-contributing traits associated with seed and oil yields may be targeted in sesame improvement programs. The strong and positive correlation between seed and oil yield among the studied sesame genotypes implied both traits could be improved simultaneously in the present population. Weak correlations observed between grain yield with yield-related traits, including internode length, number of secondary branches, number of capsules per plant, stem height from base to first branch, and thousand seed weight would provide a low selection response for grain yield.
Similarly, oil yield exhibited low correlations with internode length, the number of secondary branches per plant, and thousand-seed weight implying reduced selection response for grain yield via these traits. Oil content showed poor associations with agromorphological traits hindering direct selection. Despite the low and poor associations between seed and oil yields and oil content with yield-related agronomic traits, the present study revealed wide phenotypic variation among the studied sesame populations for several traits. These are valuable traits for future sesame phenotypic analysis, selection, and improvement in Ethiopia. Moreover, the assessed germplasm was diverse for seed and oil yields and oil content. This aided identification and selection of sesame genotypes such as Hirhir Kebabo Hairless-9, Setit-3, Orofalc, Hirhir Humera Sel-6, and Setit-1 with high seed and oil yields as useful germplasm to design and develop improved cultivars. Furthermore, sesame genotypes with relatively higher oil content, including Hirhir Humera Sel-6, Setit-1, ACC 205-180, and Orofalc ACC-2 are suitable candidates for developing new breeding populations with higher oil yield and content.
The traits accounting for the significant variation observed in the first two PCs will be important for selection. Nevertheless, 53.4% of the total variation was not explained by the PCA, probably due to the limited number of test locations used in the study. Hence, there is a need to assess the test accessions across multiple test environments and using effective molecular markers to complement the phenotypic data.

Genetic Diversity and Population Structure of Sesame Germplasm Based on SSR Markers
SSR markers are amongst the useful genomic resources to complement phenotypic data for effective selection. The present study recorded a mean major alleles frequency per locus of 0.78 among the sesame population (Table 6), which was much higher than values of 0.41 and 0.17 reported by [21,31] using 23 and 21 SSR among 129 Korean and 25 Ghanaian sesame genotypes, respectively. Variation in alleles frequency is attributable to genotypic differences and the number of SSR markers used in the genetic analysis [32][33][34]. The mean observed heterozygosity of 0.43 reported in the present study is lower than the value of 0.56 reported by [31] when assessing 25 sesame genotypes using 21 SSR markers. This study's observed heterozygosity was higher than values of 0.23, 0.01, and 0.12 reported by [19,21,22] when assessing 50, 129, and 36 sesame genotypes using 10, 23, and 10 SSR markers, respectively. The mean expected heterozygosity (He = 0.30) recorded in the present study (Table 7) was lower than values of 0.72 and 0.34 reported by [21,22] when evaluating 129 and 36 sesame accessions using 23 and 10 SSR markers, respectively. The higher heterozygosity recorded in the present study suggested that the Ethiopian sesame populations have a high genetic variation for selection.
The genetic variability was confirmed by population structure analysis, which revealed four distinct populations comprising genotypes collected from different regions in Ethiopia. Most released entries (Humera-1, Setit-1, and Setit-3) were grouped in subpopulation 4. Wei et al. [20] and Asekova et al. [21] reported two and three populations among 94 and 129 sesame accessions sampled from China and Korea collections using 44 and 23 SSR markers. The higher gene fixation index of 0.39 in population I comprising accessions collected from Amhara, Tigray, Afar, and Oromia regions suggest higher genetic differentiation attributable to high gene flow among these regions. Conversely, the low gene fixation index observed in population III, which comprises accessions sourced from the Amhara and Tigray regions, indicated low differentiation. This may be due to gene flow through germplasm exchange between sources of collections. The exchange of planting material regardless of geographical distances might be attributed to a low degree of differentiation in sesame populations observed in the current study.
Cluster analysis identified two major clusters and four sub-clusters, revealing genetic variation among the assessed sesame entries (Figure 3). Asekova et al. [21] grouped 129 sesame genotypes into two clusters using 23 SSR markers. In the present study, the genotypes' clustering patterns did not correspond to the predefined population structure based on the collection regions. This may be because genotypes gathered from similar regions belong to the same gene pool or may have similar ancestral relationships [35]. Conversely, William et al. [36] reported that genetic dissimilarity among test genotypes could arise due to the diverse ancestral origin, high gene flow caused by cross-pollination and possible gene or chromosomal mutation. In this study, some sesame genotypes collected from different regions were grouped in the same cluster, such as Hirhir Kebabo Hairless Sel-6 (Tigray) and Gojam Azene (Yohans Sel-1) (Amhara), and ACC-NS-007(2) (Oromia) and GA-002(3) (Gambela) which were found in cluster I and II. In agreement with the current study, Zhang et al. [37] reported that geographical separation did not affect genetic distance among 24 sesame genotypes. Ganesamurthy et al. [38] reported that geographical separation does not affect the genetic differentiation of germplasm. Therefore, a key indicator of genetic diversity is not necessarily the geographical origin of germplasm collections. The exchange of genetic materials among farmers and traders in the regions contributes to high gene flow and a lack of genetic differentiation. Barnaud et al. [39], suggested that farmers' selections and management practices affect genetic diversity patterns.
To develop new breeding populations possessing desirable agronomic traits, especially high grain and oil yields, crosses could be made between distantly related and complementary genotypes selected from different clusters. For instance, for improved grain and oil yields, the following entries were selected such as Setit-3, Orofalc ACC-2, Hirhir Humera Sel-6, ACC-NS-007(2), Hirhir Kebabo Hairless-9, and ACC 205-180. These genotypes are localized in sub-cluster II-a and sub-cluster II-b. The two clusters contained candidates with excellent grain and oil yields.
In conclusion, the current study determined the extent of genetic variation among 100 diverse sesame germplasm collections of Ethiopia using phenotypic traits and simple sequence repeat (SSR) markers to select distinct and complementary parents for breeding. The test genotypes exhibited significant phenotypic variation for key agronomic traits including grain yield, oil content, and oil yield, which were underpinned by their genetic diversity. The sesame genotypes were differentiated into four major populations based on the model-based population structure analysis. The moderate heterozygosity and fixation index among the accessions suggests that the accessions have distinct heterotic groups desirable for breeding. Based on wide genetic divergence, the following genotypes were selected for use in future sesame breeding programs: Hirhir Humera Sel-6, Setit-3, Hirhir Kebabo Hairless Sel-4, Hirhir Nigara 1st Sel-1, Humera-1, Orofalc ACC-2, and Hirhir Kebabo Early Sel-1 (selected from subgroup II-a), Hirhir kebabo hairless-9, NN-0029(2), NN0068-2, Hirhir Filwha Large Seeded, and Bawnji Fiyel Kolet, (from subgroup II-b). Progeny development and field evaluation by combining ability analysis are recommended among the selected parents to establish heterotic groups for sesame pre-breeding.

Institutional Review Board Statement:
The field studies were conducted using sesame genetic resources complying with the guidelines of the Ethiopian Institute of Agricultural Research Institute (EIAR). The sesame genetic resources were kindly supplied by the Ethiopian Biodiversity Institute (EBI) for exclusive use for this research.

Informed Consent Statement: Not applicable.
Data Availability Statement: All data generated or analysed during this study are included in this published article.