Investigating Genetic Diversity and Population Structure in Rice Breeding from Association Mapping of 116 Accessions Using 64 Polymorphic SSR Markers

: Genetic variability in rice breeding programs plays a very crucial role. It provides an outstanding pool of superior alleles governing better agronomic and quality characters through association mapping. For a greater understanding of population structure, the genetic relationship among different rice lines is indispensable prior to the setting of a correlation among dynamic alleles and traits. In the present investigation, the genetic diversity and population structure of 116 rice accessions were studied to understand genetic relatedness and diversity among them using 64 polymorphic SSR markers. A genotyping assessment based on SSR markers revealed a total of 225 alleles, with an average PIC value of 0.755. The germplasm lines were classified into three distinct subgroups through population structure analysis, utilizing both model-and distance-based approaches. AMOVA analysis showed that 11% of the total variation could be attributed to differences between groups, while the remaining 89% was likely due to differences within groups. This study suggested that population structure and genetic relatedness should be considered to establish marker– trait associations for association mapping when working with the core collection of germplasm lines.


Introduction
Rice (Oryza sativa L.) is an important staple crop grown in more than 100 countries, primarily farmed in Asian countries, and consumed by more than half of the global population to fulfil the calorific requirements [1,2].The consumption of rice is expected to be approximately 800-900 million tons (mt) by 2025 compared with the current production of 516 mt on the basis of milled rice [3].The higher production and productivity, attributed to the richness of genetic diversity in Indian germplasm, and its accessibility has reached saturation level and is stagnating.Now, we need to explore thoroughly the targeted traits based on genetic principles in order to manipulate them for improvement in yield traits, resistance to biotic and abiotic stresses, and quality parameters.Quantitative trait loci (QTL) mapping is a widely used approach for identifying the genetic basis of important agronomic traits in natural populations which involves either linkage mapping or LD mapping.In order to ensure global food security by developing improved rice cultivars with better tolerance against diseases and abiotic stresses like drought, flooding, salt, etc., specific traits need to be mapped and utilized in breeding programs [4].
The effects of climate change on the Earth's surface and atmosphere include increased temperatures, uneven precipitation, floods, and submergence [5].Several quantitative trait loci (QTLs) have been identified for submergence tolerance derived from different populations [6][7][8][9][10][11].In order to boost rice yield with excellent quality, a careful process need to be followed, keeping in view the constantly growing population and changing climate spectrum with the abiotic factors like drought, salt, temperature, pollution, and other reducing factors for rice productivity.Breeders are interested in the genetic enhancement of rice productivity, and rice with desirable nutritional quality attributes and have been concerned about its high productivity [12].The availability of genetic variability and the awareness of its importance play crucial roles in every genetic improvement program for ensuring responsible use as well as for selecting effective breeding tactics [13].The impact of genetic variability and the heritability of the advantageous makeup determine the breeding program's overall effectiveness.The diverse gene pool of rice accessions provides breeders with the chance to pick out desired features and combine them in novel ways.
There are numerous methods available to examine genetic diversity at both genotypic and phenotypic levels.One of the greatest ways to examine genotypic variety in rice is the use of molecular markers.These markers can identify significant changes among accessions at the DNA level, making them a more effective and well-thought-out tool for the characterization and genetic make-up of accessions.Among several techniques, SSR is one of the most popular, effective, and reasonably priced techniques for the genetic characterization of germplasm.These SSR markers are known for their co-dominant and specific nature, as well as their high level of allelic diversity, relative polymorphism abundance, and wide distribution across the genome.Consequently, SSR markers have proven to be effective in establishing genetic links [14,15].Due to their multi-allelic and highly polymorphic nature, SSR markers can provide a better genetic diversity spectrum even when used in smaller numbers.In this way, SSR markers play a crucial role in identifying genetic polymorphisms and showcasing high allelic diversity.These markers are commonly used to investigate the nuances of genetic variation among closely related rice accessions [16].
To ensure accurate association mapping in a population, it is crucial to ascertain the population structure.This step reduces type I and II errors resulting from uneven allele frequency distribution between subgroups, which may lead to false associations between molecular markers and the trait of interest [17].Recent efforts have been made to define the population structure in various crops, including rice, using diverse germplasm lines, including the development of core collections from national and international collections [18][19][20][21][22][23].Previous studies utilized SSR markers alone [19,[24][25][26][27] or in conjunction with SNP markers [28,29] for similar investigations.The present study aims to evaluate the genetic variation and to examine the population structure of 116 rice germplasm accessions, including local landraces, improved varieties, and exotic lines from diverse origins.This study will be helpful in obtaining insights into the relatedness of individuals based on this genetic information and aid in classifying genotypes based on similarity patterns in the panel of rice genotypes for marker trait associations, specifically for submergence tolerance and other agronomic traits.

Plant Material and DNA Extraction
In this study, a collection of 116 rice genotypes was utilized.The experimental work was conducted at the Crop Physiology Experimental Plot, while molecular analysis was performed at the PG Lab, Department of Plant Molecular Biology and Genetic Engineering, Acharya Narendra Dev University of Agriculture and Technology, Ayodhya, Uttar Pradesh, India.For the molecular studies, one-month-old plant leaves were collected, and complete genomic DNA was isolated using the CTAB method [30].Briefly, the leaf samples were ground with liquid nitrogen and mixed with pre-heated 2% extraction buffer (20 mM EDTA, 1.5 M NaCl, 100 mM Tris HCL, 2% CTAB, and 1% β-Mercaptoethanol).The mixture underwent treatment with chloroform: isoamyl alcohol (25:1), 100 mg/mL RNase, and 70% ethanol.Subsequently, it was incubated in a water bath at 65 • C for 45 min with gentle shaking in between.The resulting pellet was dissolved in 1X TE buffer.The quality of the extracted genomic DNA was assessed using a 0.8% agarose gel and quantified using a spectrophotometer nanodrop (Thermo Scientific, Wilmington, DE, USA).The DNA was then diluted to 20 ng/µL in TE buffer for PCR amplification.

SSR Genotyping and Data Analysis
For investigating rice diversity, a set of 64 highly polymorphic SSR primers were selected from the website "https://archive.gramene.org/markers/microsat/50ssr.html(accessed on 8 July 2023)".To assess the amplification and suitability of each primer for future genotyping of the remaining accessions, four genomic DNA samples were initially amplified using 30 SSR primers.PCR amplification was conducted in a 10 µL reaction volume, consisting of 20 ng DNA, 1X PCR master mix (GeNei Labs, Bengaluru, India), and 5 pmol each of the forward and reverse primers.The amplification process was carried out using a C1000 thermal cycler (Bio-Rad Laboratories Inc., Berkeley, CA, USA) with the following conditions: pre-denaturation at 95 • C for 5 min, followed by 39 cycles of denaturation at 95 • C for 30 s, annealing at 53-58 • C (specific to each primer) for 45 s, extension at 72 • C for 1 min, and a final extension at 72 • C for 10 min.Standard molecular weight size markers, such as the 100 bp DNA ladder (GeNei Labs, Bengaluru, India), were used to determine the size of the most intensely amplified bands around each microsatellite marker, based on the estimated product size listed on the GRAMENE website.
Based on the existence of a certain size allele in each of the germplasm samples, an allele score was assigned.An allele's existence was indicated by 1 and its absence by 0, and it was manually checked again.Both allele size and a binary matrix were used to grade the SSR genotyping results (0-1).The allelic data were analyzed using Power Marker Software to calculate various genetic parameters, including the polymorphic information content (PIC) value, major allele frequency, number of alleles per locus, and gene heterozygosity [31].Using DARwin Software (version 6.0.021), the binary data matrix was submitted to the calculation of the distance matrix based on the Jaccard similarity coefficient [32].With 1000 bootstraps, the resulting distance matrix was utilized to build a neighbor-joining dendrogram.Genetic relatedness and diversity estimates were analyzed using average pairwise divergence (π) and segregating sites through the unweighted pair group method with arithmetic mean (UPGMA) in TASSEL 5.0 software.

Genetic Variability
Genetic variability was analyzed for agronomically important traits, namely, seedling vigor (SV), days of 50% flowering (DFF), plant height (PH), panicle length (PL), number of spikelets per panicles (SPP), biological yield per plant (BYP), and harvest index (HI%).The data collection process involved rigorous field observations throughout the growth cycle of rice plants and measurements were conducted in accordance with standard agronomic practices to ensure the reliability of the experiment.For instance, SV was assessed based on the overall health and robustness of the seedlings during the early stages of growth.DFF was recorded as the number of days from planting to when 50% of the plants began flowering.Similarly, PH, PL, SPP, BYP, and HI were all measured according to established protocols.These measurements were taken from multiple plants to ensure representative sampling and the accurate assessment of genetic variability.

Structure Analysis
The software STRUCTURE v 2.3.3 was employed to conduct Bayesian clustering and determine the number of subpopulations within the accessions, following the method by Pritchard et al. [17].An admixture model with independent allele frequencies was utilized for the STRUCTURE analysis.The number of supposed populations (K) was varied from 2 to 10, and, for each K value, three independent runs were performed.Each run consisted of a 30,000 burn-in period and 100,000 iterations.The ideal value of K was determined using the Delta K statistic and L(K), as described by Evanno et al. [33], and analyzed using a structure harvester [34].GenAlex 6.5 was utilized to compute various genetic parameters, including the number of observable alleles (Na), and molecular variance (AMOVA) [35][36][37].

Allelic Diversity and SSR Marker Informativeness
A total of 116 rice germplasm lines were genotyped using 64 SSR (microsatellite) markers, resulting in the identification of 225 alleles (Table 1).Among these alleles, 5% were classified as rare, with an allele frequency of less than 5%.The number of alleles per locus ranged from two to eight, with an average of 3.57 alleles per locus.The markers RM154 and RM7200 loci had the highest number of detected alleles (8), while a group of markers, including RM422, RM1807, RM510, RM121, RM427, RM7, RM118, RM408, RM284, RM433, RGNMS3189, RM415, RM277, HVSSR12-43, and HVSSR12-44, exhibited the lowest number.The average polymorphic information content (PIC) value, which represents the relative informativeness of each marker, was found to be 0.747 in this study.Landraces showed the highest genetic diversity with PIC values ranging from 0.495 for RM162 to 0.984 for RGNMS3228.
The expected heterozygosity or gene diversity (He), calculated according to reference [38], ranged from 0.017 (RM408) to 0.868 (RM7200), with an average value of 0.421 (Table 1).Figure 1 below presents statistical features, with allelic diversity for each marker ranging from two to eight.Markers with higher number of alleles indicate greater genetic variability within the rice accessions.Mean, minimum, and maximum values are calculated only for numeric features.Mode indicates the most common value for numeric or categorical features of the analyzed parameters.Dispersion indicates the coefficient of variation for numeric features and entropy for categorical features.
Crops 2024, 4, FOR PEER REVIEW 6 The expected heterozygosity or gene diversity (He), calculated according to reference [38], ranged from 0.017 (RM408) to 0.868 (RM7200), with an average value of 0.421 (Table 1).Figure 1 below presents statistical features, with allelic diversity for each marker ranging from two to eight.Markers with higher number of alleles indicate greater genetic variability within the rice accessions.Mean, minimum, and maximum values are calculated only for numeric features.Mode indicates the most common value for numeric or categorical features of the analyzed parameters.Dispersion indicates the coefficient of variation for numeric features and entropy for categorical features.

Chromosomal Distribution and Molecular Weight Analysis of SSR Markers
A scatter plot was made, showcasing the relationship between chromosome numbers and the maximum and minimum molecular weights (Figure 2).Color-coded regions on the plot align with chromosome projections, as well as markers (Figure 2A,B) and SSR motifs (Figure 2C,D) for each chromosome.An inset in the figure presents a legend de-

Chromosomal Distribution and Molecular Weight Analysis of SSR Markers
A scatter plot was made, showcasing the relationship between chromosome numbers and the maximum and minimum molecular weights (Figure 2).Color-coded regions on the plot align with chromosome projections, as well as markers (Figure 2A Each marker is associated with specific information, including its name, chromosome location, SSR motif, and the minimum and maximum molecular weights.These SSR motifs exhibit diversity and consist of various repeats, such as (GA), (CTG), (GATA), (TCAC), (AG), (AT), and others.The color-coded regions on each chromosome demonstrated the variations in length and composition of these motifs contributing to the observed genetic diversity.

Genetic Variability
The genetic variability result indicates that a wide range of variability was observed among all the traits studied.The magnitude of phenotypic coefficient of variation (PCV) was generally higher than genotypic coefficient of variation (GCV) for all the traits (Table 2).Seed vigor (26.98%) and panicle length (21.76) showed high magnitudes of PCV (>20%).Biological yield per plant (19.82%), plant height (12.70%), and harvest index (10.36%)showed moderate magnitudes of PCV.Additionally, these traits also had similar magnitudes of GCV.Days to 50% flowering and spikelet per panicle exhibited low magnitudes for both PCV and GCV (<10%).Each marker is associated with specific information, including its name, chromosome location, SSR motif, and the minimum and maximum molecular weights.These SSR motifs exhibit diversity and consist of various repeats, such as (GA), (CTG), (GATA), (TCAC), (AG), (AT), and others.The color-coded regions on each chromosome demonstrated the variations in length and composition of these motifs contributing to the observed genetic diversity.

Genetic Variability
The genetic variability result indicates that a wide range of variability was observed among all the traits studied.The magnitude of phenotypic coefficient of variation (PCV) was generally higher than genotypic coefficient of variation (GCV) for all the traits (Table 2).Seed vigor (26.98%) and panicle length (21.76) showed high magnitudes of PCV (>20%).Bi-ological yield per plant (19.82%), plant height (12.70%), and harvest index (10.36%)showed moderate magnitudes of PCV.Additionally, these traits also had similar magnitudes of GCV.Days to 50% flowering and spikelet per panicle exhibited low magnitudes for both PCV and GCV (<10%).

Distinct Subgroup Identification through Population Structure
The population structure of 116 germplasm lines was assessed through a Bayesianbased approach.This analysis involved estimating membership fractions for a range of values of "k", spanning from 1 to 9 (Figure 1).The log likelihood obtained from the structure analysis pointed to the optimal value for "k" being three (k = 3).Similarly, an ad-hoc measure known as ∆ k exhibited its peak at k = 3.This peak indicated the presence of three distinct subgroups within the population, designated as SG1, SG2, and SG3.
Subsequently, based on the membership fractions, accessions with a probability of 80% or higher were allocated to their respective subgroups, while those with lower probabilities were classified as admixtures (Figure 3; Supplementary Table S1).SG1 was composed of 23 accessions, primarily consisting of Indian landraces and varieties, while SG2 included 32 accessions of non-Indian origin.SG3 comprised of 40 accessions, and 21 accessions were classified as admixture.In SG1, the majority belonged to the Indica subtype, while SG2 was predominantly represented by the Japonica group.Upon increasing the number of subgroups from two to five, the accessions within both SG1 and SG2 were further subdivided into sub-subgroups (Supplementary Table S1).As SG1 mainly comprised of 23 accessions of Indian origin, an independent STRUCTURE analysis revealed that ∆ k reached its peak at k = 3, indicating the presence of three sub-subgroups within SG1 (Figure 3).This clustering was attributed to the differentiation in the origin and seasonal patterns of rice varieties.

Genetic Relatedness and Diversity Assessment
Analysis categorized 116 rice accessions into three distinct groups, i.e., Group I, Group II, and Group III, accommodating 42, 36, and 38 genotypes, respectively.Group I in the UPGMA tree comprised of a mix of indigenous and agronomically improved varieties, while Groups II and III primarily consisted of exotic accessions.Subgrouping within

Genetic Relatedness and Diversity Assessment
Analysis categorized 116 rice accessions into three distinct groups, i.e., Group I, Group II, and Group III, accommodating 42, 36, and 38 genotypes, respectively.Group I in the UP-GMA tree comprised of a mix of indigenous and agronomically improved varieties, while Groups II and III primarily consisted of exotic accessions.Subgrouping within the UPGMA tree revealed that accessions in each group formed further smaller subgroups based on their origin and types.Landraces and varieties were predominantly clustered in the upper branches of the tree, whereas exotic accessions clustered in the lower branches (Figure 4).
the UPGMA tree revealed that accessions in each group formed further smaller subgroups based on their origin and types.Landraces and varieties were predominantly clustered in the upper branches of the tree, whereas exotic accessions clustered in the lower branches (Figure 4).

Principal Coordinate Analysis (PCoA)
Principal coordinate analysis (PCoA) was employed to characterize different germplasm subgroup sets.The two-dimensional and three-dimensional scatter plots for all the 116 accessions demonstrated that the first three PCA axes accounted for 5.81%, 5%,

Principal Coordinate Analysis (PCoA)
Principal coordinate analysis (PCoA) was employed to characterize different germplasm subgroup sets.The two-dimensional and three-dimensional scatter plots for all the 116 accessions demonstrated that the first three PCA axes accounted for 5.81%, 5%, and 3.92% of the genetic variation among populations, respectively (Figure 5).Both classification methods also showed a high level of similarity in clustering the genotypes.and 3.92% of the genetic variation among populations, respectively (Figure 5).Both classification methods also showed a high level of similarity in clustering the genotypes.

Genetic Differentiation Analysis
The analysis of molecular variance (AMOVA) and pair-wise comparisons of subgroups identified from population structure demonstrated significant genetic differentiation among the subgroups.The results revealed that 10% of the total variation was attributed to differences among populations whereas 79% was due to variation among individuals.The remaining 11% of the total variation was observed within individuals (Table 3).Wright's F-statistics for all SSR loci indicated an FIS value of 0.879 and an FIT value

Genetic Differentiation Analysis
The analysis of molecular variance (AMOVA) and pair-wise comparisons of subgroups identified from population structure demonstrated significant genetic differentiation among the subgroups.The results revealed that 10% of the total variation was attributed to differences among populations whereas 79% was due to variation among individuals.The remaining 11% of the total variation was observed within individuals (Table 3).Wright's F-statistics for all SSR loci indicated an FIS value of 0.879 and an FIT value of 0.890.Additionally, the determination of FST for the polymorphic loci across all accessions yielded an FST value of 0.096, suggesting a moderate level of genetic variation (Table 3).

Discussion
Genetic diversity plays a pivotal role and serves as a crucial resource in crop improvement and breeding programs.Populations with higher genetic variation are of immense importance for broadening the genetic base in breeding endeavors [39,40].In this study, 116 rice accessions, encompassing landraces, released varieties, and advanced breeding lines developed for diverse agronomic traits, including some therapeutic attributes, were investigated.This population holds significance for its representation of traditional landraces cultivated in the largest state of Uttar Pradesh in India.Molecular markers, such as microsatellites or SNPs, are essential tools for descending the genetic diversity among different rice varieties, races, and exotic accessions, offering valuable insights for rice breeding programs [41].
Variability in traits is of great importance as it provides insight into the potential for selection and improvement.Under study, the magnitude of the phenotypic coefficient of variation (PCV) was generally higher than their respective genotypic coefficient of variation (GCV) for all the traits, indicating the influence of environmental factors.Among the traits, seed vigor and panicle length showed higher variation that may be exploited as a prominent selection parameter while making selections for further yield improvement.Other traits like biological yield per plant, plant height, and harvest index showing moderate PCV and GCV values indicated a combination of genetic and environmental influences.These findings under study are in conformity with earlier reports in the rice [42,43].
The genetic structure and diversity among germplasm lines were accurately assessed by employing the STRUCTURE analysis with molecular markers like microsatellite or SNP markers.This approach provides valuable insights into the genetic architecture of the population, shedding light on the relationships among various individuals or groups within the germplasm collection [14,44].The genetic diversity of the studied accessions was evaluated from SSR genotypic data by using both model-based clustering and distancebased clustering approaches.Out of 64 polymorphic markers, a total of 225 alleles were identified across the 116 rice accessions.The number of alleles per locus ranged from two to eight, with an average of 3.57 alleles per locus.These findings are in accordance with previous reports on alleles per locus, polymorphic information content, and gene diversity in rice [28,45].The average number of alleles observed in this study (3.57alleles/locus) is aligned with other studies showing 3.88 alleles/locus, 3.9 alleles/locus [19], and 3.57 alleles per locus [28].The mean polymorphic information content (PIC) obtained from screening with 19 InDel markers was found to be 0.440, similar to other reports ranging from 0.285 in a sample of 300 global rice collection [19,29,46,47] to 0.544 observed in a rice core collection comprising samples from various countries.At a global scale, the most diverse panels exhibit gene diversity values within the range of 0.5 to 0.7 [48].The average PIC value was 0.747 with a wide range of individual markers, such as RM162 displaying a value of 0.495 to RGNMS3228 exhibiting a highest PIC value of 0.984, enabling the amplification of eight alleles.The detection of a noteworthy quantity of rare alleles underscores their significant impact on the overall genetic diversity within the population.These findings strongly suggested that the diversity panel composed of 116 germplasm lines in our study captures a significant portion of the genetic diversity found in major rice-growing regions across Asia.Furthermore, the identification of a substantial number of rare alleles in this investigation underscores their crucial role in bolstering the overall genetic diversity of the population.
The population was partitioned into two subgroups: SG1, predominantly composed of Indica accessions, and SG2, primarily consisting of Japonica accessions.Both subgroups made a substantial contribution to the overall population diversity.Although the population encompasses landraces, varieties, and breeding lines, the primary source of molecular diversity stems from the landraces.The model-based approach using STRUC-TURE has been extensively applied by researchers to investigate population structure in rice [19,25,29,47,49,50].The assignment of genotypes to different subgroups varies based on ancestry threshold values of 60-80% reported in similar studies [29,51] among research groups.In our study, adopting a stringent threshold of 80% ancestry value resulted in only 21 genotypes being classified as admixtures.Population structure analysis across diverse rice panels have revealed the presence of two to eight subpopulations in rice [25,50].In the current rice diversity panel of 116 accessions, 23 were assigned to SG1 based on maximum membership probabilities.SG1 is predominantly composed of Indian origin landraces and varieties.On the other hand, SG2 and SG3 encompassed 32 and 40 accessions, respectively, primarily consisting of non-Indian exotic accessions.This population structure, featuring two subgroups, mirrors findings from prior research.[29,52] These results imply that the presence of three subgroups may be due to the different ecological environments.Indica and Japonica accessions seem to have undergone independent evolutionary trajectories.This study, enriched with a substantial number of traditional landraces from the Crop Research Centre, Masodha, ANDUA&T, Ayodhya, shed light on the relationship between Indian germplasm and exotic accessions.It underscores that germplasm lines exhibit variability based on their ecological niches, highlighting a heightened level of genetic diversity within this population.
The clustering analysis categorized the accessions into three groups, with 42 genotypes in group I, and 36 and 38 genotypes in groups II and III, respectively.Two classification methods used in the clustering analysis demonstrated a notable degree of similarity in grouping the genotypes.These findings corroborate earlier studies indicating that the Indica group possesses higher genetic diversity than Japonica accessions [53,54], consistent with the fact that this subgroup primarily comprises Indica accessions affirming that the Indica subpopulation encompasses the largest rice-growing region, characterized by diverse environments, ecological conditions, and soil types.
The outcome of the model-based analysis was in concordance with the clustering pattern observed in both the neighbor-joining tree and principal coordinate analysis.The first three principal coordinates accounted for 5.8%, 5%, and 3.92% of the molecular variance, mirroring a similar trend observed in two population subgroups.Calculating Wright's F Statistic at all loci revealed a deviation from the Hardy-Weinberg equilibrium within the population, indicating notable molecular variation.The FST results indicated a higher degree of divergence between subgroups within the population.Moreover, a higher FIT, measured at the subgroup level across the entire population, suggested an absence of equilibrium among the groups, likely attributed to the inbreeding nature of rice.This study illuminates numerous underexplored landraces from Uttar Pradesh extensively cultivated by farmers across various regions of the largest state of India.The genetic diversity within this population is shaped by its ecological and evolutionary history, with varieties adapted to a wide array of ecosystems and diverse eco-geographical conditions.In establishing a core collection for association studies, a two-step approach was adopted [29,55] which involved determining the population structure followed by sampling based on the relatedness of the accessions.Accessions exhibiting high genetic relatedness were considered for elimination in order to curate a core collection with diverse representation.All 116 acces-Crops 2024, 4 192 sions can be effectively utilized for genome-wide or candidate gene-specific association mapping, facilitating the linkage between genotypic and phenotypic variation.

Conclusions
This study emphasizes the crucial role of genetic diversity in crop improvement, exemplified by a comprehensive analysis of 116 rice accessions.The SSR markers facilitated accurate assessment of genetic diversity, revealing 225 alleles across 64 polymorphic markers.The average number of alleles per locus (3.57) and gene diversity (0.421) suggested the presence of a broad genetic base in this collection.This diversity panel effectively captures a significant portion of genetic diversity in major rice growing regions across Asia.Stratification into Indica and Japonica subgroups, with landraces as primary contributors to diversity, underscores their significance.The findings from structure analysis were consistent with the results obtained from the clustering method using the neighbor-joining tree and principal coordinate analysis which distributed the population into three distinct subgroups.Clustering and genetic metrics further confirm the complexity of genetic dynamics in the population.As India is considered as one of the mega diversity centers for rice, this research offers valuable insights into the genetic diversity of the rice accessions included in this study.The establishment of a core collection for association studies provides a vital resource for future research in rice improvement.These findings can be used to guide various approaches, such as association analysis, the development of classical mapping populations, selection of parental lines in breeding programs, and hybrid development, to harness the natural genetic variation present within this population.In summary, this study significantly contributes to advancing rice breeding and genetic research towards the creation and utilization of newer genetic variability for further improvement in yield levels accompanied with climate resilience and better quality traits.

Figure 1 .
Figure 1.The statistics of the selected features of different parameters to inspect and find interesting features in the gene diversity data set.

Figure 1 .
Figure 1.The statistics of the selected features of different parameters to inspect and find interesting features in the gene diversity data set.

Figure 2 .
Figure 2. Scatter plot visualization of chromosome markers and SSR motifs with exploratory analysis and data visualization enhancements.Color-coded regions on the plot align with chromosome projections, as well as markers (A,B) and SSR motifs (C,D) for each chromosome.(A-C) exhibit minimum molecular weight and(B-D) exhibit maximum molecular weight.

Figure 2 .
Figure 2. Scatter plot visualization of chromosome markers and SSR motifs with exploratory analysis and data visualization enhancements.Color-coded regions on the plot align with chromosome projections, as well as markers (A,B) and SSR motifs (C,D) for each chromosome.(A-C) exhibit minimum molecular weight and(B-D) exhibit maximum molecular weight.

Figure 3 .
Figure 3. Population structure of 116 accessions in subgroup 1 and membership probability of assigning genotypes of subgroup 1 (k = 3).

Figure 3 .
Figure 3. Population structure of 116 accessions in subgroup 1 and membership probability of assigning genotypes of subgroup 1 (k = 3).

Figure 5 .
Figure 5. Principal coordinates of 116 accessions based on 64 SSR loci.Coord 1 and Coord 2 represent first and second coordinates, respectively.

Figure 5 .
Figure 5. Principal coordinates of 116 accessions based on 64 SSR loci.Coord 1 and Coord 2 represent first and second coordinates, respectively.

Table 1 .
Details of SSR loci used for genotyping in the 116 rice accessions and their genetic diversity parameters.

Table 2 .
Genetic variability among studied traits.

Table 3 .
Summary of AMOVA between groups and accessions and fixation indices using Fst values.