Estimation of the Genetic Diversity and Population Structure of Thailand’s Rice Landraces Using SNP Markers

: Rice is a staple food for more than half of the world’s population. Modern rice varieties have been developed for high yield and quality; however, there has been a substantial loss of diversity. A greater number of genetically dynamic landraces could offer valuable and useful genetic resources for rice improvement. In this study, the genetic diversity and population structure of 365 accessions of lowland and upland landraces from four populations from different geographical regions of Thailand were investigated using 75 SNP markers. Clustering analyses using maximum likelihood, Principal Coordinate Analysis (PCoA), and Discriminant Analysis of Principal Components (DAPC) clustered these landraces into two main groups, corresponding to indica and japonica groups. The indica group was further clustered into two subgroups according to the DAPC and STRUCTURE analyses (K = 3). The analysis of molecular variance (AMOVA) analysis results revealed that 91% of the variation was distributed among individuals, suggesting a high degree of genetic differentiation among rice accessions within the populations. Pairwise F ST showed the greatest genetic differentiation between the northeastern and southern populations and the smallest genetic differentiation between the northern and northeastern populations. Isolation-by-distance analysis based on a Mantel test indicated a signiﬁcant relationship between the genetic distance and geographic distance among the Thai rice landraces. The results from this study provide insight into the genetic diversity of Thai rice germplasm, which will enhance the germplasm characterization, conservation, and utilization in rice genetics and breeding.


Introduction
Rice (Oryza sativa L.) is one of the most important cereal grains in the world and serves as a staple food for more than half of the world's population [1]. Rice is grown in more than 100 countries, with 90% of the global production from Asia [2]. Research has estimated that about 120,000 distinct rice varieties exist in the world [1]. Southeast Asia is important as the main source of rice germplasm and the source of rice diversity [3]. Asian cultivated rice (O. sativa L.) was domesticated from the wild rice species, O. nivara and O. rufipogon [1], and can be assigned to either the subspecies indica or japonica depending 2 of 14 on a variety of physiological and morphological characteristics [4]. The subspecies indica is primarily grown in the tropical regions, while the subspecies japonica is grown in either the subtropical and temperate regions of East Asia [5].
Modern rice varieties have been developed for both high yield and high quality [6]. However, these rice varieties exhibit a loss of diversity, which leads to a wide range of effects regarding adaptations to changing environments [7]. Landraces, on the other hand, have been found to be genetically dynamic and display equilibrium with both the environment and pathogens; thus, they could provide valuable and useful genetic resources for crop improvement [8]. Landrace variation has become interesting knowledge for conservation and utilization [9]. However, landraces have declined in popularity, and a number of landraces are gradually being replaced by improved varieties. There is an urgent need to conserve the landraces that are rapidly declining as high-yield varieties become predominant.
Studies on genetic diversity and population structure are critical for characterizing the genetic relationship among germplasm accessions. Genetic variation among populations and their genetic relationships aid in the conservation and parental selection in crop improvement programs [10]. The identification of populations with a high level of genetic variation will become a valuable resource for broadening the genetic base as this enables the identification of superior alleles for several traits [11]. Numerous techniques can be used to determine rice genetic variations in order to comprehend and use accessible gene bank accessions. Initially, isozyme markers were used to evaluate rice genetic diversity [12]; however, these have now been replaced with DNA markers, such as simple sequence repeats (SSRs), microsatellites [13,14], and single nucleotide polymorphisms (SNPs) [15][16][17][18].
Thailand is one of the countries with a significant number of indigenous rice varieties and landraces, which can serve as a valuable genetic resource for future crop improvement to satisfy the ever-increasing demand for food production. Over 17,000 traditional landraces have been conserved at the National Rice Gene Bank collection. To effectively utilize this germplasm, an assessment and classification of the diversity is necessary. While studies on genetic diversity are available for many rice collections around the world [19,20], those for Thai rice germplasm have been conducted on limited sets of accessions [21][22][23][24][25]. According to the previous studies on the genetic structure of Thai rice evaluated using indel and simple sequence repeat (SSR) markers, two distinct groups of rice accessions have been revealed to correspond to indica and japonica groups [23,24]. However, the genetic structures of lowland and upland landraces from different geographical regions of Thailand are still largely unknown.
In this study, we used a set of SNP markers to investigate the genetic diversity and population structure of a collection of lowland and upland landraces, consisting of 365 accessions obtained from four geographical regions of Thailand. The results from this study provide an insight into the genetic diversity of the Thai rice germplasm, which will enhance germplasm characterization, conservation, and utilization in rice genetics and breeding.

Plant Materials
A collection of 365 accessions of Thai rice landraces (Oryza sativa L.) was obtained from the National Rice Gene Bank of Thailand. According to the germplasm registry, this collection is composed of 169 and 196 accessions of upland and lowland rice, respectively, which were obtained from different parts of Thailand. Among the upland rice, there were 46, 38, 40, and 45 accessions from northern (N), northeastern (NE), central (C), and southern (S) populations, respectively. Among the lowland rice, there were 50, 48, 49, and 49 from the N, NE, C, and S populations, respectively (Table S1).

DNA Extraction and SNP Genotyping
The total genomic DNA was extracted from young leaves using a DNA Trap I kit (DNA Technology Laboratory, Thailand). The DNA was quantified using a NanoDrop 8000 Spectrophotometer (Thermo Fisher Scientific, DE, USA) and diluted to a concentration of 5 ng/µL. Genotyping was performed with 102 SNP markers using Kompetitive allele specific PCR (KASP) and the SNPline genotyping system following standard KASP protocols (LGC Genomics, Teddington, UK). The SNP marker set (102 markers) was a courtesy of the Rice Science Center, Kasetsart University, Thailand (data unpublished).

Data Management and Analysis
The input for all analyses was based on the genotype data from polymorphic SNP markers. The genetic distance was calculated using the Nie's standard distance [26], and the phylogenetic tree was created using MEGA X [27] based on the Maximum-likelihood method [28].
The major allele frequency (MAF), number of alleles per locus (Na), and gene diversity per locus (He) as well as the polymorphism information content (PIC) value were estimated using PowerMarker version 3.25 [29]. The SNP data of the 72 representatives of 3,000 rice accessions corresponding to the positions of our SNP markers were obtained from the Rice SNP-Seek Database [30] and used to construct a phylogenetic tree referring to five groups of rice (O. sativa), i.e., indica, tropical japonica, temperate japonica, aus, and aromatic (Table S2; Figure S1).
Principal Coordinate Analysis (PCoA), STRUCTURE analysis, and Discriminant Analysis of Principal Components (DAPC) were used to investigate the patterns of population structure. PCoA was performed using DARwin 6.0.021 [31], and PCs were plotted using ggplot2 [32]. The STRUCTURE analysis was performed using a Bayesian model-based clustering algorithm implemented in STRUCTURE version 2.3.4 [33], where the admixture model with the correlated allele frequencies was used. A total of 3 independent replicates were run for each genetic cluster (K) value (K = 1-8), using a burn-in period of 100,000 and a run length of 100,000 iterations. LnP(D) values were derived for each K and plotted to find the plateau of the ∆K [34]. The final population structure was calculated using the structure harvester [35]. DAPC was performed with the adegenet package [36]. The "find.cluster" function was used to identify clusters (k), and the optimal k value was determined according to the Bayesian Information Criterion (BIC). Then, the "dapc" function was used to verify the classification quality. GenAlEx V6.5 [37] was used to compute the pairwise fixation index (F ST ) among all pairs of populations, in order to investigate the population differentiation. The same tool was used to perform analysis of molecular variance (AMOVA) to estimate the components of the variances of the populations. Mantel tests were conducted using the R package vegan [38].

Genetic Variability of 365 Rice Accessions Based on SNP Markers
A collection of 365 Thai rice accessions were used in this study. These included rice landraces from four geographical regions of Thailand representing different ecosystems: the upland (169 accessions) and lowland (196 accessions) ( Figure 1, Table S1). To study the genetic variability among these rice accessions, a total of 102 KASP markers were initially used to genotype the 365 rice accessions (Table S3). After the removal of SNP markers that had missing genotypes above 10% or had a minor allele frequency (MAF) lower than 5%, a total of 75 SNP markers (73.5%) were retained (Table 1). These 75 SNP markers were distributed over the 12 chromosomes, in which the highest number (11 markers) and the lowest number (2 markers) of SNP markers were found on chromosomes 4 and 10, respectively ( Figure 2). The polymorphic information content (PIC) ranged from 0.11 to 0.37, with an average of 0.26, and the gene diversity, also known as the expected heterozygosity (He), varied from 0.12 to 0.49 ( Table 1). The observed heterozygosity varied from 0 to 0.02. All SNP markers were biallelic, containing only two alleles for each marker. The  Figure S2).
Agronomy 2021, 11, x FOR PEER REVIEW erozygosity varied from 0 to 0.02. All SNP markers were biallelic, containing only alleles for each marker. The major allele frequency (MAF) at each locus ranged from to 0.94. Most of the markers showed a variable allele frequency compared among ferent geographical populations ( Figure S2).

Population Structure of Thai Landraces
To understand the structure of the overall 365 rice accessions among the four different geographical regions, three different approaches, i.e., STRUCTURE, DAPC, and PCoA, were applied. All analyses were performed based on the genotypes identified by the 75 SNP markers. In STRUCTURE, the number of clusters was estimated based on the

Population Structure of Thai Landraces
To understand the structure of the overall 365 rice accessions among the four different geographical regions, three different approaches, i.e., STRUCTURE, DAPC, and PCoA, were applied. All analyses were performed based on the genotypes identified by the 75 SNP markers. In STRUCTURE, the number of clusters was estimated based on the ∆K method [34] and the plateau criterion [33]. The results suggested the best grouping number at K = 3 based on the ∆K and the mean log-likelihood (LnP(D)) curve (Figure 3). This suggests that the 365 rice landraces can be grouped into three subpopulations, referred to as Groups I-III (Figure 3). contained 32 accessions, all of which were japonica landraces, and most of which were upland rice from the N and S populations (Table S4). None of the accessions in this group contained admixed ancestry. The number of subpopulations and the subpopulation attributions yielded by DAPC were similar to those revealed with STRUCTURE ( Figure 4). The three groups were clearly defined as two closely related indica subgroups (Groups 1 and 2) and a distantly related japonica group (Group 3) (Figure 4; Table S4). Similarly, the PCoA result revealed two main clusters corresponding to the two groups, indica and japonica. However, subgroups among the indica landraces were not clearly defined ( Figure 5; Table S4). In each group, accessions with a score higher than 0.80 were assigned to a pure group, while those with a score lower than 0.80 were assigned to be admixture. We found that the accessions assigned into Groups I and II were all indica landraces (Table S4). Group I consisted of 168 accessions where 45.23% were landraces from the southern (S) population and 33.33% were landraces from the central (C) population. Within this cluster, 126 accessions contained pure genotypes, and 42 accessions exhibited an admixed ancestry.
Group II consisted of 165 accessions, 81.21% of which were landraces from the northern (N) and the northeastern (NE) populations. Within this group, 126 accessions contained pure genotypes, and 39 accessions exhibited an admixed ancestry. Group III contained 32 accessions, all of which were japonica landraces, and most of which were upland rice from the N and S populations (Table S4). None of the accessions in this group contained admixed ancestry.
The number of subpopulations and the subpopulation attributions yielded by DAPC were similar to those revealed with STRUCTURE ( Figure 4). The three groups were clearly defined as two closely related indica subgroups (Groups 1 and 2) and a distantly related japonica group (Group 3) (Figure 4; Table S4). Similarly, the PCoA result revealed two main clusters corresponding to the two groups, indica and japonica. However, subgroups among the indica landraces were not clearly defined ( Figure 5; Table S4).

Genetic Distance and Phylogeny of the 365 Accessions
To examine the genetic relationships among the 365 rice landraces, we calculated the genetic distance based on Nie's dissimilarity. As a result, two major clusters of these rice landraces, corresponding to indica and japonica groups, were revealed ( Figure S3). The phylogenetic tree constructed using the maximum likelihood method clustered these landraces into two major groups, corresponding to indica and japonica, similar to the groups identified by DAPC and PCoA ( Figure 6). Among the indica group, two major clusters (namely Indica I and II) were further observed. The Indica I subgroups contained 176 accessions, 74 and 61 of which were rice landraces from the S population and the C population, respectively. Most of the rice landraces in this group were non-glutinous rice ( Table S4). The indica II subgroups contained 148 accessions, 122 of which were rice landraces from the N and NE populations. The majority of the rice landraces in this group (72.29%) were glutinous rice. The landraces from the S population were rarely contained in this group (Table S4). There was also a small intermediate group present between the two indica subgroups. This intermediate indica group contained nine rice landraces from the C, N, and NE populations (Table S4). Among the japonica group, two clusters (namely Japonica I and II) were indicated. The Japonica I subgroup predominantly contained the rice accessions from the N population (10/14 accessions), while the Japonica II subgroup predominantly contained the accessions from the S population (15/18 accessions).

Genetic Distance and Phylogeny of the 365 Accessions
To examine the genetic relationships among the 365 rice landraces, we calculated the genetic distance based on Nie's dissimilarity. As a result, two major clusters of these rice landraces, corresponding to indica and japonica groups, were revealed ( Figure S3). The phylogenetic tree constructed using the maximum likelihood method clustered these landraces into two major groups, corresponding to indica and japonica, similar to the groups identified by DAPC and PCoA ( Figure 6). Among the indica group, two major clusters (namely Indica I and II) were further observed. The Indica I subgroups contained 176 accessions, 74 and 61 of which were rice landraces from the S population and the C population, respectively. Most of the rice landraces in this group were non-glutinous rice ( Table S4). The indica II subgroups contained 148 accessions, 122 of which were rice landraces from the N and NE populations. The majority of the rice landraces in this group (72.29%) were glutinous rice. The landraces from the S population were rarely contained in this group (Table S4). There was also a small intermediate group present between the two indica subgroups. This intermediate indica group contained nine rice landraces from the C, N, and NE populations (Table S4). Among the japonica group, two clusters (namely Japonica I and II) were indicated. The Japonica I subgroup predominantly contained the rice accessions from the N population (10/14 accessions), while the Japonica II subgroup predominantly contained the accessions from the S population (15/18 accessions). While both upland and lowland rice landraces were found in all indica subgroups, the upland rice landraces were predominantly included in the japonica group (Table S4). The overall genetic diversity of the 365 rice accessions was moderate, as revealed by the While both upland and lowland rice landraces were found in all indica subgroups, the upland rice landraces were predominantly included in the japonica group (Table S4). The overall genetic diversity of the 365 rice accessions was moderate, as revealed by the mean gene diversity (0.33) and mean PIC (0.26) ( Table 2). Similar values of gene diversity (0.28-0.32) and PIC (0.23-025) were found among the four geographical populations (Table  2). However, the mean genetic diversity of the japonica group (0.09) was much lower than that of the other two indica subgroups (0.27 and 0.24 for Indica I and II, respectively) ( Table 3).

Genetic Differentiation, AMOVA, and Isolation-by-Distance Analyses
We further quantified the genetic differentiation (F ST ) between each geographical population using Wright's test [39]. In practice, an F ST of 0.00-0.05 indicates low differentiation, 0.05-0.15 indicates moderate differentiation, and F ST of >0.15 indicates a high level of differentiation [40]. Pairwise estimates of the F ST values between pairs of subpopulations due to geographical regions revealed low to moderate genetic differentiation, ranging from 0.016 to 0.078 (Table 4). The largest genetic differentiation (F ST ) was detected between the NE and S populations (0.078), and the lowest genetic differentiation was detected between the N and NE populations (0.016). The levels of genetic differentiation between the S and N populations and between the S and C populations were also moderate as indicated by the F ST values of 0.069 and 0.052, respectively. The analysis of molecular variance (AMOVA) based on the geographical regions revealed significant variations (p < 0.001) among and within the subpopulation groups corresponding to geographic regions (Table 5). Differences within the geographical regions contributed to approximately 91% of the total genetic variations, and these differences were notably and significantly higher than those among regions (only 9% of the total genetic variation was due to differences among regions; Table 5). Isolation-by-distance (IBD) based on Mantel's test was also demonstrated when comparing the genetic distances (F ST ) and geographical distances. A significant positive relationship between the geographical and genetic distances (R 2 = 0.026, p < 0.001) was observed in the entire population based on the SNP data (Figure 7). A similar relationship was also observed between the genetic distances (F ST ) and some climatic factors, i.e., latitude, elevation, temperature, and rainfall, among the 365 accessions ( Figures S4-S7).

Discussion
This study provides a comprehensive and systematic study of the genetic variation in a large collection of Thai rice landraces. The panel consisted of 365 accessions, which included 169 and 196 accessions of upland and lowland rice, respectively. These accessions were representatives of landraces obtained from four geographic origins: northern (96 accessions from 9 provinces), northeastern (86 accessions from 17 provinces), central (89 accessions from 15 provinces), and southern Thailand (94 accessions from 14 provinces). Based on the 75 SNP markers, the PIC values for all SNPs were less than 0.5, with an average PIC value of 0.26, suggesting that all the SNPs were considered moderate-or low-information markers. This may be due to the bi-allelic nature of the SNPs [41].
The different approaches (STRUCTURE, PCoA, and DAPC) used to analyze the genetic structure of the 365 rice accessions appeared to provide complementary information. The results showed a good consistency between STRUCTURE and DAPC. The population structure analysis using these approaches suggested that the Thai rice landraces could be divided into two main subpopulations, corresponding to indica and japonica groups, similar to the previous reports [21,23,24]. However, the majority of accessions present in our panel were indica rice. The genetic diversity was higher for the indica groups compared to the japonica group.
While both the upland and lowland landraces were distributed among the indica groups, most landraces in the japonica group were upland rice. In Thailand, indica is the dominant rice type and is normally grown in the lowland ecosystem. On the other hand, japonica rice is rarely found in mainstream rice production in Thailand and is, instead, mostly enriched in regions, such as the northern highlands [42].
Based on the rice collection used in this study, japonica rice was likely to be separated into two subgroups, one specific to the northern population and another specific to the southern population. It is possible that both subgroups of Thai japonica rice arose in both regions from different origins. There were also some rice accessions from the central population belonging to the Japonica I subgroup ( Figure 6; Table S4). These rice accessions may have the same origin as those in the northern population and may be exchanged by farmers between the two regions. It is also worth determining whether the two classes of japonica have different physiological properties.
F ST measures the amount of genetic variance that can be explained by the population structure based on Wright's F-statistics [39]. An F ST value of 0 indicates no differentiation between the subpopulations, while a value of 1 indicates complete differentiation. An F ST value greater than 0.15 can be considered as significant in differentiating populations [43]. In the present study, no significant divergence was found among the four regional populations (Table 2). This coincided with the AMOVA results (Table 5), where the vast majority of the total variation (91%) was accounted for by within-subpopulation variation, while only 9% of the total variation was accounted for by among-subpopulation variation. Assuming germplasm exchanges among regions, an independent clustering pattern of landraces from different origins in the maximum-likelihood phylogenetic tree analysis, and the revealed high molecular variance among accessions within populations in AMOVA pointed to human pressure on the dissemination of seeds among populations. This is also supported by the isolation-by-distance analysis result where only a low level of correlation was detected between the genetic distance (F ST ) and geographic distance for the Thai rice landraces.

Conclusions
In this study, the genetic diversity and population structure of 365 Thai rice landraces were investigated using a set of 75 SNP markers. Two main subpopulations, corresponding to indica and japonica, were clearly identified across the whole panel of these rice accessions. Among the indica group, two further subgroups were also indicated. The genetic differentiation among the four geographic populations was not high, which may be due to the occurrence of genetic exchange between populations.
This study provides a detailed understanding of the genetic structure and diversity of Thai landraces, which is crucial for the efficient utilization of rice genetic resources and for developing suitable conservation strategies. This will aid in building a comprehensive collection of landraces in terms of genetic diversity, which is fundamental for other studies, such as genome-wide association studies (GWAS).

Supplementary Materials:
The following are available online at https://www.mdpi.com/article/ 10.3390/agronomy11050995/s1, Figure S1: The maximum-likelihood tree of the 72 representative of 3000 rice genomes (highlighted in different colors) and the 365 Thai landrace accessions, Figure  S2: Heatmap hierarchical clustering of the genetic distance calculated based on the dissimilarity matrix of 365 rice accessions, Figure S3: Allele frequency of the 75 SNP markers in each geographical population, Figure S4: Correlation of the latitude distance and genetic distance (pairwise F ST ) among the 365 rice accessions, Figure S5: Correlation of the pairwise elevation difference (in meters) and genetic distance (pairwise F ST ) among the 365 rice accessions, Figure S6: Correlation of the pairwise difference of average temperature in a year (in degree Celsius) and genetic distance (pairwise F ST ) among the 365 rice accessions, Figure S7: Correlation of the pairwise difference of average rainfall in a year (in millimeters) and genetic distance (pairwise F ST ) among the 365 rice accessions, Table  S1: List of rice accessions used in the study, Table S2: List of 72 representative of 3000 rice genomes assigned to five genetic groups, Table S3: List of 102 SNP markers used for genotyping, and Table S4: List of 365 rice accessions used in the study.