Genetic Analysis of a Collection of Rice Germplasm (Oryza sativa L.) through High-Density SNP Array Provides Useful Information for Further Breeding Practices

Traditional breeding strategies mainly focus on the evaluation of trait performance, but pay less attention to the changing genetic background. A comprehensive understanding of the genetic diversity of germplasms is crucial for the deliberate improvement of specific traits. A collection of 154 highland rice varieties were collected as the initial genetic resource in our breeding program to improve the pathogen resistance and eating and cooking qualities. These varieties were analyzed using a whole-genome SNP array and were clustered into three groups. Further analysis revealed that the favorable alleles of pathogen resistance genes are mostly absent in our collected varieties. However, it showed that most varieties possess favorable alleles of Waxy (Wx) and ALKALI DEGENERATION (ALK), which are able to enhance the eating and cooking qualities. Moreover, only about one fifth of all varieties harbors favorable the allele of fragrance gene Betaine aldehyde dehydrogenase (BADH2). Together, these results give an overall view of the genetic constitution of the target traits, which provide useful information for future genetic improvement in breeding practices.


Introduction
Rice (Oryza sativa L.) is one of the most important staple crops that feeds approximately half of the population in the world [1]. It has been domesticated to be adapted to different cultivation regions with diverse environmental conditions [2][3][4]. The Yunnan province, located in southwestern China, is one of the major rice-producing areas, where rice terraces are the local specific cultivation mode. The cultivation region expands in the vertical direction, which ranges from areas with hundreds of meters up to >2500 m [5]. Moreover, there are 26 ethnic minorities in the Yunnan province. Different groups maintain and grow their own rice landraces for highly specific uses (e.g., ethnic dietary customs, medicinal uses, festivals, and religious ceremonies) [6]. Therefore, the local rice varieties adapted to the diverse environment conditions or used for specific demands provide an excellent opportunity for studies of the genetic diversity of local rice varieties.
Because of the high density and diverse detection methods, single nucleotide polymorphism (SNP) markers have gradually become the ideal markers used for genetic discovery Genes 2022, 13, 830 2 of 10 research and molecular breeding. The investigation of the rice genome has revealed millions of SNPs [7][8][9][10]. Recent progress in next-generation sequencing technology boosted the establishment of an SNP marker-based platform for genotyping [11,12]. SNP array technology provides a low-cost and efficient method to analyze the genotypes of multiple loci in the whole-genome scale [13][14][15][16]. Furthermore, SNPs from cloned genes provide useful information for breeding applications [17][18][19].
In recent years, consumer preferences have shifted towards rice with better eating and cooking qualities. Although each country or region have a particular preference of rice with a specific combination of quality traits, a common trait across most countries in Asia is the soft texture of freshly cooked rice, and its maintenance during storage [20]. This trait is highly related to the amount of amylose in the grain, which is controlled by two major loci, Waxy (Wx) and ALKALI DEGENERATION (ALK)/starch synthase IIa (SsIIa) [21][22][23]. Previous studies have revealed a relationship between the genetic variations within loci and the change in phenotypes. For example, Wx b in Nipponbare is a weak allele conferring a moderate amount of amylose accumulation and better quality [24], and ALK in Minghui 63 is an unfavorable allele that increases the gelatinization temperature and results in a lower eating quality [21]. Furthermore, fragrant rice is gaining widespread popularity among consumers worldwide. The recessive allele of Betaine aldehyde dehydrogenase (BADH2), badh2.1 is associated with the rice fragrance [25]. SNP mutation within badh2.1 was used as functional nucleotide polymorphism for both breeding and further research.
In this research, a total number of 154 cultivated rice varieties were genotyped with 44,263 high-quality SNP markers throughout all 12 chromosomes. The genetic structure of these varieties was analyzed. By using the SNP markers located in the cloned and characterized genes [26], especially the genes with regard to pathogen resistance and amylose content, the genic characteristics of bio-stress resistance and rice eating quality were evaluated. Our results not only reveal the complex genetic diversity in rice varieties in Yunnan, but also provide useful information for future breeding practices.

Plant Materials and Growing Conditions
A total number of 154 local rice germplasms from the Yunnan province, China, together with 3 wild rice accessions (Oryza rufipogon, Oryza meyeriana and Oryza officinalis) and Nipponbare as the control variety were analyzed in this study. All the materials were planted in the experimental plot located in the Songming county, Kunming city, Yunnan province, China. The longitude and latitude were 102.41 E and 25.05 N, respectively. The sowing date was March 17th 2019, and the transplanting date was between the 14th and 16th of May.

DNA Isolation
Flag leaves of each variety were collected for DNA isolation. DNA was extracted from the frozen leaf tissues of each variety through the CTAB (Cetyltrimethylammonium bromide) method [27].

SNP Array Analysis and Phylogenetic Tree Construction
A total of 44,263 high-quality SNP markers (GenTrain score > 0.5, missing rate < 20%, heterozygous rate < 0.05, and minor allele frequency (MAF) > 0.05) were obtained after genotyping the germplasm collection of 4726 cultivated rice. The distances between any two lines were measured by the ratios of polymorphic SNPs. The phylogenetic tree was constructed based on an improved version of the neighbor-joining algorithm method [28] using the Rlape software [29]. A Manhattan plot was generated by a costumer R script.

Background Analysis
The genotypes of rice varieties were visualized using a program written in R scripts. The program translated the SNP data information into color bars according to the length Genes 2022, 13, 830 3 of 10 of chromosomes. Chromosome fragments with homozygous SNP genotypes are shown as empty bars with a green border, whereas those with heterozygous SNP genotypes are shown as solid blue lines or bars.

Functional Analysis of Target Genes
The SNP array platform was designed to cover the causal SNP(s) for most cloned rice genes according to the information in public publications [26]. To determine the genotype of a specific gene, the genotype of a causal SNP or the haplotype represented by a combination of a group of SNPs were used to compare with a standard genotype represented by the indicated variety. The favorable allele of a gene means either the gene confers disease resistance or enhances the eating and cooking qualities.

The Characteristics of Whole-Genome SNP Array
A SNP array platform was designed with a total number of 44,263 high-quality SNP markers to cover the entire rice genome. These SNPs are almost evenly distributed on the 12 chromosomes with an average of 11 SNP markers per 100 kb ( Figure 1A). The gaps between two adjacent SNPs were calculated ( Figure 1B), most of which are shorter than 25 kb. The gaps of nearly a third of all SNPs are shorter than 2 kb, and about 83.7% of which are shorter than 20 kb.

Background Analysis
The genotypes of rice varieties were visualized using a program written in R s The program translated the SNP data information into color bars according to the of chromosomes. Chromosome fragments with homozygous SNP genotypes are as empty bars with a green border, whereas those with heterozygous SNP genotyp shown as solid blue lines or bars.

Functional Analysis of Target Genes
The SNP array platform was designed to cover the causal SNP(s) for most clon genes according to the information in public publications [26]. To determine the gen of a specific gene, the genotype of a causal SNP or the haplotype represented by a nation of a group of SNPs were used to compare with a standard genotype repre by the indicated variety. The favorable allele of a gene means either the gene confe ease resistance or enhances the eating and cooking qualities.

The Characteristics of Whole-Genome SNP Array
A SNP array platform was designed with a total number of 44,263 high-qualit markers to cover the entire rice genome. These SNPs are almost evenly distributed 12 chromosomes with an average of 11 SNP markers per 100 kb ( Figure 1A). Th between two adjacent SNPs were calculated ( Figure 1B), most of which are shorte 25 kb. The gaps of nearly a third of all SNPs are shorter than 2 kb, and about 83 which are shorter than 20 kb.

The Genetic Structure of the Local Varieties in Yunnan
The collection comprised 154 local varieties together with 3 wild rice acce (Oryza rufipogon, Oryza officinalis and Oryza meyeriana) and the control variety Nipp and was subjected to whole-genome analysis with an SNP array. Based on the gen of the SNP markers, these varieties were clustered into three groups, named as gr II and III ( Figure 2). Wild rice accessions and two other varieties, Fengdao 11 (No. 7 Fengdao 15 (No. 80), were grouped together and composed group I. Group II, whic prised five local varieties, together with the varieties of group I showed distinct g diversities compared with the other varieties in group III. A total of 94.2% of all varieties make up group III, within which Nipponbare is listed.

The Genetic Structure of the Local Varieties in Yunnan
The collection comprised 154 local varieties together with 3 wild rice accessions (Oryza rufipogon, Oryza officinalis and Oryza meyeriana) and the control variety Nipponbare and was subjected to whole-genome analysis with an SNP array. Based on the genotypes of the SNP markers, these varieties were clustered into three groups, named as groups I, II and III ( Figure 2). Wild rice accessions and two other varieties, Fengdao 11 (No. 78) and Fengdao 15 (No. 80), were grouped together and composed group I. Group II, which comprised five local varieties, together with the varieties of group I showed distinct genetic diversities compared with the other varieties in group III. A total of 94.2% of all tested varieties make up group III, within which Nipponbare is listed.  In the breeding course, several elite varieties from other cultivation regions were introduced in the Yunnan province. Parts of these varieties, such as Dozikara-1, Yuanbai, Linyou 22 and Zhikegu, were grouped together with different local varieties, which implies that gene resource from these varieties were introduced in local varieties during the breeding progress ( Figure 2). However, another group of introduced elite varieties comprised Lvmi 3, Lvmi 1, SMGR 4, Lvmi 2, Nanjing 5055 and Xuebai 2, which suggests that they share a similar genetic background, but it is not well used in the breeding progress ( Figure 2).

The Background Analysis of the Local Varieties in Yunnan
The genetic background of parent lines in breeding crosses provides very important information to further breeding practices. Taking advantage of the whole-genome SNP array, the background of 154 local varieties, two wild rice accessions and the control plant Nipponbare was analyzed. Wild rice accessions exhibited heterogeneous backgrounds since they showed an outcrossing character [30]. Similarly, large portion of the chromosome fragments in our wild rice accessions (Oryza officinalis, Oryza meyeriana and Oryza rufipogon) showed heterozygous genotypes, although to a different extent ( Figure 3A-C). Both the control plant Nipponbare and another example, Hexi 39, showed homozygous backgrounds ( Figure 3D,E), although some small chromosome fragments display a heterozygous status. We also found another type of variety, Lvmi 3, which contains 1-5% of heterozygous chromosome fragments ( Figure 3F). Additionally, some varieties such as Yinguang and Fengdao 11 show a large portion of heterozygous fragments, especially Fengdao 11 ( Figure 3G,H). and introduced varieties, respectively. The control variety Nipponbare is indicated with a blue line. Yunhui 290, a classical indica inbred variety, is indicated with a purple line.
In the breeding course, several elite varieties from other cultivation regions were introduced in the Yunnan province. Parts of these varieties, such as Dozikara-1, Yuanbai, Linyou 22 and Zhikegu, were grouped together with different local varieties, which implies that gene resource from these varieties were introduced in local varieties during the breeding progress ( Figure 2). However, another group of introduced elite varieties comprised Lvmi 3, Lvmi 1, SMGR 4, Lvmi 2, Nanjing 5055 and Xuebai 2, which suggests that they share a similar genetic background, but it is not well used in the breeding progress ( Figure 2).

The Background Analysis of the Local Varieties in Yunnan
The genetic background of parent lines in breeding crosses provides very important information to further breeding practices. Taking advantage of the whole-genome SNP array, the background of 154 local varieties, two wild rice accessions and the control plant Nipponbare was analyzed. Wild rice accessions exhibited heterogeneous backgrounds since they showed an outcrossing character [30]. Similarly, large portion of the chromosome fragments in our wild rice accessions (Oryza officinalis, Oryza meyeriana and Oryza rufipogon) showed heterozygous genotypes, although to a different extent ( Figure 3A-C). Both the control plant Nipponbare and another example, Hexi 39, showed homozygous backgrounds ( Figure 3D,E), although some small chromosome fragments display a heterozygous status. We also found another type of variety, Lvmi 3, which contains 1-5% of heterozygous chromosome fragments ( Figure 3F). Additionally, some varieties such as Yinguang and Fengdao 11 show a large portion of heterozygous fragments, especially Fengdao 11 ( Figure 3G,H).

Analysis of the Genes Involved in Rice Pathogen Infections
The whole-genome SNP array was designed to cover most of the functional markers located in or near the cloned genes that are reported as important regulators involved in bio-stress resistance. For genes with more than one SNP marker existing in the gene, haplotypes were used to compare the genotypes of the tested varieties with representative alleles. For example, the haplotypes of the Bph6, Bph9, Bph14, Bph15, Bph18 and Bph26 genes are represented by 26, 26, 29, 56, 30 and 24 SNPs, respectively (Table 1). For genes

Analysis of the Genes Involved in Rice Pathogen Infections
The whole-genome SNP array was designed to cover most of the functional markers located in or near the cloned genes that are reported as important regulators involved in bio-stress resistance. For genes with more than one SNP marker existing in the gene, haplotypes were used to compare the genotypes of the tested varieties with representative alleles. For example, the haplotypes of the Bph6, Bph9, Bph14, Bph15, Bph18 and Bph26 genes are represented by 26, 26, 29, 56, 30 and 24 SNPs, respectively (Table 1). For genes with less nucleotide diversity, such as Rymv1, Pi63, Pizt and STV11, the functional SNP was used to distinguish the functional or non-functional alleles. Generally, the favorable alleles of genes conferring resistance to different pathogens exhibit different status in the local varieties. The favorable alleles of genes related to brown planthopper resistance are almost absent in the tested varieties, except that only one variety (0.6%) shared the functional allele of Bph18 with a representative variety (Table 1). For genes conferring rice blast resistance, the favorable alleles of Pi1, Pid3 and Pita are the most used, but only accounted for 32.2%, 12.0% and 12.0% of all varieties, respectively ( Table 1). The proportions of the favorable alleles of Pi2, Pi5, Pia, Pid2, Pikh and Pi63 in all varieties are less than 10%. Additionally, the favorable alleles of Pi9, Pigm and Pizt are completely absent in all varieties. Five genes xa13, Xa21, Xa23, xa5 and Xa7 are commonly used for rice bacterial leaf blight resistance. However, only 20.3% of all varieties share favorable alleles as the representative variety (Table 1).
To further understand the genetic background with regard to the bio-stress resistance in the collection of varieties, the number of favorable alleles possessed by different varieties was counted. About half of the varieties contain no favorable alleles of genes conferring rice blast resistance ( Figure 4A), and more than half of the varieties contain no favorable alleles of genes conferring rice bacterial leaf blight resistance ( Figure 4B). About 20% of the varieties harbor one favorable allele of genes with regard to rice bacterial leaf blight. No variety contains two or more favorable alleles. For genes conferring rice blast resistance, 37.3% of the varieties harbor one favorable allele, and 16.4% varieties harbor more than one, but at most four, favorable alleles ( Figure 4A).
To further understand the genetic background with regard to the bio-stress resistance in the collection of varieties, the number of favorable alleles possessed by different varieties was counted. About half of the varieties contain no favorable alleles of genes conferring rice blast resistance ( Figure 4A), and more than half of the varieties contain no favorable alleles of genes conferring rice bacterial leaf blight resistance ( Figure 4B). About 20% of the varieties harbor one favorable allele of genes with regard to rice bacterial leaf blight. No variety contains two or more favorable alleles. For genes conferring rice blast resistance, 37.3% of the varieties harbor one favorable allele, and 16.4% varieties harbor more than one, but at most four, favorable alleles ( Figure 4A).

Analysis of the Genes Involved in Rice Eating and Cooking Qualities
The breeding of a variety with better eating and cooking qualities was another purpose of this breeding program. Thus, we analyzed the genotypes of the key genes Wx and ALK regulating amylose content as well as BADH2, which controls rice fragrance. Interestingly, 94.2% and 68.8% of our varieties possess the favorable alleles of Wx and ALK, respectively (Table 2). However, only one fifth of all varieties harbors the favorable allele of the fragrance gene BADH2. Further analysis revealed that the varieties containing the favorable alleles of both Wx and ALK account for only 35.1%. These results provide an overall view of the genetic constitution of the target traits, which may assist further breeding to increase favorable alleles together. Table 2. Functional analysis of the major genes affecting the eating and cooking qualities in the collection of rice varieties.

Analysis of the Genes Involved in Rice Eating and Cooking Qualities
The breeding of a variety with better eating and cooking qualities was another purpose of this breeding program. Thus, we analyzed the genotypes of the key genes Wx and ALK regulating amylose content as well as BADH2, which controls rice fragrance. Interestingly, 94.2% and 68.8% of our varieties possess the favorable alleles of Wx and ALK, respectively (Table 2). However, only one fifth of all varieties harbors the favorable allele of the fragrance gene BADH2. Further analysis revealed that the varieties containing the favorable alleles of both Wx and ALK account for only 35.1%. These results provide an overall view of the genetic constitution of the target traits, which may assist further breeding to increase favorable alleles together.

Discussion
The collection of varieties was clustered into three groups ( Figure 2). Group II contained five varieties, including Yunhui 290, which is a classical indica inbred variety (https://ricedata.cn/variety/varis/606611.htm (accessed on 30 April 2022)). This result suggests that the other four varieties, sharing a more common background with Yunhui 290 in group II, may represent indica varieties. Likewise, the varieties of group III grouped together with Nipponbare may represent japonica varieties. Introduced varieties, such as Dozikara-1, Yuanbai and Linyou 22, are clustered in group III, but into different subgroups. This result implies that these introduced varieties have been well utilized and introgressed into local varieties, therefore creating several new varieties. However, the varieties Lvmi 3, Lvmi1, SMGR 4, Lvmi 2, Nanjing 5055 and Xuebai were clustered together, which suggests that these introduced varieties share a highly common background and have not been well utilized by local breeders.
The background analysis showed that the wild accessions Oryza rufipogon, Oryza officinalis and Oryza meyeriana contain different levels of heterozygous fragments in their genome (Figure 3), which suggests that these accessions still remain wild in nature, although they were collected and maintained by researchers for several years. Another finding in the background analysis showed that several tested varieties, such as Lvmi 3, possess only a few numbers of heterozygous fragments. This result not only intuitively exhibits the genetic background of the tested variety, but also shows the powerful function of SNP array in the breeding course. In fact, several studies showed that the whole-genome chip array has been well used for the deliberate selection of target genes [13,31].
Thanks to the advances in functional genomics in rice, a series of genes regulating bio-stress resistance were cloned, such as Bph genes for brown planthopper resistance [32], Pi genes for rice blast resistance [33] and Xa (xa) genes for rice bacterial leaf blight resistance [34]. Interestingly, our results show that the favorable alleles of bio-stress resistance genes are less retained in the local varieties in the Yunnan province. The favorable alleles of the Bph genes are almost absent in all the tested varieties, except one variety that contains the favorable allele of Bph18 (Table 1). For genes with regard to rice bacterial leaf blight resistance, 20.3% of the varieties contain the favorable allele of Xa21, and no varieties harbor the favorable allele of xa13, Xa23, xa5 and Xa7. These results reveal the fact that rice breeding during the last few decades pursued a high grain yield and paid less attention to bio-stress resistance. Nowadays, beyond obtaining a high grain yield, new varieties should be adapted to the growing conditions with less pesticides to meet the goals of sustainable agriculture. Thus, our results could provide essential information for both candidate parent and favorable allele selection for target genes in further breeding progress.

Conclusions
In order to obtain a better understanding of the genetic background of the collection of varieties cultivated in the Yunnan province, China, a collection of 154 varieties cultivated in that region were subjected to a genetic background analysis with high-density SNP array, which revealed the complicated genetic diversities represented by the distinguishing background and high percentage of homozygous genomic fragments. Further analysis showed that the favorable alleles of pathogen resistance genes displayed a relatively low frequency of existence in this collection of varieties. However, most varieties possessed the favorable alleles of Wx and ALK genes, which confer better eating and cooking qualities to rice. These results provide useful information for the future genetic improvement of rice in breeding practices.