Genetic Basis Dissection for Eating and Cooking Qualities of Japonica Rice in Northeast China

: The japonica rice in Northeast China is famous because of its high quality. Eating and cooking qualities (ECQs) are the most important factors that determine cooked rice quality. However, the genetic basis of ECQ of japonica varieties in Northeast China needs further study. In this study, 200 japonica varieties that are widely distributed in Northeast China were collected to evaluate the physicochemical indices of grain ECQs. The distribution of each trait was concentrated without large variations. Correlation analysis indicated that gel consistency (GC) had a signiﬁcantly negative correlation with gelatinization temperature (GT). By integrating various analyses including kinship calculation, principal component analysis (PCA), linkage disequilibrium (LD) analysis, and original parent investigation, we found that the japonica varieties in Northeast China exhibited a narrow genetic basis. An association study for grain ECQs was performed and eight quantitative trait loci (QTLs) were detected. ALK was the major locus that regulated GT and also signiﬁcantly a ﬀ ecting GC. Through the linkage disequilibrium (LD) and expression pattern analysis, one possible candidate gene (LOC_Os02g29980) was predicted and required further research for validation. Additionally, a di ﬀ erent allele of Wx was identiﬁed in the variety CH4126, and ALK was not ﬁxed in these japonica varieties. These results further elucidate the genetic basis of ECQs of japonica varieties in Northeast China and provide local breeders some assistance for improving ECQs of rice grain in rice breeding.


Introduction
Rice (Oryza sativa L.) is a staple food and feeds more than half of the world's population [1]. In China, rice production has increased almost every year. Because of the increasing rice yield and living standard for customers, more and more rice breeders and consumers have been focusing on rice grain quality, which includes grain milling, appearance, eating and cooking qualities, and nutritional quality. Eating and cooking qualities (ECQs) are the most important factors that determine cooked rice quality. Amylose content (AC), gel consistency (GC), and gelatinization temperature (GT) are the key physicochemical properties of ECQs.
Starch is the major component of rice grain which usually contains two types of starch, i.e., amylose and amylopectin [2]. AC is considered to be the most important factor among rice ECQ. The rice grains with high AC are usually dry, separate, and firm when cooked, whereas rice grains with low AC are soft, glossy, and cohesive [3]. Rice grains with diverse AC meet various demands of consumers [4]. GC is the measure of rice starch gel strength and is another property that reflects firmness and stickiness of cooled cooked rice. Therefore, it has been used as a key character to determine the texture property of

Genotyping, Kinship, and LD Decay Analysis
All 200 japonica rice varieties were genotyped for 90K SNPs using a high-density rice array [24]. Only 21,308 SNPs were selected for further study due to the following filter criterion: missing data ratio (MDR) <0.2 and markers with frequency of minor allele (MAF) >0.05 (Supplementary Table S2). The kinship matrix was calculated using TASSEL software [25]. The linkage disequilibrium (LD) coefficient was calculated using the PLINK software and the LD decay plot was drawn using Microsoft Excel software.

Genome-Wide Associated Mapping
The genome-wide associated study (GWAS) was performed to detect the association between phenotype and genotype using TASSEL software [25]. A mixed linear model was used for the associated mapping. Kinship and principal components analysis (PCA) were used to correct the GWAS results. Among these, the PCA was calculated using EIGENSOFT [26]. To obtain independent association signals, multiple SNPs passing the threshold on the same chromosome were clustered as one association locus, and the SNP with the minimum P-value in a cluster was considered as the lead SNP. The threshold value was determined as follows: P threshold = 1/N (1) where N indicates the number of SNPs used in GWAS and P threshold indicates the threshold of the P-value. In our study, to simplicity, the -log 10 (P) = 4 was used for the threshold value Equation (1).

Candidate Gene Analysis
According to the GWAS results, we calculated the LD decay to discover the candidate regions of the significant loci using the Haploview software (Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA) [27]. The expression pattern of the genes in the candidate region was obtained from the RNA-seq database in the Rice Genome Annotation Project (http://rice.plantbiology.msu.edu/).

Distributions of AC, GC, and GT in the Japonica Population
Except for GT, all traits showed continuous distributions and large variations in the japonica population ( Figure 1). For GC, the value of every variety ranged from 45 to 80 mm, and mostly focused on the length between 50 and 65 mm ( Figure 1B). The AC ranged from 14% to 20% and more than half of varieties harbored the AC of 16%~18% ( Figure 1C). Therefore, most of the varieties exhibited low AC (12% to 20%) and medium GC (41 to 60 mm). In addition, the distribution of GT was discontinuous. The value of GT was focused on the degree 4~5 and 6~7, with the largest portion focused on degree 6~7 ( Figure 1A). Except for GT, all traits showed continuous distributions and large variations in the japonica population ( Figure 1). For GC, the value of every variety ranged from 45 to 80 mm, and mostly focused on the length between 50 and 65 mm ( Figure 1B). The AC ranged from 14% to 20% and more than half of varieties harbored the AC of 16%~18% ( Figure 1C). Therefore, most of the varieties exhibited low AC (12% to 20%) and medium GC (41 to 60 mm). In addition, the distribution of GT was discontinuous. The value of GT was focused on the degree 4~5 and 6~7, with the largest portion focused on degree 6~7 ( Figure 1A).

Geographical Distributions of Rice Varieties
We found that more than half of the varieties were from Liaoning province (LN), whereas only 52 and 20 varieties were from Heilongjiang province (HLJ) and Jilin province (JL), respectively ( Figure 2A). To investigate the regional differences of these traits, we compared every trait among the three provinces and found that the AC in the varieties planted in the north was lower than that of those varieties grown in the south. Moreover, the GC of the varieties from LN was softer than that of those varieties from HLJ and JL. In addition, there was no regularity in the GT ( Figure 2B-D).

Geographical Distributions of Rice Varieties
We found that more than half of the varieties were from Liaoning province (LN), whereas only 52 and 20 varieties were from Heilongjiang province (HLJ) and Jilin province (JL), respectively ( Figure 2A). To investigate the regional differences of these traits, we compared every trait among the three provinces and found that the AC in the varieties planted in the north was lower than that of those varieties grown in the south. Moreover, the GC of the varieties from LN was softer than that of those varieties from HLJ and JL. In addition, there was no regularity in the GT ( Figure 2B-D).

Correlation Analysis of AC, GC, and GT
AC, GC, and GT collectively affect the ECQ in rice grain. The correlation analysis showed that AC exhibited a significant positive correlation to GC with the correlation coefficient of 0.3. The highest negative correlation was observed between GT and GC, and the correlation coefficient reached up to -0.42. In addition, there was no significant correlation between GT and AC, and the correlation coefficient was -0.1 ( Figure 3).

Correlation Analysis of AC, GC, and GT
AC, GC, and GT collectively affect the ECQ in rice grain. The correlation analysis showed that AC exhibited a significant positive correlation to GC with the correlation coefficient of 0.3. The highest negative correlation was observed between GT and GC, and the correlation coefficient reached up to -0.42. In addition, there was no significant correlation between GT and AC, and the correlation coefficient was -0.1 ( Figure 3).

Correlation Analysis of AC, GC, and GT
AC, GC, and GT collectively affect the ECQ in rice grain. The correlation analysis showed that AC exhibited a significant positive correlation to GC with the correlation coefficient of 0.3. The highest negative correlation was observed between GT and GC, and the correlation coefficient reached up to -0.42. In addition, there was no significant correlation between GT and AC, and the correlation coefficient was -0.1 ( Figure 3).

The Japonica Varieties in Northeast China Displayed Narrow Genetic Basis
On the basis of the genotype of the total 200 varieties, the pairwise relative kinship value was analyzed ( Figure 4A). Only 0.6% of the values were less than 0.1, and 22.1% of the values ranged from 0.1 to 0.5. Moreover, 77.3% of the values were larger than 0.5 ( Figure 4B). Together, these results suggest that there was high relatedness among the japonica varieties in Northeast China. Thus, we consequently investigated the genealogy of all varieties, and the original parent of these varieties focused on four varieties which included Liaogeng5, Liaogeng326, Songgeng3, and Hejiang20 ( Figure 4C). Furthermore, Liaogeng5 was widely used as a parent for 79% of all the varieties, which was the possible reason for the high kinship between each of the two varieties. In addition, the linkage disequilibrium (LD), for all the japonica varieties, was then analyzed. The LD decay rate was measured as the chromosomal distance at which the average pairwise correlation coefficient dropped to half the value of the maximum r 2 ( Figure 4D). The result showed that the genome-wide LD decay rate was approximately estimated at 1 Mb where the r 2 dropped to 0.34. The PCA analysis was also performed to study the population structure. We were not able to divide the population into some obvious subgroups with the different distributions along the two eigenvectors. PC1 and PC2 accounted for 10.3% and 10.0% of the genetic variation, respectively. The top ten PCs only represented 48.9% of the genetic variation (Supplementary Figure S1). In total, it indicated that the japonica varieties in Northeast China displayed a narrow genetic basis and resulted in long-range LD and high relatedness.

The Japonica Varieties in Northeast China Displayed Narrow Genetic Basis
On the basis of the genotype of the total 200 varieties, the pairwise relative kinship value was analyzed ( Figure 4A). Only 0.6% of the values were less than 0.1, and 22.1% of the values ranged from 0.1 to 0.5. Moreover, 77.3% of the values were larger than 0.5 ( Figure 4B). Together, these results suggest that there was high relatedness among the japonica varieties in Northeast China. Thus, we consequently investigated the genealogy of all varieties, and the original parent of these varieties focused on four varieties which included Liaogeng5, Liaogeng326, Songgeng3, and Hejiang20 ( Figure 4C). Furthermore, Liaogeng5 was widely used as a parent for 79% of all the varieties, which was the possible reason for the high kinship between each of the two varieties. In addition, the linkage disequilibrium (LD), for all the japonica varieties, was then analyzed. The LD decay rate was measured as the chromosomal distance at which the average pairwise correlation coefficient dropped to half the value of the maximum r 2 ( Figure 4D). The result showed that the genome-wide LD decay rate was approximately estimated at 1 Mb where the r 2 dropped to 0.34. The PCA analysis was also performed to study the population structure. We were not able to divide the population into some obvious subgroups with the different distributions along the two eigenvectors. PC1 and PC2 accounted for 10.3% and 10.0% of the genetic variation, respectively. The top ten PCs only represented 48.9% of the genetic variation (Supplementary Figure S1). In total, it indicated that the japonica varieties in Northeast China displayed a narrow genetic basis and resulted in long-range LD and high relatedness.

GWAS for AC, GC, and GT
Association mapping was performed under a mixed linear model with the kinship matrix and the top three PCs as covariates. In total, 130 significant SNPs containing only eight regions associated with the three traits were detected at the threshold of four across all 12 chromosomes ( Figure 5 and Table 1). For GC, eight SNP loci were detected on chromosome 6. These eight loci were close to each other, therefore, only one region, named qGC6, was found near the position 6.7 Mb, in the associated analysis for GC, which accounted for 21% of the phenotypic variance ( Figure 5B). Two significant SNPs for AC were found at the position of 8.30 Mb on chromosome 11 and 6.84 Mb on chromosome 7, respectively. The two SNP loci, namely qAC11 and qAC7, accounted for 7.38% and 12.9% of the phenotypic variance, respectively ( Figure 5C). In addition, 120 significant SNPs were detected in the association mapping for GT. Most SNPs were clustered together and only five regions, namely qGT1, qGT2-1, qGT2-2, qGT6-1, and qGT6-2, were obtained. qGT1, located on chromosome 1 at the position of 15.2 Mb, accounted for 13.7% of the phenotypic variance with the -log 10 (P) value of 5.38. qGT2-1 and qGT2-2 were the two loci on chromosome 2, which explained 18.9% and 9.7% of the phenotypic variance with the -log 10 (P) value of 8.77 and 4.03, respectively. qGT6-2 was at the position of 20.1 Mb on chromosome 6 with the lower value in the associated mapping for GT, explaining 9% of the phenotypic variance. qGT6-1, which had a highest-peak SNP, located on chromosome 6 near the position 6.7 Mb, explained 29.7% of the phenotypic variance ( Figure 5A). Interestingly, qGT6-1 was overlapped with the region of qGC6.

GWAS for AC, GC, and GT
Association mapping was performed under a mixed linear model with the kinship matrix and the top three PCs as covariates. In total, 130 significant SNPs containing only eight regions associated with the three traits were detected at the threshold of four across all 12 chromosomes ( Figure 5 and Table 1). For GC, eight SNP loci were detected on chromosome 6. These eight loci were close to each other, therefore, only one region, named qGC6, was found near the position 6.7 Mb, in the associated analysis for GC, which accounted for 21% of the phenotypic variance ( Figure 5B). Two significant SNPs for AC were found at the position of 8.30 Mb on chromosome 11 and 6.84 Mb on chromosome 7, respectively. The two SNP loci, namely qAC11 and qAC7, accounted for 7.38% and 12.9% of the phenotypic variance, respectively ( Figure 5C). In addition, 120 significant SNPs were detected in the association mapping for GT. Most SNPs were clustered together and only five regions, namely qGT1, qGT2-1, qGT2-2, qGT6-1, and qGT6-2, were obtained. qGT1, located on chromosome 1 at the position of 15.2 Mb, accounted for 13.7% of the phenotypic variance with the -log10(P) value of 5.38. qGT2-1 and qGT2-2 were the two loci on chromosome 2, which explained 18.9% and 9.7% of the phenotypic variance with the -log10(P) value of 8.77 and 4.03, respectively. qGT6-2 was at the position of 20.1 Mb on chromosome 6 with the lower value in the associated mapping for GT, explaining 9% of the phenotypic variance. qGT6-1, which had a highest-peak SNP, located on chromosome 6 near the position 6.7 Mb, explained 29.7% of the phenotypic variance ( Figure 5A). Interestingly, qGT6-1 was overlapped with the region of qGC6.    We compared the significant SNP loci detected in this study with the quantitative trait loci (QTLs)/genes reported previously (Table 1). Wx was the major QTL for AC, but in our study, it was not detected in the GWAS for AC. However, ALK, the major gene for gelatinization temperature, overlapped the region of qGC6/qGT6-1, which regulated both the GC and GT in rice grain. In the intervals of the other QTLs, there was no overlap as compared with the QTLs previously reported to be associated with rice ECQs.

Candidate Gene Analysis
In our study, only eight QTLs were detected, and candidate gene analysis was performed on the two major important QTLs, qAC11 and qGT2-1 ( Figure 6). For the region of qAC11, according to the LD decay analysis, a total 1.02 Mb region ranging from 7.78 to 8.80 Mb on chromosome 11 ( Figure 6B) was identified as the candidate region which contained more than 150 genes. It seems that it is difficult to determine which genes are useful from the large number of genes in the candidate region. A similar result was found for qGT2-1 in the region of 17.51 to 18.44 Mb on chromosome 2, and an approximately 928 kb region was considered to be a candidate region using the LD decay analysis ( Figure 6A).
Mostly, the genes regulate the ECQ in rice grain which is always highly expressed in rice endosperm, such as Wx and ALK ( Figure 7E-F). We screened all of the genes located in the candidate regions using the RNA-seq data published by the Rice Genome Annotation Project and selected 33 genes that were expressed in rice endosperm (Supplementary Table S3). On the basis of the expression pattern analysis, four genes including LOC_Os02g29980, LOC_Os02g30300, LOC_Os02g30470, and LOC_Os02g30600 were relatively higher expressed in rice seed and endosperm and lower expressed in shoots and leaf ( Figure 7A-D). However, there was no possible gene selected in the candidate region for qAC11. Except for LOC_Os02g30600, the other three genes harbored the highest expression in rice grain endosperm. Additionally, according to the gene function annotation, both LOC_Os02g30470, and LOC_Os02g30600 encode the expressed proteins with unknown function, LOC_Os02g29980 encodes an X8 domain containing protein, and LOC_Os02g30300 encodes a MYND finger family protein.
Further analysis of sequences of the four genes using BLAST, revealed that the amino acid sequence of LOC_Os02g29980 was highly similar to the 1,3-beta-glucosidase, which could affect the starch synthesis, and LOC_Os02g30300 was homologous to F-box protein. Previous reports have demonstrated that the F-box protein was involved in various physiological and biochemical pathways including panicle and flower development, heading data, plant hormone, and abiotic stress [28]; however, none were directly related to the starch synthesis and rice quality. Additionally, the expression pattern of LOC_Os02g29980 was most similar to those of Wx and ALK. Therefore, it is possible that LOC_Os02g29980 can regulate GT in rice, but further research is required for validation.

Analysis of ALK and Wx in the Japonica Varieties
According to our GWAS results, the major gene for AC, Wx, was not detected. It was abnormal and strange in the QTL analysis or GWAS for AC, on the basis of many previous reports. Twenty varieties with highest AC and lower AC were selected, respectively, and their Wx gene was sequenced using the primer in Supplementary Table S4. The results showed that all varieties, except CH4126, exhibited the same sequence in Wx gene. And the sequence analysis revealed that the Wx gene in all sequenced varieties, including CH4126, harbored the allele of Wx b [29]. Interestingly, the variety CH4126 had the largest AC value which reached up to 26.8%, whereas the values of AC in all other varieties were less than 20%. The sequencing analysis in CH4126 showed that it had 21 variations in the Wx as compared with other sequenced varieties ( Figure 8A, Supplementary Table  S5). Among these variations, only two variations (snp3 and snp4) were located on the exon. Snp4 was a nonsynonymous mutation at the position of 3377 bp that resulted in amino acid change from proline to serine, which could be the reason for the high AC in CH4126.

Analysis of ALK and Wx in the Japonica Varieties
According to our GWAS results, the major gene for AC, Wx, was not detected. It was abnormal and strange in the QTL analysis or GWAS for AC, on the basis of many previous reports. Twenty varieties with highest AC and lower AC were selected, respectively, and their Wx gene was sequenced using the primer in Supplementary Table S4. The results showed that all varieties, except CH4126, exhibited the same sequence in Wx gene. And the sequence analysis revealed that the Wx gene in all sequenced varieties, including CH4126, harbored the allele of Wx b [29]. Interestingly, the variety CH4126 had the largest AC value which reached up to 26.8%, whereas the values of AC in all other varieties were less than 20%. The sequencing analysis in CH4126 showed that it had 21 variations in the Wx as compared with other sequenced varieties ( Figure 8A, Supplementary Table S5). Among these variations, only two variations (snp3 and snp4) were located on the exon. Snp4 was a nonsynonymous mutation at the position of 3377 bp that resulted in amino acid change from proline to serine, which could be the reason for the high AC in CH4126.
variety CH4126 had the largest AC value which reached up to 26.8%, whereas the values of AC in all other varieties were less than 20%. The sequencing analysis in CH4126 showed that it had 21 variations in the Wx as compared with other sequenced varieties ( Figure 8A, Supplementary Table  S5). Among these variations, only two variations (snp3 and snp4) were located on the exon. Snp4 was a nonsynonymous mutation at the position of 3377 bp that resulted in amino acid change from proline to serine, which could be the reason for the high AC in CH4126. qGT6-1/qGC6 was the largest significant locus in our GWAS results, overlapping the previously reported gene, ALK, which was the major gene controlling the gelatinization temperature. Sequencing analysis demonstrated that there were many variations between varieties with high GT and low GT ( Figure 8B, Supplementary Table S6). Five variations were located on the exon of ALK, and three of them could lead to the protein change. These variations in ALK were similar to a previous study. Together with the results of GWAS and the sequencing analysis of ALK, it suggested that the gelatinization temperature of japonica varieties in Northeast China was mainly controlled by ALK.

Discussion
The Northeast Plain is the largest plain in China, and the rice grown in this area accounts for a significant portion of the total rice production in China [30]. High-quality rice is conducive to the commercialization of rice. Hence, with increasing rice yield, excellent rice quality has been one of the aims of breeders in Northeast China. Moreover, breeding is the process of artificial selection, which leads to low genetic diversity in the varieties planted in one region. Generally, ECQs are invisible characters and not easily selected as compared with other visible traits. In our study, we statistically analyzed the AC, GC, and GT of the 200 varieties in Northeast China. The distribution of these traits showed that the values of each trait assembled in a narrow range without a large variation. Among these traits, GC relatively displayed the largest variation in the 200 japonica varieties. We suggest that GC is potentially difficult to select in Northern China which could be due to the deficiency of the major control gene. In addition, a modern breeding program always involves genetically improving the elite varieties used before. In total, 98% of the varieties used one or more of the four varieties including Liaogeng5, Liaogeng326, Songgeng3, and Hejiang20 as their original parents. Because of this, it is easy to explain the close kinship and long-range LD among these varieties. Therefore, the genetic basis of the varieties in Northeast China is narrow, which could be the reason for the production bottleneck in rice breeding. In order to resolve this bottleneck, we suggest introducing other excellent genetic germplasm and broadening the genetic basis of the japonica varieties in Northeast China.
A previous study showed that japonica had the Wx b allele, which is consistent with our results [29]. Wx was the only major effect gene in the rice germplasm controlling the AC in rice grain. It leads to the similar AC in most of the varieties. Similar to Wx, ALK is the only major effect gene regulating GT. In our study, about 11% of the varieties exhibited low GT, which means that ALK was not fixed in the japonica varieties in Northeast China. The process of indica-japonica differentiation could result in differentiation of Wx in japonica and indica, while ALK had no obvious difference between the two subspecies. Thus, ALK was subjected to human selection for rice breeding in Northeast China, due to the high GT in most japonica varieties there.
Fortunately, we found that the variety CH4126 had the highest AC of 26.8%. By sequencing Wx, we discovered that the functional locus of the Wx b allele still existed. In addition, many other variations, including a nonsynonymous mutation, were detected. Interestingly, the nonsynonymous mutation in CH4126 was the rare allele of Wx and it had been reported that the mutation was not significantly associate with AC [31]. However, in our study, the variation in Wx could be the main reason for the high AC in CH4126. Of course, the results need further research for validation. These results could provide a new genetic resource to dissect the genetic mechanism of Wx regulating the AC in rice grain.
The GWAS results detected eight QTLs of ECQs; only one QTL for GC that overlapped with qGT6-1 was detected, the functional gene in the region was ALK. Thus, ALK is an important gene regulating the GC in the japonica varieties in Northeast China. Correlation analysis showed that the highest correlation was detected between GC and GT, which verified the pleiotropism of ALK to some extent. The result was different from a previous report [8], which found that ALK had a minor effect on GC. In our study, ALK had a significant effect on GC, possibly due to the different genetic population. Except for ALK, other QTLs were new and did not overlap with the previously reported genes and QTLs. Due to the minor effects of these QTLs and long-range of LD, it was difficult to find the candidate gene. However, the genes regulating the ECQs are often highly expressed in endosperm, such as Wx and ALK, and the number of these genes is not very high. Therefore, screening the expression pattern of all genes in the candidate region is a good way for candidate gene determination.
GWAS is useful for genetic dissection of complex traits in rice and has been widely proven by many studies [19,32,33]. Furthermore, many novel genes have been identified using GWAS [34][35][36]. However, GWAS still has some deficiencies. Population structure is a very important factor that affects GWAS results and can produce false positive results. Its influence can be weakened by using the mixed linear model method in GWAS. In our study, 200 japonica varieties were collected from Northeast China. These collections exhibited a low-level population structure which was suitable for GWAS and reduced false positive results, but the close kinship among the population lead to long range LD, which made it difficult to determine the key genes.
The results in our study should be helpful not only for understanding the genetic basis of japonica varieties in Northeast China, but also for providing a theoretical basis of ECQs. This is a benefit to ECQ improvement for rice breeding in Northeast China.

Conclusions
In our study, 200 japonica varieties widely distributed in Northeast China were collected to evaluate the physicochemical indices of grain ECQs. We found that the japonica varieties displayed a narrow genetic basis in Northeast China. Through the association study, we found that ALK was the major locus that regulating GT, and also significantly affecting GC, which was consistent with the results in the correlation analysis. In addition, one possible candidate gene (LOC_Os02g29980) was finally predicted. Additionally, two major genes for rice ECQs (ALK and Wx) in the japonica population were also studied. These results further contribute to the understanding of the genetic basis of ECQs of japonica varieties in Northeast China and provide breeders, in the area, some assistance for improving ECQs of rice grain in rice breeding.

Supplementary Materials:
The following are available online http://www.mdpi.com/2073-4395/10/3/423/s1, Table S1: Information for 200 japonica varieties from Northeast China, Table S2: The genotype of 200 japonica varieties in Northeast China, Table S3: Presentation of the candidate genes that expressed in rice endosperm, Table S4: Primers for sequencing in this study, Table S5: Gene diversity in Wx between CH4126 and other sequenced varieties, Table S6: Gene diversity in ALK between low and high GT varieties, Figure S1: Principal component analysis of 200 japonica varieties.
Author Contributions: Y.Y. and X.W. conceived and designed research; Y.Y. and X.X. conducted experiments; M.Z., Q.X., Y.F., and X.Y. performed the phenotypic identification; Y.Y., H.Y., and Y.W. analyzed the data; Y.Y. and X.X. wrote the manuscript; X.W. and M.Z. helped to revise the manuscript. All authors have read and agreed to the published version of the manuscript.