Genomic Variation Underlying the Breeding Selection of Quinoa Varieties Longli-4 and CA3-1 in China

Quinoa (Chenopodium quinoa) is a well-known climate-resilient crop and has been introduced into multiple marginal lands across the world, including China, to improve food security and/or balanced nutrient supplies. Conventional breeding has been widely applied in the selection and breeding of quinoa varieties in China since 1980s; however, few studies have been implemented on the genetic variances among different varieties developed by diversity breeding objectives. In this study, the phenotypic and genetic differences between two varieties (Longli-4 and CA3-1) from China were systematically analyzed. A total of 407,651 and 2,731,411 single nucleotide polymorphisms (SNPs) and 212,724 and 587,935 small insertion and deletion (INDELs) were detected for Longli-4 and CA3-1, respectively, when compared with the reference genome of PI614886. The SNPs/INDELs were unevenly distributed across each chromosome for both varieties. There were 143,996 SNPs and 83,410 INDELs shared between Longli-4 and CA3-1, accounting for 4% of the total variances. The variation was then screened based on the SNP effects. There were 818 and 73 genes with the variety-specific non-synonymous and stop-gain variation in Longli-4, whereas there were 13,701 and 733 genes in CA3-1. Specifically, 3501 genes with the non-synonymous variation and 74 genes with the stop-gain variation were found in both Longli-4 and CA3-1. These results suggest that convergent selection occurred during the different breeding processes. A set of candidate genes related to agronomic traits and domestication were further selected to detect the genetic divergence in detail in the two varieties. Only one domestication gene was identified having Longli-4-specific stop-gain variation. Twelve candidate genes related to betalain (1), flowering (4), seed size (2), domestication (1), and saponin (4) were identified having CA3-1-specific stop-gain variation. Interestingly, one seed size gene homologous of CKX1 (cytokinin oxidase/dehydrogenase 1) had the stop-gain variation in both varieties. This research will therefore provide guidance for the molecular-assisted breeding in quinoa.


Introduction
Quinoa (Chenopodium quinoa) is an allotetraploid plant, belonging to the subfamily Chenopodioideae in Amaranthaceae [1,2]. Historical documents and archaeological evidence suggest that quinoa was already a traditional food for the native people of the Andes as early as 7000 years ago [3,4]. Due to its excellent tolerance to diverse environments and well-balanced nutritional values, quinoa has been characterized as a resilient crop [4][5][6] and has also been prized as a "whole grain" for human beings [7,8]. Since the year 2013 was named "the International Year of Quinoa" by the United Nations, the number of countries and regions cultivating quinoa have expanded substantially, reaching up to 100 genes/pathways are convergently selected in Longli-4 and CA3-1? Finally, the outcrossing rate of quinoa in different environments is in the range of 1.5-17.36% [4,33], and thus, maintaining the genetic homogeneity of the quinoa cultivars is difficult. To what extent are the homogeneity values maintained in Longli-4 and CA3-1 under different cultivation processes? With the release of quinoa genome information [19,34,35], it is possible to provide a broad perspective on the genetic variation of each variety. This manuscript focused on Longli-4 and CA3-1, and the genetic variation against the reference line PI614886 [19] was analyzed in each variety. The selection bias in the genome and individual genes was presented between Longli-4 and CA3-1. In the end, the variation patterns of some candidate genes involved in agronomic traits and domestication were reported, and the homogeneity of each variety was measured by several molecular markers.

Phenotype Comparison between Longli-4 and CA3-1
Longli-4 and CA3-1 were sown in the growth chamber and a significant difference was mainly observed in mature grains ( Figure 1). The colors of Longli-4 and CA3-1 grains were light-yellow and dark, respectively. The thousand-grain weight (TGW) and grain diameter of Longli-4 were significantly lower than those of CA3-1, although at different statistical levels. The stem of Longli-4 plant was in green color, while the CA3-1 was in purple (Supplemental Figure S1A). The internode lengths of the Longli-4 and CA3-1 varied differently in the top, middle, and basal stems, but the whole plant heights were similar in these two varieties (Supplemental Figure S1B-I). Likewise, no obvious difference was observed in lateral branch number, stem diameter, leaf area, panicle length, and panicle diameter (Supplemental Figure S1J-M,P-R). The top inflorescences exhibited different types within and between varieties, partly due to the variation in pedicel length, but the flowering time was similar between Longli-4 and CA3-1 (Supplemental Figure S1M), which is consistent with previous field assays [31]. The hermaphrodite flowers in both varieties contained five anthers (Supplemental Figure S1N). During the ripening process, the color of the sepals, which enclosed the grain, changed from green to yellow in Longli-4, while for CA3-1, the color did not change until the grain was ready to harvest, and the grain had a dark red or purple color on its surface (Supplemental Figure S1O). weakness resistance, prematurity and high yield potential [31]. Is it possible that variances in orthologous genes/pathways are convergently selected in Longli-4 and CA3-1? Finally, the outcrossing rate of quinoa in different environments is in the range of 1.5-17.36% [4,33], and thus, maintaining the genetic homogeneity of the quinoa cultivars is difficult.
To what extent are the homogeneity values maintained in Longli-4 and CA3-1 under different cultivation processes? With the release of quinoa genome information [19,34,35], it is possible to provide a broad perspective on the genetic variation of each variety. This manuscript focused on Longli-4 and CA3-1, and the genetic variation against the reference line PI614886 [19] was analyzed in each variety. The selection bias in the genome and individual genes was presented between Longli-4 and CA3-1. In the end, the variation patterns of some candidate genes involved in agronomic traits and domestication were reported, and the homogeneity of each variety was measured by several molecular markers.

Phenotype Comparison between Longli-4 and CA3-1
Longli-4 and CA3-1 were sown in the growth chamber and a significant difference was mainly observed in mature grains ( Figure 1). The colors of Longli-4 and CA3-1 grains were light-yellow and dark, respectively. The thousand-grain weight (TGW) and grain diameter of Longli-4 were significantly lower than those of CA3-1, although at different statistical levels. The stem of Longli-4 plant was in green color, while the CA3-1 was in purple (Supplemental Figure S1A). The internode lengths of the Longli-4 and CA3-1 varied differently in the top, middle, and basal stems, but the whole plant heights were similar in these two varieties (Supplemental Figure S1B-I). Likewise, no obvious difference was observed in lateral branch number, stem diameter, leaf area, panicle length, and panicle diameter (Supplemental Figure S1J-M, P-R). The top inflorescences exhibited different types within and between varieties, partly due to the variation in pedicel length, but the flowering time was similar between Longli-4 and CA3-1 (Supplemental Figure S1M), which is consistent with previous field assays [31]. The hermaphrodite flowers in both varieties contained five anthers (Supplemental Figure S1N). During the ripening process, the color of the sepals, which enclosed the grain, changed from green to yellow in Longli-4, while for CA3-1, the color did not change until the grain was ready to harvest, and the grain had a dark red or purple color on its surface (Supplemental Figure S1O).

Statistics of Genomic Variances in Longli-4 and CA3-1 against the Reference Genome (PI614886)
A total of 3,149,050 single nucleotide polymorphisms (SNPs) and 711,178 small insertion and deletion (INDELs) were found when both Longli-4 and CA3-1 were included for variance calling. After filtering out the SNPs and INDELs that were not mounted to the chromosome, a total of 2,970,656 SNPs and 674,628 INDELs were obtained ( Figure 2). To identify the genomic variation in each variety, the SNP and INDEL datasets were screened individually for Longli-4 and CA3-1. After filtering the variances with no read support

Statistics of Genomic Variances in Longli-4 and CA3-1 against the Reference Genome (PI614886)
A total of 3,149,050 single nucleotide polymorphisms (SNPs) and 711,178 small insertion and deletion (INDELs) were found when both Longli-4 and CA3-1 were included for variance calling. After filtering out the SNPs and INDELs that were not mounted to the chromosome, a total of 2,970,656 SNPs and 674,628 INDELs were obtained ( Figure 2). To identify the genomic variation in each variety, the SNP and INDEL datasets were screened individually for Longli-4 and CA3-1. After filtering the variances with no read support (e.g., 'N') and with the same genotype compared with the reference line PI614886, a total of 407,651 SNPs and 212,724 INDELs were detected in Longli-4. The variation numbers of CA3-1 were greatly higher than those in Longli-4, and 2,731,411 SNPs and 587,935 INDELs were found, suggesting that Longli-4 is more closely related to PI614886. The distribution patterns of SNPs and INDELs were drawn on each chromosome based on the position information ( Figure 3). The total SNP numbers from B sub-genome (Chr01, 03, 05, 06, 09, 11, 16, 17, and 18) were higher than those from A sub-genome (Chr02, 04, 07, 08, 10, 12, 13, 14, and 15) in both varieties ( Figure 3A,C and Supplemental Table S1). The SNPs were not evenly distributed on each chromosome, and the hot regions of the SNP occurrence were totally different in both varieties ( Figure 3A,C). For instance, Chr04 from the A sub-genome had the hottest region in Longli-4, whereas in CA3-1, it was Chr11 from the B sub-genome. Partly due to the shortest chromosome characteristics in the A and B sub-genomes, Chr13 and Chr09 were the chromosomes containing the least numbers but the highest densities of SNPs in both varieties (Supplemental Table S1). Similarly, the INDEL numbers from the B sub-genome were higher than those from the A sub-genome in both varieties, but the numbers of INDELs per 10 kb window on each chromosome were two and five in Longli-4 and CA3-1, respectively ( Figure 3B,D and Supplemental Table S1). The INDEL hot region occurred on Chr04 in Longli-4 but on Chr07 in CA3-1 ( Figure 3B,D).  Figure 3). The total SNP numbers from B sub-genome (Chr01, 03, 05, 06, 09, 11, 16, 17, and 18) were higher than those from A sub-genome (Chr02, 04, 07, 08, 10, 12, 13, 14, and 15) in both varieties ( Figure 3A,C and Supplemental Table S1). The SNPs were not evenly distributed on each chromosome, and the hot regions of the SNP occurrence were totally different in both varieties ( Figure 3A,C). For instance, Chr04 from the A subgenome had the hottest region in Longli-4, whereas in CA3-1, it was Chr11 from the B sub-genome. Partly due to the shortest chromosome characteristics in the A and B subgenomes, Chr13 and Chr09 were the chromosomes containing the least numbers but the highest densities of SNPs in both varieties (Supplemental Table S1). Similarly, the INDEL numbers from the B sub-genome were higher than those from the A sub-genome in both varieties, but the numbers of INDELs per 10 kb window on each chromosome were two and five in Longli-4 and CA3-1, respectively ( Figure 3B,D and Supplemental Table S1).
The INDEL hot region occurred on Chr04 in Longli-4 but on Chr07 in CA3-1 ( Figure 3B,D).     Figure 3). The total SNP numbers from B sub-genome (Chr01, 03, 05, 06, 09, 11, 16, 17, and 18) were higher than those from A sub-genome (Chr02, 04, 07, 08, 10, 12, 13, 14, and 15) in both varieties ( Figure 3A,C and Supplemental Table S1). The SNPs were not evenly distributed on each chromosome, and the hot regions of the SNP occurrence were totally different in both varieties ( Figure 3A,C). For instance, Chr04 from the A subgenome had the hottest region in Longli-4, whereas in CA3-1, it was Chr11 from the B sub-genome. Partly due to the shortest chromosome characteristics in the A and B subgenomes, Chr13 and Chr09 were the chromosomes containing the least numbers but the highest densities of SNPs in both varieties (Supplemental Table S1). Similarly, the INDEL numbers from the B sub-genome were higher than those from the A sub-genome in both varieties, but the numbers of INDELs per 10 kb window on each chromosome were two and five in Longli-4 and CA3-1, respectively ( Figure 3B,D and Supplemental Table S1).
The INDEL hot region occurred on Chr04 in Longli-4 but on Chr07 in CA3-1 ( Figure 3B,D).   Generally, transitions are defined as the substitution between two purines (A/G) or two pyrimidines (C/T), whereas the interchanges of purines with pyrimidines (G/T, A/T, A/C, G/C) are assigned as transversions. The main SNPs were transitions, accounting for 59.27% and 59.77% of the total variations, which were obviously higher than the transversions in Longli-4 and CA3-1, respectively ( Figure 4A,B and Supplemental Figure S2A,B). For the transitions, the proportion of C/T SNPs was higher than that of A/G in both varieties. For the transversions, the relative abundance values of G/T, A/T, and A/C SNPs were similar, whereas the G/C SNPs accounted for the lowest proportion of the total SNPs in both varieties ( Figure 4A,B and Supplemental Figure S2A Generally, transitions are defined as the substitution between two purines (A/G) or two pyrimidines (C/T), whereas the interchanges of purines with pyrimidines (G/T, A/T, A/C, G/C) are assigned as transversions. The main SNPs were transitions, accounting for 59.27% and 59.77% of the total variations, which were obviously higher than the transversions in Longli-4 and CA3-1, respectively ( Figure 4A,B and Supplemental Figure S2A,B). For the transitions, the proportion of C/T SNPs was higher than that of A/G in both varieties. For the transversions, the relative abundance values of G/T, A/T, and A/C SNPs were similar, whereas the G/C SNPs accounted for the lowest proportion of the total SNPs in both varieties ( Figure 4A,B and Supplemental Figure S2A

Genetic Divergence between Longli-4 and CA3-1
Among the total obtained 2,970,656 SNPs, 2,354,907 SNPs were detected in both Longli-4 and CA3-1, whereas the SNPs with missing information in Longli-4 and CA3-1 were 531,233 and 84,516, respectively (Figures 2A and 5A). Similarly, 483,974 out of 674,628 INDELs were identified in both varieties, and the numbers with no read support in Longli-4 and CA3-1 were 149,083 and 41,571, respectively ( Figures 2B and 5B). The SNPs and INDELs can be further classified into different categories when the genomic information of each variation from the reference line was included. For instance, there were 143,996 SNPs having the same nucleotide bases in both varieties but different against the reference bases at the same positions, suggesting that these SNPs are shared in Longli-4 and CA3-1 (hereafter named 'shared SNPs', Figure 5A). For the rest of the SNPs (hereafter named 'pairwise SNPs'), approximately 91% of SNPs (2,004,623) had the same nucleotide bases in Longli-4 and PI614886 but with different bases in CA3-1. In contrast, this kind of SNPs only accounted for 6.17% (136,574) in CA3-1. These results further imply that CA3-1 is in a distant relationship with both Longli-4 and the reference line. Additionally, there were 206,288 SNPs in Longli-4 and 2,074,337 SNPs in CA3-1 having different nucleotide bases in both varieties, and in these variety-specific SNPs, transitions (A/G and C/T) accounted for approximately 60% in Longli-4 and CA3-1 ( Figures 5A and 6A,D and Supplemental Figure S3A,D). The same procedure was conducted for INDEL classification and similar results were generated ( Figure 5B). In Longli-4, 75.32% of INDELs (302,360) had the same genotypes with the reference, but only 8.59% in CA3-1. Among the variety-specific INDELs (98,204 vs. 365,158), mononucleotide insertion and deletion were the predominant types. Mono-and di-nucleotide deletions were two to three times as greater as that of insertions in Longli-4. In CA3-1, mono-and di-nucleotide deletions were also higher than that of insertions, but to a lesser extent ( Figure 6B,E and Supplemental Figure S3B groups based on annotation, and relative abundance values of seven groups on each chromosom in Longli-4 (E,G) and CA3-1 (F,H) are shown as a stacked bar. The chromosomes from the A and sub-genomes of quinoa are shown as those in Figure 3.

Genetic Divergence between Longli-4 and CA3-1
Among the total obtained 2,970,656 SNPs, 2,354,907 SNPs were detected in bot Longli-4 and CA3-1, whereas the SNPs with missing information in Longli-4 and CA3were 531,233 and 84,516, respectively (Figures 2A, 5A). Similarly, 483,974 out of 674,62 INDELs were identified in both varieties, and the numbers with no read support i Longli-4 and CA3-1 were 149,083 and 41,571, respectively ( Figures 2B, 5B). The SNPs an INDELs can be further classified into different categories when the genomic informatio of each variation from the reference line was included. For instance, there were 143,99 SNPs having the same nucleotide bases in both varieties but different against the referenc bases at the same positions, suggesting that these SNPs are shared in Longli-4 and CA3-(hereafter named 'shared SNPs', Figure 5A). For the rest of the SNPs (hereafter name 'pairwise SNPs'), approximately 91% of SNPs (2,004,623) had the same nucleotide base in Longli-4 and PI614886 but with different bases in CA3-1. In contrast, this kind of SNP only accounted for 6.17%    on transitions and transversions. The same category method is also used in (D) for CA3-1. (B,E) Percentages of the 13 INDEL groups on each chromosome in Longli-4 (B) and CA3-1 (E). The group of '0′ in (B) represents the relative abundance of INDELs (302,360) with the same genotypes in Longli-4 and the reference line as that in Figure 5B, and the other twelve groups are classified based on the length of insertions and deletions. The same groups are also used in (E) for CA3-1. (C,F) SNPs (C) and INDELs (F) are classified into seven groups based on annotation information, and relative abundance values of all groups on each chromosome are shown as a stacked bar. The chromosomes from the A and B sub-genomes of quinoa are shown as those in Figure 3.  Figure 8C). Meanwhile, GO enrichment analyses were carried out for the genes with stop-gain variations in three categories. Only two enriched GO term including 'DNA integration' (GO:0015074, p = 1.85 × 10 −6 ) and 'nucleic acid binding' (GO:0003676, p = 0.0004) were found in category two ( Figure 8D). 0.0015), 'transferase activity, transferring phosphorus-containing groups' (GO:0016772, p = 0.0035), and 'oxidoreductase activity, acting on single donors with incorporation of molecular oxygen, incorporation of two atoms of oxygen' (GO:0016702, p = 0.0043), were enriched for the third type of genes ( Figure 8C). Meanwhile, GO enrichment analyses were carried out for the genes with stop-gain variations in three categories. Only two enriched GO term including 'DNA integration' (GO:0015074, p = 1.85 × 10 −6 ) and 'nucleic acid binding' (GO:0003676, p = 0.0004) were found in category two ( Figure 8D).

Variation of Candidate Genes Relevant to Vital Agronomic Traits and Domestication
To explore the genetic changes underlying some important agronomic traits in Longli-4 and CA3-1 during breeding, 7 MEP (2-C-methyl-D -erythritol 4-phosphate) pathway-, 9 flavonoid-, 72 betalain-, 352 flowering-, 226 seed size-, 151 domestication-, and 34 saponin biosynthesis-related genes from the model plant Arabidopsis thaliana and various staple crops as well as other plants, were used as seeds to identify the homologous genes in quinoa. As a result, out of the 18,020 non-synonymous genes, only 414 homologous genes were obtained by homologous sequence alignment. Among which, 1 betalain gene, 7 genes for flowering, 3 genes for seed size, and 1 gene for domestication had the Longli-4-specific non-synonymous variations, while 6 MEP-pathway genes, 6 flavonoid genes, 8 betalain genes, 153 flowering genes, 73 seed size genes, 47 domestication genes, and 13 saponin genes had the CA3-1-specific non-synonymous mutations. Furthermore, 1 MEP-pathway gene, 2 flavonoid genes, 2 betalain genes, 40 flowering genes, 20 seed size genes, 13 domestication genes, and 3 saponin genes having non-synonymous variations, were shared by Longli-4 and CA3-1, and only one seed size gene homologous of CKX1 (cytokinin oxidase/dehydrogenase 1) was found with stop-gain mutations in both varieties (Table 1 and Supplemental Table S2).

Heterozygosity Rates of Two Varieties
The homogeneity during the individual selection process is dependent on natural mutation, hybridization, and cultivation methods. In this study, Longli-4-3 and CA3-1-3 were individually harvested from the Haiyuan plots [36], and the descendants were selected for genome sequencing. Based on the SNPs and INDELs detected separately in two varieties (Figure 3), heterozygosity rates were measured according to the whole chromosomes. The whole average heterozygosity of Longli-4 calculated based on SNPs was 16.73%, ranging from 11.63% to 25.76%, while for INDELs, it was 16.34%, ranging from 13.58% to 20.06%. However, in CA3-1, the heterozygosity rates were 57.84% (30.70-92.72%) and 52.01% (31.23-81.58%) according to SNPs and INDELs (Figure 9). Similar results were also obtained from the comparisons within each sub-genome (Supplemental Figure S5). These results indicate that the Longli-4 has achieved higher homogeneity values than CA3-1. Since only one seedling for each variety was used for seq rates could be overestimated. Individual plants harvested fr ferent cultivation methods were therefore used to determine dividual heterozygosity was calculated by the frequency of offsprings using INDEL markers from nine chromosomes (Ta ure S6). Longli-4-1, Longli-4-2, CA3-1-1 and CA3-1-2 were fr heterozygosity values were 0-4.26%, 0%, 0%, and 0-2.08%, r individuals from Haiyuan county showed higher heterozyg values of Longli-4-3 and CA3-1-3 can reach up to 28.26% and markers were also used to test the heterozygosity of the com Results with 98 seedlings of Longli-4-4 revealed that the heter 0 to 5.21%. These results suggest that cultivation methods stro of each variety, and adequate spaces are necessary to main lower level. However, three genotypes (i.e., AA, AB, and BB) w ers of Chr15, Chr16, and Chr17 in Longli-4-3 (Supplementa Since only one seedling for each variety was used for sequencing, the heterozygosity rates could be overestimated. Individual plants harvested from different places with different cultivation methods were therefore used to determine the heterozygosity rates. Individual heterozygosity was calculated by the frequency of heterozygous loci within 48 offsprings using INDEL markers from nine chromosomes ( Table 2 and Supplemental Figure S6). Longli-4-1, Longli-4-2, CA3-1-1 and CA3-1-2 were from Tianzhu county, and the heterozygosity values were 0-4.26%, 0%, 0%, and 0-2.08%, respectively. On the contrary, individuals from Haiyuan county showed higher heterozygosity rates, and the highest values of Longli-4-3 and CA3-1-3 can reach up to 28.26% and 25%, respectively. The same markers were also used to test the heterozygosity of the commercial grains of Longli-4-4. Results with 98 seedlings of Longli-4-4 revealed that the heterozygosity rates ranged from 0 to 5.21%. These results suggest that cultivation methods strongly affect the homogeneity of each variety, and adequate spaces are necessary to maintain the heterozygosity at a lower level. However, three genotypes (i.e., AA, AB, and BB) were detected with the markers of Chr15, Chr16, and Chr17 in Longli-4-3 (Supplemental Figure S6), and the genetic effect in parental lines could also contribute to the high heterozygosity rates of Longli-4 variety.

Discussion
Analyzing the genomic variation pattern of quinoa is very important to understand its phenotypic variation and will provide guidance for quinoa breeding. In this study, higher numbers of SNPs and INDELs were detected in CA3-1, implying that CA3-1 is more distantly related to the reference line compared with the Longli-4. While the numbers and the distribution patterns of SNPs varied significantly in two varieties, the main variation patterns of the SNPs were the same (C/T or G/A), suggesting that the transition variation is the most frequent base substitution during species evolution and variety development. The same phenomenon was also discovered in other species, such as Arabidopsis [37] and rice [38].
Considering various factors including environmental adaptability and breeding goals, germplasm resources may maintain different genetic variation under different selection pressures [39]. Accordingly, the shared SNPs between Longli-4 and CA3-1 only account for 6%, which is far less than the 94% pairwise SNPs ( Figure 5A). A total of 13,701 nonsynonymous genes were attributed to the CA3-1 specific mutations and the enriched GO term 'flavonoid biosynthetic process' (GO:0009813) may provide some evidence for the difference in seed and stem color between Longli-4 and CA3-1 [31]. Moreover, crop yield is largely determined by the efficiency of photosynthesis [40,41]. For the 3501 shared nonsynonymous genes, the related elements in the process of photosynthesis were significantly enriched ( Figure 8C). This result is largely correlated with the relatively high yields of Longli-4 and CA3-1 [31], and is consistent with the need for high yield in the breeding programs [40,42,43].
To further explore the genetic differences between CA3-1 and Longli-4 in the breeding process, sets of candidate genes related to some important agronomic traits were selected. Among them, 13 genes with the variety-specific stop-gain variation may contribute to the phenotypic differences between Longli-4 and CA3-1 (Supplemental Table S2). For instance, only one domestication gene (AUR62007281) is mutated to obtain a premature stop codon in Longli-4, and its sequence identity against the Ramosa1 (Ra1) gene is 33.6%. Loss of function mutation in maize Ra1 gene results in extra and longer inflorescence branches and heterologous expression of the Ra1 gene in Arabidopsis increases the size of reproductive organs [44][45][46]. However, for the specific stop-gain variations of the CA3-1,the AUR62035617 gene with 68.6% sequence identity value against Waxy 1 (WX1) was identified. WX1 is an important agronomic trait-related gene playing a vital role in grain quality, especially for the amylose content of the grains [47]. In the rice, wx1 mutant produces the opaque, semitranslucent, or transparent grains [47].
Seed size is a crucial agronomic trait for breeding and directly affects grain yield [36]. AUR62037832 and AUR62003488, with 57.6% and 57.4% sequence identity values against the TTG2 and GS9, respectively, were identified for the CA3-1-specific stop-gain variation. The TTG2 (Transparent Testa Glabra2), a WRKY family transcription factor, can promote cell expansion in maternal tissues and thereby increase the seed size [48]. GS9 (Grain Shape Gene on Chromosome 9) plays an important role in seed formation, mainly controlling the seed shape, and the gs9 null mutant produces the slender grains [49]. The divergence in these two genes may underly the yield difference between Longli-4 and CA3-1 [31]. Interestingly, AUR62026114 was found to contain the stop-gain variants in both varieties.
This gene is the homolog of the CKX1 gene, which negatively regulate the number of grains by degrading the cytokinins [50]. Therefore, CKX1 should be the common target selected in both Longli-4 and CA3-1 breeding processes to increase the yield values.
Generally, plant pigments, flavonoids/anthocyanins, betalains, and carotenoids, are essential for the colorful appearance of plant organs [51,52]. Previous studies have revealed that the WRKY44 transcription factors can regulate the expression of the cytochrome P450-like1 and thereby control the betalain biosynthesis [53,54]. AUR62037832, a homolog of WRKY44, had a premature stop codon in CA3-1. The mutation of AUR62037832 may result in the different colors of the seed and stem between Longli-4 and CA3-1.
Flowering is an essential agronomic trait to be considered during the breeding process. The Unusual floral organs (UFO) is important for maintaining the normal development of floral [55]. The cytochrome P450 monooxygenases CDP/CYP90A1, is a key rate-limiting enzyme for the biosynthesis of brassinosteroids [56,57], which is involved in pollen tube development and promotion of flowering [58,59]. Delay of germination 1 (DOG1) acts in coordination with the abscisic acid (ABA) to delay seed germination [60] and is also essential for the flowering time regulation [61]. The AUR62036157, AUR62004597, and AUR62006898 are homologous genes of UFO, CDP/CYP90A1, and DOG1 in quinoa with the sequence identity values of 62.3%, 60.3%, and 28.9%. The stop-gain variants found in these three genes might affect the inflorescence morphology and flowering time of CA3-1, but the real effects remain to be confirmed.
High saponin content in the outer layer of quinoa grains makes it a major limiting factor for quinoa as a table food [6,62]. Four saponin-related genes, AUR62025671, AUR62031838, AUR62025695, and AUR62025696, were identified to contain the CA3-1specific stop-gain variations. These four gene were predicted to encode the beta-amyrin synthase (BAS) proteins, which are the key enzymes in saponin synthesis pathway [19]. These genes' mutations may influence the saponin accumulation, and will be the key candidates for further investigation of saponin components in CA3-1 grains.
In this study, there were many other variances possibly leading to the genetic divergence between Longli-4 and CA3-1 (Supplemental Table S2). These candidate genes may have important reference significance for further quinoa breeding. Since only a single individual from each of the varieties was sequenced, the number of candidate genes may be overestimated due to the difference in genetic background itself. More individual plants from different places with different cultivation methods are required to validate the candidate genes in future experiment.
In conclusion, conventional breeding has been widely used to develop quinoa cultivars [25][26][27][28][29][30][31][32]. However, mass selection, individual selection, and hybridization are highly time costed and can only select the plants with obvious phenotypic divergence. Markerassisted breeding can greatly accelerate the breeding process through genotype screening at very early growth stage. The genetic variations between Longli-4 and CA3-1 presented in this study may provide guidance for the molecular-assisted breeding of quinoa in China.

Plant Materials
To measure the heterozygosity rates of Longli-4 and CA3-1, individual plants from different sites with different cultivation methods were selected. For example, a total of 150 tested varieties including Longli-4 and CA3-1 were sown in Tianzhu county, Gansu province, by the Gansu Academy of Agricultural Sciences in 2019. Each variety had four separate rows with a distance of 40 cm, and each row was approximately 20 m. For each variety, five plants with similar phenotypes were harvested individually, and two of them (e.g., Longli-4-1 and -2 for Longli-4, and CA3-1-1 and -2 for CA3-1) were randomly selected for the following experiments. Longli-4-3 and CA3-1-3 came from the Agricultural Technology Extension and Service Center of Haiyuan County. Plot tests with Longli-4, CA3-1 and the other five varieties were conducted in the Haiyuan dryland in 2019 [31]. Each plot was 2.25 m 2 (1.5 m × 1.5 m) with five rows, and the distance between the two plots was 50 cm. For each variety, four replicated plots were used to evaluate the agronomic performance, and one single plant (e.g., Longli-4-3 for Longli-4, and CA3-1-3 for CA3-1) was randomly sampled for the tests in this study. Approximately 50 seeds from each of the selected plants were germinated on soaked filter papers, and the genomic DNA was extracted from each of the 48 seedlings individually. Since Longli-4 has been a commercial variety and the identity of seed multiplication is supposed to be homogeneity. The commercial seeds of Longli-4 were also included to test the heterozygosity rate, and 96 seedlings named Longli-4-4 were sampled for DNA extraction.

Phenotype Comparison
Seeds of Longli-4-2 and CA3-1-1 were sown in nutrient soil (Pindstrup, Denmark) in a PVC pot with a diameter of 5 cm and a depth of 10 cm, and water was supplied twice a week. The growth condition of the chamber was 14 h light/10 h dark. Four-month-old plants were used to measure the plant height, internode length, lateral branch number, stem diameter, leaf area, main panicle length, and main panicle diameter. The data were drawn in boxplots using the ggplot2 R package [63].

Genome Resequencing
Seeds of Longli-4-3 and CA3-1-3 were sowed into the pots filled with a mixture of nutrient-soil and loess (1:1), 5 cm in diameter, 10 cm in height, and then transferred into a common garden. Young leaves from individual plants in the common garden were sampled and stored in liquid nitrogen for DNA extraction. The sequencing libraries were constructed according to the standard Illumina protocol, and the paired-end sequencing was performed using the Illumina platform at Biomarker Technologies (Beijing, China). A total of 74,878,963 and 80,381,530 clean reads were obtained with Q30 values of 93.84% and 94.56% for Longli-4 and CA3-1, respectively. The clean reads were aligned back to the quinoa reference genome (PI614886) [19] for subsequent variation analyses. The properly mapped ratios were 97.47% and 96.12% for Longli-4 and CA3-1, respectively.

SNP and INDEL Calling and Statistical Analyses
After filtering the redundant reads by Picard (http://sourceforge.net/projects/picard/; version 1.94 (accessed on 24 August 2020)), SNP and INDEL calling were performed using GATK (version 3.8) with default setting [64]. Variations in Longli-4 and CA3-1, which cannot be mounted into the 18 pseudochromosomes, were filtered out. SNPs and INDELs in Longli-4 and CA3-1 against PI614886 were independently analyzed using RStudio, and the distribution patterns of SNPs or INDELs on each chromosome were presented by the CMplot R package [65]. Statistics of the number and the relative abundance of SNPs and INDELs were measured based on variation types and genomic positions using do [66] and data.table [67] R packages, and the ggplot2 R package [63] was employed to draw the stacked bar charts.
To identify the genetic variation between Longli-4 and CA3-1, the whole SNPs and INDELs were categorized into three groups based on the position information of the reference genome. The SNPs and INDELs classified into Longli-4-and CA3-1-specific groups were excluded from the following analyses since the information on these variances was lacking in one of the tested varieties. SNPs and INDELs which were at the same genome positions and detected in both varieties, were retained for the variation comparison between Longli-4 and CA3-1. The numbers of SNPs and INDELs showed the same and different variations between Longli-4 and CA3-1 were counted using do [66] and data.table [67] R packages, and results were shown in venn.plots by ggvenn R package. Among them, the variation numbers with the same genotypes against the reference genome were highlighted for each variety in SNP and INDEL venn.plots.

GO Enrichment Analyses and Variation in Candidate Genes
The SNP/INDEL effect was annotated using SnpEff [68] according to the reference genome annotation. For each variety, the genes with non-synonymous variation and stopgain mutations were identified, and GO enrichment analyses with a threshold of 0.01 were then performed using clusterProfiler package of R [69]. Results were shown in histograms using the ggplot2 package of R [63].

Heterozygosity
Homozygous or heterozygous SNPs and INDELs were identified using RStudio. According to the filtered SNP and INDEL results, the heterozygosity rates of Longli-4 and CA3-1 were calculated independently, and the boxplots of results were generated by the ggplot2 package of R.
One-week-old seedlings were used for the evaluation of the heterozygosity rates in Longli-4 and CA3-1 lines, and DNA was extracted with a simple method. Briefly, a single seedling was put into a 2 mL microcentrifuge tube with two beads. After being frozen in liquid nitrogen, the material was ground into a fine powder using a Retsch Mixer Mill MM 400 (Germany). The powder was then suspended with 150 µL Tris-EDTA buffer solution and incubated at 90 • C for 10 min. The tube was subsequently centrifuged at 10,000 r/min for 2 min, and the supernatant was pipetted as the template for the following PCR assays. Forty-eight seedlings were sampled for Longli-4-1, -2, -3 and CA3-1-1, -2, -3, while for the commercial line Longli-4-4, ninety-six seedlings were used. Nine INDEL markers were developed to test the heterozygosity rate of each line and the genotype of each seedling was identified based on the length and number of PCR bands.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/ijms232214030/s1, Figure S1: Phenotype comparisons of two quinoa varieties; Figure S2: Stacked bar charts of variation numbers in each of Longli-4 and CA3-1; Figure S3: Stacked bar charts of numbers of the pairwise SNPs and INDELs in Longli-4 and CA3-1; Figure S4: The shared SNPs and INDELs in Longli-4 and CA3-1; Figure S5: Boxplots of the heterozygosity values of SNP and INDEL in two sub-genomes; Figure S6: Genotyping using nine INDEL markers; Table S1: The distribution of SNPs or INDELs; Table S2: The non-synonymous and stop-gain genes caused by the variations in Longli-4, CA3-1 or both of them.
Author Contributions: Conceptualization, P.Z.; validation, X.L. and R.R.; formal analysis, X.L., R.R. and P.Z.; writing-original draft preparation, X.L. and P.Z.; writing-review and editing, P.Z. and G.C.; funding acquisition, P.Z. and G.C. All authors have read and agreed to the published version of the manuscript.

Data Availability Statement:
The data presented in this study are openly available in NCBI with the BioSample number of SAMN31693738 for Longli-4 and SAMN31693739 for CA3-1.