Integrating Genome-Wide Association Study with Transcriptomic Analysis to Predict Candidate Genes Controlling Storage Root Flesh Color in Sweet Potato

: Sweet potato is a hexaploid heterozygote with a complex genetic background, self-pollination infertility, and cross incompatibility, which makes genetic linkage analysis quite difﬁcult. Genome-wide association studies (GWAS) provide a new strategy for gene mapping and cloning in sweet potato. Storage root ﬂesh color (SRFC) is an important sensory evaluation, which correlates with storage root ﬂesh composition, such as starch, anthocyanin, and carotenoid. We performed GWAS using SRFC data of 300 accessions and 567,828 single nucleotide polymorphism (SNP) markers. Furthermore, we analyzed transcriptome data of different SRFC varieties, and conducted real-time quantitative PCR (qRT-PCR) to measure the expression level of the candidate gene in purple and non-purple ﬂeshed sweet potato genotypes. The results showed that ﬁve unique SNPs were signiﬁcantly ( − log 10 P > 7) associated with SRFC. Based on these trait-associated SNPs, four candidate genes, g55964 ( IbF3 (cid:48) H ), g17506 ( IbBAG2-like ), g25206 ( IbUGT-73D1-like ), and g58377 ( IbVQ25-isoform X2 ) were identiﬁed. Expression proﬁles derived from transcriptome data and qRT-PCR analyses showed that the expression of g55964 in purple-ﬂeshed sweet potato was signiﬁcantly ( p < 0.01) higher than that of non-purple ﬂeshed sweet potato. By combining the GWAS, transcriptomic analysis and qRT-PCR, we inferred that g55964 is the key gene related to purple formation of storage root in sweet potato. Our results lay the foundation for accelerating sweet potato genetic improvement of anthocyanin through marker-assisted selection.


GWAS Analysis
To increase the number of markers that can be tested for association with a trait, we used Beagle to perform genotype imputation [33]. GWAS was performed using the Essicient Mixed-Model Association expedited (EMMAX) software package (http://csg. sph.umich.edu//kang/emmax/download/index (accessed on 10 November 2021)) [34]. Kinship (K) matrix was also calculated using the EMMAX. The accession phenotypes were sorted using sort_pheno.pl. A mixed linear model (MLM) in EMMAX was used to test the associations. K-matrix and Q-matrix as covariates were used in the MLM to avoid spurious association [35]. The GWAS results were visually examined using Manhattan plots and quantile-quantile (QQ) plots which were drawn using R package. The p-value was calculated for each SNP and −log 10 P > 7 was defined as the suggestive threshold and genome-wide control threshold.

RNA-Seq Analysis
The transcriptomic data of root 'Xiangshu99 (white-fleshed) and 'Zhezi No1 (purplefleshed) was found by downloading the raw sequence data from the NCBI Short Read Archive (No. PRJNA721067) [36]; the transcriptomic data of root in 'Beniharuka (BH)' (yellow-fleshed) cultivar and its 'white-fleshed mutants (WH)' was found by downloading the raw sequence data from the NCBI Short Read Archive (No. PRJDB10052) [37]. Raw data were first processed with Fastpsoftware (https://github.com/OpenGene/fastp (accessed on 20 November 2021)) to remove adaptor sequences and low-quality reads. The clean reads were mapped to the sweet potato reference genome (https://sweetpotao.com/ (accessed on 20 November 2021)) using HISAT (http://ccb.jhu.edu/software/hisat2/index.shtml (accessed on 20 November 2021)) software [38]. Htseq-count was used to quantify the expression of known genes [39]. An mRNA was considered as a DE mRNA via the DESeq2 R package (http://bioconductor.org/packages/stats/bioc/DESeq2/ (accessed on 20 November 2021)) when it exhibited a two-fold or higher expression change and its FDR was below 0.05 in the comparisons [40]. The GO (Gene Ontology) and KEGG (Kyoto Encyclopedia of Genes and Genomes) annotation of mRNAs were conducted using EggNOG (Evolutionary Genealogy of Genes: Non-supervised Orthologous Groups), then we utilized enrichGO_pip.R and enrichKEGG_pip.R scripts to conduct GO and KEGG enrichment analyses for DE mRNAs, p < 0.05.

qRT-PCR
qRT-PCR analyses were carried out to determine the reliability of the RNA-seq results for expression profile analysis. All primers were designed according to the mRNA sequences and were synthesized commercially in the company Tianyi Huiyuan, Wuhan (Table S2). qRT-PCR for mRNAs were carried out in a 10 µL system: 1.0 µL cDNA product, 5 µL 2× qPCR mix, 3.2 µL nucleotide-free water and 0.4 µL 2.5 µM for each of the forward and reverse primers. The reactions were incubated in a BIO-RAD CFX96 for 30 s at 95 • C, followed by 40 cycles of 5 s at 95 • C, then 30 s at 60 • C. Three replications were used for all reactions and β-actin served as the endogenous reference gene. The 2 -∆∆CT method was employed to analyze the relative fold changes of genes [41].

Statistical Analysis
A total of 567,828 SNPs across 15 linkage groups (LG) remained after quality filtering. The 567,828 SNPs were subsequently used for further analysis. ADMIXTURE was employed to investigate population structure based on the maximum-likelihood method and generated a Q matrix [42]. A phylogenetic tree was constructed using MEGA7 based on the neighbor-joining method, visually edited using FigTree software (http://tree.bio. ed.ac.uk/software/fig-tree/(accessed on 10 November 2021)). A principal component analysis (PCA) was performed with PLINK software (www.cog-genomics.org/plink2 (accessed on 10 November 2021); v1.9). PCA plot was generated by drawing PCA.R. Linkage disequilibrium (LD) analysis was performed using PopLDdecay [43]. LD was estimated as squared allele frequency correlation (R 2 ), and only p-value ≤ 0.01 for each pair of loci were considered significant. LD plot was generated by perl Plot_MultiPop.pl.

SLAF-Sequencing of 300 Sweet Potato Accessions
In our previous study, we constructed SLAF-seq library of 300 sweet potato accessions using SLAF-seq technology and obtained a total of 498.14 Mb reads [44]. In this study, we mapped the raw data to the sweet potato reference genome with BWA, the rate ranged from 6.46% to 95.11%, with an average rate of 92.61% (Table S3). After filtering, we identified 567,828 high confidence SNPs (Table S4) (missing data < 50%, minor allele frequency (MAF) > 5%) which were used for subsequent analyses. The SNPs ranged from 29,393 (LG10) to 50,397 (LG2), and were distributed across 15 LGs of sweet potato (Figure 1)  analysis (PCA) found that the 300 sweet potato accessions were clustered into 4 groups (K = 4), namely groups 1, 2, 3, and 4 ( Figure 2c). Group 1, 2, 3 and 4 contain 69, 71, 83, and 77 accessions, respectively. PCA clustering showed that the accessions of the core set were distributed into two clusters: landraces and modern cultivars. Principal component 1 (PC1) represents 15.51% of the total variation, while PC2 represents 12.0% of the total variation ( Figure 2d). LD analysis showed that the delay rate of landraces was faster than that of modern cultivars (Figure 2e), suggesting a high frequency of genetic recombination in landraces, further supporting the richer genetic diversity of landraces.  LGs of sweet potato.

Population Structure and Genomic Variation among the 300 Sweet Potato Accessions
The 300 sweet potato accessions included 76 landraces and 224 modern cultivars, which corresponded to roughly their geographical distributions in Asia and North America, respectively ( Figure 2a). Neighbor-joining cluster analysis showed that the 300 accessions were divided into seven groups ( Figure 2b); the seven groups contained 63, 23, 24, 21, 66, 56, 47 accessions, respectively. The population structure analysis and principal component analysis (PCA) found that the 300 sweet potato accessions were clustered into 4 groups (K = 4), namely groups 1, 2, 3, and 4 ( Figure 2c). Group 1, 2, 3 and 4 contain 69, 71, 83, and 77 accessions, respectively. PCA clustering showed that the accessions of the core set were distributed into two clusters: landraces and modern cultivars. Principal component 1 (PC1) represents 15.51% of the total variation, while PC2 represents 12.0% of the total variation ( Figure 2d). LD analysis showed that the delay rate of landraces was faster than that of modern cultivars (Figure 2e), suggesting a high frequency of genetic recombination in landraces, further supporting the richer genetic diversity of landraces.

GWAS for Identifying Significant SNPs Associated with SRFC
SRFC is one of the most important traits associated with the quality of sweet potato. There is a correlation between the color and chemical composition of sweet potato flesh. We conducted GWAS to identify significant SNP associated with SRFC by utilizing the 567,828 SNP markers (Table S5). GWAS results were visually examined by using Manhattan plots ( Figure 3a) and quantile-quantile plots ( Figure 3b). The line shows the threshold for genome-wide significance (−log 10 P > 7). A total of 5 significant SNPs associated with SRFC were identified (Table 2).
Interestingly, g55964, F3 H, was identified within the 4460,608-4466,687 bp in LG14, where 1 SNP (LG14: 467,066) was strongly associated with SRFC. GWAS analysis revealed that the SNP was located at~379 bp upstream of the start codon of the F3 H which is involved in anthocyanins biosynthesis.
g17506, BAG2-like, was identified within the 5866,570-5867,826 bp in LG5, where one SNP (LG5: 5,867,009) was strongly associated with SRFC. GWAS analysis discovered that the SNP was located in the intron region of the BAG2-like.
g25206, UGT-73D1-like, was identified within the 31,266,628-31,269,316 bp in LG6, where two SNPs (LG6: 31,267,101 and LG6: 31,267,107) were strongly associated with SRFC. GWAS analysis found that the two SNPs were located in the UTR3 of the UGT-73A-like.
g58377, VQ25 isoform X2, was identified within the 21,793,980-21,794,619 bp in LG14, where one SNP (LG14: 21,791,164) was strongly associated with SRFC. GWAS analysis showed that the SNP was located at~2816 bp upstream of VQ motif-containing protein 25 isoform X2.  where two SNPs (LG6: 31,267,101 and LG6: 31,267,107) were strongly associated with SRFC. GWAS analysis found that the two SNPs were located in the UTR3 of the UGT-73A-like. g58377, VQ25 isoform X2, was identified within the 21,793,980-21,794,619 bp in LG14, where one SNP (LG14: 21,791,164) was strongly associated with SRFC. GWAS analysis showed that the SNP was located at ~2816 bp upstream of VQ motif-containing protein 25 isoform X2.

Differentially Expressed Gene between White-Fleshed and Purple-Fleshed Sweet Potato
To study the key genes that regulate purple formation of root in sweet potato, we analyzed the transcriptomic data of root in 'Xiangshu99 (white-fleshed) and 'Zhezi No1 (purple-fleshed). In the transcriptomic analysis, the expression levels of the genes were shown in Table S6a. Compared with 'Xiangshu99 , 442 upregulated and 117 downregulated DEGs in 'Zhezi No1'were detected (Table S6b). GO enrichment analysis was conducted to further understand the function of the identified DEGs. A total of 139 (Table S7a) and 64 (Table S7b) terms were significantly enriched in the upregulated and downregulated genes, with p < 0.05, respectively. The 20 most enriched GO terms in the upregulated and downregulated genes are shown in Figure 4. The "flavonoid biosynthetic process" (GO: 0009813) and "flavonoid metabolic process" (GO: 0009812) were the most abundant upregulated DEGs (Figure 4a). The predominantly enriched pathways in the downregulated DEGs were the "Phenylpropanoid biosynthetic process" (GO: 0009699) and "Suberin biosynthetic process" (GO: 0010345) (Figure 4b). The KEGG enrichment analysis found that a total of five (Table S8a) and five (Table S8b) terms were significantly enriched in the upregulated and downregulated genes, with p < 0.05, respectively. The "Flavonoid biosynthesis" (ko00941), and "Anthocyanin biosynthesis" (ko00942) were the most abundant in upregulated DEGs (Figure 5a). The predominantly enriched pathways in the downregulated DEGs were involved in the "Protein processing in endoplasmic reticulum" (ko04141), whereas none of the metabolic pathways were significantly enriched (Figure 5b). The results of GO and KEGG enrichment implied that the flavonoid anthocyanin biosynthetic-related genes regulate the purple formation of roots in sweet potato. "Flavonoid biosynthesis" (ko00941), and "Anthocyanin biosynthesis" (ko00942) were the most abundant in upregulated DEGs (Figure 5a). The predominantly enriched pathways in the downregulated DEGs were involved in the "Protein processing in endoplasmic reticulum" (ko04141), whereas none of the metabolic pathways were significantly enriched (Figure 5b). The results of GO and KEGG enrichment implied that the flavonoid anthocyanin biosynthetic-related genes regulate the purple formation of roots in sweet potato.

Differentially Expressed Gene between White Fleshed and Yellow-Fleshed Sweet Potato
To identify the key genes involved in yellow-fleshed, we analyzed the transcriptomic data of the root in 'Beniharuka (BH)' (yellow-fleshed) cultivar and its 'white-fleshed mutants (WH)'. In the transcriptomic analysis, the expression levels of the genes were shown in Table S9a. Compared with 'WH', a total of 108 upregulated and 97 downregulated DEGs were detected in 'Beniharuka' (Table S9b). GO enrichment analysis was conducted to further understand the function of these DEGs, and a total of 75 (Table  S10a) and 56 (Table S10b) terms were significantly enriched in the upregulated and downregulated genes, with a p < 0.05, respectively. The 20 most enriched GO terms in the upregulated and downregulated genes were shown in Figure 6. Among the GO terms in upregulated genes, "abscisic acid transport" (GO: 0080168), "isoprenoid transport" (GO: 0046864), and "terpenoid transporter" (GO: 0046865) were significantly enriched in upregulated genes (Figure 6a). The predominantly enriched pathways in the downregulated DEGs were reported for "cadmium ion transport" (GO: 001569) (Figure 6b). The

Differentially Expressed Gene between White Fleshed and Yellow-Fleshed Sweet Potato
To identify the key genes involved in yellow-fleshed, we analyzed the transcriptomic data of the root in 'Beniharuka (BH)' (yellow-fleshed) cultivar and its 'white-fleshed mutants (WH)'. In the transcriptomic analysis, the expression levels of the genes were shown in Table S9a. Compared with 'WH', a total of 108 upregulated and 97 downregulated DEGs were detected in 'Beniharuka' (Table S9b). GO enrichment analysis was conducted to further understand the function of these DEGs, and a total of 75 (Table S10a) and 56 (Table S10b) terms were significantly enriched in the upregulated and downregulated genes, with a p < 0.05, respectively. The 20 most enriched GO terms in the upregulated and downregulated genes were shown in Figure 6. Among the GO terms in upregulated genes, "abscisic acid transport" (GO: 0080168), "isoprenoid transport" (GO: 0046864), and "terpenoid transporter" (GO: 0046865) were significantly enriched in upregulated genes (Figure 6a). The predominantly enriched pathways in the downregulated DEGs were reported for "cadmium ion transport" (GO: 001569) (Figure 6b). The KEGG enrichment analysis found that a total of 16 (Table S11a) and one (Table S11b) terms were significantly enriched in the upregulated and downregulated genes, with p < 0.05, respectively. The most abundant upregulated DEGs were observed for "Glycolysis/Gluconeogenesis" (ko00010) (Figure 7a). The enriched pathways in the downregulated DEGs were involved in "Isoquinoline alkaloid biosynthesis" (ko00950) (Figure 7b).The results of GO enrichment implied that the terpenes transporter-related genes regulate the yellow formation of sweet potato root.

Exploring Candidate Genes for SRFC by RNA-Seq
According to the result of RNA-seq, a total of 35 GO terms were found to be associated with the two candidate genes, and four KEGG pathways were found to be associated with two candidate genes (Tables 3 and 4). GO analysis showed that g55964 belonged to "secondary metabolic process", "secondary metabolite biosynthetic process", "metabolic process", and "biosynthetic process", whereas g25206 belonged to "glucosyltransferase activity" and "UDP-glycosyltransferase activity". KEGG analysis suggested that g55964 and g25206 were involved in the "biosynthesis of secondary metabolites", signifying the involvement of g55964 and g25206 in the biosynthesis of metabolites in sweet potato.

Candidate Genes Expression Analysis
We quantified the expression of four candidate genes in the flesh of sweet potato varieties that varied in color by analyzing the transcriptomic data ( Figure 6, Table 5). The result displayed that only g55964 displayed significant differential expression between 'Zhezi No1 and 'Xiangshu 99 . We found that the expression of g55964 in 'Zhezi No1 was 46.8-fold higher than that of 'Xiangshu99 (Table S6b), and all candidate genes had no expression in 'WH' and 'BH' (Figure 8). We performed qRT-PCR to quantify the relative expression of g55964 among eight varieties. The result of qRT-PCR showed that the expression of g55964 in purple-fleshed sweet potato was significantly (p < 0.01) higher than that of non-purple fleshed sweet potato ( Figure 9, Table S12).

Genetic Variation among the 300 Accessions
In our previous study, we sequenced and characterized the genetic diversities of 300 natural accessions of sweet potato by SLAF-seq, most of which were from major sweet

Genetic Variation among the 300 Accessions
In our previous study, we sequenced and characterized the genetic diversities of 300 natural accessions of sweet potato by SLAF-seq, most of which were from major sweet

Genetic Variation among the 300 Accessions
In our previous study, we sequenced and characterized the genetic diversities of 300 natural accessions of sweet potato by SLAF-seq, most of which were from major sweet potato production regions (i.e., America, Africa, China, Japan, and Korea) [28]. In this study, population-genetics analyses of the SNPs were generated by mapping reads to hexaploid sweet potato reference genomes, the 'Taizhong6 . Based on population structure and PCA, 300 sweet potato accessions were clustered into four groups. The faster LD decay rate of landraces was faster than that of modern cultivars, suggesting that the genetic diversity of landraces was higher than modern cultivars. The result was consistent with He et al. [45], who reported that sweet potato landraces possess high genetic diversity and that the genetic diversity of modern cultivars is low.

GWAS Analysis of SRFC in Sweet Potato
In order to mine the candidate genes for SRFC, we recorded the SRFC of 300 sweet potato accessions, and we used the SNPs site to conduct GWAS of SRFC. We identified five potential SNPs which were associated with SRFC (−log 10 P > 7), and linked to four candidate genes. g55964 was annotated as F3 H, catalyzing naringenin and dihydokaempferol, and generated eriodictyol and dihydroquercetin which are important intermediates in the biosynthesis of anthocyanins and procyanidins [46]. We found that g25206 was annotated as UGT-73D1-like. It has been reported that uridine diphosphate glycosyltransferase (UGT) regulates the bioactivity, solubility, and transportation of receptors within the cells through modulating glycosyl transfer reaction [47]. UGT can catalyze flavonols and anthocyanins into corresponding glycosides [48,49]. g17506 was annotated as BAG2-like. Previously, it was reported that BCL2-associated athanogene (BAG) family members can regulate plant stress and development [50]. To the best of our knowledge, the BAG family has not been reported to be involved in metabolite synthesis. The g58377 was annotated as VQ25-isoform X2. The VQ motif-containing proteins (VQ) can interact with the WRKY transcription factor through the conserved VQ domain and plays an important role in the signal pathway mediated by the WRKY transcription factor [51,52]. The WRKY transcription factor can control the secondary metabolites synthesis, such as phenols, terpenes, and alkaloids [53]. Among them, anthocyanins and carotenoids are the two main pigments in sweet potato; it is not clear whether VQ can interact with WRKY transcription factors to regulate the synthesis of anthocyanins and carotenoids. Whether BAG and VQ are involved in metabolite synthesis remained to be studied.

Functional Analysis of DEGs in Different SRFC Varieties
The GO and KEGG enrichment result showed that flavonoid and anthocyanin biosyntheticrelated genes regulated the purple formation of the root in sweet potato. Li et al. also reported that flavonoid and anthocyanin biosynthesis were related to the flesh color of storage root [54]. The g55694 was annotated into the flavonoid biosynthesis pathway, and its expression in purplefleshed sweet potato was much higher than that of non-purple fleshed sweet potato. Thus we speculated that g55694 was the one of the key genes involved in the purple formation of the root in sweet potato.

Quantitative Trait Loci Analysis of Flesh Color in Sweet Potato
Previously, some studies have reported that QTLs significantly associated with flesh color in sweet potatoes are located on LG3 and LG12. Zhang et al. [55] performed GWAS for flesh color in purple sweet potato and identified one QTL for IbMYB1-2 on LG12. Gemenet et al. [56] identified two major QTLs that affect β-carotene and flesh color; they are phytoene synthase and Orange gene located within QTLs on LG3 and LG12, respectively. Yan et al. [57] identified ten QTLs for flesh color, skin color, and anthocyanin content on LG12 based on the linkages map. In this study, we identified one QTL for flesh color on LG14, the QTL was linked to F3 H which was involved in flavonoid biosynthesis.
Anthocyanins and carotenes are responsible for sweet potato storage root colors, with complex components, which are affected by various environmental factors. Thus, further research is required to dissect the genomic mechanisms controlling different types of anthocyanins and carotenes.

Conclusions
We recorded SRFC of 300 sweet potato accessions and conducted GWAS in sweet potato for SRFC. The study identified five unique SNPs which were significantly associated with SRFC trait, and linked to four genes. The transcriptome analyses showed that anthocyanin is responsible for the formation of purple roots in sweet potato, while carotenoid is responsible for the formation of yellow roots. We also employed GWAS and transcriptomic analyses to identify one QTL for flesh color on LG14, and the QTL was linked to g55964 which was involved in flavonoid biosynthesis.
Supplementary Materials: The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/agronomy12050991/s1. Table S1: The information of 300 accessions; Table S2: The primer sequencing information of genes for qRT-PCR; TableS3: The mapped rate of 300 accessions' raw data to the sweet potato reference genome; Table S4: The information of 567,828 SNPs; Table S5: The result of GWAS for identifying significant SNP associated with SRFC; Table S6a: The expression level of genes in 'Xiangshu99' and 'Zhezi No1'; Table S6b: The list of DEGs between 'Xiangshu99' and 'Zhezi No1'; Table S7a: The GO enrichment analysis of up-regulated DEGs between 'Xiangshu99' and 'Zhezi No1'; Table S7b: The GO enrichment analysis of down-regulated DEGs between 'Xiangshu99' and 'Zhezi No1'; Table S8a: The KEGG enrichment analysis of upregulated DEGs between 'Xiangshu99' and 'Zhezi No1'; Table S8b: The KEGG enrichment analysis of down-regulated DEGs between 'Xiangshu99' and 'Zhezi No1'; Table S9a: The expression level of genes in 'WH' and 'BH'; Table S9b: The list of DEGs between 'WH' and 'BH'; Table S10a: The GO enrichment analysis of up-regulated DEGs between 'WH' and 'BH'; Table S10b: The GO enrichment analysis of down-regulated DEGs between 'WH' and 'BH'; Table S11a: The KEGG enrichment analysis of up-regulated DEGs between 'WH' and 'BH'; Table S11b: The KEGG enrichment analysis of down-regulated DEGs between 'WH' and 'BH'; Table S12: The relative expression of g55964 in eight varieties.
Author Contributions: C.J. and X.Y. designed the study. Y.L. and R.P. analyzed the data and prepared the manuscript. W.Z., L.W., J.L., S.C. and X.J. provided manuscript revision services. All authors have read and agreed to the published version of the manuscript.

Conflicts of Interest:
The authors declare no conflict of interest.