Comparative Metabolome and Transcriptome Analysis of Anthocyanin Biosynthesis in White and Pink Petals of Cotton (Gossypium hirsutum L.)

Upland cotton (Gossypium hirsutum L.) is one of the important fiber crops. Cotton flowers usually appear white (or cream-colored) without colored spots at the petal base, and turn pink on the next day after flowering. In this study, using a mutant showing pink petals with crimson spots at their base, we conducted comparative metabolome and transcriptome analyses to investigate the molecular mechanism of coloration in cotton flowers. Metabolic profiling showed that cyanidin-3-O-glucoside and glycosidic derivatives of pelargonidins and peonidins are the main pigments responsible for the coloration of the pink petals of the mutant. A total of 2443 genes differentially expressed (DEGs) between the white and pink petals were identified by RNA-sequencing. Many DEGs are structural genes and regulatory genes of the anthocyanin biosynthesis pathway. Among them, MYB21, UGT88F3, GSTF12, and VPS32.3 showed significant association with the accumulation of cyanidin-3-O-glucoside in the pink petals. Taken together, our study preliminarily revealed the metabolites responsible for the pink petals and the key genes regulating the biosynthesis and accumulation of anthocyanins in the pink petals. The results provide new insights into the biochemical and molecular mechanism underlying anthocyanin biosynthesis in upland cotton.


Introduction
As one of the visible traits of plants, color is of great significance to the growth and development of plants. Many studies have shown that carotenoids, flavonoids, and betalains are the main pigments in plant color formation [1]. Particularly, flavonoids have attracted more attention because they are widely distributed secondary metabolites in plants, and contribute to tissue coloring, plant environmental adaptation, fruit development, and even human health [2]. Anthocyanins are water-soluble pigments belonging to the flavonoid family that mainly contribute to the red/pink/blue/purple coloration of plants, often forming complex color patterns, such as spots, stripes, and veins, in different organs of plants, particularly in flowers [1,3,4]. Flower coloration is one of the most attractive traits of ornamental plants, mainly determined by the type and content of six different anthocyanidins. Peonidin and cyanidin contribute to colors from pink to red; pelargonin contributes to brick red; and delphinidin, malvidin, and petunidin contribute to the colors between purple and blue [5][6][7].
To date, anthocyanin biosynthesis, one branch of the flavonoid biosynthesis pathway, has been almost completely elucidated in many plant species, such as arabidopsis (Arabidopsis thaliana), maize (Zea mays), apple (Malus domestica), tomato (Solanum lycopersicum), grapevine (Vitis vinifera) and potato (Solanum tuberosum) [8][9][10][11]. Firstly, phenylalanine, as the precursor of anthocyanin biosynthesis, is catalyzed by phenylalanine resource for exploring the mechanism of anthocyanin biosynthesis and accumulation in upland cotton. To elucidate the changes in metabolites and transcriptional regulation of the anthocyanin biosynthetic pathways in the PF mutant, using the white flower cotton cultivar X74 as a control, we compared their metabolome and transcriptome. We identified metabolites unique to the pink petals and uncovered the key structural genes and regulatory factors associated with the biosynthesis and accumulation of anthocyanins in the pink petals. . Different letters represented statistically significant differences (one-way ANOVA, p < 0.05). Scale bar: 1 cm. PA, PB, WA, and WB represent the spot region and non-spot region of PF and X74 petals, respectively.

Analysis of Anthocyanin Composition in Petals of X74 and PF
The total contents of anthocyanins in the petals of PF (PA and PB) and X74 (WA and WB) ( Figure 1A-D) were measured by a spectrophotometer. The total anthocyanins were significantly higher in PA and PB, particularly in PA, compared to WA and WB, in which only a very small amount was detected ( Figure 1E). To further explore the metabolic mechanism of the PF flowers, we analyzed the anthocyanin metabolites in PA, PB, WA, and WB ( Figure 2C). In total, 38 anthocyanin metabolites were identified (Table S2), and 9 anthocyanin metabolites are unique to the pink petals ( Figure 2B). In addition, 4 unique anthocyanin metabolites were found in the spot region (PA) of the pink petals compared with the non-spot region (PB), but their concentrations are relatively low. More anthocyanin metabolites were detected in PA (36) and PB (33) than in WA (23) and WB (24). Quercetin-3-O-glucoside and delphinidin are the main differential metabolites between the spotted (PA and WA) and non-spotted (PB and WB) petal regions. The main difference between pink (PA and PB) and white (WA and WB) petals is in cyanidin and procyanidin metabolites. It is worth noting that the content of cyanidin-3-O-glucoside in PA and PB is 743-and 310-fold higher than that in WA and WB, respectively, and that the concentration of cyanidin-3-O-glucoside is higher in PA than in PB ( Figure 2C and Table  S2), indicating that cyanidin-3-O-glucoside might be crucial for the pink color formation. and PF petals. All data are shown as mean ± SE (n = 3). Different letters represented statistically significant differences (one-way ANOVA, p < 0.05). Scale bar: 1 cm. PA, PB, WA, and WB represent the spot region and non-spot region of PF and X74 petals, respectively.

Analysis of Anthocyanin Composition in Petals of X74 and PF
The total contents of anthocyanins in the petals of PF (PA and PB) and X74 (WA and WB) ( Figure 1A-D) were measured by a spectrophotometer. The total anthocyanins were significantly higher in PA and PB, particularly in PA, compared to WA and WB, in which only a very small amount was detected ( Figure 1E). To further explore the metabolic mechanism of the PF flowers, we analyzed the anthocyanin metabolites in PA, PB, WA, and WB ( Figure 2C). In total, 38 anthocyanin metabolites were identified (Table S2), and 9 anthocyanin metabolites are unique to the pink petals ( Figure 2B). In addition, 4 unique anthocyanin metabolites were found in the spot region (PA) of the pink petals compared with the non-spot region (PB), but their concentrations are relatively low. More anthocyanin metabolites were detected in PA (36) and PB (33) than in WA (23) and WB (24). Quercetin-3-O-glucoside and delphinidin are the main differential metabolites between the spotted (PA and WA) and non-spotted (PB and WB) petal regions. The main difference between pink (PA and PB) and white (WA and WB) petals is in cyanidin and procyanidin metabolites. It is worth noting that the content of cyanidin-3-O-glucoside in PA and PB is 743-and 310-fold higher than that in WA and WB, respectively, and that the concentration of cyanidin-3-Oglucoside is higher in PA than in PB ( Figure 2C and Table S2), indicating that cyanidin-3-Oglucoside might be crucial for the pink color formation. Intensity values were adjusted by log transformation and then normalized. PA, PB, WA, and WB represent the spot region and non-spot region of PF and X74 petals, respectively. PA1/PA2/PA3, PB1/PB2/PB3, WA1/WA2/WA3, and WB1/WB2/WB3 represents three biological replicates for each sample, respectively.

Overview of the RNA Sequencing Data
To explore the molecular mechanism related to the pink coloration of the PF petals, we performed RNA-seq using samples collected from the WA, WB, PA, and PB regions ( Figures 1B,D and 2A). Clean reads were obtained after filtering out low-quality reads. The Q20 and Q30 of each sample were greater than 97.4% and 92.63%, respectively. The GC content of the reads was 42.97-43.42%. Between 96.54% and 97.04% of the clean reads were mapped to the upland cotton genome (Gossypium hirsutum, ZJU-improved_v2.1_a1). Among the mapped reads, the uniquely mapped reads accounted for 92.06-92.95% (Table  S3). The Pearson correlation coefficient and principal component analysis of the samples based on the FPKM (fragments per kilobase of transcript per million mapped reads) values showed that the biological replicates exhibited similar expression patterns, indicating the high reliability of the sequencing data ( Figure S2). To further confirm the accuracy of the transcriptomic data, we selected 11 differentially expressed genes of the anthocyanin biosynthesis pathway for qRT-PCR verification. Although there were slight differences between the results of the RNA-seq and qRT-PCR, the overall trend was consistent ( Figure S3).

Analysis of Functional Enrichment of Differentially Expressed Genes
To identify the DEGs (based on FDR ≤ 0.05 and fold change ≥ 2) involved in PF flower coloration, based on FPKM values, we did four pairwise comparisons. A total of 9243, 7640, 1719, and 1642 DEGs were identified in the pairwise comparison of PB_vs_PA, WB_vs_WA, WA_vs_PA, and WB_vs_PB, respectively. Interestingly, the number of DEGs between the spotted and non-spotted regions of petals (PB_vs_PA and WB_vs_WA) was much greater than that between the same part of petals of different genotypes (WA_vs_PA

Overview of the RNA Sequencing Data
To explore the molecular mechanism related to the pink coloration of the PF petals, we performed RNA-seq using samples collected from the WA, WB, PA, and PB regions ( Figures 1B,D and 2A). Clean reads were obtained after filtering out low-quality reads. The Q20 and Q30 of each sample were greater than 97.4% and 92.63%, respectively. The GC content of the reads was 42.97-43.42%. Between 96.54% and 97.04% of the clean reads were mapped to the upland cotton genome (Gossypium hirsutum, ZJU-improved_v2.1_a1). Among the mapped reads, the uniquely mapped reads accounted for 92.06-92.95% (Table S3). The Pearson correlation coefficient and principal component analysis of the samples based on the FPKM (fragments per kilobase of transcript per million mapped reads) values showed that the biological replicates exhibited similar expression patterns, indicating the high reliability of the sequencing data ( Figure S2). To further confirm the accuracy of the transcriptomic data, we selected 11 differentially expressed genes of the anthocyanin biosynthesis pathway for qRT-PCR verification. Although there were slight differences between the results of the RNA-seq and qRT-PCR, the overall trend was consistent ( Figure S3).

Analysis of Functional Enrichment of Differentially Expressed Genes
To identify the DEGs (based on FDR ≤ 0.05 and fold change ≥ 2) involved in PF flower coloration, based on FPKM values, we did four pairwise comparisons. A total of 9243, 7640, 1719, and 1642 DEGs were identified in the pairwise comparison of PB_vs._PA, WB_vs._WA, WA_vs._PA, and WB_vs._PB, respectively. Interestingly, the number of DEGs between the spotted and non-spotted regions of petals (PB_vs._PA and WB_vs._WA) was much greater than that between the same part of petals of different genotypes (WA_vs._PA and WB_vs._PB) ( Figure 3A). A Venn-diagram illustrates that 5395 and 669 genes were unique in different petal tissues of the same genotype (PB_vs._PA and WB_vs._WA) and the same petal tissue of different genotypes (WA_vs._PA and WB_vs._PB), respectively ( Figure 3B). Accordingly, hierarchical clustering analysis also showed a clear difference in the expression patterns of the DEGs between different genotypes and tissues ( Figure 3C). and WB_vs_PB) ( Figure 3A). A Venn-diagram illustrates that 5395 and 669 genes were unique in different petal tissues of the same genotype (PB_vs_PA and WB_vs_WA) and the same petal tissue of different genotypes (WA_vs_PA and WB_vs_PB), respectively ( Figure 3B). Accordingly, hierarchical clustering analysis also showed a clear difference in the expression patterns of the DEGs between different genotypes and tissues ( Figure  3C). We assigned all the DEGs of different comparison groups to three major GO categories, including cellular components (CC), molecular functions (MF), and biological processes (BP) ( Table S4). The top CC sub-categories in WB_vs_WA and PB_vs_PA were 'integral component of plasma membrane', 'kinesin complex', and 'extracellular region part'. The DEGs in WA_vs_PA and WB_vs_PB were not significantly enriched in CC terms ( Figure 4A and Table S4). Of the MF categories, 'DNA/RNA polymerase activity', 'endonuclease activity', and 'ribonuclease activity' were found to be enriched in WA_vs_PA and WB_vs_PB. The DEGs from WB_vs_WA and PB_vs_PA were enriched with 'CoA synthase activity', 'transporter activity', and 'glycosyl transferase activity'. The BP terms of 'DNA integration', 'histone ubiquitination', and 'regulation of cell division' were enriched in WA_vs_PA and WB_vs_PB comparisons ( Figure 4A and Table S4). The GO terms 'cuticle development', 'phenylpropanoid biosynthetic process', and 'drug transmembrane transport' were the top ones in WB_vs_WA and PB_vs_PA comparisons. In addition, genes associated with sugar metabolic, starch metabolic, lignin, flavonoid, and sterol biosynthetic process were also found to be enriched in WB_vs_WA and PB_vs_PA (Table S4).
All the DEGs from different comparison groups were further subjected to KEGG pathway enrichment analysis (Table S5). It showed that a high number of genes were associated with 'Metabolic pathways' and 'Secondary metabolite biosynthesis', followed by 'Plant hormone signal transduction', 'MAPK signaling pathway-plant' and 'carbohydrate metabolism', including 'Carbon metabolism', 'Starch and sucrose metabolism', and 'Fructose and mannose metabolism' ( Figure 4B and Table S5). In We assigned all the DEGs of different comparison groups to three major GO categories, including cellular components (CC), molecular functions (MF), and biological processes (BP) ( Table S4). The top CC sub-categories in WB_vs._WA and PB_vs._PA were 'integral component of plasma membrane', 'kinesin complex', and 'extracellular region part'. The DEGs in WA_vs._PA and WB_vs._PB were not significantly enriched in CC terms ( Figure 4A and Table S4). Of the MF categories, 'DNA/RNA polymerase activity', 'endonuclease activity', and 'ribonuclease activity' were found to be enriched in WA_vs._PA and WB_vs._PB. The DEGs from WB_vs._WA and PB_vs._PA were enriched with 'CoA synthase activity', 'transporter activity', and 'glycosyl transferase activity'. The BP terms of 'DNA integration', 'histone ubiquitination', and 'regulation of cell division' were enriched in WA_vs._PA and WB_vs._PB comparisons ( Figure 4A and Table S4). The GO terms 'cuticle development', 'phenylpropanoid biosynthetic process', and 'drug transmembrane transport' were the top ones in WB_vs._WA and PB_vs._PA comparisons. In addition, genes associated with sugar metabolic, starch metabolic, lignin, flavonoid, and sterol biosynthetic process were also found to be enriched in WB_vs._WA and PB_vs._PA (Table S4). addition, the DEGs identified in PB_vs_PA (spot vs. non-spot region) was significantly enriched with the pathways of 'fatty acid elongation', 'ABC transporters' and 'Glutathione metabolism'. Traditional strategies for gene expression analysis have focused on identifying individual genes that exhibit differences between two states of interest. Moreover, most determine whether a group of differentially expressed genes is enriched for a pathway or ontology term (such as GO and KEGG) by using overlap statistics such as the cumulative hypergeometric distribution. Although useful, due to the defect of threshold screening of GO and KEGG enrichment analysis, some genes with insignificant expression difference but important biological significance will be ignored. By contrast, GSEA considers all the genes in the experiment, not only those genes that were above any cutoff in terms of foldchange or significance. We thus further investigated the key genes correlated with flower pigmentation by GSEA (Gene Set Enrichment Analysis) using a KEGG-based list (196 gene sets). The up-regulated gene sets from all comparisons are provided in Table S6. Many of the top 20 pathways from each comparison overlapped between PB_vs_PA and WB_vs_WA as well as between WA_vs_PA and WB_vs_PB ( Figure 5A). Consistent with the KEGG enrichment analysis results, 'Glycolysis/Gluconeogenesis', 'Pentose phosphate pathway', 'Fructose and mannose metabolism', 'MAPK signaling pathway-plant', and 'Plant hormone signal transduction' were mainly related to the higher expression gene sets ( Figures 4B and 5B-E). In addition, the 'Fatty acid elongation', 'ABC transporters', and 'Glutathione metabolism' pathways were significantly enriched in KEGG analysis did not appear in the GSEA analysis. By contrast, 'Anthocyanin biosynthesis' in PB_vs_PA All the DEGs from different comparison groups were further subjected to KEGG pathway enrichment analysis (Table S5). It showed that a high number of genes were associated with 'Metabolic pathways' and 'Secondary metabolite biosynthesis', followed by 'Plant hormone signal transduction', 'MAPK signaling pathway-plant' and 'carbohydrate metabolism', including 'Carbon metabolism', 'Starch and sucrose metabolism', and 'Fructose and mannose metabolism' ( Figure 4B and Table S5). In addition, the DEGs identified in PB_vs._PA (spot vs. non-spot region) was significantly enriched with the pathways of 'fatty acid elongation', 'ABC transporters' and 'Glutathione metabolism'.
Traditional strategies for gene expression analysis have focused on identifying individual genes that exhibit differences between two states of interest. Moreover, most determine whether a group of differentially expressed genes is enriched for a pathway or ontology term (such as GO and KEGG) by using overlap statistics such as the cumulative hypergeometric distribution. Although useful, due to the defect of threshold screening of GO and KEGG enrichment analysis, some genes with insignificant expression difference but important biological significance will be ignored. By contrast, GSEA considers all the genes in the experiment, not only those genes that were above any cutoff in terms of fold-change or significance. We thus further investigated the key genes correlated with flower pigmentation by GSEA (Gene Set Enrichment Analysis) using a KEGG-based list (196 gene sets). The up-regulated gene sets from all comparisons are provided in Table S6. Many of the top 20 pathways from each comparison overlapped between PB_vs._PA and WB_vs._WA as well as between WA_vs._PA and WB_vs._PB ( Figure 5A). Consistent with the KEGG enrichment analysis results, 'Glycolysis/Gluconeogenesis', 'Pentose phosphate pathway', 'Fructose and mannose metabolism', 'MAPK signaling pathway-plant', and 'Plant hormone signal transduction' were mainly related to the higher expression gene sets ( Figures 4B and 5B-E). In addition, the 'Fatty acid elongation', 'ABC transporters', and 'Glutathione metabolism' pathways were significantly enriched in KEGG analysis did not appear in the GSEA analysis. By contrast, 'Anthocyanin biosynthesis' in PB_vs._PA and WB_vs._WA and 'Plant-pathogen interaction' in WA_vs._PA and WB_vs._PB were only found in the GSEA analysis ( Figure 5B-E and WB_vs_WA and 'Plant-pathogen interaction' in WA_vs_PA and WB_vs_PB were only found in the GSEA analysis ( Figure 5B-E).

Analysis of DEGs Related to Biosynthesis and Transport of Anthocyanidins
Given the increase in the content of anthocyanins in pink flowers of the PF mutant, we conducted a detailed analysis of the genes involved in the phenylpropanoid, flavonoid, flavonol and flavone, and anthocyanidin biosynthesis pathways. A total of 80 DEGs potentially involved in anthocyanin biosynthesis were found, including GhPAL, Gh4CL, GhCHS, GhCHI, GhFLS, GhF3H, GhF3 H, GhF3 5 H, GhANS, GhLAR, GhANR, and GhUFGTs ( Figure 6 and Table S7). The PCC (Pearson's correlation coefficient) between the expression level of these DEGs and the DEMs (differentially expressed metabolites) was calculated. It showed that Cyanidin-3-O-glucoside biosynthesis was positively and negatively regulated by 12 and 7 DEGs, respectively (|PCC| ≥ 0.6, Table S8). The 12 positive regulators included 7 UGT (UDP-glycosyltransferase) genes, 2 PAL genes, 2 LAR genes, and 1 ANR gene. Among the 7 UGTs, the expression level of GH_D11G3364, namely GhUGT88F3, showed a significant positive correlation with the Cyanidin-3-O-glucoside content (PCC > 0.9, Table S8). Some metabolites (such as Cyanidin-3-O-galactoside, Pelargonidin-3-O-glucoside, and Peonidin-3-O-galactoside) with the same concentration change trend as Cyanidin-3-O-glucoside might also be involved in the formation of pink color of petals (Table S8) change trend as Cyanidin-3-O-glucoside might also be involved in the formation of pink color of petals (Table S8). After synthesis, anthocyanins are transported from the cytosol to the vacuole for storage, which is a key step in the coloration of tissues or organs. We identified four genes annotated as GSTF from the 'Glutathione metabolism' pathway in KEGG analysis (Table  S7). Notably, we found that GhGSTF12 (GH_A07G0814 and GH_D07G0816), previously reported to be involved in anthocyanin accumulation in cotton [41], was one of the four genes. We identified many differentially expressed ABC (ATP-binding cassette) genes, After synthesis, anthocyanins are transported from the cytosol to the vacuole for storage, which is a key step in the coloration of tissues or organs. We identified four genes annotated as GSTF from the 'Glutathione metabolism' pathway in KEGG analysis (Table S7). Notably, we found that GhGSTF12 (GH_A07G0814 and GH_D07G0816), previously reported to be involved in anthocyanin accumulation in cotton [41], was one of the four genes. We identified many differentially expressed ABC (ATP-binding cassette) genes, such as ABCA (1 genes), ABCB (17 genes), ABCC (14 genes), and ABCE (34 genes), most of which were highly up-regulated in WA and PA (Table S7). We also identified some Multidrug and toxic compound extrusion proteins (MATEs) and Vesicle-associated membrane proteins (SNAREs) from DEGs significantly enriched in 'drug transmembrane transport' and 'SNARE binding'/'SNARE complex', respectively (Tables S4 and S7). Among these transporters, the expression level of GhGSTF12 (GH_A07G0814 and GH_D07G0816), GhABCC12 (GH_A12G0446), and GhABCC4 (GH_A12G1481) showed a significant positive correlation with the Cyanidin-3-O-glucoside content in different samples (PCC > 0.9, Table S9).
Transcription factors play an essential role in anthocyanin biosynthesis and transport. Of all the DEGs (12,516 genes from all four comparisons), a total of 1048 genes were identified as putative TFs and regulators. They were categorized into 76 TF families (Table S10). Among them, the majority belongs to the AP2/ERF-ERF family, followed by bHLH, MYB, bZIP, C2H2, NAC, HB-HD-ZIP, WRKY, and GRAS families ( Figure S4A). By calculating the Pearson's correlation coefficient (PCC) between the FPKM of these TFs, and the content of Cyanidin-3-O-glucoside, 25 key TFs (|PCC| ≥ 0.9) with a potential role in flower coloration were identified, including 14 negative regulators and 11 positive regulators (Table S11). The 11 positive regulators related to accumulation of Cyanidin-3-O-glucoside include one MYB, one C3H, two ZIP, one B3, one IWS1, one GARP-ARR-B, one OFP, one C2C2-Dof, and two AP2/ERF genes ( Figure S4B). Interestingly, MYB21 (GH_D12G1622), the only MYB among the positive regulators, is not one of those (GH_A07G0850 and GH_D07G0852) that have been reported to be the responsible genes for the red leaf phenotype of Rs and R1 [40,42].

Co-Expression Analysis of the Genes Related to the Anthocyanidin Biosynthesis Pathway
In order to investigate the gene network regulating anthocyanin metabolism in the PF mutant, the 5940 non-redundant DEGs were subjected to WGCNA. These DEGs were clustered into eight modules ( Figure 7A,B). Seven representative anthocyanin metabolites were used as traits in the gene module-trait relationship analysis. The brown module had a significant positive correlation with the content of five metabolites, including Cyanidin-3-O-glucoside, Pelargonidin-3-O-glucoside, Peonidin-3-O-galactoside, Procyanidin B2, and Procyanidin C1, whereas the yellow module was significantly negatively correlated with these five metabolites ( Figure 7B). Delphinidin-3-O-glucoside and Quercetin-3-O-glucoside were significantly positively correlated with the turquoise module, but negatively correlated with the green, blue, and black modules ( Figure 7B). The blue and green modules showed a positive relationship with Cyanidin 3-O-glucoside, Pelargonidin-3-O-glucoside, and Peonidin-3-O-galactoside ( Figure 7B).
In order to find the key regulatory genes from the turquoise, yellow, brown, and green modules, we filtered out the hub genes from these modules based on PCC between individual genes and the content of the seven representative metabolites. The 78 hub genes selected based on |PCC| ≥ 0.95 were constructed into two regulatory networks, each associated with two or four of the seven metabolites, and no gene was associated with Peonidin-3-O-galactoside ( Figure 7C). Delphinidin-3-O-glucoside and Cyanidin-3-O-glucoside seemed to be regulated by more genes than Quercetin-3-O-glucoside, Pelargonidin-3-O-glucoside, Procyanidin B2, and Procyanidin C1. Among these regulatory genes, GhGSTF12 (GH_D07G0816) was significantly positively correlated with Cyanidin-3-O-glucoside and Pelargonidin-3-O-glucoside, while GhUGT88F3 (GH_D11G3364) and GhVPS32.2 (GH_A13G2237) were positively correlated only with Cyanidin-3-O-glucoside, the main contributor to the pink petal phenotype (Figures 6 and 7C). In order to find the key regulatory genes from the turquoise, yellow, brown, and green modules, we filtered out the hub genes from these modules based on PCC between individual genes and the content of the seven representative metabolites. The 78 hub genes selected based on |PCC| ≥ 0.95 were constructed into two regulatory networks, each associated with two or four of the seven metabolites, and no gene was associated with Peonidin-3-O-galactoside ( Figure 7C). Delphinidin-3-O-glucoside and Cyanidin-3-

Cyanidin-3-O-Glucoside Is the Main Contributor of the Pink Color Observed in the PF Mutant
For most plants, flavonoids, especially anthocyanins, are the main pigments in the coloration of their organs or tissues. For example, the colors seen on the skin of citrus, peach, pear, and eggplant [26,35,45,46], the flesh of apples, strawberries, and tomatoes [47][48][49], and the flowers of various ornamental plants such as petunia, peony, and chrysanthemum [50][51][52] are all resulted from tissue-specific accumulation of anthocyanins. Studies have shown that the primary shade of flower color (from red to blue violet) is mainly determined by six main anthocyanin metabolites (cyanidins, delphinidins, pelargonins, peonidins, malvidins, and petunidins) [53]. But, the proportion of anthocyanin metabolites and their biosynthetic genes can be significantly different in different plant species, so it is necessary to conduct personalized analysis. In this study, we found that the total anthocyanin content of petals was much higher in the PF mutant than in X74, particularly in the colored spots observed at the petal base ( Figure 1E). A total of 38 different anthocyanin metabolites were identified in the petals of PF and X74 (Figure 2). The anthocyanins often exist in the form of monoglycosides (occurring at the 3-position), probably to maintain stability. Consistent with previous studies, quercetin, as one of the background colors of upland cotton petals, is the most abundant flavonoid metabolite in both white and pink petals [37]. Of the six major anthocyanin metabolites, cyanidins, delphinidins, pelargonins, and peonidins were detected in PF and X74 petals (Table S2). The content of Cyanidin-3-O-glucoside is significantly higher in the pink petals than in the white petals, and is considered to be the main contributor for the color variance of X74 and PF petals, consistent with previous studies in different plant species [44,[52][53][54]. The white (or cream) flower of upland cotton turns red on the next day after flowering, which was also proved to be due to the sharp increase in cyanidins. This leads us to speculate that the pink petal color of PF is due to early activation of the Cyanidin-3-O-glucoside biosynthesis pathway, although whether it shares the activation mechanism involved in petal color change after flowering remains to be further studied. Furthermore, some monoglycoside derivatives of cyanidins, pelargonins, and peonidins (such as Cyanidin-3-O-sambubioside, Cyanidin-3-rutinoside, Peonidin-3-O-galactoside, and Pelargonidin-3-O-glucoside) were not found or were present in very low levels in the white petals compared with the pink petals, implying that these anthocyanins may play fine-tuning roles in deepening the petal color of PF ( Figure 2). Interestingly, delphinidins has a high concentration in both white and pink petals, implying that delphinidins have little to do with the pink color of PF petals; this could be because: (1) delphinidins have not been transferred to vacuoles, an essential step for tissue coloration; and/or (2) the pH in the petal vacuole is not suitable for coloring by delphinidins.

Key Genes Responsible for Biosynthesis and Accumulation of Anthocyanins in PF Petals
As a branch of the flavonoid biosynthesis pathway, anthocyanin biosynthesis has been deeply studied in many plant species [55]. Consistent with the published results, many structural genes involved in anthocyanin biosynthesis, such as PAL, C4H, 4CL, CHS, CHI, F3H, FLS, F3 H, F3 5 H, ANS, and UFGT, were differentially expressed between the white petals and the pink petals. Interestingly, we found that the expression level of most structural genes was the highest in PB, followed by WB, PA, and WA, implying that they are related to the biosynthesis of quercetin and delphinidins ( Figure 6 and Table S12). DFR, the key gene in the anthocyanin synthesis pathway, was highly expressed in both white and pink petals but is not a DEG, and may contribute to accumulation of cyanidins, pelargonins, peonidins, and delphinidins in both white and pink flowers. In addition, the 78 genes selected from WGCNA were mainly associated with Cyanidin-3-O-glucoside and Delphinidin-3-O-glucoside ( Figure 7C). The gene network and association analysis also suggest that the formation of pink petals is due to the activation of Cyanidin-3-O-glucoside synthesis pathway. In the Cyanidin-3-O-glucoside regulatory network, GSTF12, UGT88F3, and VPS32.3 were highly positively correlated with Cyanidin-3-O-glucoside. Among them, the expression level of GSTF12, which has been confirmed by our previous study to be involved in the accumulation of anthocyanins in upland cotton, is consistent with the color difference of white and pink petals [41]. As the last structural gene of anthocyanin biosynthesis, UGTs (UFGTs or 3GTs) are responsible for catalyzing the glycosylation of anthocyanins and causing the diversity of anthocyanin metabolites in Arabidopsis [56]. The expression trend of UGT88F3 in petal tissues suggests that it may be responsible for the glycosylation of Cyanidin-3-O-glucoside in pink petals, even if its expression level is not high. Furthermore, VPS32.3, described as vacuolar protein sorting-associated protein, may play an important role in transport of anthocyanins according to the AVI model [12,14].
Anthocyanin biosynthesis is directly regulated by the MBW complex composed of R2R3-MYB, bHLH, and WD40 [13]. A high number of DEGs were found to be MYB and bHLH genes ( Figure S4 and Table S10). Based on the integration analysis of transcriptome and metabolome, the expression level of MYB21 (GH_D12G1622) showed a trend consistent with the content of Cyanidin-3-O-glucoside in the white and pink petals (Table S11 and Figure S4B). Conversely, the expression level of two bHLH (GH_A09G2156 and GH_A13G0532) genes in white petals was higher than in pink petals, which was contrary to the total anthocyanin content in white and pink petals (Table S11 and Figure S4B). Whether the two bHLHs interact with MYB21 to regulate anthocyanin synthesis in pink petals remains to be further verified. While GhPAP1A (GH_A07G0850) and GhPAP1D (GH_D07G0852) have been previously identified as the key MYB TFs activating anthocyanin biosynthesis in red leaf mutants R1 and Rs [40,42], both genes were not differentially expressed between the white and pink petals, although their expression levels in the nonspotted regions of both white and pink petals were much higher than that in the spotted regions (Table S12), which leads us to speculate that the high expression level of most structural genes involved in anthocyanin biosynthesis in the non-spotted regions of the petal may be a result of the high expression of GhPAP1A or GhPAP1D in the non-spotted petal regions ( Figure 6).

Outlook of Anthocyanin Biosynthesis in Upland Cotton
Cotton fiber is the main source of natural fiber in the textile industry. Naturally Colored Cotton (NCC) has attracted much attention as an environment-friendly resource. The molecular mechanism of the biosynthesis and accumulation of pigments in NCC fiber is largely unknown, although previous studies have shown that the production of brown cotton and green cotton is related to proanthocyanidins (PAs) (or their derivatives) and caffeic acid (or their derivatives), respectively [57][58][59]. It is worth noting that the biosynthesis of PAs and anthocyanins shares most of the structural and regulatory genes [57,60]. Activation of the anthocyanin biosynthesis pathway is often accompanied by an increase in the expression level of the genes (such as LAR and ANR) related to proanthocyanidin biosynthesis ( Figure 6), and the subsequent increase in proanthocyanidin content [61]. Therefore, the identification of genes related to the biosynthesis and accumulation of anthocyanin is of great significance to understanding the mechanism underlying cotton fiber coloration.
Although fiber is the main product of cotton, its yield is far lower than that of cotton flower [37]. Therefore, cotton flower, which is rich in flavonoids, sugars, and other metabolites, has great potential in many aspects. For example, compared with the artificially synthetic anthocyanins, natural anthocyanins with strong antioxidant and free radicalscavenging properties are more favored by consumers. Cotton petals rich in anthocyanins may thus become a new source of natural anthocyanins. In addition, using heterosis to improve yield is one of the major breeding strategies of cotton. The utilization of male sterility is undoubtedly an effective method to produce hybrid seeds to reduce the cost and ensure the purity of varieties. Success of large-scale hybrid seed production depends on the efficient and economic reproduction of male sterile female parents. Introducing pigment deposition-related genes into hemizygous colored maintainer lines by genetic engineering (including fertility restoration genes), and crossing such maintainer lines to male sterile lines can produce 100% male sterile offspring by removing the colored seeds or seedlings [62,63]. This has been applied in maize, rice, and tomato [64][65][66]. In addition, anthocyanins not only protect flowers against UV-light, but also form particular patterns (such as petal spots, stripes, and patches) of flower colors to attract pollinators and thereby improve female reproductivity [44,67]. Therefore, based on the preference of pollinators for specific flower colors or particular color patterns, it may be possible to guide pollinators to conduct 'point-to-point' cross-pollination by manipulating the petal color or pigment distribution in male sterile lines and maintainer lines, so as to further reduce the production cost of artificial pollination. We hope our results will provide a foundation for applying the above conceptions in cotton cultivation.

Plant Materials and Sampling
Pink Flower (PF, Gossypium hirsutum L.) is a natural mutant showing a pink flower with crimson spots at the base of the petals (Figures S1 and 1B). XinLuZao 74 (X74, Gossypium hirsutum L.) is a white flower cultivar. All plants used in this study were grown in the Teaching Experimental Farm of Shihezi University (Shihezi, China) under natural light.
We collected petal samples from the crimson spot (PA) and non-spot areas (PB) of the PF mutant on the day before flowering (Figures 1B and 2A). The corresponding petal tissues (WA and WB) were sampled from the white flowers of X74 (Figures 1D and 2A). After sampling, all materials were immediately frozen in liquid nitrogen and stored at −80 • C until used.

Extraction and Measurement of Total Anthocyanins
According to a previous study [41], flower tissues were weighed (0.1 g), ground into powder in liquid nitrogen, and resuspended in 1 mL acidic methanol (1% hydrochloric acid, w/v) at room temperature for 12 h in dark. After centrifugation at 12,000× g rpm for 10 min at 4 • C, 1 mL of the supernatant was added to 4 mL of acidic methanol. The absorbance of the solution was measured with a U-5100 UV/VIS spectrophotometer (Shimadzu, Kyoto, Japan) at 530 nm, 620 nm, and 650 nm. The anthocyanin content was calculated with the following formula:

Qualitative and Quantitative Analysis of Anthocyanins
The quantitative analysis of anthocyanins was performed by Metware Biotechnology Co., Ltd. (Wuhan, China) based on the AB Sciex QTRAP 6500 LC-MS/MS platform. The petal samples were freeze-dried, ground into powder (30 Hz, 1.5 min), and stored at −80 • C until needed. An amount of 50 mg of the powder was weighted and extracted with 0.5 mL methanol/water/hydrochloric acid (500:500:1, v/v/v). Then, the extract was vortexed for 5 min and treated by ultrasound for 5 min, and then centrifuged at 12,000× g under 4 • C for 3 min. The residue was re-extracted one more time by repeating the above steps under the same conditions. The supernatants were collected, and filtrated through a membrane filter (0.22 µm, Anpel) before LC-MS/MS analysis.
Linear ion trap (LIT) and triple quadrupole (QQQ) scans were acquired on a triple quadrupole-linear ion trap mass spectrometer (QTRAP), QTRAP ® 6500+ LC-MS/MS System, equipped with an ESI Turbo Ion-Spray interface, operating in positive ion mode, and controlled by Analyst 1.6.3 software (Sciex). The ESI source operation parameters were as follows: ion source, ESI+; source temperature, 550 • C; ion spray voltage (IS), 5500 V; curtain gas (CUR) pressure, 35 psi. Anthocyanins were analyzed using scheduled multiple reaction monitoring (MRM). Data acquisitions were performed using Analyst 1.6.3 software (Sciex). Multiquant 3.0.3 software (Sciex) was used to quantify all metabolites. Mass spectrometer parameters including the declustering potentials (DP) and collision energies (CE) for individual MRM transitions were established with further DP and CE optimization. A specific set of MRM transitions were monitored for each period according to the metabolites eluted within this period.

RNA Extraction and Sequencing
The petal samples with three biological replicates collected from X74 and PF were sent to Beijing Novogene bioinformatics technology Co., Ltd. for RNA-Seq. A total amount of 1 µg RNA per sample was used as input material for the RNA-seq library preparations. Sequencing libraries were generated using NEBNext ® UltraTM RNA Library Prep Kit for Illumina ® (NEB, San Diego, CA, USA) by following the manufacturer's recommendations, and index codes were added to attribute sequences to each sample. The integrity and purity of the RNA samples were determined by 1% agarose gel electrophoresis, NanoPhotometer Spectrophotometer, and Agilent 2100 Bioanalyzer. The clustering of the index-coded samples was performed on a cBot Cluster Generation System using TruSeq PE Cluster Kit v3-cBot-HS (Illumia, San Diego, CA, USA) according to the manufacturer's instructions. After cluster generation, the library preparations were sequenced on an Illumina Hiseq platform and 125-150 bp paired-end reads were generated.

Transcriptome Data Processing
Fastp v0.19.3 was used to filter the raw data, mainly to remove reads with adapters. Then, clean reads were obtained by removing reads containing adapter and ploy-N and low-quality reads. At the same time, the Q20, Q30, and GC contents of the clean data were calculated. All subsequent analyses were based on clean reads. The reference genome, Gossypium hirsutum (AD1) 'TM-1' genome ZJU-improved_v2.1_a1 [47], and its annotation files were downloaded from CottonGen (http://www.cottongen.org (accessed on 18 November 2021)). After indexing by using HISAT v2.1.0, the reference genome was used in alignment of the clean reads. FeatureCounts v1.6.2 was used to count the number of reads mapped to each gene, and to calculate the FPKM (the number of fragments per kilobase of transcript sequence per millions base pairs sequenced), which was used to characterize the abundance of gene transcripts [68].

Differential Gene Expression and Pathway Analysis
DESeq2 v1.22.1 was used to analyze the differential expression of genes between two groups, and the p value was corrected using the Benjamini & Hochberg method. The corrected p ≤ 0.05 and |log 2 (foldchange)| ≥ 1 were used as the thresholds for significantly differential expression. GO and KEGG enrichment analyses and visualization of DEGs were further implemented by employing the clusterProfiler R package and Sangerbox (http://vip.sangerbox.com/ (accessed on 22 November 2021)), respectively. The online iTAK was used to predict transcription factors (TFs) from all DEGs [69].

Gene Set Enrichment Analysis (GSEA)
All the expressed genes, regardless of whether or not they were differentially expressed from different comparison groups, were used for GSEA analysis, which was performed using the GSEA version 4.1.0 software with default parameters. Gene set enrichment analysis (GSEA) sorts all genes in the comparison group according to the multiple of difference between groups, and then analyzes the up and down regulation of the whole set according to the sorted results. The enrichment score for each gene set is then calculated using the entire ranked list, which reflects how the genes for each set are distributed in the ranked list. Normalized enriched score (NES) was determined for each gene set, which defines the degree of enrichment. The significantly enriched gene set was selected based on normalized enrichment score (NES) > 1 and false discovery rate (FDR) q-value ≤ 0.25.

Gene Network Construction and Visualization
A total of 5940 non-redundant DEGs from the four comparisons, WA_vs._PA, WB_vs._PB, PB_vs._PA, and WB_vs._WA, were selected for co-expression network analysis via the weighted gene co-expression network analysis (WGCNA) tools from the ImageGP platform (http://www.ehbio.com/Cloud_Platform/front/#/ (accessed on 10 December 2021)). The contents of seven representative metabolites (Procyanidin B2, Procyanidin C1, Cyanidin-3-Oglucoside, Delphinidin-3-O-glucoside, Pelargonidin-3-O-glucoside, Peonidin-3-O-galactoside, and Quercetin-3-O-glucoside) were used as the traits for WGCNA, and modules were obtained through WGCNA analysis with the default settings. Co-expression networks were constructed using the WGCNA (v1.29) package in R (Langfelder and Horvath, 2008). The modules were obtained using the automatic network construction function blockwise-Modules with the default settings, except that the power was set to 18, TOM Type was signed, minModuleSize was set to 25, Deep split was set to 2, and mergeCutHeight was set to 0.25. The eigengene value was calculated for each module and used to test the association with each trait (metabolites). The networks were visualized using Cytoscape (v.3.0.0).

Quantitative Real-Time PCR Analysis
Each sample was ground to powder in liquid nitrogen and 0.1 g power was used to extract total RNA using an RNA extraction kit (Huayueyang Biotechnology Inc., Beijing, China) by following the specifications of the kit. The cDNA was synthesized by using an EASYspin Plus Plant RNA Kit (Aidlab Biotechnologies Co., Ltd.; Beijing, China) from 1 µg of total RNA, according to the manufacturer's instructions. Three biological and three technical replicates for each reaction were analyzed on a LightCycler ® 480 instrument (Roche, Switzerland) with a first step of 95 • C for 5 min, followed by 45 cycles of 95 • C for 10 s, 60 • C for 15 s, and 72 • C for 20 s. The gene expression levels were calculated with the 2 −∆∆cT method. The cotton GhUBQ7 gene (DQ116441) was amplified as an internal control gene. All primers are listed in Table S1.

Conclusions
In this study, a total of 38 anthocyanin metabolites were identified in the white and pink petals. Among them, Quercetin-3-O-glucoside and Delphinidin-3-O-glucoside are abundant in both white and pink petals. Cyanidin-3-O-glucoside, with a markedly higher abundance in pink petals, may be the major contributor to the formation of pink petals. Other glycosidic derivatives of cyanidins, pelargonidins, and peonidins that were not detected in white petals may also contribute to the intensity of pink color. The combined analysis of RNA-seq and metabolome revealed that the expression of MYB21, UGT88F3, GSTF12, and VPS32.3 was associated with the accumulation of Cyanidin-3-O-glucoside in petals. In addition, many of the 78 genes identified through WGCNA were correlated with Cyanidin-3-O-glucoside and Delphinidin-3-O-glucoside. While the relationship between these genes and anthocyanin biosynthesis needs further investigation, we hope that our work has contributed to the understanding of anthocyanin biosynthesis in upland cotton.

Data Availability Statement:
The original contributions presented in the study are publicly available. This data can be found here: National Center for Biotechnology Information (NCBI) BioProject database under accession number PRJNA825010 (Release date: 1 October 2022).

Conflicts of Interest:
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.