Transcriptomics and Genomics Analysis Uncover the Differentially Expressed Chlorophyll and Carotenoid-Related Genes in Celery

Celery (Apium graveolens L.), a plant from Apiaceae, is one of the most important vegetables and is grown worldwide. Carotenoids can capture light energy and transfer it to chlorophyll, which plays a central role in photosynthesis. Here, by performing transcriptomics and genomics analysis, we identified and conducted a comprehensive analysis of chlorophyll and carotenoid-related genes in celery and six representative species. Significantly, different contents and gene expression patterns were found among three celery varieties. In total, 237 and 290 chlorophyll and carotenoid-related genes were identified in seven species. No notable gene expansion of chlorophyll biosynthesis was detected in examined species. However, the gene encoding ζ-carotene desaturase (ZDS) enzyme in carotenoid was expanded in celery. Comparative genomics and RNA-seq analyses revealed 16 and 5 key genes, respectively, regulating chlorophyll and carotenoid. An intriguing finding is that chlorophyll and carotenoid-related genes were coordinately regulated by transcriptional factors, which could be distinctively classified into positive- and negative-regulation groups. Six CONSTANS (CO)-like transcription factors co-regulated chlorophyll and carotenoid-related genes were identified in celery. In conclusion, this study provides new insights into the regulation of chlorophyll and carotenoid by transcription factors.


Introduction
As the main pigment, chlorophyll is located on the thylakoid membrane and plays a central role in light absorption during photosynthesis [1,2]. Actually, chlorophyll is one of the most abundant biological and organic molecules and is central to carbon cycling by solar energy capture and photosynthetic carbon fixation [3,4]. Furthermore, chlorophyll has many other functions, such as delaying aging, producing vitamins, detoxification, and anti-allergy [5,6].
Carotenoids are mainly found in leaves, flowers, fruits and roots of plants [7]. It can attract insects and birds, which help plants spread pollen and seeds [8]. Carotenoids, as antenna pigments, capture light energy and transfer it to chlorophyll, promoting the photochemical process in photosynthesis [9,10]. In addition, carotenoids protect photosynthetic machinery from harmful oxidative light damage caused by strong light [11].

Comparative Analysis of Chlorophyll and Carotenoid Content in Three Varieties of Celery
Here, the present research is based on three celery varieties with different-colored petioles, including green, white, and red. Firstly, we measured the chlorophyll and carotenoid content among three varieties. The average chlorophyll content in green celery variety (0.0453 mg/g) was 5.4 and 9.6 times more than white (0.0083 mg/g) and red (0.0047 mg/g) varieties, respectively ( Figure 1a and Table S1). Specifically, the chlorophyll a and chlorophyll b content in the green variety were all significantly higher than those in white or red varieties (p-value < 0.01) (Figure 1a and Table S2). However, no significant difference was found between white and red varieties for chlorophyll and carotenoid content. Carotenoids, first discovered in carrot [36,39], were 6.8 and 9.8 times (p-value < 0.01) more abundant in green celery (0.0116 mg/g) than in white (0.0016 mg/g) and red (0.0011 mg/g) varieties, respectively (Figure 1b, Tables S1 and S2). Similarly, no significant difference was found between white and red varieties. Therefore, these three celery varieties were excellent materials for us to explore the regulation of gene expression related to chlorophyll and carotenoid.

Exploring the Gene Expression Pattern among Three Varieties of Celery
We conducted the expression pattern analysis of these three varieties using the transcriptomic datasets obtained by our laboratory. Based on the expression values of wholegenome genes of three celery varieties, clustering analysis was employed using STEM software. The genes were classified into 16 distinct profiles, and each profile represented a group of genes with the same expression pattern (Figure 1c). There were the most genes (1256) in profile 11, followed by profiles 1 (1060) and 5 (888). Among them, five profiles were significantly enriched (p-value < 0.05), and further divided into 2 clusters (profiles 1 and 4; profiles 11, 12, and 15) with different expression trend according to the background color ( Figure 1c).
Based on the differences in chlorophyll and carotenoid content among the three varieties, we managed to find profiles that had consistent expression patterns in both white and red celery, while they were not consistent with those in green celery. We found that the gene expression in the green variety was lower than that of the red and white varieties in profiles 1, 5, and 6 ( Figure 1c). On the contrary, the gene expression in green was higher than that of red and white varieties in profiles 9, 10, and 14. Therefore, we hypothesized that the genes in these six profiles might be related to the synthesis of celery chlorophyll and carotenoid.

Identification of Differentially Expressed Genes and Functional Enrichment Analysis among Three Varieties
In order to more accurately identify the genes related to chlorophyll and carotenoid, we analyzed the differentially expressed genes (DEGs) among the three varieties. Here, we attempted to identify the DEGs between green vs red celery and green vs white celery, and these genes were not differentially expressed between red and white celery according to the differences in chlorophyll and carotenoid content among the three varieties. By that standard, we detected 646 significantly up-regulated genes and 106 down-regulated genes in green celery compared with red and green varieties (Figure 1d,e).
Furthermore, we conducted the functional enrichment analysis of these DEGs using GO and KEGG databases. The results indicated these 646 up-regulated genes in the green variety were significantly enriched (q-value < 0.05) in 126 terms, including 57 'biological process', 29 'cellular component', and 40 'molecular function' categories ( Figure 2a, Table S3, Figure S1). However, there was no significantly enriched GO term for the down-regulated genes in the green variety ( Figure S2). Among the top 20 enriched terms, the terms involved in the biological process mainly were photosynthesis, including light reaction and harvesting. For the cellular component, most terms belonged to thylakoid, photosystem, photosynthetic membrane, and thylakoid membrane. For the molecular function, two terms were involved in the structural molecule and oxidoreductase activity. This indicated that most of the enriched GO terms were related to photosynthesis, indicating that most of these genes up-regulated in green celery were related to photosynthesis. These results were also consistent with the high content of chlorophyll in the green celery variety.
Similarly, we obtained 13 enriched pathways for the up-regulated genes in the green variety using the KEGG database, and no enriched pathway was detected for the downregulated genes in the green variety ( Figure 2b and Table S4). Among these enriched pathways, most of them were also related to photosynthesis and chlorophyll metabolism, which was consistent with the GO enrichment analysis. In addition, the carotenoid pathway was enriched (q-value < 0.05) for the up-regulated genes in green celery, which was consistent with the content of carotenoid in the three celery varieties.

Comprehensive Identification of Genes Involved in Chlorophyll and Carotenoid
To better explore and comprehensive comparative analysis of genes involved in chlorophyll and carotenoid pathways, we also selected six other representative eudicot species, including two other Apiaceae (coriander and carrot), one Araliaceae (ginseng: Panax ginseng) [40], one Asteraceae (lettuce: Lactuca sativa) [41], and two Rosids (the botanical model Arabidopsis thaliana, and the genome structure model Vitis vinifera) ( Figure 3). All species used except Vitis vinifera (grape) and Arabidopsis belonged to the Euasterids.
The chlorophyll biosynthesis pathway is completed by a series of enzymatic reactions (Tables S5 and S6). According to the KEGG database, the sequences of 27 Arabidopsis genes contributing to the formation of chlorophyll a and chlorophyll b were used to identify homologous genes in celery and other plants. In total, 27, 32, 25, 77, 27, and 22 chlorophyll biosynthesis genes were identified from the whole genomes of celery, coriander, carrot, ginseng, lettuce, and grape, respectively ( Figure 3, Table S6). Similarly, we collected 18 Arabidopsis genes involved in carotenoid according to the KEGG pathway, and they further were used as seeds to identify homologous genes in celery and other five species (Figure 3; Tables S7 and S8). The most carotenoid-related genes were identified in ginseng (72), followed by coriander (44), celery (42), carrot (37), grape (37), and lettuce (34).

Phylogenetic and Gene Duplication Analyses
To further explore the evolution of chlorophyll biosynthesis genes, 210 homologous genes encoding 15 enzymes were identified in celery and the other 5 species according to the Arabidopsis (Figure 3; Table S6). Based on the number of these genes, we conducted the cluster analysis among seven species, and two groups (I and II) were detected (Figure 4a). In group II, we found there were notably more genes in ginseng than other six species.
Considering shared and specific whole-genome duplication/triplication (Figure 3), assuming no gene loss after their divergence from the last common ancestor, the ratio of gene number among grape, Arabidopsis, lettuce, ginseng, carrot, coriander, and celery should be 1:4:3:6:4:4:4 ( Figure 3). However, it turned out that in fact most genes did not reach this ratio in multiple species, and most of these genes were lost to varying degrees. This might be due to the loss of genes during evolution after genome duplication. Interestingly, the genes encoding GSA-AM and POR enzymes were expanded in ginseng compared with the grape according to the heatmap and phylogenetic analyses (Figures 4b  and S3). There were the most genes encoding several enzymes in ginseng, such as GluTR, CHLH/I/D, GSA-AM, POR, UROD, and so on (Figures 4a,b and S3). Comparatively, there was no notable gene expansion in celery and other examined species for chlorophyll biosynthesis genes. According to the KEGG pathway, we mainly investigated 18 enzymes involved in the carotenoid. In total, 266 homologous genes encoding these enzymes were identified in celery and the other 5 species according to the Arabidopsis (Figure 3; Table S8). Two groups (I and II) were also obtained according to the cluster analysis of the number of these genes (Figure 4c). Compared to the Arabidopsis, there were more genes encoding each enzyme in other species, especially for the ginseng in group II. Similar to the chlorophyll biosynthesis genes, each species experienced some degree of gene loss after genome duplication. Interestingly, the genes encoding the ZDS enzyme were expanded in celery compared with the grape, while it was contracted in the other five species. There were the most genes encoding AAO3 enzyme in coriander, ABA2 in grape, PSY and ZEP genes in ginseng (Figures 4c,d  and S4). Compared to other enzymes, there were more genes encoding NCED than other enzymes in almost species (Figure 4d).

Expression Analysis of Chlorophyll and Carotenoid Biosynthesis Genes in Celery
Glutamyl tRNA reductase (GluTR) is the first critical rate-limiting reaction in the chlorophyll synthesis pathway. Among 27 chlorophyll biosynthesis genes identified in celery according to the KEGG database, 17 of them were differentially expressed among leaf, root, and petiole tissues (Figure 5a,b and Table S9). Moreover, 16 were all expressed as significantly higher in the green variety than in the red/white varieties. Therefore, these 16 DEGs might play key roles in the synthesis of high chlorophyll content in green celery (Figure 5b). Furthermore, we found that these key DEGs encoded 11 enzymes based on the chlorophyll biosynthesis pathway (Figure 5a and Table S6). The regulatory pathway showed that each node has one or more gene copies in the seven species. There are three genes encoding Glutamate-1-semialdehyde-2,1-aminomutase (GSA-AM) in celery, more than in the other five species except for ginseng (Figure 5a). Among all 42 carotenoid-related genes identified in celery, 16 genes were differentially expressed among leaf, root, and petiole tissues (Figure 6a,b and Table S10). Five DEGs (Ag4G01350.1, Ag8G00262.1, Ag5G02913.1, Ag3G02818.1, and Ag5G00282.1) were all expressed significantly higher in the green variety than in the red/white varieties (Figure 6b,c and Table S10). Therefore, these DEGs might play important roles in the synthesis of high carotenoid content in green celery. Based on the carotenoid pathway, the DEG Ag4G01350.1 encoded the phytoene synthase (PSY), which condensed two GGPP molecules to produce phytoene, the first compound in the carotenoid pathway (Figure 6a,c). The DEG Ag8G00262.1 encoded the lycopene epsilon-cyclase (LYCE), which catalyzed the synthesis of δ-carotene from lycopene. The δ-carotene was further formed into α-carotene by the action of lycopene beta-cyclase (LYCB). The other three differentially expressed genes encoded a cytochrome P450 family member (CYP97A3), non-photochemical quenching 1 (NPQ1) and zeaxanthin epoxidase (ZEP), and these enzymes were mainly involved in the synthesis of downstream carotenoid-related substances. Therefore, we believe that the first two genes, Ag4G01350.1 and Ag8G00262.1, played a very important role in the accumulation of carotene in green celery. (a) Carotenoid-related genes identification in celery and other 6 species. The notation '1-1-1-1-1-1-1' indicates that one gene was identified in Arabidopsis, celery, coriander, carrot, lettuce, grape, and ginseng, respectively. Gene expression was detected in the 3 different tissues (root, leaf, and petiole) and 3 different colors' varieties of celery. The orange and purple colors indicate high expression levels in different tissues and varieties, respectively. Asterisks represent the up-DEGs (red), down-DEGs (blue), and non-DEGs (black). The yellow arrow represents the key carotenoid-related DEGs between green and red/white varieties. (b) The Venn diagrams show the DEGs among three tissues (root, leaf, and petiole) and three varieties (green, red, and white). (c) The bar plot of expression level (FPKM) of five key carotenoid-related genes in three varieties.
Although the corresponding encoding genes of these two enzymes were not significantly differentially expressed between the three cultivars of celery, the genes encoding ZDS enzymes were significantly expanded in celery. A total of five ZDS genes were detected in celery, which was more than that in the other six species (Figure 6a). Only two ZDS genes were found in the other two Apiaceae species (carrot and coriander). Based on the phylogenetic tree, we found that three ZDS genes (Ag4G00806.1, Ag10G01744.1, AgUnG00992.1) in celery had no corresponding genes in carrot and coriander, suggesting that they might be duplicated genes produced during genome duplication. Therefore, the above findings showed that the ZDS enzyme might play important an role in celery carotenoid.

Regulatory Networks Construction for Chlorophyll and Carotenoid-Related Genes with Transcription Factors
We constructed regulatory networks between TFs and the 16 key chlorophyll-related DEGs based on Pearson correlation coefficients (PCCs). All the differentially expressed TFs (DETs) between green celery and the other two colored varieties were used, with 20 DETs and 15 (out of 16) key chlorophyll-related DEGs together forming this network (Figure 7a). The DETs in the network were classified into 10 categories, of which CO-like ones were the most (6), followed by MYB, NBS, ABI3/VP1 (RAV), and so on (Figure 7b and Table S11). More network connections between DEGs and DETs show positive (194 with PCC > 0.9) than negative regulation (64 with PCC < −0.9) (Table S12). Twenty DETs fell into two distinct groups, including thirteen positive-and seven negative-regulation groups (Figure 7a). Interestingly, some homologous DETs, such as six CO-like TFs, five of them performed positive regulation of chlorophyll biosynthesis, while the Ag9G02215 was just reverse (Figure 7a). An inferred regulatory network involves 18 DETs and 5 key carotenoid-related DEGs (Figure 7c). The DETs could be classified into 10 categories, of which CO-like was the most (6), followed by MYB, NBS and ABI3/VP1 (RAV) (Figure 7b, Table S11). Intriguingly, just like the chlorophyll network, the carotenoid synthesis network DETs formed 11 positiveregulation and 7 negative-regulation groups (Figure 7c). Further, the carotenoid DETs are all contained in the chlorophyll DETs (Figures 6 and 7; Table S11), implicating the same positive or negative regulation and showing highly coordinated regulation of the two pigments by the same set of transcriptional factors. Some homologous DETs, e.g., Ag7G01301 and Ag9G02215, respectively perform positive and negative regulation of carotenoid, coordinated to that in the chlorophyll biosynthesis.

Discussion
Chlorophyll is an important pigment involved in plant photosynthesis. It captures light energy and drives electrons to transfer to the reaction center [42,43]. The mutation of chlorophyll biosynthesis genes can lead to the loss or decrease in related enzyme activities, resulting in the decrease in chlorophyll content and the etiolation of leaves. Finally, it will lead to a decrease in photosynthetic efficiency, slow growth and even death [44,45]. In higher plants, chlorophyll mainly includes chlorophyll a and chlorophyll b, and most enzymes required for chlorophyll biosynthesis have been detected [46,47]. Twenty-seven genes encoding these enzymes have been cloned in Arabidopsis, providing an important reference for the identification of chlorophyll synthesis genes in the other species [48].
Inhibition of carotenoid or chlorophyll synthesis to inhibit plant photosynthesis is one of the action mechanisms of herbicides [49,50]. Carotenoid or chlorophyll biosynthesis inhibitors can cause albino leaves and even lead to death. The main reason for this phenomenon is the inhibition of carotenoid or chlorophyll biosynthesis in plants [3,51]. By transplanting genes related to chloroplast and carotenoid synthesis into plants, it is possible to make plants resistant to different types of herbicides [10]. Therefore, genes identified in this study are of great value for breeding new herbicide-resistant plants or developing new herbicides.
Transcription factors (TFs) play important roles in regulating carotenoid and chlorophyllrelated genes [52,53]. For example, the down-regulation of SlMYB72 played an important role in changing the expression level of genes related to chlorophyll, carotenoid and flavonoid in Solanum lycopersicum [54]. In this study, we constructed regulatory networks between differentially expressed TFs (DETs) and the key chlorophyll-and carotenoid-related DEGs. Most DETs in the network were CO-like, MYB, NBS, ABI3/VP1 (RAV), and they performed positive or negative regulation of chlorophyll and carotenoid-related genes in celery.
An intriguing finding from the present study is that chlorophyll and carotenoids are regulated by mostly shared but differentially expressed TFs (DETs) in the same regulatory pattern. Carotenoids are integral constituents of the thylakoid membrane and are usually associated intimately with both antenna and reaction center pigment proteins [10]. The light absorbed by carotenoids is transferred to chlorophyll for photosynthesis; because of this role, they are called accessory pigments. Carotenoids are tetraterpenes and the phytol side chain of chlorophyll is terpene-derived, therefore, they have some common precursors such as geranylgeranyl diphosphate (GGPP) [18,55]. Here, we found that the DETs, while regulating carotenoid or chlorophyll, could be distinctively classified into two groups, with one group always performing positive regulation and the other negative regulation.
A high-quality chromosome-level celery reference genome and transcriptome, together with genomic data for carrot and coriander, provides abundant resources for both fundamental and applied research into Apiaceae plants [36][37][38]. Two sequential WGDs leading to the formation of celery and other Apiaceae genomes exerted a great impact on gene regulatory networks and may have contributed to the diversification of the Apiaceae [35].
In this study, the molecular basis of celery chlorophyll and carotenoid regulatory networks were comprehensively analyzed in combination with transcriptome analysis.
Several key genes related to these pathways were identified in celery and their expression was explored between different tissues and varieties, laying a solid foundation for dissecting genetic mechanisms associated with photosynthesis.

Detection of Chlorophyll and Carotenoid Content
In the study, a total of 0.5 g stems of three varieties (green celery, red celery and white celery) were taken for the determination of chlorophyll and carotenoid content, and each celery variety was repeated three times. Then, 5 mL of 95% ethanol was used for extraction at 4 • C. The content of chlorophyll and carotenoid was measured by spectrophotometry from the petiole of three celery varieties.

Chlorophyll and Carotenoid-Related Genes Identification
Firstly, the genes of Arabidopsis related to chlorophyll and carotenoid were retrieved according to the KEGG database [58]. Then, these genes were used to identify homologous genes in celery and other plants, including coriander, carrot, ginseng, lettuce, and grape, using the Blastp program with the parameters as e-value < 1 × 10 −5 , identify > 50%, score > 100 according to the previous reports [59,60].

Phylogenetic Tree Construction
The phylogenetic trees were constructed using the amino acid sequences of the chlorophyll and carotenoid genes from celery and six other species by MEGA X software [61]. The neighbor-joining (NJ) method was used with 1000 bootstrap replicates. Evolutionary distance was calculated using the Poisson correction model. The evolutionary tree was further beautified using the iTOL program [62].

Gene Expression Data Collection and Substance Content Measurement
The transcriptomic datasets were obtained from our laboratory, which contained three celery samples with different-colored petiole varieties, including green celery 'Ventura', white celery 'Baiqin', and red celery 'Hongqin'. In addition, the RNA-seq dataset from three tissues (root, petiole, and leaf) of green celery 'Ventura' were used for gene expression analysis. All the samples had three biological replicates, and all the datasets were available download from the publicly accessible database with the accession numbers CRA001996 and CRA001997 according to the previous report [56]. Furthermore, we measured the content of chlorophyll and carotenoid using spectrophotometry from the petiole of these three celery genotypes for comparative analysis.

Gene Expression Quantity, Cluster, and Enrichment Analysis
The Cufflinks program was used for calculating gene expression with default parameters, and the expression was normalized as the expected number of Fragments Per Kilobase of transcript sequence per Millions base pairs sequenced (FPKM) [63]. The DEGs analyses were conducted using DESeq2 software with p-adj < 0.05 and the absolute value of fold-change >2 [64]. The hierarchical clustering heat map and Venn diagram were visualized using TBtools software (v1.09876) [65]. The expression patterns of genes in different varieties or tissues were obtained using the STEM software (v1.3.13) [66]. The number of model profiles was set to 50, and other settings used the default parameters. The GO and KEGG enrichment analysis was performed using the OmicShare tools (http://www.omicshare.com/tools (accessed on 26 July 2022)), and q-values less than 0.05 were thought to be significantly enriched in this study.

Interaction Network Construction
All the transcription factors (TFs) were identified using the Pfam database (http: //pfam.sanger.ac.uk/ (accessed on 26 July 2022)) with E-value <1 × 10 −4 according to the previous reports [67][68][69]. Pearson's correlation coefficients (PCCs) between TFs and DEGs were calculated using in-house Perl scripts according to the gene expression value of three different color's celery varieties. The positive and negative regulatory relationships were estimated as the PCC > 0.90 and PCC < −0.90, respectively [70,71]. The interaction networks between pathway-related genes and TFs were constructed using Cytoscape software (https://cytoscape.org/ (accessed on 26 July 2022)).

Data Availability Statement:
The datasets analyzed during the current study are available in the Beijing Institute of Genomics (BIG) Data Center under accession numbers CRA001993, CRA001996, CRA001997 that are publicly accessible at http://bigd.big.ac.cn/gsa (accessed on 26 July 2022). The assembled celery genome and related dataset also can be downloaded from our Apiaceae genome database (http://cgdb.bio2db.com (accessed on 26 July 2022)). All materials and related data in this study are provided in the supplementary files. Other data are available upon request to the corresponding author.

Conflicts of Interest:
The authors declare no competing interests.