Transcriptome-wide identification and quantification of Caffeoylquinic acid biosynthesis pathway and prediction of their putative BAHDs gene complex in A. spathulifolius

: The phenylpropanoid pathway is a major secondary metabolite pathway that helps plants overcome biotic and abiotic stress and produces various by-products that promote human health. Its byproduct, chloroquinic acid (CQA), is a soluble phenolic compound present in many angiosperms. Hy-droxycinnamate-CoA shikimate/quinate transferase(BAHDs superfamily enzyme) is a significant enzyme that plays a role in accumulating CQA biosynthesis. This study analyzed transcriptome-wide identification of the phenylpropanoid to chloroquinic acid biosynthesis candidate genes in A. spathulifolius flowers and leaves. Transcriptomic analyses of the flowers and leaves showed a differential expression of the PPP and CQA biosynthesis regulated unigenes. An analysis of PPP captive unigenes revealed the following: the major duplication of the key enzyme, PAL, 120 unigenes in leaves and 76 in flowers; the gene encoding C3’H , 169 unigenes in leaves and 140 unigenes in flowers; duplicated unigenes of 4CL, 41 in leaves and 27 in flowers. In addition, C4H unigenes had 12 unigenes in the leaves of A. spathulifolius and four in the flowers. The characterization of the BAHDs superfamily members identified 82 in leaves and 72 in flowers. Among them, phylogenetic analysis showed that five unigenes encoded HQT and three encoded HCT in A. spathulifolius . The three HQT are common to both leaves and flowers, whereas the two HQT were specialized for leaves. The pattern of HQT synthesis was upregulated in flowers, whereas HCT was expressed strongly in the leaves of A . spathulifolius . Overall, 4CL , C4H , and HQT are expressed strongly in flowers, and caffeic acid and HCT show more expression in leaves. Therefore, CQA biosynthesis occurs in the flowers of A. spathulifolius rather than leaves.


Introduction
Plant secondary metabolites (PSM) are a group of organic compounds that assist in protecting plants against biotic and abiotic stress [1][2][3][4]. PSM metabolites are non-essential to normal life but impart competence to stressful environments [5,6]. Secondary metabolites can be divided into three types: terpenoids, polyketides, and phenylpropanoid [7,8]. The phenylpropanoid (PP) metabolism, the production of enormous compounds by the intermediate process of the shikimate pathway, is present in bacterial, fungi, and plants but absent in animals [9]. These phenolic compounds assist in the plant defense system against insects and fungi [10]. The main compounds of PP metabolites products of phenylalanine (Phe) are precursors that regulate many metabolites, such as flavonoids, tannins, lignins, and phenylpropanoid [11]. The shikimate pathway network connects the carbon metabolism and the AAA (Aromatic for elucidating the response metabolites of chlorogenic acid or chloroquinic acid biosynthesis in A. spathulifolius and an importans source of ornamental plants to useful drug discovery.

Assembly and gene annotation
The assembled flower reads produced 146,337 unigenes, with an average contig 811.58 bp in length. The assembled bases of transcripts had an N50 value of 1,279 bp in length, and overall alignment rates of 91.71% to the flower paired-end reads. The GC content was 38.09% on average. In contrast, the leaf transcriptome was reported previously [44]. The assembly completeness was measured by BUSCO analysis to be 91.4 %(leaf) and 91.7 % (flower), which are the complete transcripts to the eudicots database via tblastn aligns (Figure S1), indicating the good quality of unique transcripts.
The flower of A. spathulifolius retrieved using the Nr database was 65,129 and 48,896 against the KEGG database, and 70,019 unigenes were identified in the Pfam database ( Table 1). The PP pathway-involved unigenes in flowers showed a transcript of 1,128 and in the leaf had 1,287 unigenes. The Gene ontology terms indicated 40.9% to the phenylpropanoid metabolic process, 31.8% to the response to wounding, 22.7% to the lignin biosynthetic process, and 18.2% to the cinnamic acid biosynthetic process to the biological process function ( Table 2).

BAHDs protein Phylogenies
The unrooted phylogenetic trees were constructed from a complete ORF and with the

Quantification of PPP unigenes
Previous studies of A. spathulifolius leaf transcriptome-identified PP candidate genes were designed for reverse transcription (RT) to confirm the quality of isoform produced by assembled transcripts ( Figure S2). The total assembled

Discussion
Phenylpropanoid biosynthesis produces various secondary metabolites, most of which have beneficial effects on human health [45]. These metabolisms are of concern regarding diabetes, obesity, cancer, and cardiovascular disease that are a significant burden on the world  The results suggest that PSM increases the diversity of higher plants, including A.
spathulifolius. Overall, these results revealed substantial CQA biosynthesis in A.
spathulifolius that could benefit human health.

RNA Isolation, cDNA Library Construction, and Illumina Sequencing
The total RNA of the whole flower of Aster spathulifolius was extracted using the protocol reported by Bretial et al. [62]. cDNA Library construction and RNA-Sequencing were performed using the Genomics Macrogen Laboratory (South Korea). The TruSeq method was used to make short fragments of mRNA. The short fragments were then as used as templates for the cDNA library. All short fragments were linked to the sequencing adapter, and the fragments were sequenced for Paired-End (PE) reads using illumina sequencing.
For leaf transcriptome, the material used in previous studies (SRR10724565) [44] and flower RNA-sequences submitted under NCBI-SRA database with accession no: SRR14001926 were applied.

Denovo assembly and functional annotation
The raw reads were checked for Fastq quality control [63]. Below-quality value ≤ 30% (Q20) reads were removed using the trimmomatic tool [64]. The clean reads were assembled to retrieve unique transcripts using the trinity program with kmer size 25 with the following pipeline: inchworm, chrysalis, and butterfly [65]. Finally, the unique reads transcripts were checked for coding function annotations using the 'Trinotate and TrinotateWeb' pipeline, as mentioned in a previous study [66]. The gene annotation for all assembled unigenes was aligned to the Swiss-Prot protein database analysis. An amplicon size and primer structure of seven candidate unigenes and random