Isolation and Characterization of the Genes Involved in the Berberine Synthesis Pathway in Asian Blue Cohosh, Caulophyllum robustum

Caulophyllum robustum, commonly named Asian blue cohosh, is a perennial herb in the family Berberidaceae. It has traditionally been used for folk medicine in China. We isolated berberine from the leaves, stem, roots, and fruits of C. robustum, and this is the first report on berberine in this species. Transcriptome analysis was conducted for the characterization of berberine biosynthesis genes in C. robustum, in which, all the genes for berberine biosynthesis were identified. From 40,094 transcripts, using gene ontology (GO) analysis, 26,750 transcripts were assigned their functions in the categories of biological process, molecular function, and cellular component. In the analysis of genes expressed in different tissues, the numbers of genes in the categories of intrinsic component of membrane and transferase activity were up-regulated in leaves versus stem. The berberine synthesis genes in C. robustum were characterized by phylogenetic analysis with corresponding genes from other berberine-producing species. The co-existence of genes from different plant families in the deepest branch subclade implies that the differentiation of berberine synthesis genes occurred early in the evolution of berberine-producing plants. Furthermore, the copy number increment of the berberine synthesis genes was detected at the species level.


Introduction
Caulophyllum is a small genus of perennial herbs in the family Berberidaceae. The genus contains only three species that are grown in the wild in eastern Asia and North America. Of the three species identified, C. thalictroides and C. giganteum are natives in eastern North America, and C. robustum is native to Northeast Asia, including China, Korea, and Japan [1]. The common name for the Caulophyllum species is blue cohosh, from the North American natives who used it for the treatment of amenorrhea, dysmenorrhea, and rheumatic symptoms [2]. The extracts are still used by midwives to aid labor in the United States. Root powders or liquid extracts from blue cohosh are sold as dietary supplements

Berberine Extraction and Quantification
Tissues were collected three times during [15][16][17][18][19] June 2021. The freeze-dried tissues were ground with a mortar and pestle, and berberine was extracted from the fine powder. The tissues were freeze-dried with 70% ethanol (1 g/10 mL) in the dark for 24 h at room temperature. After dilution to 10,000 mg/L, the berberine content was measured using a Shimadzu Prominence HPLC system (Shimadzu, Kyoto, Japan) equipped with a diode array UV-vis detector for monitoring at 280 nm. A C18 column (250 × 4.6 mm, 5 µm, Waters, Ireland) was used for separation by binary gradient elution using solvent A (water containing 0.1% formic acid) and solvent B (acetonitrile containing 0.1% formic acid). The flow rate was set at 1.0 mL/min as follows: 0 min, 0% B; 5 min, 0% B; 10 min, 30% B; 25 min, 40% B; 30 min, 50% B; 35 min, 0% B; and 40 min, 0% B. The injection volume was 10 µL and the column temperature was 40 • C.
One-way Analysis of Variance (ANOVA) and Duncan's multiple range tests were used to identify significant differences (p < 0.05), using Statistical Product and Service Solutions (SPSS ver. 26, Chicago, IL, USA).

RNA Extraction and Transcriptome Sequencing
Transcriptome sequencing was conducted on the fruit, leaves, and stem of C. robustum. Total cellular RNAs were extracted from fresh tissues with the Hybrid-RTM kit (Genes All Biotechnology Co., Seoul, Republic of Korea). A cDNA library was constructed using the TruSeq RNA Sample Prep Kit v2 (Illumina). Paired-end sequencing was carried out with the Illumina HiSeq 4000 at Macrogen, Inc. (Seoul, Republic of Korea). Low-quality reads (Phred score < 20), short reads (<50 bp), empty nucleotides (N at the end of reads), and adapter sequences were removed using Trimmomatic software [18]. Quality control before and after trimming was performed using FastQC software (https://www.bioinformatics. babrham.ac.uk/projects/fastqc/, accessed on 12 December 2022).

Functional Annotations
The transcriptome sequences were blasted on the known public databases InterProScan in the European Bioinformatics Institute (EBI) and NR in NCBI using the Basic Local Alignment Search Tool (BLAST) program with the cut-off parameters e-value 1e −4 and >70% similarity. The candidate transcripts with InterProScan and BLASTP hits were sorted to import to the Blast2GO suite 6.0 (https://www.blast2go.com/, accessed on 12 December 2022, BioBam Bioinformatics SL, Valencia, Spain). Gene ontology (GO) analyses were then performed.

Isolation of Berberine Synthesis Genes
Berberine synthesis genes were reported in Berberis koreana [21]. We used the protein sequences of the berberine synthesis genes in B. koreana for query in the transcriptome sequences in C. robustum using the NBLAST program [22]. The genes isolated were blasted again in NCBI database to be sure their functions. We also isolated the corresponding sequences from the NCBI genome databases of those plant species listed in Table 1.

Phylogenetic Analysis
Phylogenetic analysis was performed using the protein sequences. The protein sequences were aligned using ClustalW software and the phylogenetic tree was constructed with the neighbor-joining method using MEGA v5.0 with 1000 bootstraps. A maximum likelihood phylogenetic tree was built using MEGA X (v10.2.4) [23].

Berberine Contents
We quantified the berberine content in the leaves, stem, roots, and fruit of C. robustum ( Figure 1). The berberine content in leaf tissue was more than 10 times higher than in the other three tissues, that is, 58.51 ± 3.39 ug/g in the leaf, 5.83 ± 0.24 ug/g in the stem, 2.60 ± 1.33 ug/g in the root, and 3.23 ± 0.09 ug/g in the fruit per 1 g of tissue.  [19]. Short reads were mapped to the assembly using the Burrows-Wheeler Aligner (BWA) (https://github.com/lh3/bwa, accessed on 12 December 2022) [20]. The fragments per kilobase per million (FPKM) was calculated to select fragments higher than 1 FPKM in value to be used as a reference sequence.

Functional Annotations
The transcriptome sequences were blasted on the known public databases Inter-ProScan in the European Bioinformatics Institute (EBI) and NR in NCBI using the Basic Local Alignment Search Tool (BLAST) program with the cut-off parameters e-value 1e −4 and >70% similarity. The candidate transcripts with InterProScan and BLASTP hits were sorted to import to the Blast2GO suite 6.0 (https://www.blast2go.com/, accessed on 12 December 2022, BioBam Bioinformatics SL, Valencia, Spain). Gene ontology (GO) analyses were then performed.

Isolation of Berberine Synthesis Genes
Berberine synthesis genes were reported in Berberis koreana [21]. We used the protein sequences of the berberine synthesis genes in B. koreana for query in the transcriptome sequences in C. robustum using the NBLAST program [22]. The genes isolated were blasted again in NCBI database to be sure their functions. We also isolated the corresponding sequences from the NCBI genome databases of those plant species listed in Table 1.

Phylogenetic Analysis
Phylogenetic analysis was performed using the protein sequences. The protein sequences were aligned using ClustalW software and the phylogenetic tree was constructed with the neighbor-joining method using MEGA v5.0 with 1000 bootstraps. A maximum likelihood phylogenetic tree was built using MEGA X (v10.2.4) [23].

Berberine Contents
We quantified the berberine content in the leaves, stem, roots, and fruit of C. robustum ( Figure 1). The berberine content in leaf tissue was more than 10 times higher than in the other three tissues, that is, 58.51 ± 3.39 ug/g in the leaf, 5.83 ± 0.24 ug/g in the stem, 2.60 ± 1.33 ug/g in the root, and 3.23 ± 0.09 ug/g in the fruit per 1 g of tissue.

Figure 1.
Berberine content in the four organs of C. robustum. Data followed by a different letter indicate significant differences between samples according to one-way ANOVA/Duncan test (p < 0.05).

Transcriptome Sequencing
We obtained approximately 1.97 × 10 8 reads from the three transcriptome libraries (Supplementary Table S1). The percentage of clean reads from the raw reads was approximately 91.7% in the three libraries. Briefly, the number of clean reads was approximately 5.3 × 10 7 in the fruit, 6.4 × 10 7 in the leaves, and 6.7 × 10 7 in the stem of C. robustum. From the cleaned read data, the number of mapped reads were 3.7 × 10 7 , 4.6 × 10 7 , and 4.3 × 10 7 in the fruit, leaves, and stem, respectively (Supplementary Table S2).
The de novo assembly was carried out by pooling the three library results (fruit, leaf, stem). (Supplementary Table S3). The number of contigs obtained was 252,149, which covers 1.97 × 10 8 bp. After cleaning by removing redundancy, the number of contigs was 41,094, which covers 5.4 × 10 7 bp. The average length of the contigs was 783 bp before cleaning and 1338 bp after redundancy cleaning.

Functional Annotation and Gene Ontology (GO)
The functions of the obtained transcripts were analyzed by mapping against the InterProScan and NR databases. Of the 41,094 transcripts, 24,840 (60.4%) and 30,210 (73.5%) were mapped against the InterProScan and NR databases, respectively ( Figure 2A). In the NR database, the highest matching was with Macleaya cordata (3884), Aquiegia coerulea (3113), and Tetracentron sinense (2277), followed by Coptis chinensis, Thalictrum thalictroides, and Nelumbo nucifera (lotus) ( Figure 2B). Berberine content in the four organs of C. robustum. Data followed by a different letter indicate significant differences between samples according to one-way ANOVA/Duncan test (p < 0.05).

Transcriptome Sequencing
We obtained approximately 1.97 × 10 8 reads from the three transcriptome libraries (Supplementary Table S1). The percentage of clean reads from the raw reads was approximately 91.7% in the three libraries. Briefly, the number of clean reads was approximately 5.3 × 10 7 in the fruit, 6.4 × 10 7 in the leaves, and 6.7 × 10 7 in the stem of C. robustum. From the cleaned read data, the number of mapped reads were 3.7 × 10 7 , 4.6 × 10 7 , and 4.3 × 10 7 in the fruit, leaves, and stem, respectively (Supplementary Table S2).
The de novo assembly was carried out by pooling the three library results (fruit, leaf, stem). (Supplementary Table S3). The number of contigs obtained was 252,149, which covers 1.97 × 10 8 bp. After cleaning by removing redundancy, the number of contigs was 41,094, which covers 5.4 × 10 7 bp. The average length of the contigs was 783 bp before cleaning and 1338 bp after redundancy cleaning.

Functional Annotation and Gene Ontology (GO)
The functions of the obtained transcripts were analyzed by mapping against the In-terProScan and NR databases. Of the 41,094 transcripts, 24,840 (60.4%) and 30,210 (73.5%) were mapped against the InterProScan and NR databases, respectively ( Figure 2A). In the NR database, the highest matching was with Macleaya cordata (3884), Aquiegia coerulea (3113), and Tetracentron sinense (2277), followed by Coptis chinensis, Thalictrum thalictroides, and Nelumbo nucifera (lotus) ( Figure 2B).  In the GO analysis, 26,750 transcripts were assigned into the three major classification categories: biological process, molecular function, and cellular component, in which, 16,224, 18,160, and 15,282 transcripts were of biological process, molecular function, and cellular component, respectively ( Figure 3).

Differences of the Genes Expressed among Leaves, Stem, and Fruit in C. robustum
Of the 41,094 transcripts obtained in C. robustum, the numbers of transcripts expressed in the leaves, fruit, and stem were 37,108, 35,597, and 36,608, respectively (Supplementary Table S4). The expression was analyzed using a vis-à-vis comparison between the tissues (Table 2). In a comparison between the fruit and leaves, 761 genes were upregulated and 1135 genes were down-regulated in the fruit compared with the leaves. In leaves versus stem, the number of up-regulated genes in the leaves was higher than in the stem. GO analysis of the differentially expressed genes in the tissues revealed that the genes related to the membrane and the intrinsic component of the membrane categories

Differences of the Genes Expressed among Leaves, Stem, and Fruit in C. robustum
Of the 41,094 transcripts obtained in C. robustum, the numbers of transcripts expressed in the leaves, fruit, and stem were 37,108, 35,597, and 36,608, respectively (Supplementary Table S4). The expression was analyzed using a vis-à-vis comparison between the tissues (Table 2). In a comparison between the fruit and leaves, 761 genes were up-regulated and 1135 genes were down-regulated in the fruit compared with the leaves. In leaves versus stem, the number of up-regulated genes in the leaves was higher than in the stem. GO analysis of the differentially expressed genes in the tissues revealed that the genes related to the membrane and the intrinsic component of the membrane categories were the highest in number in both up-and downregulation ( Figure 4). In leaves versus stem, 326 genes and 105 genes were up-and downregulated, respectively, in the leaves, in which, a high number of genes in the categories of intrinsic component of membrane and transferase activity were up-regulated, and the genes related to response to stress and plasma membrane were down-regulated ( Figure 4). In fruit versus stem, 536 and 768 genes were up-and down-regulated in the leaves, respectively ( Table 2). Of the up-regulated genes, the intrinsic component of membrane genes showed the highest peak. Of the down-regulated genes, the number of annotated genes was extensive, but the down-regulated genes were mostly related to metabolite synthesis, including the BIA metabolic process and the BIA biosynthetic process ( Figure 4).    Figure 5 shows the berberine synthesis pathway and Table 3 lists the number of berberine synthesis genes in six berberine-producing medicinal plants mined from the NCBI database. Protein sequences of the berberine synthesis enzymes of the six plants are shown in Supplementary Table S1. We found the protein sequences of most of the genes in the six species, except NCS and STOX in P. nudicaule. The number of copies of the ber-  Figure 5 shows the berberine synthesis pathway and Table 3 lists the number of berberine synthesis genes in six berberine-producing medicinal plants mined from the NCBI database. Protein sequences of the berberine synthesis enzymes of the six plants are shown in Supplementary Table S1. We found the protein sequences of most of the genes in the six species, except NCS and STOX in P. nudicaule. The number of copies of the berberine synthesis genes ranged from one to twelve, depending on the genes and species. Protein sequences of the berberine synthesis genes of the Table 3 are in the Supplementary Table S5. 4OMT 2.1.      NCS is an important enzyme in BIA synthesis because it catalyzes the condensation reaction of dopamine and 4-hydroxyphenyl acetaldehyde to yield (S)-norcoclaurine, which is the first common precursor in the synthesis of myriad BIA molecules. The copy numbers of the NCS were ten, eleven, five, six, and one in C. robustum, B. koreana, N. nucifera, C. chinesis, and M. cordata, respectively. We could not find the NCS matching sequences in the transcriptome database for P. nudicaule. Phylogenetic analysis was carried out with these NCSs, while the NCS from M. cordata was excluded because it was a partial sequence with only 81 amino acids ( Figure 6A). The 32 NCSs from four species were separated into two groups: I and II. The B. koreana NCS_2 was out-grouped from the others. Of the ten NCSs from C. robustum, nine were placed in group I and one NCS was in group II. Group I contained NCSs from Ranunculaceae and Berberidaceae, in which the NCSs from a species formed mostly separated subclades. Group II contained NCSs from Berberidaceae and Nelumbonaceae. Unlike the NCSs in group I, the NCSs from different species were mixed in subclades. Berberine bridge enzyme (BBE) is the key rate-limiting enzyme in the berberine synthesis pathway [2,24]. We found four copies in the transcriptomes of C. robustum and two to five in other berberine-producing plants (Table 3). There were five copies in the transcriptomes of B. koreana, which belongs to the same Berberidaceae plant family as Caulophyllum. We analyzed the phylogenetic relationships among the BBEs listed in Table 3 as well as a few more BBEs from other species that were mined from the NCBI protein database ( Figure 6B). Short protein sequences were excluded from the analysis so that the phylogenetic analysis included 32 BBEs of 11 species from three families. The analyzed BBEs were divided into two groups, I and II, which were further divided into four subgroups. In each subgroup, the BBEs from different plant families were co-existed, except for subgroup I-III, which contained BBEs from B. koreana out-grouped to the two sister subgroups I-I and I-II.

Phylogenetic Analysis of Methyltransferases and CYP450 Monooxygenases
There were four methyltransferases (6OMT, CNMT, 4OMT, and SOMT) and two CYPs (CYP80B3 and CYP719A21) in the berberine synthesis pathway. Phylogenetic analysis was carried out with the four methyltransferases and two CYP subfamilies.
The analyzed 58 methyltransferases from 10 plant species were divided into five groups (Figure 7). The CNMTs were placed into a group V that was an out-clade to the Omethyltransferases. Three 4OMTs of N. nucifera formed a group IV that was an out-clade to groups I (6OMTs and 4OMTs), II (6OMTs), and III (OMT of A. thaliana and 4OMTs); while CNMTs and SOMTs were clustered in their own subclade, the 4OMTs and 6OMTs split into two groups. The 4OMTs split into two groups, in which the three 4OMTs from  Berberine bridge enzyme (BBE) is the key rate-limiting enzyme in the berberine synthesis pathway [2,24]. We found four copies in the transcriptomes of C. robustum and two to five in other berberine-producing plants (Table 3). There were five copies in the transcriptomes of B. koreana, which belongs to the same Berberidaceae plant family as Caulophyllum. We analyzed the phylogenetic relationships among the BBEs listed in Table 3 as well as a few more BBEs from other species that were mined from the NCBI Plants 2023, 12, 1483 9 of 18 protein database ( Figure 6B). Short protein sequences were excluded from the analysis so that the phylogenetic analysis included 32 BBEs of 11 species from three families. The analyzed BBEs were divided into two groups, I and II, which were further divided into four subgroups. In each subgroup, the BBEs from different plant families were co-existed, except for subgroup I-III, which contained BBEs from B. koreana out-grouped to the two sister subgroups I-I and I-II.

Phylogenetic Analysis of Methyltransferases and CYP450 Monooxygenases
There were four methyltransferases (6OMT, CNMT, 4OMT, and SOMT) and two CYPs (CYP80B3 and CYP719A21) in the berberine synthesis pathway. Phylogenetic analysis was carried out with the four methyltransferases and two CYP subfamilies.
The analyzed 58 methyltransferases from 10 plant species were divided into five groups (Figure 7). The CNMTs were placed into a group V that was an out-clade to the O-methyltransferases. Three 4OMTs of N. nucifera formed a group IV that was an out-clade to groups I (6OMTs and 4OMTs), II (6OMTs), and III (OMT of A. thaliana and 4OMTs); while CNMTs and SOMTs were clustered in their own subclade, the 4OMTs and 6OMTs split into two groups. The 4OMTs split into two groups, in which the three 4OMTs from N. nucifera formed a single cluster that was out-grouped to the other O-methyltransferases. A. thaliana OMT clustered with the SOMT clade (group IV) that was tied with two sister clades, in which one clade consisted of 6OMTs and 4OMTs (group I) and another clade comprised only 6OMTs (group II). In each clade of the deep branch, the CNMT and O-methyltransferases consisted of enzymes from different species from different families. In 6OMT in group I, for instance, the 6OMTs were from C. robustum (Berberidaceae, Ranunculales), B. koreana (Berberidaceae, Ranunculales), C. chinensis (Ranunculaceae, Ranunculales), M. cordata (Papaveraceae, Ranunculales), P. papaver (Papaveraceae, Ranunculales), and N. nucifera (Nelumbonaceae, Proteales). Because the 6OMTs split into two groups, multiple sequence analysis was carried out with 6OMTs from C. robustum and B. koreana in Berberidaceae (Figure 7). There were six and seven copies of 6OMTs in C. robustum and B. koreana, respectively. Polypeptide sequence conservation of group I 6OMTs was higher than that of group II 6OMTs, including the S-adenosyl-L-methionine binding (conserved region I) and metal binding (conserved region IV) sites (Supplementary Figure S1).
There were 184 CYP genes that fell into 31 families of CYP monooxydases in the transcriptomes of C. robustum (Table 4). The family was identified by the NCBI nomenclature. The CYP71 and CYP72 had predominantly high number of genes with 27 and 22, respectively. CYP78 amd CYP94 had 10 and 14 genes, respectively. In each gene, CYP71A1 and CYP72A had 13 copies and 8 copies, respectively, and the rest genes had variable number of copies from one to six. Of the 184 CYP genes of C. robustum, 21 genes were found in the transcriptome of B. koreana [21]. Table 4. CYP monooxygenase families in C. robustum.  Of the various CYP enzymes, CYP80 and CYP719 are involved in the berberine synthesis and we found single gene in both CYP genes in the transcriptomes of C. robustum. Thus, we carried out phylogenetic analysis with the CYP80 and CYP719 enzymes from berberine producing plants except of the Combretum micranthum. The phylogenetic analysis revealed that CYP80 and CYP719 were distinguished by forming separate clades ( Figure 8). As expected, the CYP80 and CYP719 of C. robustum showed closest relationship by forming a deepest branch, subclade I-II in group I and IV-III in group IV, respectively. There were 184 CYP genes that fell into 31 families of CYP monooxydases in the transcriptomes of C. robustum ( Table 4). The family was identified by the NCBI nomenclature.  berberine producing plants except of the Combretum micranthum. The phylogenetic analysis revealed that CYP80 and CYP719 were distinguished by forming separate clades (Figure 8). As expected, the CYP80 and CYP719 of C. robustum showed closest relationship by forming a deepest branch, subclade I-II in group I and IV-III in group IV, respectively.

Discussion
C. robustum (Asian blue cohosh) has long been traditionally used in folk medicine in China [6,7]. It is still used in the United States by midwives to aid labor [3]. Xia et al. listed 22 phytochemicals identified in root and rhizome extracts from the Caulophyllum species; however, berberine was not included in the list [6]. Berberine is a major bioactive compound among diverse benzylisoquinoline alkaloids (BIAs) in the species in Berberidaceae [25]. Caulophyllum is a small genus in the Berberidaceae family [1], and this is the first report on berberine in the Caulophyllum species. We found that the leaves contained considerably more berberine than other tissues. However, the plant parts used in folk medicine are mostly roots, rhizomes, and stems [4,7,8]. Thus, C. robustum leaf may be a good source for isolating berberine for the pharmaceutical industry.

Discussion
C. robustum (Asian blue cohosh) has long been traditionally used in folk medicine in China [6,7]. It is still used in the United States by midwives to aid labor [3]. Xia et al. listed 22 phytochemicals identified in root and rhizome extracts from the Caulophyllum species; however, berberine was not included in the list [6]. Berberine is a major bioactive compound among diverse benzylisoquinoline alkaloids (BIAs) in the species in Berberidaceae [25].
Caulophyllum is a small genus in the Berberidaceae family [1], and this is the first report on berberine in the Caulophyllum species. We found that the leaves contained considerably more berberine than other tissues. However, the plant parts used in folk medicine are mostly roots, rhizomes, and stems [4,7,8]. Thus, C. robustum leaf may be a good source for isolating berberine for the pharmaceutical industry.
The transcriptome mapping frequency was 64.4% and 73.5% of the InterProScan and NR databases in our results. These values are higher than the 69.9% NR matching in the transcriptomes of Berberis koreana [21], 46.3% in Coptis teeta, and 42.64% in Coptis chinensis [26]. In the NR database, the top six species showing over 1000 matches are known to produce BIA alkaloids [21,[26][27][28], except Tetracentron sinense. BIA alkaloids are limited in some plant families, such as Berberidaceae, Papaveraceae, and Ranunculaceae in Ranunculales [14] and Nelumbonaceae [15]. The five species showing the highest matching belong to these limited plant families. There have been no reports on the BIAs in T. sinense, which is a basal eudicot species lacking a vessel element [29] in the family Trochodendraceae in Trochodendrales. Roy et al. reported the highest NR matching between the transcriptomes of B. koreana and Nelumbo nucifera (lotus) [21]; whereas in our results, the matching between the transcriptomes of C. robustum and N. nucifera was lower than other species in Berberidaceae, Papaveraceae, and Ranunculaceae. Significant sequence homologies were reported between the transcriptomes of Coptis deltoidea and N. nucifera [30], and 1653 transcripts were still matched between N. nucifera and C. robustum in our results. Many bioactive components, including BIAs, have been found in N. nucifera, and all parts of the lotus plant have been used in Chinese traditional medicine [31,32].
In our analysis, the GO classification of 65.1% was higher than the 53.1% in B. koreana [21], but lower than those of the Coptis species, with 77.8% in C. deltoidea [30], and 70.5% in C. chinensis [26]. In the biological process category, most of the functions were related to metabolic processes. In the transcripts classified as biological processes, most of them were related to metabolic processes in which two groups with over 8000 transcripts were involved in organic substance metabolic processes and nitrogen compound metabolic processes. Alkaloids are natural organic cyclic compounds that contain at least one nitrogen atom [33]. In the molecular function category, the top two groups were organic cyclic compound binding and heterocyclic compound binding, which is reasonable as all alkaloids are heterocyclic and organic cyclic compounds. The reason that oxidoreductase activity and transferase activity groups were also found in high numbers could be related to the fact that the biosynthetic pathway for the BIA molecules was mediated by several methyltransferases and CYPs [21,34]. The GO analysis of the genes expressed among different tissues supports the finding of a higher berberine content in C. robustum leaf than in the stem, fruit, or root. In the comparison between the fruit and leaves, the up-regulated genes in the leaves included transferase and oxidoreductase activities. In the comparison between the leaves and stem, the upregulated genes in the leaves were directly related to functioning in the BIA metabolic process, BIA biosynthetic process, oxidoreductase activity, and monooxydase activity, although the numbers were less than 10 in each subcategory (Figure 4).
BIA biosynthesis starts with the condensation of dopamine and 4-hydroxyphenylacetaldehyde, both derived from L-tyrosine, to become (S)-norcoclaurine, which then undergoes enzyme-mediated stepwise reactions including oxidation, reduction, methylation, acetylation, and decarboxylation, resulting in structural changes in the molecules [35]. The biosynthetic steps and enzymes have been well elucidated in the Coptis species [26], Berberis species [21,36], and other plants [34]. We identified all biosynthetic enzymes for berberine synthesis in the transcriptomes of C. robustum. The number of copies of genes encoding each enzyme was variable, with the highest copy of 10 in NSC. We also isolated berberine synthesis genes from the transcriptomes in the NCBI database in other berberine-producing plants, which revealed that the copy numbers of each enzyme-encoding gene were highly variable in each species. The copy numbers of the berberine synthesis genes were also variable among the three closely related species in the genus Berberis [36]. In contrast to our results, half of the genes identified in the BIA synthesis were identical copies between Coptis teeta and C. chinensis [26].
NCS is a key enzyme for the biosynthesis of all BIA alkaloids as it catalyzes the condensation of dopamine and 4-hydroxyphenylacetaldehyde (4-HPAA) to form (S)-norcoclaurine, which is the entry molecule for BIA synthesis [34,36,37]. C. robustum has ten copies of NCS, which is the highest number among the BIA synthesis enzymes in C. robustum. In phylogenetic analysis, the group II NCSs might be ancestral to the group I NCSs because group II consists of families from Berberidaceae and Nelumbonaceae. The Nelumbonaceae belongs to the order of Proteales, whereas Ranunculaceae and Berberidaceae are in the order of Ranunculales. In addition, the NCSs in the deep branch clade were strictly with a single species in group I, but not in group II. NCSs from Thalictrum flavum, Papaver somniferum, and Coptis japonica shared significant sequence similarity, suggesting that NCS might be a member of the pathogenesis-related 10/Bet V1 protein family [37]. Thus, functional differentiation may be ensued by limited sequence modification. In a phylogenetic analysis of those NCSs with the known pathogenesis-related 10/Bet V1 protein, the (S)-norcoclaurine synthesis enzymes showed clustering together, so the authors suggested a monophyletic origin for NCSs. Our results showing the co-existence of NCSs from different plant families also support the monophyletic origin theory.
The berberine bridge enzyme (BBE) has a synonym (S)-reticuline oxidase as it catalyzes the conversion of (S)-reticuline to (S)-scoulerine by an oxidative ring closure conversion of (S)-reticuline by C-C bond, referred to as the berberine bridge [38]. As (S)-scoulerine is a common intermediate molecule for the biosynthesis of sanguinarine, berberine, and noscapine [34], the conversion of (S)-reticuline to (S)-scoulerine is the rate-limiting step for BIA production in many BIA-producing plants [28]. There were four copies of BBE in the C. robustum transcriptomes. In the phylogenetic study, thirty-two BBE enzymes from eleven species in three families were split into two groups, which were further divided into four subgroups, in which the BBEs from different plant families co-existed. BBEs from a single genus grouped together at the deepest subclade in most cases, but odd clustering at the deepest cluster was observed, such as C. robustum G48747T and B. stolonifera AAD17487 in subgroup II-I, and B. lycium AWH63007.1 and P. somniferum in subgroup II-II with robust bootstraps. BBE-like enzymes are a multigene family of FAD-linked oxidase (pfam08031) [39] present in plants, fungi, and bacteria [38]. In plants, the BBElike proteins were identified in wide range of plant families as mediating the production of alkaloids [38]. BBEs from different plant families co-existed in the subgroups in our phylogenetic analysis. Thus, the BBEs in the current study represent the monophyletic enzymes in the BIA synthesis pathway among the diverse BBE-like proteins.
A diverse group of methyltransferases mediate the production of naturally produced small molecules by adding methyl groups to S, N, O, or C atoms [40]. Various Oand N-methyltransferases were identified in the production of alkaloids in plants [41]. For berberine synthesis, one N-methyltransferase (CNMT) and three O-methyltransferases (4OMT, 6OMT, SOMT) are known to be involved in berberine synthesis in the Berberis species [21]. In the current study, we identified one copy of CNMT, four copies of 4OMT, six copies of 6OMT, and three copies of SOMT in the transcriptomes of C. robustum. In the phylogenetic analysis, the cluster of each methyltransferase and CNMT and the coexistence of methyltransferases from different species in the deep branch subclade indicated that each methyltransferase was monophyletic, evolving independently from each other after diverging from ancestral methyltransferase. Liu et al. demonstrated the whole gene duplication and diversification of protoberberine synthesis genes in C. chinensis from the whole sequence data [40], which can account for the divergence of the 6OMT group I and 6OMT group II in our result. The two subgroups of 6OMTs may have derived from the duplication of the 6OMT encoding gene. The duplication might have occurred in a very early stage of plant evolution as a result of the coexistence of N. nucifera (Proteales, Nelumbonaceae) and C. chinensis (Ranunculales, Ranunculaceae), along with C. robustum and B. koreana (Ranunculales, Berberidaceae) in the 6OMT group II clade. The 6OMT group I also constituted enzymes from different plant families from different orders. Ibrahim et al. identified five conserved sequence motifs in O-methyltransferases [42]. We identified these conserved motifs in the enzymes of both 6OMT groups, but each group showed slightly different sequences in each motif, alluding to the fact that they may have undergone some sequence diversification, but conserved the sequences' functional motifs, such as motif I for S-adenosyl L-methionine and motif IV for metal binding.
Cytochrome P450 (CYPs) are a monooxygenase superfamily containing heme as a cofactor [43]. CYP genes are estimated to comprise approximately 1% of the plant genome [44,45] and plant CYPs are well archived in the Plant CYP website (http://drnelson. uthsc.edu/biblioD.html, accessed on 12 December 2022). CYPs are also known to be the key drivers of alkaloid chemical diversification in plants [35,45]. We identified 184 CYP genes in the transcriptomes of C. robustum. Nelson and Schuler [46] presented CYP gene copies in 15 sequenced/representative plant genomes, including eight flowering plant species, such as lotus, Arabidopsis, papaya, grape, poplar, soybean, rice, and Brachypodium. Among those eight flowering plant species, only lotus (172 copies) and papaya (142 copies) had a lower number of CYP gene copies than our C. robustum result. The number of CYP copies in C. robustum was lower than that in the Berberis species, which belongs to Berberidaceae as Caulophyllum. The CYP copy numbers were 257 in Berberis koreana, 255 in B. thunbergii, and 253 in B. amurensis [36]. Nelson [47] grouped 16,000 plant CYPs into 277 families on the basis of 40% or higher sequence sharing so that we also adopted his suggestion on the identification of CYP families. In C. robustum, we identified 56 CYP families and the Berberis species had 31 families [36], in which 18 families shared between the Caulophyllum and Berberis species. The 31 CYP families in the C. robustum were further classified into 56 subfamilies, in which, only 16 subfamilies were shared with the Berberis species, implying that CYP differentiation was evident in subfamilies, and occurred even within Berberidaceae.
Among the plant CYP families, CYP80, CYP82, and CYP719 were known to be involved in berberine synthesis [34,48]. CYP80 and CYP719 were present, but CYP82 was not identified in the transcriptomes of C. robustum or the three Berberis species [36]. CYP80B3 mediates the conversion of (S)-N-methylcoclaurine to (S)-3 -hydroxy-N-methylcoclaurine, which is an intermediate BIA molecule in the synthesis of diverse morphine, sanguinarine, noscapine, and berberine [34]. The subfamily enzymes of CYP719 are involved diverse BIAs including berberine. However, CYP82 is not in the pathway for berberine synthesis. Thus, the Caulophyllum and Berberis species may not carry the CYP82 gene. Berberine is a major BIA molecule in the genera Berberis [16,36] and Caulophyllum [currnt study]. Roy et al. demonstrated that blocking the pathway leading to other BIA molecules results in the effective biosynthesis of berberine in B. koreana [21]. We identified one copy each of CYP80B3 and CYP719A21 in the transcriptomes of C. robustum, whereas B. koreana had five copies of CYP80 and two copies of CYP719. The berberine content in C. robustum leaf (99.95 ± 0.5 µg/g) is about five times higher than that in B. koreana leaf (19.2 ± 0.5 µg/g). Thus, the single copies of CYP80 and CYP719 may be sufficient to synthesize berberine in C. robustum, but this may require further study.
In the phylogenetic analysis, the 19 CYP80s consisted of CYP80B1, CYP80B2, and CYP80B3 subfamilies. The 21 CYP719s consisted of CYP719A1, CYP719A2, CYP719A6, CYP719A7, and CYP719A21 subfamilies. The CYP80 and CYP719 families were distinguished by their sequences, as shown by the formation into two separate clades. However, the subfamily assignment within the family was not unequivocal as sequences are highly similar among the members within a family. Many of these CYP enzymes were not annotated into subfamilies in the NCBI database; therefore, we carried out a BLAST analysis individually with the unidentified subfamily CYP enzymes. We then assigned the subfamily sequence sharing of 40% or higher in the BLAST analysis using the suggestion of Nelson [47]. However, we were unable to assign the P. nudicaule CYP719 into a subfamily, as it did not have a CYP719 subfamily sequence matching higher than 40%. CYP80 and CYP719 were specifically involved in the biosynthesis of berberine, and those species used in the phylogenetic analysis were all berberine-producing medicinal plants, except C. micranthum, which has been used as a medicinal plant and contains various phytochemicals, including alkaloids [49]. However, its alkaloids were not further characterized. Our result showed that most of the CYP80 and CYP719 subfamilies were differentiated within plant families, except the CYP members in the II-I and III-I groups. Moreover, the CYPs from the same species were placed in the deepest branch clade, implying that CYP gene duplication occurred within species to increase the copy numbers. Gene duplication is important for gene family differentiation, and many genes involved in the synthesis of secondary metabolites are present in gene families [21].

Summary
Caulophyllum species have traditionally been used in herbal medicine by native Americans. The roots and rhizomes of C. robustum, Asian blue cohosh, have also been widely used in folk medicine in China. Various phytochemicals have been reported as being present in this plant, but berberine has not been identified until current study. We found high berberine content in C. robustum leaves; therefore, we carried out a transcriptome analysis to characterize the berberine biosynthesis genes in C. robustum. We obtained 41,094 transcripts that were annotated and characterized by GO analysis. Vis-à-vis comparison of the number of genes expressed in leaf, stem, and fruit was conducted with GO annotation, in which, more genes were expressed in leaf compared to fruit and stem. We isolated all the genes involved in the berberine synthesis pathway in the transcriptomes of C. robustum, and the compared gene copies in six other berberine-producing plants. Phylogenetic analysis was then carried out with NCS, BBE, methyltransferases, CYP80 and CYP719. In the analyses of NCS and BBE enzymes, different species were mixed in subclades of their phylogenetic tree to support the monophyletic origins of both enzymes. There were four methyltransferases, 6OMT, CNMT, 4OMT, and SOMT, involved in berberine synthesis. In the phylogenetic analysis, the CNMTs and SOMTs were clustered in their own subclade, and the 4OMTs and 6OMTs split into two groups, which means that the duplication of 4OMTs and 6OMTs occurred early in the evolution of the plant. We identified 184 CYPs, in which, 132 CYPs were classified into 31 families, with the rest not classified. CYP80 and CYP719 are involved in berberine synthesis, and we identified one copy each of CYP80 and CYP719. The phylogenetic analysis revealed that the differentiation into subfamilies in both CYP80 and CYP719 occurred mostly at the plant family level. The phylogenetic tree also revealed that the copy number increment occurred at the species level.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/plants12071483/s1. Figure S1. Sequence comparison of the group 1 and group II of 6OMTs of C. robustum and B. koreana. The conserved motif sequences are highlighted with red. The motifs I and IV are S-adenosyl-L-methionine binding (conserved region I) and metal binding (conserved region IV) sites, respectively. Table S1. Number of reads and read length before and after processed. Table S2. Mapping Information of the clean reads. Table S3. De novo assembly results. Table S4. Number of expressed genes in different tissues. Table S5. Protein sequences of the berberine synthesis genes in berberine producing plants.