Comparative Transcriptomics for Genes Related to Berberine and Berbamine Biosynthesis in Berberidaceae

Berberine and berbamine are bioactive compounds of benzylisoquinoline alkaloids (BIAs) present in Berberis species. The contents of berbamine are 20 times higher than berberine in leaf tissues in three closely related species: Berberis koreana, B. thunbergii and B. amurensis. This is the first report on the quantification of berberine compared to the berbamine in the Berberis species. Comparative transcriptome analyses were carried out with mRNAs from the leaf tissues of the three-species. The comparison of the transcriptomes of B. thunbergii and B. amurensis to those of B. koreana, B. thunbergii showed a consistently higher number of differentially expressed genes than B. amurensis in KEGG and DEG analyses. All genes encoding enzymes involved in berberine synthesis were identified and their expressions were variable among the three species. There was a single copy of CYP80A/berbamunine synthase in B. koreana. Methyltransferases and cytochrome P450 mono-oxidases (CYPs) are key enzymes for BIA biosynthesis. The current report contains the copy numbers and other genomic characteristics of the methyltransferases and CYPs in Berberis species. Thus, the contents of the current research are valuable for molecular characterization for the medicinal utilization of the Berberis species.


Introduction
Berberidaceae is an herbaceous plant family that consists of 700 species of 18 genera [1,2]. The family includes pharmacologically important genera (i.e., Berberis, Caulophyllum, Mahonia) producing bioactive compounds such as alkaloids, terpenoids, flavonoids, sterols, anthocyanins, and carotenoids [3,4]. Berberidaceae are commonly known as the barberry family because the majority of the species are in the genus Berberis, in which about 500 species are included [5]. Berberis species are distributed worldwide in temperate and subtropical regions of the northern hemisphere [6]. Plants in the Berberis species have been utilized for folk medicine for a long time. Extracts from many parts of the organs have been used for treatments in wide range of cardiovascular, metabolic, hepatic, and renal disorders [7]. The bioactive compounds in Berberis species are benzylisoquinoline alkaloids (BIAs). BIAs are restricted to certain plant families such as Magnoliaceae, Ranunculaceae, Papaveraceae, and Berberidaceae in Ranunculales [8] and Fabaceae in Fabales [9].

RNA Extraction, Illumina Sequencing, Data Processing and Assembly
All samples were collected in August 2021 when plants grow most vigorously in Chuncheon, South Korea. Leaf tissues were collected from three Berberis species (B. koreana, B. amurensis, and B. thunbergii) and immediately frozen in liquid nitrogen. Three biological replicates for each leaf species were prepared. RNA extraction was performed with Hybrid-RTM kit (Gene All Biotechnology Co., Seoul, Korea) with the manufacturer's protocol. RNA quality and quantity were checked using the A2100 Bio system (Agilent Technologies Inc., Santa Clara, CA, USA). Then, only RNA with a concentration of 1-10 µg and an RNA integrity number (RIN) above 8.0 was used for library construction for sequencing.
A cDNA library was constructed using the TruSeq RNA Sample Prep Kit v2 (Illumina). Paired-end sequencing was performed with Illumina HiSeq 2500 at the National Instrumentation Centre for Environmental Management (NICEM) at Seoul National University, South Korea. Low-quality reads with Phred score < 20, short reads (<50 bp), empty nucleotides (N at the end of reads), and adapter sequences were trimmed using Trimmomatic software [30]. Quality control before and after trimming was performed using FastQC [31] (https://www.bioinformatics.babra ham.ac.uk/projects/fastqc/; access date, 11 May 2022).

De Novo Assembly
Obtained sequences were assembled using the de novo RNA-Seq Assembly Pipeline (DRAP) protocol [32]. Only high quality clean reads were used in the assembly using Oases program. The first step for assembly using Oases (v 0.2.09) [33] was implemented with K mers sizes of 25, 31, 37, 43, and 49, and the assembled fragments were merged. Then, the chimeric contigs were deleted and we removed duplicated sequences, insertions, and deletion and polyA/T tails. The second compaction employed with the protocol of CD-HIT-EST v4.8.1 to order the contigs by length and remove all the included ones given identity and coverage thresholds [34]. The coding region was then finally confirmed using TransDecoder (v3.0.1) [35]. Short reads were mapped to the assembly using Burrows-Wheeler Aligner (BWA) [36]. Fragments Per Kilobase per Million (FPKM) was calculated to select fragments of higher than 1 FPKM value to be used as reference sequence. HISAT2 v2.1.0 program [37] was used for mapping the Berberis reads with the assembled de novo reference sequences.

DEG
The transcriptome data sets of B. thurnbergii and B. amurensis were mapped against the transcriptomes of B. koreana for differentially expressed gene (DEG) analysis. Transcript read counts were calculated with StringTie v1.3.4d [38]. DEG analysis was performed using DESeq v1.30.0 [39]. In DESeq, read count is normalized through size factor and scatter plot, and significant DEG results are organized through Log2Fold Change value. Then, p value was calculated for post normalization in which we set p-value < 0.05 and log2FC abs1 for significance.

Functional Annotations and Phylogenetic Analysis
For functional annotation of genes obtained through differential expression analysis, a homology search was performed using a known protein database by Basic Local Alignment Search Tool (BLAST) analysis [40,41]. The cut-off parameters were E-value 10 −4 and >70% similarity with the protein sequences in UniProt, TAIR, and NCBI databases. Finally, all candidate transcripts with Pfam domains and BLASTP hits were retained and imported to Blast 2GO suite 5.2 (BioBam Bioinformatics SL, Valencia, Spain). Then, gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyses were performed. The retrieved GO terms were classified into three main categories: biological process (BP), cellular component (CC), and molecular function (MF). Unique enzyme commission (EC) numbers were assigned to transcripts and used to map the KEGG biochemical pathways. The specific composition of the GO terms was calculated and presented in a bar chart according to the percentage.
Phylogenetic analysis was performed using the protein sequences of genes involved in berberine biosynthesis. The protein sequences were aligned with the same homologs of closely related families, and a maximum-likelihood phylogenetic tree was built using MEGA X (v.10.2.4) [42]. The protein sequences were aligned by ClustalW software (v2.1) and the phylogenetic tree was constructed by neighbor-joining method using MEGA ver. 5.0 software with 1000 bootstraps.

DEG
The transcriptome data sets of B. thurnbergii and B. amurensis were mapped against the transcriptomes of B. koreana for differentially expressed gene (DEG) analysis. Transcript read counts were calculated with StringTie v1.3.4d [38]. DEG analysis was performed using DESeq v1.30.0 [39]. In DESeq, read count is normalized through size factor and scatter plot, and significant DEG results are organized through Log2Fold Change value. Then, p value was calculated for post normalization in which we set p-value < 0.05 and log2FC abs1 for significance.

Functional Annotations and Phylogenetic Analysis
For functional annotation of genes obtained through differential expression analysis, a homology search was performed using a known protein database by Basic Local Alignment Search Tool (BLAST) analysis [40,41]. The cut-off parameters were E-value 10 -4 and >70% similarity with the protein sequences in UniProt, TAIR, and NCBI databases. Finally, all candidate transcripts with Pfam domains and BLASTP hits were retained and imported to Blast 2GO suite 5.2 (BioBam Bioinformatics SL, Valencia, Spain). Then, gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyses were performed. The retrieved GO terms were classified into three main categories: biological process (BP), cellular component (CC), and molecular function (MF). Unique enzyme commission (EC) numbers were assigned to transcripts and used to map the KEGG biochemical pathways. The specific composition of the GO terms was calculated and presented in a bar chart according to the percentage.
Phylogenetic analysis was performed using the protein sequences of genes involved in berberine biosynthesis. The protein sequences were aligned with the same homologs of closely related families, and a maximum-likelihood phylogenetic tree was built using MEGA X (v.10.2.4) [42]. The protein sequences were aligned by ClustalW software (v2.1) and the phylogenetic tree was constructed by neighbor-joining method using MEGA ver. 5.0 software with 1000 bootstraps.

mRNA Sequence Analysis and De Novo Assembly
Cellular mRNAs from leaf tissues were sequenced using Illumina paired-end sequencing technology. RNA-Seq libraries were made in triplicate in each sample. Raw reads were generated in each species in the range from 35 million to 80 million reads ( Table 1). The average error rate of the libraries was as low as average Q20 > 96%. About 92% of the reads were retained in each species after removal of adaptor sequences, short reads (<50 nt), and low quality sequences (Table 1). Then, clean reads from all samples were pooled to perform de novo assembly using DRAP [32].

Functional Annotations and Classifications
For functional annotation of the transcripts, we queried the sequences of transcripts against public databases using BLASTX. As a result, the numbers of transcripts of Berberis species matched with the annotated transcripts in the NCBI database were 37,176 in Nr, 17,252 in GO, 4729 in KEGG, 24,364 in Pfam, and 23,466 in Interpro (Supplementary  Table S1). GO terms using the Blast2Go program (BioBam Bioinformatics SL, Valencia, Spain) describe the annotated genes, gene products and sequences in controlled vocabularies into three subcategories; biological process (BP), cellular components (CC) and molecular function (MF). In B. koreana 20,806 were assigned to BP, 20,779 to CC and 20,821 to MF. The classification patterns of the other two Berberis species are similar to that of the B. koreana (Supplementary Table S1).
KEGG pathway analysis was carried out to seek the biological interpretations of the transcripts. Of the 41,464 unigenes, 4730 were able to be placed in 337 unique KEGG pathways in the three Berberis species (Supplementary Table S2). The number of annotated unigenes was 3804 in B. koreana, 3823 in B. thunbergii, and 3785 in B. amurensis. The unigenes were categorized into two major KEGG categories, Metabolism and Genetic Information Processing (Supplementary Table S2).

Differentially Expressed Genes and Classification
Differentially expressed genes (DEGs) among the Berberis species were identified by read counts using String Tie [38] and the results were normalized through size factor and scatter plot using DESeq [39] (Supplementary Figure S2). Because transcripts of B. koreana were reported previously [30],

Functional Annotations and Classifications
For functional annotation of the transcripts, we queried the sequences of transcripts against public databases using BLASTX. As a result, the numbers of transcripts of Berberis species matched with the annotated transcripts in the NCBI database were 37,176 in Nr, 17,252 in GO, 4729 in KEGG, 24,364 in Pfam, and 23,466 in Interpro (Supplementary Table  S1). GO terms using the Blast2Go program (BioBam Bioinformatics SL, Valencia, Spain) describe the annotated genes, gene products and sequences in controlled vocabularies into three subcategories; biological process (BP), cellular components (CC) and molecular function (MF). In B. koreana 20,806 were assigned to BP, 20,779 to CC and 20,821 to MF. The classification patterns of the other two Berberis species are similar to that of the B. koreana (Supplementary Table S1).
KEGG pathway analysis was carried out to seek the biological interpretations of the transcripts. Of the 41,464 unigenes, 4730 were able to be placed in 337 unique KEGG pathways in the three Berberis species (Supplementary Table S2). The number of annotated unigenes was 3804 in B. koreana, 3823 in B. thunbergii, and 3785 in B. amurensis. The unigenes were categorized into two major KEGG categories, Metabolism and Genetic Information Processing (Supplementary Table S2).

Differentially Expressed Genes and Classification
Differentially expressed genes (DEGs) among the Berberis species were identified by read counts using String Tie [38] and the results were normalized through size factor and scatter plot using DESeq [39]   In the KEGG annotations among the DEGs, a higher number of unigenes differentially expressed in B. thunbergii was observed compared to B. amurensis (Figure 3). In addition, the order of unigenes abundance in each pathway was different between the two species.  In the KEGG annotations among the DEGs, a higher number of unigenes differentially expressed in B. thunbergii was observed compared to B. amurensis (Figure 3). In addition, the order of unigenes abundance in each pathway was different between the two species. For instance, the "pentose and glucoronate interconversions" pathway was enriched with the highest number of differentially expressed unigenes B. thunbergii (38 unigenes), but in B. amurensis the number was low (17 unigenes) (Figure 3). The exact numbers of upregulated and downregulated unigenes in both species are shown in Supplementary Figures S5 and S6.

Genes Encoding Enzymes for BIAs
Berberine biosynthesis begins with combining dopamine and 4-hydrophenylacetaldehyde, which are converted to (S)-norcoclaurine by (S)-norcoclaurine synthase (NCS) [30]. Then, (S)-norcoclaurine is converted to various BIA molecules by consequential steps that are mediated by several methyltransferases and oxidases. Table 5 shows the expression values (normalized read counts) of the unigenes of enzymes involved in berberine synthesis. The number of paralog copies of B. koreana are shown in brackets in the column of B. koreana from the results of Roy et al. [30], but these values are not shown in the B. thunbergii and B. amurensis because long-read cDNA transcriptomes are not available for them. Of the three species, B. thunbergii had highest berberine contents in leaf, followed by B. amurensis and B. koreana ( Figure 1). However, the expressions of the berberine synthesis genes were not related to the berberine contents. For instance, expression values

Genes Encoding Enzymes for BIAs
Berberine biosynthesis begins with combining dopamine and 4-hydrophenylacetaldehyde, which are converted to (S)-norcoclaurine by (S)-norcoclaurine synthase (NCS) [30]. Then, (S)-norcoclaurine is converted to various BIA molecules by consequential steps that are mediated by several methyltransferases and oxidases. Table 5 shows the expression values (normalized read counts) of the unigenes of enzymes involved in berberine synthesis. The number of paralog copies of B. koreana are shown in brackets in the column of B. koreana from the results of Roy et al. [30], but these values are not shown in the B. thunbergii and B. amurensis because long-read cDNA transcriptomes are not available for them. Of the three species, B. thunbergii had highest berberine contents in leaf, followed by B. amurensis and B. koreana ( Figure 1). However, the expressions of the berberine synthesis genes were not related to the berberine contents. For instance, expression values NMCH/CYP80B3 was in the order of B. koreana, B. amurensis, and B. koreana. There were eight steps from (S)-norcoclaurine to berberine in the berberine biosynthetic pathway [11]. Of the eight steps, four steps were mediated by methyltransferases, and four steps by oxidases. In the current study, unigenes encoding for those eight BIA biosynthesizing enzymes were all identified and expressions of these genes were differed among the three Berberis species ( Figure 4A, Table 5, Supplementary Table S3). For instance, (S)-N-methylcoclaurine 3 -hydroxylase/N-methylcoclaurine 3 -monooxygenase (NMCH/CYP80B3), and 3 -hydroxy-N-methylcoclaurine 4 -O-methyltransferase (4-OMT) were found to be expressed highest in B. amurensis, while TyrAT (Tyrosine aminotransferase) and (S)-scoulerine 9-O-methyltransferase (SOMT) were expressed highest in B. amurensis. BBE (Berberine bridge enzyme/reticuline oxidase), canadine synthase enzyme (CYP719A21/CAS) and Tyrosine decarboxylase (TYDC) were expressed highest in B. koreana (Table 5).

CYPs and Methyltransferases among Three Berberis Species
Cytochrome P450 mono-oxidases (CYPs) and methyltransferases are key enzymes in BIA synthesis. We identified 95 CYP gene families among the three species in our transcriptome data.
The number of CYP copies was 257 in B. koreana, 255 in B. thunbergii, and 253 in B. amurensis (Supplementary Tables S4 and S5). In the berberine biosynthesis pathway, CYP80B3 and CYP719A were involved, and we identified 11 copies in CYP80B3 and 6 copies in CYP719A of these CYPs in our transcriptome data (Supplementary Table S4). The CYP80A1 is responsible for the synthesis berbamunine to lead berbamine synthesis. There was a single copy of the CYP80A1 in B. koreana. From the DEG analysis, we observed that the unigenes expressing CYP719A were upregulated in B. koreana and downregulated in B. thunbergii, and those annotated as CYP80B3 were mostly downregulated in B. koreana but upregulated in B. thunbergii ( Figure 4B).
Our study annotated 298 unigenes as methyltransferases in 23 families (Supplemen- The berbamine biosynthetic pathway has not been fully elucidated yet. It was suggested that berbamunine was a precursor for berbamine [43]. Berbamunine is a molecule derived from (S)-N-methylcoclaurine by CYP80A1/berbamunine synthase [11]. We found a single copy of the CYP80A1 in the full length cDNA transcripts in B. Koreana [30].

CYPs and Methyltransferases among Three Berberis Species
Cytochrome P450 mono-oxidases (CYPs) and methyltransferases are key enzymes in BIA synthesis. We identified 95 CYP gene families among the three species in our transcriptome data.  (Supplementary Tables S4 and S5). In the berberine biosynthesis pathway, CYP80B3 and CYP719A were involved, and we identified 11 copies in CYP80B3 and 6 copies in CYP719A of these CYPs in our transcriptome data (Supplementary Table S4). The CYP80A1 is responsible for the synthesis berbamunine to lead berbamine synthesis. There was a single copy of the CYP80A1 in B. koreana. From the DEG analysis, we observed that the unigenes expressing CYP719A were upregulated in B. koreana and downregulated in B. thunbergii, and those annotated as CYP80B3 were mostly downregulated in B. koreana but upregulated in B. thunbergii ( Figure 4B).

Phylogenetic Analysis of the Methyltransferases and CYP Oxidases
Protein sequences from the longest ORF of the four methyltransferases (6OMT, 4OMT, SOMT, and CNMT) and three CYPs (CYP719A, CYP80B3, and CYPA1) were employed for phylogenetic analysis ( Figure 5). The methyltransferase tree was divided into two broad clusters, one with 6OMT and 4OMT and the other with SOMT and CNMT. Out of all Berberis methyltransferases, only SOMT was grouped together with orthologs of other species, appearing very close to Thalictrum flavum subsp glaucum and Coptis chinensis and C. japonica ( Figure 5A). Other Berberis methyltransferase (6OMT, 4OMT, and CNMT) were tied with CNMTs in other species ( Figure 5A). and C. japonica ( Figure 5A). Other Berberis methyltransferase (6OMT, 4OMT, and CNMT) were tied with CNMTs in other species ( Figure 5A). The CYP phylogenetic tree revealed two clusters, one for CYP80 and another for CYP719A ( Figure 5B). In the CYP80 cluster, the Berberis CYP80A was out-grouped to other CYP80B subfamilies in which the two CYP80B from Berberis species formed a deep clade with Thalictrum thalictroides. In the CYP719 subfamilies, the Berberis CYP719 was outgrouped to other CYP719s from the genera Thalictrum, Coptis, and Paperver.

Discussion
The species in the genus Berberis have been utilized for folk medicine for a long time worldwide. We carried out comparative transcriptome analyses on the three East Asian endemic Berberis species, B. koreana, B. amurensis, and B. thunbergii, to elucidate the genes involved in benzylisoquinoline alkaloids (BIAs) and bis-benzylisoquinoline alkaloid (bisBIA). Berberine and berbamine are major bioactive compounds in the Berberis species [3,12]. We quantified contents of the berberine and berbamine from leaf tissues of the three species. Berbamine contents were 20-fold higher than berberine in the three Berberis species. While the biosynthetic pathway for berberine was well characterized [16,30], the berbamine biosynthesis pathway has not been elucidated [43]. Total extracts from Berberis species have been used for a long time in folk medicine around the world and berberine The CYP phylogenetic tree revealed two clusters, one for CYP80 and another for CYP719A ( Figure 5B). In the CYP80 cluster, the Berberis CYP80A was out-grouped to other CYP80B subfamilies in which the two CYP80B from Berberis species formed a deep clade with Thalictrum thalictroides. In the CYP719 subfamilies, the Berberis CYP719 was out-grouped to other CYP719s from the genera Thalictrum, Coptis, and Paperver.

Discussion
The species in the genus Berberis have been utilized for folk medicine for a long time worldwide. We carried out comparative transcriptome analyses on the three East Asian endemic Berberis species, B. koreana, B. amurensis, and B. thunbergii, to elucidate the genes involved in benzylisoquinoline alkaloids (BIAs) and bis-benzylisoquinoline alkaloid (bisBIA). Berberine and berbamine are major bioactive compounds in the Berberis species [3,12]. We quantified contents of the berberine and berbamine from leaf tissues of the three species. Berbamine contents were 20-fold higher than berberine in the three Berberis species. While the biosynthetic pathway for berberine was well characterized [16,30], the berbamine biosynthesis pathway has not been elucidated [43]. Total extracts from Berberis species have been used for a long time in folk medicine around the world and berberine was the main alkaloid that has been utilized in the pharmaceutical industry. The results from current study showed higher contents of berbamine than berberine. Thus, functional studies on the pharmacological effects of the berbamine in Berberis species are required.
GO annotation was made in 17,252 unigenes, which is approximately 41% of the total 41,764 unigenes, whereas 4729 unigenes were annotated in the KEGG analysis, which was 11.32% of the total number. Our previous study in B. koreana annotated a similar number (16,756) of unigenes into GO classification, but a significantly higher number (12,362) of unigenes in KEGG categories [30]. The major difference can be attributed to the large number of novel unannotated unigenes found in the two related species B. thunbergii and B. amurensis. We identified approximately the same number of unigenes among the three species, but B. thunbergii showed a higher number of differentially expressed unigenes than B. amurensis in comparison with B. koreana in DEG analysis (Figures 2 and 3). However, it would be too rash to relate the DEG results with the contents of berberine and berbamine because the expression of unigenes involved BIA synthesis was variable among the three species (Table 5). Prior to this study, we identified all genes encoding the enzymes in berberine biosynthesis in B. koreana and noted that branching steps to lead other alkaloids were effectively blocked by lacking the appropriate genes [30]. We demonstrated expression differences of the genes in berberine synthesis among the three Berberis species. Evolutionary pressures shape the gene expression by the relative importance of evolutionary changes in regulatory genetic and epigenetic mechanisms [44]. Thus, each Berberis species might have undergone different histories in adapting to local habitats.
CYP450s are involved in wide array of plant metabolisms leading to the synthesis of fatty acids, lignin, plant hormones, and secondary metabolites [45]. Of the many CYP families, two main families were known to be involved in the BIA alkaloids: CYP80 and CYP719 [11]. We identified 95 CYP gene families, including CYP80 and CYP719 in the transcriptomes of the three species. Three CYP80 subfamilies (A, B, and G) were known in the BIA metabolism. CYP80B [(S)-N-methylcoclaurine 3 -hydroxylase. N-methylcoclaurine 4 -O-methyltransferase, NMCH] catalyzes the conversion of (S)-N-methylcoclaurine to (S)-3 hydroxyl-N-methylcoclaurine leading to the berberine synthesis [42]. In our results, CYP80B3 showed highest level of expression counted by normalized read counts among the unigenes encoding berberine synthesis enzymes (Table 5). B. amurensis showed the highest level of expression, which was followed by B. thunbergii and B. koreana, which is an interesting result because the content of berberine in B. thunbergii was the highest among the three species. The CYP80G subfamily mediates the reaction of conversion of N-methylcoclaurine to lirinidine of aporphine in lotus [46]. The CYP80A (berbamunine synthase) subfamily mediates the reaction to convert the (S)-N-methylcoclaurine to berbamunine [43,47]. Berbamunine was the first dimer in the biosynthesis of bisBIAs caused by feeding the analysis of 14 C-labeled tyrosine, tyramine, and several chiral 1-benzyl-1,2,3,4-tetrahydroisoquinolines in the cell culture of Berberis stolonifera [23]. Kraus and Kutchan (1995) [48] demonstrated this by heterologous expression of a cDNA encoding berbamunine synthase (CYP80) from a cell suspension culture of B. stolonifera, but they did not classify the CYP80 into subfamilies. In our results, the content of berbamine is about 20 times higher than berberine in the three Berberis species, but unigenes for en-coding CYP80A/berbamunine synthase were not found in transcriptome data. Instead, we identified one copy of CYP80A/berbamunine synthase in the full-length cDNA data set in B. koreana which was generated by PacBio sequencing protocol [30]. In Berberis species, berbamine is a sole bisBIA alkaloids reported, whereas several BIA alkaloids (i.e., including berberine, palmatine, jatrorrhizine, tetrahydropalmatin, and berbamunine) were known [49], thus the expression level of the other enzyme genes in the berberine biosynthesis pathway might be higher than CYP80A/berbamunine synthase.
There are two types of bisBIAs, noncyclic bisBIAs and cyclic bisBIAs. The berbamunine is a noncyclic BIA, whereas the berbamine is a cyclic BIA. The berbamunine may be an intermediate molecule leading to berbamine by forming an aryl bond between C8-O-C7 and C3 -O-C3 in berbamunine. However, the enzyme in charge of the conversion berbamunine to berbamine is not known yet [43]. Although it is a prerequisite to quantify berbamunine in B. koreana, B. thunbergii, and B. amurensis, we could not identify the berbamunine in our samples in HPLC analysis because the standard berbamunine molecule was not available in our analysis.
CYP719 subfamily members catalyze the formation of methylenedioxy bridge in roemerine (CYP719A1) lotus [9] and (S)-canadine (Canadine synthase, CYP719A21) in Berberis [30] and Coptis [21]. The expression levels CYP719A1 were much lower than expression levels of CYP80B3 in the three Berberis species in our results. We identified three copies of CYP719A1 in our previous study [30] and three more copies were identified in B. koreana in the current study. Thus, six copies of CYP719A1 were also present in B. thunbergii and B. amurensis (Supplementary Table S4). The number of copies of CYP719 in Berberis seemed to be higher than other BIA-producing plants, such as a single copy in lotus [9,50] and two copies in C. deltoidea [21]. The phylogenetic analysis divided the CYPs into two clusters, where Berberis CYP719A formed a sub cluster with A. mexicana and P. somniferum. Berberis CYP80 was formed into two sub-clusters where all other species formed one group, and Berberis CYP80 singly branched out. The N. nucifera appeared as an outgroup and a primordial sequence for both CYPs (CYP80B3 and CYP719A), which is in congruence with other studies [9,30].
We identified 24 methyltransferase families in which the norcoclaurine 6-Omethyltransferase (6OMT) had the highest number of copies with 13~15 copies, followed by S-coclaurine-N-methyltransferase (CNMT) with 9~10 copies in each Berberis species. The copies of these two transcripts in Berberis species were higher than the copies in other BIA producing species such as C. deltoidea with five copies of 6OMT and three copies of CNMT [21] and lotus with three copies of 6OMT and two copies of CNMT [9]. Methyltransferases catalyze biochemical reactions for biosynthesis and modifications of bioactive molecules in plants. There are several types of methyltransferases, depending on the substrate atom that accepts the methyl group; O, N, C, or S [8]. (S)-norcoclaurine, the most immediate BIA molecule, transformed to other BIA molecules by sequential methylation and oxygenation [16]. Methylation occurs at four sites at C6, C7, C4 , and N2 in benzylisoquinoline. In the berberine biosynthesis pathway, three O-methyltransferases and one N-methyltransferase are involved such as 6OMT, CNMT, 3 -hydroxy-N-methylcoclaurine 4 -O-methyltransferase (4OMT), (S)-scoulerine 7-O-methyltransferase (SOMT) [46]. Roy et al. (2021) [30] reported that the copy numbers of 6OMT, 4OMT, SOMT, and CNMT were two, two, one, and four, respectively, in B. koreana according to PacBio full-length cDNA analysis. In the current comparative transcriptome analysis, we identified additional copies to be 14 in 6OMT, 2 in 4OMT, 2 in SOMT, and 9 in CNMT in B. koreana. The copy numbers of 4OMT and SOMT were invariable among the three Berberis species, but 6OMT and CNMT were variable in numbers such that 6OMT was 14, 13, and 15 in B. koreana, B thunbergii, and B. amurensis, respectively. The copy numbers of CNMT were 9, 10, and 10 in B. koreana, B thunbergii, and B. amurensis, respectively. The phylogenetic analysis represented the Berberis methyltransferase (6OMT, 4OMT and CNMT) into a single clade together with CNMT of Sinopodophyllum hexadendrum and one sub cluster of four species ( Figure 5A) indicating that the Berberis methyltransferase were not phylogenetically related to previously reported 6OMT and 4OMT genes. The current SOMT clade corroborated our previous findings [30] as it was tied closely to T. thalictroides, C. japonica, and C. chinensis.
In conclusion, the genus Berberis contains many medicinal plants in which the major bioactive molecules are bisbenzylisoquinoline (BIAs). Among the diverse BIAs, berberine is the main molecule that is the most biologically active BIA compound distributed in almost all Berberis species. Berbamine is a bisBIA molecule on Berberis species, but it has not been well studied yet compared to the wealth of information on berberine. We quantified both berberine and berbamine in three closely related Berberis species, B. koreana, B. thunbergii, and B. amurensis, which are endemic plants in Far East Asia. Unexpectedly, berbamine contents were almost 20 times higher than the berberine in all three species. This is the first report on the berbamine quantification compared to berberine in Berberis species. We identified all genes encoding the berberine synthesis from transcriptomes of the three species. For berbamine synthesis, we identified a single copy of the berbamunine synthase gene from PacBio sequencing database. Comparative genomic analyses were carried out on the transcriptomes of the three species, which revealed differential expression and copies of the BIA synthesis genes among the three species. Thus, the contents of the current research will be valuable for molecular characterization for the medicinal utilization of Berberis species.
Supplementary Materials: The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/plants11202676/s1, Figure S1: HPLC Standard 100mg/L berberine (red) and berbamine (green) RT(retention time), followed by berbamine and berberine RT for each sample., Figure S2: Heat map and volcano plot of all differentially expressed genes (red upregulated and blue down regulated)., Figure S3: Venn diagram showing total number of differentially expressing genes in B. koreana, B. amurensis and B. thunbergii., Figure S4: Functional categories of (A) upregulated DEG genes and (B) down regulated genes in the Gene Ontology. Figure Table S1: Total unigenes classification in gene ontology categories, Table S2: KEGG annotation of unigenes, Table S3: List of unigenes encoding genes for berberine synthesis pathway, Table S4: Number of CYP copies in each CYP gene families in three Berberis species.

Informed Consent Statement: Not applicable.
Data Availability Statement: All data were submitted to the National Center for Biotechnology Information (NCBI). The de novo assembly data was registered under SRA SUB10636706 with BioProject ID PRJNA814432. The raw RNA sequencing BioProject ID was PRJNA856352 for B. amurensis, PRJNA856402 for B. koreana, and PRJNA856417 for B. thunbergii.