Comparative Analysis and Phylogenetic Implications of Plastomes of Five Genera in Subfamily Amyridoideae (Rutaceae)

: In the most recent classiﬁcation of Rutaceae, Amyridoideae is the largest and most diverse subfamily. In Amyridoideae, the genera Phellodendron , Tetradium , Toddalia and Zanthoxylum were proposed as “proto-Rutaceae”due to substantial phytochemical similarities. In this study, we investigated the plastome varia-tions in eight species representing these four genera and Melicope. All plastomes exhibited a typical quadripartite structure with four regions (LSC, SSC, IRa and IRb). The whole chloroplast genome size ranged from 158,383 bp to 159,014 bp and the gene number ranged from 115 to 116. By comparative analyses, we found that there were structural variations at the LSC/IR and SSC/IR borders of the plastomes in the ﬁve genera, especially in Melicope . Three most divergent regions ( trnH-psbA , trnE-trnT and psaB ) were found from the LSC region, which had great potential for developing effective genetic markers. In addition, we conducted a phylogenomic analyses of the “proto-Rutaceae”and related taxa with plastomes data from 36 species. Our results showed that (1) Phellodendron , Tetradium , Toddalia and Zanthoxylum were conﬁrmed as close relatives and grouped together as the ‘proto-Rutaceae’, (2) Phellodendron was sister to Tetradium , and Toddalia was deeply nested within Zanthoxylum , and (3) Toddalia asiatica was closely related to the Southwest Paciﬁc and East Asian species of Zanthoxylum , and Melicope pteleifolia was more closely related to Acronychia than it is to Tetradium . This study provided new insights into the plastome structural varia-tions in subfamily Amyridoideae, and demonstrated that the plastomes data were sufﬁciently robust to explore implications of the phylogeny for the previous phylogenetic hypotheses. rapid nucleotide substitution at the species level, indicating great potential as barcode markers for plant identiﬁcation and further phylogenetic analysis in Amyridoideae.


Introduction
Chloroplasts (cp) are one of the essential organelles in plant cells. The chloroplast genomes (plastomes) are usually circular DNA molecules, composed of a large single copy (LSC) region and a small single copy (SSC) region that are separated by a pair of inverted repeats (IRs) regions [1]. However, many plastomes have revealed considerable variation within and between plant species in terms of both sequence and structural variation, which is valuable for understanding of the climatic adaptation of plants [2]. In addition, due to its simple structure, highly conserved sequence, and maternal inheritance characteristics, the plastome has long been a focus of research in plant molecular evolution and systematics [3]. In recent years, the whole chloroplast genomes have made significant contributions to phylogenetic studies of many plant families, and have been suggested to give higher resolution of evolutionary relationships within phylogenetic clades, compared with small numbers of plastid or nuclear markers [2,[4][5][6][7].

Taxon Sampling, DNA Extraction and Sequencing
Five samples of Phellodendron chinense, Tetradium daniellii (Benn.) T.G. Hartley, and Toddalia asiatica were collected in China from Tianquan of Sichuan, Changping of Beijing, and Gushan of Fujian, respectively. Voucher specimens were deposited at the Museum of Beijing Forestry University (Table S1). Total genomic DNA was extracted from silica-dried leaves of the three species using CTAB method [24].
Average 150 bp pair-end reads were performed on an Illumina HiSeq 4000 platform at Novogene Biotech Co. (http://www.novogene.com (accessed on 1 February 2021), China). The filtered reads were used for de novo assembly with Geneious Prime [25]. Then the generated contigs were concatenated into larger contigs based on the published plastomes of P. amurense (NC_035551.1), T. ruticarpum (MT134114) and Z. pinnatum (MN968553). By using the Repeat Finder plugin of Geneious, the IR regions were determined and copied in reverse order to complete the chloroplast sequences. These sequences were annotated by the Unix Program Plann [26]. After checking the annotated circular structures, we used the Organellar Genome DRAW tool [27] to generate the illustrations of circular plasmids.

Codon Usage and Characterization of Repeat Sequences
Apart from the three newly assembled plastomes, we downloaded the plastomes of other five Amyridoideae species from GenBank (Table S1) for comparative analysis. Then we used CodonW1.4.2 program [28] to calculate relative synonymous codon usage (RSCU) value of these eight plastomes. The R package ggplot2 was used to visualize the RSCU values. RSCU = 1 indicated that codon had no use preference, the codons with RSCU value >1 were defined as high frequency codons whereas the reverse was true for RSCU values less than 1.0 [29].
Simple sequence repeats (SSRs) were produced by MISA [30]. The thresholds of mononucleotides, dinucleotides, trinucleotides, tetranucleotides, pentanucleotides, and hexanucleotides repeats were 8 repeat units, 5 repeat units, 4 repeat units, 3 repeat units, 3 repeat units and 3 repeat units, respectively. Repeat sequences were detected by the online REPuter software tool [31], including forward, reverse, complement and palindromic repeats, with minimum repeat size of 30 bp and 90% or more sequence identity.

Genome Comparison
We analyzed base compsositions of the eight plastomes and calculated the GC content by Geneious. To illustrate the IR expansion and contraction, we specifically compared the IR/SC boundary regions by IRscope [32]. In order to compare the similarity, we used the MAFFT [33] to align chloroplast genome sequences. The mVISTA [34], an online tool, was used to evaluate similarity by visualizing the results with the annotation of P. chinense as the reference. We made a further analysis in the LAGAN mode, which can produce true multiple alignments regardless of whether or not inversions were contained [35]. To identify the nucleotide diversity (Pi), a sliding window analysis was conducted using DnaSP [36]. And we used Microsoft Excel 2011 to convert results into charts for observation.

Phylogenetic Analysis
A total of 36 complete plastomes (Tables S1 and S2) from 15 genera of Rutaceae (four plastomes newly generated and 32 plastomes obtained from GenBank) were included for phylogenomic analyses. Three species of Simaroubaceae and one species of Meliaceae (plastomes obtained from GenBank) were defined as outgroups. The alignment of these sequences was performed in MAFFT [33], then the nonhomologous sites were deleted for phylogenetic trees. Bayesian inference (BI) was executed with MrBayes v3.2.3 [37] in the GTR + G + I model. Maximum likelihood trees (ML) were conducted by IQtree [38] and 1000 bootstrap replicates were test to evaluate the branch support values. The proteincoding genes of the 36 sequences were also extracted to construct phylogenetic trees using the same methods as the complete chloroplast genome sequences.
the same methods as the complete chloroplast genome sequences.

Structure and Size of Plastomes
Three newly assembled chloroplast genomes (Phellodendron chinense, daniellii and Toddalia asiatica) and five published chloroplast genomes (P. Tetradium ruticarpum (A.Juss.) T.G.Hartley, and Melicope pteleifolia (Champ. T.G.Hartley, Zanthoxylum calcicola Huang and Z. piperitum (L.) DC.) obta GenBank, representing five genera in subfamily Amyridoideae, were comparative analyses. All plastomes of these five genera exhibited a similar qu structure (Figure 1), containing a large single copy (LSC) region (85,124 bp-86 small single copy (SSC) region (17,526 bp-18,610 bp), and a pair of inverted re (26,999 bp-27,644 bp), respectively. The whole genome size of the eight spec from 158,383 bp (T. daniellii) to 159,014 bp (T. asiatica) ( Table 1), which were si slightly smaller than the reported plastomes of some genera of subfamily Aur Rutaceae [39][40][41].   The DNA GC content is an important indicator of species affinity. The total GC contents of the five genera chloroplast genomes varied from 38.3% to 38.6%, which was obviously lower than AT contents. The average GC contents of SSC region, LSC region and IR regions were 33.3%, 36.7%, and 42.8%, respectively, indicating that the rRNA and tRNA, which occupy mainly in IR regions, prefer to use bases G and C. The total gene content and order were similar across the five genera plastomes. There were 116 genes in Phellodendron and Zanthoxylum, while 115 genes in Tetradium, Toddalia and Melicope. The number of protein-coding genes in Melicope was 80, while other four genera had 81 protein-coding genes. All the genera had 30 tRNA genes and 4 rRNA genes (Table 1).

Codon Preference Analysis
Genes in the closely related species generally show a similar codon use pattern. Through the analysis of codon preference, we can better understand the evolution of species [42]. Based on the protein-coding sequences, the frequency of codon usage was estimated for the five genera plastomes. In total, the protein-coding genes were composed of 26,943-27,731 codons, which encoding 21 amino acids ( Figure S1). Among them, the most frequent amino acid was Leucine (11.98-15.20%) and the least frequent coded amino acid was Cysteine (1.11-1.84%).
In our analysis, RSCU values of 30 codons were greater than 1, most of them ended with A or U and only five codons ended with G, which indicated that the preferred codons tended to be A/U ending. As shown in Table 2, the effective number of codons (Nc) value ranged from 51.09 to 57.46. When the Nc value >35, there is relatively weak codon usage bias [43]. The content of each base on the third synonymous codon (T 3s , C 3s , A 3s , and G 3s ) should have A = T and C = G when there is no variation or the selection effect on the bias in two complementary strands, and the usage pattern was completely caused by mutation [44]. As can be seen from Table 2, the use frequency between T+C and A+G was unbalanced, which indicated that the use pattern of codons in these five genera was also influenced by other factors in addition to mutation. This pattern was similar to that reported for other Rutaceae plastomes such as Citrus [45].

IR Contraction and Expansion
The contraction and expansion of IR region at the borders can affect the size difference between chloroplast genomes and play important roles in evolution [46]. Although the IR regions of the five genera were relatively conserved, structural variations were found in different degrees, especially in Melicope (Figure 2). The IR regions of the eight plastomes ranged from 26,999 bp (P. amurense) to 27,644 bp (Z. piperitum), among which rps3, rpl22, rps19, ndhF, ycf1, trnN, rpl2 and trnH were located at the junction of the LSC/IR and SSC/IR borders. For the LSC/IRb region, the gene rpl22 in four genera crossed the junction and extended into the IRb region, with a length from 39 bp to 246 bp. In Melicope, however, the gene rpl22 was completely located in the IRb region, and the gene rps3 crossed the junction and extended 429 bp into the IRb region. For the IRb/SSC region, the gene ndhF was located near the junction or crossed the junction and extended into the IRb regions of T. daniellii (6 bp) and Z. piperitum (23 bp). All eight plastomes had similar SSC/IRa borders, and the junction was all crossed by the gene ycf1 with a length from 1069 bp to 1704 bp. For the IRa/LSC region, the junction was within the region between the genes rps19 and trnH in four genera, while in Melicope, the junction was within the region between the genes rpl22 and trnH. In addition, it was found that the gene rpl2 was missing in both the IRa and IRb regions of Melicope.
Melicope, however, the gene rpl22 was completely located in the IRb region, and th rps3 crossed the junction and extended 429 bp into the IRb region. For the IRb/SSC the gene ndhF was located near the junction or crossed the junction and extended i IRb regions of T. daniellii (6 bp) and Z. piperitum (23 bp). All eight plastomes had SSC/IRa borders, and the junction was all crossed by the gene ycf1 with a length fro bp to 1704 bp. For the IRa/LSC region, the junction was within the region betw genes rps19 and trnH in four genera, while in Melicope, the junction was within the between the genes rpl22 and trnH. In addition, it was found that the gene rpl2 was m in both the IRa and IRb regions of Melicope.

Repeat and SSR Analysis
SSRs were an important type of repeated sequence in genomes that are particularly useful molecular markers in genetic diversity research [47]. From Figure 3A, we found that, in all eight species, the palindromic type had the most repeats, and the forward type ranked second. Reverse and complement repeats were much fewer, and even were not found in Z. piperitum and M. pteleifolia. Most SSRs were located in the LSC region (71%), followed by SSC (16%) and IR regions (13%) ( Figure 3B). We identified 199, 197, 69, 74, 72,79, 65 and 62 SSRs in P. amurense, P. chinense, T. daniellii, T. ruticarpum, Z. calcicola, Z. piperitum, M. pteleifolia and T.asiatica, respectively ( Figure 3C). A mononucleotide repeat unit (A/T) was most abundant in all species, accounting for 86.29-97.30%. This was followed by dinucleotide repeat unit (AT/AT) and (AG/CT) ( Figure 3C,D). It is noteworthy that Phellodendron plastome not only had most SSRs, the trinucleotide repeat units (AAT/ATT and AAG/CTT) and the polynucleotide repeats were only found in this genus. unit (A/T) was most abundant in all species, accounting for 86.29%-97.30%. This was followed by dinucleotide repeat unit (AT/AT) and (AG/CT) ( Figure 3C,D). It is noteworthy that Phellodendron plastome not only had most SSRs, the trinucleotide repeat units (AAT/ATT and AAG/CTT) and the polynucleotide repeats were only found in this genus.

Sequence Divergence Analysis
To characterize genome divergence, multiple sequence alignments were performed between the eight plastomes ( Figure 4). The comparison demonstrated that the eight plastomes had the high consistency in genes arrangement, especially in the same genus. Most of sequence variations occurred in the non-coding sequences, which indicated that

Sequence Divergence Analysis
To characterize genome divergence, multiple sequence alignments were performed between the eight plastomes ( Figure 4). The comparison demonstrated that the eight plastomes had the high consistency in genes arrangement, especially in the same genus. Most of sequence variations occurred in the non-coding sequences, which indicated that the coding regions were more conserved than non-coding regions. Particularly, the IR regions were much less divergent than the LSC and SSC regions. We also calculated the nucleotide variability (Pi) to show divergence at the sequence level between the eight plastomes ( Figure 5). It was found that the Pi values ranged from 0 to 0.08833 with a mean of 0.02086 in the LSC region, from 0 to 0.07306 with an average of 0.01944 in the SSC regions, and from 0 to 0.04125 with a mean of 0.00539 in the IR regions, indicating that IR regions had much lower variability than the LSC and SSC regions. Three most divergent hotspots, trnH-psbA, trnE-trnT and psaB regions, were observed, which exhibited significantly higher Pi values (>0.08) and were all located in the LSC region. These divergent regions may be undergoing rapid nucleotide substitution at the species level, indicating great potential as barcode markers for plant identification and further phylogenetic analysis in Amyridoideae. of 0.02086 in the LSC region, from 0 to 0.07306 with an average of 0.01944 in the SSC regions, and from 0 to 0.04125 with a mean of 0.00539 in the IR regions, indicating that IR regions had much lower variability than the LSC and SSC regions. Three most divergent hotspots, trnH-psbA, trnE-trnT and psaB regions, were observed, which exhibited significantly higher Pi values (>0.08) and were all located in the LSC region. These divergent regions may be undergoing rapid nucleotide substitution at the species level, indicating great potential as barcode markers for plant identification and further phylogenetic analysis in Amyridoideae.

Phylogenetic Analysis
The BI and ML analyses based on both the whole plastomes and protein-coding genes recovered exactly the same tree topology ( Figure 6). The phylogenetic trees displayed higher supports at all nodes that provided robust phylogenetic relationships of the selected taxa in Rutaceae. The results showed that the sampled species of Phellodendron, Tetradium, Toddalia, and Zanthoxylum were resolved as close relatives and grouped together as a distinct clade, which is consistent with previous phytochemical and phylogenetic studies of the "proto-Rutaceae" [13,14,21]. Moreover, our results showed that these four genera of "proto-Rutaceae"formed two subclades, which were sister to each other.
The first subclade consisted of Phellodendron and Tetradium. Although Phellodendron and Tetradium differ radically in fruit structure (the former is dehiscent follicle and the latter is fleshy drupe), Hartley (1981) [18] found that it was impossible to distinguish the two genera using vegetative or staminate material. Previous molecular studies also revealed that these two genera were closely related [18,21,23]. The sister-group relationship of Phellodendron and Tetradium was also supported by our plastomes data (PP/BS: 1/100). In this subclade, the two sampled species of Tetradium, T. ruticarpum (A.Juss.) T.G.Hartley. and T. daniellii (Benn.) T.G.Hartley, were well supported as sister species (PP/BS: 1/100). P. chinense and P. amurense, the only two species of Phellodendron, formed another strongly supported monophyletic group (PP/BS: 1/100). Unfortunately, as part of 'Proto-Rutaceae', Fagaropsis was not included in our analysis, but it most likely belongs to this subclade too. Inclusion of the plastomes data of this genus in future studies will help enhance our understanding of the 'proto-Rutaceae'. The second subclade was composed of Toddalia and Zanthoxylum. It was interesting that, in 'Proto-Rutaceae', although all four genera produce 1-BTIQ alkaloids, only Toddalia and Zanthoxylum synthesise anthranilate-derived alkaloids and coumarins, and there was a co-occurrence of furanocoumarins and furoquinolines in Toddalia and Zanthoxylum, but not in Phellodendron [13,15]. Thus, the proximity of these two genera was supported by both the chemical characteristics and the molecular data. In previous phylogenetic studies, Toddalia might be sister to Zanthoxylum, or nested with it [20,21]. Furthermore, Appelhans et al. (2018) [20] has proposed to change T. asiatica as Z. asiaticum (L.) Appelhans, Groppo & J.Wen, comb. nov. Our results based on plastomes data clearly showed that T. asiatica was deeply nested within Zanthoxylum. Therefore, we confirmed that Toddalia was part of the clade of Zanthoxylum, and suggested that the two genera should be merged. In Appelhans et al.'s (2018) [20] study, Toddalia was resolved as sister to the African and Malagasy species of Zanthoxylum, however, in another study it was found to be sister to a main Asian clade [48]. In our plastomes analyses, T. asiatica was not closely related to the African and Malagasy species of Z. paniculatum Balf.f. and Z. madagascariense Baker, but was sister to the clade formed by the Southwest Pacific species

Phylogenetic Analysis
The BI and ML analyses based on both the whole plastomes and protein-coding genes recovered exactly the same tree topology ( Figure 6). The phylogenetic trees displayed higher supports at all nodes that provided robust phylogenetic relationships of the selected taxa in Rutaceae. The results showed that the sampled species of Phellodendron, Tetradium, Toddalia, and Zanthoxylum were resolved as close relatives and grouped together as a distinct clade, which is consistent with previous phytochemical and phylogenetic studies of the "proto-Rutaceae" [13,14,21]. Moreover, our results showed that these four genera of "proto-Rutaceae"formed two subclades, which were sister to each other.
The first subclade consisted of Phellodendron and Tetradium. Although Phellodendron and Tetradium differ radically in fruit structure (the former is dehiscent follicle and the latter is fleshy drupe), Hartley (1981) [18] found that it was impossible to distinguish the two genera using vegetative or staminate material. Previous molecular studies also revealed that these two genera were closely related [18,21,23]. The sister-group relationship of Phellodendron and Tetradium was also supported by our plastomes data (PP/BS: 1/100). In this subclade, the two sampled species of Tetradium, T. ruticarpum (A.Juss.) T.G.Hartley. and T. daniellii (Benn.) T.G.Hartley, were well supported as sister species (PP/BS: 1/100). P. chinense and P. amurense, the only two species of Phellodendron, formed another strongly supported monophyletic group (PP/BS: 1/100). Unfortunately, as part of 'Proto-Rutaceae', Fagaropsis was not included in our analysis, but it most likely belongs to this subclade too. Inclusion of the plastomes data of this genus in future studies will help enhance our understanding of the 'proto-Rutaceae'. The second subclade was composed of Toddalia and Zanthoxylum. It was interesting that, in 'Proto-Rutaceae', although all four genera produce 1-BTIQ alkaloids, only Toddalia and Zanthoxylum synthesise anthranilate-derived alkaloids and coumarins, and there was a co-occurrence of furanocoumarins and furoquinolines in Toddalia and Zanthoxylum, but not in Phellodendron [13,15]. Thus, the proximity of these two genera was supported by both the chemical characteristics and the molecular data. In previous phylogenetic studies, Toddalia might be sister to Zanthoxylum, or nested with it [20,21]. Furthermore, Appelhans et al. (2018) [20] has proposed to change T. asiatica as Z. asiaticum (L.) Appelhans, Groppo & J.Wen, comb. nov. Our results based on plastomes data clearly showed that T. asiatica was deeply nested within Zanthoxylum. Therefore, we confirmed that Toddalia was part of the clade of Zanthoxylum, and suggested that the two genera should be merged. In Appelhans et al.'s (2018) [20] study, Toddalia was resolved as sister to the African and Malagasy species of Zanthoxylum, however, in another study it was found to be sister to a main Asian clade [48]. In our plastomes analyses, T. asiatica was not closely related to the African and Malagasy species of Z. paniculatum Balf.f. and Z. madagascariense Baker, but was sister to the clade formed by the Southwest Pacific species Z. pinnatum (J.R.Forst. & G.Forst.) Oliv. and the East Asian species of Z. schinifolium Siebold & Zucc., Z. oxyphyllum Edgew., and Z. calcicola Huang (PP/BS: 1/100).
The inferred phylogeny based on plastomes data also lent support to the division of Euodia sensu lato by Hartley (1981) [18]. Melicope pteleifolia (Champ. ex Benth.) T.G.Hartley, a South-East Asian species with trifoliolate (occasionally unifoliate) leaves, was tradition-ally regarded as belonging to Euodia [49]. In the present study, the results showed that M. pteleifolia was not close to T. ruticarpum and T. daniellii, both of them have odd-pinnately compound leaves and formerly belonged to Euodia [50], but was clustered with A. pedunculata (L.) Miq., a species that has unifoliolate leaves (PP/BS: 1/100). They formed a strongly supported sister group and appeared basal to the clade of the 'proto-Rutaceae'.
The genera of the "proto-Rutaceae"are characterized by having 1-BTIQ alkaloids. Waterman (2007) [51] hypothesized that the "proto-Rutaceae"might represent an early branching clade within Rutaceae. However, the molecular studies showed that it was not the basal in the family [11,52,53]. In morphology, instead of fruit types, the phyllotaxis, habit and aculeate protuberances on branches are of taxonomic value in the "proto-Rutaceae" [20]. Phellodendron, Tetradium and Fagaropsis have opposite leaves and do not have any aculeus or spines on branches, while Zanthoxylum and Toddalia have alternate leaves and are aculeate [8]. Having established good molecular evidence for the clade 'proto-Rutaceae' as a natural group, the presence of 1-BTIQ alkaloids can be interpreted as the synapomorphy for these genera of the group. Considering that the "proto-Rutaceae"is not a formal taxon name, we supported Ling et al.'s (2009) [23] suggestion that it may be possible to group these five genera in a re-circumscribed tribe Zanthoxyleae Dumort. in the future.
The inferred phylogeny based on plastomes data also lent support to the division of Euodia sensu lato by Hartley (1981) [18]. Melicope pteleifolia (Champ. ex Benth.) T.G.Hartley, a South-East Asian species with trifoliolate (occasionally unifoliate) leaves, was traditionally regarded as belonging to Euodia [49]. In the present study, the results showed that M. pteleifolia was not close to T. ruticarpum and T. daniellii, both of them have odd-pinnately compound leaves and formerly belonged to Euodia [50], but was clustered with A. pedunculata (L.) Miq., a species that has unifoliolate leaves (PP/BS: 1/100). They formed a strongly supported sister group and appeared basal to the clade of the 'proto-Rutaceae'.
The genera of the "proto-Rutaceae"are characterized by having 1-BTIQ alkaloids. Waterman (2007) [51] hypothesized that the "proto-Rutaceae"might represent an early branching clade within Rutaceae. However, the molecular studies showed that it was not the basal in the family [11,52,53]. In morphology, instead of fruit types, the phyllotaxis, habit and aculeate protuberances on branches are of taxonomic value in the "proto-Rutaceae" [20]. Phellodendron, Tetradium and Fagaropsis have opposite leaves and do not have any aculeus or spines on branches, while Zanthoxylum and Toddalia have alternate leaves and are aculeate [8]. Having established good molecular evidence for the clade 'proto-Rutaceae' as a natural group, the presence of 1-BTIQ alkaloids can be interpreted as the synapomorphy for these genera of the group. Considering that the "proto-Rutaceae"is not a formal taxon name, we supported Ling et al.'s (2009) [23] suggestion that it may be possible to group these five genera in a re-circumscribed tribe Zanthoxyleae Dumort. in the future.

Conclusions
In this study, we investigated plastome variations in eight species representing five genera of the subfamily Amyridoideae, Rutaceae. We found that the plastomes of these five genera are relatively conserved in terms of genome structure and gene content and order.