Comparative Analysis of the Complete Chloroplast Genomes of Eight Ficus Species and Insights into the Phylogenetic Relationships of Ficus

The genus Ficus is an evergreen plant, the most numerous species in the family Moraceae, and is often used as a food and pharmacy source. The phylogenetic relationships of the genus Ficus have been debated for many years due to the overlapping phenotypic characters and morphological similarities between the genera. In this study, the eight Ficus species (Ficus altissima, Ficus auriculata, Ficus benjamina, Ficus curtipes, Ficus heteromorpha, Ficus lyrata, Ficus microcarpa, and Ficus virens) complete chloroplast (cp) genomes were successfully sequenced and phylogenetic analyses were made with other Ficus species. The result showed that the eight Ficus cp genomes ranged from 160,333 bp (F. heteromorpha) to 160,772 bp (F. curtipes), with a typical quadripartite structure. It was found that the eight Ficus cp genomes had similar genome structures, containing 127 unique genes. The cp genomes of the eight Ficus species contained 89–104 SSR loci, which were dominated by mono-nucleotides repeats. Moreover, we identified eight hypervariable regions (trnS-GCU_trnG-UCC, trnT-GGU_psbD, trnV-UAC_trnM-CAU, clpP_psbB, ndhF_trnL-UAG, trnL-UAG_ccsA, ndhD_psaC, and ycf1). Phylogenetic analyses have shown that the subgenus Ficus and subgenus Synoecia exhibit close affinities and based on the results, we prefer to merge the subgenus Synoecia into the subgenus Ficus. At the same time, new insights into the subgeneric classification of the Ficus macrophylla were provided. Overall, these results provide useful data for further studies on the molecular identification, phylogeny, species identification and population genetics of speciation in the Ficus genus.


Introduction
The genus Ficus is the most numerous species of evergreen plants in the family Moraceae and is mainly found in subtropical and tropical areas [1]. For a long time, numerous Ficus species have been utilized as sources for food and pharmacy because of their rich nutritional content [2]. The genus Ficus is one of the largest woody genera of angiosperms, and the subdivision of the genus has been a concern of taxonomists [3]. The present relatively complete taxonomic system for the genus Ficus was proposed by Berg [4], which divided the genus Ficus into six subgenera (subgenus Urostigma, subgenus Pharmacosycea, subgenus Sycomorus, subgenus Ficus, subgenus Sycidium, and subgenus Synoecia) based on previous molecular phylogenetic analyses and morphological characters. However, the use of morphological and molecular markers for the identification of the Ficus genus are controversial or limited due to overlapping phenotypic characteristics or morphological similarities among the Ficus genus [3,[5][6][7]. Therefore, the provision of additionalgenomic information is imperative for the understanding of the Ficus genus, and for the safe and effective utilization of Ficus.
The chloroplast (cp) is a unique organelle in the plant cell that not only plays a vital part in photosynthesis, but also participates in the synthesis of other organisms [8][9][10]. The cp genomes in plants are uniparentally inherited and consists of a highly conserved genomic structure a small single-copy (SSC), a couple of inverse-repeat (IR) regions, and a large single-copy (LSC), which also has a large number of variable loci [11,12]. Therefore, cp genomes have often been used to research the evolution, interspecific divergence, adaptive population history and phylogeny of related species [12][13][14]. With the emergence of the whole-genome sequencing era, the whole cp genomes have been extensively used in phylogenetic differentiation studies, species taxonomic identification and genetic breeding [15][16][17][18][19]. Although the cp genomes of some genus of Ficus have been reported previously [20][21][22][23][24][25][26][27], the eight Ficus species studied here have not been reported. As such, we compared the cp genome structures of eight Ficus species to provide a useful genomic resource for Ficus species.
In our study, the complete cp genomes of eight Ficus species were sequenced, which possess economically significant values (Ficus altissima, Ficus auriculata, Ficus benjamina, Ficus curtipes, Ficus heteromorpha, Ficus lyrata, Ficus microcarpa, and Ficus virens). The structural features and sequence differences of the eight Ficus species were then compared and analyzed, while the phylogeny of the eight Ficus species was inferred from the reported cp genomes of the Ficus genus. The results of this study can provide guidance for the subsequent conservation and utilization of Ficus species.

Plant Materials and DNA Extraction
Eight taxa, F. altissima, F. auriculata, F. benjamina, F. curtipes, F. heteromorpha, F. lyrata, F. microcarpa, and F. virens, represented the genus Ficus of the family Moraceae. We collected young, fresh, and non-diseased leaves from the adult plants of the target species, which were then frozen into liquid nitrogen. In addition, the voucher specimens were preserved in the herbarium of Southwest Forestry University, Kunming, Yunnan, China (Table S1). DNA extraction was performed using a modified CTAB method [28].

Genome Sequencing, Assembly and Annotation
Sequencing was performed by Annoroad Gene Technology (Beijing, China) to generate libraries with an average insert size of 400 bp from total DNA on an Illumina-based platform (Illumina Novaseq 6000). Then, the raw data obtained by sequencing were assembled using GetOrganelle v1.6.0 software with Ficus religiosa as the reference [29]. The assembled sequences were annotated with Geneious prime using F. religiosa as a reference and manually adjusted for start and stop codons. The genome mapping of the annotated Ficus cp genomes was conducted using OGdraw online [30]. The GenBank accession numbers of the eight Ficus species are shown in Table S1.

Simple Sequence Repeats (SSR) and Repetitive Sequence Analysis
Identification of SSRs was performed using MISA [31], and the specific parameter settings were as in Zhao et al. [32]. The forward (F), reverse (R), complementary (C), and palindromic (P) oligonucleotide repeats were determined by the REPuter program [33], and then the tandem repeats were subdivided using the network-based Tandem Repeat Finding [34], and the REPuter and Tandem Repeat specific parameters were set as in Yang et al. [35].

Genome Comparison and Divergent Hotspots Identification
Expansion or contraction of the IR region was investigated and visualized by IR scope [36]. The alignment of the eight Ficus cp genome sequences of species was performed using MAFFT v7 [37], and the completed sequences were subsequently compared using the Shuffle-LAGAN mode in mVISTA [38]. The nucleotide variability (Pi) of the cp genome was analyzed using DnaSp v5.10 [39], with the following settings: a step size of 200 bp and a window length of 800 bp.

Non-Synonymous (Ka) and Synonymous (Ks) Substitution Rate Analysis
First, the selected protein-coding genes (length > 300 bp) were paired in MAFFT. Then, the Ka/Ks value of the screened genes was calculated using the KaKs_calculator [40] with reference to Ivanova et al. [41].

Phylogenetic Analysis
To estimate phylogenetic relationships of eight Ficus species in the genus Ficus, cp genomes of 34 taxa were compared (Table S2), including 31 Ficus species, and 3 Flacourtiaceae species (Flacourtia indica, Homalium ceylanicum and Poliothyrsis sinensis) were set as outgroups. The sequence alignment of 34 cp genomic matrices was made using MAFFT v7 software. In order to obtain robust phylogenetic relationships, we constructed phylogenetic trees for the genus Ficus using the maximum likelihood (ML) method. The optimal nucleotide substitution model (K3Pu + F + R5) was selected using ModelFinder [42]. ML analyses was performed using IQ-tree 1.5.5 [43] under ultrafast bootstrap (1000) and the partition model (partitioned analysis with mixed data) [44]. The ML tree was visualized by using FigTree v1.4.0.  (Table S1).

Genomic Characteristics of Chloroplast
We annotated a total of 127 unique genes, containing 83 protein-coding genes, 8 rRNAs and 36 tRNAs, in eight Ficus species showing similar genomic structures. Nineteen of the genes contained one intron and three genes contained two introns (Table 1). In all eight Ficus species, their cp genomes were AT-rich, with 64.1% of the genomes made up of A/T nucleotides. The GC content of the IR regions, LSC and SSC regions, proteincoding sequences, tRNA, and rRNA among the eight Ficus cp genomes was similar, with 42.4-42.7%, 32.7%, 37.1-37.2%, 52.9% and 55.4% GC content, respectively (Table S3).

Category
Gene Groups Gene Names Number

IR Constriction and Expansion
We performed a comparative analysis of the contraction and expansion of the IR/SC region boundary in eight Ficus species (Figure 3). The genes at the junctions of the eight Ficus species included rps19, ndhF, rpl22, rpl2, psbA, trnH, and ycf1. The ndhF gene of F. curtipes, F. heteromorpha, and F. lyrata was located exclusively in the SSC region, ranging from 3 to 657 bp away from the boundary of JSB. The ndhF gene of F. altissima, F. auriculata, F. benjamina, F. microcarpa, and F. virens was mainly located in the SSC region, but a small proportion of the ndhF gene spanned the junction 15 to 521 bp into the IRb region. The rps19 gene of F. curtipes, F. lyrata, and F. virens was located in the LSC region, which for F. auriculata was located in the IRb region, while the gene in F. altissima, F. benjamina, F. heteromorpha and F. microcarpa spans the LSC and IRb boundary. We also observed that the ycf1 gene in all eight Ficus cp genomes spanned the SSC and IRa junctions, but the majority were located in the SSC region, and the length of the ycf1 gene in the IRa was within the range of 1008-1493 bp. In the eight Ficus cp genomes, the trnH, psbA, and rpl22 genes were located exclusively in the LSC region, and the distance between the trnH gene and JLA ranged from 46 to 169 bp.

Sequence Divergence Analysis
To determine the sequence differences among the eight Ficus cp genomes, we used Ficus religiosa as a reference genome and compared them using mVISTA (Figure 4). It is apparent from Figure 4 that the eight Ficus cp genomes were highly conserved, but regions of divergence among intergenic regions were also found, such as trnT-UGU_trnL-UAA, rps16_trnQ-UUG, trnT-GGU_psbD, petA_psbJ rpoB_trnC-GCA, petN_psbM, atpB_rbcL, and rpl32_trnL-UAG. Subsequently, we calculated the nucleotide variability (Pi), as the Pi value allows for a better assessment of the degree of sequence divergence ( Figure 5). From the results, the degree of divergence in the LSC and SSC regions exceeded that of the IR

Sequence Divergence Analysis
To determine the sequence differences among the eight Ficus cp genomes, we used Ficus religiosa as a reference genome and compared them using mVISTA (Figure 4). It is apparent from Figure 4 that the eight Ficus cp genomes were highly conserved, but regions of divergence among intergenic regions were also found, such as trnT-UGU_trnL-UAA, rps16_trnQ-UUG, trnT-GGU_psbD, petA_psbJ rpoB_trnC-GCA, petN_psbM, atpB_rbcL, and rpl32_trnL-UAG. Subsequently, we calculated the nucleotide variability (Pi), as the Pi value allows for a better assessment of the degree of sequence divergence ( Figure 5). From the results, the degree of divergence in the LSC and SSC regions exceeded that of the IR region, and the average value of Pi in the eight Ficus species was approximately 0.0033. Then, Pi > 0.0128 was identified as the hotspot region, and a total of eight high variance regions were found: trnS-GCU_trnG-UCC, trnT-GGU_psbD, trnV-UAC_trnM-CAU, clpP_psbB, ndhF_trnL-UAG, trnL-UAG_ccsA, ndhD_psaC, and ycf1, where the ndhF_trnL-UAG region has the highest Pi value (0.0202). The identification of these highly variable regions offers invaluable information for the evolution of markers in Ficus.

Non-Synonymous (Ka) and Synonymous (Ks) Substitution Rate Analysis
We calculated the Ka/Ks ratios of the eight Ficus chloroplast genomes using F. religiosa as the reference with a total of 53 protein genes reselected based on 300 bp length ( Figure 6; Table S6). The Ka/Ks ratio results indicate that gene-specificity is the majority, and the remaining small portion is region-specific (the IR, SSC, and LSC regions showed comparable values). The Ka range of eight Ficus chloroplast genomes was from 0 to 2.6149, and the Ks values ranged from 0 to 5.0893. The highest Ka and Ks was for the ycf1 gene between F. altissima and F. religiosa (ka = 2.6149, ks = 5.0893), while the values of Ka and Ks for a large number of genes were equal to 0. The average Ka/Ks ratio for the 53 protein genes was 0.1211 and the average Ka/Ks value for ycf1 was 1.0476, with only the ycf1 gene having a Ka/Ks value greater than 1 among the genes tested.

Non-Synonymous (Ka) and Synonymous (Ks) Substitution Rate Analysis
We calculated the Ka/Ks ratios of the eight Ficus chloroplast genomes using F. religiosa as the reference with a total of 53 protein genes reselected based on 300 bp length ( Figure 6; Table S6). The Ka/Ks ratio results indicate that gene-specificity is the majority, and the remaining small portion is region-specific (the IR, SSC, and LSC regions showed comparable values). The Ka range of eight Ficus chloroplast genomes was from 0 to 2.6149, and the Ks values ranged from 0 to 5.0893. The highest Ka and Ks was for the ycf1 gene between F. altissima and F. religiosa (ka = 2.6149, ks = 5.0893), while the values of Ka and Ks for a large number of genes were equal to 0. The average Ka/Ks ratio for the 53 protein genes was 0.1211 and the average Ka/Ks value for ycf1 was 1.0476, with only the ycf1 gene having a Ka/Ks value greater than 1 among the genes tested.

Non-Synonymous (Ka) and Synonymous (Ks) Substitution Rate Analysis
We calculated the Ka/Ks ratios of the eight Ficus chloroplast genomes using F. religiosa as the reference with a total of 53 protein genes reselected based on 300 bp length ( Figure 6; Table S6). The Ka/Ks ratio results indicate that gene-specificity is the majority, and the remaining small portion is region-specific (the IR, SSC, and LSC regions showed comparable values). The Ka range of eight Ficus chloroplast genomes was from 0 to 2.6149, and the Ks values ranged from 0 to 5.0893. The highest Ka and Ks was for the ycf1 gene between F. altissima and F. religiosa (ka = 2.6149, ks = 5.0893), while the values of Ka and Ks for a large number of genes were equal to 0. The average Ka/Ks ratio for the 53 protein genes was 0.1211 and the average Ka/Ks value for ycf1 was 1.0476, with only the ycf1 gene having a Ka/Ks value greater than 1 among the genes tested.

Phylogenomic Analysis
The phylogenetic relationships of the genus Ficus were inferred by maximum likelihood, and the support obtained by this method was high and it strongly supported the monophyly of the genus Ficus. The phylogenetic tree shows that the genus Ficus is divided into five main branches (Figure 7). Cluster I contained eight species of the subgenus Urostigma. Seven species of the subgenus Sycomorus formed a monophyletic cluster with high support. Cluster III contained two species of the subgenus Pharmacosycea and Ficus lyrata. Cluster IV contained three species of the subgenus Synoecia and four species of the subgenus Ficus. Six species of the subgenus Ficus formed Cluster V.
Life 2022, 12, x FOR PEER REVIEW 10 of 16

Phylogenomic Analysis
The phylogenetic relationships of the genus Ficus were inferred by maximum likelihood, and the support obtained by this method was high and it strongly supported the monophyly of the genus Ficus. The phylogenetic tree shows that the genus Ficus is divided into five main branches (Figure 7). Cluster I contained eight species of the subgenus Urostigma. Seven species of the subgenus Sycomorus formed a monophyletic cluster with high support. Cluster III contained two species of the subgenus Pharmacosycea and Ficus lyrata. Cluster Ⅳ contained three species of the subgenus Synoecia and four species of the subgenus Ficus. Six species of the subgenus Ficus formed Cluster Ⅴ.

Chloroplast Genomic DNA Structures
In the present study, we sequenced the eight Ficus species cp genomes, which laid the foundation for a comprehensive comparison of the cp genome sequences of Moraceae. The eight Ficus species reported in this study differ somewhat in chloroplast size, but in general they are roughly similar in chloroplast genome length to those reported so far for Ficus species, with all being around 160,000 bp in length [8,27]. The structure of the eight Ficus cp genomes, like most angiosperms, also has a typical quadripartite structure [8,[45][46][47]. Moreover, we found that the content of the eight Ficus cp genomes was dominated by A/T, up to 64.1%, and this result was similar to that in other plants [48][49][50]. The lower G/C content of the Ficus genomes may be related to its own spontaneous mutations [51]. We also found that the GC content of LSC and SSC were lower than the IR regions, which were common in Ficus species [27,52,53], and this may be so due to the existence of rRNA and tRNA genes in the IR region [54].

Identification of SSRs and Repeat Sequences
Repeated sequences have been shown to be associated with the rearrangement, evolution and divergence of cp genome sequences, as well as having an important function

Chloroplast Genomic DNA Structures
In the present study, we sequenced the eight Ficus species cp genomes, which laid the foundation for a comprehensive comparison of the cp genome sequences of Moraceae. The eight Ficus species reported in this study differ somewhat in chloroplast size, but in general they are roughly similar in chloroplast genome length to those reported so far for Ficus species, with all being around 160,000 bp in length [8,27]. The structure of the eight Ficus cp genomes, like most angiosperms, also has a typical quadripartite structure [8,[45][46][47]. Moreover, we found that the content of the eight Ficus cp genomes was dominated by A/T, up to 64.1%, and this result was similar to that in other plants [48][49][50]. The lower G/C content of the Ficus genomes may be related to its own spontaneous mutations [51]. We also found that the GC content of LSC and SSC were lower than the IR regions, which were common in Ficus species [27,52,53], and this may be so due to the existence of rRNA and tRNA genes in the IR region [54].

Identification of SSRs and Repeat Sequences
Repeated sequences have been shown to be associated with the rearrangement, evolution and divergence of cp genome sequences, as well as having an important function in the phylogeny [12,55,56]. The number and distribution of the four repeats in the eight Ficus cp genomes, and the distribution of the repeats were similar in the cp genomes of Ficus, which was strongly related to the high conservation within the Ficus species. The cpSSR markers have been frequently used for intraspecific identification and genetic evolutionary analysis because of their high mutation rate [57][58][59]. The results of this study indicated that the mono-nucleotide repeats were the richer repeat type in the eight Ficus species, with the A/T being the enriched repeat motif, similar to other higher plant studies [13,60], indicating that the higher plants were dominated by low-level repeat motif [61]. The SSRs and repeats detected in this study were important for the subsequent phylogenetic studies and taxonomy of the genus Ficus, and even the family Moraceae.

IR Constriction and Expansion
Scientists have noticed that the expansion and contraction of the IR regions affects the size of the chloroplasts [56,62]. Moreover, it may cause border genes to enter the IR or SC [63]. In our study, most of the genes (rpl22, rpl2, ycf1, trnH, and psbA) in the eight Ficus species were identical at the IR/SC boundary positions, and differed only in length, further suggesting that the Ficus species are closely related and have highly conserved cp genomes. The cp genomes of monocots and dicots have distinct differences in gene arrangement; for example, the trnH gene is located in the LSC region in dicots, whereas it is located in the IR region in monocots [64]. Interestingly, we found that the rps19 gene of F. auriculata was located exclusively in the IRb region, which would be very rare in dicotyledons [65][66][67], and this may be the case because of expansion from the IR region to the LSC region.

Sequence Divergence Analysis
A comparative analysis of hypervariable regions in plant cp genomes can better identify mutation hotspots and provide a basis for genetic diversity analysis [68,69]. In this study, to identify variable regions, we calculated the percentage of variable characters (coding and non-coding regions) in eight Ficus species cp genomes. The results of the study were similar to those of angiosperms, where the non-coding regions were much more variable than the coding regions [70][71][72]. At the same time, we also found that the SC region of the eight Ficus species in this study were more variable than the IR region, which was consistent with previous results [73,74]. In previous studies, a large number of hypervariable regions were used as DNA barcodes for other plants [75][76][77]. We identified eight hypervariable regions in the Ficus cp genomes (trnS-GCU_trnG-UCC, trnT-GGU_psbD, trnV-UAC_trnM-CAU, clpP_psbB, ndhF_trnL-UAG, trnL-UAG_ccsA, ndhD_psaC, and ycf1). The discovery of these hypervariable regions could provide vast amounts of information for the development of molecular markers for phylogenetic analyses.

Non-Synonymous (Ka) and Synonymous (Ks) Substitution Rate Analysis
The Ka/Ks reveal selection pressure on protein-coding genes and they have a key role in evolutionary studies [78]. Non-synonymous nucleotide substitutions occurred with a lower frequency than synonymous substitutions in most genes, subject to purifying selection [65]. Thus, the ratio of Ka/Ks > 1 indicates probable positive selection; Ka/Ks < 1 indicates purifying selection, while Ka/Ks values of nearly 1 indicate neutral evolution [79]. The results of this study showed that a great many genes had Ka and Ks values that are equal to 0, indicating that these eight Ficus chloroplast genomes are relatively conservative. However, the highest Ka and Ks values were for the gene between F. altissima and F. religiosa (ka = 2.6149, ks = 5.0893), suggesting that it is more variable in F. altissima. The ycf1 gene was under positive selection in the Ficus cp genome, and so the ycf1 gene may have an important role in the adaptation of Ficus species to different environments [80]. However, it cannot be excluded that it may be a pseudogene, as the gene often becomes pseudogenic in angiosperm cp genomes [81]. Therefore, further studies are needed to determine whether the ycf1 gene is in a positive selection state in the Ficus cp genome. The Ka/Ks values were less than 1 for all genes except ycf1, which indicated that these cp genomes underwent extensive purifying selection.

Phylogenomic Analysis
The cp genome was the most vital genetic resource for inferring plant evolutionary relationships and was well suited for analyzing comparative plant relatedness [82,83]. In this study, in order to determine the phylogenetic relationships of eight Ficus genus, phylogenetic trees were structured using ML methods. The findings indicated that all Ficus species clustered in the one clade with high values, a finding that is consistent with the phylogeny inferred by previous work using cpDNA fragments [8].
According to the results of the phylogenetic analysis, species of the three subgenera (subgenus Sycomorus, subgenus Synoecia, and subgenus Pharmacosycea) were all monophyletic taxa and had strong support values, suggesting that the cp genome was suited for resolving the phylogeny of the Ficus. Berg [4] has pointed out the existence of some morphological similarities between the subgenus Ficus and the subgenus Synoecia. The results of the present study showed that the subgenus Ficus and the subgenus Synoecia are clustered into one branch, indicating a close relationship between the two subgenera. Therefore, the taxonomic treatment of the subgenus Synoecia and subgenus Ficus requires further study.
The results of Rønsted [84] suggest that the subgenus Urostigma belongs to a nonmonophyletic origin, and we obtained similar results. Interestingly, in the present study, Ficus lyrata showed a close relationship with the subgenus Pharmacosycea, and so further research is needed on the subgenus classification of F. lyrata.
Based on morphological traits [85] as well as molecular data [6], it was previously suggested that Ficus tikoua should be classified as subgenus Sycomorus and our phylogenetic analysis provides strong evidence for this classification. Subgenus Urostigma and subgenus Sycomorus formed a clade, indicating that the two subgenera are closely related. Zhang [86] points out that F. auriculata and F. beipeiensis should be classified as the same species, but our study results clearly do not support this classification. In our result, F. auriculata and F. beipeiensis were still somewhat different, despite being more closely related. Our results show that the F. formosana, F. heteromorpha, F. erecta and F. pandurate of the Ficus gasparriniana-F. heteromorpha complex of sect. Ficus cluster together, a result that is consistent with those obtained in previous phylogenetic studies based on SSR markers [87].

Conclusions
In our study, we assembled and analyzed the cp genomes of eight Ficus species. By comparing these cp genomes, the results show that the structures of these cp genomes are similar to those of most angiosperm genomes, which exhibit a typical tetragonal structure. Additionally, we obtained valuable genetic resources, including SSRs, highly variable loci, and repetitive sequences. Phylogenetic analysis shows that all Ficus species were clustered in the same clade. The present study provides a clearer phylogenetic framework for the taxonomy of the genus Ficus, and it will help in the species identification of these species, and in genetic diversity studies. Author Contributions: Data curation, wrote the manuscript, methodology, software, X.X. and J.P.; resources, L.Y. and A.D.; conceptualization, software, X.Z.; supervision, funding acquisition, D.W. All authors have read and agreed to the published version of the manuscript.

Conflicts of Interest:
The authors declare no conflict of interest.