Complete Chloroplast Genome Sequence of Fortunella venosa (Champ. ex Benth.) C.C.Huang (Rutaceae): Comparative Analysis, Phylogenetic Relationships, and Robust Support for Its Status as an Independent Species

: Fortunella venosa (Rutaceae) is an endangered species endemic to China and its taxonomic status has been controversial. The genus Fortunella contains a variety of important economic plants with high value in food, medicine, and ornamental. However, the placement of Genus Fortunella into Genus Citrus has led to controversy on its taxonomy and Systematics. In this present research, the Chloroplast genome of F. venosa was sequenced using the second-generation sequencing, and its structure and phylogenetic relationship analyzed. The results showed that the Chloroplast genome size of F. venosa was 160,265 bp, with a typical angiosperm four-part ring structure containing a large single copy region (LSC) (87,597 bp), a small single copy region (SSC) (18,732 bp), and a pair of inverted repeat regions (IRa \ IRb) (26,968 bp each). There are 134 predicted genes in Chloroplast genome, including 89 protein-coding genes, 8 rRNAs, and 37 tRNAs. The GC-content of the whole Chloroplast genome was 43%, with the IR regions having a higher GC content than the LSC and the SSC regions. There were no rearrangements present in the Chloroplast genome; however, the IR regions showed obvious contraction and expansion. A total of 108 simple sequence repeats (SSRs) were present in the entire chloroplast genome and the nucleotide polymorphism was high in LSC and SSC. In addition, there is a preference for codon usage with the non-coding regions being more conserved than the coding regions. Phylogenetic analysis showed that species of Fortunella are nested in the genus of Citrus and the independent species status of F. venosa is supported robustly, which is signiﬁcantly different from F. japonica . These ﬁndings will help in the development of DNA barcodes that can be useful in the study of the systematics and evolution of the genus Fortunella and the family Rutaceae. the results that although Chloroplast genomes length genetic structure, they still show signiﬁcant differences in the IR/SC boundary (Figure All genes at the border include rps3, rpl22, ndhF, ycf1, trnH. The expansion and contraction of the border region analyzed for the 10 species. For example, the position of rpl22 gene in Citrus aurantium, C. cavaleriei, C. hongheensis, C. limon, and C. sinensis is located in the IRb region with a distance of 7 bp, 6 bp, 7 bp, 7 bp, and 7 bp from the boundary, respectively. The rpl22 in the other species spans the LSC and IRb regions, and the situation of rpl22 at the boundary of LSC and IRa is also different, the rpl22 gene is missing in C. maxima, C. medica, and Fortunella venosa . The gene ndhF located at the border between IRb and SSC is only 2 bp and 2200 bp in C. medica , and the rest are 31 bp and 2201 bp. The gene trnH located on the border of IRa and LSC is located on LSC but the length from the border varies from 2–65 bp. Ycf1 was lost at the boundary of IRb and SSC in C. medica and F. venosa , and ndhF crossed the boundary of LSC and IRb, but only 2 bp was located at IRb in C. medica , the rest was 31 bp. The length of ycf1 at the boundary between SSC and IRa is 5490 bp to 5505 bp. These results indicate that there is a contraction and expansion of IR region, which can be used for the study of species-speciﬁc gene loci.


Introduction
The origin of the Chloroplast (cp) can be traced back to more than one billion years ago as a result of Cyanobacterium endosymbiosis [1][2][3]. It is an organelle commonly found in the cytoplasmic matrix that is useful in the process of photosynthesis hence sustaining life on Earth [4,5]. The Chloroplast (cp) is a semi-autonomous organelle having its own genetic material, but some of its proteins are encoded in the nuclear genome [6]. The Chloroplast (cp) genome of angiosperms is mostly double-stranded circular structure containing four parts: a large single copy (LSC) region, a small single copy (SSC) region and a pair of inverted repeats (IRa/IRb) regions with the same sequence in opposite directions [7,8]. The

Repeat Sequence and Codon Usage
Dispersed repeats (forward, reverse, complementary, palindromic repeat sequences) in the complete Chloroplast genome sequence was analyzed using the REPuter online program (https://bibiserv.cebitec.uni-bielefeld.de/reputer, accessed on 7 April 2021). Parameters were set to minimum repeat length of 30 bp, and the similarity between repeats was >90% [43]. Tandem repeats were detected using the Tandem Repeats Finder online tool (https://tandem.bu.edu/trf/trf.html, accessed on 7 April 2021) with parameters set to system default values [44]. Presently, there are nine complete Chloroplast genome sequences of the family Rutaceae available in the GenBank database including Fortunella japonica (MN495932), Citrus aurantifolia (KJ865401), C. aurantium (MT702983), C. hongheensis (MT880607), C. cavaleriei (MT880606), C. limon (MT880608), C. maxima (MN782007), C. medica (MT106673), and C. sinensis (DQ864733). Microsatellite identification tool (MISA) [45], was used to detect the simple sequence repeats (SSRs) in the Chloroplast genome sequences of F. venosa and the 9-individual species mentioned above. The parameters were set as follows: no less than 10 mononucleotides repeat units, no less than 5 dinucleotide repeat units, no less than 4 trinucleotide multiple units, and no more than 3 tetranucleotides, pentanucleotides, and hexanucleotides repeat units [46]. The type, quantity and distribution pattern of SSRs were compared and analyzed. The CDS region was extracted from the plastome sequence using the geneious software, and all the CDS were connected using a web tool sequence operation toolbox (http://www.detaibio.com/sms2/index.html, accessed on 7 April 2021) [47]. The MEGA6 software was used to determine relative synonymous codon usage (RSCU) within the Chloroplast genome [48].

Comparative Genome Analysis and Sequence Divergence
Using Fortunella venosa as a reference, the divergence within the ten Chloroplast genomes was analyzed using mVISTA tool [49,50]. The species sequences used included F. venosa and 9 other Rutaceae species; F. japonica (MN495932), Citrus aurantifolia (KJ865401), C. aurantium (MT702983), C. hongheensis (MT880607), C. cavaleriei (MT880606), C. limon (MT880608), C. maxima (MN782007), C. medica (MT106673), C. sinensis (DQ864733). To analyze the rearrangements and inversions within the boundary region of F. venosa, an insertion program Mauve in Geneious8 (Biomatcrs Ltd. Auckland, New Zealand) was used. The IRscope (IRscope.shinyapps.io/Chloroplot/) [51] software was used to analyze the expansion and contraction of IR boundary of the 10-representative species, and compared the differences within the IR boundaries. DnaSP v.5.0 [52] software was used to calculate nucleotide polymorphism (Pi), with the parameter set as follows: window length of 600 bp and the distance between each site (step size) was 200 bp. This was used to construct a polymorphic site line chart, and find fragments with high polymorphism among the Chloroplast genomes.

Adaptive Evolution and Substitution Rate
In order to analyze the rate of evolutionary changes in the Chloroplast genome of Fortunella venosa, the CDS sequence was extracted using geneious with Citrus aurantifolia as reference. The protein-coding sequences of the 10 Rutaceae species were extracted using PhyloSuite [53], MAFFT to automatically remove the stop codon. PhyloSuite was used to construct the maximum likelihood phylogenetic tree. GTR was selected as the best-fit model, and no outgroup was specified.1000 Bootstraps were performed to construct the phylogenetic unrooted tree. The PAML file and Newick file are imported into EasyCodeML for selective pressure analysis. Using the PAML v4.7 package of the EasyCodeML software [54,55], the positive selection pressure, non-synonymous (DN) and synonymous (DS) substitution rates, and their ratio (ω = DN/DS) of 10 Rutaceae species Plastomes were evaluated. The site-specific model in the software (M0 vs. M3, M1a vs. M2a, M7a vs. M8, and M8a vs. M8) were compared. In order to evaluate the adaptive evolution of Chloroplast genes, the computational likelihood ratio test (LRT) and ω were used to analyze the selection pressure of protein-coding genes in 10 plants.

Phylogeny
To determine the phylogenetic position and relationship of Fortunella venosa, a phylogenetic tree was reconstructed using 28 other species Chloroplast genome sequences downloaded from NCBI database with Melia azedarach Linn. as the outgroup. The outgroup was chosen according to the current APGIV system of classification (http: //www.mobot.org/MOBOT/research/APweb/, accessed on 24 July 2021) and the tree of life phylogeny (https://treeoflife.kew.org/tree-of-life, accessed on 24 July 2021). These 28 Chloroplast genome sequences include Fortunella (2 species), Citrus (9 species), Zanthoxylum (11 species), Glycosmis (2 species), Micromelum (1 species), Clausena (1 species), Murraya (1 species), and Melia (1 species), the detailed information summarized in File S1. Using the MAFFT integrated in PhyloSuite [53], the sequences were aligned. ModelFinder was used to find the best-fit model for the phylogenetic tree reconstruction, and for Maximum Likelihood (ML) analysis 1000 repeated bootstrap tests were performed. Based on the construction of the phylogenetic tree of the entire Chloroplast genome, the phylogenetic tree was constructed with protein coding genes (CDS) to prove its accuracy. A CDS tree was constructed using ML, PhyML and BI methods with 76 shared genes within the 28 species. Geneious was used to extract the CDS sequence, and then concatenated using PhyloSuite. The sequences were aligned using MAFFT, and Model Finder was used to find the best-fit model both in the BI and the IQ tree phylogeny. The total length of the CDS alignment data set was 22,688 amino acids. The reconstructed tree was visualized using FigTree version 1.4.4 [56].

Analysis of Chloroplast Genome Structure
A genome paired-end sequencing obtained a total of 8,768,734 reads of 150 bp in length were obtained from Chloroplast genome sequencing, of which 3,244,455 bp reads were used for chloroplast genome assembly, accounting for 37% of the total. The base coverage reads used to assemble the Chloroplast genome was 793.65 times. The chloroplast genome of Fortunella venosa (GenBank accession No. MZ457935) has been submitted to the GenBank of the National Center for Biotechnology Information (NCBI). The complete chloroplast genome of Fortunella venosa had a size of 160,265 bp. The plastome of F. venosa is a typical circular four-part structure consisting of a large single copy region (LSC, 87,597 bp), a small single copy region (SSC, 18,732 bp) and two inverted repeat regions (IRa and IRb, 26,968 bp each) ( Figure 1 and Table 1). A total of 134 functional genes, including 89 protein-coding genes (CDS), 8 ribosomal RNA (rRNA) genes, and 37 transfer RNA (tRNA) genes, were detected in F. venosa cpDNA ( Table 1). The LSC region consists of 62 CDS, and 22 tRNAs, whereas, the SSC region is composed of 12 CDS and 1 tRNA. The IR regions is composed of 18 CDS, 14 tRNA and 8rRNA ( Figure 1). The total GC-content of the F. venosa chloroplast genome was 38.4%. The IR region had the highest GC content (43.0%), while the LSC and SSC had 36.7% and 33.2%, respectively. The total length of the protein-coding region, tRNA and rRNA were 79,983 bp, 2792 bp, and 9044 bp, respectively, accounting for 49.9%, 1.7%, and 5.6% of the total length of the chloroplast genome. There were 21 genes duplicated in the IR region with two copies, including 10 protein coding genes (rpl22, rps19, rpl2, rpl23, ycf2, ycf15, nbhB, rps7, rps12, ycf68), seven tRNA genes (trnI-CAU, trnL-CAA, trnV-GAC, trnI-GAU, trnA-UGC, trnR-ACG, and trnN-GUU), and four rRNA genes (rrn16, rrn23, rrn4.5, and rrn5). There were 17 genes with introns, 15 genes had one intron (rps16, trnG-UCC, atpF, rpoC1, trnL-UAA, trnV-UAC, petB, petD, rpl16, trnI-GAU, trnA-UGC, ndhA, trnA-UGC, trnI-GAU, rps12, ndhB, rpl2) while two genes (ycf3, clpP) had two introns. Chloroplasts have maintained an autonomous genome that encodes important proteins required for their photosynthesis and different housekeeping functions. According to the function of genes, they can be divided into four categories, as shown in Table 2. There is a variation in the Chloroplast genomes of different species in terms of length, GC content and even the evolutionary rates. The comparison of Chloroplast genomes of ten species of Rutaceae is shown in Table 1.     Table 2. Genes present and functional gene category in F. venosa chloroplast genome.

Repeat Sequence Analysis
A total of 50 long repetitive sequences were detected in the Chloroplast genome of Fortunella venosa by REPuter, including 22 forward repeats (F), 7 reverse repeats (R), 19 palindromic repeats (P), and two complementary repeats (C). Forward repeats were the most abundant, followed by the palindromic repeats within all the species. The least abundant repeats were complementary repeats ( Figure 2). Most of the repeat sites were located in the non-coding region of LSC, and some of them were located in rpoB, psaB, trnV-UAC, trnS-GCU, and trnL-UUA. Six repeat sites were located in the IR region and two in the SSC region. Analysis of the experimental data showed that most of the repeat sequences were 30-40 bp in length, with the longest being a palindrome repeat sequence with 54 bp. This repeat sequence was located between trnH-GUG and psbA section in the LSC region.
A total of 37 tandem repeats were detected by Tandem Repeats Finder, three repeats of which were longer than 30 bp in length and the others were between 1 bp and 26 bp. 20 repeat units reported mismatches and 10 had indels. region. Analysis of the experimental data showed that most of the repeat sequences 30-40 bp in length, with the longest being a palindrome repeat sequence with 54 bp. repeat sequence was located between trnH-GUG and psbA section in the LSC region.

SSR Analysis
In this study, a total of 108 SSR loci were detected in the Chloroplast genome of Fortunella venosa. Among them, 74 were mononucleotides, five were dinucleotides, 15 were trinucleotide, 11 were tetranucleotide, two were pentanucleotides, and one was hexanucleotide ( Figure 3). Most of these SSR loci were distributed in the Chloroplast genome, accounting for 74.1% of LSC region. The results are basically consistent with those of the other nine species (Figures 4 and 5). Furthermore, 88.9% of the 108 SSRs were located in the non-coding region, and 11.1% of the rest in the coding region were located in the LSC region. Comparison of repeated sequences in ten Rutaceae chloroplast genomes (Type and abundance of long repetitive sequences). Note: C represents complementary repeats, F represents Forward repeats, R represents reverse repeats, P represents palindromic repeats.

SSR Analysis
In this study, a total of 108 SSR loci were detected in the Chloroplast genome of Fortunella venosa. Among them, 74 were mononucleotides, five were dinucleotides, 15 were trinucleotide, 11 were tetranucleotide, two were pentanucleotides, and one was hexanucleotide ( Figure 3). Most of these SSR loci were distributed in the Chloroplast genome, accounting for 74.1% of LSC region. The results are basically consistent with those of the other nine species (Figures 4 and 5). Furthermore, 88.9% of the 108 SSRs were located in the non-coding region, and 11.1% of the rest in the coding region were located in the LSC region.    .

Codon Usage Analysis
A total of 89 CDS of the chloroplast genome of Fortunella venosa were used to estimate the relative frequency of synonymous codon usage. A total of 26,699 codons were detected, out of which 2844 (10.65%) encoded leucine and 315 (1.18%) encoded cysteine, which were the most and the least abundant amino acids in the Chloroplast genome of F. venosa, respectively. The most used codon was AUU (1071) encoding isoleucine and least used codon was AUG (1) that encoding methionine. From the analysis of the frequency of synonymous codon usage (RSCU) in the plastome, the codon usage was biased, among which 30 amino acids had a RSCU > 1. Three amino acids, methionine (AUG), serine (UCC), and tryptophan (UGG) do not have codon usage bias (RSCU = 1.00). Among the three stop codons, the use of the stop codon was biased towards UAA (RSCU > 1.00). The relative synonymous codon usage of F. venosa is shown in (Table 3).

Comparative Genome and Sequence Divergence Analysis
In general, the sequence sizes of these species are similar, ranging from 159,893 bp to 160,996 bp in length. As shown in Figure 6, the sequence similarity is very high, indicating that the Chloroplast genome is highly conserved having translocation and inversion of the genes (See File S2). In the 10 Plastomes, the IR regions were more conserved than LSC and SSC regions. Similarly, the coding regions were more conserved than non-coding regions. The regions that are relatively variable in non-coding section include; trnA(GUG)-psbA psbL-trnG(UGG), petN-psbM, psbE-trnM(CAU), trnL(UAA)-trnF(GAA), ndhC-trnV(UAC). These regions may have rapid nucleotide substitution at the species level, indicating that molecular markers have potential application value in phylogenetic analysis and plant identification ( Figure 6). Forests 2021, 12, x FOR PEER REVIEW 11 Figure 6. DNA sequence comparison of the ten species of Rutaceae. VISTA-based identity plot showing sequence identity among ten Rutaceae species using Fortunella venosa as a reference. In this study, the results showed that although Chloroplast genomes are generally conserved in length and genetic structure, they still show significant differences in the IR/SC boundary region (Figure 7). All genes at the border include rps3, rpl22, ndhF, ycf1, trnH. The expansion and contraction of the border region was analyzed for the 10 species. For example, the position of rpl22 gene in Citrus aurantium, C. cavaleriei, C. hongheensis, C. limon, and C. sinensis is located in the IRb region with a distance of 7 bp, 6 bp, 7 bp, 7 bp, and 7 bp from the boundary, respectively. The rpl22 in the other species spans the LSC and IRb regions, and the situation of rpl22 at the boundary of LSC and IRa is also different, the rpl22 gene is missing in C. maxima, C. medica, and Fortunella venosa. The gene ndhF located at the border between IRb and SSC is only 2 bp and 2200 bp in C. medica, and the rest are 31 bp and 2201 bp. The gene trnH located on the border of IRa and LSC is located on LSC but the length from the border varies from 2-65 bp. Ycf1 was lost at the boundary of IRb and SSC in C. medica and F. venosa, and ndhF crossed the boundary of LSC and IRb, but only 2 bp was located at IRb in C. medica, the rest was 31 bp. The length of ycf1 at the boundary between SSC and IRa is 5490 bp to 5505 bp. These results indicate that there is a contraction and expansion of IR region, which can be used for the study of species-specific gene loci. In this study, the results showed that although Chloroplast genomes are generally conserved in length and genetic structure, they still show significant differences in the IR/SC boundary region (Figure 7). All genes at the border include rps3, rpl22, ndhF, ycf1, trnH. The expansion and contraction of the border region was analyzed for the 10 species. For example, the position of rpl22 gene in Citrus aurantium, C. cavaleriei, C. hongheensis, C. limon, and C. sinensis is located in the IRb region with a distance of 7 bp, 6 bp, 7 bp, 7 bp, and 7 bp from the boundary, respectively. The rpl22 in the other species spans the LSC and IRb regions, and the situation of rpl22 at the boundary of LSC and IRa is also different, the rpl22 gene is missing in C. maxima, C. medica, and Fortunella venosa. The gene ndhF located at the border between IRb and SSC is only 2 bp and 2200 bp in C. medica, and the rest are 31 bp and 2201 bp. The gene trnH located on the border of IRa and LSC is located on LSC but the length from the border varies from 2-65 bp. Ycf1 was lost at the boundary of IRb and SSC in C. medica and F. venosa, and ndhF crossed the boundary of LSC and IRb, but only 2 bp was located at IRb in C. medica, the rest was 31 bp. The length of ycf1 at the boundary between SSC and IRa is 5490 bp to 5505 bp. These results indicate that there is a contraction and expansion of IR region, which can be used for the study of species-specific gene loci. The results of Dnaspv.5.0 showed that the regions with high nucleotide polymorphism were the LSC and SSC regions, which was basically consistent with the results of mVISTA (Figure 7). The highly polymorphic loci were trnG-GCC, trnfM-UAA, ndhJ, rpl2, rpl23, trnL-CAU, ccsA, ndhD, ycf1, trnN-GUU, and trnR-AGG. The highest value of Pi was The results of Dnaspv.5.0 showed that the regions with high nucleotide polymorphism were the LSC and SSC regions, which was basically consistent with the results of mVISTA (Figure 7). The highly polymorphic loci were trnG-GCC, trnfM-UAA, ndhJ, rpl2, rpl23, trnL-CAU, ccsA, ndhD, ycf1, trnN-GUU, and trnR-AGG. The highest value of Pi was 0.01563, recorded by the genes rpl2 and rpl23. The Pi value was more than 0.01, As shown in (Figure 8). Data on specific nucleotide polymorphisms are provided in File S3.
Forests 2021, 12, x FOR PEER REVIEW 13 of 21 0.01563, recorded by the genes rpl2 and rpl23. The Pi value was more than 0.01, As shown in (Figure 8). Data on specific nucleotide polymorphisms are provided in File S3.

Adaptive Evolution Analysis
The dN/dS value can be used to measure the evolution rate of a specific gene [57]. This is the ratio of synonymous substitution rate (dS) to nonsynonymous substitution rate (dN) (ω = dN/dS). In the selection pressure analysis, when ω > l, it shows a positive selection, while, when ω = l, it is a neutral selection; if ω < 1, it is a purifying selection. In this study, we found that the model M7 vs. M8 is the most suitable model by EasyCodeML detection. A total of 344 positive selection sites were detected in 79 CDSs of the ten species (see File S4). Among them, the Naive Empirical Bayes (NEB) detected 54 genes loci, encoding 15 genes of selection pressure, accounting for 18. 99% of the total number of genes. The largest number of loci was rpoC2 with 27. Bayes Empirical Bayes (BEB) detected 290 positive selection sites, which encode the selection pressure of 53 genes, respectively accounting for 67.09% of the total number of genes, and rpoC2 has the most loci with 57. In NEB, photosynthesis-related genes ndhI (2) and self-replicating genes rpoC2 (8), rps2 (1), and rps18 (1) had p > 0.99%. In BEB, photosynthesis-related genes ndhB (1), ndhI (2), psbZ (1) and self-replicating genes rpoC2 (8), rps18 (1), rps19 (1), rps2 (1) had p > 0.99%. is shown in (Table 4). The results showed that the 10 species were under strong positive selection pressure, the nucleotide substitution rate was faster, and they showed strong adaptive variation to their environment.

Adaptive Evolution Analysis
The dN/dS value can be used to measure the evolution rate of a specific gene [57]. This is the ratio of synonymous substitution rate (dS) to nonsynonymous substitution rate (dN) (ω = dN/dS). In the selection pressure analysis, when ω > l, it shows a positive selection, while, when ω = l, it is a neutral selection; if ω < 1, it is a purifying selection. In this study, we found that the model M7 vs. M8 is the most suitable model by EasyCodeML detection. A total of 344 positive selection sites were detected in 79 CDSs of the ten species (see File S4). Among them, the Naive Empirical Bayes (NEB) detected 54 genes loci, encoding 15 genes of selection pressure, accounting for 18. 99% of the total number of genes. The largest number of loci was rpoC2 with 27. Bayes Empirical Bayes (BEB) detected 290 positive selection sites, which encode the selection pressure of 53 genes, respectively accounting for 67.09% of the total number of genes, and rpoC2 has the most loci with 57. In NEB, photosynthesis-related genes ndhI (2) and self-replicating genes rpoC2 (8), rps2 (1), and rps18 (1) had p > 0.99%. In BEB, photosynthesis-related genes ndhB (1), ndhI (2), psbZ (1) and self-replicating genes rpoC2 (8), rps18 (1), rps19 (1), rps2 (1) had p > 0.99%. is shown in (Table 4). The results showed that the 10 species were under strong positive selection pressure, the nucleotide substitution rate was faster, and they showed strong adaptive variation to their environment.

Phylogenetic Analysis
The CDS phylogenetic tree results are shown in (Figure 9), Zanthoxylum was clustered into one branch. Glycosmis, Micromelum, Clausena, and Murraya showed a close relationship and hence formed single clade. Fortunella venosa and F. japonica were clustered together and showed a close relationship to genus Citrus. The two species were found to be closely related. They both show that genus Fortunella and genus Citrus are closely related. The results of the whole Chloroplast genome tree are shown in (Figure 10). In the phylogenetic tree reconstructed using the complete chloroplast genome, more than 95% of the branches have a support value of 100% which supports a close relationship among the species. However, one Citrus branch has a support value of 55.6% and its phylogenetic status needs to be further studied, which are basically consistent with the phylogenetic relationship constructed from CDS ( Figure 9).
The Chloroplast genome sequence provides an important resource for phylogenetic research. In order to get a more detailed phylogenetic conclusion, more complete Chloroplast genomes of Fortunella are needed. As a highly primitive group of this genus, the complete Chloroplast genome characteristics of F. venosa is indispensable, which will be subsequently used for Citrus taxa phylogenetic study. relationship constructed from CDS (Figure 9). The Chloroplast genome sequence provides an important resource for phylogenetic research. In order to get a more detailed phylogenetic conclusion, more complete Chloroplast genomes of Fortunella are needed. As a highly primitive group of this genus, the complete Chloroplast genome characteristics of F. venosa is indispensable, which will be subsequently used for Citrus taxa phylogenetic study.

Discussion
In this study, we analyzed the complete chloroplast genome of Fortunella venosa and performed a comparative study with 10 Rutaceae species. The Chloroplast genome of Fortunella venosa is a circular structure with a size of 160,265 bp, which is similar to the size of other related species reported [58,59]. All the 10 complete chloroplast genomes of the

Discussion
In this study, we analyzed the complete chloroplast genome of Fortunella venosa and performed a comparative study with 10 Rutaceae species. The Chloroplast genome of Fortunella venosa is a circular structure with a size of 160,265 bp, which is similar to the size of other related species reported [58,59]. All the 10 complete chloroplast genomes of the Rutaceae species displayed attributes that is similar to other angiosperm Chloroplast genomes, with quadripartite structure including the LSC, SSC, and a pair of inverted repeats (IRa and IRb). Although there were no genomic rearrangements with gene order highly conserved, there were differences in the Chloroplast genomes ranging from 160,229 bp-160,265 bp in genus Fortunella, and 159,893-160, 996 bp in genus Citrus, hence suggesting some small genetic differences within the genomes. Previous studies have reported that loss of genes [60], variations in the inverted repeat regions [61], and the intergenic spacer region variation [62] are the three fundamental causes of the variations in the chloroplast genome sizes in plants. Additionally, the chloroplast gene length has also been associated with the genome size [63].
Repetitive sequences play an important role in genome rearrangement and are very helpful for phylogenetic studies [64]. In addition, analysis of various chloroplast genomes has shown that repetitive sequences are essential for the study of indels and substitutions [65]. All the ten Rutaceae species had reverse, compliment, forward and Palindromic repeats which varied in number among all the species. From the analysis, the number of repeats found within the chloroplast genomes indicate that Fortunella venosa and F. japonica are more similar than to the rest of the Citrus species. Studies have linked sequence variations and genome rearrangements to slipped-strand mispairing and improper recombination of the repeat sequences [66]. On the other hand, Simple Sequence Repeats (SSRs), also known as microsatellite sequences, are repeated DNA sequences, widely distributed in the whole Chloroplast genome, having lengths of about 1-6 bp [67]. The inheritance of cpDNA in higher plants is mostly maternal, and the structure of cpDNA is relatively conserved and simple, hence cpDNA SSR is widely used as molecular markers, variety identification and other molecular studies [68]. For example, SSR analysis is important for DNA markers used for population genetic and evolutionary studies [69][70][71]. In this study, we analyzed the SSRs in the Chloroplast genomes. Six categories of perfect SSRs (mono-, di-, tri-, tetra-, penta-, and hexa-nucleotide repeats) were detected in the Chloroplast genome of these ten species. In recent years, more evidence shows that the repetitive structure of genomic DNA is essential, not only important in plant molecular research [72], but also widely used in the study of population genetics of species [73][74][75]. SSR has the advantages of high mutation rate, site specificity and multiple alleles [76,77], which can be used for genetic diversity analysis [78,79].
The relative frequency of synonymous codon usage (RSCU) values in Chloroplast genomes have been shown to be as a result of mutation and selection [80,81], which are crucial in the study of the evolution of organisms. RSCU > 1 indicates a preference for the codon, RSCU < 1 indicates a low usage of the codon, and RSCU = 1 indicates no preference for the codon [82]. The codon usage was biased, among which 30 amino acids have RSCU >1. Three amino acids, methionine (AUG), serine (UCC), and tryptophan (UGG) do not have codon usage bias (RSCU = 1.00). Among the three stop codons, the use of the stop codon was biased towards UAA (RSCU > 1.00). This is basically consistent with the reports of other Chloroplast genomes in Rutaceae [58,59].
Comparative analysis in the 10 Plastomes showed that the IR regions were more conserved than LSC and SSC regions. Similarly, the coding regions were more conserved than non-coding regions. This is a common phenomenon in most angiosperms [83]. There is a variability in the size, position at the boundary regions among the species especially for genes such as rpl22, ndhF, and ycf1. This changes in the sizes and positions of the genes cause changes in the size of the genome, hence comparatively, there is a variation in length and number of genes as shown among the species. Expansion and contraction at the borders of the IR regions are considered important in the Chloroplast genome size and play a vital role in its evolution [84]. Nucleotide diversity among the 10 species of Rutaceae genomes was calculated, indicating that the average nucleotide diversity is 0.00252 (Supplementary File S4). This was comparatively higher as compared to the previous studies that compared the species level and the interspecific nucleotide diversity [85]. Most of the nucleotide diversity sites occurred in the LSC and SSC regions, with the highest peaks being rpl2/rpl23/trnL-CAU (0.016) and ycf1/trnN-GUU/trnR-AGG (0.015). This shows that there are low levels of nucleotide diversity throughout the chloroplast genome.
The genus Fortunella includes four species of the "kumquats" from eastern Asia (China, Hong Kong, and Malay Peninsula). It is traditionally separated from Citrus by quantitative characters, 3-7 (versus 8-18) locules in the ovary with two (vs. 4-12) ovules per locule, and by smaller fruits. In other vegetative, floral, and fruit characters, Fortunella is quite similar to Citrus, including the polyadelphous androecium (character 4) with numerous stamens cohering in bundles, a character more commonly found in Citrus subgenus Citrus. The results of this study (Figures 9 and 10) indicate that Fortunella Swingle and Citrus L. are closely related, but do not support the incorporation of F. venosa into C. japonica. The complete chloroplast genomes have been shown to provide informative sites for resolving phylogenetic relationships of plants, and have been examined as well to be effective in the ability of differentiation in lower taxonomic levels [86]. The ML, BI, PhyML tree showed a very high level of support in our study. This study shows that F. venosa should be an independent species, which is significantly different from F. japonica in terms of morphology (habitat, leaf type, fruit size, etc.). F. venosa is a small shrub, usually no more than 1 m tall (the shortest mature plant is 0.28 m high); the leaves are single leaves (the joints of the petiole and the leaf are not joint); the leaves are usually small, 2-4 (−7) cm long, 1-2 cm wide, wedge-shaped base; petiole short, 1-3 (−5) mm long; flower solitary leaf axils, petals 3-5 mm long; ovary 2-3 compartments; fruit spherical or elliptical, diameter 6-8 mm, Orange-red when mature. On the other hand, F. japonica is a small tree or tree with a height of 2 to 8.5 m, and the main stem is usually slender; the leaf is a single leaflet with joints between the petiole and the leaf; the leaf is larger, 4-9 cm long, 1.5-3.5 cm wide, and a wide wedge-shaped base; The petiole is obviously longer, 6-10 (−15) mm long; the flower is single or 2-3 clusters with leaf axils, the petals are 6-8 mm long; the ovary is 4-6 compartments; the fruit is larger, spherical, 1.5-2.5 cm in diameter, Orange-yellow to orange-red when cooked ( Figure 11). Due to the significant morphological difference between F. venosa and F. japonica, it is easy to distinguish the two in the wild. The molecular results obtained in this study provide strong support for the independent systematics status of F. venosa. In this paper, we still use F. japonica and F. venosa as the scientific names according to the Flora of China for better discussion. In addition, none of the research results done so far based on morphology, palynology and molecular biology supports the incorporation of F. venosa into C. japonica, showing that the two species are independent. ical difference between F. venosa and F. japonica, it is easy to distinguish the two in the wild. The molecular results obtained in this study provide strong support for the independent systematics status of F. venosa. In this paper, we still use F. japonica and F. venosa as the scientific names according to the Flora of China for better discussion. In addition, none of the research results done so far based on morphology, palynology and molecular biology supports the incorporation of F. venosa into C. japonica, showing that the two species are independent.

Conclusions
This paper reports the first complete Chloroplast genome sequence of Fortunella venosa. It provides a more detailed and complete information, laying a foundation for the identification of species in the genus Fortunella and the analysis of genetic differences at the individual level. In Rutaceae, Fortunella is phylogenetically related to Citrus, but the inter-species relationship is complicated. This study confirmed that the molecular phylogeny supports F. venosa as an independent species. Hence the Chloroplast genome proves an important basis for the study of systematic classification. In order to better solve the problem of systematic classification in Fortunella, we need to get more cpDNA sequence information of Fortunella. Furthermore, the variations among chloroplast genomes of both Fortunella and Citrus species provide a mechanism of distinguishing the species for future

Conclusions
This paper reports the first complete Chloroplast genome sequence of Fortunella venosa. It provides a more detailed and complete information, laying a foundation for the identification of species in the genus Fortunella and the analysis of genetic differences at the individual level. In Rutaceae, Fortunella is phylogenetically related to Citrus, but the interspecies relationship is complicated. This study confirmed that the molecular phylogeny supports F. venosa as an independent species. Hence the Chloroplast genome proves an important basis for the study of systematic classification. In order to better solve the problem of systematic classification in Fortunella, we need to get more cpDNA sequence information of Fortunella. Furthermore, the variations among chloroplast genomes of both Fortunella and Citrus species provide a mechanism of distinguishing the species for future studies. The study of chloroplast genes is of great significance in revealing the mechanism and metabolic regulation of plant photosynthesis. At the same time, the in-depth study of the chloroplast genome helps understand the mutual regulation between the nuclear genome and the chloroplast genome, and the chloroplast as a semi-autonomous organelle is conducive to energy conversion research.