The Complete Chloroplast Genome Sequence of Laportea bulbifera (Sieb. et Zucc.) Wedd. and Comparative Analysis with Its Congeneric Species

Laportea bulbifera (L. bulbifera) is an important medicinal plant of Chinese ethnic minorities, with high economic and medicinal value. However, the medicinal materials of the genus Laportea are prone to be misidentified due to the similar morphological characteristics of the original plants. Thus, it is crucial to discover their molecular marker points and to precisely identify these species for their exploitation and conservation. Here, this study reports detailed information on the complete chloroplast (cp) of L. bulbifera. The result indicates that the cp genome of L. bulbifera of 150,005 bp contains 126 genes, among them, 37 tRNA genes and 81 protein-coding genes. The analysis of repetition demonstrated that palindromic repeats are more frequent. In the meantime, 39 SSRs were also identified, the majority of which were mononucleotides Adenine-Thymine (A-T). Furthermore, we compared L. bulbifera with eight published Laportea plastomes, to explore highly polymorphic molecular markers. The analysis identified four hypervariable regions, including rps16, ycf1, trnC-GCA and trnG-GCC. According to the phylogenetic analysis, L. bulbifera was most closely related to Laportea canadensis (L. canadensis), and the molecular clock analysis speculated that the species originated from 1.8216 Mya. Overall, this study provides a more comprehensive analysis of the evolution of L. bulbifera from the perspective of phylogenetic and intrageneric molecular variation in the genus Laportea, which is useful for providing a scientific basis for further identification, taxonomic, and evolutionary studies of the genus.


Introduction
Chloroplasts (cp) are a unique organelle of green plants, which can participate in photosynthesis and provide the necessary energy for plant growth and development [1]. The chloroplast genome is profoundly conserved, primarily containing genome size, structure, gene content, and organization [2]. It is considered one of the indispensable tools for phylogenetic analysis and molecular identification. At present, numerous studies have utilized plastid information to explore the phylogenetic relationships, origin evolution, and patterns and rates of nucleotide substitutions among land plants [2][3][4]. These studies have demonstrated that angiosperms differ significantly in genome size, genome structure, and gene replacement rates. Consequently, it is also very interesting to understand the plants' developmental relationship of the same genus through plant plastid data and to analyze their differences from a biological viewpoint.
The genus Laportea is an essential group in the Urticaceae family, comprising around 28 species that are mainly distributed in tropical and temperate regions, including Asia, North and South America, Africa, and the Pacific island [5][6][7]. They are mostly perennial online site (https://www.cloudtutu.com/, accessed on 8 July 2022). In addition, the GC contents of each gene and plastome were calculated by using CGView Serve [24].

Repeat Sequences, Codon Usage and RNA Editing Sites Analysis in L. bulbifera Chloroplast Genome
The SSRs were detected by using MIcroSAtellite (MISA), with parameter settings referenced to Beier et al. [25]. In the meantime, we used the REputer online platform to calculate forward, reverse, palindromic, and complement repeats [26]. In addition, we used Phylosuite v1.2.2 software to extract protein-coding sequence (CDS), and then obtained the relative synonymous codon usage (RSCU) value of L. bulbifera by CodonW version1.4.2 calculation [27,28]. The RSCU was calculated as the difference between a codon's actual and predicted frequency. The codon is utilized less frequently than anticipated if the value of RSCU is lower than 1. Contrarily, it shows that codon use is higher than anticipated [29]. Additionally, the possible RNA editing sites in the CDS of the cp genome were detected by using the predictive RNA Editor for Plants (PREP), with a threshold value of 0.8 [30].

Genome Comparison
To identify interspecific variation, mVISTA was used for plastid comparisons of nine Laportea species [31]. Then, sliding window analysis was performed using DnaSP v6.0 to calculate the nucleotide diversity (Pi), with parameter settings referring to the method of Rozas J et al. [31,32]. Finally, the IR boundaries in these genomes were visualized using the IRscope online tool.

Molecular Clock Analyses
The divergence time of Laportea was estimated by using BEAST (version 1.10.1) software, with the Bayesian method. The fossil information from Zhengyia shennongensis (35 Mya), and Girardinia suborbiculata (13 Mya), and the model is an uncorrelated log-normal relaxed clock model [39][40][41]. First, input the combined nuclear genes and chloroplast fragments into BRAUti to generate a ".xml" file that can be imported into BEAST to run, and punctuate the age of the corresponding branch with the age of the fossil. Then, the posterior topology of the tree is set to Yule speciation and runs for 20,000,000 generations to save a tree every 1000 generations. The "trees file" generated by BEAST was imported into TreeAnnotator and the posterior probability limit was set to 0.5 to generate the maximum clade credibility tree (MCC tree) [42]. Finally, the MCC tree was imported into Figuretree (v1.4.3) to show the results.

Category of Genes
Group of Genes Name of Genes

Analysis of Repeat Sequences and Codon Usage
Repeat analysis of L. bulbifera plastome detected 39 SSRs with lengths ranging from 17 to 38 bp. It contains 34 mononucleotide repeats (A/T), accounting for 87.20%, and just two polynucleotide repeats (AT). Furthermore, four different forms of interleaved repeats, including 20 palindromic repeats (17-38 bp), 16 forward repeats (18-37 bp), 10 reverse repeats (17-20 bp), and three completion repeats (18 bp) (Figure 2a and Table S1).  (Table S2). Subsequently, we calculated the codon usage frequency (RSCU values) from the sequence of the protein-coding gene. The result showed that 32 codons with RSCU value greater than 1 and that all except UUG and GGG terminated with A/U (Figure 2b).
In total, 54 RNA editing sites were identified in 12 genes of the L. bulbifera cp genome, and ndhB was discovered to have the most gene editing sites, while some CDSs only had one editing site (such as atpA, atpF, clpP, and matK, etc.). Furthermore, almost all editing sites underwent a conversion from cytosine to uracil(C-U) at the first or second base position, and no editing sites were discovered at the third codon location. Among these amino acids, the conversion of Serine (S) to leucine (L) is the most frequent. Numerous editing sites have the potential to be transformed, for example, phenylalanine (F), histidine, tyrosine (Y), methionine (M), proline(P), and valine (V) ( Table S3).

Sliding Window Analysis
To better comprehend the genetic diversity, sliding window analysis with the DnaSP programs identified highly variable regions in the Laportea cp genome.

Boundaries of IR
The expansions and contractions of IR boundaries could reflect the length diversity and evolutionary events in plastid genomes, which are common in cp genomes [48]. Here, we compare differences in size and junction in LSC, SSC, and IR regions of Laportea. It indicated that the IR regions' length varied from 24,955 to 27,435 bp and a plurality of genes spanning or approaching the edges of the IR and SC regions (Figure 4), containing rps19, rpl2, rpl22, ycf 1, ndhF, and trnH. The rps19 genes of all species were found in the LSC/IRb border regions, with six bp in the LSC region and 51-131 bp in the IRb region, except for the rps19 genes of Laportea ovalifolia (L. ovalifolia), Laportea cuspidate (L. cuspidate), and L. aestuans, which were situated in the IRb and LSC regions, respectively. Except that the ndhF of Laportea grossa is located in the SSC region, other species are located at the junction of the IRb and the SSC region, the SSC region has 2197-2250 bp, and the IRb region has 2-80 bp. The complete ycf 1 gene is located at the boundary between SSC and IRa, and the length ranges are 2529-5322 bp and 195-3035 bp, respectively. The trnH gene is generally situated near the IRb and LSC interface, with a distance to the boundary of 1-24 bp.

Gene Selective Pressure Analysis
The dN/dS ratio could provide insight into the evolution of DNA sequence by examining the process of diversification selection among related species [49][50][51][52][53]. In this study, the majority of the 62 shared genes in this research had dN/dS ratios smaller than 1 ( Figure S2), which suggests that they may be in the process of purifying selection. While five genes (clpP, psal, rps15, ycf 1, and ycf 2) have higher dN/dS ratios, ranging from 0.6 to 1.0. It was demonstrated that these genes have some potential for positive gene selection, which may speed up the future development of Laportea species ( Figure 5). In general, the lower substitution rates of plastids in the Laportea indicate that PCGs were highly conserved.

Phylogenetic Analysis and Divergence Time Estimation
Four phylogenetic trees (ML, UPGMA, ME, and NJ) of Urticaceae were constructed based on the 115 CDSs. This result showed that the tree has higher support values on the whole, and the taxonomy of species in the genus Laportea is confusing, most of them were found scattered in Urera, Poikilospermum, Girardina, and Naporea. Specifically, it was clustered into four to five branches. Among these, both the ML tree and ME tree showed that L. bubifera, L. canandensis, and L. medogensis clustered into one branch, L. moreana, L. ovalifolia, L. aistuans, and L. grossa clustered into one branch. And L. decumana and L. cuspidata clustered as one, respectively, while NJ and UPGMA showed L. grossa as a separate species (Figure 6 and Figure S4). The results of the construction of species of Urticaceae in this study are generally consistent with previous studies [10,17]. Subsequently, to explore the affinities within the Laportea, we constructed four phylogenetic trees. The result showed that all nodes had the highest bootstrap support, and Laportea could be grouped into two clades, one clade includes four species (L. grossa, L. mooreana, L. ovalifolia, and L. aestuans), whereas the other clades include the remaining species. Phylogenetic tree analysis at the family and genus level indicated that L. bulbifera was closely related to L.canadensis (Figure 7). Furthermore, the divergence time of each internal node of the phylogenetic tree was estimated with fossil record data of Zhengyia shennongensis and Girardinia suborbiculata to calibrated by using BEAST for further infer the historical origin of Laportea species. It was shown that Laportea genus diverged at about~157.3396 million years ago (Mya), with L. bulbifera splitting at~1.8216 Mya (Figure 8). The detected divergence time may contribute to future research on the Laportea genus.

Discussion
The Urticaceae family is diverse, with approximately 54 genera and ∼2600 species worldwide, and is divided into six groups overall, including Boehmerieae, Cecropiaceae, Elatostemateae, Forsskaoleae, Parietarieae, and Urticeae [54]. Molecular studies at this stage support that Urticaceae constitute a good branch, while the majority of this research has focused on the family or tribe level, with relatively few investigations on congenic species [55]. Laportea is a genus of the tribe Urticeae, and phylogenetic analysis demonstrates that Laportea consisted of a polylineage scattered in the Urera and Poikilospermum clade. [8,56]. In this study, we reconstructed the phylogenetic analysis of Urticaceae family, which were consistent with those of previous studies, showing that Laportea was mostly dispersed in Urera, Poikilospermum, Girardina, and Naporea. However, to date, no studies have targeted the phylogenetic relationships and intra-genus differences of this genus.
Here, we first sequenced, assembled, annotated, and processed data for chloroplasts of L. bulbifera. Then, the complete plastid data of eight Laportea species were downloaded from NCBI and aligned and concatenated using the MAFFT online website. Subsequently, the genetic differences, high variance regions, and evolutionary history of Laportea were analyzed using bioinformatics tools such as mVISTA and IRscope online sites, MEGA, and DnaSP software. The result indicated that the plastomes of Laportea, with sizes ranging from 149,149 bp to 161,930 bp, exhibited the tetrad structure typical of angiosperms. The cp genome had an uneven distribution of GC content, with IR regions having a higher abundance than LSC and SSC. The possible reason for this phenomenon is that the IR region is enriched with ribosomal RNA (rRNA) genes and transfer RNA (tRNA) genes of GC. At the same time, the conservatism in IR regions compared to SC regions may also be due to GC inequality [57,58]. In addition, previous studies have demonstrated that changes in IR/SC junctions are thought to be one of the main drivers of the size diversity of cp genomes in higher plants [59,60]. Changes in the length of the cp genome may be mostly caused by the shrinkage and extension of IR border regions [48]. Genes located on the border could make IR or SC sections with the extension or shrinkage of the IR boundary regions. (e.g., rps19, ycf1, and trnH shown in Figure 4). The result is consistent with previous findings, both indicating that the cp genomes of species within the same family or genus are extremely homogeneous [17].
The SSRs are a type of genetic marker that reveal information about an individual and are composed of tandem repeats of 1-6 oligonucleotides [61,62]. The SSRs analysis revealed L. bulbifera has the highest number of mononucleotides in chloroplasts, most of which were poly T and A. This is consistent with earlier research indicating that mononucleotides are the most abundant type of SSRs and a majority of these loci are located in the noncoding regions, as in most angiosperms [63][64][65][66]. All in all, the SSRs resource established will be beneficial for plant evolution and ecological studies of Laportea.
This study was the first opportunity to compare the Laportea plastid genome and estimate the ratios of dN/dS by using mVISTA online sites and PAML v4.9 program to reveal the interspecific diversity of plastid genomes in Laportea. It was shown that the noncoding regions of the plastids in Laportea display higher polymorphisms than the coding regions, which is the same result as most angiosperms [67,68] (Figure 5 and Figure S2). The hypervariability analysis is also consistent with this viewpoint. In the meantime, the result recommends four hypervariable genes, rps16, trnC-GCA, trnG-GCC, and ycf 1, as potential molecular markers of the Laportea. Subsequently, to verify whether the above-mentioned encoded proteins can be used as molecular markers, we extracted these protein-coding genes and reconstructed the phylogenetic tree to explore their taxonomic relationship. The results are consistent with the previous 78 protein-coding gene construction tree files, and the species of Laportea are also clustered into two branches, and L. bulbifera and L. canadensis also have the closest relationship ( Figure S3). Thus, we can reasonably speculate that genes such as ycf 1, rpoC2, and clpP may serve as the identification points for the evolution of Laportea. This is also consistent with the locus ycf 1 reported by previous studies in Urticaceae, as a highly variable gene, which has critical implications for our effective identification and wise utilization of medicinal taxa in this genus [69,70]. However, this study only made bold speculation from the perspective of phylogenetic analysis, and the specific molecular experimental verification deserves further discussion by subsequent scholars due to the limitations of the sample. Notably, L. canadensis is a clonal, monoecious, perennial herb common in North American wetland and floodplain forests and has now been introduced in most countries [71]. At present, some studies have demonstrated that L. canadensis has good efficacy in the treatment of skin diseases as well as Stinging Nettle [7]. L. bulbifera is used clinically mainly for the treatment of analgesic, anti-inflammatory, and rheumatic diseases. However, the similar morphological characteristics of the two plants and the closest kinship hinder their clinical medicinal identification to some extent. Thus, mining the molecular markers between the two is crucial for solving the problem of medicinal confusion within the genus Laportea. There are relatively few studies on the medicinal components and pharmacologically active substances of Laportea. Here, we suggest that scholars consider L. bulbifera as a pioneer in bioprospecting to further promote the clinical medicinal development of this genus.
The complete cp genome could provide a wealth of resources for phylogenetic and evolutionary connection inference [72]. The phylogenetic relationship of Laportea has long been controversial. Most of the previous reports focused on a single cp genome [73,74], or analyzed the phylogeny from the perspective of the entire Urticaceae [17], but there was no specific plastid genome analysis on Laportea. In this study, the ethnomedicine L. bulbifera was taken as an example to interpret the composition of its cp genome from the perspective of plant evolution, and explore its genetic relationship and taxonomic identification points. The results show that the phylogenetic tree has high consensus support at each node indicating the correctness and confidence of the phylogenetic relationship, which also lays the foundation for resolving the controversial issues of the taxonomic status and evolutionary relationship of Laportea.

Conclusions
To further search for molecular markers to solve the confusion within the genus Laportea and further rationalize the exploitation of the ethnomedicine L. bulbifera, we resequenced and reported the plastid genome of L. bulbifera, reconstructed intra-genus relatedness, counted the codon usage, compared the sequence divergence of this genus, and estimated their evolution time for the first time. Overall, the chloroplast genome of L. bulbifera contains 39 SSRs, 54 RNA editing sites, and 44,746 codons. The species of Laportea exhibit typical circular tetrads, similar to most angiosperms, with sizes ranging from 149,149 bp to 161,930 bp and the characteristics of the L. bulbifera plastids are similar to other species of the Laportea genus. The genome comparison revealed rps16, trnC-GCA, trnG-GCC, and ycf 1 can be considered as potential molecular markers for Laportea. The phylogenetic analyses show that L. bulbifera and L. canadensis are closer, originating about 1.8216 Mya ago. Moreover, three genes with large differences and eight gene spacer regions were detected, which also laid the theoretical foundation for the identification of Laportea plants. This research will provide precious resources for the cp genome of L. bulbifera, which also has essential implications for investigating the Laportea genus evolution and the discrimination of medicinal products.

Supplementary Materials:
The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/genes13122230/s1, Table S1. Chloroplast genomes are scattered in repeat sequence characteristic values; Table S2. Codon Usage in this chloroplast genome; Table S3. Predicted RNA editing site in L.bubifera chloroplast genome; Figure S1. Sequence alignment of Laportea chloroplast genomes performed using the mVISTA program with L. aestuans as a reference. Figure S2. The dN/dS values between each plastid gene in the Laportera species are shown as box plots. Figure S3. The phylogenetic tree of species from Laportea is based on the nucleotide sequences of 8 CDSs using the maximum likelihood (ML) method. Figure S4. The phylogenetic tree of species from Urticaceae based on 115 CDSs using the test minimum evolution (ME, Figure S4a) and the unweighted pair group method with arithmetic mean (UPGMA, Figure S4b) and the neighbor-joining (NJ, Figure S4c

Data Availability Statement:
No new data were created or analyzed in this study. Data sharing is not applicable to this article.