Complete Chloroplast Genome of Hygrophila polysperma (Acanthaceae): Insights into Its Genetic Features and Phylogenetic Relationships

Chin, Li-Xuan; Huang, Qiurui; Fan, Qinglang; Tan, Haibo; Li, Yuping; Peng, Caixia; Deng, Yunfei; Li, Yongqing

doi:10.3390/horticulturae11101240

Open AccessArticle

Complete Chloroplast Genome of Hygrophila polysperma (Acanthaceae): Insights into Its Genetic Features and Phylogenetic Relationships

by

Li-Xuan Chin

^1,2

,

Qiurui Huang

³,

Qinglang Fan

³,

Haibo Tan

^1,2,

Yuping Li

^1,2

,

Caixia Peng

^1,2

,

Yunfei Deng

^1,2

and

Yongqing Li

^1,2,*

¹

State Key Laboratory of Plant Diversity and Specialty Crops, Guangdong Provincial Key Laboratory of Applied Botany, South China Botanical Garden, Chinese Academy of Sciences, Guangzhou 510650, China

²

University of Chinese Academy of Sciences, Beijing 100049, China

³

Huamei International School, Guangzhou 510520, China

^*

Author to whom correspondence should be addressed.

Horticulturae 2025, 11(10), 1240; https://doi.org/10.3390/horticulturae11101240

Submission received: 6 August 2025 / Revised: 6 October 2025 / Accepted: 8 October 2025 / Published: 14 October 2025

(This article belongs to the Special Issue Horticultural Plant Genomics and Quantitative Genetics)

Download

Browse Figures

Versions Notes

Abstract

Hygrophila polysperma is a type of amphibious plant that originates from Acanthaceae. Here, we report its first complete chloroplast (cp) genome. The complete cp genome is 146,675 bp in length with 38.3% of GC content. There are 130 genes including 86 protein coding genes, 36 tRNA genes, and 8 rRNA genes in this genome. Simple short sequence (SSR) analysis found 30 SSRs, 24 of which are located in a large single-copy region. Nucleotide diversity identified six most divergent sequences (trns-GCU, psaA-pafI, psaI-pafII, ycf2, rpl32, and ycf1) among 3 close-related species, H. polysperma, H. ringens, and Asteracantha longifolia. A phylogenetic tree among H. polysperma and another 30 related species was constructed based on the common coding sequence of the cp genome and showed that H. polysperma is most closely related to H. ringens (both belong to subtribe Hygrophilinae) and, together, they form a clade that is sister to A. longifolia. This study provides a basis for systemic and evolution studies as well as the development of molecular markers for species identification and genetic breeding.

Keywords:

Hygrophila polysperma; amphibious plant; chloroplast genome; comparative genome analysis; phylogenetic relationship

1. Introduction

Hygrophila is a genus within the Acanthaceae family, one of the largest plant families from angiosperm. Hygrophila polysperma (Roxb.) T. Anderson is also known as Miramar weed as the common name [1,2] and grows as an annual herbaceous plant. This species was found in Jiangxi province in mainland China when it was first being reported as a terrestrial plant [3]. By contrast, the submerged variation of this species was introduced to Florida, U.S., in the 1950s as an ornamental aquatic plant by the aquarium plant industry [4]. H. polysperma grows in the form of a shrub and is categorized as an invasive species due to its fast-growing and rate of rapid spread [4,5,6].

H. polysperma was first documented in Florida as an aquatic ornamental used in the aquarium trade, whereas the sample collected in this study was grown under terrestrial conditions. This is due to its amphibious properties, allowing it to grow on submerged, emerged, and land habitats. Likewise, Hygrophila ringens (H. ringens) and Hygrophila difformis (H. difformis), another species of the same genus, are typically submerged macrophytes of tropical and subtropical rivers and streams and occupy primarily wet tropical biomes [7,8]. In addition, the generic epithet “Hygrophila” itself derives from Greek hygro- (water) and -phila (plants), reflecting the historical association with aquatic habitats and its initial dissemination by aquarium enthusiasts [9].

H. polysperma is reproduced through seed dispersal where the seeds are stored in polyspermous, thin-walled, elliptical capsules [9]. The leaves are 2–2.2 cm long, elliptic-lanceolate, obtuse, and all linear with inconspicuous rounded teeth (Figure 1). The flowers are purple with the upper lip 2-toothed and the lower lip 3-lobed, and with the lobes nearly equal and rounded. However, the leaves of submerged H. polysperma are generally larger in size and grow up to 8 cm long and 2 cm wide [4].

Chloroplast contains the genomic information that gives the data of its varieties and speciation. The chloroplast genome (cp genome) is conserved and contains a large single-copy (LSC) region, a small single-copy (SSC) region, and two inverted repeat regions (IRs) in opposite directions, forming a quadripartite structure. These contribute to the capability of the cp genome to be utilized for plant systemic and evolution studies, as well as the development of molecular markers for species identification and genetic breeding [10]. Here, we report the complete cp genome of H. polysperma. The characteristics of the genome including gene content, repeat sequences, codon analysis, IR contraction and expansion, and comparative analysis of genome structure were summarized and the phylogenetic relationship between H. polysperma and 30 other species were analyzed based on coding sequences (CDSs) from complete cp genomes. This study aims to provide the scientific basis for development of molecular markers in species identification, genetic breeding, and molecular evolution studies.

2. Materials and Methods

2.1. Plant Materials, DNA Extraction, and Plastid Genome Sequencing

Fresh plant samples were collected from the South China Botanical Garden research centre field in Guangzhou city, Guangdong province, China, and were grown from the seeds of wild H. polysperma under terrestrial conditions. A voucher specimen of the plant samples was deposited at the herbarium of South China Botanical Garden with accession number IBSC1050922. The arial part of the fresh plant samples was collected for extraction and genome sequencing. Total genomic DNA was extracted using the Magen HiPure Plant DNA extraction kit (Magen Biotech CO., Shanghai, China) and amplified through Polymerase Chain Reaction (PCR). The PCR product was then detected for purity and quality via DNA gel electrophoresis (1 × TAE agarose gel). The qualified DNA was subject to DNA sequencing through the BGISEQ-500 platform by sending to Bio & Data Biotech. Inc., Guangzhou, China. The sequencing procedure was carried out according to the BGISEQ-500 standard protocol. The process of DNA sequencing included DNA sample quality detection, ultrasonication into DNA fragments, reconstruction of the sequence library from DNA fragments, quality detection of the sequence, and genome sequencing.

2.2. Cp Genome Assembly and Annotation

After BGISEQ-500 sequencing, the raw data of the sample was converted into nucleotide data through base calling conversion. The raw reads were displayed in the form of FASTQ containing the original data. The raw reads were QC-filtered by eliminating primer sequence, capping end, and low-quality data to obtain clean data. In order to reduce the complexity of sequence assembly, fast gapped-read alignment with Bowtie 2 was carried out by selecting three CDSs with the highest number of aligned reads and treated as the starting sequences (SSeqs) for cp genome assembly [11]. This is because these sequences with the highest number of aligned reads are mostly conserved and less likely to contain structural variations or errors. The core module of assembly was used by Novoplastys software to assemble the sequence into cp genome contigs [12]. CAP3 was used to connect multiple contigs into a complete cp genome, and the starting position of the circular cp genome was manually adjusted. The complete cp genome of H. polysperma was created for this study.

We constructed the annotation of the H. polysperma complete cp genome by using the GeSeq online tool (https://chlorobox.mpimp-golm.mpg.de/geseq.html, accessed on 20 February 2025) with default parameters [13]. The detailed circular cp genome map was constructed with Organellar Genomes DRAW (OGDRAW) [14]. Detailed information of the cp genome was determined and stated by using Unipro UGENE v50.0 software, as well as the GC content analysis and distribution [15].

2.3. Repeat Sequences and Codon Usage Analysis

We conducted repeat-sequence detection by performing analysis through the MISA-web online platform (https://webblast.ipk-gatersleben.de/misa/index.php?action=1, accessed on 20 March 2025) [16] with Kmer lengths of 10, 6, 5, 4, 4, and 4, where each value refers to the mononucleotide, dinucleotide, trinucleotide, tetranucleotide, pentanucleotide, and hexanucleotide, respectively. The Short Sequence Repeats (SSRs) were then compared with the result obtained from running the script provided by [17] through the github platform (https://github.com/Xwb7533/CPStools, accessed on 21 March 2025) to obtain the complete SSR information. Repeat sequences such as forward (F), reverse (R), complementary (C), and palindromic (P) were analyzed by running the REPuter online programme (https://bibiserv.cebitec.uni-bielefeld.de/reputer, accessed on 21 March 2025) with a minimal repeating size of 30 bp and a Hamming distance of 3. Tandem repeats were analyzed by using the online Tandem Repeats Finder Program (https://tandem.bu.edu/trf/home, accessed on 21 March 2025) [18] with alignment parameters of 2, 7, and 7 bp, denoting match, mismatch, and indel sections. In order to understand the preferable codons, we conducted Relative Synonymous Codon Usage (RSCU) analysis by using online tools, RSCU illustrator, provided on the Genepioneer platform (http://genepioneer.com/, accessed on 21 March 2025).

2.4. Inverted-Repeats Contraction and Expansion

IR boundary comparison among cp genome sequences of different species was illustrated by using the CPJSdraw tool provided on the Genepioneer platform to visualize the LSC-IR and SSC-IR boundary [19].

2.5. Comparative Analysis of Genome Structure

The list of information including every gene position in the cp genome was compiled and created to run mVISTA analysis online. VISTA is a comprehensive suite of programmes and databases for comparative analysis of genomic sequences where mVISTA is one of the online tools to align and compare the sequences from multiple species. The mVISTA online programme (https://genome.lbl.gov/vista/mvista/submit.shtml, accessed on 21 March 2025) was used in LAGAN mode to align the obtained cp genomes with another 4 different species of the Acanthaceae family [20]. The complete genome sequences were downloaded from NCBI (accession numbers: PQ436353, NC_080349, NC_082429, and NC_080332).

Then, we conducted nucleotide diversity (Pi) analysis by using DNA Sequence Polymorphism software (Dna SP) v6.12 to identify the possible molecular markers that can be applied for phylogenetic analysis. The Pi analysis was performed with a sliding window with a window length of 600 bp and step size of 200 bp.

2.6. Phylogenetic Analysis

The phylogenetic tree was illustrated by selecting a complete cp genome of another 30 species, which showed the highest percentage of identical sequences to H. polysperma. The complete cp genome was downloaded from Genbank of NCBI (Table S1). The common CDS of cp genomes shared among 31 species was analyzed using the script provided by [17]. The common CDS nucleotides of all 31 cp genomes were aligned by using the Clustal Omega online platform (https://www.ebi.ac.uk/jdispatcher/msa, accessed on 24 March 2025) to obtain a multiple-sequence alignment file. Phylogenetic analysis was performed via maximum likelihood (ML) analysis on the nucleotide alignment of 66 common CDSs using MEGA11 [21]. ML phylogenetic inference was performed by using a general time-reversible model with a gamma distribution of substitution rate among sites (GTR+G) and applying 1000 replicates to optimize the ML method. Bootstrap analysis was performed to determine the support of each branch in the phylogenetic tree constructed. The percentage of trees in which the associated taxa clustered together is shown next to the branches. We obtained the initial tree for the heuristic search automatically by applying Neighbour-Join and BioNJ algorithms to a matrix of pairwise distances estimated using the Maximum Composite Likelihood (MCL) approach, and then selecting the topology with a superior log likelihood value. A discrete Gamma distribution was used to model evolutionary rate differences among sites (5 categories (+G, parameter = 0.2505)). There was a total of 53,140 positions obtained in the final dataset.

3. Results

3.1. Species Identification

The fresh plant sample was collected in the South China Botanical Garden and initially identified as H. polysperma based on phenotype. To further confirm the species identity, sequencing was carried out and the result showed that the highest cp genome sequence similarity belongs to a species called Hygrophila sp. (accession number: OQ984266) with a mere genus name without exact species variation. Thus, the gene sequence of the psb-trnH intergenic spacer from this species was selected and input in GenBank for species characterization. It showed that the sequence of the psb-trnH intergenic spacer is highly similar to H. polysperma (accession number: KP744280) at 98.2% of identity.

3.2. General Characterization of Genome

The cp genome of H. polysperma demonstrated a circular structure containing a length of 146,675 bp nucleotides, including 90,570 bp nucleotides in the LSC region (1–90,570 bp), 17,699 bp nucleotides in SSC (109,774–127,472 bp), and a pair of IRs consisting of 19,203 bp nucleotides (90,571–109,773 bp for inverted-repeat A and 127,473–146,675 bp for inverted-repeat B) (Figure 2), forming a typical quadripartite organization. The cp genome of H. polysperma annotated 130 genes consisting of 86 CDSs, 8 rRNAs, and 36 tRNAs (Table 1). There are 22 genes that contain introns in the H. polysperma cp genome. The 130 genes overall can be categorized into four categories, which are photosynthesis-related genes (6 gene groups), self-replication (5 gene groups), other genes (7 gene groups), and genes with unknown function (1 gene group) (Table 2).

There is a total of 38.3% of GC content found in the overall cp genome, with 38.4% distributed in CDSs, 55.5% in rRNA, and 33.5% in non-coding regions (Table 3). There are 30.6%, 31.1%, 19.5%, and 18.8% attributed in the cp genome of H. polysperma to A, T, C, and G bases, respectively. The GC content was found to be higher in the IRs (45.2%) than the regions in LSC (36.4%) and SSC (32.7%). The occurrence of this situation is due to the presence of rRNAs in IRs, resulting in the genome possessing a higher GC content (Figure 3). This situation can be observed in the majority of angiosperm plants [22,23]. The overall length of the CDS region is 74,493 bp, where 68 protein coding genes are located in the LSC region, 11 are located in the SSC region, and 10 of them are distributed in IRs. The GC content is higher in CDSs than the overall cp genome at 38.4%.

When the complete cp genome assembly was input into BLAST (https://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=blastn&PAGE_TYPE=BlastSearch&LINK_LOC=blasthome, accessed on 20 March 2025), it showed that the gene sequence is closely related to another Hygrophila species called H. ringens. Thus, the cp genome of H. ringens was selected to be compared with H. polysperma in the analysis shown below.

3.3. General Comparison of Cp Genome with Another Closely Related Species

Both cp genome sequences of H. polysperma and H. ringens were aligned and compared. Genes consisting of 2 exons include trnH-GUG, trnK-UUU, rps16, trnG-UCC, atpF, rpoC1, trnL-UAA, trnV-UAC, petB, petD, rpl16, rpl2, ndhB, trnI-GAU, trnA-UGC, and ndhA, and those consisting of 3 exons, rps12, pafI, and clpP1, were displayed in the cp genome of H. polysperma. In the cp genome of H. polysperma, genes including ycf3, ycf4, clpP, psbN, and ycf68 showed no significant similarity to any known sequences when compared with H. ringens, and no homologs were detected for the introns of ycf3. However, a single gene, pafI, which contains an extra intron of trnI, was identified.

Thykaloid ycf3 was previously reported as a conserved gene sequence in cyanobacteria, algae, and plants, which is essential for stable accumulation to encode the photosystem I assembly factors in photosynthesis at the post-transcriptional level [24,25,26]. Similarly, the ycf4 gene plays a crucial role in encoding the nonessential assembly factor for photosystem I in higher plants [27]. However, the detected pafI and pafII genes in the H. polysperma cp genome are part of the assembly and maintenance of the photosystem I complex [28]. These photosynthesis-related genes, pafI and pafII, were also previously reported in P. cyrtonema and P. odoratum from the Liliceae family [29] and C. gileadensis from the Burseraceae family [28], consistent with the lack of detectable homology to the ycf3 and ycf4 genes. Similarly, the gene ycf68 is also a photosynthetic-related gene, but the main function of this gene is not fully understood [30,31]. The clpP gene encodes for the Clp protease proteolytic subunit that is present in the H. ringens chloroplast genome, but the clpP1 (caseinolytic protease P1) gene is present in H. polysperma, encoding for the function of ATP-dependent Clp protease proteolytic subunit 1. The clpP gene was previously reported in a green alga, C. reinhardtii, which works as an important chloroplast gene involved in cytochrome b₆f complex and protein ClpP1, which is part of the Clp system, as a catalytic subunit in ATP-dependent ClpP protease [32,33,34]. The basic structure of Clp machinery is conserved throughout evolution as the Clp system plays a central role in plastid development and function [35,36]. Furthermore, no homology was detected for the psbN gene (which encodes for photosystem II protein N), whereas the pbf1 (prolamin-box binding factor1) gene, encoding photosystem biogenesis factor 1 was identified in H. polysperma. It was reported that the psbN gene is required for early assembly and repair of photosystem II in tobacco, N. tabacum [37]. The expression of the psbN gene affects the processing of psbT-psbH intercistronic RNA [38], but the effect of the pbf1 gene located within psbT-psbH is unknown as the function of the pbf1 gene was previously reported to play a role in transcription regulation in the seed of maize [39]. Thus, the pbf1 gene may be performing a different function related to chloroplast-specific processes.

In addition, the gene ndhA in the H. polysperma cp genome which encodes for NADH dehydrogenase subunit A, occupying the largest intron (1077 bp) that is allocated in the SSC region, was detected.

Apart from H. ringens, we found that Asteracantha longifolia (A. longifolia) displays a higher percentage of identical sequences than H. ringens. Therefore, a comparison of the genome sequence was conducted among H. polysperma and A. longifolia as well. Based on the comparison of the cp genome sequence, no homologs were detected for ycf3, ycf4, clpP, and psbN genes, as well as rrn16s, ycf68, rrn23S, rrn4.5s, and rrn5s genes in H. polysperma, but there were significant identifications in A. longifolia.

3.4. Repeat Sequences

As protein-coding regions and conserved gene sequences of the cp genome can be treated as tools for phylogenetic analysis and domestication studies [40], the intron content was evaluated in this study. SSRs, also known as microsatellites, referring to 2–6 base pairs of repeating DNA sequences, function as molecular markers in genetic variation, which are located in non-coding sequence regions, and their impacts vary depending on their location in the gene sequence [41]. In this case, a total of 30 SSRs were found in the cp genome of H. polysperma (Table 4). Based on the perspective of the distribution of SSRs, a majority of the SSRs (24) were found in the LSC region, whereas 2 SSRs were found in the IR and 4 SSRs were located in the SSC region. In addition, there are 26 mono-, 3 di-, and 1 tetranucleotides found based on the result obtained from SSRs containing 2 compound SSRs.

In order to understand the type of repeats in the sequence, they are categorized into tandem, palindromic, forward, reverse, and complementary repeats. After analysis in REPuter and MISA-web programmes, a total of 30 tandem, 13 palindromic, 12 forward, 3 reverse, and 2 complementary repeats were shown from the result. The size of tandem repeats is generally distributed from 10 to 17 bp, 30 to 44 bp for palindromic, 30 to 42 bp for forward, 30 to 37 for reverse repeats, and 15 and 102 bp for each complementary repeat.

There are 22 genes recruiting introns, of which 14 are CDSs and 8 are tRNAs (8 of these are duplicated in the IR). Most genes consist of only a single intron, whereas pafI and clpP1 contain the usual complement of two introns each within the genes. There are 13 genes that are composed of introns located at the LSC, while 4 duplicated genes consisting of introns are distributed within IRs. Only 1 gene employs an intron in the SSC region.

3.5. Codon Usage Analysis

The RSCU chart was constructed to evaluate the rate of preference for codons employed by selected species. A value more than 1.0 indicates a higher preference of codons among all synonymous codons. RSCU analysis of the cp genome was carried out among the five highest percentages of identical cp genome sequences, including H. ringens, A. longifolia, and another 2 species Ruellia elegans (R. elegans) and Ruellia speciosa (R. speciosa). We identified 30,190 codons present in the entire H. polysperma cp genome. After eliminating the CDS gene sequences less than 300 bp (to minimize non-adaptive biases), only 50 CDSs were kept for RSCU analysis, and the total CDS length was 59,334 bp after merging. The codon TTA of leucine was the most used, and AGC and TAC were the codons least used. Arginine, leucine, and serine are the most universal amino acids among the 64 total RSCUs presented (Figure 4). The frequency of start codons ATG and TGG demonstrated no bias (RSCU = 1).

Codon usage preference is important in studying the origin and evolution of green plants, which have been published for various organisms [42]. RSCU demonstrated, out of 32 codons with RSCU values greater than 1, 29 that end with nucleotides A/T. This proves that the H. polysperma cp genome has a stronger preference for codons ending with A/T bases. High RSCU (>1) is often due to mutation bias and natural selection pressure [8].

3.6. IR Contraction and Expansion in the cp Genome

There are four boundary limits shown between LSC-IRb-SSC-IRa regions due to the unique circular quadripartite structure of the cp genome. The difference in IR (IRa and IRb) sequences indicates the variation and identity of species as IRs that are the most conserved regions of the cp genome, and the expansion and shortening of IR boundaries are suggested to be involved in the size difference of the cp genome [43]. The length of IRs among two Hygrophila species and A. longifolia ranges from 19,141 bp to 19,203 bp (Figure 5). The variations within the IR-SC boundary regions with H. ringens are similar except for the presence of ycf2 genes in the IRb regions of H. polysperma, which is different from H. ringens.

The IR contraction of H. polysperma species was then compared within the four closest sequence-similarity members based on the BLAST results. The gene content of H. polysperma was shown to be conserved as the similarity of gene contents in IRs among different species is not consistent, except for H. ringens. ycf2 genes were found in both IRs of all 5 species, but the gene length is significantly shorter than that of Ruellia sp. while being longer than those of H. ringens and A. longifolia in the LSC-IRb region. Similarly, the length of the trnH gene in the IRb region of Hygrophila sp. is longer than those of other species. This phenomenon only occurred in Hygrophila sp. Therefore, the position of the trnH gene that is located exactly in the IRa-LSC region of Hygrophila sp. is conserved. In contrast, the position of the trnH gene from another species is entirely located within the LSC region without elongation.

Genes that show similarity within 5 cp genomes through crossing the junction border include the ycf1 gene that crosses IR-SSC borders, as it has been reported that this ycf1 gene contributes to cp genome analysis in higher plants [44]. The ndhF gene also covers IR-SSC regions of five cp genomes displaying high similarity.

3.7. Comparative Analysis of cp Genome Structure

A comparison among five species was performed through online software mVISTA to visualize the alignment difference with annotation information, as well as to reveal the conservation of the cp genome sequence, as previously carried out with other species [43]. The cp genome was compared with H. polysperma as reference. Based on the result, we found that members from the Ruellia species (R. elegans and R. speciosa) were less divergent between 110,000 and 127,000 bp on the sequences that showed a large missing part due to highly conserved regions for both Hygrophila species and A. longifolia (Figure 6). Apart from that, the conserved non-coding sequences (CNSs) shown in mVISTA exhibited more nucleotide divergence than the coding regions of the sequences. The majority of the coding regions among five species are relatively conservative except ycf1, ndhF, rpl32, trnL-UAG, ccsA, ndhD, psaC, nhdE, ndhG, ndhl, ndhA, ndhH, and rps15, which showed less than 50% similarity compared with both Ruellia species. These results provide detailed information for the development of molecular markers in phylogenetic analysis and plant species identification.

3.8. Nucleotide Diversity (Pi)

Nucleotide diversity (Pi) analysis was performed by selecting only three cp genomes (H. polysperma, H. ringens, and A. longifolia) since there was a substantial part missing in the SSC regions of Ruellia species when mVISTA alignment was carried out. The Pi values varied from 0 to 0.1 as demonstrated in DnaSP6.12 software analysis with a window length of 600 bp and step size of 200-bp. The result of Pi provides information to identify the most preferable sequence divergence. Six apparent peaks (trns-GCU, psaA-pafI, psaI-pafII, ycf2, rpl32, and ycf1) were observed from the result of the overall cp genome, where the trnS-GCU and ycf2 regions belong to the highest peak (Pi > 0.08) among all regions (Figure 7). There are 4 peaks located in the LSC region (trns-GCU, psaA-pafI, psaI-pafII, ycf2), whereas 2 peaks belong to the SSC region (rpl32 and ycf1). The Pi values of LSC and SSC are significantly higher than those of the IRs due to repeated nucleotides present in the sequence, as the circular structure of the genome occurs in a loop form and IRs are generally more conserved than LSC and SSC regions.

3.9. Phylogenetic Analysis

A phylogenetic tree was constructed with the information obtained from BLAST by selecting a complete cp genome of another 30 species that showed the highest percentage of identical sequences with H. polysperma (Table S1). Through multiple sequence alignment of 66 common CDSs for these 31 species, the phylogenetic relationship and the evolutionary process with H. polysperma were revealed. The phylogenetic tree was constructed through maximum likelihood (ML) analysis with almost all exhibiting 100% bootstrap support.

The constructed phylogenetic tree with the highest log likelihood (−181,279.27) is shown (Figure 8). According to the result, there are 15 cp genomes that belong to the Acanthaceae family, which is the same plant family as that of H. polysperma. Acanthaceae is a plant family that consists of one of the most taxonomically diverse, geographically widespread, morphologically and ecologically variable lineages in angiosperms [45]. Plants of genus Hygrophila are derived from the subtribe of Hygrophilinae within the Ruellieae subfamily of the Acanthaceae family, in which Ruellieae includes 10 different subtribes comprising various genera [9,45,46,47,48].

Another 15 cp genomes belong to another four different plant families (Apocynaceae, Gesneriaceae, Malvaceae, and Rubiaceae). There are 5 genera belonging to the Acanthaceae family (Hygrophila, Echinacanthus, Lepidagathis, Ruellia, and Strobilanthes), 1 genus from Apocynaceae (Wrightia), 3 genera belonging to Gesneriaceae (Oreocharis, Petrocodon, and Primulina), 1 from Malvaceae (Durio), and another 1 genus from Rubiaceae (Gardenia). The closely related species sharing the same branch as H. polysperma is H. ringens with 100% bootstrap support. The subgroup is closely connected to A. longifolia and Echinacanthus attenuates in this analysis. This entire group is sister to Strobilanthes tonkinensis and another four species from the Ruellia genus. The analysis robustly supported the species selected from genera Barleria, Lepidagathis, Echinacanthus, Strobilanthes, Hygrophila, Asteracanthus, Ruellia, Blepharis, and Acanthus forming the clade of the Acanthaceae family with species from genera Blepharis and Acanthus as the first diverged subtribe in this clade. This clade is sister to genera from Rubiaceae and Apocynaceae. Nevertheless, the whole tribe including Acanthaceae, Rubiaceae, and Apocynaceae is sister to the tribe of Gesneriaceae and Malvaceae supported with 100% ML bootstrap.

4. Discussion

While H. polysperma is known to be amphibious, the sequenced individual was grown under terrestrial conditions. The fresh plant sample was collected and initially identified as H. polysperma based on its morphological traits. The sample was then subjected to sequencing. The cp genome sequence showed the highest similarity with a species called Hygrophila sp. (accession number: OQ984266). Later, the psb-trnH intergenic spacer confirmed it as H. polysperma (accession number: KP744280) with 98.2% identity. This is a perfect example of plant species identification based on both morphological traits and cp genome.

Based on the results, the cp genome of H. polysperma demonstrated a typical circular quadripartite structure which is similar to most angiosperm with slightly different sizes when compared within same genus due to the different sizes of IR contraction and expansion, as well as in comparison with other closely related species. Especially, the expansion of trnH genes in the LSC-IR of the Hygrophila sp. cp genome was observed (Figure 5), which is different from those of Asteracantha and Ruellia.

From the general comparison of the cp genome sequence among the same genus, there are no homologs detected for ycf3, ycf4, clpP, and ycf68 genes in the H. polysperma cp genome compared with H. ringens. While homologs were not detected in our analysis, future studies employing more sensitive detection methods or examining potential pseudogenes could clarify whether these represent true gene losses in H. polysperma. The variation in ycf3, ycf4, and ycf68 genes was suggested to indicate the evolutionary adaptation and specialized functions of chloroplast considering the occurrence of evolutionary pressures and ecological niches for the plant species [29]. The lack of detectable sequence similarity to annotated clpP genes but the presence of clpP1 in the H. polysperma cp genome suggest that clpP1 protease as part of the Clp structure may provide the potential to serve as an editing site for this species since the clpP gene was previously found to have one RNA editing site [49].

Repetitive sequences are widely dispersed in the cp genome and play an important role in genetic diversity and evolution [50]. In this report, there are 30 SSRs found in the H. polysperma cp genome (Table 4). Based on the overall result of SSRs and Pi (Table 4 and Figure 7), the psaA-pafI intergenic region located in the LSC region, as well as showing high nucleotide diversity in our Pi analysis, may serve as a reference of DNA molecular markers for species identification. A previous article stated that the ycf1 gene is the second-largest gene in the chloroplast genome, which is crucial for plant viability [43]. It was proposed that the ycf1 gene is the most variable site in the chloroplast genome to serve as a DNA molecular marker of land plants [44], while the ycf1 gene located in the SSC region of the H. polysperma cp genome exhibits high Pi. In addition, the presence of the ndhD gene can also be treated as a genic marker of the high-variation region for plant molecular identification [43]. Thus, the complete chloroplast genome sequence has provided the data for suitable DNA markers and exact species identification.

This study has reconstructed the well-supported phylogenetic relationship of H. polysperma within the Acanthaceae family among species with similar cp genome sequences and sharing the same CDSs (Figure 8). The taxonomic diversity of Acanthaceae gives the result that almost half of the genera from the phylogenetic tree constructed among 31 species are derived from the Acanthaceae family, whereas the remaining 15 species originate from up to four different plant families such as Apocynaceae, Gesneriaceae, Malvaceae, and Rubiaceae. The latest revised classification reported that, overall, there are 10 subtribes classified into Ruellieae, for instance, Erantheminae (5 genera), Dinteracanthinae (1 genus), Ruelliinae (5 genera), Trichantherinae (6 genera), Strobilanthinae (1 genus), Hygrophilinae (2 genera), Petalidiinae (6 genera), Mcdadeinae (1 genus), Phaulopsidinae (1 genus), and Mimulopsidinae (4 genera), and there are 2 genera that belong to Hygrophilinae, Brillantaisia and Hygrophila, due to the nonmonophyletic nature of the genus Hygrophila [9,45]. However, there is an absence of Brillantaisia sp. shown from the result of cp genome sequence similarity, but only H. ringens from the Hygrophila genus. This showed that the phylogenetic relationship of the Hygrophila genus among the Acanthaceae family should undergo multi-factor verification in addition to comparing the cp genomes and morphological traits. There is an interesting phenomenon to note: only 1 species belonged to the same genus as H. polysperma out of 30 species when comparing the cp genome sequence similarity despite there being numerous species in genus Hygrophila, such as H. difformis, which is also an amphibious plant [8]. In contrast, many other Hygrophila species, including H. difformis, exhibit minimal or no identical sequences when compared to H. polysperma. It is likely that the shared similarities between these Hygrophila species give them the unique characteristics.

5. Conclusions

In this study, we assembled the complete cp genome of H. polysperma. The cp genome exhibits the typical quadripartite architecture and is highly conserved; duplicated genes flank the contracted IR boundaries, and the single-copy regions (LSC and SSC) are intact. These features make the genome a robust reference for developing molecular markers to identify H. polysperma and guide its genetic breeding. H. polysperma is most closely related to H. ringens, and together, they form a clade that is sister to A. longifolia, all belonging to the Acanthaceae (subtribe Hygrophilinae) family. However, based on cp genome sequence similarity, Brillantaisia sp. (the other genus in Hygrophilinae) was absent from our analysis, which only identified H. ringens from the Hygrophila genus. Collectively, our results provide the first genomic resource for H. polysperma, expand the available intraspecific variation data, and facilitate future comparative studies.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/horticulturae11101240/s1, Table S1: Data source of species selected in phylogenetic analysis.

Author Contributions

L.-X.C.: contributed to formal analysis, preparation of figures and writing—original draft preparation; Q.H. and Q.F.: resource collection; H.T., Y.L. (Yuping Li), C.P. and Y.D.: species identification, photograph, supervision and validation; Y.L. (Yongqing Li): investigation, conceptualization, writing—review and editing of manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Biological Resources Programme, Chinese Academy of Sciences (KFJ-BRP-007-017).

Data Availability Statement

The genbank data of H. polysperma is available at NCBI, accession ID number: PV915200.

Acknowledgments

During the preparation of this manuscript, the authors used Kimi K2 and Deepseek V3.1 for the purposes of language editing. The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Kew Royal Botanic Gardens. Hygrophila polysperma (Roxb.) T. Anderson. Available online: https://powo.science.kew.org/taxon/urn:lsid:ipni.org:names:49756-1#distributions (accessed on 18 April 2023).
NCBI. Hygrophila polysperma (Miramar-Weed). Available online: https://pubchem.ncbi.nlm.nih.gov/taxonomy/Hygrophila-polysperma (accessed on 30 March 2025).
Qiu, X.; Xie, Y. New Records of Acanthaceae to Jiangxi Province. Jiangxi Sci. 2020, 38, 643–644. [Google Scholar]
Mukherjee, A. Prospects for Classical Biological Control of the Aquatic Invasive Weed Hygrophila polysperma (Acanthaceae). Ph.D. Thesis, University of Florida, Gainesville, FL, USA, 2011. [Google Scholar]
Doyle, R.D.; Francis, M.D.; Smart, R.M. Interference competition between Ludwigia repens and Hygrophila polysperma: Two morphologically similar aquatic plant species. Aquat. Bot. 2003, 77, 223–234. [Google Scholar] [CrossRef]
Mukherjee, A.; Williams, D.; Gitzendanner, M.A.; Overholt, W.A.; Cuda, J.P. Microsatellite and chloroplast DNA diversity of the invasive aquatic weed Hygrophila polysperma in native and invasive ranges. Aquat. Bot. 2016, 129, 55–61. [Google Scholar] [CrossRef]
Kew Royal Botanic Gardens. Hygrophila ringens . Available online: https://powo.science.kew.org/taxon/urn:lsid:ipni.org:names:60454721-2 (accessed on 30 March 2025).
Li, G.; Zhao, X.; Yang, J.; Hu, S.; Ponnu, J.; Kimura, S.; Hwang, I.; Torii, K.U.; Hou, H. Water wisteria genome reveals environmental adaptation and heterophylly regulation in amphibious plants. Plant Cell Environ. 2024, 47, 4720–4740. [Google Scholar] [CrossRef] [PubMed]
Tripp, E.; Daniel, T.; Fatimah, S.; McDade, L. Phylogenetic relationships within Ruellieae (Acanthaceae) and a revised classification. Int. J. Plant Sci. 2013, 174, 97–137. [Google Scholar] [CrossRef]
Jiao, H.; Chen, Q.; Xiong, C.; Wang, H.; Ran, K.; Dong, R.; Dong, X.; Guan, Q.; Wei, S. Chloroplast genome profiling and phylogenetic insights of the “Qixiadaxiangshui” pear (Pyrus bretschneideri Rehd.1). Horticulturae 2024, 10, 744. [Google Scholar] [CrossRef]
Langmead, B.; Salzberg, S.L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 2012, 9, 357–359. [Google Scholar] [CrossRef]
Dierckxsens, N.; Mardulyn, P.; Smits, G. NOVOPlasty: De novo assembly of organelle genomes from whole genome data. Nucleic Acids Res. 2017, 45, e18. [Google Scholar] [CrossRef]
Tillich, M.; Lehwark, P.; Pellizzer, T.; Ulbricht-Jones, E.S.; Fischer, A.; Bock, R.; Greiner, S. GeSeq—Versatile and accurate annotation of organelle genomes. Nucleic Acids Res. 2017, 45, W6–W11. [Google Scholar] [CrossRef]
Greiner, S.; Lehwark, P.; Bock, R. OrganellarGenomeDRAW (OGDRAW) Version 1.3.1: Expanded toolkit for the graphical visualization of organellar genomes. Nucleic Acids Res. 2019, 47, W59–W64. [Google Scholar] [CrossRef]
Okonechnikov, K.; Golosova, O.; Fursov, M. Unipro UGENE: A unified bioinformatics toolkit. Bioinformatics 2012, 28, 1166–1167. [Google Scholar] [CrossRef] [PubMed]
Beier, S.; Thiel, T.; Münch, T.; Scholz, U.; Mascher, M. MISA-Web: A web server for microsatellite prediction. Bioinformatics 2017, 33, 2583–2585. [Google Scholar] [CrossRef] [PubMed]
Huang, L.; Yu, H.; Wang, Z.; Xu, W. CPStools: A package for analyzing chloroplast genome sequences. iMetaOmics 2024, 1, e25. [Google Scholar] [CrossRef]
Benson, G. Tandem repeats finder: A program to analyze DNA sequences. Nucleic Acids Res. 1999, 27, 573–580. [Google Scholar] [CrossRef]
Li, H.; Guo, Q.; Xu, L.; Gao, H.; Liu, L.; Zhou, X. CPJSdraw: Analysis and visualization of junction sites of chloroplast genomes. PeerJ 2023, 11, e15326. [Google Scholar] [CrossRef]
Mayor, C.; Brudno, M.; Schwartz, J.R.; Poliakov, A.; Rubin, E.M.; Frazer, K.A.; Pachter, L.S.; Dubchak, I. VISTA: Visualizing global DNA sequence alignments of arbitrary length. Bioinformatics 2000, 16, 1046–1047. [Google Scholar] [CrossRef]
Tamura, K.; Stecher, G.; Kumar, S. MEGA11: Molecular evolutionary genetics analysis version 11. Mol. Biol. Evol. 2021, 38, 3022–3027. [Google Scholar] [CrossRef]
Xue, S.; Shi, T.; Luo, W.; Ni, X.; Iqbal, S.; Ni, Z.; Huang, X.; Yao, D.; Shen, Z.; Gao, Z. Comparative analysis of the complete chloroplast genome among Prunus mume, P. armeniaca, and P. salicina. Hortic. Res. 2019, 6, 89. [Google Scholar] [CrossRef]
Raman, G.; Park, K.T.; Kim, J.-H.; Park, S. Characteristics of the completed chloroplast genome sequence of Xanthium spinosum: Comparative analyses, identification of mutational hotspots and phylogenetic implications. BMC Genom. 2020, 21, 855. [Google Scholar] [CrossRef]
Boudreau, E.; Takahashi, Y.; Lemieux, C.; Turmel, M.; Rochaix, J. The chloroplast ycf3 and ycf4 open reading frames of Chlamydomonas reinhardtii are required for the accumulation of the photosystem I complex. EMBO J. 1997, 16, 6095–6104. [Google Scholar] [CrossRef]
Naver, H.; Boudreau, E.; Rochaix, J.D. Functional studies of ycf3: Its role in assembly of photosystem I and interactions with some of its subunits. Plant Cell 2001, 13, 2731–2745. [Google Scholar] [CrossRef] [PubMed]
Wang, X.; Yang, Z.; Zhang, Y.; Zhou, W.; Zhang, A.; Lu, C. Pentatricopeptide repeat protein PHOTOSYSTEM I BIOGENESIS FACTOR2 is required for splicing of ycf3. J. Integr. Plant Biol. 2020, 62, 1741–1761. [Google Scholar] [CrossRef] [PubMed]
Krech, K.; Ruf, S.; Masduki, F.F.; Thiele, W.; Bednarczyk, D.; Albus, C.A.; Tiller, N.; Hasse, C.; Schöttler, M.A.; Bock, R. The plastid genome-encoded ycf4 protein functions as a nonessential assembly factor for photosystem I in higher plants. Plant Physiol. 2012, 159, 579–591. [Google Scholar] [CrossRef] [PubMed]
Al-Andal, A. Comparative analysis of chloroplast genomes in Commiphora gileadensis from Saudi Arabia and Oman reveal evolutionary genetic divergence. Cogent Food Agric. 2025, 11, 2477796. [Google Scholar] [CrossRef]
Yan, M.; Dong, S.; Gong, Q.; Xu, Q.; Ge, Y. Comparative chloroplast genome analysis of four Polygonatum species insights into DNA barcoding, evolution, and phylogeny. Sci. Rep. 2023, 13, 16495. [Google Scholar] [CrossRef]
Wu, M.; Lan, S.; Cai, B.; Chen, S.; Chen, H.; Zhou, S. The complete chloroplast genome of Guadua angustifolia and comparative analyses of neotropical-paleotropical bamboos. PLoS ONE 2015, 10, e0143792. [Google Scholar] [CrossRef]
Gao, L.-Z.; Liu, Y.-L.; Zhang, D.; Li, W.; Gao, J.; Liu, Y.; Li, K.; Shi, C.; Zhao, Y.; Zhao, Y.-J.; et al. Evolution of Oryza chloroplast genomes promoted adaptation to diverse ecological habitats. Commun. Biol. 2019, 2, 278. [Google Scholar] [CrossRef]
Majeran, W.; Wollman, F.A.; Vallon, O. Evidence for a role of clpP in the degradation of the chloroplast cytochrome ^b₆f complex. Plant Cell 2000, 12, 137–149. [Google Scholar] [CrossRef]
Kuroda, H.; Maliga, P. The plastid clpP1 protease gene is essential for plant development. Nature 2003, 425, 86–89. [Google Scholar] [CrossRef]
Ramundo, S.; Rochaix, J.-D. Chloroplast unfolded protein response, a new plastid stress signaling pathway? Plant Signal. Behav. 2014, 9, e972874. [Google Scholar] [CrossRef]
Shikanai, T.; Shimizu, K.; Ueda, K.; Nishimura, Y.; Kuroiwa, T.; Hashimoto, T. The chloroplast clpP gene, encoding a proteolytic subunit of ATP-dependent protease, is indispensable for chloroplast development in Tobacco. Plant Cell Physiol. 2001, 42, 264–273. [Google Scholar] [CrossRef]
Nishimura, K.; van Wijk, K.J. Organization, function and substrates of the essential Clp protease system in plastids. Biochim. Biophys. Acta BBA Bioenerg. 2015, 1847, 915–930. [Google Scholar] [CrossRef] [PubMed]
Torabi, S.; Umate, P.; Manavski, N.; Plöchinger, M.; Kleinknecht, L.; Bogireddi, H.; Herrmann, R.G.; Wanner, G.; Schröder, W.P.; Meurer, J. psbN is required for assembly of the photosystem II reaction center in Nicotiana tabacum. Plant Cell 2014, 26, 1183–1199. [Google Scholar] [CrossRef] [PubMed]
Plöchinger, M.; Schwenkert, S.; von Sydow, L.; Schröder, W.P.; Meurer, J. Functional update of the auxiliary proteins psbW, psbY, HCF136, psbN, terC and ALB3 in maintenance and assembly of PSII. Front. Plant Sci. 2016, 7, 423. [Google Scholar] [CrossRef] [PubMed]
Lang, Z.; Wills, D.M.; Lemmon, Z.H.; Shannon, L.M.; Bukowski, R.; Wu, Y.; Messing, J.; Doebley, J.F. Defining the role of Prolamin-Box Binding Factor1 gene during maize domestication. J. Hered. 2014, 105, 576–582. [Google Scholar] [CrossRef]
Daniell, H.; Lin, C.-S.; Yu, M.; Chang, W.-J. Chloroplast genomes: Diversity, evolution, and applications in genetic engineering. Genome Biol. 2016, 17, 134. [Google Scholar] [CrossRef]
Yao, X.; Ye, Q.; Kang, M.; Huang, H. Microsatellite analysis reveals interpopulation differentiation and gene flow in the endangered tree Changiostyrax dolichocarpa (Styracaceae) with fragmented distribution in central China. New Phytol. 2007, 176, 472–480. [Google Scholar] [CrossRef]
Yao, H.; Xu, Y.; Lan, Y.; Xiang, D.; Jiao, P.; Xu, H.; Qiao, D.; Cao, Y. Codon usage bias analysis in the chloroplast genomes of diatoms within the family Thalassiosiraceae and Skeletonemataceae. Plant Mol. Biol. Rep. 2025. [Google Scholar] [CrossRef]
Huang, Y.; Li, J.; Yang, Z.; An, W.; Xie, C.; Liu, S.; Zheng, X. Comprehensive analysis of complete chloroplast genome and phylogenetic aspects of ten Ficus species. BMC Plant Biol. 2022, 22, 253. [Google Scholar] [CrossRef]
Dong, W.; Xu, C.; Li, C.; Sun, J.; Zuo, Y.; Shi, S.; Cheng, T.; Guo, J.; Zhou, S. ycf1, the most promising plastid DNA barcode of land plants. Sci. Rep. 2015, 5, 8348. [Google Scholar] [CrossRef]
Manzitto-Tripp, E.A.; Darbyshire, I.; Daniel, T.F.; Kiel, C.A.; McDade, L.A. Revised classification of Acanthaceae and worldwide dichotomous keys. TAXON 2022, 71, 103–153. [Google Scholar] [CrossRef]
Wood, J.R.I. Nees, Arnott and some forgotten Acanthaceae types from Asia. Edinb. J. Bot. 1994, 51, 103–115. [Google Scholar] [CrossRef]
McDade, L.; Moody, M. Phylogenetic relationships among Acanthaceae: Evidence from noncoding trnL-trnF chloroplast DNA sequences. Am. J. Bot. 1999, 86, 70–80. [Google Scholar] [CrossRef]
McDade, L.; Daniel, T.; Masta, S.; Riley, K. Phylogenetic relationships within the tribe Justicieae (Acanthaceae): Evidence from molecular sequences, morphology, and cytology. Ann. Mo. Bot. Gard. 2000, 87, 435. [Google Scholar] [CrossRef]
Zhang, J.; Liu, H.; Xu, W.; Wan, X.; Zhu, K. Comparative analysis of chloroplast genome of Lonicera japonica cv. Damaohua. Open Life Sci. 2024, 19, 20220984. [Google Scholar] [CrossRef]
Shukla, N.; Kuntal, H.; Shanker, A.; Sharma, S.N. Mining and analysis of simple sequence repeats in the chloroplast genomes of genus Vigna. Biotechnol. Res. Innov. 2018, 2, 9–18. [Google Scholar] [CrossRef]

Figure 1. Morphological characteristics of H. polysperma. (a) The phenotype of terrestrial H. polysperma; (b) morphological characteristics of seed capsules.

Figure 2. Chloroplast genome of H. polysperma. Genes with introns are marked with asterisks (*).

Figure 3. GC content and gene distribution of each region of the H. polysperma cp genome. The yellow region refers to the LSC region, the blue regions refer to IRs, and the green region refers to the SSC region. The coloured arrows represent the gene distribution in the cp genome.

Figure 4. Relative Synonymous Codon Usage (RSCU) analysis for 20 amino acids and termination codons (Ter) among five species of cp genomes. Codon contents for CDSs of H. polysperma, H. ringens, A. longifolia, R. elegans, and R. speciosa arranged from left to right corresponding to each column in the bar graph. Each amino acid is represented by codons in the boxes below the bar graph, with box colour corresponding to the respective column.

Figure 5. Comparison of LSC, SSR, and IR boundaries of H. polysperma cp genomes and four different species. The LSC, SSC, and IRs are represented with different colours. JLB, JSB, JSA, and JLA represent the connective sites between respective regions of the genome sequence. Gene names are displayed in boxes.

Figure 6. Comparative genome analysis among five species through mVISTA diagram. Light grey arrows and dark grey arrows above the alignments represent genes with their orientation and distribution. Exons are shown in purple, UTR represents 5′- and 3′-untranslated regions (turquoise rectangle), CNS represents a conserved non-coding sequence (red), the size of the sequences is shown on the x-axes, and the percent identities (50–100%) are shown on the y-axes.

Figure 7. Comparative analysis of the nucleotide availability via Pi values of H. polysperma, H. ringens, and A. longifolia presented in a sliding window (window length: 600 bp; step size: 200 bp). The X-axis denotes the midpoint of each window; the Y-axis denotes the nucleotide diversity in each window. LSC represents large single copy; SSC represents small single copy; IR represents inverted-repeat.

Figure 8. The phylogenetic relationship of H. polysperma with another 30 species based on cp genome similarity. The phylogenetic tree was constructed with common protein coding genes among 31 species using the ML method, with bootstrap values displayed at each branch node. H. polysperma is highlighted in red.

Table 1. Summary of cp genome features of H. polysperma.

Genome Features	H. polysperma
Genome size (bp)	146,675
LSC size (bp)	90,570
SSC size (bp)	17,699
IR size (bp)	19,203
GC content (%)	38.3
No. of genes	130
No. of PCGs	86
No. of tRNA	36
No. of rRNA	8

Notes: LSC represents large single copy; SSC represents small single copy; IR represents inverted-repeat; GC represents guanine-cytosine; PCG represents protein coding gene.

Table 2. List of genes annotated in cp genomes of H. polysperma.

Category	Gene Group	Gene Name	Quantity
Photosynthesis	Subunits of photosystem I	psaB, psaA, psaI, psaJ, psaC	5
	Subunits of photosystem II	psbA, psbK, psbI, psbM, psbD, psbC, psbZ, psbJ, psbL, psbF, psbE, psbB, psbT, psbH	14
	Subunits of cytochrome b/f complex	petN, petA, petL, petG, petB, petD	6
	NADH dehydrogenase subunit	ndhJ, ndhK, ndhC, ndhB (×2), ndhD, ndhF, ndhE, ndhG, ndhI, ndhA, ndhH	12
	Subunits of ATP synthase	atpA, atpF, atpH, atpI, atpE, atpB	6
	Large subunits of rubisco	rbcL	1
Self-replication	Proteins of large ribosomal subunit	rpl33, rpl20, rpl36, rpl14, rpl16, rpl22, rpl2, rpl23, rpl32	9
	Proteins of small ribosomal subunit	rps12 (×2), rps16, rps2, rps14, rps4, rps18, rps11, rps8, rps3, rps19, rps7 (×2), rps15	14
	Subunits of RNA polymerase	rpoC2, rpoC1, rpoB, rpoA	4
	Ribosomal RNAs	rrn16 (×2), rrn23 (×2), rrn4.5 (×2), rrn5 (×2)	8
	Transfer RNAs	trnH-GUG, trnK-UUU, trnQ-UUG, trnS-GCU, trnG-UCC, trnR-UCU, trnC-GCA, trnD-GUC, trnY-GUA, trnE-UUC, trnT-GGU, trnS-UGA, trnG-GCC, trnfM-CAU, trnS-GGA, trnT-UGU, trnL-UAA, trnF-GAA, trnV-UAC \| trnC-ACA, trnM-CAU, trnW-CCA, trnP-UGG, trnI-CAU, trnL-CAA (×2), trnV-GAC (×2), trnI-GAU (×2), trnA-UGC (×2), trnR-ACG (×2), trnN-GUU (×2), trnL-UAG	37
Other genes	Maturase	matK	1
	Protease	clpP1	1
	Acetyl-CoA carboxylase	accD	1
	c-type cytochrome synthesis gene	ccsA	1
	Translation initiation factor	infA	1
	Envelope membrane carbon uptake protein	cemA
	Other	pafI, pafII, pbf1	3
Genes of unknown function	Conserved hypothetical chloroplast ORF	ycf1 (×2), ycf2 (×2), ycf15 (×2)	6

Notes: Genes marked with the sign (×2) refer to duplicated genes.

Table 3. Summary of nucleotide concentration in each region of the H. polysperma cp genome.

Region	Length (bp)	A (%)	T (%)	C (%)	G (%)	GC (%)
LSC	90,570	31.3	32.2	18.7	17.7	36.4
SSC	17,699	33.8	33.5	17.1	15.5	32.7
IRa	19,203	27.8	27.0	21.4	23.8	45.2
IRb	19,203	27.0	27.8	23.8	21.4	45.2
Total genome	146,675	30.6	31.1	19.5	18.8	38.3
CDS	74,493	30.2	31.4	17.9	20.5	38.4

Notes: LSC represents large single copy; SSC represents small single copy; IR represents inverted-repeat; CDS represents coding sequence.

Table 4. Type of repeat sequences in H. polysperma cp genome.

No.	Type	Quantity	Start	End	Loc	Loc Type
1	TA	6	4631	4642	trnK-UUU_1-rps16_2	IGS
2	C	15	4651	4665	trnK-UUU_1-rps16_2	IGS
3	A	11	6152	6162	rps16_1-trnQ-UUG	IGS
4	TA	6	7039	7050	trnQ-UUG-psbK	IGS
5	A	12	12,283	12,294	atpF_2-atpF_1	Intron
6	T	10	12,375	12,384	atpF_2-atpF_1	Intron
7	T	11	13,082	13,092	atpF_1-atpH	IGS
8	A	10	16,360	16,369	rps2-rpoC2	IGS
9	TCAA	4	29,279	29,294	petN-psbM	IGS
10	T	10	30,457	30,466	psbM-trnD-GUC	IGS
11	T	12	35,749	35,760	psbC-trnS-UGA	IGS
12	A	11	36,529	36,539	psbZ-trnG-GCC	IGS
13	A	10	36,998	37,007	trnG-GCC-trnfM-CAU	IGS
14	TA	6	42,637	42,648	psaA-pafI_3	IGS
15	A	11	44,614	44,624	pafI_2-pafI_1	Intron
16	T	11	70,950	70,960	clpP1_3-clpP1_2	Intron
17	A	15	71,099	71,113	clpP1_3-clpP1_2	Intron
18	A	11	71,587	71,597	clpP1_2-clpP1_1	Intron
19	A	11	77,197	77,207	petD_1-petD_2	Intron
20	T	10	78,563	78,572	rpoA	Gene
21	A	17	81,500	81,516	rpl14-rpl16_2	IGS
22	T	11	86,724	86,734	ycf2	Gene
23	A	10	86,898	86,907	ycf2	Gene
24	T	11	87,248	87,258	ycf2	Gene
25	G	11	103,922	103,932	trnA-UGC_1-trnA-UGC_2	Intron
26	T	10	112,269	112,278	ndhF-rpl32	IGS
27	T	11	124,973	124,983	ycf1	Gene
28	T	11	125,270	125,280	ycf1	Gene
29	T	10	125,571	125,580	ycf1	Gene
30	C	11	133,314	133,324	trnA-UGC_2-trnA-UGC_1	Intron

Notes: IGS represents intergenic sequences.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chin, L.-X.; Huang, Q.; Fan, Q.; Tan, H.; Li, Y.; Peng, C.; Deng, Y.; Li, Y. Complete Chloroplast Genome of Hygrophila polysperma (Acanthaceae): Insights into Its Genetic Features and Phylogenetic Relationships. Horticulturae 2025, 11, 1240. https://doi.org/10.3390/horticulturae11101240

AMA Style

Chin L-X, Huang Q, Fan Q, Tan H, Li Y, Peng C, Deng Y, Li Y. Complete Chloroplast Genome of Hygrophila polysperma (Acanthaceae): Insights into Its Genetic Features and Phylogenetic Relationships. Horticulturae. 2025; 11(10):1240. https://doi.org/10.3390/horticulturae11101240

Chicago/Turabian Style

Chin, Li-Xuan, Qiurui Huang, Qinglang Fan, Haibo Tan, Yuping Li, Caixia Peng, Yunfei Deng, and Yongqing Li. 2025. "Complete Chloroplast Genome of Hygrophila polysperma (Acanthaceae): Insights into Its Genetic Features and Phylogenetic Relationships" Horticulturae 11, no. 10: 1240. https://doi.org/10.3390/horticulturae11101240

APA Style

Chin, L.-X., Huang, Q., Fan, Q., Tan, H., Li, Y., Peng, C., Deng, Y., & Li, Y. (2025). Complete Chloroplast Genome of Hygrophila polysperma (Acanthaceae): Insights into Its Genetic Features and Phylogenetic Relationships. Horticulturae, 11(10), 1240. https://doi.org/10.3390/horticulturae11101240

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Complete Chloroplast Genome of Hygrophila polysperma (Acanthaceae): Insights into Its Genetic Features and Phylogenetic Relationships

Abstract

1. Introduction

2. Materials and Methods

2.1. Plant Materials, DNA Extraction, and Plastid Genome Sequencing

2.2. Cp Genome Assembly and Annotation

2.3. Repeat Sequences and Codon Usage Analysis

2.4. Inverted-Repeats Contraction and Expansion

2.5. Comparative Analysis of Genome Structure

2.6. Phylogenetic Analysis

3. Results

3.1. Species Identification

3.2. General Characterization of Genome

3.3. General Comparison of Cp Genome with Another Closely Related Species

3.4. Repeat Sequences

3.5. Codon Usage Analysis

3.6. IR Contraction and Expansion in the cp Genome

3.7. Comparative Analysis of cp Genome Structure

3.8. Nucleotide Diversity (Pi)

3.9. Phylogenetic Analysis

4. Discussion

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI