Plastid Genome of Equisetum xylochaetum from the Atacama Desert, Chile and the Relationships of Equisetum Based on Frequently Used Plastid Genes and Network Analysis

The modern pteridophyte genus Equisetum is the only survivor of Sphenopsida, an ancient clade known from the Devonian. This genus, of nearly worldwide distribution, comprises approximately 15 extant species. However, genomic information is limited. In this study, we assembled the complete chloroplast genome of the giant species Equisetum xylochaetum from a metagenomic sequence and compared the plastid genome structure and protein-coding regions with information available for two other Equisetum species using network analysis. Equisetum chloroplast genomes showed conserved traits of quadripartite structure, gene content, and gene order. Phylogenetic analysis based on plastome protein-coding regions corroborated previous reports that Equisetum is monophyletic, and that E. xylochaetum is more closely related to E. hyemale than to E. arvense. Single-gene phylogenetic estimation and haplotype analysis showed that E. xylochaetum belonged to the subgenus Hippochaete. Single-gene haplotype analysis revealed that E. arvense, E. hyemale, E. myriochaetum, and E. variegatum resolved more than one haplotype per species, suggesting the presence of a high diversity or a high mutation rate of the corresponding nucleotide sequence. Sequences from E. bogotense appeared as a distinct group of haplotypes representing the subgenus Paramochaete that diverged from Hippochaete and Equisetum. In addition, the taxa that were frequently located at the joint region of the map were E. scirpoides and E. pratense, suggesting the presence of some plastome characters among the Equiseum subgenera.


Introduction
Equisetum L. is a genus of vascular plants that represents ancient Sphenopsida, a long-enduring clade known from fossils of the Devonian and later ages and, therefore, is considered useful in understanding the evolution of vascular plants. This genus is comprised of approximately 15 extant species, with a nearly worldwide distribution [1,2]. Previous studies have examined the evolutionary relationships among stem and crown Equisetum species using both morphology and genomic data. However, because morphology can vary as the result of hybridization and climate differences, molecular approaches have become popular. Recent studies have indicated three subgenera, including the primitive subgenus Paramochaete, and the later diverging subgenera Equisetum and Hippochaete. However, such relationships were estimated from relatively few plastid genes, e.g., rbcL, rps4, and trnL-F e.g., [3][4][5][6].
Among reported pteridophyte plastome sequences, only three were from Equisetum species: one from E. hyemale [7] and two from US and Korean E. arvense [8,9], which were placed in subgenera Equisetum and Hippochaete, respectively e.g., [3][4][5][6]. These reports revealed plastome variation among Equisetum species. The two E. arvense genomes differed by 417 bp and the E. hyemale genome was about 1.5 kbp smaller than that of E. arvense. In addition, rpl16 of E. arvense had an intron that is not present in E. hyemale [7][8][9]. These observations indicate that additional chloroplast genomes would be useful in evaluating evolutionary trends in this long-enduring genus.
Previous Equisetum plastome information was obtained using PCR amplification or from organelle-enriched DNA. The advancement of sequencing technologies and computational techniques allowed us to obtain complete organelle genome sequences from the shotgun metagenomic data we have archived for E. xylochaetum, presenting an additional technical option for obtaining Equisetum plastid genomes. Therefore, in this study, we assembled the complete plastid genome of E. xylochaetum, a giant species endemic to the Atacama Desert of Chile, South America, and used the information obtained to explore the evolution of Equisetum plastid genomes and the phylogenetic relationship of Equisetum species, as well as to determine whether the phylogeny estimated by using the popular plastid conserved regions was congruent with the haplotype mapping results of the corresponding sequences. Results showed that we successfully constructed the de novo plastid genome of E. xylochaetum using the shotgun metagenomic data. Phylogenetic estimations and comparison of Equisetum plastid genomes showed that E. xylochaetum was in the subgenus Hippochaete and that the Equisetum plastid genomes from subgenera Hippochaete and Equisetum were conserved in terms of genome structure, gene content, and gene order. Furthermore, results from TCS haplotype mapping showed that some of the taxa had a higher level of nucleotide diversity and some of the taxa shared common nucleotide haplotypes. Therefore, more conserved nucleotide regions and complete plastid genomes are needed for a better understanding of the evolutionary relationships of Equisetum.
A comparison of Equisetum plastid genomes showed a collinear relationship, forming only one syntenic block in whole genome alignment. The genomes had similar genome size, % GC, gene content, gene length, and had identical gene order (Tables 1 and 2). The proteincoding regions of E. xylochaetum plastid genes were subjected to purifying selection when compared against the corresponding protein-coding genes of E. hyemale and E. arvense. Protein-coding regions of Equisetum species were similar in size, ranging between having the same length in atpB , E, F, H, I, clpP, infA, ndhB, C, D, E, H, I, petB, D, G, L, N,  psaA, B, C, I, J, M, psbA, B, C, D, E, F, H, I, J, K, L, M, N, Z, rbcL, rpl14, 16, 22, 23, 32, 33, 36, rps2, 3, 4, 7, 8, 11, 12, 14, 15, 18, and 19 to having a 195 bp or 64 amino acids difference in accD. These genes have a similar number and position of introns except for the presence of 753 bp of intron in rpl16 in E. arvense. The percentage of the identical nucleotide of the aligned sites ranged from 88.7 percent in matK to 99.2 percent in psbJ, while the percentage of the identical derived amino acid of the aligned sites ranged from 73.9 percent in atpF to 100 percent in atpH, ndhE, petB, D, G, N, psaJ, psbA, E, F, I, J, L, Z, rpl36, and rps2 (Table 1). Phylogenetic estimation of Equisetum using plastome protein-coding sequences suggested that the known complete plastid genomes of Equisetum species formed a monophyletic clade of the two subgenera, Hippochaete and Equisetum. The newly assembled E. xylochaetum plastome indicates placement within Hippochaete with E. hyemale ( Figure 2).      Single-gene ML phylogenetic analysis of atpB, matK, rpoB, rps4, and trnL-F resolved the known subgenera of Equisetum, including Paramochaete, Hippochaete, and Equisetum (Figures 3-7). The majority of the Equisetum species were resolved with ML bootstrap values of at least 50. However, the monophyly of some Equisetum species could not be resolved. The monophyly of E. arvense and E. variegatum was not resolved in the matK tree, the monophyly of E. bogotense, E. laevigatum, E. myriochaetum, E. hyemale, and E. giganteum was not resolved in the rps4 tree, and the monophyly of E. hyemale, E. praealtum, E. ramosissimum, E. trachyodon, and E. xylochaetum was not resolved in the trnL-F tree. All hybrid taxa were phylogenetically placed within the clade consisting of the majority of their maternal parent, if the monophyly of the taxa was absent. In the case of the rps4 tree, these hybrids included Equisetum x fontqueri isolate 26093 located within the clade of E. telmateia, Equisetum x litorale isolates 41084 and 41085 with E. arvense, Equisetum x schaffneri isolates 40813 and 40824 with E. giganteum, and Equisetum x schaffneri isolate 40814 with E. myriochaetum. For trnL-F, the hybrid taxa Equisetum x ferrissii (AY226113) located in the clade with E. laevigatum, Equisetum x litorale isolates 41084 and 41085 with E. arvense, and Equisetum x schaffneri isolate 40814 with E. myriochaetum.    TCS haplotype network analyses using atpB, matK, and rpoB resolved distinct clades representing each of the Equisetum subgenera. At the species level, haplotype networks constructed using atpB and rpoB showed one haplotype for each Equisetum species. In contrast, maps of matK, rps4, and trnL-F resolved more than one haplotype for some species and resolved some haplotypes that consisted of more than one species. For matK, there was more than one haplotype for E. arvense and E. hyemale and there was one haplotype that consisted of sequences from E. arvense and E. variegatum (Figure 4).
Haplotype maps of rps4 and trnL-F seemed to be more complex compared to those of atpB, matK, and rpoB. In the map of rps4 (Figure 6), we observed 10 haplotypes, of which, haplotypes 1-3 of E. bogotense appeared as a distinct group representing subgenus Paramochaete. A few haplotypes consisted of only one Equisetum species, which were haplotype 4 for E. palustre, haplotype 5 for E. diffusum, and haplotype 8 for E. scirpoides. The hybrid taxa were embedded within the same haplotypes as their maternal taxa. These included Equisetum x fontqueri isolate 26093 that was in haplotype 6 with E. telmateia, Equisetum x litorale isolates 41084 and 41085 in haplotype 7 with E. arvense, Equisetum x schaffneri isolates 40813 and 40824 in haplotype 9 with E. giganteum, and Equisetum x schaffneri isolate 40814 in haplotype 9 with E. myriochaetum. Some haplotypes consisted of many plant species, i.e., haplotype 7 and 9, where the majority of Equisetum and Hippochaete were placed together, respectively. Interestingly, a rps4 sequence from E. hyemale grouped with other sequences of that species but also was present as a unique haplotype, as haplotype 10 with E. praealtum isolate 41501.
Some Equisetum species were resolved as more than a single haplotype. E. hyemale isolate 20201 was resolved as a unique haplotype 14 while E. hyemale isolate 1273o was located in haplotype 10 with E. variegatum. For E. variegatum, in addition to its member in haplotype 10, E. variegatum isolates 40820 and 40823 were resolved as additional unique haplotypes 11 and 12. In addition, E. myriochaetum isolate 40826 was present as haplotype 15, while most members were located in haplotype 13.    Most of the hybrid taxa in the trnL-F map were placed in the same haplotypes as their maternal taxa. Equisetum x ferrissii (AY226113) was located in haplotype 16 with its maternal taxon, E. laevigatum, Equisetum x ferrissii in haplotype 13 with the majority of E. hyemale, Equisetum x litorale isolates 41084 and 41085 in haplotype 8 with E. hyemale, Equisetum x schaffneri isolates 40813 and 40824 in haplotype 13 with E. giganteum, and Equisetum x schaffneri isolate 40814 in haplotype 13 with the majority of E. myriochaetum.

Discussion
In this study, we assembled the complete plastid genome of E. xylochaetum from shotgun metagenomes of E. xylochaetum sampled from two Atacama Desert locales exhibiting different degrees of disturbance. Results showed that the plastid genomes constructed from these two E. xylochaetum metagenome accessions were identical, suggesting that the Equisetum samples were from the same Equisetum population. Comparison of nucleotide, and their derived protein, sequences of this newly assembled E. xylochaetum plastid genome to those of E. hyemale and E. arvense showed that the Equisetum plastid genomes were highly conserved in terms of structure and function, even though the two subgenera (Hippochaete and Equisetum) might have diverged as early as 135 Mya during the early Cretaceous [4,6]. All the plastid protein-coding sequences were subjected to purifying selection, with genes of the same type having identical nucleotide percentages and having nucleotide identity ranging from 88.7-99.2 percent. The only major difference in gene structure was presence of the intron in E. arvense rpl16. To determine where and when the intron of rp116 originated in the Equisetum lineage, more Equisetum rpl16 sequences or complete plastid genomes are required.
In a broad sense, the phylogenetic positions of Equisetum species inferred by using all protein-coding sequences along with their derived proteins and the single-gene analysis present in this study were congruent with results from previous studies that used a singlegene approach [11] or a combination of multi-genes and morphological characters e.g., [4][5][6], where Equisetum formed monophyletic clades of each subgenus and placed E. xylochaetum in Hippochaete. Despite the presence of the high conservation level of Equisetum plastid genes, it was surprising to us that the single-gene phylogenetic approach was not sufficient to resolve relationships among Equisetum taxa, especially those closely related taxa placed in subgenus Hippochaete, e.g., E. giganteum, E. variegatum, and E. hyemale. Therefore, it is evident that more Equisetum plastid genomes, plus additional molecular information from other genetic compartments, are needed.
The addition of haplotype mapping provided in this study enhanced the understanding of how plastid genes from each taxon are related. In general, the haplotype maps reflected the relationship resolved from phylogenetic estimation using the corresponding nucleotide regions. Even so, these new maps aid the visualisation of how these plastome nucleotide data were interrelated to each other at the level of isolate, species, and subgenus. The presence of only one shared distinct haplotype of an Equisetum species, though its samples were collected from different locales, suggested a high conservation level of the corresponding genes within its plastid genomes. On the other hand, the presence of more than one haplotype at the specific level suggested the presence of nucleotide diversity, indicating the need to further examine the populations of E. arvense, E. bogotense, E. hyemale, E. variegatum, and E. myriochaetum. In addition, the presence of a haplotype consisting of more than one Equisetum species, e.g., haplotypes 8 and 9 of the rps4 map and haplotypes 8 and 13 of the trnL-F map, suggested that these conserved regions alone were not sufficient for studying the relationship and diversity of Equisetum taxa. These findings emphasize the need for more Equisetum plastid genomes.
The presence of distinct haplotype(s) in the early-diverging species E. bogotense in rps4 and trnL-F suggested that these plastid sequences might not represent the ancestral characters of Equisetum. Instead, these E. bogotense samples may only represent the survival representatives of the extinct members that also evolved during the course of time. In contrast, according to the rps4 and trnL-F maps, the taxa that frequently occurred at the junction region between each subgenus were E. scirpoides and E. pratense, suggesting that these taxa might be particularly helpful for understanding how the Equisetum subgenera diverged.

Materials and Methods
Nucleotide data for Equisetum xylochaetum Mett. were obtained from GenBank BioProject PRJNA555713 [12], generated by metagenomic shotgun sequencing of the microbiome of giant Equisetum xylochaetum sampled from two streambed locales in the Atacama Desert of northern Chile that differed in the degree of human disturbance. The two raw data sets, separately archived in accessions SRX6486516 and SRX6486517, each represented pooled replicate DNA extractions from both above-ground green and below-ground non-green tissues. To obtain the complete chloroplast genome of E. xylochaetum, metagenomic sequences were trimmed using Trimmomatic v. 0.39 [13] using the parameter sliding window:4:30. Next, the trimmed sequences from the two raw data sets were independently assembled using MEGAHIT ver. 1.2.9 [14] with the parameter "bubble-level equal to 0" in order to prevent the merging of sequences that were highly similar, e.g., sequences from closely related species or sequences that display single nucleotide polymorphisms. Each assembly yielded a contig of the complete plastid genome of E. xylochaetum, and these two contigs were identical in sequence. To validate the assembly, we calculated the coverage of the plastid genomes using the methods described in Satjarak and Graham [15]. One of the two contigs, which had the mean coverage of 706 fold, was then selected for annotation of protein-coding genes using proteins inferred from E. arvense [8,9] and E. hyemale [7] as references. The tRNAs and rRNA genes were annotated using tRNAscan-SE On-line [16] and the RNAmmer 1.2 Server [17], respectively. The complete plastid genome of E. xylochaetum was deposited in GenBank under accession number MW282958. A representative plant specimen has been deposited at the University of Concepción herbarium under accession number CONC-CH 6005.
We compared the plastid genome of E. xylochaetum obtained from this study to other complete Equisetum plastid genomes, including E. hyemale (KC117177), E. arvense from the US (GU191334), and E. arvense from Korea (JN968380). We examined general characteristics of the genomes, including the genome size, %GC, the gene content, gene length, gene order, and polymorphism of nucleotides within coding regions and their derived proteins. To consider nucleotide polymorphisms, we aligned the protein-coding sequences (Table 1) using Geneious translation alignment: global alignment with free end gap, standard genetic code, and Identity (1.0/0.0) cost matrix (Geneious ver. 9.1.3; https://www.geneious.com; accessed in 31 January 2022).
The mode of evolution of protein-coding regions was performed using the method described in Mekvipad and Satjarak [18]. For the polymorphism of protein sequences, we aligned the derived protein sequences using MAFFT alignment: auto algorithm and Blosum62 scoring matrix [19]. To investigate the relationship of E. xylochaetum and other Equisetum complete chloroplast genomes, Psilotum nudum (NC_003386.1) was used as an outgroup. The protein-coding sequences and protein sequences (Table 1) were similarly aligned, trimmed using Trimal ver. 1.2 [20], and concatenated. The nucleotide data matrix was 60,987 bp and the protein data matrix consisted of 19,435 amino acids. Phylogenetic relationships were estimated using maximum likelihood and Bayesian frameworks as described in Satjarak and Graham [15].
To investigate whether the Equisetum relationship resolved from frequently-used nucleotide sequences reported in previous studies exhibited grades or evolutionary intermediates, we performed haplotype network analysis of selected, frequently-used Equisetum conserved regions. These included atpB, matK, rpoB, rps4, and trnL-F (Table 3). To prepare the data matrices, the conserved nucleotide regions were extracted from the complete plastid genomes and from DNA sequences from other published studies (Table 3). Next, the data were aligned, and the phylogenetic trees were estimated using the methods described above. The haplotype network analysis was calculated using (Templeton, Crandall, and Sing; TCS) [21] and visualized in PopArt v1.7 [22].

Conclusions
In summary, our study demonstrated that metagenomic data can be a useful way to obtain plastid genomes. The comparison of the de novo plastid genome of E. xylochaetum with other reported Equisetum plastomes showed a high degree of conservation in terms of structure, gene content, gene order, and nucleotide polymorphisms. Even so, this new plastid genome provided additional information about the evolution and diversity of Equisetum, e.g., the presence of an intron in rpl16. Haplotype analyses of the selected conserved nucleotides showed that some Equisetum species were distantly related to other taxa, inferred from the presence of distinct haplotypes. Many of the taxa appeared as shared haplotypes, suggesting that the molecular data we currently have might not be sufficient for a full understanding of the evolutionary relationship of Equisetum and that more Equisetum plastid genomes are needed.