Complete Mitochondrial Genome of Suwallia teleckojensis (Plecoptera: Chloroperlidae) and Implications for the Higher Phylogeny of Stoneflies

Stoneflies comprise an ancient group of insects, but the phylogenetic position of Plecoptera and phylogenetic relations within Plecoptera have long been controversial, and more molecular data is required to reconstruct precise phylogeny. Herein, we present the complete mitogenome of a stonefly, Suwallia teleckojensis, which is 16146 bp in length and consists of 13 protein-coding genes (PCGs), 2 ribosomal RNAs (rRNAs), 22 transfer RNAs (tRNAs) and a control region (CR). Most PCGs initiate with the standard start codon ATN. However, ND5 and ND1 started with GTG and TTG. Typical termination codons TAA and TAG were found in eleven PCGs, and the remaining two PCGs (COII and ND5) have incomplete termination codons. All transfer RNA genes (tRNAs) have the classic cloverleaf secondary structures, with the exception of tRNASer(AGN), which lacks the dihydrouridine (DHU) arm. Secondary structures of the two ribosomal RNAs were shown referring to previous models. A large tandem repeat region, two potential stem-loop (SL) structures, Poly N structure (2 poly-A, 1 poly-T and 1 poly-C), and four conserved sequence blocks (CSBs) were detected in the control region. Finally, both maximum likelihood (ML) and Bayesian inference (BI) analyses suggested that the Capniidae was monophyletic, and the other five stonefly families form a monophyletic group. In this study, S. teleckojensis was closely related to Sweltsa longistyla, and Chloroperlidae and Perlidae were herein supported to be a sister group.


Introduction
In metazoans, the mitochondrial genome (mitogenome) is usually a circular, double-stranded molecule, ranging in size from 13 to 16 kb [1,2]. It contains a remarkably conserved set of 37 genes, i.e., 13 protein coding genes (PCGs), 2 ribosomal RNA (rRNA), and 22 transfer (tRNA) genes [1,2]. Additionally, an A + T-rich region is known as the non-coding region or the control region (CR), which is involved in the initiation of transcription and replication [2]. Because of their abundance, small size, fast rate of evolution, and low levels of sequence recombination, mitochondrial genomes are increasingly applied to comparative and molecular evolution, phylogenetic studies, and population genetics [3].
The Plecoptera (stoneflies) is a small order of hemimetabolous insects. It is comprised of 16 families and includes about 3900 described species worldwide [4,5]. Chloroperlidae includes more than 200 species mainly in the Holarctic region [5]. The chloroperlid genus Suwallia Ricker, 1943 is primarily distributed in North America, Japan, Russia, and Mongolia [6,7]. Recently, four species of the chloroperlid genus Suwallia, S. teleckojensis, S. decolorata, S. talalajensis, and S. wolongshana are reported from China [8][9][10].   The gene order of S. teleckojensis is identical to Drosophila melanogaster Meigen, which is considered to be the typical arrangement of the inset mitochondrial genes [1]. All genes have similar locations and strands to those of other published stoneflies. The arrangement of mitochondrial genes of S. teleckojensis is the same as S. longistyla [31] and should be conservative in all sequenced stoneflies.
The nucleotide composition of the mitogenome was biased toward A and T, with 66.7% of A + Figure 1. Map of the mitochondrial genome of S. teleckojensis. Direction of gene transcription is indicated by the arrows. Protein-coding genes (PCGs) are shown as blue arrows, rRNA genes as purple arrows, tRNA genes as red arrows, and control region (CR) as gray arrows. tRNA genes are labeled according to single-letter IUPAC-IUB abbreviations (L1: UUR, L2: CUN, S1: AGN, S2: UCN). The GC content is plotted using a black sliding window, as the deviation from the average GC content of the entire sequence. GC skew is plotted as the deviation from the average GC skew of the entire sequence.

Protein-Coding Genes and Codon Usage
The total length of all 13 PCGs of S. teleckojensis was 11,244 bp. All but two of the PCGs in S. teleckojensis utilize the conventional translational start codons for invertebrate mitochondrial DNA (mtDNA). For example, one PCG (COI) contained an ATT codon, two PCGs (ND3 and ND6) contained ATC codons, and eight PCGs (ND2, COII, ATP8, ATP6, COIII, ND4, ND4L, and CytB) initiated with ATG codons. Two exceptions were ND5 and ND1, which used GTG and TTG as a start codon, respectively, as reported for most other Plecoptera [22,30]. Eleven PCGs used the typical termination codons TAA and TAG in S. teleckojensis, while only two PCGs (COII and ND5) stopped with the incomplete termination signal T. The stop codons TAA and TAG always had an overlap comprising several nucleotides with the downstream tRNAs (Table 2), which was supposed to act as a "backup" to prevent translation read-through if the transcripts were not properly cleaved [43]. The presence of incomplete stop codons is a feature shared with many arthropod mitochondrial genomes [2] and these truncated stop codons were assumed to be completed post-transcriptionally by the polyadenylation of mature mRNA [44].
The relative synonymous codon usage (RSCU) reflects the influence of strongly biased codon usage [46]. The wide base compositional biases of the genome for AT was well reflected in the codon usage. There is a strong bias toward AT-rich codons with the four most prevalent codons in S. teleckojensis in the order: TTA (Leu), ATT (Ile), TTT (Phe), and ATA (Met) ( Table 4). The four most prevalent codons in S. teleckojensis were similar to those in Capnia zijinshana [28]. Molecular processes, such as translational selection efficiency and accuracy, may influence the codon usage. They apparently have a stronger influence in organisms with rapid growth rates [47].

Transfer RNAs
Twenty-two typical tRNAs in the arthropod mitogenomes were found in S. teleckojensis and their respective secondary structures are shown in Figure 2. All of tRNA genes could be folded as typical cloverleaf structures except for tRNA Ser(AGN) , in which its dihydrourine (DHU) could only form a loop (8 bp) with the DHU stem loss. In general, this phenomenon is a common condition in metazoan mtDNAs [48].
The length of S. teleckojensis tRNAs ranged from 65 to 71 bp (Table 2), similar to that of S. longistyla [31]. Like most of reported insect tRNAs, the lengths of the aminoacyl (AA) stem (7 bp), the anticodon (AC) stem (5 bp), and the AC loop (7 bp) of S. teleckojensis were invariable ( Figure 2). The lengths of DHU arms and TΨC arms were more variable, ranging from 3 to 5 bp. The length of the AC stems was always conservative except that tRNA Ser(AGN) had the longest base pairing of all 22 tRNAs.
A total of 38 unmatched base pairs were identified in the stems of 17 different tRNAs. Compared with other stoneflies, most of the mismatched nucleotides were G-U pairs (29 base pairs), which can form a weak bond in tRNAs and non-canonical pairs in tRNA secondary structures [49]. These contained G-U pairs in amino acid arms (9 bp), DHU arms (7 bp), anticodon arms (10 bp), and TΨC arms (3 bp). The other unmatched base pairs were two U-U pairs in the AC stem of tRNA Ala and tRNA Tyr , three A-C pairs in the TΨC stem of tRNA Leu(UUR) and in the AA stem of tRNA Arg and tRNA Ser(AGN) , three A-A pairs in the AA stem of tRNA Gly , tRNA Leu(UUR) and tRNA Ser(UCN) , and one U-C pair in the AA stem of tRNA Met (Figure 2).

Ribosomal RNAs
Like other insect mitogenomes, the large and small rRNA subunits (lrRNA and srRNA) in S. teleckojensis were located at tRNA Leu(CUN) , tRNA Val , and tRNA Val -the control region, respectively ( Figure 1 and Table 2). The lengths of lrRNA and srRNA were 1329 and 800 bp, respectively.
The secondary structure of srRNA contained three domains (Figure 4). Similar to S. spinicercia, domain I was more variable than domains II and III, whereas Helix 1399 was the most conserved region [22]. In addition, S. teleckojensis possessed more nucleotides in Helix1068, Helix1074, and Helix 577.
( Figure 1 and Table 2). The lengths of lrRNA and srRNA were 1329 and 800 bp, respectively.
The secondary structures of both lrRNA and srRNA were drawn based on other insect models [22,50]. The secondary structure of lrRNA of S. teleckojensis largely resembled previously published structures for Styloperla spinicercia (Linnaeus, 1763) (Insecta: Plecoptera: Styloperlidae) [22]. It consisted of 5 structural domains (I, II, IV-VI) with domain III absent, which is a typical trait in arthropods (Figure 3) [51]. Compared with S. spinicercia, domains I, II, and IV were variable, whereas 5 helices (H2455, H2507, H2520, H2547, and H2588) within domain V had the highest similarity [22]. The secondary structure of srRNA contained three domains (Figure 4). Similar to S. spinicercia, domain I was more variable than domains II and III, whereas Helix 1399 was the most conserved region [22]. In addition, S. teleckojensis possessed more nucleotides in Helix1068, Helix1074, and Helix 577.
We also compared the lrRNA and srRNA of S. teleckojensis with S. longistyla. The secondary structure of lrRNA and srRNA had high similarity (pair-wise sequence identity was 86.77% and 90.10%, respectively) and they should be conservative in the family Chloroperlidae.

The Control Region
Previously, the A + T-rich region, also called the control region (CR) was reported to contain elements essential to the initiation of replication and transcription [52]. The A + T-rich region of the S. teleckojensis mitogenome was 1249 bp in size, possessed an A + T content of 79.1%, and mapped between the srRNA and tRNA Met gene cluster (Figure 1). The control region in the S. teleckojensis We also compared the lrRNA and srRNA of S. teleckojensis with S. longistyla. The secondary structure of lrRNA and srRNA had high similarity (pair-wise sequence identity was 86.77% and 90.10%, respectively) and they should be conservative in the family Chloroperlidae.

The Control Region
Previously, the A + T-rich region, also called the control region (CR) was reported to contain elements essential to the initiation of replication and transcription [52]. The A + T-rich region of the S. teleckojensis mitogenome was 1249 bp in size, possessed an A + T content of 79.1%, and mapped between the srRNA and tRNA Met gene cluster (Figure 1). The control region in the S. teleckojensis mitogenome was longer than most other stoneflies and the A + T content was slightly lower than that of S. longistyla (80.1%). The following structural elements were summarized in the control region of S. teleckojensis mitogenome: (1) a leading sequence (736 bp, containing SL1 and poly-T) adjacent to srRNA, (2) a tandemly repeated sequence block consisting of six complete and one incomplete tandem repeat units (7 bp Figure 5B). There was also a 9 bp poly-T stretch (15,472) near the 5′ end of the CR. In the remainder of the control region, like the S. longistyla control region, we also found lots of polynucleotide repeats (≥7 bp) including 2 poly-A, 1 poly-T, and 1 poly-C, which is unusual for a mitogenome control region [31]. When compared with the CRs of the six other stonefly species, four conserved blocks were identified in S. teleckojensis: CSB1 (15,764), CSB2 (15,794), CSB3 (15,824), and CSB4 (15,847-15,900) ( Figure 6). These CBSs ranged in size from 18 to 54 bp, and their sequence identity among species was generally over 50%. The function of these conserved blocks is unclear. Further study on the mechanistic basis of mtDNA replication is warranted. Two Stem-loop (SL) structures were predicted in the CR: SL1 (15,504) and SL2 (15,790). The proposed stem-loop (SL) structures with a 3 flanking G(A) n T motif were detected after SL1 and SL2, but the 5 flanking TATA motif was only detected after SL2 ( Figure 5B). There was also a 9 bp poly-T stretch (15,472) near the 5 end of the CR. In the remainder of the control region, like the S. longistyla control region, we also found lots of polynucleotide repeats (≥7 bp) including 2 poly-A, 1 poly-T, and 1 poly-C, which is unusual for a mitogenome control region [31]. When compared with the CRs of the six other stonefly species, four conserved blocks were identified in S. teleckojensis: CSB1 (15,764), CSB2 (15,794), CSB3 (15,824), and CSB4 (15,847-15,900) ( Figure 6). These CBSs ranged in size from 18 to 54 bp, and their sequence identity among species was generally over 50%. The function of these conserved blocks is unclear. Further study on the mechanistic basis of mtDNA replication is warranted. Sci. 2018, 19, x FOR PEER REVIEW 12 of 18 Figure 6. The alignment of conserved structural elements of CRs among stoneflies. Sequence identity among species was indicated by colored boxes. CSB1-4 indicates four conserved sequence blocks in the CRs. Blue color means 100% sequnce identity. Pink color means 75% sequnce identity. Green color means 50% sequnce identity.

Phylogenetic Relationship
Phylogenetic analyses were performed using nucleotide sequences of 13 PCGs from 16 Plecoptera species, and two Ephemeroptera species were included as the outgroup (Table 1). Bayesian (BI) and maximum likelihood (ML) analyses generated the same tree topologies based on the PCG13 (including 13 PCGs) and PCGR (including 13 PCGs and 2 rRNAs) matrices (Figures 7 and  A1). The resultant tree from the BI and ML analyses using PCG13 and PCGR datasets showed strong support for a monophyletic Capniidae, and the other five stonefly families formed a monophyletic group. In this monophyletic group, S. teleckojensis was closely related to S. longistyla (with bootstrap value of 92/100 and posterior probability of 0.98/1.00 in ML tree and BI tree, respectively), Chloroperlidae and Perlidae were herein corroborated to be a sister group (with bootstrap value of 75/67 and posterior probability of 1.00/1.00 in ML tree and BI tree, respectively). The position of Chloroperlidae and Perlidae is very different from previous studies [20,21,24,25,30]. Instead, our results support the traditional morphology-based classification [26,27]. This phenomenon may result from the limited mitogenomic data, especially for Chloroperlidae species.
Currently, the position of Pteronarcyidae, Styloperlidae, and Peltoperlidae in Pteronarcyoidea has been resolved based on morphology [19,27], but this relationship has not been well supported by molecular data. Two previous studies concluded that Pteronarcyidae was not a sister group of Peltoperlidae [23,25] and Chen et al. (2016) indicated that there was a relationship of Styloperlidae (Pteronarcyidae and Peltoperlidae) [24]. However, the phylogenetic relationship among the three families in Pteronarcyoidea in our study was very different. In this study, Pteronarcyidae was recovered as a sister-group to Styloperlidae and Peltoperlidae with posterior probability of 0.85/0.98 on the tree generated by BI. ML analyses generated the same tree topologies with BI analyses, but the bootstrap value is 31/46 (Figures 7 and A1). These results were generally identical to the recent study made by our previous studies [22] and morphology studies [19,27]. The limited Peltoperlid Figure 6. The alignment of conserved structural elements of CRs among stoneflies. Sequence identity among species was indicated by colored boxes. CSB1-4 indicates four conserved sequence blocks in the CRs. Blue color means 100% sequnce identity. Pink color means 75% sequnce identity. Green color means 50% sequnce identity.

Phylogenetic Relationship
Phylogenetic analyses were performed using nucleotide sequences of 13 PCGs from 16 Plecoptera species, and two Ephemeroptera species were included as the outgroup (Table 1). Bayesian (BI) and maximum likelihood (ML) analyses generated the same tree topologies based on the PCG13 (including 13 PCGs) and PCGR (including 13 PCGs and 2 rRNAs) matrices (Figures 7 and A1). The resultant tree from the BI and ML analyses using PCG13 and PCGR datasets showed strong support for a monophyletic Capniidae, and the other five stonefly families formed a monophyletic group. In this monophyletic group, S. teleckojensis was closely related to S. longistyla (with bootstrap value of 92/100 and posterior probability of 0.98/1.00 in ML tree and BI tree, respectively), Chloroperlidae and Perlidae were herein corroborated to be a sister group (with bootstrap value of 75/67 and posterior probability of 1.00/1.00 in ML tree and BI tree, respectively). The position of Chloroperlidae and Perlidae is very different from previous studies [20,21,24,25,30]. Instead, our results support the traditional morphology-based classification [26,27]. This phenomenon may result from the limited mitogenomic data, especially for Chloroperlidae species.
Currently, the position of Pteronarcyidae, Styloperlidae, and Peltoperlidae in Pteronarcyoidea has been resolved based on morphology [19,27], but this relationship has not been well supported by molecular data. Two previous studies concluded that Pteronarcyidae was not a sister group of Peltoperlidae [23,25] and Chen et al. (2016) indicated that there was a relationship of Styloperlidae (Pteronarcyidae and Peltoperlidae) [24]. However, the phylogenetic relationship among the three families in Pteronarcyoidea in our study was very different. In this study, Pteronarcyidae was recovered as a sister-group to Styloperlidae and Peltoperlidae with posterior probability of 0.85/0.98 on the tree generated by BI. ML analyses generated the same tree topologies with BI analyses, but the bootstrap value is 31/46 (Figures 7 and A1). These results were generally identical to the recent study made by our previous studies [22] and morphology studies [19,27]. The limited Peltoperlid mitogenomes may result in two clades with low support values and ambiguous relationships among the three families. mitogenomes may result in two clades with low support values and ambiguous relationships among the three families. The basal position of Capniidae is confirmed and the relationship between Chloroperlidae and Perlidae is supported by BI and ML analyses, but the relationship among Pteronarcyidae, Styloperlidae, and Peltoperlidae remains to be studied in the future. Sampling across more taxonomic levels is very useful and important to gain detailed insights into this problem.

Specimens and DNA Extraction
A single adult sample of S. teleckojensis was used in this study. It was collected from Hanma National Nature Reserve of Inner Mongolia Autonomous Region, China in 2015 with other specimens by Weihai Li. Specimens were preserved in 100% ethanol in the field, and then kept in a −20 °C freezer. The thoracic muscle of the specimen was used for extraction of total genomic DNA using the QIAamp DNA Blood Mini Kit (Qiagen, Germany) and stored at −20 °C until needed. Vouchers consisting of the remaining stoneflies (No. VHem-0021) were deposited in the Entomological Museum of Henan Institute of Science and Technology (HIST), Henan Province, China.

Genome Sequencing, Assembly, and Annotation
The mitogenomes were amplified and sequenced as described in previous studies [22,30,53,54]. The complete mitogenome of S. teleckojensis has been deposited in GenBank with accession number MF198253. Sequence reads were assembled into contigs with BioEdit version 7.0.5.3 [55]. tRNA genes were identified by ARWEN with default settings [56]. Two rRNA and all PCG genes were identified by BLAST searches in NCBI (Available online: http://www.ncbi.nlm.nih.gov), and then confirmed by alignment with homologous genes from other published stonefly mitogenomes. The nucleotide composition and codon usage of PCGs were calculated with MEGA 5.0 [57]. Strand asymmetry was measured using the formulas [58]: The basal position of Capniidae is confirmed and the relationship between Chloroperlidae and Perlidae is supported by BI and ML analyses, but the relationship among Pteronarcyidae, Styloperlidae, and Peltoperlidae remains to be studied in the future. Sampling across more taxonomic levels is very useful and important to gain detailed insights into this problem.

Specimens and DNA Extraction
A single adult sample of S. teleckojensis was used in this study. It was collected from Hanma National Nature Reserve of Inner Mongolia Autonomous Region, China in 2015 with other specimens by Weihai Li. Specimens were preserved in 100% ethanol in the field, and then kept in a −20 • C freezer. The thoracic muscle of the specimen was used for extraction of total genomic DNA using the QIAamp DNA Blood Mini Kit (Qiagen, Germany) and stored at −20 • C until needed. Vouchers consisting of the remaining stoneflies (No. VHem-0021) were deposited in the Entomological Museum of Henan Institute of Science and Technology (HIST), Henan Province, China.

Genome Sequencing, Assembly, and Annotation
The mitogenomes were amplified and sequenced as described in previous studies [22,30,53,54]. The complete mitogenome of S. teleckojensis has been deposited in GenBank with accession number MF198253. Sequence reads were assembled into contigs with BioEdit version 7.0.5.3 [55]. tRNA genes were identified by ARWEN with default settings [56]. Two rRNA and all PCG genes were identified by BLAST searches in NCBI (Available online: http://www.ncbi.nlm.nih.gov), and then confirmed by alignment with homologous genes from other published stonefly mitogenomes. The nucleotide composition and codon usage of PCGs were calculated with MEGA 5.0 [57]. Strand asymmetry was measured using the formulas [58]:

Phylogenetic Analysis
Phylogenetic analysis was performed based on 16 complete or nearly complete mitogenomes of stoneflies from GenBank ( Table 1). The MAFFT algorithm within the TranslatorX online platform was used to align each PCGs individually [60]. Before back-translating to nucleotides, poorly aligned sites were removed from the protein alignment using GBlocks in the TranslatorX with default settings. We also aligned each rRNA gene individually using the MAFFT v7.0 online server with the G-INS-I strategy [61]. GBlocks v0.91b was used to filter the ambiguous positions in the alignment of rRNAs [62].
The datasets were assembled for our phylogenetic analyses: (1) the "PCG13 matrix" (total of 11,229 bp), including 13 PCGs; (2) the "PCGR matrix" (total of 12,986 bp), including 13 PCGs and 2rRNAs. We selected the best-fit model of nucleotide sequences of each gene by using jModelTest 0.1.1 according to the Akaike Information Criterion (AIC) [63], and the result is shown in Table 5. Bayesian analyses were run with PCG13 and PCGR datasets which were partitioned by gene. The nucleotide matrices were used to reconstruct phylogenetic trees via Bayesian inference (BI) and maximum likelihood (ML) using MrBayes 3.2.6 and RAxML-HPC2 8.1.11, respectively [64,65]. For Bayesian analyses, we conducted two simultaneous runs for 10 million generations. Each set was sampled every 1000 generations with a burn-in rate of 25%. Stationarity was examined by Tracer v.1.5, and was considered to be reached when the ESS (estimated sample size) value was above 200 [66]. For ML analyses of nucleotide sequences, bootstrapping analyses with 1000 replicates were performed with the fast ML method implemented in RAxML using the GTRGAMMA model. Table 5. Best evolutionary models selected by jModelTest.

Gene
Best Model