Unique Duplication of trnN in Odontoptilum angulatum (Lepidoptera: Pyrginae) and Phylogeny within Hesperiidae

Simple Summary Pyrginae is one of the major groups of the Hesperiidae, and has some particular characteristics. The annotated complete mitogenome from this subfamily is reported here. The gene order of the new mitogenome with the duplication of trnN differs from the typical Lepidoptera-specific arrangement and is unique to Hesperiidae. The presence of a pseudo gene in the mt genome of Odontoptilum angulatum supports the duplication of trnN, following the TDRL model. Comparison of the newly generated mitogenome of Odontoptilum angulatum to all available mitochondrial genomes of other rearranged Pyrginae species revealed that the condition of Odontoptilum angulatum is of independent origin. Therefore, we hypothesize that the gene block trnN–trnS1–trnE is the hot spot of gene rearrangement in the Tagiadini of Pyrginae. Phylogenetic analyses based on 13 protein-coding genes and entire RNA genes of mitogenomes show the monophyly of Pyrginae. Abstract To explore the variation and relationship between gene rearrangement and phylogenetic effectiveness of mitogenomes among lineages of the diversification of the tribe Tagiadini in the subfamily Pyrginae, we sequenced the complete mitogenome of Odontoptilum angulatum. The genome is 15,361 bp with the typical 37 genes, a large AT-rich region and an additional trnN (trnN2), which is completely identical to trnN (sequence similarity: 100%). The gene order differs from the typical Lepidoptera-specific arrangement and is unique to Hesperiidae. The presence of a “pseudo-trnS1” in the non-coding region between trnN1 and trnN2 supports the hypothesis that the presence of an extra trnN can be explained by the tandem duplication-random loss (TDRL) model. Regarding the phylogenetic analyses, we found that the dataset comprising all 37 genes produced the highest node support, as well as a monophyly of Pyrginae, indicating that the inclusion of RNAs improves the phylogenetic signal. Relationships among the subfamilies in Hesperiidae were also in general agreement with the results of previous studies. The monophyly of Tagiadini is strongly supported. Our study provides a new orientation for application of compositional and mutational biases of mitogenomes in phylogenetic analysis of Tagiadini and even all Hesperiidae based on larger taxon sampling in the future.


Introduction
Due to unique characteristics of the insect mitochondrion, such as strict maternal mode of inheritance, conservative gene components and a comparatively fast rate of evolution [1][2][3], mitochondrial genomes have become a significant molecular marker due to their maternal inheritance, fast evolutionary rate and highly conserved gene content compared to nuclear genes [4], and have been used widely in studies of genetic evolution, classification and identification, and indicators of possible phylogenic relationships for many taxonomic groups including Lepidopteran insects [5][6][7]. Generally, the insect Insects 2021, 12, 348 3 of 14 to the lab, stored in a −20 • C freezer in the Entomological Museum of the Northwest A&F University, Yangling, Shaanxi Province, China. All adult specimens were identified based on morphological characteristics [37] and were confirmed with cox1 barcoding alignment in the BOLD database [40]. The thoracic muscle from a single specimen was used to extract genomic DNA following the manufacturer's instructions (EasyPureR Genomic DNA Kit, TRAN, TransGen, Beijing, China).

Sequencing
Complete mitochondrial genome was determined using NGS by Biomarker Technologies Corporation (Beijing, China.). After the samples were sequenced in both directions, approximately 1 GB of data was obtained. The raw paired-end clean reads were assembled under medium-low sensitivity with trim sequence by employing the mitogenome of Tagiades vajuna (Hesperiidae; Pyrginae; GenBank: KX865091) [39] as a reference (Table 1) in Geneious 11.0.5. The generated consensus sequence and the reference sequence were aligned used MAFFT (within Geneious 11.0.5) for annotation. All PCGs were determined by finding the ORFs by employing condon Table 5 (the invertebrate mitochondrial genetic code). The tRNAs, including the duplicated tRNA (trnN), and rRNAs (12S and 16S) were found using the MITOS Web Server [41]. According to the MITOS predictions, the secondary structures of tRNAs were manually plotted using Adobe Illustrator CC2020. The mitogenomic circular map was drawn using Organellar Genome DRAW (OGDRAW) [42]. Finally, in order to ensure the accuracy of annotation, all genes were visually inspected via alignments in Geneious against the reference mitogenome. The base composition, AT skew, CG skew and data used to plot RSCU (relative synonymous codon usage) figures were all calculated using PhyloSuite v 1.2.2 [43]. The mitogenome sequence was registered in GenBank as MW381783.

Phylogenetic Analysis
For the phylogenetic analysis, we chose a total of 33 mitogenome sequences of known Hesperiidae that are publicly available (including the newly sequenced mt genome of O. angulatum) ( Table 1) as ingroups and the mitogenome of four species in Papilionidae were selected as outgroups.
Statistics for the basic characteristics of the mitochondrial genome were produced by MitoTool [62]. The extraction of PCGs and RNAs were carried out by PhyloSuite v 1.2.2. All 13 PCGs were aligned in batches with MAFFT integrated into BioSuite [63], based on the codon-alignment mode. tRNAs and rRNAs were aligned using the Q-INS-i algorithm in MAFFT v.7 online service [64]. Poorly aligned regions in the alignments were removed using Gblocks v0.91b [65].
To compare the phylogenetic signal information of the different dataset combinations, 3 datasets were used: protein-coding genes (PCGs), removal of third codon position of protein-coding genes + whole RNA genes (PCG12RT), and protein-coding genes + entire RNA genes (PCGRT). We chose the GTR + I + G model of evolution based on the three datasets by Bayesian information criterion (BIC) as estimated in jModelTest 2.1.7 [66]. Phylogenetic analysis was performed employing the best-fit model, using maximum likelihood (ML) and Bayesian inference (BI). The ML analysis were conducted using RAxML GUI [67], with an ML + rapid bootstrap (BS) algorithm with 1000 replicates. The BI analysis was implemented using MrBayes 3.2.6 [68] with default settings and 6 × 10 6 MCMC generations. The convergence of the independent runs was indicated by average standard deviation of split frequencies < 0.01, estimated sample size > 200, and potential scale reduction factor ≈ 1.

Mitochondrial Genome Organization
The mitogenome of O. angulatum (15,361 bp) is a single, covalently closed circular double-stranded DNA molecule ( Figure 1) composed of 38 coding genes (13 PCGs, 23 tRNA genes, and two rRNA genes), and a major non-coding A + T-rich region (replication origin site) [69]. Compared with the mitogenomes of other Hesperiidae (range from 15,267 bp (Potanthus flavus) to 15,769 bp (Heteropterus morpheus)), it is medium-sized. In addition to the 37 typical genes in arthropod mitochondrion, an additional trnN (named trnN2) was found in this genome.
The minority strand (N-strand) encodes 14 genes, 4 PCGs (ND1, ND4, ND4L, and ND5), 8 tRNAs (trnQ, trnC, trnY, trnF, trnH, trnP, trnL2, and trnV), and 2 rRNA genes (the large rRNA subunit (lrRNA) and small rRNA subunit (srRNA)), whereas the other 24 genes were transcribed from the majority strand (J-strand). All 13 PCGs were initiated by the start codon ATN (8 by ATG, 4 by ATT and 1 by ATA). Nine PCGs ended with a complete TAA termination codon. It is worth mentioning that, differing from the use of CGA in cox1 initiation as has been reported before for Hesperiidae [70], cox1 of O. angulatum starts with normal ATG. In addition to cox1, cox2, nad5 and nad4, all of which use an incomplete stop codon T-, the remaining PCGs all use complete TAA stop codons. Seven gene overlaps were observed, ranging from 1 to 24 bp in length. With the exception of the control region, we identified 17 non-coding regions (NCRs) comprising a total of 261 bp with the second longest being 44 bp between trnN1 and trnN2. The lrRNA and srRNA were 1380 bp and 761 bp, respectively ( Table 2).
to the 37 typical genes in arthropod mitochondrion, an additional trnN (named trnN2) was found in this genome. The minority strand (N-strand) encodes 14 genes, 4 PCGs (ND1, ND4, ND4L, and ND5), 8 tRNAs (trnQ, trnC, trnY, trnF, trnH, trnP, trnL2, and trnV), and 2 rRNA genes (the large rRNA subunit (lrRNA) and small rRNA subunit (srRNA)), whereas the other 24 genes were transcribed from the majority strand (J-strand). All 13 PCGs were initiated by the start codon ATN (8 by ATG, 4 by ATT and 1 by ATA). Nine PCGs ended with a complete TAA termination codon. It is worth mentioning that, differing from the use of CGA in cox1 initiation as has been reported before for Hesperiidae [70], cox1 of O. angulatum starts with normal ATG. In addition to cox1, cox2, nad5 and nad4, all of which use an incomplete stop codon T-, the remaining PCGs all use complete TAA stop codons. Seven gene overlaps were observed, ranging from 1 to 24 bp in length. With the exception of the The A/T nucleotide composition is 81.2% (excluding the control region) in O. angulatum, indicating a strong A/T bias ( Table 3). The PCGs have the lowest AT content (79.8%) and the A + T-rich region has the highest (95.4%), as in all previously-sequenced mitogenomes of skippers [57]. Among the 13 PCGs, the A + T content of the third codon (94%) was much higher than the first (74.8%) and second positions (70.7%). The mitogenome exhibits obvious negative GC-skews (−0.217) and insignificant negative AT-skews (−0.015) ( Table 3). The mitogenome-wide AT bias was well-documented in the codon usage, and the Insects 2021, 12, 348 6 of 14 RSCU (relative synonymous codon usage) values indicated a preference for NNU and NNA codons in skipper mitogenomes, which has been observed before [38,45]. Furthermore, Figure 2 also indicates that the most frequently used codons are UUU (Phe), UUA (Leu), AUU (Ile), AUA (Met), and AAU (Asn). In the whole genome of O. angulatum, besides the common 22 tRNAs, an additional trnN was found. We found a unique rearrangement that differs from previous studies in Lepidoptera [71,72]. Moreover, the sequences of trnN1 and trnN2 were absolutely identical (sequence similarity: 100%). Twenty-three tRNA (62 bp to 71 bp) could be folded into the typical structure of cloverleaf secondary ( Figure S1). Only trnS (AGN) lacked the DHU stem ( Figure S1), and this phenomenon probably evolved very early in the Metazoa [73] and is the ancestral state in butterflies. The trnN1 and trnN2 have a secondary structure of standard tRNA genes [74,75] and a completely identical anticodon; we surmise that both of them had identical functions. The A/T nucleotide composition is 81.2% (excluding the control region) in O. angulatum, indicating a strong A/T bias ( Table 3). The PCGs have the lowest AT content (79.8%) and the A + T-rich region has the highest (95.4%), as in all previously-sequenced mitogenomes of skippers [57]. Among the 13 PCGs, the A + T content of the third codon (94%) was much higher than the first (74.8%) and second positions (70.7%). The mitogenome exhibits obvious negative GC-skews (−0.217) and insignificant negative AT-skews (−0.015) ( Table 3). The mitogenome-wide AT bias was well-documented in the codon usage, and the RSCU (relative synonymous codon usage) values indicated a preference for NNU and NNA codons in skipper mitogenomes, which has been observed before [38,45]. Furthermore, Figure 2 also indicates that the most frequently used codons are UUU (Phe), UUA (Leu), AUU (Ile), AUA (Met), and AAU (Asn). In the whole genome of O. angulatum, besides the common 22 tRNAs, an additional trnN was found. We found a unique rearrangement that differs from previous studies in Lepidoptera [71,72]. Moreover, the sequences of trnN1 and trnN2 were absolutely identical (sequence similarity: 100%). Twenty-three tRNA (62 bp to 71 bp) could be folded into the typical structure of cloverleaf secondary ( Figure S1). Only trnS (AGN) lacked the DHU stem ( Figure S1), and this phenomenon probably evolved very early in the Metazoa [73] and is the ancestral state in butterflies. The trnN1 and trnN2 have a secondary structure of

Non-Coding Regions (NCR) and a Pseudo Gene
The control region, located between rrnS and trnM and thought to play a controlling role in the transcription process [76], was the longest non-coding region in the mitogenome of O. angulatum. Its length and A + T content are 287 bp and 95.4%, respectively (Table 3). Compared with three other skippers from Tagiadini, the sequence analysis result of the control region showed that they all have conserved structures, including variable-length poly-T stretches (17-19 bp), several runs of microsatellite-like A/T sequences following a motif ATTTA, and an interrupted poly-A stretch directly upstream of trnM. Moreover, the motif ATAGA close to the 5 -end of the rrnS is the origin of the minority strand replication in Lepidopteran mitogenomes ( Figure 3) [69,77]. In addition to the control region, another long NCR (44 bp) was found between trnN1 and trnN2. In this NCR, a 37 bp region was identified, which shares 100% similarity with the homologous sequences of trnS1. Therefore, we defined this region as pseudo-trnS1 ( Figure 4A). In addition, we found that the pseudo-trnS1 had a 7 bp gene fragment (ATAATAT), which is the intergenic nucleotides between trnN2 and trnS1 in front of it in NCR.

Non-Coding Regions (NCR) and a Pseudo Gene
The control region, located between rrnS and trnM and thought to play a controlling role in the transcription process [76], was the longest non-coding region in the mitogenome of O. angulatum. Its length and A + T content are 287 bp and 95.4%, respectively (Table 3). Compared with three other skippers from Tagiadini, the sequence analysis result of the control region showed that they all have conserved structures, including variable-length poly-T stretches (17-19 bp), several runs of microsatellite-like A/T sequences following a motif ATTTA, and an interrupted poly-A stretch directly upstream of trnM. Moreover, the motif ATAGA close to the 5′-end of the rrnS is the origin of the minority strand replication in Lepidopteran mitogenomes ( Figure 3) [69,77]. In addition to the control region, another long NCR (44 bp) was found between trnN1 and trnN2. In this NCR, a 37 bp region was identified, which shares 100% similarity with the homologous sequences of trnS1. Therefore, we defined this region as pseudo-trnS1 ( Figure 4A). In addition, we found that the pseudo-trnS1 had a 7 bp gene fragment (ATAATAT), which is the intergenic nucleotides between trnN2 and trnS1 in front of it in NCR.

Non-Coding Regions (NCR) and a Pseudo Gene
The control region, located between rrnS and trnM and thought to play a controlling role in the transcription process [76], was the longest non-coding region in the mitogenome of O. angulatum. Its length and A + T content are 287 bp and 95.4%, respectively (Table 3). Compared with three other skippers from Tagiadini, the sequence analysis result of the control region showed that they all have conserved structures, including variable-length poly-T stretches (17-19 bp), several runs of microsatellite-like A/T sequences following a motif ATTTA, and an interrupted poly-A stretch directly upstream of trnM. Moreover, the motif ATAGA close to the 5′-end of the rrnS is the origin of the minority strand replication in Lepidopteran mitogenomes ( Figure 3) [69,77]. In addition to the control region, another long NCR (44 bp) was found between trnN1 and trnN2. In this NCR, a 37 bp region was identified, which shares 100% similarity with the homologous sequences of trnS1. Therefore, we defined this region as pseudo-trnS1 ( Figure 4A). In addition, we found that the pseudo-trnS1 had a 7 bp gene fragment (ATAATAT), which is the intergenic nucleotides between trnN2 and trnS1 in front of it in NCR.   The tandem duplication-random loss (TDRL) model is the most widely accepted mechanism for explaining mitochondrial gene rearrangement. In the TDRL model, some of the mitochondrial genes produce multiple gene repeats due to a contiguous segment of DNA duplication. Subsequently, the accumulation of mutations within multiple duplications eventually deactivates one of the genes at random, and the selective pressure to shrink the genome leads to the elimination of nonfunctional genes [30]. In this process, the randomly lost and nonfunctional genes can become pseudo genes. [78,79]. Pseudo genes may eventually disappear completely from the genome due to the selective pressure of shrinking the genome, causing the gene order to be different from the typical gene arrangement (such as Drosophila yakuba) [80].
In a previous study, the extra tRNA genes were also found in other Tagiadini species. The trnS1 duplication in Ctenoptilum vasava (Hesperiidae: Pyrginae) (-N-S1a-S1b-E-) [38] and the tandem duplication of the gene block trnS1-trnE in Tagiades vajuna (Hesperiidae: Pyrginae) (-N-S1a -Ea -S1b -Eb -) [39] further indicates the independent origin of the duplicated trnS1: -N-S1-E-→ -N-S1a-S1b-E-in C. vasava, -N-S1-E-→ -N-S1a -Ea -S1b -Eb -in Tagiades vajuna and -N-S1-E-→ -Na-Nb-S1-E-in O. angulatum. Therefore, the Insects 2021, 12, 348 9 of 14 condition of O. angulatum is of independent origin. However, pseudo genes were not found in the vicinity of duplicated tRNA in both cases. The gene order of the O. angulatum mitogenome differs from the typical Lepidoptera-specific arrangement and is unique not only in Hesperiidae but also in Lepidoptera (Figure 1). The presence of pseudo-trnS1 and an upstream 7 bp gene fragment in the mitogenome of O. angulatum supported the hypothesis that duplication of trnN obeyed the TDRL model. This current genomic rearrangement likely occurred due to the tandem duplication of the gene block trnN-trnS1, forming the structure of trnN-trnS1-trnN-trnS1 and then the subsequent random loss of trnS1 in the first copy, causing the current arrangement trnNa-trnNb-trnS1 and a non-coding region including a 37-bp pseudo-trnS1 ( Figure 4B). The translocation of trnN and trnS1 in the E. montanus (Pyrginae: Erynnini) mitogenome could also be explained through the TDRL model through a trnN-trnS1 tandem duplication [51]. Therefore, we hypothesize that the gene block trnN-trnS1-trnE is the hot spot of gene rearrangement in the Tagiadini of Pyrginae.

Phylogenetic Analyses
The six phylogenetic trees (3 datasets × 2 methods) show topologies that are nearly congruent with most branches receiving strong support. The topologies generated by the PCGRT dataset have a higher support rate than PCG and PCG12RT, which indicates that the RNA genes have more phylogenetic resolution [81]. Since the phylogenetic topologies obtained by both methods (ML and BI) using the PCGRT dataset are concordant, we only showed the ML tree ( Figure 5, all remaining dendrograms are shown in supplementary data: PCGRT dataset with the method of the BI in Figure S2, PCG dataset with the method of the ML and BI in Figures S3 and S4, and PCG12RT dataset with the method of ML and BI in Figures S5 and S6). In general, the phylogenomic relationships recovered in our analysis are nearly identical to the most recent mitochondrial phylogenomic studies [39,54,57]. Nevertheless, Pyrginae and Eudaminae show different phylogenetic relationships in the 6 phylograms: Pyrginae was polyphyletic by Eudaminae in both analyses of the PCG datase and PCG12RT datasets, but monophyletic in the BI and ML analysis of the PCGRT dataset (nodal support value: BS = 58, BPP = 0.796). Nowadays, based on the study of the mitogenome data and united mitogenome/nuclear genome datasets, the monophyly of Pyrginae still remains unclear [82]. For the sake of resolving the problem of the taxonomic status of Pyrginae, a denser taxonomic sampling of mt genomes, more complete transcriptome or genomics data, and better linkage between morphological features and molecular data is required.
As expected, O. angulatum clustered with the other three Tagiadini species, T. tethys, C. vasava, T. tethys and T. vajuna in all 6 phylograms produced. Moreover, a consistent In general, the phylogenomic relationships recovered in our analysis are nearly identical to the most recent mitochondrial phylogenomic studies [39,54,57]. Nevertheless, Pyrginae and Eudaminae show different phylogenetic relationships in the 6 phylograms: Pyrginae was polyphyletic by Eudaminae in both analyses of the PCG datase and PCG12RT datasets, but monophyletic in the BI and ML analysis of the PCGRT dataset (nodal support value: BS = 58, BPP = 0.796). Nowadays, based on the study of the mitogenome data and united mitogenome/nuclear genome datasets, the monophyly of Pyrginae still remains unclear [82]. For the sake of resolving the problem of the taxonomic status of Pyrginae, a denser taxonomic sampling of mt genomes, more complete transcriptome or genomics data, and better linkage between morphological features and molecular data is required.
As expected, O. angulatum clustered with the other three Tagiadini species, T. tethys, C. vasava, T. tethys and T. vajuna in all 6 phylograms produced. Moreover, a consistent topology was obtained: (((T. tethys + T. vajuna) + C. vasava) + O. angulatum). The gene rearrangement in mitogenomes was expected to provide valuable information for the reconstruction of molecular phylogeny. However, it seems that majority of gene rearrangements could be observed in the tribe Tagiadini within the family Hesperiidae. Theoretically, it is beneficial to use mt genome rearrangement as a phylogenetic marker because rearrangements of the mt genome appear to be unique and rare events, which are stable once they have occurred, and the rearrangement genes are homologous. The synapomorphy of gene rearrangement supported an insect-crustacean clade and further study of the arrangements of the mt genome will help to understand and to improve the higher-level taxonomy and systematics [83,84].

Conclusions
We sequenced the mitogenome of O. angulatum and found that the duplication of trnN was unique among all the characterized mitogenomes in Lepidoptera. The duplication of trnN obeyed the TDRL model, where one of the trnS1 lost some parts and became a pseudo-gene after the tandem-duplication of the element trnN1-trnS1. Comparing with the duplication pattern of closely related branch, we suggest that the trnN duplication of O. angulatum probably has an independent origin in Tagiadini. We compared the results from different datasets and methods to reconstruct the phylogenetic reconstruction of the family Hesperiidae, and suggest that the topologies generated by the PCGRT dataset had a higher node support rate.