Mitochondrial Genomes of Hestina persimilis and Hestinalis nama (Lepidoptera, Nymphalidae): Genome Description and Phylogenetic Implications

Simple Summary In this study, the mitogenomes of Hestina persimilis and Hestinalis nama were obtained via sanger sequencing. Compared with other mitogenomes of Apaturinae butterflies, conclusions can be made that the mitogenomes of Hestina persimilis and Hestinalis nama are highly conservative. The phylogenetic trees build upon mitogenomic data showing that the relationships among Nymphalidae are similar to previous studies. Hestinalis nama is apart from Hestina, and closely related to Apatura, forming a monophyletic clade. Abstract In this study, the complete mitochondrial genomes (mitogenomes) of Hestina persimilis and Hestinalis nama (Nymphalidae: Apaturinae) were acquired. The mitogenomes of H. persimilis and H. nama are 15,252 bp and 15,208 bp in length, respectively. These two mitogenomes have the typical composition, including 37 genes and a control region. The start codons of the protein-coding genes (PCGs) in the two mitogenomes are the typical codon pattern ATN, except CGA in the cox1 gene. Twenty-one tRNA genes show a typical clover leaf structure, however, trnS1(AGN) lacks the dihydrouridine (DHU) stem. The secondary structures of rrnL and rrnS of two species were predicted, and there are several new stem loops near the 5′ of rrnL secondary structure. Based on comparative genomic analysis, four similar conservative structures can be found in the control regions of these two mitogenomes. The phylogenetic analyses were performed on mitogenomes of Nymphalidae. The phylogenetic trees show that the relationships among Nymphalidae are generally identical to previous studies, as follows: Libytheinae\Danainae + ((Calinaginae + Satyrinae) + Danainae\Libytheinae + ((Heliconiinae + Limenitidinae) + (Nymphalinae + (Apaturinae + Biblidinae)))). Hestinalis nama is apart from Hestina, and closely related to Apatura, forming monophyly.


Introduction
Hestina persimilis and Hestinalis nama belong to the lepidopteran family Nymphalidae, Apaturinae and mainly distribute in the Palaearctic and Oriental region. Their adults inhabit mountainous broad-leaved forests and present the habit of sipping tree juice and water in wetlands. The larvae were reported as a kind of agriculture pest of the host plants, Ulmaceae. At present, there are only one species (H. nama) of the genus Hestinalis and three species (including Hestina persimilis, Hestina assimilis and Hestina nicevillei) of the genus Hestina distributed in China. Hestinalis nama was originally described as Diadema nama by Doubleday in 1844 [1]. However, in subsequent studies, latter scholars placed it under the genus Hestina [2,3]. The classification of butterflies is mainly based on the characteristics of external genitalia and wing veins. Morphologically, including genitalic structure, Hestina and Hestinalis are easily separable [4]. In recent studies, Hestinalis was treated as a distinct genus [5]. Although most modern literature chooses to separate them, some literature, Wu and Hsu still treats them as one [6]. In this paper, phylogenetic analysis shows that Hestinalis nama is apart from Hestina. Therefore, we also separate them apart.
Mitogenome fragments have been extensively used in phylogenetic analysis for butterflies and moths, particularly for the cox1 gene which was primarily used as a DNA barcoding for animals [7][8][9][10][11]. In the BOLD system [12], lepidopteran insects consist of the largest amount of data being sequenced. However, the phylogenetic relationships at different taxonomic levels are still controversial [13][14][15]. It has been proposed that mitochondrial genomes might provide more genetic information than a single gene fragment [16][17][18]. Therefore, sequencing more mitogenomes might improve our understanding of evolution and phylogeny at different taxonomic levels in Lepidoptera. Furthermore, the mitogenome has been widely used in the areas of population genetic structure, gene drift and phylogenetics, because of its characteristics of maternal inheritance, small genome size (15-20 kb in length) and rapid rate of evolution [19,20]. To date, only one complete mitogenome (H. assimilis) has been sequenced from the genus Hestina; other species and Hestinalis nama were all not sequenced, which is quite limited and will restrict our understanding of evolution in Nymphalidae at the genomic level. In this study, the mitogenomes of H. persimilis and H. nama were obtained, with the aim of: (1) providing a comparative analysis of Apaturinae mitogenomes, including nucleotide composition, codon usage, gene arrangement, prediction of tRNA and rRNA secondary structures and novel features of the control region, and (2) reconstructing the phylogenetic relationships among subfamilies in Nymphalidae based on mitogenomes data.

Sampling and DNA Sequencing
Specimens of H. persimilis and H. nama were collected from the Sichuan and Yunnan Provinces of China in 2010. Specimens were being made, followed by morphological identification. One side of the hindfoot for each sample was preserved in absolute ethanol and stored in −20 • C freezer in College of Plant Protection, Shanxi Agricultural University, Taiyuan, China.
The DNA extraction kit and primers [21] (Table S1) were produced by Shanghai Major Biomedical Technology Co., Ltd. (Shanghai, China). The reaction systems of PCR amplifications were 25 µL, including upstream and downstream primers 0.5 µL, respectively, PCR Master Mix 12.5 µL, DNA template 3 µL, and ddH 2 O 8.5 µL. The amplification reaction conditions were as follows: initial denaturation at 94 • C for 2 min; 35 cycles of denaturation at 94 • C for 1 min, annealing at 53 • C for 45 s, extension at 72 • C for 1 min, and a final extension step at 72 • C for 4 min. PCR products were detected by 1% agarose gel electrophoresis. All the gene fragments were sent to Shanghai Major Biomedical Technology for sequencing.

Annotation and Analysis of Mitochondrial DNA
The original sequence fragments were assembled with SeqMan (Steve ShearDown, 1998-2001 version reserved by DNASTAR Inc., Madison, WI, USA) to get a complete mitogenome. The secondary structure of tRNA genes were determined by tRNAscan-SE Search Server (http://lowelab.ucsc.edu/tRNAscan-SE/; accessed on 28 June 2021) [22]. Putative tRNA genes, including trnH and trnS1(AGN), which could not be found by tR-NAscanSE, were confirmed by comparison with the homologous genes of other Apaturinae species. PCGs and rRNA genes were identified by the MITOS webserver with invertebrate genetic code [23]. The nucleotide composition and codon usage of PCGs were calculated with MEGA-X [24]. Determination of tandem repeat sequences in control regions were performed using the Tandem Repeats Finder online software (http://tandem.bu.edu/trf/trf.html; ac-cessed on 10 May 2020) [25]. The mitogenomes of H. persimilis and H. nama were uploaded to GenBank, with the accession numbers of MT110153 and MT110154.

Phylogenetic Analysis
Phylogenetic analysis was performed on the dataset of 13 PCGs from 54 complete or nearly complete mitogenomes of Nymphalidae, with two Papilionidae species selected as outgroups (Table S2). All assembled PCGs of 56 mitogenomes were aligned through MEGA-X. The optimal partition tactics and substitution models were selected by Parti-tionFinder v2 (Tables S3 and S4) [26][27][28]. The maximum likelihood (ML) and Bayesian inference (BI) analyses were conducted through the online CIPRES Science Gateway [29]. The ML analysis was performed with RAxML-HPC2 on XSEDE [30], with GTRGAMMA model applied to all partitions. Bootstrap values were estimated with 1000 replicates. The BI analyses were carried out through two independent Markov chain Monte Carlo (MCMC) chains, which were set for 1,000,000 generations, with sampling per 1000 generations.

Mitogenomes Organization
The complete mitogenomes of H. persimilis and H. nama are 15,252 and 15,208 bp. They share the consistent gene organization, order and arrangement with most of other lepidopterans, including 13 PCGs, 22 tRNAs and 2 rRNAs (rrnL and rrnS) (Figures 1 and 2). The mitogenome is circular with two strands. The heavy strand (H-strand) encodes most of the genes (9 PCGs and 14 tRNAs), while the light strand (L-strand) contains the remaining reverse complementary genes (four PCGs, eight tRNAs and two rRNAs), as shown in Tables 1 and 2. In addition, the nucleotide composition of the two species are both AT-biased, similar to other lepidopterans. The AT contents of the mitogenomes of H. persimilis and H. nama are 80.9 and 79.2%, respectively (Tables 3 and 4). The obvious AT-biased (Table S5) is generally believed to be related to the evolution of mitochondrial origin [31].

Protein Coding Genes and Codon Usage
Orthologs from the two Hestina mitogenomes present similar start and stop codons. Most PCGs start with the typical initial codon ATN, but cox1 initiates with CGA. In particular, the putative start codon CGA in the cox1 gene is a common feature of most sequenced lepidopterans, but a few species start with codon ATG, ATT, ATA or TTG. While most PCGs end with the stop codon TAA or TAG, truncated codon T is also detected in cox2 and nad4. It has been proposed that truncated stop codons can be completed by polyadenylation, which was also found in other insectan mitogenomes [32].
Relative synonymous codon usage (RSCU) can directly reflect the preference of codon usage (Table S6) , are all composed of A and/or T. Conversely, some GC-rich codons are seldom utilized in the Apaturinae species. For example, the codon UCG, CCG are not used in H. persimilis, while CUG, GUC and CCG are absent in Sasakia charonda. This phenomenon is common among lepidopterans [33,34], which indicates that the GC content of genes is closely related to codon preference [35,36].     12, x FOR PEER REVIEW

Transfer RNAs and Ribosomal RNAs
All 22 tRNAs typical of lepidopteran mitogenomes are found in the mitogenomes. Most tRNA genes are in classical clover-leaf secondary structures except for trnS1(AGN), with its DHU arm forming a simple loop, which is considered as a typical feature in metazoan mitogenomes ( Figure 5) [37]. Additionally, the anticodon stem of trnS1(AGN) may be shortened as of base mismatch in some insect mitogenomes [38]. Previous studies showed that not only trnS1(AGN), but also some other tRNAs, such as trnS2(UCN) and trnG, lack a DHU or TΨC arm [39]. Missing a DHU arm and base mismatch are thermodynamically unstable, which indicate that a DHU arm might not really exist. Accordingly, this special structure of trnS1(AGN) still needs further investigation. In addition, it has been shown that some isoforms of tRNAs can be found in control regions or some PCGs on the L-strand of mitogenomes. The isoforms of tRNAs can also be folded into cloverleaf structures. However, it is not clear whether their functions are similar to those of tRNAs [40] or not.
The rrnL gene is found between trnL (CUN) and trnV, while the rrnS gene is located between trnV and the control region. The lengths of rrnL genes of H. persimilis and H. nama mitogenomes are 1334 bp (AT content 84.29%) and 1327 bp (AT content 83.44%). The sizes of rrnS genes of H. persimilis and H. nama are 776 bp (AT content 85.03%) and 774 bp (AT content 84.39%). The secondary structure of rrnL genes include six structural domains except for that domain III is absent in arthropods ( Figure 6). The rrnS genes include three structural domains (Figure 7). Both the secondary structures of rrnL and rrnS of the two species are roughly similar to other lepidopterans, such as Amata emma, Apis mellifera [41], Grapholita molesta [42], Manduca sexta [43], etc. The microsatellite sequence (TA) n is not found in rrnL and rrnS, but exists in other insects (e.g., Choristoneura longicellana). There are several new stem loops near the 5 of rrnL secondary structure, and these loops were not found in other insects.  Dashes, black dots and circles indicate the Watson-Crick base pairings, G-U bonds and U-U, A-A, A-C and A-G bonds, respectively.

Intergenic and Overlapping Regions
It has been proposed that mitogenomes tend to be highly economized in size by eliminating or reducing intergenic spacers [44]. However, by excluding the control region, 12 intergenic spacers (1 to 91 bp, 150 bp in total) are found in H. persimilis, and 11intergenic spacers (1 to 69 bp, 118 bp in total) are found in H. nama. It has been reported that Lepidopteran mitogenomes usually have two typical and relatively conservative intergenic spacers. The longer one is located between trnQ and nad2 genes, with the length of 91 and 69 bp in H. persimilis and H. nama. Previous studies found that the nucleotide sequence of the trnQ-nad2 spacer and the nad2 gene have a highly similarity. It has been inferred that the trnQ-nad2 spacer may come from the nad2 gene [45]. The other shorter spacer is located between trnS2(UCN) and nad1 genes, with the length of 22 and 13 bp in H. persimilis and H. nama, respectively, sharing a conserved sequence of ATACTAA.
Comparing with the intergenic spacers, the overlapping regions are more conservative [46]. Fourteen overlapping spacers (1 to 26 bp, 66 bp in total) are found in H. persimilis, and fourteen overlapping spacers (1 to 8 bp, 42 bp in total) are found in H. nama. ATP8 and ATP6 overlap with the ATGATAA motif in the two mitogenomes, which had also been reported in many other lepidopterans [47].

Putative Control Regions
The control region, also known as the A + T-region or D-loop, is always the largest intergenic spacer in animal mitogenomes and considered as the initial region for replication [48]. The control regions (376 bp in H. persimilis and 390 bp in H. nama) in the two mitogenomes are located between rrnS and trnM. The AT content is also the highest in mitogenomes (91.23% in H. persimilis and 88.72% in H. nama). Previous studies indicated that the control region is the segment with fastest evolutionary rate and can be used as an important molecular marker for animal population genetics.
There are generally four conserved structures in the control region, including a motif of ATAGA located at downstream of rrnS followed by 19 bp Poly-T stretch, a poly-A stretches (9 bp in H. persimilis and 6 bp in H. nama) at the upstream of trnM (Figures 8 and 9), the microsatellite-like repeat regions ((AT) 10 in H. persimilis and (AT) 6 in H. nama), and the repeated sequences (23 bp in H. persimilis and 25 bp in H. nama). All these characteristics are generally considered to be related to the transcription or replication of mitogenomes [49]. Although the location of initial replication region in complete metamorphosis insects (including lepidopterans) are different, they all located after polyT (about 10-20 bp) ( Figure 10) [50]. Accordingly, polyT may be involved in the recognition of the initial replication region [51].

Conclusions
The mitogenomes of H. persimilis and H. nama were obtained using sanger sequencing. Comparing them with other mitogenomes of Apaturinae butterflies, the conclusion can be drawn that the mitogenomes are highly conserved, sharing the same gene order, gene location, codon usage, nucleotide composition and AT-biased pattern. The secondary structures of rrnL and rrnS of two species are roughly similar to other lepidopterans. Although the control regions vary greatly in length, their structure has not changed much, which includes four basic conservative regions. The topology of phylogenetic analyses are generally identical to those of other studies. Hestinalis nama is not grouped with Hestina, and is closely related to Apatura, which is consistent with early studies.

Supplementary Materials:
The following are available online at https://www.mdpi.com/article/10.3 390/insects12080754/s1, Figure S1: phylogenetic relationships by BI analysis based on the 13PCGs. Figure S2: phylogenetic relationships by ML analysis based on the 13PCGs. Table S1: sequences of 22 pairs of primers. Table S2: list of species used to construct the phylogenetic tree. Table S3: the starting partitions used to initiate the PartitionFinder analysis. Table S4: evolutionary models from partition strategies start scheme used in the phylogenetic analysis. Table S5: bias of base composition of protein-coding genes in the two mitogenomes. Table S6: