Next-Generation Sequencing of Four Mitochondrial Genomes of Dolichovespula (Hymenoptera: Vespidae) with a Phylogenetic Analysis and Divergence Time Estimation of Vespidae

Simple Summary Mitochondria are vital organelles found in most eukaryotes that are involved in energy metabolism. The mitochondrial genome (mtgenome) is the genetic material of mitochondria, which is responsible for the coding and translation of proteins needed by organelles. By studying the mtgenomes of insects, we can better comprehend the phylogenetic relationships and molecular evolution of insects. The mtgenomes of four wasps are described in this paper, along with comparative genomic analyses. There are gene rearrangement events and 37 genes in each of the four mtgenomes. The subfamily Stenogastrinae is the sister group to the remaining Vespidae family, and according to our concluding analysis, the genus Vespa is more closely linked to the genus Vespula than to the genus Dolichovespula. We provide new evidence for the two-origin hypothesis of eusociality in the Vespidae. This work will aid us in future research on species evolution and phylogeny in the family Vespidae. Abstract The wasp genus Dolichovespula (Hymenoptera: Vespidae: Vespinae) is a eusocial wasp group. Due to the taxonomic and phylogenetic issues with the family Vespidae, more genetic data should be gathered to provide efficient approaches for precise molecular identification. For this work, we used next-generation sequencing (also known as high-throughput sequencing) to sequence the mitochondrial genomes (mtgenomes) of four Dolichovespula species, viz. D. flora, D. lama, D. saxonica, and D. xanthicincta 16,064 bp, 16,011 bp, 15,682 bp, and 15,941 bp in length, respectively. The mitochondrial genes of the four species are rearranged. The A + T content of each mtgenome is more than 80%, with a control region (A + T-rich region), 13 protein-coding genes (PCGs), 22 tRNA genes, and two rRNA genes. There are 7 to 11 more genes on the majority strands than on the minority strands. Using Bayesian inference and Maximum-Likelihood methodologies as well as data from other species available on GenBank, phylogenetic trees and relationship assessments in the genus Dolichovespula and the family Vespidae were generated. The two fossil-based calibration dates were used to estimate the origin of eusociality and the divergence time of clades in the family Vespidae. The divergence times indicate that the latest common ancestor of the family Vespidae appeared around 106 million years ago (Ma). The subfamily Stenogastrinae diverged from other Vespidae at about 99 Ma, the subfamily Eumeninae at around 95 Ma, and the subfamily Polistinae and Vespinae diverged at approximately 42 Ma. The genus Dolichovespula is thought to have originated around 25 Ma. The origin and distribution pattern of the genus Dolichovespula are briefly discussed.


Introduction
The long-cheeked yellow jacket genus Dolichovespula consists of 19 species distributed in Europe/Asia, North Africa, and North America [1][2][3][4][5]. Archer tentatively divided Dolichovespula into seven species groups and discussed their distribution in detail [1,2]. However, China, with one of the most speciose faunae of the world, was not well researched at that time. Ten known species belonging to all seven groups as defined by Archer occur in China. More collection information has been added with the development of our study on Chinese vespids in recent years. The nests and all three castes of D. flora and D. stigma were discovered, and D. baileyi Archer, 1987 is now considered to be synonymous with D. stigma Lee, 1986 [4,6]. The additional data, as important evidence, improved group division [5,7]. There are still two high-altitude species, D. lama and D. xanthicincta, with their nests and male data lacking. More data, especially molecular data, is expected to support further research.
The normal insect mitochondrial genome (mtgenome) measures around 16 kb, is double-stranded and closed-circular, and typically contains 13 protein-coding genes (PCGs), 22 tRNA genes, two rRNA genes, and a control region (A + T-rich region) [8,9]. The mtgenome is a suitable molecular marker for molecular evolution, phylogenetic relationships, and species identification due to its maternal inheritance, short coalescence time, rapid mutation rate, and conservative gene components [10][11][12][13]. Frequent gene rearrangements and a high A + T content are common characteristics of the mtgenomes of Hymenoptera [14,15]. To date, 56 annotated mtgenomes representing 34 species of Vespidae are available in the GenBank. Among them, only one species of genus Dolichovespula (D. panda) has been annotated [9]. Three additional Dolichovespula (i.e., D. sylvestris, D. saxonica, and D. media) mtgenomes are found in GenBank. However, no annotated data are available for these three mtgenomes.
In this research, we sequenced the mtgenomes of Dolichovespula flora, D. lama, D. saxonica, and D. xanthicincta via high-throughput sequencing and annotated them. General features of the mtgenomes were described. The phylogenetic trees in the genus Dolichovespula and in the family Vespidae were constructed and the divergence times of the family Vespidae were estimated.

Sample Gathering and DNA Isolation
Supplementary Table S1 provides specifics regarding sample gathering. After identification, each specimen was stored in a sealed box and put in a −20 • C freezer. According to the manufacturer's instructions, mitochondrial DNA was extracted from the legs of each wasp specimen using the DNeasy tissue kit (Qiagen, Hilden, Germany). The entomological collections of the Beijing Academy of Agriculture and Forestry Sciences housed the voucher DNA. The DNA concentration was quantified using Qubit 3.0 (Invitrogen, Life Technologies, Carlsbad, CA, USA).

Mitochondrial Genome Sequencing and Assembly
The mtgenomic sequences of the four Dolichovespula species were obtained by highthroughput sequencing (GenBank: OP250139 D. flora, OP250140 D. lama, OP250141 D. saxonica, and OP250142 D. xanthicincta). A string of additional NNNs was inserted to indicate that there are gaps in our submissions and that the genome sequences are incomplete. The libraries were sequenced on Illumina Hiseq 2500 with the strategy of 250 paired-ends by BerryGenomics Company (Beijing, China), and constructed on the Illumina TruSeq@ DNA PCR-Free HT Kit with target insert size 500 bp. Trimmomatic version 0.36 [16] was used to delete adapter sequences and trim low-quality bases. Considering the short reads generated in this research, the presumed mitochondrial targets were ascertained by BLAST searching from the raw reads. A Perl script (FastqExtract.pl accessed on 11 March 2021) was used to obtain the targeted mitochondrial reads which were assembled into contigs under IDBA UD version 1.1.1 and Celera Assembler version 8.3rc2 [17]. Geneious version 8.0 was used to conduct the de novo assembly of the mitochondrial contigs [18].

Mitochondrial Genome Annotation and Analysis
The sequences were annotated online by Mitos Web Server [19] with the parameters Genetic Code = "5 Invertebrate" and Reference = "RefSeq 63 Metazoa". The mtgenomic maps of the four Dolichovespula species were generated by OrganellarGenomeDRAW [20]. The PCGs were determined by finding the open reading frames (ORFs) and then performing BLAST searches in GenBank. The tRNA gene loci and secondary structure were predicted using the tRNAscan-SE Search Server version 1.21 [21]. The rRNA genes and A + T-rich regions were identified by their adjacent genes. The nucleotide composition was automatically displayed in the Geneious software. AT-and GC-skews were computed as follows: AT-skew = (A − T%)/(A + T%) and GC-skew = (G − C%)/(G + C%) [22]. The start/stop codon usage of 13 protein-coding genes was calculated by CodonW (written by John Peden, University of Nottingham, Nottingham, UK). The gene arrangements were compared with the putative ancestral arrangement of the insect mitochondrial genome [23].

Phylogenetic Analysis
Combining the available data published on NCBI, sequences of 13 PCGs and two rRNAs (rrnL and rrnS) of 38 vespids species were used for phylogenetic analyses with an additional two species of Formicidae as outgroups (Table S2). Sequences of five mitochondrial genes (cob, cox1, cox2, rrnL, and rrnS) of 14 Dolichovespula species were also used to construct the phylogenetic trees of the genus Dolichovespula (Table S3). Supplementary  Tables S2 and S3 show detailed information for the species included in this study.
The PCGs of each species were translated into amino acids first, and then aligned using ClustalW with the default parameters implemented in MEGA 7.0 [24]. The two rRNA genes were aligned by MAFFTv7.0 online server (the Q-INS-i strategy is selected in iterative refinement methods because it considers the secondary structure of rRNA) [25]. The ambiguous positions in all alignments were deleted using Gblocks v0.91b [26]. Then, these aligned sequences were concatenated using SequenceMatrix v1.7 [27]. The best partitioning strategy and substitution models for each dataset were determined by PartitionFinder 2.1.1 with the following settings: branch lengths as linked; model election as AICc with the greedy algorithm [28]. Due to the predefined partitions of configuration files, the PCG123 dataset was determined to be 39 data blocks (three codon positions of 13 PCGs were separated), and the PCG123R (PCG123 and rRNAs) dataset was determined to be 41 data blocks (39 data blocks for 13 PCGs and 2 data blocks for rrnL and rrnS, respectively) in the phylogenetic analysis of Vespidae. The partitioning schemes and models are shown in Tables S4 and S5. In the same way, the Dolichovespula dataset was determined to be 11 data blocks (9 data blocks for the 3 PCGs and 2 data blocks for rrnL and rrnS, respectively). The partitioning schemes and models are shown in Tables S6 and S7. The PCG123 dataset and PCG123R dataset were used in follow-up phylogenetic analyses.
The phylogenetic trees were generated using the Bayesian inference (BI) method by MrBayes v3.2.2 [29]. Four independent Markov chains were run for 10 million metropoliscoupled generations. Samples were taken every 1000 generations, with the first 25% thrown away as burn-in. Maximum-likelihood (ML) analysis was performed on 1000 repetitions using RAxML based on the fast likelihood method [30]. The CIPRES Science Gateway V3.3 was used to operate RAxML [31].

Divergence Time Estimation
The divergence time of Vespidae was estimated with BEAST v1.10.4 [32] based on the available data (PCG123R dataset). An uncorrelated lognormal relaxed clock model was employed. The best partitioning strategy and substitution models were determined using PartitionFinder 2.1.1 in the same way as the MrBayes analyses mentioned above, and considering the evaluation models usable in BEAST. The models of clock and tree except that of substitution were linked among partitions. The tree prior was a Yule process with a random starting tree.
The divergence time of the family Vespidae was estimated based on the two fossilbased calibration dates viz., Solenopsis richteri and Myrmica scabrinodis as a taxon set being used to calibrate node age prior under a normal prior distribution with a standard deviation of 0.5 and a mean of 93.4 [33], Abispa ephippium and Orancistrocerus aterrimus as another set being used to calibrate the divergence time with a standard deviation of 0.5 and a mean of 92 [34]. For the test runs with MCMC (Markov Chain Monte Carlo) the length was set to 500 million generations and sampling was undertaken every 10,000 generations.
A Tracer v1.7.2 with 10% burn-in was used to test the convergence of the chains to the stationary distribution. A TreeAnnotator v1.10.4 with 20% burn-in and a posterior probability limit of 50% was used to generate the dated maximum clade credibility tree. The final tree was visualized and edited in FigTree v1.4.4.

General Features of the Genomes
The four mtgenomes we sequenced are partial genomes mainly because of the A + T-rich region variations and the deletion of a few PCGs sequences. A string of NNNs was inserted representing the gaps in the sequences submitted to GenBank. However, to reflect the relative reality of the mtgenomes, this paper does not consider the inserted NNNs when analyzing the four mitochondrial genome sequences.
The mtgenome of each sequenced species is a double-strand of circular molecular DNA consisting of 37 genes (13 PCGs, 22 tRNA genes, and two rRNA genes) and a partial control region ( Figure 1). The majority strands (J-strands) of mtgenomes of D. flora and D. saxonica each have 23 genes, and the minority strands (N-strands) each contain the remaining 14 genes. The J-strands of the mtgenomes of D. lama and D. xanthicincta have 22 and 24 genes, respectively, and the N-strands contain the remaining 15 and 13 genes, respectively. The base composition of the mtgenomes is revealed using the AT-skew, GC-skew, and A + T content ( Table 1).
The assembled mtgenomes of the four species, viz. Dolichovespula flora, D. lama, D. saxonica, and D. xanthicincta, were 16,064 bp, 16,011 bp, 15,682 bp, and 15,941 bp long in turn, and their A + T content covered about 82.78%, 82.85%, 83.05%, and 82.21% of the genome, respectively. Each mtgenome has several intergenic regions and gene overlap regions which may increase the stability of the mitochondrial structures [35].

Protein-Coding Genes and Codon Usage
Total length of 13 PCGs in Dolichovespula flora were 10,831 bp, accounting for 67.42% of the whole mtgenome. The gene nad2 is only 672 bp long, shorter than in the other vespid species (Table S8), possibly because we do not have the complete sequence of the nad2. Compared with the complete mtgenome of Dolichovespula panda, there were estimated gaps (marked as NNNs) about 378 bp long in nad2 and 57 bp long in nad6. The A + T content of all 13 PCGs was 80.31% (Table 1). The PCG with the highest A + T content was nad2 (86.31%) and the lowest was cox1 (73.51%) ( Table 2). Three PCGs (nad3, nad5, nad1) started from the ATA codon, six genes (cox1, atp6, cox3, nad4, nad6, cob) from ATG, and four genes (nad2, cox2, atp8, nad4l) from ATT. The PCGs ended with a typical stop codon TAA, except for cox1 and nad1, which ended with the incomplete termination codon TA. All 13 PCGs in Dolichovespula lama were 11,204 bp long, accounting for 69.98% of the whole mtgenome. Compared with D. panda, there were estimated gaps 54 bp long in nad6. The A + T content of all 13 PCGs was 81.03% (Table 1). The PCG with the highest A + T content was nad2 (87.84%) and the lowest was cox1 (72.74%) ( Table 2). Three PCGs (nad3, nad4l, nad1) were initiated from the ATA codon, six genes (cox1, atp6, cox3, nad4, nad6, cob) from ATG, and four genes (nad2, cox2, atp8, nad5) from ATT. The PCGs ended with a typical termination codon TAA, except for cox3, which ended with the incomplete stop codon TA.
The 13 PCGs in Dolichovespula saxonica were 10,528 bp in length, accounting for 67.13% of the whole mtgenome. Possibly because we do not have the complete sequence, length of the cox1 gene is only 848 bp, which is much shorter than in the other vespid species (Table S8). Compared with D. panda, there were estimated gaps 687 bp long in cox1 and 51 bp long in nad6. The A + T content of all 13 PCGs was 80.70% (Table 1). The PCG with the highest A + T content was nad2 (88.61%) and the lowest was cox1 (73.30%) ( Table 2). One PCG (nad1) started from the ATA codon, seven genes (nad2, cox1, atp6, cox3, nad4, nad6, cob) from ATG, and five genes (cox2, atp8, nad3, nad5, nad41) from ATT. The PCGs ended with a typical stop codon TAA, except that cox1 and cox3 ended with the incomplete TA. The 13 PCGs in Dolichovespula xanthicincta were 11,222 bp in length, accounting for 70.40% of the whole mtgenome. Compared with D. panda, there were estimated gaps 24 bp long in nad6. The A + T content of all 13 PCGs was 79.96% (Table 1). The PCG with the highest A + T content was nad4 (85.38%) and the lowest was cox1 (74.14%) ( Table 2). Three PCGs (nad3, nad4l, nad1) started from the ATA codon, six genes (cox1, cox3, atp6, nad4, nad6, cob) from ATG, and four genes (nad2, cox2, atp8, nad5) from ATT. PCG cox3 ended with the incomplete stop codon TA, nad1 and nad4 ended with the termination codon TAG, and the remaining genes ended with a typical termination codon TAA.
The values of codon usage in the mtgenomes of the four Dolichovespula species show a remarkable bias toward A and T nucleotides. PCGs nad2, nad6, and atp8 have the highest levels of A + T content while the cox1 has the lowest level (Table 2). Among the 13 PCGs of the four mtgenomes, there were three types of start codon (ATT, ATA, and ATG) and three types of termination codon (TAA, TAG, and TA). The frequencies of Phe, Leu, and Ile were noticeably higher than those of other amino acids in the four sequenced mtgenomes, and the most frequent codons for the three amino acids were TTT (Phe), TTA (Leu), and ATT (Ile). Relative synonymous codon usage (RSCU) in the mitochondrial genomes of the Dolichovespula species is shown in Table S9 and Figures S1-S4.
In addition, 38 vespids were selected for phylogenetic tree construction with their start codons and stop codons compared in Figure 2.

tRNA and rRNA Genes
Each mtgenome of the four Dolichovespula species contained 22 tRNA genes and two rRNA genes. Among them, 13 tRNA genes were found on the J-strand and the remaining nine were on the N-strand in the mtgenomes of D. lama. Fourteen tRNA genes were located on the J-strand and 8 tRNA genes on the N-strand in the mtgenomes of D. flora, D. saxonica, and D. xanthicincta. All tRNAs formed a typical secondary structure (cloverleaf structure) except trnH lacked the TΨC loop ( Figures S5-S8). The initiation of tRNA of D. xanthicincta arrangement as trnY-trnM-trnI-trnQ was different from the other three species as trnY-trnI-trnM-trnQ.
Two rRNAs of the four Dolichovespula species were located on the N-strand. The rrnS was located between trnV and the A + T-rich region, while the rrnL was located upstream at rrnS, between nad1 and trnV.
The total length of 22 tRNA genes in Dolichovespula flora, D. lama, D. saxonica and D. xanthicincta was 1492 bp, 1497 bp, 1495 bp and 1485 bp, in turn. Each tRNA gene was between 62 bp and 73 bp long (Table 1). Interestingly, the shortest tRNA gene of each species was trnS1 with a length of 63 bp except for D. flora, which had a length of 62 bp. However, the longest gene was not the same ( Table 2). In the four species, the tRNA gene with the lowest A + T content was trnK (about 76%). The gene with the highest A + T content was trnV in D. lama (95.52%), D. saxonica (92.54%) and D. xanthicincta (95.59%), but trnE (93.94%) in D. flora ( Table 2).
Three gene rearrangement events occurred in the four Dolichovespula species (Figure S9), viz. the translocation of trnY to upstream of trnI (as in Vespa mandarinia and Dolichovespula panda), translocation of trnL1 to the region between trnS2 and nad1(as in Vespa, Polistes, and D. panda), and shuffling of trnQ and trnM (as in D. panda) except D. xanthicincta shuffling of trnQ, trnM, and trnI. There were additional events different from each other as follows in D. flora: shuffling of trnE and trnS1, the position and orientation of trnN and trnF were changed; in D. lama: shuffling of trnE and trnN, local inversion of trnS1; in D. saxonica: shuffling of trnE and trnS1, remote inversion of trnN and trnF; in D. xanthicincta: shuffling of trnE and trnS1, the position and orientation of trnN and trnF were changed.

Phylogenetic Relationships
We analyzed the datasets mentioned above (PCG123, PCG123R for Vespidae and five selected genes for Dolichovespula) by BI and ML methods and obtained the relevant phylogenetic trees. The phylogenetic trees (Figures 3 and 4) produced by the PCG123R and PCG123 datasets revealed the phylogeny of 38 species of Vespidae. The BI-and ML trees generated by the five genes datasets represented the phylogeny of 14 species of Dolichovsepula ( Figure 5).
However, the topologies of phylogenetic trees derived from the two datasets were different within the genus Dolichovespula (Figures 3 and 4).   Based on five mitochondrial genes (cob, cox1, cox2, rrnL and rrnS) of 14 species of Dolichovespula and two species of Polistes as outgroup, two phylogenetic trees (BI and ML) were constructed ( Figure 5). In both trees, the five species i.e., D. panda, D. lama, D. saxonica, D. xanthicincta and D. flora, clustered at the base (except the outgroup); two sister species D. maculata and D. media combined with D. sylvestris constituted a clade, and the relationships as (D. arctica + D. adulterina) + D. omissa were strongly supported. Compared with the species groups divided by Archer [2,7], the species belonging to adulterina group and lama group were well clustered. However, the position of D. flora was poorly supported in the maculata group. The cladogram of the BI tree was different from that of the ML tree on the position of D. arenaria with D. albida + D. pacifica. Considering the data confidentiality level, the relationships indicated by the BI tree were more accurate than those indicated by the ML tree.

Estimated Time of Divergence
The PCG123R dataset on the 38 species of Vespidae was selected to estimate the time of divergence. As shown in the cladogram, the most recent common ancestor of Vespidae diverged around 106 Ma. Sequentially, the subsocial subfamily  Based on five mitochondrial genes (cob, cox1, cox2, rrnL and rrnS) of 14 species of Dolichovespula and two species of Polistes as outgroup, two phylogenetic trees (BI and ML) were constructed ( Figure 5). In both trees, the five species i.e., D. panda, D. lama, D. saxonica, D. xanthicincta and D. flora, clustered at the base (except the outgroup); two sister species D. maculata and D. media combined with D. sylvestris constituted a clade, and the relationships as (D. arctica + D. adulterina) + D. omissa were strongly supported. Compared with the species groups divided by Archer [2,7], the species belonging to adulterina group and lama group were well clustered. However, the position of D. flora was poorly supported in the maculata group. The cladogram of the BI tree was different from that of the

Estimated Time of Divergence
The PCG123R dataset on the 38 species of Vespidae was selected to estimate the time of divergence. As shown in the cladogram, the most recent common ancestor of Vespidae diverged around 106 Ma. Sequentially, the subsocial subfamily Stenogastrinae was separated from the other subfamilies of Vespidae at about 99 Ma. The mainly solitary subfamily Eumeninae diverged around 95 Ma. The social subfamily Polistinae divided with the eusocial subfamily Vespidae about 42 Ma and the origin of the genus Dolichovespula was estimated at around 25 Ma (Figure 3).

Discussion
The wasp family Vespidae (Hymenoptera), as model taxa in understanding the evolution from solitary to social habit, consists of more than 5000 species [4]. Based on morphological and behavioral characteristics and then combining them with available molecular sequence data, Carpenter [45,46] supported monophyly of the social subfamilies with phylogenetic relationships as Euparagiinae + (Masarinae + (Eumeninae + (Stenogastrinae + (Polistinae + Vespinae)))). A single origin of eusociality and a view that Vespidae contains six subfamilies has been widely supported, but later Schmitz and Moritz argued that Polistinae + Vespinae had a closer relationship with Eumeninae than Stenogastrinae based on cladistic analyses of molecular data (nuclear 28S rDNA and mitochondrial 16S rDNA) [47]. This suggests that eusociality evolved twice. Hines et al. inferred the same opinion based on four nuclear encoded genes (18S and 28S rDNA, abdominal-A, and RNA polymerase II) [48]. Since Pickett and Carpenter [43] reaffirmed their view of one origin of eusociality based on the largest taxon sample including molecular data, and the largest phenotypic character dataset ever compiled, there are still strongly opposed voices based on transcriptome and other molecular studies [34,[49][50][51]. Recently, Piekarski et al. divided the family Vespidae into eight subfamilies based on the Maximum-Likelihood tree constructed from 235 loci selected in 163 Vespidae taxa [51]. Two subfamilies Gayellinae and Zethinae sunk by Carpenter were resurrected by Piekarski et al. [45,46,51]. Interestingly, Piekarski et al. showed the phylogenetic relations as Stenogastrinae ((Masarinae + (Gayellinae + Masarinae)) + (Eumeninae + (Zethinae + (Polistinae + Vespidae)))). In other words, the subsocial subfamily Stenogastrinae appeared first, followed by the emergence of the solitary subfamilies, such as Eumeninae. Finally, the eusocial subfamilies Polistinae and Vespinae were separated from other Vespidae. The results lead to the rejection of the single origin of eusociality hypothesis and strongly support the two-origin hypothesis of eusociality in Vespidae. Obviously, the molecular and phenotypic evidence are still in conflict. Piekarski et al. claim that the molecular evidence supports the dual origin, while the single origin hypothesis is more convincing when all available evidence is considered [52]. For a group of 5000 described species, broader sampling and more new information especially molecular data and analyses are expected to help solve the problem. In this study, we added mt DNA of four Dolichovespula species and compared it to that of all other known species. As might have been expected, our results corroborated the dual origin hypothesis.
The eusocial subfamily Vespinae consists of four genera, known as hornets (Provespa and Vespa) and yellowjackets (Vespula and Dolichovespula), for which phylogenetic relationships remain controversial [3,53]. Based on morphological and behavioral and/or molecular data, some analyses support that yellowjackets (Vespula + Dolichovespula) are a clade sister to Provespa, placing Vespa as a clade sister to the remaining vespine genera [41,44,54]. However, Pickett and Carpenter found Provespa + Vespa as a clade sister to Vespula + Dolichovespula [43]. By analyzing the amino acid sequences of antigen 5, Pantera et al. drew a neighbor-joining dendrogram, and found that Vespula is closer to Vespa than to Dolichovespula [55]. Inversely, Lopez-Osorio et al., using transcriptomic (RNA-seq) data, found that Dolichovespula is more closely related to Vespa than to Vespula, therefore challenging the prevailing hypothesis that yellowjacket (Vespula + Dolichovespula) was a clade [53]. In recent years, some scholars [9,39,40] based on mitochondrial genome research speculated a sister group relationship between Dolichovespula and a clade formed by Vespa and Vespula which supported the conclusion of Pantera et al. [55]. In this study, we confirmed the former conclusion of Pantera et al. [55].
By examining the distribution pattern and phylogenetic research on extant species we might uncover some clues concerning the time of evolution of bios. Hymenoptera was speculated to have started to diversify about 281 Ma by analyzing 3256 protein-coding genes in 173 insect species [49]. The fossil record shows that Vespidae evolved at least by the Early Cretaceous around 120-65 Ma [56]. Perrard et al. estimated that the minimum age of the Eumeninae is approximately 90 Ma [57]. Tan et al. mentioned that the origin of the genus Zethus could be estimated to be between 90 and 80 Ma, allowing for the Gondwanic distribution pattern [5]. In this study, the estimated time of origin for Vespidae, Eumeninae and Vespinae agreed with the former research [5,56,57]. This is the first estimate of the time of origin of the genus Dolichovespula (approximately 25 Ma) and the largest sample of molecular data for Dolichovespula used for phylogenetic analysis.
It is assumed that the diversification center of Vespinae lies in the mountainous regions in the subtropics and warm temperate zone of eastern Asia. The area from the eastern Himalaya to southern China harbors the largest number of hornet species, and fits for the Beringian distribution pattern [58][59][60][61]. Tan et al. supposed that Mt. Qin and Ba are the most speciose areas and contain the central area of Vespinae [3]. The genus Dolichovespula, D. panda located at the base of the life tree, is found only in China (Sichuan, Shaanxi and Ningxia). D. lama, D. flora and D. xanthicincta are distributed from China to its neighboring countries (i.e., India and Nepal for D. lama; Myanmar and Korea for D. flora; Bhutan and Myanmar for D. xanthicincta). D. maculata, D. arctica, D. arenaria and D. albida are mainly distributed in North America (USA and Canada) and the other remaining congeners are widely distributed in Eurasia. In phylogeny analysis, we can conclude that the species in China and its surrounding countries are older than those in other regions. Our findings agreed with the hypothesis that the subfamily Vespinae originated in subtropical, warm eastern Asia and that it had a Beringian distribution pattern.

Conclusions
This study reports the mtgenomes of four Dolichovespula species. Among them, the mtgenomes of D. flora, D. lama, and D. xanthicincta are sequenced for the first time. The annotated mitochondrial genomic sequences of Vespidae in GenBank (including our study) suggest that Stenogastrinae is the oldest subfamily within Vespidae, and is located at the base of the phylogenetic trees, which supports the two-origin hypothesis of eusociality in Vespidae. We concluded that the relationships among genera were (Vespa + Vespula) + Dolichovespula in the subfamily Vespinae. In addition, the results suggest that Dolichovespula species in China and its adjacent countries are older than those in other regions and support the hypothesis that Vespinae originated in subtropical, warm eastern Asia and that it had a Beringian distribution pattern.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/ani12213004/s1, Figure S1: Relative synonymous codon usage (RSCU) for PCGs of the mtgenomes of Dolichovespula flora. Figure S2: Relative synonymous codon usage (RSCU) for PCGs of the mtgenomes of Dolichovespula lama. Figure S3: Relative synonymous codon usage (RSCU) for PCGs of the mtgenomes of Dolichovespula saxonica. Figure S4: Relative synonymous codon usage (RSCU) for PCGs of the mtgenomes of Dolichovespula xanthicincta. Figure S5: Cloverleaf structure of 22 tRNAs in the mitochondrial genome of Dolichovespula flora. Figure S6: Cloverleaf structure of 22 tRNAs in the mitochondrial genome of Dolichovespula lama. Figure S7: Cloverleaf structure of 22 tRNAs in the mitochondrial genome of Dolichovespula saxonica. Figure S8: Cloverleaf structure of 22 tRNAs in the mitochondrial genome of Dolichovespula xanthicincta. Figure S9: Mitochondrial genome organization of Vespidae referenced with the ancestral insect mtgenomes. The underlined symbols are located on the N-strand and others on the J-strand. The white, green, yellow and pink blocks denote tRNAs, rRNAs, PCGs and control regions, respectively. The red font means rearranged genes. Table S1: Collection information of Dolichovespula newly sequenced in this study. Table S2: Mtgenomes of Vespidae in GenBank. Table S3: Dolichovespula and outgroup species included in our study, and corresponding Genbank accession numbers.  Institutional Review Board Statement: Not applicable for studies not involving humans or animals.

Informed Consent Statement: Not applicable.
Data Availability Statement: The data underlying this article are available in the GenBank Nucleotide Database at https://www.ncbi.nlm.nih.gov/genbank/ (accessed on 16 December 2021), and can be accessed with accession numbers OP250139, OP250140, OP250141, OP250142.

Conflicts of Interest:
The authors declare no conflict of interest.