Mitogenomic Comparison of the Mole Crickets Gryllotalpidae with the Phylogenetic Implications (Orthoptera: Ensifera)

Simple Summary We sequenced the complete mitochondrial genomes (mitogenomes) of Gryllotalpa henana Cai & Niu, 1998 and the Chinese G. orientalis Burmeister, 1838 for the first time, and reconstructed the mitogenomic phylogeny of the infraorder Gryllidea. The results show that the two new mitogenomes are double-stranded circular molecules with a typical gene complement, gene arrangement and base composition, the same as those of other gryllotalpids and ancestral insects. Tandem repeats of the control region were discovered in Gryllotalpidae for the first time. Considering both the high nucleotide divergence and the elevated ratio of Ka/Ks, the genes nad2 and nad6 may be evaluated as potential markers for future phylogeny and species delimitation in Gryllotalpidae. The results of phylogenetic analyses provide supports for the mitogenomic and transcriptomic trees, but partially contradict those of the multilocus phylogenies. Abstract Owing to limited molecular data, the phylogenetic position of the family Gryllotalpidae is still controversial in the infraorder Gryllidea. Mitochondrial genome (mitogenome) plays a crucial role in reconstructing phylogenetic relationships and revealing the molecular evolution of insects. However, only four mitogenomes have been reported in Gryllotalpidae to date. Herein, we obtained the first mitogenomes of Gryllotalpa henana Cai & Niu, 1998 and the Chinese G. orientalis Burmeister, 1838, made a detailed comparison of all mitogenomes available in Gryllotalpidae and reconstructed the phylogeny of Gryllidea based on mitogenomes using Bayesian inference (BI) and maximum likelihood (ML) methods. The results show that the complete mitogenome sequences of G. henana (15,504 bp) and G. orientalis (15,497 bp) are conserved, both exhibiting the double-stranded circular structure, typical gene content and the ancestral insect gene arrangement. The complete mitogenome of G. henana exhibits the lowest average AT content ever detected in Gryllotalpidae, and even Gryllidea. The gene nad2 of both species has atypical initiation codon GTG. All tRNAs exhibit typical clover-leaf structure, except for trnS1 lacking the dihydrouridine (DHU) arm. A potential stem–loop structure, containing a (T)n(TC)2(T)n sequence, is detected in the control region of all gryllotalpids investigated and is likely related to the replication initiation of the minority strand. The phylogenetic analyses recover the six families of Gryllidea as Gryllotalpidae + (Myrmecophilidae + (Mogoplistidae + (Trigonidiidae + (Phalangopsidae + Gryllidae)))), similar to the trees based on transcriptomic and mitogenomic data. However, the trees are slightly different from the multilocus phylogenies, which show the sister-group relationship of Gryllotalpidae and Myrmecophilidae. The contradictions between mitogenomic and multilocus trees are briefly discussed.


Introduction
The mitochondrial genomes (or mitogenomes) of insects are double-stranded circular molecules with lengths ranging from approximately 15 kb to 20 kb, and generally comprise 37 genes with 13 protein-coding genes (PCGs), 22 transfer RNA genes (tRNAs), two ribosome RNA genes (rRNAs) and a non-coding control region (CR) [1]. Mitogenomes are one of the most information-rich characteristics, and are useful in phylogeny, evolutionary a step size of 25 bp were performed to estimate the sequence diversity for each independent PCG, using DnaSP 6.0. Genetic distances based on 13 PCGs were estimated using MEGA 7.0 with Kimura-2-parameter (K2P) [30]. AT-content (the proportion of A + T out of the total) was used to assess the overall composition of the double-stranded molecule [31]. Strand asymmetry was calculated according to the formula: AT-skew = (A − T)/(A + T) and GC-skew = (G − C)/(G + C) [32]. The AT-content, AT-skew and GC-skew were graphically plotted by Origin 2018 (OriginLab Corp., Northampton, MA, USA). The Pi values were graphically plotted by CorelDRAW 2020 (Corel Corp., Ottawa, ON, Canada). The genetic distance and Ka/Ks ratios were graphically plotted by Microsoft Excel spreadsheet.

Phylogenetic Analyses
Twenty-eight species from six families of Gryllidea were chosen as the ingroup, and four species in Tettigoniidea and one species in Schizodactyloidea were selected as outgroups. The detailed information of species used in phylogenetic analyses were listed in Table 1. Statistics for the basic characteristics of the mitogenome and the extraction of PCGs and rRNAs were produced by PhyloSuite. The alignment of all 13 PCGs was conducted in batches with MAFFT integrated into PhyloSuite with codon alignment mode setting [33,34]. Two rRNAs were aligned using the Q-INS-i algorithm incorporated into MAFFT-with-extensions software (http://mafft.cbrc.jp/alignment/server/, accessed on 29 March 2022) [33]. Ambiguous sites of alignments of all genes were manually removed, and the modified alignments were concatenated using PhyloSuite [34].
Phylogenetic analyses were conducted using four different datasets: (1) P123: 13 PCGs (10,899 bp), (2) P123R: 13 PCGs + 2 rRNAs (13,550 bp), (3) P12: 13 PCGs excluding the third codon position (7266 bp), (4) P12R: 13 PCGs excluding the third codon position + 2 rRNAs (9917 bp). Phylogenetic trees were reconstructed using Bayesian inference (BI) and maximum likelihood (ML) analyses, with partition strategies for analyzing mitogenome data according to Leavitt [35]. The best-fit partition schemes and models for BI analyses were inferred using PartitionFinder 2 [36] integrated into PhyloSuite [34], and are shown in Supplementary Table S1. BI trees were conducted using MrBayes 3.2.6 [37] with 10 million MCMC generations, sampling every 1000 generations. The convergence was considered to be reached when the average standard deviation of the split frequencies was lower than 0.01. The first 25% were discarded as "burn-in", and the remaining samples were used to generate the majority consensus trees and estimate the posterior probabilities (PPs). The best-fit substitution models for ML analyses were selected by ModelFinder [38], and shown in Supplementary Table S2. ML trees were reconstructed using IQ-TREE integrated into PhyloSuite under Ultrafast bootstrap. Bootstrap supports (BSs) were evaluated with 1000 replicates.
Two separated features, base proportion (AT-content) and strand asymmetry (ATand GC-skew), are used to assess the base compositional bias of mitogenomes [31,32]. The AT-content of the whole mitogenomes ranges from 66.4% in G. henana to 72.2% in G. pluvialis (Figure 2A, Table 2), indicating that the overall composition is biased towards A and T in Gryllotalpidae. The CR in all six species exhibits a higher value of the AT-content (74.9-81.1%), followed successively by tRNAs (72.3-74. Table 2). The entire mitogenomes of all gryllotalpids exhibit the typical skew pattern of insects with positive AT-skew (0.042-0.072) and negative GC-skew (−0.451-−0.295), indicating that the majority strand of mitogenomes is biased in favor of A and C ( Figure 2B,C, Table 2). The skew patterns of the four genomic regions are conserved in Gryllotalpa, and exhibit negative AT-and GC-skew in PCGs, positive AT-and Two separated features, base proportion (AT-content) and strand asymmetry (ATand GC-skew), are used to assess the base compositional bias of mitogenomes [31,32]. The AT-content of the whole mitogenomes ranges from 66.4% in G. henana to 72.2% in G. pluvialis ( Figure 2A, Table 2), indicating that the overall composition is biased towards A and T in Gryllotalpidae. The CR in all six species exhibits a higher value of the AT-content (74.9-81.1%), followed successively by tRNAs (72.3-74. Table 2). The entire mitogenomes of all gryllotalpids exhibit the typical skew pattern of insects with positive AT-skew (0.042-0.072) and negative GC-skew (−0.451-−0.295), indicating that the majority strand of mitogenomes is biased in favor of A and C ( Figure 2B,C, Table 2

Protein-Coding Genes and Codon Usage
The concatenated sequence of the PCGs is 11,097 bp in G. henana and 11,109 bp in the Chinese G. orientalis, accounting for 71.6% and 71.7% of their whole mitogenomes, respectively (Tables 2 and 3). The 13 PCGs of the two new mitogenomes, similar to those of the other four gryllotalpids, contain two ATPase subunits (atp6 and atp8), three cytochrome c oxidase subunits (cox1-3), one cytochrome b gene (cytb), and seven NADH dehydrogenase subunits (nad1-6 and nad4L) ( Figure 1, Table 3). The lengths of the 13 PCGs range from 156 bp of atp8 to 1723 bp of nad5 in both mitogenomes newly sequenced. The shortest atp8 and longest nad5, also found in other four gryllotalpids, are common features in metazoan mitogenomes [52,53].
All PCGs of the two new mitogenomes have the typical initiation codon (ATN), except for nad2 starting with GTG ( Table 3). The atypical initiation codon is also present in the mitogenomes of two other mole crickets (the Korean G. orientalis and G. unispina) and two katydids (Kuwayamaea brachyptera Gorochov & Kang, 2002 and Ruidocollaris obscura Liu & Jin, 1999) [12,13,54]. The termination codons are relatively conserved in Gryllotalpidae. Most of them are complete triplet bases TAA/TAG, and others are incomplete T/TA immediately followed by or partially overlapped with a tRNA gene. Incomplete stop codons are fairly common in the orthopteran mitogenomes and can be converted into a potential stop codon via polyadenylation to TAA [18,39,40,55]. The results of RSCU analyses show that the PCGs exhibit strong biases toward the nucleotides A and U in the codon usage. The four most frequent codons (UUU/Phe, UUA/Leu2, AUU/Ile, and AUA/Met) are the same in Gryllotalpidae, and all composed wholly of A or U (Figure 3, Supplementary Table S3). The codons ending with A/U occur more frequently than that with G/C, suggesting that the AU composition at third position of codons positively influences the nucleotide AT (or AU) bias of the PCGs in Gryllotalpidae.

Transfer and Ribosomal RNA Genes
The 22 tRNAs of the two new mitogenomes are scattered around the circular DNA molecule, and are arranged identically in order and direction (Figure 1). The tRNAs of gryllotalpids retain the ancestral gene order [10][11][12][13], whereas multiple patterns of tRNA rearrangements have been detected in many other ensiferans [39,48] and most caeliferans [35]. All tRNAs exhibit typical clover-leaf structure, except for trnS1 (Figure 4). The dihydrouridine (DHU) arm of trnS1 forms a simple loop as in many other metazoans including gryllotalpids [10][11][12][13]52,56]. The length of tRNAs varies from 62 bp (trnC) to  (Table 2), both within the variation range in Gryllotalpidae. The trnG gene of gryllotalpids generally exhibits the lowest nucleotide substitutions, while trnL1, trnW and trnY genes tend to be more variable among 22 tRNA genes ( Figure 4). All tRNAs in the mitogenomes of Gryllotalpidae possess invariable length of 7 bp for both the acceptor stem and the anticodon loop. The length of anticodon stem is relatively conservative, varying from 4 bp in trnK and trnM to 5 bp in the rest of tRNAs. Most of the size variations among tRNAs stemmed from the length variation in DHU and TψC arms, within which the size of loops (all 3-10 bp) is more variable than that of stems (all 3-5 bp).
both within the variation range in Gryllotalpidae. The trnG gene of gryllotalpids generally exhibits the lowest nucleotide substitutions, while trnL1, trnW and trnY genes tend to be more variable among 22 tRNA genes ( Figure 4). All tRNAs in the mitogenomes of Gryllotalpidae possess invariable length of 7 bp for both the acceptor stem and the anticodon loop. The length of anticodon stem is relatively conservative, varying from 4 bp in trnK and trnM to 5 bp in the rest of tRNAs. Most of the size variations among tRNAs stemmed from the length variation in DHU and TψC arms, within which the size of loops (all 3-10 bp) is more variable than that of stems (all 3-5 bp).
The tRNAs of G. henana possess a total of 36 unmatched base pairs, including 31 GU mismatches in most tRNAs, three AC mismatches in the anticodon stem of trnS1 and trnW and the TψC stem of trnN, and two UU mismatches in the acceptor stem of trnD and the anticodon stem of trnA (Figure 4, Supplementary Table S4). A total of 30 mismatches were detected in the Chinese G. orientalis. Twenty-seven of them are GU pairs, two are UU mismatches in the DHU stem of trnC and the acceptor stem of trnL1, and one is AA pair in the anticodon arm of trnS1. The mismatch number in the Chinese G. orientalis is lower than that in the Korean one (36 mismatches) (Supplementary Table S4), suggesting that the mitogenomes are differentiated intraspecifically.  Table S4), suggesting that the mitogenomes are differentiated intraspecifically.
The two rRNA genes (rrnL and rrnS) are located in the conserved positions as in mitogenomes of other gryllotalpids [10][11][12][13]. rrnL is present between trnL1 and trnV, while rrnS between trnV and the CR (Figure 1). The two genes rrnL and rrnS are 1214 and 733 bp long in G. henana, and 1236 and 730 bp long in the Chinese G. orientalis, respectively ( Table 3). The lengths range from 1214 to 1247 bp for rrnL, and from 719 to 783 bp for rrnS in Gryllotalpidae. The AT content of rRNAs is 67.1% in G. henana and 73.2% in the Chinese G. orientalis. The value of AT content is lower in G. henana than those in the other gryllotalpids and many other orthopterans [12,48,[57][58][59][60][61][62][63].

Intergenic Spacers and Gene Overlaps
In G. henana, intergenic spacers are distributed in 12 regions and range in size from 1 to 71 bp with a total of 162 bp (Table 3). Eleven intergenic spacers exist in the mitogenome of the Chinese G. orientalis, ranging from 1 to 25 bp and adding up to 73 bp. The largest has 71 bp located between rrnL and trnV in G. henana, whereas there are 25 bp located between trnS2 and nad1 in the Chinese G. orientalis. Two identical intergenic spacers were detected in the mitogenomes of all gryllotalpids. One is between nad4L and trnT (2 bp), and the other is between nad1 and trnL1 (1 bp). In most cases, the intergenic spacers consist of only 1 or 2 bp.
The gene overlaps of G. henana are distributed in 13 locations with a total of 52 bp, whereas those of the Chinese G. orientalis are in 10 locations with a total of 34 bp (Table 3). The longest gene overlap is 17 bp between trnL1 and rrnL in G. henana, and 8 bp between trnW and trnC in the Chinese G. orientalis. All six gryllotalpids have five identical overlapping regions, including trnK-trnD (1 bp), trnE-trnF (2 bp), trnI-trnQ (3 bp), nad4-nad4L (7 bp) and trnW-trnC (8 bp). In general, the variability of gene overlaps is lower than that of intergenic spacers.

Control Region
The CR, also called AT-rich region, is located in the conserved position between rrnS and trnI ( Figure 1, Table 2). The AT-content of this region is 81.1% in G. henana and 77.3% in the Chinese G. orientalis. In all six gryllotalpids, the Korean G. orientalis shows the lowest AT content of 74.9%, whereas G. henana exhibits the highest 81.1%. The CRs of Gryllotalpidae show low variations in lengths, which range from 863 bp in G. henana to 920 bp in the Korean G. orientalis. The low variations of CR in length are likely attributed to the lacking of conspicuous repeats, which are often found in other insects [6,54,[64][65][66]. Two kinds of short repeats were detected in Gryllotalpidae for the first time (Supplementary Table S5). One is the microsatellite (TA) n element found in G. henana and the Chinese G. orientalis. The other recognized in G. pluvalis is the duplicated tandem repeat, containing 18 bp (ATATAATTAAATATTTAA) with 2.3 copies. A potential stem-loop structure, containing (T) n (TC) 2 (T) n sequences, was detected in the CR near the trnI gene of G. henana and the Chinese G. orientalis, same as the findings in other gryllotalpids ( Figure S1). Similar structures were also found in many crickets of Gryllidea [46,67], and likely related to replication initiation of the N-strand [68].

Genetic Diversity and Selective Constraints
Sliding window analyses exhibit the estimations of nucleotide diversity (Pi) for each PCG of the six mitogenomes ( Figure 5A, Supplementary Table S6). The gene atp8 has the highest nucleotide diversity (Pi = 0.

Phylogenetic Analyses
The phylogenetic trees based on the four datasets (P12, P12R, P123 and P123R) are highly consistent, except for the positions of the Korean G. orientalis and Velarifictorus hemelytrus (Saussure, 1877) ( Figure 6, Supplementary Figures S2-S4). For the same dataset, the nodal support values in BI trees are generally higher than those in ML trees. For the same inference method (BI or ML), different data combinations slightly affected the topology and support values. The P123 trees are markedly more resolved, and have overall higher supports at nodes than the others. The ingroup topologies between BI and ML trees are identical based on the P123 and P123R datasets, but are inconsistent based on the P12 and P12R datasets, indicating that the inclusion of the third codon positions make topologies more stable in both ML and BI trees. The nodal supports of phylogenetic trees based on P123 dataset are higher than that of P123R dataset. A similar situation was observed Ka/Ks ratio (ω) is an important indicator for detecting molecular adaptation correlated to the biological evolution [69,70]. The Ka/Ks ratios of 13 PCGs are all lower than 1 in all mitogenomes of Gryllotalpidae ( Figure 5B, Supplementary Table S6), indicating that these PCGs are evolving under purifying selection and suitable for phylogenetic reconstructions in Gryllotalpidae. The Ka/Ks of atp8 (ω = 0.393), nad2 (ω = 0.266) and nad6 (ω = 0.192) are much higher than those of other PCGs, suggesting that the former three genes experience more relaxed evolutionary constraints and retain more non-synonymous mutations. The gene cox1 exhibits the lowest Ka/Ks ratio (ω = 0.041) implying the greatest evolutionary limitation on cox1 among 13 PCG genes. The strong evolutionary constraints (ω << 1) of mitochondrial PCGs suggest that the deleterious mutations are eliminated by purifying selection to maintain highly conserved genes that encode core subunits of the respiratory chain complexes [10,71].
The species of Gryllotalpidae are similar in external morphology but exhibit complicated variations in genitalia, leading to taxonomic difficulties based solely on morphological characters [21,22]. Designing species-specific markers is crucial for resolving such problems. The cox1 gene has long been used as a universal DNA marker for species identification in insects [72][73][74][75], but is the most conservative protein coding gene in mitogenomes of gryllotalpids. Considering both the high nucleotide divergence and the elevated ratio of Ka/Ks, the genes nad2 and nad6 may be evaluated as potential markers for species delimitation in Gryllotalpidae.

Phylogenetic Analyses
The phylogenetic trees based on the four datasets (P12, P12R, P123 and P123R) are highly consistent, except for the positions of the Korean G. orientalis and Velarifictorus hemelytrus (Saussure, 1877) ( Figure 6, Supplementary Figures S2-S4). For the same dataset, the nodal support values in BI trees are generally higher than those in ML trees. For the same inference method (BI or ML), different data combinations slightly affected the topology and support values. The P123 trees are markedly more resolved, and have overall higher supports at nodes than the others. The ingroup topologies between BI and ML trees are identical based on the P123 and P123R datasets, but are inconsistent based on the P12 and P12R datasets, indicating that the inclusion of the third codon positions make topologies more stable in both ML and BI trees. The nodal supports of phylogenetic trees based on P123 dataset are higher than that of P123R dataset. A similar situation was observed between P12 and P12R trees, reflecting that the exclusion of the rRNA genes can improve branch supports of phylogenetic trees. The monophyly of the infraorder Gryllidea was well supported by all datasets with high nodal supports (PPs = 1; BSs = 100), and consistent with the results proposed by Chintauan-Marquier et al. [17], Zhou et al. [48], Chang et al. [10], Song et al. [8] and Sanno et al. [18].
The monophyletic Grylloidea was confirmed and the relationships within this superfamily were present as Mogoplistidae + (Trigonidiidae + (Phalangopsidae + Gryllidae)). This finding corroborates the generally accepted classification schemes [15] as well as mostly recent studies [8,17,18,39,40,76]. The monophyly of the superfamily Gryllotalpoidea, however, was rejected in the present study. Gryllotalpidae formed the sister taxon to the clade of Myrmecophilidae + Grylloidea rather than solely to Myrmecophilidae. This result is similar to the mitogenome-based trees [8,18], but conflicts with the multilocus-based phylogeny proposed by Chintauan-Marquier et al. [17], which is adopted prevalently as a reference classification. Mitogenomes may experience selective pressures in some insects with peculiar ecological and morphological traits [77,78]. The small and wingless crickets in Myrmecophilidae inhabit subterranean ant nests of low oxygen levels [79,80], whereas mole crickets have larger sizes and short wings, and usually hide in horizontal burrows near the soil surface [81]. The positively selective sites associated with hypoxic adaptability were identified in the cox1 genes of Myrmecophilidae, but were failed to be detected in those of Gryllotalpidae [18], suggesting that the mitogenomes of Myrmecophilidae and Gryllotalpidae have different evolutionary properties. Therefore, we speculated that the contradictions between mitogenomic and multilocus trees are partially attributed to the evolutionary differences of mitogenomes of the two families. In addition, the inconsistent trees may also be influenced by the lack of nuclear genes, which are important for reconstructing deep-level phylogenetic relationships [82][83][84]. The present investigation improved the resolution of the phylogram by Sanno et al. [18], although more species and markers are necessary for future studies.
In Gryllotalpidae, G. henana first split from the remaining gryllotalpids (BSs = 100, PPs = 1) ( Figure 6; Supplementary Figures S2-S4). Interestingly, in the second clade, the two specimens of G. orientalis were failed to be clustered in one branch. The Korean G. orientalis was clustered with the clade of Gryllotalpa sp. + G. unispina based on P123 and P123R datasets ( Figure 6, Supplementary Figure S2), but was placed with the clade of the Chinese G. orientalis + G. pluvialis based on P12 and P12R datasets ( Supplementary Figures S3 and S4).
Moreover, the K2P genetic distance of the two specimens of G. orientalis (0.145) is relatively high compared with the interspecific distances of Gryllotalpa (0.022-0.321) (Supplementary Table S7). We speculate that the so-called G. orientalis in China is likely a new species, and further morphological and biological evidences are needed to confirm this inference.  The monophyletic Grylloidea was confirmed and the relationships within this superfamily were present as Mogoplistidae + (Trigonidiidae + (Phalangopsidae + Gryllidae)). This finding corroborates the generally accepted classification schemes [15] as well as mostly recent studies [8,17,18,39,40,76]. The monophyly of the superfamily Gryllotalpoidea, however, was rejected in the present study. Gryllotalpidae formed the sister taxon to the clade of Myrmecophilidae + Grylloidea rather than solely to Myrmecophilidae. This result is similar to the mitogenome-based trees [8,18], but conflicts with the multilocusbased phylogeny proposed by Chintauan-Marquier et al. [17], which is adopted prevalently as a reference classification. Mitogenomes may experience selective pressures in some insects with peculiar ecological and morphological traits [77,78]. The small and wingless crickets in Myrmecophilidae inhabit subterranean ant nests of low oxygen levels [79,80], whereas mole crickets have larger sizes and short wings, and usually hide in horizontal burrows near the soil surface [81]. The positively selective sites associated with hypoxic adaptability were identified in the cox1 genes of Myrmecophilidae, but were failed to be detected in those of Gryllotalpidae [18], suggesting that the mitogenomes of Myrmecophilidae and Gryllotalpidae have different evolutionary properties. Therefore, we speculated that the contradictions between mitogenomic and multilocus trees are partially attributed to the evolutionary differences of mitogenomes of the two families. In addition, the inconsistent trees may also be influenced by the lack of nuclear genes, which Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/insects13100919/s1, Table S1: The best partitioning schemes and models for the Bayesian inference (BI) method in different datasets selected by PartitionFinder. Table S2: The best partitioning schemes and models for the Maximum likelihood (ML) method in different datasets selected by ModelFinder. Table S3: The count and relative synonymous codon usage (RSCU) of six species in Gryllotalpidae. Table S4: The numbers of mismatched base pairs in the secondary structure of the tRNAs in the six species of Gryllotalpidae. Table S5: Tandem repeat regions in the control region of six species in Gryllotalpidae. Table S6: Nucleotide diversity (Pi), non-synonymous substitutions rates (Ka), synonymous substitutions rates (Ks), Ka/Ks ratio and genetic distance of six species in Gryllotalpidae. Table S7: The K2P genetic distances in Gryllotalpidae. Figure S1: Location and structure of the potential stem-loops in Gryllotalpidae. (A) The location of the predicted stem-loops in the mitogenome. (B) The structures of potential stem-loops. Figure S2