Complete Mitogenome and Phylogenetic Analyses of Galerita orientalis Schmidt-Goebel, 1846 (Insecta: Coleoptera: Carabidae: Galeritini)

The genus Galerita Fabricius, 1801 belongs to the tribe Galeritini of the family Carabidae. In this study, the complete mitochondrial genome (GenBank: ON920164.1) of G. orientalis is newly sequenced, annotated, characterized, and composed of 37 typical genes, and one control region. Mitogenome is a circular DNA molecule of 16,137 bp with a 78.79% AT content. All 13 protein-coding genes are initiated using a typical ATN (Met) as the start codon, except for nad1, which has a TTG as the start codon, and are terminated using a typical TAN stop codon. Twenty-two tRNAs could fold into a typical cloverleaf structure, including trnS1-GCU, which lacks the DHU stem observed in other mitogenomes of the subfamily Harpalinae. Both rrnS and rrnL contain many helices. A conserved poly-T stretch (19 bp) and seven tandem repeats are observed in the control region, and a phylogenetic analysis indicated that the genus Galerita is an independent lineage. The complete mitogenome of G. orientalis will contribute to further studies on the molecular basis of the classification and phylogeny of Harpalinae, and even Carabidae.


Introduction
Carabidae is the largest family in the order Coleoptera, comprising more than 32,000 species worldwide [1,2], and is known as carabid beetles or ground beetles [1,3]. Harpalinae is the largest group of Carabidae and contains more than 19,000 species [1,4]. These beetles live in diverse habitats, have diverse morphological forms, and exhibit a variety of unusual lifestyles [5]. Therefore, the study of phylogeny is critical for understanding the evolution and diversity of tribes, genera, and species within the Harpalinae [1,5]. The monophyly of Harpalinae is based on morphological characteristics, chemical defensive secretions, chromosome number in males, and molecular sequence data obtained from the 18S rDNA gene [1,5]. However, the boundaries of these character states do not exactly match Harpalinae [1,5]. The nominotypical subgenus of the pantropical genus Galerita Fabricius, 1801 is represented in all zoogeographic regions, except Australian [6][7][8][9], and includes more than 110 species [10], which belong to the tribe Galeritini of the subfamily Harpalinae. The taxonomy of the tribe Galeritini has been developed based on adult and larvae features [6][7][8][9][10][11]. Based on the DNA-sequence datasets obtained from nuclear genes (the 28S rDNA and wingless nuclear protein-coding genes), higher-level phylogenetic relationships within Harpalinae are investigated [1,5,12,13], including the tribe Galeritini. Galerita is a sister group to ctenodactylines in the Zuphiitae clade, based on the 28S rDNA gene and wingless data [13].
Galerita orientalis Schmidt-Göbel, 1846 ( Figure 1) is widely distributed in continental Asia, Japan, and the Greater and Lesser Sunda Islands [9], and was revised by Reichardt (1965) [8,9]. However, our knowledge remains incomplete, with limited genetic and mitogenomic information from G. orientalis. As of August 2022, only three partial coding sequences of cox1 from G. orientalis had been published in GenBank. Insect mitochondrial genomes (mitogenomes) are closed, circular, double-stranded DNA molecules with lengths ranging from 15 to 19 kb that contain 37 typical genes, including 13 protein-coding genes (PCGs), 2 rRNAs, 22 tRNAs, and a control region (CR). Mitogenomes play an important role in the molecular phylogeny of insects [14]. Next-generation sequencing (NGS) is an important and effective strategy for mitogenome assembly and the phylogenetic analysis of Harpalinae [3,[15][16][17].
Galerita is a sister group to ctenodactylines in the Zuphiitae clade, based on the 28S rDNA gene and wingless data [13].
Galerita orientalis Schmidt-Göbel, 1846 ( Figure 1) is widely distributed in continental Asia, Japan, and the Greater and Lesser Sunda Islands [9], and was revised by Reichardt (1965) [8,9]. However, our knowledge remains incomplete, with limited genetic and mitogenomic information from G. orientalis. As of August 2022, only three partial coding sequences of cox1 from G. orientalis had been published in GenBank. Insect mitochondrial genomes (mitogenomes) are closed, circular, double-stranded DNA molecules with lengths ranging from 15 to 19 kb that contain 37 typical genes, including 13 protein-coding genes (PCGs), 2 rRNAs, 22 tRNAs, and a control region (CR). Mitogenomes play an important role in the molecular phylogeny of insects [14]. Next-generation sequencing (NGS) is an important and effective strategy for mitogenome assembly and the phylogenetic analysis of Harpalinae [3,[15][16][17].
In the present study, we sequence and characterize the G. orientalis mitogenome using NGS. Furthermore, we construct phylogenetic trees based on the mitogenomes of 30 species of the family Carabidae and two outgroup species, which will contribute to the research on its phylogenetic position in the family Harpalinae. These results will be useful to reconstruct the phylogenetic relationships within Carabidae in the future.

Animal Materials, DNA Extraction, and Illumina Sequencing
Samples of adult G. orientalis were collected from Jingziguan Town (E 111.026°, N 33.244°), Xichuan County, Nanyang City, Henan Province, China, on 9 June 2022, the genomic DNA (gDNA) of which was extracted using the Qiagen DNeasy Blood and Tissue Extraction kit (Qiagen, Germantown, MD, USA). The purity and concentration of the obtained gDNA were tested using a NanoPhotometer ® spectrophotometer (Implen, Calabasas, CA, USA) and a Qubit ® 2.0 fluorometer (Life Technologies, Carlsbad, CA, USA), respectively [18,19]. Sequencing libraries for the quality-checked gDNA were generated using a TrueLib DNA Library Rapid Prep Kit for Illumina sequencing (Illumina, Inc., San In the present study, we sequence and characterize the G. orientalis mitogenome using NGS. Furthermore, we construct phylogenetic trees based on the mitogenomes of 30 species of the family Carabidae and two outgroup species, which will contribute to the research on its phylogenetic position in the family Harpalinae. These results will be useful to reconstruct the phylogenetic relationships within Carabidae in the future.

Phylogenetic Analysis
In this study, the phylogenetic analysis included sequences from mitogenomes of 30 Carabidae species and 2 outgroup species (Lepisma saccharina and Corydidarum magnifica) ( Table 1). The concatenated sequences of 13 protein-coding genes from mitogenomes were used to reconstruct the phylogenetic relationships of Carabidae using PhyloSuite version 1.2.2 [35] with MAFFT version 7 [36], MACSE version 2.03 [37], Gblocks 0.91b [38], Mod-elFinder [39], MrBayes version 3.2.7 [40], and IQ-TREE 1.6.12 [41]. Nucleotide sequences of 13 PCGs were aligned using MAFFT version 7 with the default parameters, and MACSE version 2.03 with default parameters. Ambiguously aligned fragments of the alignments from the 13 PCGs were removed using Gblocks 0.91b [38] with default parameters. The nucleotide sequences were used to construct phylogenetic trees using two methods: Bayesian inference (BI) using MrBayes 3.2.0,7 and maximum likelihood (ML) using IQ-TREE 1.6.12. According to the Bayesian information criterion (BIC) scores, a GTR + F [state frequencies, fixed (empirical)] + I [proportion of invariable sites, uniformly distributed on the interval (0.00, 1.00)] + G4 (γ-distributed rate variation, four categories) model was selected as the best-fit partition model (edge-unlinked) for the BI of nucleotide sequences using PhyloSuite version 1.2.2 with ModelFinder. For the IQ-tree, owing to the selection of the partition mode, the best-fit partition model was automatically calculated before the phylogenetic trees were constructed. In the BI analysis, 2 runs of 2,000,000 generations were conducted for each matrix, and the initial 25% was discarded as burn-in, which had the same topology with an average standard deviation of split frequencies of 0.008349 (<0.01). In the ML analysis, node support values were assessed using 5000 bootstrap resampling replicates. The resulting phylogenetic trees were visualized using an interactive tree of life (iTOL) (https://itol.embl.de/ (accessed on 23 October 2022)) [42].

Sequencing, Quality Control, and Mitogenome Organization and Base Composition of G. orientalis
Approximately 43.38 Gb of high-quality, clean reads were obtained using the fastp software [6] from approximately 51.89 Gb of raw reads of a 300 bp insert library, using the Illumina NovaSeq 6000 platform for the G. orientalis mitogenome assembly. The Q20, Q30, and GC contents of the clean reads were 98.37%, 93.88%, and 35.88%, respectively ( Table 2). These high-quality clean short reads (0.39% reads from the mitogenome) defined the mitogenome of G. orientalis (ON920164.1) with 100% coverage at a high-average-reads depth (10,425 times), which consisted of a typical, single, circular DNA molecule 16,137 bp in length. The length of this mitogenome was between 16,027 bp for Ha. discrepans and 17,701 bp for Ab. parallelepipedus in the subfamily Harpalinae (Table 1).
The mitogenome of G. orientalis contained 40.85% A, 37.94% T, 8.46% G, and 12.75% C, which showed an obvious AT bias with 78.79% AT content. The AT content of the G. orientalis mitogenome was slightly lower than those of Ha. sinicus [16], Am. communis [17], S. pumicatus [17], and Ha. pensylvanicus [17]. The AT-and GC-skews of the major strand of the G. orientalis mitogenome were 0.037 and −0.202, respectively, indicating a major strand compositional bias characterized by a slight excess of A over T nucleotides, and a strong excess of C over G nucleotides. Bias is generally observed in the mitogenomes of members of the subfamily Harpalinae, including Am. aulica [3] and Ha. pensylvanicus [17].

Protein-Coding Genes
A total of 9 of the 13 PCGs were encoded on the majority strand (cox1, cox2, cox3, atp6, atp8, nad2, nad3, nad6, and cob), and 4 on the minority strand (nad1, nad4, nad4l, and nad5) ( Figure 2 and Table 3). All 13 PCGs had a typical ATN (Met) start codon, except for nad1 (TTG as start codon): only 1 PCG (nad2) initiated with an ATA start codon, 5 PCGs (cox1, nad3, nad5, nad4l, and nad6) initiated with an ATT start codon, 5 PCGs (cox2, atp6, cox3, nad4, and cob) initiated with an ATG start codon, and only 1 PCG (atp8) initiated with an ATC start codon. The start codons in the G. orientalis mitogenome were consistent with those in the Ha. sinicus mitogenome. In the mitogenomes of the subfamily Harpalinae, CGA and TAT are used as start codons [3,17]. All 13 PCGs contained a typical TAN stop codon, 3 PCGs (nad3, cob, and nad1) terminated with a TAG stop codon, 6 PCGs (nad2, atp8, atp6, cox3, nad4l, and nad6) ended with a TAA stop codon, and 4 PCGs (cox1, cox2, nad5, and nad4) terminated with an incomplete stop codon (T), consisting of a codon that was completed by the addition of A nucleotides at the 3 end of the encoded mRNA. Most of the stop codons of the genes in the G. orientalis mitogenome were identical to those in the Ha. sinicus and Am. aulica mitogenomes. Other types of stop codons are present in the mitogenomes of the subfamily Harpalinae, such as incomplete stop codons A, CTA, and TTA [16,17]. The diversity of the start and stop codons reflects the evolutionary diversity of species and makes it difficult to determine the start and stop positions of PCGs.
The results of the relative synonymous codon usage (RSCU) analysis for the 13 PCGs, comprising 3714 codons excluding the start and stop codons, showed codon usage bias in the G. orientalis mitogenome (Figure 3 and Supplementary Table S1). Among the amino acids (Supplementary Table S1), Leu was the predominant type (576), followed by Ile (381) and Phe (363). Among the codon usage counts, UUA (434) for Leu was dominant, followed by AUU (356) for Ile and UUU (323) for Phe. Thirteen PCGs had the biased usage of the A and T nucleotides (Supplementary Table S2).
A total of 9 of the 13 PCGs were encoded on the majority strand (cox1, cox2, cox3, atp6, atp8, nad2, nad3, nad6, and cob), and 4 on the minority strand (nad1, nad4, nad4l, and nad5) ( Figure 2 and Table 3). All 13 PCGs had a typical ATN (Met) start codon, except for nad1 (TTG as start codon): only 1 PCG (nad2) initiated with an ATA start codon, 5 PCGs (cox1, nad3, nad5, nad4l, and nad6) initiated with an ATT start codon, 5 PCGs (cox2, atp6, cox3, nad4, and cob) initiated with an ATG start codon, and only 1 PCG (atp8) initiated with an ATC start codon. The start codons in the G. orientalis mitogenome were consistent with those in the Ha. sinicus mitogenome. In the mitogenomes of the subfamily Harpalinae, CGA and TAT are used as start codons [3,17]. All 13 PCGs contained a typical TAN stop codon, 3 PCGs (nad3, cob, and nad1) terminated with a TAG stop codon, 6 PCGs (nad2, atp8, atp6, cox3, nad4l, and nad6) ended with a TAA stop codon, and 4 PCGs (cox1, cox2, nad5, and nad4) terminated with an incomplete stop codon (T), consisting of a codon that was completed by the addition of A nucleotides at the 3' end of the encoded mRNA. Most of the stop codons of the genes in the G. orientalis mitogenome were identical to those in the Ha. sinicus and Am. aulica mitogenomes. Other types of stop codons are present in the mitogenomes of the subfamily Harpalinae, such as incomplete stop codons A, CTA, and TTA [16,17]. The diversity of the start and stop codons reflects the evolutionary diversity of species and makes it difficult to determine the start and stop positions of PCGs.
The results of the relative synonymous codon usage (RSCU) analysis for the 13 PCGs, comprising 3714 codons excluding the start and stop codons, showed codon usage bias in the G. orientalis mitogenome (Figure 3 and Supplementary

Transfer and Ribosomal RNA Genes
The traditional 22 tRNA genes were interspersed among the PCGs. Fourteen of the 22 tRNAs were in the majority strand, and eight were in the minority strand ( Figure 2 and Table 3). The lengths of the 22 tRNAs ranged from 61 bp (trnA-UGC) to 75 bp (trnQ-UUG and trnW-UCA) ( Table 3 and Supplementary Table S2), which had a typical cloverleaf secondary structure (Figure 4). Most of the secondary structures of tRNAs in the G. orientalis mitogenome were consistent with those in the mitogenomes of Ha. sinicus and Ha. pensylvanicus. In the mitogenomes of the subfamily Harpalinae, trnS1-GCU lacked a DHU

Transfer and Ribosomal RNA Genes
The traditional 22 tRNA genes were interspersed among the PCGs. Fourteen of the 22 tRNAs were in the majority strand, and eight were in the minority strand ( Figure 2 and Table 3). The lengths of the 22 tRNAs ranged from 61 bp (trnA-UGC) to 75 bp (trnQ-UUG and trnW-UCA) (Tables 3 and S2), which had a typical cloverleaf secondary structure (Figure 4). Most of the secondary structures of tRNAs in the G. orientalis mitogenome were consistent with those in the mitogenomes of Ha. sinicus and Ha. pensylvanicus. In the mitogenomes of the subfamily Harpalinae, trnS1-GCU lacked a DHU stem [3,16,17], which was replaced by a simple loop [3,16,17]. However, the trnS1-GCU of the G. orientalis mitogenome had a 2 bp DHU (Figure 4). The diversity of the secondary structures of tRNAs reflects the evolutionary diversity of the species. The length of the anticodon stems of the tRNAs ranged from 3 bp (trnT-UGU) to 8 bp (trnC-GCA) (Figure 4). The length of the DHU stem ranged from 2 bp (trnY-GUA and trnS1-GCU) to 5 bp (trnM-CAU and trnK-CUU) (Figure 4), most of which were 3-4 bp long. The length of the TΨC stem ranged from 3 bp (trnC-GCA and trnV-UAC) to 6 bp (trnL1-UAG) (Figure 4), most of which were 4-5 bp long. There are three types of mismatched base pairs for tRNA: U-U base pairs, A-G base pairs, and non-canonical G-U base pairs (Figure 4). The amino acid accepter stem of trnC-GCA has U-U base pairs (Figure 4), the amino acid accepter stem of trnW-UCA has A-G base  (Figure 4), and the anticodon stems of trnW-UCA and trnD-GUC and the TΨC stem of trnS1-GCU have G-U base pairs (Figure 4). U-U base pairs, A-G base pairs, and non-canonical G-U base pairs (Figure 4). The a acid accepter stem of trnC-GCA has U-U base pairs (Figure 4), the amino acid ac stem of trnW-UCA has A-G base pairs (Figure 4), and the anticodon stems of trnW and trnD-GUC and the TΨC stem of trnS1-GCU have G-U base pairs (Figure 4).

Control Region
The CR, also called the AT-rich region, is 1327 bp in length with an AT content of 88.62% and is located between the rrnS and trnI-GAU genes (Figure 2 and Supplementary Table S2). In the mitogenomes of the subfamily Harpalinae, the AT content of the CR of G. orientalis was slightly higher than that of Ha. sinicus, and lower than that of Ha. pensylvanicus. The AT-and GC-skews of CR were 0.005 and −0.125 on the majority strand, respectively, indicating a major-strand compositional bias characterized by a slight excess of A over T nucleotides, and a strong excess of C over G nucleotides. Bias is generally observed in the mitogenomes of members of the subfamily Harpalinae, such as Ha. pensylvanicus [17]. The secondary structure of the CR was inferred (Supplementary Figures S3) to contain more than 20 stem-loop structures. A conserved poly-T stretch (19 bp) and seven tandem repeats (TRs) were observed in the CR of the G. orientalis mitogenome ( Figure 5 and Table S3). The total length of the TRs was 238 bp, contributing to 17.94% of the CR size. For TR3-TR6, these four TRs overlapped with each other ( Figure 5 and Table S3). The region where TR3-TR6 was located, was the most enriched region of A + T, starting from 15,383 to 15,471, with a total length of 89 bp which was entirely composed of A and T bases; TR2 was a varied and typical microsatellite-like element (TA)24, with a left-flanking conserved polyT stretch (19 bp) ( Figure 5). Table S2). In the mitogenomes of the subfamily Harpalinae, the AT content of the CR of G. orientalis was slightly higher than that of Ha. sinicus, and lower than that of Ha. pensylvanicus. The AT-and GC-skews of CR were 0.005 and −0.125 on the majority strand, respectively, indicating a major-strand compositional bias characterized by a slight excess of A over T nucleotides, and a strong excess of C over G nucleotides. Bias is generally observed in the mitogenomes of members of the subfamily Harpalinae, such as Ha. pensylvanicus [17]. The secondary structure of the CR was inferred (Supplementary Figures  S3) to contain more than 20 stem-loop structures. A conserved poly-T stretch (19 bp) and seven tandem repeats (TRs) were observed in the CR of the G. orientalis mitogenome (Figure 5 and Table S3). The total length of the TRs was 238 bp, contributing to 17.94% of the CR size. For TR3-TR6, these four TRs overlapped with each other ( Figure 5 and Table S3). The region where TR3-TR6 was located, was the most enriched region of A + T, starting from 15,383 to 15,471, with a total length of 89 bp which was entirely composed of A and T bases; TR2 was a varied and typical microsatellite-like element (TA)24, with a left-flanking conserved polyT stretch (19 bp) ( Figure 5).

Phylogenetic Analysis
This study's phylogenetic analyses were based on the nucleotide sequences of the 13 PCGs obtained from 32 mitogenomes (Figures 6 and 7). A total of 11,136 alignment positions were obtained using Gblock [38], from 11,556 alignment positions of the 13 PCGs (Supplementary Table S4). The BI tree provided significantly higher support values than the ML tree for the same dataset, particularly for branches that involved the subfamily Harpalinae relationships of the 13 PCG nucleotide sequences. In the ML tree, significantly low ML bootstrap support values (21 and 36) were observed for the subfamily Harpalinae, which is consistent with the result of a previous phylogenetic study [3]. In the BI and ML trees, nine genera of the subfamily Harpalinae were clustered as (((((((Pterostichus + Stomis) + Orthomus) + Amara) + Harpalus) + (Abax + Craspedophorus)) + Hexagonia) + Galerita) (Figures 6 and 7), which was similar to the results of previous phylogenetic studies [1,3,5,12,16]. At the genus level, Galerita is an independent lineage in the topology of the BI and ML trees (Figures 6 and 7 Figure 6; Figure 7). The genus Galerita is closely related to the genus Trichognathus in the tribe Galeritini [10,12]. The tribe Galeritini and Dryptini have sister relationships [10]. However, the mitogenomes are unknown. An accurate phylogenetic

Phylogenetic Analysis
This study's phylogenetic analyses were based on the nucleotide sequences of the 13 PCGs obtained from 32 mitogenomes (Figures 6 and 7). A total of 11,136 alignment positions were obtained using Gblock [38], from 11,556 alignment positions of the 13 PCGs (Supplementary Table S4). The BI tree provided significantly higher support values than the ML tree for the same dataset, particularly for branches that involved the subfamily Harpalinae relationships of the 13 PCG nucleotide sequences. In the ML tree, significantly low ML bootstrap support values (21 and 36) were observed for the subfamily Harpalinae, which is consistent with the result of a previous phylogenetic study [3]. In the BI and ML trees, nine genera of the subfamily Harpalinae were clustered as (((((((Pterostichus + Stomis) + Orthomus) + Amara) + Harpalus) + (Abax + Craspedophorus)) + Hexagonia) + Galerita) (Figures 6 and 7), which was similar to the results of previous phylogenetic studies [1,3,5,12,16]. At the genus level, Galerita is an independent lineage in the topology of the BI and ML trees (Figures 6 and 7). The genus Galerita is closely related to the genus Trichognathus in the tribe Galeritini [10,12]. The tribe Galeritini and Dryptini have sister relationships [10]. However, the mitogenomes are unknown. An accurate phylogenetic position within the genus Galerita requires additional mitogenome sequences. At the subfamily level of the BI and ML trees, the positions of Brachininae and Omophroninae were unstable. Nevertheless, this study provided a molecular basis for the classification and phylogeny of the family Carabidae, especially the subfamily Harpalinae.
Genes 2022, 13, 2199 10 of 13 position within the genus Galerita requires additional mitogenome sequences. At the subfamily level of the BI and ML trees, the positions of Brachininae and Omophroninae were unstable. Nevertheless, this study provided a molecular basis for the classification and phylogeny of the family Carabidae, especially the subfamily Harpalinae.

Conclusions
The genus Galerita contains more than 110 species [10], which is relatively disorganized due to simplistic morphological characteristics [9] and the absence of mitogenome molecular phylogenetic evidence. Therefore, we newly assembled the G. orientalis mi-togenome in the present study. Compared to other previously reported mitogenomes of subfamily Harpalinae, all of them presented similar structural characters and nucleotide compositions, which contained 13 PCGs, 22 tRNA, 2 rRNA, and a control region. All 13 protein-coding genes were initiated using a typical ATN (Met) as the start codon, except for nad1, which has a TTG as the start codon, and were terminated using a typical TAN stop codon. The 22 tRNA could fold into a typical cloverleaf structure, including trnS1-NCU, which lacks the DHU arm in other mitogenomes of subfamily Harpalinae. Both rrnS and rrnL contained a lot of helices. A conserved poly-T stretch (19 bp) and seven TRs were observed in the CR. The phylogenetic analysis indicated that the genus Galerita was an independent lineage. Considering the diversity of the family Carabidae and the limitations of the current mitogenome information, the accurate phylogeny within the genus Galerita will require additional mitogenomes.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/genes13122199/s1, Table S1: Codon number and RSCU of PCGs of G. orientalis mitogenome; Table S2: Composition and skewness of genes and CR of G. orientalis mitogenome; Table S3: Tandem repeats of CR of G. orientalis mitogenome; Table S4: Alignment results using Gblocks; Figure S1: Predicted secondary structure of rrnL of G. orientalis mitogenome; Figure S2: Predicted secondary structure of rrnS of G. orientalis mitogenome; Figure S3: Predicted secondary structure of control region of G. orientalis mitogenome.  Data Availability Statement: The following information was supplied regarding the deposition of DNA sequences: The raw data can be obtained from the Sequence Read Archive at NCBI under accession number SRR20727398. The associated BioProject and Bio-Sample numbers are PRJNA864596 and SAMN30075025, respectively.