Complete Mitochondrial Genome of Three Species of the Genus Microtus (Arvicolinae, Rodentia)

Simple Summary The knowledge and analysis of mitochondrial genomes can be useful to answer some interesting questions about the biology of some species. This is the case of vole species that are characterized by unusual genetic features related to sex chromosomes and variation of their karyotypes. In this work, we describe the mitochondrial genomes of three emblematic vole species, demonstrating that they are highly conserved in the organization of their protein coding, transfer RNA and ribosomal RNA genes. In addition, we performed a detailed analysis of their control regions identifying several domains and conserved boxes related to mitochondrial DNA regulation. Finally, a phylogenetic analysis with the mitochondrial DNA confirmed the established phylogeny of the analyzed voles. Abstract The 65 species of the genus Microtus have unusual sex-related genetic features and a high rate of karyotype variation. However, only nine complete mitogenomes for these species are currently available. We describe the complete mitogenome sequences of three Microtus, which vary in length from 16,295 bp to 16,331 bp, contain 13 protein-coding genes (PCGs), two ribosomal RNA genes, 22 transfer RNA genes and a control region. The length of the 13 PCGs and the coded proteins is the same in all three species, and the start and stop codons are conserved. The non-coding regions include the L-strand origin of replication, with the same sequence of 35 bp, and the control region, which varies between 896 bp and 930 bp in length. The control region includes three domains (Domains I, II and III) with extended termination-associated sequences (ETAS-1 and ETAS-2) in Domain I. Domain II and Domain III include five (CSB-B, C, D, E and F) and three (CSB-1, CSB-2, and CSB-3) conserved sequence blocks, respectively. Phylogenetic reconstructions using the mitochondrial genomes of all the available Microtus species and one representative species from another genus of the Arvicolinae subfamily reproduced the established phylogenetic relationships for all the Arvicolinae genera that were analyzed.

Complete mitogenomes were aligned using ClustalW, and the phylogenetic relationships were reconstructed using the Bayesian inference (BI) implemented in MrBayes v. 3.1 [64]. Runs of two million generations were conducted. Trees were sampled every 1000 generations with a burn-in of 25%. The best-fit nucleotide substitution model with the lowest BIC (Bayesian Information Criterion) value was chosen (GTR + G + I) using MEGA version X [48].

Gene Organization
The complete mitogenomes of M. cabrerae (MN058077), M. chrotorrhinus (MN058078) and M. thomasi (MN058079) analyzed were 16,331 bp, 16,297 bp and 16,295 bp in length, respectively. These values are similar to the mitogenomes of other species from this genus, which range between 16,283 bp (M. rossiaemeridionalis) and 16,312 bp (M. kikuchii) [1,60]. They were also comparable in size to those from other species of the Arvicolinae subfamily such as Proedromys liangshanensis (16,296 bp) [57] and Neodon forresti (16,397 bp; GenBank accession number: KU891252.1). All our results confirm that the mitochondrial genomes in the Arvicolinae subfamily are very similar in size.
The mitogenomes from M. cabrerae, M. chrotorrhinus and M. thomasi include a control region (D-loop) and a conserved set of 37 vertebrate mitochondrial genes, with 13 protein-coding genes (PCGs), 22 tRNA genes and two rRNA genes (12S rRNA and 16S rRNA) ( Table 1). As expected, the organization and structures of these three mitogenomes were identical to those described for other Microtus and mammal species (Figure 1). Hence, twelve PCGs, 14 tRNAs and two rRNAs are located on the heavy strand, while Nd6 and eight tRNAs are found on the light strand. The D-loop is emplaced between the tRNA-Pro and tRNA-Phe genes [37,38,57]. The percentage of identity observed in pairwise comparing of these three complete mitogenomes varied between 86.34% (M. thomasi-M. cabrerae pair-wise comparison) and 87.55% (M. thomasi-M. chrotorrhinus pair-wise comparison). These results fall into the range of the identity values that we calculated for comparisons between the available Microtus mitogenomes (85.50-98.75%).

Nucleotide Composition
We identified a bias towards A and T nucleotides, which is commonly reported in mitogenome sequences in mammals [35,37]. Hence, the A+T compositions of the H-strands are 58.19%, 59.27% and 59.77% in the M. cabrerae, M. chrotorrhinus and M. thomasi mitogenomes, respectively. The 13 mitochondrial PCGs are AT-biased, with an A+T content ranging from 54.34-56.63% for Cox3 to 64.71-66.67% for the Atp8 gene. The control region, the two rRNA genes and the 22 tRNAs are also AT-biased in all three species (Table 2).

Protein-Coding Genes and Codon Usage
The 13 mitochondrial PCGs from the three analyzed species are 11,390 bp in length (11,358 bp in codons and 32 bp in stop codons) and encode 3786 amino acids ( Table 2). The three described mitogenomes also contain some overlapping nucleotides and gaps between PCGs or between PCGs and tRNAs ( Table 1). The longest overlap of 43 bp is located between the Atp8 and Atp6 genes (Table 1).
Twelve mitochondrial PCGs use exactly the same start codon for translation initiation in all the three species: GTG for Nd1, ATT for Nd2 and Nd3, ATG for the other nine PCGs. Only the Nd5 gene has variation in the start codon: ATA in M. cabrerae and M. thomasi, and ATT in M. chrotorrhinus. Similarly, 12 PCGs genes use exactly the same five stop codons for translation termination in all three species, two incomplete (T-for Nd1, Cox3 and Nd4; TA-for Atp6) and the three other complete (TAG for Nd6, TAA for the rest of PCGs). However, Nd5 uses TAA in M. cabrerae and M. thomasi, and TAG in M. chrotorrhinus ( Table 2).
The most abundant start and stop codons were ATG and TAA, respectively, a finding that agrees with previous work on other mammal mitogenomes [30,31,35,37,65,66]. Incomplete stop codons (Tor TA-), like those used in three PCGs (Nd1, Cox3, and Nd4), are commonly observed in metazoan mitogenomes. They might be further completed by poly-adenylation of the 3 -end of the mRNA occurring after transcription, giving rise to the complete functional TAA stop codon [67,68].
The length of the 13 PCGs is the same in all three species and, consequently, in the coded proteins as well. The percentages of nucleotide identity range between 82.03%, observed when comparing the M. cabrereae and M. thomasi Nd2 gene sequences, and 89.71%, obtained when comparing the M. chrotorrhinus and M. thomasi Atp8 gene sequences (Table 2).

rRNAs, tRNA Genes and Non-Coding Regions
The tRNA-Val is located between the rRNA genes 12S and 16S. rRNAs genes appeared flanked by tRNA-Phe and tRNA-Leu(UUR) (Table 1; Figure 1 (Table 1).
Non-coding regions are important during replication and for the maintenance of the mitogenomes. These included the L-strand origin of replication (OL), intergenic spacers and the control region [69]. The three mitogenomes from Microtus species analyzed here have identical ORs of 35 bp. This region is located between tRNA-Asn and tRNA-Cys in the WANCY region, which refers to a cluster of five tRNA genes (tRNA-Trp, tRNA-Ala, tRNA-Asn, tRNA-Cys and tRNA-Tyr). The same organization is present in other Microtus and Arvicolinae species, as well as in most mammal species [31,37,38,57]. Hence, the OR from the mitogenomes of other Microtus species varies in length from 34 to 40 bp, while in most Arvicolinae species it is 34 bp in length and is highly conserved (Figure 2). Intergenic spacers were also found in the mitogenomes, with sizes in the range 1-11 bp ( Table 1) The control region includes three domains (Domains I, II and III). In Domain I, the extended termination-associated sequences (ETAS-1 and ETAS-2) were identified (Figure 3). The ETAS-1 sequence is better conserved than the ETAS-2 sequences in these species. Thus, pair-wise comparisons of ETAS-1 sequences showed a similarity of 89.83-96.6%, while the ETAS-2 sequence similarity was 67.3-76.9%.  The conserved sequence blocks CSB-1, CSB-2 and CSB-3 [70,71] were identified within Domain III. CSB-1 is the best-conserved block, having similarity values of 66.7-84% when comparing the three Microtus species with each other. CSB-2 and CSB-3 are less well conserved, with similarities of 52.6-68.4% and 54.1-68.0%, respectively ( Figure 3). No repetitive DNA sequences were found to be present between CSB1 and CSB2 on the Microtus D-loop, as occurs in other mammal species [35]. Five other conserved sequence blocks (CSB-B, C, D, E and F) were identified in central Domain II [72], all of which are well conserved with nucleotide similarities of 78.95-100% (Figure 3).

Phylogenetic Analysis
The phylogenetic positions of the three analyzed species were assessed using Bayesian inference ( Figure 4). All Microtus species are clustered in a well-supported clade that also includes Neodon and Lasiopodomys species. The Alexandromys species are grouped in the same branch with genera Neodon and Lasiopodomys, which have been considered as subgenera of Microtus genus, although phylogenetic relationships are not clearly established [20,24]. Proedromys is also close to this Microtus clade since it appears grouped as part of a well-supported node (posterior probability values = 1). The Microtus-Proedromys group is also associated with the clade of Myodes and Eothenomys (posterior probability values = 0.93). Finally, Dicrostonyx and Ondatra show a basal position. These results agree with the previously established phylogenetic relationships for these genera [73][74][75][76].
A number of conclusions can be drawn for the Microtus species analyzed. The two species from the Terricola group included here, M. (Terricola) thomasi and M. (Terricola) subterraneus, are closely associated with M. arvalis and M. rossiaemeridionalis. The close phylogenetic relationship between the Terricola and Microtus subgenera has been previously reported [20,29]. Although the two subgenera Aulacomys and Pedomys are not resolved the three North American species, M. chrotorrhinus, M. ochrogaster and M. richardsoni are grouped together. These Nearctic species fall within the phylogeny of Microtus, in line with previous studies [20], rather than being basal, as recently reported [29]. M. cabrerae and M. agrestis are grouped, which could support their inclusion in the subgenus Agricola as has been previously proposed [20]. Ostensibly, M. cabrerae and M. agrestis share certain unusual genetic features, including the presence of giant sex chromosomes, which could be regarded as additional proof of their phylogenetic proximity. However, several studies have clearly demonstrated that these enlarged sex chromosomes arose and evolved independently in the genus Microtus [4], and hence, their presence is not a robust criterion for the inclusion of M. cabrerae and M. agrestis in the same subgenus. A previous mitochondrial phylogenetic reconstruction obtained similar results, with M. cabrerae and M. agrestis clustered in the same clade, but the level of genetic divergence indicated that both species could be considered as members of two different subgenera (Agricola and Iberomys) [29]. The genus Iberomys, which is based on the description of archaic morphological characters, has been proposed, with only the species Microtus cabrerae [77,78]; however, no support is obtained for this genus [20].
The phylogenetic results obtained here with mitochondrial data need to be validated by the use of other nuclear markers because of the limitation inherent to mitogenomes, maternal inheritance, accelerated rates of substitution, introgression, effective population size and neutrality [79]. However, sequencing and characterization of mitogenomes from other species of the genus Microtus and closely related taxa from the subfamily Arvicolinae will help our understanding of the phylogenic relationships of this rodent species group and hence resolve some of the issues that remain open.

Conclusions
The complete mitogenomes of three Microtus species are described. Our results demonstrated that these mitogenomes have the organization and characteristics of the described mitogenomes of voles and mammalian species and contain 13 protein-coding genes (PCGs), two ribosomal RNA genes, 22 transfer RNA genes and a control region. We identified the conserved domain and the sequence-conserved blocks of the region control. Phylogenetic reconstructions reproduced the established phylogenetic relationships for all the Arvicolinae genera that were analyzed. Our results could be useful in future studies about the identifications and phylogeny of Arvicolidae species, especially of the genus Microtus. Funding: This work was funded by the Consejería de Innovación, Ciencia y Empresa of the Junta de Andalucía (Group: RNM924), and by the Universidad de Jaén (as part of the program Plan de Apoyo a la Investigación 2019-2020, Acción 1). Michail Rovatsos was supported by Charles University Research Centre program (204069).