Characterization of Five New Earthworm Mitogenomes (Oligochaeta: Lumbricidae): Mitochondrial Phylogeny of Lumbricidae

: Identiﬁcation based on conventional morphological characteristics is typically difﬁcult and time-consuming. The development of molecular techniques provides a novel strategy that relies on speciﬁc mitochondrial gene fragments to conduct authentication. For this study, ﬁve newly sequenced partial mitogenomes of earthworms ( Bimastos parvus , Dendrobaena octaedra , Eisenia andrei , Eisenia nordenskioldi , and Octolasion tyrtaeum ) with lengths ranging from 14,977 to 15,715 were presented. Each mitogenome possessed a putative control region that resided between tRNA-Arg and tRNA-His. All of the PCGs were under negative selection according to the value of Ka/Ks. The phylogenetic trees supported the classiﬁcation of Eisenia and Lumbricus ; however, the trees based on cox1 did not. Through various comparisons, it was determined that cox1 fragments might be more suitable for molecular identiﬁcation. These results lay the foundation for further phylogenetic studies on Lumbricidae.


Introduction
Earthworms are one of the most dominant and widely distributed invertebrates in soils. They can significantly alter their habitats through the formation of soil aggregates by drilling soil and accelerating microbial activities, with profound effects on both belowground and aboveground biota [1,2]. Despite their importance for ecosystems, significant difficulties remain in the identification of earthworms [3]. This is due to the limited number of external features available for identification, as well as the considerable plasticity of these features; thus, species identification often relies on internal structures [1,3,4]. Moreover, since many of the morphological features used for identification exist only in adults, the identification of juveniles is impossible in many species [4,5]. As classification and species identification are the basis of most ecological studies, identification limitations have greatly hindered progress in the ecological and community research on earthworms.
Sequence data are important not only for species identification but also for phylogenetic analysis. Due to the soft bodies of earthworms, the fossil record in this area is sparse [6]; thus, it is difficult to identify their ancestry and evolutionary characteristics, with Lumbricidae being no exception. Although Lumbricidae are one of the most widely studied and distributed groups of earthworms, their taxonomy is far from settled [3]. Because of the lack of useful taxonomic features, many morphologically similar species are classified as a single species with various morphotypes or species complexes that include various taxa of uncertain taxonomic categories [3,7,8]. Consequently, earthworm identification based on mitochondrial genomes (mitogenome) might provide a solution to this issue [9]. The mitogenome is a highly conserved, typically double-stranded, circular molecule that Diversity 2021, 13, 580 2 of 8 is one of the most widely used tools in taxonomy, population genetics, and evolutionary biology [10][11][12].
For this study, we sequenced 5 new partial mitogenomes from 4 genera of Lumbricidae and compared their 13 protein-coding genes (PCGs), 2 ribosomal RNAs (rRNAs), and 22 transfer RNAs (tRNAs) with other Lumbricidae species. Due to the large number of published mitogenomes of the Eisenia genus, we selected two species of the Eisenia genus to clarify its phylogenetic relationships. We aimed to elucidate the phylogenetic relationships of Lumbricidae to identify the most suitable gene sequence for species identification.

DNA Extraction and Sequencing
Genomic DNA (gDNA) was isolated from tissue using a FastPure Cell/Tissue DNA Isolation Mini Kit (Vazyme™, Nanjing, China) according to the manufacturer's protocols. High-quality gDNAs were then stored at −20 • C for future experiments. The DNA samples were sent to Genepioneer Biotechnologies (Nanjing, China) for library construction and sequencing using an Illumina HiSeq2500 high-throughput sequencing platform [14]. Raw data were filtered to remove joint sequences and low-quality reads to obtain high-quality, clean data.

Data Analysis
The rate of nonsynonymous substitutions (Ka), rate of synonymous substitutions (Ks), ratio of Ka/Ks, and p-distance were determined using MEGA X for the five earthworm species [19]. Although mitogenome DNA is inherited as a single molecule, phylogenetic inferences based on different portions of the genome may differ. To verify the obtained phylogenies, we constructed several datasets extracted from the mitogenomes: (a) The concatenated set of base sequences of 13 PCGs from 13 species; (b) 13 single PCGs from 13 species.
To determine the phylogenetic relationships of Lumbricidae, phylogenetic trees were constructed using Bayesian inference (BI) and maximum likelihood (ML) based on these two types of datasets, respectively. One species (Pontoscolex corethrurus) was considered as an outgroup (Table 2). Phylogenetic analysis was performed using the PhyloSuite v1.2.2 [20]. For dataset a), the best fit model of BI was selected by ModelFinder, while the model of ML was determined by automatic partitioning (Table 3). For dataset b), the best fit models of BI for cox1 and cox2 were GTR+F+I+G4, whereas for nd5 and nd3 they were HKY+F+I+G4; meanwhile, the best fit models of ML for cox1 and cox2 were GTR+F+I+G4, whereas for nd5 and nd3 they were HKY+F+I+G4 and TN+F+I+G4, respectively. The BI analyses were performed using MrBayes 3.2.6 and were run for one million generations, with tree sampling being carried out every 1000 generations with a burn-in of 25% trees. The phylogenetic trees were viewed and edited using EvolView (available at https://www. evolgenius.info/) [21].

Characterization of Mitogenomes
The partial mitogenome sequences of B. parvus, D. octaedra, E. andrei, E. nordenskioldi, and O. tyrtaeum were linear DNA molecules that were 14,977, 15,152, 15,194, 15,656, and 15,715 bp in length, respectively. Among these earthworms, O. tyrtaeum had the smallest mitogenome (14,977 bp), while D. octaedra had the largest (15,715 bp). A base composition analysis indicated that five mitogenome sequences were biased toward A and T, with an A + T content of 66.57% in B. parvus, 66.91% in D. octaedra, 62.70% in E. andrei, 62.92% in E. nordenskioldi, and 64.75% in O. tyrtaeum. The five mitogenomes were submitted to GenBank (accession numbers: MZ857199, MZ857197, MZ857198, MZ857200, and MZ857201) ( Table 2). The mitogenomes of these five species possessed the typical gene composition that has been hypothesized for most earthworms, including 13 PCGs, 22 tRNA genes, and two rRNA genes (12S and 16S) [25,27]. For these five mitogenomes, there was a 417 to 1093 bp repetitive sequence between tRNA-Arg and tRNA-His, which was suspected to be a putative control region (CR) that also appeared in the mitogenomes of other Lumbricidae species [22][23][24].
All of the genes were encoded on a single DNA strand, where different parts of the genome had almost no length variation but a different substitution rate (Figure 1). For the five mitogenomes, the total PCG lengths were 11,080, 11,078, 11,075, 11,077, and 11,075 bp, respectively. The majority of the PCGs of these mitogenomes began with the conventional initiation codon ATG, as seen in other animals, except for cox1, which began with the initiation codon ATT. There were three termination codons in the five mitogenome Diversity 2021, 13, 580 4 of 8 sequences: TAA, TA-, and T--. For all five mitogenomes, the occurrence frequency of the termination codon TAA was higher than that of the other two termination codons.
All of the genes were encoded on a single DNA strand, where different parts of the genome had almost no length variation but a different substitution rate (Figure 1). For the five mitogenomes, the total PCG lengths were 11,080, 11,078, 11,075, 11,077, and 11,075 bp, respectively. The majority of the PCGs of these mitogenomes began with the conventional initiation codon ATG, as seen in other animals, except for cox1, which began with the initiation codon ATT. There were three termination codons in the five mitogenome sequences: TAA, TA-, and T--. For all five mitogenomes, the occurrence frequency of the termination codon TAA was higher than that of the other two termination codons.

Variation among PCGs
The Ka, Ks, and their ratio (Ka/Ks, sometimes termed dN/dS) are commonly employed to aid in understanding the direction of evolution and its selective strength in a coding sequence [28][29][30]. Ka/Ks > 1 indicates a positive selection, Ka/Ks < 1 indicates a negative selection, and Ka/Ks ≈ 1 indicates a neutral evolution. According to Figure 1, the Ka/Ks values of all the PCGs were less than 0.35, with cox1 having the lowest evolutionary

Variation among PCGs
The Ka, Ks, and their ratio (Ka/Ks, sometimes termed dN/dS) are commonly employed to aid in understanding the direction of evolution and its selective strength in a coding sequence [28][29][30]. Ka/Ks > 1 indicates a positive selection, Ka/Ks < 1 indicates a negative selection, and Ka/Ks ≈ 1 indicates a neutral evolution. According to Figure 1 From what little data we have so far, cox1 was the most slowly evolving gene, while the short atp8 gene had the highest substitution rate, with p-distance being the lowest and the highest on the nucleotide level, respectively.

A Mitochondrial Genome Phylogeny of Lumbricidae
Because of the limited mitogenome sequences of Lumbricidae, we included only 12 species from 6 genera of Lumbricidae for the phylogenetic analyses. One earthworm in Glossoscolecidae (P. corethrurus) was selected as an outgroup to root the phylogenetic trees, which was implemented to understand the evolutionary relationships within the Lumbricidae family. According to the phylogenetic trees (Figure 2), all of the species in both Eisenia and Lumbricus were clustered with species of the same genus.
rate. The PCGs of the five assessed species evolved under negative selection (purify selection) as a whole.
From what little data we have so far, cox1 was the most slowly evolving gene, while the short atp8 gene had the highest substitution rate, with p-distance being the lowest and the highest on the nucleotide level, respectively.

A Mitochondrial Genome Phylogeny of Lumbricidae
Because of the limited mitogenome sequences of Lumbricidae, we included only 12 species from 6 genera of Lumbricidae for the phylogenetic analyses. One earthworm in Glossoscolecidae (P. corethrurus) was selected as an outgroup to root the phylogenetic trees, which was implemented to understand the evolutionary relationships within the Lumbricidae family. According to the phylogenetic trees (Figure 2), all of the species in both Eisenia and Lumbricus were clustered with species of the same genus. The Bimastos genus was closely related to the Eisenia genus. We also used 13 single PCGs to build trees, where the results of the phylogenetic trees based on cox1, cox2, nd5, and nd3 were roughly similar to those for the phylogenetic tree based on PCGs. The cox1based phylogenetic tree reflected that the Eisenia species were not completely clustered together, and the cox1-based and nd3-based phylogenetic trees reflected that the Bimastos genus was closely related to the Eisenia genus. Our results revealed that mitogenomic sequences are effective molecular markers for investigating the systematic relationships within Lumbricidae; however, the data covered only six genera, which meant that they were still limited. The sequencing mitogenomes of additional taxa might assist with the further investigation of the ongoing controversial taxonomic issues, while resolving the higher-level phylogeny within Lumbricidae. The accumulation of mitogenomic sequences may provide further evolutionary and phylogenetic clues about Lumbricidae in the future.

Discussion
As mentioned above, there is an unknown region between tRNA-Arg and tRNA-His in most published mitogenomes of Lumbricidae species with a length of several hundred bp; thus, we speculated that this region may be a CR. However, since different species and even different individuals of the same species have considerable differences in their CR [26,31,32], we are not yet able to verify this deduction. The Bimastos genus was closely related to the Eisenia genus. We also used 13 single PCGs to build trees, where the results of the phylogenetic trees based on cox1, cox2, nd5, and nd3 were roughly similar to those for the phylogenetic tree based on PCGs. The cox1based phylogenetic tree reflected that the Eisenia species were not completely clustered together, and the cox1-based and nd3-based phylogenetic trees reflected that the Bimastos genus was closely related to the Eisenia genus. Our results revealed that mitogenomic sequences are effective molecular markers for investigating the systematic relationships within Lumbricidae; however, the data covered only six genera, which meant that they were still limited. The sequencing mitogenomes of additional taxa might assist with the further investigation of the ongoing controversial taxonomic issues, while resolving the higher-level phylogeny within Lumbricidae. The accumulation of mitogenomic sequences may provide further evolutionary and phylogenetic clues about Lumbricidae in the future.

Discussion
As mentioned above, there is an unknown region between tRNA-Arg and tRNA-His in most published mitogenomes of Lumbricidae species with a length of several hundred bp; thus, we speculated that this region may be a CR. However, since different species and even different individuals of the same species have considerable differences in their CR [26,31,32], we are not yet able to verify this deduction.
According to Figures 1 and 3, the Ka/Ks values and p-distances of cox1, cox2, cox3, cytb, atp6, nd5, and nd3 were relatively stable and the results of the phylogenetic trees based on cox1, cox2, nd5, and nd3 roughly reflected the phylogenetic relationships within the Lumbricidae families. However, considering the suitability of selecting a gene for molecular identification (in addition to reflecting the internal phylogenetic relationships of the Lumbricidae family), the length of sequence fragments should be long enough, and the sequence of published fragments should be plentiful enough [1]. The number of Lumbricidae sequences in GenBank (NCBI, available at https://www.ncbi.nlm.nih.gov/) is currently many times larger in cox1 than in other PCGs. Therefore, after comprehensive consideration, the cox1 fragment may be more suitable for molecular identification, which is consistent with the findings of other studies [1,33]. The cox2 fragment has the potential to construct DNA barcodes; however, more cox2 gene fragments of the species need to be supplemented. cytb, atp6, nd5, and nd3 were relatively stable and the results of the phylogenetic trees based on cox1, cox2, nd5, and nd3 roughly reflected the phylogenetic relationships within the Lumbricidae families. However, considering the suitability of selecting a gene for molecular identification (in addition to reflecting the internal phylogenetic relationships of the Lumbricidae family), the length of sequence fragments should be long enough, and the sequence of published fragments should be plentiful enough [1]. The number of Lumbricidae sequences in GenBank (NCBI, available at https://www.ncbi.nlm.nih.gov/) is currently many times larger in cox1 than in other PCGs. Therefore, after comprehensive consideration, the cox1 fragment may be more suitable for molecular identification, which is consistent with the findings of other studies [1,33]. The cox2 fragment has the potential to construct DNA barcodes; however, more cox2 gene fragments of the species need to be supplemented. Our results were limited by the restricted taxonomic sampling and/or the low phylogenetic signal of the chosen genes, consistent with similar studies that revealed systematic instability within the family (e.g., many genera do not form monophyletic assemblages) and emphasized the need for classification reflecting evolutionary relationships [34]. Nevertheless, our results complemented the database of mitochondrial genes in Lumbricidae species and demonstrated that mitogenome sequences are effective molecular markers for the study of systematic relationships within Lumbricidae. However, the data covered only 12 species, and the number of acquisitions of each sample was not vast. Furthermore, we did not explore whether RNA fragments might be used for molecular identification, meaning that they remain limited. Despite there being no evidence of speciation in the nuclear markers, large mitochondrial genetic distances have been observed within several Lumbricidae species [35,36]. However, considering that the rate of variation of the mitochondrial genome sequence is higher than that of the nuclear genome sequence, the detection ability of inter-species differentiation is stronger [37] and there is little difference between the clustering of the mitochondrial genome tree and the clustering of the nuclear genome tree [38]. In this paper, we mainly considered the mitochondrial genome, and the nuclear genome was not considered. We will shift our focus to nuclear genomes in future studies. Sequencing the mitogenomes of additional Our results were limited by the restricted taxonomic sampling and/or the low phylogenetic signal of the chosen genes, consistent with similar studies that revealed systematic instability within the family (e.g., many genera do not form monophyletic assemblages) and emphasized the need for classification reflecting evolutionary relationships [34]. Nevertheless, our results complemented the database of mitochondrial genes in Lumbricidae species and demonstrated that mitogenome sequences are effective molecular markers for the study of systematic relationships within Lumbricidae. However, the data covered only 12 species, and the number of acquisitions of each sample was not vast. Furthermore, we did not explore whether RNA fragments might be used for molecular identification, meaning that they remain limited. Despite there being no evidence of speciation in the nuclear markers, large mitochondrial genetic distances have been observed within several Lumbricidae species [35,36]. However, considering that the rate of variation of the mitochondrial genome sequence is higher than that of the nuclear genome sequence, the detection ability of inter-species differentiation is stronger [37] and there is little difference between the clustering of the mitochondrial genome tree and the clustering of the nuclear genome tree [38]. In this paper, we mainly considered the mitochondrial genome, and the nuclear genome was not considered. We will shift our focus to nuclear genomes in future studies. Sequencing the mitogenomes of additional taxa may help to further investigate current controversial taxonomic problems and resolve the higher-level phylogeny within Lumbricidae. The improvement of these deficiencies will be addressed in future research.

Conclusions
In contrast to traditional morphological identification, molecular identification is simple, rapid, and accurate [39][40][41]. In this study, we constructed a total of 30 phylogenetic trees based on two types of datasets to clarify phylogenetic relationships within the Lumbricidae family while finding suitable gene fragments for molecular identification.