Perspectives in Earthworm Molecular Phylogeny: Recent Advances in Lumbricoidea and Standing Questions

: Earthworm systematics have been limited by the small number of taxonomically informative morphological characters and high levels of homoplasy in this group. However, molecular phylogenetic techniques have yielded signiﬁcant improvements in earthworm taxonomy in the last 15 years. Several different approaches based on the use of different molecular markers, sequencing techniques, and compromises between specimen/taxon coverage and phylogenetic information have recently emerged (DNA barcoding, multigene phylogenetics, mitochondrial genome analysis, transcriptome analysis, targeted enrichment methods, and reduced representation techniques), pro-viding solutions to different evolutionary questions regarding European earthworms. Molecular phylogenetics have led to signiﬁcant advances being made in Lumbricidae systematics, such as the redeﬁnition or discovery of new genera ( Galiciandrilus , Compostelandrilus , Vindoboscolex , Castellodrilus ), delimitation and revision of previously existing genera ( Kritodrilus , Eophila , Zophoscolex , Bimastos ), and changes to the status of subspeciﬁc taxa (such as the Allolobophora chaetophora complex). These approaches have enabled the identiﬁcation of problems that can be resolved by molecular phylogenetics, including the revision of Aporrectodea , Allolobophora , Helodrilus, and Dendrobaena , as well as the examination of small taxa such as Perelia , Eumenescolex, and Iberoscolex . Similar advances have been made with the family Hormogastridae, in which integrative systematics have contributed to the description of several new species, including the delimitation of (formerly) cryptic species. At the family level, integrative systematics have provided a new genus system that better reﬂects the diversity and biogeography of these earthworms, and phylogenetic comparative methods provide insight into earthworm macroevolution. Despite these achievements, further research should be performed on the Tyrrhenian cryptic complexes, which are of special eco-evolutionary interest. These examples highlight the potential value of applying molecular phylogenetic techniques to other earthworm families, which are very diverse and occupy different terrestrial habitats across the world. The systematic implementation of such approaches should be encouraged among the different expert groups worldwide, with emphasis on collaboration and cooperation. Author Contributions: Conceptualization, D.F.M., T.D., J.D. and M.N.; investigation, D.F.M., T.D., J.D. and M.N.; resources, D.F.M., T.D., J.D. and M.N.; writing—original draft preparation, D.F.M.; writing—review and editing, D.F.M., T.D., J.D. and M.N.; supervision, T.D., J.D. and M.N.; project administration, D.F.M., T.D., J.D. and M.N.; funding acquisition, D.F.M., T.D., J.D. and M.N. All read agreed the of the


Introduction
Earthworm systematics have been riddled since their inception by the limited number of morphological characters offered by the soft-bodied and conserved body plan of these organisms, with many of them being symplesiomorphic or homoplasious [1]. This has been reflected by an unstable taxonomy, even at higher taxonomic levels. For instance, at least seven different genus systems have been proposed for Lumbricidae throughout the 20th century [2][3][4][5][6][7][8] and the family system for Megascolecoidea is still subject to changes [9].
Within this framework, other rare or endemic genera which were absent or underrepresented in the aforementioned study have gradually been added in recent molecular phylogenetic studies ( Figure 2). For example, [23]   Although COI barcoding is a useful tool, it has been found to be limited for resolving phylogenetic relationships at above species level [18]. Other mitochondrial (such as COII, 16S, 12S, ND1) and nuclear molecular markers (such as 28S or 18S) showed similar limitations when used individually [19,20]. Their combination in multigene or multilocus phylogenetic analyses ( Figure 1) has improved the ability of phylogenetic analysis to recover closely related species within monophyletic clades [21][22][23], even revealing some relationships at above genus (or even family) levels. Nonetheless, in some cases of ancient divergence or rapid radiation, this approach has been shown to be lacking power.
As a source of several molecular markers with the added value of establishing gene order, mitochondrial genome analysis represents an interesting alternative to multigene phylogenetic analysis [24,25]. Besides its advantages, such as increased above species level resolution due to increased number of markers and rare rearrangements, mitochondrial genomes overlook nuclear information, which could constitute a problem in cases of incomplete lineage sorting or heteroplasmy [26,27].
The obvious solution to the limitations presented by these methods would lie in the representation of the whole genome of a species within the phylogenetic analyses. As wholegenome sequencing (WGS) is unaffordable in several phylogenetic applications (particularly when a wide taxon coverage is required), transcriptomics appears to be the best alternative. Two studies have demonstrated the potential application of transcriptomics to earthworm molecular phylogenetics [28,29]; phylogenetic resolution at the family level and above was significantly increased, but at the cost of generating very complex bioinformatic pipelines for selecting the most informative orthologous genes. In addition, genes evolving at different rates can potentially support different topologies [28]. Another possible disadvantage of this methodology is the requirement of freshly preserved specimens, as well as the need for special preservation protocols and careful handling of RNA (relative to DNA) during the process.
Some of the most recent additions to the molecular phylogenetics toolbox include targeted enrichment methods, namely ultra-conserved elements (UCE [30]) and anchored hybrid enrichment (AHE [31]) ( Figure 1). These techniques rely on previously existing genomes as the starting point for loci selection and probe design, chosen based on conservation and uniqueness through a sliding window approach [31]; this enables the capture of a consistent marker data set for all the taxa studied, while avoiding phylogenetically misleading parts of the genome (e.g., paralogs and pseudogenes) [32]. These techniques are an efficient, inexpensive way of sequencing hundreds (e.g., 609 in [27]) of orthologs, which in the case of AHE provide phylogenetic signal at both deep and shallow scale analyses [31]. They have displayed the ability to resolve problematic nodes in other animals [33,34]. The latter technique has been successfully applied to clitellates [35] and earthworms [27] and is currently being implemented in the Lumbricidae (Rafinesque-Schmaltz, 1815) by D.F. Marchán and collaborators.
All the aforementioned phylogenomic approaches are suitable for reconstructing phylogenetic relationships above the species level, but owing to the cost and the level of genetic variability they capture, they are not suitable for phylogeographic studies, in which several representatives per locality are required. Reduced representation techniques such as RAD-seq [36] and GBS [37] have been developed for this purpose ( Figure 1). Both approaches share the same methodological basis, in which restriction enzymes are used to cut the genomic DNA, and the resulting fragments are sorted by size and sequenced. After bioinformatic treatment, these yield thousands of single nucleotide polymorphisms (SNPs), which are suitable for population genetics, selection signature analysis and cryptic lineage delimitation. For some examples of these approaches in earthworms, see [38][39][40][41].
Time-calibrated phylogenies can provide valuable information about divergence time and to test hypothesis about historical biogeography and environment-trait evolution correlations. Currently, Bayesian methods, such as the one implemented in BEAST [42], allows a wide range of substitution models (which can be different and independent for different sets of sites), flexible model specification, and choice of priors on parameters. Thus, these methods allow to estimate the time-calibrated trees, which are better adjusted to the molecular data analyzed and the calibrations provided. In the case of earthworms, this is the weak point of divergence time estimation; as soft bodied invertebrates, no direct body fossils exist for them, which can be used for calibration. Trace fossils (such as cocoons and galleries) and closely related annelid fossils have been implemented as a compromise [43], but the vast temporal scale and deep genetic divergence between the outgroups and the ingroup resulted in wide confidence intervals, which require cautious interpretation of results. Paleogeographic events and their correlation with splits between sister taxa have been implemented as an alternative to fossils [44,45], yet this approach has been criticized as relying on the assumption of vicariance; divergence between taxa being older than the paleogeographic event cannot be ruled out, resulting in divergence time estimation that are in practice only a minimum estimate. External substitution rates (obtained from previous analyses) should be used with caution as substitution rates for the same gene change between taxa even within the same family [22]; they often come from vicariance-based analyses [46] but they have also been obtained from more robust fossil-based analyses [47,48].
Besides the limitation of calibration sources, the choice of molecular technique can improve divergence time estimation; genomes, transcriptomes, and targeted enrichment methods provide hundreds or thousands of loci among which it is possible to choose the most suitable ones (clock-like or tree-like [49]).
Within this framework, other rare or endemic genera which were absent or underrepresented in the aforementioned study have gradually been added in recent molecular phylogenetic studies ( Figure 2). For example, [23]  These three genera have been recently found to be related to Kritodrilus (Bouche 1972) (whose type species was included in molecular analyses for the first time (Marchán et al. 2021a)) and surprisingly, to the newly described Central European genus Vindoboscolex [48]. The unlikely phylogenetic relationship of those isolated genera has provided some new insight into the early evolution of the Lumbricidae; an early branching clade would have occupied a wide area ranging from Galicia to Hungary and, subsequently, been fragmented into relict, geographically restricted clades.
The important inclusion of generotypes (type species of a genus) in molecular phylogenetic analyses provided other significant advances (Figure 2), such as restricting the genus Eophila (Rosa 1893) to the closest relatives of Eophila tellinii (Rosa 1888) (Eophila gestroi (Cognetti de Martiis 1905) and Eophila crodabepis (Paoletti et al. 2016)), which ended the story of the genus as a taxonomic wastebasket [47]. Similarly, the inclusion of Zophoscolex atlanticus (Bouché 1969) together with several other species previously attributed to Zophoscolex (Qiu and Bouche, 1998) restricted this genus to the French species, while the Iberian species were found to constitute the separate genus Castellodrilus [51].
Ref. [50] demonstrated the close relationship between the poorly known French species Allolobophora chaetophora (Bouché 1972) and Helodrilus cortezi (Qiu and Bouché 1998), which formed a well-supported clade separated from other representatives of their (previously considered) congeneric species. Avelona ligra (Bouché 1969) was shown as a sister species of All. chaetophora by [29] but with a very restricted number of Lumbricidae taxa included in the analysis. Ref. [52] included representatives of most of the previously described subspecies of All. chaetophora, He. musicus (Qiu and Bouché 1998) and Av. ligra in a multigene phylogenetic analysis, showing that H. cortezi, H. musicus, and All. chaetophora constitute a genus-level clade, Gatesona, which also includes the former subspecies of All. chaetophora now elevated to the species level (Figure 2). At the same time Gatesona was retrieved as the sister clade to Avelona, which together constitute a French lineage with most of its diversity restricted to the Massif Central.  Based on wide taxonomic sampling and multigene phylogenetic analysis, [53] recovered the cosmopolitan genera Dendrodrilus (Omodeo 1956) and Allolobophoridella (Mrsic 1990) within the same clade as the North American endemic species of Bimastos, indicating the synonymy of the former with the latter and their ancestral origin in North America together with their sister taxa Eisenoides (Gates 1969). In addition, the morphologically similar Healyella (Omodeo and Rota 1989) and Spermophorodrilus (Bouché 1975) were found to be phylogenetically unrelated and nested within a Dendrobaena sensu lato clade.

Remaining Questions
All the advances stressed both the suitability of molecular phylogenetic approaches and the need for their application to remaining systematic questions within the family.
One of the most glaring systematic issues is the taxonomic wastebaskets identified by [50] as non-monophyletic.
The first of these is Aporrectodea, which includes several of the most widespread and common lumbricids, such as Aporrectodea caliginosa (Savigny 1826), Aporrectodea trapezoides (Duges 1828), and Aporrectodea rosea (Savigny 1826). Furthermore, Ap. trapezoides (the generotype), Ap. caliginosa, and other species included by Bouché (1972) within Nicodrilus (Bouché 1972) appeared to form a well-supported lineage [21] clearly unrelated to Ap. rosea, Ap. georgii (Michaelsen 1890), or Ap. jassyensis (Michaelsen 1891) (Figure 2), all of which had been assigned to the genus Koinodrilus (Qiu and Bouché 1998). However, the latter species behave as rogue taxa in phylogenetic analyses; this means that their positions in phylogenetic trees are unstable and cannot be resolved with certainty. The addition of further representatives of Koinodrilus could allow the phylogenetic relationships of these taxa to stabilize; alternatively, the application of other molecular phylogenetic techniques with greater resolution at deeper nodes, such as AHE, could provide a solution to this problem. As all the Aporrectodea species (including, but not restricted to, Koinodrilus) not related to Ap. trapezoides cannot be considered to belong to a monophyletic Aporrectodea, they must be placed within newly defined genera once their phylogenetic relationships are finally resolved.
Allolobophora represents a similar case. The type species, Allolobophora chlorotica (Savigny 1826), appears to form a monophyletic clade with other green-pigmented lumbricids such as Allolobophora dubiosa (Orley 1881) and Allolobophora molleri (Rosa 1889) [50]. This clade appears to be unrelated to several Carpatho-Balkanic species (such as Allolobophora mehadiensis (Rosa 1895), Allolobophora robusta (Rosa 1895), and Allolobophora sturanyi dacica (Pop 1938)), which have previously been assigned to different genera (such as Serbiona and Karpatodinariona). Interestingly, the latter species appeared to be more closely related to Cernosvitovia [50] (Figure 2). The inclusion of more representatives of those Carpatho-Balkanic clades will be necessary to confirm the hypothesis that these "Allolobophora" forms belong to a redefined Cernosvitovia.
Some representatives of both Aporrectodea and Allolobophora present an additional challenge; deep species-level lineages or cryptic species that have been identified within Ap. trapezoides and Ap. caliginosa [21,54], Ap.rosea [55] and All. chlorotica [13,56]. In these cases, although the molecular phylogenetic evidence already exists, the taxonomy is lagging. The difficulty in describing morphologically cryptic lineages as species may explain why such work has not already been performed (but see [40,57] for alternative approaches to this task), in addition to the difficulty of including holotypes (or topotypes) of previously described taxa in order to assign valid names to these genetic lineages. Although daunting, this task must be undertaken; the aforementioned species are amongst the most abundant in anthropogenic habitats, such as crops and orchards, and all agroecological, ecotoxicological, and applied research targeting them should be accurately assigned to the relevant species, not to a loosely related complex of genetic lineages.
The same problem affects the genera Helodrilus and Dendrobaena. Some representatives of both have been included in molecular phylogenetic analyses [50,53,58,59] revealing that they consist of several unrelated genus-level clades (Figure 2). Although generotypes have been included in both cases, further representatives of their wide taxonomic diversity in Eastern Europe remain to be studied, and the molecular markers must be standardized in order to combine the fragmentary data. In the case of Dendrobaena, a name has already been proposed for species more closely related to Dendrobaena byblica (Rosa 1893) than to Dendrobaena octaedra (Savigny 1826): Omodeoia (Kvavadze 1994). Hence, if other species assigned to Omodeoia were recovered together with D. byblica in a monophyletic clade, there would be molecular support for division of the Dendrobaena species complex into Dendrobaena and Omodeoia. Interestingly, the newly described genus Phylomontanus (Bozorgui et al. 2019) appears to be closely related to D. byblica ( Figure 2); a more comprehensive molecular phylogenetics study would enable testing whether this constitutes a junior synonym of Omodeoia or if it is a third lineage of "Dendrobaena-like" earthworms. In the case of Helodrilus, no replacement name appears to be available for species not closely related to Helodrilus oculatus (Hoffmeister 1895) (the type species). Thus, new genus names will need to be proposed for the other unrelated clades.
In addition to the most conspicuous genera, other smaller ones still pose important challenges for molecular phylogenetics. For example, Perelia (Easton 1983) includes several species of the Middle East and Central Asia, with a few Eastern Europe representatives. The phylogenetic relationships of these species between each other and with other lumbricid genera are currently unknown; in addition, one species was proposed to belong to the new genus Rhiphaeodrilus (Csuzdi and Pavlicek 2005), based on nephridia morphology. Incorporation of these species in a molecular phylogenetic framework would not only be helpful for the systematic revision of the Lumbricidae, but also for historical biogeographic reconstructions, as very few endemic representatives from beyond Eastern Europe are currently available. Eumenescolex (Qiu and Bouché 1998) is a poorly known genus of Western Mediterranean earthworms, with a strongly disjunct distribution (France, Italy, Spain), which is otherwise consistent with the geological history of the region. Members of the genus have been suggested to be related to Scherotheca [60], but they differ strikingly in body size and lifestyle from most species, other than the Corsican endemics. Molecular phylogenetics data on these elusive earthworms could provide more information on the intriguing role of Corsica, Sardinia, and Southern France in the evolution of lumbricids before they drifted apart in the Oligocene [61]. Scherotheca itself is a diverse genus, for which relatively little molecular phylogenetics data is available (but see [60] for four additional representatives, including a newly described one). In this case, molecular phylogenetics would serve as a starting point for determining their ancestral range (Spain, mainland France, or Corsica) and morphological radiation towards a giant anecic phenotype. The case of Eiseniona (Omodeo 1956) or Iberoscolex (Qiu and Bouché 1998) it is also worthy of mention that several Iberian species were described as belonging to the Eastern European genus Eiseniona, (which has also been considered synonym of Aporrectodea), but they were assigned by Qiu and Bouché (1998) to the genus Iberoscolex. The addition of the generotypes Iberoscolex microepigeus (Qiu and Bouché 1998) and Eiseniona handlirschi (Rosa 1897) to the representatives previously included in molecular phylogenetics analyses (Domínguez et al. 2015) will enable a statement to be made regarding the validity of these controversial genera [62].

Advances
The application of molecular phylogenetics to a comparatively small but well represented family in the Western Mediterranean, Hormogastridae (Michaelsen 1900) led to a drastic change in their systematics.
Integrative systematics (combining molecular phylogenetics and new morphological characters) led to the description of nine new species in six years [63][64][65][66], which constitutes an increase of 43%. Furthermore, molecular phylogenetic approaches uncovered high levels of cryptic speciation within the Hormogastridae [15]; the geographically restricted Hormogaster elisae (Álvarez 1977) complex (recently redescribed as the genus Carpetania Marchán et al. 2018) became an ideal model for studying the phylogeography of cryptic lineages [43] and the evolutionary processes involved in their diversification [39], culminating in the integrative description of the three component species [40] (Figure 3).
The reconstruction of an explicit phylogenetic framework through multigene methods enabled the application of the phylogenetic comparative method (rarely used in invertebrates) to study macroevolutionary patterns in Hormogastridae; in this way, the origin and radiation of a key evolutionary innovation (the multilamellar typhlosole) was linked to increasing body weight, with soil characteristics as secondary evolutionary pressures [67].
The systematics of the family were revealed to be in dire need of revision by Novo et al. [22,44], as the genus Hormogaster (Rosa 1887) appeared to include at least four independent, genus-level clades (later described as Hormogaster, Boucheona Transcriptomics analysis led to the inclusion of the highly divergent genera Ailoscolex (Bouché 1969) (previously thought to belong to a different family) and Hemigastrodrilus (Bouché 1970) as the earliest branching clades of the family [28]. However, it was the integration of morphological characters (including new ones [68]) and molecular markers that finally led to proposal of a revised genus system for Hormogastridae [69] (Figure 3). The new systematics of this family revealed a more complex outline of the diversity and biogeography of its genera, opening the door to future evolutionary and ecological research   [66,69]. Bayesian inference of the phylogenetic relationships of Hormogastridae based on a combined morphological and molecular matrix (concatenated sequence of COI-16S-tRNAs-28S-H3 markers). The inset shows the species delimitation of the Carpetania elisae complex based on a maximum likelihood analysis of the SNP dataset obtained with GBS (modified from [39,40]).

Remaining Questions
[70] revealed two cryptic lineage complexes within the Tyrrhenian Hormogaster redii (Rosa 1887) and Hormogaster samnitica (Cognetti de Martiis, 1914); in a similar way to the former Hormogaster elisae complex, these species-level genetic lineages could not be easily delimited using multigene phylogenetic inference (Figure 3). The application of reduced representation techniques such as GBS and RAD-seq, together with increased sampling effort across their ranges, would probably enable integrative species delimitation and description within those complexes. This is not a minor issue; with some of the widest ranges  [66,69]. Bayesian inference of the phylogenetic relationships of Hormogastridae based on a combined morphological and molecular matrix (concatenated sequence of COI-16S-tRNAs-28S-H3 markers). The inset shows the species delimitation of the Carpetania elisae complex based on a maximum likelihood analysis of the SNP dataset obtained with GBS (modified from [39,40]).

Remaining Questions
Ref. [70] revealed two cryptic lineage complexes within the Tyrrhenian Hormogaster redii (Rosa 1887) and Hormogaster samnitica (Cognetti de Martiis, 1914); in a similar way to the former Hormogaster elisae complex, these species-level genetic lineages could not be easily delimited using multigene phylogenetic inference (Figure 3). The application of reduced representation techniques such as GBS and RAD-seq, together with increased sampling effort across their ranges, would probably enable integrative species delimitation and description within those complexes. This is not a minor issue; with some of the widest ranges among Hormogastridae and occupying a diverse range of habitats, H. redii and H. samnitica appear to possess adaptive and colonizing potential far beyond their hormogastrid kin. More precise knowledge of their systematics will enable these species to be used as evolutionary models to apply molecular evolution approaches, such as those implemented by [66,71].

Remaining Questions of Lumbricoidea
As outlined above, extensive advances have been made in knowledge and understanding of the main families of Lumbricoidea (Lumbricidae and Hormogastridae). The two other most basal families (Lutodrilidae and Criodrilidae) are both much less diverse (with one and three species, respectively) and less explored by molecular phylogenetics approaches. Yet, their phylogenetic position and distribution (Nearctic and Palearctic, respectively) make them important pieces to understand the origin and paleobiogeography of this earthworm superfamily. It is well known that earthworm evolution closely reflects paleogeographic events [29,72], and the opening of the Atlantic Ocean would be expected to have determined the first stages of their diversification. Likewise, the formation of the Mediterranean Sea and the uplift of the Pyrenees and the Alps appear to have been key for the diversification of Hormogastridae and Lumbricidae. The origin of both can be found in the terrane, which consisted in the Northeastern Iberian Peninsula, Southern France, Corsica, and Sardinia [48,70]. Their latter eastward expansion, which led to the vast diversification of genera such as Dendrobaena, Eisenia, or Octodrilus, as well as the colonization of North America by the genera Eisenoides (Gates 1969) and Bimastos are comparatively poorly understood. In order to complete this relevant part of the evolutionary puzzle of Lumbricoidea, more accurate paleogeographic reconstructions of the Tertiary period and more complete sampling of the eastern Eurasian genera (including Perelia) are necessary.

Other Families: Remaining Questions
While important work has been conducted in other earthworm families (for example Rhinodrilidae (Benham 1890) [27,73], Megascolecidae (Rosa 1891) [41,45,74,75]), several families have received very little attention relative to the large percentage of earthworm diversity and occupied land masses they represent. Africa, South America, and North America display the strongest deficit of molecular phylogenetic research. Although molecular phylogenetic studies exist for most of those families, they often constitute isolated and unconnected attempts to answer very specific evolutionary questions.
Far from intending to diminish the value of those contributions, this work is meant to encourage earthworm experts to systematically apply available molecular phylogenetic tools in order to establish robust genus systems, spatio-temporal evolutionary frameworks (through biogeographical reconstructions and time-calibrated phylogenies), cryptic diversity assessment, and comparative phylogeographies. In cases where molecular biology facilities are not available, international collaboration should provide satisfactory solutions. However, the most advanced molecular phylogenetic tools will be moot if there are no taxonomists with expertise in the target taxa to guide them; the decline in the number of earthworm taxonomists in the last few decades is an alarming problem that affects all continents alike. Thus, the priority for active earthworm taxonomists should be to train a younger generation of researchers who could integrate knowledge of the intricate systematics of earthworms with the use of the rich molecular toolbox.

Conflicts of Interest:
The authors declare no conflict of interest.