Evaluating the Conservation Status of a North-Western Iberian Earthworm (Compostelandrilus cyaneus) with Insight into Its Genetic Diversity and Ecological Preferences

In spite of the high conservation value of soil fauna, the evaluation of their conservation status has usually been neglected. This is more evident for earthworms, one of the most important ecosystem service providers in temperate habitats but rarely the subject of conservation research. These studies have not been developed in Western Europe, which comprises high diversity and several early-branching, relic genera. One potentially menaced representative of this fauna is Compostelandrilus cyaneus; this risk can be assessed by implementing potential distribution modeling and genetic diversity monitoring to their known populations. Genetic barcoding was performed in representatives of four populations (three of them newly sampled) in order to estimate genetic diversity and population genetics parameters. Ensemble species distribution models were built by combining several algorithms and using the five more relevant bioclimatic and soil variables as predictors. A large amount of genetic diversity was found in a small area of less than 20 km2, with populations located in less managed, better-preserved habitats showing higher genetic variability than populations isolated from natural habitats and surrounded by anthropic habitats. Potential distribution appears to be strongly restricted at a regional scale, and suitable habitats within the extent of occurrence appear fragmented and relatively limited. In addition, the main variables determining the ecological niche of C. cyaneus suggests a vulnerability to climate change and increasing soil compaction. Based on this knowledge, this species was assessed as Critically Endangered following the IUCN Red List of Threatened Species criteria, and some potential conservation actions are suggested.


Introduction
The conservation value of soil fauna has historically been disregarded, despite the ecological importance and high diversity of these animals [1,2]. It has been suggested that failure to preserve soil biodiversity could have drastic economic consequences [3] as well as affect ecosystem functioning and diversity [4].
Earthworms (Crassiclitellata, Annelida) constitute a special case within soil fauna in regard to their conservation value. The instrumental value of earthworms encompasses both direct and indirect economic benefits. The indirect economic value of earthworms as one of the most important ecosystem service providers in temperate habitats is widely known [5]: their activity modifies the physico-chemical properties of soil [6] and promotes microbial, animal, and vegetal diversity [7]. This may become even more relevant under global climate change and in the face of global food security risks [8,9]. Their direct economic benefits have less weight but are still important. Although direct consumption of earthworms is limited to some indigenous populations, "earthworm flour" is increasingly used as a protein source in human and animal nutrition [10]. Earthworms are also used as fish bait and in vermicomposting, both of which are economically important [11].
Even though comprehensive, large-scale works have been published on global patterns of earthworm diversity distribution [12], research on their conservation status is strikingly scarce relative to studies on other soil invertebrates (e.g., termites [13], Collembola [14], mites [15], myriapods [16]). Previous research of this type has mainly been carried out in the Balkans, New Zealand, and Australia. Stojanovic et al. [17][18][19] evaluated the rich endemic Lumbricidae fauna of the Balkans Peninsula, aiming to make up for the absence of red lists of earthworms in southeastern European countries. It was done at different scales: species level [17], genus level [18], and country-wide [19]. Going one step beyond scale-wise, Buckley et al. [20] published a systematized evaluation of the conservation status of New Zealand earthworms, establishing a template for further research in other countries and biogeographic regions. Due to its special status as a flagship species, the Giant Gippsland earthworm Megascolides australis has been the subject of particularly detailed conservation research [21,22], leading to a fully-fledged National Recovery Plan [23]. The basic biology, habitat preferences, and population structure of this species are fairly well known [24]. Preliminary genetic diversity assessment has also been conducted [25], although not publicly available.
The native earthworm fauna of Western Europe (belonging to the families Lumbricidae and Hormogastridae) is of great interest from evolutionary and ecological perspectives. The Iberian Peninsula (and Balearic Islands), southern France, Corsica, and Sardinia possess high diversity and endemicity and can be considered earthworm biodiversity hotspots. In addition, these terranes, which constituted a continuous landmass from the Cretaceous to the Oligocene-Miocene [26], have been suggested to be the center of origin of both sister families due to the presence of several early-branching genera [27,28]. Two of these genera, Galiciandrilus Domínguez, Aira, Porto, Díaz Cosín & Pérez-Losada 2017 and Compostelandrilus Domínguez, Aira, Porto, Díaz Cosín & Pérez-Losada 2017, are isolated in the NW Iberian Peninsula but appear to be closely related to Kritodrilus Bouché 1972 and Vindoboscolex Marchán 2021 in a surprisingly disjunct distribution [29]. This suggests that these taxa are relics of a formerly widely distributed lineage whose range may have contracted due to climate changes or replacement by other earthworm genera. The distribution of species of Galiciandrilus and Compostelandrilus appears to be extremely patchy and associated with habitats such as cork oak and holm oak forests, which are not widespread in the NW Iberian Peninsula. To add to the intrinsic conservation value of these species, each has highly divergent morphological characters, stressing the importance of each of them as reservoirs of the past morphological radiation of the aforementioned lineages.
C. cyaneus (Briones and Diaz Cosin, 1993) is the easternmost species of this complex, known from a single location in León (104 km from its closest relative). The surrounding area has been greatly modified by different types of agricultural land use, and the original habitats of this species have been reduced to hills and slopes unsuitable for crops. The species is, therefore, a priority target for conservation evaluation.
Potential distribution modeling/niche characterization [30][31][32] and genetic diversity monitoring [33] have been established as important tools for evaluating the conservation status of vertebrates. Ecological niche modeling has been used successfully on earthworms; in spite of their peculiarities (difficulty of sampling, patchy distributions), MaxEnt [34][35][36][37], Random Forests [38], and other algorithms have achieved high predictive power when using large-scale variables to predict the earthworms' potential distribution. Implementing these approaches to earthworm biodiversity conservation would facilitate the necessary research on this key component of soil fauna.
The aims of this study are (i) to characterize the distribution and niche of C. cyaneus, (ii) to investigate the genetic diversity of known populations of this species, and (iii) to use this information for evaluating the conservation status of the species, establishing a template for other endemic earthworm species.

Specimens and Sampling
Specimens of C. cyaneus were collected in a sampling survey carried out in León, northwestern Spain, in November 2020. León is geologically heterogeneous, with three differentiated units: the Cantabrian mountain range to the north, the Bierzo depression and Galaico-Leonés mountain range to the west, and a Tertiary-Quaternary sedimentary basin in the rest of the province-the Meseta (plateau) [39]. In the Meseta (the main unit), the average elevation is between 700 and 1000 m, and the relief is near-horizontal, with elevated, softly undulated plains interrupted by fluvial valleys [40]. The general climate is Continental Mediterranean, with high thermal amplitude, long winters and irregular precipitations [41].
Thirty-one points were selected along two West-East transects and one North-South transect (connecting the former) in the area between the known populations of Compostelandrilus. Sampling points were spaced as regularly as possible (between 3 and 8 km), and natural-to-seminatural habitats were preferred (Table 1). Habitat type and soil characteristics of the locations where C. cyaneus was found are as follows: The north population (N) was sampled in a land mosaic of pastures with sclerophyllous vegetation and arable land in the close vicinity of the town of Villarmún. Soil characteristics: loam soil, coarse sand: 0%, fine sand: 47 The west population (W) was sampled within bushy pastures in an isolated hillock surrounded by non-irrigated arable land. Soil characteristics: clayey-sandy-loam soil, coarse sand: 0%, fine sand: 56.83%, coarse silt: 5.22%, fine silt: 9.15%, clay: 28.80%, pHH: 8.24, organic matter content: 1.45%, nitrogen content: 0.13%, C/N: 10.94.
Earthworms were collected by digging and hand-sorting and were then rinsed with water and fixed in 96% ethanol. Sampling was qualitative, and a standardized sampling effort of one hour of digging by four researchers was applied to every location. The sampling and handling of specimens followed ethical considerations and conformed to Directive 2010/63/EU. The species were identified from the external and internal morphological characters analyzed by Domínguez et al. [42].

DNA Isolation and Sequencing
After morphological identification, up to 10 (preferentially mature) specimens of each population were chosen for further analysis. Total genomic DNA was extracted from ventral integument samples of approximately 5 mm × 5 mm using the DNeasy Blood & Tissue Kit (Qiagen). Cytochrome oxidase C subunit 1 (COI) was amplified by polymerase chain reaction (PCR), with the primers and conditions described by Pérez-Losada et al. [43]. PCR products were purified and sequenced by the C.A.C.T.I Genomics service (University of Vigo, Vigo, Spain).
DNA sequences obtained in this study are available in Genbank, and the accession numbers are shown in Table 1.

Phylogenetic Analyses and Population Genetics
Sequences from the other species of the genus Compostelandrilus (menciae and bercianus) generated in Domínguez et al. [42] were retrieved from Genbank and used as a reference for the phylogenetic analysis.
Sequences were aligned using MAFFT v.7 [44], with default settings. After trimming, the obtained sequences had a length of 606 base pairs. The best-fit evolutionary model was selected using jModelTest v. 2.1.3 [45] by applying the Akaike information criterion (AIC) [46], Bayesian information criterion (BIC) [47], and the Decision Theory method (DT). GTR + I + G was selected as the best-fit evolutionary model.
Maximum Likelihood phylogenetic inference was performed using RAxML-NG [48] in the CIPRES Science Gateway V. 3.3 platform, from 10 random starting trees and 1000 rapid bootstrap replicates. A suitable starting tree for the time-calibrated phylogenetic inference was generated by using the chronopl function in the R package ape v5.2. to convert the Maximum Likelihood tree into an ultrametric tree by non-parametric rate smoothing (NPRS). A relative calibration (root age = 1) was implemented: this approach provides an approximate visualization of the relative age of the clades without relying on external calibrations or assumptions of vicariance.
A uniform distribution with an initial value = 0.002 (ranging from 0.00005 to 0.02) was specified through the ucld.mean parameter, and a uniform distribution with an initial value = 0.10 (ranging from 0 to 10) was specified for the ucld.stdev parameter. A relative calibration of 1 (mean = 1, standard deviation = 0.05) was implemented as a normal prior for the root of the tree. Fifty million generations were specified for the Monte-Carlo Markov chain, and sampling was conducted every 5000th generation. The log file was visualized in Log Tracer v. 1.7 [49] to check for convergence and effective sampling sizes over 100. The final tree was generated using TreeAnnotator v.1.10.4. [50] with a burn-in of 2000 trees.
All layers were aggregated to the smallest resolution possible (100 m × 100 m).

Variable Selection
In order to select the most suitable predictor variables, a Boosted Regression Tree analysis was performed following a modified version of the script by Irving et al. [54] in R. After identifying the variables of the highest relative importance, correlation analysis was performed, and variables with correlation values above 0.7 were discarded. A total of five variables were chosen for the final dataset: Ecological niche models were obtained using the R package 'SSDM' [55] with standard parameters. Ensemble species distribution models (ESDMs) were built by combining the algorithms ('MAXENT', 'GLM', 'CTA', 'MARS', 'SVM', 'GBM', 'GAM' and 'ANN'), producing kappa values greater than 0.5, with 5 repetitions for each algorithm. Both a presence-absence analysis and a presence-only analysis with randomly generated pseudoabsences were performed and were compared according to the evaluation parameters provided by package 'SSDM' by default (area under the curve-AUC, omission rate, sensitivity, specificity, proportion of correctly predicted occurrences, and Kappa).
Extent of occurrence (EOO) and area of occupancy (AOO) were calculated in QGIS 3.16.3. The area of suitable habitat was determined according to the dominant land use. The latter was obtained from SIGPAC (Agricultural Plot Geographic Information System) for 2012 and 2020 for calculation of potential habitat loss.

Sampled Populations
All of the individuals studied possessed internal and external morphological characters consistent with the original description [56] and with that of Domínguez et al. [42].

Phylogenetic Analyses and Population Genetics
The four populations of C. cyaneus were recovered as a sister clade to C. bercianus and C. menciae by the phylogenetic analysis ( Figure 1). The four populations were not recovered as monophyletic: the N, S, and W populations were nested within paraphyletic population E, divided into three deep lineages. In addition, population W was nested within paraphyletic population S (Figure 1). The estimated relative age for the different lineages within population E was older than the split between the known populations of C. bercianus and C. menciae, while the split between populations N and S showed a similar estimated age to the latter. The split of population W within population S appeared, proportionately, very recent. Uncorrected average pairwise genetic (UAPG) distance based on the COI sequences between the four populations ranged between 1.08% and 5.78% ( Table 2). The lowest value corresponded to the divergence between the W and S populations, while the highest values corresponded to the divergence between the E and the S-W populations. Intrapopulation UAPG divergence ranged between 0.04% and 0.96%. The haplotype network ( Figure 2) displayed complex genetic structures within populations S and E, while they were significantly simpler for populations N and W. Populations E, N, and S were separated by 22, 23, and 15 mutational steps, respectively, while populations S and W were separated by a single mutational step. Haplotypic diversity (H) ( Table 3) was high for populations S and E (0.644-0.822) and low for populations W and N (0.250-0.378). Nucleotide diversity (π) was low for all populations (0.00099-0.00957). The H-π relationships for populations S and E (high H and low π) corresponded to high demographic expansion from small effective populations. For populations W and N, H-π relationships (low-low) matched the expectation for recent bottlenecks or founder effects.
The geographical representation of the predicted suitability values of the presenceabsence model is shown in Figure 3. The predicted highly suitable areas were narrower than in the presence-only model ( Figure S1) and restricted to a narrow southeastern band in the province of León. The relative contributions of the predictor variables to each model are shown in Table  S1. Temperature Seasonality, Precipitation Seasonality, and Soil Bulk Density were the three most influential variables for both models.
The extent of occurrence (EOO) (the polygon delimiting the known populations) covered an area of 17.48 km 2 . The area of suitable habitat, according to the SIGPAC dominant land-use types (pastures, pastures with bushes, pastures with trees; Figure 3c), was 4.2 km 2 in 2012 and 4.5 km 2 in 2020, i.e., 24% and 26% of the EOO. These areas can be considered estimates of the area of occupancy (AOO). There was no estimated loss of suitable habitat in the 2012-2020 period.

Genetic Diversity
The information obtained about the genetic diversity of the known populations of C. cyaneus can be used to infer the long-term viability and connectivity of the populations and their relationship with the environment.
For such a small area (less than 20 km 2 ), known populations of C. cyaneus harbor a large amount of genetic diversity. Populations E and S, located in less managed, better-preserved habitats, showed the highest amount of genetic variability; on the other hand, populations W and N, which were isolated from natural habitats and surrounded by anthropic habitats (nonirrigated arable land), showed significantly lower genetic diversity. Together with population genetic parameters, this suggests that the latter populations colonized their current habitats from the southeastern populations and became isolated by land-use changes and a reduction in suitable habitat. The reduction in genetic diversity may be explained by low connectivity with other populations and lower habitat suitability [57][58][59][60]. If these factors lead to inbreeding depression, the populations may be at a higher risk of extinction [61].
Genetic diversity appeared uncorrelated with the five environmental variables selected for ecological niche modeling. It could have been expected that, at least for the three most influential variables (seasonality of temperature and precipitation and soil bulk density), genetic diversity would be higher for certain preferred values of these variables; yet, such an effect was not observed. It is possible that the geographical distance between the populations is too small for them to display significant environmental heterogeneity. Another possibility is that a single molecular marker is not enough to capture the effect of such an environmental pressure on genetic diversity; landscape genomics analyses (such as in [62]) could help to test this hypothesis.

Ecological Niche and Distribution
The potential distribution of C. cyaneus appears to be strongly restricted at a regional scale, corresponding to roughly 10% of the area of the province of León.
Even without taking land use into account (as Corine Land Cover was excluded in the selection step), suitable habitat within the EOO appears fragmented and relatively limited. Considering the main variables determining the ecological niche of C. cyaneus (seasonality of temperature and precipitation and soil bulk density), climate change and increasing soil compaction may have historically modified the area of suitable habitat for this species and may further modify it in the future. These are two of the main threats to soils and to agriculture sustainability [63,64].
Suitable habitat, as inferred from the preferred types of land use (pastures with or without bushes or trees), also shows a high degree of fragmentation, with large gaps of unfavorable habitat between patches. While preferred habitats do not appear to have decreased in the last 8 years, they are evidently scarce in the area, as arable land and reforested woodland appear to be the dominant types of land use.
Due to the characteristics of their habitat, ecological preferences and the fine-scale distribution of soil fauna are generally poorly known when compared with more conspicuous animals; in fact, soils have been described as the "third biotic frontier" (after oceanic abysses and tropical forest canopies) [65]. Ecological niche modeling provides a useful tool for opening such a black box. The identification of the most relevant variables for the potential distribution of soil species can serve as a starting point for experimental work on the interaction between environmental variables and life-history traits. In addition, delimiting potentially highly suitable habitats should facilitate the rigorous sampling of these animals at a large scale by discarding a priori wide unsuitable areas and focusing the sampling effort on those most likely to contain the target species [31]. This is important in order to make quantitative sampling across the range of an earthworm species viable. Although qualitative sampling with standardized effort is suitable for the evaluation of earthworm populations, quantitative sampling should be the preferred choice: it unlocks information such as abundance and population density, which can be compared between populations and correlated to habitat characteristics and large-scale environmental variables [66].

Conservation Status Evaluation for C. cyaneus
The following assessment roughly follows the structure and format of the IUCN Red List of Threatened Species. Country occurrence: Native: Spain

Distribution Map
See maps in Figures 2 and 3.

Population
Four populations of this species have been discovered. Current population trend: unknown. Population genetics suggest that two of the known populations display low genetic diversity and may have suffered recent bottlenecks.

Habitat and Ecology
The species has been found in natural pastures with sclerophyllous bushes and/or trees (Quercus ilex rotundifolia), usually restricted to isolated hillocks and slopes when the habitat is surrounded by arable land. It has not been found in reforested woods composed of pines or other non-native, commercial species. Other earthworm species are rarely found within the same patches as C. cyaneus, the most frequent being Aporrectodea trapezoides (Dugès, 1828) and Aporrectodea rosea (Savigny, 1826).
C. cyaneus is a large endogeic earthworm (Figure 4a). This ecological category is usually associated with poor active dispersal abilities (as these earthworms move slowly and rarely crawl over the surface) and k-like life-history strategies (long time to reach maturity, low number of offspring, high longevity). C cyaneus deposits a large volume of casts on the soil surface: this is very likely to modify the resistance of soils to erosion by surface runoff. The casts appear to be later colonized by unidentified lichens (Figure 4b).

Systems: terrestrial Use and trade
The species is not utilized in any known form.

Threats
The small area inhabited by this species has been greatly modified by agricultural land use, and populations are usually immersed in mosaics of natural and anthropogenic habitats. An increase in the area dedicated to crops or reforested woods would reduce the suitable habitat for these earthworms, further reduce the connectivity between populations, and increase putative competition with cosmopolitan earthworms.

Conservation Actions
No conservation actions are currently in place for this species as the populations are not included within any protected areas. Further sampling within and around the EOO should be conducted to confirm the extent of the area and to provide further information about the preferred habitats of the species. A citizen science program (possibly including local farmers) could be developed, owing to the easy recognition of the surface casts of this species (Figure 4c). This would enable the monitoring of populations. A plan limiting the establishment of non-native tree plantations and arable land within the habitats of the known populations and encouraging the creation of corridors of pastures with sclerophyllous bushes and trees between the known populations should be considered priority actions by the relevant authorities.

Implications for Earthworm Conservation
Even though different earthworm species show significant differences in ecological preferences [67] and their response to habitat alteration and other human-mediated impacts [68], some implications about potential threats and possible conservation actions could be extrapolated to related or ecologically similar earthworm taxa.
The preservation of patches of original, unmodified habitat appears to be key to narrowly distributed, endogeic earthworms such as C. cyaneus as they act as genetic diversity reservoirs from which the recolonization of surrounding habitats could be eventually achieved. To avoid isolation between those patches and recently established populations that are in genetic drift, with reduced fitness and eventual extinction, connectivity between populations should be encouraged. For that, corridors of native vegetation should be preserved or restored. The importance of hedgerows and grass field margins for invertebrate diversity [69] and soil functioning [70] is well known, but their putative function as genetic flow enablers could give them additional value for soil fauna conservation. The lower bulk density in relation to other land uses [70] should make them highly suitable habitats for earthworm species that share similar ecological preferences to C. cyaneus.
Even though native, endemic species can be occasionally found in unfavorable land uses such as arable lands, they are usually outcompeted by exotic or cosmopolitan earthworms in such highly-productive soils [71], hindering their recolonization by the former. However, in soils with lower availability of resources, native earthworms may prevent cosmopolitan earthworms from further expanding their range [71]. This dynamic further highlights the importance of land-use mosaics, including grass field margins and hedgerows.

Conclusions
Genetic diversity estimation based on COI molecular barcoding and ecological niche modeling based on large-scale and soil predictor variables has provided useful insight into the conservation status of a narrowly distributed, endemic earthworm from the highly diverse Iberian Peninsula. The more vulnerable populations were identified based on low genetic diversity and habitat isolation. In addition, threats for its populations were inferred from the most influential environmental variables. This approach appears promising in terms of facilitating research on earthworm conservation evaluation and compensating for the lack of attention to this key element of soil ecosystems in conservation biology.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/genes13020337/s1, Table S1: Predictive model parameters and relative contributions of the predictor variables. AUC: area under the curve, prop.correct: proportion of correctly predicted occurrences, TEMPSEA: temperature seasonality, PRSEA: precipitation seasonality, MTWQ: mean temperature of wettest quarter, SOILW: soil water content, SOILBD: soil bulk density. Figure   Data Availability Statement: The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request. Molecular barcoding sequences are available in Genbank under accession numbers MZ614758-MZ614795.