In or Out of the Checklist? DNA Barcoding and Distribution Modelling Unveil a New Species of Crocidura Shrew for Italy

The genus Crocidura (Eulipotyphla, Soricidae) is the most speciose genus amongst mammals, i.e., it includes the highest number of species. Different species are distinguished by skull morphology, which often prevents the identification of individuals in the field and limits research on these species’ ecology and biology. We combined species distribution models and molecular analyses to assess the distribution of cryptic Crocidura shrews in Italy, confirming the occurrence of the greater white-toothed shrew Crocidura russula in the northwest of the country. The molecular identification ascertained the species’ presence in two distinct Italian regions. Accordingly, species distribution modelling highlighted the occurrence of areas suitable for C. russula in the westernmost part of northern Italy. Our results confirm the role of Italy as a mammal hotspot in the Mediterranean; additionally, they also show the need to include C. russula in Italian faunal checklists. To conclude, we highlight the usefulness of combining different approaches to explore the presence of cryptic species outside their known ranges. Since the similar, smaller C. suaveolens may be displaced by the larger C. russula through competitive exclusion, the latter might be the species actually present where C. suaveolens had been reported previously. A comprehensive and detailed survey is therefore required to assess the current distribution of these species.


Introduction
Field identification at the species level may be problematic and unreliable for rodents and other small mammals, especially when species belonging to the same genus share similar morphology Table 1. Italian Crocidura specimens analysed in our work, including those resembling C. russula. Specimens of C. suaveolens, C. leucodon and C. pachyura were included for comparison with C. russula. Specimen details and GenBank accession numbers are reported for each sample. a.n., accession number. provided the first evidence of the presence of C. russula on the Italian territory and reviewed, on such a basis, the currently known distribution of Crocidura shrews occurring in Italy. To conclude, using species distribution modelling, we provide evidence for the potential occurrence of C. russula in Italy and for areas of potential overlap with C. suaveolens.

DNA Barcoding
Four individuals showing morphological features that match those reported for C. russula [20] were killed by domestic cats in northwest Italy, where the presence of this species was suspected [21]. Specimens of other Italian species (C. suaveolens, C. leucodon and C. pachyura) were accidentally found dead by the authors or other collaborators (see Acknowledgements: Table 1; Figure 1). Table 1. Italian Crocidura specimens analysed in our work, including those resembling C. russula. Specimens of C. suaveolens, C. leucodon and C. pachyura were included for comparison with C. russula. Specimen details and GenBank accession numbers are reported for each sample. a.n., accession number.  Total DNA was extracted from the Crocidura tail samples collected in Italy (Table 1) by using the DNeasy Blood & Tissue Kit (Qiagen, Milan, Italy) following the manufacturer's instructions. Purified DNA concentrations and sample quality were estimated fluorometrically with a NanoDrop™1000 Spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA). To produce DNA barcoding sequences for the morphologically identified Crocidura species (i.e., C. suaveolens, C. leucodon and C. pachyura) and identify the four presumed C. russula samples, the standard DNA barcode 5′-end region of the mitochondrial COI gene (658 bp) was amplified and sequenced using the universal primers Total DNA was extracted from the Crocidura tail samples collected in Italy (Table 1) by using the DNeasy Blood & Tissue Kit (Qiagen, Milan, Italy) following the manufacturer's instructions. Purified DNA concentrations and sample quality were estimated fluorometrically with a NanoDrop™1000 Spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA). To produce DNA barcoding sequences for the morphologically identified Crocidura species (i.e., C. suaveolens, C. leucodon and C. pachyura) and identify the four presumed C. russula samples, the standard DNA barcode 5 -end region of the mitochondrial COI gene (658 bp) was amplified and sequenced using the universal primers LCO1490 and HCO2918 [32], with the thermal profile described by [33] and chemical reaction Diversity 2020, 12, 380 4 of 12 parameters in [34]. After sequencing, primer-nucleotide-sequence trimming, and a sequence-quality check, the presence of stop codons was verified through the online tool EMBOSS Transeq (http://www.ebi.ac.uk/Tools/st/emboss_transeq/). This analysis was conducted to exclude the amplification of COI pseudogenes that could affect the identification results or the discovery of divergent intraspecific lineages [10]. Sequence data were submitted to the European Bioinformatics Institute of the European Molecular Biology Laboratory (EMBL-EBI) and assigned to the accession numbers provided in Table 1.
To confirm the taxonomic assignment of the presumed Crocidura russula samples from northwest Italy, the obtained sequences were first queried against the BOLD Systems (IDS tool) database [35]. Second, to assess the genetic divergence among these specimens and other Italian and western Palearctic Crocidura species, we assembled a multiple alignment for the COI fragment including the sequences obtained in this study and the available sequences from BOLD (Table 1). Multiple sequence alignments were produced using MAFFT online (https://mafft.cbrc.jp/alignment/server [36]), set to default parameters. Due to the different lengths of the available sequences, the alignment was trimmed to the same final lengths. Genetic K2P distances were calculated using MEGA X [37].

Species Distribution Modelling
We collected occurrence records (with photos validated through an expert-based approach) for the period 2007-2020 from multiple data sources: iNaturalist (www.inaturalist.org: downloaded on 10 August 2020), Ornitho (www.ornitho.it: downloaded on 19 August 2020), the Facebook group on Italian Mammals (https://www.facebook.com/groups/mammiferi/, updated up to 12 August 2020) and the authors' personal records. Validated data from iNaturalist are also included in the GBIF (Global Biodiversity Information Facility) platform, the same used by the International Union for Conservation of Nature (IUCN) to assess the range of animal species. Several of them were also included in previous work on the genetics/skull morphology of Italian shrews [12,21,26,[29][30][31] or stored in Italian museums (Museo di Storia Naturale di Firenze "La Specola"; Museo di Storia Regionale della Maremma). Several others were directly captured, and the body measurements appeared to be within the variability of just one of the two species in sympatric areas [20]. We kept, for analyses, occurrences from areas including sympatric populations of both C. russula and C. suaveolens only if the identification was deemed reliable [31], as well as occurrences from areas where previous genetic work confirmed the presence of only one of these species [18,19,24,26,38]. Occurrences that could not be unambiguously assigned to one species were excluded. From the resulting dataset, we removed all records with a spatial accuracy coarser than 1 km.
We obtained 117 reliable records of C. russula and 138 of C. suaveolens. We modelled the potential distribution of the two Crocidura species as a function of climate, topography and land-cover within 1 km × 1 km cells of a grid superimposed on central and southern Europe. We selected bioclimatic variables, downloaded from the CHELSA database [39], at a 30 arc-second resolution. Topographic data were derived from the European Digital Elevation Model with a 25 m resolution (EU-DEM v1.0, made available by the European Environment Agency). We calculated the slope (average slope over the cell in degrees) and solar radiation (global radiation calculated for 21 March, 21 June and 22 September, averaged over time and cell) in GRASS [40]. Land cover variables were expressed as the respective proportional cover within the cell and derived from the Corine Land Cover [41]. Some land use variables, which were likely to have similar importance for the target species based on their ecology, were merged together to reduce the number of predictors and increase their potential importance. This process led to the following categories: urban (all artificial surfaces expect for green urban areas), complex crops (closely associated crops of different types or crops interspersed with natural vegetation), inland water (watercourses and waterbodies), salt water (coastal lagoons, estuaries and seas), bare ground or sparse vegetation (all habitats with no or sparse vegetation) (see Table S1 in Supplementary Material for details). Our occurrence data were thus opportunistic and non-systematic. Therefore, we adopted a maximum entropy approach, as implemented in Maxent, as this method outperforms others when non-standardized data are used [42,43]. Maxent works with an occurrence-background framework, and it is essential to place background points, to mirror the environmental conditions that had been sampled with data collection (i.e., to avoid the inclusion into the background of conditions where the target species had not been found just because such conditions were not sampled) [44]. In our study, we did not have any information about the sampling efforts, so we restricted the background to the areas containing occurrence records by creating a 20 km buffer around the occurrence records, within which we randomly placed 30,000 points. These resulted in 28,452 actual background points after overlap with environmental layers. To build models as robust as possible, for each species, we subdivided the occurrence data into four spatially independent partitions using the checkerboard 2 method in ENMeval [45]. We thus used the subset containing data from three partitions to train the model and the fourth to test it on spatially independent data. We used only linear and quadratic features and selected the regularization multiplier on the basis of the Akaike's Information Criterion, corrected for a small sample size (AIC c ). We did this first for a "basic" full model, which was then used for the selection between correlated variables: among pairs of correlated predictors (r > 0.7), we kept only the one included in the most supported model based on AUC (Area Under the Curve of the Receiver Operating Characteristic) values, using SDMtune [46]. We then removed all variables with Lambda = 0, suggesting no noticeable effect on species occurrence. Finally, we performed model tuning, selecting-according to the AICc value-the regularization multiplier, the features to be used (linear and/or quadratic), the number of iterations and the inclusion/exclusion of environmental variables (starting from the removal of the variable with the lowest value of permutation importance [47]). The values selected by the tuning procedure were the following ones: C. russula: regularization multiplier: 1, features: linear and quadratic, number of iterations: 540; C. suaveolens: regularization multiplier: 0.5, features: linear and quadratic, number of iterations: 360. We then calculated, for each species, the AUC and TSS (True Skill Statistic) values over the training and test datasets, and the omission rates at selected thresholds (10th percentile and minimum training presence) for the test data to ensure that models were not subject to overfitting.

DNA Barcoding Identification
The mitochondrial COI-5 end region was successfully sequenced for all the Crocidura samples collected in Italy, and no stop codons were found. The four presumed C. russula shared the same haplotype, and the BOLD-IDS search returned a 100% maximum identity match with 13 German reference specimens and one Belgian reference specimen of C. russula. The identity of Italian specimens of C. suaveolens and C. leucodon was also confirmed, whereas for C. pachyura, the DNA barcode sequence produced in this study was the first one ever published for this species.
The multiple alignments encompassing all the publicly available DNA barcode sequences of western Palearctic Crocidura species contained neither stop codons nor indels, and after trimming, these were 511 bp long. The K2P-distance values confirmed the marked genetic divergence of C. russula from the other Crocidura species, supporting the unambiguous identification of the northwest Italian samples ( Table 2).

Species Distribution Modelling
For both C. russula and C. suaveolens, the distribution models appeared sufficiently robust. The model's performances were similar for the training and test datasets for C. russula (TSS of 0.55 vs. 0.57, and AUC of 0.84 vs. 0.83, for the training and test datasets, respectively; N = 88 and 29), as well as for C. suaveolens (TSS of 0.52 vs. 0.5, and AUC of 0.81 vs. 0.79, for the training and test datasets, respectively; N = 101 and 37). The omission rates confirmed the lack of overfitting, these being very close (or slightly lower than, in the case of 10th percentile for C. russula) to the expected values.
The environmental suitability for both species depended on several variables (Table 3) and was positively associated with medium-high values of urban cover and negatively so with arable land. The most important factors for C. suaveolens were BIO4 (temperature seasonality) and BIO6 (the minimum temperature of the coldest month), which revealed a potential association of the species with climates characterized by high seasonality and relatively mild winters. The suitability for C. russula was affected first by the urban cover and then by BIO7 (the temperature annual range) and BIO10 (the mean temperature of the warmest quarter), suggesting the avoidance of climates with extreme temperature differences between seasons and of temperature extremes during the summer. The suitability for C. suaveolens was favoured by partial (quadratic) cover of broadleaved forest and wetlands, whereas C. russula likely preferred vineyards and orchards, avoiding bare ground or sparse vegetation and a high cover of broadleaved forest.
The potential distribution of habitats suitable for C. russula (Figure 2) highlighted the occurrence of suitable environments in northwest Italian regions. These areas perfectly overlapped with the records of the species and also included some nearby areas (see the inset in Figure 2). Italy showed a high suitability for C. suaveolens (Figure 3). All the confirmed occurrences were included in areas suitable for both species in Italy. The spatial variation of the environmental suitability for both species is shown in Figures S1 and S2, in the Supplementary Material. The potential distribution of habitats suitable for C. russula (Figure 2) highlighted the occurrence of suitable environments in northwest Italian regions. These areas perfectly overlapped with the records of the species and also included some nearby areas (see the inset in Figure 2). Italy showed a high suitability for C. suaveolens (Figure 3). All the confirmed occurrences were included in areas suitable for both species in Italy. The spatial variation of the environmental suitability for both species is shown in Figures S1 and S2, in the Supplementary Material.  White colour denotes unsuitable areas, whereas suitable areas are shown by a red gradient from poorly suitable sites in light red to highly suitable ones in dark red. The inset shows the habitat suitability in detail for eastern France and western Italy.

Discussion
In our work, we provided the first genetic evidence of the presence C. russula in Italy, near the border with France (Nervia river valley) and in western Piedmont (province of Cuneo), thus confirming previous suggestions by other authors based on skull morphology and owl pellet analyses [21,29,30]. Old distributional data, on which basis C. russula was thought to be present across the whole country [15,48], are, in fact, misleading because, at that time, the taxon also included C. suaveolens and C. pachyura [49], because of the high overlap of most of the morphological features and measurements among these taxa (Table 4: [20,21]). Further studies should be conducted to define the current range of C. russula in Italy, at least in the northwestern regions (Piedmont, Liguria and Lunigiana in northwestern Tuscany [21]). The species-environment relationships suggested both similarities and divergences in the ecology of the two shrew species we examined. The suitability for both species increases almost linearly with the urban cover (with a decrease in suitability for complete urban cover) and decreases with arable land. Habitat suitability for C. suaveolens was positively influenced by moderate cover of

Discussion
In our work, we provided the first genetic evidence of the presence C. russula in Italy, near the border with France (Nervia river valley) and in western Piedmont (province of Cuneo), thus confirming previous suggestions by other authors based on skull morphology and owl pellet analyses [21,29,30]. Old distributional data, on which basis C. russula was thought to be present across the whole country [15,48], are, in fact, misleading because, at that time, the taxon also included C. suaveolens and C. pachyura [49], because of the high overlap of most of the morphological features and measurements among these taxa (Table 4: [20,21]). Further studies should be conducted to define the current range of C. russula in Italy, at least in the northwestern regions (Piedmont, Liguria and Lunigiana in northwestern Tuscany [21]). Table 4. Body measurements of Crocidura species occurring in Italy and neighbouring countries [20,21].

Species
Body- Tail  The species-environment relationships suggested both similarities and divergences in the ecology of the two shrew species we examined. The suitability for both species increases almost linearly with the urban cover (with a decrease in suitability for complete urban cover) and decreases with arable land. Habitat suitability for C. suaveolens was positively influenced by moderate cover of broadleaved forest and wetlands, in agreement with other authors [31], and was especially associated with areas Diversity 2020, 12, 380 9 of 12 characterized by high seasonality in temperature and not reaching excessively low temperatures in winters. The suitability for C. russula, instead, is negatively affected by extreme temperature differences between seasons and temperature extremes in summer. Such differences between species suggest that C. suaveolens is more specialized for Mediterranean and continental, relatively warm, climates than C. russula, whereas the latter could be adapted to more oceanic climates.
Assessing the presence of small-sized, elusive mammals may be a difficult task, and cryptic species may remain long undetected [6]. Distinguishing the occurrences of C. russula and C. suaveolens is challenging because skulls are needed to fulfil the task. Apart from the skull analysis of sacrificed specimens, examining skulls from owl pellets may provide an answer [31], but in those cases, the exact location is uncertain due to the predator's mobility (up to several kilometres between roosts/nests and feeding sites [50][51][52]). In this work, we provided evidence that DNA barcoding is a reliable litmus test for identifying Crocidura shrews where they are sympatric or close by [10].
The potential distributions depicted by our models matched fairly well with the known patterns of occurrence. Very few occurrence records of C. suaveolens are available for eastern Europe, whereas most of this area was depicted by the model as highly suitable for the species, coherently with the known distribution [14,38]. The model for C. russula identified a higher suitability for western Europe, in agreement with the known occurrence pattern [53]. The British Isles were also identified as potentially suitable for the species but were not included within the species range according to IUCN maps, as the occurrence of C. russula in Ireland was confirmed after the IUCN assessment [54,55]. Interestingly, the model detected a continuous high suitability for C. russula from southeast France to northwest Italy. This finding further reinforces the point we make that some populations of this species occur in the northwest corner of Italy: suitable habitats are predicted to occur in sites with genetically confirmed occurrence records and in nearby areas, where new investigations might lead to the discovery of other populations.
Not only do our results provide strong evidence of the occurrence of a to-date-overlooked species in Italy but also highlight the potential of applying an integrated approach in the study of cryptic species for shedding light on their distribution and ecology, both fundamental tools for local and global conservation actions. Finally, our study further emphasizes the importance of the Mediterranean basin and of Italy in particular as biodiversity hotspots. Accordingly, several other vertebrate taxa show similar high speciation rates in the region, occur in sympatry or in close proximity in Italy [56][57][58][59], and thus deserve further research efforts in the future to properly assess their distribution and conservation status.