Identifying Past Remains of Morphologically Similar Vole Species Using Molar Shapes

Accurate species identification in fossil remains is a complex task but is a key component for developing good inferences on many, if not all, fundamental questions in macroecology and macroevolution. In the Quaternary, arvicolines are very abundant remains in archeological and paleontological sites in Western Europe and their identification is often based on the first lower molar. The common vole Microtus arvalis (Pallas, 1778) and the field vole Microtus agrestis (Linnaeus, 1761) are commonly found in those deposits. These two species are genetically and ecologically divergent. Nonetheless, their lower molars, on which species identification is done, exhibit a large morphological variation that can potentially lead to some confusion and misinterpretation. Moreover, molecular data suggest that present-day M. agrestis populations are a complex of divergent lineages, some of them being recognized nowadays as valid species. On the basis of extant populations representing a large part of the present-day geographical distribution of these two species, we developed a classification model based on geometric morphometrics of the first lower molar. Our statistical model was then applied on four fossil sites selected to evaluate the relevance of taxonomic determination found in species lists. The model using landmarks describing the overall shape of the first lower molar classifies the two species with the smallest prediction error together with very high individual posterior probabilities. The obtained classification is much better than those arising from shapes of any specific molar part such as the anterior loop, asymmetry or peculiar triangle shape. Discrepancies with expert classification on fossils suggest that existing faunal lists should always be considered cautiously for these two species. Our morphometric model provides a first step towards a rationalized way of revising past collections and expertise for future small mammal assemblages. It will thus help us better understand the paleobiogeographical expansion of these two key species in Quaternary faunas.


Introduction
Developing good inferences from fossil remains, such as phylogeographic, biostratigraphical, bioclimatic or any evolutionary or ecological analysis based on fossil biodiversity, relies on an accurate taxonomic identification of fossil remains, whenever possible by assessing finer levels, such as species or sub-species.Such a postulate depends on the sensitivity of classification criteria and their actual variations within and between species.Investigations into the variation of morphological structures are thus one of the essential bases of taxonomic work in order to avoid an overly typological description and the likelihood of splitting the individuals into invalid species.
In the Quaternary, arvicoline remains are very abundant in archeological and paleontological sites in Western Europe and their identification is often based on first lower molars.The common vole (Microtus arvalis Pallas, 1778) and the field vole (Microtus agrestis Linnaeus, 1761) are commonly found in these Quaternary deposits.Today, they show a quite similar geographical distribution, M. agrestis being slightly more widespread, reaching higher latitudes such as Nordic areas [1].Both species are mainly found in grasslands, where they dig underground galleries.Nonetheless, they occur in distinct habitats.M. agrestis is more often found in ungrazed grasslands, fallow-land and clear-felled areas from sea level to an elevation over 2000 m.It can also occupy forest plantations, woodlands, dunes and rocky habitats.M. arvalis mostly occurs in well drained open landscapes, typically grasslands, pastures, dry meadows within forest zones and alpine meadows up to 3000 m.In short grass habitats, it can burrow its galleries deeper than its counterpart (e.g., [2][3][4]).
Despite a large phylogenetic distance between the two species [5] and their belonging to two distinct clades within Microtus that have probably colonized Western Europe in separate waves [6], these two are quite similar in their outer appearance.Their molars look alike and certain identification from fossil remains or owl pellets is difficult leading to the denomination "Microtus arvalis-agrestis" in many studies of small-mammal assemblages.An abundant literature has been produced to propose morphological and morphometric ways to differentiate these two rodents on bone and dental remains (e.g., [7][8][9][10][11][12][13][14][15][16][17][18][19][20][21]).Until now, the shape of the second upper molar (M2) has proved to be the most reliable character to separate the species [1], the one from M. agrestis being constituted by a supplementary loop [22].However, this character is not fully decisive.Some populations do not present this character on M2 as observed in Danish island field voles [23].Due to high abundances of Microtini fossil remains, the first lower molar is preferentially chosen, as this molar has been considered to carry relevant taxonomic information in arvicolines.Two criteria have been generally recognized on this tooth: the first one corresponds to the presence of a supplementary T9 on the lingual side.It is specific to M. agrestis, but not systematic, its presence can vary at least between 10% and 40% depending on geographical populations and also through time (SM personal obs.).The second criterion is the asymmetry between lingual and buccal triangles, the ones of M. arvalis being approximately the same size, whereas those of M. agrestis present a higher asymmetry [11].Several authors have established an index measuring this asymmetry between triangles T4 and T5 [12,19,24,25].Some other authors have also measured this asymmetry on triangles T2 and T3 [20,26].Nadachowksi [12] initially coupled this asymmetry with individual size to provide a "fairly satisfactory basis for distinguishing between the two species of voles".This index was recently used to explore temporal variations of M. agrestis [19,27].
These past and current variations question the reliability of these criteria and how confident researchers can be in using such tools for species identification.One major limit may come from the non-evaluation of these criteria on a broad geographical scale.Indeed, it has been clearly noted that these two taxa present an important variation both within (e.g., [9,23,28,29]) and among populations (e.g., [1,12,30]).Consequently, the criteria used can provide variable quality in the classification process, which is likely correct on typical individuals, but less accurate on untypical ones.
Recent studies have demonstrated that present-day populations of M. agrestis s.l. are structured as a complex of divergent and geographically homogeneous lineages, with some of these lineages being recognized nowadays as valid species, with Microtus lavernedii found in Mediterranean Europe and Microtus rozianus in Portugal [6,31,32].These species are nonetheless considered as morphologically indistinguishable from M. agrestis s.s.[31] and some introgression was found between M. agrestis and M. lavernedii [33].Divergences between these taxa are dated from 70,000 BP for M. rozianus and from 18,500 BP for M. lavernedii [32].Similar divergent lineages may have existed earlier in the Quaternary within the agrestis species complex but identifying them from their fossil remains is a challenge.
To understand key-events in the past distribution and evolutionary history of these species from fossil remains, it is crucial to identify these species with precision and accuracy to remove any doubt and confusion from fauna lists.This study used geometric morphometrics to describe and quantify shapes of the first lower molar of M. arvalis and M. agrestis.More than individuals coming from throughout Europe were analyzed to build a classification model incorporating a broad intraspecific variation.We also evaluated previously described criteria to define their range of confidence.Then, we applied the model to some fossil samples as a test to evaluate the potential of erroneous identification in historical faunal lists.

Modern Samples
A total of 1372 first molars were collected to constitute the modern reference framework.Since an important geographical variation exists among M. agrestis and M. arvalis populations (e.g., [1]), the studied tooth samples came from a large area of Western Europe (Figure 1), with localities distributed across large latitudinal and longitudinal ranges.Each population has a sample size ranging from 15 to 92.The M. agrestis s.l.sample is constituted by 970 molars coming from 21 localities, whereas the M. arvalis sample is constituted by 402 from 13 localities (Table 1).A few localities were merged to represent a regional sample, such as Ledesma and Bocigas in Spain (Table 1).This aggregation level was not further considered in the analyses and had no influence.Among field vole populations, some localities correspond to the newly described field vole species M. lavernedii (Table 1).Some other populations are located in the French Alps (Isère and Alpes de Haute Provence) and were considered as uncertain as they fall in an undocumented geographical area in molecular studies [31,32].The locality in Maine-et-Loire may correspond to the north border of the distribution [34] and was also considered as uncertain.The locality of Cheserex (Vaud, Switzerland) is closed to the contact zone documented in the Jura [33] and again it was considered as uncertain.Those four localities were not incorporated into the model training but were used as additional data.
All specimens came from collections of Natural History museums or research laboratories.The full list of accession numbers is provided in Table S1.They all came from old trapping sessions and micromammal specialists did their taxonomic identifications at the time on external criteria only.

Fossil Material Remains
The fossil sample is constituted by 166 first lower molars coming from four archaeological sites (Figure 1 and Table 1).These samples were selected to investigate contrasted taxonomic situations to evaluate the potential confidence in published species lists from excavated sites (Supplementary Materials, Section S1).For two sites (Blénien and Peyrazet), species identification was done using traditional morphological criteria and checking the presence of the typical supplementary loop on the second upper molars characteristic of M. agrestis.The teeth come from levels 4 and 5 in Peyzaret (from 13,835 to 16,601 cal BP, [35][36][37], and from level 10/11 in Blénien dating most likely from the beginning of the Holocene [38]).Based on classical criteria, these two fossil cases provided teeth unambiguously attributed at 100% to M. agrestis in Blénien and M. arvalis in Peyrazet.
For the two other sites (units C3 and C2 of Lazaret and level VII of Arma delle Manie), historical identifications by [39] were used for comparison.These sites are older than the first two: about Marine Isotopic Stage (MIS) 6 for Lazaret and MIS 4-5 for Arma delle Manie.Identifications were based on the length of M1, the T4 tilt and the ratio T4/T5, as well as the observation of a few upper M2 with a supplementary loop, clearly attesting the presence of M. agrestis within deposits.

Landmark Schemes and Digitization
The 1538 teeth were first drawn under a Wild M8 binocular (Leica, Wetzlar, Germany) coupled with a camera lucida Wild 308700 (Leica) by only one of us, standardizing the elongation axis of the tooth and its occlusal plane by optimizing the focus along it.The drawings were then scanned using a 600 dpi resolution.We automated the 2D landmarking by first standardizing the molar orientation according to the manual marking of four landmarks (in white in Figure 2) following Brunet-Lecomte [25].Those four landmarks were used for two successive orientation standardizations enabling the automatic detection of fourteen additional landmarks defined as extreme tips of salient and re-entrant triangles and loop tips.All landmarked teeth were further checked for gross error.
Hence, four landmark schemes and two biometric measurements were defined to describe molar shape or the shape of a peculiar part of the tooth (Figure 2): the first one uses 18 landmarks (thereafter FL) capturing the whole molar shape.The second scheme uses the same landmarks except the 13th and a set of 20 semilandmarks on the anterior loop (SL).The third describes only the anterior loop using the same 20 semilandmarks plus landmarks 16 and 10 at the tip of T6 and T7 triangles.Several authors have recognized the T2 shape as a criterion for discriminating the two species (e.g., [11]).Therefore, we uses 20 semi-landmark dataset (T2) plus landmarks 19 and 21 to describe the T2 in more detail.Finally, two additional measurements derived from landmarks were also used: the tooth length was defined as the Euclidean distance from landmarks 1 and 13, and the L8L18 ratio was computed to characterize molar asymmetry.Even if not strictly identical, this ratio is very similar to the LT4LT5 index proposed by Nadachowski [12].
a 600 dpi resolution.We automated the 2D landmarking by first standardizing the molar orientation according to the manual marking of four landmarks (in white in Figure 2) following Brunet-Lecomte [25].Those four landmarks were used for two successive orientation standardizations enabling the automatic detection of fourteen additional landmarks defined as extreme tips of salient and reentrant triangles and loop tips.All landmarked teeth were further checked for gross error.

Geometric Morphometrics
Landmark data (FL, SL, Anterior Loop (AL) and T2 datasets) were superimposed using a partial Generalized Procrustes Analysis (GPA) to remove all non-shape components (position, size and orientation) by means of rigid transformations [40].When necessary, semilandmarks were allowed to slide by minimizing the bending energy [41].The space defined by the Procrustes coordinates was then projected onto a Euclidean tangent space (the tangency point being the mean shape).Procrustes fits were performed with geomorph version 3.0.3[42].As tangent space coordinates are rank-deficient variables, we rotated the data using principal component analysis (PCA) and removed axes with null eigenvalues.
As tangent space variation is dependent on the geometry of the mean shape, fossils were not used in the different Procrustes superimpositions to remove any influence of those data on the definition of the mean shape.Fossils were then superimposed onto the present-day mean shape and projected onto its tangent space.Fossil tangent coordinates were then rotated according to the eigenvectors from the present-day PCA.Thus, they were treated as supplementary data and they did not interfere with the building of our model for species identification.

Statistical Classification
Shape differences between the two species were analyzed using linear discriminant analysis (LDA).The quality of the model produced was evaluated using leave-one-out cross validation (LOOCV).This process estimates the prediction error on samples independently from individuals used to build the model by removing the well-known inflation of the prediction error when the same individuals are used on both model training and test (e.g., [16] on arvicoline teeth).Another interest of LOOCV compared to k-fold CV stays on a sample size of the training set almost identical to the complete sample providing an unbiased estimate of the expected prediction error [43].However, this estimate has a high variance and all models are very correlated because the training sets are almost completely similar.It results that this LOOCV error could be a poor estimate of the error conditional on a given training set [43].The purpose being to ultimately develop a statistical expert model for classifying Microtini species from this specific training set, this subtle problem is likely limited here.
The approach delineated by Evin and coauthors [44] was applied to evaluate different potential bias in classification based on tooth shape.Hence, we evaluated the effect of the number of included variables by training a model and its LOOC validation from one to p principal components (PCs), p corresponding to the dimensionality of the shape space (i.e., the number of non-null eigenvalues: 32, 68, 41 and 40 PCs for the four landmark datasets).The imbalance of sample sizes may bias the model towards one or the other species.Therefore, we also trained the model 100 times on random balanced samples of each species, the chosen sample size being the smallest one between the two species.Furthermore, we explored whether the expectation of 0.5 error rate was met when a model was trained on a random reshuffling of individuals to the two species (simulating a null model where the classification would be only due to chance).The randomization was done 100 times on balanced samples of the two species.Finally, we also looked at the quality of the classification process based on the posterior probabilities of each individual to belong to their chosen class.
All analyses were performed in R v3.3.2 (R Core Team 2016).All data and R codes are available on request.

Morphospaces
In the tooth shape analyses (FL, SL, AL and T2), the two first PCs account, respectively, for 32%, 38%, 55% and 65% of the total shape variation (Figure 3 and Figure S1).Most of the variation (~90%) is recovered with, respectively, 16, 17, 6 and 5 PCs.In the four cases, PC1 tends to hold an important part of the interspecific variation and to separate the two species, with the least overlap for the FL analysis and the largest one for the AL (Figure 3A,C).For the FL, shape changes associated to PC1 concern mainly behavior of lingual salient angles (LSAs), buccal salient angles (BSAs) and lingual reentrant angles (LRAs).In comparison to the mean shape, the LSAs tend to extend horizontally onto the lingual side, whereas LRAs tend to go the opposite way.On buccal triangles, BSAs tend to mainly migrate vertically (Figure 3A).All these tendencies reflect the labio-lingual asymmetry in M. agrestis compared to M. arvalis.For AL results, we can observe that M. agrestis samples occupy all the morphospace on the first two PCs, whereas M. arvalis has a more restricted distribution.The overlap in this plane is mainly related to the M. agrestis shape that impinges on the M. arvalis morphospace.Modifications are mainly observed on LRA 5, which tends to draw a T9 triangle.The SL analysis, which combines FL and AL datasets, presents an intermediate overlapping in comparison with FL and AL results.Although there is an overlapping on T2 triangle results, distinction between the two species can be observed mainly due to the extreme part of the salient triangle tending to move upwards and the reentrant part moving downwards in M. agrestis.The T2 triangle of this last species tends to be finally less ovoid and more salient than M. arvalis one.
Despite M. arvalis having on average a smaller M1 than M. agrestis, no clear discrimination has been observed on tooth length for the two species.For the L8L18 ratio, a bimodal distribution clearly shows the distinction between M. arvalis (range: 0.86-1.75)and M. agrestis (range: 1.13-2.35),with, however, a sizable overlapping part.Despite M. arvalis having on average a smaller M1 than M. agrestis, no clear discrimination has been observed on tooth length for the two species.For the L8L18 ratio, a bimodal distribution clearly shows the distinction between M. arvalis (range: 0.86-1.75)and M. agrestis (range: 1.13-2.35),with, however, a sizable overlapping part.

The Arvalis-Agrestis Differentiation
No bias in relation to the predictor dimensionality was found as the prediction error decreases and reaches an asymptote after including a few PCs but never increases again (Figure 4a, see [44] for such bias).The large sample size of our model even in relation to the large dimensionality of the shape spaces is likely the reason for that nice behavior.A very small bias in prediction error (<1% for most) was found for shape data (FL, SL and T2 models) in relation to the imbalance of samples (Figure 4B).Prediction error on the asymmetry ratio and AL semilandmarks are a little more biased (being still <2%), but imbalance has a profound effect of tooth length prediction (11%).On random samples, all models recover the 50/50 expectation (Figure 4).
No bias in relation to the predictor dimensionality was found as the prediction error decrea reaches an asymptote after including a few PCs but never increases again (Figure 4a, see [44] h bias).The large sample size of our model even in relation to the large dimensionality of pe spaces is likely the reason for that nice behavior.A very small bias in prediction error (<1% st) was found for shape data (FL, SL and T2 models) in relation to the imbalance of samples (Fig .Prediction error on the asymmetry ratio and AL semilandmarks are a little more biased (be <2%), but imbalance has a profound effect of tooth length prediction (11%).On random samp odels recover the 50/50 expectation (Figure 4).The model based on the 18 fixed landmarks (FL) yields to the lowest prediction error of 1.9% (Table 2).Moreover, the individuals are almost all classified (96%) with a very high confidence (Figure 5A and Table 2).Whereas the SL model is almost as performant as FL (Figure 5B), shapes of specific parts (AL and T2) and local asymmetry (L8L18 ratio) do not perform so well (Figure 5C,D,F) with a higher prediction error (about 8 to 10% of error) and a strong decrease in model quality (i.e., with a higher percentage of individuals with a posterior probability smaller than 0.9).As expected, considering the large species overlap (Figure 3), tooth length is not performing very well with a quarter of individuals misclassified (Figure 5E and Table 2).
The shape changes associated to first linear discriminant functions (Figure 6) are very similar to those observed on PC1 (Figure 3).The FL first linear discriminant function (LD1) expresses the increase of tooth asymmetry between buccal and lingual triangles in M. agrestis.The AL LD1 expresses its differentiation with a new triangle, and T2 LD1 defines a more acute triangle in M. agrestis.The overlaps on LD1 between the two species are much less important with FL and SL than for the four other models.
those observed on PC1 (Figure 3).The FL first linear discriminant function (LD1) expresses the increase of tooth asymmetry between buccal and lingual triangles in M. agrestis.The AL LD1 expresses its differentiation with a new triangle, and T2 LD1 defines a more acute triangle in M. agrestis.The overlaps on LD1 between the two species are much less important with FL and SL than for the four other models.   1 The prediction error is based on leave-on-out cross-validation; 2 The percentage of individuals classified with a posterior probability higher than 0.9.

Identifying Variants within the Field Vole
Similar to the results observed on the identification of M. agrestis s.l. and M. arvalis, the FL scheme presents the best results when used to discriminate M. agrestis s.s. and M. lavernedii (Figure S2).However, the effect of sample imbalance is more pronounced here than before, probably because M. lavernedii is represented only by 157 individuals compared to the 638 M. agrestis s.s.When the model is trained on balanced samples, the FL error rate is 17.97 ± 1.71%.It should be noted that the T2 model provides the second-best model with an error rate of 22.91 ± 1.71%.Other models based on the anterior loop, the asymmetry index or the tooth length provide almost random results.
When subdividing the reclassification by locality (Table S2), it appears that northern populations show almost perfect classification as M. agrestis s.s., whereas southern populations in the zone of distribution of M. lavernedii or at the border present in some proportion variants classified as M. lavernedii.For the uncertain locality on the northern border of lavernedii distribution (Maine-et-Loire), 18% of the individuals were classified as M. lavernedii.For uncertain populations in the French Alps located in the discontinuity of the lavernedii distribution, 14% (Alpes de Haute Provence) and 25% (Isère) of the individuals were assigned to M. lavernedii.
Shape changes associated to the discriminant functions show that the slight differentiation between M. lavernedii and M. agrestis is mainly on the triangle shapes (Figure 7).Looking specifically at the T2, which has almost the same prediction error as the FL, the lavernedii triangle appears less sharp than the agrestis one.

Identifying Variants within the Field Vole
Similar to the results observed on the identification of M. agrestis s.l. and M. arvalis, the FL scheme presents the best results when used to discriminate M. agrestis s.s. and M. lavernedii (Figure S2).However, the effect of sample imbalance is more pronounced here than before, probably because M. lavernedii is represented only by 157 individuals compared to the 638 M. agrestis s.s.When the model is trained on balanced samples, the FL error rate is 17.97 ± 1.71%.It should be noted that the T2 model provides the second-best model with an error rate of 22.91 ± 1.71%.Other models based on the anterior loop, the asymmetry index or the tooth length provide almost random results.
When subdividing the reclassification by locality (Table S2), it appears that northern populations show almost perfect classification as M. agrestis s.s., whereas southern populations in the zone of distribution of M. lavernedii or at the border present in some proportion variants classified as M. lavernedii.For the uncertain locality on the northern border of lavernedii distribution (Maine-et-Loire), 18% of the individuals were classified as M. lavernedii.For uncertain populations in the French Alps located in the discontinuity of the lavernedii distribution, 14% (Alpes de Haute Provence) and 25% (Isère) of the individuals were assigned to M. lavernedii.
Shape changes associated to the discriminant functions show that the slight differentiation between M. lavernedii and M. agrestis is mainly on the triangle shapes (Figure 7).Looking specifically at the T2, which has almost the same prediction error as the FL, the lavernedii triangle appears less sharp than the agrestis one.
Figure 7. Distribution of M. agrestis and M. lavernedii samples on the FL and T2 linear discriminant functions.The shapes associated to the LD1 between the two species have been drawn.The gray curve represents the mean shape and the black curve represents the shape associated to a positive change along LD1 from the mean shape.

Statistical Fossil Identification
Past variations appear in the range of the modern intraspecific variations of the two species (Figure 8).Species identification of fossil samples was predicted based on model training on the complete sample sizes.Arma delle Manie and Lazaret samples were identified by [39] while Blénien and Peyrazet individuals by one of us.The sites of Blénien and Peyrazet show good to perfect agreement between the expert and model-based classifications, whereas the two other sites show relatively poor matches (Table 3).In Arma delle Manie and Lazaret, our best performing models (FL and SL) are in opposition with the identification by Abassi, some M. agrestis individuals being reevaluated as M. arvalis.Only the tooth length corroborates expert identifications.For Blénien, all indexes are congruent with expert identifications, except for a few individuals in the case of specific indexes (AL, T2 and ratio).In Peyrazet, a few individuals have been attributed to M. arvalis instead of M. agrestis according to the model.It is noticeable that the specific indexes (AL, T2 and ratio) suggested a larger proportion of M. agrestis than FL and SL approaches.Moreover, if the tooth length was used alone as discriminant factor, almost all individuals would have been misclassified as M. agrestis.
Figure 7. Distribution of M. agrestis and M. lavernedii samples on the FL and T2 linear discriminant functions.The shapes associated to the LD1 between the two species have been drawn.The gray curve represents the mean shape and the black curve represents the shape associated to a positive change along LD1 from the mean shape.

Statistical Fossil Identification
Past variations appear in the range of the modern intraspecific variations of the two species (Figure 8).Species identification of fossil samples was predicted based on model training on the complete sample sizes.Arma delle Manie and Lazaret samples were identified by [39] while Blénien and Peyrazet individuals by one of us.The sites of Blénien and Peyrazet show good to perfect agreement between the expert and model-based classifications, whereas the two other sites show relatively poor matches (Table 3).In Arma delle Manie and Lazaret, our best performing models (FL and SL) are in opposition with the identification by Abassi, some M. agrestis individuals being re-evaluated as M. arvalis.Only the tooth length corroborates expert identifications.For Blénien, all indexes are congruent with expert identifications, except for a few individuals in the case of specific indexes (AL, T2 and ratio).In Peyrazet, a few individuals have been attributed to M. arvalis instead of M. agrestis according to the model.It is noticeable that the specific indexes (AL, T2 and ratio) suggested a larger proportion of M. agrestis than FL and SL approaches.Moreover, if the tooth length was used alone as discriminant factor, almost all individuals would have been misclassified as M. agrestis.
Table 3. Classification results on fossil samples (Arma delle Manie, Lazaret, Blénien and Peyrazet) between expert identifications, geometric models (Fixed Landmarks (FL), Free Landmarks plus Semilandmarks (SL), Anterior Loop (AL), Triangle 2 (T2)) and biometric measurements (Tooth length and L8L18 ratio).In brackets, values having posterior probabilities higher than 0.9 are given.The quality of classification of individuals and the proportion of individuals classified with a posterior probability higher than 0.9 are highly variable depending on the model (Figure 5).In all cases, tooth length presents the lowest values.It should be noted that the proportion of individuals with a posterior probability between 0.5 and 0.9 is much higher than our expectation from the training data, and that, even for the best performing models (FL and SL).

Expert
These models present relatively similar classification results, although, in the case of Peyrazet, the SL approach tends to identify slightly more M. agrestis.It is noteworthy that seven out the nine M. agrestis identified by FL method in Peyrazet show posterior probabilities ranging from 0.58 to 0.91, and therefore might not be very typical M. agrestis.For this site, where no upper M2 characteristic of M. agrestis has been found (Supplementary Materials, Section S1), L8L18 ratio and AL semilandmarks suggest nonetheless a sizable presence of M. agrestis.

Molar Shape Variations and Discriminant Models
Geometric morphometrics has been able to describe and quantify intra-and interspecific shape variation of the first lower molar of these two divergent but morphologically close species.This subtle quantification has allowed evaluating the reliability of historical taxonomic criteria.Describing the whole shape by a set of landmarks (FL) or with, in addition, semilandmarks on the anterior loop (SL) appear to be the best approaches in the modern reference framework, with the lowest prediction errors (Table 2).In the taxonomic literature, one of the main criteria usually retained on the first lower molar to distinguish M. arvalis from M. agrestis was the asymmetry between buccal and lingual triangles (e.g., [11]).This observation later led to many asymmetry-derived indexes (e.g., [12]).Indeed, none of the formulated criteria are completely convincing, because they are based either on typological descriptions or restrained samples.The FL approach reflects a large part of this asymmetry as can be observed with the shape changes between the two species.Using landmarks integrates the asymmetry on all triangles and not just the T4-T5 kept for the ratio.Even if relatively good taxonomic estimates could be obtained with the L8L18 ratio, a significant overlap could nonetheless blur the analyses leading to inaccuracies.It appears clearly that additional species-specific variation is present on the M1 and is captured by landmarks.
Other criteria (AL and T2) present larger prediction errors (7-11%) and lower percentages of individuals classified with a posterior probability higher than 0.9, which could barely reach 50% for the asymmetric index.Both M. arvalis and M. agrestis show large variations of their AL shapes.Whereas the differentiation of the anterior loop with a supplementary cusp is strictly related to M. agrestis, the risk is higher for intermediate forms, which are present in both M. agrestis and M. arvalis.Species identification by an expert might be biased towards one or the other (Table 3).Over-identification towards a specific species can occur following the criteria used.From our corpus of individuals, tooth length is a poor discriminator between these two species.To summarize, the single use of AL, T2 shapes and L8L18 ratio to separate these two species definitely presents a higher prediction error than using all these criteria within the complete description of the tooth with landmarks (FL and SL).Most likely, AL and T2 shapes could also include population variations related to their geography and phylogenetic origins.The risk may even be greater with fossil materials, where population turnovers were operated through time as observed with DNA analyses (e.g., [45][46][47][48]).
The FL approach has the advantage over the semilandmarks that it could be used even on slightly broken specimens.Specifically, in the agrestis-arvalis discrimination, the semilandmarks on the anterior loop do not provide additional information and therefore most of the diagnostic characters are embedded in the fixed landmarks.The simplicity of use of the FL scheme and its performance make it the method of choice for this species identification both for present-day and fossil individuals.
Among M. agrestis s.l., some new species are now recognized [6,31,32,34].However, the current view is that they could not be recognized based on their morphology [34].Applying a discriminant model between samples in the geographical ranges of M. agrestis and M. lavernedii leads nonetheless to some promising results.A few characters on triangles appear to vary between the two species such as a less sharp T2 on M. lavernedii than with the typical M. agrestis.In the range of M. lavernedii, the results are more mixed than in northern localities.Such mixed results may come partly because of a bias towards M. agrestis in relation to the strong imbalance of the sample sizes.Besides, these mixed attributions are in agreement with our current understanding of the contact zone between these two field vole species, a contact zone presumed to spread from the French Alps to the northwestern French coast [31,32] and in western Switzerland where some introgression has been observed for mitochondrial DNA [33].Despite removing the uncertain populations supposed to be in this contact zone (Maine-et-Loire, South Jura, French Alps), some training samples may still have been erroneously attributed to M. lavernedii.The presence of such errors or possible hybrids tends to reduce the differences between the species in the model and to negatively bias its error rate.
A trend in the percentage of lavernedii attribution can be observed in western Switzerland and eastern France.In the northern Jura and the Vosges, the model recognized almost no M. lavernedii, whereas in the south Jura and south Valais some mixed attributions are observed and in Sion (Valais) an almost complete attribution of the samples to M. lavernedii is observed.Contrary to what was previously thought [31], some morphometric differentiation can be observed between the field vole taxa despite a very recent divergence, less than 20,000 BP [32].Similar diverging lineages may have repeatedly occurred earlier in the Quaternary within the field vole group and geometric morphometrics seems able to capture the subtle morphological changes that may be used to track ancient faunal turnovers and refuge zones.

Fossil Variations and Their Implications
Microtini with arvaloïdes and agrestoïdes morphology appear at least since the Lower Pleistocene [49] and teeth with Microtus arvalis/agrestis morphology are regularly observed through the Middle Pleistocene (e.g., [50]).However, the question of the appearance of M. agrestis species remains unsolved.The oldest remains of M. agrestis came from Miesenheim I in Germany and were clearly attested from the extra loop on upper M2 [51].Then, many other European sites attested its presence dated around the Mindel-Riss interglacial (Holstein-MIS 11), as Orgnac 3 and Saint-Estève in France [11,52], Uppony 1 in Hungary [53] and Atapuerca in Spain [54].Finally, these two species are still often listed for fossil assemblages as "Microtus arvalis/agrestis" or "Microtus agrestis/arvalis" during these 400 ka of species history.Their historical geographical distribution is thus highly difficult to perceive and their use as palaeoenvironmental indicators is impeded.
Results obtained on our fossil site selection highlight the identification difficulties between these two species following the used criteria.The case of Peyrazet shows different possible interpretations.Indeed, based on the upper M2 analyzed, no individual with the M. agrestis supplementary loop has been recognized.Nonetheless, at least two individuals with a posterior probability larger than 0.97 have been recognized by the FL LDA.This observation suggests that we have some evidence of the rare presence of M. agrestis in the studied levels, although the hypothesis that they are extreme forms of M. arvalis, very close to its field counterpart, could not be totally ruled out.This result confirms the specificity of the lower levels of Peyrazet, M. agrestis becoming a common element of the small fauna communities in upper levels [36].At Arma delle Manie and Lazaret, in which upper M2 attested the presence of M. agrestis, our results demonstrate clearly the occurrence of both species.On the contrary, in the case of Blénien, no M. arvalis has been attested, the posterior probability being over 0.99 for all the 22 teeth, confirming the particularity of this level of being constituted only by M. agrestis.
Past variations appear mainly in the range of the modern intraspecific variations but posterior probabilities of fossil samples were not as good as the ones of present-day samples.Indeed, despite the good model performance and the large geographical variation included, classification of fossil material should always be discussed with caution.The model is based on current living populations, recording the present-day variation from the several lineages existing in the two species in Europe since the last Glacial [55], lineages that represent their ongoing diversifications [6].Past populations of voles and lemmings in Western Europe appear to have repeatedly faced continental scale turnovers (e.g., [46,56,57]).In such a scheme, present-day lineages have recently diverged and might not reflect past diversity.The model records only part of the variation these species could have had.Despite this cautionary tale, using probabilistic statements will definitely improve past species attributions.Moreover, a probabilistic framework could be integrated furthermore using weighted averages instead of hard classification thresholding in all analyses, for example in the exploration of fossil variations though time and space.
Finally, geometric morphometrics suitably describes the variation of vole molars and thus could help to resolve difficult issues in species identification and should be used in priority over other indexes.This precise estimation of intra-and inter-specific variations of extinct species will prevent in turn the over-identification inherent in the typological approaches.Accurate species identification of small mammals from their teeth remains a crucial step for a deeper understanding of the evolution of these taxa and of the ecosystems during the last million years.

Supplementary Materials:
The following are available online at http://www.mdpi.com/2571-550X/1/3/0020/s1, Section S1: Collections and accession numbers of all samples; description of sites, Table S1: Accession numbers, collections and species identification in collection, Figure S1: Percentage of variance explained by each principal component given the landmark scheme, Figure S2: Model prediction error for the four landmark schemes given the number of PCs used, Table S2: Agrestis-lavernedii attribution using the FL model.

Figure 1 .
Figure 1.Map of Western Europe showing the present-day localities used in the model training for classification of the two species.White circles locate samples of Microtus arvalis and blue circles samples of Microtus agrestis.The location of the four fossil samples included to evaluate historical species list are shown in yellow squares.Numbers from 1 to 4 correspond to Peyrazet (Lot), Blénien Cave (Alsace), Lazaret (Alpes-Maritime) and Arma delle Manie (Liguria, Italy), respectively.

Figure 1 .
Figure 1.Map of Western Europe showing the present-day localities used in the model training for classification of the two species.White circles locate samples of Microtus arvalis and blue circles samples of Microtus agrestis.The location of the four fossil samples included to evaluate historical species list are shown in yellow squares.Numbers from 1 to 4 correspond to Peyrazet (Lot), Blénien Cave (Alsace), Lazaret (Alpes-Maritime) and Arma delle Manie (Liguria, Italy), respectively.

Figure 3 .
Figure 3. Morphospaces (as PC1-PC2 planes) of modern samples with main patterns of shape variation for the four landmark schemes and histograms for the two biometric measurements: (A) Fixed Landmarks (FL); (B) Free Landmarks plus Semilandmarks (SL); (C) Anterior Loop (AL); and (D) Triangle 2 (T2).Two histograms of modern samples for: (E) Tooth Length; and (F) L8L18 ratio.PC: principal component.Shape changes from the mean shape associated to PC1 have been drawn for each model.

Figure 3 .
Figure 3. Morphospaces (as PC1-PC2 planes) of modern samples with main patterns of shape variation for the four landmark schemes and histograms for the two biometric measurements: (A) Fixed Landmarks (FL); (B) Free Landmarks plus Semilandmarks (SL); (C) Anterior Loop (AL); and (D) Triangle 2 (T2).Two histograms of modern samples for: (E) Tooth Length; and (F) L8L18 ratio.PC: principal component.Shape changes from the mean shape associated to PC1 have been drawn for each model.

Figure 4 .
Figure 4. (A) Variations of the model prediction error for the four landmark schemes with the number of PCs used.Continuous colored lines correspond to the prediction error observed for the initial sample.Associated colored dashed lines correspond to the prediction error for the set of 100 balanced samples randomly generated from the initial sample (for clarity purpose, only the mean error prediction is reported).The grey line corresponds to the mean prediction error for the set of 100 balanced samples with random reassignment of species (null model of correct classification only due to chance).The associated 95% confidence interval (CI) is figured in light grey.(B) Model prediction error for the four landmark schemes (considering all PCs) and the two biometric measurements.Colored dots correspond to the observed prediction error.Associated black dots correspond to the mean prediction error plus their 95% CI for the set of 100 balanced samples.The prediction error for a null model of correct classification only due to chance is figured by grey dots (mean prediction error) and associated grey segments (95% CI).See Figure3and text for abbreviations.

Figure 4 .
Figure 4. (A) Variations of the model prediction error for the four landmark schemes with the number of PCs used.Continuous colored lines correspond to the prediction error observed for the initial sample.Associated colored dashed lines correspond to the prediction error for the set of 100 balanced samples randomly generated from the initial sample (for clarity purpose, only the mean error prediction is reported).The grey line corresponds to the mean prediction error for the set of 100 balanced samples with random reassignment of species (null model of correct classification only due to chance).The associated 95% confidence interval (CI) is figured in light grey.(B) Model prediction error for the four landmark schemes (considering all PCs) and the two biometric measurements.Colored dots correspond to the observed prediction error.Associated black dots correspond to the mean prediction error plus their 95% CI for the set of 100 balanced samples.The prediction error for a null model of correct classification only due to chance is figured by grey dots (mean prediction error) and associated grey segments (95% CI).See Figure 3 and text for abbreviations.

Figure 5 .
Figure 5. Distributions of posteriori probabilities of species identification for modern and fossil samples based on: (A) FL; (B) SL; (C) AL; (D) T2; (E) Tooth length; and (F) L8L18 ratio.The numbers at the dot vertical line correspond to the percentages of specimens with a posteriori probability higher than 0.9.

Figure 5 .
Figure 5. Distributions of posteriori probabilities of species identification for modern and fossil samples based on: (A) FL; (B) SL; (C) AL; (D) T2; (E) Tooth length; and (F) L8L18 ratio.The numbers at the dot vertical line correspond to the percentages of specimens with a posteriori probability higher than 0.9.

Figure 6 .
Figure 6.Distribution of modern samples on the six linear discriminant functions.See Figure 3 for abbreviations.For FL, AL, SL and T2 models, mean shape deformation from the first linear discriminant function (LD1) between M. arvalis and M. agrestis have been drawn.

Figure 6 .
Figure 6.Distribution of modern samples on the six linear discriminant functions.See Figure 3 for abbreviations.For FL, AL, SL and T2 models, mean shape deformation from the first linear discriminant function (LD1) between M. arvalis and M. agrestis have been drawn.

Table 1 .
List of modern and fossil sites reported with the number of analyzed teeth.

Table 1 .
Cont.The denomination M. agrestis/uncertain refers to samples identified as M. agrestis in the agrestis-arvalis analysis and as uncertain samples in the agrestis-lavernedii analysis.See text for details.

Table 2 .
Prediction error and prediction quality (posterior probability > 0.9) of each model.

Table 2 .
Prediction error and prediction quality (posterior probability > 0.9) of each model.