Soil-Related Predictors for Distribution Modelling of Four European Crayﬁsh Species

: One of the most critical challenges in species distribution modelling is testing and validating various digitally derived environmental predictors (e.g., remote-sensing variables, topographic variables) by ﬁeld data. Therefore, here we aimed to explore the value of soil properties in the spatial distribution of four European indigenous crayﬁsh species. A database with 473 presence and absence locations in Romania for Austropotamobius bihariensis , A . torrentium , Astacus astacus and Pontastacus leptodactylus was used in relation to eight digitalised soil properties. Using random forest modelling, we found a preference for dense soils with lower coarse fragments content together with deeper sediment cover and higher clay values for A . astacus and P . leptodactylus . These descriptors trigger the need for cohesive soil river banks as the microenvironment for building their burrows. Conversely, species that can use banks with higher coarse fragments content, the highland species A . bihariensis and A . torrentium , prefer soils with slightly thinner sediment cover and lower density while not inﬂuenced by clay/sand content. Of all species, A . astacus was found related with higher erosive soils. The value of these soil-related digital descriptors may reside in the improvement of approaches in crayﬁsh species distribution modelling to gain adequate conservation measures.


Introduction
Under climate change, drought and flash flood episodes intensify in frequency and severity, invariable leading to disturbed aquatic fauna [1]. To survive, a crayfish population needs the appropriate quality of water [2,3] and a particular microenvironment required for sheltering by burrowing, an activity closely related to the soil structure in the river banks [4][5][6].
Species distribution modelling relies on quality environmental data tested and validated as predictors by consistent field information [7][8][9]. From those, digitally derived environmental data (e.g., remote-sensing variables, topographic variables) are the most valuable because they can be easily computed and applied at large scales enlarging the perspective of scientific approaches [10,11]. In parallel with increasing the computation capabilities, the availability of remote-sensing based datasets is also increasing [12][13][14]. Being a surrogate for describing the interactions between the species and ecosystem assemblage [15,16], digitally derived environmental data rely on acquiring accurate raw field distributional data because they are essential to validate the quality of a set of predictors for a given species.
Europe has little crayfish taxa diversity, with six from over the 540 species worldwide [17]. Burrowing behaviour is highly important in crayfish life, three ecological types

Soil Data
We used as soil data the most recent version of the Soil Grids system at 250 m spati

Soil Data
We used as soil data the most recent version of the Soil Grids system at 250 m spatial resolution. Soil Grids provides global predictions, using machine learning methods, for numerous soil properties at seven standard depths (0, 5, 15, 30, 60, 100, and 200 cm), based on ca. 150,000 soil profiles used for training and a stack of 158 soil covariates (topography, vegetation, climate and lithology maps) [49]. From the available soil properties, we selected the most appropriate ones which could be related to the presence/absence of the crayfish: absolute depth to bedrock in cm (related to soil thickness), bulk density (fine earth) in kg/m 3 , sand (50-2000 µm), silt (2-50 µm), clay (0-2 µm) content mass fraction in %, coarse fragments volumetric in %. Due to the large extent of the study area and to avoid issues related to the difference in soil thickness between field locations, we used the average value for soil properties across the seven depths.
Along these, we used two descriptors related to soil erosion. The Revised Universal Soil Loss Equation (RUSLE), at a resolution of 100 m, is the most detailed assessment of soil erosion by water for the European Union, calculated based on higher resolution peer-reviewed inputs of rainfall, soil, topography, land use, and management from the year 2010 [50][51][52], with values expressing soil loss in t/ha.yr. The second descriptor, soil erodibility factor (K), is a component of RUSLE and illustrates the specific contribution of soil in the total risk estimation. The combination of various factors leads to data assumed to be related to soil stability to erosion, integrating information in K factor on organic matter content, soil texture, soil structure, permeability, coarse fragments, and stone content [53].
For each crayfish species, we created a database recording the following information for each field location: presence/absence of crayfish and CPUE as dependent variables and the values of the soil-related descriptors as independent variables (absolute depth to bedrock, bulk density, sand, silt, clay, and coarse fragments content, soil erosion by water, soil erodibility factor).

Variable Importance Analysis
Quantification of the influence of environmental variables on the dependent variables (e.g., crayfish presence/absence) is a critical issue in many applied approaches [54]. In our case, quantification of the influence of soil-related descriptors helps to understand and elucidate the causes explaining the spatial variation of four crayfish species.
To quantify the effect of soil-related predictors on the spatial variation of crayfish species, we used the random forests (RF) method. RF is an effective modelling and predictive statistical technique based on the combinations of classification trees [55,56]. Each tree is constructed based on an independent and random selection of prediction variables and samples, therefore the final model is much more robust regarding input data noise [55]. The generalisation of the error of all classification trees depends on the prediction power of each tree and the correlation between them. This method has high prediction power, is not sensitive to overfitting, and does not influence the data [57]. For example, it was used by [58], who developed BIOMOD, a computer platform to model species distributions, enabling the quantification of species-environment relationships or projecting species distributions into different environmental conditions. The authors of [59] used RF for modelling the predator (Lontra provocax) and prey (crustaceans) distributions by relating a set of environmental predictors to species occurrence records.
Two algorithms for calculating the importance of independent variables are implemented in the R software [60] through the random Forest package. The most widely used is the mean decrease in accuracy (MDA) which calculates the importance of the variable by removing it from the model, calculating the model's accuracy based on unused observations for tree creation (approximately 1/3). Therefore, the higher the MDA value, the more important is the variable on the variance of the dependent variable [61].
The random Forest method in the R program was applied on each database using 100 bootstrapped models with 500 trees. The model with the average accuracy among the 100 random repetitions was retained, resulting in a list of soil-related descriptors, for each

Species Distribution Maps
It is known that a carefully selected subset of relevant predictors generally performs better than using all available predictors for RF prediction mapping. For each crayfish species, a subset was selected from the pool of all 8 soil-related factors based on an MDA threshold of 1%. Therefore, species distribution mapping, as the presence/absence category (with a threshold of 50% for presence probability), was conducted only using those soilrelated descriptors that add at least 1% to the model's overall accuracy. All soil-related descriptors were used for distribution mapping of A. astacus and P. leptodactylus, six out of eight were used for A. torrentium species (except the absolute depth to bedrock and soil erodibility), while only four soil-related descriptors were used for the A. bihariensis species (absolute depth to bedrock, coarse fragments, bulk density, and soil erodibility).
The same parameters were used within the RF method for species distribution mapping, as for the variable importance analysis, retaining the average map among the 100 bootstrapped models (with 500 trees). The other RF parameters (number of variables randomly sampled, minimum and maximum size of terminal nodes, etc.) were kept as default values.

Results Validation
To validate the results of RF variable importance analysis with field data, we conducted a bivariate correlation analysis, using IBM SPSS Statistics, between the CPUE and each independent variable. We expect that soil variables suggested as important for the spatial distribution modelling of crayfish species correlate better with CPUE than the other variables.
The final distribution maps were assessed using the standard measures of overall accuracy (OA) and area under the curve (AUC), based on the 70%/30% rule. For model training, 70% randomly selected point locations were used, while the remaining 30% was used for the model validation. Overall accuracy is the ratio between correctly classified cases and the total number of cases [62]. AUC is a widely used metric for model performance evaluation and assessing the discriminatory capacity of species distribution models [63]. Generally, an AUC value of 0.5 shows no discrimination between categories, 0.7-0.8 is considered acceptable, while values higher than 0.8 are considered excellent.

Austropotamobius bihariensis
Only four out of the eight soil-related descriptors recorded an MDA value higher than 1% and were consequently used for the A. bihariensis species distribution mapping. The A. bihariensis species distribution seems to be best explained by the absolute depth to bedrock, recording an MDA of 17.03%. At the same time, the summary statistics show that the species seems to prefer thinner sediment cover (average of 15.6 m for presence compared to 19.03 m for absence) and probably thinner soils ( Table 1). The second most important is coarse fragment content which only decreased the model's accuracy by 4.2% (Figure 2a). The statistics show a higher coarse fragments content for presence locations (average of 12.3%) than for absence locations (9%), suggesting that A. bihariensis prefers soils with a higher volume of coarse fragments (Table 1). Bulk density and soil erodibility factor decreased the model's accuracy by only 2.5% (Figure 2a), suggesting that the A. bihariensis species prefer soils with slightly lower density and less erosion.    The higher correlations with CPUE confirmed the RF variable importance analysis, with absolute depth to bedrock recording an R-value of −0.60, coarse fragment content of 0.27, bulk density of −0.17, and soil erodibility factor of −0.17 (Table 2). The A. bihariensis prediction map is the second most accurate among the four species, showing an overall accuracy of 69% and an excellent AUC value of 0.94 (Table 3). The habitat of this species is located on rivers crossing highlands, mainly mountainous landforms. The prediction based on soil-related factors shows the highest probability across mountainous landforms. However, as biogeographical investigations show, this species is only found in the Apuseni Mountains area [25]. Our results show that the A. bihariensis species prefer thinner, less dense soils with a higher volume of coarse fragments and less prone to erosion (Figure 2b).

Austropotamobius torrentium
While the absolute depth to bedrock and soil erodibility were significant for A. bihariensis, our results show that these descriptors play no role in the distribution of the A. torrentium species. The distribution of A. torrentium species is best explained by bulk density (MDA of 13.1%) and coarse fragments content (MDA of 10.5%) (Figure 3a). The summary statistics show no significant differences between the average of presence and absence locations regarding bulk density and slightly higher content of coarse fragments within the presence locations (Table 1). Soil texture plays a significant role for A. torrentium distribution showing for silt content an MDA value of 10.7%, for clay content 4.9%, and sand content 4.5% (Figure 3a), however with no significant average differences between presence and absence locations. Soil erosion by water seems to be an important factor, with a notable difference between presence (5.4 t/ha.yr) and absence (6.6 t/ha.yr) locations (Table 1), however, with an MDA value of only 5.2% (Figure 3a). The low values of the correlation coefficient, recorded by all soil-related factors, confirmed to some extent the difficulty of finding specific soil-related habitat conditions for this species and the lower accuracy of the model (Table 2).
lower accuracy values suggest that A. torrentium species di explained by other environmental factors (Figure 3b).

Astacus astacus
All soil-related descriptors recorded MDA values high quently used for distribution mapping. The most important i silt, sand, and clay content, reducing the model accuracy by spectively (Figure 4a). However, the summary statistics show cations have only slightly higher clay content and slightly sence locations (Table 1). A significant contribution is made which decreased the model's accuracy by 9.1% (Figure 4a). The A. torrentium prediction map recorded an overall accuracy of 67% and an acceptable AUC value of 0.72 (Table 3). As variable importance analysis showed also, these lower accuracy values suggest that A. torrentium species distribution is probably better explained by other environmental factors (Figure 3b).

Astacus astacus
All soil-related descriptors recorded MDA values higher than 1% and were subsequently used for distribution mapping. The most important is soil texture, represented by silt, sand, and clay content, reducing the model accuracy by 10.2%, 10.1%, and 10%, respectively (Figure 4a). However, the summary statistics show that A. astacus presence locations have only slightly higher clay content and slightly lower sand content than absence locations (Table 1). A significant contribution is made by coarse fragments content  1% (Figure 4a). The statistics show a lower coarse fragments content for presence locations (average of 9.4%) than for absence locations (11.2%), suggesting that A. astacus prefers soils with a lower volume of coarse fragments (Table 1). Absolute depth to bedrock is also an essential soil-related factor for A. astacus species, decreasing the accuracy by 7.4% (Figure 4a), with no significant average differences between presence and absence locations. Bulk density, soil erosion by water, and soil erodibility factor decreased the accuracy of the model by 5.2%, 4.3%, and 3.3%, respectively (Figure 4a), the species preferring slightly denser, but higher erosive soils, with a higher average value within the presence locations (8 t/ha.yr) compared to the absence locations (7.3 t/ha.yr) ( Table 1). These results are consistent with the correlation analysis, the identified relationships between CPUE and soil-related descriptors being weaker but statistically significant: clay content (R-value 0.18), sand content (R-value −0.13), coarse fragments content (R-value −0.14), bulk density (R-value 0.16), and soil erodibility factor (R-value 0.18) ( Table 2).

2021, 13, x FOR PEER REVIEW
distribution is explained to a lesser extent by soil-related desc weak, suggest that A. astacus species prefer more likely sligh soils, with slightly higher clay content but lower sand conten ments (Figure 4b).  The A. astacus prediction map recorded an overall accuracy of 66% and an acceptable AUC value of 0.74 (Table 3). As variable importance analysis showed also, the species distribution is explained to a lesser extent by soil-related descriptors. Our results, however weak, suggest that A. astacus species prefer more likely slightly denser but higher erosive soils, with slightly higher clay content but lower sand content and volume of coarse fragments (Figure 4b).

Pontastacus leptodactylus
The presence of P. leptodactylus species is explained by all soil-related descriptors, used therefore for distribution mapping. The most important is coarse fragment content which decreased the model's accuracy by 23.9% (Figure 5a). The statistics show a much higher coarse fragments content for absence locations (average of 11.06%) than for presence locations (3.7%), suggesting that P. leptodactylus prefers soils with a lower volume of coarse fragments ( Table 1). Absolute depth to bedrock is also a vital soil property for P. leptodactylus species, which decreased the model's accuracy by 16.4% (Figure 5a). The summary statistics show that the species prefers deeper sediment cover (average of 27.4 m for presence compared to 19.6 m for absence) and probably deeper soils (Table 1). Bulk density and soil erodibility factor decreased the model's accuracy by 12.4% and 12.1%, respectively (Figure 5a). The P. leptodactylus species prefers denser soils, with an average of 1416.7 kg/m 3 for presence locations and 1381.1 kg/m 3 within absence locations. Soil texture represented by clay, silt, and sand content shows lower importance for the distribution of P. leptodactylus species, reducing the accuracy by 7.9, 6.9 and 7.3%, respectively (Figure 5a). However, the summary statistics show that P. leptodactylus presence locations have a higher clay content (29.2% compared to 25.4% for absence) and a lower sand content (30.1% compared to 35.4%) ( Table 1). Soil erosion by water is the least important soil-related descriptor. However a lower average value within the presence locations (4.1 t/ha.yr) than the absence locations (7.2 t/ha.yr) suggests that the P. leptodactylus species prefer more stable soils, less prone to erosion ( Table 1). The correlation analysis with field data highly validates the RF variable importance results. The highest correlation coefficients were recorded by the same soil-related factors, coarse fragment content (R-value of −0.46), absolute depth to bedrock (R-value of 0.44), bulk density (R-value of 0.22), clay content (R-value of 0.31), and sand content (R-value of −0.30) ( Table 2).
The P. leptodactylus distribution prediction map is the most accurate among the four species, recording an overall accuracy of 94% and an excellent AUC value of 0.94 (Table 3). As field data show, the habitat of this species is located on rivers crossing lowlands, mainly plain topography, the prediction based on soil-related descriptors showing the highest probability across this type of landforms. The results show that P. leptodactylus species is more likely to live in deeper, denser, and more stable soils, less prone to erosion, with a lower volume of coarse fragments, higher clay content, and lower sand content (Figure 5b). highest probability across this type of landforms. The result species is more likely to live in deeper, denser, and more stabl with a lower volume of coarse fragments, higher clay conten (Figure 5b).

Discussion
Prediction of crayfish distributions based on digitally d still needs to use validated geospatial environmental descri usually relied on classical general variables such as altitude, te land use/land cover [64]. Specifically designed and tested va is a need for scientific approaches to provide reliable config crayfish distribution modelling. Even if there are available, w consisting of numerous environmental variables specifically bodies [13], testing and validating their relevance to differen tions is required.
For crayfish, distribution modelling methods are valua

Discussion
Prediction of crayfish distributions based on digitally derived environmental data still needs to use validated geospatial environmental descriptors. Previous approaches usually relied on classical general variables such as altitude, temperature, precipitation or land use/land cover [64]. Specifically designed and tested variables are scarce, and there is a need for scientific approaches to provide reliable configurations for best results in crayfish distribution modelling. Even if there are available, worldwide coverage datasets consisting of numerous environmental variables specifically designed to describe water bodies [13], testing and validating their relevance to different crayfish species distributions is required.
For crayfish, distribution modelling methods are valuable instruments for understanding their biogeographical patterns and depicting past and current aspects relevant for both scientific knowledge and appropriate conservation [31]. Here we enlarged the array of digitally derived environmental descriptors by analysing the distribution of four European indigenous crayfish species in relation to the soil-related digitalised characteristics.
From the four tested species, the set of soil descriptors analyses performed best in the distribution modelling of P. leptodactylus, a crayfish species environmentally characteristic for lowland areas, large or medium rivers and lakes [65]. The prediction based on soil-related descriptors expectedly showed the highest probability across these types of landforms. The coarse fragment content, the most critical soil descriptor found by statistical analyses, show a much higher coarse fragments content for absence locations suggesting that P. leptodactylus prefers soils with a lower volume of coarse fragments. Absolute depth to bedrock was also found significant; the species prefer deeper sediment cover and deeper soils. The species also prefers denser soils, represented by higher clay and lower sand content. Soil erosion by water is the least important soil-related descriptor, this variable suggests that the P. leptodactylus species prefer more stable soils, less susceptible to erosion. These descriptors highlight the need of this species for cohesive soil river banks as the microenvironment for building their burrows [5].
The most important predictors for A. astacus proved to be soil texture. Presence locations had slightly higher clay and slightly lower sand content compared to absence locations. A significant influence was proved for lower coarse fragments content and absolute depth to bedrock, suggesting the species occurs in denser and higher erosive soils. These results are consistent with the previous study testing the relevance of bedrock substrate stability [29].
The two investigated Austropotamobius species revealed differentiation in ecological preferences related to soil descriptors, A. bihariensis showing preferences for the absolute depth to bedrock and soil erodibility, with no contribution in the distribution of A. torrentium for these two descriptors. The latter species distribution was found to be shaped by the bulk density and coarse fragments content. Soil texture plays a significant role for A. torrentium distribution, showing relevance for silt, clay, sand content. It is also tolerant to soil erosion by water, whereas A. bihariensis seems to prefer soils with slightly lower density and less erosion. Austropotamobius species are primarily known to prefer stone substrate, usually building their burrows behind solid rocks [29,66].
Moving forward, all these soil descriptors depicted and characterised the four investigated crayfish in relation to one of the most relevant components of their environment besides water, the substrate. Invariable, a river traverses a section of the landscape covered more or less by soil, the soil being the main factor contributing to the bank's structure, supporting and supported by riparian vegetation. The river bankfs ecosystem provides the best environment for crayfish, which are sheltering by actively digging burrows. Built by the crayfish themselves, the burrows are critical structures [46,67], offering protection against overheating [68,69], predators [37,70], also preventing the drift caused by mechanical pressure generated by occasional flash-floods [6,29]. Therefore, the knowledge, understanding and management of habitats populated with protected crayfish species must consider the aspect of soil (substrate) properties, not only in the vicinity of a site but also in a buffer zone upstream of the river basin. For instance, deforestation generates a dramatic negative impact upon crayfish resident populations. The addition of coarse particulate organic matter through logging affects water quality or even disturbs the entire food web in a stream [71]. These activities are also weakening the integrity of the rivers banks. Similarly, the channelisation and other anthropogenic stream-course maintenance operations lead to a high imbalance over the benthic communities [72,73], the crayfish being one of the first negatively affected taxa.
The value of validated soil-related digital descriptors describing the four investigated European crayfish species distributions may reside in improving different approaches in crayfish species distribution modelling. These could be used to design adequate conservation measures by assessing the best local habitat and appropriate inter-populations corridors. Invasive crayfish species may have different abilities to use these soil structures and could benefit from climate change [39]. Thus, the availability of good distributional predictors can be valuable for investigating invasion dynamics, including the carried cray-fish plague pathogen Aphanomyces astaci with the increasing availability of spatial data associated with pathogen strains and virulence [74,75].
The natural or artificial barriers existing (or placed) on a river course proved to be a functional control of upstream spreading of the invasive crayfish species [76]. On the other side, such structures can be detrimental to native crayfish populations by reducing suitable habitats and migration routes and creating suitable habitat pockets for introduced species [77]. Even if expensive, hitherto, the construction of these barriers seems to be the most promising method to protect native crayfish from the almost unstoppable colonisation of invasive species.