Uncertainty in Various Habitat Suitability Models and Its Impact on Habitat Suitability Estimates for Fish

Species distribution models (SDMs) are extensively used to project habitat suitability of species in stream ecological studies. Owing to complex sources of uncertainty, such models may yield projections with varying degrees of uncertainty. To better understand projected spatial distributions and the variability between habitat suitability projections, this study uses five SDMs that are based on the outputs of a two-dimensional hydraulic model to project the suitability of habitats and to evaluate the degree of variability originating from both differing model types and the split-sample procedure. The habitat suitability index (HSI) of each species is based on two stream flow variables, including current velocity (V), water depth (D), as well as the heterogeneity of these flow conditions as quantified by the information entropy of V and D. The six SDM approaches used to project fish abundance, as represented by HSI, included two stochastic models: the generalized linear model (GLM) and the generalized additive model (GAM); as well as three machine learning models: the support vector machine (SVM), random forest (RF) and the artificial neural network (ANN), and an ensemble model (where the latter is the average of the preceding five models). The target species Sicyopterus japonicas was found to prefer habitats with high current velocities. The relationship between mesohabitat diversity and fish abundance was indicated by the trends in information entropy and weighted usable area (WUA) over the study area. This study proposes a method for quantifying habitat suitability, and for assessing the uncertainties in HSI and WUA that are 4089 introduced by the various SDMs and samples. This study also demonstrated both the merits of the ensemble modeling approach and the necessity of addressing model uncertainty.


Introduction
Recognizing that the structure and functions of a stream ecosystem are significantly influenced by flow conditions, previous studies have provided valuable insights into the ecological modeling and assessment of stream habitats [1].Physical hydraulic models, such as the Physical Habitat Simulation system (PHABSIM) and the River 2D model [2], are useful tools for simulating changes in water depth and velocity at scales that are relevant to stream habitats [3][4][5].The proficiency of the River 2D model at simulating water depth, current velocity and other riverbed characteristics, such as substrate and cover, means that it is a particularly powerful tool for projecting fish habitat suitability indices (HSI).In addition, it is important to estimate the corresponding weighted usable area (WUA), and further to project potential distributions of fish species in rivers [3].HSI has been widely used as an index of habitat suitability for species in stream ecological studies [6].Mesohabitats and stream biota have been shown to be associated with distinct combinations of current velocity and water depth [7].Some studies have elucidated the correlation between current velocity and depth [7,8], as well as their combined effects in ecological modeling and assessment [9].Additionally, these two variables are used in species distribution models (SDMs) to estimate HSI values for fish species.
To better understand the relationship between environmental variables and species occurrence, various SDMs have been developed and used in stream ecology; some notable examples, from previous work, include the generalized linear model (GLM) [10], the generalized additive model (GAM) [10,11], the random forest model (RF) [12,13], the support vector machine (SVM) [12], the Fuzzy model [14,15], the artificial neural network model (ANN) [10,12], and a model based on probabilistic neural networks (PNN) [16].However, no consensus has been reached on an optimal model that is applicable in all relevant situations and suitable for all species since each SDM has its own unique structure and merit [17].Therefore, the variations among models must be considered; otherwise, the selection of an inappropriate SDM may result in mistaken projections and increased uncertainty.
Uncertainty analysis is particularly important when making habitat-suitability projections [18,19].When an optimal SDM has not yet been identified for a given species and or geographic extent and spatial resolution, blindly making decisions based on the result of a single SDM, without considering the uncertainty inherent in alternative models, may result in suboptimal management planning decisions [20].Uncertainty analysis is important for two main reasons [18].First, understanding of the degree of variability associated with model types can help engineers and decision-makers know where improvements can be made; second, such an understanding can indicate the level of confidence associated with SDM projections.If simulated results obtained using various models with acceptable performance are significantly different from one other, the overall result can be regarded as having high variability (low confidence).Ensemble modeling, where projections of species distributions are generated using several statistical or machine learning techniques together instead of using any single modeling technique, that is, the integration of multiple SDM outputs, is recognized as an important technique in modeling habitat suitability when truly optimal models have not been identified [19,21,22].
In addition to the two dominant variables that are accounted for by HSI, which are current velocity and water depth, spatial pattern and heterogeneity in physical conditions can also affect biological processes.In stream ecosystems, mesoscale interactions between channel morphology [23,24] and discharge [25] generate a heterogeneous hydraulic environment and spatial pattern that provide various in-stream habitats for freshwater biota [26,27].Greater stream habitat heterogeneity frequently results in greater diversity of aquatic species [23,28] and the improved success and survival of individual species [24,29].Habitat heterogeneity in rivers can be characterized by the relative proportion of mesohabitats (composition), as well as their spatial arrangement (configuration) and temporal dynamics (variation with discharge) [27,30].Although the characterization of hydromorphological conditions at an ecologically relevant scale is made possible by quantifying the hydraulic environment of streams via empirical data and an increasing number of models to choose from [27], the quantification of both uncertainty and stream habitat heterogeneity is not so straight forward [12,24].Additionally, few methods for quantifying variability in SDM outputs are available, not to mention methods for quantifying habitat heterogeneity and relating it to aquatic species ecology [12,24].
This study quantifies the relationship between fish abundance and spatial patterns of mesohabitats using six SDMs, which are the general linear model (GLM), the general additive model (GAM), the random forest model (RF), the support vector machine model (SVM), the artificial neural network model (ANN), and an ensemble model (which yields the average of the outputs of the preceding five SDMs).The objectives of this study are as follows: (1) quantify the relationship between fish abundance (represented by HSI) and flow conditions using SDMs and calculate the weighted usable area (WUA) for the focal species; (2) quantify in-stream spatial heterogeneity of flow conditions and of HSI using information entropy, and (3) quantify the variability of HSI values, WUA locations and the entropy of HSI originating from discrepancies between outputs of various SDMs and the ensemble model approach.

Materials and Methods
This study focuses on the relationship between fish abundance and mesohabitat spatial patterns.Specifically, fish occurrence data and the spatial patterns of mesohabitats (based on current velocity V and water depth D data) in the Datuan Stream in northern Taiwan were used in this study.The area was surveyed in the summer of 2008.There were 63 samples collected using backpack electrofishing.The equipment had an effective radial range of 2 m.Water turbulence meant that this method of sampling was most appropriate, due to the difficult in directly observing Sicyopterus japonicas via snorkeling.Stream bank observation was also difficult to perform due to the lack of suitable observation points (i.e., bridges, and other ecological engineering structures).The study area in Datuan Stream was approximately 300 m 2 .Consistent with previous studies, Sicyopterus japonicas, an amphidromous gobiid, was the focal species, as it has often been identified as a habitat quality indicator species [5,[31][32][33].Lin et al. (2011) [5,32] have described the study area and the method for collecting data in more detail.Figure 1 presents the study area and sampling locations as well as the number of Sicyopterus japonicas observed.

Hydraulic Model and Model Units
River 2D was used to generate spatial maps of current velocity (V) and water depth (D) across the study area.The spatial resolution of V and D is 4 m 2 .River 2D is a free-source two-dimensional depth-averaged finite element hydrodynamic model that is used in fish habitat evaluation studies [2].During the flow condition simulation, geo-referenced topographic measurements and boundary conditions, including flow rate as well as upstream and downstream water levels, were used as input data.The terrain model was based on a triangular network of elements (cells).The sizes of the substrates in the study area were homogeneous and were neglected in estimating HSI.
The model was calibrated by adjusting the bed roughness height to fit the measurements of current velocity (V) and water depth (D) in spring 2008; it was validated using V and D measurements for summer 2008.V and D were classified using the classification criterion that was proposed by Lin et al. (2011) [32].Both the projected current velocity and water depth across the study area were inputted into the species distribution models.

Ensemble Modeling of Species Distribution and Model Evaluations
The SDMs considered in this study include GLM, GAM, RF, SVM and ANN, which are widely applied to model species distributions [19].The GLM represents a flexible extension of linear models that connect species abundance to the driving factors of interest using a log link function [34].Poisson regression is an example of a generalized linear model.The "stat" package of the R software was used to project the abundance of fish at each location using GLM [35].The GAM is an extension of the general linear models that relate species abundances to the driving factors that are considered to have a polynomial form based on a Poisson distribution with a log link function [36].The "gam" package of the R software was used to project the fish abundance at each location using GAM [37].In order to further analyze the species abundance model fit and test the assumptions of the Poisson process, the chi-square test of goodness of fit was applied and the p-value was determined for both GLM and GAM.Random forests (RF) refer to an ensemble machine learning method for classification in which many decision trees are trained using bootstrap replicates of a complete data set [38].The mean projection of the individual trees is the output.The only assumption of the RF is high representativeness of sampled data.The "random Forest" package of the R software was used to project fish abundance at each location of interest using Random forests [39].
SVM assigns each location to one of the classes y = 0, 1, 2, …, n (where n represents the number of species presences) [8]; the conditions of locations which are more conducive to species are assumed to correlate with a higher number of species presences and vice-versa.A hyperplane which maximizes the margin between the training points associated with the classes, based on the conditions, i.e., driving factors of interest, is obtained via convex quadratic programming [8].The "e1071" package of the R software was used to project the abundance of fish at each location of interest using SVM [40].ANNs are non-linear machine learning tools in which a number of neurons are generated to connect species abundance and driving factors, using a back-propagation algorithm [41].The "nne" package of the R software was used to project the abundance of fish at each location of interest using ANN [42].There are no assumptions for either machine learning approaches, SVM and ANN.Finally, the results of the ensemble model were defined as the mean of the results of the five SDMs-GLM, GAM, RF, SVM and ANN.
The number of S. japonicas present in each modeling unit was estimated using the five species distribution models (GLM, GAM, RF, SVM and ANN), and the ensemble model (averaging the five SDMs) in terms of two explanatory variables, V and D. V and D generated using River 2D ranged from 0.05 to 1.00 m/s and from 0.05 to 1.25 m, respectively.The measured data were randomly divided into a training set and a validation set, which comprised 70% and 30% of the data, respectively.This split-sample procedure was performed 1000 times per SDM, yielding a total of 6000 outputs.A projection of each species distribution was generated based on the training set.The sampling error was quantified by calculating the sample variance in each training set with a finite population correction factor.Akaike information criterions (AICs) for GLM, GAM, SVM and ANN were calculated, which can be used to deal with the trade-off between the goodness of fit of the model and the complexity of the model [43].The predictive performance of the models was evaluated in terms of both the root mean square error (RMSE) between the measurements (the 30% validation data) and the projections.Moreover, the standard errors of RMSE were set to the standard deviation of RMSE.Kullback-Leibler divergence (KL) between modeled and observed fish abundance was also derived for each model.This divergence can be used to quantify the information lost by calculating the relative entropy between simulated abundance and observations [43].

Weighted Usable Area
To obtain the composite habitat suitability for fish, incorporating both V and D, the composite suitability index (HSI, ranging from zero to unity) was calculated for each cell.The V and D values were obtained via River 2D simulations.Specifically, HSI and WUA values were calculated as follows [44]: ) [ ] where ( ) simulated HSI values as functions of V and D at i th cell; = i WUA weighted useable habitat area in i th cell; and = i A water surface area in i th cell.The WUA values were calculated as a basis of discussing the total potential useable habitat areas and their various discharges.
A nonparametric Kruskal-Wallis test and a multiple comparison test were performed to determine whether the median WUA derived from the outputs of the six SDMs differed significantly.

Spatial Heterogeneity of Flow Conditions and Habitat Suitability
In this study, the spatial heterogeneity of flow conditions and HSI were quantified by calculating the information entropy [12,45]: where ncat is the number of parameter categories considered, and pi is the proportional frequency of the considered parameter in category i.In this case, the entropies of V, D, V + D and HSI were evaluated for each SDM.V ranged from 0.05 to 1.25 m/s and was classed into 12 categories, each with a range of 0.1 m/s.Similarly, D was grouped into ten categories from 0.05 to 1.05 m, defined in 0.1 m increments.HSI was grouped into ten categories from 0 to 1, defined in 0.1 increments.The range of values of information entropy is from zero to one.An entropy value of close to 1 indicates that the considered parameter is distributed uniformly across all categories, indicating high spatial heterogeneity.A value of zero indicates that the considered parameter is concentrated in a single category, indicating a low spatial heterogeneity.
In order to better compare spatial heterogeneity of habitat suitability between models, the mean, standard deviation, median, min, max, range, skewness and kurtosis of information entropy were calculated based on 1000 realizations of each model.Skewness is a measure of the asymmetry of a probability distribution about its mean; in this case, the probability distribution represented information entropy as defined above.To be more specific, if the variables in the distribution are concentrated on the right of the figure, the distribution is said to be left-skewed with a negative value of skewness, indicating that most of the spatial heterogeneity of HSI among 1000 realizations is higher than the mean.Kurtosis is a measure of peakedness of a probability distribution relative to a normal distribution.More specifically, high kurtosis (positive value) represents a sharp peak near the mean of the distribution.In contrast, data sets with low kurtosis (negative value) tend to have a flat top near the mean instead of a distinct peak, indicating the spatial heterogeneity of HSI tends to distribute uniformly.

Quantifying Variability in Habitat Suitability Index for Each Model
Two sources of uncertainty were considered-that derived from each SDM and that derived from the split-sample procedure with 70% training data and 30% validation data.To elucidate the contributions of these two sources to the variability in HSI, a general linear model is used to quantify their relative contributions by calculating the ratio between the deviation explained by one source and the null deviation [19,46].Moreover, to evaluate how the degree of similarity between outputs of different models, a principal component analysis (PCA) and Pearson correlation tests were performed on the projections.If the outputs of the models are entirely consistent with each other, then the first principal component (PC1) that is derived from the PCA would explain 100% of the variation.The proportion of the variation that is explained by PC1 can be defined as the degree of variability among model projections.Therefore, PC1 can be regarded as a consensus axis across the projections of species abundance derived from six models [22,47].So the other extreme example of model consensus would be the situation where the projections show no similarity whatsoever; i.e., they were infinitely inconsistent with each other.In this case, the consensus axis would explain only 16.7% (= 100%/6) of the variance [48].Accordingly, the variability among projections can be quantified as the inverse of the proportion of variation that is explained by PC1.

River 2D Simulations of Flow Conditions
Maps of current velocity and water depth during spring and summer 2008 were simulated by the River 2D model (Figure 2a,b).The flow rate was set to 0.5 m 3 /s, which was the average of a nearby flow meter.This model was calibrated and validated with the field measurements of flow conditions and its simulation performance are shown in terms of Coefficient of Determination (R 2 ) and Nash-Sutcliffe efficiency coefficient (Eff) [5].For current velocity, the calibration projection showed an R 2 = 0.93 and an Eff = 0.92; the validation dataset showed an R 2 = 0.85 and an Eff = 0.84.For water depth, the calibration projection had an R 2 = 0.98, and an Eff = 0.98, and the validation projection had an R 2 = 0.95, and an Eff = 0.96.
In general, the current velocity decreased and water depth increased from upstream to downstream (Figure 2a,b).The undercut slope of the right bank showed flows with higher velocities (up to 0.97 m/s) and deeper depths (up to 1.2 m), except for the upstream boundary where vegetation grows in gravelly soils.In contrast, the slip-off slope of the left bank resulted in deposition caused by slow, shallow flows.The spatial patterns of V and D were affected directly by the topography of the riverbed.In Figure 2a, the upstream area consisted of areas with high velocity predominantly with few pools in the middle section of the reach close to the right bank where some boulders aggregated.The flows close to gravel-based substrate riverbanks, which were covered with vegetation resulted in low velocity and low depth; other areas in the main channel showed high velocity and low depth.In the downstream section, current velocity was slower and both low and high water depth appeared.

HSI Obtained from SDMs and Ensemble Models
The p-value of the chi-square goodness of fit test for GLM and GAM were 0.381 and 0.468, respectively.The degree of freedom for GLM and GAM were 57 and 54, respectively.The predictive performance of each SDM in terms of the validation data and the AIC of GLM, GAM, SVM and ANN based on the training data were demonstrated in Table 1.The average RMSEs between the observation and simulation for the six models are 1.505-1.551.The means (standard errors) of AIC for GLM, GAM, SVM and ANN are 72.445(3.814), 74.313 (3.846), 63.661 (7.407) and 34.805 (5.555), respectively.The average KL between the observed abundance and the simulated results ranged from 0.232 to 0.250.The standard errors of KL for the six models are 0.002-0.003.The model that performed best among the six SDMs, in terms of RMSE and KL, is the ensemble model, which has a RMSE of 1.505 and a KL of 0.232.In contrast, RF underperformed, of which RMSE and KL are 1.562 and 0.247, respectively.
Figure 3 shows the average and standard deviation of HSI projections from 1000 realizations of all six SDM approaches.In this study, the areas with relatively high HSI (more than 0.7) are defined as suitable habitats for S. japonicas.Sites with high average HSI are located in the southwestern part of the study area, regardless of the SDM approach.Similarly, the standard deviation of HSI across all models is concentrated in the southwestern sites except for the SVM outputs in which some of the sites in the north also have high standard deviation of HSI.In terms of driving factor evaluation, current velocity showed a consistent trend (Figure 4).In fact, all model results consistently identified a velocity greater than 0.4 m/s to be highly suitable.The notable differences between SDM algorithms did however become apparent when preferred water depth ranges were taken into account.
For instance, while GLM, RF and SVM identified a low water depth (0-0.6 m) as suitable for the focal species, the GAM identified sites with greater water depths (0.8-1.2 m) as high suitability factors.In the case of ANN and ensemble model outputs, there was no clear trend in suitability dependencies on water depth.Therefore, the spatial divergences, which exist between models, are presumably a result of discrepancies between preferred water depths, whereas spatial convergence of model outputs are probably due to similarities in current velocity weighting.By visual inspection of Figure 4, SVM appears to diverge the most; for instance, while GLM, GAM, ANN, and the ensemble model generated smooth general suitability distributions, the SVM model produced outputs, which seem to be far more specific and complicated.This is apparent from the highly nonlinear pattern observed in Figure 4d.To be more specific, GLM evaluated a velocity greater than 0.2 m/s and a depth lower than 0.6 m to be suitable.The GAM-based surface indicated suitable sites with velocity greater than 0.4 m/s and depth larger than 1 m.In RF, a velocity greater than 0.4 m/s and a depth between 0.2 and 0.4 m were evaluated as suitable.SVM-derived HSI distributions showed high habitat suitability in velocities greater than 0.2 m/s and in depths between 0.15 and 0.45 m; this heightened specificity resulted in a far more fragmented projection of suitable habitats, since sites in the immediate vicinity of highly suitable areas tended to have velocities and depths outside of the suitable range.In ANN, the HSI distributions displayed high habitat suitability in velocities greater than 0.3 irrespective of water depths.The distribution from the ensemble model integrated all of the HSI patterns from the five above-mentioned SDMs.It demonstrated two central conditions for high habitat suitability around 0.3 m and 1.1 m in depth, both with a velocity greater than 0.3 m/s.As for the standard deviations of HSI, in GLM, ANN and the ensemble model, the standard deviations follow the trends of average HSI surfaces; that is, in general, the sites with high suitability corresponded to greater standard deviations of HSI.The exception to this was the GAM-based HSI distributions, which indicated high standard deviation in velocities greater than 0.3 m/s and in depths between 0.2 and 0.4 m.Moreover, in RF, the sites with low standard deviations were located in velocity greater 0.3 m/s and in depth between 0.2 and 0.4 m.In SVM, the low standard deviation corresponded to the high HSI values.

Spatial Heterogeneity of Flow Conditions and of HSI
The information entropies of current velocity and water depth are 0.62 and 0.60, respectively, which indicates moderate spatial heterogeneity.An entropy value of current velocity and depth close to 0.5 represents intermediate spatial heterogeneity of flow conditions over the river.In addition, Figure 6 demonstrates the density distribution of information entropy of HSI in 1000 realizations derived from six different SDMs, including GLM, GAM, RF, SVM, ANN, and the ensemble models.In comparison to the results of the other models, the entropy level of the GAM outputs reveals a far more scattered distribution.In contrast, the distribution of the entropy in RF, SVM and the ensemble models were more consistent.Table 2 shows the entropies of HSI in GLM, GAM, RF, SVM ANN, and the ensemble models ranging from 0.18 to 0.89, 0.03 to 0.89, 0.45 to 0.80, 0.46 to 0.84, 0.29 to 0.85 and 0.04 to 0.83, respectively.The average entropy in GLM was the greatest (0.74) one among all models, while that in ANN was the lowest (0.59).The entropy in GAM and RF had the largest (0.23) and smallest (0.05) standard deviations, respectively.In addition, skewness of the entropy in RF and GLM were greatest (-0.08) and lowest (-1.67), respectively.Furthermore, kurtosis of the entropy in GLM and ANN were the greatest (3.68) and lowest (-0.46), respectively.

Variability in Maps of HSI and WUA
The results obtained using the SVM differed the most from the other five models, as quantified by the Spearman correlations, which were 0.70, 0.83, 0.84, 0.77 and 0.87 (Table 3).The remaining five models yielded more mutually consistent results, with mutual correlations of 0.85 to 0.96 (Table 3).In addition to the Spearman correlation, Figure 7 shows the PCA values of HSI that were derived using the six models, which show that the component loadings obtained using SVM, PC1 and PC2, differ distinctly from those obtained using the other models.In contrast, the ensemble model was highly

Usefulness of River 2D in Simulating Flow Conditions
Data on current velocity V and water depth D were collected from fall 2007 to summer 2008 and the spatial patterns of mesohabitats, based on species data collected over a good portion of the study area, were studied further.A 2D numerical hydrodynamic model, River 2D, was used to generate spatial maps of current velocity (V) and water depth (D) over the entire study area.The accuracy of the simulation depended strongly on the quality of bed topography data [49].In this study, the simulated results showed strong agreement with the empirical data, supporting River 2D's applicability to small scale analysis [12].Based on the accuracy of simulated flows, the suitability of the habitat for S. japonicas can be evaluated with confidence over the entire study area.

SDM-Determined Suitability of Habitat with Respect to Spatial Heterogeneity
A higher AIC represents greater amount of information lost, or the overfitting of a model (too many parameters considered) based on the training dataset.Shearer et al. (2015) found that GAMs based on high-flow rivers overestimated the effects of flow allocation relative to those based on low-flow rivers [11].In this study, GLM and GAM yielded higher AICs but used fewer parameters, indicating the projections derived from both models lead to greater amounts of information loss in comparison with SVM and ANN.The numbers of parameters for GLM, GAM, ANN and SVM were 5, 8, 9 and 50, respectively.In the case of the two artificial intelligence models, these parameter numbers are dependent on the number of weights in ANN [50] and the number of margin support vectors in SVM [51].The parameter numbers also suggest that both GLM and GAM suffered from greater information loss than did the other models; while there is a high chance of over fitting in SVM because it included far more parameters than any other model.ANN yielded projections with relatively low RMSE and KL between the predicted and the measured number of species compared to the other models.Considering RMSE simultaneously measures both bias (accuracy) and precision (variation) of the projections; and KL values based on the validation dataset can be used to assess the differences between the distribution of simulated abundance and empirical measurements, this result indicates the merit of the nonlinear transfer function in ANN [52].Despite this, in terms of the RMSE and KL, the ensemble model outperformed all other single-model approaches considered in this study.Accordingly, the ensemble model has potential to improve upon the predictive performance of all single-model SDMs [22,53].Schweizer et al. (2007) and Ayllón et al. (2009) emphasized the mutual dependency between flow velocity and depth [7,9].In this study, HSI was obtained using six species distribution models, which only considered V and D throughout simulations.Based on the criterion of Lin et al. (2011) [5] for classifying mesohabitats, the preferred mesohabitats of S. japonicas, those with high HSI values, were found to be riffles and runs, i.e., those associated with high current velocities [31] presumably due to the fact that mesohabitats with higher current velocities resulted in greater aeration rates, providing higher concentrations of oxygen, and consequently increased abundance of algae and invertebrates on which to feed [54].The information entropy of current velocity and water depth captured the diversity and richness of the microhabitats available to the focal species.Moreover, seasonal variations in discharge could also be considered as a driving factor of species abundance as the velocity and depth may vary between seasons (Figure 8).Specifically, as discharge increases so too does current velocity.Such temporal variability may very well carry considerable ecological implications, as fluctuations in conditions and resources affect fish richness.Also, the spatial heterogeneity of the habitats that were identified using each model was calculated.The spatial heterogeneities of V, D and habitat quality are important measurements, since higher spatial heterogeneity of habitats can provide a greater number of environmental conditions [12].Previous studies have shown that greater stream habitat heterogeneity does in fact increase aquatic species diversity [23,28] and the survival of some species [24,25,29] by providing a larger variety of refuges and other functions.

Variability in HSI That Originates from Various Sources of Uncertainty
Variability in SDM-based HSI is attributable to several sources, including inaccuracies in survey data (current velocity, water depth or species occurrence) [55,56]), the used SDM [57,58], and the parameters of the ecological models [46].In this study, two sources of uncertainty in HSI, the type of model and the random selection of 70% of the data set as training data, were considered.The random sample splitting explained about 37.0% of the variation between the 6000 outputs (6 SDMs × 1000 iterations), while model structure explained about 63.0%, indicating that the uncertainty that originated in sampling of data had less of an effect on the variability of the projected HSI but was still responsible for one third of the variability The variability in HSI that was introduced by the sample data is attributable to deviations introduced by the collection of data concerning fish occurrences and the small sample size [18].Point electrofishing was performed while moving upstream to minimize time spent in water [5,32].Since these data were collected manually, some untraceable errors may have arisen during their collection.This error and the small sample size may be responsible for a certain level of variability in the projection of HSI.The spatial discrepancy of the simulated HSI is further characterized by comparing the outputs of the various SDMs used herein.Variation in model structure explained almost two thirds of the variability in the generated HSI surfaces.GLM, RF and SVM determined that S. japonicas preferred habitats with high current velocity and low water depth, while GAM and the ensemble model found that high current velocity and high water depth were preferred; the ANN-derived HSI indicated that water depth was unimportant.Additionally, the total potential usable area for S. japonicas, WUA, was obtained from observations of habitat suitability and represents habitat availability for a target species [59].HSI dominated WUA, a higher WUA suggested greater suitability of the habitat for the species of interest.In this study, the means and standard deviations of WUA values across the six models clearly differed, despite similar model performance, in terms of RMSE and KL.Therefore, in practice, decision-makers should be aware of the differences between projections caused by model choice [12].This study assessed the variability of HSI outputs arising from different SDMs and the split-sample procedure at a small spatial scale, it is felt that the findings can help decision makers better understand uncertainty.

Conclusions
In order to better understand the habitat preferences of S. japonicas within the Datuan tributary, six different SDM approaches were used to project habitat suitability in terms of current velocity (V) and water depth (D) as projected by a two-dimensional hydraulic model, River 2D.Variability between projections was then analyzed.Information entropy analysis of velocity and depth successfully indicated spatial heterogeneity of habitats.Variation in model structure explained approximately two thirds of the variability in generated HSI surfaces based on the outputs of the six models, while almost one third of HSI variability originated from split-sample procedure.S. japonicas was found to prefer habitats with high current velocity regardless of SDM approach.Differing results of suitable water depth ranges revealed the strong effect differing SDMs have on simulated distributions.The ensemble model appears to be a powerful approach, which captures a wide range of variability originating from different model types, and we strongly recommend its use during the modeling process.This study also underscores the importance of addressing uncertainties in species distribution models.

Figure 1 .
Figure 1.(a) Study site; (b) sample locations and number of Sicyopterus japonicas observed; and (c) Sicyopterus japonicas observations and corresponding stream velocity and depth.

Figure 2 .
Figure 2. Maps of (a) current velocity (m/s) and (b) depth (m) over the study area.Note: flow direction: South to North.

Figure 3 .
Figure 3. Maps of average Habitat suitability index (HSI) derived from (a) GLM; (b) GAM; (c) RF; (d) SVM and (e) ANN; as well as (f) the ensemble model, and maps of standard deviation of HSI derived from (g) GLM; (h) GAM; (i) RF; (j) SVM and (k) ANN; as well as (l) the ensemble model.

Figure 5 .
Figure 5. Scatter plots between observed and simulated fish abundance: (a) GLM; (b) GAM; (c) RF; (d) SVM; (e) ANN and (f) the ensemble model.Note: X-axis represents observed count ranging from 0 to 5, and Y-axis represents simulated count ranging from 0 to 5.

Figure 8 .
Figure 8. Maps of current velocity (a-c) and water depth (d-f) when the quantity of flow is 0.5, 0.26 and 0.12 m 3 /s.

Table 1 .
Mean ± standard error of validation root mean square error, Akaike information criterion (AIC) and Kullback-Leibler divergence (KL) in 1000 realizations as well as the partial explanations power.* from velocity and depth in the training model.

Table 2 .
Descriptive statistics for information entropy of habitat suitability in six models in 1000 realizations.