# Impacts of Species Misidentification on Species Distribution Modeling with Presence-Only Data

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Material and Methods

**Figure 1.**Study area and species data used: (

**a**) island of São Miguel in the Azores, Portugal; and (

**b**) location of the Cyathea cooperi (orange squares) and C. medullaris (blue dots) presence recorded in São Miguel between September 2011 and May 2012. Note: grey background represents the island relief as bright tones for high altitude and insolation and dark tones for opposite conditions.

#### 2.1. Real Data

^{−1}, increasing with altitude and from east to west [34,35]. São Miguel is the largest island (745 km

^{2}), and its highest peak stands 1103 m high in the East (Figure 1b).

#### 2.2. Simulated Data

_{i}) was defined as the product of those two responses, which may be expressed as S

_{i}= P

_{i}T

_{i}.

_{P}, σ

_{P}) and T (μ

_{T}, σ

_{T}) were defined, respectively, equal to the mean and standard deviation of the precipitation and mean maximum temperature of the warmest quarter (introduced above) of the whole archipelago of the Azores. Thus, μ

_{P}= 314.77, σ

_{P}= 162.90, μ

_{T}= 21.83, and σ

_{T}= 1.79.

^{−3}, 9.4 × 10

^{−3}, 34.6 × 10

^{−3}, 95.7 × 10

^{−3}, and 138.8 × 10

^{−3}respectively.

**Figure 2.**The ecological niche of the simulated target and contaminating species in five scenarios (ellipses show the 95% probability regions). The niche of the target species was defined as the multiplicative interaction of precipitation (P) and temperature (T). The response of the target species to P and T was defined using normal curves where the mean μ and standard deviation σ of P (μ

_{P}, σ

_{P}) and T (μ

_{T}, σ

_{T}) are 314.77 and 162.90, and 21.83 and 1.79, respectively. For the simulated contaminating species, μ and σ of P and T varied in standard deviation units in relation to the simulated target species as shown in each scenario (for example, the normal curve of P in scenario III used a mean value of μ

_{P}+ σ

_{P}and a standard deviation of 2σ

_{P}).

_{i}) as weights. Thus, locations with high suitability for the species had a higher chance of being selected and, hence, the samples indicated ecological preferences of the species. In addition, the same strategy was used to create a testing sample for the purpose of estimating the prediction accuracy of the MaxEnt results while using simulated data. However, here there was the opportunity of generating data for absences as well as presences, allowing the calculation of standard accuracy measures such as the area under the receiver operating characteristic curve (AUC) [40]. A total of 1000 presences and 1000 absences were generated in the same manner outlined above, but using the inverse of environmental suitability (1 − S

_{i}) as weights for generating the absences (i.e., locations with low environmental suitability for the species had a higher chance of being selected).

#### 2.3. Modeling Procedures

Modeling setup | Real data | Simulated data |
---|---|---|

Target species | Cyathea cooperi (351 records) | Target species (1000 records) |

Contaminating scenarios | 1 | 5 (I, II, III, IV and V) |

Contaminating species | Cyathea medullaris (32 records) | Contaminating species of scenarios I–V (1000 records each) |

Mis-identification rates r | 7 (0, 1, 2, 4, 8, 16 and 100%) | 8 (0, 1, 2, 4, 8, 16, 32 and 100%) |

Bootstrapping replicates | 250 * | 500 |

Bootstrap sample size | 200 * | 200 |

SDMs outputs produced | 7 | 36 (7 settings of r (i.e., r > 0) times 5 scenarios plus one model without contamination (i.e., r = 0)) |

#### 2.4. Analyses

**Table 2.**Summary of the analyses performed to examine the impacts of misidentification on the MaxEnt predictions.

Analysis | Purpose | Methods |
---|---|---|

A | Understanding whether species misidentification caused a contraction or expansion of the predicted distribution of the target species. | Comparison of the estimate of the probability density function of the predictions produced with and without contaminated data. Estimates were obtained with the kernel method, namely the Gaussian smoothing kernel [51]. |

B | Identifying the direction of the contraction or expansion effects identified in Analysis A, namely whether the contaminating data shifted the MaxEnt predictions of the species presence towards the distribution of the contaminating species. | Calculation of Schoener’s D comparing the MaxEnt predictions produced using contaminated data (1%≤ r ≤32%) to both the predictions produced using the gold standard data set (r = 0%) and the contaminating species (r = 100%). |

C | Assessing the magnitude of the effects identified in the previous two analyses. | Number of pixels for which the MaxEnt predictions produced using contaminated data and the gold standard data set differed significantly (i.e., the 95% confidence interval of the SDM predictions did not overlapped). |

D | Assessing the potential impacts of species misidentification on practical applications that use SDM based analysis to identify priority areas such as in management. | Identification of the omission and commission errors committed using contaminated data while defining priority areas (i.e., the MaxEnt predictions that ranked in the top decile). |

_{Xi}− p

_{Yi}|, where p

_{Xi}and p

_{Yi}are the normalized prediction values on location i of the SDM outputs being compared. Thus, Schoener’s D ranges between 0 and 1 and provides a measure of the similarity of two modeling outputs in the geographic space.

## 3. Results

**Figure 3.**Gaussian kernel density estimates of the MaxEnt predictions: (

**a**) real data; (

**b**) simulated data in scenario I; (

**c**) simulated data in scenario II; (

**d**) simulated data in scenario III; (

**e**) simulated data in scenario IV; and (

**f**) simulated data in scenario V.

**Figure 4.**Similarity (Schoener’s D) between the MaxEnt outputs produced with and without contaminated data: (

**a**) comparison between the contaminated outputs (1% ≤ r ≤ 32%) and those produced using the gold standard data set (r = 0%); and (

**b**) comparison between the contaminated outputs (1% ≤ r ≤ 32%) and those produced using the contaminating species (r = 100%).

**Figure 5.**Some examples of predictions of presence of the real and simulated species produced by MaxEnt: (

**a**) predictions of presence of C. cooperi produced using the gold standard data set (r = 0%); (

**b**) predictions of presence of C. cooperi produced using contaminated data (r = 16%); (

**c**) predictions of presence of the contaminating species C. medullaris (r = 100%); (

**d**) predictions of presence of the simulated target species using the gold standard data set (r = 0%); (

**e**) predictions of presence of the simulated target species using contaminated data in scenario III (r = 16%); and (

**f**) predictions of presence of the simulated contaminating species in scenario III (r = 100%); Notes: greyscale from black (high prediction values) to white (low prediction values). Black arrows in parts (b) and (c) highlight an area where the influence of the distribution of the contaminating species over that of the target species is particularly noted when using contaminated data.

**Figure 6.**Magnitude of the effects caused by species mis-identification on the MaxEnt predictions: (

**a**) proportion of the predictions that differed significantly at the various misidentification rates for the real and simulated species; (

**b**) location of the predictions that changed significantly (in black) for C. cooperi using the misidentification rate of 1%; (

**c**) Location of the predictions that changed significantly (in black) for C. cooperi using the misidentification rate of 4%; and (

**d**) location of the predictions that changed significantly (in black) for C. cooperi using the misidentification rate 16%.

**Figure 7.**Omission and commission errors committed by MaxEnt while defining priority areas using contaminated data: (

**a**) omission or commission error committed at the various misidentification rates for the real and simulated species; (

**b**) location of the omission and commission errors committed for C. cooperi using the misidentification rate of 1%; (

**c**) location of the omission and commission errors committed for C. cooperi using the mis-identification rate of 4%; and (

**d**) location of the omission and commission errors committed for C. cooperi using the misidentification rate of 16%. Note: The priority areas that were correctly defined using contaminated data are shown in black in parts (b), (c) and (d).

**Figure 8.**AUC values of the MaxEnt outputs produced using the simulated data as a function of the misidentification error rate.

## 4. Discussion

## 5. Conclusions

## Acknowledgments

## Author Contributions

## Conflicts of Interest

## References

- Guisan, A.; Thuiller, W. Predicting species distribution: Offering more than simple habitat models. Ecol. Lett.
**2005**, 8, 993–1009. [Google Scholar] [CrossRef] - Elith, J.; Leathwick, J.R. Species distribution models: Ecological explanation and prediction across apace and time. Annu. Rev. Ecol. Evolut. Syst.
**2009**, 40, 677–697. [Google Scholar] [CrossRef] - Guisan, A.; Zimmermann, N.E. Predictive habitat distribution models in ecology. Ecol. Model.
**2000**, 135, 147–186. [Google Scholar] [CrossRef] - Barry, S.; Elith, J. Error and uncertainty in habitat models. J. Appl. Ecol.
**2006**, 43, 413–423. [Google Scholar] [CrossRef] - Yañez-Arenas, C.; Guevara, R.; Martínez-Meyer, E.; Mandujano, S.; Lobo, J.M. Predicting species’ abundances from occurrence data: Effects of sample size and bias. Ecol. Model.
**2014**, 294, 36–41. [Google Scholar] [CrossRef] - Moudrý, V.; Šímová, P. Influence of positional accuracy, sample size and scale on modelling species distributions: A review. Int. J. Geogr. Inf. Sci.
**2012**, 26, 2083–2095. [Google Scholar] [CrossRef] - Syfert, M.M.; Smith, M.J.; Coomes, D.A. The effects of sampling bias and model complexity on the predictive performance of maxent species distribution models. PLoS ONE
**2013**, 8. [Google Scholar] [CrossRef] - Hortal, J.; Lobo, J.M.; Jiménez-Valverde, A. Limitations of biodiversity databases: Case study on seed-plant diversity in tenerife, canary islands. Conserv. Biol.
**2007**, 21, 853–863. [Google Scholar] [CrossRef] [PubMed] - Guisan, A.; Graham, C.H.; Elith, J.; Huettmann, F.; The NCEAS Species Distribution Modelling Group. Sensitivity of predictive species distribution models to change in grain size. Divers. Distrib.
**2007**, 13, 332–340. [Google Scholar] [CrossRef] - Graham, C.H.; Ferrier, S.; Huettman, F.; Moritz, C.; Peterson, A.T. New developments in museum-based informatics and applications in biodiversity analysis. Trend. Ecol. Evol.
**2004**, 19, 497–503. [Google Scholar] [CrossRef] [PubMed] - Phillips, S.J.; Anderson, R.P.; Schapire, R.E. Maximum entropy modeling of species geographic distributions. Ecol. Model.
**2006**, 190, 231–259. [Google Scholar] [CrossRef] - Hanberry, B.B.; He, H.S.; Dey, D.C. Sample sizes and model comparison metrics for species distribution models. Ecol. Model.
**2012**, 227, 29–33. [Google Scholar] [CrossRef] - Hefley, T.J.; Baasch, D.M.; Tyre, A.J.; Blankenship, E.E. Correction of location errors for presence-only species distribution models. Method. Ecol. Evolut.
**2014**, 5, 207–214. [Google Scholar] [CrossRef] - Kramer-Schadt, S.; Niedballa, J.; Pilgrim, J.D.; Schröder, B.; Lindenborn, J.; Reinfelder, V.; Stillfried, M.; Heckmann, I.; Scharf, A.K.; Augeri, D.M.; et al. The importance of correcting for sampling bias in maxent species distribution models. Divers. Distrib.
**2013**, 19, 1366–1379. [Google Scholar] [CrossRef] - Boria, R.A.; Olson, L.E.; Goodman, S.M.; Anderson, R.P. Spatial filtering to reduce sampling bias can improve the performance of ecological niche models. Ecol. Model.
**2014**, 275, 73–77. [Google Scholar] [CrossRef] - Varela, S.; Anderson, R.P.; García-Valdés, R.; Fernández-González, F. Environmental filters reduce the effects of sampling bias and improve predictions of ecological niche models. Ecography
**2014**, 37, 1084–1091. [Google Scholar] [CrossRef] - Dormann, C.F.; McPherson, J.M.; Araújo, M.B.; Bivand, R.; Bolliger, J.; Carl, G.; Davies, R.G.; Hirzel, A.; Jetz, W.; Kissling, W.D.; et al. Methods to account for spatial autocorrelation in the analysis of species distributional data: A review. Ecography
**2007**, 30, 609–628. [Google Scholar] [CrossRef] - Santika, T.; Hutchinson, M.F. The effect of species response form on species distribution model prediction and inference. Ecol. Model.
**2009**, 220, 2365–2379. [Google Scholar] [CrossRef] - Václavík, T.; Kupfer, J.A.; Meentemeyer, R.K. Accounting for multi-scale spatial autocorrelation improves performance of invasive species distribution modelling (iSDM). J. Biogeogr.
**2012**, 39, 42–55. [Google Scholar] [CrossRef] - de Oliveira, G.; Rangel, T.F.; Lima-Ribeiro, M.S.; Terribile, L.C.; Diniz-Filho, J.A.F. Evaluating, partitioning, and mapping the spatial autocorrelation component in ecological niche modeling: A new approach based on environmentally equidistant records. Ecography
**2014**, 37, 637–647. [Google Scholar] [CrossRef] - Meyer, C.B.; Thuiller, W. Accuracy of resource selection functions across spatial scales. Divers. Distrib.
**2006**, 12, 288–297. [Google Scholar] [CrossRef] - Fernandes, R.; Vicente, J.; Georges, D.; Alves, P.; Thuiller, W.; Honrado, J. A novel downscaling approach to predict plant invasions and improve local conservation actions. Biol. Invasion.
**2014**, 16, 2577–2590. [Google Scholar] [CrossRef] - Alldredge, M.W.; Pacifici, K.; Simons, T.R.; Pollock, K.H. A novel field evaluation of the effectiveness of distance and independent observer sampling to estimate aural avian detection probabilities. J. Appl. Ecol.
**2008**, 45, 1349–1356. [Google Scholar] [CrossRef] - Bailey, L.L.; MacKenzie, D.I.; Nichols, J.D. Advances and applications of occupancy models. Method. Ecol. Evolut.
**2013**, 5, 1269–1279. [Google Scholar] [CrossRef] - Archaux, F.; Gosselin, F.; Bergès, L.; Chevalier, R. Effects of sampling time, species richness and observer on the exhaustiveness of plant censuses. J. Veg. Sci.
**2006**, 17, 299–306. [Google Scholar] [CrossRef] - Tillett, B.J.; Field, I.C.; Bradshaw, C.J.A.; Johnson, G.; Buckworth, R.C.; Meekan, M.G.; Ovenden, J.R. Accuracy of species identification by fisheries observers in a north Australian shark fishery. Fish. Res.
**2012**, 127–128, 109–115. [Google Scholar] [CrossRef] - Hull, J.M.; Fish, A.M.; Keane, J.J.; Mori, S.R.; Sacks, B.N.; Hull, A.C. Estimation of species identification error: Implications for raptor migration counts and trend estimation. The J. Wildl. Manag.
**2010**, 74, 1326–1334. [Google Scholar] [CrossRef] - Shea, C.P.; Peterson, J.T.; Wisniewski, J.M.; Johnson, N.A. Misidentification of freshwater mussel species (Bivalvia: Unionidae): Contributing factors, management implications, and potential solutions. J. North. Am. Benthol. Soc.
**2011**, 30, 446–458. [Google Scholar] [CrossRef] - Meier, R.; Dikow, T. Significance of specimen databases from taxonomic revisions for estimating and mapping the global species diversity of invertebrates and repatriating reliable specimen data. Conserv. Biol.
**2004**, 18, 478–488. [Google Scholar] [CrossRef] - Scott, W.A.; Hallam, C. Assessing species misidentification rates through quality assurance of vegetation monitoring. Plant Ecol.
**2003**, 165, 101–115. [Google Scholar] [CrossRef] - Ensing, D.J.; Moffat, C.E.; Pither, J. Taxonomic identification errors generate misleading ecological niche model predictions of an invasive hawkweed. Botany
**2012**, 91, 137–147. [Google Scholar] [CrossRef] - Molinari-Jobin, A.; Kéry, M.; Marboutin, E.; Molinari, P.; Koren, I.; Fuxjäger, C.; Breitenmoser-Würsten, C.; Wölfl, S.; Fasel, M.; Kos, I.; et al. Monitoring in the presence of species misidentification: The case of the Eurasian lynx in the Alps. Anim. Conserv.
**2012**, 15, 266–273. [Google Scholar] [CrossRef] - Beerkircher, L.; Arocha, F.; Barse, A.; Prince, E.; Restrepo, V.; Serafy, J.; Shivji, M. Effects of species misidentification on population assessment of overfished white marlin tetrapturus albidus and roundscale spearfish T. georgii. Endanger. Species Res.
**2009**, 9, 81–90. [Google Scholar] [CrossRef] - Silva, L.; Ojeda-Land, E.; Rodríguez-Luengo, J.L. Invasive Terrestrial Flora and Fauna of Macaronesia. Top 100 in Azores, Madeira and Canaries. ARENA: Ponta Delgada, Portugal, 2008. [Google Scholar]
- Borges, P.A.V.; Amorim, I.R.; Cunha, R.; Gabriel, R.; Martins, A.F.; Silva, L.; Costa, A.; Vieira, V. Azores. In Encyclopedia of Islands; Gillespie, R.G., Clague, D.A., Eds.; University of California Press: Oakland, CA, USA, 2009; pp. 70–75. [Google Scholar]
- Robinson, R.C.; Sheffield, E.; Sharpe, J.M. Problem ferns: Their impact and management. In Fern Ecology; Mehltreter, K., Walker, L.R., Sharpe, J.M., Eds.; Cambridge University Press: Cambridge, UK, 2010; pp. 255–322. [Google Scholar]
- Azevedo, E.B.; Pereira, L.S.; Itier, B. Modelling the local climate in island environments: Water balance applications. Agric. Water Manag.
**1999**, 40, 393–403. [Google Scholar] [CrossRef] - Projectos Climaat e Climarcost. Available online: www.climaat.angra.uac.pt (accessed on 10 November 2015).
- Theodoridis, S.; Koutroumbas, K. Pattern Recognition, 4th ed.; Academic Press: Burlington, MA, USA, 2009. [Google Scholar]
- Fielding, A.H.; Bell, J.F. A review of methods for the assessment of prediction errors in conservation presence/absence models. Environ. Conserv.
**1997**, 24, 38–49. [Google Scholar] [CrossRef] - Maxent Software for Species Habitat Modeling. Available online: http://www.cs.princeton.edu/~schapire/maxent (accessed on 10 November 2015).
- Elith, J.; Phillips, S.J.; Hastie, T.; Dudík, M.; Chee, Y.E.; Yates, C.J. A statistical explanation of maxent for ecologists. Divers. Distrib.
**2011**, 17, 43–57. [Google Scholar] [CrossRef] - Phillips, S.J.; Dudík, M. Modeling of species distributions with Maxent: New extensions and a comprehensive evaluation. Ecography
**2008**, 31, 161–175. [Google Scholar] [CrossRef] - Royle, J.A.; Chandler, R.B.; Yackulic, C.; Nichols, J.D. Likelihood analysis of species occurrence probability from presence-only data for modelling species distributions. Method. Ecol. Evolut.
**2012**, 3, 545–554. [Google Scholar] [CrossRef] - Phillips, S.J.; Elith, J. On estimating probability of presence from use-availability or presence-background data. Ecology
**2013**, 94, 1409–1419. [Google Scholar] [CrossRef] [PubMed] - Yackulic, C.B.; Chandler, R.; Zipkin, E.F.; Royle, J.A.; Nichols, J.D.; Campbell Grant, E.H.; Veran, S. Presence-only modelling using Maxent: When can we trust the inferences? Method. Ecol. Evolut.
**2013**, 4, 236–243. [Google Scholar] [CrossRef] - Merow, C.; Smith, M.J.; Silander, J.A. A practical guide to Maxent for modeling species’ distributions: What it does, and why inputs and settings matter. Ecography
**2013**, 36, 1058–1069. [Google Scholar] [CrossRef] - Strubbe, D.; Matthysen, E. Predicting the potential distribution of invasive ring-necked parakeets Psittacula krameri in Northern Belgium using an ecological niche modelling approach. Biol. Invasion.
**2009**, 11, 497–513. [Google Scholar] [CrossRef] - Costa, H.; Aranda, S.C.; Lourenço, P.; Medeiros, V.; de Azevedo, E.B.; Silva, L. Predicting successful replacement of forest invaders by native species using species distribution models: The case of Pittosporum undulatum and Morella faya in the Azores. For. Ecol. Manag.
**2012**, 279, 90–96. [Google Scholar] [CrossRef] - Freeman, E.A.; Moisen, G. PresenceAbsence: An R package for presence absence analysis. J. Stat. Softw.
**2008**, 23, 1–31. [Google Scholar] [CrossRef] - Silverman, B.W. Density Estimation for Statistics and Data Analysis; Chapman and Hall: London, UK, 1986; Volume 26. [Google Scholar]
- Wickham, H. ggplot2: Elegant Graphics for Data Analysis; Springer: New York, NY, USA, 2009; p. 212. [Google Scholar]
- Schoener, T.W. The Anolis lizards of Bimini: Resource partitioning in a complex fauna. Ecology
**1968**, 49, 704–726. [Google Scholar] [CrossRef] - Warren, D.L.; Glor, R.E.; Turelli, M.; Funk, D. Environmental niche equivalency versus conservatism: Quantitative approaches to niche evolution. Evolution
**2008**, 62, 2868–2883. [Google Scholar] [CrossRef] [PubMed] - Crall, A.; Newman, G.; Jarnevich, C.; Stohlgren, T.; Waller, D.; Graham, J. Improving and integrating data on invasive species collected by citizen scientists. Biol. Invasion.
**2010**, 12, 3419–3428. [Google Scholar] [CrossRef] - Fitzpatrick, M.C.; Preisser, E.L.; Ellison, A.M.; Elkinton, J.S. Observer bias and the detection of low-density populations. Ecol. Appl.
**2009**, 19, 1673–1679. [Google Scholar] [CrossRef] [PubMed] - Soley-Guardia, M.; Radosavljevic, A.; Rivera, J.L.; Anderson, R.P. The effect of spatially marginal localities in modelling species niches and distributions. J. Biogeogr.
**2014**, 41, 1390–1401. [Google Scholar] [CrossRef] - Václavík, T.; Meentemeyer, R.K. Equilibrium or not? Modelling potential distribution of invasive species in different stages of invasion. Divers. Distrib.
**2012**, 18, 73–83. [Google Scholar] [CrossRef] - Broennimann, O.; Guisan, A. Predicting current and future biological invasions: Both native and invaded ranges matter. Biol. Lett.
**2008**, 4, 585–589. [Google Scholar] [CrossRef] [PubMed] - Jiménez-Valverde, A.; Peterson, A.T.; Soberón, J.; Overton, J.M.; Aragón, P.; Lobo, J.M. Use of niche models in invasive species risk assessments. Biol. Invasion.
**2011**, 13, 2785–2797. [Google Scholar] [CrossRef] - Lobo, J.M. More complex distribution models or more representative data? Biodivers. Inform.
**2008**, 5, 14–19. [Google Scholar] [CrossRef] - Cayuela, L.; Golicher, D.J.; Newton, A.C.; Kolb, M.; de Alburquerque, F.S.; Arets, E.J.M.M.; Alkemade, J.R.M.; Pérez, A.M. Species distribution modeling in the tropics: Problems, potentialities, and the role of biological data for effective species conservation. Trop. Conserv. Sci.
**2009**, 2, 319–352. [Google Scholar]

© 2015 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Costa, H.; Foody, G.M.; Jiménez, S.; Silva, L.
Impacts of Species Misidentification on Species Distribution Modeling with Presence-Only Data. *ISPRS Int. J. Geo-Inf.* **2015**, *4*, 2496-2518.
https://doi.org/10.3390/ijgi4042496

**AMA Style**

Costa H, Foody GM, Jiménez S, Silva L.
Impacts of Species Misidentification on Species Distribution Modeling with Presence-Only Data. *ISPRS International Journal of Geo-Information*. 2015; 4(4):2496-2518.
https://doi.org/10.3390/ijgi4042496

**Chicago/Turabian Style**

Costa, Hugo, Giles M. Foody, Sílvia Jiménez, and Luís Silva.
2015. "Impacts of Species Misidentification on Species Distribution Modeling with Presence-Only Data" *ISPRS International Journal of Geo-Information* 4, no. 4: 2496-2518.
https://doi.org/10.3390/ijgi4042496