Next Article in Journal
Testing Extended Accounts in Scheduled Conservation of Open Woodlands with Permanent Livestock Grazing: Dehesa de la Luz Estate Case Study, Arroyo de la Luz, Spain
Previous Article in Journal
Mechanical Behaviour of Soil Improved by Alkali Activated Binders
Article Menu
Issue 4 (December) cover image

Export Article

Open AccessArticle
Environments 2017, 4(4), 81;

Identifying Reliable Opportunistic Data for Species Distribution Modeling: A Benchmark Data Optimization Approach

Department of Bioenvironmental Systems Engineering, National Taiwan University, No. 1, Sec. 4, Roosevelt Road, Taipei 10617, Taiwan
Geographic Information Technology Co., 4F.No. 310, Sec. 4, Zhongxiao E. Rd., Taipei 10694, Taiwan
Author to whom correspondence should be addressed.
Received: 29 September 2017 / Revised: 4 November 2017 / Accepted: 7 November 2017 / Published: 14 November 2017
Full-Text   |   PDF [31893 KB, uploaded 14 November 2017]   |  


The purpose of this study is to increase the number of species occurrence data by integrating opportunistic data with Global Biodiversity Information Facility (GBIF) benchmark data via a novel optimization technique. The optimization method utilizes Natural Language Processing (NLP) and a simulated annealing (SA) algorithm to maximize the average likelihood of species occurrence in maximum entropy presence-only species distribution models (SDM). We applied the Kruskal–Wallis test to assess the differences between the corresponding environmental variables and habitat suitability indices (HSI) among datasets, including data from GBIF, Facebook (FB), and data from optimally selected FB data. To quantify uncertainty in SDM predictions, and to quantify the efficacy of the proposed optimization procedure, we used a bootstrapping approach to generate 1000 subsets from five different datasets: (1) GBIF; (2) FB; (3) GBIF plus FB; (4) GBIF plus optimally selected FB; and (5) GBIF plus randomly selected FB. We compared the performance of simulated species distributions based on each of the above subsets via the area under the curve (AUC) of the receiver operating characteristic (ROC). We also performed correlation analysis between the average benchmark-based SDM outputs and the average dataset-based SDM outputs. Median AUCs of SDMs based on the dataset that combined benchmark GBIF data and optimally selected FB data were generally higher than the AUCs of other datasets, indicating the effectiveness of the optimization procedure. Our results suggest that the proposed approach increases the quality and quantity of data by effectively extracting opportunistic data from large unstructured datasets with respect to benchmark data. View Full-Text
Keywords: optimal data selection; data combination; opportunistic data; species modeling optimal data selection; data combination; opportunistic data; species modeling

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).

Supplementary material


Share & Cite This Article

MDPI and ACS Style

Lin, Y.-P.; Lin, W.-C.; Lien, W.-Y.; Anthony, J.; Petway, J.R. Identifying Reliable Opportunistic Data for Species Distribution Modeling: A Benchmark Data Optimization Approach. Environments 2017, 4, 81.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics



[Return to top]
Environments EISSN 2076-3298 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top