Identifying Reliable Opportunistic Data for Species Distribution Modeling: A Benchmark Data Optimization Approach
AbstractThe purpose of this study is to increase the number of species occurrence data by integrating opportunistic data with Global Biodiversity Information Facility (GBIF) benchmark data via a novel optimization technique. The optimization method utilizes Natural Language Processing (NLP) and a simulated annealing (SA) algorithm to maximize the average likelihood of species occurrence in maximum entropy presence-only species distribution models (SDM). We applied the Kruskal–Wallis test to assess the differences between the corresponding environmental variables and habitat suitability indices (HSI) among datasets, including data from GBIF, Facebook (FB), and data from optimally selected FB data. To quantify uncertainty in SDM predictions, and to quantify the efficacy of the proposed optimization procedure, we used a bootstrapping approach to generate 1000 subsets from five different datasets: (1) GBIF; (2) FB; (3) GBIF plus FB; (4) GBIF plus optimally selected FB; and (5) GBIF plus randomly selected FB. We compared the performance of simulated species distributions based on each of the above subsets via the area under the curve (AUC) of the receiver operating characteristic (ROC). We also performed correlation analysis between the average benchmark-based SDM outputs and the average dataset-based SDM outputs. Median AUCs of SDMs based on the dataset that combined benchmark GBIF data and optimally selected FB data were generally higher than the AUCs of other datasets, indicating the effectiveness of the optimization procedure. Our results suggest that the proposed approach increases the quality and quantity of data by effectively extracting opportunistic data from large unstructured datasets with respect to benchmark data. View Full-Text
- Supplementary File 1:
Supplementary (PDF, 5945 KB)
Share & Cite This Article
Lin, Y.-P.; Lin, W.-C.; Lien, W.-Y.; Anthony, J.; Petway, J.R. Identifying Reliable Opportunistic Data for Species Distribution Modeling: A Benchmark Data Optimization Approach. Environments 2017, 4, 81.
Lin Y-P, Lin W-C, Lien W-Y, Anthony J, Petway JR. Identifying Reliable Opportunistic Data for Species Distribution Modeling: A Benchmark Data Optimization Approach. Environments. 2017; 4(4):81.Chicago/Turabian Style
Lin, Yu-Pin; Lin, Wei-Chih; Lien, Wan-Yu; Anthony, Johnathen; Petway, Joy R. 2017. "Identifying Reliable Opportunistic Data for Species Distribution Modeling: A Benchmark Data Optimization Approach." Environments 4, no. 4: 81.
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.