Open AccessData Descriptor
CHASE-PL—Future Hydrology Data Set: Projections of Water Balance and Streamflow for the Vistula and Odra Basins, Poland
Data 2017, 2(2), 14; doi:10.3390/data2020014 -
Abstract
There is considerable concern that the water resources of Central and Eastern Europe region can be adversely affected by climate change. Projections of future water balance and streamflow conditions can be obtained by forcing hydrological models with the output from climate models. In
[...] Read more.
There is considerable concern that the water resources of Central and Eastern Europe region can be adversely affected by climate change. Projections of future water balance and streamflow conditions can be obtained by forcing hydrological models with the output from climate models. In this study, we employed the SWAT hydrological model driven with an ensemble of nine bias-corrected EURO-CORDEX climate simulations to generate future hydrological projections for the Vistula and Odra basins in two future horizons (2024–2050 and 2074–2100) under two Representative Concentration Pathways (RCPs). The data set consists of three parts: (1) model inputs; (2) raw model outputs; (3) aggregated model outputs. The first one allows the users to reproduce the outputs or to create the new ones. The second one contains the simulated time series of 10 variables simulated by SWAT: precipitation, snow melt, potential evapotranspiration, actual evapotranspiration, soil water content, percolation, surface runoff, baseflow, water yield and streamflow. The third one consists of the multi-model ensemble statistics of the relative changes in mean seasonal and annual variables developed in a GIS format. The data set should be of interest of climate impact scientists, water managers and water-sector policy makers. In any case, it should be noted that projections included in this data set are associated with high uncertainties explained in this data descriptor paper. Full article
Figures

Figure 1

Open AccessData Descriptor
Open Access Article Processing Charges (OA APC) Longitudinal Study 2016 Dataset
Data 2017, 2(2), 13; doi:10.3390/data2020013 -
Abstract
This article documents Open access article processing charges (OA APC) Main 2016. This dataset was developed as part of a longitudinal study of the minority (about a third) of the fully open access journals that use the APC business model. APC data for
[...] Read more.
This article documents Open access article processing charges (OA APC) Main 2016. This dataset was developed as part of a longitudinal study of the minority (about a third) of the fully open access journals that use the APC business model. APC data for 2016, 2015, 2014, and 2013 are primarily obtained from publishers’ websites, a process that requires analytic skill as many publishers offer a diverse range of pricing options, including multiple currencies and/or differential pricing by article type, length or work involved and/or discounts for author contributions to editing or the society publisher or based on perceived ability to pay. This version of the dataset draws heavily from the work of Walt Crawford, and includes his entire 2011–2015 dataset; in particular Crawford’s work has made it possible to confirm “no publication fee” status for a large number of journals. DOAJ metadata for 2016 and 2014 and a 2010 APC sample provided by Solomon and Björk are part of the dataset. Inclusion of DOAJ metadata and article counts by Crawford and Solomon and Björk provide a basis for studies of factors such as journal size, subject, or country of publication that might be worth testing for correlation with business model and/or APC size. Full article
Open AccessData Descriptor
Ecological and Functional Traits in 99 Bird Species over a Large-Scale Gradient in Germany
Data 2017, 2(2), 12; doi:10.3390/data2020012 -
Abstract
A gap still exists in published data on variation of morphological and ecological traits for common bird species over a large area. To diminish this knowledge gap, we report here average values of 99 bird species from three sites in Germany from the
[...] Read more.
A gap still exists in published data on variation of morphological and ecological traits for common bird species over a large area. To diminish this knowledge gap, we report here average values of 99 bird species from three sites in Germany from the Biodiversity Exploratories on 24 ecological and functional traits. We present our own data on morphological and ecological traits of 28 common bird species and provide additional measurements for further species from published studies. This is a unique data set from live birds, which has not been published and is available neither from museum nor from any other collection in the presented coverage. Dataset: available as the supplementary file. Dataset license: CC-BY Full article
Figures

Figure 1

Open AccessErratum
Erratum: Morrison, H., et al. Open Access Article Processing Charges (OA APC) Longitudinal Study 2015 Preliminary Dataset
Data 2017, 2(1), 11; doi:10.3390/data2010011 -
Abstract The authors wish to make the following corrections to their paper [...] Full article
Open AccessData Descriptor
Herbarium of the Pontifical Catholic University of Paraná (HUCP), Curitiba, Southern Brazil
Data 2017, 2(1), 10; doi:10.3390/data2010010 -
Abstract
The main objective of this paper is to present the herbarium of the Pontifical Catholic University of Parana’s and its collection. The history of the HUCP had its beginning in the middle of the 1970s with the foundation of the Biology Museum that
[...] Read more.
The main objective of this paper is to present the herbarium of the Pontifical Catholic University of Parana’s and its collection. The history of the HUCP had its beginning in the middle of the 1970s with the foundation of the Biology Museum that gathered both botanical and zoological specimens. In April 1979 collections were separated and the HUCP was founded with preserved specimens of algae (green, red, and brown), fungi, and embryophytes. As of October 2016, the collection encompasses nearly 25,000 specimens from 4934 species, 1609 genera, and 297 families. Most of the specimens comes from the state of Paraná but there were also specimens from many Brazilian states and other countries, mainly from South America (Chile, Argentina, Uruguay, Paraguay, and Colombia) but also from other parts of the world (Cuba, USA, Spain, Germany, China, and Australia). Our collection includes 42 fungi, 258 gymnosperms, 299 bryophytes, 2809 pteridophytes, 3158 algae, 17,832 angiosperms, and only one type of Mimosa (Mimosa tucumensis Barneby ex Ribas, M. Morales & Santos-Silva—Fabaceae). We also have botanical education and education for sustainability programs for basic and high school students and training for teachers. Full article
Figures

Figure 1

Open AccessArticle
The Effectiveness of Geographical Data in Multi-Criteria Evaluation of Landscape Services †
Data 2017, 2(1), 9; doi:10.3390/data2010009 -
Abstract
The aim of the paper is to map and evaluate the state of the multifunctional landscape of the municipality of Naples (Italy) and its surroundings, through a Spatial Decision-Making support system (SDSS) combining geographic information system (GIS) and a multi-criteria method an analytic
[...] Read more.
The aim of the paper is to map and evaluate the state of the multifunctional landscape of the municipality of Naples (Italy) and its surroundings, through a Spatial Decision-Making support system (SDSS) combining geographic information system (GIS) and a multi-criteria method an analytic hierarchy process (AHP). We conceive a knowledge-mapping-evaluation (KME) framework in order to investigate the landscape as a complex system. The focus of the proposed methodology involving data gathering and processing. Therefore, both the authoritative and the unofficial sources, e.g., volunteered geographical information (VGI), are useful tools to enhance the information flow whenever quality assurance is performed. Thus, the maps of spatial criteria are useful for problem structuring and prioritization by considering the availability of context-aware data. Finally, the identification of landscape services (LS) and ecosystem services (ES) can improve the decision-making processes within a multi-stakeholders perspective involving the evaluation of the trade-off. The results show multi-criteria choropleth maps of the LS and ES with the density of services, the spatial distribution, and the surrounding benefits. Full article
Figures

Figure 1

Open AccessData Descriptor
Data on Healthy Food Accessibility in Amsterdam, The Netherlands
Data 2017, 2(1), 7; doi:10.3390/data2010007 -
Abstract
This data descriptor introduces data on healthy food supplied by supermarkets in the city of Amsterdam, The Netherlands. In addition to two neighborhood variables (i.e., share of autochthons and average housing values), the data comprises three street network-based accessibility measures derived from analyses
[...] Read more.
This data descriptor introduces data on healthy food supplied by supermarkets in the city of Amsterdam, The Netherlands. In addition to two neighborhood variables (i.e., share of autochthons and average housing values), the data comprises three street network-based accessibility measures derived from analyses using a geographic information system. Data are provided on a spatial micro-scale utilizing grid cells with a spatial resolution of 100 m. We explain how the data were collected and pre-processed, and how alternative analyses can be set up. To illustrate the use of the data, an example is provided using the R programming language. Full article
Figures

Figure 1

Open AccessArticle
An Overview and Evaluation of Recent Machine Learning Imputation Methods Using Cardiac Imaging Data
Data 2017, 2(1), 8; doi:10.3390/data2010008 -
Abstract
Many clinical research datasets have a large percentage of missing values that directly impacts their usefulness in yielding high accuracy classifiers when used for training in supervised machine learning. While missing value imputation methods have been shown to work well with smaller percentages
[...] Read more.
Many clinical research datasets have a large percentage of missing values that directly impacts their usefulness in yielding high accuracy classifiers when used for training in supervised machine learning. While missing value imputation methods have been shown to work well with smaller percentages of missing values, their ability to impute sparse clinical research data can be problem specific. We previously attempted to learn quantitative guidelines for ordering cardiac magnetic resonance imaging during the evaluation for pediatric cardiomyopathy, but missing data significantly reduced our usable sample size. In this work, we sought to determine if increasing the usable sample size through imputation would allow us to learn better guidelines. We first review several machine learning methods for estimating missing data. Then, we apply four popular methods (mean imputation, decision tree, k-nearest neighbors, and self-organizing maps) to a clinical research dataset of pediatric patients undergoing evaluation for cardiomyopathy. Using Bayesian Rule Learning (BRL) to learn ruleset models, we compared the performance of imputation-augmented models versus unaugmented models. We found that all four imputation-augmented models performed similarly to unaugmented models. While imputation did not improve performance, it did provide evidence for the robustness of our learned models. Full article
Figures

Figure 2

Open AccessTechnical Note
Determination of Concentration of the Aqueous Lithium–Bromide Solution in a Vapour Absorption Refrigeration System by Measurement of Electrical Conductivity and Temperature
Data 2017, 2(1), 6; doi:10.3390/data2010006 -
Abstract
Lithium–bromide/water (LiBr/water) pairs are widely used as working medium in vapour absorption refrigeration systems where the maximum expected temperature and LiBr mass concentration in solution are usually 95 ℃ and 65%, respectively. Unfortunately, published data on the electrical conductivity of aqueous lithium–bromide solution
[...] Read more.
Lithium–bromide/water (LiBr/water) pairs are widely used as working medium in vapour absorption refrigeration systems where the maximum expected temperature and LiBr mass concentration in solution are usually 95 ℃ and 65%, respectively. Unfortunately, published data on the electrical conductivity of aqueous lithium–bromide solution are few and contradictory. The objective of this paper is to develop an empirical equation for the determination of the concentration of the aqueous lithium–bromide solution during the operation of the vapour absorption refrigeration system when the electrical conductivity and temperature of solution are known. The present study experimentally investigated the electrical conductivity of aqueous lithium–bromide solution at temperatures in the range from 25 ℃ to 95 ℃ and concentrations in the range from 45% to 65% by mass using a submersion toroidal conductivity sensor connected to a conductivity meter. The results of the tests have shown this method to be an accurate and efficient way to determine the concentration of aqueous lithium–bromide solution in the vapour absorption refrigeration system. Full article
Figures

Figure 1

Open AccessArticle
Learning Parsimonious Classification Rules from Gene Expression Data Using Bayesian Networks with Local Structure
Data 2017, 2(1), 5; doi:10.3390/data2010005 -
Abstract
The comprehensibility of good predictive models learned from high-dimensional gene expression data is attractive because it can lead to biomarker discovery. Several good classifiers provide comparable predictive performance but differ in their abilities to summarize the observed data. We extend a Bayesian Rule
[...] Read more.
The comprehensibility of good predictive models learned from high-dimensional gene expression data is attractive because it can lead to biomarker discovery. Several good classifiers provide comparable predictive performance but differ in their abilities to summarize the observed data. We extend a Bayesian Rule Learning (BRL-GSS) algorithm, previously shown to be a significantly better predictor than other classical approaches in this domain. It searches a space of Bayesian networks using a decision tree representation of its parameters with global constraints, and infers a set of IF-THEN rules. The number of parameters and therefore the number of rules are combinatorial in the number of predictor variables in the model. We relax these global constraints to learn a more expressive local structure with BRL-LSS. BRL-LSS entails a more parsimonious set of rules because it does not have to generate all combinatorial rules. The search space of local structures is much richer than the space of global structures. We design the BRL-LSS with the same worst-case time-complexity as BRL-GSS while exploring a richer and more complex model space. We measure predictive performance using Area Under the ROC curve (AUC) and Accuracy. We measure model parsimony performance by noting the average number of rules and variables needed to describe the observed data. We evaluate the predictive and parsimony performance of BRL-GSS, BRL-LSS and the state-of-the-art C4.5 decision tree algorithm, across 10-fold cross-validation using ten microarray gene-expression diagnostic datasets. In these experiments, we observe that BRL-LSS is similar to BRL-GSS in terms of predictive performance, while generating a much more parsimonious set of rules to explain the same observed data. BRL-LSS also needs fewer variables than C4.5 to explain the data with similar predictive performance. We also conduct a feasibility study to demonstrate the general applicability of our BRL methods on the newer RNA sequencing gene-expression data. Full article
Figures

Figure 1

Open AccessEditorial
Acknowledgement to Reviewers of Data in 2016
Data 2017, 2(1), 4; doi:10.3390/data2010004 -
Abstract The editors of Data would like to express their sincere gratitude to the following reviewers for assessing manuscripts in 2016.[...] Full article
Open AccessData Descriptor
Scanned Image Data from 3D-Printed Specimens Using Fused Deposition Modeling
Data 2017, 2(1), 3; doi:10.3390/data2010003 -
Abstract
This dataset provides high-resolution 2D scans of 3D printed test objects (dog-bone), derived from EN ISO 527-2:2012. The specimens are scanned in resolutions from 600 dpi to 4800 dpi utilising a Konica-Minolta bizHub 42 and Canon LiDE 210 scanner. The specimens are created
[...] Read more.
This dataset provides high-resolution 2D scans of 3D printed test objects (dog-bone), derived from EN ISO 527-2:2012. The specimens are scanned in resolutions from 600 dpi to 4800 dpi utilising a Konica-Minolta bizHub 42 and Canon LiDE 210 scanner. The specimens are created to research the influence of the infill-pattern orientation; The print orientation on the geometrical fidelity and the structural strength. The specimens are printed on a MakerBot Replicator 2X 3D-printer using yellow (ABS 1.75 mm Yellow, REC, Moscow, Russia) and purple ABS plastic (ABS 1.75 mm Pink Lion&Fox, Hamburg, Germany). The dataset consists of at least one scan per specimen with the measured dimensional characteristics. For this, software is created and described within this work. Specimens from this dataset are either scanned on blank white paper or on white paper with blue millimetre marking. The printing experiment contains a number of failed prints. Specimens that did not fulfil the expected geometry are scanned separately and are of lower quality due to the inability to scan objects with a non-flat surface. For a number of specimens printed sensor data is acquired during the printing process. This dataset consists of 193 specimen scans in PNG format of 127 objects with unadjusted raw graphical data and a corresponding, annotated post-processed image. Annotated data includes the detected object, its geometrical characteristics and file information. Computer extracted geometrical information is supplied for the images where automated geometrical feature extraction is possible. Full article
Figures

Figure 1

Open AccessArticle
How to Make Sense of Team Sport Data: From Acquisition to Data Modeling and Research Aspects
Data 2017, 2(1), 2; doi:10.3390/data2010002 -
Abstract
Automatic and interactive data analysis is instrumental in making use of increasing amounts of complex data. Owing to novel sensor modalities, analysis of data generated in professional team sport leagues such as soccer, baseball, and basketball has recently become of concern, with potentially
[...] Read more.
Automatic and interactive data analysis is instrumental in making use of increasing amounts of complex data. Owing to novel sensor modalities, analysis of data generated in professional team sport leagues such as soccer, baseball, and basketball has recently become of concern, with potentially high commercial and research interest. The analysis of team ball games can serve many goals, e.g., in coaching to understand effects of strategies and tactics, or to derive insights improving performance. Also, it is often decisive to trainers and analysts to understand why a certain movement of a player or groups of players happened, and what the respective influencing factors are. We consider team sport as group movement including collaboration and competition of individuals following specific rule sets. Analyzing team sports is a challenging problem as it involves joint understanding of heterogeneous data perspectives, including high-dimensional, video, and movement data, as well as considering team behavior and rules (constraints) given in the particular team sport. We identify important components of team sport data, exemplified by the soccer case, and explain how to analyze team sport data in general. We identify challenges arising when facing these data sets and we propose a multi-facet view and analysis including pattern detection, context-aware analysis, and visual explanation. We also present applicable methods and technologies covering the heterogeneous aspects in team sport data. Full article
Figures

Figure 1

Open AccessData Descriptor
Description of a Database Containing Wrist PPG Signals Recorded during Physical Exercise with Both Accelerometer and Gyroscope Measures of Motion
Data 2017, 2(1), 1; doi:10.3390/data2010001 -
Abstract
Wearable heart rate sensors such as those found in smartwatches are commonly based upon Photoplethysmography (PPG) which shines a light into the wrist and measures the amount of light reflected back. This method works well for stationary subjects, but in exercise situations, PPG
[...] Read more.
Wearable heart rate sensors such as those found in smartwatches are commonly based upon Photoplethysmography (PPG) which shines a light into the wrist and measures the amount of light reflected back. This method works well for stationary subjects, but in exercise situations, PPG signals are heavily corrupted by motion artifacts. The presence of these artifacts necessitates the creation of signal processing algorithms for removing the motion interference and allowing the true heart related information to be extracted from the PPG trace during exercise. Here, we describe a new publicly available database of PPG signals collected during exercise for the creation and validation of signal processing algorithms extracting heart rate and heart rate variability from PPG signals. PPG signals from the wrist are recorded together with chest electrocardiography (ECG) to allow a reference/comparison heart rate to be found, and the temporal alignment between the two signal sets is estimated from the signal timestamps. The new database differs from previously available public databases because it includes wrist PPG recorded during walking, running, easy bike riding and hard bike riding. It also provides estimates of the wrist movement recorded using a 3-axis low-noise accelerometer, a 3-axis wide-range accelerometer, and a 3-axis gyroscope. The inclusion of gyroscopic information allows, for the first time, separation of acceleration due to gravity and acceleration due to true motion of the sensor. The hypothesis is that the improved motion information provided could assist in the development of algorithms with better PPG motion artifact removal performance. Full article
Figures

Figure 1

Open AccessReview
Standardization and Quality Control in Data Collection and Assessment of Threatened Plant Species
Data 2016, 1(3), 20; doi:10.3390/data1030020 -
Abstract
Informative data collection is important in the identification and conservation of rare plant species. Data sets generated by many small-scale studies may be integrated into large, distributed databases, and statistical tools are being developed to extract meaningful information from such databases. A diversity
[...] Read more.
Informative data collection is important in the identification and conservation of rare plant species. Data sets generated by many small-scale studies may be integrated into large, distributed databases, and statistical tools are being developed to extract meaningful information from such databases. A diversity of field methodologies may be employed across smaller studies, however, resulting in a lack of standardization and quality control, which makes integration more difficult. Here, we present a case study of the population-level monitoring of two threatened plant species with contrasting life history traits that require different field sampling methodologies: the limestone glade bladderpod, Physaria filiformis, and the western prairie fringed orchid, Plantanthera praeclara. Although different data collection methodologies are necessary for these species based on population sizes and plant morphology, the resulting data allow for similar inferences. Different sample designs may frequently be necessary for rare plant sampling, yet still provide comparable data. Various sources of uncertainty may be associated with data collection (e.g., random sampling error, methodological imprecision, observer error), and should always be quantified if possible and included in data sets, and described in metadata. Ancillary data (e.g., abundance of other plants, physical environment, weather/climate) may be valuable and the most relevant variables may be determined by natural history or empirical studies. Once data are collected, standard operating procedures should be established to prevent errors in data entry. Best practices for data archiving should be followed, and data should be made available for other scientists to use. Efforts to standardize data collection and control data quality, particularly in small-scale field studies, are imperative to future cross-study comparisons, meta-analyses, and systematic reviews. Full article
Figures

Figure 1

Open AccessArticle
Application of Taxonomic Modeling to Microbiota Data Mining for Detection of Helminth Infection in Global Populations
Data 2016, 1(3), 19; doi:10.3390/data1030019 -
Abstract
Human microbiome data from genomic sequencing technologies is fast accumulating, giving us insights into bacterial taxa that contribute to health and disease. The predictive modeling of such microbiota count data for the classification of human infection from parasitic worms, such as helminths, can
[...] Read more.
Human microbiome data from genomic sequencing technologies is fast accumulating, giving us insights into bacterial taxa that contribute to health and disease. The predictive modeling of such microbiota count data for the classification of human infection from parasitic worms, such as helminths, can help in the detection and management across global populations. Real-world datasets of microbiome experiments are typically sparse, containing hundreds of measurements for bacterial species, of which only a few are detected in the bio-specimens that are analyzed. This feature of microbiome data produces the challenge of needing more observations for accurate predictive modeling and has been dealt with previously, using different methods of feature reduction. To our knowledge, integrative methods, such as transfer learning, have not yet been explored in the microbiome domain as a way to deal with data sparsity by incorporating knowledge of different but related datasets. One way of incorporating this knowledge is by using a meaningful mapping among features of these datasets. In this paper, we claim that this mapping would exist among members of each individual cluster, grouped based on phylogenetic dependency among taxa and their association to the phenotype. We validate our claim by showing that models incorporating associations in such a grouped feature space result in no performance deterioration for the given classification task. In this paper, we test our hypothesis by using classification models that detect helminth infection in microbiota of human fecal samples obtained from Indonesia and Liberia countries. In our experiments, we first learn binary classifiers for helminth infection detection by using Naive Bayes, Support Vector Machines, Multilayer Perceptrons, and Random Forest methods. In the next step, we add taxonomic modeling by using the SMART-scan module to group the data, and learn classifiers using the same four methods, to test the validity of the achieved groupings. We observed a 6% to 23% and 7% to 26% performance improvement based on the Area Under the receiver operating characteristic (ROC) Curve (AUC) and Balanced Accuracy (Bacc) measures, respectively, over 10 runs of 10-fold cross-validation. These results show that using phylogenetic dependency for grouping our microbiota data actually results in a noticeable improvement in classification performance for helminth infection detection. These promising results from this feasibility study demonstrate that methods such as SMART-scan can be utilized in the future for knowledge transfer from different but related microbiome datasets by phylogenetically-related functional mapping, to enable novel integrative biomarker discovery. Full article
Figures

Figure 1

Open AccessArticle
The Land Surface Temperature Synergistic Processor in BEAM: A Prototype towards Sentinel-3
Data 2016, 1(3), 18; doi:10.3390/data1030018 -
Abstract
Land Surface Temperature (LST) is one of the key parameters in the physics of land-surface processes on regional and global scales, combining the results of all surface-atmosphere interactions and energy fluxes between the surface and the atmosphere. With the advent of the European
[...] Read more.
Land Surface Temperature (LST) is one of the key parameters in the physics of land-surface processes on regional and global scales, combining the results of all surface-atmosphere interactions and energy fluxes between the surface and the atmosphere. With the advent of the European Space Agency (ESA) Sentinel 3 (S3) satellite, accurate LST retrieval methodologies are being developed by exploiting the synergy between the Ocean and Land Colour Instrument (OLCI) and the Sea and Land Surface Temperature Radiometer (SLSTR). In this paper we explain the implementation in the Basic ENVISAT Toolbox for (A)ATSR and MERIS (BEAM) and the use of one LST algorithm developed in the framework of the Synergistic Use of The Sentinel Missions For Estimating And Monitoring Land Surface Temperature (SEN4LST) project. The LST algorithm is based on the split-window technique with an explicit dependence on the surface emissivity. Performance of the methodology is assessed by using MEdium Resolution Imaging Spectrometer/Advanced Along-Track Scanning Radiometer (MERIS/AATSR) pairs, instruments with similar characteristics than OLCI/ SLSTR, respectively. The LST retrievals were properly validated against in situ data measured along one year (2011) in three test sites, and inter-compared to the standard AATSR level-2 product with satisfactory results. The algorithm is implemented in BEAM using as a basis the MERIS/AATSR Synergy Toolbox. Specific details about the processor validation can be found in the validation report of the SEN4LST project. Full article
Figures

Figure 1

Open AccessData Descriptor
Land Cover Data for the Mississippi–Alabama Barrier Islands, 2010–2011
Data 2016, 1(3), 16; doi:10.3390/data1030016 -
Abstract
Land cover on the Mississippi–Alabama barrier islands was surveyed in 2010–2011 as part of continuing research on island geomorphic and vegetation dynamics following the 2005 impact of Hurricane Katrina. Results of the survey include sub-meter GPS location, a listing of dominant vegetation species
[...] Read more.
Land cover on the Mississippi–Alabama barrier islands was surveyed in 2010–2011 as part of continuing research on island geomorphic and vegetation dynamics following the 2005 impact of Hurricane Katrina. Results of the survey include sub-meter GPS location, a listing of dominant vegetation species and field photographs recorded at 375 sampling locations distributed among Cat, West Ship, East Ship, Horn, Sand, Petit Bois and Dauphin Islands. The survey was conducted in a period of intensive remote sensing data acquisition over the northern Gulf of Mexico by federal, state and commercial organizations in response to the 2010 Macondo Well (Deepwater Horizon) oil spill. The data are useful in providing ground reference information for thematic classification of remotely-sensed imagery, and a record of land cover which may be used in future research. Full article
Figures

Open AccessData Descriptor
SNiPhunter: A SNP-Based Search Engine
Data 2016, 1(3), 17; doi:10.3390/data1030017 -
Abstract
Procuring biomedical literature is a time-consuming process. The genomic sciences software solution described here indexes literature from Pubmed Central’s open access initiative, and makes it available as a web application and through an application programming interface (API). The purpose of this tertiary data
[...] Read more.
Procuring biomedical literature is a time-consuming process. The genomic sciences software solution described here indexes literature from Pubmed Central’s open access initiative, and makes it available as a web application and through an application programming interface (API). The purpose of this tertiary data artifact—called SNiPhunter—is to assist researchers in finding articles relevant to a reference single nucleotide polymorphism (SNP) identifier of interest. A novel feature of this NoSQL (not only structured query language) database search engine is that it returns results to the user ordered according to the amount of times a refSNP has appeared in an article, thereby allowing the user to make a quantitative estimate as to the relevance of an article. Queries can also be launched using author-defined keywords. Additional features include a variant call format (VCF) file parser and a multiple query file upload service. Software implementation in this project relied on Python and the NodeJS interpreter, as well as third party libraries retrieved from Github. Full article
Figures

Open AccessData Descriptor
Technical Guidelines to Extract and Analyze VGI from Different Platforms
Data 2016, 1(3), 15; doi:10.3390/data1030015 -
Abstract
An increasing number of Volunteered Geographic Information (VGI) and social media platforms have been continuously growing in size, which have provided massive georeferenced data in many forms including textual information, photographs, and geoinformation. These georeferenced data have either been actively contributed (e.g., adding
[...] Read more.
An increasing number of Volunteered Geographic Information (VGI) and social media platforms have been continuously growing in size, which have provided massive georeferenced data in many forms including textual information, photographs, and geoinformation. These georeferenced data have either been actively contributed (e.g., adding data to OpenStreetMap (OSM) or Mapillary) or collected in a more passive fashion by enabling geolocation whilst using an online platform (e.g., Twitter, Instagram, or Flickr). The benefit of scraping and streaming these data in stand-alone applications is evident, however, it is difficult for many users to script and scrape the diverse types of these data. On 14 June 2016, a pre-conference workshop at the AGILE 2016 conference in Helsinki, Finland was held. The workshop was called “LINK-VGI: LINKing and analyzing VGI across different platforms”. The workshop provided an opportunity for interested researchers to share ideas and findings on cross-platform data contributions. One portion of the workshop was dedicated to a hands-on session. In this session, the basics of spatial data access through selected Application Programming Interfaces (APIs) and the extraction of summary statistics of the results were illustrated. This paper presents the content of the hands-on session including the scripts and guidelines for extracting VGI data. Researchers, planners, and interested end-users can benefit from this paper for developing their own application for any region of the world. Full article
Figures

Figure 1