Open AccessErratum
Erratum: Morrison, H., et al. Open Access Article Processing Charges (OA APC) Longitudinal Study 2015 Preliminary Dataset
Data 2017, 2(1), 11; doi:10.3390/data2010011 -
Abstract The authors wish to make the following corrections to their paper [...] Full article
Open AccessData Descriptor
Herbarium of the Pontifical Catholic University of Paraná (HUCP), Curitiba, Southern Brazil
Data 2017, 2(1), 10; doi:10.3390/data2010010 -
Abstract
The main objective of this paper is to present the herbarium of the Pontifical Catholic University of Parana’s and its collection. The history of the HUCP had its beginning in the middle of the 1970s with the foundation of the Biology Museum that
[...] Read more.
The main objective of this paper is to present the herbarium of the Pontifical Catholic University of Parana’s and its collection. The history of the HUCP had its beginning in the middle of the 1970s with the foundation of the Biology Museum that gathered both botanical and zoological specimens. In April 1979 collections were separated and the HUCP was founded with preserved specimens of algae (green, red, and brown), fungi, and embryophytes. As of October 2016, the collection encompasses nearly 25,000 specimens from 4934 species, 1609 genera, and 297 families. Most of the specimens comes from the state of Paraná but there were also specimens from many Brazilian states and other countries, mainly from South America (Chile, Argentina, Uruguay, Paraguay, and Colombia) but also from other parts of the world (Cuba, USA, Spain, Germany, China, and Australia). Our collection includes 42 fungi, 258 gymnosperms, 299 bryophytes, 2809 pteridophytes, 3158 algae, 17,832 angiosperms, and only one type of Mimosa (Mimosa tucumensis Barneby ex Ribas, M. Morales & Santos-Silva—Fabaceae). We also have botanical education and education for sustainability programs for basic and high school students and training for teachers. Full article
Figures

Figure 1

Open AccessArticle
The Effectiveness of Geographical Data in Multi-Criteria Evaluation of Landscape Services †
Data 2017, 2(1), 9; doi:10.3390/data2010009 -
Abstract
The aim of the paper is to map and evaluate the state of the multifunctional landscape of the municipality of Naples (Italy) and its surroundings, through a Spatial Decision-Making support system (SDSS) combining geographic information system (GIS) and a multi-criteria method an analytic
[...] Read more.
The aim of the paper is to map and evaluate the state of the multifunctional landscape of the municipality of Naples (Italy) and its surroundings, through a Spatial Decision-Making support system (SDSS) combining geographic information system (GIS) and a multi-criteria method an analytic hierarchy process (AHP). We conceive a knowledge-mapping-evaluation (KME) framework in order to investigate the landscape as a complex system. The focus of the proposed methodology involving data gathering and processing. Therefore, both the authoritative and the unofficial sources, e.g., volunteered geographical information (VGI), are useful tools to enhance the information flow whenever quality assurance is performed. Thus, the maps of spatial criteria are useful for problem structuring and prioritization by considering the availability of context-aware data. Finally, the identification of landscape services (LS) and ecosystem services (ES) can improve the decision-making processes within a multi-stakeholders perspective involving the evaluation of the trade-off. The results show multi-criteria choropleth maps of the LS and ES with the density of services, the spatial distribution, and the surrounding benefits. Full article
Figures

Figure 1

Open AccessData Descriptor
Data on Healthy Food Accessibility in Amsterdam, The Netherlands
Data 2017, 2(1), 7; doi:10.3390/data2010007 -
Abstract
This data descriptor introduces data on healthy food supplied by supermarkets in the city of Amsterdam, The Netherlands. In addition to two neighborhood variables (i.e., share of autochthons and average housing values), the data comprises three street network-based accessibility measures derived from analyses
[...] Read more.
This data descriptor introduces data on healthy food supplied by supermarkets in the city of Amsterdam, The Netherlands. In addition to two neighborhood variables (i.e., share of autochthons and average housing values), the data comprises three street network-based accessibility measures derived from analyses using a geographic information system. Data are provided on a spatial micro-scale utilizing grid cells with a spatial resolution of 100 m. We explain how the data were collected and pre-processed, and how alternative analyses can be set up. To illustrate the use of the data, an example is provided using the R programming language. Full article
Figures

Figure 1

Open AccessArticle
An Overview and Evaluation of Recent Machine Learning Imputation Methods Using Cardiac Imaging Data
Data 2017, 2(1), 8; doi:10.3390/data2010008 -
Abstract
Many clinical research datasets have a large percentage of missing values that directly impacts their usefulness in yielding high accuracy classifiers when used for training in supervised machine learning. While missing value imputation methods have been shown to work well with smaller percentages
[...] Read more.
Many clinical research datasets have a large percentage of missing values that directly impacts their usefulness in yielding high accuracy classifiers when used for training in supervised machine learning. While missing value imputation methods have been shown to work well with smaller percentages of missing values, their ability to impute sparse clinical research data can be problem specific. We previously attempted to learn quantitative guidelines for ordering cardiac magnetic resonance imaging during the evaluation for pediatric cardiomyopathy, but missing data significantly reduced our usable sample size. In this work, we sought to determine if increasing the usable sample size through imputation would allow us to learn better guidelines. We first review several machine learning methods for estimating missing data. Then, we apply four popular methods (mean imputation, decision tree, k-nearest neighbors, and self-organizing maps) to a clinical research dataset of pediatric patients undergoing evaluation for cardiomyopathy. Using Bayesian Rule Learning (BRL) to learn ruleset models, we compared the performance of imputation-augmented models versus unaugmented models. We found that all four imputation-augmented models performed similarly to unaugmented models. While imputation did not improve performance, it did provide evidence for the robustness of our learned models. Full article
Figures

Figure 2

Open AccessTechnical Note
Determination of Concentration of the Aqueous Lithium–Bromide Solution in a Vapour Absorption Refrigeration System by Measurement of Electrical Conductivity and Temperature
Data 2017, 2(1), 6; doi:10.3390/data2010006 -
Abstract
Lithium–bromide/water (LiBr/water) pairs are widely used as working medium in vapour absorption refrigeration systems where the maximum expected temperature and LiBr mass concentration in solution are usually 95 ℃ and 65%, respectively. Unfortunately, published data on the electrical conductivity of aqueous lithium–bromide solution
[...] Read more.
Lithium–bromide/water (LiBr/water) pairs are widely used as working medium in vapour absorption refrigeration systems where the maximum expected temperature and LiBr mass concentration in solution are usually 95 ℃ and 65%, respectively. Unfortunately, published data on the electrical conductivity of aqueous lithium–bromide solution are few and contradictory. The objective of this paper is to develop an empirical equation for the determination of the concentration of the aqueous lithium–bromide solution during the operation of the vapour absorption refrigeration system when the electrical conductivity and temperature of solution are known. The present study experimentally investigated the electrical conductivity of aqueous lithium–bromide solution at temperatures in the range from 25 ℃ to 95 ℃ and concentrations in the range from 45% to 65% by mass using a submersion toroidal conductivity sensor connected to a conductivity meter. The results of the tests have shown this method to be an accurate and efficient way to determine the concentration of aqueous lithium–bromide solution in the vapour absorption refrigeration system. Full article
Figures

Figure 1

Open AccessArticle
Learning Parsimonious Classification Rules from Gene Expression Data Using Bayesian Networks with Local Structure
Data 2017, 2(1), 5; doi:10.3390/data2010005 -
Abstract
The comprehensibility of good predictive models learned from high-dimensional gene expression data is attractive because it can lead to biomarker discovery. Several good classifiers provide comparable predictive performance but differ in their abilities to summarize the observed data. We extend a Bayesian Rule
[...] Read more.
The comprehensibility of good predictive models learned from high-dimensional gene expression data is attractive because it can lead to biomarker discovery. Several good classifiers provide comparable predictive performance but differ in their abilities to summarize the observed data. We extend a Bayesian Rule Learning (BRL-GSS) algorithm, previously shown to be a significantly better predictor than other classical approaches in this domain. It searches a space of Bayesian networks using a decision tree representation of its parameters with global constraints, and infers a set of IF-THEN rules. The number of parameters and therefore the number of rules are combinatorial in the number of predictor variables in the model. We relax these global constraints to learn a more expressive local structure with BRL-LSS. BRL-LSS entails a more parsimonious set of rules because it does not have to generate all combinatorial rules. The search space of local structures is much richer than the space of global structures. We design the BRL-LSS with the same worst-case time-complexity as BRL-GSS while exploring a richer and more complex model space. We measure predictive performance using Area Under the ROC curve (AUC) and Accuracy. We measure model parsimony performance by noting the average number of rules and variables needed to describe the observed data. We evaluate the predictive and parsimony performance of BRL-GSS, BRL-LSS and the state-of-the-art C4.5 decision tree algorithm, across 10-fold cross-validation using ten microarray gene-expression diagnostic datasets. In these experiments, we observe that BRL-LSS is similar to BRL-GSS in terms of predictive performance, while generating a much more parsimonious set of rules to explain the same observed data. BRL-LSS also needs fewer variables than C4.5 to explain the data with similar predictive performance. We also conduct a feasibility study to demonstrate the general applicability of our BRL methods on the newer RNA sequencing gene-expression data. Full article
Figures

Figure 1

Open AccessEditorial
Acknowledgement to Reviewers of Data in 2016
Data 2017, 2(1), 4; doi:10.3390/data2010004 -
Abstract The editors of Data would like to express their sincere gratitude to the following reviewers for assessing manuscripts in 2016.[...] Full article
Open AccessData Descriptor
Scanned Image Data from 3D-Printed Specimens Using Fused Deposition Modeling
Data 2017, 2(1), 3; doi:10.3390/data2010003 -
Abstract
This dataset provides high-resolution 2D scans of 3D printed test objects (dog-bone), derived from EN ISO 527-2:2012. The specimens are scanned in resolutions from 600 dpi to 4800 dpi utilising a Konica-Minolta bizHub 42 and Canon LiDE 210 scanner. The specimens are created
[...] Read more.
This dataset provides high-resolution 2D scans of 3D printed test objects (dog-bone), derived from EN ISO 527-2:2012. The specimens are scanned in resolutions from 600 dpi to 4800 dpi utilising a Konica-Minolta bizHub 42 and Canon LiDE 210 scanner. The specimens are created to research the influence of the infill-pattern orientation; The print orientation on the geometrical fidelity and the structural strength. The specimens are printed on a MakerBot Replicator 2X 3D-printer using yellow (ABS 1.75 mm Yellow, REC, Moscow, Russia) and purple ABS plastic (ABS 1.75 mm Pink Lion&Fox, Hamburg, Germany). The dataset consists of at least one scan per specimen with the measured dimensional characteristics. For this, software is created and described within this work. Specimens from this dataset are either scanned on blank white paper or on white paper with blue millimetre marking. The printing experiment contains a number of failed prints. Specimens that did not fulfil the expected geometry are scanned separately and are of lower quality due to the inability to scan objects with a non-flat surface. For a number of specimens printed sensor data is acquired during the printing process. This dataset consists of 193 specimen scans in PNG format of 127 objects with unadjusted raw graphical data and a corresponding, annotated post-processed image. Annotated data includes the detected object, its geometrical characteristics and file information. Computer extracted geometrical information is supplied for the images where automated geometrical feature extraction is possible. Full article
Figures

Figure 1

Open AccessArticle
How to Make Sense of Team Sport Data: From Acquisition to Data Modeling and Research Aspects
Data 2017, 2(1), 2; doi:10.3390/data2010002 -
Abstract
Automatic and interactive data analysis is instrumental in making use of increasing amounts of complex data. Owing to novel sensor modalities, analysis of data generated in professional team sport leagues such as soccer, baseball, and basketball has recently become of concern, with potentially
[...] Read more.
Automatic and interactive data analysis is instrumental in making use of increasing amounts of complex data. Owing to novel sensor modalities, analysis of data generated in professional team sport leagues such as soccer, baseball, and basketball has recently become of concern, with potentially high commercial and research interest. The analysis of team ball games can serve many goals, e.g., in coaching to understand effects of strategies and tactics, or to derive insights improving performance. Also, it is often decisive to trainers and analysts to understand why a certain movement of a player or groups of players happened, and what the respective influencing factors are. We consider team sport as group movement including collaboration and competition of individuals following specific rule sets. Analyzing team sports is a challenging problem as it involves joint understanding of heterogeneous data perspectives, including high-dimensional, video, and movement data, as well as considering team behavior and rules (constraints) given in the particular team sport. We identify important components of team sport data, exemplified by the soccer case, and explain how to analyze team sport data in general. We identify challenges arising when facing these data sets and we propose a multi-facet view and analysis including pattern detection, context-aware analysis, and visual explanation. We also present applicable methods and technologies covering the heterogeneous aspects in team sport data. Full article
Figures

Figure 1

Open AccessData Descriptor
Description of a Database Containing Wrist PPG Signals Recorded during Physical Exercise with Both Accelerometer and Gyroscope Measures of Motion
Data 2017, 2(1), 1; doi:10.3390/data2010001 -
Abstract
Wearable heart rate sensors such as those found in smartwatches are commonly based upon Photoplethysmography (PPG) which shines a light into the wrist and measures the amount of light reflected back. This method works well for stationary subjects, but in exercise situations, PPG
[...] Read more.
Wearable heart rate sensors such as those found in smartwatches are commonly based upon Photoplethysmography (PPG) which shines a light into the wrist and measures the amount of light reflected back. This method works well for stationary subjects, but in exercise situations, PPG signals are heavily corrupted by motion artifacts. The presence of these artifacts necessitates the creation of signal processing algorithms for removing the motion interference and allowing the true heart related information to be extracted from the PPG trace during exercise. Here, we describe a new publicly available database of PPG signals collected during exercise for the creation and validation of signal processing algorithms extracting heart rate and heart rate variability from PPG signals. PPG signals from the wrist are recorded together with chest electrocardiography (ECG) to allow a reference/comparison heart rate to be found, and the temporal alignment between the two signal sets is estimated from the signal timestamps. The new database differs from previously available public databases because it includes wrist PPG recorded during walking, running, easy bike riding and hard bike riding. It also provides estimates of the wrist movement recorded using a 3-axis low-noise accelerometer, a 3-axis wide-range accelerometer, and a 3-axis gyroscope. The inclusion of gyroscopic information allows, for the first time, separation of acceleration due to gravity and acceleration due to true motion of the sensor. The hypothesis is that the improved motion information provided could assist in the development of algorithms with better PPG motion artifact removal performance. Full article
Figures

Figure 1

Open AccessReview
Standardization and Quality Control in Data Collection and Assessment of Threatened Plant Species
Data 2016, 1(3), 20; doi:10.3390/data1030020 -
Abstract
Informative data collection is important in the identification and conservation of rare plant species. Data sets generated by many small-scale studies may be integrated into large, distributed databases, and statistical tools are being developed to extract meaningful information from such databases. A diversity
[...] Read more.
Informative data collection is important in the identification and conservation of rare plant species. Data sets generated by many small-scale studies may be integrated into large, distributed databases, and statistical tools are being developed to extract meaningful information from such databases. A diversity of field methodologies may be employed across smaller studies, however, resulting in a lack of standardization and quality control, which makes integration more difficult. Here, we present a case study of the population-level monitoring of two threatened plant species with contrasting life history traits that require different field sampling methodologies: the limestone glade bladderpod, Physaria filiformis, and the western prairie fringed orchid, Plantanthera praeclara. Although different data collection methodologies are necessary for these species based on population sizes and plant morphology, the resulting data allow for similar inferences. Different sample designs may frequently be necessary for rare plant sampling, yet still provide comparable data. Various sources of uncertainty may be associated with data collection (e.g., random sampling error, methodological imprecision, observer error), and should always be quantified if possible and included in data sets, and described in metadata. Ancillary data (e.g., abundance of other plants, physical environment, weather/climate) may be valuable and the most relevant variables may be determined by natural history or empirical studies. Once data are collected, standard operating procedures should be established to prevent errors in data entry. Best practices for data archiving should be followed, and data should be made available for other scientists to use. Efforts to standardize data collection and control data quality, particularly in small-scale field studies, are imperative to future cross-study comparisons, meta-analyses, and systematic reviews. Full article
Figures

Figure 1

Open AccessArticle
Application of Taxonomic Modeling to Microbiota Data Mining for Detection of Helminth Infection in Global Populations
Data 2016, 1(3), 19; doi:10.3390/data1030019 -
Abstract
Human microbiome data from genomic sequencing technologies is fast accumulating, giving us insights into bacterial taxa that contribute to health and disease. The predictive modeling of such microbiota count data for the classification of human infection from parasitic worms, such as helminths, can
[...] Read more.
Human microbiome data from genomic sequencing technologies is fast accumulating, giving us insights into bacterial taxa that contribute to health and disease. The predictive modeling of such microbiota count data for the classification of human infection from parasitic worms, such as helminths, can help in the detection and management across global populations. Real-world datasets of microbiome experiments are typically sparse, containing hundreds of measurements for bacterial species, of which only a few are detected in the bio-specimens that are analyzed. This feature of microbiome data produces the challenge of needing more observations for accurate predictive modeling and has been dealt with previously, using different methods of feature reduction. To our knowledge, integrative methods, such as transfer learning, have not yet been explored in the microbiome domain as a way to deal with data sparsity by incorporating knowledge of different but related datasets. One way of incorporating this knowledge is by using a meaningful mapping among features of these datasets. In this paper, we claim that this mapping would exist among members of each individual cluster, grouped based on phylogenetic dependency among taxa and their association to the phenotype. We validate our claim by showing that models incorporating associations in such a grouped feature space result in no performance deterioration for the given classification task. In this paper, we test our hypothesis by using classification models that detect helminth infection in microbiota of human fecal samples obtained from Indonesia and Liberia countries. In our experiments, we first learn binary classifiers for helminth infection detection by using Naive Bayes, Support Vector Machines, Multilayer Perceptrons, and Random Forest methods. In the next step, we add taxonomic modeling by using the SMART-scan module to group the data, and learn classifiers using the same four methods, to test the validity of the achieved groupings. We observed a 6% to 23% and 7% to 26% performance improvement based on the Area Under the receiver operating characteristic (ROC) Curve (AUC) and Balanced Accuracy (Bacc) measures, respectively, over 10 runs of 10-fold cross-validation. These results show that using phylogenetic dependency for grouping our microbiota data actually results in a noticeable improvement in classification performance for helminth infection detection. These promising results from this feasibility study demonstrate that methods such as SMART-scan can be utilized in the future for knowledge transfer from different but related microbiome datasets by phylogenetically-related functional mapping, to enable novel integrative biomarker discovery. Full article
Figures

Figure 1

Open AccessArticle
The Land Surface Temperature Synergistic Processor in BEAM: A Prototype towards Sentinel-3
Data 2016, 1(3), 18; doi:10.3390/data1030018 -
Abstract
Land Surface Temperature (LST) is one of the key parameters in the physics of land-surface processes on regional and global scales, combining the results of all surface-atmosphere interactions and energy fluxes between the surface and the atmosphere. With the advent of the European
[...] Read more.
Land Surface Temperature (LST) is one of the key parameters in the physics of land-surface processes on regional and global scales, combining the results of all surface-atmosphere interactions and energy fluxes between the surface and the atmosphere. With the advent of the European Space Agency (ESA) Sentinel 3 (S3) satellite, accurate LST retrieval methodologies are being developed by exploiting the synergy between the Ocean and Land Colour Instrument (OLCI) and the Sea and Land Surface Temperature Radiometer (SLSTR). In this paper we explain the implementation in the Basic ENVISAT Toolbox for (A)ATSR and MERIS (BEAM) and the use of one LST algorithm developed in the framework of the Synergistic Use of The Sentinel Missions For Estimating And Monitoring Land Surface Temperature (SEN4LST) project. The LST algorithm is based on the split-window technique with an explicit dependence on the surface emissivity. Performance of the methodology is assessed by using MEdium Resolution Imaging Spectrometer/Advanced Along-Track Scanning Radiometer (MERIS/AATSR) pairs, instruments with similar characteristics than OLCI/ SLSTR, respectively. The LST retrievals were properly validated against in situ data measured along one year (2011) in three test sites, and inter-compared to the standard AATSR level-2 product with satisfactory results. The algorithm is implemented in BEAM using as a basis the MERIS/AATSR Synergy Toolbox. Specific details about the processor validation can be found in the validation report of the SEN4LST project. Full article
Figures

Figure 1

Open AccessData Descriptor
Land Cover Data for the Mississippi–Alabama Barrier Islands, 2010–2011
Data 2016, 1(3), 16; doi:10.3390/data1030016 -
Abstract
Land cover on the Mississippi–Alabama barrier islands was surveyed in 2010–2011 as part of continuing research on island geomorphic and vegetation dynamics following the 2005 impact of Hurricane Katrina. Results of the survey include sub-meter GPS location, a listing of dominant vegetation species
[...] Read more.
Land cover on the Mississippi–Alabama barrier islands was surveyed in 2010–2011 as part of continuing research on island geomorphic and vegetation dynamics following the 2005 impact of Hurricane Katrina. Results of the survey include sub-meter GPS location, a listing of dominant vegetation species and field photographs recorded at 375 sampling locations distributed among Cat, West Ship, East Ship, Horn, Sand, Petit Bois and Dauphin Islands. The survey was conducted in a period of intensive remote sensing data acquisition over the northern Gulf of Mexico by federal, state and commercial organizations in response to the 2010 Macondo Well (Deepwater Horizon) oil spill. The data are useful in providing ground reference information for thematic classification of remotely-sensed imagery, and a record of land cover which may be used in future research. Full article
Figures

Open AccessData Descriptor
SNiPhunter: A SNP-Based Search Engine
Data 2016, 1(3), 17; doi:10.3390/data1030017 -
Abstract
Procuring biomedical literature is a time-consuming process. The genomic sciences software solution described here indexes literature from Pubmed Central’s open access initiative, and makes it available as a web application and through an application programming interface (API). The purpose of this tertiary data
[...] Read more.
Procuring biomedical literature is a time-consuming process. The genomic sciences software solution described here indexes literature from Pubmed Central’s open access initiative, and makes it available as a web application and through an application programming interface (API). The purpose of this tertiary data artifact—called SNiPhunter—is to assist researchers in finding articles relevant to a reference single nucleotide polymorphism (SNP) identifier of interest. A novel feature of this NoSQL (not only structured query language) database search engine is that it returns results to the user ordered according to the amount of times a refSNP has appeared in an article, thereby allowing the user to make a quantitative estimate as to the relevance of an article. Queries can also be launched using author-defined keywords. Additional features include a variant call format (VCF) file parser and a multiple query file upload service. Software implementation in this project relied on Python and the NodeJS interpreter, as well as third party libraries retrieved from Github. Full article
Figures

Open AccessData Descriptor
Technical Guidelines to Extract and Analyze VGI from Different Platforms
Data 2016, 1(3), 15; doi:10.3390/data1030015 -
Abstract
An increasing number of Volunteered Geographic Information (VGI) and social media platforms have been continuously growing in size, which have provided massive georeferenced data in many forms including textual information, photographs, and geoinformation. These georeferenced data have either been actively contributed (e.g., adding
[...] Read more.
An increasing number of Volunteered Geographic Information (VGI) and social media platforms have been continuously growing in size, which have provided massive georeferenced data in many forms including textual information, photographs, and geoinformation. These georeferenced data have either been actively contributed (e.g., adding data to OpenStreetMap (OSM) or Mapillary) or collected in a more passive fashion by enabling geolocation whilst using an online platform (e.g., Twitter, Instagram, or Flickr). The benefit of scraping and streaming these data in stand-alone applications is evident, however, it is difficult for many users to script and scrape the diverse types of these data. On 14 June 2016, a pre-conference workshop at the AGILE 2016 conference in Helsinki, Finland was held. The workshop was called “LINK-VGI: LINKing and analyzing VGI across different platforms”. The workshop provided an opportunity for interested researchers to share ideas and findings on cross-platform data contributions. One portion of the workshop was dedicated to a hands-on session. In this session, the basics of spatial data access through selected Application Programming Interfaces (APIs) and the extraction of summary statistics of the results were illustrated. This paper presents the content of the hands-on session including the scripts and guidelines for extracting VGI data. Researchers, planners, and interested end-users can benefit from this paper for developing their own application for any region of the world. Full article
Figures

Figure 1

Open AccessData Descriptor
688,112 Statistical Results: Content Mining Psychology Articles for Statistical Test Results
Data 2016, 1(3), 14; doi:10.3390/data1030014 -
Abstract
In this data deposit, I describe a dataset that is the result of content mining 167,318 published articles for statistical test results reported according to the standards prescribed by the American Psychological Association (APA). Articles published by the APA, Springer, Sage, and Taylor
[...] Read more.
In this data deposit, I describe a dataset that is the result of content mining 167,318 published articles for statistical test results reported according to the standards prescribed by the American Psychological Association (APA). Articles published by the APA, Springer, Sage, and Taylor & Francis were included (mining from Wiley and Elsevier was actively blocked). As a result of this content mining, 688,112 results from 50,845 articles were extracted. In order to provide a comprehensive set of data, the statistical results are supplemented with metadata from the article they originate from. The dataset is provided in a comma separated file (CSV) in long-format. For each of the 688,112 results, 20 variables are included, of which seven are article metadata and 13 pertain to the individual statistical results (e.g., reported and recalculated p-value). A five-pronged approach was taken to generate the dataset: (i) collect journal lists; (ii) spider journal pages for articles; (iii) download articles; (iv) add article metadata; and (v) mine articles for statistical results. All materials, scripts, etc. are available at https://github.com/chartgerink/2016statcheck_data and preserved at http://dx.doi.org/10.5281/zenodo.59818. Full article
Figures

Figure 1

Open AccessData Descriptor
A New Integrated High-Latitude Thermal Laboratory for the Characterization of Land Surface Processes in Alaska’s Arctic and Boreal Regions
Data 2016, 1(2), 13; doi:10.3390/data1020013 -
Abstract
Alaska’s Arctic and boreal regions, largely dominated by tundra and boreal forest, are witnessing unprecedented changes in response to climate warming. However, the intensity of feedbacks between the hydrosphere and vegetation changes are not yet well quantified in Arctic regions. This lends considerable
[...] Read more.
Alaska’s Arctic and boreal regions, largely dominated by tundra and boreal forest, are witnessing unprecedented changes in response to climate warming. However, the intensity of feedbacks between the hydrosphere and vegetation changes are not yet well quantified in Arctic regions. This lends considerable uncertainty to the prediction of how much, how fast, and where Arctic and boreal hydrology and ecology will change. With a very sparse network of observations (meteorological, flux towers, etc.) in the Alaskan Arctic and boreal regions, remote sensing is the only technology capable of providing the necessary quantitative measurements of land–atmosphere exchanges of water and energy at regional scales in an economically feasible way. Over the last decades, the University of Alaska Fairbanks (UAF) has become the research hub for high-latitude research. UAF’s newly-established Hyperspectral Imaging Laboratory (HyLab) currently provides multiplatform data acquisition, processing, and analysis capabilities spanning microscale laboratory measurements to macroscale analysis of satellite imagery. The specific emphasis is on acquiring and processing satellite and airborne thermal imagery, one of the most important sources of input data in models for the derivation of surface energy fluxes. In this work, we present a synergistic modeling framework that combines multiplatform remote sensing data and calibration/validation (CAL/VAL) activities for the retrieval of land surface temperature (LST). The LST Arctic Dataset will contribute to ecological modeling efforts to help unravel seasonal and spatio-temporal variability in land surface processes and vegetation biophysical properties in Alaska’s Arctic and boreal regions. This dataset will be expanded to other Alaskan Arctic regions, and is expected to have more than 500 images spanning from 1984 to 2012. Full article
Figures

Figure 1

Open AccessData Descriptor
A Spectral Emissivity Library of Spoil Substrates
Data 2016, 1(2), 12; doi:10.3390/data1020012 -
Abstract
Post-mining sites have a significant impact on surrounding ecosystems. Afforestation can restore these ecosystems, but its success and speed depends on the properties of the excavated spoil substrates. Thermal infrared remote sensing brings advantages to the mapping and classification of spoil substrates, resulting
[...] Read more.
Post-mining sites have a significant impact on surrounding ecosystems. Afforestation can restore these ecosystems, but its success and speed depends on the properties of the excavated spoil substrates. Thermal infrared remote sensing brings advantages to the mapping and classification of spoil substrates, resulting in the determination of its properties. A library of spoil substrates containing spectral emissivity and chemical properties can facilitate remote sensing activities. This study presents spectral library of spoil substrates’ emissivities extracted from brown coal mining sites in the Czech Republic. Extracted samples were homogenized by drying and sieving. Spectral emissivity of each sample was determined by spectral smoothing algorithm applied to data measured by a Fourier transform infrared (FTIR) spectrometer. A set of chemical parameters (pH, conductivity, Na, K, Al, Fe, loss on ignition and polyphenol content) and toxicity were determined for each sample as well. The spectral library presented in this paper also offers valuable information in the form of geographical coordinates for the locations where samples were obtained. Presented data are unique in nature and can serve many remote sensing activities in longwave infrared electromagnetic spectrum. Full article
Figures

Figure 1