Open AccessData Descriptor
Chlamydospore Specific Proteins of Candida albicans
Data 2017, 2(3), 26; doi:10.3390/data2030026 (registering DOI) -
Abstract
Polymorphic yeast, Candida albicans, forms thick-walled structures called chlamydospores in order to survive under adverse conditions. We present proteomic profile changes occurring during chlamydospore formation. Chlamydospores were induced by inoculating C. albicans cells (grown for 48 h) on rice extract and semisolid agar
[...] Read more.
Polymorphic yeast, Candida albicans, forms thick-walled structures called chlamydospores in order to survive under adverse conditions. We present proteomic profile changes occurring during chlamydospore formation. Chlamydospores were induced by inoculating C. albicans cells (grown for 48 h) on rice extract and semisolid agar containing tween 80 (1%), and were overlaid by a polyethene sheet to induce microaerophilic conditions at 30 °C. Proteins extracted from chlamydospores and hyphae (producing chlamydospores) were identified by LC-MS/MS analysis. Present datasets include proteomic data (Swath spectral libraries) of chlamydospores and yeast phase cells, as well as methodologies and tools used for the data generation. Further analysis is expected to provide an opportunity to understand modulations in metabolic processes, molecular architecture (i.e., cell wall, membrane, and cytoskeleton) and stress response pathways leading to chlamydospore formation and thus facilitating survival of C. albicans under adverse conditions. Full article
Figures

Figure 1

Open AccessData Descriptor
Thermodynamic Data of Fusarium oxysporum Grown on Different Substrates in Gold Mine Wastewater
Data 2017, 2(3), 24; doi:10.3390/data2030024 -
Abstract
The necessity for sustainable process development has led to an upsurge in bio-based processes, thereby placing a higher demand on the use of suitable microorganisms. Similarly, thermodynamics is a veritable tool that can predict the behavior of any material under well-defined conditions. Thermodynamic
[...] Read more.
The necessity for sustainable process development has led to an upsurge in bio-based processes, thereby placing a higher demand on the use of suitable microorganisms. Similarly, thermodynamics is a veritable tool that can predict the behavior of any material under well-defined conditions. Thermodynamic data of Fusariumoxysporum used in the bioremediation of gold mine wastewater, for a process supported with different carbon sources, was investigated. The data were obtained using a Discovery DSC® (TA Instruments, Inc. New Castle, DE, USA) equipped with modulated Differential Scanning Calorimeter (MDSCTM) software. The data revealed minimal differences in the physical properties of the F. oxysporum used, indicating that the utilisation of agro-waste for microbial proliferation in wastewater treatment is as feasible as when refined carbon sources are used. The data will be helpful for the development of environmentally benign process development strategies, especially for environmental engineering applications. Full article
Open AccessData Descriptor
A Database of Weekly Sea Ice Parcel Tracks Derived from Lagrangian Motion Data with Ancillary Data Products
Data 2017, 2(3), 25; doi:10.3390/data2030025 -
Abstract
Arctic sea ice has been on the decline over the past several decades, and multi-year sea ice has decreased significantly in its areal share of the overall sea ice cover. Changes in several key variables such as radiative balances, albedo, ice surface temperature,
[...] Read more.
Arctic sea ice has been on the decline over the past several decades, and multi-year sea ice has decreased significantly in its areal share of the overall sea ice cover. Changes in several key variables such as radiative balances, albedo, ice surface temperature, and ice thickness have driven much of the decline, but the motion of sea ice makes studying the effects of these variables on individual parcels difficult. Previous studies have observed changes in the means of these variables and their impacts on sea ice concentration, but an accessible database of Lagrangian tracked data is not yet available for study. In order to address this, a database has been developed at the University of Colorado Boulder that performs Lagrangian tracking on individual sea ice parcels and saves coincident ancillary thermodynamic and dynamic variables for each parcel on a weekly timescale. Full article
Figures

Figure 1

Open AccessData Descriptor
Overview of German Additive Manufacturing Companies
Data 2017, 2(3), 23; doi:10.3390/data2030023 -
Abstract
This dataset is the description of a curated list of companies involved in additive manufacturing in Germany. The companies included are of various categories, such as 3D printing providers, hardware manufacturers, software developers and vendors. The list was compiled through literature and Internet-based
[...] Read more.
This dataset is the description of a curated list of companies involved in additive manufacturing in Germany. The companies included are of various categories, such as 3D printing providers, hardware manufacturers, software developers and vendors. The list was compiled through literature and Internet-based research, resulting in the compilation of information from a number of resources, such as the Bundesanzeiger (Federal Gazette), the Registergerichte (Register Courts), the respective websites themselves and a B2B marketplace (Wer liefert Was?). The aim of compiling this list is to provide information to researchers on the current situation of 3D printing in Germany. Full article
Figures

Figure 1

Open AccessData Descriptor
A High Resolution Dataset of Drought Indices for Spain
Data 2017, 2(3), 22; doi:10.3390/data2030022 -
Abstract
Drought indices are essential metrics for quantifying drought severity and identifying possible changes in the frequency and duration of drought hazards. In this study, we developed a new high spatial resolution dataset of drought indices covering all of Spain. The dataset includes seven
[...] Read more.
Drought indices are essential metrics for quantifying drought severity and identifying possible changes in the frequency and duration of drought hazards. In this study, we developed a new high spatial resolution dataset of drought indices covering all of Spain. The dataset includes seven drought indices, spans the period 1961–2014, and has a spatial resolution of 1.1 km and a weekly temporal resolution. A web portal has been created to enable download and visualization of the data. The data can be downloaded as single gridded points for each drought index, but the entire drought index dataset can also be downloaded in netCDF4 format. The dataset will be updated for complete years as the raw meteorological data become available. Full article
Figures

Figure 1

Open AccessArticle
Using Semantic Web Technologies to Query and Manage Information within Federated Cyber-Infrastructures
Data 2017, 2(3), 21; doi:10.3390/data2030021 -
Abstract
A standardized descriptive ontology supports efficient querying and manipulation of data from heterogeneous sources across boundaries of distributed infrastructures, particularly in federated environments. In this article, we present the Open-Multinet (OMN) set of ontologies, which were designed specifically for this purpose as well
[...] Read more.
A standardized descriptive ontology supports efficient querying and manipulation of data from heterogeneous sources across boundaries of distributed infrastructures, particularly in federated environments. In this article, we present the Open-Multinet (OMN) set of ontologies, which were designed specifically for this purpose as well as to support management of life-cycles of infrastructure resources. We present their initial application in Future Internet testbeds, their use for representing and requesting available resources, and our experimental performance evaluation of the ontologies in terms of querying and translation times. Our results highlight the value and applicability of Semantic Web technologies in managing resources of federated cyber-infrastructures. Full article
Figures

Figure 1

Open AccessArticle
Open Source Fundamental Industry Classification
Data 2017, 2(2), 20; doi:10.3390/data2020020 -
Abstract
Abstract: We provide complete source code for building a fundamental industry classification based on publicly available and freely downloadable data. We compare various fundamental industry classifications by running a horserace of short-horizon trading signals (alphas) utilizing open source heterotic risk models (https://ssrn.com/abstract=2600798)
[...] Read more.
Abstract: We provide complete source code for building a fundamental industry classification based on publicly available and freely downloadable data. We compare various fundamental industry classifications by running a horserace of short-horizon trading signals (alphas) utilizing open source heterotic risk models (https://ssrn.com/abstract=2600798) built using such industry classifications. Our source code includes various stand-alone and portable modules, e.g., for downloading/parsing web data, etc. Full article
Figures

Figure 1

Open AccessData Descriptor
Four Datasets Derived from an Archive of Personal Homepages (1995–2009)
Data 2017, 2(2), 19; doi:10.3390/data2020019 -
Abstract
While data from social media are easily accessible, understanding how individuals expressed themselves on the Internet in its initial years of public availability (the mid-late 1990s) has proved difficult. In this data deposit, I describe how archival data from Geocities homepages were retrieved
[...] Read more.
While data from social media are easily accessible, understanding how individuals expressed themselves on the Internet in its initial years of public availability (the mid-late 1990s) has proved difficult. In this data deposit, I describe how archival data from Geocities homepages were retrieved and processed to remove non-text data, then further refined to create separate datasets, each of which provides unique insights into modes of personal expression on the early Internet. The present paper describes four datasets, all of which were derived from a larger collection of personal websites: (1) a large corpus of raw text data from Geocities personal homepages, (2) a linguistic analysis of basic psychological properties of the same Geocities pages, using an open-source implementation of the Linguistic Inquiry Word Count (LIWC), (3) a dataset of links between homepages (suitable for network analysis), and (4) a manifest dataset summarizing the size and last update date for each file in the dataset. Data from over 378,000 Geocities pages are included. In addition to providing a detailed description of how these datasets were created, I describe how they might be utilized in future research. Full article
Figures

Figure 1

Open AccessData Descriptor
Towards Automatic Bird Detection: An Annotated and Segmented Acoustic Dataset of Seven Picidae Species
Data 2017, 2(2), 18; doi:10.3390/data2020018 -
Abstract
Analysing behavioural patterns of bird species in a certain region enables researchers to recognize forthcoming changes in environment, ecology, and population. Ornithologists spend many hours observing and recording birds in their natural habitat to compare different audio samples and extract valuable insights. This
[...] Read more.
Analysing behavioural patterns of bird species in a certain region enables researchers to recognize forthcoming changes in environment, ecology, and population. Ornithologists spend many hours observing and recording birds in their natural habitat to compare different audio samples and extract valuable insights. This manual process is typically undertaken by highly-experienced birders that identify every species and its associated type of sound. In recent years, some public repositories hosting labelled acoustic samples from different bird species have emerged, which has resulted in appealing datasets that computer scientists can use to test the accuracy of their machine learning algorithms and assist ornithologists in the time-consuming process of analyzing audio data. Current limitations in the performance of these algorithms come from the fact that the acoustic samples of these datasets combine fragments with only environmental noise and fragments with the bird sound (i.e., the computer confuses environmental sound with the bird sound). Therefore, the purpose of this paper is to release a dataset lasting more than 4984 s that contains differentiated samples of (1) bird sounds and (2) environmental sounds. This data descriptor releases the processed audio samples—originally obtained from the Xeno-Canto repository—from the known seven families of the Picidae species inhabiting the Iberian Peninsula that are good indicators of the habitat quality and have significant value from the environment conservation point of view. Full article
Figures

Figure 1

Open AccessData Descriptor
Transcriptome Dataset of Soybean (Glycine max) Grown under Phosphorus-Deficient and -Sufficient Conditions
Data 2017, 2(2), 17; doi:10.3390/data2020017 -
Abstract
This data descriptor introduces the dataset of the transcriptome of low-phosphorus tolerant soybean (Glycine max) variety NN94-156 under phosphorus-deficient and -sufficient conditions. This data is comprised of the transcriptome datasets (four libraries) acquired from roots and leaves of the soybean plants
[...] Read more.
This data descriptor introduces the dataset of the transcriptome of low-phosphorus tolerant soybean (Glycine max) variety NN94-156 under phosphorus-deficient and -sufficient conditions. This data is comprised of the transcriptome datasets (four libraries) acquired from roots and leaves of the soybean plants challenged with low-phosphorus, which allows further analysis whether systemic tolerance response to low phosphorus stress occurred. We describe the detailed procedure of how plants were prepared and treated and how the data were generated and pre-processed. Further analyses of this data would be helpful to improve our understanding of molecular mechanisms of low-phosphorus stress in soybean. Full article
Open AccessData Descriptor
Long-Term Land Cover Data for the Lower Peninsula of Michigan, 2010–2050
Data 2017, 2(2), 16; doi:10.3390/data2020016 -
Abstract
Land cover data are often used to examine the impacts of landscape alterations on the environment from the local to global scale. Although various agencies produce land cover data at various spatial scales, data are still limited at the regional scale over extended
[...] Read more.
Land cover data are often used to examine the impacts of landscape alterations on the environment from the local to global scale. Although various agencies produce land cover data at various spatial scales, data are still limited at the regional scale over extended timescales. This is a critical data gap since decision-makers often use future and long-term land cover maps to develop effective policies for sustainable environmental systems. As a result, land change science incorporates common data mining tools to create future land cover maps that extend over long timescales. This study applied one of the well-known land cover change models, called Land Transformation Model (LTM), to produce urbanization maps for the Lower Peninsula of Michigan in United States from 2010 to 2050 with five year intervals. Long-term urbanization data in the Lower Peninsula of Michigan can be used in various environmental studies such as assessing the impact of future urbanization on climate change, water quality, food security and biodiversity. Full article
Figures

Figure 1

Open AccessArticle
Demonstration Study: A Protocol to Combine Online Tools and Databases for Identifying Potentially Repurposable Drugs
Data 2017, 2(2), 15; doi:10.3390/data2020015 -
Abstract
Traditional methods for discovery and development of new drugs can be very time-consuming and expensive processes because they include several stages, such as compound identification, pre-clinical and clinical trials before the drug is approved by the U.S. Food and Drug Administration (FDA). Therefore,
[...] Read more.
Traditional methods for discovery and development of new drugs can be very time-consuming and expensive processes because they include several stages, such as compound identification, pre-clinical and clinical trials before the drug is approved by the U.S. Food and Drug Administration (FDA). Therefore, drug repurposing, namely using currently FDA-approved drugs as therapeutics for other diseases than what they are originally prescribed for, is emerging to be a faster and more cost-effective alternative to current drug discovery methods. In this paper, we have described a three-step in silico protocol for analyzing transcriptomics data using online databases and bioinformatics tools for identifying potentially repurposable drugs. The efficacy of this protocol was evaluated by comparing its predictions with the findings of two case studies of recently reported repurposed drugs: HIV treating drug zidovudine for the treatment of dry age-related macular degeneration and the antidepressant imipramine for small-cell lung carcinoma. The proposed protocol successfully identified the published findings, thus demonstrating the efficacy of this method. In addition, it also yielded several novel predictions that have not yet been published, including the finding that imipramine could potentially treat Severe Acute Respiratory Syndrome (SARS), a disease that currently does not have any treatment or vaccine. Since this in silico protocol is simple to use and does not require advanced computer skills, we believe any motivated participant with access to these databases and tools would be able to apply it to large datasets to identify other potentially repurposable drugs in the future. Full article
Figures

Figure 1

Open AccessData Descriptor
CHASE-PL—Future Hydrology Data Set: Projections of Water Balance and Streamflow for the Vistula and Odra Basins, Poland
Data 2017, 2(2), 14; doi:10.3390/data2020014 -
Abstract
There is considerable concern that the water resources of Central and Eastern Europe region can be adversely affected by climate change. Projections of future water balance and streamflow conditions can be obtained by forcing hydrological models with the output from climate models. In
[...] Read more.
There is considerable concern that the water resources of Central and Eastern Europe region can be adversely affected by climate change. Projections of future water balance and streamflow conditions can be obtained by forcing hydrological models with the output from climate models. In this study, we employed the SWAT hydrological model driven with an ensemble of nine bias-corrected EURO-CORDEX climate simulations to generate future hydrological projections for the Vistula and Odra basins in two future horizons (2024–2050 and 2074–2100) under two Representative Concentration Pathways (RCPs). The data set consists of three parts: (1) model inputs; (2) raw model outputs; (3) aggregated model outputs. The first one allows the users to reproduce the outputs or to create the new ones. The second one contains the simulated time series of 10 variables simulated by SWAT: precipitation, snow melt, potential evapotranspiration, actual evapotranspiration, soil water content, percolation, surface runoff, baseflow, water yield and streamflow. The third one consists of the multi-model ensemble statistics of the relative changes in mean seasonal and annual variables developed in a GIS format. The data set should be of interest of climate impact scientists, water managers and water-sector policy makers. In any case, it should be noted that projections included in this data set are associated with high uncertainties explained in this data descriptor paper. Full article
Figures

Figure 1

Open AccessData Descriptor
Open Access Article Processing Charges (OA APC) Longitudinal Study 2016 Dataset
Data 2017, 2(2), 13; doi:10.3390/data2020013 -
Abstract
This article documents Open access article processing charges (OA APC) Main 2016. This dataset was developed as part of a longitudinal study of the minority (about a third) of the fully open access journals that use the APC business model. APC data for
[...] Read more.
This article documents Open access article processing charges (OA APC) Main 2016. This dataset was developed as part of a longitudinal study of the minority (about a third) of the fully open access journals that use the APC business model. APC data for 2016, 2015, 2014, and 2013 are primarily obtained from publishers’ websites, a process that requires analytic skill as many publishers offer a diverse range of pricing options, including multiple currencies and/or differential pricing by article type, length or work involved and/or discounts for author contributions to editing or the society publisher or based on perceived ability to pay. This version of the dataset draws heavily from the work of Walt Crawford, and includes his entire 2011–2015 dataset; in particular Crawford’s work has made it possible to confirm “no publication fee” status for a large number of journals. DOAJ metadata for 2016 and 2014 and a 2010 APC sample provided by Solomon and Björk are part of the dataset. Inclusion of DOAJ metadata and article counts by Crawford and Solomon and Björk provide a basis for studies of factors such as journal size, subject, or country of publication that might be worth testing for correlation with business model and/or APC size. Full article
Open AccessData Descriptor
Ecological and Functional Traits in 99 Bird Species over a Large-Scale Gradient in Germany
Data 2017, 2(2), 12; doi:10.3390/data2020012 -
Abstract
A gap still exists in published data on variation of morphological and ecological traits for common bird species over a large area. To diminish this knowledge gap, we report here average values of 99 bird species from three sites in Germany from the
[...] Read more.
A gap still exists in published data on variation of morphological and ecological traits for common bird species over a large area. To diminish this knowledge gap, we report here average values of 99 bird species from three sites in Germany from the Biodiversity Exploratories on 24 ecological and functional traits. We present our own data on morphological and ecological traits of 28 common bird species and provide additional measurements for further species from published studies. This is a unique data set from live birds, which has not been published and is available neither from museum nor from any other collection in the presented coverage. Dataset: available as the supplementary file. Dataset license: CC-BY Full article
Figures

Figure 1

Open AccessErratum
Erratum: Morrison, H., et al. Open Access Article Processing Charges (OA APC) Longitudinal Study 2015 Preliminary Dataset
Data 2017, 2(1), 11; doi:10.3390/data2010011 -
Abstract
The authors wish to make the following corrections to their paper [...] Full article
Open AccessData Descriptor
Herbarium of the Pontifical Catholic University of Paraná (HUCP), Curitiba, Southern Brazil
Data 2017, 2(1), 10; doi:10.3390/data2010010 -
Abstract
The main objective of this paper is to present the herbarium of the Pontifical Catholic University of Parana’s and its collection. The history of the HUCP had its beginning in the middle of the 1970s with the foundation of the Biology Museum that
[...] Read more.
The main objective of this paper is to present the herbarium of the Pontifical Catholic University of Parana’s and its collection. The history of the HUCP had its beginning in the middle of the 1970s with the foundation of the Biology Museum that gathered both botanical and zoological specimens. In April 1979 collections were separated and the HUCP was founded with preserved specimens of algae (green, red, and brown), fungi, and embryophytes. As of October 2016, the collection encompasses nearly 25,000 specimens from 4934 species, 1609 genera, and 297 families. Most of the specimens comes from the state of Paraná but there were also specimens from many Brazilian states and other countries, mainly from South America (Chile, Argentina, Uruguay, Paraguay, and Colombia) but also from other parts of the world (Cuba, USA, Spain, Germany, China, and Australia). Our collection includes 42 fungi, 258 gymnosperms, 299 bryophytes, 2809 pteridophytes, 3158 algae, 17,832 angiosperms, and only one type of Mimosa (Mimosa tucumensis Barneby ex Ribas, M. Morales & Santos-Silva—Fabaceae). We also have botanical education and education for sustainability programs for basic and high school students and training for teachers. Full article
Figures

Figure 1

Open AccessArticle
The Effectiveness of Geographical Data in Multi-Criteria Evaluation of Landscape Services †
Data 2017, 2(1), 9; doi:10.3390/data2010009 -
Abstract
The aim of the paper is to map and evaluate the state of the multifunctional landscape of the municipality of Naples (Italy) and its surroundings, through a Spatial Decision-Making support system (SDSS) combining geographic information system (GIS) and a multi-criteria method an analytic
[...] Read more.
The aim of the paper is to map and evaluate the state of the multifunctional landscape of the municipality of Naples (Italy) and its surroundings, through a Spatial Decision-Making support system (SDSS) combining geographic information system (GIS) and a multi-criteria method an analytic hierarchy process (AHP). We conceive a knowledge-mapping-evaluation (KME) framework in order to investigate the landscape as a complex system. The focus of the proposed methodology involving data gathering and processing. Therefore, both the authoritative and the unofficial sources, e.g., volunteered geographical information (VGI), are useful tools to enhance the information flow whenever quality assurance is performed. Thus, the maps of spatial criteria are useful for problem structuring and prioritization by considering the availability of context-aware data. Finally, the identification of landscape services (LS) and ecosystem services (ES) can improve the decision-making processes within a multi-stakeholders perspective involving the evaluation of the trade-off. The results show multi-criteria choropleth maps of the LS and ES with the density of services, the spatial distribution, and the surrounding benefits. Full article
Figures

Figure 1

Open AccessData Descriptor
Data on Healthy Food Accessibility in Amsterdam, The Netherlands
Data 2017, 2(1), 7; doi:10.3390/data2010007 -
Abstract
This data descriptor introduces data on healthy food supplied by supermarkets in the city of Amsterdam, The Netherlands. In addition to two neighborhood variables (i.e., share of autochthons and average housing values), the data comprises three street network-based accessibility measures derived from analyses
[...] Read more.
This data descriptor introduces data on healthy food supplied by supermarkets in the city of Amsterdam, The Netherlands. In addition to two neighborhood variables (i.e., share of autochthons and average housing values), the data comprises three street network-based accessibility measures derived from analyses using a geographic information system. Data are provided on a spatial micro-scale utilizing grid cells with a spatial resolution of 100 m. We explain how the data were collected and pre-processed, and how alternative analyses can be set up. To illustrate the use of the data, an example is provided using the R programming language. Full article
Figures

Figure 1

Open AccessArticle
An Overview and Evaluation of Recent Machine Learning Imputation Methods Using Cardiac Imaging Data
Data 2017, 2(1), 8; doi:10.3390/data2010008 -
Abstract
Many clinical research datasets have a large percentage of missing values that directly impacts their usefulness in yielding high accuracy classifiers when used for training in supervised machine learning. While missing value imputation methods have been shown to work well with smaller percentages
[...] Read more.
Many clinical research datasets have a large percentage of missing values that directly impacts their usefulness in yielding high accuracy classifiers when used for training in supervised machine learning. While missing value imputation methods have been shown to work well with smaller percentages of missing values, their ability to impute sparse clinical research data can be problem specific. We previously attempted to learn quantitative guidelines for ordering cardiac magnetic resonance imaging during the evaluation for pediatric cardiomyopathy, but missing data significantly reduced our usable sample size. In this work, we sought to determine if increasing the usable sample size through imputation would allow us to learn better guidelines. We first review several machine learning methods for estimating missing data. Then, we apply four popular methods (mean imputation, decision tree, k-nearest neighbors, and self-organizing maps) to a clinical research dataset of pediatric patients undergoing evaluation for cardiomyopathy. Using Bayesian Rule Learning (BRL) to learn ruleset models, we compared the performance of imputation-augmented models versus unaugmented models. We found that all four imputation-augmented models performed similarly to unaugmented models. While imputation did not improve performance, it did provide evidence for the robustness of our learned models. Full article
Figures

Figure 2