Next Issue
Volume 2, September
Previous Issue
Volume 2, March
 
 

Data, Volume 2, Issue 2 (June 2017) – 9 articles

  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.
Order results
Result details
Section
Select all
Export citation of selected articles as:
540 KiB  
Article
Open Source Fundamental Industry Classification
by Zura Kakushadze and Willie Yu
Data 2017, 2(2), 20; https://doi.org/10.3390/data2020020 - 17 Jun 2017
Cited by 3 | Viewed by 8129
Abstract
Abstract: We provide complete source code for building a fundamental industry classification based on publicly available and freely downloadable data. We compare various fundamental industry classifications by running a horserace of short-horizon trading signals (alphas) utilizing open source heterotic risk models (https://ssrn.com/abstract=2600798) [...] Read more.
Abstract: We provide complete source code for building a fundamental industry classification based on publicly available and freely downloadable data. We compare various fundamental industry classifications by running a horserace of short-horizon trading signals (alphas) utilizing open source heterotic risk models (https://ssrn.com/abstract=2600798) built using such industry classifications. Our source code includes various stand-alone and portable modules, e.g., for downloading/parsing web data, etc. Full article
Show Figures

Figure 1

867 KiB  
Data Descriptor
Four Datasets Derived from an Archive of Personal Homepages (1995–2009)
by Sean C. Rife
Data 2017, 2(2), 19; https://doi.org/10.3390/data2020019 - 13 Jun 2017
Cited by 10 | Viewed by 3916
Abstract
While data from social media are easily accessible, understanding how individuals expressed themselves on the Internet in its initial years of public availability (the mid-late 1990s) has proved difficult. In this data deposit, I describe how archival data from Geocities homepages were retrieved [...] Read more.
While data from social media are easily accessible, understanding how individuals expressed themselves on the Internet in its initial years of public availability (the mid-late 1990s) has proved difficult. In this data deposit, I describe how archival data from Geocities homepages were retrieved and processed to remove non-text data, then further refined to create separate datasets, each of which provides unique insights into modes of personal expression on the early Internet. The present paper describes four datasets, all of which were derived from a larger collection of personal websites: (1) a large corpus of raw text data from Geocities personal homepages, (2) a linguistic analysis of basic psychological properties of the same Geocities pages, using an open-source implementation of the Linguistic Inquiry Word Count (LIWC), (3) a dataset of links between homepages (suitable for network analysis), and (4) a manifest dataset summarizing the size and last update date for each file in the dataset. Data from over 378,000 Geocities pages are included. In addition to providing a detailed description of how these datasets were created, I describe how they might be utilized in future research. Full article
Show Figures

Figure 1

9858 KiB  
Data Descriptor
Towards Automatic Bird Detection: An Annotated and Segmented Acoustic Dataset of Seven Picidae Species
by Ester Vidaña-Vila, Joan Navarro and Rosa Ma Alsina-Pagès
Data 2017, 2(2), 18; https://doi.org/10.3390/data2020018 - 16 May 2017
Cited by 7 | Viewed by 7630
Abstract
Analysing behavioural patterns of bird species in a certain region enables researchers to recognize forthcoming changes in environment, ecology, and population. Ornithologists spend many hours observing and recording birds in their natural habitat to compare different audio samples and extract valuable insights. This [...] Read more.
Analysing behavioural patterns of bird species in a certain region enables researchers to recognize forthcoming changes in environment, ecology, and population. Ornithologists spend many hours observing and recording birds in their natural habitat to compare different audio samples and extract valuable insights. This manual process is typically undertaken by highly-experienced birders that identify every species and its associated type of sound. In recent years, some public repositories hosting labelled acoustic samples from different bird species have emerged, which has resulted in appealing datasets that computer scientists can use to test the accuracy of their machine learning algorithms and assist ornithologists in the time-consuming process of analyzing audio data. Current limitations in the performance of these algorithms come from the fact that the acoustic samples of these datasets combine fragments with only environmental noise and fragments with the bird sound (i.e., the computer confuses environmental sound with the bird sound). Therefore, the purpose of this paper is to release a dataset lasting more than 4984 s that contains differentiated samples of (1) bird sounds and (2) environmental sounds. This data descriptor releases the processed audio samples—originally obtained from the Xeno-Canto repository—from the known seven families of the Picidae species inhabiting the Iberian Peninsula that are good indicators of the habitat quality and have significant value from the environment conservation point of view. Full article
Show Figures

Figure 1

174 KiB  
Data Descriptor
Transcriptome Dataset of Soybean (Glycine max) Grown under Phosphorus-Deficient and -Sufficient Conditions
by Hengyou Zhang, Shanshan Chu and Dan Zhang
Data 2017, 2(2), 17; https://doi.org/10.3390/data2020017 - 16 May 2017
Cited by 10 | Viewed by 3994
Abstract
This data descriptor introduces the dataset of the transcriptome of low-phosphorus tolerant soybean (Glycine max) variety NN94-156 under phosphorus-deficient and -sufficient conditions. This data is comprised of the transcriptome datasets (four libraries) acquired from roots and leaves of the soybean plants [...] Read more.
This data descriptor introduces the dataset of the transcriptome of low-phosphorus tolerant soybean (Glycine max) variety NN94-156 under phosphorus-deficient and -sufficient conditions. This data is comprised of the transcriptome datasets (four libraries) acquired from roots and leaves of the soybean plants challenged with low-phosphorus, which allows further analysis whether systemic tolerance response to low phosphorus stress occurred. We describe the detailed procedure of how plants were prepared and treated and how the data were generated and pre-processed. Further analyses of this data would be helpful to improve our understanding of molecular mechanisms of low-phosphorus stress in soybean. Full article
4340 KiB  
Data Descriptor
Long-Term Land Cover Data for the Lower Peninsula of Michigan, 2010–2050
by Amin Tayyebi, Samuel J. Smidt and Bryan C. Pijanowski
Data 2017, 2(2), 16; https://doi.org/10.3390/data2020016 - 05 May 2017
Cited by 4 | Viewed by 4793
Abstract
Land cover data are often used to examine the impacts of landscape alterations on the environment from the local to global scale. Although various agencies produce land cover data at various spatial scales, data are still limited at the regional scale over extended [...] Read more.
Land cover data are often used to examine the impacts of landscape alterations on the environment from the local to global scale. Although various agencies produce land cover data at various spatial scales, data are still limited at the regional scale over extended timescales. This is a critical data gap since decision-makers often use future and long-term land cover maps to develop effective policies for sustainable environmental systems. As a result, land change science incorporates common data mining tools to create future land cover maps that extend over long timescales. This study applied one of the well-known land cover change models, called Land Transformation Model (LTM), to produce urbanization maps for the Lower Peninsula of Michigan in United States from 2010 to 2050 with five year intervals. Long-term urbanization data in the Lower Peninsula of Michigan can be used in various environmental studies such as assessing the impact of future urbanization on climate change, water quality, food security and biodiversity. Full article
(This article belongs to the Special Issue Geospatial Data)
Show Figures

Figure 1

5473 KiB  
Article
Demonstration Study: A Protocol to Combine Online Tools and Databases for Identifying Potentially Repurposable Drugs
by Aditi Chattopadhyay and Madhavi K. Ganapathiraju
Data 2017, 2(2), 15; https://doi.org/10.3390/data2020015 - 04 May 2017
Cited by 3 | Viewed by 6319
Abstract
Traditional methods for discovery and development of new drugs can be very time-consuming and expensive processes because they include several stages, such as compound identification, pre-clinical and clinical trials before the drug is approved by the U.S. Food and Drug Administration (FDA). Therefore, [...] Read more.
Traditional methods for discovery and development of new drugs can be very time-consuming and expensive processes because they include several stages, such as compound identification, pre-clinical and clinical trials before the drug is approved by the U.S. Food and Drug Administration (FDA). Therefore, drug repurposing, namely using currently FDA-approved drugs as therapeutics for other diseases than what they are originally prescribed for, is emerging to be a faster and more cost-effective alternative to current drug discovery methods. In this paper, we have described a three-step in silico protocol for analyzing transcriptomics data using online databases and bioinformatics tools for identifying potentially repurposable drugs. The efficacy of this protocol was evaluated by comparing its predictions with the findings of two case studies of recently reported repurposed drugs: HIV treating drug zidovudine for the treatment of dry age-related macular degeneration and the antidepressant imipramine for small-cell lung carcinoma. The proposed protocol successfully identified the published findings, thus demonstrating the efficacy of this method. In addition, it also yielded several novel predictions that have not yet been published, including the finding that imipramine could potentially treat Severe Acute Respiratory Syndrome (SARS), a disease that currently does not have any treatment or vaccine. Since this in silico protocol is simple to use and does not require advanced computer skills, we believe any motivated participant with access to these databases and tools would be able to apply it to large datasets to identify other potentially repurposable drugs in the future. Full article
(This article belongs to the Special Issue Biomedical Informatics)
Show Figures

Figure 1

676 KiB  
Data Descriptor
CHASE-PL—Future Hydrology Data Set: Projections of Water Balance and Streamflow for the Vistula and Odra Basins, Poland
by Mikołaj Piniewski, Mateusz Szcześniak and Ignacy Kardel
Data 2017, 2(2), 14; https://doi.org/10.3390/data2020014 - 26 Apr 2017
Cited by 8 | Viewed by 6330
Abstract
There is considerable concern that the water resources of Central and Eastern Europe region can be adversely affected by climate change. Projections of future water balance and streamflow conditions can be obtained by forcing hydrological models with the output from climate models. In [...] Read more.
There is considerable concern that the water resources of Central and Eastern Europe region can be adversely affected by climate change. Projections of future water balance and streamflow conditions can be obtained by forcing hydrological models with the output from climate models. In this study, we employed the SWAT hydrological model driven with an ensemble of nine bias-corrected EURO-CORDEX climate simulations to generate future hydrological projections for the Vistula and Odra basins in two future horizons (2024–2050 and 2074–2100) under two Representative Concentration Pathways (RCPs). The data set consists of three parts: (1) model inputs; (2) raw model outputs; (3) aggregated model outputs. The first one allows the users to reproduce the outputs or to create the new ones. The second one contains the simulated time series of 10 variables simulated by SWAT: precipitation, snow melt, potential evapotranspiration, actual evapotranspiration, soil water content, percolation, surface runoff, baseflow, water yield and streamflow. The third one consists of the multi-model ensemble statistics of the relative changes in mean seasonal and annual variables developed in a GIS format. The data set should be of interest of climate impact scientists, water managers and water-sector policy makers. In any case, it should be noted that projections included in this data set are associated with high uncertainties explained in this data descriptor paper. Full article
(This article belongs to the Special Issue Open Data and Robust & Reliable GIScience)
Show Figures

Figure 1

206 KiB  
Data Descriptor
Open Access Article Processing Charges (OA APC) Longitudinal Study 2016 Dataset
by Heather Morrison, Widlyne Brutus, Myriam Dumais-Desrosiers, Tanoh Laurent Kakou, Katherine Laprade, Salah Merhi, Arbia Ouerghi, Jihane Salhab, Victoria Volkanova and Sara Wheatley
Data 2017, 2(2), 13; https://doi.org/10.3390/data2020013 - 08 Apr 2017
Cited by 3 | Viewed by 7232
Abstract
This article documents Open access article processing charges (OA APC) Main 2016. This dataset was developed as part of a longitudinal study of the minority (about a third) of the fully open access journals that use the APC business model. APC data for [...] Read more.
This article documents Open access article processing charges (OA APC) Main 2016. This dataset was developed as part of a longitudinal study of the minority (about a third) of the fully open access journals that use the APC business model. APC data for 2016, 2015, 2014, and 2013 are primarily obtained from publishers’ websites, a process that requires analytic skill as many publishers offer a diverse range of pricing options, including multiple currencies and/or differential pricing by article type, length or work involved and/or discounts for author contributions to editing or the society publisher or based on perceived ability to pay. This version of the dataset draws heavily from the work of Walt Crawford, and includes his entire 2011–2015 dataset; in particular Crawford’s work has made it possible to confirm “no publication fee” status for a large number of journals. DOAJ metadata for 2016 and 2014 and a 2010 APC sample provided by Solomon and Björk are part of the dataset. Inclusion of DOAJ metadata and article counts by Crawford and Solomon and Björk provide a basis for studies of factors such as journal size, subject, or country of publication that might be worth testing for correlation with business model and/or APC size. Full article
622 KiB  
Data Descriptor
Ecological and Functional Traits in 99 Bird Species over a Large-Scale Gradient in Germany
by Swen C. Renner and Willem van Hoesel
Data 2017, 2(2), 12; https://doi.org/10.3390/data2020012 - 31 Mar 2017
Cited by 5 | Viewed by 5555
Abstract
A gap still exists in published data on variation of morphological and ecological traits for common bird species over a large area. To diminish this knowledge gap, we report here average values of 99 bird species from three sites in Germany from the [...] Read more.
A gap still exists in published data on variation of morphological and ecological traits for common bird species over a large area. To diminish this knowledge gap, we report here average values of 99 bird species from three sites in Germany from the Biodiversity Exploratories on 24 ecological and functional traits. We present our own data on morphological and ecological traits of 28 common bird species and provide additional measurements for further species from published studies. This is a unique data set from live birds, which has not been published and is available neither from museum nor from any other collection in the presented coverage. Dataset: available as the supplementary file. Dataset license: CC-BY Full article
(This article belongs to the Special Issue Biodiversity and Species Traits)
Show Figures

Figure 1

Previous Issue
Next Issue
Back to TopTop