Next Issue
Volume 5, September
Previous Issue
Volume 5, March
 
 

Data, Volume 5, Issue 2 (June 2020) – 29 articles

Cover Story (view full-size image): An increasing number of chemicals, such as pharmaceuticals, pesticides, and synthetic hormones, are in daily use worldwide. In the environment, chemicals can adversely affect biological populations and communities and, in turn, related ecosystem functions. Standartox is a database and tool that collects ecotoxicological test information to support the evaluation of environmental effects and risks of chemicals. Standartox cleans and harmonizes these data and subsequently provides access to functions that allow the data to be filtered and aggregated according to the user’s requirements. Large amounts of toxicity data on chemicals are currently scattered among various resources and are cumbersome to process. Standartox steadily incorporates new ecotoxicity data and aims at facilitating data access. View this paper.
  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.
Order results
Result details
Section
Select all
Export citation of selected articles as:
25 pages, 15956 KiB  
Data Descriptor
A Probabilistic Bag-to-Class Approach to Multiple-Instance Learning
by Kajsa Møllersen, Jon Yngve Hardeberg and Fred Godtliebsen
Data 2020, 5(2), 56; https://doi.org/10.3390/data5020056 - 26 Jun 2020
Cited by 2 | Viewed by 3250
Abstract
Multi-instance (MI) learning is a branch of machine learning, where each object (bag) consists of multiple feature vectors (instances)—for example, an image consisting of multiple patches and their corresponding feature vectors. In MI classification, each bag in the training set has a class [...] Read more.
Multi-instance (MI) learning is a branch of machine learning, where each object (bag) consists of multiple feature vectors (instances)—for example, an image consisting of multiple patches and their corresponding feature vectors. In MI classification, each bag in the training set has a class label, but the instances are unlabeled. The instances are most commonly regarded as a set of points in a multi-dimensional space. Alternatively, instances are viewed as realizations of random vectors with corresponding probability distribution, where the bag is the distribution, not the realizations. By introducing the probability distribution space to bag-level classification problems, dissimilarities between probability distributions (divergences) can be applied. The bag-to-bag Kullback–Leibler information is asymptotically the best classifier, but the typical sparseness of MI training sets is an obstacle. We introduce bag-to-class divergence to MI learning, emphasizing the hierarchical nature of the random vectors that makes bags from the same class different. We propose two properties for bag-to-class divergences, and an additional property for sparse training sets, and propose a dissimilarity measure that fulfils them. Its performance is demonstrated on synthetic and real data. The probability distribution space is valid for MI learning, both for the theoretical analysis and applications. Full article
(This article belongs to the Special Issue Machine Learning in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

11 pages, 3296 KiB  
Data Descriptor
A Database for the Radio Frequency Fingerprinting of Bluetooth Devices
by Emre Uzundurukan, Yaser Dalveren and Ali Kara
Data 2020, 5(2), 55; https://doi.org/10.3390/data5020055 - 21 Jun 2020
Cited by 35 | Viewed by 6720
Abstract
Radio frequency fingerprinting (RFF) is a promising physical layer protection technique which can be used to defend wireless networks from malicious attacks. It is based on the use of the distinctive features of the physical waveforms (signals) transmitted from wireless devices in order [...] Read more.
Radio frequency fingerprinting (RFF) is a promising physical layer protection technique which can be used to defend wireless networks from malicious attacks. It is based on the use of the distinctive features of the physical waveforms (signals) transmitted from wireless devices in order to classify authorized users. The most important requirement to develop an RFF method is the existence of a precise, robust, and extensive database of the emitted signals. In this context, this paper introduces a database consisting of Bluetooth (BT) signals collected at different sampling rates from 27 different smartphones (six manufacturers with several models for each). Firstly, the data acquisition system to create the database is described in detail. Then, the two well-known methods based on transient BT signals are experimentally tested by using the provided data to check their solidity. The results show that the created database may be useful for many researchers working on the development of the RFF of BT devices. Full article
(This article belongs to the Special Issue Data from Smartphones and Wearables)
Show Figures

Figure 1

15 pages, 6388 KiB  
Data Descriptor
Emissions from Swine Manure Treated with Current Products for Mitigation of Odors and Reduction of NH3, H2S, VOC, and GHG Emissions
by Baitong Chen, Jacek A. Koziel, Chumki Banik, Hantian Ma, Myeongseong Lee, Jisoo Wi, Zhanibek Meiirkhanuly, Daniel S. Andersen, Andrzej Białowiec and David B. Parker
Data 2020, 5(2), 54; https://doi.org/10.3390/data5020054 - 18 Jun 2020
Cited by 14 | Viewed by 4401
Abstract
Odor and gaseous emissions from the swine industry are of concern for the wellbeing of humans and livestock. Additives applied to the swine manure surface are popular, marketed products to solve this problem and relatively inexpensive and easy for farmers to use. There [...] Read more.
Odor and gaseous emissions from the swine industry are of concern for the wellbeing of humans and livestock. Additives applied to the swine manure surface are popular, marketed products to solve this problem and relatively inexpensive and easy for farmers to use. There is no scientific data evaluating the effectiveness of many of these products. We evaluated 12 manure additive products that are currently being marketed on their effectiveness in mitigating odor and gaseous emissions from swine manure. We used a pilot-scale system simulating the storage of swine manure with a controlled ventilation of headspace and periodic addition of manure. This dataset contains measured concentrations and estimated emissions of target gases in manure headspace above treated and untreated swine manure. These include ammonia (NH3), hydrogen sulfide (H2S), greenhouse gases (CO2, CH4, and N2O), volatile organic compounds (VOC), and odor. The experiment to test each manure additive product lasted for two months; the measurements of NH3 and H2S were completed twice a week; others were conducted weekly. The manure for each test was collected from three different farms in central Iowa to provide the necessary variety in stored swine manure properties. This dataset is useful for further analyses of gaseous emissions from swine manure under simulated storage conditions and for performance comparison of marketed products for the mitigation of gaseous emissions. Ultimately, swine farmers, the regulatory community, and the public need to have scientific data informing decisions about the usefulness of manure additives. Full article
(This article belongs to the Special Issue Big Data for Sustainable Development)
Show Figures

Graphical abstract

15 pages, 2172 KiB  
Article
Charge Recombination Kinetics of Bacterial Photosynthetic Reaction Centres Reconstituted in Liposomes: Deterministic Versus Stochastic Approach
by Emiliano Altamura, Paola Albanese, Pasquale Stano, Massimo Trotta, Francesco Milano and Fabio Mavelli
Data 2020, 5(2), 53; https://doi.org/10.3390/data5020053 - 12 Jun 2020
Cited by 3 | Viewed by 2305
Abstract
In this theoretical work, we analyse the kinetics of charge recombination reaction after a light excitation of the Reaction Centres extracted from the photosynthetic bacterium Rhodobacter sphaeroides and reconstituted in small unilamellar phospholipid vesicles. Due to the compartmentalized nature of liposomes, vesicles may [...] Read more.
In this theoretical work, we analyse the kinetics of charge recombination reaction after a light excitation of the Reaction Centres extracted from the photosynthetic bacterium Rhodobacter sphaeroides and reconstituted in small unilamellar phospholipid vesicles. Due to the compartmentalized nature of liposomes, vesicles may exhibit a random distribution of both ubiquinone molecules and the Reaction Centre protein complexes that can produce significant differences on the local concentrations from the average expected values. Moreover, since the amount of reacting species is very low in compartmentalized lipid systems the stochastic approach is more suitable to unveil deviations of the average time behaviour of vesicles from the deterministic time evolution. Full article
Show Figures

Figure 1

11 pages, 2510 KiB  
Data Descriptor
Large-Scale Dataset for Radio Frequency-Based Device-Free Crowd Estimation
by Abdil Kaya, Stijn Denis, Ben Bellekens, Maarten Weyn and Rafael Berkvens
Data 2020, 5(2), 52; https://doi.org/10.3390/data5020052 - 09 Jun 2020
Cited by 8 | Viewed by 3902
Abstract
Organisers of events attracting many people have the important task to ensure the safety of the crowd on their venue premises. Measuring the size of the crowd is a critical first step, but often challenging because of occlusions, noise and the dynamics of [...] Read more.
Organisers of events attracting many people have the important task to ensure the safety of the crowd on their venue premises. Measuring the size of the crowd is a critical first step, but often challenging because of occlusions, noise and the dynamics of the crowd. We have been working on a passive Radio Frequency (RF) sensing technique for crowd size estimation, and we now present three datasets of measurements collected at the Tomorrowland music festival in environments containing thousands of people. All datasets have reference data, either based on payment transactions or an access control system, and we provide an example analysis script. We hope that future analyses can lead to an added value for crowd safety experts. Full article
Show Figures

Figure 1

19 pages, 1955 KiB  
Review
An Interdisciplinary Review of Camera Image Collection and Analysis Techniques, with Considerations for Environmental Conservation Social Science
by Coleman L. Little, Elizabeth E. Perry, Jessica P. Fefer, Matthew T. J. Brownlee and Ryan L. Sharp
Data 2020, 5(2), 51; https://doi.org/10.3390/data5020051 - 06 Jun 2020
Cited by 8 | Viewed by 3110
Abstract
Camera-based data collection and image analysis are integral methods in many research disciplines. However, few studies are specifically dedicated to trends in these methods or opportunities for interdisciplinary learning. In this systematic literature review, we analyze published sources (n = 391) to [...] Read more.
Camera-based data collection and image analysis are integral methods in many research disciplines. However, few studies are specifically dedicated to trends in these methods or opportunities for interdisciplinary learning. In this systematic literature review, we analyze published sources (n = 391) to synthesize camera use patterns and image collection and analysis techniques across research disciplines. We frame this inquiry with interdisciplinary learning theory to identify cross-disciplinary approaches and guiding principles. Within this, we explicitly focus on trends within and applicability to environmental conservation social science (ECSS). We suggest six guiding principles for standardized, collaborative approaches to camera usage and image analysis in research. Our analysis suggests that ECSS may offer inspiration for novel combinations of data collection, standardization tactics, and detailed presentations of findings and limitations. ECSS can correspondingly incorporate more image analysis tactics from other disciplines, especially in regard to automated image coding of pertinent attributes. Full article
(This article belongs to the Special Issue Machine Learning in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

9 pages, 2065 KiB  
Article
Data Wrangling in Database Systems: Purging of Dirty Data
by Otmane Azeroual
Data 2020, 5(2), 50; https://doi.org/10.3390/data5020050 - 05 Jun 2020
Cited by 19 | Viewed by 6036
Abstract
Researchers need to be able to integrate ever-increasing amounts of data into their institutional databases, regardless of the source, format, or size of the data. It is then necessary to use the increasing diversity of data to derive greater value from data for [...] Read more.
Researchers need to be able to integrate ever-increasing amounts of data into their institutional databases, regardless of the source, format, or size of the data. It is then necessary to use the increasing diversity of data to derive greater value from data for their organization. The processing of electronic data plays a central role in modern society. Data constitute a fundamental part of operational processes in companies and scientific organizations. In addition, they form the basis for decisions. Bad data quality can negatively affect decisions and have a negative impact on results. The quality of the data is crucial. This includes the new theme of data wrangling, sometimes referred to as data munging or data crunching, to find the dirty data and to transform and clean them. The aim of data wrangling is to prepare a lot of raw data in their original state so that they can be used for further analysis steps. Only then can knowledge be obtained that may bring added value. This paper shows how the data wrangling process works and how it can be used in database systems to clean up data from heterogeneous data sources during their acquisition and integration. Full article
(This article belongs to the Special Issue Challenges in Business Intelligence)
Show Figures

Figure 1

13 pages, 1247 KiB  
Data Descriptor
Responses of Germination to Light and to Far-Red Radiation—Can they be Predicted from Diaspores Size?
by Luís Silva Dias, Elsa Ganhão and Alexandra Soveral Dias
Data 2020, 5(2), 49; https://doi.org/10.3390/data5020049 - 21 May 2020
Cited by 1 | Viewed by 2199
Abstract
This paper presents an update of a dataset of seed volumes previously released online and combines it with published data of the photoblastic response of germination of fruits or seeds (light or dark conditions), and of the effects of enhanced far-red radiation on [...] Read more.
This paper presents an update of a dataset of seed volumes previously released online and combines it with published data of the photoblastic response of germination of fruits or seeds (light or dark conditions), and of the effects of enhanced far-red radiation on germination. Some evidence was found to support that germination in larger diaspores might be indifferent to light or dark conditions. Similarly, germination in smaller diaspores might be inhibited by far-red radiation. However, the length, width, thickness, volume, shape, type of diaspore, or relative amplitude of volume is essentially useless to predict photoblastic responses or the effects of far-red radiation on germination of diaspores. Full article
Show Figures

Graphical abstract

8 pages, 446 KiB  
Data Descriptor
Low-Temperature Pyrolysis of Municipal Solid Waste Components and Refuse-Derived Fuel—Process Efficiency and Fuel Properties of Carbonized Solid Fuel
by Kacper Świechowski, Ewa Syguła, Jacek A. Koziel, Paweł Stępień, Szymon Kugler, Piotr Manczarski and Andrzej Białowiec
Data 2020, 5(2), 48; https://doi.org/10.3390/data5020048 - 21 May 2020
Cited by 16 | Viewed by 3874
Abstract
New technologies to valorize refuse-derived fuels (RDFs) will be required in the near future due to emerging trends of (1) the cement industry’s demands for high-quality alternative fuels and (2) the decreasing calorific value of the fuels derived from municipal solid waste (MSW) [...] Read more.
New technologies to valorize refuse-derived fuels (RDFs) will be required in the near future due to emerging trends of (1) the cement industry’s demands for high-quality alternative fuels and (2) the decreasing calorific value of the fuels derived from municipal solid waste (MSW) and currently used in cement/incineration plants. Low-temperature pyrolysis can increase the calorific value of processed material, leading to the production of value-added carbonized solid fuel (CSF). This dataset summarizes the key properties of MSW-derived CSF. Pyrolysis experiments were completed using eight types of organic waste and their two RDF mixtures. Organic waste represented common morphological groups of MSW, i.e., cartons, fabrics, kitchen waste, paper, plastic, rubber, PAP/AL/PE composite packaging (multi-material packaging also known as Tetra Pak cartons), and wood. The pyrolysis was conducted at temperatures ranging from 300 to 500 °C (20 °C intervals), with a retention (process) time of 20 to 60 min (20 min intervals). The mass yield, energy densification ratio, and energy yield were determined to characterize the pyrolysis process efficiency. The raw materials and produced CSF were tested with proximate analyses (moisture content, organic matter content, ash content, and combustible part content) and with ultimate analyses (elemental composition C, H, N, S) and high heating value (HHV). Additionally, differential scanning calorimetry (DSC) and thermogravimetric analyses (TGA) of the pyrolysis process were performed. The dataset documents the changes in fuel properties of RDF resulting from low-temperature pyrolysis as a function of the pyrolysis conditions and feedstock type. The greatest HHV improvements were observed for fabrics (up to 65%), PAP/AL/PE composite packaging (up to 56%), and wood (up to 46%). Full article
Show Figures

Figure 1

5 pages, 1946 KiB  
Data Descriptor
Data from Experimental Analysis of the Performance and Load Cycling of a Polymer Electrolyte Membrane Fuel Cell
by Andrea Ramírez-Cruzado, Blanca Ramírez-Peña, Rosario Vélez-García, Alfredo Iranzo and José Guerra
Data 2020, 5(2), 47; https://doi.org/10.3390/data5020047 - 20 May 2020
Cited by 3 | Viewed by 3037
Abstract
Fuel cells are electrochemical devices that convert the chemical energy stored in fuels (hydrogen for polymer electrolyte membrane (PEM) fuel cells) directly into electricity with high efficiency. Fuel cells are already commercially used in different applications, and significant research efforts are being carried [...] Read more.
Fuel cells are electrochemical devices that convert the chemical energy stored in fuels (hydrogen for polymer electrolyte membrane (PEM) fuel cells) directly into electricity with high efficiency. Fuel cells are already commercially used in different applications, and significant research efforts are being carried out to further improve their performance and durability and to reduce costs. Experimental testing of fuel cells is a fundamental research activity used to assess all the issues indicated above. The current work presents original data corresponding to the experimental analysis of the performance of a 50 cm2 PEM fuel cell, including experimental results from a load cycling dedicated test. The experimental data were acquired using a dedicated test bench following the harmonized testing protocols defined by the Joint Research Centre (JRC) of the European Commission for automotive applications. With the presented dataset, we aim to provide a transparent collection of experimental data from PEM fuel cell testing that can contribute to enhanced reusability for further research. Full article
Show Figures

Figure 1

16 pages, 1920 KiB  
Data Descriptor
Standartox: Standardizing Toxicity Data
by Andreas Scharmüller, Verena C. Schreiner and Ralf B. Schäfer
Data 2020, 5(2), 46; https://doi.org/10.3390/data5020046 - 16 May 2020
Cited by 14 | Viewed by 4902
Abstract
An increasing number of chemicals such as pharmaceuticals, pesticides and synthetic hormones are in daily use all over the world. In the environment, chemicals can adversely affect populations and communities and in turn related ecosystem functions. To evaluate the risks from chemicals for [...] Read more.
An increasing number of chemicals such as pharmaceuticals, pesticides and synthetic hormones are in daily use all over the world. In the environment, chemicals can adversely affect populations and communities and in turn related ecosystem functions. To evaluate the risks from chemicals for ecosystems, data on their toxicity, which are typically produced in standardized ecotoxicological laboratory tests, is required. The results from ecotoxicological tests are compiled in (meta-)databases such as the United States Environmental Protection Agency (EPA) ECOTOXicology Knowledgebase (ECOTOX). However, for many chemicals, multiple ecotoxicity data are available for the same test organism. These can vary strongly, thereby causing uncertainty of related analyses. Given that most current databases lack aggregation steps or are confined to specific chemicals, we developed Standartox, a tool and database that continuously incorporates the ever-growing number of test results in an automated process workflow that ultimately leads to a single aggregated data point for a specific chemical-organism test combination, representing the toxicity of a chemical. Standartox can be accessed through a web application and an R package. Full article
Show Figures

Figure 1

8 pages, 2280 KiB  
Data Descriptor
An Open Access Data Set Highlighting Aggregation of Dyes on Metal Oxides
by Vishwesh Venkatraman and Lethesh Kallidanthiyil Chellappan
Data 2020, 5(2), 45; https://doi.org/10.3390/data5020045 - 13 May 2020
Cited by 5 | Viewed by 2553
Abstract
The adsorption of a dye to a metal oxide surface such as TiO2, NiO and ZnO leads to deprotonation and often undesirable aggregation of dye molecules, which in turn impacts the photophysical properties of the dye. While controlled aggregation is useful [...] Read more.
The adsorption of a dye to a metal oxide surface such as TiO2, NiO and ZnO leads to deprotonation and often undesirable aggregation of dye molecules, which in turn impacts the photophysical properties of the dye. While controlled aggregation is useful for some applications, it can result in lower performance for dye-sensitized solar cells. To understand this phenomenon better, we have conducted an extensive search of the literature and identified over 4000 records of absorption spectra in solution and after adsorption onto metal oxide. The total data set comprises over 3500 unique compounds, with observed absorption maxima in solution and after adsorption on the semiconductor electrode. This data may serve to provide further insight into the structure-property relationships governing dye-aggregation behaviour. Full article
(This article belongs to the Special Issue Machine Learning and Materials Informatics)
Show Figures

Figure 1

26 pages, 5969 KiB  
Article
An Optimum Tea Fermentation Detection Model Based on Deep Convolutional Neural Networks
by Gibson Kimutai, Alexander Ngenzi, Rutabayiro Ngoga Said, Ambrose Kiprop and Anna Förster
Data 2020, 5(2), 44; https://doi.org/10.3390/data5020044 - 30 Apr 2020
Cited by 14 | Viewed by 5732
Abstract
Tea is one of the most popular beverages in the world, and its processing involves a number of steps which includes fermentation. Tea fermentation is the most important step in determining the quality of tea. Currently, optimum fermentation of tea is detected by [...] Read more.
Tea is one of the most popular beverages in the world, and its processing involves a number of steps which includes fermentation. Tea fermentation is the most important step in determining the quality of tea. Currently, optimum fermentation of tea is detected by tasters using any of the following methods: monitoring change in color of tea as fermentation progresses and tasting and smelling the tea as fermentation progresses. These manual methods are not accurate. Consequently, they lead to a compromise in the quality of tea. This study proposes a deep learning model dubbed TeaNet based on Convolution Neural Networks (CNN). The input data to TeaNet are images from the tea Fermentation and Labelme datasets. We compared the performance of TeaNet with other standard machine learning techniques: Random Forest (RF), K-Nearest Neighbor (KNN), Decision Tree (DT), Support Vector Machine (SVM), Linear Discriminant Analysis (LDA), and Naive Bayes (NB). TeaNet was more superior in the classification tasks compared to the other machine learning techniques. However, we will confirm the stability of TeaNet in the classification tasks in our future studies when we deploy it in a tea factory in Kenya. The research also released a tea fermentation dataset that is available for use by the community. Full article
(This article belongs to the Special Issue Machine Learning in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

13 pages, 1391 KiB  
Article
Guidelines for a Standardized Filesystem Layout for Scientific Data
by Florian Spreckelsen, Baltasar Rüchardt, Jan Lebert, Stefan Luther, Ulrich Parlitz and Alexander Schlemmer
Data 2020, 5(2), 43; https://doi.org/10.3390/data5020043 - 24 Apr 2020
Cited by 2 | Viewed by 4553
Abstract
Storing scientific data on the filesystem in a meaningful and transparent way is no trivial task. In particular, when the data have to be accessed after their originator has left the lab, the importance of a standardized filesystem layout cannot be underestimated. It [...] Read more.
Storing scientific data on the filesystem in a meaningful and transparent way is no trivial task. In particular, when the data have to be accessed after their originator has left the lab, the importance of a standardized filesystem layout cannot be underestimated. It is desirable to have a structure that allows for the unique categorization of all kinds of data from experimental results to publications. They have to be accessible to a broad variety of workflows, e.g., via graphical user interface as well as via command line, in order to find widespread acceptance. Furthermore, the inclusion of already existing data has to be as simple as possible. We propose a three-level layout to organize and store scientific data that incorporates the full chain of scientific data management from data acquisition to analysis to publications. Metadata are saved in a standardized way and connect original data to analyses and publications as well as to their originators. A simple software tool to check a file structure for compliance with the proposed structure is presented. Full article
(This article belongs to the Special Issue Data Quality and Data Access for Research)
Show Figures

Figure 1

18 pages, 15131 KiB  
Data Descriptor
Changes in the Building Stock of Da Nang between 2015 and 2017
by Andreas Braun, Gebhard Warth, Felix Bachofer, Tram Thi Quynh Bui, Hao Tran and Volker Hochschild
Data 2020, 5(2), 42; https://doi.org/10.3390/data5020042 - 23 Apr 2020
Cited by 3 | Viewed by 3527
Abstract
This descriptor introduces a novel dataset, which contains the number and types of buildings in the city of Da Nang in Central Vietnam. The buildings were classified into nine distinct types and initially extracted from a satellite image of the year 2015. Secondly, [...] Read more.
This descriptor introduces a novel dataset, which contains the number and types of buildings in the city of Da Nang in Central Vietnam. The buildings were classified into nine distinct types and initially extracted from a satellite image of the year 2015. Secondly, changes were identified based on a visual interpretation of an image of the year 2017, so that new buildings, demolished buildings and building upgrades can be quantitatively analyzed. The data was aggregated by administrative wards and a hexagonal grid with a diameter of 250 m to protect personal rights and to avoid the misuse of a single building’s information. The dataset shows an increase of 19,391 buildings between October 2015 and August 2017, with a variety of interesting spatial patterns. The center of the city is mostly dominated by building changes and upgrades, while most of the new buildings were constructed within a distance of five to six kilometers from the city center. Full article
Show Figures

Figure 1

24 pages, 1156 KiB  
Article
A Multi-Factor Analysis of Forecasting Methods: A Study on the M4 Competition
by Pantelis Agathangelou, Demetris Trihinas and Ioannis Katakis
Data 2020, 5(2), 41; https://doi.org/10.3390/data5020041 - 22 Apr 2020
Cited by 2 | Viewed by 3429
Abstract
As forecasting becomes more and more appreciated in situations and activities of everyday life that involve prediction and risk assessment, more methods and solutions make their appearance in this exciting arena of uncertainty. However, less is known about what makes a promising or [...] Read more.
As forecasting becomes more and more appreciated in situations and activities of everyday life that involve prediction and risk assessment, more methods and solutions make their appearance in this exciting arena of uncertainty. However, less is known about what makes a promising or a poor forecast. In this article, we provide a multi-factor analysis on the forecasting methods that participated and stood out in the M4 competition, by focusing on Error (predictive performance), Correlation (among different methods), and Complexity (computational performance). The main goal of this study is to recognize the key elements of the contemporary forecasting methods, reveal what made them excel in the M4 competition, and eventually provide insights towards better understanding the forecasting task. Full article
Show Figures

Figure 1

7 pages, 1338 KiB  
Data Descriptor
The Fluctuation of Process Gasses Especially of Carbon Monoxide during Aerobic Biostabilization of an Organic Fraction of Municipal Solid Waste under Different Technological Regimes
by Sylwia Stegenta-Dąbrowska, Jakub Rogosz, Przemysław Bukowski, Marcin Dębowski, Peter F. Randerson, Jerzy Bieniek and Andrzej Białowiec
Data 2020, 5(2), 40; https://doi.org/10.3390/data5020040 - 19 Apr 2020
Cited by 2 | Viewed by 2121
Abstract
Carbon monoxide (CO) is an air pollutant commonly formed during natural and anthropogenic processes involving incomplete combustion. Much less is known about biological CO production during the decomposition of the organic fraction (OF), especially originating from municipal solid waste (MSW), e.g., during the [...] Read more.
Carbon monoxide (CO) is an air pollutant commonly formed during natural and anthropogenic processes involving incomplete combustion. Much less is known about biological CO production during the decomposition of the organic fraction (OF), especially originating from municipal solid waste (MSW), e.g., during the aerobic biostabilization (AB) process. In this dataset, we summarized the temperature and the content of process gases (including rarely reported carbon monoxide, CO) generated inside full-scale AB of an organic fraction of municipal solid waste (OFMSW) reactor. The objective of the study was to present the data of the fluctuation of CO content as well as that of O2, CO2, and CH4 in process gas within the waste pile, during the AB of the OFMSW. The OFMSW was aerobically biostabilized in six reactors, in which the technological regimes of AB were dependent on process duration (42–69 days), waste mass (391.02–702.38 Mg), the intensity of waste aeration (4.4–10.7 m3·Mg−1·h−1), reactor design (membrane-covered reactor or membrane-covered reactor with sidewalls) and thermal conditions in the reactor (20.2–77.0 °C). The variations in the degree of waste aeration (O2 content), temperature, and fluctuation of CO, CO2, and CH4 content during the weekly measurement intervals were summarized. Despite a high O2 content in all reactors and stable thermal conditions, the presence of CO in process gas was observed, which suggests that ensuring optimum conditions for the process is not sufficient for CO emissions to be mitigated. In the analyzed experiment, CO concentration was highly variable over the duration of the process, ranging from a few to over 1,500 ppm. The highest concentration of CO was observed between the second and fifth weeks of the test. The reactor B2 was the source of the highest CO production and average highest temperature. This study suggests that the highest CO productions occur at the highest temperature, which is why the authors believe that CO production has thermochemical foundations. Full article
(This article belongs to the Special Issue Data Reuse for Sustainable Development Goals)
Show Figures

Figure 1

18 pages, 3398 KiB  
Article
An On-Demand Service for Managing and Analyzing Arctic Sea Ice High Spatial Resolution Imagery
by Dexuan Sha, Xin Miao, Mengchao Xu, Chaowei Yang, Hongjie Xie, Alberto M. Mestas-Nuñez, Yun Li, Qian Liu and Jingchao Yang
Data 2020, 5(2), 39; https://doi.org/10.3390/data5020039 - 17 Apr 2020
Cited by 2 | Viewed by 3560
Abstract
Sea ice acts as both an indicator and an amplifier of climate change. High spatial resolution (HSR) imagery is an important data source in Arctic sea ice research for extracting sea ice physical parameters, and calibrating/validating climate models. HSR images are difficult to [...] Read more.
Sea ice acts as both an indicator and an amplifier of climate change. High spatial resolution (HSR) imagery is an important data source in Arctic sea ice research for extracting sea ice physical parameters, and calibrating/validating climate models. HSR images are difficult to process and manage due to their large data volume, heterogeneous data sources, and complex spatiotemporal distributions. In this paper, an Arctic Cyberinfrastructure (ArcCI) module is developed that allows a reliable and efficient on-demand image batch processing on the web. For this module, available associated datasets are collected and presented through an open data portal. The ArcCI module offers an architecture based on cloud computing and big data components for HSR sea ice images, including functionalities of (1) data acquisition through File Transfer Protocol (FTP) transfer, front-end uploading, and physical transfer; (2) data storage based on Hadoop distributed file system and matured operational relational database; (3) distributed image processing including object-based image classification and parameter extraction of sea ice features; (4) 3D visualization of dynamic spatiotemporal distribution of extracted parameters with flexible statistical charts. Arctic researchers can search and find arctic sea ice HSR image and relevant metadata in the open data portal, obtain extracted ice parameters, and conduct visual analytics interactively. Users with large number of images can leverage the service to process their image in high performance manner on cloud, and manage, analyze results in one place. The ArcCI module will assist domain scientists on investigating polar sea ice, and can be easily transferred to other HSR image processing research projects. Full article
Show Figures

Figure 1

12 pages, 2789 KiB  
Data Descriptor
Bioinformatics Analysis Identifying Key Biomarkers in Bladder Cancer
by Chuan Zhang, Mandy Berndt-Paetz and Jochen Neuhaus
Data 2020, 5(2), 38; https://doi.org/10.3390/data5020038 - 16 Apr 2020
Cited by 3 | Viewed by 3315
Abstract
Our goal was to find new diagnostic and prognostic biomarkers in bladder cancer (BCa), and to predict molecular mechanisms and processes involved in BCa development and progression. Notably, the data collection is an inevitable step and time-consuming work. Furthermore, identification of the complementary [...] Read more.
Our goal was to find new diagnostic and prognostic biomarkers in bladder cancer (BCa), and to predict molecular mechanisms and processes involved in BCa development and progression. Notably, the data collection is an inevitable step and time-consuming work. Furthermore, identification of the complementary results and considerable literature retrieval were requested. Here, we provide detailed information of the used datasets, the study design, and on data mining. We analyzed differentially expressed genes (DEGs) in the different datasets and the most important hub genes were retrieved. We report on the meta-data information of the population, such as gender, race, tumor stage, and the expression levels of the hub genes. We include comprehensive information about the gene ontology (GO) enrichment analyses and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyses. We also retrieved information about the up- and down-regulation of genes. All in all, the presented datasets can be used to evaluate potential biomarkers and to predict the performance of different preclinical biomarkers in BCa. Full article
(This article belongs to the Special Issue Benchmarking Datasets in Bioinformatics)
Show Figures

Figure 1

19 pages, 6997 KiB  
Article
U-Net Segmented Adjacent Angle Detection (USAAD) for Automatic Analysis of Corneal Nerve Structures
by Philip Mehrgardt, Seid Miad Zandavi, Simon K. Poon, Juno Kim, Maria Markoulli and Matloob Khushi
Data 2020, 5(2), 37; https://doi.org/10.3390/data5020037 - 14 Apr 2020
Cited by 13 | Viewed by 4827
Abstract
Measurement of corneal nerve tortuosity is associated with dry eye disease, diabetic retinopathy, and a range of other conditions. However, clinicians measure tortuosity on very different grading scales that are inherently subjective. Using in vivo confocal microscopy, 253 images of corneal nerves were [...] Read more.
Measurement of corneal nerve tortuosity is associated with dry eye disease, diabetic retinopathy, and a range of other conditions. However, clinicians measure tortuosity on very different grading scales that are inherently subjective. Using in vivo confocal microscopy, 253 images of corneal nerves were captured and manually labelled by two researchers with tortuosity measurements ranging on a scale from 0.1 to 1.0. Tortuosity was estimated computationally by extracting a binarised nerve structure utilising a previously published method. A novel U-Net segmented adjacent angle detection (USAAD) method was developed by training a U-Net with a series of back feeding processed images and nerve structure vectorizations. Angles between all vectors and segments were measured and used for training and predicting tortuosity measured by human labelling. Despite the disagreement among clinicians on tortuosity labelling measures, the optimised grading measurement was significantly correlated with our USAAD angle measurements. We identified the nerve interval lengths that optimised the correlation of tortuosity estimates with human grading. We also show the merit of our proposed method with respect to other baseline methods that provide a single estimate of tortuosity. The real benefit of USAAD in future will be to provide comprehensive structural information about variations in nerve orientation for potential use as a clinical measure of the presence of disease and its progression. Full article
(This article belongs to the Special Issue Data-Driven Healthcare Tasks: Tools, Frameworks, and Techniques)
Show Figures

Figure 1

13 pages, 7943 KiB  
Data Descriptor
METER.AC: Live Open Access Atmospheric Monitoring Data for Bulgaria with High Spatiotemporal Resolution
by Atanas Terziyski, Stoyan Tenev, Vedrin Jeliazkov, Nina Jeliazkova and Nikolay Kochev
Data 2020, 5(2), 36; https://doi.org/10.3390/data5020036 - 08 Apr 2020
Cited by 9 | Viewed by 4598
Abstract
Detailed atmospheric monitoring data are notoriously difficult to obtain for some geographic regions, while they are of paramount importance in scientific research, forecasting, emergency response, policy making, etc. We describe a continuously updated dataset, METER.AC, consisting of raw measurements of atmospheric pressure, temperature, [...] Read more.
Detailed atmospheric monitoring data are notoriously difficult to obtain for some geographic regions, while they are of paramount importance in scientific research, forecasting, emergency response, policy making, etc. We describe a continuously updated dataset, METER.AC, consisting of raw measurements of atmospheric pressure, temperature, relative humidity, particulate matter, and background radiation in about 100 locations in Bulgaria, as well as some derived values such as sea-level atmospheric pressure, dew/frost point, and hourly trends. The measurements are performed by low-power maintenance-free nodes with common hardware and software, which are specifically designed and optimized for this purpose. The time resolution of the measurements is 5 min. The short-term aim is to deploy at least one node per 100 km2, while uniformly covering altitudes between 0 and 3000 m asl with a special emphasis on remote mountainous areas. A full history of all raw measurements (non-aggregated in time and space) is publicly available, starting from September 2018. We describe the basic technical characteristics of our in-house developed equipment, data organization, and communication protocols as well as present some use case examples. The METER.AC network relies on the paradigm of the Internet of Things (IoT), by collecting data from various gauges. A guiding principle in this work is the provision of findable, accessible, interoperable, and reusable (FAIR) data. The dataset is in the public domain, and it provides resources and tools enabling citizen science development in the context of sustainable development. Full article
(This article belongs to the Special Issue Data Reuse for Sustainable Development Goals)
Show Figures

Figure 1

13 pages, 2714 KiB  
Article
Data Quality as a Critical Success Factor for User Acceptance of Research Information Systems
by Otmane Azeroual, Gunter Saake, Mohammad Abuosba and Joachim Schöpfel
Data 2020, 5(2), 35; https://doi.org/10.3390/data5020035 - 06 Apr 2020
Cited by 12 | Viewed by 3955
Abstract
In our present paper, the influence of data quality on the success of the user acceptance of research information systems (RIS) is investigated and determined. Until today, only a little research has been done on this topic and no studies have been carried [...] Read more.
In our present paper, the influence of data quality on the success of the user acceptance of research information systems (RIS) is investigated and determined. Until today, only a little research has been done on this topic and no studies have been carried out. So far, just the importance of data quality in RIS, the investigation of its dimensions and techniques for measuring, improving, and increasing data quality in RIS (such as data profiling, data cleansing, data wrangling, and text data mining) has been focused. With this work, we try to derive an answer to the question of the impact of data quality on the success of RIS user acceptance. An acceptance of RIS users is achieved when the research institutions decide to replace the RIS and replace it with a new one. The result is a statement about the extent to which data quality influences the success of users’ acceptance of RIS. Full article
(This article belongs to the Special Issue Data Reuse for Sustainable Development Goals)
Show Figures

Figure 1

5 pages, 374 KiB  
Data Descriptor
Player Heart Rate Responses and Pony External Load Measures during 16-Goal Polo
by Russ Best
Data 2020, 5(2), 34; https://doi.org/10.3390/data5020034 - 02 Apr 2020
Cited by 5 | Viewed by 2784
Abstract
This dataset provides information pertaining to the spatiotemporal stresses experienced by Polo ponies in play and the cardiovascular responses to these demands by Polo players, during 16-goal Polo. Data were collected by player-worn GPS units and paired heart rate monitors, across a New [...] Read more.
This dataset provides information pertaining to the spatiotemporal stresses experienced by Polo ponies in play and the cardiovascular responses to these demands by Polo players, during 16-goal Polo. Data were collected by player-worn GPS units and paired heart rate monitors, across a New Zealand Polo season. The dataset comprises observations from 160 chukkas of Open Polo, and is presented as per chukka per game (curated) and in per effort per player (raw) formats. Data for distance, speed, and high intensity metrics are presented and are further categorised into five equine-based speed zones, in accordance with previous literature. The purpose of this dataset is to provide a detailed quantification of the load experienced by Polo players and their ponies at the highest domestic performance level in New Zealand, as well as advancing the scope of previous Polo literature that has employed GPS or heart rate monitoring technologies. This dataset may be of interest to equine scientists and trainers, veterinary practitioners, and sports scientists. An exemplar template is provided to facilitate the adoption of this data collection approach by other practitioners. Full article
(This article belongs to the Special Issue Data from Smartphones and Wearables)
Show Figures

Figure 1

24 pages, 4752 KiB  
Article
Multiple Regression Analysis and Frequent Itemset Mining of Electronic Medical Records: A Visual Analytics Approach Using VISA_M3R3
by Sheikh S. Abdullah, Neda Rostamzadeh, Kamran Sedig, Amit X. Garg and Eric McArthur
Data 2020, 5(2), 33; https://doi.org/10.3390/data5020033 - 29 Mar 2020
Cited by 11 | Viewed by 4667
Abstract
Medication-induced acute kidney injury (AKI) is a well-known problem in clinical medicine. This paper reports the first development of a visual analytics (VA) system that examines how different medications associate with AKI. In this paper, we introduce and describe VISA_M3R3, a VA system [...] Read more.
Medication-induced acute kidney injury (AKI) is a well-known problem in clinical medicine. This paper reports the first development of a visual analytics (VA) system that examines how different medications associate with AKI. In this paper, we introduce and describe VISA_M3R3, a VA system designed to assist healthcare researchers in identifying medications and medication combinations that associate with a higher risk of AKI using electronic medical records (EMRs). By integrating multiple regression models, frequent itemset mining, data visualization, and human-data interaction mechanisms, VISA_M3R3 allows users to explore complex relationships between medications and AKI in such a way that would be difficult or sometimes even impossible without the help of a VA system. Through an analysis of 595 medications using VISA_M3R3, we have identified 55 AKI-inducing medications, 24,212 frequent medication groups, and 78 medication groups that are associated with AKI. The purpose of this paper is to demonstrate the usefulness of VISA_M3R3 in the investigation of medication-induced AKI in particular and other clinical problems in general. Furthermore, this research highlights what needs to be considered in the future when designing VA systems that are intended to support gaining novel and deep insights into massive existing EMRs. Full article
(This article belongs to the Special Issue Data Quality and Data Access for Research)
Show Figures

Figure 1

12 pages, 2184 KiB  
Data Descriptor
Data-Sets for Indoor Photovoltaic Behavior in Low Lighting Conditions
by Mojtaba Masoudinejad
Data 2020, 5(2), 32; https://doi.org/10.3390/data5020032 - 28 Mar 2020
Cited by 5 | Viewed by 2901
Abstract
Analysis of voltage–current behavior of photovoltaic modules is a critical part of their modeling. Parameter identification of these models demands data from them, measured in realistic environments. In spite of advancement in modeling methodologies under solar lighting, few analyses have been focused on [...] Read more.
Analysis of voltage–current behavior of photovoltaic modules is a critical part of their modeling. Parameter identification of these models demands data from them, measured in realistic environments. In spite of advancement in modeling methodologies under solar lighting, few analyses have been focused on indoor photovoltaics. Lack of accurate and reproducible data as a major challenge in this field is addressed here. A high accuracy measurement setup for evaluation and analysis of indoor photovoltaic modules is explained. By use of this system, different modules are measured under diverse environmental conditions. These measurements are structured in data-sets that can be used for either analysis of physical environment effects and modeling or development of specific parameter identification methods in low light intensity conditions. Full article
(This article belongs to the Special Issue Data Reuse for Sustainable Development Goals)
Show Figures

Figure 1

17 pages, 1355 KiB  
Data Descriptor
Monthly Entomological Inoculation Rate Data for Studying the Seasonality of Malaria Transmission in Africa
by Edmund I. Yamba, Adrian M. Tompkins, Andreas H. Fink, Volker Ermert, Mbouna D. Amelie, Leonard K. Amekudzi and Olivier J. T. Briët
Data 2020, 5(2), 31; https://doi.org/10.3390/data5020031 - 27 Mar 2020
Cited by 4 | Viewed by 4236
Abstract
A comprehensive literature review was conducted to create a new database of 197 field surveys of monthly malaria Entomological Inoculation Rates (EIR), a metric of malaria transmission intensity. All field studies provide data at a monthly temporal resolution and have a duration of [...] Read more.
A comprehensive literature review was conducted to create a new database of 197 field surveys of monthly malaria Entomological Inoculation Rates (EIR), a metric of malaria transmission intensity. All field studies provide data at a monthly temporal resolution and have a duration of at least one year in order to study the seasonality of the disease. For inclusion, data collection methodologies adhered to a specific standard and the location and timing of the measurements were documented. Auxiliary information on the population and hydrological setting were also included. The database includes measurements that cover West and Central Africa and the period from 1945 to 2011, and hence facilitates analysis of interannual transmission variability over broad regions. Full article
Show Figures

Figure 1

10 pages, 1342 KiB  
Article
Influence of Information Quality via Implemented German RCD Standard in Research Information Systems
by Otmane Azeroual, Joachim Schöpfel and Dragan Ivanovic
Data 2020, 5(2), 30; https://doi.org/10.3390/data5020030 - 27 Mar 2020
Cited by 2 | Viewed by 3028
Abstract
With the steady increase in the number of data sources to be stored and processed by higher education and research institutions, it has become necessary to develop Research Information Systems, which will store this research information in the long term and make it [...] Read more.
With the steady increase in the number of data sources to be stored and processed by higher education and research institutions, it has become necessary to develop Research Information Systems, which will store this research information in the long term and make it accessible for further use, such as reporting and evaluation processes, institutional decision making and the presentation of research performance. In order to retain control while integrating research information from heterogeneous internal and external data sources and disparate interfaces into RIS and to maximize the benefits of the research information, ensuring data quality in RIS is critical. To facilitate a common understanding of the research information collected and to harmonize data collection processes, various standardization initiatives have emerged in recent decades. These standards support the use of research information in RIS and enable compatibility and interoperability between different information systems. This paper examines the process of securing data quality in RIS and the impact of research information standards on data quality in RIS. We focus on the recently developed German Research Core Dataset standard as a case of application. Full article
(This article belongs to the Special Issue Data Quality and Data Access for Research)
Show Figures

Figure 1

14 pages, 2267 KiB  
Article
Research Data Sharing in Spain: Exploring Determinants, Practices, and Perceptions
by Rafael Aleixandre-Benavent, Antonio Vidal-Infer, Adolfo Alonso-Arroyo, Fernanda Peset and Antonia Ferrer Sapena
Data 2020, 5(2), 29; https://doi.org/10.3390/data5020029 - 27 Mar 2020
Cited by 10 | Viewed by 4096
Abstract
This work provides an overview of a Spanish survey on research data, which was carried out within the framework of the project Datasea at the beginning of 2015. It is covered by the objectives of sustainable development (goal 9) to support the research. [...] Read more.
This work provides an overview of a Spanish survey on research data, which was carried out within the framework of the project Datasea at the beginning of 2015. It is covered by the objectives of sustainable development (goal 9) to support the research. The purpose of the study was to identify the habits and current experiences of Spanish researchers in the health sciences in relation to the management and sharing of raw research data. Method: An electronic questionnaire composed of 40 questions divided into three blocks was designed. The three Section s contained questions on the following aspects: (A) personal information; (B) creation and reuse of data; and (C) preservation of data. The questionnaire was sent by email to a list of universities in Spain to be distributed among their researchers and professors. A total of 1063 researchers completed the questionnaire. More than half of the respondents (54.9%) lacked a data management plan; nearly a quarter had storage systems for the research group; 81.5% used personal computers to store data; “Contact with colleagues” was the most frequent means used to locate and access other researchers’ data; and nearly 60% of researchers stated their data were available to the research group and collaborating colleagues. The main fears about sharing were legal questions (47.9%), misuse or interpretation of data (42.7%), and loss of authorship (28.7%). The results allow us to understand the state of data sharing among Spanish researchers and can serve as a basis to identify the needs of researchers to share data, optimize existing infrastructure, and promote data sharing among those who do not practice it yet. Full article
(This article belongs to the Special Issue Data Reuse for Sustainable Development Goals)
Show Figures

Figure 1

7 pages, 536 KiB  
Article
The Emergency Medicine Facing the Challenge of Open Science
by Andrea Sixto-Costoya, Rafael Aleixandre-Benavent, Rut Lucas-Domínguez and Antonio Vidal-Infer
Data 2020, 5(2), 28; https://doi.org/10.3390/data5020028 - 25 Mar 2020
Cited by 8 | Viewed by 2798
Abstract
(1) Background: The availability of research datasets can strengthen and facilitate research processes. This is specifically relevant in the emergency medicine field due to the importance of providing immediate care in critical situations as the very current Coronavirus (COVID-19) Pandemic is showing to [...] Read more.
(1) Background: The availability of research datasets can strengthen and facilitate research processes. This is specifically relevant in the emergency medicine field due to the importance of providing immediate care in critical situations as the very current Coronavirus (COVID-19) Pandemic is showing to the scientific community. This work aims to show which Emergency Medicine journals indexed in Journal Citation Reports (JCR) currently meet data sharing criteria. (2) Methods: This study analyzes the editorial policies regarding the data deposit of the journals in the emergency medicine category of the JCR and evaluates the Supplementary material of the articles published in these journals that have been deposited in the PubMed Central repository. (3) Results: It has been observed that 19 out of the 24 journals contained in the emergency medicine category of Journal Citation Reports are also located in PubMed Central (PMC), yielding a total of 5983 articles. Out of these, only 9.4% of the articles contain supplemental material. Although second quartile journals of JCR emergency medicine category have quantitatively more articles in PMC, the main journals involved in the deposit of supplemental material belong to the first quartile, of which the most used format in the articles is pdf, followed by text documents. (4) Conclusion: This study reveals that data sharing remains an incipient practice in the emergency medicine field, as there are still barriers between researchers to participate in data sharing. Therefore, it is necessary to promote dynamics to improve this practice both qualitatively (the quality and format of datasets) and quantitatively (the quantity of datasets in absolute terms) in research. Full article
(This article belongs to the Special Issue Data Reuse for Sustainable Development Goals)
Show Figures

Figure 1

Previous Issue
Next Issue
Back to TopTop