Next Issue
Volume 4, December
Previous Issue
Volume 4, June
 
 

Data, Volume 4, Issue 3 (September 2019) – 41 articles

Cover Story (view full-size image): Satellite earth observation is gaining increasing recognition as a technology for addressing global environmental challenges. Due to large global programs such as Landsat and Copernicus, the availability of free data has increased significantly in recent years. In response, there has been a drive to find more efficient ways to process data at scale and facilitate access to derived insights. Two concepts addressing this challenge are data cubes as a technical solution for efficiently scaling computations and analysis-ready data (ARD) for defining quality standards and ensuring traceability with extensive metadata. This study compared different processing schemes for converting standard synthetic aperture radar (SAR) image datasets into radiometrically terrain corrected (RTC) analysis-ready products. The goal is to assist in the practical implementation of defined ARD standards, enabling routine analyses [...] Read more.
  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.
Order results
Result details
Section
Select all
Export citation of selected articles as:
6 pages, 1015 KiB  
Data Descriptor
Video Recordings of Male Face and Neck Movements for Facial Recognition and Other Purposes
by Collin Gros and Jeremy Straub
Data 2019, 4(3), 130; https://doi.org/10.3390/data4030130 - 06 Sep 2019
Viewed by 4214
Abstract
Facial recognition is made more difficult by unusual facial positions and movement. However, for many applications, the ability to accurately recognize moving subjects with movement-distorted facial features is required. This dataset includes videos of multiple subjects, taken under multiple lighting brightness and temperature [...] Read more.
Facial recognition is made more difficult by unusual facial positions and movement. However, for many applications, the ability to accurately recognize moving subjects with movement-distorted facial features is required. This dataset includes videos of multiple subjects, taken under multiple lighting brightness and temperature conditions, which can be used to train and evaluate the performance of facial recognition systems. Full article
Show Figures

Figure 1

15 pages, 2106 KiB  
Article
Predicting High-Risk Prostate Cancer Using Machine Learning Methods
by Henry Barlow, Shunqi Mao and Matloob Khushi
Data 2019, 4(3), 129; https://doi.org/10.3390/data4030129 - 02 Sep 2019
Cited by 38 | Viewed by 8225
Abstract
Prostate cancer can be low- or high-risk to the patient’s health. Current screening on the basis of prostate-specific antigen (PSA) levels has a tendency towards both false positives and false negatives, both of which have negative consequences. We obtained a dataset of 35,875 [...] Read more.
Prostate cancer can be low- or high-risk to the patient’s health. Current screening on the basis of prostate-specific antigen (PSA) levels has a tendency towards both false positives and false negatives, both of which have negative consequences. We obtained a dataset of 35,875 patients from the screening arm of the National Cancer Institute’s Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial. We segmented the data into instances without prostate cancer, instances with low-risk prostate cancer, and instances with high-risk prostate cancer. We developed a pipeline to deal with imbalanced data and proposed algorithms to perform preprocessing on such datasets. We evaluated the accuracy of various machine learning algorithms in predicting high-risk prostate cancer. An accuracy of 91.5% can be achieved by the proposed pipeline, using standard scaling, SVMSMOTE sampling method, and AdaBoost for machine learning. We then evaluated the contribution of rate of change of PSA, age, BMI, and filtration by race to this model’s accuracy. We identified that including the rate of change of PSA and age in our model increased the area under the curve (AUC) of the model by 6.8%, whereas BMI and race had a minimal effect. Full article
Show Figures

Figure 1

11 pages, 4065 KiB  
Data Descriptor
Flights of a Multirotor UAS with Structural Faults: Failures on Composite Propeller(s)
by Srikanth Gururajan, Kyle Mitchell and William Ebel
Data 2019, 4(3), 128; https://doi.org/10.3390/data4030128 - 28 Aug 2019
Cited by 4 | Viewed by 3298
Abstract
Data acquired from several flights of a custom-fabricated Hexacopter Unmanned Aerial System (UAS) with composite structure (carbon fiber arms and central hub) and composite (carbon fiber) propellers are described in this article. The Hexacopter was assembled from a commercially available kit (Tarot 690) [...] Read more.
Data acquired from several flights of a custom-fabricated Hexacopter Unmanned Aerial System (UAS) with composite structure (carbon fiber arms and central hub) and composite (carbon fiber) propellers are described in this article. The Hexacopter was assembled from a commercially available kit (Tarot 690) and flown in manual and autonomous modes. Takeoffs and landings were under manual control and the bulk of the flight tests was conducted with the Hexacopter in a “position hold” mode. All flights were flown within the UAS flight cage at Parks College of Engineering, Aviation and Technology at Saint Louis University for approximately 5 min each. Several failure conditions (different types, artificially induced) on the composite (carbon fiber) propellers were tested, including failures on up to two propellers. The dataset described in this article contains flight data from the onboard flight controller (Pixhawk) as well as three accelerometers, each with three axes, mounted on the arms of the Hexacopter UAS. The data are included as supplemental material. Full article
Show Figures

Figure 1

9 pages, 382 KiB  
Data Descriptor
NILMPEds: A Performance Evaluation Dataset for Event Detection Algorithms in Non-Intrusive Load Monitoring
by Lucas Pereira
Data 2019, 4(3), 127; https://doi.org/10.3390/data4030127 - 24 Aug 2019
Cited by 8 | Viewed by 3775
Abstract
Datasets are important for researchers to build models and test how these perform, as well as to reproduce research experiments from others. This data paper presents the NILM Performance Evaluation dataset (NILMPEds), which is aimed primarily at research reproducibility in the field of [...] Read more.
Datasets are important for researchers to build models and test how these perform, as well as to reproduce research experiments from others. This data paper presents the NILM Performance Evaluation dataset (NILMPEds), which is aimed primarily at research reproducibility in the field of Non-intrusive load monitoring. This initial release of NILMPEds is dedicated to event detection algorithms and is comprised of ground-truth data for four test datasets, the specification of 47,950 event detection models, the power events returned by each model in the four test datasets, and the performance of each individual model according to 31 performance metrics. Full article
Show Figures

Figure 1

11 pages, 3235 KiB  
Article
A Novel Ensemble Neuro-Fuzzy Model for Financial Time Series Forecasting
by Alexander Vlasenko, Nataliia Vlasenko, Olena Vynokurova, Yevgeniy Bodyanskiy and Dmytro Peleshko
Data 2019, 4(3), 126; https://doi.org/10.3390/data4030126 - 23 Aug 2019
Cited by 17 | Viewed by 3382
Abstract
Neuro-fuzzy models have a proven record of successful application in finance. Forecasting future values is a crucial element of successful decision making in trading. In this paper, a novel ensemble neuro-fuzzy model is proposed to overcome limitations and improve the previously successfully applied [...] Read more.
Neuro-fuzzy models have a proven record of successful application in finance. Forecasting future values is a crucial element of successful decision making in trading. In this paper, a novel ensemble neuro-fuzzy model is proposed to overcome limitations and improve the previously successfully applied a five-layer multidimensional Gaussian neuro-fuzzy model and its learning. The proposed solution allows skipping the error-prone hyperparameters selection process and shows better accuracy results in real life financial data. Full article
(This article belongs to the Special Issue Data Analysis for Financial Markets)
Show Figures

Figure 1

10 pages, 365 KiB  
Data Descriptor
Google Web and Image Search Visibility Data for Online Store
by Artur Strzelecki
Data 2019, 4(3), 125; https://doi.org/10.3390/data4030125 - 22 Aug 2019
Cited by 13 | Viewed by 6709
Abstract
This data descriptor describes Google search engine visibility data. The visibility of a domain name in a search engine comes from search engine optimization and can be evaluated based on four data metrics and five data dimensions. The data metrics are the following: [...] Read more.
This data descriptor describes Google search engine visibility data. The visibility of a domain name in a search engine comes from search engine optimization and can be evaluated based on four data metrics and five data dimensions. The data metrics are the following: Clicks volume (1), impressions volume (2), click-through ratio (3), and ranking position (4). Data dimensions are as follows: queries that are entered into search engines that trigger results with the researched domain name (1), page URLs from research domains which are available in the search engine results page (2), country of origin of search engine visitors (3), type of device used for the search (4), and date of the search (5). Search engine visibility data were obtained from the Google search console for the international online store, which is visible in 240 countries and territories for a period of 15 months. The data contain 123 K clicks and 4.86 M impressions for the web search and 22 K clicks and 9.07 M impressions for the image search. The proposed method for obtaining data can be applied in any other area, not only in the e-commerce industry. Full article
(This article belongs to the Special Issue Data Analysis for Financial Markets)
Show Figures

Graphical abstract

16 pages, 3264 KiB  
Data Descriptor
A Dataset of Students’ Mental Health and Help-Seeking Behaviors in a Multicultural Environment
by Minh-Hoang Nguyen, Manh-Toan Ho, Quynh-Yen T. Nguyen and Quan-Hoang Vuong
Data 2019, 4(3), 124; https://doi.org/10.3390/data4030124 - 21 Aug 2019
Cited by 19 | Viewed by 30008
Abstract
University students, especially international students, possess a higher risk of mental health problems than the general population. However, the literature regarding the prevalence and determinants of mental health problems as well as help-seeking behaviors of international and domestic students in Japan seems to [...] Read more.
University students, especially international students, possess a higher risk of mental health problems than the general population. However, the literature regarding the prevalence and determinants of mental health problems as well as help-seeking behaviors of international and domestic students in Japan seems to be limited. This dataset contains 268 records of depression, acculturative stress, social connectedness, and help-seeking behaviors reported by international and domestic students at an international university in Japan. One of the main findings that can be drawn from this dataset is how the level of social connectedness and acculturative stress are predictive of the reported depression among international as well as domestic students. The dataset is expected to provide reliable materials for further study of cross-cultural public health studies and policy-making in higher education. Full article
(This article belongs to the Special Issue Big Data and Digital Health)
Show Figures

Figure 1

12 pages, 1284 KiB  
Technical Note
dsCleaner: A Python Library to Clean, Preprocess and Convert Non-Intrusive Load Monitoring Datasets
by Manuel Pereira, Nuno Velosa and Lucas Pereira
Data 2019, 4(3), 123; https://doi.org/10.3390/data4030123 - 12 Aug 2019
Cited by 10 | Viewed by 5024
Abstract
Datasets play a vital role in data science and machine learning research as they serve as the basis for the development, evaluation, and benchmark of new algorithms. Non-Intrusive Load Monitoring is one of the fields that has been benefiting from the recent increase [...] Read more.
Datasets play a vital role in data science and machine learning research as they serve as the basis for the development, evaluation, and benchmark of new algorithms. Non-Intrusive Load Monitoring is one of the fields that has been benefiting from the recent increase in the number of publicly available datasets. However, there is a lack of consensus concerning how dataset should be made available to the community, thus resulting in considerable structural differences between the publicly available datasets. This technical note presents the DSCleaner, a Python library to clean, preprocess, and convert time series datasets to a standard file format. Two application examples using real-world datasets are also presented to show the technical validity of the proposed library. Full article
Show Figures

Figure 1

11 pages, 3182 KiB  
Data Descriptor
Sea Ice Climate Normals for Seasonal Ice Monitoring of Arctic and Sub-Regions
by Ge Peng, Anthony Arguez, Walter N. Meier, Freja Vamborg, Jake Crouch and Philip Jones
Data 2019, 4(3), 122; https://doi.org/10.3390/data4030122 - 10 Aug 2019
Cited by 5 | Viewed by 5564
Abstract
The climate normal, that is, the latest three full-decade average, of Arctic sea ice parameters is useful for baselining the sea ice state. A baseline ice state on both regional and local scales is important for monitoring how the current regional and local [...] Read more.
The climate normal, that is, the latest three full-decade average, of Arctic sea ice parameters is useful for baselining the sea ice state. A baseline ice state on both regional and local scales is important for monitoring how the current regional and local states depart from their normal to understand the vulnerability of marine and sea ice-based ecosystems to the changing climate conditions. Combined with up-to-date observations and reliable projections, normals are essential to business strategic planning, climate adaptation and risk mitigation. In this paper, monthly and annual climate normals of sea ice parameters (concentration, area, and extent) of the whole Arctic Ocean and 15 regional divisions are derived for the period of 1981–2010 using monthly satellite sea ice concentration estimates from a climate data record (CDR) produced by NOAA and the National Snow and Ice Data Center (NSIDC). Basic descriptions and characteristics of the normals are provided. Empirical Orthogonal Function (EOF) analysis has been utilized to describe spatial modes of sea ice concentration variability and how the corresponding principal components change over time. To provide users with basic information on data product accuracy and uncertainty, the climate normal values of Arctic sea ice extents (SIE) are compared with that of other products, including a product from NSIDC and two products from the Copernicus Climate Change Service (C3S). The SIE differences between different products are in the range of 2.3–4.5% of the CDR SIE mean. Additionally, data uncertainty estimates are represented by using the range (the difference between the maximum and minimum), standard deviation, 10th and 90th percentiles, and the first, second, and third quartile distribution of all monthly values, a distinct feature of these sea ice normal products. Full article
(This article belongs to the Special Issue Open Data and Robust & Reliable GIScience)
Show Figures

Graphical abstract

20 pages, 590 KiB  
Article
Aspect Extraction from Bangla Reviews Through Stacked Auto-Encoders
by Matteo Bodini
Data 2019, 4(3), 121; https://doi.org/10.3390/data4030121 - 09 Aug 2019
Cited by 7 | Viewed by 4087
Abstract
Interactions between online users are growing more and more in recent years, due to the latest developments of the web. People share online comments, opinions, and reviews about many topics. Aspect extraction is the automatic process of understanding the topic (the aspect) of [...] Read more.
Interactions between online users are growing more and more in recent years, due to the latest developments of the web. People share online comments, opinions, and reviews about many topics. Aspect extraction is the automatic process of understanding the topic (the aspect) of such comments, which has obtained huge interest from commercial and academic points of view. For instance, reviews available in webshops (like eBay, Amazon, Aliexpress, etc.) can help the customers in purchasing products and automatic analysis of reviews would be useful, as sometimes it is almost impossible to read all the available ones. In recent years, aspect extraction in the Bangla language has been regarded more and more as a task of growing importance. In the previous literature, a few methods have been introduced to classify Bangla texts according to the aspect they were focused on. This kind of research is limited mainly due to the lack of publicly available datasets for aspect extraction in the Bangla language. We take into account the only two publicly available datasets, recently published, collected for the task of aspect extraction in the Bangla language. Then, we introduce several classification methods based on stacked auto-encoders, as far as we know never exploited in the task of aspect extraction in Bangla, and we achieve better aspect classification performance with respect to the state-of-the-art: the experiments show an average improvement of 0.17 , 0.31 and 0.30 (across the two datasets), respectively in precision, recall and F1-score, reported in the state-of-the-art works that tackled the problem. Full article
Show Figures

Figure 1

8 pages, 4219 KiB  
Data Descriptor
Satellite-Based Reconstruction of the Volcanic Deposits during the December 2015 Etna Eruption
by Gaetana Ganci, Annalisa Cappello, Giuseppe Bilotta, Claudia Corradino and Ciro Del Negro
Data 2019, 4(3), 120; https://doi.org/10.3390/data4030120 - 08 Aug 2019
Cited by 13 | Viewed by 2715
Abstract
Satellite-derived data, including an estimation of the eruption rate, proximal volcanic deposits and lava flow morphometric parameters (area, maximum length, thickness, and volume) are provided for the eruption that occurred at Mt Etna on 6–8 December 2015. This eruption took place at the [...] Read more.
Satellite-derived data, including an estimation of the eruption rate, proximal volcanic deposits and lava flow morphometric parameters (area, maximum length, thickness, and volume) are provided for the eruption that occurred at Mt Etna on 6–8 December 2015. This eruption took place at the New Southeast Crater (NSEC), the youngest of the summit craters of Etna, shortly after a sequence of four violent paroxysmal events took place in 65 h (3–5 December) at “Voragine”, the oldest summit crater. Multispectral SEVIRI images at 15 min sampling time have been used to compute time-averaged eruption rate curves, while tri-stereo Pléiades images, at 50 cm spatial resolution, provided the pre-eruptive topography and topographic changes due to volcanic deposits. In addition to the two types of satellite data, other parameters have been inferred, such as probable vesicularity and pyroclastic deposits. Full article
Show Figures

Figure 1

15 pages, 2706 KiB  
Article
Gifted and Talented Services for EFL Learners in China: A Step-by-Step Guide to Propensity Score Matching Analysis in R
by Shifang Tang, Fuhui Tong and Xiuhong Lu
Data 2019, 4(3), 119; https://doi.org/10.3390/data4030119 - 03 Aug 2019
Cited by 2 | Viewed by 3438
Abstract
We sought to quantify the effectiveness of a gifted and talented (GT) program, as was provided to university students who demonstrated a talent for learning English as a foreign language (EFL) in China. To do so, we used propensity score matching (PSM) techniques [...] Read more.
We sought to quantify the effectiveness of a gifted and talented (GT) program, as was provided to university students who demonstrated a talent for learning English as a foreign language (EFL) in China. To do so, we used propensity score matching (PSM) techniques to analyze data collected from a tier-1 university where an English talent (ET) program was provided. Specifically, we provided (a) a step-by-step guide of PSM analysis using the R analytical package, (b) the codes for PSM analysis and visualization, and (c) the final analysis of baseline equivalence and treatment effect based on the matching sample. Collectively, the results of descriptive statistics, visualization, and baseline equivalence indicate that PSM is an effective matching technique for generating an unbiased counterfactual analysis. Moreover, the ET program yields a statistically significant, positive effect on ET students’ English language proficiency. Full article
Show Figures

Figure 1

16 pages, 2542 KiB  
Data Descriptor
A Rainfall Data Intercomparison Dataset of RADKLIM, RADOLAN, and Rain Gauge Data for Germany
by Jennifer Kreklow, Björn Tetzlaff, Gerald Kuhnt and Benjamin Burkhard
Data 2019, 4(3), 118; https://doi.org/10.3390/data4030118 - 02 Aug 2019
Cited by 14 | Viewed by 5366
Abstract
Quantitative precipitation estimates (QPE) derived from weather radars provide spatially and temporally highly resolved rainfall data. However, they are also subject to systematic and random bias and various potential uncertainties and therefore require thorough quality checks before usage. The dataset described in this [...] Read more.
Quantitative precipitation estimates (QPE) derived from weather radars provide spatially and temporally highly resolved rainfall data. However, they are also subject to systematic and random bias and various potential uncertainties and therefore require thorough quality checks before usage. The dataset described in this paper is a collection of precipitation statistics calculated from the hourly nationwide German RADKLIM and RADOLAN QPEs provided by the German Weather Service (Deutscher Wetterdienst (DWD)), which were combined with rainfall statistics derived from rain gauge data for intercomparison. Moreover, additional information on parameters that can potentially influence radar data quality, such as the height above sea level, information on wind energy plants and the distance to the next radar station, were included in the dataset. The resulting two point shapefiles are readable with all common GIS and constitutes a spatially highly resolved rainfall statistics geodataset for the period 2006 to 2017, which can be used for statistical rainfall analyses or for the derivation of model inputs. Furthermore, the publication of this data collection has the potential to benefit other users who intend to use precipitation data for any purpose in Germany and to identify the rainfall dataset that is best suited for their application by a straightforward comparison of three rainfall datasets without any tedious data processing and georeferencing. Full article
Show Figures

Figure 1

10 pages, 2455 KiB  
Article
Paving the Way towards an Armenian Data Cube
by Shushanik Asmaryan, Vahagn Muradyan, Garegin Tepanosyan, Azatuhi Hovsepyan, Armen Saghatelyan, Hrachya Astsatryan, Hayk Grigoryan, Rita Abrahamyan, Yaniss Guigoz and Gregory Giuliani
Data 2019, 4(3), 117; https://doi.org/10.3390/data4030117 - 02 Aug 2019
Cited by 28 | Viewed by 5235
Abstract
Environmental issues become an increasing global concern because of the continuous pressure on natural resources. Earth observations (EO), which include both satellite/UAV and in-situ data, can provide robust monitoring for various environmental concerns. The realization of the full information potential of EO data [...] Read more.
Environmental issues become an increasing global concern because of the continuous pressure on natural resources. Earth observations (EO), which include both satellite/UAV and in-situ data, can provide robust monitoring for various environmental concerns. The realization of the full information potential of EO data requires innovative tools to minimize the time and scientific knowledge needed to access, prepare and analyze a large volume of data. EO Data Cube (DC) is a new paradigm aiming to realize it. The article presents the Swiss-Armenian joint initiative on the deployment of an Armenian DC, which is anchored on the best practices of the Swiss model. The Armenian DC is a complete and up-to-date archive of EO data (e.g., Landsat 5, 7, 8, Sentinel-2) by benefiting from Switzerland’s expertise in implementing the Swiss DC. The use-case of confirm delineation of Lake Sevan using McFeeters band ratio algorithm is discussed. The validation shows that the results are sufficiently reliable. The transfer of the necessary knowledge from Switzerland to Armenia for developing and implementing the first version of an Armenian DC should be considered as a first step of a permanent collaboration for paving the way towards continuous remote environmental monitoring in Armenia. Full article
(This article belongs to the Special Issue Earth Observation Data Cubes)
Show Figures

Figure 1

10 pages, 2762 KiB  
Data Descriptor
A High-Resolution Map of Singapore’s Terrestrial Ecosystems
by Leon Yan-Feng Gaw, Alex Thiam Koon Yee and Daniel Rex Richards
Data 2019, 4(3), 116; https://doi.org/10.3390/data4030116 - 01 Aug 2019
Cited by 53 | Viewed by 23231
Abstract
The natural and semi-natural areas within cities provide important refuges for biodiversity, as well as many benefits to people. To study urban ecology and quantify the benefits of urban ecosystems, we need to understand the spatial extent and configuration of different types of [...] Read more.
The natural and semi-natural areas within cities provide important refuges for biodiversity, as well as many benefits to people. To study urban ecology and quantify the benefits of urban ecosystems, we need to understand the spatial extent and configuration of different types of vegetated cover within a city. It is challenging to map urban ecosystems because they are typically small and highly fragmented; thus requiring high resolution satellite images. This article describes a new high-resolution map of land cover for the tropical city-state of Singapore. We used images from WorldView and QuickBird satellites, and classified these images using random forest machine learning and supplementary datasets into 12 terrestrial land classes. Close to 50 % of Singapore’s land cover is vegetated while freshwater fills about 6 %, and the rest is bare or built up. The overall accuracy of the map was 79 % and the class-specific errors are described in detail. Tropical regions such as Singapore have a lot of cloud cover year-round, complicating the process of mapping using satellite imagery. The land cover map provided here will have applications for urban biodiversity studies, ecosystem service quantification, and natural capital assessment. Full article
Show Figures

Figure 1

12 pages, 2595 KiB  
Article
Dynamic Data Citation Service—Subset Tool for Operational Data Management
by Chris Schubert, Georg Seyerl and Katharina Sack
Data 2019, 4(3), 115; https://doi.org/10.3390/data4030115 - 01 Aug 2019
Cited by 2 | Viewed by 3880
Abstract
In earth observation and climatological sciences, data and their data services grow on a daily basis in a large spatial extent due to the high coverage rate of satellite sensors, model calculations, but also by continuous meteorological in situ observations. In order to [...] Read more.
In earth observation and climatological sciences, data and their data services grow on a daily basis in a large spatial extent due to the high coverage rate of satellite sensors, model calculations, but also by continuous meteorological in situ observations. In order to reuse such data, especially data fragments as well as their data services in a collaborative and reproducible manner by citing the origin source, data analysts, e.g., researchers or impact modelers, need a possibility to identify the exact version, precise time information, parameter, and names of the dataset used. A manual process would make the citation of data fragments as a subset of an entire dataset rather complex and imprecise to obtain. Data in climate research are in most cases multidimensional, structured grid data that can change partially over time. The citation of such evolving content requires the approach of “dynamic data citation”. The applied approach is based on associating queries with persistent identifiers. These queries contain the subsetting parameters, e.g., the spatial coordinates of the desired study area or the time frame with a start and end date, which are automatically included in the metadata of the newly generated subset and thus represent the information about the data history, the data provenance, which has to be established in data repository ecosystems. The Research Data Alliance Data Citation Working Group (RDA Data Citation WG) summarized the scientific status quo as well as the state of the art from existing citation and data management concepts and developed the scalable dynamic data citation methodology of evolving data. The Data Centre at the Climate Change Centre Austria (CCCA) has implemented the given recommendations and offers since 2017 an operational service on dynamic data citation on climate scenario data. With the consciousness that the objective of this topic brings a lot of dependencies on bibliographic citation research which is still under discussion, the CCCA service on Dynamic Data Citation focused on the climate domain specific issues, like characteristics of data, formats, software environment, and usage behavior. The current effort beyond spreading made experiences will be the scalability of the implementation, e.g., towards the potential of an Open Data Cube solution. Full article
(This article belongs to the Special Issue Earth Observation Data Cubes)
Show Figures

Figure 1

8 pages, 4499 KiB  
Data Descriptor
A New Multi-Temporal Forest Cover Classification for the Xingu River Basin, Brazil
by Margaret Kalacska, Oliver Lucanus, Leandro Sousa and J. Pablo Arroyo-Mora
Data 2019, 4(3), 114; https://doi.org/10.3390/data4030114 - 01 Aug 2019
Cited by 5 | Viewed by 3223
Abstract
We describe a new multi-temporal classification for forest/non-forest classes for a 1.3 million square kilometer area encompassing the Xingu River basin, Brazil. This region is well known for its exceptionally high biodiversity, especially in terms of the ichthyofauna, with approximately 600 known species, [...] Read more.
We describe a new multi-temporal classification for forest/non-forest classes for a 1.3 million square kilometer area encompassing the Xingu River basin, Brazil. This region is well known for its exceptionally high biodiversity, especially in terms of the ichthyofauna, with approximately 600 known species, 10% of which are endemic to the river basin. Global and regional scale datasets do not adequately capture the rapidly changing land cover in this region. Accurate forest cover and forest cover change data are important for understanding the anthropogenic pressures on the aquatic ecosystems. We developed the new classifications with a minimum mapping unit of 0.8 ha from cloud free mosaics of Landsat TM5 and OLI 8 imagery in Google Earth Engine using a classification and regression tree (CART) aided by field photographs for the selection of training and validation points. Full article
Show Figures

Figure 1

23 pages, 4465 KiB  
Article
Paving the Way to Increased Interoperability of Earth Observations Data Cubes
by Gregory Giuliani, Joan Masó, Paolo Mazzetti, Stefano Nativi and Alaitz Zabala
Data 2019, 4(3), 113; https://doi.org/10.3390/data4030113 - 30 Jul 2019
Cited by 36 | Viewed by 6651
Abstract
Earth observations data cubes (EODCs) are a paradigm transforming the way users interact with large spatio-temporal Earth observation (EO) data. It enhances connections between data, applications and users facilitating management, access and use of analysis ready data (ARD). The ambition is allowing users [...] Read more.
Earth observations data cubes (EODCs) are a paradigm transforming the way users interact with large spatio-temporal Earth observation (EO) data. It enhances connections between data, applications and users facilitating management, access and use of analysis ready data (ARD). The ambition is allowing users to harness big EO data at a minimum cost and effort. This significant interest is illustrated by various implementations that exist. The novelty of the approach results in different innovative solutions and the lack of commonly agreed definition of EODC. Consequently, their interoperability has been recognized as a major challenge for the global change and Earth system science domains. The objective of this paper is preventing EODC from becoming silos of information; to present how interoperability can be enabled using widely-adopted geospatial standards; and to contribute to the debate of enhanced interoperability of EODC. We demonstrate how standards can be used, profiled and enriched to pave the way to increased interoperability of EODC and can help delivering and leveraging the power of EO data building, efficient discovery, access and processing services. Full article
(This article belongs to the Special Issue Earth Observation Data Cubes)
Show Figures

Figure 1

10 pages, 1765 KiB  
Article
Catastrophic Household Expenditure for Healthcare in Turkey: Clustering Analysis of Categorical Data
by Onur Dogan, Gizem Kaya, Aycan Kaya and Hidayet Beyhan
Data 2019, 4(3), 112; https://doi.org/10.3390/data4030112 - 29 Jul 2019
Cited by 2 | Viewed by 3982
Abstract
The amount of health expenditure at the household level is one of the most basic indicators of development in countries. In many countries, health expenditure increases relative to national income. If out-of-pocket health spending is higher than the income or too high, this [...] Read more.
The amount of health expenditure at the household level is one of the most basic indicators of development in countries. In many countries, health expenditure increases relative to national income. If out-of-pocket health spending is higher than the income or too high, this indicates an economical alarm that causes a lower life standard, called catastrophic health expenditure. Catastrophic expenditure may be affected by many factors such as household type, property status, smoking and drinking alcohol habits, being active in sports, and having private health insurance. The study aims to investigate households with respect to catastrophic health expenditure by the clustering method. Clustering enables one to see the main similarity and difference between the groups. The results show that there are significant and interesting differences between the five groups. C4 households earn more but spend less money on health problems by the rate of 3.10% because people who do physical exercises regularly have fewer health problems. A household with a family with one adult, landlord and three people in total (mother or father and two children) in the cluster C5 earns much money and spends large amounts for health expenses than other clusters. C1 households with elementary families with three children, and who do not pay rent although they are not landlords have the highest catastrophic health expenditure. Households in C3 have a rate of 3.83% health expenditure rate on average, which is higher than other clusters. Households in the cluster C2 make the most catastrophic health expenditure. Full article
(This article belongs to the Special Issue Data-Driven Healthcare Tasks: Tools, Frameworks, and Techniques)
Show Figures

Figure 1

11 pages, 2655 KiB  
Data Descriptor
TIRF Microscope Image Sequences of Fluorescent IgE-FcεRI Receptor Complexes inside a FcεRI-Centric Synapse in RBL-2H3 Cells
by Rachel Drawbond and Kathrin Spendier
Data 2019, 4(3), 111; https://doi.org/10.3390/data4030111 - 28 Jul 2019
Cited by 2 | Viewed by 3615
Abstract
Total internal reflection fluorescence (TIRF) microscope image sequences are commonly used to study receptors in live cells. The dataset presented herein facilitates the study of the IgE-FcεRI receptor signaling complex (IgE-RC) in rat basophilic leukemia (RBL-2H3) cells coming into contact with a supported [...] Read more.
Total internal reflection fluorescence (TIRF) microscope image sequences are commonly used to study receptors in live cells. The dataset presented herein facilitates the study of the IgE-FcεRI receptor signaling complex (IgE-RC) in rat basophilic leukemia (RBL-2H3) cells coming into contact with a supported lipid bilayer with 25 mol% N-dinitrophenyl-aminocaproyl phosphatidylethanolamine, modeling an immunological synapse. TIRF microscopy was used to image IgE-RCs within this FcεRI-centric synapse by loading RBL-2H3 cells with fluorescent anti-dinitrophenyl (anti-DNP) immunoglobulin E (IgE) in suspension for 24 h. Fluorescent anti-DNP IgE (IgE488) concentrations of this suspension increased from 10% to 100% and corresponding non-fluorescent anti-DNP IgE concentrations decreased from 90% to 0%. After the removal of unbound anti-DNP IgE, multiple image sequences were taken for each of these ten conditions. Prior to imaging, anti-DNP IgE-primed RBL-2H3 cells were either kept for a few minutes, for about 30 min, or for about one hour in Hanks buffer. The dataset contains 482 RBL-2H3 model synapse image stacks, dark images to correct for background intensity, and TIRF illumination profile images to correct for non-uniform TIRF illumination. After background subtraction, non-uniform illumination correction, and conversion of pixel units from analog-to-digital units to photo electrons, the average pixel intensity was calculated. The average pixel intensity within FcεRI-centric synapses for all three Hanks buffer conditions increased linearly at a rate of 0.42 ± 0.02 photo electrons per pixel per % IgE488 in suspension. RBL-2H3 cell degranulation was tested by detecting β-hexosaminidase activity. Prolonged RBL-2H3 cell exposure to Hanks buffer inhibited exocytosis in RBL-2H3 cells. Full article
Show Figures

Figure 1

17 pages, 617 KiB  
Review
Reinforcement Learning in Financial Markets
by Terry Lingze Meng and Matloob Khushi
Data 2019, 4(3), 110; https://doi.org/10.3390/data4030110 - 28 Jul 2019
Cited by 73 | Viewed by 15661
Abstract
Recently there has been an exponential increase in the use of artificial intelligence for trading in financial markets such as stock and forex. Reinforcement learning has become of particular interest to financial traders ever since the program AlphaGo defeated the strongest human contemporary [...] Read more.
Recently there has been an exponential increase in the use of artificial intelligence for trading in financial markets such as stock and forex. Reinforcement learning has become of particular interest to financial traders ever since the program AlphaGo defeated the strongest human contemporary Go board game player Lee Sedol in 2016. We systematically reviewed all recent stock/forex prediction or trading articles that used reinforcement learning as their primary machine learning method. All reviewed articles had some unrealistic assumptions such as no transaction costs, no liquidity issues and no bid or ask spread issues. Transaction costs had significant impacts on the profitability of the reinforcement learning algorithms compared with the baseline algorithms tested. Despite showing statistically significant profitability when reinforcement learning was used in comparison with baseline models in many studies, some showed no meaningful level of profitability, in particular with large changes in the price pattern between the system training and testing data. Furthermore, few performance comparisons between reinforcement learning and other sophisticated machine/deep learning models were provided. The impact of transaction costs, including the bid/ask spread on profitability has also been assessed. In conclusion, reinforcement learning in stock/forex trading is still in its early development and further research is needed to make it a reliable method in this domain. Full article
(This article belongs to the Special Issue Data Analysis for Financial Markets)
Show Figures

Figure 1

12 pages, 1100 KiB  
Article
Prediction of Fault Fix Time Transition in Large-Scale Open Source Project Data
by Hironobu Sone, Yoshinobu Tamura and Shigeru Yamada
Data 2019, 4(3), 109; https://doi.org/10.3390/data4030109 - 27 Jul 2019
Cited by 3 | Viewed by 2485
Abstract
Open source software (OSS) programs are adopted as embedded systems regarding their server usage, due to their quick delivery, cost reduction, and standardization of systems. Many OSS programs are developed using the peculiar style known as the bazaar method, in which faults are [...] Read more.
Open source software (OSS) programs are adopted as embedded systems regarding their server usage, due to their quick delivery, cost reduction, and standardization of systems. Many OSS programs are developed using the peculiar style known as the bazaar method, in which faults are detected and fixed by developers around the world, and the result is then reflected in the next release. Furthermore, the fix time of faults tends to be shorter as the development of the OSS progresses. However, several large-scale open source projects encounter the problem that fault fixing takes much time because the fault corrector cannot handle many fault reports. Therefore, OSS users and project managers need to know the stability degree of open source projects by determining the fault fix time. In this paper, we predict the transition of the fix time in large-scale open source projects. To make the prediction, we use the software reliability growth model based on the Wiener process considering that the fault fix time in open source projects changes depending on various factors such as the fault reporting time and the assignees to fix the faults. In addition, we discuss the assumption that fault fix time data depend on the prediction of the transition in fault fixing time. Full article
Show Figures

Figure 1

12 pages, 1029 KiB  
Article
Urban Mobility Demand Profiles: Time Series for Cars and Bike-Sharing Use as a Resource for Transport and Energy Modeling
by Michel Noussan, Giovanni Carioni, Francesco Davide Sanvito and Emanuela Colombo
Data 2019, 4(3), 108; https://doi.org/10.3390/data4030108 - 26 Jul 2019
Cited by 10 | Viewed by 3800
Abstract
The transport sector is currently facing a significant transition, with strong drivers including decarbonization and digitalization trends, especially in urban passenger transport. The availability of monitoring data is at the basis of the development of optimization models supporting an enhanced urban mobility, with [...] Read more.
The transport sector is currently facing a significant transition, with strong drivers including decarbonization and digitalization trends, especially in urban passenger transport. The availability of monitoring data is at the basis of the development of optimization models supporting an enhanced urban mobility, with multiple benefits including lower pollutants and CO2 emissions, lower energy consumption, better transport management and land space use. This paper presents two datasets that represent time series with a high temporal resolution (five-minute time step) both for vehicles and bike sharing use in the city of Turin, located in Northern Italy. These high-resolution profiles have been obtained by the collection and elaboration of available online resources providing live information on traffic monitoring and bike sharing docking stations. The data are provided for the entire year 2018, and they represent an interesting basis for the evaluation of seasonal and daily variability patterns in urban mobility. These data may be used for different applications, ranging from the chronological distribution of mobility demand, to the estimation of passenger transport flows for the development of transport models in urban contexts. Moreover, traffic profiles are at the basis for the modeling of electric vehicles charging strategies and their interaction with the power grid. Full article
Show Figures

Figure 1

10 pages, 1145 KiB  
Data Descriptor
Internal Seed Structure of Alpine Plants and Extreme Cold Exposure
by Ganesh K. Jaganathan and Sarah E. Dalrymple
Data 2019, 4(3), 107; https://doi.org/10.3390/data4030107 - 24 Jul 2019
Cited by 1 | Viewed by 3383
Abstract
Cold tolerance in seeds is not well understood compared to mechanisms in aboveground plant tissue but is crucial to understanding how plant populations persist in extreme cold conditions. Counter-intuitively, the ability of seeds to survive extreme cold may become more important in the [...] Read more.
Cold tolerance in seeds is not well understood compared to mechanisms in aboveground plant tissue but is crucial to understanding how plant populations persist in extreme cold conditions. Counter-intuitively, the ability of seeds to survive extreme cold may become more important in the future due to climate change projections. This is due to the loss of the insulating snow bed resulting in the actual temperatures experienced at soil surface level being much colder than without snow cover. Seed survival in extremely low temperatures is conferred by mechanisms that can be divided into freezing avoidance and freezing tolerance depending on the location of ice crystal formation within the seed. We present a dataset of alpine angiosperm species with seed mass and seed structure defined as endospermic and non-endospermic. This is presented alongside the locations of temperature minima per species which can be used to examine the extent to which different seed structures are associated with snow cover. We hope that the dataset can be used by others to demonstrate if certain seed structures and sizes are associated with snow cover, and if so, would they be negatively impacted by the loss of snow resulting from climate change. Full article
Show Figures

Figure 1

8 pages, 868 KiB  
Data Descriptor
Scots Pine Seedlings Growth Dynamics Data Reveals Properties for the Future Proof of Seed Coat Color Grading Conjecture
by Arthur Novikov, Vladan Ivetić, Tatyana Novikova and Evgeniy Petrishchev
Data 2019, 4(3), 106; https://doi.org/10.3390/data4030106 - 23 Jul 2019
Cited by 12 | Viewed by 3031
Abstract
Seed coat color grading conjecture is also known as Pravdin’s conjecture. To verify the conjecture, we established a long-term field experiment. This data set included unique empirical data of Scots pine (Pinus sylvestris L.) container-grown seedlings produced from different seed color grades, [...] Read more.
Seed coat color grading conjecture is also known as Pravdin’s conjecture. To verify the conjecture, we established a long-term field experiment. This data set included unique empirical data of Scots pine (Pinus sylvestris L.) container-grown seedlings produced from different seed color grades, outplanted on a post fire site in the Voronezh region, Russia. Variables were provided for 10 rows of 90 samples in each row. These data contribute to our understanding of seed germination and seedlings growth dynamics from size and color gradings of seeds. This structure is the future basis of the Forest Reproductive Material Library (FRMLib) and will be used for assisted migration and forest seed transfer. Full article
Show Figures

Graphical abstract

9 pages, 625 KiB  
Data Descriptor
Building Stock and Building Typology of Kigali, Rwanda
by Felix Bachofer, Andreas Braun, Florian Adamietz, Sally Murray, Pablo d’Angelo, Edward Kyazze, Abias Philippe Mumuhire and Jonathan Bower
Data 2019, 4(3), 105; https://doi.org/10.3390/data4030105 - 21 Jul 2019
Cited by 14 | Viewed by 5858
Abstract
This study uses very high-resolution Pléiades imagery for the densely built-up central part of the City of Kigali for the year 2015 in order to derive urban morphology data on building footprints, building archetypes and building heights. Aerial images of the study area [...] Read more.
This study uses very high-resolution Pléiades imagery for the densely built-up central part of the City of Kigali for the year 2015 in order to derive urban morphology data on building footprints, building archetypes and building heights. Aerial images of the study area from 2008–2009 were used in combination with the 2015 dataset to create a change monitoring dataset on a single building basis. A semi-automated approach was chosen which combined an object-based image analysis with an expert-based revision. The result is a geospatial dataset that detects 165,625 buildings for 2008–2009 and 211,458 for 2015. The dataset includes information on the type of changes between the two dates. Analysis of this geospatial dataset can be used for a range of research applications in economics and the social sciences, as well as a range of policy applications in urban planning and municipal finance administration. Full article
Show Figures

Graphical abstract

10 pages, 2649 KiB  
Data Descriptor
Towards the Fulfillment of a Knowledge Gap: Wood Densities for Species of the Subtropical Atlantic Forest
by Laio Zimermann Oliveira, Heitor Felippe Uller, Aline Renata Klitzke, Jackson Roberto Eleotério and Alexander Christian Vibrans
Data 2019, 4(3), 104; https://doi.org/10.3390/data4030104 - 20 Jul 2019
Cited by 16 | Viewed by 4208
Abstract
Wood density ( ρ ) is a trait involved in forest biomass estimates, forest ecology, prediction of stand stability, wood science, and engineering. Regardless of its importance, data on ρ are scarce for a substantial number of species of the vast Atlantic Forest [...] Read more.
Wood density ( ρ ) is a trait involved in forest biomass estimates, forest ecology, prediction of stand stability, wood science, and engineering. Regardless of its importance, data on ρ are scarce for a substantial number of species of the vast Atlantic Forest phytogeographic domain. Given that, the present paper describes a dataset composed of three data tables: (i) determinations of ρ (kg m−3) for 153 species growing in three forest types within the subtropical Atlantic Forest, based on wood samples collected throughout the state of Santa Catarina, southern Brazil; (ii) a list of 719 tree/shrub species observed by a state-level forest inventory and a ρ value assigned to each one of them based on local determinations and on a global database; (iii) the means and standard deviations of ρ for 477 permanent sample plots located in the subtropical Atlantic Forest, covering ∼95,000 km2. The mean ρ over the 153 sampled species is 538.6 kg m−3 (standard deviation = 120.5 kg m−3), and the mean ρ per sample plot, considering the three forest types, is 525.0 kg m−3 (standard error = 1.8 kg m−3). The described dataset has potential to underpin studies on forest biomass, forest ecology, alternative uses of timber resources, as well as to enlarge the coverage of global datasets. Full article
(This article belongs to the Special Issue Forest Monitoring Systems and Assessments at Multiple Scales)
Show Figures

Figure 1

8 pages, 355 KiB  
Data Descriptor
Correlations between Environmental Factors and Milk Production of Holstein Cows
by Roman Mylostyvyi and Olexandr Chernenko
Data 2019, 4(3), 103; https://doi.org/10.3390/data4030103 - 19 Jul 2019
Cited by 23 | Viewed by 7370
Abstract
Global climate change is a challenge for dairy farming. In this regard, identifying reliable correlations between environmental parameters and animals’ physiological responses is a starting point for the mathematical modeling of their effects on the future welfare and milk production of cows. The [...] Read more.
Global climate change is a challenge for dairy farming. In this regard, identifying reliable correlations between environmental parameters and animals’ physiological responses is a starting point for the mathematical modeling of their effects on the future welfare and milk production of cows. The aim of the study was to examine the relationship between environmental parameters and the milk production of cows in hot period. Archival data from the Ukrainian Hydrometeorological Center were used to study the state of insolation conditions (IC), wind direction (WD), wind strength (WS), air temperature (AT), and relative humidity (RH). The temperature–humidity index (THI) (Kibler, 1964) and temperature–humidity index in the hangar-type cowshed (THICHT) (Mylostyvyi et al., 2019) served as integral indicators of the state of the cowshed’s microclimate. The daily milk yield (DMY), yield of milk fat (MF) and milk protein (MP), and percentage of milk fat (PMF) and protein (PMP) were taken into account by the DairyComp 305 herd management system (VAS, USA). Statistical data processing was performed using the mathematical functions of Microsoft Excel (Microsoft Inc.) and Statistica 10 (StatSoft Inc.). There was a weak correlation between IC and DMY at r = −0.2, between RH and DMY at r = +0.4, and between RH and MF at r = +0.2. Between DMY, MF, MP, and WS made up r = –0.2 to 0.4. Between DMY, MF, MP, and AT made up r = −0.2 to 0.5 (p < 0.05). The effects of weather factors on animal productivity will be the subject of further research. Full article
Show Figures

Figure 1

19 pages, 10075 KiB  
Article
Semantic Earth Observation Data Cubes
by Hannah Augustin, Martin Sudmanns, Dirk Tiede, Stefan Lang and Andrea Baraldi
Data 2019, 4(3), 102; https://doi.org/10.3390/data4030102 - 17 Jul 2019
Cited by 31 | Viewed by 6968
Abstract
There is an increasing amount of free and open Earth observation (EO) data, yet more information is not necessarily being generated from them at the same rate despite high information potential. The main challenge in the big EO analysis domain is producing information [...] Read more.
There is an increasing amount of free and open Earth observation (EO) data, yet more information is not necessarily being generated from them at the same rate despite high information potential. The main challenge in the big EO analysis domain is producing information from EO data, because numerical, sensory data have no semantic meaning; they lack semantics. We are introducing the concept of a semantic EO data cube as an advancement of state-of-the-art EO data cubes. We define a semantic EO data cube as a spatio-temporal data cube containing EO data, where for each observation at least one nominal (i.e., categorical) interpretation is available and can be queried in the same instance. Here we clarify and share our definition of semantic EO data cubes, demonstrating how they enable different possibilities for data retrieval, semantic queries based on EO data content and semantically enabled analysis. Semantic EO data cubes are the foundation for EO data expert systems, where new information can be inferred automatically in a machine-based way using semantic queries that humans understand. We argue that semantic EO data cubes are better positioned to handle current and upcoming big EO data challenges than non-semantic EO data cubes, while facilitating an ever-diversifying user-base to produce their own information and harness the immense potential of big EO data. Full article
(This article belongs to the Special Issue Earth Observation Data Cubes)
Show Figures

Figure 1

23 pages, 4815 KiB  
Article
Feedforward Neural Network-Based Architecture for Predicting Emotions from Speech
by Mihai Gavrilescu and Nicolae Vizireanu
Data 2019, 4(3), 101; https://doi.org/10.3390/data4030101 - 15 Jul 2019
Cited by 7 | Viewed by 3962
Abstract
We propose a novel feedforward neural network (FFNN)-based speech emotion recognition system built on three layers: A base layer where a set of speech features are evaluated and classified; a middle layer where a speech matrix is built based on the classification scores [...] Read more.
We propose a novel feedforward neural network (FFNN)-based speech emotion recognition system built on three layers: A base layer where a set of speech features are evaluated and classified; a middle layer where a speech matrix is built based on the classification scores computed in the base layer; a top layer where an FFNN- and a rule-based classifier are used to analyze the speech matrix and output the predicted emotion. The system offers 80.75% accuracy for predicting the six basic emotions and surpasses other state-of-the-art methods when tested on emotion-stimulated utterances. The method is robust and the fastest in the literature, computing a stable prediction in less than 78 s and proving attractive for replacing questionnaire-based methods and for real-time use. A set of correlations between several speech features (intensity contour, speech rate, pause rate, and short-time energy) and the evaluated emotions is determined, which enhances previous similar studies that have not analyzed these speech features. Using these correlations to improve the system leads to a 6% increase in accuracy. The proposed system can be used to improve human–computer interfaces, in computer-mediated education systems, for accident prevention, and for predicting mental disorders and physical diseases. Full article
Show Figures

Graphical abstract

Previous Issue
Next Issue
Back to TopTop