Previous Issue

Table of Contents

Data, Volume 4, Issue 1 (March 2019)

  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Readerexternal link to open them.
View options order results:
result details:
Displaying articles 1-31
Export citation of selected articles as:
Open AccessData Descriptor Immunomics Datasets and Tools: To Identify Potential Epitope Segments for Designing Chimeric Vaccine Candidate to Cervix Papilloma
Data 2019, 4(1), 31; https://doi.org/10.3390/data4010031 (registering DOI)
Received: 6 December 2018 / Revised: 11 February 2019 / Accepted: 12 February 2019 / Published: 15 February 2019
PDF Full-text (2398 KB) | HTML Full-text | XML Full-text
Abstract
Immunomics tools and databases play an important role in the designing of prophylactic or therapeutic vaccines against pathogenic bacteria and viruses. Therefore, we aimed to illustrate the different immunological databases and web servers used to design a chimeric vaccine candidate against human cervix [...] Read more.
Immunomics tools and databases play an important role in the designing of prophylactic or therapeutic vaccines against pathogenic bacteria and viruses. Therefore, we aimed to illustrate the different immunological databases and web servers used to design a chimeric vaccine candidate against human cervix papilloma. Initially, cellular immunity inducing major histocompatibility complex class I and II epitopes from L2 protein of papilloma 58 strain were predicted using the IEDB, NetMHC, and Tepi tools. Then, the overlapped segments from the above analysis were used to calculate efficiency on interferon-gamma and humoral immunity production. In addition, the allergenicity, antigenicity, cross-reactivity with human proteomes, and epitope conservancy of elite segments were determined. The chimeric vaccine candidate (SGD58) was constructed with two different overlapped peptide segments (23–36) and (29–42), adjuvants (flagellin and RS09), two Th epitopes, and amino acid linkers. The results of homology modeling demonstrated that SGD58 have 88.6% of favored regions based on Ramachandran plot. Protein–protein docking with Swarm Dock reveals SGD58 with receptor complex have −54.74 kcal/mol of binding energy with more than 20 interacting residues. Docked complex are stable in 100ns of molecular dynamic simulation. Further, coding sequences of SGD58 also show elevated gene expression in E. coli. In conclusion, SGD58 may prompt vaccine against cervix papilloma. This study provides insight of vaccine design against different pathogenic microbes as well. Full article
Figures

Figure 1

Open AccessData Descriptor The Historical Small Smart City Protocol (HISMACITY): Toward an Intelligent Tool Using Geo Big Data for the Sustainable Management of Minor Historical Assets
Received: 30 November 2018 / Revised: 7 February 2019 / Accepted: 7 February 2019 / Published: 13 February 2019
Viewed by 98 | PDF Full-text (5287 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
This research reports the ongoing design of the HISMACITY (Historical Small Smart City) Protocol, a planning tool with a certification system. The tool is designed for small municipalities in Europe. Through the award-winning certification system, the Protocol supports the fulfillment of best practices. [...] Read more.
This research reports the ongoing design of the HISMACITY (Historical Small Smart City) Protocol, a planning tool with a certification system. The tool is designed for small municipalities in Europe. Through the award-winning certification system, the Protocol supports the fulfillment of best practices. Such practices can enhance town attractiveness. It also counteracts excessive land use that results from urban growth, and reduces demographic decline in internal areas of each country. The research methodology is grounded on building a dynamic dataset using geo big data, local data, and mobile data via information communications technology (ICT), and real-time data through sensors. The tool aims to build algorithms to calculate indicators that measure quality standards of integrated interventions. The aim is to reach specific goals within defined priority areas of the Historical Small Smart City Protocol. Being highly adaptive, the framework follows urban responsive design principles based on weighted suitability models that can be calibrated by changing the input data and the weights of the linear combination formula. The results highlight varying framework data, including the tool’s development procedures and practicality. Full article
(This article belongs to the Special Issue Big Data Challenges in Smart Cities)
Figures

Graphical abstract

Open AccessData Descriptor Spatial Distribution of Wind Turbines, Photovoltaic Field Systems, Bioenergy, and River Hydro Power Plants in Germany
Received: 7 December 2018 / Revised: 25 January 2019 / Accepted: 28 January 2019 / Published: 11 February 2019
Viewed by 145 | PDF Full-text (4784 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
The expansion of renewable energy technologies, accompanied by an increasingly decentralized supply structure, raises many research questions regarding the structure, dimension, and impacts of the electricity supply network. In this context, information on renewable energy plants, particularly their spatial distribution and key parameters—e.g., [...] Read more.
The expansion of renewable energy technologies, accompanied by an increasingly decentralized supply structure, raises many research questions regarding the structure, dimension, and impacts of the electricity supply network. In this context, information on renewable energy plants, particularly their spatial distribution and key parameters—e.g., installed capacity, total size, and required space—are more and more important for public decision makers and different scientific domains, such as energy system analysis and impact assessment. The dataset described in this paper covers the spatial distribution, installed capacity, and commissioning year of wind turbines, photovoltaic field systems, and bio- and river hydro power plants in Germany. Collected from different online sources and authorities, the data have been thoroughly cross-checked, cleaned, and merged to generate validated and complete datasets. The paper concludes with notes on the practical use of the dataset in an environmental impact monitoring framework and other potential research or policy settings. Full article
Figures

Figure 1

Open AccessArticle Vehicular Ad Hoc Network (VANET) Connectivity Analysis of a Highway Toll Plaza
Received: 13 January 2019 / Revised: 20 January 2019 / Accepted: 30 January 2019 / Published: 10 February 2019
Viewed by 169 | PDF Full-text (1339 KB)
Abstract
The aim of this paper was to study issues of network connectivity in vehicular ad hoc networks (VANETs) to avoid traffic congestion at a toll plaza. An analytical model was developed for highway scenarios where the traffic congestion could have the vehicles reduce [...] Read more.
The aim of this paper was to study issues of network connectivity in vehicular ad hoc networks (VANETs) to avoid traffic congestion at a toll plaza. An analytical model was developed for highway scenarios where the traffic congestion could have the vehicles reduce their speed instead of blocking the flow of traffic. In this model, nearby vehicles must be informed when traffic congestion occurs before reaching the toll plaza so they can reduce their speed in order to avoid traffic congestion. Once they have crossed the toll plaza they can travel on at their normal speed. The road was divided into two or three sub-segments to help analyze the performance of connectivity. The proposed analytical model considered various parameters that might disturb the connectivity probability, including traveling speed, communication range of vehicles, vehicle arrival rate, and road length. The simulation results matched those of the analytical model, which showed the analytical model developed in this paper is effective. Full article
Open AccessData Descriptor A Uniform In Vitro Efficacy Dataset to Guide Antimicrobial Peptide Design
Received: 16 December 2018 / Accepted: 29 January 2019 / Published: 10 February 2019
Viewed by 201 | PDF Full-text (1335 KB) | Supplementary Files
Abstract
Antimicrobial peptides are ubiquitous molecules that form the innate immune system of organisms across all kingdoms of life. Despite their prevalence and early origins, they continue to remain potent natural antimicrobial agents. Antimicrobial peptides are therefore promising drug candidates in the face of [...] Read more.
Antimicrobial peptides are ubiquitous molecules that form the innate immune system of organisms across all kingdoms of life. Despite their prevalence and early origins, they continue to remain potent natural antimicrobial agents. Antimicrobial peptides are therefore promising drug candidates in the face of overwhelming multi-drug resistance to conventional antibiotics. Over the past few decades, thousands of antimicrobial peptides have been characterized in vitro, and their efficacy data are now available in a multitude of public databases. Computational antimicrobial peptide design attempts typically use such data. However, utilizing heterogenous data aggregated from different sources presents significant drawbacks. In this report, we present a uniform dataset containing 20 antimicrobial peptides assayed against 30 organisms of Gram-negative, Gram-positive, mycobacterial, and fungal origin. We also present circular dichroism spectra for all antimicrobial peptides. We draw simple inferences from this data, and we discuss what characteristics are essential for antimicrobial peptide efficacy. We expect our uniform dataset to be useful for future projects involving computational antimicrobial peptide design. Full article
Open AccessData Descriptor A Dataset for Comparing Mirrored and Non-Mirrored Male Bust Images for Facial Recognition
Received: 13 January 2019 / Revised: 30 January 2019 / Accepted: 5 February 2019 / Published: 8 February 2019
Viewed by 162 | PDF Full-text (1114 KB)
Abstract
Facial recognition, as well as other types of human recognition, have found uses in identification, security, and learning about behavior, among other uses. Because of the high cost of data collection for training purposes, logistical challenges and other impediments, mirroring images has frequently [...] Read more.
Facial recognition, as well as other types of human recognition, have found uses in identification, security, and learning about behavior, among other uses. Because of the high cost of data collection for training purposes, logistical challenges and other impediments, mirroring images has frequently been used to increase the size of data sets. However, while these larger data sets have shown to be beneficial, their comparative level of benefit to the data collection of similar data has not been assessed. This paper presented a data set collected and prepared for this and related research purposes. The data set included both non-occluded and occluded data for mirroring assessment. Full article
Open AccessArticle Innovating Metrics for Smarter, Responsive Cities
Received: 31 January 2019 / Revised: 31 January 2019 / Accepted: 2 February 2019 / Published: 6 February 2019
Viewed by 158 | PDF Full-text (1352 KB)
Abstract
This paper explores the emerging and evolving landscape for metrics in smart cities in relation to big data challenges. Based on a review of the research literature, the problem of “synthetic quantitative indicators” along with concerns for “measuring urban realities” and “making metrics [...] Read more.
This paper explores the emerging and evolving landscape for metrics in smart cities in relation to big data challenges. Based on a review of the research literature, the problem of “synthetic quantitative indicators” along with concerns for “measuring urban realities” and “making metrics meaningful” are identified. In response, the purpose of this paper is to advance the need for innovating metrics for smarter, more interactive and responsive cities in addressing and mitigating algorithmic-related challenges on the one hand, and concerns associated with involving people more meaningfully on the other hand. As such, the constructs of awareness, learning, openness, and engagement are employed in this study. Using an exploratory case study approach, the research design for this work includes the use of multiple methods of data collection including survey and interviews. Employing a combination of content analysis for qualitative data and descriptive statistics for quantitative data, the main findings of this work support the need for rethinking and innovating metrics. As such, the main conclusion of this paper highlights the potential for developing new pathways and spaces for involving people more directly, knowingly, and meaningfully in addressing big and small data challenges for the innovating of urban metrics. Full article
(This article belongs to the Special Issue Big Data Challenges in Smart Cities)
Open AccessData Descriptor Dataset for Scheduling Strategies for Microgrids Coupled with Natural Gas Networks
Received: 30 December 2018 / Revised: 1 February 2019 / Accepted: 2 February 2019 / Published: 5 February 2019
Viewed by 142 | PDF Full-text (215 KB)
Abstract
Datasets are significant for researchers to test the functionality of their proposed strategies for the microgrid dispatch. This article presents a dataset to help researchers in testing their algorithms related to the dispatch problem of microgrids coupled with natural gas networks. This preliminary [...] Read more.
Datasets are significant for researchers to test the functionality of their proposed strategies for the microgrid dispatch. This article presents a dataset to help researchers in testing their algorithms related to the dispatch problem of microgrids coupled with natural gas networks. This preliminary release of a microgrid dispatch dataset contains data related to microgrid components (like solar PV, wind turbine, fuel cell and batteries) and natural gas network elements connected with the microgrid (e.g., micro gas turbine). It also includes the data associated with the authors’ proposed scheduling strategy and its dispatch results. The provided dataset can be used to reproduce the authors’ proposed strategy. The presented dataset further can be used for comparisons of other researchers’ proposed strategies. These comparisons will make a strategy’s features more evident. Full article
Open AccessArticle Data Preprocessing for Evaluation of Recommendation Models in E-Commerce
Received: 10 December 2018 / Revised: 18 January 2019 / Accepted: 28 January 2019 / Published: 31 January 2019
Viewed by 230 | PDF Full-text (3472 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
E-commerce businesses employ recommender models to assist in identifying a personalized set of products for each visitor. To accurately assess the recommendations’ influence on customer clicks and buys, three target areas—customer behavior, data collection, user-interface—will be explored for possible sources of erroneous data. [...] Read more.
E-commerce businesses employ recommender models to assist in identifying a personalized set of products for each visitor. To accurately assess the recommendations’ influence on customer clicks and buys, three target areas—customer behavior, data collection, user-interface—will be explored for possible sources of erroneous data. Varied customer behavior misrepresents the recommendations’ true influence on a customer due to the presence of B2B interactions and outlier customers. Non-parametric statistical procedures for outlier removal are delineated and other strategies are investigated to account for the effect of a large percentage of new customers or high bounce rates. Subsequently, in data collection we identify probable misleading interactions in the raw data, propose a robust method of tracking unique visitors, and accurately attributing the buy influence for combo products. Lastly, user-interface issues discuss the possible problems caused due to the recommendation widget’s positioning on the e-commerce website and the stringent conditions that should be imposed when utilizing data from the product listing page. This collective methodology results in an exact and valid estimation of the customer’s interactions influenced by the recommendation model in the context of standard industry metrics, such as Click-through rates, Buy-through rates, and Conversion revenue. Full article
(This article belongs to the Special Issue Data Analysis for Financial Markets)
Figures

Figure 1

Open AccessData Descriptor Biogenic Volatiles Emitted from Four Cold-Hardy Grape Cultivars During Ripening
Received: 29 December 2018 / Revised: 18 January 2019 / Accepted: 24 January 2019 / Published: 31 January 2019
Cited by 1 | Viewed by 241 | PDF Full-text (3748 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
In this research dataset, we summarize for the first time volatile organic compounds (VOCs) emitted in vivo from ripening wine grapes. We studied four cold-hardy cultivars grown in the Midwestern U.S.: St. Croix, Frontenac, Marquette, and La Crescent. These cultivars have gained popularity [...] Read more.
In this research dataset, we summarize for the first time volatile organic compounds (VOCs) emitted in vivo from ripening wine grapes. We studied four cold-hardy cultivars grown in the Midwestern U.S.: St. Croix, Frontenac, Marquette, and La Crescent. These cultivars have gained popularity among local growers and winemakers, but still very little is known about their performance compared with long-established V. vinifera grapes. Volatiles were collected using two novel approaches: biogenic emissions from grape clusters on a vine and single grape berries. A third approach was headspace collection of volatiles from crushed grapes. Solid-phase microextraction (SPME) was used to collect volatiles. Vacuum-assisted SPME was used in the case of single grape berry. Collected VOCs were analyzed using separation and identification on a gas chromatograph mass spectrometer (GC-MS). More than 120 VOCs were identified using mass spectral libraries. The dataset provides evidence that detecting biogenic emissions from growing grapes is feasible. The dataset provides a record of temporal and spatial variability of VOCs, many of which could potentially impart aroma and flavor in the wine. The number of VOCs detected followed the order from single berry (the least) to crushed berry (the most). Thus, more information for potential use in harvesting in order to obtain the desired flavor is found in data from crushed grapes. Full article
Figures

Figure 1

Open AccessEditorial Special Issue on Astrophysics & Geophysics: Research and Applications
Received: 16 January 2019 / Revised: 23 January 2019 / Accepted: 24 January 2019 / Published: 26 January 2019
Viewed by 293 | PDF Full-text (155 KB) | HTML Full-text | XML Full-text
Abstract
The earth’s layers and space are media permanently exposed to the influences of numerous perturbations characterized by time- and space-dependent intensity. For this reason, the detection of astrophysical and terrestrial events and their influences, as well as the development and application of various [...] Read more.
The earth’s layers and space are media permanently exposed to the influences of numerous perturbations characterized by time- and space-dependent intensity. For this reason, the detection of astrophysical and terrestrial events and their influences, as well as the development and application of various models, must be based on observational data. The aim of this Special Issue, “Astrophysics & Geophysics: Research and Applications” in Data, is to engage a wide community of scientists to reorganize and expand current knowledge in this field. This Special Issue contains five articles, which include a wide range of topics such as big data in astrophysics and geophysics, data processing, visualization and acquisition, Earth observational data, remote sensing, etc. We hope that the topic of this Special Issue of Data will be of continued interest and we look forward to seeing progress in this field. Full article
Open AccessArticle Comparison of Micro-Census Results for Magarya Ward, Wurno Local Government Area of Sokoto State, Nigeria, with Other Sources of Denominator Data
Received: 3 December 2018 / Revised: 16 January 2019 / Accepted: 19 January 2019 / Published: 25 January 2019
Viewed by 220 | PDF Full-text (1695 KB)
Abstract
Routine immunization coverage in Nigeria is suboptimal. In the northwestern state of Sokoto, an independent population-based survey for 2016 found immunization coverage with the third dose of Pentavalent vaccine to be 3%, whereas administrative coverage in 2016 was reported to be 69%. One [...] Read more.
Routine immunization coverage in Nigeria is suboptimal. In the northwestern state of Sokoto, an independent population-based survey for 2016 found immunization coverage with the third dose of Pentavalent vaccine to be 3%, whereas administrative coverage in 2016 was reported to be 69%. One possibility driving this large discrepancy is that administrative coverage is calculated using an under-estimated target population. Official population projections from the 2006 Census are based on state-specific standard population growth rates. Immunization target population estimates from other sources have not been independently validated. We conducted a micro-census in Magarya ward, Wurno Local Government Area of Sokoto state to obtain an accurate count of the total population living in the ward, and to compare these results with other sources of denominator data. We developed a precise micro-plan using satellite imagery, and used the navigation tool EpiSample v1 in the field to guide teams to each building, without duplications or omissions. The particular characteristics of the selected ward underscore the importance of using standardized shape files to draw precise boundaries for enumeration micro-plans. While the use of this methodology did not resolve the discrepancy between independent and administrative vaccination coverage rates, a simplified application can better define the target population for routine immunization services and estimate the number of children still unprotected from vaccine-preventable diseases. Full article
Open AccessArticle Gaussian Mixture and Kernel Density-Based Hybrid Model for Volatility Behavior Extraction From Public Financial Data
Received: 7 December 2018 / Revised: 22 January 2019 / Accepted: 22 January 2019 / Published: 24 January 2019
Viewed by 200 | PDF Full-text (988 KB) | HTML Full-text | XML Full-text
Abstract
This paper carried out a hybrid clustering model for foreign exchange market volatility clustering. The proposed model is built using a Gaussian Mixture Model and the inference is done using an Expectation Maximization algorithm. A mono-dimensional kernel density estimator is used in order [...] Read more.
This paper carried out a hybrid clustering model for foreign exchange market volatility clustering. The proposed model is built using a Gaussian Mixture Model and the inference is done using an Expectation Maximization algorithm. A mono-dimensional kernel density estimator is used in order to build a probability density based on all historical observations. That allows us to evaluate the behavior’s probability of each symbol of interest. The computation result shows that the approach is able to pinpoint risky and safe hours to trade a given currency pair. Full article
(This article belongs to the Special Issue Data Analysis for Financial Markets)
Figures

Figure 1

Open AccessArticle Towards Identifying Author Confidence in Biomedical Articles
Received: 6 November 2018 / Revised: 16 January 2019 / Accepted: 17 January 2019 / Published: 21 January 2019
Viewed by 213 | PDF Full-text (4058 KB) | HTML Full-text | XML Full-text
Abstract
In an era where the volume of medical literature is increasing daily, researchers in the biomedical and clinical areas have joined efforts with language engineers to analyze the large amount of biomedical and molecular biology literature (such as PubMed), patient data, or health [...] Read more.
In an era where the volume of medical literature is increasing daily, researchers in the biomedical and clinical areas have joined efforts with language engineers to analyze the large amount of biomedical and molecular biology literature (such as PubMed), patient data, or health records. With such a huge amount of reports, evaluating their impact has long stopped being a trivial task. In this context, this paper intended to introduce a non-scientific factor that represents an important element in gaining acceptance of claims. We postulated that the confidence that an author has in expressing their work plays an important role in shaping the first impression that influences the reader’s perception of the paper. The results discussed in this paper were based on a series of experiments that were ran using data from the open archives initiative (OAI) corpus, which provides interoperability standards to facilitate effective dissemination of the content. This method may be useful to the direct beneficiaries (i.e., authors, who are engaged in medical or academic research), but also, to the researchers in the fields of biomedical text mining (BioNLP) and NLP, etc. Full article
(This article belongs to the Special Issue Curative Power of Medical Data)
Figures

Figure 1

Open AccessArticle Statistical Modeling of Trivariate Static Systems: Isotonic Models
Received: 27 November 2018 / Revised: 9 January 2019 / Accepted: 17 January 2019 / Published: 21 January 2019
Viewed by 181 | PDF Full-text (3975 KB) | HTML Full-text | XML Full-text
Abstract
This paper presents an improved version of a statistical trivariate modeling algorithm introduced in a short Letter by the first author. This paper recalls the fundamental concepts behind the proposed algorithm, evidences its criticalities and illustrates a number of improvements which lead to [...] Read more.
This paper presents an improved version of a statistical trivariate modeling algorithm introduced in a short Letter by the first author. This paper recalls the fundamental concepts behind the proposed algorithm, evidences its criticalities and illustrates a number of improvements which lead to a functioning modeling algorithm. The present paper also illustrates the features of the improved statistical modeling algorithm through a comprehensive set of numerical experiments performed on four synthetic and five natural datasets. The obtained results confirm that the proposed algorithm is able to model the considered synthetic and the natural datasets faithfully. Full article
Figures

Figure 1

Open AccessArticle Data Governance and Sovereignty in Urban Data Spaces Based on Standardized ICT Reference Architectures
Received: 30 November 2018 / Revised: 15 January 2019 / Accepted: 15 January 2019 / Published: 18 January 2019
Viewed by 390 | PDF Full-text (915 KB) | HTML Full-text | XML Full-text
Abstract
European cities and communities (and beyond) require a structured overview and a set of tools as to achieve a sustainable transformation towards smarter cities/municipalities, thereby leveraging on the enormous potential of the emerging data driven economy. This paper presents the results of a [...] Read more.
European cities and communities (and beyond) require a structured overview and a set of tools as to achieve a sustainable transformation towards smarter cities/municipalities, thereby leveraging on the enormous potential of the emerging data driven economy. This paper presents the results of a recent study that was conducted with a number of German municipalities/cities. Based on the obtained and briefly presented recommendations emerging from the study, the authors propose the concept of an Urban Data Space (UDS), which facilitates an eco-system for data exchange and added value creation thereby utilizing the various types of data within a smart city/municipality. Looking at an Urban Data Space from within a German context and considering the current situation and developments in German municipalities, this paper proposes a reasonable classification of urban data that allows the relation of various data types to legal aspects, and to conduct solid considerations regarding technical implementation designs and decisions. Furthermore, the Urban Data Space is described/analyzed in detail, and relevant stakeholders are identified, as well as corresponding technical artifacts are introduced. The authors propose to setup Urban Data Spaces based on emerging standards from the area of ICT reference architectures for Smart Cities, such as DIN SPEC 91357 “Open Urban Platform” and EIP SCC. In the course of this, the paper walks the reader through the construction of a UDS based on the above-mentioned architectures and outlines all the goals, recommendations and potentials, which an Urban Data Space can reveal to a municipality/city. Finally, we aim at deriving the proposed concepts in a way that they have the potential to be part of the required set of tools towards the sustainable transformation of German and European cities in the direction of smarter urban environments, based on utilizing the hidden potential of digitalization and efficient interoperable data exchange. Full article
(This article belongs to the Special Issue Big Data Challenges in Smart Cities)
Figures

Graphical abstract

Open AccessArticle Machine-Learning Models for Sales Time Series Forecasting
Received: 3 November 2018 / Revised: 9 January 2019 / Accepted: 14 January 2019 / Published: 18 January 2019
Viewed by 206 | PDF Full-text (1585 KB) | HTML Full-text | XML Full-text
Abstract
In this paper, we study the usage of machine-learning models for sales predictive analytics. The main goal of this paper is to consider main approaches and case studies of using machine learning for sales forecasting. The effect of machine-learning generalization has been considered. [...] Read more.
In this paper, we study the usage of machine-learning models for sales predictive analytics. The main goal of this paper is to consider main approaches and case studies of using machine learning for sales forecasting. The effect of machine-learning generalization has been considered. This effect can be used to make sales predictions when there is a small amount of historical data for specific sales time series in the case when a new product or store is launched. A stacking approach for building regression ensemble of single models has been studied. The results show that using stacking techniques, we can improve the performance of predictive models for sales time series forecasting. Full article
(This article belongs to the Special Issue Data Stream Mining and Processing)
Figures

Figure 1

Open AccessData Descriptor Electroencephalograms during Mental Arithmetic Task Performance
Received: 18 December 2018 / Revised: 13 January 2019 / Accepted: 16 January 2019 / Published: 18 January 2019
Viewed by 258 | PDF Full-text (588 KB) | HTML Full-text | XML Full-text
Abstract
This work has been carried out to support the investigation of the electroencephalogram (EEG) Fourier power spectral, coherence, and detrended fluctuation characteristics during performance of mental tasks. To this aim, the presented dataset contains International 10/20 system EEG recordings from subjects under mental [...] Read more.
This work has been carried out to support the investigation of the electroencephalogram (EEG) Fourier power spectral, coherence, and detrended fluctuation characteristics during performance of mental tasks. To this aim, the presented dataset contains International 10/20 system EEG recordings from subjects under mental cognitive workload (performing mental serial subtraction) and the corresponding reference background EEGs. Based on the subtraction task performance (number of subtractions and accuracy of the result), the subjects were divided into good counters and bad counters (for whom the mental task required excessive efforts). The data was recorded from 36 healthy volunteers of matched age, all of whom are students of Educational and Scientific Centre “Institute of Biology and Medicine”, National Taras Shevchenko University of Kyiv (Ukraine); the recordings are available through Physiobank platform. The dataset can be used by the neuroscience research community studying brain dynamics during cognitive workload. Full article
(This article belongs to the Special Issue Big Data and Digital Health)
Figures

Figure 1

Open AccessArticle Improving Urban Population Distribution Models with Very-High Resolution Satellite Information
Received: 5 December 2018 / Revised: 7 January 2019 / Accepted: 14 January 2019 / Published: 16 January 2019
Viewed by 541 | PDF Full-text (8457 KB) | HTML Full-text | XML Full-text
Abstract
Built-up layers derived from medium resolution (MR) satellite information have proven their contribution to dasymetric mapping, but suffer from important limitations when working at the intra-urban level, mainly due to their difficulty in capturing the whole range of variation in terms of built-up [...] Read more.
Built-up layers derived from medium resolution (MR) satellite information have proven their contribution to dasymetric mapping, but suffer from important limitations when working at the intra-urban level, mainly due to their difficulty in capturing the whole range of variation in terms of built-up densities. In this regard, very-high resolution (VHR) remote sensing is known for its ability to better capture small variations in built-up densities and to derive detailed urban land use, which plead in favor of its use when mapping urban populations. In this paper, we compare the added value of various combinations of VHR data sets, compared to a MR one. A top-down dasymetric mapping strategy is applied to reallocate population counts from administrative units into a regular 100 × 100 m grid, according to different weighting layers. These weighting layers are created from MR and/or VHR input data, using simple built-up proportion or reallocation “weights”, obtained from a set of multiple ancillary data used to train a Random Forest regression model. The results reveal that (1) a built-up mask derived from VHR can improve the accuracy of the reallocation by roughly 13%, compared to MR; (2) using VHR land-use information alone results in lower accuracy than using a MR built-up mask; and (3) there is a clear complementarity between VHR land cover and land use. Full article
Figures

Figure 1

Open AccessData Descriptor BLE RSS Measurements Dataset for Research on Accurate Indoor Positioning
Received: 5 December 2018 / Revised: 2 January 2019 / Accepted: 8 January 2019 / Published: 12 January 2019
Viewed by 380 | PDF Full-text (1988 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
RSS-based indoor positioning is a consolidated research field for which several techniques have been proposed. Among them, Bluetooth Low Energy (BLE) beacons are a popular option for practical applications. This paper presents a new BLE RSS database that was created to aid in [...] Read more.
RSS-based indoor positioning is a consolidated research field for which several techniques have been proposed. Among them, Bluetooth Low Energy (BLE) beacons are a popular option for practical applications. This paper presents a new BLE RSS database that was created to aid in the development of new BLE RSS-based positioning methods and to encourage their reproducibility and comparability. The measurements were collected in two university zones: an area among bookshelves in a library and an area of an office space. Each zone had its own batch of deployed iBKS 105 beacons, configured to broadcast advertisements every 200 ms. The collection in the library zone was performed using three Android smartphones of different brands and models, with beacons broadcasting at −12 dBm transmission power, while in the other zone the collection was performed using of one those smartphones with beacons configured to advertise at the −4 dBm, −12 dBm and −20 dBm transmission powers. Supporting materials and scripts are provided along with the database, which annotate the BLE readings, provide details on the collection, the environment, and the BLE beacon deployments, ease the database usage, and introduce the reader to BLE RSS-based positioning and its challenges. The BLE RSS database and its supporting materials are available at the Zenodo repository under the open-source MIT license. Full article
(This article belongs to the Special Issue Wireless Localization: Tracking and Navigation Data Set)
Figures

Figure 1

Open AccessData Descriptor UAV-Based 3D Point Clouds of Freshwater Fish Habitats, Xingu River Basin, Brazil
Received: 9 December 2018 / Revised: 31 December 2018 / Accepted: 7 January 2019 / Published: 10 January 2019
Viewed by 283 | PDF Full-text (6150 KB) | HTML Full-text | XML Full-text
Abstract
Dense 3D point clouds were generated from Structure-from-Motion Multiview Stereo (SFM-MVS) photogrammetry for five representative freshwater fish habitats in the Xingu river basin, Brazil. The models were constructed from Unmanned Aerial Vehicle (UAV) photographs collected in 2016 and 2017. The Xingu River is [...] Read more.
Dense 3D point clouds were generated from Structure-from-Motion Multiview Stereo (SFM-MVS) photogrammetry for five representative freshwater fish habitats in the Xingu river basin, Brazil. The models were constructed from Unmanned Aerial Vehicle (UAV) photographs collected in 2016 and 2017. The Xingu River is one of the primary tributaries of the Amazon River. It is known for its exceptionally high aquatic biodiversity. The dense 3D point clouds were generated in the dry season when large areas of aquatic substrate are exposed due to the low water level. The point clouds were generated at ground sampling distances of 1.20–2.38 cm. These data are useful for studying the habitat characteristics and complexity of several fish species in a spatially explicit manner, such as calculation of metrics including rugosity and the Minkowski–Bouligand fractal dimension (3D complexity). From these dense 3D point clouds, substrate complexity can be determined more comprehensively than from conventional arbitrary cross sections. Full article
(This article belongs to the Special Issue Open Data and Robust & Reliable GIScience)
Figures

Figure 1

Open AccessEditorial Acknowledgement to Reviewers of Data in 2018
Published: 10 January 2019
Viewed by 229 | PDF Full-text (123 KB) | HTML Full-text | XML Full-text
Abstract
Rigorous peer-review is the corner-stone of high-quality academic publishing [...] Full article
Open AccessArticle A Cluster Graph Approach to Land Cover Classification Boosting
Received: 30 October 2018 / Revised: 3 January 2019 / Accepted: 4 January 2019 / Published: 10 January 2019
Viewed by 246 | PDF Full-text (14905 KB) | HTML Full-text | XML Full-text
Abstract
When it comes to land cover classification, the process of deriving the land classes is complex due to possible errors in algorithms, spatio-temporal heterogeneity of the Earth observation data, variation in availability and quality of reference data, or a combination of these. This [...] Read more.
When it comes to land cover classification, the process of deriving the land classes is complex due to possible errors in algorithms, spatio-temporal heterogeneity of the Earth observation data, variation in availability and quality of reference data, or a combination of these. This article proposes a probabilistic graphical model approach, in the form of a cluster graph, to boost geospatial classifications and produce a more accurate and robust classification and uncertainty product. Cluster graphs can be characterized as a means of reasoning about geospatial data such as land cover classifications by considering the effects of spatial distribution, and inter-class dependencies in a computationally efficient manner. To assess the capabilities of our proposed cluster graph boosting approach, we apply it to the field of land cover classification. We make use of existing land cover products (GlobeLand30, CORINE Land Cover) along with data from Volunteered Geographic Information (VGI), namely OpenStreetMap (OSM), to generate a boosted land cover classification and the respective uncertainty estimates. Our approach combines qualitative and quantitative components through the application of our probabilistic graphical model and subjective expert judgments. Evaluating our approach on a test region in Garmisch-Partenkirchen, Germany, our approach was able to boost the overall land cover classification accuracy by 1.4% when compared to an independent reference land cover dataset. Our approach was shown to be robust and was able to produce a diverse, feasible and spatially consistent land cover classification in areas of incomplete and conflicting evidence. On an independent validation scene, we demonstrated that our cluster graph boosting approach was generalizable even when initialized with poor prior assumptions. Full article
(This article belongs to the Special Issue Geospatial Crowdsourced Data - Validation and Classification)
Figures

Graphical abstract

Open AccessArticle Spatiotemporal Analysis of Urban Mobility Using Aggregate Mobile Phone Derived Presence and Demographic Data: A Case Study in the City of Rome, Italy
Received: 11 October 2018 / Revised: 18 December 2018 / Accepted: 29 December 2018 / Published: 8 January 2019
Viewed by 232 | PDF Full-text (7084 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
Urban mobility is known to have a relevant impact on work related car accidents especially during commuting. It is characterized by highly dynamic spatial–temporal variability. There are open questions about the size of this phenomenon; its spatial, temporal, and demographic characteristics; and driving [...] Read more.
Urban mobility is known to have a relevant impact on work related car accidents especially during commuting. It is characterized by highly dynamic spatial–temporal variability. There are open questions about the size of this phenomenon; its spatial, temporal, and demographic characteristics; and driving mechanisms. A case study is here presented for the city of Rome, Italy. High-resolution population presence and demographic data, derived from mobile phone traffic, were used. Hourly profiles of a defined mobility factor (NPM) were calculated for a gridded domain during working days and cluster analyzed to obtain mean diurnal NPM mobility patterns. Age distributions of the population were calculated from demographic data to get insight in the type of population involved in mobility, and spatially linked with the mobility patterns. Census data about production units and their employees were related with the classified NPM mobility patterns. Seven different NPM mobility patterns were identified and mapped over the study area. The mobility slightly deviates from the census-based demography (0.15 on average, in a range of 0 to 1). The number of employees per 100 inhabitants was found to be the main driving mechanism of mobility. Finally, contributions of people employed in different economic macrocategories were assigned to each mobility time-pattern. Results provide a deeper knowledge of urban dynamics and their driving mechanisms in Rome. Full article
Figures

Figure 1

Open AccessReview Neural Networks in Big Data and Web Search
Received: 4 November 2018 / Revised: 24 December 2018 / Accepted: 24 December 2018 / Published: 30 December 2018
Viewed by 423 | PDF Full-text (3151 KB) | HTML Full-text | XML Full-text
Abstract
As digitalization is gradually transforming reality into Big Data, Web search engines and recommender systems are fundamental user experience interfaces to make the generated Big Data within the Web as visible or invisible information to Web users. In addition to the challenge of [...] Read more.
As digitalization is gradually transforming reality into Big Data, Web search engines and recommender systems are fundamental user experience interfaces to make the generated Big Data within the Web as visible or invisible information to Web users. In addition to the challenge of crawling and indexing information within the enormous size and scale of the Internet, e-commerce customers and general Web users should not stay confident that the products suggested or results displayed are either complete or relevant to their search aspirations due to the commercial background of the search service. The economic priority of Web-related businesses requires a higher rank on Web snippets or product suggestions in order to receive additional customers. On the other hand, web search engine and recommender system revenue is obtained from advertisements and pay-per-click. The essential user experience is the self-assurance that the results provided are relevant and exhaustive. This survey paper presents a review of neural networks in Big Data and web search that covers web search engines, ranking algorithms, citation analysis and recommender systems. The use of artificial intelligence (AI) based on neural networks and deep learning in learning relevance and ranking is also analyzed, including its utilization in Big Data analysis and semantic applications. Finally, the random neural network is presented with its practical applications to reasoning approaches for knowledge extraction. Full article
(This article belongs to the Special Issue Semantics in the Deep: Semantic Analytics for Big Data)
Figures

Figure 1

Open AccessReview Using Twitter for Public Health Surveillance from Monitoring and Prediction to Public Response
Received: 14 December 2018 / Revised: 21 December 2018 / Accepted: 22 December 2018 / Published: 29 December 2018
Viewed by 349 | PDF Full-text (2161 KB) | HTML Full-text | XML Full-text
Abstract
Twitter is a social media platform where over 500 million people worldwide publish their ideas and discuss diverse topics, including their health conditions and public health events. Twitter has proved to be an important source of health-related information on the Internet, given the [...] Read more.
Twitter is a social media platform where over 500 million people worldwide publish their ideas and discuss diverse topics, including their health conditions and public health events. Twitter has proved to be an important source of health-related information on the Internet, given the amount of information that is shared by both citizens and official sources. Twitter provides researchers with a real-time source of public health information on a global scale, and can be very important in public health research. Classifying Twitter data into topics or categories is helpful to better understand how users react and communicate. A literature review is presented on the use of mining Twitter data or similar short-text datasets for public health applications. Each method is analyzed for ways to use Twitter data in public health surveillance. Papers in which Twitter content was classified according to users or tweets for better surveillance of public health were selected for review. Only papers published between 2010–2017 were considered. The reviewed publications are distinguished by the methods that were used to categorize the Twitter content in different ways. While comparing studies is difficult due to the number of different methods that have been used for applying Twitter and interpreting data, this state-of-the-art review demonstrates the vast potential of utilizing Twitter for public health surveillance purposes. Full article
(This article belongs to the Special Issue Big Data and Digital Health)
Figures

Figure 1

Open AccessArticle Machine Learning in Classification Time Series with Fractal Properties
Received: 13 November 2018 / Revised: 18 December 2018 / Accepted: 23 December 2018 / Published: 28 December 2018
Viewed by 353 | PDF Full-text (1616 KB) | HTML Full-text | XML Full-text
Abstract
The article presents a novel method of fractal time series classification by meta-algorithms based on decision trees. The classification objects are fractal time series. For modeling, binomial stochastic cascade processes are chosen. Each class that was singled out unites model time series with [...] Read more.
The article presents a novel method of fractal time series classification by meta-algorithms based on decision trees. The classification objects are fractal time series. For modeling, binomial stochastic cascade processes are chosen. Each class that was singled out unites model time series with the same fractal properties. Numerical experiments demonstrate that the best results are obtained by the random forest method with regression trees. A comparative analysis of the classification approaches, based on the random forest method, and traditional estimation of self-similarity degree are performed. The results show the advantage of machine learning methods over traditional time series evaluation. The results were used for detecting denial-of-service (DDoS) attacks and demonstrated a high probability of detection. Full article
(This article belongs to the Special Issue Data Stream Mining and Processing)
Figures

Figure 1

Open AccessArticle The Model and Training Algorithm of Compact Drone Autonomous Visual Navigation System
Received: 4 November 2018 / Revised: 20 December 2018 / Accepted: 22 December 2018 / Published: 28 December 2018
Viewed by 351 | PDF Full-text (1705 KB) | HTML Full-text | XML Full-text
Abstract
Trainable visual navigation systems based on deep learning demonstrate potential for robustness of onboard camera parameters and challenging environment. However, a deep model requires substantial computational resources and large labelled training sets for successful training. Implementation of the autonomous navigation and training-based fast [...] Read more.
Trainable visual navigation systems based on deep learning demonstrate potential for robustness of onboard camera parameters and challenging environment. However, a deep model requires substantial computational resources and large labelled training sets for successful training. Implementation of the autonomous navigation and training-based fast adaptation to the new environment for a compact drone is a complicated task. The article describes an original model and training algorithms adapted to the limited volume of labelled training set and constrained computational resource. This model consists of a convolutional neural network for visual feature extraction, extreme-learning machine for estimating the position displacement and boosted information-extreme classifier for obstacle prediction. To perform unsupervised training of the convolution filters with a growing sparse-coding neural gas algorithm, supervised learning algorithms to construct the decision rules with simulated annealing search algorithm used for finetuning are proposed. The use of complex criterion for parameter optimization of the feature extractor model is considered. The resulting approach performs better trajectory reconstruction than the well-known ORB-SLAM. In particular, for sequence 7 from the KITTI dataset, the translation error is reduced by nearly 65.6% under the frame rate 10 frame per second. Besides, testing on the independent TUM sequence shot outdoors produces a translation error not exceeding 6% and a rotation error not exceeding 3.68 degrees per 100 m. Testing was carried out on the Raspberry Pi 3+ single-board computer. Full article
(This article belongs to the Special Issue Data Stream Mining and Processing)
Figures

Figure 1

Open AccessData Descriptor Human Male Body Images from Multiple Perspectives with Multiple Lighting Settings
Received: 28 November 2018 / Revised: 20 December 2018 / Accepted: 20 December 2018 / Published: 23 December 2018
Viewed by 348 | PDF Full-text (2848 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
There are multiple technological ways to identify humans and verify claimed identities. The dataset presented herein facilitates work on hard and soft biometric human identification and identity verification. It is comprised of full-body images of multiple fully clothed males from a constrained age [...] Read more.
There are multiple technological ways to identify humans and verify claimed identities. The dataset presented herein facilitates work on hard and soft biometric human identification and identity verification. It is comprised of full-body images of multiple fully clothed males from a constrained age range. The images have been taken from multiple perspectives with varied lighting brightness and temperature. Full article
Figures

Figure 1

Open AccessData Descriptor A Mobile Air Pollution Monitoring Data Set
Received: 8 November 2018 / Revised: 14 December 2018 / Accepted: 17 December 2018 / Published: 22 December 2018
Viewed by 462 | PDF Full-text (1493 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
Air pollution was observed in Hamilton, Ontario, Canada using monitors installed in a mobile platform from November 2005 up to November 2016. The dataset is an aggregation of several project specific monitoring days, which attempted to quantify air pollution spatial variation under varying [...] Read more.
Air pollution was observed in Hamilton, Ontario, Canada using monitors installed in a mobile platform from November 2005 up to November 2016. The dataset is an aggregation of several project specific monitoring days, which attempted to quantify air pollution spatial variation under varying conditions or in specific regions. Pollutants observed included carbon monoxide, nitric oxide, nitrogen dioxide, total nitrogen oxides, ground-level ozone, particulate matter concentrations for size cuts of 10 µm, 2.5 µm and 1 µm, and sulfur dioxide. Observations were collected over 114 days, which occurred in varying seasons and months. During sampling, the mobile platform travelled at an average speed of 27 km/h. The samples were collected as one-minute integrated samples and are prepared as line-segments, which include an offset for instrument response time. Sampling occurred on major freeways, highways, arterial and residential roads. This dataset is shared in hopes of supporting research on how to best utilize air pollution observations obtained with mobile air pollution platforms, which is a growing technique in the field of urban air pollution monitoring. We conclude with limitations in the data capture technique and recommendations for future mobile monitoring studies. Full article
Figures

Graphical abstract

Data EISSN 2306-5729 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top