Submit to Entropy Review for Entropy Propose a Special Issue

Journal Browser

► Journal Browser

Big Data

Special Issue Editors
Special Issue Information
Keywords
Published Papers

A special issue of Entropy (ISSN 1099-4300).

Deadline for manuscript submissions: closed (30 October 2013) | Viewed by 34807

Share This Special Issue

Special Issue Editor

Dr. Nikunj C. Oza

E-Mail Website
Guest Editor

NASA Ames Research Center, NASA, Moffett Field, CA 94035, USA
Interests: data mining; machine learning; ensemble learning methods; online learning; anomaly detection; applications of machine learning and data mining

Special Issue Information

Dear Colleagues,

"Big data" refers to datasets that are so large that conventional database management and data analysis tools are insufficient to work with them. Big data has become a bigger-than-ever problem for many reasons. Data storage is rapidly becoming cheaper in terms of cost per unit of storage, thereby making appealing the prospect of saving all collected data. Computer processing is becoming more powerful and cheaper, and computer memory is also becoming cheaper, thereby making processing such data increasingly practical. The number of deployed sensors is growing rapidly. For example, there are a greater number of Earth-Observing Satellites than ever before, collecting many terabytes of data per day. Engineered systems have increasing sensing of their environment as well as of the systems themselves for integrated vehicle health management. The internet has greatly added to the volume and heterogeneity of data available---the world-wide web contains an enormous volume of text, images, videos, and connections between these. Many complex processes that we desire to understand generate these data. We desire methods that go in the reverse direction---from big data to an understanding of these complex processes---how they work, when and how they display anomalous behavior, and other insights. Data mining is a field---brought about through the combination of machine learning, statistics, and database management---that seeks to develop such methods. This special issue seeks comprehensive reviews or research articles in the area of entropy and information theory methods for big data. Research articles may describe theoretical and/or algorithmic developments.

Dr. Nikunj C. Oza
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Entropy is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

big data
analytics
data mining
predictive analytics
knowledge discovery
classification
regression
anomaly detection
clustering

Published Papers (5 papers)

Download All Papers

Research

325 KiB

Open AccessArticle

Fast Feature Selection in a GPU Cluster Using the Delta Test

by Alberto Guillén, M. Isabel García Arenas, Mark Van Heeswijk, Dusan Sovilj, Amaury Lendasse, Luis Javier Herrera, Héctor Pomares and Ignacio Rojas

Entropy 2014, 16(2), 854-869; https://doi.org/10.3390/e16020854 - 13 Feb 2014

Cited by 11 | Viewed by 7581

Abstract

Feature or variable selection still remains an unsolved problem, due to the infeasible evaluation of all the solution space. Several algorithms based on heuristics have been proposed so far with successful results. However, these algorithms were not designed for considering very large datasets, making their execution impossible, due to the memory and time limitations. This paper presents an implementation of a genetic algorithm that has been parallelized using the classical island approach, but also considering graphic processing units to speed up the computation of the fitness function. Special attention has been paid to the population evaluation, as well as to the migration operator in the parallel genetic algorithm (GA), which is not usually considered too significant; although, as the experiments will show, it is crucial in order to obtain robust results. Full article

(This article belongs to the Special Issue Big Data)

► Show Figures

728 KiB

Open AccessArticle

Information-Theoretic Data Discarding for Dynamic Trees on Data Streams

by Christoforos Anagnostopoulos and Robert B. Gramacy

Entropy 2013, 15(12), 5510-5535; https://doi.org/10.3390/e15125510 - 13 Dec 2013

Cited by 7 | Viewed by 5465

Abstract

Ubiquitous automated data collection at an unprecedented scale is making available streaming, real-time information flows in a wide variety of settings, transforming both science and industry. Learning algorithms deployed in such contexts often rely on single-pass inference, where the data history is never revisited. Learning may also need to be temporally adaptive to remain up-to-date against unforeseen changes in the data generating mechanism. Online Bayesian inference remains challenged by such transient, evolving data streams. Nonparametric modeling techniques can prove particularly ill-suited, as the complexity of the model is allowed to increase with the sample size. In this work, we take steps to overcome these challenges by porting information theoretic heuristics, such as exponential forgetting and active learning, into a fully Bayesian framework. We showcase our methods by augmenting a modern non-parametric modeling framework, dynamic trees, and illustrate its performance on a number of practical examples. The end product is a powerful streaming regression and classification tool, whose performance compares favorably to the state-of-the-art. Full article

(This article belongs to the Special Issue Big Data)

► Show Figures

Figure 1

2532 KiB

Open AccessArticle

Stochasticity: A Feature for the Structuring of Large and Heterogeneous Image Databases

by Abdourrahmane M. Atto, Yannick Berthoumieu and Rémi Mégret

Entropy 2013, 15(11), 4782-4801; https://doi.org/10.3390/e15114782 - 4 Nov 2013

Cited by 5 | Viewed by 5447

Abstract

The paper addresses image feature characterization and the structuring of large and heterogeneous image databases through the stochasticity or randomness appearance. Measuring stochasticity involves finding suitable representations that can significantly reduce statistical dependencies of any order. Wavelet packet representations provide such a framework for a large class of stochastic processes through an appropriate dictionary of parametric models. From this dictionary and the Kolmogorov stochasticity index, the paper proposes semantic stochasticity templates upon wavelet packet sub-bands in order to provide high level classification and content-based image retrieval. The approach is shown to be relevant for texture images. Full article

(This article belongs to the Special Issue Big Data)

► Show Figures

Figure 1

2683 KiB

Open AccessArticle

Kernel Spectral Clustering for Big Data Networks

by Raghvendra Mall, Rocco Langone and Johan A.K. Suykens

Entropy 2013, 15(5), 1567-1586; https://doi.org/10.3390/e15051567 - 3 May 2013

Cited by 64 | Viewed by 7574

Abstract

This paper shows the feasibility of utilizing the Kernel Spectral Clustering (KSC) method for the purpose of community detection in big data networks. KSC employs a primal-dual framework to construct a model. It results in a powerful property of effectively inferring the community affiliation for out-of-sample extensions. The original large kernel matrix cannot fitinto memory. Therefore, we select a smaller subgraph that preserves the overall community structure to construct the model. It makes use of the out-of-sample extension property for community membership of the unseen nodes. We provide a novel memory- and computationally efficient model selection procedure based on angular similarity in the eigenspace. We demonstrate the effectiveness of KSC on large scale synthetic networks and real world networks like the YouTube network, a road network of California and the Livejournal network. These networks contain millions of nodes and several million edges. Full article

(This article belongs to the Special Issue Big Data)

► Show Figures

Figure 1

221 KiB

Open AccessArticle

Discretization Based on Entropy and Multiple Scanning

by Jerzy W. Grzymala-Busse

Entropy 2013, 15(5), 1486-1502; https://doi.org/10.3390/e15051486 - 25 Apr 2013

Cited by 31 | Viewed by 7985

Abstract

In this paper we present entropy driven methodology for discretization. Recently, the original entropy based discretization was enhanced by including two options of selecting the best numerical attribute. In one option, Dominant Attribute, an attribute with the smallest conditional entropy of the concept given the attribute is selected for discretization and then the best cut point is determined. In the second option, Multiple Scanning, all attributes are scanned a number of times, and at the same time the best cut points are selected for all attributes. The results of experiments on 17 benchmark data sets, including large data sets, with 175 attributes or 25,931 cases, are presented. For comparison, the results of experiments on the same data sets using the global versions of well-known discretization methods of Equal Interval Width and Equal Frequency per Interval are also included. The entropy driven technique enhanced both of these methods by converting them into globalized methods. Results of our experiments show that the Multiple Scanning methodology is significantly better than both: Dominant Attribute and the better results of Globalized Equal Interval Width and Equal Frequency per Interval methods (using two-tailed test and 0.01 level of significance). Full article

(This article belongs to the Special Issue Big Data)

Journal Menu

Journal Browser

Big Data

Share This Special Issue

Special Issue Editor

Special Issue Information

Keywords

Published Papers (5 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI