E-Mail Alert

Add your e-mail address to receive forthcoming issues of this journal:

Journal Browser

Journal Browser

Special Issue "Big Data"

Quicklinks

A special issue of Entropy (ISSN 1099-4300).

Deadline for manuscript submissions: closed (30 October 2013)

Special Issue Editor

Guest Editor
Dr. Nikunj C. Oza (Website)

NASA Ames Research Center, NASA, Moffett Field, CA 94035, USA
Interests: data mining; machine learning; ensemble learning methods; online learning; anomaly detection; applications of machine learning and data mining

Special Issue Information

Dear Colleagues,

"Big data" refers to datasets that are so large that conventional database management and data analysis tools are insufficient to work with them. Big data has become a bigger-than-ever problem for many reasons. Data storage is rapidly becoming cheaper in terms of cost per unit of storage, thereby making appealing the prospect of saving all collected data. Computer processing is becoming more powerful and cheaper, and computer memory is also becoming cheaper, thereby making processing such data increasingly practical. The number of deployed sensors is growing rapidly. For example, there are a greater number of Earth-Observing Satellites than ever before, collecting many terabytes of data per day. Engineered systems have increasing sensing of their environment as well as of the systems themselves for integrated vehicle health management. The internet has greatly added to the volume and heterogeneity of data available---the world-wide web contains an enormous volume of text, images, videos, and connections between these. Many complex processes that we desire to understand generate these data. We desire methods that go in the reverse direction---from big data to an understanding of these complex processes---how they work, when and how they display anomalous behavior, and other insights. Data mining is a field---brought about through the combination of machine learning, statistics, and database management---that seeks to develop such methods. This special issue seeks comprehensive reviews or research articles in the area of entropy and information theory methods for big data. Research articles may describe theoretical and/or algorithmic developments.

Dr. Nikunj C. Oza
Guest Editor

Submission

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. Papers will be published continuously (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are refereed through a peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Entropy is an international peer-reviewed Open Access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1400 CHF (Swiss Francs).


Keywords

  • big data
  • analytics
  • data mining
  • predictive analytics
  • knowledge discovery
  • classification
  • regression
  • anomaly detection
  • clustering

Published Papers (5 papers)

View options order results:
result details:
Displaying articles 1-5
Export citation of selected articles as:

Research

Open AccessArticle Fast Feature Selection in a GPU Cluster Using the Delta Test
Entropy 2014, 16(2), 854-869; doi:10.3390/e16020854
Received: 13 October 2013 / Revised: 10 January 2014 / Accepted: 28 January 2014 / Published: 13 February 2014
Cited by 6 | PDF Full-text (325 KB) | HTML Full-text | XML Full-text
Abstract
Feature or variable selection still remains an unsolved problem, due to the infeasible evaluation of all the solution space. Several algorithms based on heuristics have been proposed so far with successful results. However, these algorithms were not designed for considering very large [...] Read more.
Feature or variable selection still remains an unsolved problem, due to the infeasible evaluation of all the solution space. Several algorithms based on heuristics have been proposed so far with successful results. However, these algorithms were not designed for considering very large datasets, making their execution impossible, due to the memory and time limitations. This paper presents an implementation of a genetic algorithm that has been parallelized using the classical island approach, but also considering graphic processing units to speed up the computation of the fitness function. Special attention has been paid to the population evaluation, as well as to the migration operator in the parallel genetic algorithm (GA), which is not usually considered too significant; although, as the experiments will show, it is crucial in order to obtain robust results. Full article
(This article belongs to the Special Issue Big Data)
Open AccessArticle Information-Theoretic Data Discarding for Dynamic Trees on Data Streams
Entropy 2013, 15(12), 5510-5535; doi:10.3390/e15125510
Received: 9 August 2013 / Revised: 4 December 2013 / Accepted: 9 December 2013 / Published: 13 December 2013
Cited by 2 | PDF Full-text (728 KB) | HTML Full-text | XML Full-text
Abstract
Ubiquitous automated data collection at an unprecedented scale is making available streaming, real-time information flows in a wide variety of settings, transforming both science and industry. Learning algorithms deployed in such contexts often rely on single-pass inference, where the data history is [...] Read more.
Ubiquitous automated data collection at an unprecedented scale is making available streaming, real-time information flows in a wide variety of settings, transforming both science and industry. Learning algorithms deployed in such contexts often rely on single-pass inference, where the data history is never revisited. Learning may also need to be temporally adaptive to remain up-to-date against unforeseen changes in the data generating mechanism. Online Bayesian inference remains challenged by such transient, evolving data streams. Nonparametric modeling techniques can prove particularly ill-suited, as the complexity of the model is allowed to increase with the sample size. In this work, we take steps to overcome these challenges by porting information theoretic heuristics, such as exponential forgetting and active learning, into a fully Bayesian framework. We showcase our methods by augmenting a modern non-parametric modeling framework, dynamic trees, and illustrate its performance on a number of practical examples. The end product is a powerful streaming regression and classification tool, whose performance compares favorably to the state-of-the-art. Full article
(This article belongs to the Special Issue Big Data)
Open AccessArticle Stochasticity: A Feature for the Structuring of Large and Heterogeneous Image Databases
Entropy 2013, 15(11), 4782-4801; doi:10.3390/e15114782
Received: 26 August 2013 / Revised: 27 September 2013 / Accepted: 29 October 2013 / Published: 4 November 2013
Cited by 3 | PDF Full-text (2532 KB) | HTML Full-text | XML Full-text
Abstract
The paper addresses image feature characterization and the structuring of large and heterogeneous image databases through the stochasticity or randomness appearance. Measuring stochasticity involves finding suitable representations that can significantly reduce statistical dependencies of any order. Wavelet packet representations provide such a [...] Read more.
The paper addresses image feature characterization and the structuring of large and heterogeneous image databases through the stochasticity or randomness appearance. Measuring stochasticity involves finding suitable representations that can significantly reduce statistical dependencies of any order. Wavelet packet representations provide such a framework for a large class of stochastic processes through an appropriate dictionary of parametric models. From this dictionary and the Kolmogorov stochasticity index, the paper proposes semantic stochasticity templates upon wavelet packet sub-bands in order to provide high level classification and content-based image retrieval. The approach is shown to be relevant for texture images. Full article
(This article belongs to the Special Issue Big Data)
Open AccessArticle Kernel Spectral Clustering for Big Data Networks
Entropy 2013, 15(5), 1567-1586; doi:10.3390/e15051567
Received: 1 March 2013 / Revised: 25 April 2013 / Accepted: 29 April 2013 / Published: 3 May 2013
Cited by 21 | PDF Full-text (2683 KB) | HTML Full-text | XML Full-text
Abstract
This paper shows the feasibility of utilizing the Kernel Spectral Clustering (KSC) method for the purpose of community detection in big data networks. KSC employs a primal-dual framework to construct a model. It results in a powerful property of effectively inferring the [...] Read more.
This paper shows the feasibility of utilizing the Kernel Spectral Clustering (KSC) method for the purpose of community detection in big data networks. KSC employs a primal-dual framework to construct a model. It results in a powerful property of effectively inferring the community affiliation for out-of-sample extensions. The original large kernel matrix cannot fitinto memory. Therefore, we select a smaller subgraph that preserves the overall community structure to construct the model. It makes use of the out-of-sample extension property for community membership of the unseen nodes. We provide a novel memory- and computationally efficient model selection procedure based on angular similarity in the eigenspace. We demonstrate the effectiveness of KSC on large scale synthetic networks and real world networks like the YouTube network, a road network of California and the Livejournal network. These networks contain millions of nodes and several million edges. Full article
(This article belongs to the Special Issue Big Data)
Open AccessArticle Discretization Based on Entropy and Multiple Scanning
Entropy 2013, 15(5), 1486-1502; doi:10.3390/e15051486
Received: 28 February 2013 / Revised: 16 April 2013 / Accepted: 18 April 2013 / Published: 25 April 2013
Cited by 9 | PDF Full-text (221 KB) | HTML Full-text | XML Full-text
Abstract
In this paper we present entropy driven methodology for discretization. Recently, the original entropy based discretization was enhanced by including two options of selecting the best numerical attribute. In one option, Dominant Attribute, an attribute with the smallest conditional entropy of the [...] Read more.
In this paper we present entropy driven methodology for discretization. Recently, the original entropy based discretization was enhanced by including two options of selecting the best numerical attribute. In one option, Dominant Attribute, an attribute with the smallest conditional entropy of the concept given the attribute is selected for discretization and then the best cut point is determined. In the second option, Multiple Scanning, all attributes are scanned a number of times, and at the same time the best cut points are selected for all attributes. The results of experiments on 17 benchmark data sets, including large data sets, with 175 attributes or 25,931 cases, are presented. For comparison, the results of experiments on the same data sets using the global versions of well-known discretization methods of Equal Interval Width and Equal Frequency per Interval are also included. The entropy driven technique enhanced both of these methods by converting them into globalized methods. Results of our experiments show that the Multiple Scanning methodology is significantly better than both: Dominant Attribute and the better results of Globalized Equal Interval Width and Equal Frequency per Interval methods (using two-tailed test and 0.01 level of significance). Full article
(This article belongs to the Special Issue Big Data)

Journal Contact

MDPI AG
Entropy Editorial Office
St. Alban-Anlage 66, 4052 Basel, Switzerland
entropy@mdpi.com
Tel. +41 61 683 77 34
Fax: +41 61 302 89 18
Editorial Board
Contact Details Submit to Entropy
Back to Top