Next Article in Journal
Atomic Structure Modeling of Multi-Principal-Element Alloys by the Principle of Maximum Entropy
Next Article in Special Issue
Fast Feature Selection in a GPU Cluster Using the Delta Test
Previous Article in Journal
Bayesian Reliability Estimation for Deteriorating Systems with Limited Samples Using the Maximum Entropy Approach
Previous Article in Special Issue
Stochasticity: A Feature for the Structuring of Large and Heterogeneous Image Databases
Article Menu

Export Article

Open AccessArticle
Entropy 2013, 15(12), 5510-5535; doi:10.3390/e15125510

Information-Theoretic Data Discarding for Dynamic Trees on Data Streams

Department of Mathematics, Imperial College London, South Kensington Campus, London SW7 2AZ, UK
Booth School of Business, The University of Chicago, 5807 South Woodlawn Avenue, Chicago, IL 60637, USA
Author to whom correspondence should be addressed.
Received: 9 August 2013 / Revised: 4 December 2013 / Accepted: 9 December 2013 / Published: 13 December 2013
(This article belongs to the Special Issue Big Data)
View Full-Text   |   Download PDF [728 KB, uploaded 24 February 2015]   |  


Ubiquitous automated data collection at an unprecedented scale is making available streaming, real-time information flows in a wide variety of settings, transforming both science and industry. Learning algorithms deployed in such contexts often rely on single-pass inference, where the data history is never revisited. Learning may also need to be temporally adaptive to remain up-to-date against unforeseen changes in the data generating mechanism. Online Bayesian inference remains challenged by such transient, evolving data streams. Nonparametric modeling techniques can prove particularly ill-suited, as the complexity of the model is allowed to increase with the sample size. In this work, we take steps to overcome these challenges by porting information theoretic heuristics, such as exponential forgetting and active learning, into a fully Bayesian framework. We showcase our methods by augmenting a modern non-parametric modeling framework, dynamic trees, and illustrate its performance on a number of practical examples. The end product is a powerful streaming regression and classification tool, whose performance compares favorably to the state-of-the-art.
Keywords: regression and classification trees; dynamic trees; streaming data; massive data; online learning; active learning regression and classification trees; dynamic trees; streaming data; massive data; online learning; active learning
This is an open access article distributed under the Creative Commons Attribution License (CC BY 3.0).

Scifeed alert for new publications

Never miss any articles matching your research from any publisher
  • Get alerts for new papers matching your research
  • Find out the new papers from selected authors
  • Updated daily for 49'000+ journals and 6000+ publishers
  • Define your Scifeed now

SciFeed Share & Cite This Article

MDPI and ACS Style

Anagnostopoulos, C.; Gramacy, R.B. Information-Theoretic Data Discarding for Dynamic Trees on Data Streams. Entropy 2013, 15, 5510-5535.

Show more citation formats Show less citations formats

Related Articles

Article Metrics

Article Access Statistics



[Return to top]
Entropy EISSN 1099-4300 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top