Next Article in Journal
Malevich’s Suprematist Composition Picture for Spin States
Previous Article in Journal
Permutation Entropy and Irreversibility in Gait Kinematic Time Series from Patients with Mild Cognitive Decline and Early Alzheimer’s Dementia
Open AccessArticle

Topological Information Data Analysis

1
Inserm UNIS UMR1072—Université Aix-Marseille, 13015 Marseille, France
2
Institut de Mathématiques de Jussieu—Paris Rive Gauche (IMJ-PRG), 75013 Paris, France
*
Author to whom correspondence should be addressed.
Current address: Median Technologies, Les Deux Arcs, 1800 Route des Crêtes, 06560 Valbonne, France.
Entropy 2019, 21(9), 869; https://doi.org/10.3390/e21090869
Received: 5 July 2019 / Revised: 14 August 2019 / Accepted: 28 August 2019 / Published: 6 September 2019
(This article belongs to the Section Information Theory, Probability and Statistics)
This paper presents methods that quantify the structure of statistical interactions within a given data set, and were applied in a previous article. It establishes new results on the k-multivariate mutual-information ( I k ) inspired by the topological formulation of Information introduced in a serie of studies. In particular, we show that the vanishing of all I k for 2 k n of n random variables is equivalent to their statistical independence. Pursuing the work of Hu Kuo Ting and Te Sun Han, we show that information functions provide co-ordinates for binary variables, and that they are analytically independent from the probability simplex for any set of finite variables. The maximal positive I k identifies the variables that co-vary the most in the population, whereas the minimal negative I k identifies synergistic clusters and the variables that differentiate–segregate the most in the population. Finite data size effects and estimation biases severely constrain the effective computation of the information topology on data, and we provide simple statistical tests for the undersampling bias and the k-dependences. We give an example of application of these methods to genetic expression and unsupervised cell-type classification. The methods unravel biologically relevant subtypes, with a sample size of 41 genes and with few errors. It establishes generic basic methods to quantify the epigenetic information storage and a unified epigenetic unsupervised learning formalism. We propose that higher-order statistical interactions and non-identically distributed variables are constitutive characteristics of biological systems that should be estimated in order to unravel their significant statistical structure and diversity. The topological information data analysis presented here allows for precisely estimating this higher-order structure characteristic of biological systems. View Full-Text
Keywords: information theory; cohomology; information category; topological data analysis; genetic expression; epigenetics; multivariate mutual-information; synergy; statistical independence information theory; cohomology; information category; topological data analysis; genetic expression; epigenetics; multivariate mutual-information; synergy; statistical independence
Show Figures

Figure 1

MDPI and ACS Style

Baudot, P.; Tapia, M.; Bennequin, D.; Goaillard, J.-M. Topological Information Data Analysis. Entropy 2019, 21, 869.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map

1
Back to TopTop