Next Article in Journal
Symmetry-Like Relation of Relative Entropy Measure of Quantum Coherence
Next Article in Special Issue
Theory, Analysis, and Applications of the Entropic Lattice Boltzmann Model for Compressible Flows
Previous Article in Journal
Path Planning of Pattern Transfer Based on Dual-Operator and a Dual-Population Ant Colony Algorithm for Digital Mask Projection Lithography
Previous Article in Special Issue
Universal Gorban’s Entropies: Geometric Case Study
Open AccessArticle

Robust and Scalable Learning of Complex Intrinsic Dataset Geometry via ElPiGraph

by Luca Albergante 1,2,3,4,*, Evgeny Mirkes 5,6, Jonathan Bac 1,2,3,7, Huidong Chen 8,9, Alexis Martin 1,2,3,10, Louis Faure 1,2,3,11, Emmanuel Barillot 1,2,3, Luca Pinello 8,9, Alexander Gorban 5,6 and Andrei Zinovyev 1,2,3,*
1
Institut Curie, PSL Research University, 75005 Paris, France
2
INSERM U900, 75248 Paris, France
3
CBIO-Centre for Computational Biology, Mines ParisTech, PSL Research University, 75006 Paris, France
4
Sensyne Health, Oxford OX4 4GE, UK
5
Center for Mathematical Modeling, University of Leicester, Leicester LE1 7RH, UK
6
Lobachevsky University, 603000 Nizhny Novgorod, Russia
7
Centre de Recherches Interdisciplinaires, Université de Paris, F-75000 Paris, France
8
Molecular Pathology Unit & Cancer Center, Massachusetts General Hospital Research Institute and Harvard Medical School, Boston, MA 02114, USA
9
Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
10
ECE Paris, F-75015 Paris, France
11
Center for Brain Research, Medical University of Vienna, 22180 Vienna, Austria
*
Authors to whom correspondence should be addressed.
Entropy 2020, 22(3), 296; https://doi.org/10.3390/e22030296
Received: 6 December 2019 / Revised: 26 February 2020 / Accepted: 2 March 2020 / Published: 4 March 2020
(This article belongs to the Special Issue Entropies: Between Information Geometry and Kinetics)
Multidimensional datapoint clouds representing large datasets are frequently characterized by non-trivial low-dimensional geometry and topology which can be recovered by unsupervised machine learning approaches, in particular, by principal graphs. Principal graphs approximate the multivariate data by a graph injected into the data space with some constraints imposed on the node mapping. Here we present ElPiGraph, a scalable and robust method for constructing principal graphs. ElPiGraph exploits and further develops the concept of elastic energy, the topological graph grammar approach, and a gradient descent-like optimization of the graph topology. The method is able to withstand high levels of noise and is capable of approximating data point clouds via principal graph ensembles. This strategy can be used to estimate the statistical significance of complex data features and to summarize them into a single consensus principal graph. ElPiGraph deals efficiently with large datasets in various fields such as biology, where it can be used for example with single-cell transcriptomic or epigenomic datasets to infer gene expression dynamics and recover differentiation landscapes. View Full-Text
Keywords: data approximation; principal graphs; principal trees; topological grammars; software data approximation; principal graphs; principal trees; topological grammars; software
Show Figures

Figure 1

MDPI and ACS Style

Albergante, L.; Mirkes, E.; Bac, J.; Chen, H.; Martin, A.; Faure, L.; Barillot, E.; Pinello, L.; Gorban, A.; Zinovyev, A. Robust and Scalable Learning of Complex Intrinsic Dataset Geometry via ElPiGraph. Entropy 2020, 22, 296.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Search more from Scilit
 
Search
Back to TopTop