Why Topology for Machine Learning and Knowledge Extraction?

Data has shape, and shape is the domain of geometry and in particular of its “free” part, called topology. The aim of this paper is twofold. First, it provides a brief overview of applications of topology to machine learning and knowledge extraction, as well as the motivations thereof. Furthermore, this paper is aimed at promoting cross-talk between the theoretical and applied domains of topology and machine learning research. Such interactions can be beneficial for both the generation of novel theoretical tools and finding cutting-edge practical applications.


Introduction
Data has shape in more than one sense: Macroscopically, a dataset may itself have a shape, and this is generally of capital importance for analyzing it.At a lower granularity level, each element of a dataset may have a shape, often in a scarcely formalized way.Both machine learning and knowledge extraction, then, need to understand and take advantage of either type of data shape.Moreover, in the human-machine interaction, visualization plays an important role.All these issues require a smart use of geometry and, more and more often, of that "free" branch of it which is topology.There are literally thousands of texts (articles, proceedings, books) on Topological Data Analysis (TDA).Here I will mention very few keystones, examples and comprehensive surveys.

The Shape of Datasets and Networks
There are different ways of thinking of the shape of a dataset.For a long time we have used Mahalanobis and Bhattacharyya distances to take into account the shape of clusters present in data.All the remarkable success of statistical learning by Support Vector Machines (SVM) is due to the consideration that, in many practical situations, if a separator of different populations in a dataset exists, it does not have a simple (i.e., linear) shape [1] (see [2], co-authored by the topologist and Fields medalist S. Smale, for a different viewpoint on similar themes).
How do relevant data embed into the space of all possible occurrences?This problem is raised in a paper co-authored by Fields medalist D. Mumford [3] and solved in a surprising way [4]: In the space of all possible 3 × 3 pixel patches in a digital image, consider the ones with high contrast; this is a finite subset of a finite set, so it seems impossible to sensibly talk about its shape.Still, the authors show that it is the discretization of a Klein bottle; this is made possible by persistent homology (see Section 3).
It has been well known for a long time that the topology of a space is strictly connected with the behaviour of a continuous real function defined on it (i.e., it is connected with the indices of its critical points).This is the core of classical Morse Theory [5] and is a key idea in TDA, for instance in Mapper [6] and in persistent homology (see Section 3).
A common problem is that data can be described by a high number of variables, but the dataset X can be intrinsically low-dimensional.The "true" dimension of X is easily discovered by Principal Component Analysis, provided that data lie in a linear subspace.Otherwise Mapper can come to help.Its ingredients are: A parameter space Z, an open covering U of it, a continuous function f (called filter) from the metric space where the data is represented to Z. Then Mapper builds a simplicial complex as follows: Vertices are the data clusters contained in f −1 (U) for each open set U ∈ U , and there is a k-simplex with vertices v 0 , . . ., v k whenever the intersection of the clusters represented by these vertices is nonempty.Then you get a "topological summary" of the dataset which makes it much easier to deal with it.Of course, the choices of Z, U and f influence the dimension of the resulting complex and the resolution of the representation.Mapper is widely used in companies working at data analysis, and has a lot of scientific applications: From network security to RNA sequencing to epidemiology, to mention a few [7][8][9][10][11][12][13].
UMAP [14] is another, recent system for reducing the dimension of datasets.
While the idea of the shape, in a topological sense, of a dataset is fairly new, topology is commonly associated with networks.
A way to think of the shape of a network is to analyze whether it has similar features to a small-world network [15].A different problem is the coverage of a given domain by a network of sensors [16]; this already provides an example of use of topological persistence, which I will treat later.There is a sort of nonvisible shape in a neural network: The datum of its weights.While learning, a neural network modifies its own "shape" up to a stable one.This shape stabilization interacts with the (more) visible topological structure of the network itself as a graph, which may be-so to say-non-Euclidean; this is the subject of "geometric deep learning" which makes use of Fourier theory, spectral graph theory and much more [17].

The Shape of a Data Element
What does a machine learn?How can one extract knowledge from a dataset?Quite often it is a matter of estimating similarity in a broad sense: Classifying a music piece, retrieving images, ascribing a text to an author require to determine either a distance or a relation among objects, or both.In this task, topology is extremely powerful.By its very nature, it is a method for formalizing qualitative aspects of reality.So it is no wonder that TDA enjoyed a terrific growth in recent years.
When speaking of TDA, one often refers to Persistent Homology.Homology is a branch of algebraic topology, a discipline which assigns algebraic invariants to topological spaces in such a way that two topologically equivalent (homeomorphic) spaces get the same invariants (the converse is unfortunately not true in general); in particular, homology is the class of invariants which can best be computed for practical uses [18].What about persistent homology?It comes from the following class of problems.
Imagine that there is an object of interest, subset of a metric (e.g., Euclidean) space, of which you only have a finite cloud of samples.This is the common situation of a digitalized image, of a 3D mesh, but also what you have in the aforementioned cases of a network of sensors, or of the pixel patches.How can you guess the (algebraic) topology of the original object out of these samples?One first idea is to build a continuous object out of the point cloud, hoping that what you get is a good approximation of the original object.One way to do that is by centering a ball of fixed radius on each sample point.There are some smart techniques based on this idea: e.g., the construction of Vietoris-Rips, Čech, alpha complexes [19].Out of these constructions we can compute topological invariants; typically they are the dimensions of the homology modules at various dimensions k, called Betti numbers.Substantially, they count the numbers of k-cycles, i.e. connected components and voids in the object.There is a problem: The Betti numbers that one obtains depend on the radius of the balls.While varying the radius, there may be k-cycles which persist, and a good guess is that those may correspond to the true k-cycles of the sampled object [20,21].
In a more formal and general way, instead of studying only a topological space X, persistence studies pairs (X, f ), where f is a continuous real function defined on X (filtering function).Then, for each pair of reals u < v, the k-th persistent Betti number function counts how many k-cycles present in the sublevel set under u (i.e., in the set {x ∈ X | f (x) ≤ u}) survive in the sublevel set under v [22,23].These functions are usually summarized by persistence diagrams or barcodes and yield lower bounds for a natural pseudodistance between such pairs [24].There are several derived structures-e.g., zigzag diagrams, landscapes, vineyards, extended persistent diagrams-and a compelling algebraic setting, persistence modules.An important development is the use of a different range than R for the filtering function; in particular, R n as a range is very tempting for applications but poses hard problems [25,26].The stability of these representations is the object of particular attention [27][28][29]; this (and much more) is thoroughly covered for the 1-dimensional range in [30].
The freedom of choice of the filtering function and the generality of the setting grant a great modularity to this tool; it was clear already when applying the historical predecessor of persistent homology, Size Functions, to classification problems [31,32].This flexibility has been widely exploited in the analysis of data of natural origin [33].The multiset nature of persistence diagrams and barcodes represents a problem and a challenge for them to be input to a machine learning system.A solution, roughly said, is to substitute cornerpoints with Gaussian kernels [34][35][36][37]; this idea was proposed long before, in rudimentary form, for size functions [38,39].The idea of associating different pairs (X, f ) to the same data is part of a general philosophy of inserting the observer into the observed phenomenon, well expressed in [40].
Implementations of persistent homology algorithms are covered in [41].Python implementations of TDA are surveyed in [42].A multiscale mapper combining Mapper with the ideas of persistent homology has been recently developed [43].Exporting the structure of persistence diagrams beyond topology is the main goal of [44].

Visualization for the Human-In-The-Loop Paradigm
There are several reasons for an automatic system to be-at least for the moment-just a smart assistant of a human operator in a number of tasks.This is particularly the case of biomedical applications, where technology offers invaluable support, but the final word is still the competence of the physician.If a machine has the advantage of speed and tirelessness, the human expert has the unbeaten capability of learning concepts with few examples and of evaluating problems and solutions in non-formalized environments.These different skills risk becoming drawbacks if they remain separated; they can, on the contrary, enhance each other if they integrate together in the human-in-the-loop paradigm [45].This is what happens, e.g., in Active Learning, where continual interaction between human and machine optimizes the trade-off between what an algorithm can offer and what the human actually looks for [46].This can take the simple form of relevance feedback in data retrieval, or of smarter systems where the machine poses queries to the operator, or even exploits peculiarities of human psychology to minimize time and cognitive burden.Interaction then becomes unavoidable when data exploration goals are ill-defined or evolve over time.
A structural difficulty consists in the fact that a learning machine usually works in a very high-dimensional space, what is hardly something a human can deal with.On the other hand, human experts are highly skilled in extracting knowledge from datasets of dimension ≤ 3.This is why visualization is a keystone of human-machine interaction, bringing information down to the sensorial domain of a human user.Data exploration and analysis, but also data presentation can greatly benefit from it.A typical example is the representation of relations by a graph, where additional information can be conveyed by size, color and shape of vertices and edges.(A peculiar example is the one of crystallizations, edge-colored graphs by which all piecewise-linear manifolds of any dimension can be completely represented, with applications in pure topology and, recently, in theoretical physics [47].)Selection, modification, and hypothesis testing thereof then become easier.In general, it is necessary to build algorithms that are able to provide representations of data interpretable by humans (see the already quoted [9,10,14] for some).This is part of a large project [48,49] in which geometry and particularly topology play a relevant role.

Conclusions
A lot of work has been done, applying topology to machine learning and knowledge extraction, but much more awaits the competence and imagination of experts from both sides.I passionately hope that more topologists discover the challenges, suggestions, and application chances coming from this domain, but I also invite researchers from computer science, artificial intelligence and even robotics [50] to add topology to their toolboxes.