Next Article in Journal
Anomaly Detection for Skin Lesion Images Using Convolutional Neural Network and Injection of Handcrafted Features: A Method That Bypasses the Preprocessing of Dermoscopic Images
Next Article in Special Issue
Problem-Driven Scenario Generation for Stochastic Programming Problems: A Survey
Previous Article in Journal
Data-Driven Analysis of Student Engagement in Time-Limited Computer Laboratories
Previous Article in Special Issue
A Surprisal-Based Greedy Heuristic for the Set Covering Problem
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Exploring Graph and Digraph Persistence

by
Mattia G. Bergomi
1,† and
Massimo Ferri
2,*,†
1
Independent Researcher, 20124 Milan, Italy
2
Advanced Research Center on Electronic Systems (ARCES), Department of Mathematics, University of Bologna, 40126 Bologna, Italy
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Algorithms 2023, 16(10), 465; https://doi.org/10.3390/a16100465
Submission received: 29 August 2023 / Revised: 22 September 2023 / Accepted: 25 September 2023 / Published: 2 October 2023

Abstract

:
Among the various generalizations of persistent topology, that based on rank functions and leading to indexing-aware functions appears to be particularly suited to catching graph-theoretical properties without the need for a simplicial construction and a homology computation. This paper defines and studies “simple” and “single-vertex” features in directed and undirected graphs, through which several indexing-aware persistence functions are produced, within the scheme of steady and ranging sets. The implementation of the “sink” feature and its application to trust networks provide an example of the ease of use and meaningfulness of the method.

1. Introduction

Data-driven applications increasingly rely on graph-based representations, making graphs and directed graphs (digraphs) crucial tools in modern data analysis. Graphs highlight patterns, trends, and correlations that would be difficult or impossible to discern through other means. Directed graphs offer an even more nuanced understanding of these interactions, allowing for causal connections.
Graph theory and machine learning offer powerful tools to study directed and undirected graphs. However, on the one hand, powerful graph-theoretical features can be hard to translate into efficient algorithms. On the other hand, most deep learning algorithms are designed for undirected graphs (for instance, direction can affect information propagation or the ability of the model to capture long-range dependencies between nodes).
We aim to combine graph-theoretical features with strategies inspired by computational topology. This approach would allow us to easily and directly integrate graph-theoretical features in computational pipelines.
Computational topology and homological persistence provide swift algorithms to quantify geometrical and topological aspects of triangulable manifolds [1]. Briefly, given a triangulable manifold M and given a continuous function f: M R , persistent homology studies the evolution of homological classes throughout the sublevel-set filtration induced by f on M. Indeed, in this context, the word persistence refers to the lifespan of homological classes along the filtration. In intuitive terms, this procedure allows us to see M using f as a lens capable of highlighting features of interest in M.
Many extensions and generalizations of topological persistence aim to augment the field of application of persistent homology. Such generalizations allow for the study of objects other than triangulable manifolds, maintaining the valuable properties granted by the persistent homology framework, e.g., stability [2], universality [3], resistance to occlusions [4], and computability [5].
In most cases, generalizations of persistent homology can be attained in two ways. On the one hand, it is possible to define explicit mappings from the object of interest to a triangulable manifold preserving the properties of the original object. On the other hand, theoretical generalizations aim to extend the definition of persistence to objects other than triangulable manifolds and functors other than homology. These generalizations come in many flavors, which we shall briefly describe in Section 2.1. Here, we shall consider the rank-based persistence framework and its developments and applications detailed in [6,7,8]. This framework revolves around an axiomatic approach that yields usable definitions of persistence in categories such as graphs and quivers, without requiring the usage of auxiliary topological constructions.
In the current manuscript, following the approach described in [8,9], we provide strategies to compute rank-based persistence on weighted graphs (either directed or undirected). Weighted (di)graphs are widely used in many real-world scenarios, ranging from the analysis of interactions in social networks to context-based recommendation. Modern machine learning provides specific neural architectures to process directed graphs, e.g., [10,11,12]. As neural networks assure maximum flexibility by learning directly from training data, graph theory offers a plethora of invariants that—albeit static—can be leveraged as descriptors of directed, weighted networks [13,14,15]. We provide an algorithmic approach to encode such features as persistence diagrams carrying compact information that can be leveraged complementarily or even integrated into modern neural analysis pipelines.

Organization of the Paper

For the sake of simplicity, we shall write “(di)graphs” for “graphs (respectively directed graphs)”. Section 2 briefly outlines various generalizations of persistence and the current methods taking advantage of persistent homology on (di)graphs. Section 3 presents the essential concepts for the generalized theory of persistence used in this manuscript, namely, categorical and indexing-aware persistence functions. These functions are based on graph-theoretical features. In particular, we introduce the concept of simple features in Section 4 and single-vertex features in Section 5. This latter class of features yields plug-and-play methods for applications: we provide an implementation of single-vertex feature persistence in Section 6 and apply it in the context of trust networks. Section 8 concludes the paper.

2. State of the Art

Oftentimes, modern datasets can be endowed with a network-like structure. At the same time, topological persistence, thanks to its aforementioned mathematically provable properties, has proved to be a valuable tool in many applications. Thus, generalizations of topological persistence and its adaptation to objects such as (di)graphs are lively research areas.

2.1. Generalizations of Persistence

The keystone of persistence theory is classically the persistence module, i.e., a functor from the category ( R , ) to the category of modules on a fixed ring (generally, a field) that catches the essential features of a filtered topological space or simplicial complex. As hinted previously, the filtration is mostly constituted by the nested sublevel-sets of a real filtering function. The modules are the images of the linear maps induced in homology by the inclusions. This setting provides great modularity: different filtering functions may catch different features. The most used filtering function is the ball radius in building Vietoris–Rips or Čech complexes out of sampled objects [16].
Extensions of the classical persistence theory offer much greater flexibility. The first generalization was the passage from single-parameter to multi-parameter filtering functions [17] and to different ranges for the filtering function [18]. However, we believe that the most impactful generalizations are in categorical terms. In [19], persistence theory was extended to functors from ( R , ) to any Abelian category, concerning in particular the “interleaving distance”. This distance was also investigated in [3] for multidimensional domains and extended in [20] to functors from a poset to an arbitrary category. Posets as domains were studied in [21,22] for a generalized notion of persistence diagram. The extension on which the present paper is based was presented in [6] and aims to widen the range of phenomena that could give rise to persistence diagrams. This is achieved via “rank functions” that generalize the dimension of homology modules in classical persistence.

2.2. Persistent Homology for (Di)graphs

Data frequently come in the form of graphs or digraphs, and researchers have often been attracted by the possibility of applying persistence to the analysis of networks. The classical strategy consists in building a filtered simplicial complex from a weighted graph and then computing a persistent homology. This can be achieved in several ways. Of course, the most direct way is to consider the graph itself as a one-dimensional simplicial complex (or CW complex if loops and multiple edges are allowed). This was carried out, e.g., in [23], where a connection between Laplacian, persistence, and network circuit graphs was drawn. Another paper [24] addressed persistence, using the notion of discrete curvature for the evaluation of graph generative models. The clique complex is also a very natural way of transforming a graph into a possibly multidimensional complex. This latter construction was leveraged in [25] for the study of deep neural networks, in [26] for the analysis of the dynamics of scientific knowledge, in [27] for understanding the breakdown and survival of democracies, and in [28,29] (among many other works) for the representation and study of the human connectome. The persistent homology of clique–community complexes was applied to the study of general types of complex networks in [30,31]. Other constructions of simplicial complexes—e.g., independent sets, neighborhoods, and dominating sets—have also been discussed and utilized in applications, though, to the best of our knowledge, not in connection with persistence.
The construction of simplicial complexes from digraphs poses similar but not identical problems to their undirected counterparts. In [32], directed cliques were used to investigate the structural and functional topology of the brain. A differently flavored homology was built in [33,34], where the role of simplices was played by directed paths. Directed paths were also at the basis of an application of Hochschild homology [35] (Ch.1)—through the use of “connectivity digraphs” and within the framework of [6]—for a new persistence homology for digraphs in [36].

3. Graph-Theoretical Persistence

To the best of our knowledge, current applications of persistence to (di)graphs leverage a homology of some kind. Here, we provide a non-necessarily-homologicalapproach to the study of (di)graphs. We recall the basic notions of the extension of topological persistence to graph theory introduced in [6,7]. For an encompassing view of classical topological persistence, we refer the reader to [37,38].

3.1. Categorical Persistence Functions

Definition 1.
[6] (Def. 3.2) Let C be a category. We say that a lower-bounded function p: Morph ( C ) Z is a categorical persistence function if, for all u 1 u 2 v 1 v 2 , the following inequalities hold:
  • p ( u 1 v 1 ) p ( u 2 v 1 ) and p ( u 2 v 2 ) p ( u 2 v 1 ) ;
  • p ( u 2 v 1 ) p ( u 1 v 1 ) p ( u 2 v 2 ) p ( u 1 v 2 ) .
With C = ( R , ) , the graphs of such functions have the appearance of superimposed triangles typical of persistent Betti number functions [37] (Sect. 2), so they can be condensed into persistence diagrams [6] (Def. 3.16).
As anticipated in the Introduction, we shall write “(di)graph” for “graph (respectively, digraph)”, extending this notation to cases such as sub(di)graphs et similia. Symmetrically and when no confusion arises, we shall denote with Gr the category Graph or the category Digraph , with monomorphisms as morphisms. We shall use the noun edge for edges of a graph and arcs of a digraph.
Filtered (di)graphs ( G , f ) are pairs consisting of a (di)graph G = ( V , E ) and a filtering function f: V E R on the edges and extended to each vertex as the minimum of the values of its incident edges. For any u R , the sublevel (di)graph G f , u (briefly, G u , if no confusion occurs) is the sub(di)graph induced by the edges e such that f ( e ) u .
Definition 2.
[7] (Def. 5) The natural pseudodistance of filtered (di)graphs ( G , f ) and ( G , f ) is
δ ( G , f ) , ( G , f ) = + if H = inf ϕ H sup e E | f ( e ) f ϕ ( e ) | otherwise
where H is the set of (di)graph isomorphisms between G and G .
Let p be a categorical persistence function on Gr ; ( G , f ) , ( G , f ) any filtered (di)graphs; D , D the respective persistence diagrams; and d the bottleneck distance [37] (Sect. 6), [6] (Def. 3.24).
Definition 3.
[7] (Sect. 2) The categorical persistence function p is said to be stable if the inequality
d ( D , D ) δ ( ( G , f ) , ( G , f ) )
holds. Moreover, the bottleneck distance is said to be universal with respect to p if it yields the best possible lower bound for the natural pseudodistance among the possible distances between D and D for any ( G , f ) , ( G , f ) .

3.2. Indexing-Aware Persistence Functions

The difference between any of the categorical persistence functions introduced in [7] and the functions of the present subsection (presented originally in [8]) is that the former come from a functor defined on Gr , while the latter strictly depend on the filtration, thus descending from a functor defined on ( R , ) .
Definition 4.
[8] (Def. 5) Let p be a map assigning to each filtered (di)graph ( G , f ) a categorical persistence function p ( G , f ) on ( R , ) , such that p ( G , f ) = p ( G , f ) whenever an isomorphism between ( G , f ) and ( G , f ) compatible with the functions f , f exists.
All the resulting categorical persistence functions p ( G , f ) are called indexing-aware persistence functions (ip-functions for brevity). The map p itself is called an ip-function generator.
Definition 5.
[8] (Def. 6, Prop. 1) Let p be an ip-function generator on Gr . The map p itself and the resulting ip-functions are said to be balanced if the following condition is satisfied. Let G = ( V , E ) be any (di)graph, f and g two filtering functions on G, and p G , f ) and p ( G , g ) their ip-functions. If a positive real number h exists, such that sup e E | f ( e ) g ( e ) | h , then for all ( u , v ) Δ + the inequality p ( G , f ) ( u h , v + h ) p ( G , g ) ( u , v ) holds.
Proposition 1.
[8] (Thm. 1) Balanced ip-functions are stable.
We now present our main ip-function generators: steady and ranging sets.
Definition 6.
[8] (Def. 8) Given a (di)graph G = ( V , E ) , any function F : 2 V E { t r u e , f a l s e } is called a feature. Conventionally, for X V E , we set F ( X ) = f a l s e in a sub(di)graph G = ( V , E ) if X V E .
Remark 1.
The second part of Definition 6 was added for use in some of the following proofs.
Let F be a feature.
Definition 7.
[8] (Def. 8) We call an F -set any X V E such that F ( X ) = t r u e . In a weighted (di)graph G, we shall say that X V E is an F -set at level w R if it is an F -set of G w .
Definition 8.
[8] (Def. 9) We define the maximal feature m F associated with F as follows: for any X ( V E ) , m F ( X ) = t r u e if and only if F ( X ) = t r u e and there is no Y ( V E ) such that X Y and F ( Y ) = t r u e .
Definition 9.
[8] (Def. 10) A set X V E is a steady F -set (s F -set for brevity) at ( u , v ) if it is an F -set at all levels w with u w v .
We call X a ranging F -set (r F -set) at ( u , v ) if there exist levels w u and w v at which it is an F -set.
Let S ( G , f ) F ( u , v ) be the set of s F -sets at ( u , v ) and let R ( G , f ) F ( u , v ) be the set of r F -sets at ( u , v ) .
Definition 10.
[8] (Def. 11) For any filtered (di)graph ( G , f ) , we define σ ( G , f ) F (resp., ϱ ( G , f ) F ) as the function that assigns to ( u , v ) R 2 , u < v , the number | S ( G , f ) F ( u , v ) | (resp., | R ( G , f ) F ( u , v ) | ).
We denote by σ F and ϱ F the maps assigning σ ( G , f ) F and ϱ ( G , f ) F , respectively, to ( G , f ) .
Proposition 2.
[8] (Prop. 2) The maps σ F and ϱ F are ip-function generators.
The next definition will be generalized in Section 4.
Definition 11.
[8] (Def. 13) We say that a feature F is monotone if:
  • For any (di)graphs G = ( V , E ) G = ( V , E ) and any X ( V E ) , F ( X ) = t r u e in G implies that F ( X ) = t r u e in G .
  • In any (di)graph G ¯ = ( V ¯ , E ¯ ) , for any Y X V ¯ E ¯ , F ( X ) = t r u e implies that F ( Y ) = t r u e .
Proposition 3.
[8] (Prop. 3) If F is monotone, then σ F = ϱ F .
Proposition 4.
[8] (Prop. 4) If F is monotone, then the ip-function generators σ F = ϱ F are balanced.

4. Simple Features

We now define a class of features extending the class of monotone features (Definition 11).
Definition 12.
Let G = ( V , E ) be a (di)graph. A feature F is said to be simple if, for X ( V E ) and for sub(di)graphs H 1 H 2 H 3 G , the following condition holds:
if F ( X ) = t r u e in H 3 and F ( X ) = f a l s e in H 2 , then F ( X ) = f a l s e in H 1 .
For the remainder of this section, let ( G , f ) be a weighted (di)graph, G = ( V , E ) , and F be a simple feature in G.
Lemma 1.
Let X ( V E ) . Then, either there is no value u for which F ( X ) = t r u e in G u , or F ( X ) = t r u e in G u for all u [ u 1 , v 1 ) , where u 1 is the lowest value u such that in the sub(di)graph G u = ( V u , E u ) one has F ( X ) = t r u e in G u and v 1 is either the lowest value v > u 1 for which F ( X ) = f a l s e in G v or + .
Proof. 
Assume that there is at least a value u ¯ such that F ( X ) = t r u e in G u ¯ . There are surely values of w such that F ( X ) = f a l s e in G w : at least the values beneath the minimum attained by f. Let w ¯ < u ¯ be such that F ( X ) = f a l s e in G w ¯ ; for any w < w ¯ , set H 1 = G w , H 2 = G w ¯ , H 3 = G u ¯ . Then, by Definition 12, F ( X ) = f a l s e in G w . Thus, between two values for which F ( X ) = t r u e , there cannot be a value such that F ( X ) = f a l s e . □
Definition 13.
The interval [ u 1 , v 1 ) of Lemma 13, i.e., the widest interval for which F ( X ) = t r u e in ( G , f ) , is called the F -interval of X in ( G , f ) .
Proposition 5.
σ F = ϱ F .
Proof. 
By Lemma 13, σ F and ϱ F would differ only if, for at least one set X, there were values w 1 < w 2 < w 3 for which F ( X ) = t r u e in G w 1 and in G w 3 but F ( X ) = f a l s e in G w 2 . □
Example 1.
Of course, there exist features that are not simple; as an example, for a (di)graph G, E U ( X ) = t r u e if and only if X is a set of vertices inducing a nonempty Eulerian sub(di)graph, neither E U nor its maximal version m E U (simply E U in [8]) is simple, as can be seen in [8] (Fig. 2), depicting a graph with σ m E U ϱ m E U .
It is easy to prove that the following features are simple.
Example 2.
For a graph G, the (rather trivial) feature D e g d for which D e g d ( X ) = t r u e if and only if X is a singleton containing a vertex of a fixed degree d is simple. Also, the feature D e g d for which D e g d ( X ) = t r u e if and only if X is a singleton containing a vertex whose degree is d is simple, with half-lines as D e g d intervals. Analogous features are defined for a digraph in terms of indegree and outdegree.
Example 3.
For a (di)graph G, C o n n ( X ) = t r u e if and only if X is a set of vertices inducing a connected subgraph (resp., a strongly connected subdigraph) of G. C o n n is simple, and its C o n n -intervals are half-lines.
Example 4.
For a (di)graph G, C o m p ( X ) = t r u e if and only if X is a set of vertices inducing a connected component (resp., a strong component) of G. C o m p is simple; for the C o m p -interval [ u 1 , v 1 ) of a set X, the value v 1 is either + or the value at which some other vertex joins the component (resp., strong component) induced by X.
Proposition 6.
For a (di)graph G, any monotone feature F is simple.
Proof. 
Let G = ( V , E ) be a (di)graph and F be a monotone feature. For H 1 = ( V 1 , E 1 ) H 2 = ( V 2 , E 2 ) H 3 G and X ( V E ) , assume that F ( X ) = t r u e in H 3 and F ( X ) = f a l s e in H 2 . By condition 1 of Definition 11, X V 2 E 2 (recall the convention of Definition 6), so X V 1 E 1 and F ( X ) = f a l s e in H 1 . □
Lemma 2.
Let f , g : E R be filtering functions on a (di)graph G, and assume that there exists a positive real number h such that sup e E f ( e ) g ( e ) h . Then, for real numbers u ¯ , v ¯ such that u ¯ < v ¯ h , the sublevel (di)graph G f , u ¯ is a sub(di)graph of G g , v ¯ , and G g , u ¯ is a sub(di)graph of G f , v ¯ .
Proof. 
For any e E , if f ( e ) u ¯ , then g ( e ) u ¯ + h < v ¯ ; analogously, if g ( e ) u ¯ , then f ( e ) u ¯ + h < v ¯ . □
The following Proposition generalizes Proposition 4 (i.e., [8] (Prop. 4)) with essentially the same proof.
Proposition 7.
The ip-function generators σ F = ϱ F are balanced, and hence stable.
Proof. 
With the same hypotheses as in Lemma 2, let X ( V E ) be such that F ( X ) = t r u e in G f , w at all levels w [ u h , v + h ] , with u < v . For any w ¯ [ u , v ] , we now show that F ( X ) = t r u e in G g , w ¯ . In fact, by Proposition 2, G f , u h is a sub(di)graph of G g , w ¯ , which is a sub(di)graph of G f , v + h . Since F is simple and F ( X ) = t r u e both in G f , u h and in G f , v + h , we necessarily have that F ( X ) = t r u e in G g , w ¯ . Therefore, there is an injective map (actually, an inclusion map) from S ( G , f ) F ( u h , v + h ) to S ( G , g ) F ( u , v ) , and so σ ( G , f ) F ( u h , v + h ) σ ( G , g ) F ( u , v ) . Stability comes from Proposition 1. □
Remark 2.
The maximal version of a simple feature (Definition 8) is generally not simple: An example is the feature I identifying independent sets of vertices, which is monotone (hence, simple), while the ip functions corresponding to its maximal version m I are not balanced ([8] (Sect. 2.4, Appendix, Figure 14, Figure 15)), so m I is not simple. Still, some are.
Proposition 8.
For a (di)graph G, the feature m M [8] (Sect. 2.4), i.e., such that m M ( X ) = t r u e if and only if X is a maximal matching, is simple.
Proof. 
A matching of a graph is also a matching of any supergraph. If, for H 1 = ( V 1 , E 1 ) H 2 = ( V 2 , E 2 ) H 3 G , we have m M ( X ) = t r u e in H 3 and m M ( X ) = f a l s e in H 2 , this means that X is a maximal matching in H 3 but not in H 2 . There are only two ways of not being a maximal matching in H 2 : either X V 2 E 2 or H 2 contains a matching X X . The latter is impossible because in that case, X would not be maximal in H 3 either. Thus, X V 1 E 1 V 2 E 2 and m M ( X ) = f a l s e in H 1 . □
Two variations on the notion of matching for digraphs are as follows:
Example 5.
For a digraph G = ( V , E ) , PK ( X ) = t r u e if and only if X is a set of arcs, any two of which have neither the same head nor the same tail. Such a set X is said to be path-like.
Example 6.
For a digraph G = ( V , E ) , PS ( X ) = t r u e if and only if X is a set of arcs, such that for any two of them the head of one is not the tail of the other. Such a set X is said to be path-less.
Proposition 9.
The features PK and PS are monotone, and hence simple.
Proof. 
Straightforward. □
Proposition 10.
The features m PK and m PS are simple.
Proof. 
The same as for Proposition 8. □
Figure 1 shows the functions σ m PK = ϱ m PK (middle) and σ m PS = ϱ m PS (right) for the same toy example (left) used in [8].

5. Single-Vertex Features

Some features are of particular interest in applications: those that may be true on singletons consisting of vertices. They can be categorized as point-wise if they depend on the single vertex, local if they depend on the k-neighborhood of the vertex for some k, or global if they depend on the whole (di)graph.
Example 7.
See Example 2 for examples of point-wise features.
Example 8.
Another point-wise feature for a digraph G is S o u , for which S o u ( X ) = t r u e if and only if X consists of a single vertex (that we call a source) whose outdegree is larger than its indegree. An analogous feature identifies a sink, where the indegree exceeds the outdegree. Variations on these features can be built by specifying that the outdegree (respectively, indegree) exceeds a fixed fraction of the total degree of the vertex. The usual example with its ip functions σ S o u (middle) and ϱ S o u is shown in Figure 2. σ S o u and ϱ S o u are not balanced: see the Appendix.
Example 9.
A local feature is one identifying a hub (see [8] (Sect. 2.5, 4)) in a (di)graph, i.e., a vertex whose degree (resp., outdegree) exceeds that of each of its neighbors. The corresponding ip-functions σ and ϱ are unbalanced [8] (Appendix).
Example 10.
In [9] (Sect. 4.1.4), a local feature is defined for the graph G = ( V , E ) , representing a gray-tone image, where each vertex is a pixel and is adjacent to each of its eight neighboring pixels. The filtering function f: E R maps each edge to the minimum of the intensities of the incident pixels. The feature G m , n k , t (short: G ) maps each vertex v to t r u e if and only if more than m and less than n pixels at distance k have an intensity < t . The function σ G is proved to be balanced [39] (Thm. 4.3.1).
Example 11.
A global feature in a digraph G is R o o t , for which R o o t ( X ) = t r u e if and only if X consists of a vertex (that we call a root) from which there is a directed path to each vertex of the connected component containing it. Figure 3 shows the same weighted digraph of the previous examples, with its ip functions σ R o o t (middle) and ϱ R o o t . σ R o o t and ϱ R o o t are not balanced: see the Appendix.

6. Computational Experiments

Large, directed, weighted graphs are widely used for representing context-based ratings, from recommendation engines to trust networks.
We implemented the single-vertex features described in Section 5 as part of the Generalized Persistence Analysis Python package available at https://github.com/LimenResearch/gpa, accessed on 13 August 2023.

6.1. Algorithm

We consider a directed, weighted graph G = ( V , E , Ω ) and apply a transformation function on the weights defined on its edges. Transformed weights induce a sublevel set filtration. See Figure 4. For each sublevel set, we compute a single-vertex feature (sink in Figure 4). Finally, we compute the persistence in terms of the considered single-vertex feature, realized by each vertex throughout the filtration. Thus, the persistence associated with each vertex is the largest interval where the feature holds, as detailed in [8].
We draw the readers’ attention to Figure 4, which serves as a pseudocode for the implementation (direct or leveraging the provided package) of persistence-based features. A Python implementation is available at https://github.com/LimenResearch/gpa/tree/master/gpa/examples/bitcoin/bitcoin.py, accessed on 13 August 2023. The users instantiate directed weighted graphs, specify a transformation function to be applied to the weights to obtain the filtration, and finally choose a feature (in the following applications, we shall leverage sources and sinks). Features are implemented by adding two methods to the WeightedGraph class. The first method computes the feature of interest, while the second applies the computation to each step of the filtration. Thereafter, the persistence diagram is computed according to the steady and ranging paradigms described in [8].

6.2. Datasets

The OTC and Bitcoin Alpha datasets consist of user-to-user trust networks of Bitcoin traders on the OTC and Alpha platform, respectively [40]. Users can express a vote of trust on other users by assigning them a integer rating varying from 10 (fraudulent user) to 10 (completely trustworthy), excluding the value 0. We utilized the data and ground truth provided at https://github.com/horizonly/Rev2-model/tree/master, accessed on 30 January 2020. There, fraudulent and trustworthy users are identified by considering the platform founders’ votes.

6.3. Results

We computed the steady persistence of the source single-vertex feature considering the sublevel set filtration induced on the Bitcoin Alpha network A = ( V A , E A , Ω A ) by the function
f ( ω ) = max ( Ω A ) ( ω + | min ( Ω A ) | ) ,
where Ω A is the set of all weights defined on the network’s edges.
For each sublevel of the filtration and each of its nodes, we say that a vertex v is a sink if the sum of the weights of its in_edges is larger the sum of the weights defined on its out_edges. In symbols:
S i ( v ) = e i n _ e d g e s ( v ) ω e > e o u t _ e d g e s ( v ) ω e .
This computation gave rise to the persistence diagram in Figure 5 (left panel). The persistence associated with the networks’ nodes allowed us to draw the histogram of the values associated with trustworthy and fraudulent users, respectively; see Figure 5 (right panel). The histogram revealed how nodes representing fraudulent users attained high persistence levels (with a mean > 12 and most of them being represented by a cornerline in the persistence diagram, i.e., points with infinite persistence, as showcased in Figure 5), while trustworthy users’ persistence was typically finite, with a mean 6 and median 4. Thus, persistence computed via the sink single-vertex feature could serve as a concise score to separate the populations of trustworthy and fraudulent users. Importantly, such a score could be easily integrated into analysis pipelines and complement the information carried by other methods.

7. Discussion

The features defined in Section 4 and Section 5 are fairly natural from the graph-theoretical viewpoint but would lead to rather complicated constructions if we wanted to apply the classical pipeline: graph–simplicial complex–homology–persistence. On the contrary, the ip-function generators “steady” and “ranging” immediately produce persistence diagrams, through which distances, analysis, discrimination between signal and noise, stability, and universality can be directly determined and studied. This may widen and simplify the use of persistence in situations where data appear as (undirected) graphs, even more so when they occur as digraphs. This might have been the case for the problems faced, e.g., in [41] for startling deductions from microglial morphology, in [42,43] for protein-binding prediction, in [44] for RNA analysis, in [45] for investigating market crashes, and in [46] for genealogical studies. Research such as that presented in [47] on antibiotic discovery did not make use of persistence; we suspect that our new persistence techniques might boost its representation and analysis power. Some of the above-referenced papers modeled their data as hypergraphs, so it could be interesting and useful to extend our methods to this type of structure.

8. Conclusions

The availability of large amounts of data represented as graphs has expedited the development of computational methods specializing in handling data endowed with the complex structures arising from pairwise interactions.
We provided a computational recipe to generate compact representations of large networks according to graph-theoretical features or custom properties defined on the networks’ vertices. To achieve this, we leveraged a generalization of topological persistence adapted to work on weighted (di)graphs. In this framework, we widened the notion of monotone features on (di)graphs—i.e., features respecting inclusion—thus well-suited to generating filtrations and persistence diagrams. Simple features are a special case of features whose value can change from true to false only once in the filtration of a (di)graph. We proved that in a (di)graph, any monotone feature is simple, and that simple features give rise to stable indexing-aware persistence functions. We then provided examples of simple features. Turning our attention to applications, single-vertex features allow the user to compute the persistence of a weighted (di)graph for any feature, focused on singletons. Here, we provided several examples, characterizing single-vertex features as point-wise, local, or global descriptors of a network.
Computational experiments—supporting our belief that single-vertex features can be easily integrated in applications—showed how the sink feature could characterize the users of trust networks. When considering the persistence of sink vertices, fraudulent users were, in the majority of cases, represented by half-lines with infinite persistence, while trusted users were associated with finite persistence. To replicate our results, we provided a Python implementation of the algorithm for the computation of single-vertex feature persistence, integrating it in the package originally supporting [7,8].
We believe that straightforward implementation to compute local and global descriptors of networks arising from easily engineerable features could complement and help gain control of current approaches based on artificial neural architectures and learned features.

Author Contributions

Conceptualization, methodology, formal analysis, investigation, writing—original draft preparation, M.G.B. and M.F.; software, resources, data curation, validation, M.G.B.; writing—review and editing, supervision, M.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Generalized Persistence Analysis Python package available at https://github.com/LimenResearch/gpa, accessed on 13 August 2023. Feature implementation is available at https://github.com/LimenResearch/gpa/tree/master/gpa/examples/bitcoin/bitcoin.py, accessed on 13 August 2023. Data and ground truth provided at https://github.com/horizonly/Rev2-model/tree/master, accessed on 30 January 2020.

Acknowledgments

Work performed under the auspices of INdAM-GNSAGA.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Unbalanced

The ip-function generators σ S o u and ϱ S o u are not balanced (Definition 5), as the weighted digraphs ( G , f ) and ( G , g ) of Figure A1 show. In fact, sup e E | f ( e ) g ( e ) | = 2 and σ ( G , f ) S o u ( 3.5 2 , 4 + 2 ) = 1 > 0 = σ ( G , g ) S o u ( 3.5 , 4 ) . ϱ S o u coincides with σ S o u in this particular case.
Figure A1. Left: ( G , f ) and the corresponding σ ( G , f ) S o u . Right: ( G , g ) and the corresponding σ ( G , g ) S o u .
Figure A1. Left: ( G , f ) and the corresponding σ ( G , f ) S o u . Right: ( G , g ) and the corresponding σ ( G , g ) S o u .
Algorithms 16 00465 g0a1
The ip-function generators σ R o o t and ϱ R o o t are not balanced (Definition 5), as the weighted digraphs ( G , f ) and ( G , g ) of Figure A2 show. In fact, sup e E | f ( e ) g ( e ) | = 2 and σ ( G , f ) R o o t ( 3.5 2 , 4 + 2 ) = 1 > 0 = σ ( G , g ) R o o t ( 3.5 , 4 ) . ϱ R o o t coincides with σ R o o t in this particular case.
Figure A2. Left: ( G , f ) and the corresponding σ ( G , f ) R o o t . Right: ( G , g ) and the corresponding σ ( G , g ) R o o t .
Figure A2. Left: ( G , f ) and the corresponding σ ( G , f ) R o o t . Right: ( G , g ) and the corresponding σ ( G , g ) R o o t .
Algorithms 16 00465 g0a2

References

  1. Otter, N.; Porter, M.A.; Tillmann, U.; Grindrod, P.; Harrington, H.A. A roadmap for the computation of persistent homology. EPJ Data Sci. 2017, 6, 1–38. [Google Scholar] [CrossRef] [PubMed]
  2. Cohen-Steiner, D.; Edelsbrunner, H.; Harer, J. Stability of persistence diagrams. In Proceedings of the Symposium on Computational Geometry, Pisa, Italy, 6–8 June 2005; Mitchell, J.S.B., Rote, G., Eds.; ACM: New York, NY, USA, 2005; pp. 263–271. [Google Scholar]
  3. Lesnick, M. The Theory of the Interleaving Distance on Multidimensional Persistence Modules. Found. Comput. Math. 2015, 15, 613–650. [Google Scholar] [CrossRef]
  4. Di Fabio, B.; Landi, C. A Mayer–Vietoris formula for persistent homology with an application to shape recognition in the presence of occlusions. Found. Comput. Math. 2011, 11, 499–527. [Google Scholar] [CrossRef]
  5. Malott, N.O.; Chen, S.; Wilsey, P.A. A survey on the high-performance computation of persistent homology. IEEE Trans. Knowl. Data Eng. 2022, 35, 4466–4484. [Google Scholar] [CrossRef]
  6. Bergomi, M.G.; Vertechi, P. Rank-based persistence. Theory Appl. Categ. 2020, 35, 228–260. [Google Scholar]
  7. Bergomi, M.G.; Ferri, M.; Vertechi, P.; Zuffi, L. Beyond Topological Persistence: Starting from Networks. Mathematics 2021, 9, 3079. [Google Scholar] [CrossRef]
  8. Bergomi, M.G.; Ferri, M.; Tavaglione, A. Steady and ranging sets in graph persistence. J. Appl. Comput. Topol. 2022, 7, 33–56. [Google Scholar] [CrossRef]
  9. Bergomi, M.G.; Ferri, M.; Mella, A.; Vertechi, P. Generalized Persistence for Equivariant Operators in Machine Learning. Mach. Learn. Knowl. Extr. 2023, 5, 346–358. [Google Scholar] [CrossRef]
  10. Monti, F.; Otness, K.; Bronstein, M.M. Motifnet: A motif-based graph convolutional network for directed graphs. In Proceedings of the 2018 IEEE Data Science Workshop (DSW), Lausanne, Switzerland, 4–6 June 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 225–228. [Google Scholar]
  11. Tong, Z.; Liang, Y.; Sun, C.; Rosenblum, D.S.; Lim, A. Directed graph convolutional network. arXiv 2020, arXiv:2004.13970. [Google Scholar]
  12. Zhang, X.; He, Y.; Brugnone, N.; Perlmutter, M.; Hirn, M. Magnet: A neural network for directed graphs. Adv. Neural Inf. Process. Syst. 2021, 34, 27003–27015. [Google Scholar]
  13. Estrada, E. The Structure of Complex Networks: Theory and Applications; Oxford University Press: New York, NY, USA, 2012. [Google Scholar]
  14. Zhao, S.X.; Fred, Y.Y. Exploring the directed h-degree in directed weighted networks. J. Inf. 2012, 6, 619–630. [Google Scholar] [CrossRef]
  15. Yang, Y.; Xie, G.; Xie, J. Mining important nodes in directed weighted complex networks. Discret. Dyn. Nat. Soc. 2017, 2017, 9741824. [Google Scholar] [CrossRef]
  16. Carlsson, G. Topology and data. Bull. Amer. Math. Soc. 2009, 46, 255–308. [Google Scholar] [CrossRef]
  17. Carlsson, G.; Singh, G.; Zomorodian, A. Computing Multidimensional Persistence. In Proceedings of the ISAAC, Honolulu, HI, USA, 16–18 December 2009; Dong, Y., Du, D.Z., Ibarra, O.H., Eds.; Springer: Berlin/Heidelberg, Germany, 2009. Lecture Notes in Computer Science. Volume 5878, pp. 730–739. [Google Scholar]
  18. Burghelea, D.; Dey, T.K. Topological persistence for circle-valued maps. Discret. Comput. Geom. 2013, 50, 69–98. [Google Scholar] [CrossRef]
  19. Bubenik, P.; Scott, J.A. Categorification of persistent homology. Discret. Comput. Geom. 2014, 51, 600–627. [Google Scholar] [CrossRef]
  20. de Silva, V.; Munch, E.; Stefanou, A. Theory of interleavings on categories with a flow. Theory Appl. Categ. 2018, 33, 583–607. [Google Scholar]
  21. Kim, W.; Mémoli, F. Generalized persistence diagrams for persistence modules over posets. J. Appl. Comput. Topol. 2021, 5, 533–581. [Google Scholar] [CrossRef]
  22. McCleary, A.; Patel, A. Edit Distance and Persistence Diagrams Over Lattices. SIAM J. Appl. Algebra Geom. 2022, 6, 134–155. [Google Scholar] [CrossRef]
  23. Mémoli, F.; Wan, Z.; Wang, Y. Persistent Laplacians: Properties, algorithms and implications. SIAM J. Math. Data Sci. 2022, 4, 858–884. [Google Scholar] [CrossRef]
  24. Southern, J.; Wayland, J.; Bronstein, M.; Rieck, B. Curvature filtrations for graph generative model evaluation. arXiv 2023, arXiv:2301.12906. [Google Scholar]
  25. Watanabe, S.; Yamana, H. Topological measurement of deep neural networks using persistent homology. Ann. Math. Artif. Intell. 2022, 90, 75–92. [Google Scholar] [CrossRef]
  26. Ju, H.; Zhou, D.; Blevins, A.S.; Lydon-Staley, D.M.; Kaplan, J.; Tuma, J.R.; Bassett, D.S. Historical growth of concept networks in Wikipedia. Collect. Intell. 2022, 1, 26339137221109839. [Google Scholar] [CrossRef]
  27. Arfi, B. The promises of persistent homology, machine learning, and deep neural networks in topological data analysis of democracy survival. Qual. Quant. 2023. [Google Scholar] [CrossRef]
  28. Sizemore, A.E.; Giusti, C.; Kahn, A.; Vettel, J.M.; Betzel, R.F.; Bassett, D.S. Cliques and cavities in the human connectome. J. Comput. Neurosci. 2018, 44, 115–145. [Google Scholar] [CrossRef] [PubMed]
  29. Guerra, M.; De Gregorio, A.; Fugacci, U.; Petri, G.; Vaccarino, F. Homological scaffold via minimal homology bases. Sci. Rep. 2021, 11, 5355. [Google Scholar] [CrossRef]
  30. Rieck, B.; Fugacci, U.; Lukasczyk, J.; Leitte, H. Clique community persistence: A topological visual analysis approach for complex networks. IEEE Trans. Vis. Comput. Graph. 2018, 24, 822–831. [Google Scholar] [CrossRef] [PubMed]
  31. Aktas, M.E.; Akbas, E.; Fatmaoui, A.E. Persistence homology of networks: Methods and applications. Appl. Netw. Sci. 2019, 4, 61. [Google Scholar] [CrossRef]
  32. Reimann, M.W.; Nolte, M.; Scolamiero, M.; Turner, K.; Perin, R.; Chindemi, G.; Dłotko, P.; Levi, R.; Hess, K.; Markram, H. Cliques of Neurons Bound into Cavities Provide a Missing Link between Structure and Function. Front. Comput. Neurosci. 2017, 11, 48. [Google Scholar] [CrossRef]
  33. Chowdhury, S.; Mémoli, F. Persistent path homology of directed networks. In Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, New Orleans, LA, USA, 7–10 January 2018; SIAM: Philadelphia, PA, USA, 2018; pp. 1152–1169. [Google Scholar]
  34. Dey, T.K.; Li, T.; Wang, Y. An efficient algorithm for 1-dimensional (persistent) path homology. Discret. Comput. Geom. 2022, 68, 1102–1132. [Google Scholar] [CrossRef]
  35. Loday, J.L. Cyclic Homology; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2013; Volume 301. [Google Scholar]
  36. Caputi, L.; Riihimäki, H. Hochschild homology, and a persistent approach via connectivity digraphs. J. Appl. Comput. Topol. 2023, 1–50. [Google Scholar] [CrossRef]
  37. Edelsbrunner, H.; Harer, J. Persistent homology—A survey. In Surveys on Discrete and Computational Geometry; American Mathematical Society: Providence, RI, USA, 2008; Volume 453, pp. 257–282. [Google Scholar]
  38. Edelsbrunner, H.; Harer, J. Computational Topology: An Introduction; American Mathematical Society: Providence, RI, USA, 2009. [Google Scholar]
  39. Mella, A. Non-Topological Persistence for Data Analysis and Machine Learning. Ph.D. Thesis, Alma Mater Studiorum-Università di Bologna, Bologna, Italy, 2021. [Google Scholar]
  40. Kumar, S.; Spezzano, F.; Subrahmanian, V.; Faloutsos, C. Edge weight prediction in weighted signed networks. In Proceedings of the 2016 IEEE 16th International Conference on Data Mining (ICDM), Barcelona, Spain, 12–15 December 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 221–230. [Google Scholar]
  41. Colombo, G.; Cubero, R.J.A.; Kanari, L.; Venturino, A.; Schulz, R.; Scolamiero, M.; Agerberg, J.; Mathys, H.; Tsai, L.H.; Chachólski, W.; et al. A tool for mapping microglial morphology, morphOMICs, reveals brain-region and sex-dependent phenotypes. Nat. Neurosci. 2022, 25, 1379–1393. [Google Scholar] [CrossRef]
  42. Wee, J.; Xia, K. Persistent spectral based ensemble learning (PerSpect-EL) for protein–protein binding affinity prediction. Briefings Bioinform. 2022, 23, bbac024. [Google Scholar] [CrossRef]
  43. Qiu, Y.; Wei, G.W. Artificial intelligence-aided protein engineering: From topological data analysis to deep protein language models. arXiv 2023, arXiv:2307.14587. [Google Scholar] [CrossRef]
  44. Xia, K.; Liu, X.; Wee, J. Persistent Homology for RNA Data Analysis. In Homology Modeling: Methods and Protocols; Springer: Berlin/Heidelberg, Germany, 2023; pp. 211–229. [Google Scholar]
  45. Yen, P.T.W.; Xia, K.; Cheong, S.A. Laplacian Spectra of Persistent Structures in Taiwan, Singapore, and US Stock Markets. Entropy 2023, 25, 846. [Google Scholar] [CrossRef]
  46. Boyd, Z.M.; Callor, N.; Gledhill, T.; Jenkins, A.; Snellman, R.; Webb, B.; Wonnacott, R. The persistent homology of genealogical networks. Appl. Netw. Sci. 2023, 8, 15. [Google Scholar] [CrossRef]
  47. Choo, H.Y.; Wee, J.; Shen, C.; Xia, K. Fingerprint-Enhanced Graph Attention Network (FinGAT) Model for Antibiotic Discovery. J. Chem. Inf. Model. 2023, 63, 2928–2935. [Google Scholar] [CrossRef] [PubMed]
Figure 1. A weighted digraph ( G , f ) (left), its coinciding ip-functions σ ( G , f ) m PK and ϱ ( G , f ) m PK (middle), and its coinciding ip-functions σ ( G , f ) m PS and ϱ ( G , f ) m PS (right). Above, the filtration is shown.
Figure 1. A weighted digraph ( G , f ) (left), its coinciding ip-functions σ ( G , f ) m PK and ϱ ( G , f ) m PK (middle), and its coinciding ip-functions σ ( G , f ) m PS and ϱ ( G , f ) m PS (right). Above, the filtration is shown.
Algorithms 16 00465 g001
Figure 2. A weighted digraph ( G , f ) (left) and its ip-functions σ ( G , f ) S o u (middle) and ϱ ( G , f ) S o u (right).
Figure 2. A weighted digraph ( G , f ) (left) and its ip-functions σ ( G , f ) S o u (middle) and ϱ ( G , f ) S o u (right).
Algorithms 16 00465 g002
Figure 3. A weighted digraph ( G , f ) (left) and its ip functions σ ( G , f ) R o o t (middle) and ϱ ( G , f ) R o o t (right).
Figure 3. A weighted digraph ( G , f ) (left) and its ip functions σ ( G , f ) R o o t (middle) and ϱ ( G , f ) R o o t (right).
Algorithms 16 00465 g003
Figure 4. Algorithmic flow. A directed weighted graph is mapped to a filtration (induced by transformed weights). Then, persistence is computed according to a given single-vertex feature.
Figure 4. Algorithmic flow. A directed weighted graph is mapped to a filtration (induced by transformed weights). Then, persistence is computed according to a given single-vertex feature.
Algorithms 16 00465 g004
Figure 5. (left) The persistence diagrams associated with the Bitcoin Alpha (top) and OTC (bottom) networks. Persistence diagrams represent the sink single-vertex feature. Histograms of the persistence values associated with trustworthy and fraudulent users are depicted in (right). In each histogram, the black middle line is the median, while the white circle is the mean of the values.
Figure 5. (left) The persistence diagrams associated with the Bitcoin Alpha (top) and OTC (bottom) networks. Persistence diagrams represent the sink single-vertex feature. Histograms of the persistence values associated with trustworthy and fraudulent users are depicted in (right). In each histogram, the black middle line is the median, while the white circle is the mean of the values.
Algorithms 16 00465 g005
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Bergomi, M.G.; Ferri, M. Exploring Graph and Digraph Persistence. Algorithms 2023, 16, 465. https://doi.org/10.3390/a16100465

AMA Style

Bergomi MG, Ferri M. Exploring Graph and Digraph Persistence. Algorithms. 2023; 16(10):465. https://doi.org/10.3390/a16100465

Chicago/Turabian Style

Bergomi, Mattia G., and Massimo Ferri. 2023. "Exploring Graph and Digraph Persistence" Algorithms 16, no. 10: 465. https://doi.org/10.3390/a16100465

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop