Next Article in Journal
Variability Identification and Uncertainty Evolution Characteristic Analysis of Hydrological Variables in Anhui Province, China
Previous Article in Journal
Minimizing System Entropy: A Dual-Phase Optimization Approach for EV Charging Scheduling
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Edge-Centric Embeddings of Digraphs: Properties and Stability Under Sparsification

by
Ahmed Begga
*,
Francisco Escolano Ruiz
and
Miguel Ángel Lozano
Department of Computer Science and Artificial Intelligence, University of Alicante, 03690 Alicante, Spain
*
Author to whom correspondence should be addressed.
Entropy 2025, 27(3), 304; https://doi.org/10.3390/e27030304
Submission received: 25 November 2024 / Revised: 28 February 2025 / Accepted: 11 March 2025 / Published: 14 March 2025
(This article belongs to the Section Information Theory, Probability and Statistics)

Abstract

:
In this paper, we define and characterize the embedding of edges and higher-order entities in directed graphs (digraphs) and relate these embeddings to those of nodes. Our edge-centric approach consists of the following: (a) Embedding line digraphs (or their iterated versions); (b) Exploiting the rank properties of these embeddings to show that edge/path similarity can be posed as a linear combination of node similarities; (c) Solving scalability issues through digraph sparsification; (d) Evaluating the performance of these embeddings for classification and clustering. We commence by identifying the motive behind the need for edge-centric approaches. Then we proceed to introduce all the elements of the approach, and finally, we validate it. Our edge-centric embedding entails a top-down mining of links, instead of inferring them from the similarities of node embeddings. This analysis is key to discovering inter-subgraph links that hold the whole graph connected, i.e., central edges. Using directed graphs (digraphs) allows us to cluster edge-like hubs and authorities. In addition, since directed edges inherit their labels from destination (origin) nodes, their embedding provides a proxy representation for node classification and clustering as well. This representation is obtained by embedding the line digraph of the original one. The line digraph provides nice formal properties with respect to the original graph; in particular, it produces more entropic latent spaces. With these properties at hand, we can relate edge embeddings to node embeddings. The main contribution of this paper is to set and prove the linearity theorem, which poses each element of the transition matrix for an edge embedding as a linear combination of the elements of the transition matrix for the node embedding. As a result, the rank preservation property explains why embedding the line digraph and using the labels of the destination nodes provides better classification and clustering performances than embedding the nodes of the original graph. In other words, we do not only facilitate edge mining but enforce node classification and clustering. However, computing the line digraph is challenging, and a sparsification strategy is implemented for the sake of scalability. Our experimental results show that the line digraph representation of the sparsified input graph is quite stable as we increase the sparsification level, and also that it outperforms the original (node-centric) representation. For the sake of simplicity, our theorem relies on node2vec-like (factorization) embeddings. However, we also include several experiments showing how line digraphs may improve the performance of Graph Neural Networks (GNNs), also following the principle of maximum entropy.

1. Introduction

Graph Neural Networks (GNNs) [1,2,3,4] are of pivotal importance for the advancement of AI in structural domains such as molecular chemistry, social-network analysis [5], and even algorithmic reasoning [6,7]. GNNs are node-centric, since they exploit adjacency information to learn latent spaces where the nodal features are encoded for node classification and graph classification. Link/edge prediction, however, does usually rely on the latent representations of the nodes supposed to form an edge.
So far, capturing the latent spaces of edges has been an elusive problem. For instance, HOPE [8] exploits the powers of the adjacency matrix to assign asymmetric roles (source and target) to the nodes via a double embedding. This leads to edge prediction, but the resulting long-range similarity tends to diffuse local contexts, which results in poor node classification. The NERD method [9] overcomes the latter limitation by using alternating random walks, a standard method for estimating node centrality in directed graphs [10]. These walks provide, again, two embeddings: one for the source role of the node and another for its target role. The product of these embeddings provides a good performance in link prediction and graph reconstruction, and the concatenation of the embeddings is also competitive with node-centered methods in node-classification.
However, the issues derived from addressing edge-related tasks such as link prediction from a node-centric angle have not been recognized until recently [11]. The degree distribution of most social networks follows a power law: a small number of nodes has a large number of neighbors, whereas most of the nodes have a small degree, and they are in the tail of the degree distribution. This is the so-called long-tail effect: link prediction of GNNs is greatly hindered by the tail node pairs since they share few neighbors. See, for instance, Figure 1-Left, where we plot the probabilities of reaching each node from a bunch of random walks for three social networks. Therein, the tail effect is quite visible: half of the nodes are unreached. However, in the Center and Right columns of Figure 1 we present more entropic distributions, thereby mitigating the long-tail effect to some extent, but how do we construct these distributions?
Maximum Entropy Latent Spaces. Instead of modifying the structure to increase the number of neighbors among under-represented nodes, as done in [11], we propose a more principled approach in this paper. In particular, we turn our attention to the latent spaces of edges themselves. Such spaces must produce similar codes for the edges having a similar structural role in the network. This requirement is even stronger when the networks encode causal relationships via directed edges. We show a couple of examples in Figure 2-Left column (a),(d), where there are two distinct roles: intra-community edges and inter-community edges. The nodes of intra-community edges have the same color (blue and green, respectively), whereas inter-communities have different colors in their nodes.
Then, in Figure 2-Center column (b),(e), we show a graph, say G = ( V , E ) , where the nodes are the edges of the original graph G = ( V , E ) , i.e., V = E . There is an edge ( i j , k l ) E iff ( i , j ) E , ( k , l ) E and j = k . In other words, edges in G , the so-called line graph of G, encode transitive relationships in G. As a result, paths and loops are shortened in the line graph. Since the edges of G are now the nodes of G , we can apply a node-centric approach to G in order to uncover the properties of the latent spaces of the edges in G. Independently of whether we use a node2vec-like embedding [12,13,14,15,16] or the embedding resulting from a state-of-the-art (SOTA) or more recent GNN [17,18,19,20,21], the subsequent latent space relies on how the structure constrains the random walks exploring it.
For instance, the Center and Right columns of Figure 2, show the line graphs of the graphs in the previous columns. A brief observation reveals that the line graph produces denser communities while preserving inter-class edges. Thus, shortening the paths and loops is respectful of the community structure of the graph: communities become redundant, but the inter-community information flow is preserved. In other words, random walks launched inside a given community visit it faster, but their probability of visiting another community is preserved with respect to the one in the original graph.
Main Contribution. Considering the above observations, and pointing out that G has O ( | E | ) nodes instead of O ( | V | ) , in this paper we give spectral arguments showing that line graphs lead naturally to quasi-maximum entropy embeddings.
Thus, this paper goes beyond the previous (instrumental) uses of line graphs. For instance, in the LGNN (line graph NN) model [22], line graph features complement node-based features for community detection. Similarly, in PRUNE [23] line graphs are implicitly computed via second-order proximity (considering the respective direct predecessors and successors of node pairs). Second-order proximity was introduced in LINE [14]. However, in [24] second-order proximity is used to bridge the microscopic scale of nodes with the mesoscopic community structure. This is interpreted in PRUNE via tri-factorization. Finally, GNNs have recently incorporated convolutions on both nodes and edges [25], and transfer learning [26] has been applied to node-centric approaches. In this latest regard, node-centric embeddings fail to capture transferable edges since they infer edges in a node-centric way and they are subject to the long-tail effect. We conjecture that modeling edges themselves would help to better transfer structural information.
Organization of the paper. The remainder of the paper is organized as follows. In Section 2, we introduce the main properties of line digraphs and uncover the formal relationship between nodal embeddings and the embedding of edges/paths. This relationship is linked to the rank property of the line digraph (the rank of a line digraph is the number of nodes of the original digraph), and this makes the difference in classification and clustering. Actually, large rank is an algebraic interpretation of maximum entropy [27]. In Section 3, we present our experimental results (edge classification and clustering), both in node2vec-like methods and GNNs, where we compute a proxy of the nodes classification/clustering. We show that this proxy not only outperforms the original (node-centric) classification and clustering but also has stable performance at different levels of sparsification. Finally, in Section 4, we sketch our conclusions and future work.

2. Method

2.1. Embedding Edges and Paths

This section presents the main theoretical contribution of this paper: the linearity theorem and its corollary, namely the rank-preservation property. We commence by defining line digraphs and introducing some of their formal properties. We focus our attention on two complementary concepts: (a) The preservation of the non-zero spectrum of the original digraph; (b) The relative density and diameter.
Given an input graph G, its line digraph G tends to shorten the cycles in G, and this results in holding a small degree and diameter with respect to G. When we translate these facts to the respective transition matrices, we find out that running random walks in G (instead of in G) results in the following: (a) Sampling a larger number of nodes; (b) Doing the sampling in a more efficient way (lower probability of returning to the origin); (c) Getting less oversampled nodes, which makes the representation less biased towards notable nodes, i.e., more entropic, thus reducing the long-tail effect.
With the above properties at hand, we prove (linearity theorem) that the similarity between two edges (entry of the r—the power of the transition matrix of G ) is a linear combination of the similarities between several pairs of nodes (entries of the transition matrix of G). Since the respective embeddings of the nodes of G and those of G (the edges of G) rely on the weighted sums of these powers, we have that the matrix-to-factorize for G tends to preserve its rank much better than that for G (rank-preservation corollary). This corollary explains why embedding the line digraph and using the labels of the destination nodes provides better classification and clustering performances than embedding the nodes of the original graph (we test this hypothesis in the experimental section).
Therefore, we contribute with theoretical results having important practical implications beyond explaining the link between the embedding of edges and that of nodes. In general, due to the iterative property of line digraphs, our rationale is extensible to line digraphs of line digraphs (path embedding), provided that there are enough computational resources to compute and store the iterated digraph. In this regard, we rely on digraph sparsification techniques to make this method scalable.

2.2. Line and Iterated Line Digraphs

Similarity of G and  G . Let G = ( V , E ) be a digraph with n nodes V and m edges E. Its line digraph G = ( V , E ) is a graph where the nodes V = E are the edges of G and there is an edge between two nodes of i j , k l V if j = k , i.e., if the edges ( i , j ) , ( k , l ) E share the node j = k . Then, G and G are encoded by their respective adjacency matrices which are related as follows:
A G = H T T , A G = T T H ,
where H is the n × m  incidence matrix of heads with H i j = 1 if the node i is the head of the edge j and H i j = 0 otherwise, and T is the n × m  incidence matrix of tails, with T i j = 1 if the node i is the tail of the edge j and T i j = 0 otherwise.
Example. Consider the Top-Left digraph (directed graph) in Figure 2. We have V = { a , b , c , d , e , f } and E = { ( a , b ) , ( b , a ) , ( b , c ) , ( c , a ) , ( c , d ) , ( d , e ) , ( e , f ) , ( f , d ) , ( f , e ) } . The edges are ordered in lexicographic order: e 1 = ( a , b ) , e 2 = ( b , a ) , , e 9 = ( f , e ) . Then, we have the following matrix product A G = T T H :
0 1 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 · 1 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 1 = 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 .
Note that the rows of H T correspond to the edges in G and the respective columns contain the target node of each edge. The first edges in G are given by H 1 : T · T : 2 = 1 and H 1 : T · T : 3 = 1 : ( a b , b a ) and ( a b , b c ) respectively. Then, we have H 2 : T · T : 1 = 1 leading to ( b a , a b ) and H 3 : T · T : 4 = 1 and H 3 : T · T : 5 = 1 leading respectively to ( b c , c a ) and ( b c , c d ) . Then, we obtain 12 edges in G : the nodes of the digraph in Figure 2b.
An iterated line digraph  p G , with p > 1 is p times G . For instance, in Figure 2 we have that the digraph (c) is 2 G , that is, the line graph of G : see digraph (c) with 12 nodes. Definitively, computing iterated line graphs allows us to understand the latent spaces of directed paths encoding causal chains.
Coding Intuition: The Rank Property. As shown in the example above, we have highlighted (in bold) the rows that are unique, i.e., those defining the algebraic rank of A G . We will build our linearity theorem on a result from [28]: rank ( A G ) = | V | . However, at this point in the paper, the intuition that there are as many different connectivity patterns in G as nodes in G is quite useful for the reader. Actually, this fact reveals the coding properties of G wrt G. We highlight two basic consequences:
(a)
We have that rank ( A G ) rank ( A G ) = | V | . Note that node2vec-like latent spaces are rooted in a SVD factorization (PCA) involving the adjacency matrix. As a result, the latent spaces of G are, at least, as entropic as those of G: exactly | V | codewords are needed to explain the ways of linking the | E | nodes of G .
(b)
When it comes to GNNs, we observe a similar result. The foundational mechanism of a GNN is message passing: (i) Nodes are endowed with feature vectors; (ii) These vectors are propagated using the adjacency; (iii) The vectors received by each node are combined and projected on a learnable matrix. Having rank ( A G ) = | V | implies that | V | of the | E | nodes in G integrate their neighboring information in a different and orthogonal way. As a result, the subsequent latent spaces the nodes in G are at least as entropic as those of the nodes in G. These spaces are actually less prone to the over-smoothing problem (hindering of the performance by decreasing the entropy of the coding due to repeated aggregation).

2.3. The Spectral Argument

The rank property is transferred to random walks as follows.
Preservation of the spectrum of G in  G . The line digraph G is somewhat redundant with respect to G: the respective adjacencies A G and A G are called Flanders pair in [29]; they have the same non-zero eigenvalues, with the same multiplicities, and also have the same number of distinct eigenvectors and generalized eigenvectors in each case associated with these non-zero eigenvalues (see also [30]). The transition matrices P G = D G 1 A G and P G = D G 1 A G also share the same non-zero spectrum. This is key for understanding how random walks sample both G and G . In particular, we have [31]: t r ( P G r ) = t r ( P G r ) for all r (equal traces for all powers of P ).
Transport Efficiency. The condition t r ( P G r ) = t r ( P G r ) leads to the following expected probabilities of returning to the starting node: E G ( r ) = 1 n t r ( P G r ) and E G ( r ) = 1 m t r ( P G r ) , where n = | V | and m = | E | . Borrowing the terminology from continuous-time random walks, E G ( r ) and E G ( r ) quantify the efficiency of the respective random walks in terms of diffusing far from the origin (the smaller, the more efficient) [32]. Thus, G is more efficient than G, and to some extent this mitigates the long-tail effect, as we show in Figure 1.
Quasi-Maximum Entropy. Maximal Entropy Random Walks [33] require that paths of a given length r are equiprobable. The condition t r ( P G r ) = t r ( P G r ) for all r, leads to n E G ( r ) = m E G ( r ) , i.e., E G ( r ) = n m E G ( r ) . Then, the denser G we have m n and E G ( r ) 0 (maximal efficiency). Under these conditions, we have Maximal Entropy for any cycle of any length. This is consistent with envision the G as a quasi-entropy maximizers with respect to G: it maximizes the entropy of G subject to preserving the spectrum. Since the transition matrices P G and P G share the same non-zero spectrum, they have an identical asymptotic behavior (same maximal eigenvalue).
Density and diameter of G vs.  G . Line digraphs hold small degree and diameter in relation to their large number of nodes m [28]. This means that when we are forced to place m nodes in G with respect to the n nodes of G, G (loosely) works as a complete digraph without forcing each node to be adjacent to all the others [34]. More formally, some line graphs are almost Moore graphs. For instance, if the order (sum of all nodes that can be reached from any node) of G is upper bounded [35] by M λ 1 , D ( G ) = ( λ 1 D + 1 1 ) / ( λ 1 1 ) , then the order of G is upper bounded by m M λ 1 , D ( G ) = ( λ 1 D + 2 1 ) / ( λ 1 1 ) , where D is the diameter of G, and λ 1 is the leading eigenvalue of A G (and A G since both matrices share the non-zero spectrum). Therefore, we have O ( λ 1 D ) vs. O ( λ 1 D + 1 ) bounds. These bounds control, respectively, the reachability of the nodes in both graphs. Again, the largest reachability bound of G allows us to mitigate the long-tail effect.

2.4. The Rank Property of LDEs

Since node-centric graph embeddings can be seen as factorizing a matrix S relying on the powers of P , the transition matrix [36], the above properties suggest a deeper analysis of S G vs. S G . (The respective matrices-to-factorize for G and G ). In particular, the preservation of the non-zero spectrum in P G r with respect to that of P G r for r 0 suggests a link between both matrices to factorize and, consequently, between their respective embeddings. Therefore, uncovering this link allows us to understand (and then characterize) the structural differences between the respective latent spaces of nodes and edges (or even paths) for the same graph.
More precisely, the standard factorization approach for network embedding [36] states that the latent representations for nodes of G = ( V , E ) are obtained from the SVD of M ^ G = log ( max ( M G , 1 ) ) , where M G is the Pointwise Mutual Information (PMI) matrix. More recently, Qiu et al. [15] showed that M G can be posed in the following terms:
M G = v o l ( G ) b S G , S G = 1 T r = 1 T P G r D G 1 ,
where P G = D G 1 A G is the transition matrix of G. This emphasizes the role of the random walks (RWs) in the resulting embedding. In particular, Qiu et al. show that the negative spectrum of S G is filtered out (zeroed) even for moderate values of the window size T. In practice, this leads to bound the singular values of S G by its eigenvalues (The formal proof is only correct when the average matrix  1 / T ( r = 1 T P G r ) is symmetric.). Since the eigenvalues of S G decay exponentially, one should expect small optimal embedding dimensions for locally dense graphs, due to their small spectral gap [37,38]. In addition, the factorization of M ^ G provides a loss function (quality of the K-rank approximation) for quantifying the relative distortion of the current embedding with respect to that of an ideal one [39].
Herein, we extend the above analysis to link nodal embeddings with the embedding of higher-order entities (edges, paths, etc.). To that end, we compare the factorizaton of M ^ G with that of M ^ G or M ^ p G , p > 1 . This is particularly interesting when considering directed networks (digraphs), where the following conditions apply: (a) We have a more flexible/general model; (b) There is solid mathematical machinery linking the ranks of G, G , and p G [28,31], as well as emphasizing the optimal diameter, degree preservation, and density of line graphs and their iterations [35,40].
Here follows our main formal result in this paper. The intuition behind the following theorem is that random walks of length r in G are linear combinations of n = | V | random walks in G of length r 1 for any length r > 1 (see Figure 3).
Theorem 1
(Linearity). Embedding edges can be posed concerning the linear combination of node embeddings. For a strongly connected digraph G = ( V , E ) , i.e., without sinks and sources, and with line digraph G , we have the following relationship between edge similarity and node similarity:
P G r ( e i , e j ) = k = 1 n α i k P G r 1 ( e i , e k ) , with α i k = P G ( e k , e j ) ,
where P G = D G 1 A G is the transition matrix of G , P G is a permutation of P G = D G 1 A G , the transition matrix of G and e i = ( e i + , e i ) denotes the directed edges (“out-of” and “into” components) of E (the nodes of G ).
Proof. 
If G = ( V , E ) is strongly connected, then we have that rank ( A G ) = n , with n = | V | (see Lemma 4.2 in [28]). In addition, following the characterization theorem for the adjacency matrices of line graphs (Theorem 10 in [31]), we have that given two of the m = | E | rows of A G , they are either orthogonal or identical. As a result, there are only n different rows or prototypes in A G , say e 1 , e 2 , , e n , each one representing a class of equivalence (see Figure 3). Therefore, A G can be factorized as follows: A G = E i n E o u t , where E i n is a m × n indicator matrix with E i n ( e i , e j ) = 1 , if e i belongs to the class of e j and 0 otherwise; on the other hand, E o u t is the n × m matrix containing the different rows of A G (characteristic patterns of linkage of the line graph).
Interestingly, the prototypes e i , i = 1 , , n , are defined by the “into” node e i since edges ending in the same node e i in G exhibit the same pattern of linkage in G . As a result, A G = Q T E o u t E i n Q , where Q is a n × n permutation matrix where Q ( i , j ) = 1 if i = e j and 0 otherwise, with i V . In addition, the transition matrices of both G and G store the same probabilities in the non-zero positions and we have P G = E i n E o u t P and P G = Q T E o u t P E i n Q , where E o u t P ( e i , e j ) = P r [ e i , e j ] . Then, we have
P G r = ( E i n E o u t P ) ( E i n E o u t P ) ( E i n E o u t P ) r terms = E i n ( E o u t P E i n ) ( E o u t P E i n ) ( E o u t P E i n ) r 1 terms E o u t P = E i n ( Q P G Q T ) ( Q P G Q T ) ( Q P G Q T ) r 1 terms E o u t P = E i n P G r 1 E o u t P
where P G = Q P G Q T is the permuted P G . Then, we obtain P G r = E i n P G r 1 E o u t P , which is an m × m matrix. Since E i n is a rank-n indicator matrix, the resulting P G r matrix has only n distinct rows, i.e., many edges have the same probabilistic pattern of linkage. This is why the n × m matrix P G r 1 E o u t P retains only the different rows of P G r and it is the matricial version of Equation (4). □
The proof of the linearity theorem uncovers the differences between the structure of S G (see Equation (3)) and that of S G = 1 T r = 1 T P G r D G 1 . These differences lead to the rank preservation property of S G . This property is summarized by the following corollary:
Corollary 1
(Rank Preservation). The structure of S G preserves better the rank than it does S G for values of T (window size) closer to the mixing time t m i x of P G .
The proof is in Appendix A, where we refer the reader for a deeper understanding of the link between rank preservation and random walks.

2.5. Implications of Using Line Digraphs

In light of the formal analysis above, line digraphs and their embeddings have several pros and cons.
Implications of Rank Preservation. This is a key property for LDEs (Line Digraphs Embeddings) since the rank of the similarity matrix (and therefore that of the embedding) is closely related to the optimal dimension of the embedding. Following [41], a too-low dimensionality discards too much signal power, and this leads to a high bias (many vectors in the embedding collapse). On the other hand, a very large dimensionality leads to a high variance (the embedding becomes a noisy subspace). The optimal dimensionality is upper bound by the rank of the embedding. Thus, if we set a given dimensionality d n , LDEs (Line Digraph embeddings) hold a rank close to n 1 , whereas DEs (Digraph embeddings) do not generally reach this rank. As a result, LDEs retain naturally the most informative part of the signal (which reduces the bias) with respect to DEs.
Finally, Theorem 1 and its corollary lead to the following definition of the component-wise similarity for the line digraph:
M G ( e i , e j ) = v o l ( G ) b · d o u t ( e i ) 1 T r = 1 T k = 1 n α i k P G r 1 ( e i , e k ) .
Therefore, factorizing M ^ G allows us to classify edges in G . Since herein the label of the edge is that of the destination node, we thus obtain a proxy to classify nodes in G. As a result, the differences in node-classification performance between M ^ G and M ^ G are due to the following:
(a)
The flexibility of combining linearly many probabilities (actually n terms) for defining a single probability of linking any pair of edges in the line digraph.
(b)
The better rank preservation for the line digraph embedding with respect to its nodal counterpart. The rank of M ^ G is O ( n ) even for large values of the window size T. Since edges with the same destination nodes are hashed to nearly the same embedding vector (see the proof of Theorem 1), this enforces the coherent classification of these destination nodes.
Implications in Edge Mining. As an additional result, the component-wise Katz similarity (used in HOPE [8]) with G is K G ( e i , e j ) = r = 1 T β r k = 1 n α i k A G r 1 ( e i , e k ) , where A G is a permutation of A G , and we down weight the number of paths between each pair of nodes in G (edges in G). Such paths are linear combinations of the corresponding paths in G. This can be used to predict links in G (second-order paths in G), to quantify the centrality of an edge (apply [42] to the line digraphs embeddings). We may also identify edge-hubs and authorities (applying HITS [43] to the line digraph, see Figure 3).
The Need for Sparsification. Computing the line digraph of a realistic (i.e., large) digraph is very challenging since we have O ( n 2 ) nodes in the line digraph (the worst case arises with dense digraphs). To overcome this limitation, we must rely on digraph sparsification to filter out non-critical edges before embedding the line digraph. Although graph sparsification theory is mostly developed for non-directed graphs [44,45], there are recent approaches focused on asymmetric adjacency matrices, i.e., designed for digraphs. Herein, we follow Cohen et al.’s notion of matrix approximation: two adjacency matrices A G and A ˜ G are similar if their symmetrizations hold good spectral properties. They are spectrally similar [46]. With this notion to hand, we have that a good approximation (sparsification) A ˜ G should preserve the in-degrees and out-degrees of A G (Lemma 3.13 of [46]). As a result, the node e = ( i , j ) in G (edges in G ) will be sampled with probability p e :
p e = 1 2 n 1 d o u t ( i ) + 1 d i n ( j ) .
where d o u t ( i ) and d i n ( j ) are, respectively, the out-degree of i and the in-degree of j. The probability of keeping the error of the approximation below ϵ with probability p is achieved with sampling independently k 128 · ( 2 n / ϵ 2 ) log ( 2 n / p ) edges (see Theorem 3.9 in [46]).
In our experiments, we will analyze how competitive is the line digraph embedding in terms of classifying and clustering nodes with a decreasing number of sampled edges. Therefore, our approach is consistent with analyzing the robustness of the embedding to attacks (graph poisoning) [39]. Since edges are removed according to not linking good hubs with good authorities, both hubs and authorities are preserved as much as possible (hubs visit many authorities, and authorities are visited from many hubs). However, the singular gap (difference between the main singular value and the second best) may be reduced. Thus, we interpret the performance stability as evidence of the preservation of the singular gaps [47].

3. Experimental Results

3.1. Experimental Setup

In this paper, we analyze the following networks:
(a) Directed single-label citation networks: Cora [48]—Citation network containing 2708 scientific publications with 5278 links between them. Each publication is classified into one of 7 categories. CiteSeer for Document Classification [48]—Citation network containing 3312 scientific publications with 4676 links between them. Each publication is classified into one of 6 categories. Wiki—Contains a network of 2405 web pages with 17,981 links between them. Each page is classified into one of 19 categories. https://github.com/thunlp/MMDW/ (accessed on 10 March 2025). Citeseer and Cora have been retrieved from LINQS (https://linqs.soe.ucsc.edu/data, accessed on 10 March 2025). See Table 1 for details.
(b) Originally undirected multi-label networks: Protein-Protein Interactions (PPI)—Subgraph from PPI corresponding to Homo Sapiens. https://downloads.thebiogrid.org/BioGRID (accessed on 10 March 2025) [49]. Wikipedia Part-of-Speech (POS)—Co-occurrence of words appearing in the first million bytes of the Wikipedia dump. The categories correspond to the Part-of-Speech (POS) labels inferred by the Stanford POS-Tagger. Facebook social circles [50] that are treated as directed by adding bidirectional edges. These datasets have been retrieved from SNAP (https://snap.stanford.edu/node2vec/, accessed on 10 March 2025) [51]. See Table 2.
All the experiments were run on an Intel Xeon(R) W-2123 CPU @ 3.60 GHz × 8 , equipped with a GeForce RTX 2080 Ti and 32 Gb RAM. Regarding the hyper-parameters of node2vec and NetMF, we have used window size T = 10 , rw-path length L = 80 , 10 walks per node, p = q = 1 and dimension of the embedding vectors d = 128 . The down-weighting parameter for the HOPE algorithm is β = 0.1 . The obtained results are quite invariant to changes in these settings.

3.2. Classification Experiments

Protocol. The classification experiments are performed by training a logistic regression classifier with the embedding vectors corresponding to 50% of the nodes and tested with the remaining vectors, using the OpenNE framework (https://github.com/thunlp/OpenNE, accessed on 10 March 2025). Some cells in the results tables have been left blank due to computational limitations. Concerning the sparsification level, for the inverse degree of sparsification IDS z ( 0 , 1 ] , we sample O ( z m ) edges with probability p e given in Equation (7). A descending IDS illustrates better than a pair ϵ , p the robustness of the embedding.
Then, we proceed to embed edges and use these embeddings for edge classification and/or clustering. Since edges inherit their labels from nodes, the latter process returns a proxy representation for node classification or clustering.
Citation Networks. In Figure 4-Top, we show the stability of node classification for Cora, Citeseer, and Wiki. In all cases, the node and edge (line digraph) versions of NetMF are quite stable, even for small values of IDS (very sparse). However, the reachability patterns of Cora and Citeseer follow a power law with respect to the visiting of nodes in random walks (see Figure 1), which does not change when we apply the line graph (see their large numbers of sources and sinks in Table 2). Under these conditions, the random walks running on the line digraphs cannot reach many more different (contextual) nodes than when they run on the original graphs. This leads to very redundant embeddings. In addition, they have a low edge density. As a result, the LDE (node2vec(edge)) is not always the best alternative (especially in Citeseer).
However, for the Wiki network, the reachability pattern does not follow a power law (see Figure 1). In addition, the reachability pattern of the line digraph is even more entropic than that of the original graph. This is consistent with having much less sinks than Cora and Citeseer. It also has less sources than these networks but the higher density of the network allows the random walks exploring the line digraph to reach many different nodes. This leads to less redundant embeddings for the nodes of the line digraph. The node2vec(edges) algorithm works better because it reduces the redundancy with respect to the NetMF factorization.
In all cases, HOPE is stable but its performance is the lowest. It is only competitive with NetMF and node2vec in Cora and Citeseer because of their power laws. This is consistent with some findings showing that Katz similarity is competitive for networks with high-degree node coverage [52].
Undirected Multi-label Networks. Regarding PPI, Facebook, and POS, in Figure 4-Bottom, we show that in all cases the line digraph improves significantly the classification scores of the alternatives. It is worth mentioning that in all the cases in which the line digraph presents a significant improvement, it corresponds to datasets with a large edge/node ratio (i.e., cases in which the size of the line digraph presents a drastic growth with respect to the original graph). This is also the case of Wiki, in the citation networks.
Stability Under Sparsification. Concerning the citation networks, we do not observe significant differences between the magnitudes and the evolutions of their average singular gaps. However, Wiki has larger gaps than Cora and Citeseer for large values IDS (a small fraction of removed edges). Additionally, the most stable network (Facebook) has the largest average singular gap for the line digraph representation: 16.2 (line digraph) vs. 2.65 (original graph).
Summarizing, the line digraph embedding improves the alternatives when it is possible to reduce the redundancy. This is explained by the rank property of the line digraph. In general, line digraphs are more robust than original graphs with respect to increasing levels of sparsification, especially when their average singular gap is large (Facebook).

3.3. Clustering Experiments

Protocol. We evaluate the performance of the obtained embeddings for community detection problems, by applying agglomerative clustering on the embedding vectors. The total number of classes C is set to the number of labels. We analyze both the Adjusted Rand Index (ARI) and the modularity, which is defined for digraphs with adjacency matrix A , volume W, and proposed partition: P , as
Q ( A ) = 1 W C P i , j C A i j d o u t ( i ) d i n ( j ) ) W .
ARI Analysis. We run clustering experiments on all our single-label datasets: Cora, Citeseer, and Wiki. In Figure 5, we show the ARI values obtained by clustering the embedding of the original graph and the line digraph corresponding to the citation datasets. In the case of Cora, the results with the original graph, the line digraph, and the iterated line digraph are similar, but in the other cases, the line digraph outperforms the results obtained with the original graph.
Modularity Analysis. Regarding modularity, we have obtained both the modularity of the partition given by the original labeling of the graphs (ground truth), and the modularity of the partition provided by the clustering. Consistently with the ARI values, the modularity obtained with the line digraph outperforms the results of the original graph. In addition, we can observe that the modularity of the ground truth is almost the same in the original digraph and the line digraph. Interestingly, in Wiki we outperform the clustering obtained with the ground truth labeling.
Stability under Sparsification. The performance of clustering is less stable than that of classification for increasing levels of sparsification (decreasing IDS). This is due to the multi-modality of the labelled communities. Summarizing, the performance of clustering is less stable than that of classification, but still the best results are obtained with the line digraph representation.

3.4. GNN Experiments

To further validate our theoretical findings about line digraphs and their stability under sparsification, we extended our analysis to modern Graph Neural Networks (GNNs). While our previous experiments focused on node2vec-like embeddings, here we evaluate whether the maximum entropy properties of line digraphs translate to improved performance with GNN architectures as well.
GNNs have emerged as powerful tools for learning graph-structured data by iteratively updating node representations through neighborhood aggregation schemes [1]. In their simplest form, vanilla GNNs follow a message-passing framework where each node aggregates feature vectors from its neighbors and then applies a learnable transformation to update its own representation [53,54]. This process can be expressed as follows:
h v ( l + 1 ) = σ W ( l ) · AGGREGATE { h u ( l ) : u N ( v ) }
where h v ( l ) is the feature vector of node v at layer l, N ( v ) represents the neighbors of v, W ( l ) is a learnable weight matrix, and σ is a non-linear activation function.
We evaluated four state-of-the-art GNN architectures [55] that extend this basic framework in different ways:
  • Graph Convolutional Networks (GCN) [1]: Uses a simple weighted average of neighbor features with spectral motivation.
  • Graph Attention Networks (GAT) [2]: Employs attention mechanisms to weight neighbor contributions differently.
  • GraphSAGE (SAGE) [4]: Samples a fixed number of neighbors and learns how to aggregate their features.
  • Approximate Personalized Propagation of Neural Predictions (APPNP) [3]: Separates feature transformation from propagation using personalized PageRank.
Following standard practice in the literature, we used the established split of 60% training, 20% validation, and 20% testing sets for all datasets. For each architecture and dataset, we implemented two different sparsification strategies:
  • Edge sparsification: Randomly sampling edges while maintaining graph connectivity.
  • Node sparsification: Selecting a subset of nodes and preserving their associated edges.
The fraction of selected edges varied from 0.0 to 1.0 in steps of 0.1, allowing us to assess model robustness across different sparsification levels.
Figure 6 visualizes the learned edge embeddings from the line digraph of Cora using t-SNE dimensionality reduction at different sparsification levels. Remarkably, even with significant sparsification (80% of edges removed), the embedding space preserves clear community structures with well-defined boundaries between different classes. This visual evidence supports our theoretical claim that line digraphs maintain high entropy between communities even under aggressive sparsification, preserving the essential topological structure needed for node classification.
The quantitative results in Figure 7, Figure 8 and Figure 9 reveal several key findings:
  • Edge sparsification consistently outperforms node sparsification across all three datasets, with the effect being most pronounced in Citeseer and Pubmed. This aligns with our theory that strategic edge sampling preserves more of the graph’s spectral properties than node-based approaches.
  • GraphSAGE exhibits remarkable stability under edge sparsification, maintaining accuracy above 88% in Cora, 80% in Citeseer, and 90% in Pubmed even with only 10% of edges. This robustness can be attributed to its neighborhood sampling strategy, which naturally accommodates sparser structures.
  • GCN and GAT show similar patterns, with edge sparsification providing better performance than node sparsification when the fraction of selected edges is below 0.5.
  • APPNP demonstrates higher sensitivity to sparsification compared to other architectures, suggesting its personalized PageRank-based propagation mechanism benefits from denser graph structures.
These findings complement our earlier analysis of node2vec-like embeddings, providing additional evidence that the line digraph representation can maintain high classification performance even under significant edge reduction. The consistency of these results across both traditional embedding approaches and modern GNN architectures validates our theoretical framework regarding the entropy-maximizing properties of line digraphs and their stability under sparsification.

4. Conclusions

In this paper, we have proposed an edge-centric embedding: embedding the line digraph instead of the original digraph. Our motivation is that node-centric approaches do not model edge directionality. We have uncovered a formal link between the embedding of edges and that of nodes. This link leads to the ranking property, which allows us to (a) better classify and cluster nodes of the original graph, when we use their respective directed-edges embeddings as proxies, and (b) better understand the latent spaces of both nodes and edges, which may explain the success of their co-embedding. In addition, we filter out in advance the non-critical edges of the original graph to make the line digraph computation scalable. Our experiments show that line digraphs are stable under increasing levels of sparsification. These experiments are performed from two different angles: for node2vec-like latent spaces, where our nodes do not have attributes (only topology), and GNNs, where each node is attributed and the combination between topology and aggregation determines the latent spaces. In both cases, the entropy-maximization margin of line graphs is key.
Future work is motivated by the scalability problems arising in dense graphs, but this is inherent to investigating the role of an edge in a graph. This requires the development of more powerful and even learnable sparsifiers. In addition, the rank-preservation property (which is congruent with determinantal point processes) suggests that edge-embeddings could be more powerful than node-based ones for classifying graphs themselves through deep sets [56]. Finally, we will test to what extent LDEs contribute to structural transfer learning [26].

Author Contributions

Conceptualization, F.E.R.; Methodology, F.E.R. and A.B.; Software, A.B. and M.Á.L.; Validation, A.B., F.E.R. and M.Á.L.; Formal analysis, F.E.R.; Investigation, A.B. and F.E.R.; Resources, M.Á.L.; Data curation, A.B.; Writing—original draft, A.B.; Writing—review & editing, F.E.R. and M.Á.L.; Visualization, A.B.; Supervision, F.E.R. and M.Á.L.; Project administration, F.E.R.; Funding acquisition, F.E.R. All authors have read and agreed to the published version of the manuscript.

Funding

The authors are funded by the project PID2022-142516OB-I00 of the Spanish government.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Acknowledgments

The authors are indebted to Edwin R. Hancock from the University of York, who recently passed away.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of the data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:
LDELine Digraphs Embeddings
PMIPointwise Mutual Information
PPIProtein-Protein Interactions
POSPart-of-Speech
RWRandom Walk
SBMStochastic Block Model
HITSHyperlink-Induced Topic Search
IDSInverse Degree of Sparsification
ARIAdjusted Rand Index

Appendix A. Proof of the Rank-Preservation Corollary

Corollary A1
(Rank Preservation). The structure of S G preserves better the rank than it does S G for values of T (window size) closer to the mixing time t m i x of P G .
Proof. 
From the definition of S G we have the following:
lim T S G = lim T 1 T r = 1 T P G r D G 1 = Π G D G 1 ,
where Π G is an n × n matrix whose rows repeat n times the steady state π G = [ π G 1 , π G 2 , , π G n ] of the Markov chain defined by P G . Therefore, the asymptotic rank of S G is one. Similarly, for S G since we have the following:
lim T S G = lim T 1 T r = 1 T P G r D G 1 = Π G D G 1 ,
where the m × m matrix Π G has m identical rows with the stationary state vector π G = [ π G 1 , π G 2 , , π G m ] . However, a closer look to the above limit uncovers an interesting property for the G similarity:
lim T S G = lim T 1 T r = 1 T P G r D G 1 = lim T 1 T r = 1 T E i n P G r 1 E o u t P D G 1 = E i n lim T 1 T r = 1 T P G r 1 E o u t P D G 1 = E i n lim T 1 T I + P G + P G 2 + E o u t P D G 1 .
Equation (A3) leads to Equation (A2), thus uncovering Π G E i n Π ˜ G E o u t P , where Π ˜ G = Q T Π G Q , i.e., it is a permutation of Π G (see the proof of Theorem 1). However, for large but finite values of T = T l , we have that
P G + P G 2 + + P G T l = I + P G + P G 2 + + P G T l 1 P G = ( P G T l I ) ( I P G ) 1 P G ,
For T = T l , S G is defined as
S G l a r g e = E i n 1 T l I + P G + P G 2 + + P G T l 1 E o u t P D G 1 = E i n 1 T l I + ( P G T l 1 I ) ( I P G ) 1 P G E o u t P = 1 T l E i n E o u t P + E i n ( P G T l 1 I ) ( I P G ) 1 P G E o u t P D G 1
= 1 T l E i n E o u t P + E i n ( I P G ) 1 ( P G T l 1 I ) P G E o u t P D G 1 .
However, the shape of S G l a r g e is quite different:
S G l a r g e = 1 T l P G I 1 ( P G T l I ) P G D G 1 .
In terms of rank, for T l large enough (e.g., close to the mixing time t m i x ), the ergodicity theorem leads to P G T l = Π G . In this case, r a n k ( S G l a r g e ) < n , despite P G I 1 exists (full rank), since we have d e t ( ( P G T l I ) P G ) = d e t ( P G T l I ) = 0 .
However, this is not (necessarily) the case of S G l a r g e , despite ( I P G ) 1 ( P G T l 1 I ) P G is not full rank. Since, E i n E o u t P = P G (by definition) we have that r a n k ( E i n E o u t P ) = n . Considering the worst case, where the rank of ( I P G ) 1 ( P G T l 1 I ) P G is one, we have that post-multiplying this matrix by E o u t P cannot increase this rank. In addition, the role of the pre-multiplying (indicator) matrix E i n is just to finally produce a m × m matrix. Thus, in the worst case, S G l a r g e becomes the sum of a full-rank matrix and a rank-one matrix. Therefore, r a n k ( S G l a r g e ) n 1 . In this case, S G l a r g e is only full rank when the Sherman–Morrison conditions are fulfilled.
In practice, however, T l t m i x , and one can (a priori) expect a better rank preservation behavior of S G l a r g e with respect to S G l a r g e . However, this is not always necessarily true. A close look to recent bounds of t m i x in (Eulerian) digraphs [57] shows that if we define
t m i x ( ϵ ) = min t 0 : m a x x , y P G t ( x , y ) π G ( y ) 1 ϵ ,
then the upper bound of t m i x is O ( m n ) . As a result, it is expected that S G l a r g e preserves its rank better than S G l a r g e when G is dense. □

References

  1. Kipf, T.N.; Welling, M. Semi-Supervised Classification with Graph Convolutional Networks. In Proceedings of the International Conference on Learning Representations (ICLR), Toulon, France, 24–26 April 2017. [Google Scholar]
  2. Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Liò, P.; Bengio, Y. Graph Attention Networks. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
  3. Klicpera, J.; Bojchevski, A.; Günnemann, S. Personalized Embedding Propagation: Combining Neural Networks on Graphs with Personalized PageRank. arXiv 2018, arXiv:1810.05997. [Google Scholar]
  4. Hamilton, W.; Ying, Z.; Leskovec, J. Inductive representation learning on large graphs. Adv. Neural Inf. Process. Syst. 2017, 30, 1025–1035. [Google Scholar]
  5. Waikhom, L.; Patgiri, R. A survey of graph neural networks in various learning paradigms: Methods, applications, and challenges. Artif. Intell. Rev. 2023, 56, 6295–6364. [Google Scholar] [CrossRef]
  6. Veličković, P.; Badia, A.P.; Budden, D.; Pascanu, R.; Banino, A.; Dashevskiy, M.; Hadsell, R.; Blundell, C. The CLRS Algorithmic Reasoning Benchmark. arXiv 2022, arXiv:2205.15659. [Google Scholar]
  7. Markeeva, L.; McLeish, S.; Ibarz, B.; Bounsi, W.; Kozlova, O.; Vitvitskyi, A.; Blundell, C.; Goldstein, T.; Schwarzschild, A.; Veličković, P. The CLRS-Text Algorithmic Reasoning Language Benchmark. arXiv 2024, arXiv:2406.04229. [Google Scholar]
  8. Ou, M.; Cui, P.; Pei, J.; Zhang, Z.; Zhu, W. Asymmetric Transitivity Preserving Graph Embedding. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 1105–1114. [Google Scholar] [CrossRef]
  9. Khosla, M.; Leonhardt, J.; Nejdl, W.; Anand, A. Node Representation Learning for Directed Graphs. In Proceedings of the ECML/PKDD (1), Würzburg, Germany, 16–20 September 2019; Brefeld, U., Fromont, E., Hotho, A., Knobbe, A.J., Maathuis, M.H., Robardet, C., Eds.; Lecture Notes in Computer Science. Springer: Berlin/Heidelberg, Germany, 2019; Volume 11906, pp. 395–411. [Google Scholar]
  10. Benzi, M.; Estrada, E.; Klymko, C. Ranking hubs and authorities using matrix functions. Linear Algebra Its Appl. 2013, 438, 2447–2474. [Google Scholar] [CrossRef]
  11. Wang, Y.; Wang, D.; Liu, H.; Hu, B.; Yan, Y.; Zhang, Q.; Zhang, Z. Optimizing Long-tailed Link Prediction in Graph Neural Networks through Structure Representation Enhancement. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD ’24, Barcelona, Spain, 25–29 August 2024; pp. 3222–3232. [Google Scholar] [CrossRef]
  12. Grover, A.; Leskovec, J. node2vec: Scalable Feature Learning for Networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 855–864. [Google Scholar]
  13. Perozzi, B.; Al-Rfou, R.; Skiena, S. DeepWalk: Online Learning of Social Representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’14, New York, NY, USA, 24–27 August 2014; pp. 701–710. [Google Scholar] [CrossRef]
  14. Tang, J.; Qu, M.; Wang, M.; Zhang, M.; Yan, J.; Mei, Q. LINE: Large-scale Information Network Embedding. In Proceedings of the 24th International Conference on World Wide Web, WWW 2015, Florence, Italy, 18–22 May 2015; pp. 1067–1077. [Google Scholar] [CrossRef]
  15. Qiu, J.; Dong, Y.; Ma, H.; Li, J.; Wang, K.; Tang, J. Network Embedding As Matrix Factorization: Unifying DeepWalk, LINE, PTE, and Node2Vec. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, WSDM ’18, Marina Del Rey, CA, USA, 5–9 February 2018; pp. 459–467. [Google Scholar] [CrossRef]
  16. Chen, S.; Niu, S.; Akoglu, L.; Kovacevic, J.; Faloutsos, C. Fast, Warped Graph Embedding: Unifying Framework and One-Click Algorithm. arXiv 2017, arXiv:1702.05764. [Google Scholar]
  17. Beaini, D.; Passaro, S.; Létourneau, V.; Hamilton, W.; Corso, G.; Lió, P. Directional Graph Networks. In Proceedings of the 38th International Conference on Machine Learning, Virtual, 18–24 July 2021; Volume 139, pp. 748–758. [Google Scholar]
  18. Chien, E.; Peng, J.; Li, P.; Milenkovic, O. Adaptive Universal Generalized PageRank Graph Neural Network. In Proceedings of the International Conference on Learning Representations, Virtual, 3–7 May 2021. [Google Scholar]
  19. Abboud, R.; Dimitrov, R.; Ceylan, I.I. Shortest Path Networks for Graph Property Prediction. In Proceedings of the First Learning on Graphs Conference, Virtual, 9–12 December 2022. [Google Scholar]
  20. Song, Y.; Zhou, C.; Wang, X.; Lin, Z. Ordered GNN: Ordering Message Passing to Deal with Heterophily and Over-smoothing. In Proceedings of the Eleventh International Conference on Learning Representations, Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
  21. Begga, A.; Escolano, F.; Lozano, M.Á. Node classification in the heterophilic regime via diffusion-jump GNNs. Neural Netw. 2025, 181, 106830. [Google Scholar] [CrossRef]
  22. Chen, Z.; Li, L.; Bruna, J. Supervised Community Detection with Line Graph Neural Networks. In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
  23. Lai, Y.A.; Hsu, C.C.; Chen, W.H.; Yeh, M.Y.; Lin, S.D. PRUNE: Preserving Proximity and Global Ranking for Network Embedding. In Advances in Neural Information Processing Systems 30; Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2017; pp. 5257–5266. [Google Scholar]
  24. Wang, X.; Cui, P.; Wang, J.; Pei, J.; Zhu, W.; Yang, S. Community Preserving Network Embedding. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, AAAI’17, San Francisco, CA, USA, 4–9 February 2017; pp. 203–209. [Google Scholar]
  25. Jiang, X.; Zhu, R.; Li, S.; Ji, P. Co-embedding of Nodes and Edges with Graph Neural Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 45, 7075–7086. [Google Scholar] [CrossRef]
  26. Shen, X.; Dai, Q.; Mao, S.; Chung, F.L.; Choi, K.S. Network Together: Node Classification via Cross-Network Deep Network Embedding. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 1935–1948. [Google Scholar] [CrossRef]
  27. Escolano, F.; Suau, P.; Bonev, B. Information Theory in Computer Vision and Pattern Recognition, 1st ed.; Springer Publishing Company, Incorporated: Berlin/Heidelberg, Germany, 2009. [Google Scholar]
  28. Balbuena, C.; Ferrero, D.; Marcote, X.; Pelayo, I. Algebraic Properties of a Digraph and its Line Digraph. J. Interconnect. Netw. 2003, 4, 377–393. [Google Scholar] [CrossRef]
  29. Terán, F.D.; Lippert, R.; Nakatsukasa, Y.; Noferini, V. Flanders theorem for many matrices under commutativity assumptions. Linear Algebra Its Appl. 2014, 443, 120–138. [Google Scholar] [CrossRef]
  30. Johnson, C.; Schreiner, E. The Relationship Between AB and BA. Am. Math. Mon. 1996, 103, 578–582. [Google Scholar] [CrossRef]
  31. Pakonski, P.; Tanner, G.; Zyczkowski, K. Families of Line-Graphs and Their Quantization. J. Stat. Phys. 2003, 111, 1331–1352. [Google Scholar] [CrossRef]
  32. Mülken, O.; Blumen, A. Continuous-time quantum walks: Models for coherent transport on complex networks. Phys. Rep. 2011, 502, 37–87. [Google Scholar] [CrossRef]
  33. Sinatra, R.; Gómez-Gardeñes, J.; Lambiotte, R.; Nicosia, V.; Latora, V. Maximal-entropy random walks in complex networks with limited information. Phys. Rev. E 2011, 83, 030103. [Google Scholar] [CrossRef]
  34. Fiol, M.; Alegre, I.; Yebra, J. Line Digraph Iterations and the (d,k) Problem for Directed Graphs. In Proceedings of the 10th Annual Symposium on Computer Architecture, Washington, DC, USA, 22–26 August 1983; pp. 174–177. [Google Scholar] [CrossRef]
  35. Dalfó, C. Iterated line digraphs are asymptotically dense. Linear Algebra Its Appl. 2017, 529, 391–396. [Google Scholar] [CrossRef]
  36. Levy, O.; Goldberg, Y. Neural Word Embedding as Implicit Matrix Factorization. In Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, Montreal, QC, Canada, 8–13 December 2014; pp. 2177–2185. [Google Scholar]
  37. Curado, M.; Escolano, F.; Lozano, M.; Hancock, E. Dirichlet densifiers for improved commute times estimation. Pattern Recognit. 2019, 91, 56–68. [Google Scholar] [CrossRef]
  38. Curado, M.; Lozano, M.; Escolano, F.; Hancock, E. Dirichlet densifier bounds: Densifying beyond the spectral gap constraint. Pattern Recognit. Lett. 2019, 125, 425–431. [Google Scholar] [CrossRef]
  39. Bojchevski, A.; Günnemann, S. Adversarial Attacks on Node Embeddings via Graph Poisoning. In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, Long Beach, CA, USA, 9–15 June 2019; pp. 695–704. [Google Scholar]
  40. Miller, M.; Siran, J. Moore graphs and beyond: A survey of the degree/diameter problem. Electron. J. Comb. 2013, 20, 1–92. [Google Scholar] [CrossRef]
  41. Yin, Z.; Shen, Y. On the Dimensionality of Word Embedding. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18, Montréal, QC, Canada, 3–8 December 2018; pp. 895–906. [Google Scholar]
  42. Fan, C.; Zeng, L.; Ding, Y.; Chen, M.; Sun, Y.; Liu, Z. Learning to Identify High Betweenness Centrality Nodes from Scratch: A Novel Graph Neural Network Approach. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, CIKM ’19, Beijing, China, 3–7 November 2019; pp. 559–568. [Google Scholar] [CrossRef]
  43. Kleinberg, J.M. Authoritative Sources in a Hyperlinked Environment. J. ACM 1999, 46, 604–632. [Google Scholar] [CrossRef]
  44. Spielman, D.A.; Teng, S.H. Spectral Sparsification of Graphs. SIAM J. Comput. 2011, 40, 981–1025. [Google Scholar] [CrossRef]
  45. Cheng, D.; Cheng, Y.; Liu, Y.; Peng, R.; Teng, S.H. Spectral Sparsification of Random-Walk Matrix Polynomials. arXiv 2015, arXiv:1502.03496. [Google Scholar]
  46. Cohen, M.B.; Kelner, J.A.; Peebles, J.; Peng, R.; Rao, A.B.; Sidford, A.; Vladu, A. Almost-Linear-Time Algorithms for Markov Chains and New Spectral Primitives for Directed Graphs. arXiv 2016, arXiv:1611.00755. [Google Scholar]
  47. Donato, D.; Leonardi, S.; Tsaparas, P. Stability and Similarity of Link Analysis Ranking Algorithms. Internet Math. 2006, 3, 479–507. [Google Scholar] [CrossRef]
  48. Sen, P.; Namata, G.; Bilgic, M.; Getoor, L.; Gallagher, B.; Eliassi-Rad, T. Collective Classification in Network Data. AI Mag. 2008, 29, 93–106. [Google Scholar] [CrossRef]
  49. Breitkreutz, B.; Stark, C.; Reguly, T.; Boucher, L.; Breitkreutz, A.; Livstone, M.S.; Oughtred, R.; Lackner, D.H.; Bähler, J.; Wood, V.; et al. The BioGRID Interaction Database: 2008 update. Nucleic Acids Res. 2008, 36, 637–640. [Google Scholar] [CrossRef]
  50. McAuley, J.; Leskovec, J. Learning to Discover Social Circles in Ego Networks. In Proceedings of the Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012, Lake Tahoe, NV, USA, 3–6 December 2012; pp. 548–556. [Google Scholar]
  51. Leskovec, J.; Krevl, A. SNAP Datasets: Stanford Large Network Dataset Collection. 2014. Available online: http://snap.stanford.edu/data (accessed on 10 March 2025).
  52. Song, H.H.; Cho, T.W.; Dave, V.; Zhang, Y.; Qiu, L. Scalable Proximity Estimation and Link Prediction in Online Social Networks. In Proceedings of the 9th ACM SIGCOMM Conference on Internet Measurement, Chicago, IL, USA, 4–6 November 2009; pp. 322–335. [Google Scholar] [CrossRef]
  53. Gilmer, J.; Schoenholz, S.S.; Riley, P.F.; Vinyals, O.; Dahl, G.E. Neural Message Passing for Quantum Chemistry. arXiv 2017, arXiv:1704.01212. [Google Scholar]
  54. Ju, W.; Fang, Z.; Gu, Y.; Liu, Z.; Long, Q.; Qiao, Z.; Qin, Y.; Shen, J.; Sun, F.; Xiao, Z.; et al. A Comprehensive Survey on Deep Graph Representation Learning. Neural Netw. 2024, 173, 106207. [Google Scholar] [CrossRef]
  55. Wu, Z.; Pan, S.; Chen, F.; Long, G.; Zhang, C.; Yu, P.S. A Comprehensive Survey on Graph Neural Networks. arXiv 2019, arXiv:1901.00596. [Google Scholar] [CrossRef]
  56. Zaheer, M.; Kottur, S.; Ravanbakhsh, S.; Poczos, B.; Salakhutdinov, R.R.; Smola, A.J. Deep Sets. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2017; Volume 30. [Google Scholar]
  57. Boczkowski, L.; Peres, Y.; Sousi, P. Sensitivity of Mixing Times in Eulerian Digraphs. SIAM J. Discret. Math. 2018, 32, 624–655. [Google Scholar] [CrossRef]
Figure 1. The Long-Tail Effect. Histogram of occurrence of each node in the random walks for Cora (top), Citeseer (middle) and Wiki (bottom) datasets, by using the original graph (left), the line graph (center), and the iterated line graph (right). y axis is represented in log scale.
Figure 1. The Long-Tail Effect. Histogram of occurrence of each node in the random walks for Cora (top), Citeseer (middle) and Wiki (bottom) datasets, by using the original graph (left), the line graph (center), and the iterated line graph (right). y axis is represented in log scale.
Entropy 27 00304 g001
Figure 2. Examples of (a,d) original digraphs, (b,e) their line digraphs, and (c,f) their iterated line digraphs. The color of the nodes represents two different labels. Herein, edges are labeled using the label of the destination node. In other application domains may be more convenient to use the label of the source.
Figure 2. Examples of (a,d) original digraphs, (b,e) their line digraphs, and (c,f) their iterated line digraphs. The color of the nodes represents two different labels. Herein, edges are labeled using the label of the destination node. In other application domains may be more convenient to use the label of the source.
Entropy 27 00304 g002
Figure 3. Explanation of the Linearity Theorem. (Top): Input digraph G and its line digraph G . Colors index nodes as well as edges destinations. Example: compute P G 4 ( e 10 , e 9 ) , where e 10 is a hub and e 9 is an authority. (Middle): Adjacencies A G vs. A G . Clearly, r a n k ( A G ) = n , with n = 6 nodes in this case. E o u t retains the n different rows of A G . E i n is the indicator matrix where E i n ( i , j ) = 1 means that e i belongs to the class of equivalence j = 1 , , n . Then, E o u t E i n is a permutation of A G (see colors to uncover the permutation matrix Q making A G = Q T E o u t E i n Q ). (Bottom): Explicit linear combinations. P G r as a function of P G r 1 (taking Q into account). P G 4 ( e 10 , e 9 ) results from the aggregation of 3 paths in G, all of them starting from e 10 = V 1 and ending at any node e k V in 3 steps. Two of the paths reach V 6 = e k = 6 (and also V 5 = e k = 9 ) and another one reachs V 5 = e k = 9 . Since only e 6 is linked with e 9 , we only retain the scores of the paths reaching V 6 = e k = 6 , i.e., P G 4 ( e 10 , e 9 ) = 1 3 + 1 9 = 4 9 .
Figure 3. Explanation of the Linearity Theorem. (Top): Input digraph G and its line digraph G . Colors index nodes as well as edges destinations. Example: compute P G 4 ( e 10 , e 9 ) , where e 10 is a hub and e 9 is an authority. (Middle): Adjacencies A G vs. A G . Clearly, r a n k ( A G ) = n , with n = 6 nodes in this case. E o u t retains the n different rows of A G . E i n is the indicator matrix where E i n ( i , j ) = 1 means that e i belongs to the class of equivalence j = 1 , , n . Then, E o u t E i n is a permutation of A G (see colors to uncover the permutation matrix Q making A G = Q T E o u t E i n Q ). (Bottom): Explicit linear combinations. P G r as a function of P G r 1 (taking Q into account). P G 4 ( e 10 , e 9 ) results from the aggregation of 3 paths in G, all of them starting from e 10 = V 1 and ending at any node e k V in 3 steps. Two of the paths reach V 6 = e k = 6 (and also V 5 = e k = 9 ) and another one reachs V 5 = e k = 9 . Since only e 6 is linked with e 9 , we only retain the scores of the paths reaching V 6 = e k = 6 , i.e., P G 4 ( e 10 , e 9 ) = 1 3 + 1 9 = 4 9 .
Entropy 27 00304 g003
Figure 4. Classification stability. (Top): directed single-label citation networks. Legend: node2vec node refers to performance for the nodal embedding, node2vec edge refers to performance for edge embedding, and similarly for NetFM and HOPE. (Bottom): undirected multi-label networks. (Top): Micro-F1 score. (Bottom): Macro-F1 score.
Figure 4. Classification stability. (Top): directed single-label citation networks. Legend: node2vec node refers to performance for the nodal embedding, node2vec edge refers to performance for edge embedding, and similarly for NetFM and HOPE. (Bottom): undirected multi-label networks. (Top): Micro-F1 score. (Bottom): Macro-F1 score.
Entropy 27 00304 g004
Figure 5. Clustering stability: Evolution of ARI (top) vs. evolution of modularity (bottom) for different sparsification levels (fraction of preserved edges). Legend: same as in classification results.
Figure 5. Clustering stability: Evolution of ARI (top) vs. evolution of modularity (bottom) for different sparsification levels (fraction of preserved edges). Legend: same as in classification results.
Entropy 27 00304 g005
Figure 6. Visualization of learned edge embeddings using t-SNE dimensionality reduction of the Cora dataset. Colors represent different ground truth classes. Note that even with significant sparsification (80% of edges removed), the embedding space maintains clear community structures and distinct class boundaries, demonstrating the entropy-preserving properties of line digraphs even under aggressive edge reduction.
Figure 6. Visualization of learned edge embeddings using t-SNE dimensionality reduction of the Cora dataset. Colors represent different ground truth classes. Note that even with significant sparsification (80% of edges removed), the embedding space maintains clear community structures and distinct class boundaries, demonstrating the entropy-preserving properties of line digraphs even under aggressive edge reduction.
Entropy 27 00304 g006
Figure 7. Comparison of node and edge sparsification strategies across GNN architectures on the Cora dataset. Solid lines represent node sparsification, dashed lines represent edge sparsification. The x-axis indicates the fraction of edges preserved (from 0.0 to 1.0), while the y-axis shows classification accuracy. Edge sparsification consistently outperforms node sparsification, with GraphSAGE (dashed green) showing remarkable stability even with minimal edge retention.
Figure 7. Comparison of node and edge sparsification strategies across GNN architectures on the Cora dataset. Solid lines represent node sparsification, dashed lines represent edge sparsification. The x-axis indicates the fraction of edges preserved (from 0.0 to 1.0), while the y-axis shows classification accuracy. Edge sparsification consistently outperforms node sparsification, with GraphSAGE (dashed green) showing remarkable stability even with minimal edge retention.
Entropy 27 00304 g007
Figure 8. Comparison of node and edge sparsification strategies across GNN architectures on the Citeseer dataset. Solid lines represent node sparsification, dashed lines represent edge sparsification. The performance gap between edge and node sparsification is more pronounced in Citeseer than in Cora, with edge-based methods (dashed lines) maintaining consistently higher accuracy across all sparsification levels. This suggests that preserving edges is particularly important for maintaining the structural information in citation networks with sparse connectivity patterns.
Figure 8. Comparison of node and edge sparsification strategies across GNN architectures on the Citeseer dataset. Solid lines represent node sparsification, dashed lines represent edge sparsification. The performance gap between edge and node sparsification is more pronounced in Citeseer than in Cora, with edge-based methods (dashed lines) maintaining consistently higher accuracy across all sparsification levels. This suggests that preserving edges is particularly important for maintaining the structural information in citation networks with sparse connectivity patterns.
Entropy 27 00304 g008
Figure 9. Comparison of node and edge sparsification strategies across GNN architectures on the Pubmed dataset. Solid lines represent node sparsification, dashed lines represent edge sparsification. All GNN architectures show remarkable stability under edge sparsification (dashed lines), with GraphSAGE maintaining above 90% accuracy even with only 10% of edges preserved. This demonstrates that strategic edge preservation captures the essential topology of the graph, aligning with our theoretical findings about the maximal entropy properties of line digraphs.
Figure 9. Comparison of node and edge sparsification strategies across GNN architectures on the Pubmed dataset. Solid lines represent node sparsification, dashed lines represent edge sparsification. All GNN architectures show remarkable stability under edge sparsification (dashed lines), with GraphSAGE maintaining above 90% accuracy even with only 10% of edges preserved. This demonstrates that strategic edge preservation captures the essential topology of the graph, aligning with our theoretical findings about the maximal entropy properties of line digraphs.
Entropy 27 00304 g009
Table 1. Properties of Cora, Citeseer, and Wiki datasets.
Table 1. Properties of Cora, Citeseer, and Wiki datasets.
DatasetNodesEdgesClassesSinksSourcesRatioDensity
Cora27085429748611432.0048010.0741%
Cora line graph542910,147771619261.8690370.0344%
Cora iterated10,14718,6787121333581.8407410.0181%
Citeseer332747326100613651.4223020.0428%
Citeseer line graph47326413686413321.3552410.0286%
Citeseer iterated64138254689415371.2870730.0201%
Wiki240516,52317193606.8702700.2858%
Wiki line graph16,523227,1311786124213.7463540.0832%
Wiki iterated227,1313,786,4441799316,02116.6707490.0073%
Table 2. Properties of PPI, POS, and Facebook datasets.
Table 2. Properties of PPI, POS, and Facebook datasets.
DatasetNodesEdgesClassesDiameterRatioDensity
PPI389076,58450819.6874040.5062%
PPI line graph76,5846,113,02450979.8211640.1042%
Facebook4039176,46810821.8455060.5410%
Facebook line graph176,46818,806,166109106.5698370.0604%
POS4777184,81240338.6878790.8100%
POS line graph184,81299,322,576404537.4249290.2908%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Begga, A.; Escolano Ruiz, F.; Lozano, M.Á. Edge-Centric Embeddings of Digraphs: Properties and Stability Under Sparsification. Entropy 2025, 27, 304. https://doi.org/10.3390/e27030304

AMA Style

Begga A, Escolano Ruiz F, Lozano MÁ. Edge-Centric Embeddings of Digraphs: Properties and Stability Under Sparsification. Entropy. 2025; 27(3):304. https://doi.org/10.3390/e27030304

Chicago/Turabian Style

Begga, Ahmed, Francisco Escolano Ruiz, and Miguel Ángel Lozano. 2025. "Edge-Centric Embeddings of Digraphs: Properties and Stability Under Sparsification" Entropy 27, no. 3: 304. https://doi.org/10.3390/e27030304

APA Style

Begga, A., Escolano Ruiz, F., & Lozano, M. Á. (2025). Edge-Centric Embeddings of Digraphs: Properties and Stability Under Sparsification. Entropy, 27(3), 304. https://doi.org/10.3390/e27030304

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop