Beyond Weisfeiler–Lehman with Local Ego-Network Encodings †

: Identifying similar network structures is key to capturing graph isomorphisms and learning representations that exploit structural information encoded in graph data. This work shows that ego networks can produce a structural encoding scheme for arbitrary graphs with greater expressivity than the Weisfeiler–Lehman (1-WL) test. We introduce I GEL , a preprocessing step to produce features that augment node representations by encoding ego networks into sparse vectors that enrich message passing (MP) graph neural networks (GNNs) beyond 1-WL expressivity. We formally describe the relation between I GEL and 1-WL, and characterize its expressive power and limitations. Experiments show that I GEL matches the empirical expressivity of state-of-the-art methods on isomorphism detection while improving performance on nine GNN architectures and six graph machine learning tasks.


Introduction
Novel approaches for representation learning on graph-structured data have appeared in recent years [1].Graph neural networks can efficiently learn representations that depend both on the graph structure and node and edge features from large-scale graph data sets.The most popular choice of architecture is the message passing graph neural network (MP-GNN).MP-GNNs represent nodes by repeatedly aggregating feature 'messages' from their neighbors.
To improve the expressivity of MP-GNNs, recent methods have extended the vanilla message passing mechanism is various ways.For example, using higher order k-vertex tuples [9] leading to k-WL generalizations, introducing relative positioning information for network vertices [15], propagating messages beyond direct neighborhoods [16], using concepts from algebraic topology [17], or combining subgraph information in different ways [18][19][20][21][22][23][24].Similarly, provably powerful graph networks (PPGN) [25] have been proposed as an architecture with 3-WL expressivity guarantees, at the cost of quadratic memory and cubic time complexities with respect to the number of nodes.More recently, Balcilar Graph neural networks are deep learning architectures for learning representations on graphs.The most popular choice of architecture is the message passing graph neural network (MP-GNN).In MP-GNNs, there is a direct correspondence between the connectivity of network layers and the structure of the input graph.Because of this, the representation (embedding) of each node depends directly on its neighbors and only indirectly on more distant nodes.
Each layer of an MP-GNN computes an embedding for a node by iteratively aggregating its attributes and the attributes of their neighboring nodes.Aggregation is expressed via two parametrized functions: MSG, which represents the computation of joint information for a vertex and a given neighbor, and UPDATE, a pooling operation over messages that produces a vertex representation.Let h 0 v ∈ R w denote an initial w-dimensional feature vector associated with v ∈ V.Each i-th GNN layer computes the i-th message passing step, such that µ i v is the multi-set of messages received by v: and h i v is the output of a permutation-invariant UPDATE function over the message multi-set and the previous vertex state: For machine learning tasks that require graph-level embeddings, an additional parameterized function dubbed READOUT produces a graph-level representation r i G , pooling all vertex representations at step i: The functions MSG, UPDATE, and READOUT are differentiable with respect to their parameters, which are optimized during learning via gradient descent.The choice of functions gives rise to a broad variety of GNN architectures [30][31][32][33][34].However, it has been shown that all MP-GNNs defined by MSG, UPDATE, and READOUT are at most as expressive as the 1-WL test when distinguishing non-isomorphic graphs [8].

Expressivity of Weisfeiler-Lehman and MATLANG
The classic Weisfeiler-Lehman algorithm (1-WL), also known as color refinement, is shown in Algorithm 1.The algorithm starts with an initial color assignment to each vertex c 0 v according to their degree and proceeds updating the assignment at each iteration.

Algorithm 1 1-WL (Color refinement).
Input: G = (V, E) The update aggregates, for each node v, its color c i v and the colors of its neigbors c i u , then hashes this multi-set of colors, mapping it into a new color to be used in the next iteration c i+1 v .The algorithm converges to a stable color assignment, which can be used to test graphs for isomorphism.Note that neighbor aggregation in 1-WL can be understood as a message passing step, with the hash operation being analogous to UPDATE steps in MP-GNNs.
Two graphs, G 1 , G 2 , are not isomorphic if they are distinguishable (that is, their stable color assignments do not match).However, if they are not distinguishable (that is, their stable color assignments match), they are likely to be isomorphic [35].To reduce the likelihood of false positives when color assignments match, one can consider k-tuples of vertices instead of single vertices, leading to higher order variants of the WL test (denoted k-WL) which assign colors to k-vertex tuples.In this case, G 1 and G 2 are said to be k-WL equivalent, denoted G 1 ≡ k−WL G 2 , if their stable assignments are not distinguishable.For more details on the algorithm, we refer to [14,36].k-WL tests are more expressive than their (k − 1)-WL counterparts for k > 2, with the exception of 2-WL, which is known to be as expressive as the 1-WL color refinement test [10].
Among the various characterizations of k-WL expressivity, the relationship with matrix query languages defined by MATLANG [11] has also found applications for graph representation learning [26].MATLANG is a language of operations on matrices, where sentences are formed by sequences of operations.There exists a subset of MATLANG-ML 1 -that is as expressive as the 1-WL test.Another subset-ML 2 -is strictly more expressive than the 1-WL test but less expressive than the 3-WL test.Finally, there exists another subset ML 3 that is as expressive as the 3-WL test [13].We provide additional technical details on MATLANG and its sub-languages in section E, where we analyze the relation between IGEL and MATLANG.

Graph Neural Networks Beyond 1-WL
Recently, approaches have been proposed for improving the expressivity of MP-GNNs.Here, we focus on subgraph and substructure GNNs, which are most closely related to IGEL.For an overview on augmented message passing methods, see [37] and Appendix I.
k-hop MP-GNNs (k-hop) [16] propagate messages beyond immediate vertex neighbors, effectively using ego network information in the vertex representation.Neighborhood subgraphs are extracted, and message passing occurs on each subgraph, with an exponential cost on the number of hops k both at preprocessing and at each iteration (epoch).In contrast, IGEL only requires a preprocessing step that can be cached once computed.
Distance encoding GNNs (DE-GNN) [38] also propose to improve MP-GNN by using extra node features by encoding distances to a subset of p nodes.The features obtained by DE-GNN are similar to IGEL when conditioning the subset to size p = 1 and using a distance encoding function with k = α.However, these features are not strictly equivalent to IGEL, as node degrees within the ego network can be smaller than in the full graph.
Graph Substructure networks (GSNs) [19] incorporate topological features by counting local substructures (such as the presence of cliques or cycles).GSNs require expert knowledge on what features are relevant for a given task with modifications to MP-GNN architectures.In contrast, IGEL reaches comparable performance using a general encoding for ego networks and without altering the original message passing mechanism.
GNNML3 [26] performs message passing in spectral domain with a custom frequency profile.While this approach achieves good performance on graph classification, it requires an expensive preprocessing step for computing the eigendecomposition of the graph Laplacian and O(k)-order tensors to achieve k-WL expressiveness with cubic time complexity.
More recently, a series of methods formulate the problem of representing vertices or graphs as aggregations over subgraphs.The subgraph information is pooled or introduced during message passing at an additional cost that varies depending on each architecture.Consequently, they require generating the subgraphs (or effectively replicating the nodes of every subgraph of interest) and pay an additional overhead due to the aggregation.
These approaches include identity-aware GNNs (ID-GNNs) [39], which embed each node while incorporating identity information in the GNN and apply rounds of heterogeneous message passing; nested GNNs (NGNNs) [20], which perform a two-level GNN using rooted subgraphs and consider a graph as a bag of subgraphs; GNN-as-Kernel (GNN-AK) [21], which follows a similar idea but introduces additional positional and contextual embeddings during aggregation; equivariant subgraph aggregation networks (ESAN) [22] encode graphs as bags of subgraphs and show that such an encoding can lead to a better expressive power; shortest-path neural networks (SPNNs) [40], which represent nodes by aggregating over sets of neighbors at the same distance; and subgraph union networks (SUN) [23], which unify and generalize previous subgraph GNN architectures and connect them to invariant graph networks [41].Compared to all these methods, IGEL only relies on an initial preprocessing step based on distances and degrees without having to run additional message passing iterations or modify the architecture of the GNN.
Interestingly, recent work [42] showed that the implicit encoding of the pairwise distance between nodes plus the degree information which can be extracted via aggregation are fundamental to provide a theoretical justification of ESAN.Furthermore, the work on SUN [23] showed that node-based subgraph GNNs are provably as powerful as the 3-WL test.This result is aligned with recent analyses on the hierarchies of model expressivity [43].
In this work, we directly consider distances and degrees in the ego network, explicitly providing the structural information encoded by more expressive GNN architectures.In contrast to previous work, IGEL aims to be a minimal yet expressive representation of network structures without learning that is amenable to formal analysis, as shown in Section 4. This connects ego network properties to subgraph GNNs, corroborating the 3-WL upper bound of SUN and the expressivity analysis of ESAN-to which IGEL is strongly related in its ego networks policy (EGO+).Furthermore, these relationships may explain the comparable empirical performance of IGEL to the state of the art, as shown in Section 5.

Local Ego-Network Encodings
The idea behind the IGEL encoding is to represent each vertex v by compactly encoding its corresponding ego network E α v at depth α.The choice of encoding consists of a histogram of vertex degrees at distance d ≤ α, for each vertex in E α v .Essentially, IGEL runs a breadthfirst traversal up to depth α, counting the number of times the same degree appears at distance d ≤ α.We postulate that such a simple encoding is sufficiently expressive and at the same time computationally tractable to be considered as vertex features.
Figure 1 shows an illustrative example for α = 2.In this example, the green node is encoded using a (sparse) vector of size 5 × 3 = 15, since the maximum degree at depth 2 is 5 and there are three distances considered: 0, 1, 2. The ego network contains six nodes, and all (distance, degree) pairs occur once, except for degree 3 at distance 2, which occurs twice.Algorithm 2 describes the steps to produce the IGEL encoding iteratively.Algorithm 2 IGEL Encoding.
Similarly to 1-WL, information of the neighbors of each vertex is aggregated at each iteration.However, 1-WL does not preserve distance information in the encoding due to the hashing step.Instead of hashing the degrees into equivalence classes, IGEL keeps track of the distance at which a degree is found, generating a more expressive encoding.
The cost of such additional expressiveness is a computational complexity that grows exponentially in the number of iterations.More precisely, the time complexity follows O(n • min(m, (d max ) α )), with O(n • m) when α ≥ diam(G).In Appendix F, we provide a possible breadth-first search (BFS) implementation that is embarrassingly parallel and thus can be computed over p processors following O(n • min(m, (d max ) α )/p) time complexity.
The encoding produced by Algorithm 1 can be described as a multi-set of path length and degree pairs ( l, d) in the α-depth ego graph of v, E α v : which also results in exponential space complexity.However, e α v can be represented as a (sparse) vector IGEL α vec (v), where the i-th index contains the frequency of path length and degree pairs ( l, d), as shown in Figure 1: which has linear space complexity O(α • n • d max ), conservatively assuming every node requires d max parameters at every α depth from the center of the ego network, where d max = O(n) in the worst case when a vertex is fully connected.Note that in practice, we may normalize raw counts by, e.g., applying log1p-normalization, and for real-world graphs, d max n, where the probability of larger degrees often decays exponentially [44].Finally, complexity can be further reduced by making use of sparse vector implementations.
We finish this section with two remarks.First, the produced encodings can differ only for values of α < diam(G), since otherwise the ego network is the same for all nodes, ∀v, and thus the resulting encodings, i.e., e α v = e α+1 v , for α ≥ diam(G).Second, the sparse formulation in Equation ( 5) can be understood as an inductive analogue of Weisfeiler-Lehman graph kernels [45], as we explore in Appendix B.

Which Graphs Are IGEL-Distinguishable?
We now present results about the expressive power of IGEL, extending previous preliminary results on 1-WL that had been presented in [46].We discuss the increased expressivity of IGEL with respect to 1-WL, and identify upper-bounds on the expressivity on graphs that are also indistinguishable under MATLANG and the 3-WL test.We assess expressivity by studying whether two graphs can be distinguished by comparing the encodings obtained by the k-WL test and the IGEL encodings for a given value of α.Similarly to the definition of k-WL equivalence, we say that G 1 = (V 1 , E 1 ) and G 2 = (V 1 , E 1 ) are IGEL-equivalent if the sorted multi-set of node representations is the same for G 1 and G 2 :

Distinguishability on 1-WL Equivalent Graphs
We first show IGEL is more powerful than 1-WL, following Lemmas 1 and 2: Lemma 1. IGEL is at least as expressive as 1-WL.For two graphs, G 1 , G 2 , which are distinguished by 1-WL in k iterations (G Lemma 2. There exist at least two non-isomorphic graphs, G 1 , G 2 , that IGEL can distinguish but that 1-WL cannot distinguish; i.e., G First, we formally prove Lemma 1, i.e., that IGEL is at least as expressive as 1-WL.For this, we consider a variant of 1-WL which removes the hashing step.This modification can only increase the expressive power of 1-WL and makes it possible to directly compare such (possibly more expressive) 1-WL encodings with the encodings generated by IGEL.Intuitively, after k color refinement iterations, 1-WL considers nodes at k hops from each node, which is equivalent to running IGEL with α = k + 1, i.e., using ego networks that include information of all nodes that 1-WL would visit.

Proof of Lemma 1. For convenience, let
v}} be a recursive definition of Algorithm 1 where hashing is removed and c 0 v = {{d G (v)}}.Since the hash is no longer computed, the nested multi-sets contain strictly the same or more information as in the traditional 1-WL algorithm.
For IGEL to be less expressive than 1-WL, it must hold that there exist two graphs We define an equally or more expressive variant of the 1-WL test 1-WL * where hashing is removed, such that c k v 1 = {{{{...{{d G (v 1 )}}, {{d G (u)∀u ∈ N 1 G 1 (v 1 )}}...}}}}, nested up to depth k.To avoid nesting, the multi-set of nested degree multi-sets can be rewritten as the union of degree multi-sets by introducing an indicator variable for the iteration number where a degree is found: At each step i, we introduce information about nodes up to distance i of v 1 .Furthermore, by construction, nodes will be visited on every subsequent iteration-i.e., for c 2 v 1 , we will observe (2, The flattened representation provided by 1-WL * is still equally or more expressive than 1-WL, as it removes hashing and keeps track of the iteration at which a degree is found.
Let IGEL-W be a less expressive version of IGEL that does not include edges between nodes at k + 1 hops of the ego network root.Now, consider the case in which from 1-WL * , and let α = k + 1 so that IGEL-W considers degrees by counting edges found at k to k + 1 hops of v 1 and v 2 .Assume that This implies that all degrees and iteration counts match as per the distance indicator variable at which the degrees are found, so Therefore, by extension IGEL is at least as expressive as 1-WL.
To prove Lemma 2, we show graphs that IGEL can distinguish despite being undistinguishable by 1-WL and the MATLANG sub-languages ML 1 and ML 2 .In Section 4.1.1,we provide an example where IGEL distinguishes 1-WL/ML 1 equivalent graphs, while Section 4.1.2shows that IGEL also distinguishes graphs that are known to be distinguishable in the strictly more expressive ML 2 language.4.1.1.ML 1 /1-WL Expressivity: Decalin and Bicyclopentyl Decalin and Bycyclopentyl (in Figure 2) are two molecules whose graph representations are not distinguishable by 1-WL despite their simplicity.The graphs are nonisomorphic, but 1-WL identifies 3 equivalence classes in both graphs: central nodes with degree 3 (purple), their neighbors (blue), and peripheral nodes farthest from the center (green).Figure 2 shows the resulting IGEL encoding for the central node (in purple) using α = 1 (top) and α = 2 (bottom).For α = 1, the encoding field of IGEL is too narrow to identify substructures that distinguish the two graphs (Figure 3, top).However, for α = 2 the representations of central nodes differ between the two graphs (Figure 3, bottom).In this example, any value of α ≥ 2 can distinguish between the graphs.

ML 2 Expressivity: Cospectral 4-Regular Graphs
IGEL can also distinguish ML 2 -equivalent graphs.Recall that ML 2 is strictly more expressive than 1-WL, as described in Section 2.2.It is known that d-regular graphs of the same cardinality are indistinguishable by the 1-WL test in Algorithm 1 and that co-spectral graphs cannot be distinguished in ML 2 : [13], Proposition 5.1).
In contrast to 1-WL, we can find examples of non-isomorphic d-regular graphs that IGEL can distinguish, as the generated encodings will differ for any pair of graphs whose sets of sorted degree sequences do not match at any path length less than α.Furthermore, we can find examples of co-spectral graphs that can be distinguished by IGEL encodings.In both cases, the intuition is that the ego network encoding generated by IGEL discards the edges that connect nodes beyond the subgraph.Consequently, the generated encoding will depend on the actual connectivity at the boundary of the ego network and provide IGEL with increased expressivity compared to other methods.
Figure 4 shows two co-spectral 4-regular graphs taken from [47], and the structures obtained using IGEL encodings with α = 1 on each graph.The 1-WL test assigns a single color to all nodes and stabilizes after one iteration.Likewise, any ML 2 sentences executed as operations on the adjacency matrices of both graphs produce equal results.However, IGEL identifies four different structures (denoted a, b, c, d).Since the IGEL encodings between both graphs do not match, they are distinguishable.This is the case for any value of α ≥ 1.
IGEL encodings for two co-spectral 4-regular graphs from [47].IGEL distinguishes 4 kinds of structures within the graphs (associated with every node as a, b, c, and d).The two graphs can be distinguished since the encoded structures and their frequencies do not match.

Indistinguishability on Strongly Regular Graphs
We identify an upper bound on the expressive power of IGEL: non-isomorphic Strongly Regular Graphs with equal parameters.In this case, two non-isomorphic graphs are not indistinguishable by IGEL.Formally, given v , and has d-neighbors.By definition, d neighbors of v have λ shared neighbors with v each, plus an edge with v, and E 1 v does not include γ edges beyond its neighbors.Thus, for SRGs G 1 , G 2 where Thus, IGEL with α ∈ {1, 2} can only distinguish between different values of n, d and λ.
Our findings show that IGEL is a powerful permutation-equivariant representation (see Lemma A1), capable of distinguishing 1-WL equivalent graphs such as Figure 4-which as cospectral graphs are known to be expressable in strictly more powerful MATLANG sublanguages than 1-WL [13].Furthermore, in Appendix D, we connect IGEL to SPNNs [40] and show that IGEL is strictly more expressive than SPNNs on unattributed graphs.Finally, we note that the upper bound on strongly regular graphs is a hard ceiling on expressivity since SRGs are commonly known to be indistinguishable by 3-WL [14,21,23,43,49].IGEL formally reaches an expressivity upper bound on SRGs, distinguishing SRGs with different values of n, d, and λ.These results are similar with subgraph methods implemented within MP-GNN architectures, such as nested GNNs [20] and GNN-AK [21], which are known to be not less powerful than 3-WL, and the ESAN framework when leveraging ego networks with root-node flags as a subgraph sampling policy (EGO+), which is as powerful as 3-WL on SRGs [22].However, in contrast to these methods, IGEL cannot distinguish graphs with different values of γ.Furthermore, in Section 5, we study IGEL in a series of empirical tasks, finding that the expressivity difference that IGEL exhibits on SRGs does not have significant implications in downstream tasks.
Summarizing, IGEL distinguishes non-isomorphic graphs that are undistinguishable by the 1-WL test.Furthermore, Lemma 2 shows that IGEL can distinguish any graphs that 1-WL can distinguish.We derive a precise expressivity upper bound for IGEL on SRGs, showing that IGEL cannot distinguish SRGs with equal parameters or between values of γ.Overall, the expressive power of IGEL on SRGs is similar to other state-of-the-art methods, including k-hop GNNs [16], GSNs [19], NGNNs [20], GNN-AK [21], and ESAN [22].

Empirical Validation
We evaluate IGEL as a sparse local ego network encoding following Equation ( 5), extending vertex/edge attributes on six experimental tasks: graph classification, graph isomorphism detection, graphlet counting, graph regression, link prediction, and vertex classification.With our experiments, we seek to evaluate the following empirical questions: Q1.Does IGEL improve MP-GNN performance on standard graph-level tasks?Q2.Can we empirically validate our results on the expressive power of IGEL compared to 1-WL?Q3.Are IGEL encodings appropriate features to learn on unattributed graphs?Q4.How do GNN models compare with more traditional neural network models when they are enriched with IGEL features?

Overview of the Experiments
For graph classification, isomorphism detection, and graphlet counting, we reproduce the benchmark proposed by [26] on eight graph data sets.For each task and data set, we introduce IGEL as vertex/edge attributes, and compare the performance of including or excluding IGEL on several GNN architectures-including linear and MLP baselines (without message passing), GCNs [31], GATs [33], GINs [8], Chebnets [30], and GNNML3 [26].We measure whether IGEL improves inductive MP-GNN performance while validating our theoretical expressivity analysis (Q1 and Q2).We also evaluate on the ZINC-12K and PATTERN data sets from benchmarking GNNs [50] to test IGEL on larger, real-world data sets.
For link prediction, we experiment on two unattributed social network graphs.We train self-supervised embeddings on IGEL encodings and compare them against standard transductive vertex embeddings.Transductive methods require that all nodes in the graph are known at training and inference time, while inductive methods can be applied to unseen nodes, edges, and graphs.Inductive methods may be applied in transductive settings but not vice-versa.Since IGEL is label and permutation invariant, its output is an inductive representation.We detail the self-supervised embedding approach in Appendix H.We compare our results against strong vertex embedding models, namely DeepWalk [51] and Node2Vec [52], seeking to validate IGEL as a theoretically grounded structural feature extractor in unattributed graphs (Q3).
For vertex classification, we train using IGEL encodings and vertex attributes as inputs on DNN models without message passing on an inductive protein-to-protein interaction (PPI) multi-label classification problem.We evaluate the impact of introducing IGEL on top of vertex attributes and compare the performance of IGEL-inclusive models with MP-GNNs (Q4).

Experimental Methodology
On graph-level tasks, we introduce IGEL encodings concatenated to existing vertex features, and also introduce IGEL as edge-level features representing an edge as the elementwise product of node-level IGEL encodings at each end of the edge into the best performing model configurations found by [26] without any hyper-parameter tuning (e.g., number of layers, hidden units, choice pooling and activation functions).We evaluate performance differences with and without IGEL on each task, data set and model on 10 independent runs, measuring statistical significance of the differences through paired t-tests.On benchmark data sets, we reuse the best reported GIN-AK + baseline from [21] and simply introduce IGEL as additional node features with α ∈ {1, 2}, with no hyper-parameter changes.
On vertex and edge-level tasks, we report best performing configurations after hyperparameter search.Each configuration is evaluated on five independent runs.Our results are compared against strong standard baselines from the literature, and we provide a breakdown of the best-performing hyper-parameters found in Appendix A.

Results and Notation
The following formatting denotes significant (as per paired t-tests) positive (in bold), negative (in italic), and insignificant differences (no formatting) after introducing IGEL, with the best results per task/data set underlined.

Graph Classification: TU Graphs
Table 1 shows graph classification results for TU molecule data sets [28].In each data set, nodes represent atoms and edges represent their atomic bonds.The graphs contain no edge features while, node features are a one-hot encoded vector of the atom represented by that node.We evaluate differences in mean accuracies with and without IGEL through paired t-tests, denoting significance intervals of p < 0.01 as * and p < 0.0001 as .Our results show that IGEL in the Mutag and Proteins data sets improves the performance of all MP-GNN models, including GNNML3, contributing to answer Q1.By introducing IGEL in those data sets, MP-GNN models reach similar performance to GNNML3.Introducing IGEL achieves this at O(n • min(m, (d max ) α )) preprocessing costs compared to O(n 3 ) worst-case eigen-decomposition costs associated with GNNML3's spectral supports.
Additionally, since IGEL is an inductive method, the worst-case O(n • (d max ) α ) when α < diam(G) cost is only required when the graph is first processed.Afterwards, encodings can be reused, recomputing them for nodes neighboring new nodes or updated edges as given by α.This contrasts with GNNML3's spectral supports, which are computed on the adjacency matrix and would require a full recalculation when nodes or edges change.
On the Enzymes and PTC data sets, results are mixed: for all models other than GNNML3, IGEL either significantly improves accuracy (on MLPNet, GCN, and GIN on Enzymes), or does not have a negative impact on performance.GAT outperforms GNNML3 in PTC, while GNNML3 is the best performing model on Enzymes.Additionally, GNNML3 performance degrades when IGEL is introduced on the Enzymes and PTC data sets.We believe this degradation may be caused by overfitting due to a lack of additional parameter tuning, as GNNML3 models are deeper in Enzymes and PTC (four GNNML3 layers) when compared Mutag and Proteins (three and two GNNML3 layers respectively).It may be possible to improve GNNML3 performance with IGEL by re-tuning model parameters, but due to computational constraints we do not test this hypothesis.Nevertheless, we observe that all models improve in at least two different data sets after introducing IGEL without hyper-parameter tuning, which we believe indicates our results are a conservative lower-bound on model performance.
We also compare the best IGEL results from Table 1 with state-of-the-art methods improving expressivity.Table 2 summarizes the reported results for k-hop GNNs [16], GSNs [19], nested GNNs [20], ID-GNNs [39], GNN-AK [21], and ESAN [22].When we compare IGEL and the best performing baseline for every data set, none of the differences are statistically significant (p > 0.01) except for ID-GNN in Proteins (where p = 0.009).Overall, our results show that incorporating the IGEL encodings in a vanilla GNN yields comparable performance to state-of-the-art methods (Q2).

Graph Isomorphism Detection
Table 3 shows isomorphism detection results on two data sets: Graph8c (as described in Appendix A), and EXP [53].On the Graph8c data set, we identify isomorphisms by counting the number of graph pairs for which randomly initialized MP-GNN models produce equivalent outputs.Equivalence is measured by the Manhattan distance between graph on 100 independent initialization runs further described in Appendix A. The EXP data set contains 1-WL equivalent pairs of graph, and the objective is to identify whether they are isomorphic or not.We report model accuracies on the binary classification task of distinguishing non-isomorphic graphs that are 1-WL equivalent.On Graph8c, introducing IGEL significantly reduces the amount of graph pairs erroneously identified as isomorphic for all MP-GNN models.Furthermore, IGEL allows a linear baseline employing a sum readout over input feature vectors, then projecting onto a 10-component space to identify all but 1571 non-isomorphic pairs compared to GCNs (4196 errors) or GATs (1827 errors) that can be identified without IGEL.
We also find that all Graph8c graphs can be distinguished if the IGEL encodings for α = 1 and α = 2 are concatenated.We do not study the expressivity of concatenating combinations of α in this work, but based on our results we hypothesize it produces strictly more expressive representations.
On EXP, introducing IGEL is sufficient to correctly identify all non-isomorphic graphs for all standard MP-GNN models, as well as the MLP baseline.Furthermore, the linear baseline can reach 97.25% classification accuracy with IGEL despite only computing a global sum readout before a single-output fully connected layer.Results on Graph8c and EXP validate our theoretical claims that IGEL is more expressive than 1-WL and can distinguish graphs that would be indistinguishable under 1-WL, answering Q2.
We also evaluate IGEL on the SR25 data set (described in Appendix A), which contains 15 strongly regular 25 vertex non-isomorphic graphs known to be indistinguishable by 3-WL where we can empirically validate Theorem 1.In [26], it was shown that all models in our benchmark are unable to distinguish any of the 105 non-isomorphic graph pairs in SR25.Introducing IGEL does not improve distinguishability-as expected from Theorem 1.

Graphlet Counting
We evaluate IGEL on a graphlet counting regression task, training a model to minimize mean squared error (MSE) on the normalized graphlet counts.Counts are normalized by the standard deviation of counts in the training set, as in [26].
In Table 4, we show the results of introducing IGEL in five graphlet counting tasks on the RandomGraph data set [54].We identify 3-stars, triangles, tailed triangles and 4-cycle graphlets, as shown in Figure 6, plus a custom structure with 1-WL expressiveness proposed in [26] to evaluate GNNML3.We highlight statistically significant differences when introducing IGEL (p < 0.0001).

3-Star
Triangle Tailed Triangle 4-Cycle   Introducing IGEL improves the ability of 1-WL GNNs to recognize triangles, tailed triangles, and custom 1-WL graphlets from [26].Stars can be identified by all baselines, and introducing IGEL only produces statistically significant differences on the linear baseline.Interestingly, IGEL on the linear model produces results outperforming MP-GNNs without IGEL for star, triangle, tailed triangle, and custom 1-WL graphlets.
By introducing IGEL on the MLP baseline, it obtains the best performance (lower MSE) on the triangle, tailed-triangle, and custom 1-WL graphlets, even when compared to GNNML3 and subgraph GNNs-including when IGEL encodings are input to GNNML3.
Results on linear and MLP baselines are interesting as neither baseline uses message passing, indicating that raw IGEL encodings may be sufficient to identify certain graph structures in simple linear models.For all graphlets except 4-cycles, introducing IGEL outperforms or matches GNNML3 performance at lower preprocessing and model training/inference costs-without the need for costly eigen decomposition or message passing, answering Q1 and Q2.IGEL moderately improves performance counting 4-cycle graphlets, but the results are not competitive when compared to GNNML3.

Benchmark Results with Subgraph GNNs
Given the favorable performance of IGEL compared to related subgraph aware methods such as NGNN and ESAN, as shown in Table 2, we also explore introducing IGEL on a subgraph GNN, namely GNN-AK proposed by [21].We follow a similar experimental approach as in previous experiments, reproducing the results of GNN-AK using GINs [8] with edge-feature support [55] (GINs are the best performing base model in three out of the four data sets, as reported in [21]) as the base GNN on two real-world benchmark data sets: ZINC-12K (as a graph regression task minimizing mean squared error ↓, where lower is better) and PATTERN (as a graph classification task, where higher accuracy ↑ is better) from benchmarking GNNs [50].
Without performing any additional hyper-parameter tuning or architecture search, we evaluate the impact of introducing IGEL with α ∈ {1, 2} on the best performing model configuration and code published by the authors of GNN-AK [21].Furthermore, we also evaluate the setting in which subgraph information is not used to assess whether IGEL can provide comparable performance to GNN-AK without changes to the network architecture.Table 5 summarizes our results.IGEL maintains or improves performance in both cases when introduced on a GIN, but only in the case of PATTERN we find a statistically significant difference (p < 0.05).When introducing IGEL on a GIN-AK model, we find statistically significant improvements on ZINC-12K.Introducing IGEL on GIN-AK + in PATTERN produces unstable losses on the validation set, with model performance showing larger variations across epochs.We believe that this instability might explain the loss in performance and that further hyperparameter tuning and regularization (e.g., tuning dropout to avoid overfitting on specific IGEL features) could result in improved model performance.
Finally, we note that despite our constrained setup, introducing IGEL is also interesting from a runtime and memory standpoint.In particular, introducing IGEL on a GIN for PATTERN yields performance only −0.2% worse than its GNN-AK (86.711 vs. 86.877),while executing 3.87 times faster (62.21 s vs. 240.91s per iteration) and requiring 20.51 times less memory (26.2 GB vs. 1.3 GB).This is in line with our theoretical analysis in Section 3, as IGEL can be computed once as a preprocessing step and then introduces a negligible cost on the input size of the first layer, which is amortized in multi-layer GNNs.Together with our results on graph classification, isomorphism detection, and graphlet counting, our experiments show that IGEL is also an efficient way of introducing subgraph information without architecture modifications in large real-world data sets.

Link Prediction
We also test IGEL on a link prediction task, following the approach of [52] to compare with well-known transductive node embedding methods on the Facebook and ArXiv AstroPhysics data sets [56].We mode the task as follows: for each graph, we generate negative examples (non-existing edges) by sampling random unconnected node pairs.Positive examples (existing edges) are obtained by removing half of the edges, keeping the pruned graph connected after edge removals.Both sets of vertex pairs are chosen to have the same cardinality.Note that keeping the graph connected is not required for IGELit is required by transductive methods, which fail to learn meaningful representations on disconnected graph components.We learn self-supervised IGEL embeddings with a DeepWalk [51] approach (described in Appendix H) and model the link prediction task as a logistic regression problem whose input is the representation of an edge-as the element-wise product of IGEL embeddings of vertices at each end of an edge without fine-tuning, which is the best edge representation reported by [52].
In Table 6, we report AUC results averaged on five independent executions and compared against previous reported baselines.In this case, we perform hyper-parameter search, with results described on Appendix A. We provide additional details on the unsupervised parameters in our code repositories also referenced in Appendix A. Table 6.Area under the ROC curve (AUROC) link prediction results on Facebook and AP-arXiv.Embeddings learned on IGEL encodings outperform transductive methods.IGEL stddevs < 0.005.

Method Facebook arXiv
DeepWalk [51] 0.968 0.934 node2vec [52] 0.968 0.937 IGEL with α = 2 significantly outperforms standard transductive methods on both data sets.This is despite the fact that we compare against methods that are aware of node identities and that several vertices can share the same IGEL encodings.Furthermore, IGEL is an inductive method that may be used on unseen nodes and edges, unlike transductive methods DeepWalk or node2vec.We do not explore the inductive setting as it would unfairly favor IGEL and cannot be directly applied to DeepWalk or Node2vec.
Additionally, IGEL significantly underperforms when α = 1.We believe this might be caused by the fact that when α < 2, it is not possible for the model to assess whether two vertices are neighbors based on their IGEL representation.Overall, our link prediction results show that IGEL encodings can be used as a potentially inductive feature generation approach in unattributed networks, without degrading performance when compared to standard vertex embedding methods-answering Q3.

Vertex Classification
In light of our graph-level results on graphlet counting, we evaluate to which extent a vertex classification task can be solved by leveraging IGEL structural features without message passing (Q4).We introduce IGEL in a DNN model and evaluate against several MP-GNN baselines.Our comparison includes supervised baselines proposed by GraphSAGE [32], LCGL [34], and GAT [33] on a multi-label classification task in a proteinto-protein interaction (PPI) [32] data set.The aim is to predict 121 binary labels, given graph data where every vertex has 50 attributes.We tune the parameters of a multilayer perceptron (MLP) whose input features are either IGEL, vertex attributes, or both through randomized grid search-and we provide a detailed description of the grid-search parameters in Appendix A.
Table 7 shows Micro-F1 scores averaged over five independent runs.Introducing IGEL alongside vertex attributes in an MLP can outperform standard MP-GNNs like GraphSAGE or LGCL despite not using message passing-and thus only having access to the attributes of a vertex without additional context from its neighbors, answering Q4.Furthermore, even though IGEL underperforms when compared with GAT, the results reported by [33] use a three-layer GAT model propagating messages through 3-hops, while we observe the best IGEL performance with α = 1.We believe that the structural information at 1-hop captured by IGEL might be sufficient to improve performance on tasks where local information is critical, potentially reducing the hops required by downstream models (e.g., GATs).

Discussion and Conclusions
IGEL is a novel and simple vertex representation algorithm that increases the expressive power of MP-GNNs beyond the 1-WL test.Empirically, we found IGEL can be used as a vertex/edge feature extractor on graph-, edge-, and node-level settings.On four different graph-level tasks, IGEL significantly improves the performance of nine graph representation models without requiring architectural modifications-including Linear and MLP baselines, GCNs [57], GATs [33], GINs [8], ChebNets [30], GNNML3 [26], and GNN-AK [21].We introduce IGEL without performing hyper-parameter search on an existing baseline, which suggests that IGEL encodings are informative and can be introduced in a model without costly architecture search.
Although structure-aware message passing [16,20,21,23], substructure counts [19], identity [39], and subgraph pooling [22] may also be combined with existing MP-GNN architectures, IGEL reaches comparable performance as related models while simply augmenting the set of vertex-level feature without tuning model hyper-parameters.IGEL consistently improves performance on five data sets for graph classification: one data set for graph regression, two data sets for isomorphism detection, and five different graphlet structures in a graphlet counting task.Furthermore, even though MP-GNNs with learnable subgraph representations are expected to be more expressive since they can freely learn structural characteristics according to optimization objectives, our results show that introducing IGEL produces comparable results on three different domains and improves the performance of a strong GNN-AK + baseline on ZINC-12K.Additionally, introducing IGEL on a GIN model in the PATTERN data set achieves 99.8% of the performance of a strong GIN-AK + baseline at 3.87 times lower memory costs.The computational efficiency is a key benefit of IGEL as it only requires a single preprocessing step that extends vertex/edge attributes and can be cached during training and inference.
On link prediction tasks evaluated in two different graph data sets, IGEL-based Deep-Walk [51] embeddings outperformed transductive methods based on embedding node identities such as DeepWalk [51] and Node2vec [52].Finally, IGEL with α = 1 outperformed certain MP-GNN architectures like GraphSAGE [32] on a protein-protein interaction nodeclassification task despite being used as an additional input to a DNN without message passing.More powerful MP-GNNs-namely a three-layer GAT [33]-outperformed the IGEL-enriched DNN model, albeit at potentially higher computational costs due to the increased depth of the model.
A fundamental aspect in our analysis of the expressivity of IGEL is that the connectivity of nodes is different depending on whether they are analyzed as part of the subgraph (ego network) or within the entire input graph.In particular, edges at the boundary of the ego network are a subset of the edges in the input graph.IGEL exploits this idea in combination with a simple encoding based on frequencies of degrees and distances.This is a novel idea, which allows us to connect the expressive power of IGEL with other analyses like the 1-WL test, as well as Weisfeiler-Lehman kernels (see Appendix B), shortest-path neural networks (see Appendix D), and MATLANG (see Appendix E).Furthermore, the ego network formulation allows us to identify an upper-bound on expressivity in strongly regular graphs-matching recent findings on the expressivity of subgraph GNNs.
Although we have presented IGEL on unattributed graphs, the principle underlying its encoding can be also applied to labelled or attributed graphs.Appendix G outlines possible extensions in this direction.In Appendix G.3, we also connect IGEL with k-hop GNNs and GNN-AK-drawing a direct link between subgraph GNNs and our proposed encoding.
Overall, our results show that IGEL can be efficiently used to enrich network representations on a variety of tasks and data sets, which we believe is an attractive baseline in expressivity-related tasks.This opens up interesting future research directions by showing that explicit network encodings like IGEL can perform competitively compared to more complex learnable representations while being more computationally efficient.

Appendix A.2. Hyper-Parameters and Experiment Details
Graph Level Experiments.We reproduce the benchmark of [26] without modifying model hyper-parameters for the tasks of graph classification, graph isomorphism detection, and graphlet counting.For classification tasks, the six models in Table 2 are trained on binary/categorical cross-entropy objectives depending on the task.For Graph isomorphism detection, we train GNNs as binary classification models on the binary classification task on EXP [53], and identify isomorphisms by counting the number of graph pairs for which randomly initialized MP-GNN models produce equivalent outputs on Graph8c-Simple 8 vertices graphs from: http://users.cecs.anu.edu.au/~bdm/data/graphs.html (accessed on 16 September 2023).This evaluation procedure means models are not trained but simply initialized, following the approach of [26].We also evaluate on the SR25 dataset and find IGEL encodings cannot distinguish the 105 Strongly Regular graph pairs with parameters SRG (25,12,5,6) from: http://users.cecs.anu.edu.au/~bdm/data/graphs.html (accessed on 16 September 2023).For the graphlet counting regression task on the RandomGraph data set [54], we train models to minimize mean squared error (MSE) on the normalized graphlet counts for five types of graphlets.
On all tasks, we experiment with α ∈ {1, 2} and optionally introduce a preliminary linear transformation layer to reduce the dimensionality of IGEL encodings.For every setup, we execute the same configuration 10 times with different seeds and compare runs introducing IGEL or not by measuring whether differences on the target metric (e.g., accuracy or MSE) are statistically significant, as shown in Tables 1 and 2. In Table A1, we provide the value of α that was used in our experimental results.Our results show that the choice of α depends on both the task and model type.We believe these results may be applicable to subgraph-based MP-GNNs, and will explore how different settings, graph sizes, and downstream models interact with α in future work.

Chebnet GAT GCN GIN GNNML3 Linear MLP
Vertex and Edge-level Experiments.In this section we break down the best performing hyper-parameters on the edge-(link prediction) and vertex-level (node classification) experiments.
Link Prediction-The best performing hyperparameter configuration on the Facebook graph including α = 2, learning t = 256 component vectors with e = 10 walks per node, each of length s = 150 and p = 8 negative samples per positive for the self-supervised negative sampling.Respectively, on the arXiv citation graph, we find the best configuration at α = 2, t = 256, e = 2, s = 100 and p = 9.
Node Classification-In the node classification experiment, we analyze both encoding distances α ∈ {1, 2}.Other IGEL hyper-parameters are fixed after a small greedy search based on the best configurations in the link prediction experiments.For the MLP model, we perform a greedy architecture search, including number of hidden units, activation functions and depth.Our results show scores averaged over five different seeded runs with the same configuration obtained from hyperparameter search.
The best performing hyperparameter configuration on the node classification is found with α = 2 on t = 256 length embedding vectors, concatenated with node features as the input layer for 1000 epochs in a three-layer MLP using ELU activations with a learning rate of 0.005.Additionally, we apply 100 epoch patience for early stopping, monitoring the F1 score on the validation set.
Reproducibility-We provide a replication folder in the code repository for the exact configurations used to run the experiments.

Appendix B. Relationship to Weisfeiler-Lehman Graph Kernels
The Weisfeiler-Lehman algorithm has inspired the design of graph kernels for graph classification, as proposed by [45,58].In particular, Weisfeiler-Lehman graph kernels [45] produce representations of graphs based on the labels resulting of evaluating several iterations of the 1-WL test described in Algorithm 1.The resulting vector representation resembles IGEL encodings, particularly in vector form-IGEL α vec (v).In the case of WL kernels, the hash function maps sorted label multi-sets in terms of their position in lexicographic order within the iteration.An iteration of the algorithm is illustrated in Figure A1.After k iterations, graphs are represented by counting the frequency of distinct labels c i v that were observed at each iteration 1 ≤ i ≤ k.The resulting vector representation can be used to compare whether two graphs are similar and as an input to graph classification models.However, WL graph kernels can suffer from generalization problems, as the label compression step assigns different labels to multi-sets that differ on a single element within the multi-set.If a given graph contains a previously unseen label, each iteration will produce previously unseen multi-sets, propagating at each iteration step and potentially harming model performance.Recent works generalize certain iteration steps of WL graph kernels to address these limitations, introducing topological information [59] or Wasserstein distances between node attribute to derive labels [60].IGEL can be understood as another work in that direction, removing the hashing step altogether and simply relying on target structural features-path length and distance tuples-that can be inductively computed in unseen or mutating graphs.
Furthermore, if we can find A α v , it is also possible to find A k v ∀ k ∈ {1, ..., α}.Thus, we can compute the distance of every node to v by progressively expanding k, mapping the degrees of nodes to the value of k being processed by means of unary function application ( f ), and then computing the minimum value.Let ξ denote a distance vector containing entries for all vertices found at distance k of v: The distance l E α v (u) between a given node u to v in E α v can then be computed in terms of ML 1 operations that may be applied recursively: where f α min is a unary function that retrieves the minimum length from having computed Thus, computing the distance is also possible in ML 1 given A α v .However, in order to compute A α v , we must be able to extract a subset of V contained in E α v .Following the approach of [13] (see Proposition 7.2), we may leverage a node indicator vector 1 V ∈ {0, 1} n for V ⊆ V where 1 V v = 1 when v ∈ V and 0 otherwise.We can then compute A α v via a diagonal mask matrix M {0, 1} n×n such that M i,i = 1 when v i ∈ V , and M i,i = 0 when It follows from Equation (A3) that if we are provided M, it is possible to compute A α v in ML 1 .However, since indicator vectors are not part of ML, it is not possible to extract the ego subgraph for a given node v.As such IGEL cannot be expressed within MATLANG unless indicator vectors are introduced.A natural question is whether it is possible to express IGEL with an operator that requires no knowledge of V unlike indicator vectors, which require computing E α v = (V , E ) beforehand.One possible approach is to only allow indicator vectors for single vertices, encoding any V ⊆ V only if |V | = 1.We denote single-vertex indicator vectors as one-hot v -an operation that represents the one-hot encoding of v.Note that for any V ⊆ V, its indicator vector 1 V can be computed as the sum of one-hot encoding vectors: 1 V = ∑ v∈V one-hot v .Thus, introducing one-hot is as expressive as introducing indicator vectors.
We now express IGEL in terms of the one-hot operation.Consider the following ML 1 expression where Z 0 v = one-hot v : For Z 1 v , we obtain an indicator vector containing the neighbors of v-matching N 1 G (v)-which is binarized (mapping non-zero values to 1-e.g., applying f bin returns 0 when x is 0, and 1 otherwise).Furthermore, when computed recursively for α steps, Z α v implementation is also embarrassingly parallel, which means that, as noted in Section 3, it can be distributed over p processors with O(n • min(m, (d max ) α )/p) time complexity. of immediate neighbors of v but from nodes at every distance 1 ≤ l ≤ α.Equation (1) then becomes The update will need to pool over the messages passed onto v: This formulation matches the general form of k-hop GNNs [16] presented in Section 2.3.Furthermore, introducing distance and degree signals in the message passing can yield models analogous to GNN-AK [21]-which explicitly embed the distance of a neighboring node when representing the ego network root during message passing.As such, IGEL directly connects the expressivity of k-hop GNNs with the 1-WL algorithm and provides a formal framework to explore the expressivity of higher order MP-GNN architectures.

Appendix H. Self-Supervised IGEL
We provide additional context for the application of IGEL as a representational input for methods such as DeepWalk [51].This section describes in detail how self-supervised IGEL embeddings are learned.We also provide a qualitative analysis of IGEL in the selfsupervised setting using networks that are amenable for visualization.We focus on analyzing how α influences the learned representations.
IGEL & Self-Supervised Node Representations IGEL can be easily incorporated in standard node representation methods like Deep-Walk [51] or node2vec [52].Due to its relative simplicity, integrating IGEL only requires replacing the input to the embedding method so that IGEL encodings are used rather than node identities.We provide an overview of the process of generating embeddings through DeepWalk, which involves (a) sampling random walks to capture the relationships between nodes and (b) training a negative-sampling based embedding model in the style of word2vec [62] on the random walks to embed random walk information in a compact latent space.
Distributional Sampling through Random Walks-First, we sample π random walks from each vertex in the graph.Walks are generated by selecting the next vertex uniformly from the neighbors of the current vertex.Figure A3 illustrates a possible random walk of length 9 in the graph of Figure 1.By randomly sampling walks, we obtain traversed node sequences to use as inputs for the following negative sampling optimization objective.Nodes contain the time-step when they were visited.The context of a given vertex will be any nodes whose time-steps are within the context window of that node.
Negative Sampling Optimization Objective-Given a random walk ω, defined as a sequence of nodes of length s, we define the context C(v, ω) associated with the occurrence of vertex v in ω as the the sub-sequence in ω containing the nodes that appear close to v, including repetitions.Closeness is determined by p, the size of the positive context window, i.e., the context contains all nodes that appear at most p steps before/after the node within ω.In DeepWalk, a skip-gram negative sampling objective learns to represent vertices appearing in similar contexts within random walks.
Given a node v c ∈ V in random walk ω, our task is to learn embeddings that assign high probability for nodes v o ∈ V appearing in the context C(v c , ω) and lower probability for nodes not appearing in the context.Let σ(•) denote the logistic function.As we focus on the learned representation capturing these symmetric relationships, the probability of v o being in C(v c , ω) is given by Table A2 summarizes the hyper-parameters of DeepWalk.Our global objective function is the following negative sampling log-likelihood.For each of the π random walks and each vertex v c at the center of a context in the random walk, we sum the term corresponding to the positive cases for vertices found in the context and the expectation over z negative, randomly sampled vertices.Let P n (V) be a noise distribution from which the z negative samples are drawn; our task is to maximize (A4) through gradient ascent: Defining the IGEL-DeepWalk embedding function-In DeepWalk, for every vertex v ∈ V, there is a corresponding t-dimensional embedding vector e v ∈ R t .As such, one can represent the embedding function as a product between a one-hot encoded vector corresponding to the index of the vertex one-hot v ∈ B n where and an embedding matrix E V ∈ R n×t : Introducing IGEL requires modifying the shape of E V to account for the shape of t vec -dimensional IGEL α vec (v) encoding vectors dependent on α and d max , so that IGEL embeddings are computed as a weighted sum of embeddings corresponding to each (path length, degree) pair.Let E α IGEL ∈ R t vec ×t emb define a structural embedding matrix with one embedding per (path length, degree) pair; a linear IGEL can be defined as:

IGEL
Since the definition of IGEL α emb (v) is differentiable, it can be used as a drop-in replacement of e v in Equation (A4).

Definition 3 .Remark 3 .
A n-vertex d-regular graph is strongly regular-denoted SRG(n, d, λ, γ)-if all adjacent vertices have λ vertices in common and all non-adjacent vertices have γ vertices in common.For any G = SRG(n, d, λ, γ), diam(G) ≤ 2 [48].Theorem 1. IGEL cannot distinguish two SRGs when n, d, and λ are the same, and between any value of γ (equal or not).

Figure 6 .
Figure 6.Graphlet types in the counting task.

Figure A1 .
Figure A1.One iteration of the Weisfeiler-Lehman graph kernel where input labels are node degrees.Given an input labeling, one iteration of the kernel computes a new set of labels based on the sorted multi-sets produced by aggregating the labels of each node's neighbors.

Figure A3 .
Figure A3.Example of random walk starting on the green node and finishing on the purple node.Nodes contain the time-step when they were visited.The context of a given vertex will be any nodes whose time-steps are within the context window of that node.
values of γ can be distinguished by ≡ 1 IGEL or ≡ 2 IGEL .
Proof.Recall that for any graph G, IGEL encodings are equal for all α ≥ diam(G) and per Remark 3, SRGs have diameters of two or less.Let G = SRG(n, d, λ, γ), only α ∈ {1, 2} produce different encodings for nodes of G.By construction, ∀v in G, v has encoding e α v , so the encoding of G is e α G = {{e α v }} n .Furthermore, {{e 1 v }} n only encodes n, d, and λ, and {{e 2 v }} n only encodes n and d by expanding e α v in Algorithm 2:

Table 1 .
Per-model classification accuracy metrics on TU data sets.Each cell shows the average accuracy of the model and data set in that row and column, with IGEL (left) and without IGEL (right).

Table 3 .
Graph isomorphism detection results.The IGEL column denotes whether IGEL is used or not in the configuration.For Graph8c, we describe graph pairs erroneously detected as isomorphic.For EXP classify, we show the accuracy of distinguishing non-isomorphic graphs in a binary classification task.

Table 4 .
[23]hlet counting results.On every cell, we show mean test set MSE error (lower is better), with stat.sig(p<0.0001)results highlighted and best results per task underlined.For comparison, we also report strong literature results from two subgraph GNNs: GNN-AK + using a GIN base[21]and SUN[23].

Table 5 .
[21] and standard deviations of the evaluation metrics on the real-world benchmark data sets in combination with GNN-AK[21], highlighting positive and negative stat.sig, (p < 0.05) results when IGEL is added with best per-data set results underlined.

Table A1 .
Values of α used when introducing IGEL in the best reported configuration for graphlet counting and graph classification tasks.The table is broken down by graphlet types (upper section) and graph classification tasks on the TU data sets (bottom section).