Modeling Tree-like Heterophily on Symmetric Matrix Manifolds

Tree-like structures, characterized by hierarchical relationships and power-law distributions, are prevalent in a multitude of real-world networks, ranging from social networks to citation networks and protein–protein interaction networks. Recently, there has been significant interest in utilizing hyperbolic space to model these structures, owing to its capability to represent them with diminished distortions compared to flat Euclidean space. However, real-world networks often display a blend of flat, tree-like, and circular substructures, resulting in heterophily. To address this diversity of substructures, this study aims to investigate the reconstruction of graph neural networks on the symmetric manifold, which offers a comprehensive geometric space for more effective modeling of tree-like heterophily. To achieve this objective, we propose a graph convolutional neural network operating on the symmetric positive-definite matrix manifold, leveraging Riemannian metrics to facilitate the scheme of information propagation. Extensive experiments conducted on semi-supervised node classification tasks validate the superiority of the proposed approach, demonstrating that it outperforms comparative models based on Euclidean and hyperbolic geometries.


Introduction
The prevalence of hierarchical tree-like structures, characterized by power-law distributions, is a ubiquitous phenomenon observed across various real-world applications, encompassing domains from social networks [1,2] to data mining [3] and recommendation systems [4].This pervasive structural pattern has garnered significant attention within the realm of computer science and network analysis due to its profound implications for comprehending network dynamics, functionality, and evolution [5][6][7].
In recent years, there has been a burgeoning interest among researchers in employing hyperbolic space modeling to elucidate tree structures.In contrast to the conventional Euclidean spaces characterized by zero curvature, hyperbolic spaces, endowed with negative curvature, offer a more nuanced measure of inter-nodal distances within a tree.Moreover, the intrinsic property of hyperbolic space to manifest exponential expansion aligns seamlessly with the exponential proliferation inherent in tree growth dynamics.
The complexities inherent in real-world networks often entail a broad spectrum of structural motifs, encompassing flat, tree-like, and circular substructures, thereby giving rise to heterophily within the network.Heterophily contrasts with homophily, where nodes sharing similar attributes tend to cluster together.As depicted in Figure 1, within the overarching tree-like structure, the diverse properties of local substructures yield a variety of graphs.The left graph shows cluster-forming sub-trees, reflecting homophily, while the right graph exhibits hierarchical sub-trees, indicative of heterophily.Hyperbolic spaces offer a nuanced depiction of hierarchical structures and exponential growth dynamics, whereas Euclidean spaces are valued for their simplicity and intuitive geometric properties.Regardless of whether one opts to model such networks within the framework of hyperbolic or Euclidean spaces, both approaches inevitably encounter challenges related to local distortion, resulting in the inaccurate modeling of distances between nodes.To mitigate the limitations above, this study seeks to explore a more expressive space that could tolerate structural heterophily.The aim is to encode the information inherent in the graph topology into a continuous embedding space with less distortion, thus enhancing the performance of the downstream node classification task.From a geometric perspective, the quality of the embedding in geometric learning depends on the compatibility between the intrinsic graph structure and the embedding space.In light of this principle, we employ the Riemannian manifold of symmetric positive-definite matrices to embed node representations.As shown in Figure 2, symmetric spaces have a rich structure of totally geodesic subspaces, including flat (Euclidean) subspaces and tree-like (hyperbolic) subspaces, facilitating the representations of various substructures within a continuous space.In Riemannian geometry, a Riemannian metric is a fundamental concept used to define distances, angles, and other geometric properties on smooth manifolds.Various Riemannian metrics have been proposed to guarantee the geometric properties of a symmetric positive-definite manifold (SPD), including the affine-invariant metric (AIM) [8], log-Euclidean metric (LEM) [9,10], and log-Cholesky metric (LCM) [11].Equipped with these metrics, many Euclidean methods can be generalized into the domain of the Riemannian manifold.
In this study, we introduce a novel approach termed Riemannian graph convolutional neural network (RGCN) aimed at effectively capturing tree-like heterophily within graphs.RGCN operates on the Riemannian symmetric positive-definite matrix manifold and utilizes pullback techniques to generalize Riemannian metrics, such as LEM and LCM, to reconstruct key components of graph convolutional neural networks.In particular, the pullback technique first maps the embedding from the SPD manifold onto the tangent space, proceeds with the operations of information propagation, and ultimately pulls the resulting embeddings back to the SPD manifold.These information propagation components encompass feature transformation, neighborhood aggregation, and non-linear activation, as detailed in prior work [12].Specifically, the integration of feature transformation and non-linear activation enriches the expressive capacity of the SPD neural network.Concurrently, the iterative process of neighborhood aggregation updates the node embeddings by transporting neighboring features across the graph topology.Our experimental results on semi-supervised node classification tasks substantiate the superiority of our proposed methodology, consistently surpassing comparative models grounded in Euclidean and hyperbolic geometries.The principal contributions of this research can be outlined as follows: The rest of this paper is organized as follows.In Section 2, we briefly survey the related works about GNNs and Riemannian manifolds of symmetric positive-definite matrices.Section 3 introduces some preliminaries.Section 4 presents the details of our proposed model.In Section 5, experimental results on eight benchmark datasets are shown and analyzed to highlight the benefits of our approach.Finally, we conclude the paper in Section 6.

Graph Neural Networks
Contemporary graph neural network (GNN) models commonly embrace the messagepassing paradigm [13] to encode node representations, demonstrating significant achievements across tasks such as node classification [12], link prediction [14], and graph classification [15].Advancements in this domain are typically categorized into two primary branches: spectral approaches [16,17] and spatial approaches [12,18].Spectral approaches leverage graph spectral theory to define graph convolutional operations.Taking inspiration from [19], which suggests approximating spectral filters via truncated Chebyshev polynomial expansions of the graph Laplacian, ChebNet [17] introduces K-localized convolutions, laying the groundwork for convolutional neural networks on graphs.Expanding upon this, graph convolutional network (GCN) [12] restricts the K-localized convolution to K = 1, employing multiple layers to implement rich convolutional filter functions.To address both local and global consistency, deep graph convolutional neural network (DGCNN) [20] extends GCN by integrating a convolutional operation with a positive pointwise mutual information matrix.Conversely, spatial approaches directly aggregate neighborhood information around the central node.For example, GraphSAGE [12] introduces a versatile inductive framework that samples fixed-size local neighborhoods and aggregates their features using mean, long short-term memory (LSTM) or pooling mechanisms.Graph attention network (GAN) [21] enhances this aggregation process with attention mechanisms, assigning varying weights to aggregated neighborhoods through self-attention mechanisms.Despite the robust theoretical foundation of spectral-based GCNs, spatial-based GCNs demonstrate superior efficiency, generality, and adaptability.For deeper insights into graph neural networks, numerous comprehensive surveys are available [22,23].
Researchers have observed that numerous graphs, including social networks and biological networks, often manifest a pronounced hierarchical structure [24].Krioukov et al. [25] emphasized that the strong clustering and power-law degree distribution properties in such graphs can be ascribed to a latent hierarchy.Recent investigations have underscored the remarkable representational efficacy of hyperbolic spaces in modeling underlying hierarchies across diverse domains, such as taxonomies [26,27], knowledge graphs [28,29], images [30], semantic classes [31], and actions [32], yielding promising outcomes.Liu et al. [33] and Chami et al. [34] have proposed hyperbolic graph convolutional networks (HGCNs), extending GCNs to hyperbolic spaces for capturing hierarchical structures in graphs.Recently, a series of GNNs have emerged in these spaces, executing graph convolution on various Riemannian manifolds to accommodate diverse graph structures, such as hyperbolic space on tree-like graphs [25], spherical space on spherical graphs [35], and their Cartesian products [36,37].

Riemannian Manifold of Symmetric Positive-Definite Matrices
The utilization of symmetric positive-definite (SPD) matrices for data representation has been a topic of extensive investigation, primarily leveraging covariance matrices to capture the statistical dependencies among Euclidean features [38,39].Recent research endeavors have shifted towards the development of foundational components of neural networks within the covariance matrix space.This includes techniques for feature transformation, such as mapping Euclidean features to covariance matrices using geodesic Gaussian kernels [40], non-linear operations applied to the eigenvalues of covariance matrices [41], convolutional operations employing SPD filters [42], and the Frechét mean [43].Furthermore, proposals for Riemannian recurrent networks [44] and Riemannian batch normalization [45] have been put forth.In comparison to these prior approaches, our proposal introduces an adaptive framework utilizing the pullback paradigm to construct the information propagation component with both LEM and LCM.

Preliminaries and Problem Definition
In this section, we initially introduce the preliminaries and notation essential for constructing an SPD embedding space.Subsequently, we define the problem of semisupervised node classification on the SPD manifold.

Riemannian Manifold
A smooth manifold M extends the concept of a surface to higher dimensions.At each point x ∈ M, there is an associated tangent space T x M, representing the first-order approximation of M around x, which is locally Euclidean.The Riemannian metric g x (•, •) : T x M × T x M → R defined on the tangent space T x M induces an inner product, enabling the derivation of geometric concepts.The pair (M, g) constitutes a Riemannian manifold.The transition between the tangent space and the manifold is facilitated by the exponential and logarithmic maps, denoted as exp x (v) : T x M → M and log x (y) : M → T x M, respectively.Here, exp x (v) projects the vector v ∈ T x M onto the manifold M at point x, while log x (y) projects the vector y ∈ M back to the tangent space T x M. For further elucidation, please consult the mathematical references [46].

Geometry of SPD Manifold
SPD matrices constitute a subset of the Euclidean space R n(n+1)/2 , and various wellestablished Riemannian metrics exist on the SPD manifold.Here, we briefly provide an overview of two such metrics, namely, LEM [9] and LCM [11].The matrix logarithms log : S n ++ → S n and log lcm : S n ++ → L n are defined as follows: where S = UΛU ⊤ denotes the eigenvalue decomposition, L = L (S) represents the Cholesky decomposition, ϕ(L) = ⌊L⌋ + ln(D(L)) signifies a coordinate transformation from the L n + manifold onto the Euclidean space L n , ⌊L⌋ denotes the strictly lower triangular part of L, and D(L) represents the diagonal elements.It is noteworthy that, topologically, L n ≃ S n ≃ R n(n+1)/2 , as their metric topology stems from the Euclidean metric tensor.Leveraging the matrix logarithm, Arsigny et al. [9] propose LEM via Lie group translation, while Lin et al. [11] introduce LCM based on the Cholesky logarithm, establishing an isometry between S n ++ and L n + .In this investigation, we posit that LEM and LCM are fundamentally analogous, reflecting a high-level mathematical abstraction.
The Riemannian metric and corresponding geodesic distance under the LEM are expressed as follows: where ++ are tangent vectors, log lem * ,S (•) denotes the differential map of the matrix logarithm at S, g E represents the standard Euclidean metric tensor, and ∥ • ∥ F stands for the Frobenius norm.
Similarly, the Riemannian metric and geodesic distance under LCM are defined as where

Problem Definition
In this study, we delve into semi-supervised graph representation learning within the SPD space.For clarity and without loss of generality, we define a graph G = (V, E , X), where V = {v 1 , • • • , v n } represents the node set and E = {(v i , v j )|v i , v j ∈ V } denotes the edge set.The edges are encapsulated in the adjacency matrix A, where A ij = 1 if (v i , v j ) ∈ E and 0 otherwise.Each node v i is characterized by a feature vector x i ∈ R d , and matrix X ∈ R |V |×d represents the collective features of all nodes.We now formalize the problem at hand.Definition 1 (Semi-supervised graph representation learning in the SPD space).Given a graph G = (V, E , X), the objective of semi-supervised graph representation learning in the SPD space is to ascertain an encoding function Φ : V → Z that maps each node v to a vector z within an SPD space.This encoding should encapsulate the intrinsic complexity of the graph structure, leveraging information from a subset of labeled nodes to enable accurate label predictions for unlabeled nodes.

SPD Graph Convolutional Networks
Our approach, RGCN, introduces an innovative graph neural network framework constructed on the SPD manifold.Drawing upon the foundation established by HGCN, we conduct graph convolution operations within the substituted Euclidean space and subsequently pull the embeddings back to the SPD manifold.Following the paradigm of GCN and HGNN architectures, RGCN comprises three essential components: feature transformation, neighborhood aggregation, and non-linear activation.

Mapping from Euclidean to SPD Spaces
RGCN initially projects input features onto the SPD manifold using the exp map.Let x E ∈ R represent input Euclidean features, which may be generated by pre-trained Euclidean neural networks.The objective is to devise a transformation that maps these Euclidean features to a point within the SPD space.To achieve this, we learn a linear map that converts the input Euclidean features into a vector of dimension n(n + 1)/2, which is reshaped to form the lower triangle of an initially zero matrix A ∈ R n×n .Subsequently, we apply the exponential map to transition the coordinates from the substituted Euclidean space to the original manifold S n ++ .For instance, in the case of LEM, we define a symmetric matrix U ∈ S n such that U = A + A ⊤ , followed by the exp map as the inverse of Equation ( 1): whereas for LCM, we directly employ the exp map as the inverse map of Equation ( 2): where S (•) represents the inverse of the Cholesky decomposition, Φ(L) = ⌊L⌋ + exp(D(L)) signifies a coordinate transformation from the Euclidean space L n onto the L n + manifold.This one-time mapping process enables input features to operate within the SPD manifold seamlessly.

Feature Transformation
The feature transformation employed in the standard GCN is utilized to map the embedding space of one layer to the embedding space of the next layer, aiming to capture large neighborhood structures.In our approach, we aim to learn transformations of points on the SPD manifold.However, SPD space lacks the notion of a vector space structure.To address this, we extend the framework provided by HGCN and derive transformations within this space.The core concept is to leverage the matrix exponential (exp) and logarithm (log) maps, enabling us to perform Euclidean transformations using substituted Euclidean subspaces S n or L n .Assuming W is an n ′ × n weight matrix, we define the SPD linear transformation as follows: where both the exp and log maps can be formulated using techniques such as the log-Euclidean metric (LEM) or log-Cholesky metric (LCM).

Neighborhood Aggregation
Neighborhood aggregation stands as a pivotal operation within GCNs, enabling the capture of intricate neighborhood structures and features.Let us consider that x i aggregates information from its neighbors (x j )j ∈ N (i) with associated weights (w j )j ∈ N (i).While mean aggregation in Euclidean GCNs computes the weighted average ∑ j∈N (i) w j x j , an analogous operation in hyperbolic space, known as the Fréchet mean, lacks a closed-form solution.To address this, we propose aggregation within substituted Euclidean subspaces S n or L n employing an attention mechanism.
In GCNs, attention learns the significance of neighbors and aggregates their information based on their relevance to the central node.Yet, attention on Euclidean embeddings often overlooks the tree-like structure prevalent in many real-world graphs.Thus, we further propose an SPD attention-based aggregation operation.Given SPD embeddings (Z i , Z j ), we initially map Z i and Z j to substituted Euclidean subspaces S n or L n to compute attention weights w ij using concatenation and a Euclidean multi-layer perceptron (MLP).Subsequently, we propose SPD aggregation to update node embeddings as follows: Similar to Euclidean aggregation, RGCN employs a non-linear activation function, σ S (•), to learn non-linear transformations.Specifically, RGCN applies the Euclidean nonlinear activation in substituted Euclidean subspaces S n or L n , and then, maps back to the SPD manifold S n ++ : It is worth noting that the exponential and logarithm maps are instantiated by both the log-Euclidean metric (LEM) and log-Cholesky metric (LCM).

RGCN Architecture
Having introduced all the building blocks of RGCN, we now summarize the model architecture, as illustrated in Figure 3.Given a graph G = (V, E ) and input Euclidean features (x E ) i∈V , the first layer of RGCN maps from Euclidean to SPD space.RGCN then stacks multiple SPD graph convolution layers.At each layer HGCN transforms and aggregates neighbor's embeddings in the substituted Euclidean subspaces.Hence, the information propagation in an RGCN layer is: SPD embeddings (Z) i∈V of the last RGCN layer can then be used to predict node labels.
For the node classification task, we directly classify the nodes on the SPD manifold using the SPD multinomial logistic loss.

Experiments
In this section, we present our experimental evaluation to validate the effectiveness of the proposed method and analyze the results.
Table 2. Comparison of model capabilities regarding tree-like structure modeling, tree-like heterophily modeling, and neighbor interaction.A checkmark (✓) indicates the presence of the capability, while a cross (×) denotes its absence.

Model Type
Model Tree Heterophily Neighbor Experimental Details.We adhere to a consistent data splitting strategy employed in previous studies [18,34].Specifically, nodes within the Disease category are partitioned into training (30%), validation (10%), and test (60%) sets.For other categories such as Texas, Wisconsin, Cornell, Squirrel, and Chameleon, the node distribution is set at 70%, 15%, and 15%, respectively.However, for Cora and PubMed, we utilize 20 labeled training examples per class.Our methodology closely mirrors the parameter configurations and optimization techniques outlined in the original works.

Experimental Results
The proposed RGCN model is initially assessed in the context of node classification to gauge its discriminative capacity across tree-like and grid-like structures.Table 3 presents performance comparisons between different models, encompassing those operating within Euclidean and hyperbolic spaces.Notably, models leveraging hyperbolic geometry exhibit substantial performance gains over several comparative models, particularly evident in datasets resembling tree structures, such as the Disease dataset, exhibiting complete treelike structures when δ = 0.This underscores the efficacy of hyperbolic geometry in adeptly capturing hierarchical structures within graphs.As illustrated in the table, the proposed RGCN achieves peak performance on three out of four datasets, displaying slightly lower performance only on Cora, which tends towards Euclidean geometry.This underscores the effectiveness of symmetric positive-definite (SPD) geometry as an adaptive mixed space, encompassing both Euclidean and hyperbolic subspaces, for modeling intricate graphs comprising hierarchical and grid-like structures.Particularly noteworthy is the relative performance enhancements of 13.7% and 2.2% achieved by RGCN compared to methods solely based on Euclidean or hyperbolic geometry, respectively, on the real network Airport with δ = 1.In summary, the utilization of SPD geometry by RGCN surpasses individual models grounded in hyperbolic and Euclidean geometries in modeling complex networks, with experimental outcomes validating the effective exploitation of SPD geometric properties in crafting neural network modules, thereby enhancing experimental performance.Moreover, Table 4 presents the outcomes of graph neural network models predicated on Euclidean, hyperbolic, and symmetric positive-definite (SPD) geometries for node classification tasks on heterophily graphs.Notably, on intricate heterophily graphs, models grounded in hyperbolic geometry (e.g., HGCN and HyboNet) do not consistently surpass MLP.Specifically, hyperbolic models outperform the conventional homophily model GCN across nearly all five graph datasets; nevertheless, in comparison to heterophily models, they demonstrate superior performance solely on the Squirrel and Chameleon graphs, potentially attributable to disparities in specific graph structures.
Given that RGCN comprehensively harnesses the attributes of the SPD manifold, compatible with both Euclidean and hyperbolic geometries, it achieves the most remarkable classification results across all five datasets when compared to all comparative methodologies.In the outcomes pertaining to the initial three datasets, RGCN outshines the Euclidean heterophily graph model H 2 GCN, while in the results on the latter two datasets, RGCN's performance also eclipses that of the hyperbolic model HyboNet.This unequivocally validates the geometric versatility of the SPD manifold and underscores the superiority of the proposed RGCN in modeling representation capabilities.

Analysis and Discussion
In this subsection, we analyze the sensitivity to hyperparameters regarding the embedding dimension and propagation layer.
For the hidden layer dimension, Figure 4 illustrates that on graphs biased towards hierarchical structures, such as Disease and Airport, optimal performance is attained at larger feature dimensions.Conversely, on graphs biased towards grid-like structures, such as Cora and PubMed, optimal performance is achieved at smaller dimensions, specifically at five dimensions.This variance in representation space dimensions due to geometric structural disparities aligns with the expectations of this study, indicating that the symmetric positive-definite (SPD) space encompasses both Euclidean and hyperbolic subspaces, enabling adaptive encoding of distinct spatial structures.Regarding the number of propagation layers, as depicted in Figure 5, the challenge of over-smoothing has long impeded graph neural networks from effectively capturing long-distance dependency relationships.Consequently, optimal performance of graph neural network models is typically achieved with fewer layers.The analysis of the number of propagation layers validates this observation.Although the optimal layer settings may vary due to different graph properties, optimal performance is generally attained within four layers, with a risk of over-smoothing when surpassing this threshold.

Conclusions
In this study, we systematically reconstructed the components of information propagation in classical Euclidean graph convectional networks, such as linear feature transformations, information aggregation, and non-linear activation functions, to symmetric manifold spaces, specifically symmetric positive-definite matrix spaces.By integrating Riemannian geometry with the log-Euclidean metric (LEM) and log-Cholesky metric (LCM) in pullback techniques, we develop a comprehensive scheme of information propagation on the symmetric positive-definite matrix manifold.Experimental results show that the proposed model outperforms its Euclidean and hyperbolic geometry counterparts on complex network data exhibiting implicit hierarchy.The efficacy of this approach further validates the applicability of deep learning to symmetric manifolds, offering a novel avenue for processing data with intricate structures.Although this study demonstrates the superiority of SPD manifolds over Euclidean and hyperbolic geometries for graph embedding, the neural network operations defined on SPD manifolds are computationally expensive.To enhance the scalability of SPD geometry on large-scale graph data, we will focus on the efficiency optimization of SPD neural networks in the future.

Figure 2 .
Figure 2. Illustration of graphs embedded in a continuous symmetric space with both flat and tree-like substructures.

Figure 4 .
Figure 4. Classification results under different dimension settings.

Figure 5 .
Figure 5. Classification results under different propagation layer settings.