Graph Convolutional Networks Guided by Explicitly Estimated Homophily and Heterophily Degree

: Graph convolutional networks (GCNs) have been successfully applied to learning tasks on graph-structured data. However, most traditional GCNs based on graph convolutions assume homophily in graphs, which leads to a poor performance when dealing with heterophilic graphs. Although many novel methods have recently been proposed to deal with heterophily, the effect of homophily and heterophily on classifying node pairs is not clearly separated in existing approaches and inevitably inﬂuences each other. To deal with various types of graphs more accurately, in this work we propose a new GCN-based model that leverages the explicitly estimated homophily and heterophily degree between node pairs and adaptively guides the propagation and aggregation of signed messages. We also design a pre-training process to learn homophily and heterophily degree from both original node attributes that are graph-agnostic and the localized graph structure information by using Deepwalk that reﬂects graph topology. Extensive experiments on eight real-world benchmarks demonstrate that the new approach achieves state-of-the-art results on three homophilic graph datasets and outperforms baselines on ﬁve heterophilic graph datasets.

The design of traditional GCN-based methods, such as GAT, GraphSAGE and GIN, is based on the homophily assumption, which means that the connected nodes in a graph more likely belong to the same category or have similar features [1,11]. The traditional GCN-based methods reflect the assumption of a message-passing mechanism [5,12], where connected nodes propagate features to each other as messages, and each node aggregates the node features of neighbors and then combines the aggregated message and ego features to update the node representation [1,2,4,5]. For each node, the aggregation and update process can be viewed as a weighted sum of ego features and all neighbor's features, and the neighbor's features are always assigned positive coefficients when calculating the weighted sum. From the perspective of spectral approaches [5,13,14], the results in [15,16] show that the typical message-passing mechanism of the GCNs above actually acts as a low-pass filter, which filters out high-pass signals and retains the low-pass signals. It causes the representations of connected nodes to tend to be similar and easier to be classified as the same category. Under this assumption, the traditional GCN-based methods have shown good predictive accuracy of node classification tasks on homophilic graphs.
However, there are many graphs in the real-world where a large number of neighbouring nodes belong to different categories, also known as heterophilic graphs. For example, on e-commerce websites, a fraudster is more inclined to interact with ordinary users than with another fraudster [17]. In dating networks, users tend to establish relationships with the opposite gender. Studies have shown that many traditional GCN models are ineffective on heterophilic graphs, especially for semi-supervised node classification tasks [18,19]. As shown in Figure 1a, consider updating the node feature h 1 of v 1 that has two neighbours v 2 and v 3 . By applying the traditional message-passing method, the updated representation of node v 1 is obtained by adding the node features of v 2 and v 3 to its self-representation, respectively. After updating, the node feature of v 1 becomes closer to that of both v 2 and v 3 , which makes it difficult to distinguish between nodes v 1 and v 2 that belong to different categories.

Aggregation
Update   Recently, many novel GCN-based methods have been proposed to deal with heterophily. Notably, FAGCN [20] takes the perspective of graph signal processing [21] and explores both low-pass filters and high-pass filters in the model design. By connecting to the spatial domain that the process of low-pass filtering represents, the sum of node features with neighbours and the process of high-pass filtering represent the difference of node features between neighbouring nodes; it uses a self-gating mechanism to learn the signed coefficients of the messages. HOG-GCN [22] and BM-GCN [23] explore the estimated homophily degree between node pairs to guide the propagation of positive messages but do not consider how to guide the propagation of negative messages, which is often important in heterophilic graphs. Consider again the example in Figure 1b, if we introduce signed messages during the message-passing process, the updated representation of node v 1 is closer to that of its homophilic neighbor v 3 and away from that of its heterophilic neighbor v 2 as expected. We also observe that the above methods have a common limitation in that the effects of homophily and heterophily on classifying node pairs are not clearly separated in their learning approaches and inevitably influence each other.
In this paper, we aim to design a new model that works well on either homophilic or heterophilic graphs for semi-supervised node classification tasks. Comparing the existing approaches and inspired by the above models, especially FAGCN and HOG-GCN, we would like to leverage explicitly the estimated homophily and heterophily degree between node pairs to adaptively guide the propagation and aggregation of signed messages, respectively. In particular, positive messages are aggregated from homophilic neighbors, and negative messages are aggregated from heterophilic neighbors. To overcome the limitation that the categories of most nodes are unknown in semi-supervised node classification tasks, we pre-train estimators to calculate both the homophily and the heterophily degree of node pairs, which reflect the probability that neighbouring nodes belong to the same category and different categories, respectively. We also propose to learn estimators from both original node attributes that are graph-agnostic and the localized graph structure information by using Deepwalk [24] that reflects graph topology. These two estimators are jointly used in our learning process. We adopt a two-stage training method to train our proposed model, in which the estimators are pre-trained in the first stage, and the GCN module together with the pre-trained estimators are jointly trained in the second stage. That is, the estimators are fine-tuned when training the GCN module.
To summarize, the main contributions of this paper are as follows: 1.
We propose an improved GCN-based message-passing mechanism which explores the explicitly estimated homophily and heterophily degree to adaptively guide the passing of positive and negative messages, respectively. The improved mechanism makes the learned representations of nodes which belong to different categories easier to be distinguished in both homophilic and heterophilic graphs.

2.
We propose extracting localized structure information in graphs by using Deepwalk which is explored to pre-train an estimator. The estimator as well as the pre-trained estimator learned from original node attributes are jointly used and give us an approach to estimate appropriate homophily and heterophily degree of node pairs. 3.
We then design a new Graph Convolutional Network Guided by Explicitly-Estimated Homophily and Heterophily Degree called GCN-EHHD based on the above components. Extensive experiments on eight real-world benchmarks demonstrate the effectiveness of our model. Specifically, GCN-EHHD achieves state-of-the-art results on three homophilic graph datasets and outperforms baselines on five heterophilic graph datasets.
The rest of the paper is organized as follows. In Section 2, we briefly introduce the related work on tackling semi-supervised node classification tasks on heterophilic graphs. In Section 3, we define the semi-supervised node classification tasks and describe the details of our proposed GCN-EHHD. In Section 4, we demonstrate the effectiveness of our proposed approach by comprehensive experiments and an ablation study on real-world datasets. Finally, we conclude with future work in Section 5.

Related Work
There are two major GCN-based approaches for tackling learning problems on heterophilic graphs. One is to mine more supplemental nodes when updating a node feature in the message-passing process, which act as those directly-connected neighbors for propagating messages to the node and participate in aggregation and updating of node representation. Geom-GCN [18] maps each node to the latent space and takes the nodes whose distance from the node is less than the setting value in the latent space as the supplemental nodes. NL-GNN [25] utilizes an attention-guided sorting method to choose the supplemental nodes. It defines a learnable calibration vector and adopts an attention mechanism to calculate the attention scores of each node representation and the calibration vector. All nodes are sorted according to their attention scores. For each node, the other nodes whose attention scores are similar are chosen as the supplemental nodes. In addition to the above methods, there are some works that choose the higher-order neighbors in the graph as supplemental nodes. They are based on the observation that for nodes in heterophilic graphs, the first-order neighbors are often heterophilic, but the higher-order neighbors are likely to be homophilic [26]. MixHop [19] chooses two-hop neighbors as the supplemental nodes for message passing. In the layers of MixHop, the messages from different hops neighbors are linearly transformed by different weights and then concatenated. H2GCN [26] chooses higher-order neighbors as the supplemental nodes for message passing. It cancels the linear transformation and concatenates the messages aggregated from neighbors of different hops in the hidden layer, and the aggregated messages at different hidden layers are concatenated and linearly transformed in the output layer. Considering that the number of two-hop neighbors will grow exponentially with the growth of network scale, UGCN [27] limits the two-hop neighbors that have at least two different paths to the node as the supplemental nodes. Apart from the two-hop neighbors, UGCN also utilizes the cosine similarity to choose k-nearest neighbor neighbors as the supplemental nodes. However, there is a shortcoming for these methods in that mixing the messages obtained from connected nodes and supplemental nodes will damage the graph structure information which is often important in graph learning tasks.
The other approach is to design a more reasonable message-passing mechanism. FAGCN [20] demonstrates the importance of high-pass signals on heterophilic graphs through experiments and illustrates that high-pass signals can be retained by propagating negative messages between the connected nodes. It uses a self-gating attention mechanism to learn the signed coefficients of neighbors for propagation, and the coefficients represent the proportions of low-pass signals and high-pass signals. However, the effects of homophily and heterophily on the learning of signed coefficients are not clearly distinguished and treated in a mixed and implicit way. GGCN [28] uses the cosine similarity function to calculate the signed coefficients of neighbors for propagation between connected nodes. CPGNN [29] defines a parameterized and learnable compatibility matrix which is used to model the probability of node pairs that belong to each category. It also adopts a pre-trained estimator to calculate the prior belief of class labels for nodes and propagates the prior belief by the compatibility matrix. HOG-GCN [22] combines the estimated homophily degree and the learnable homophily degree of node pairs to guide the positive message passing between nodes and their multi-hop neighbors. Since HOG-GCN models the homophily degree between node pairs by learnable parameters, the parameter quantity of HOG-GCN can be quite big for a large graph. BM-GCN [23] adopts the estimated soft labels to calculate a block similarity matrix and combines the estimated soft labels and block similarity matrix to guide the positive message passing. HOG-GCN and BM-GCN do not consider the passing of signed messages, which is often important in heterophilic graphs. In a word, the existing works do not consider utilizing the explicitly estimated homophily and heterophily degree to guide the passing of signed messages.

Problem Definition and Overview of the Proposed Method
represents a set of edges, and A ∈ {0, 1} N×N is an adjacency matrix which can describe the topology of the graph G. A[i, j] is 1 if there is an edge between v i and v j , or 0 otherwise. In this paper, we focus on the semi-supervised node classification task on graphs. For each graph, there is a feature matrix X. Let x i be the i-th column of X, which represents the feature vector associated with node v i . Each node belongs to one of C classes. There is a set of nodes V L (|V L | << N), and each node of V L is assigned a label y i ∈ {1, 2, . . . , C}. The labels of all nodes in set V L form a set Y L . The task is to learn a function f : (G, X, Y L )→ Y U and predict the labels Y U of unlabeled nodes.
There are two major components in GCN-EHHD. One is the MLP module for pretraining estimators. Here we choose the multi-layer perceptron (MLP) as the estimator to explicitly estimate the homophily and heterophily degree between neighbouring nodes. The other is the GCN module that leverages the estimated homophily and heterophily degree to adaptively guide signed-message passing. In the improved message-passing mechanism, when updating the representation of each node in the graph, we use the homophily degree to guide the passing of a positive message and the heterophily degree to guide the passing of a negative message, respectively, and these signed messages are aggregated with the node self-representation. We also adopt a two-stage training process to train GCN-EHHD, where the MLP module is pre-trained first for obtaining estimators and then is jointly trained with the GCN module in the second step. An overview of GCN-EHHD is shown in Figure 2.

The Pre-Training Module for Estimators
In order to make full use of the graph information, we pre-train two MLPs by learning from two MLPs which use the node attribute information following [22] and localized structure information independently. We choose Deepwalk [24] to mine localized structure information. Deepwalk [24] applies a group of random walk generators to generate node sequences from a graph. For each node v i in the graph, the random walk generator takes v i as the first node to visit, samples uniformly from neighbors of v i and then designates the sampled neighbor as the next node to visit. The generator repeats the sampling process until the maximum length T of the sequence is reached and the sampled nodes form a sequence in order. Then, Skip-gram [30] is used to generate embeddings x E i for each node v i from random walk sequences. For each node in a random walk sequence, Skip-gram maximizes the probability of the neighbors that co-occur with the node in a window. It iterates over the random walk set and uses gradient descent to train the embedding of nodes. Let X E be the learned matrix generated by Deepwalk in graph G, and let x E i be the i-th column of X E . X E will be used to train an estimator.
Then, we define two MLPs, called MLP 1 and MLP 2 , respectively. The l-th layers of the two MLPs can be defined as: Through MLP estimators, we can obtain the soft labels of all the nodes: where p are the outputs of the last layers of MLP 1 and MLP 2 for node v i , respectively, and s i , denote the probabilities that node v i belongs to class j as estimated by MLP 1 and MLP 2 , respectively.
We pre-train estimators by the nodes with known labels and obtain the optimal parameters of MLP 1 and MLP 2 by minimizing the following two loss functions: where J represents the cross entropy.
After the above pre-training process, we obtain the estimators MLP 1 and MLP 2 , and we will use them to estimate the homophily and heterophily degree of the neighboring nodes.

The GCN Module Guided by Explicitly Estimated Homophily and Heterophily Degree
For any directly connected nodes v i and v j in the graph, we use pre-trained estimators to calculate the homophily degree of these two nodes, which is the estimated probability that two nodes belong to the same category: We take a weighted sum of the two estimated homophily degrees: The hyperparameter θ controls the proportion of original node attribute information and localized structure information used in estimation. Meanwhile, we can obtain the estimated heterophily degree between node pairs, which is the estimated probability that two nodes belong to different categories: Following the settings in BM-GCN [23], for a graph G, we add multiple self-loops to it: where I ∈ R n×n is the identity matrix, and δ is the self-loop parameter which is used to control the proportion contributed by the node itself when calculating the weights of messages. We use N (v i ) to denote the set of neighbors of node v i in the graph. We use a softmax operation to normalize the homophily and heterophily degrees between node v i and all its neighbors to obtain the weights of positive and negative messages: Then, we use the above weights to guide the message passing in the graph convolutional layer. The l-th graph convolutional layer in GCN-EHHD can be defined by: where h Finally, we can obtain the ultimate prediction of GCN-EHHD by: where K is the number of the convolutional layers in GCN-EHHD, r i ∈ R N×C is the predicted results of GCN-EHHD for node v i , and r i [c] is the probability that node v i belongs to class c.

Optimization Objective
Similar to the loss function in (5), we can calculate the loss of the GCN module by When training the GCN module of GCN-EHHD, we fine-tune the estimators at the same time. That is, we add the loss of MLP 1 described in Equation (5), the loss of MLP 2 described in Equation (6) and the loss described in Equation (17) as the final loss: We will jointly train the MLP module and the GCN module by minimizing the above loss function.

Experimental Settings
Datasets. We chose eight widely-adopted datasets. Following [26], we used the homophily ratio h = |(u,v):(u,v)∈E ∧y u =y v | |E | to describe the homophily level for a graph. We chose three homophilic graph datasets: Cora, Citeseer and Pubmed [31,32], and five heterophilic graph datasets: Texas, Wisconsin, Squirrel, Chameleon and Cornell [18]. The statistics of these datasets are listed in Table 1, where h represents the homophily ratio of a graph, and d avg = |E | |V | represents the average degree of nodes in a graph.
Implementation. The baselines MLP and Deepwalk were the same methods with the MLP 1 and MLP 2 in GCN-EHHD, respectively, and the parameter settings and training procedures of baselines MLP and Deepwalk were the same as those of the pre-trained MLP 1 and MLP 2 in GCN-EHHD, respectively. For GCN and GAT, the dimension of hidden layers was set as 64, the number of layers was set as 2, the Adam optimizer [33] was used as optimizer, the learning rate was set as 0.001, and the weight decay was set as 5E-4. The number of attention heads in GAT was set as five. The number of training epochs of GCN and GAT was set as 1600 with the early stopping strategy of 100 patience epochs. For other baselines, the optimal parameters and the training procedures were taken as reported in their original papers. We implemented our proposed GCN-EHHD based on PyTorch [34]. We utilized the Adam optimizer [33] to train the GCN-EHHD. In the process of generating embedding X E using Deepwalk, the number of random walk generators was set as 80, the maximum length of random walk was set as 10, and the windows size was set as 5. For each dataset, the embedding dimension of X E was the same as the original features X.

Experimental Results
For all eight datasets, we adopted the test approach of [18] and randomly split the data ten times according to the division ratio 48%/32%/20% to generate the training/validation/test sets. We used the mean accuracy on the test set of the ten splits as the result. We report the experimental results of three homophilic graphs in Table 2, and the results of five heterophilic graphs in Table 3. Table 2. Node classification accuracy on homophilic graphs (in % ± Standard Error). The best results are in bold, and the second-best results are underlined. By inspecting Tables 2 and 3, we have the following observations. On one hand, for homophilic graphs, the performances of GCN-based models under the homophily assumption were similar to that of GCN-based models tailored for dealing with heterophily. This shows that the emerging methods for tackling heterophily have good compatibility for homophily. For heterophilic graphs, these methods tailored for heterophily significantly outperformed traditional GCNs. On the other hand, for homophilic graphs, our GCN-EHHD outperformed all the baselines on Citeseer and Pubmed and achieved the second best on Cora. For heterophilic graphs, GCN-EHHD outperformed all the baselines on the five datasets. Especially, GCN-EHHD achieved 4.83%, 3.17% and 2.01% improvement on Texas, Squirrel and Cornell, respectively. Therefore, our proposed model can achieve the best or comparable performance on both homophilic and heterophilic graphs. Table 3. Node classification accuracy on heterophilic graphs (in % ± Standard Error). The best results are in bold, and the second-best results are underlined. The accuracies of MLP 1 and MLP 2 of the GCN-EHHD are also meaningful for analyzing the performance of GCN-EHHD. We show the accuracies of pre-trained MLP 1 , MLP 2 and final MLP 1 , MLP 2 after fine-tuning in Table 4. In our experiments, in general, the better the MLPs accuracies were, the better the accuracy of the whole GCN-EHHDwas, especially for the large graph. We discuss the effect of pre-trained estimators in Section 4.4. Next, we discuss the computation complexity and model complexity compared with three representative and competitive models, including FAGCN, HOG-GCN and BM-GCN. To facilitate comparison, we assume that for each model, the dimensions of the input layer, the output layer and all hidden layers are equal and denoted by d. Let K be the number of network layers and L be the number of layers of MLP for pre-training the estimators, let |V | be the number of nodes and |E | be the number of edges in a graph. We also take the time The coefficients of messages must be calculated at each layer for FAGCN, and the linear transformation of features is only used in the first and the last layers in FAGCN but must be used in each layer in HOG-GCN, BM-GCN and GCN-EHHD. So, the computation time of FAGCN grows with the increase of the number of edges of the graph, and the computation time of HOG-GCN, BM-GCN and GCN-EHHD grows in a square number with the increase of the representation dimension of the hidden layer. In our experiments on the Cora dataset, the time cost of one-pass forward reasoning process in FAGCN, HOG-GCN, BM-GCN and GCN-EHHD were 2.00 ms, 361.03 ms, 1.99 ms and 1.99 ms, respectively. It can be seen that our proposed GCN-EHHD consumes relatively less time.

Model
The complexity of parameters in our GCN-EHHD is O((3K + 2L)d 2 ), and they are O(2d 2 + Kd), O((2K + L)d 2 + K|V | 2 ) and O((2K + L)d 2 ) in FAGCN, HOG-GCN and BM-GCN, respectively. Note that the complexity of parameters in HOG-GCN is able to become O((2K + L)d 2 + K|E |) by modeling the edge weights rather than the homophily degree matrix and overlooking the message passing from high-order neighborhoods. Among these models, the number of parameters in FAGCN is generally the smallest, and the number of parameters in HOG-GCN is generally the largest, since the number of nodes and the number of edges in the graph are generally much larger than the hidden layer dimension of the model. Taking the parameters used in the four models when conducting experiments on the Cora dataset as an example, the numbers of parameters of FAGCN, HOG-GCN, BM-GCN and GCN-EHHD were 92.49 k, 26,154.75 k, 336.46 k and 553.36 k, respectively. It can be seen that, our proposed GCN-EHHD achieves an improvement in the experimental performance without too much increase in the number of parameters.

Visualization
In order to more visually observe the performance of our proposed model and the compared baselines, we performed the visualization tasks for GCN, FAGCN, BM-GCN and GCN-EHHD on Chameleon that is a heterophilic graph. We extracted the output embedding of the last layer of the four models and used t-SNE [35] to project them to a 2D space. The projected result can be seen in Figure 3 where we can find some meaningful phenomena. First, for GCN, many nodes with different categories are mixed together, which once again shows the defect of GCN for learning tasks on heterophilic graphs. Next, for FAGCN, most nodes with the same category are basically clustered together, but the boundaries between clusters are not obvious. For BM-GCN, the classification result was further improved, but there is still no clear boundary between two clusters (i.e., the blue cluster and the purple cluster) in Figure 3c. Finally, for our proposed GCN-EHHD, the boundaries between clusters are the clearest compared with the other three models. This visualization result once again demonstrates the effectiveness of our proposed model.

Ablation Study
In order to explore the role of negative message passing guided by heterophily degree and estimator generated by Deepwalk, we conducted an ablation study. Recall that the hyperparameter α in Equation (14) controls the proportion of aggregated positive and negative messages from the neighbors, and the hyperparameter θ in Equation (9) controls the proportion of node attributes and graph structure information when estimating homophily degree and heterophily degree. We therefore perform an ablation study by comparing different settings of parameters α and θ, and report their effects on the predictive accuracy of node classification tasks.
We chose eleven values of α in the interval [0, 1]. When α = 1, it amounts to discarding the part of negative message passing and only aggregating positive messages from neighbors. When α = 0, the reverse is true. For each setting of α, we chose eleven values of θ ranging from 0 to 1. When θ = 0, we only used the pre-trained estimator generated based on graph structure information extracted by Deepwalk. When θ = 1, we only considered node attributes for estimating the homophily and heterophily degree. For each setting of α and θ, we conducted node classification experiments as described in Section 4.2 on the aforementioned eight datasets in Section 4.1 and report the experimental results as given in     The analysis results of the ablation study on α are shown in Figures 4 and 5. For each sub-figure, the x-axis is different settings of α, and under each setting of α, the yaxis coordinate value of the figures on the left shows the average accuracy given the aforementioned eleven different settings of θ, and the y-axis coordinate value of the figures on the right shows the classification accuracy for θ ∈ {0, 0.5, 1}, respectively. As shown, there are three main situations. For the first situation, the influence of α on the model accuracy in average is small for graphs such as Cora, Citeseer and Pubmed. For the second situation, the plots are characterized by sort of "reverse U-shape", where the accuracy values for α = 0 and α = 1 are close to each other, and the best results are obtained in the vicinity of α = 0.5. The analysis results for Wisconsin, Cornell and Texas are this case. In the third situation, the plots for Chameleon and Squirrel are characterized by sort of "U-shape". When α = 1, that is, the negative message passing is cancelled, the performance is not the best in the eight datasets, and in particular is poor on Cornell, Wisconsin and Texas. This validates the effectiveness of negative message passing. For Cora, Citeseer, Pubmed, Wisconsin, Cornell and Texas, when the value of α is set to be approximately 0.5, the performance is the best. Consider that the ablated GCN-EHHD models with α ∈ {0, 1} are similar, so the accuracy for α = 0 and α = 1 is close. For Chameleon and Squirrel, the performance is the best when α = 0; that is, we only use negative message passing, and the performance is the worst in the vicinity of α = 0.5. We think the difference is related to the average degree of nodes in graphs. As shown in Table 1, the average degrees of nodes in Chameleon and Squirrel are both much larger than the other datasets. For graphs with a small average degree, the influence of homophilic neighbors and heterophilic neighbors can both play important roles in message passing on heterophilic graphs. This may be the reason why the best performance is obtained in the vicinity of α = 0.5 for Wisconsin, Cornell and Texas. Therefore, while for most graphs we can obtain a good performance by mixing positive messages with negative messages, for some heterophilic graphs with large average node degrees, it may be better to keep using negative messages alone.
The analysis results of the ablation study on θ are shown in Figures 6 and 7. For each sub-figure, the x-axis is different settings of θ, and under each setting of θ, the yaxis coordinate value of the figures on the left shows the average accuracy given the aforementioned eleven different settings of α. The y-axis coordinate value of the figures on the right shows the classification accuracy for α ∈ {0, 0.5, 1}, respectively. In addition, we also annotate each sub-figure with the accuracy of two pre-trained MLPs. As shown, there are three main situations. For the first situation, the better accuracy of GCN-EHHD is obtained for the values of θ that give more importance to the pre-trained MLP with best performances on graphs such as Chameleon and Squirrel as shown in Figure 7g,h and Figure 7i,j, and Pubmed as shown in Figure 6e and Figure 6f, respectively. These three graphs are relatively large compared with other datasets. In particular, the average degree of nodes in Chameleon and Squirrel is quite large. For the second situation, the influence of θ on the accuracy of GCN-EHHD is small for graphs such as Cora and Citeseer as shown in Figure 6a,b and Figure 6c,d, respectively. The difference in accuracy between the two pre-trained MLPs used for analyzing Cora and Citeseer is about 10% and not as significant as for other datasets. For the third situation, the accuracy of GCN-EHHD fluctuates with the change of θ on graphs such as Texas, Cornell and Wisconsin as shown in Figure 7c-f and Figure 7a,b, respectively. The special performance in the third situation may mainly be because the three graphs are all relatively small which causes instability.
The results indicate that the pre-trained estimator using Deepwalk is useful in measuring the homophiliy and heterophily degree on some large graphs, especially for those with large node degrees.

Conclusions
We proposed a new model called GCN-EHHD in order to better generalize GCN-based methods to deal with heterophilic graphs. In GCN-EHHD, we pre-trained estimators on homophily degree and heterophily degree between neighbouring nodes and leveraged the information to adaptively guide propagation/aggregation of the positive messages and negative messages. Since both node attributes and graph topology matter to the prediction accuracy, we propose applying Deepwalk to extract the structure information of graphs for training an estimator that is combined with the estimator learned from original node attributes to estimate accurate homophily and heterophily degree between node pairs. Our experiments with eight datasets demonstrate the effectiveness of the proposed approach. The proposed new approach achieved state-of-the-art results on three homophilic graphs and outperformed baselines on five heterophilic graphs. Our extensive experiments show that the explicit propagation of signed messages adaptively guided by heterophily degree is important in the message-passing process of heterophilic graphs. The exploration of graph structure information by using Deepwalk turned out to be useful for large heterophilic graphs such as Chameleon and Squirrel, for which the graph topology seemed to play a more decisive role than node attributes as shown by the accuracy of pre-trained estimators. In future work, it would be interesting to explore models other than MLPs to train more effective estimators and study more large graphs with a variety of statistical properties.