Multi-Channel Graph Convolutional Networks for Graphs with Inconsistent Structures and Features

Chang, Xinglong; Wang, Jianrong; Wang, Rui; Wang, Tao; Wang, Yingkui; Li, Weihao

doi:10.3390/electronics13030607

Open AccessEditor’s ChoiceArticle

Multi-Channel Graph Convolutional Networks for Graphs with Inconsistent Structures and Features

by

Xinglong Chang

^1,2,

Jianrong Wang

^1,3,

Rui Wang

³,

Tao Wang

³,

Yingkui Wang

^4,* and

Weihao Li

⁵

¹

School of New Media and Communication, Tianjin University, Tianjin 300350, China

²

Qijia Youdao Network Technology (Beijing) Co., Ltd., Beijing 100012, China

³

College of Intelligence and Computing, Tianjin University, Tianjin 300350, China

⁴

Department of Computer Science and Technology, Tianjin Renai College, Tianjin 301636, China

⁵

Data61-CSIRO, Black Mountain Laboratories, Canberra, ACT 2601, Australia

^*

Author to whom correspondence should be addressed.

Electronics 2024, 13(3), 607; https://doi.org/10.3390/electronics13030607

Submission received: 1 January 2024 / Revised: 29 January 2024 / Accepted: 30 January 2024 / Published: 1 February 2024

Download

Browse Figures

Versions Notes

Abstract

Graph convolutional networks (GCNs) have attracted increasing attention in various fields due to their significant capacity to process graph-structured data. Typically, the GCN model and its variants heavily rely on the transmission of node features across the graph structure, which implicitly assumes that the graph structure and node features are consistent, i.e., they carry related information. However, in many real-world networks, node features may unexpectedly mismatch with the structural information. Existing GCNs fail to generalize to inconsistent scenarios and are even outperformed by models that ignore the graph structure or node features. To address this problem, we investigate how to extract representations from both the graph structure and node features. Consequently, we propose the multi-channel graph convolutional network (MCGCN) for graphs with inconsistent structures and features. Specifically, the MCGCN encodes the graph structure and node features using two specific convolution channels to extract two separate specific representations. Additionally, two joint convolution channels are constructed to extract the common information shared by the graph structure and node features. Finally, an attention mechanism is utilized to adaptively learn the importance weights of these channels under the guidance of the node classification task. In this way, our model can handle both consistent and inconsistent scenarios. Extensive experiments on both synthetic and real-world datasets for node classification and recommendation tasks show that our methods, MCGCN-A and MCGCN-I, achieve the best performance on seven out of eight datasets and the second-best performance on the remaining dataset. For simpler graph structures or tasks where the overhead of multiple convolution channels is not justified, traditional single-channel GCN models might be more efficient.

Keywords:

graph convolutional networks; network analysis; graph representation learning

1. Introduction

Network data are ubiquitous in the real world, such as social networks [1,2], citation networks [3,4], communication networks [5,6], and biological networks [7,8]. Such data often comprise two sources of information: the underlying network structure and the node features. For example, in a citation network, each node symbolizes a paper, characterized by keywords, whereas the edges denote citation links between papers. Consequently, an effective graph learning algorithm should seamlessly integrate both types of information [9,10].

Graph convolutional networks (GCNs) [11] are extensively studied for their robust representation learning capabilities and have been applied to various network analysis tasks, such as node classification, link prediction, community detection, and recommendation systems [12,13,14]. Typically, GCNs employ a feature-passing mechanism over the structure of the input graph, integrating information from both the network structure and node features. Specifically, each layer of the GCN updates node representations by aggregating those of their neighboring nodes, which is supervised by partially available node labels.

Recently, a variety of GCN layers have been proposed. For example, GraphSAGE [15] samples a fixed size of neighborhoods for each node, proposes a sum/max/LSTM pooling aggregator, and adopts a concatenation operation for the update. GAT [16] applies an attention mechanism to assign weights to neighbors. GIN [17] generates a provably maximally powerful graph neural network under the Weisfeiler–Lehman test [18].

Existing GCN-based methods [11,15,16,19] suffer from serious drawbacks. To fuse structural information and node features, they heavily rely on the process of passing features across the graph structure, which is roughly equivalent to Laplacian smoothing on node features [20]. Essentially, they assume consistency between structural and feature information, both contributing positively to node label prediction. However, in numerous real-world networks, these two sources of information often possess distinct characteristics, leading to a prevalent mismatch between them [21].

Such a mismatch (inconsistency) between structures and features can arise from two scenarios: (1) Either the topological structure or the node features frequently carry substantial noise, which can negatively impact node predictions. Using social networks like Twitter as an example, the real social relationship reflects the community structure more directly than the diverse and noisy user-generated features. (2) Adversarial attacks can specifically target either the network structure or node features [22]. These attacks involve actions like adding/removing edges or injecting random noise into node attributes. Both scenarios result in a mismatch between these two information sources. When such a mismatch occurs, existing methods struggle to derive effective representations from both the graph structure and node features. These methods exhibit inferior performance compared to models that solely leverage either the network structure (e.g., DeepWalk [23]) or node features (e.g., MLP [24]), as supported by our motivating observations.

Recent methods, such as H2GCN [25], Geom-GCN [26], CPGNN [27], GGCN [28], and GPR-GNN [29], target graphs exhibiting varying degrees of heterophily or low homophily, where connected nodes may have differing class labels and features. However, these methods continue to presume that both high-order structures and node features are reliable and consistent. In situations where the graph structure is initially completely inconsistent with the node features (e.g., random graph structure), incorporating a high-order structure fails to capture useful representations for downstream tasks [30]. As a result, these models struggle to generalize in scenarios with inconsistent structures and features.

Therefore, our research focuses on devising methods to derive representations from both the graph structure and node features, particularly under circumstances where these elements exhibit inconsistencies. In this work, we propose a novel approach called the multi-channel graph convolutional network (MCGCN) for graph data with inconsistent structures and node features. The core idea is to extract distinct representations from both the structural aspects and node features of the graphs. Additionally, we aim to capture the shared insights arising from combinations of structures and features. Meanwhile, we utilize an attention mechanism to adaptively incorporate these representations under the guidance of semi-supervised node classification tasks.

Specifically, we first generate a feature structure from the original node features and extract topological features based on the inherent network topology. Subsequently, we propagate the original features across the feature structure and the topological features across the original structure, employing two distinct convolution channels to extract two specific representations of the features and structure, respectively. This approach aims to identify inconsistencies within the data.

Furthermore, we propagate the original features across the original structure and the topological features across the feature structure using two joint convolution channels, aiming to effectively fuse information in a consistent scenario.

Ultimately, we leverage an attention mechanism to adaptively learn the importance weights of these representations, extracting the most useful information for downstream tasks [31,32]. Our method excels at extracting useful representations that significantly contribute to node prediction, even in cases of mismatched network structures and node features.

To summarize, our work makes the following contributions:

We study the mismatch (inconsistency) between structures and node features and present two motivating examples, highlighting the limitations of GCNs in fusing inconsistent structures and node features.
We propose a multi-channel graph convolutional network for graphs characterized by inconsistent structures and features. Our method extracts representations from both the structure and feature spaces, along with their combinations, and adaptively fuses the most useful information from these representations through an attention mechanism.
Extensive results on both synthetic and real-world datasets for node classification tasks show that the proposed method outperforms existing start-of-the-art methods on graphs with inconsistent structures and features and also delivers competitive performance on graphs with consistent structures and features.

2. Related Works

2.1. Graph Convolutional Networks

A graph convolutional network (GCN) [11] is a kind of graph neural network that aims to extend traditional deep learning methods to graph-structured data [33,34]. GCNs simplify spectral-based graph convolutional networks [35,36] by limiting the filtering operation on the first-order neighborhood. GCNs update the representation of the target node by combining its own representation and the aggregated representations from its direct neighborhood.

The field has seen a variety of methodological advancements aimed at improving the flexibility, scalability, and performance of GCNs. This includes the development of spatial-based approaches like GraphSAGE [15] and GIN [17], which allow for inductive learning on graphs [37,38,39]. Attention mechanisms were introduced into GCNs with the Graph Attention Network (GAT) [40], enabling the model to focus on important parts of the graph. Further, Rong et al. proposed the DropEdge technique to alleviate over-smoothing issues prevalent in deep GCNs [41].

2.2. Multi-Channel Graph Convolutional Networks

Drawing inspiration from multi-channel strategies in conventional CNNs, researchers have explored similar concepts in the realm of GCNs. These works involve using multiple types of convolutions or incorporating various aspects of graph data (like different types of relationships or features) into separate channels. One of the pioneering works in this area is the multi-channel spectral GCN [42,43,44], which extends spectral graph convolutions by incorporating multiple spectral filters. This method allows different channels to capture various frequency components of graph signals, leading to richer representations.

However, in scenarios where the graph structure is misaligned with node features, existing methods prove ineffective in capturing useful representations for subsequent tasks. Consequently, models operating under these conditions face challenges in generalizing effectively when confronted with discrepancies between structures and features.

In this work, we design two specific convolution channels to extract representations from the structure and feature spaces. Another two joint convolution channels are designed to fuse and capture the common information shared by both the structure and node features. Finally, an attention mechanism is employed to adaptively learn the importance weights of these representations. By adopting this approach, the problem of mismatch between network structure and features can be effectively resolved.

3. Preliminaries

We first present the notations and problem definition and then introduce graph convolutional networks (GCNs).

3.1. Problem Definition

Definition 1.

Attributed Network. Given an undirected, unweighted, and attributed network,

G = (V, E, X)

, where

V = {v_{1}, v_{2}, \dots, v_{n}}

is a set of n nodes,

E = \{e_{i j}\} \subseteq V \times V

is a set of edges, and

X \in R^{n \times m}

is a matrix representing node features, where m denotes the feature dimension. The i-th row of X corresponds to the feature of node

v_{i}

. The topological structure of G is depicted by an adjacency matrix

A = [a_{i j}] \in R^{n \times n}

, where

a_{i j} = 1

indicates a connection between nodes

v_{i}

and

v_{j}

, and

a_{i j} = 0

otherwise.

Definition 2.

Semi-Supervised Node Classification Task. In the attributed network,

G = (V, E, X)

, each node is categorized into one of C classes. We have labels for a subset

V_{L}

of nodes, containing u nodes, where

u ≪ n

. For each node

v_{i} \in V_{L}

, a label

y_{i} \in C

is assigned. The objective of the node classification task is to predict the labels of the remaining nodes in

V ∖ V_{L}

.

3.2. Notations of Graph Convolutional Networks

Given a network,

G = (V, E, X)

, where A is the adjacency matrix and D is the degree matrix with

D_{i i} = \sum_{j} A_{i j}

. By introducing self-loop and normalizing the adjacency matrix, we have

\tilde{A} = {\tilde{D}}^{(- 1 / 2)} \tilde{A} {\tilde{D}}^{(- 1 / 2)}

, where

\tilde{A} = A + I

(identity matrix) and

{\tilde{D}}_{i i} = \sum_{j} {\tilde{A}}_{i j}

. Then, the classic two-layer GCN can be defined as:

Z = f (X, A) = σ (\tilde{A} σ (\tilde{A} X W^{(0)}) W^{(1)}),

(1)

where

W^{(0)}

and

W^{(1)}

are learnable weight matrices,

σ

denotes non-linear activation functions such as ReLU, and Z is the final output representation for downstream tasks. Although GCNs show proficiency in various network analysis tasks such as node classification [12,45], they fail to extract useful and effective representations from networks with inconsistent structures and features, as discussed in this paper.

4. Motivating Observations

In this section, we generate two sets of synthetic graphs to evaluate the adaptability of classical GCNs in learning from features and topological structures, especially under different inconsistency levels. A graph with 900 nodes is generated, and these nodes are categorized into three groups, with each group containing 300 nodes. For nodes sharing the same category, node features are generated following a Gaussian distribution, whereas their topological structure is created using the Stochastic Block Model (SBM) [46]. As a result, node labels are correlated with both the node features and topological structure. The initial phase of our experiment involves a synthetic graph with ideal features and an ideal structure, a condition termed the “perfect setting”. We then incrementally add inconsistency. The methodology used to create these synthetic graphs and the approach used to integrate the inconsistencies are detailed in Section 6.2.3. For a comparative study, we select three benchmark models: Multi-Layer Perceptron (MLP) [24] as a feature-based model, DeepWalk [23] for structure-based analysis, and a GCN, which is a standard graph neural network model that utilizes both the graph structure and feature data, assuming their dependability and association with node labels.

4.1. Setting One: Structure Inconsistency

In this part of the experiment, we maintain constant and ideal node features that are correlated with their labels but introduce inconsistency into the topological structure. We generate a series of 11 graphs with increasing levels of structure inconsistency. The process begins with a perfectly consistent structure (i.e., structure inconsistency is 0). In the final graph of the series, the structure inconsistency reaches its maximum (equal to 1), resulting in a completely random topological structure. The results are shown in Figure 1a. Our observations reveal that all models perform efficiently under ideal conditions. However, as structural inconsistency is introduced, the MLP’s performance remains stable, given its independence from structural information. On the other hand, DeepWalk’s performance significantly worsens as structural inconsistency increases, eventually equating to the accuracy of random assignments. This decline is attributed to DeepWalk’s inability to extract meaningful insights from a random and label-unrelated topology. Similarly, the performance of the GCN also diminishes, underscoring its limitations in gleaning valuable information from unreliable topological structures.

4.2. Setting Two: Feature Inconsistency

In this part, we maintain a constant and ideal structure, and feature inconsistency is added gradually. Similarly, 11 graphs with increasing feature inconsistency levels are generated. The series starts with an ideal situation where feature inconsistency is nonexistent (equal to 0). In the final graph of this sequence, feature inconsistency reaches its peak (equal to 1), characterized by all nodes sharing the same Gaussian distribution, thereby making the features uncorrelated with the node labels. The results are illustrated in Figure 1b. The results indicate that under ideal conditions, all models perform efficiently. DeepWalk demonstrates robust performance across various feature inconsistency levels. The effectiveness of the MLP diminishes swiftly, eventually aligning with the accuracy of random label assignment. The performance of the GCN also reduces as feature inconsistency increases, confirming its limitations in extracting valuable information from unreliable feature data.

4.3. Motivation

The outcomes of these experiments reveal a critical aspect of classical GCNs: their inherent method of combining information from both features and topological structures without discerning the reliability of each. This approach results in GCNs being less effective in extracting useful information from data where either the structure or features are inconsistent. Intriguingly, models that solely rely on either a topological structure or node features have been shown to outperform GCNs in such scenarios. This limitation of GCNs has been a key motivator in our research, leading us to develop and introduce specific and joint convolution methods as the primary contribution of this work.

5. Methodology

In this section, we provide a brief overview of our approach, followed by a detailed introduction of the main components of our model.

5.1. Overview

In order to enhance information fusion and extract optimal representations from inconsistent structures and features, we introduce the MCGCN, a multi-channel graph convolutional network. The framework of the proposed approach is illustrated in Figure 2. We separately process the network structure and features, deriving two distinct representations from the structure and feature spaces to effectively address inconsistency. Initially, feature structures are generated from the original node features, and structural features are initialized based on the network topology. Subsequently, the MCGCN propagates the original node features and structural features using two specific convolution channels over the features and the network structure, respectively. Simultaneously, the original and structural features are propagated through their respective structures via two joint convolution channels, efficiently extracting shared information and enabling generalization in both consistent and partially inconsistent scenarios. Lastly, recognizing the uncertain contribution of each representation to label prediction before training, we employ an attention mechanism to adaptively learn their importance weights. This approach adaptively ensures the extraction of pertinent information for the classification task, even in scenarios where the structure and features exhibit inconsistency.

5.2. Specific Convolution Channels

To enhance the ability to manage inconsistent information, we employ two specific convolution channels to extract distinct representations from both the structure and features. Initially, we use a traditional MLP to capture representations from node features. However, this method treats each node independently, disregarding any concealed relationships among them. To address this limitation and extract more potent feature representations that account for the hidden node relationships, we leverage a specific graph convolution channel. This channel allows us to grasp the underlying connections between nodes, leading to more effective representations in the feature space. In order to carry out convolution operations within the feature space, we first generate a feature structure based on the original node features. Although there exist various methods for creating a feature structure, in this context, we choose an intuitive approach, k-nearest neighbor (kNN), to construct the feature structure, denoted as

A_{f} \in R^{n \times n}

.

Specifically, we first calculate the similarity matrix

S \in R^{n \times n}

using cosine similarity to assess the similarity between any two nodes.

S_{i j} = \frac{x_{i} x_{j}}{| x_{i} | | x_{j} |},

(2)

where

x_{i} \in R^{m}

represents the original features of node

v_{i}

. Based on the similarity matrix S, we choose the top k similar nodes for each node to establish edges and, finally, derive the adjacency matrix of the feature structure

A_{f}

. Then, with the feature structure

A_{f}

and original node features X, we use a two-layer graph convolutional network to extract specific representation from the feature space, each layer of which is expressed as:

Z_{f f}^{(l)} = σ ({\tilde{D}}_{f}^{- 1 / 2} {\tilde{A}}_{f} {\tilde{D}}_{f}^{- 1 / 2} Z_{f f}^{(l - 1)} W_{f f}^{(l)}),

(3)

where

{\tilde{A}}_{f} = A_{f} + I_{f}

(

I_{f}

is the identity matrix) and

{\tilde{D}}_{f}

denotes the degree matrix of

{\tilde{A}}_{f}

with

{({\tilde{D}}_{f})}_{i i} = \sum_{j} {({\tilde{A}}_{f})}_{i j}

.

W_{f f}^{(l)}

denotes the weight matrix of the l-th layer.

Z_{f f}^{(0)} = X

.

σ

is an activation function. We use

Relu (t) = max (0, t)

in the first layer and

linear (t) = t

in the second layer. In this way, we can obtain the final feature representation

Z_{f f}

(

Z_{f f} = Z_{f f}^{(2)}

) that captures the underlying structure of nodes in the feature space.

Next, to extract specific representations from the topological space, we employ another dedicated convolutional channel designed to capture the inherent relationships between nodes within the topology space. Specifically, we first generate the initial topological features from the original network structure. There are numerous methods available for generating these topological features. Notably, the network’s adjacency matrix, denoted as A, can directly encapsulate the structural information of nodes. Therefore, we initially consider the original adjacency matrix A as the initial topological features

X_{s}

, i.e.,

X_{s} = A

. Leveraging both the topological features and the original network structure, we subsequently employ a two-layer graph convolutional network (GCN) to distill structural representations from the structural space, each layer of which is expressed as:

Z_{s s}^{(l)} = σ ({\tilde{D}}^{- 1 / 2} \tilde{A} {\tilde{D}}^{- 1 / 2} Z_{s s}^{(l - 1)} W_{s s}^{(l)}),

(4)

where

\tilde{A} = A + I

and

\tilde{D}

denotes the degree matrix of

\tilde{A}

.

W_{s s}^{(l)}

denotes the weight matrix of the l-th layer.

Z_{s s}^{(0)} = X_{s}

. We also use Relu and

linear (t) = t

in the first and second layers, respectively. Moreover, as shown in (4), we can also set the initial structure features

Z_{s s}^{(0)} = X_{s} = I

, as used in SAT [47]. Finally, we obtain the final structure representation

Z_{s s}^{(2)}

that captures the underlying relationship of nodes in the structure space.

With these two specific convolution channels, we can extract specific representations from both the feature and topology spaces. In this way, our model can adaptively choose useful information for graphs with inconsistent structures and features according to specific downstream tasks.

5.3. Joint Convolution Channels

Many real-world networks exhibit diverse degrees of alignment between their partial network structures and node features. To ensure that our model is capable of handling both consistent and inconsistent scenarios, we further design two joint convolutional channels. These channels effectively merge the network structure and node feature information, enhancing flexibility in classification tasks to identify more pertinent information.

Specifically, our method propagates the original features across the network structure and topological features through the feature structure using two joint convolution channels. It is important to highlight that this approach is distinct from traditional GCN-based methods that only propagate features across the network structure. The new joint convolution channels effectively fuse these two distinct types of information more efficiently.

First, based on the original network structure and node features, we use a two-layer graph convolutional network, each layer of which is expressed as:

Z_{f s}^{(l)} = σ ({\tilde{D}}^{- 1 / 2} \tilde{A} {\tilde{D}}^{- 1 / 2} Z_{f s}^{(l - 1)} W_{f s}^{(l)}),

(5)

where

W_{f s}^{(l)}

denotes the weight matrix of the l-th layer and

Z_{f s}^{(0)} = X

.

Second, based on the feature structure and topological features, we use a two-layer graph convolutional network, each layer of which is expressed as:

Z_{s f}^{(l)} = σ ({\tilde{D}}_{f}^{- 1 / 2} {\tilde{A}}_{f} {\tilde{D}}_{f}^{- 1 / 2} Z_{s f}^{(l - 1)} W_{s f}^{(l)}),

(6)

where

W_{s f}^{(l)}

denotes the weight matrix of the l-th layer and

Z_{s f}^{(0)} = X_{s}

. Finally, we obtain two final joint representations

Z_{f s}^{(2)}

and (

Z_{s f}^{(2)}

). With these two joint convolution channels, we can extract the common information shared by both the structure and features.

5.4. Attention Mechanism

Following the utilization of two distinct convolution channels and two combined convolution channels, we can successfully acquire the structural representation

Z_{s s}

, feature representation

Z_{f f}

, and two joint representations

Z_{f s}

and

Z_{s f}

. Nonetheless, predicting which representation correlates most with the classification task can be challenging, particularly in cases where the structure and features do not match. To address this, we employ an attention mechanism, which adaptively combines these representations using varying degrees of importance weights, enabling the automatic extraction of valuable insights. Their importance weights

(α_{s s}, α_{f f}, α_{f s}, α_{s f})

are defined as follows:

(α_{s s}, α_{f f}, α_{f s}, α_{s f}) = attention (Z_{s s}, Z_{f f}, Z_{f s}, Z_{s f}),

(7)

where

α_{s s}, α_{f f}, α_{f s}, α_{s f} \in R^{n \times 1}

represent the attention weights of the representations

Z_{s s}

,

Z_{f f}

,

Z_{f s}

,

Z_{s f} \in R^{n \times d}

, respectively. Specifically, we take the structure representation of node

v_{i}

,

z_{s s}^{i} \in R^{1 \times d}

as an example. To obtain the attention value of the structure representation, we first transform the representation through linear transformation, followed by a nonlinear activation function tanh. Then, we apply a shared attention vector

q \in R^{d^{'} \times 1}

to obtain the attention value as follows:

β_{s s}^{i} = q^{T} \cdot tanh (W_{a} \cdot {(z_{s s}^{i})}^{T} + b_{1}),

(8)

where

W_{a} \in R^{d^{'} \times d}

represents the learnable weight matrix and

b_{1} \in R^{d^{'} \times 1}

denotes the bias vector. Similarly, we can obtain the attention values of

β_{f f}^{i}, β_{f s}^{i}, β_{s f}^{i}

. We can obtain the final importance weights by normalizing the attention values using a softmax function [48]:

α_{s s}^{i} = softmax (β_{s s}^{i}) = \frac{exp (β_{s s}^{i})}{exp (β_{s s}^{i} + β_{f f}^{i} + β_{f s}^{i} + β_{s f}^{i})} .

(9)

The above equation implies that a larger

α_{s s}^{i}

value increases the importance of the structure representation. Similarly, we have:

\begin{matrix} α_{f f}^{i} = softmax (β_{f f}^{i}) \\ α_{f s}^{i} = softmax (β_{f s}^{i}) \\ α_{s f}^{i} = softmax (β_{s f}^{i}) . \end{matrix}

(10)

Based on the importance weights, we can obtain the final representation of node

v_{i}

as follows:

z^{i} = α_{s s}^{i} z_{s s}^{i} + α_{f f}^{i} z_{f f}^{i} + α_{f s}^{i} z_{f s}^{i} + α_{s f}^{i} z_{s f}^{i} .

(11)

The final representation for all n nodes is expressed in matrix form as follows:

Z = α_{S S} Z_{s s} + α_{F F} Z_{f f} + α_{F S} Z_{f s} + α_{S F} Z_{s f},

(12)

where

α_{S S} = d i a g (α_{s s})

,

α_{F F} = d i a g (α_{f f})

,

α_{F S} = d i a g (α_{f s})

and

α_{S F} = d i a g (α_{s f})

.

5.5. Optimization Objective

Based on the final representation Z obtained using (12), we then apply a linear transformation and softmax function to obtain the predicted soft label assignment matrix

\hat{Y} \in R^{n \times C}

as follows:

\hat{Y} = softmax (W_{c} Z + b_{2}),

(13)

where

{\hat{Y}}_{i c}

represents the probability of node

v_{i}

belonging to class c. Then, for the semi-supervised node classification task, we can minimize the cross-entropy over all labeled nodes between the ground truth and the prediction:

L = \sum_{l \in V_{L}} Y_{l} ln {\hat{Y}}_{l},

(14)

where Y represents the label indicator matrix, i.e.,

Y_{i c} = 1

if node

v_{i}

belongs to class c, or

Y_{i c} = 0

otherwise.

6. Experiments

In this section, we first give the experimental settings and then compare our proposed approach (MCGCN) with some state-of-the-art methods on two network analysis tasks, including transductive node classification and visualization. Next, we give attention to the analysis of synthetic datasets to validate whether the MCGCN can learn interpretable importance weights.

6.1. Experimental Setting

6.1.1. Datasets

Eight publicly available real-world datasets were utilized for evaluation. The Cornell, Texas, and Wisconsin datasets consist of web pages as nodes, hyperlinks as edges, and page categories (student, project, course, staff, and faculty) as node labels. These datasets were obtained from the respective web pages of Cornell University, the University of Texas, and the University of Wisconsin. The textual content of the web pages is typically processed and converted into feature vectors, which serve as input attributes for each node in the learning process. They are benchmarks for evaluating the performance of various graph-based machine learning algorithms, especially GCNs and other GNN models. The Film dataset is an actor co-occurrence network [49]. where nodes are actors, edges represent co-occurrence on the same Wikipedia page, and labels categorize actors based on their Wikipedia content. The Chameleon and Squirrel datasets are Wikipedia networks [50] with web pages as nodes, mutual links as edges, and labels categorized by the average monthly traffic of each page. The Cora and Citeseer datasets are citation networks [51,52], where nodes are academic papers, edges are citations, and node labels represent academic topics. Paper contents are processed as node features. Table 1 summarizes the statistics of these datasets.

6.1.2. Baselines

We compared two variants of the MCGCN with six state-of-the-art methods (DeepWalk, MLP, GCN, GAT, H2GCN [25], GPRGNN [29]): (1) MCGCN-A, which initializes topological features with an adjacency matrix, and (2) MCGCN-I, which initializes topological features with an identity matrix.

6.1.3. Parameter Setting

Following the Geom-GCN approach [26], for each dataset, we performed ten random separations, allocating 38%, 52%, and 10% of the total number of nodes for training, validation, and testing, respectively. All the parameters of the baselines were initialized as suggested in their respective papers. In our methods, MCGCN-A and MCGCN-I, we employed two-layer GCNs across four convolution channels. For Citeseer, we maintained a consistent hidden layer dimension of 512 and an output layer dimension of 32. ReLU was used as the activation function for all GCN modules. For the feature structure, we constructed a kNN graph using

k \in {2, \dots, 9}

. Our models were developed using PyTorch deep learning tools and optimized with the Adam optimizer. The learning rate was set to 0.001, and we assessed model performance using classification accuracy.

6.2. Results and Analysis

6.2.1. Node Classification

Table 2 demonstrates that our methods, MCGCN-A and MCGCN-I, achieved the best performance on seven out of the eight datasets and the second-best performance on the remaining dataset.

Our models demonstrated a noticeable improvement over the standard DeepWalk, GCN, and GAT across all datasets. This is primarily due to the MCGCN’s multi-channel approach, which allows it to better capture the complex interplay between node features and the graph structure. DeepWalk and GCN, while robust in simpler graph environments, fell short in scenarios where the graph structure was inconsistent or features were complex. Compared to GAT, the MCGCN exhibited superior performance, particularly in environments with inconsistent graph structures. The MCGCN outperformed GAT, which employs attention mechanisms at the node level. This suggests that the MCGCN’s method for handling inconsistency through multiple channels is more effective than the attention-based strategies used in GAT. The H2GCN achieved the second-best result on the Citeseer dataset. It addressed the limitations of GNNs in handling heterophily or low-homophily settings. However, the real-world complexity of heterophily might not have been fully captured in the models and datasets used. GPRGNN, which utilizes a novel GPRGNN architecture that adaptively learns GPR weights, achieved the second-best result on both the Chameleon and Film datasets. This adaptability is crucial for optimizing the use of both node features and topological information, regardless of the extent to which node labels exhibit homophily or heterophily.

The MCGCN’s strength lies in its ability to handle inconsistency in both graph structures and node features, making it particularly suitable for complex networks found in domains like social network analysis or bioinformatics. However, its complex architecture may not yield significant improvements in more uniform or consistent graph datasets. In such scenarios, the additional computational complexity of the MCGCN might not translate into proportionate performance gains, especially when compared to simpler models like the GCN or GAT, which are optimized for such environments.

In conclusion, the MCGCN represents a significant advancement in graph convolutional networks, especially for challenging datasets with structural and feature inconsistencies. Its multi-channel approach offers a novel and effective solution for these complex scenarios, though its complexity may not always be necessary for more straightforward graph analysis tasks.

6.2.2. Visualization

To demonstrate our approach’s superior performance, we employed t-SNE [53], which can project the learned node representations into a two-dimensional space, to visualize the representations in the Cora dataset as an example. Figure 3 displays the visualization results for the MLP, DeepWalk, GCN, and MCGCN-A, with each color denoting a categorical label. As shown, the results for the MLP, DeepWalk, and GCN are less satisfactory, since the borders between different segmented groups are unclear, and some points of different classes are mixed with each other. In contrast, the MCGCN forms more discernible clusters.

6.2.3. Synthetic Experiments

In this section, we elaborate on the generation of the synthetic datasets described in Section 4, which are utilized in the subsequent attention analysis.

Each synthetic graph comprised 900 nodes, and each node had a 50-dimensional feature vector. These 900 nodes were divided into three groups of 300, with each group receiving a distinct label. Although the three Gaussian distributions for these groups shared a common covariance matrix, their centers were initially distanced from each other. The topological structure was created using the SBM model [46], where each group was treated as a community. Intra-community edges were formed with a probability of 0.03, while initially, there were no inter-community edges. The node labels correlated with both the node features and the topological structure. Starting from this ideal scenario, we gradually added inconsistency.

In scenarios with structural inconsistencies, we maintained perfect and fixed node features while setting the intra-community edge probability at 0.03. We began with a synthetic graph with a 0 probability for inter-community edges. Inconsistency was introduced by incrementally increasing the inter-community edge probability by 0.003 for each new graph. In total, eleven graphs were generated, culminating in a final graph where the inter-community edge probability reached 0.03, matching the intra-community probability.

In settings with feature inconsistencies, we began with an ideal scenario, where three distinct node groups were formed using three Gaussian distributions with centers distanced from each other. We assumed that three randomly generated Gaussian distribution centers were

f_{1}

,

f_{2}

, and

f_{3}

, with a mean of

f_{m e a n} = (f_{1} + f_{2} + f_{3}) / 3

. We generated eleven graphs, each with centers defined as

f_{k} [i] = i \times f_{k} + (1 - i) \times f_{m e a n}

, where

k = 1, 2, 3

. When

i = 1

, the three centers were in their original positions, distant from each other. We then decreased i by 0.1 for each subsequent graph, causing each center to gradually shift toward

f_{m e a n}

. In the final graph of the series, with

i = 0

, all three Gaussian distributions converged at the same center.

The results, presented in Figure 1, indicate that the MCGCN consistently performed well in both groups of settings. This demonstrates the MCGCN’s effectiveness across various inconsistency levels, aligning with our expectations.

6.2.4. Attention Analysis

To further validate the MCGCN’s ability to yield interpretable results, we analyzed the distributions and trends of its attention values. Given the lack of a definitive measure for inconsistency in real-world datasets, we analyzed the MCGCN’s attention values on the synthetic datasets described earlier. This analysis helps illustrate how attention values shift with varying degrees of inconsistency in structures or features. Figure 4 shows the trends of the attention values on four representations (i.e., structure representation, feature representation, and two joint representations). Initially, when both the features and structure are informative, attention is distributed uniformly across all representations, as depicted in Figure 4a. However, as structural inconsistency intensifies, particularly approaching 1, the attention predominantly shifts toward feature representation. The results validate that the MCGCN can extract information that is correlated with the classification task and filter the redundant information contained in an unreliable structure. A similar pattern is observed in Figure 4b, where increasing feature inconsistency leads to a gradual shift of attention toward structure representation. These trends, illustrated in Figure 4, indicate that the MCGCN can adaptively learn the effective representation from an inconsistent structure and features and further demonstrate that our proposed MCGCN can obtain more interpretable results.

6.3. Case Study on Recommendation Task

6.3.1. Datasets

To evaluate the performance of the MCGCN in the context of recommendations, we constructed a dataset focused on person–job fit, sourced from the renowned human resources platform Xinrenxinshi.com, as outlined in Table 3. The dataset allocation included 80% for training, with the remaining 20% equally divided between validation and testing, each constituting 10%. BERT was utilized to initialize the representations for each person or job node, deriving these from their respective textual descriptions.

6.3.2. Baselines

For the job recommendation task, we employed GCN, GAT, AGC [54], and HAN [55] as baseline methods. These models are recognized as state of the art in this domain. Notably, HAN introduces a unique technique through its use of hierarchical attention networks. This method enables a deeper comprehension of the inherent hierarchical structures present in documents.

6.3.3. Metrics

The AUC (Area Under the Receiver Operating Characteristic Curve) metric is of paramount importance in assessing the performance of recommendation tasks. Furthermore, we employed four other widely acknowledged metrics—accuracy, recall, and F1 score—to comprehensively evaluate effectiveness.

6.3.4. Results and Analysis

Table 4 displays the job recommendation results on the Xinrenxinshi dataset, with our method exhibiting outstanding performance. Our approach surpassed all competing methods in terms of accuracy and F1 score while achieving the second-best results in terms of the AUC. In terms of recall, AGC and GAT marginally outperformed our model, with differences of 3.79% and 0.45%. Overall, as illustrated in Table 4, our model demonstrated either superior or comparable effectiveness in relation to other baseline models across most of the evaluated metrics.

7. Conclusions and Future Work

In this paper, we propose the MCGCN, a novel multi-channel graph convolutional network for graphs with inconsistent structures and features. The proposed approach includes four main convolution channels and an attention mechanism. To handle inconsistent information, two specific convolution channels are designed to extract representations from the structure and feature spaces. Additionally, two joint convolution channels are developed to fuse and capture the common information shared by both the structure and node features, enabling the handling of both consistent and inconsistent scenarios. An attention mechanism is employed to adaptively learn the importance weights of these representations, focusing on the information most relevant to node label prediction. Comprehensive experiments on synthetic and real-world datasets demonstrate the proposed approach’s superior performance compared to several existing state-of-the-art methods.

The MCGCN is particularly well suited for applications involving complex network structures with variable or inconsistent data, such as social networks, biological networks, and recommendation systems. Its ability to handle inconsistency and extract meaningful representations from complex graphs makes it a valuable tool in these areas. However, for simpler graph structures or tasks where the overhead of multiple convolution channels is not justified, traditional single-channel GCN models might be more efficient.

Incorporating the proposed approach into real-world applications demands careful consideration of several practical aspects. (1) Scalability: As the approach is designed for graph-based data, which can be extensive and complex in real-world scenarios, it is vital to ensure that the model scales efficiently. This might involve using distributed computing techniques to manage large datasets and optimizing the model’s architecture to handle an increasing number of nodes and edges without significantly compromising performance. (2) Deployment challenges: Integrating this model into existing systems involves challenges such as ensuring compatibility with current infrastructures, minimal downtime, and maintaining data integrity and security during the integration process. Continuous monitoring and maintenance will be required to ensure the model’s performance remains optimal over time. (3) Computational resources: The proposed approach requires significant computational resources for both training and inference. The training phase might require high-performance GPUs or even TPUs to manage the computational load, especially for larger datasets. For inference, especially in real-time applications, the computational demands can be substantial. It is crucial to balance the computational costs with the performance gains provided by the model. In summary, the successful implementation of the proposed approach in real-world applications requires a well-thought-out strategy that addresses scalability, deployment, and computational resources.

In future research, we plan to explore alternative approaches for initializing topological features and generating feature structures. Furthermore, we aim to enhance our method with a data-driven, self-supervised learning approach, making it applicable to graphs without available label information.

Author Contributions

Conceptualization, X.C. and Y.W.; methodology, J.W.; software (https://github.com/wyking36/AGC, accessed on 1 January 2024), R.W. and T.W.; validation, T.W. and W.L.; writing—original draft preparation, X.C.; writing—review and editing, J.W.; visualization, R.W.; supervision, W.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (No. 62276187 and No. U2031142).

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

Author Xinglong Chang was employed by the company Qijia Youdao Network Technology (Beijing) Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Mitchell, J.C. Social networks. Annu. Rev. Anthropol. 1974, 3, 279–299. [Google Scholar] [CrossRef]
Milroy, L.; Llamas, C. Social networks. In The Handbook of Language Variation and Change; Wiley-Blackwell: Oxford, UK, 2013; pp. 407–427. [Google Scholar]
Radicchi, F.; Fortunato, S.; Vespignani, A. Citation networks. In Models of Science Dynamics: Encounters between Complexity Theory and Information Sciences; Springer: Berlin/Heidelberg, Germany, 2011; pp. 233–257. [Google Scholar]
Greenberg, S.A. How citation distortions create unfounded authority: Analysis of a citation network. BMJ 2009, 339, b2680. [Google Scholar] [CrossRef] [PubMed]
Shaw, M.E. Communication networks. In Advances in Experimental Social Psychology; Elsevier: Amsterdam, The Netherlands, 1964; Volume 1, pp. 111–147. [Google Scholar]
Monge, P.R.; Contractor, N.S. Theories of Communication Networks; Oxford University Press: New York, NY, USA, 2003. [Google Scholar]
Alm, E.; Arkin, A.P. Biological networks. Curr. Opin. Struct. Biol. 2003, 13, 193–202. [Google Scholar] [CrossRef] [PubMed]
Girvan, M.; Newman, M.E. Community structure in social and biological networks. Proc. Natl. Acad. Sci. USA 2002, 99, 7821–7826. [Google Scholar] [CrossRef] [PubMed]
Grover, A.; Leskovec, J. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 855–864. [Google Scholar]
Chen, Y.; Wu, L.; Zaki, M. Iterative deep graph learning for graph neural networks: Better and robust node embeddings. Adv. Neural Inf. Process. Syst. 2020, 33, 19314–19326. [Google Scholar]
Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. In Proceedings of the ICLR, Toulon, France, 24–26 April 2017. [Google Scholar]
Tang, J.; Aggarwal, C.C.; Liu, H. Node classification in signed social networks. In Proceedings of the ICDM, Barcelona, Spain, 12–15 December 2016; pp. 54–62. [Google Scholar]
Gao, S.; Denoyer, L.; Gallinari, P. Temporal link prediction by integrating content and structure information. In Proceedings of the CIKM, Scotland, UK, 24–28 October 2011; pp. 1169–1174. [Google Scholar]
Yu, X.; Han, J. Personalized entity recommendation: A heterogeneous information network approach. In Proceedings of the WSDM, New York City, NY, USA, 24–28 February 2014; pp. 283–292. [Google Scholar]
Hamilton, W.L.; Ying, Z.; Leskovec, J. Inductive representation learning on large graphs. In Proceedings of the NIPS, Long Beach, CA, USA, 12–24 December 2017; pp. 1024–1034. [Google Scholar]
Velickovic, P.; Cucurull, G.; Casanova, A.; Romero, A.; Liò, P.; Bengio, Y. Graph attention networks. In Proceedings of the ICLR, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
Xu, K.; Hu, W.; Leskovec, J.; Jegelka, S. How Powerful are Graph Neural Networks? In Proceedings of the ICLR, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
Shervashidze, N.; Schweitzer, P.; van Leeuwen, E.J.; Mehlhorn, K.; Borgwardt, K.M. Weisfeiler-Lehman Graph Kernels. J. Mach. Learn. Res. 2011, 12, 2539–2561. [Google Scholar]
Wu, F.; Souza, A.; Zhang, T.; Fifty, C.; Yu, T.; Weinberger, K. Simplifying Graph Convolutional Networks. In Proceedings of the ICML, Long Beach, CA, USA, 10–15 June 2019; pp. 6861–6871. [Google Scholar]
Li, Q.; Han, Z.; Wu, X. Deeper Insights Into Graph Convolutional Networks for Semi-Supervised Learning. In Proceedings of the AAAI, New Orleans, LA, USA, 2–7 February 2018; pp. 3538–3545. [Google Scholar]
Qin, M.; Jin, D.; Lei, K.; Gabrys, B.; Musial-Gabrys, K. Adaptive community detection incorporating topology and content in social networks. Knowl. Based Syst. 2018, 161, 342–356. [Google Scholar] [CrossRef]
Zügner, D.; Akbarnejad, A.; Günnemann, S. Adversarial attacks on neural networks for graph data. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, New York, NY, USA, 19–23 August 2018; pp. 2847–2856. [Google Scholar]
Perozzi, B.; Al-Rfou, R.; Skiena, S. DeepWalk: Online learning of social representations. In Proceedings of the KDD, New York, NY, USA, 24–27 August 2014; pp. 701–710. [Google Scholar]
Pal, S.K.; Mitra, S. Multilayer perceptron, fuzzy sets, and classification. IEEE Trans. Neural Netw. 1992, 3, 683–697. [Google Scholar] [CrossRef]
Zhu, J.; Yan, Y.; Zhao, L.; Heimann, M.; Akoglu, L.; Koutra, D. Beyond Homophily in Graph Neural Networks: Current Limitations and Effective Designs. In Proceedings of the NIPS, Virtual-only Conference, 6–12 December 2020. [Google Scholar]
Pei, H.; Wei, B.; Chang, K.C.; Lei, Y.; Yang, B. Geom-GCN: Geometric Graph Convolutional Networks. In Proceedings of the ICLR, Virtual-only Conference, 26 April–1 May 2020. [Google Scholar]
Zhu, J.; Rossi, R.A.; Rao, A.B.; Mai, T.; Lipka, N.; Ahmed, N.K.; Koutra, D. Graph Neural Networks with Heterophily. In Proceedings of the AAAI, Virtual-only Conference, 2–9 February 2021. [Google Scholar]
Yan, Y.; Hashemi, M.; Swersky, K.; Yang, Y.; Koutra, D. Two Sides of the Same Coin: Heterophily and Oversmoothing in Graph Convolutional Neural Networks. arXiv 2021, arXiv:2102.06462. [Google Scholar]
Chien, E.; Peng, J.; Li, P.; Milenkovic, O. Adaptive Universal Generalized PageRank Graph Neural Network. In Proceedings of the ICLR, Virtual-only Conference, 3–7 May 2021. [Google Scholar]
Wang, K.; Zhu, Y.; Liu, H.; Zang, T.; Wang, C. Learning Aspect-Aware High-Order Representations from Ratings and Reviews for Recommendation. ACM Trans. Knowl. Discov. Data 2023, 17, 1–22. [Google Scholar] [CrossRef]
Guo, M.H.; Xu, T.X.; Liu, J.J.; Liu, Z.N.; Jiang, P.T.; Mu, T.J.; Zhang, S.H.; Martin, R.R.; Cheng, M.M.; Hu, S.M. Attention mechanisms in computer vision: A survey. Comput. Vis. Media 2022, 8, 331–368. [Google Scholar] [CrossRef]
Niu, Z.; Zhong, G.; Yu, H. A review on the attention mechanism of deep learning. Neurocomputing 2021, 452, 48–62. [Google Scholar] [CrossRef]
Scarselli, F.; Gori, M.; Tsoi, A.C.; Hagenbuchner, M.; Monfardini, G. The Graph Neural Network Model. IEEE Trans. Neural Netw. 2009, 20, 61–80. [Google Scholar] [CrossRef]
Wu, Z.; Pan, S.; Chen, F.; Long, G.; Zhang, C.; Yu, P.S. A Comprehensive Survey on Graph Neural Networks. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 4–24. [Google Scholar] [CrossRef]
Defferrard, M.; Bresson, X.; Vandergheynst, P. Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering. In Proceedings of the NIPS, Barcelona, Spain, 5–10 December 2016; pp. 3837–3845. [Google Scholar]
Shuman, D.I.; Narang, S.K.; Frossard, P.; Ortega, A.; Vandergheynst, P. The Emerging Field of Signal Processing on Graphs: Extending High-Dimensional Data Analysis to Networks and Other Irregular Domains. IEEE Signal Process. Mag. 2013, 30, 83–98. [Google Scholar] [CrossRef]
Ge, L.; Li, S.; Wang, Y.; Chang, F.; Wu, K. Global spatial-temporal graph convolutional network for urban traffic speed prediction. Appl. Sci. 2020, 10, 1509. [Google Scholar] [CrossRef]
Chen, Z.; Li, S.; Yang, B.; Li, Q.; Liu, H. Multi-scale spatial temporal graph convolutional network for skeleton-based action recognition. In Proceedings of the AAAI, Virtual-only Conference, 2–9 February 2021; Volume 35, pp. 1113–1122. [Google Scholar]
Shan, X.; Cao, J.; Huo, S.; Chen, L.; Sarrigiannis, P.G.; Zhao, Y. Spatial–temporal graph convolutional network for Alzheimer classification based on brain functional connectivity imaging of electroencephalogram. Hum. Brain Mapp. 2022, 43, 5194–5209. [Google Scholar] [CrossRef]
Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; Bengio, Y. Graph attention networks. arXiv 2017, arXiv:1710.10903. [Google Scholar]
Rong, Y.; Huang, W.; Xu, T.; Huang, J. Dropedge: Towards deep graph convolutional networks on node classification. arXiv 2019, arXiv:1907.10903. [Google Scholar]
Wang, X.; Zhu, M.; Bo, D.; Cui, P.; Shi, C.; Pei, J. Am-gcn: Adaptive multi-channel graph convolutional networks. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, New York, NY, USA, 6–10 July 2020; pp. 1243–1253. [Google Scholar]
Zhang, H.; Tian, Q.; Han, Y. Multi channel spectrum prediction algorithm based on GCN and LSTM. In Proceedings of the 2022 IEEE 96th Vehicular Technology Conference (VTC2022-Fall), London, UK, 26–29 September 2022; pp. 1–5. [Google Scholar]
Zhai, R.; Zhang, L.; Wang, Y.; Song, Y.; Yu, J. A multi-channel attention graph convolutional neural network for node classification. J. Supercomput. 2023, 79, 3561–3579. [Google Scholar] [CrossRef]
Hu, F.; Zhu, Y.; Wu, S.; Wang, L.; Tan, T. Hierarchical Graph Convolutional Networks for Semi-supervised Node Classification. In Proceedings of the IJCAI, Macao, China, 10–16 August 2019; pp. 4532–4539. [Google Scholar]
Karrer, B.; Newman, M.E.J. Stochastic blockmodels and community structure in networks. Phys. Rev. 2010, 83, 016107. [Google Scholar] [CrossRef] [PubMed]
Chen, X.; Chen, S.; Yao, J.; Zheng, H.; Zhang, Y.; Tsang, I.W. Learning on attribute-missing graphs. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 44, 740–757. [Google Scholar] [CrossRef]
Liu, W.; Wen, Y.; Yu, Z.; Yang, M. Large-margin softmax loss for convolutional neural networks. In Proceedings of the 33rd International Conference on International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016; Volume 48, pp. 507–516. [Google Scholar]
Tang, J.; Sun, J.; Wang, C.; Yang, Z. Social influence analysis in large-scale networks. In Proceedings of the KDD, Paris, France, 28 June–1 July 2009; pp. 807–816. [Google Scholar]
Rozemberczki, B.; Allen, C.; Sarkar, R. Multi-Scale attributed node embedding. J. Complex Netw. 2021, 9, cnab014. [Google Scholar] [CrossRef]
Sen, P.; Namata, G.; Bilgic, M.; Getoor, L.; Gallagher, B.; Eliassi-Rad, T. Collective Classification in Network Data. AI Mag. 2008, 29, 93–106. [Google Scholar] [CrossRef]
Namata, G.M.; London, B.; Getoor, L.; Huang, B. Query-driven Active Surveying for Collective Classification. In Proceedings of the Workshop on Mining and Learning with Graphs (MLG), Edinburgh, UK, 1 July 2012. [Google Scholar]
Hinton, G.E. Visualizing High-Dimensional Data Using t-SNE. Vigiliae Christ. 2008, 9, 2579–2605. [Google Scholar]
Chang, X.; Wang, J.; Guo, R.; Wang, Y.; Li, W. Asymmetric Graph Contrastive Learning. Mathematics 2023, 11, 4505. [Google Scholar] [CrossRef]
Yang, Z.; Yang, D.; Dyer, C.; He, X.; Smola, A.; Hovy, E. Hierarchical attention networks for document classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, CA, USA, 12–17 June 2016; pp. 1480–1489. [Google Scholar]

Figure 1. Classification accuracy on synthetic graphs. The x-axis represents the structure/feature inconsistency levels.

Figure 2. The framework of the MCGCN. We first generate topological features from the original structure and create a feature structure from the original features. Then, we design four convolution channels to capture different information: two specific convolution channels to separately capture feature and structure information, and two joint convolution channels to effectively extract the common information shared by the structure and features. Finally, we employ an attention mechanism to adaptively learn the importance weights of these representations.

Figure 3. Visualization results on the Cora dataset. Each color denotes a categorical label (i.e., academic topic).

Figure 4. Attention analysis of the MCGCN on synthetic datasets. The x-axis represents the feature/structure inconsistency levels.

Table 1. The statistics of the datasets. # denotes the number of objects.

Dataset	Texas	Wisconsin	Cornell	Squirrel	Chameleon	Film	Cora	Citeseer
# Nodes	183	251	183	5201	2277	7600	2708	3327
#Edges	309	499	295	217,073	36,101	33,544	5429	4732
#Features	1703	1703	1703	2089	2325	931	1433	3703
#Classes	5	5	5	5	5	5	7	6

Table 2. Mean accuracies for transductive node classification task (%). The best result is indicated in bold and the second-best result is underlined.

	Texas	Wisconsin	Cornell	Squirrel	Chameleon	Film	Cora	Citeseer
DeepWalk	49.19 ± 0.38	53.51 ± 1.10	44.12 ± 0.52	32.37 ± 0.95	42.61 ± 0.42	23.74 ± 0.56	76.08 ± 0.63	53.59 ± 0.63
MLP	77.30 ± 0.55	83.01 ± 1.02	77.98 ± 0.83	34.39 ± 0.43	45.47 ± 0.37	32.78 ± 0.52	72.30 ± 0.88	70.17 ± 0.62
GCN	52.16 ± 1.04	55.88 ± 0.97	52.70 ± 0.71	37.96 ± 1.13	60.03 ± 0.74	27.92 ± 0.51	85.21 ± 0.53	73.68 ± 0.47
GAT	58.38 ± 0.48	54.41 ± 0.94	54.32 ± 0.38	30.03 ± 1.28	59.93 ± 0.69	28.15 ± 0.92	85.34 ± 0.73	73.92 ± 0.43
H2GCN	77.57 ± 0.87	81.72 ± 0.74	77.81 ± 0.69	40.14 ± 0.47	59.64 ± 1.02	31.63 ± 0.49	85.27 ± 0.32	74.42 ± 0.49
GPRGNN	77.83 ± 0.43	81.96 ± 0.96	77.93 ± 0.98	41.81 ± 0.89	60.09 ± 0.73	33.25 ± 0.47	85.79 ± 0.36	73.37 ± 0.90
MCGCN-A	78.46 ± 0.39	82.39 ± 0.91	78.65 ± 0.59	41.74 ± 1.28	59.91 ± 0.47	33.46 ± 0.62	86.32 ± 0.60	74.04 ± 0.67
MCGCN-I	78.39 ± 0.47	82.55 ± 0.42	78.21 ± 0.38	42.21 ± 1.79	61.64 ± 0.35	33.17 ± 0.58	86.28 ± 0.74	74.51 ± 0.34

Table 3. Statistics of the Xinrenxinshi dataset. # denotes the number of objects.

#Person	#Job	#Interaction	#Recommend	#Interview	#Offer
9719	2035	15,101	1735	952	159

Table 4. A comparison of the job recommendation results on the Xinrenxinshi dataset. The best result is indicated in bold and the second-best result is underlined.

Method	AUC	Accuracy	Recall	F1 Score
GCN	0.7789	0.7347	0.9155	0.7753
GAT	0.8176	0.7300	0.9765	0.7834
AGC	0.8909	0.6596	0.9437	0.7349
HAN	0.6465	0.5266	0.7230	0.5693
Ours	0.8448	0.7483	0.9395	$0.7891$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chang, X.; Wang, J.; Wang, R.; Wang, T.; Wang, Y.; Li, W. Multi-Channel Graph Convolutional Networks for Graphs with Inconsistent Structures and Features. Electronics 2024, 13, 607. https://doi.org/10.3390/electronics13030607

AMA Style

Chang X, Wang J, Wang R, Wang T, Wang Y, Li W. Multi-Channel Graph Convolutional Networks for Graphs with Inconsistent Structures and Features. Electronics. 2024; 13(3):607. https://doi.org/10.3390/electronics13030607

Chicago/Turabian Style

Chang, Xinglong, Jianrong Wang, Rui Wang, Tao Wang, Yingkui Wang, and Weihao Li. 2024. "Multi-Channel Graph Convolutional Networks for Graphs with Inconsistent Structures and Features" Electronics 13, no. 3: 607. https://doi.org/10.3390/electronics13030607

APA Style

Chang, X., Wang, J., Wang, R., Wang, T., Wang, Y., & Li, W. (2024). Multi-Channel Graph Convolutional Networks for Graphs with Inconsistent Structures and Features. Electronics, 13(3), 607. https://doi.org/10.3390/electronics13030607

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Multi-Channel Graph Convolutional Networks for Graphs with Inconsistent Structures and Features

Abstract

1. Introduction

2. Related Works

2.1. Graph Convolutional Networks

2.2. Multi-Channel Graph Convolutional Networks

3. Preliminaries

3.1. Problem Definition

3.2. Notations of Graph Convolutional Networks

4. Motivating Observations

4.1. Setting One: Structure Inconsistency

4.2. Setting Two: Feature Inconsistency

4.3. Motivation

5. Methodology

5.1. Overview

5.2. Specific Convolution Channels

5.3. Joint Convolution Channels

5.4. Attention Mechanism

5.5. Optimization Objective

6. Experiments

6.1. Experimental Setting

6.1.1. Datasets

6.1.2. Baselines

6.1.3. Parameter Setting

6.2. Results and Analysis

6.2.1. Node Classification

6.2.2. Visualization

6.2.3. Synthetic Experiments

6.2.4. Attention Analysis

6.3. Case Study on Recommendation Task

6.3.1. Datasets

6.3.2. Baselines

6.3.3. Metrics

6.3.4. Results and Analysis

7. Conclusions and Future Work

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI