Efficient Graph Representation Learning by Non-Local Information Exchange †
Abstract
:1. Introduction
- We propose a non-local information exchange mechanism to efficiently integrate feature representations from non-local neighborhoods with a collection of express connections.
- Current works focus on the optimization of feature representation with a deep GNN model by overcoming the over-smoothing issue. We address this challenge from the novel perspective of expressibility—i.e., the insight of our NLE mechanism is to directly combine local and global information through express links before the over-smoothed features undermine the discriminative power of feature representation.
- We devise our ExM wrapper as a pre-processing step, which generates new topologies in advance and facilitates various GNN models to retain SOTA performance on both homophilous and heterophilous graph data.
2. Preliminaries
2.1. Related Works
2.2. Proxy Measurement for Expressibility
3. Methods
3.1. Graph Re-Wiring by Hierarchical Non-Local Information Exchange
3.2. Express Messenger (ExM) Wrapper
3.2.1. Exhaustive vs. Progressive NLE
Cascaded Express Messenger (C-ExM) Wrapper
Algorithm 1 Cascaded ExM wrapper |
Require: Range of hopping steps , hop while do if l is odd or then else if l is even then end if end while |
Aggregated Express Messenger (A-ExM) Wrapper
Algorithm 2 Aggregated ExM wrapper |
Require: Range of hopping steps , hop while do while do end while end while |
3.3. Implementation
4. Experiments
4.1. Experimental Setting
4.1.1. Dataset
4.1.2. Experiment Setup
4.2. Results on Synthetic Data
4.3. Results on Real Data
4.3.1. Benchmark Graph Re-Wiring Techniques
4.3.2. Evaluation on Graph Feature Representation Learning
4.3.3. Performance of Deep GNN
4.3.4. Ablation Studies
5. Discussion
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Appendix A. Data Statistics
h | Node Number | Edge Number | Class Number | |
---|---|---|---|---|
Questions | 0.84 | 48,921 | 153,540 | 2 |
Cora | 0.81 | 2708 | 10,556 | 7 |
Pubmed | 0.80 | 19,717 | 88,648 | 3 |
Citeseer | 0.74 | 3327 | 9104 | 6 |
Minesweeper | 0.68 | 10,000 | 39,402 | 2 |
Tolokers | 0.60 | 11,758 | 519,000 | 2 |
Amazon-ratings | 0.38 | 24,492 | 93,050 | 5 |
Chameleon | 0.24 | 2277 | 31,421 | 5 |
Squirrel | 0.22 | 5201 | 198,493 | 5 |
Wisconsin | 0.20 | 251 | 515 | 5 |
Cornell | 0.13 | 183 | 298 | 5 |
Texas | 0.11 | 183 | 325 | 5 |
Roman-empire | 0.05 | 22,662 | 32,927 | 18 |
Appendix B. Detailed Analysis in Ablation Studies
Roman | Amazon | Mine. | Squirrel | Citeseer | ||
---|---|---|---|---|---|---|
GAT + C-ExM | 83.74 ± 0.55 | 46.44 ± 0.42 | 86.54 ± 0.78 | 61.55 ± 1.33 | 74.25 ± 1.27 | |
76.44 ± 0.87 | 46.83 ± 0.72 | 90.56 ± 0.59 | 61.48 ± 0.99 | 73.58 ± 1.97 | ||
76.19 ± 0.82 | 46.76 ± 0.71 | 90.96 ± 0.74 | 62.51 ± 0.81 | 73.84 ± 1.27 | ||
75.97 ± 0.70 | 47.00 ± 0.61 | 90.82 ± 0.82 | 61.91 ± 1.32 | 74.25 ± 1.52 | ||
SAGE + C-ExM | 90.34 ± 0.42 | 50.25 ± 0.49 | 87.09 ± 0.79 | 43.93 ± 1.21 | 75.30 ± 1.74 | |
85.00 ± 0.52 | 50.44 ± 0.79 | 92.51 ± 0.40 | 43.30 ± 1.86 | 74.57 ± 1.50 | ||
85.11 ± 0.50 | 51.30 ± 0.52 | 92.81 ± 0.39 | 44.85 ± 1.60 | 74.64 ± 1.72 | ||
84.81 ± 0.50 | 51.59 ± 0.39 | 92.93 ± 0.40 | 44.76 ± 1.79 | 74.35 ± 1.37 |
Average Degree | Roman | Amazon. | Chameleon | Squirrel |
---|---|---|---|---|
1.45 | 3.80 | 13.80 | 38.16 | |
GAT + C-ExM | 80.77 ± 0.45 | 48.43 ± 0.44 | 70.88 ± 1.60 | 76.00 ± 0.95 |
GAT + A-ExM | 79.99 ± 0.49 | 48.17 ± 0.53 | 73.18 ± 2.08 | 77.52 ± 0.86 |
FSGNN + C-ExM | 72.61 ± 0.57 | 44.56 ± 0.59 | 75.99 ± 1.31 | 66.45 ± 4.17 |
FSGNN + A-ExM | 72.17 ± 0.35 | 45.42 ± 0.45 | 77.76 ± 0.94 | 73.85 ± 2.08 |
References
- Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
- Zhang, M.; Chen, Y. Link prediction based on graph neural networks. In Proceedings of the NIPS’18: 32nd International Conference on Neural Information Processing Systems, Montreal, ON, Canada, 3–8 December 2018; Volume 31. [Google Scholar]
- Xu, K.; Hu, W.; Leskovec, J.; Jegelka, S. How Powerful are Graph Neural Networks? In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Yun, S.; Jeong, M.; Kim, R.; Kang, J.; Kim, H.J. Graph transformer networks. In Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, BC, Canada, 8–14 December 2019; Volume 32. [Google Scholar]
- Kreuzer, D.; Beaini, D.; Hamilton, W.; Létourneau, V.; Tossou, P. Rethinking graph transformers with spectral attention. Adv. Neural Inf. Process. Syst. 2021, 34, 21618–21629. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
- Chen, D.; Lin, Y.; Li, W.; Li, P.; Zhou, J.; Sun, X. Measuring and relieving the over-smoothing problem for graph neural networks from the topological view. In Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20), New York, NY, USA, 7–12 February 2020; Volume 34, pp. 3438–3445. [Google Scholar]
- Kim, J.; Nguyen, D.; Min, S.; Cho, S.; Lee, M.; Lee, H.; Hong, S. Pure transformers are powerful graph learners. Adv. Neural Inf. Process. Syst. 2022, 35, 14582–14595. [Google Scholar]
- Chen, J.; Gao, K.; Li, G.; He, K. NAGphormer: A tokenized graph transformer for node classification in large graphs. In Proceedings of the Eleventh International Conference on Learning Representations, Kigali, Rwanda, 1–5 May 2022. [Google Scholar]
- He, X.; Hooi, B.; Laurent, T.; Perold, A.; Lecun, Y.; Bresson, X. A Generalization of ViT/MLP-Mixer to Graphs. In Proceedings of the 40th International Conference on Machine Learning, Honolulu, HI, USA, 23–29 July 2023; Krause, A., Brunskill, E., Cho, K., Engelhardt, B., Sabato, S., Scarlett, J., Eds.; Proceedings of Machine Learning Research, PMLR: Birmingham, UK, 2023; Volume 202, pp. 12724–12745. [Google Scholar]
- Cai, S.; Li, L.; Deng, J.; Zhang, B.; Zha, Z.J.; Su, L.; Huang, Q. Rethinking Graph Neural Architecture Search From Message-Passing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 6657–6666. [Google Scholar]
- Feng, J.; Chen, Y.; Li, F.; Sarkar, A.; Zhang, M. How powerful are k-hop message-passing graph neural networks. Adv. Neural Inf. Process. Syst. 2022, 35, 4776–4790. [Google Scholar]
- Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; Bengio, Y. Graph attention networks. arXiv 2017, arXiv:1710.10903. [Google Scholar]
- Mo, Y.; Peng, L.; Xu, J.; Shi, X.; Zhu, X. Simple unsupervised graph representation learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 28 February–1 March 2022; Volume 36, pp. 7797–7805. [Google Scholar]
- Zhu, J.; Yan, Y.; Zhao, L.; Heimann, M.; Akoglu, L.; Koutra, D. Beyond homophily in graph neural networks: Current limitations and effective designs. Adv. Neural Inf. Process. Syst. 2020, 33, 7793–7804. [Google Scholar]
- Zheng, X.; Wang, Y.; Liu, Y.; Li, M.; Zhang, M.; Jin, D.; Yu, P.S.; Pan, S. Graph neural networks for graphs with heterophily: A survey. arXiv 2022, arXiv:2202.07082. [Google Scholar]
- Rong, Y.; Huang, W.; Xu, T.; Huang, J. Dropedge: Towards deep graph convolutional networks on node classification. arXiv 2019, arXiv:1907.10903. [Google Scholar]
- Gasteiger, J.; Weißenberger, S.; Günnemann, S. Diffusion improves graph learning. In Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, BC, Canada, 8–14 December 2019; Volume 32. [Google Scholar]
- Battiston, F.; Cencetti, G.; Iacopini, I.; Latora, V.; Lucas, M.; Patania, A.; Young, J.G.; Petri, G. Networks beyond pairwise interactions: Structure and dynamics. Phys. Rep. 2020, 874, 1–92. [Google Scholar]
- Alon, U.; Yahav, E. On the bottleneck of graph neural networks and its practical implications. arXiv 2020, arXiv:2006.05205. [Google Scholar]
- Topping, J.; Di Giovanni, F.; Chamberlain, B.P.; Dong, X.; Bronstein, M.M. Understanding over-squashing and bottlenecks on graphs via curvature. arXiv 2021, arXiv:2111.14522. [Google Scholar]
- Buades, A.; Coll, B.; Morel, J.M. A non-local algorithm for image denoising. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–25 June 2005; IEEE: New York, NY, USA, 2005; Volume 2, pp. 60–65. [Google Scholar]
- Coifman, R.R.; Lafon, S. Diffusion maps. Appl. Comput. Harmon. Anal. 2006, 21, 5–30. [Google Scholar] [CrossRef]
- Rusch, T.K.; Chamberlain, B.P.; Mahoney, M.W.; Bronstein, M.M.; Mishra, S. Gradient gating for deep multi-rate learning on graphs. arXiv 2022, arXiv:2210.00513. [Google Scholar]
- Battaglia, P.W.; Hamrick, J.B.; Bapst, V.; Sanchez-Gonzalez, A.; Zambaldi, V.; Malinowski, M.; Tacchetti, A.; Raposo, D.; Santoro, A.; Faulkner, R.; et al. Relational inductive biases, deep learning, and graph networks. arXiv 2018, arXiv:1806.01261. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16 × 16 words: Transformers for image recognition at scale. arXiv 2010, arXiv:2010.11929. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 10012–10022. [Google Scholar]
- Zhang, Z.; Liu, Q.; Hu, Q.; Lee, C.K. Hierarchical graph transformer with adaptive node sampling. Adv. Neural Inf. Process. Syst. 2022, 35, 21171–21183. [Google Scholar]
- Zhang, J.; Zhang, H.; Xia, C.; Sun, L. Graph-bert: Only attention is needed for learning graph representations. arXiv 2020, arXiv:2001.05140. [Google Scholar]
- Morris, C.; Ritzert, M.; Fey, M.; Hamilton, W.L.; Lenssen, J.E.; Rattan, G.; Grohe, M. Weisfeiler and leman go neural: Higher-order graph neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 4602–4609. [Google Scholar]
- Morris, C.; Fey, M.; Kriege, N.M. The power of the Weisfeiler-Leman algorithm for machine learning with graphs. arXiv 2021, arXiv:2105.05911. [Google Scholar]
- Goodfellow, I.; Bengio, Y.; Courville, A.; Bengio, Y. Deep Learning; MIT Press: Cambridge, MA, USA, 2016; Volume 1. [Google Scholar]
- Wei, Z.; Wu, G. Non-local Exchange: Introduce Non-locality via Graph re-wiring to Graph Neural Networks. In Proceedings of the NeurIPS 2024 Workshop on Behavioral Machine Learning, Vancouver, BC, Canada, 14 December 2024. [Google Scholar]
- Platonov, O.; Kuznedelev, D.; Diskin, M.; Babenko, A.; Prokhorenkova, L. A critical look at the evaluation of GNNs under heterophily: Are we really making progress? arXiv 2023, arXiv:2302.11640. [Google Scholar]
- Pei, H.; Wei, B.; Chang, K.C.C.; Lei, Y.; Yang, B. Geom-gcn: Geometric graph convolutional networks. arXiv 2020, arXiv:2002.05287. [Google Scholar]
- Shi, Y.; Huang, Z.; Feng, S.; Zhong, H.; Wang, W.; Sun, Y. Masked label prediction: Unified message-passing model for semi-supervised classification. arXiv 2020, arXiv:2009.03509. [Google Scholar]
- Hamilton, W.; Ying, Z.; Leskovec, J. Inductive representation learning on large graphs. In Proceedings of the NIPS’17: Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
- Wang, X.; Zhang, M. How powerful are spectral graph neural networks. In Proceedings of the International Conference on Machine Learning, Baltimore, MD, USA, 17–23 July 2022; PMLR: Birmingham, UK, 2022; pp. 23341–23362. [Google Scholar]
- Maurya, S.K.; Liu, X.; Murata, T. Simplifying approach to node classification in graph neural networks. J. Comput. Sci. 2022, 62, 101695. [Google Scholar] [CrossRef]
- Rusch, T.K.; Chamberlain, B.; Rowbottom, J.; Mishra, S.; Bronstein, M. Graph-coupled oscillator networks. In Proceedings of the 39th International Conference on Machine Learning, Baltimore, MD, USA, 17–23 July 2022; PMLR: Birmingham, UK, 2022; pp. 18888–18909. [Google Scholar]
- Chen, M.; Wei, Z.; Huang, Z.; Ding, B.; Li, Y. Simple and deep graph convolutional networks. In Proceedings of the International Conference on Machine Learning, Vancouver, BC, Canada, 13–19 July 2020; PMLR: Birmingham, UK, 2020; pp. 1725–1735. [Google Scholar]
- Nikolentzos, G.; Dasoulas, G.; Vazirgiannis, M. k-hop graph neural networks. Neural Netw. 2020, 130, 195–205. [Google Scholar] [CrossRef] [PubMed]
- Cauteruccio, F.; Kou, Y. Investigating the emotional experiences in eSports spectatorship: The case of League of Legends. Inf. Process. Manag. 2023, 60, 103516. [Google Scholar] [CrossRef]
GCN | GDC | DropEdge | NAGphormer | NLE (Ours) | |
---|---|---|---|---|---|
Long-range interaction | ✗ | ✗ | ✗ | ✓ | ✓ |
Graph re-wiring | ✗ | ✓ | ✓ | ✗ | ✓ |
Anti-over-smoothing | ✗ | ✗ | ✓ | ✓ | ✓ |
h | Original | DropEdge | GDC | NLE (Ours) | |
---|---|---|---|---|---|
Cham. | 0.24 | 26.54 | 24.56 (−1.98) | 33.55 (+7.01) | 35.31 (+8.77) |
Squirrel | 0.22 | 22.96 | 23.44 (+0.48) | 22.96 (−0.00) | 22.96 (−0.00) |
Wiscon. | 0.20 | 47.06 | 41.18 (−5.88) | 41.18 (−5.88) | 50.98 (+3.92) |
Texas | 0.11 | 64.87 | 64.87 (−0.00) | 64.87 (−0.00) | 70.27 (+5.40) |
Roman. | 0.05 | 24.32 | 22.71 (−1.61) | 19.91 (−4.41) | 28.08 (+3.76) |
Roman | Texas | Wisconsin | Squirrel | Chameleon | Amazon | Citeseer | Pubmed | Cora | |
---|---|---|---|---|---|---|---|---|---|
h = 0.05 | h = 0.11 | h = 0.20 | h = 0.22 | h = 0.24 | h = 0.38 | h = 0.74 | h = 0.80 | h = 0.81 | |
Baseline | |||||||||
DropEdge | |||||||||
GDC | |||||||||
SDRF | – | – | |||||||
C-ExME | |||||||||
A-ExME | |||||||||
C-ExMP | |||||||||
A-ExMP |
Roman | Texas | Wisconsin | Squirrel | Chameleon | Citeseer | Pubmed | Average | |
---|---|---|---|---|---|---|---|---|
h = 0.05 | h = 0.11 | h = 0.20 | h = 0.22 | h = 0.24 | h = 0.74 | h = 0.80 | Imp. | |
GCN | – | |||||||
GPRGNN | – | – | ||||||
UGT | – | – | – | |||||
GloGNN | – | |||||||
GBKGNN | – | – | – | |||||
GAT | ||||||||
w/ExM | ||||||||
GT | ||||||||
w/ExM | ||||||||
SAGE | ||||||||
w/ExM | ||||||||
NAG | ||||||||
w/ExM | ||||||||
JacobiConv | ||||||||
w/ExM | ||||||||
FSGNN | ||||||||
w/ExM | ||||||||
Average Imp. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wei, Z.; Dan, T.; Ding, J.; Wu, G. Efficient Graph Representation Learning by Non-Local Information Exchange. Electronics 2025, 14, 1047. https://doi.org/10.3390/electronics14051047
Wei Z, Dan T, Ding J, Wu G. Efficient Graph Representation Learning by Non-Local Information Exchange. Electronics. 2025; 14(5):1047. https://doi.org/10.3390/electronics14051047
Chicago/Turabian StyleWei, Ziquan, Tingting Dan, Jiaqi Ding, and Guorong Wu. 2025. "Efficient Graph Representation Learning by Non-Local Information Exchange" Electronics 14, no. 5: 1047. https://doi.org/10.3390/electronics14051047
APA StyleWei, Z., Dan, T., Ding, J., & Wu, G. (2025). Efficient Graph Representation Learning by Non-Local Information Exchange. Electronics, 14(5), 1047. https://doi.org/10.3390/electronics14051047