GLADC: Global Linear Attention and Dual Constraint for Mitigating Over-Smoothing in Graph Neural Networks
Abstract
1. Introduction
- Global linear attention, which captures long-range dependency across all nodes with linear complexity, enabling efficient global information mixing;
- Dual constraint applied during local propagation to synergistically mitigate over-smoothing:
- –
- Column-wise random masking acts as a dynamic neighbor filter. It randomly freezes a subset of feature channels during aggregation, limiting the influx of redundant information from high-order neighbors and preserving the node’s intrinsic features from the previous layer.
- –
- Row-wise contrastive constraint explicitly maximizes the distance between representations of different nodes while minimizing the distance for the same node across layers, thereby enhancing inter-node discriminability.
- We propose GLADC, a novel scheme that integrates global linear attention and dual constraint to achieve efficient long-range modeling while effectively resisting over-smoothing.
- Extensive experiments on multiple real-world graph datasets demonstrate that GLADC significantly alleviates over-smoothing, prevents performance degradation in deep architectures, and achieves state-of-the-art performance on multiple real-world graph datasets.
2. Related Work
2.1. Over-Smoothing Phenomenon and Mitigation Strategies
2.2. Graph Transformers and Scalability
3. Preliminaries
3.1. Problem Definition
3.2. Graph Convolutional Networks
3.3. Simplified Graph Convolutional Networks
4. Methods
4.1. Input Layer
4.2. Global Linear Attention
4.3. Dual Constraint
- (a)
- Random Masking (Column-wise Masking)
- (b)
- Contrastive Constraint (Row-wise Constraint)
4.4. Overall Training Procedure
| Algorithm 1: Training procedure of GLADC |
![]() |
4.5. Structural Information Fusion
5. Results
5.1. Experimental Setups
5.2. Experimental Results
5.3. Over-Smoothing Analysis
5.4. Efficiency Analysis
5.5. Ablation Study
- SGC: The baseline model without our proposed components.
- W/O dual constraint: Only global linear attention active.
- Only random masking: Solely with random masking mechanism.
- Only contrastive constraint: Solely with contrastive constraint.
- W/O random masking: Without random masking.
- W/O contrastive constraint: Without contrastive constraint.
- GLADC (SGC): Our complete proposed model.
5.6. Hyperparameter Sensitivity Analysis
6. Conclusions and Outlooks
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Kipf, T.N.; Welling, M. Semi-Supervised Classification with Graph Convolutional Networks. In Proceedings of the International Conference on Learning Representations, Toulon, France, 24–26 April 2017. [Google Scholar]
- Cheng, T.; Bi, T.; Ji, W.; Tian, C. Graph Convolutional Network for Image Restoration: A Survey. Mathematics 2024, 12, 2020. [Google Scholar] [CrossRef]
- Réau, M.; Renaud, N.; Xue, L.C.; Bonvin, A.M. DeepRank-GNN: A graph neural network framework to learn patterns in protein–protein interfaces. Bioinformatics 2021, 39, btac759. [Google Scholar] [CrossRef] [PubMed]
- Tao, Z.; Huang, J. Research on recommender systems based on GCN. In AIP Conference Proceedings; AIP Publishing: Melville, NY, USA, 2024; Volume 3194. [Google Scholar]
- Zhang, Y.; Xu, W.; Ma, B.; Zhang, D.; Zeng, F.; Yao, J.; Yang, H.; Du, Z. Linear attention based spatiotemporal multi graph GCN for traffic flow prediction. Sci. Rep. 2025, 15, 8249. [Google Scholar] [CrossRef] [PubMed]
- Huang, S.; Xiang, H.; Leng, C.; Xiao, F. Cross-Social-Network User Identification Based on Bidirectional GCN and MNF-UI Models. Electronics 2024, 13, 2351. [Google Scholar] [CrossRef]
- Kenyeres, M.; Kenyeres, J. Average Consensus over Mobile Wireless Sensor Networks: Weight Matrix Guaranteeing Convergence without Reconfiguration of Edge Weights. Sensors 2020, 20, 3677. [Google Scholar] [CrossRef] [PubMed]
- Ataei Nezhad, M.; Barati, H.; Barati, A. An authentication-based secure data aggregation method in internet of things. J. Grid Comput. 2022, 20, 29. [Google Scholar] [CrossRef] [PubMed]
- Li, Q.; Han, Z.; Wu, X.M. Deeper Insights into Graph Convolutional Networks for Semi-Supervised Learning. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; Volume 32. [Google Scholar]
- Rong, Y.; Huang, W.; Xu, T.; Huang, J. DropEdge: Towards Deep Graph Convolutional Networks on Node Classification. In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Fang, T.; Xiao, Z.; Wang, C.; Xu, J.; Yang, X.; Yang, Y. Dropmessage: Unifying random dropping for graph neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; Volume 37, pp. 4267–4275. [Google Scholar]
- Guo, X.; Wang, Y.; Du, T.; Wang, Y. ContraNorm: A Contrastive Learning Perspective on Oversmoothing and Beyond. In Proceedings of the 11th International Conference on Learning Representations, Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
- Zhang, W.; Sheng, Z.; Yin, Z.; Jiang, Y.; Xia, Y.; Gao, J.; Yang, Z.; Cui, B. Model degradation hinders deep graph neural networks. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 14–18 August 2022; pp. 2493–2503. [Google Scholar]
- Wu, Q.; Zhao, W.; Yang, C.; Zhang, H.; Nie, F.; Jiang, H.; Bian, Y.; Yan, J. Sgformer: Simplifying and empowering transformers for large-graph representations. In Proceedings of the International Conference on Neural Information Processing Systems, New Orleans, LA, USA, 10–16 December 2023; pp. 64753–64773. [Google Scholar]
- Gasteiger, J.; Bojchevski, A.; Günnemann, S. Predict then propagate: Graph neural networks meet personalized pagerank. In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Do, T.H.; Nguyen, D.M.; Bekoulis, G.; Munteanu, A.; Deligiannis, N. Graph convolutional neural networks with node transition probability-based message passing and DropNode regularization. Expert Syst. Appl. 2021, 174, 114711. [Google Scholar] [CrossRef]
- Zhao, L.; Akoglu, L. PairNorm: Tackling Oversmoothing in GNNs. arXiv 2019, arXiv:1909.12223. [Google Scholar]
- Zhou, K.; Dong, Y.; Wang, K.; Lee, W.S.; Hooi, B.; Xu, H.; Feng, J. Understanding and resolving performance degradation in deep graph convolutional networks. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, Virtual, 1–5 November 2021; pp. 2728–2737. [Google Scholar]
- Xu, K.; Li, C.; Tian, Y.; Sonobe, T.; Kawarabayashi, K.I.; Jegelka, S. Representation learning on graphs with jumping knowledge networks. In Proceedings of the International Conference on Machine Learning. PMLR, Stockholm, Sweden, 10–15 July 2018; pp. 5453–5462. [Google Scholar]
- Li, G.; Muller, M.; Thabet, A.; Ghanem, B. DeepGCNs: Can GCNs go as deep as CNNs? In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27–28 October 2019; pp. 9267–9276. [Google Scholar]
- Chen, M.; Wei, Z.; Huang, Z.; Ding, B.; Li, Y. Simple and deep graph convolutional networks. In Proceedings of the International Conference on Machine Learning. PMLR, Vienna, Austria, 12–18 July 2020; pp. 1725–1735. [Google Scholar]
- Peng, F.; Liu, K.; Lu, X.; Qian, Y.; Yan, H.; Ma, C. TSC: A Simple Two-Sided Constraint against Over-Smoothing. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Barcelona, Spain, 25–29 August 2024; pp. 2376–2387. [Google Scholar]
- Ying, C.; Cai, T.; Luo, S.; Zheng, S.; Ke, G.; He, D.; Shen, Y.; Liu, T.Y. Do transformers really perform badly for graph representation? In Proceedings of the International Conference on Neural Information Processing Systems, Virtual, 6–14 December 2021; pp. 28877–28888. [Google Scholar]
- Wu, Z.; Jain, P.; Wright, M.; Mirhoseini, A.; Gonzalez, J.E.; Stoica, I. Representing long-range context for graph neural networks with global attention. In Proceedings of the International Conference on Neural Information Processing Systems, Bali, Indonesia, 8–12 December 2021; Volume 34, pp. 13266–13279. [Google Scholar]
- Wu, F.; Souza, A.; Zhang, T.; Fifty, C.; Yu, T.; Weinberger, K. Simplifying graph convolutional networks. In Proceedings of the International Conference on Machine Learning. PMLR, Long Beach, CA, USA, 9–15 June 2019; pp. 6861–6871. [Google Scholar]
- Chen, D.; Lin, Y.; Li, W.; Li, P.; Zhou, J.; Sun, X. Measuring and relieving the over-smoothing problem for graph neural networks from the topological view. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 3438–3445. [Google Scholar]
- Sen, P.; Namata, G.; Bilgic, M.; Getoor, L.; Galligher, B.; Eliassi-Rad, T. Collective classification in network data. AI Mag. 2008, 29, 93. [Google Scholar] [CrossRef]
- Shchur, O.; Mumme, M.; Bojchevski, A.; Günnemann, S. Pitfalls of Graph Neural Network Evaluation. arXiv 2018, arXiv:1811.05868. [Google Scholar]
- Lim, D.; Hohne, F.; Li, X.; Huang, S.L.; Gupta, V.; Bhalerao, O.; Lim, S.N. Large scale learning on non-homophilous graphs: New benchmarks and strong simple methods. In Proceedings of the International Conference on Neural Information Processing Systems, Virtual, 6–14 December 2021; pp. 20887–20902. [Google Scholar]
- Platonov, O.; Kuznedelev, D.; Diskin, M.; Babenko, A.; Prokhorenkova, L. A critical look at the evaluation of GNNs under heterophily: Are we really making progress? In Proceedings of the 11th International Conference on Learning Representations, Kigali, Rwanda, 1–5 May 2023. [Google Scholar]








| Dataset | Cora | Citeseer | Pubmed | CoauthorCS | AmazonPhoto | Actor | Squirrel |
|---|---|---|---|---|---|---|---|
| Nodes | 2708 | 3327 | 19,717 | 18,333 | 7650 | 7600 | 2223 |
| Edges | 5429 | 4732 | 44,338 | 81,894 | 119,043 | 29,926 | 46,998 |
| Classes | 7 | 6 | 3 | 15 | 8 | 5 | 5 |
| Train | 140 | 120 | 60 | 300 | 160 | 100 | 100 |
| Validate | 500 | 500 | 500 | 500 | 500 | 500 | 500 |
| Test | 1000 | 1000 | 1000 | 1000 | 1000 | 1000 | 1000 |
| Dataset | Cora | Citeseer | Pubmed | CoauthorCS | AmazonPhoto | Actor | Squirrel |
|---|---|---|---|---|---|---|---|
| GCN | |||||||
| SGC | |||||||
| APPNP | |||||||
| AIR (SGC) | |||||||
| ContraNorm (GCN) | |||||||
| DropMessage (GCN) | |||||||
| TSC (GCN) | |||||||
| SGFormer | |||||||
| GLADC (GCN) | |||||||
| GLADC (SGC) |
| Model | Layer 4 | Layer 8 | Layer 16 | Layer 32 |
|---|---|---|---|---|
| GCN | ||||
| SGC | ||||
| ContraNorm (GCN) | ||||
| DropMessage (GCN) | ||||
| AIR (SGC) | ||||
| TSC (GCN) | ||||
| GLADC (GCN) | 86.3 ± 0.2 | |||
| GLADC (SGC) |
| Model | Layer 4 | Layer 8 | Layer 16 | Layer 32 |
|---|---|---|---|---|
| GCN | ||||
| SGC | ||||
| ContraNorm (GCN) | OOM | OOM | ||
| DropMessage (GCN) | ||||
| AIR (SGC) | OOM | OOM | ||
| TSC (GCN) | ||||
| GLADC (GCN) | 40.4 ± 0.3 | |||
| GLADC (SGC) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Chen, Z.; Yan, Y.; Wang, Q.; Chen, H. GLADC: Global Linear Attention and Dual Constraint for Mitigating Over-Smoothing in Graph Neural Networks. Algorithms 2025, 18, 739. https://doi.org/10.3390/a18120739
Chen Z, Yan Y, Wang Q, Chen H. GLADC: Global Linear Attention and Dual Constraint for Mitigating Over-Smoothing in Graph Neural Networks. Algorithms. 2025; 18(12):739. https://doi.org/10.3390/a18120739
Chicago/Turabian StyleChen, Zepeng, Yang Yan, Qiuyan Wang, and Hanning Chen. 2025. "GLADC: Global Linear Attention and Dual Constraint for Mitigating Over-Smoothing in Graph Neural Networks" Algorithms 18, no. 12: 739. https://doi.org/10.3390/a18120739
APA StyleChen, Z., Yan, Y., Wang, Q., & Chen, H. (2025). GLADC: Global Linear Attention and Dual Constraint for Mitigating Over-Smoothing in Graph Neural Networks. Algorithms, 18(12), 739. https://doi.org/10.3390/a18120739


