MADS-GCN: A Robust Interactive Memory-Augmented Dual-Stream GCN with Adaptive Spatiotemporal Modeling for Human Action Recognition

Wang, Qian; Zhou, Yini; Shi, Haowen; Huang, Qian

doi:10.3390/app16115408

This is an early access version, the complete PDF, HTML, and XML versions will be available soon.

Open AccessArticle

MADS-GCN: A Robust Interactive Memory-Augmented Dual-Stream GCN with Adaptive Spatiotemporal Modeling for Human Action Recognition

¹

School of Dance, Nanjing University of the Arts, Nanjing 210013, China

²

College of Computer Science and Software Engineering, Hohai University, Nanjing 210098, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(11), 5408; https://doi.org/10.3390/app16115408 (registering DOI)

Submission received: 1 April 2026 / Revised: 25 May 2026 / Accepted: 25 May 2026 / Published: 28 May 2026

(This article belongs to the Section Computing and Artificial Intelligence)

Download Versions Notes

Abstract

Human action recognition is a key research area in computer vision, where accurate recognition relies on effective modeling of both global and local spatiotemporal information. However, existing GCN-based methods often overemphasize the local topological connectivity of human skeletons. Moreover, their temporal modules fail to fully capture the evolution of action sequences, leading to critical instantaneous information being obscured by global representations. To address these problems, we propose an integrated framework termed MADS-GCN. In the spatial modeling stage, we introduce two parallel streams: the Physical Stream uses the adjacency matrix to constrain convolution and capture global structural patterns, while the Topological Stream leverages spatial attention to assign adaptive weights to joints, preserving discriminative local adaptive features. For temporal modeling, a channel-temporal attention mechanism is applied to adaptively refine feature maps, followed by a bidirectional GRU to capture multi-scale temporal patterns. Extensive experiments on NTU RGB+D60, Northwestern-UCLA, and our custom DanceBasic-Set demonstrate the effectiveness of MADS-GCN and indicate its applicability to dance action recognition scenarios.

Keywords: human action recognition; MADS-GCN; graph convolutional network; spatio-temporal feature modeling

Share and Cite

MDPI and ACS Style

Wang, Q.; Zhou, Y.; Shi, H.; Huang, Q. MADS-GCN: A Robust Interactive Memory-Augmented Dual-Stream GCN with Adaptive Spatiotemporal Modeling for Human Action Recognition. Appl. Sci. 2026, 16, 5408. https://doi.org/10.3390/app16115408

AMA Style

Wang Q, Zhou Y, Shi H, Huang Q. MADS-GCN: A Robust Interactive Memory-Augmented Dual-Stream GCN with Adaptive Spatiotemporal Modeling for Human Action Recognition. Applied Sciences. 2026; 16(11):5408. https://doi.org/10.3390/app16115408

Chicago/Turabian Style

Wang, Qian, Yini Zhou, Haowen Shi, and Qian Huang. 2026. "MADS-GCN: A Robust Interactive Memory-Augmented Dual-Stream GCN with Adaptive Spatiotemporal Modeling for Human Action Recognition" Applied Sciences 16, no. 11: 5408. https://doi.org/10.3390/app16115408

APA Style

Wang, Q., Zhou, Y., Shi, H., & Huang, Q. (2026). MADS-GCN: A Robust Interactive Memory-Augmented Dual-Stream GCN with Adaptive Spatiotemporal Modeling for Human Action Recognition. Applied Sciences, 16(11), 5408. https://doi.org/10.3390/app16115408

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

MADS-GCN: A Robust Interactive Memory-Augmented Dual-Stream GCN with Adaptive Spatiotemporal Modeling for Human Action Recognition

Abstract

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI