You are currently viewing a new version of our website. To view the old version click .
Electronics
  • This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
  • Article
  • Open Access

30 November 2025

HDCA: Heterogeneous Dual-Path Contrastive Architecture for Action Recognition

,
,
,
and
College of Information Engineering and Artificial Intelligence, Henan University of Science and Technology, Luoyang 471000, China
*
Author to whom correspondence should be addressed.
Electronics2025, 14(23), 4730;https://doi.org/10.3390/electronics14234730 
(registering DOI)
This article belongs to the Special Issue Advances in Artificial Intelligence and Computer Vision Based on Deep Learning

Abstract

We propose Heterogeneous Dual-path Contrastive Architecture (HDCA) for action recognition. Our model involves a spatial pathway and a temporal pathway; these two pathways employ distinct backbone networks and input formats, tailored to the specific properties of spatial features and temporal features. The spatial pathway processes super images to capture spatial semantics while the temporal pathway operates on frame sequences to model motion patterns. This targeted design can precisely capture the scenes and motions depicted in videos while improving parameter efficiency. To establish a cross-modality complementary enhancement mechanism, we develop cross-modality contrastive loss and intra-group contrastive loss to train the HDCA. These contrastive losses work synergistically to maximize the similarity of feature representations among videos belonging to the same class while minimizing similarity across different classes, achieving cross-modality alignment through cross-modality contrastive loss and enhancing intra-group compactness via intra-group contrastive loss. HDCA fully exploits the complementary strengths of spatial features and temporal features in action recognition. Systematic experiments on three benchmark datasets validate the effectiveness and superiority of our approach, which support the motivation and hypothesis of our model design. The experimental results demonstrate that our model achieves competitive performance compared to existing state-of-the-art approaches for action recognition. Notably, performance gains increase with dataset complexity, indicating that discriminative correlation information between modalities learned by HDCA yield greater performance gains in the recognition tasks of complex videos.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.