HDCA: Heterogeneous Dual-Path Contrastive Architecture for Action Recognition
Abstract
1. Introduction
- 1.
- The spatial pathway and the temporal pathway in Heterogeneous Dual-path Contrastive Architecture (HDCA) employ distinct backbone networks and input formats tailored to the specific properties of spatial features and temporal features. The spatial pathway processes super images to capture spatial semantics, while the temporal pathway operates on frame sequences to model motion patterns. This targeted design can precisely capture the scenes and motions depicted in videos while improving parameter efficiency.
- 2.
- We devise a cross-modality contrastive loss. Training the HDCA with this loss maximizes the similarity between different feature representations from the same video. This process enables the model to discover discriminative correlation information between modalities while reducing the reliance of performance on spatiotemporal fusion strategies.
- 3.
- To address the potential intra-group feature dispersion caused by cross-modality contrastive loss, we introduce an intra-group contrastive loss. This loss function maximizes the similarity of feature representations among videos belonging to the same class. By enhancing intra-group compactness through intra-group contrastive loss, HDCA effectively leverages the complementary strengths of spatial features and temporal features in action recognition.
2. Related Work
2.1. Two-Stream CNNs
2.2. Contrastive Learning
3. Method
3.1. Spatial Pathway
3.2. Temporal Pathway
3.3. Cross-Modality Contrastive Loss
3.4. Intra-Group Contrastive Loss
4. Experiment
4.1. Datasets
4.2. Implementation Details
4.3. Ablation Studies
4.4. Comparison with the State of the Art
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Carreira, J.; Zisserman, A. Quo vadis, action recognition? A new model and the kinetics dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 6299–6308. [Google Scholar]
- Feichtenhofer, C.; Fan, H.; Malik, J.; He, K. Slowfast networks for video recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October 27–2 November 2019; pp. 6202–6211. [Google Scholar]
- Simonyan, K.; Zisserman, A. Two-stream convolutional networks for action recognition in videos. Adv. Neural Inf. Process. Syst. 2014, 27, 568–576. [Google Scholar]
- Wang, L.; Xiong, Y.; Wang, Z.; Qiao, Y.; Lin, D.; Tang, X.; Van Gool, L. Temporal segment networks for action recognition in videos. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 41, 2740–2755. [Google Scholar] [CrossRef]
- Feichtenhofer, C.; Pinz, A.; Zisserman, A. Convolutional two-stream network fusion for video action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 1933–1941. [Google Scholar]
- Choi, M.; Han, K.; Wang, X.; Zhang, Y.; Liu, Z. A dual-stream neural network explains the functional segregation of dorsal and ventral visual pathways in human brains. Adv. Neural Inf. Process. Syst. 2023, 36, 50408–50428. [Google Scholar]
- Chen, S.; Cheng, N.; Chen, X.; Wang, C. Integration and competition between space and time in the hippocampus. Neuron 2024, 112, 3651–3664. [Google Scholar] [CrossRef]
- Feichtenhofer, C.; Pinz, A.; Wildes, R.P. Spatiotemporal multiplier networks for video action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4768–4777. [Google Scholar]
- Wanyan, Y.; Yang, X.; Chen, C.; Xu, C. Active exploration of multimodal complementarity for few-shot action recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 6492–6502. [Google Scholar]
- Xia, L.; Fu, W. Spatial-temporal multiscale feature optimization based two-stream convolutional neural network for action recognition. Clust. Comput. 2024, 27, 11611–11626. [Google Scholar] [CrossRef]
- Kar, A.; Rai, N.; Sikka, K.; Sharma, G. Adascan: Adaptive scan pooling in deep convolutional neural networks for human action recognition in videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 3376–3385. [Google Scholar]
- Xiao, Z.; Xing, H.; Qu, R.; Li, H.; Cheng, X.; Xu, L.; Feng, L.; Wan, Q. Heterogeneous Mutual Knowledge Distillation for Wearable Human Activity Recognition. IEEE Trans. Neural Netw. Learn. Syst. 2025, 36, 16589–16603. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
- Ganj, A.; Ebadpour, M.; Darvish, M.; Bahador, H. LR-net: A block-based convolutional neural network for low-resolution image classification. Iran. J. Sci. Technol. Trans. Electr. Eng. 2023, 47, 1561–1568. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Szegedy, C.; Ioffe, S.; Vanhoucke, V.; Alemi, A. Inception-v4, inception-resnet and the impact of residual connections on learning. In Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; Volume 31, pp. 4278–4284. [Google Scholar]
- Ji, S.; Xu, W.; Yang, M.; Yu, K. 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 35, 221–231. [Google Scholar] [CrossRef] [PubMed]
- Feichtenhofer, C. X3d: Expanding architectures for efficient video recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 14–19 June 2020; pp. 203–213. [Google Scholar]
- Hara, K.; Kataoka, H.; Satoh, Y. Can spatiotemporal 3d cnns retrace the history of 2d cnns and imagenet? In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 6546–6555. [Google Scholar]
- Tran, D.; Wang, H.; Torresani, L.; Feiszli, M. Video classification with channel-separated convolutional networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 5552–5561. [Google Scholar]
- Hadsell, R.; Chopra, S.; LeCun, Y. Dimensionality reduction by learning an invariant mapping. In Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), New York, NY, USA, 17–22 June 2006; Volume 2, pp. 1735–1742. [Google Scholar]
- Chen, T.; Kornblith, S.; Norouzi, M.; Hinton, G. A simple framework for contrastive learning of visual representations. In Proceedings of the International Conference on Machine Learning, PmLR, Virtual, 13–18 July 2020; pp. 1597–1607. [Google Scholar]
- He, K.; Fan, H.; Wu, Y.; Xie, S.; Girshick, R. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 9729–9738. [Google Scholar]
- Dorkenwald, M.; Xiao, F.; Brattoli, B.; Tighe, J.; Modolo, D. Scvrl: Shuffled contrastive video representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 4132–4141. [Google Scholar]
- Pan, T.; Song, Y.; Yang, T.; Jiang, W.; Liu, W. Videomoco: Contrastive video representation learning with temporally adversarial examples. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 20–25 June 2021; pp. 11205–11214. [Google Scholar]
- Qian, R.; Meng, T.; Gong, B.; Yang, M.H.; Wang, H.; Belongie, S.; Cui, Y. Spatiotemporal contrastive video representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 20–25 June 2021; pp. 6964–6974. [Google Scholar]
- Pang, B.; Zha, K.; Cao, H.; Tang, J.; Yu, M.; Lu, C. Complex sequential understanding through the awareness of spatial and temporal concepts. Nat. Mach. Intell. 2020, 2, 245–253. [Google Scholar] [CrossRef]
- Tang, G.; Han, Y.; Sun, X.; Zhang, R.; Han, M.H.; Liu, Q.; Wei, P. Anti-drift pose tracker (ADPT), a transformer-based network for robust animal pose estimation cross-species. eLife 2025, 13, RP95709. [Google Scholar] [CrossRef]
- Zong, M.; Wang, R.; Chen, X.; Chen, Z.; Gong, Y. Motion saliency based multi-stream multiplier ResNets for action recognition. Image Vis. Comput. 2021, 107, 104108. [Google Scholar] [CrossRef]
- Zhang, B.; Wang, L.; Wang, Z.; Qiao, Y.; Wang, H. Real-time action recognition with enhanced motion vector CNNs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2718–2726. [Google Scholar]
- Piergiovanni, A.; Ryoo, M.S. Representation flow for action recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 9945–9953. [Google Scholar]
- Li, J.; Liu, X.; Zhang, M.; Wang, D. Spatio-temporal deformable 3d convnets with attention for action recognition. Pattern Recognit. 2020, 98, 107037. [Google Scholar] [CrossRef]
- Zhang, H.; Liu, D.; Xiong, Z. Two-stream action recognition-oriented video super-resolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October 27–2 November 2019; pp. 8799–8808. [Google Scholar]
- Wang, L.; Koniusz, P. Feature Hallucination for Self-supervised Action Recognition. arXiv 2025, arXiv:2506.20342. [Google Scholar] [CrossRef]
- Liao, D.; Shu, X.; Li, Z.; Liu, Q.; Yuan, D.; Chang, X.; He, Z. Fine-grained feature and template reconstruction for tir object tracking. IEEE Trans. Circuits Syst. Video Technol. 2025, 35, 9276–9286. [Google Scholar] [CrossRef]
- Cho, S.; Kim, T.K. Body-Hand Modality Expertized Networks with Cross-attention for Fine-grained Skeleton Action Recognition. arXiv 2025, arXiv:2503.14960. [Google Scholar]
- Schroff, F.; Kalenichenko, D.; Philbin, J. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 815–823. [Google Scholar]
- Sohn, K. Improved deep metric learning with multi-class n-pair loss objective. Adv. Neural Inf. Process. Syst. 2016, 29, 1–9. [Google Scholar]
- Oord, A.v.d.; Li, Y.; Vinyals, O. Representation learning with contrastive predictive coding. arXiv 2018, arXiv:1807.03748. [Google Scholar]
- Chen, X.; Fan, H.; Girshick, R.; He, K. Improved baselines with momentum contrastive learning. arXiv 2020, arXiv:2003.04297. [Google Scholar] [CrossRef]
- Chen, X.; Xie, S.; He, K. An empirical study of training self-supervised vision transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual, 11–17 October 2021; pp. 9640–9649. [Google Scholar]
- Fan, Q.; Chen, C.F.; Panda, R. Can an image classifier suffice for action recognition? arXiv 2021, arXiv:2106.14104. [Google Scholar]
- Christoph, R.; Pinz, F.A. Spatiotemporal residual networks for video action recognition. Adv. Neural Inf. Process. Syst. 2016, 2, 3468–3476. [Google Scholar]
- Pang, Z.; Wang, C.; Pan, H.; Zhao, L.; Wang, J.; Guo, M. MIMR: Modality-invariance modeling and refinement for unsupervised visible-infrared person re-identification. Knowl.-Based Syst. 2024, 285, 111350. [Google Scholar] [CrossRef]
- Soomro, K.; Zamir, A.R.; Shah, M. A dataset of 101 human action classes from videos in the wild. Cent. Res. Comput. Vis. 2012, 2, 1–7. [Google Scholar]
- Kuehne, H.; Jhuang, H.; Garrote, E.; Poggio, T.; Serre, T. HMDB: A large video database for human motion recognition. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 2556–2563. [Google Scholar]
- Kay, W.; Carreira, J.; Simonyan, K.; Zhang, B.; Hillier, C.; Vijayanarasimhan, S.; Viola, F.; Green, T.; Back, T.; Natsev, P.; et al. The kinetics human action video dataset. arXiv 2017, arXiv:1705.06950. [Google Scholar] [CrossRef]
- Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
- Yang, C.; Xu, Y.; Shi, J.; Dai, B.; Zhou, B. Temporal pyramid network for action recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 591–600. [Google Scholar]
- Wang, X.; Girshick, R.; Gupta, A.; He, K. Non-local neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7794–7803. [Google Scholar]
- Liu, T.; Ma, Y.; Yang, W.; Ji, W.; Wang, R.; Jiang, P. Spatial-temporal interaction learning based two-stream network for action recognition. Inf. Sci. 2022, 606, 864–876. [Google Scholar] [CrossRef]
- Zhu, Y.; Lan, Z.; Newsam, S.; Hauptmann, A. Hidden two-stream convolutional networks for action recognition. In Proceedings of the Computer Vision–ACCV 2018: 14th Asian Conference on Computer Vision, Perth, Australia, 2–6 December 2018; Revised Selected Papers, Part III 14. Springer: Berlin/Heidelberg, Germany, 2019; pp. 363–378. [Google Scholar]
- Tishby, N.; Pereira, F.C.; Bialek, W. The information bottleneck method. arXiv 2000, arXiv:physics/0004057. [Google Scholar]
- Kwon, H.; Kim, M.; Kwak, S.; Cho, M. Motionsqueeze: Neural motion feature learning for video understanding. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part XVI 16. Springer: Berlin/Heidelberg, Germany, 2020; pp. 345–362. [Google Scholar]
- Wang, L.; Qiao, Y.; Tang, X. Action recognition with trajectory-pooled deep-convolutional descriptors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 4305–4314. [Google Scholar]
- Wang, X.; Lu, Y.; Yu, W.; Pang, Y.; Wang, H. Few-shot action recognition via multi-view representation learning. IEEE Trans. Circuits Syst. Video Technol. 2024, 34, 8522–8535. [Google Scholar] [CrossRef]
- Wang, X.; Zhang, S.; Qing, Z.; Zuo, Z.; Gao, C.; Jin, R.; Sang, N. HyRSM++: Hybrid relation guided temporal set matching for few-shot action recognition. Pattern Recognit. 2024, 147, 110110. [Google Scholar] [CrossRef]
- Donahue, J.; Anne Hendricks, L.; Guadarrama, S.; Rohrbach, M.; Venugopalan, S.; Saenko, K.; Darrell, T. Long-term recurrent convolutional networks for visual recognition and description. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 2625–2634. [Google Scholar]
- Majd, M.; Safabakhsh, R. Correlational convolutional LSTM for human action recognition. Neurocomputing 2020, 396, 224–229. [Google Scholar] [CrossRef]
- Liu, Z.; Li, Z.; Wang, R.; Zong, M.; Ji, W. Spatiotemporal saliency-based multi-stream networks with attention-aware LSTM for action recognition. Neural Comput. Appl. 2020, 32, 14593–14602. [Google Scholar] [CrossRef]
- Diba, A.; Fayyaz, M.; Sharma, V.; Karami, A.H.; Arzani, M.M.; Yousefzadeh, R.; Van Gool, L. Temporal 3d convnets: New architecture and transfer learning for video classification. arXiv 2017, arXiv:1711.08200. [Google Scholar] [CrossRef]
- Khorasgani, S.H.; Chen, Y.; Shkurti, F. Slic: Self-supervised learning with iterative clustering for human action videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 16091–16101. [Google Scholar]
- Shi, Q.; Zhang, H.B.; Li, Z.; Du, J.X.; Lei, Q.; Liu, J.H. Shuffle-invariant network for action recognition in videos. ACM Trans. Multimed. Comput. Commun. Appl. (TOMM) 2022, 18, 1–18. [Google Scholar] [CrossRef]
- Li, C.; Zhang, J.; Wu, S.; Jin, X.; Shan, S. Hierarchical compositional representations for few-shot action recognition. Comput. Vis. Image Underst. 2024, 240, 103911. [Google Scholar] [CrossRef]
- Wang, X.; Yan, Y.; Hu, H.M.; Li, B.; Wang, H. Cross-modal contrastive learning network for few-shot action recognition. IEEE Trans. Image Process. 2024, 33, 1257–1271. [Google Scholar] [CrossRef]







| Stage | Spatial Pathway | Tempoarl Pathway | Output |
|---|---|---|---|
| sampling | - | - | spatial: temporal: |
| super image | tiled in a grid format | - | spatial: temporal: |
| conv0 | stride | - | spatial: temporal: |
| conv1 | stride | stride | spatial: temporal: |
| pool1 | , max stride | , max stride | spatial: temporal: |
| res2 | spatial: temporal: | ||
| res3 | spatial: temporal: | ||
| res4 | spatial: temporal: | ||
| res5 | spatial: temporal: | ||
| global pool, fusion, fc | classes | ||
| Super Image | CCL | ICL | HMDB51 | Kinetics-400 |
|---|---|---|---|---|
| Top-1 | Top-1 | |||
| ✘ | ✔ | ✔ | 74.8 | 74.0 |
| ✔ | ✘ | ✔ | 77.9 | 77.4 |
| ✔ | ✔ | ✘ | 77.2 | 75.9 |
| ✔ | ✔ | ✔ | 78.6 | 78.3 |
| Frames for Spatial Pathway | Layout of Super Image | Frames for Temporal Pathway | HMDB51 |
|---|---|---|---|
| Top-1 | |||
| 4 | 16 | 76.1 | |
| 8 | 32 | 78.6 | |
| 16 | 64 | 78.9 |
| Action Category | Handstand | Push Up | Run | Climb Stairs | Eat |
|---|---|---|---|---|---|
| Number | 30 | 30 | 30 | 30 | 30 |
| Precision (%) | 100 | 100 | 100 | 96.7 | 93.3 |
| False negative | 0 | 0 | 0 | 1 | 2 |
| Action Category | Sword Fight | Fencing | Hug | Kiss | Smile |
|---|---|---|---|---|---|
| Number | 30 | 30 | 30 | 30 | 30 |
| Precision (%) | 46.7 | 56.7 | 56.7 | 60.0 | 60.0 |
| False negative | 16 | 13 | 13 | 12 | 12 |
| Method | Pre-Training | Input | UCF101 | HMDB51 | Kinetics-400 |
|---|---|---|---|---|---|
| Top-1 (%) | Top-1(%) | Top-1(%) | |||
| Two Stream CNN [3] | ✔ | RGB, Flow | 88.0 | 59.4 | - |
| TDD [56] | ✔ | RGB, Flow | 90.3 | 63.2 | - |
| Feichtenhoferetal et al. [5] | ✔ | RGB, Flow | 93.5 | 69.2 | - |
| AdaScan [11] | ✔ | RGB, Flow | 89.4 | 54.9 | - |
| TSN [4] | ✔ | RGB, Flow | 94.2 | 69.4 | 72.5 |
| TSN + SoSR + ToSR [34] | ✔ | RGB, Flow | 92.1 | 68.3 | - |
| MSNet [55] | ✔ | RGB | - | 77.4 | 76.4 |
| MSM-ResNets [30] | ✘ | RGB, Flow | 93.5 | 66.7 | - |
| SCVRL [25] | ✔ | RGB, Flow | 89.0 | 62.6 | - |
| MRLN [57] | ✔ | RGB, Flow | 86.9 | 65.5 | 75.7 |
| HyRSM++ [58] | ✔ | RGB, Flow | 93.5 | 61.5 | 74.0 |
| LRCN [59] | ✔ | RGB, Flow | 82.7 | - | - |
| C2LSTM [60] | ✔ | RGB | 92.8 | 61.3 | - |
| STS-ALSTM [61] | ✔ | RGB, Flow | 92.7 | 64.4 | - |
| C3D [20] | ✔ | RGB | 90.4 | - | - |
| T3D [62] | ✔ | RGB | 93.2 | 63.5 | 62.2 |
| I3D [1] | ✔ | RGB, Flow | 97.8 | 80.9 | - |
| NL I3D [51] | ✔ | RGB | - | - | 77.7 |
| SlowFast [2] | ✘ | RGB | - | - | 75.6 |
| TPN-R50 [50] | ✘ | RGB | - | - | 77.7 |
| SLIC [63] | ✔ | RGB, Flow | 83.2 | 56.2 | - |
| SIN (AVG) [64] | ✔ | RGB | 91.6 | 75.0 | - |
| AMFAR [9] | ✘ | RGB, Flow | 91.2 | 73.9 | - |
| HCR [65] | ✔ | RGB | 88.9 | 67.5 | 75.7 |
| CCLN [66] | ✔ | RGB | 89.6 | 61.5 | 75.8 |
| HDCA (ours) | ✘ | RGB | 94.0 | 78.6 | 78.3 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kang, S.; Huo, H.; Ma, L.; Wang, J.; Mei, A. HDCA: Heterogeneous Dual-Path Contrastive Architecture for Action Recognition. Electronics 2025, 14, 4730. https://doi.org/10.3390/electronics14234730
Kang S, Huo H, Ma L, Wang J, Mei A. HDCA: Heterogeneous Dual-Path Contrastive Architecture for Action Recognition. Electronics. 2025; 14(23):4730. https://doi.org/10.3390/electronics14234730
Chicago/Turabian StyleKang, Shilu, Hua Huo, Lan Ma, Jinxuan Wang, and Aokun Mei. 2025. "HDCA: Heterogeneous Dual-Path Contrastive Architecture for Action Recognition" Electronics 14, no. 23: 4730. https://doi.org/10.3390/electronics14234730
APA StyleKang, S., Huo, H., Ma, L., Wang, J., & Mei, A. (2025). HDCA: Heterogeneous Dual-Path Contrastive Architecture for Action Recognition. Electronics, 14(23), 4730. https://doi.org/10.3390/electronics14234730

