A Multimodal CNN–Transformer Network for Gait Pattern Recognition with Wearable Sensors in Weak GNSS Scenarios
Abstract
:1. Introduction
- A multi-nodal MEMS network synchronizing data from wrist, chest, and ankle nodes using linear interpolation and frequency feature enhancement.
- A CNN–Transformer hybrid architecture combining dilated convolutional layers for multi-scale feature extraction with positional encoding-enhanced self-attention mechanisms.
- Comprehensive validation across seven ambulatory modes (including stair navigation and fall detection) demonstrating 99.8% recognition accuracy in real-world urban scenarios.
2. Related Work
3. Methodology
3.1. Abbreviations and Acronyms
3.2. Time-Frequency Joint Feature Enhancement
3.3. A Hybrid CNN–Transformer Architecture Forl Gait Recognition
4. Experiments and Evaluation
4.1. Experimental Setup
4.2. Multi-Node Biomechanical Signal Analysis
- The triaxial accelerometer shows comparable amplitude across all three axes during high-intensity gaits (R, J).
- The gyroscope reveals X-axis angular velocity dominance during running and Y-axis prominence during jumping.
- The foot sensor records maximum acceleration responses during weight-bearing gaits (R, J), with jumping impact peaks being most pronounced, while low-dynamic gaits (e.g., E) yield minimal values, consistent with the foot’s role as the primary load-bearing region.
- The wrist sensor exhibits large-amplitude data only during intense gaits, with smooth fluctuations in regular gaits, reflecting correlations between upper limb swing amplitude and gait intensity.
- The chest sensor demonstrates intermediate variation magnitudes, where multi-axis data combinations characterize the trajectory of the body’s center of mass.
4.3. Performance Evaluation of CNN–Transformer Hybrid Architectures for Multi-Node Wearable Sensor Networks
4.3.1. Effectiveness Analysis of Multi-Source Sensor Wearable System
4.3.2. Performance Analysis of Human Gait Recognition Algorithms
- Highest Sensitivity: The proposed model achieves the highest coverage in recognizing all gait types, with no missed detections in transient actions (e.g., falling).
- Optimal Specificity: The model demonstrates the highest classification reliability, reducing misjudgment rates by 60% compared to the second-best model.
- Robustness: Addressing imbalanced gait durations, the proposed method achieves an F1-score of 0.997, surpassing traditional models by 0.05–0.12, proving its robustness in complex real-world scenarios.
5. Conclusions
- Multi-source Sensor Network Construction: A distributed wearable node network is designed to synchronously capture gait parameters from wrist, chest, and ankle regions. Temporal calibration and frequency-domain feature fusion are employed to construct a spatiotemporally correlated feature matrix.
- Cross-modal Feature Enhancement: Compared to single-node sensing solutions (wristband/chest strap/foot device), the multi-source fusion system significantly improves recognition accuracy, achieving 7.98%, 0.43%, and 5.59% enhancements over wrist-, chest-, and foot-only systems, respectively.
- Hybrid Model Architecture Optimization: The CNN–Transformer–Attention hybrid architecture integrates the CNN’s local feature extraction strengths with the Transformer’s global dependency modeling. It achieves >99% recognition accuracy across seven fundamental gait patterns, outperforming traditional CNN, LSTM, and other models by 3–8 percentage points.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Cai, J.-Y.; Luo, Y.-Y.; Cheng, C.-H. A Smart Space Focus Enhancement System Based on Grey Wolf Algorithm Positioning and Generative Adversarial Networks for Database Augmentation. Electronics 2025, 14, 865. [Google Scholar] [CrossRef]
- Zhu, Y.; Fang, X.; Li, C. Multi-Sensor Information Fusion for Mobile Robot Indoor-Outdoor Localization: A Zonotopic Set-Membership Estimation Approach. Electronics 2025, 14, 867. [Google Scholar] [CrossRef]
- Guan, D.; Hua, C.; Zhao, X. Two-Path Spatial-Temporal Feature Fusion and View Embedding for Gait Recognition. Appl. Sci. 2023, 13, 12808. [Google Scholar] [CrossRef]
- Khare, S.; Sarkar, S.; Totaro, M. Comparison of Sensor-Based Datasets for Human Activity Recognition in Wearable IoT. In Proceedings of the 2020 IEEE 6th World Forum on Internet of Things (WF-IoT), New Orleans, LA, USA, 2–16 June 2020; pp. 1–6. [Google Scholar]
- Shi, Y.; Du, L.; Chen, X.; Liao, X.; Yu, Z.; Li, Z.; Wang, C.; Xue, S. Robust Gait Recognition Based on Deep CNNs With Camera and Radar Sensor Fusion. IEEE Internet Things J. 2023, 10, 10817–10832. [Google Scholar] [CrossRef]
- Wang, J.; Xia, M.; Zhang, D.; Wen, W.; Chen, W.; Shi, C. Urban GNSS Positioning for Consumer Electronics: 3D Mapping and Advanced Signal Processing. IEEE Trans. Consum. Electron. 2025. [Google Scholar] [CrossRef]
- Luo, M.; Yin, M.; Li, J.; Li, Y.; Kobsiriphat, W.; Yu, H.; Xu, T.; Wu, X.; Cao, W. Lateral Walking Gait Recognition and Hip Angle Prediction using a Dual-Task Learning Framework. Cyborg Bionic Syst. 1960, 1–28. [Google Scholar] [CrossRef]
- Yu, F.; Liu, Y.; Wu, Z.; Tan, M.; Yu, J. Adaptive gait training of a lower limb rehabilitation robot based on human–robot interaction force measurement. Cyborg Bionic Syst. 2024, 5, 0115. [Google Scholar] [CrossRef]
- Zhang, R.; Zhang, X.; He, D.; Wang, R.; Guo, Y. sEMG signals characterization and identification of hand movements by machine learning considering sex differences. Appl. Sci. 2022, 12, 2962. [Google Scholar] [CrossRef]
- Song, J.; Zhu, A.; Tu, Y.; Huang, H.; Arif, M.A.; Shen, Z.; Zhang, X.; Cao, G. Effects of different feature parameters of sEMG on human motion pattern recognition using multilayer perceptrons and LSTM neural networks. Appl. Sci. 2020, 10, 3358. [Google Scholar] [CrossRef]
- Muhammad, Z.U.R.; Syed, G.; Asim, W.; Imran, N.; Gregory, S.; Dario, F.; Ernest, K. Stacked Sparse Autoencoders for EMG-Based Classification of Hand Motions: A Comparative Multi Day Analyses between Surface and Intramuscular EMG. Appl. Sci. 2018, 8, 1126. [Google Scholar] [CrossRef]
- Qiu, S.; Zhao, H.; Jiang, N.; Wang, Z.; Liu, L.; An, Y.; Zhao, H.; Miao, X.; Liu, R.; Fortino, G. Multi-sensor information fusion based on machine learning for real applications in human activity recognition: State-of-the-art and research challenges. Inf. Fusion 2022, 80, 241–265. [Google Scholar] [CrossRef]
- Chen, P.; Jian, Q.; Wu, P.; Guo, S.; Cui, G.; Jiang, C.; Kong, L. A Multi-Domain Fusion Human Motion Recognition Method Based on Lightweight Network. IEEE Geosci. Remote Sens. Lett. 2022, 19, 4019605. [Google Scholar] [CrossRef]
- Wu, Z.; Huang, Y.; Wang, L.; Wang, X.; Tan, T. A Comprehensive Study on Cross-View Gait Based Human Identification with Deep CNNs. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 209–226. [Google Scholar] [CrossRef] [PubMed]
- Liao, R.; Cao, C.; Garcia, E.B.; Yu, S.; Huang, Y. Pose-Based Temporal-Spatial Network (PTSN) for Gait Recognition with Carrying and Clothing Variations. In Biometric Recognition: 12th Chinese Conference, CCBR 2017, Shenzhen, China, 28–29 October 2017; Springer: Cham, Switzerland, 2017; pp. 474–483. [Google Scholar]
- Fang, B.; Sun, F.; Liu, H.; Liu, C. 3D human gesture capturing and recognition by the IMMU-based data glove. Neurocomputing 2018, 277, 198–207. [Google Scholar] [CrossRef]
- Ramalingame, R.; Barioul, R.; Li, X.; Sanseverino, G.; Krumm, D.; Odenwald, S.; Kanoun, O. Wearable Smart Band for American Sign Language Recognition With Polymer Carbon Nanocomposite-Based Pressure Sensors. IEEE Sens. Lett. 2021, 5, 6001204. [Google Scholar] [CrossRef]
- Mantyjarvi, J.; Himberg, J.; Seppanen, T. Recognizing human motion with multiple acceleration sensors. In Proceedings of the 2001 IEEE International Conference on Systems, Man and Cybernetics. e-Systems and e-Man for Cybernetics in Cyberspace (Cat.No.01CH37236), Tucson, AZ, USA, 7–10 October 2001; Volume 2, pp. 747–752. [Google Scholar]
- Zhao, H.; Wang, Z.; Qiu, S.; Wang, J.; Xu, F.; Wang, Z.; Shen, Y. Adaptive gait detection based on foot-mounted inertial sensors and multi-sensor fusion. Inf. Fusion 2019, 52, 157–166. [Google Scholar] [CrossRef]
- Cunado, D.; Nixon, M.S.; Carter, J.N. Using gait as a biometric, via phase-weighted magnitude spectra. In Audio-and Video-Based Biometric Person Authentication: First International Conference, AVBPA’97 Crans-Montana, Switzerland, 12–14 March 1997; Proceedings 1; Springer: Berlin/Heidelberg, Germany, 1997; pp. 93–102. [Google Scholar]
- Wang, W.; Xi, J.; Zhao, D. Learning and Inferring a Driver’s Braking Action in Car-Following Scenarios. IEEE Trans. Veh. Technol. 2018, 67, 3887–3899. [Google Scholar] [CrossRef]
- Wang, J.; Shi, C.; Xia, M.; Zheng, F.; Li, T.; Shan, Y.; Jing, G.; Chen, W.; Hsia, T.C. Seamless Indoor–Outdoor Foot-Mounted Inertial Pedestrian Positioning System Enhanced by Smartphone PPP/3-D Map/Barometer. IEEE Internet Things J. 2024, 11, 13051–13069. [Google Scholar] [CrossRef]
- Bianchi, V.; Bassoli, M.; Lombardo, G.; Fornacciari, P.; Mordonini, M.; Munari, I.D. IoT Wearable Sensor and Deep Learning: An Integrated Approach for Personalized Human Activity Recognition in a Smart Home Environment. IEEE Internet Things J. 2019, 6, 8553–8562. [Google Scholar] [CrossRef]
- Wang, J.; Shi, C.; Zheng, F.; Yang, C.; Liu, X.; Liu, S.; Xia, M.; Jing, G.; Li, T.; Chen, W.; et al. Multi-frequency smartphone positioning performance evaluation: Insights into A-GNSS PPP-B2b services and beyond. Satell. Navig. 2024, 5, 25. [Google Scholar] [CrossRef]
- Buffelli, D.; Vandin, F. Attention-Based Deep Learning Framework for Human Activity Recognition With User Adaptation. IEEE Sens. J. 2021, 21, 13474–13483. [Google Scholar] [CrossRef]
- Li, J.; Wang, Z.; Wang, C.; Su, W. GaitFormer: Leveraging dual-stream spatial–temporal Vision Transformer via a single low-cost RGB camera for clinical gait analysis. Knowl.-Based Syst. 2024, 295, 111810. [Google Scholar] [CrossRef]
Parameters | Wrist Sensors | Shoulder and Foot Sensors | ||
---|---|---|---|---|
Gyroscope | Accelerometer | Gyroscope | Accelerometer | |
Update rate | 100 Hz | 100 Hz | Update rate | 100 Hz |
Bias stability | 1°/s | 0.5 mg | Bias stability | 1°/s |
Dynamic range | ±2000°/s | ±8 g | Dynamic range | ±2000°/s |
White noise | White noise | |||
Bandwidth | 250 Hz | 250 Hz | Bandwidth | 250 Hz |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, J.; Liu, N.; Xie, Y.; Que, S.; Xia, M. A Multimodal CNN–Transformer Network for Gait Pattern Recognition with Wearable Sensors in Weak GNSS Scenarios. Electronics 2025, 14, 1537. https://doi.org/10.3390/electronics14081537
Wang J, Liu N, Xie Y, Que S, Xia M. A Multimodal CNN–Transformer Network for Gait Pattern Recognition with Wearable Sensors in Weak GNSS Scenarios. Electronics. 2025; 14(8):1537. https://doi.org/10.3390/electronics14081537
Chicago/Turabian StyleWang, Jiale, Nanzhu Liu, Yuxin Xie, Shengmao Que, and Ming Xia. 2025. "A Multimodal CNN–Transformer Network for Gait Pattern Recognition with Wearable Sensors in Weak GNSS Scenarios" Electronics 14, no. 8: 1537. https://doi.org/10.3390/electronics14081537
APA StyleWang, J., Liu, N., Xie, Y., Que, S., & Xia, M. (2025). A Multimodal CNN–Transformer Network for Gait Pattern Recognition with Wearable Sensors in Weak GNSS Scenarios. Electronics, 14(8), 1537. https://doi.org/10.3390/electronics14081537