A ST-ConvLSTM Network for 3D Human Keypoint Localization Using MmWave Radar
Abstract
1. Introduction
2. Related Works
3. Building Dataset
3.1. Dataset Definition and Design
3.2. Hardware Design of Dataset Testing System
3.3. Keypoint 3D Coordinate Computation
3.4. Kalman Filter Algorithm
3.5. Synchronization of Radar and Vision Systems
4. Radar Signal Processing Flow
4.1. Point Cloud Clustering Algorithm
4.2. Point Cloud Fusion Algorithm
5. ST-ConvLSTM
5.1. Overall Architecture of ST-ConvLSTM
5.2. Data Preprocessing
6. Experiments and Results
6.1. Experiment Platform
6.2. Model Training
6.3. Model Performance Evaluation
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Al-Abri, S.; Keshvari, S.; Al-Rashdi, K.; Al-Hmouz, R.; Bourdoucen, H. Computer vision based approaches for fish monitoring systems: A comprehensive study. Artif. Intell. Rev. 2025, 58, 185. [Google Scholar] [CrossRef]
- Sage, K.; Young, S. Security applications of computer vision. IEEE Aerosp. Electron. Syst. Mag. 2002, 14, 19–29. [Google Scholar] [CrossRef]
- Deng, L.; Deng, Y.; Bi, Z. Simulation of athletes’ motion detection and recovery technology based on monocular vision and biomechanics. J. Intell. Fuzzy Syst. 2021, 40, 2241–2252. [Google Scholar] [CrossRef]
- Jaimes, A.; Sebe, N. Multimodal human–computer interaction: A survey. Comput. Vis. Image Underst. 2007, 108, 116–134. [Google Scholar] [CrossRef]
- Gu, S.; Zhang, X.; Zhang, J. A full-time deep learning-based alert approach for bridge–ship collision using visible spectrum and thermal infrared cameras. Meas. Sci. Technol. 2023, 34, 095907. [Google Scholar] [CrossRef]
- Ramanan, D.; Forsyth, D.A.; Zisserman, A. Strike a pose: Tracking people by finding stylized poses. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, CA, USA, 20–25 June 2005; Volume 1, pp. 271–278. [Google Scholar]
- Gkioxari, G.; Hariharan, B.; Girshick, R.; Malik, J. Using k-Poselets for Detecting People and Localizing Their Keypoints. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 3582–3589. [Google Scholar]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Cao, Z.; Simon, T.; Wei, S.E.; Sheikh, Y. Realtime Multi-person 2D Pose Estimation Using Part Affinity Fields. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1302–1310. [Google Scholar]
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft COCO: Common Objects in Context. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; pp. 740–755. [Google Scholar]
- Wang, J.; Chen, Y.; Hao, S.; Peng, X.; Hu, L. Deep learning for sensor based activity recognition: A survey. Pattern Recognit. Lett. 2019, 119, 3–11. [Google Scholar] [CrossRef]
- Zhang, H.; Ho, E.S.; Zhang, F.X.; Shum, H.P. Pose-based tremor classification for Parkinson’s disease diagnosis from video. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), Singapore, 18–22 September 2022; pp. 489–499. [Google Scholar]
- Liu, W.; Lin, X.; Chen, X.; Wang, Q.; Wang, X.; Yang, B.; Cai, N.; Chen, R.; Chen, G.; Lin, Y. Vision-based estimation of MDS-UPDRS scores for quantifying Parkinson’s disease tremor severity. Med. Image Anal. 2023, 85, 102754. [Google Scholar] [CrossRef]
- Tao, Z.; Li, Y.; Wang, P.; Ji, L. Traffic incident detection based on mmWave radar and improvement using fusion with camera. J. Adv. Transp. 2022, 2022, 2286147. [Google Scholar] [CrossRef]
- Tan, B.; Ma, Z.; Zhu, X.; Li, S.; Zheng, L.; Chen, S.; Huang, L.; Bai, J. 3-D object detection for multiframe 4-D automotive millimeter-wave radar point cloud. IEEE Sens. J. 2022, 23, 11125–11138. [Google Scholar] [CrossRef]
- Hu, Y.; Yang, X.; Xia, Z.; Xu, F. Human activity recognition trained on simulated millimeter-wave radar data with domain adaptation. IEEE Trans. Instrum. Meas. 2025, 74, 1–13. [Google Scholar] [CrossRef]
- Scholes, S.; Ruget, A.; Zhu, F.; Leach, J. Human Pose Inference Using an Elevated mmWave FMCW Radar. IEEE Access 2024, 12, 115605–115614. [Google Scholar] [CrossRef]
- Gu, M.; Chen, Z.; Chen, K.; Pan, H. RMPCT-Net: A multi-channel parallel CNN and transformer network model applied to HAR using FMCW radar. Signal Image Video Process. 2024, 18, 2219–2229. [Google Scholar] [CrossRef]
- Chen, J.; Gu, M.; Lin, Z. R-ATCN: Continuous human activity recognition using FMCW radar with temporal convolutional networks. Meas. Sci. Technol. 2024, 36, 016180. [Google Scholar] [CrossRef]
- Cai, J.; Yang, Z.; Chu, P.; Guo, J.; Zhou, J. Robust hand gesture detection and recognition using 4D millimeter-wave radar in a ubiquitous scene. Measurement 2025, 253, 117545. [Google Scholar] [CrossRef]
- Sengupta, A.; Jin, F.; Zhang, R.; Cao, S. mm-Pose: Real-time human skeletal posture estimation using mmWave radars and CNNs. IEEE Sens. J. 2020, 20, 10032–10044. [Google Scholar] [CrossRef]
- Jogin, M.; Madhulika, M.S.; Divya, G.D.; Meghana, R.K.; Apoorva, S. Feature Extraction using Convolution Neural Networks (CNN) and Deep Learning. In Proceedings of the 2018 3rd IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology (RTEICT), Bangalore, India, 18–19 May 2018; pp. 2319–2323. [Google Scholar]
- Sigal, L.; Balan, A.O.; Black, M.J. Humaneva: Synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. Int. J. Comput. Vis. 2010, 87, 4–27. [Google Scholar] [CrossRef]
- Zhang, Z. Microsoft kinect sensor and its effect. IEEE Multimed. 2012, 19, 4–10. [Google Scholar] [CrossRef]
- Jin, F.; Zhang, R.; Sengupta, A.; Cao, S.; Hariri, S.; Agarwal, N.K.; Agarwal, S.K. Multiple Patients Behavior Detection in Real-time using mmWave Radar and Deep CNNs. In Proceedings of the 2019 IEEE Radar Conference (RadarConf), Boston, MA, USA, 22–26 April 2019; pp. 1–6. [Google Scholar]
- Kim, Y.; Ling, H. Human activity classification based on micro-Doppler signatures using a support vector machine. IEEE Trans. Geosci. Remote Sens. 2009, 47, 1328–1337. [Google Scholar]
- Cao, P.; Xia, W.; Li, Y. Heart ID: Human Identification Based on Radar Micro-Doppler Signatures of the Heart Using Deep Learning. Remote Sens. 2019, 11, 1220. [Google Scholar] [CrossRef]
- Li, X.; He, Y.; Fioranelli, F.; Jing, X. Semisupervised human activity recognition with radar micro-Doppler signatures. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–12. [Google Scholar] [CrossRef]
- Adib, F.; Hsu, C.-Y.; Mao, H.; Katabi, D.; Durand, F. Capturing the human figure through a wall. ACM Trans. Graph. 2015, 34, 1–13. [Google Scholar] [CrossRef]
- Zhao, M.M.; Li, T.H.; Mohammad, A.A. Through-Wall Human Pose Estimation Using Radio Signals. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7356–7365. [Google Scholar]
- Zhao, M.; Tian, Y.; Zhao, H.; Alsheikh, M.A.; Li, T.; Hristov, R.; Kabelac, Z.; Katabi, D.; Torralba, A. RF-based 3D skeletons. In Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication, Budapest, Hungary, 20–25 August 2018; pp. 267–281. [Google Scholar]
- Yu, Z.; Taha, A.; Taylor, W.; Zahid, A.; Rajab, K.; Heidari, H.; Imran, M.A.; Abbasi, Q.H. A radar-based human activity recognition using a novel 3-D point cloud classifier. IEEE Sens. J. 2022, 22, 18218–18227. [Google Scholar] [CrossRef]
- Dang, X.; Jin, P.; Hao, Z.; Ke, W.; Deng, H.; Wang, L. Human Movement Recognition Based on 3D Point Cloud Spatiotemporal Information from Millimeter-Wave Radar. Sensors 2023, 23, 9430. [Google Scholar] [CrossRef]
- Zhang, Z. A flexible new technique for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 2002, 22, 1330–1334. [Google Scholar] [CrossRef]
- Bajpai, R.; Joshi, D. Movenet: A deep neural network for joint profile prediction across variable walking speeds and slopes. IEEE Trans. Instrum. Meas. 2021, 70, 1–11. [Google Scholar] [CrossRef]
- Basar, T. A New Approach to Linear Filtering and Prediction Problems. In Control Theory: Twenty-Five Seminal Papers (1932–1981); IEEE: Piscataway, NJ, USA, 2001; pp. 167–179. [Google Scholar]
- Schumann, O.; Hahn, M.; Scheiner, N.; Weishaupt, F.; Tilly, J.F.; Dickmann, J.; Wohler, C. RadarScenes: A Real-World Radar Point Cloud Data Set for Automotive Applications. In Proceedings of the 2021 IEEE 24th International Conference on Information Fusion (FUSION), Sun City, South Africa, 1–4 November 2021; pp. 1–8. [Google Scholar]
- Engels, F.; Heidenreich, P.; Wintermantel, M.; Stäcker, L.; Al Kadi, M.; Zoubir, A.M. Automotive Radar Signal Processing: Research Directions and Practical Challenges. IEEE J. Sel. Top. Signal Process. 2021, 15, 865–878. [Google Scholar] [CrossRef]
- Raj, S.; Ghosh, D. Improved and Optimal DBSCAN for Embedded Applications Using High-Resolution Automotive Radar. In Proceedings of the 2020 21st International Radar Symposium (IRS), Warsaw, Poland, 5–7 October 2020; pp. 343–346. [Google Scholar]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 4510–4520. [Google Scholar]
- Shi, X.; Chen, Z.; Wang, H.; Yeung, D.Y. Convolutional LSTM Network: A machine learning approach for precipitation nowcasting. In Proceedings of the 29th International Conference on Neural Information Processing Systems—Volume 1 (NIPS’15), Montreal, QC, Canada, 7–12 December 2015; Volume 1, pp. 802–810. [Google Scholar]
- Li, Y.; Liu, Y.; Li, H.; Zhang, G.; Xu, M.F.; Hao, C.Q. Millimeter-wave radar human pose estimation based on Transformer and PointNet++. Comput. Sci. 2025, 52 (Suppl. S1), 445–453. [Google Scholar]
- Xue, H.; Cao, Q.; Ju, Y.; Hu, H.; Wang, H.; Zhang, A.; Su, L. M4esh: mmWave-Based 3D Human Mesh Construction for Multiple Subjects. In Proceedings of the 20th ACM Conference on Embedded Networked Sensor Systems, Boston, MA, USA, 6–9 November 2022; pp. 391–406. [Google Scholar]
Dataset Notation | Dataset Description |
---|---|
3D keypoint coordinates | |
point cloud coordinates | |
point cloud velocity/SNR | |
motion category/confidence | |
timestamp | current frame timestamp |
Component | Configuration |
---|---|
Central Processing Unit (CPU) | Intel Core i7-8700K (Santa Clara, CA, USA) |
Graphics Processing Unit (GPU) | NVIDIA GeForce RTX 2060 (Santa Clara, CA, USA) |
System Memory (RAM) | 64 GB DDR4 |
Operating System | Windows 11 |
Module | Version | Module Function |
---|---|---|
Python | 3.9 | High-level programming language |
TensorFlow | 2.7.0 | Neural network backend framework |
Keras | 2.7.0 | High-level neural networks API |
OpenCV | 3.4.18.65 | Computer vision library |
NumPy | 1.26.4 | Scientific computing library |
Module | MAE | MAD |
---|---|---|
Single Frame | 0.0274 | 0.0258 |
ICP Multi-Frame Fusion | 0.0115 | 0.0102 |
Network Model | Statistic | Horizontal | Vertical | Depth |
---|---|---|---|---|
ST-ConvLSTM | mean | 0.1075 | 0.0633 | 0.1180 |
range | 0.1995 | 0.1945 | 0.2809 | |
median | 0.1090 | 0.0648 | 0.1201 | |
mmPose [21] | mean | 0.1576 | 0.4381 | 0.2570 |
range | 0.2358 | 0.3368 | 0.9646 | |
median | 0.1584 | 0.4363 | 0.2278 | |
MnPoTr [42] | mean | 0.4745 | 0.2044 | 2.7247 |
range | 0.3627 | 0.0360 | 0.8549 | |
median | 0.4678 | 0.2046 | 2.7461 | |
M4esh [43] | mean | 0.3782 | 0.4011 | 2.9600 |
range | 0.3537 | 0.2710 | 0.0466 | |
median | 0.3573 | 0.3942 | 3.0033 |
Network Model | Loss | MAE |
---|---|---|
ST-ConvLSTM | 2.6443 × 10−4 | 0.0115 |
mmPose | 8.3952 × 10−4 | 0.0191 |
MnPoTr | 6.8622 × 10−4 | 0.0167 |
M4esh | 7.3146 × 10−4 | 0.0172 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wei, S.; Wang, H.; Mo, Y.; Du, D. A ST-ConvLSTM Network for 3D Human Keypoint Localization Using MmWave Radar. Sensors 2025, 25, 5857. https://doi.org/10.3390/s25185857
Wei S, Wang H, Mo Y, Du D. A ST-ConvLSTM Network for 3D Human Keypoint Localization Using MmWave Radar. Sensors. 2025; 25(18):5857. https://doi.org/10.3390/s25185857
Chicago/Turabian StyleWei, Siyuan, Huadong Wang, Yi Mo, and Dongping Du. 2025. "A ST-ConvLSTM Network for 3D Human Keypoint Localization Using MmWave Radar" Sensors 25, no. 18: 5857. https://doi.org/10.3390/s25185857
APA StyleWei, S., Wang, H., Mo, Y., & Du, D. (2025). A ST-ConvLSTM Network for 3D Human Keypoint Localization Using MmWave Radar. Sensors, 25(18), 5857. https://doi.org/10.3390/s25185857