Integrating OpenPose for Proactive Human–Robot Interaction Through Upper-Body Pose Recognition
Abstract
1. Introduction
2. Related Work
3. Methodology
3.1. System Overview
3.2. Upper-Body Pose Recognition Using Neural Networks
3.2.1. Upper-Body Feature Extraction and Labeling
3.2.2. Artificial Neural Network (ANN) Model Construction
3.3. Active Care System for Zenbo Junior
- Bowing the head: When people engage in activities such as using their smartphones, having a meal, or reading a book, they often bow their heads. Upon recognizing the posture of bowing the head, the robot moves in front of the person, lowers its head to look at the person, and asks, “What are you doing?” The robot activates its voice recognition function and provides appropriate reminders based on the person’s response.
- Looking at the screen: When people are focused on their screens while working, they assume a posture of looking at the screen. When the robot recognizes this posture, it moves next to the person, looks at the screen, and asks, “What are you looking at?” The robot activates its voice recognition function and offers reminders or suggestions based on the person’s response.
- Looking at the robot: When people look at the robot, they assume a posture of looking at the robot. Upon recognizing this posture, the robot rotates left and right and responds, “Wishing you a wonderful day.”
- Propping up the head: When people are engaged in deep thinking or pondering, they assume a posture of propping up their head. When the robot detects this posture, it asks, “What are you thinking about?” The robot activates its voice recognition function and responds accordingly to the person’s input.
- Stretching hands: When people have been working for a while and want to relax, they assume a posture of stretching their hands. Upon recognizing this posture, the robot looks up and reminds the person to take proper rest.
4. Experiments and Results
4.1. Data Collection
4.2. Experimental Setting
4.3. Experimental Results
5. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Sakai, K.; Hiroi, Y.; Ito, A. Playing with a Robot: Realization of “Red Light, Green Light” Using a Laser Range Finder”. In Proceedings of the 2015 Third International Conference on Robot, Vision and Signal Processing (RVSP), Kaohsiung, Taiwan, 18–20 November 2015. [Google Scholar]
- Brock, H.; Sabanovic, S.; Nakamura, K.; Gomez, R. Robust real-time hand gestural recognition for non-verbal communication with tabletop robot haru. In Proceedings of the 2020 29th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), Naples, Italy, 31 August–4 September 2020; pp. 891–898. [Google Scholar]
- Brock, H.; Chulani, J.P.; Merino, L.; Szapiro, D.; Gomez, R. Developing a Lightweight Rock-Paper-Scissors Framework for Human-Robot Collaborative Gaming. IEEE Access 2020, 8, 202958–202968. [Google Scholar] [CrossRef]
- Hsu, R.C.; Lin, Y.-P.; Lin, C.-J.; Lai, L.-S. Humanoid robots for searching and kicking a ball and dancing. In Proceedings of the 2020 IEEE Eurasia Conference on IOT, Communication and Engineering (ECICE), Yunlin, Taiwan, 23–25 October 2020; pp. 385–387. [Google Scholar]
- Cha, N.; Kim, I.; Park, M.; Kim, A.; Lee, U. HelloBot: Facilitating Social Inclusion with an Interactive Greeting Robot. In Proceedings of the 2018 ACM International Joint Conference and 2018 International Symposium on Pervasive and Ubiquitous Computing and Wearable Computers, Singapore, 8–12 October 2018; pp. 21–24. [Google Scholar]
- Li, C.; Imeokparia, E.; Ketzner, M.; Tsahai, T. Teaching the nao robot to play a human-robot interactive game. In Proceedings of the 2019 International Conference on Computational Science and Computational Intelligence (CSCI), Las Vegas, NV, USA, 5–7 December 2019; pp. 712–715. [Google Scholar]
- Hsieh, C.-F.; Lin, Y.-R.; Lin, T.-Y.; Lin, Y.-H.; Chiang, M.-L. Apply Kinect and Zenbo to Develop Interactive Health Enhancement System. In Proceedings of the 2019 8th International Conference on Innovation, Communication and Engineering (ICICE), Zhengzhou, China, 25–30 October 2019; pp. 165–168. [Google Scholar]
- Wei, S.-E.; Ramakrishna, V.; Kanade, T.; Sheikh, Y. Convolutional pose machines. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 4724–4732. [Google Scholar]
- Newell, A.; Yang, K.; Deng, J. Stacked hourglass networks for human pose estimation. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 483–499. [Google Scholar]
- Yang, W.; Li, S.; Ouyang, W.; Li, H.; Wang, X. Learning feature pyramids for human pose estimation. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 1281–1290. [Google Scholar]
- Ke, L.; Chang, M.-C.; Qi, H.; Lyu, S. Multi-scale structure-aware network for human pose estimation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 713–728. [Google Scholar]
- Fang, H.-S.; Xie, S.; Tai, Y.-W.; Lu, C. Rmpe: Regional multi-person pose estimation. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2334–2343. [Google Scholar]
- Cao, Z.; Simon, T.; Wei, S.-E.; Sheikh, Y. Realtime multi-person 2d pose estimation using part affinity fields. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7291–7299. [Google Scholar]
- Cao, Z.; Hidalgo, G.; Simon, T.; Wei, S.-E.; Sheikh, Y. OpenPose: Realtime multi-person 2D pose estimation using Part Affinity Fields. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 43, 172–186. [Google Scholar] [CrossRef] [PubMed]
- Rishan, F.; De Silva, B.; Alawathugoda, S.; Nijabdeen, S.; Rupasinghe, L.; Liyanapathirana, C. Infinity Yoga Tutor: Yoga Posture Detection and Correction System. In Proceedings of the 2020 5th International Conference on Information Technology Research (ICITR), Moratuwa, Sri Lanka, 2–4 December 2020; pp. 1–6. [Google Scholar]
- Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
- Chen, K. Sitting posture recognition based on OpenPose. IOP Conf. Ser. Mater. Sci. Eng. 2019, 677, 032057. [Google Scholar] [CrossRef]
- LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
- Vaufreydaz, D.; Johal, W.; Combe, C. Starting engagement detection towards a companion robot using multimodal features. Robot. Auton. Syst. 2016, 75, 4–16. [Google Scholar] [CrossRef]
- Lin, L.-H.; Cui, Y.; Hao, Y.; Xia, F.; Sadigh, D. Gesture-informed robot assistance via foundation models. In Proceedings of the 7th Annual Conference on Robot Learning, Atlanta, GA, USA, 6–9 November 2023. [Google Scholar]
- Belardinelli, A. Gaze-based intention estimation: Principles, methodologies, and applications in HRI. ACM Trans. Hum. -Robot Interact. 2024, 13, 1–30. [Google Scholar] [CrossRef]
- Recht, B.; Re, C.; Wright, S.; Niu, F. Hogwild!: A lock-free approach to parallelizing stochastic gradient descent. Adv. Neural Inf. Process. Syst. 2011, 24, 1–9. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Goodfellow, I.; Bengio, Y.; Courville, A. Deep feedforward networks. In Deep learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
- Murphy, K.P. Machine Learning: A Probabilistic Perspective; MIT Press: Cambridge, MA, USA, 2012. [Google Scholar]
- Abadi, M.; Barham, P.; Chen, J.; Chen, Z.; Davis, A.; Dean, J.; Devin, M.; Ghemawat, S.; Irving, G.; Isard, M.; et al. TensorFlow: A system for Large-Scale machine learning. In Proceedings of the 12th USENIX symposium on operating systems design and implementation (OSDI 16), Savannah, GA, USA, 2–4 November 2016; pp. 265–283. [Google Scholar]
- Chollet, F. Deep Learning with Python; Simon and Schuster: New York, NY, USA, 2021. [Google Scholar]
Central Processing Unit (CPU) | Intel Core i7-8700 |
Graphics Processing Unit (GPU) | NVIDIA GeForce RTX3060 |
Random Access Memory (RAM) | 64GB DDR3 |
Operating System | Windows 10 |
Programming language | Python 3.7.3 |
Integrated Development Environment (IDE) | TensorFlow 2.3.0 Keras 2.4.3 Android Studio 4.2.1 |
Input features | 24 |
24 upper-body feature nodes | [x0, y0, x1, y1, x2, y2, x3, y3, x4, y4, x5, y5, x6, y6, x7, y7, x15, y15, x16, y16, x17, y17, x18, y18] |
Neurons | 32 |
Batch size | 25 |
Training iterations | 100 |
Learning rate | 1 × 10−4 (epoch over 40 down to 1 × 10−5) |
Activation function | ReLU |
Optimizer | Adam |
Loss function | Categorical Cross Entropy |
Training data | 4320 |
Validation data | 1080 |
Evaluation Index | Accuracy |
Model | Accuracy | Precision | Recall | F1-Score |
---|---|---|---|---|
ANN | 0.95 | 0.95 | 0.95 | 0.95 |
SVM | 0.69 | 0.76 | 0.69 | 0.68 |
CNN | 0.62 | 0.60 | 0.62 | 0.57 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Tseng, S.-H.; Chiang, J.-C.; Shiue, C.-E.; Yueh, H.-P. Integrating OpenPose for Proactive Human–Robot Interaction Through Upper-Body Pose Recognition. Electronics 2025, 14, 3112. https://doi.org/10.3390/electronics14153112
Tseng S-H, Chiang J-C, Shiue C-E, Yueh H-P. Integrating OpenPose for Proactive Human–Robot Interaction Through Upper-Body Pose Recognition. Electronics. 2025; 14(15):3112. https://doi.org/10.3390/electronics14153112
Chicago/Turabian StyleTseng, Shih-Huan, Jhih-Ciang Chiang, Cheng-En Shiue, and Hsiu-Ping Yueh. 2025. "Integrating OpenPose for Proactive Human–Robot Interaction Through Upper-Body Pose Recognition" Electronics 14, no. 15: 3112. https://doi.org/10.3390/electronics14153112
APA StyleTseng, S.-H., Chiang, J.-C., Shiue, C.-E., & Yueh, H.-P. (2025). Integrating OpenPose for Proactive Human–Robot Interaction Through Upper-Body Pose Recognition. Electronics, 14(15), 3112. https://doi.org/10.3390/electronics14153112