Real-Time Human Action Recognition with a Low-Cost RGB Camera and Mobile Robot Platform
Abstract
:1. Introduction
2. Materials and Methods
2.1. Pose Estimation Pipeline
2.2. Skeleton to Image Conversion
2.3. CNN Model
2.4. Mobile Robot Platform
3. Experimental Setup
3.1. Dataset
3.2. Training
4. Results and Discussion
4.1. Results of Proposed Model
4.2. Results of Tracking
5. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Luo, Z.; Hsieh, J.-T.; Balachandar, N.; Yeung, S.; Pusiol, G.; Luxenberg, J.; Li, G.; Li, L.-J.; Downing, N.L.; Milstein, A.; et al. Computer Vision-Based Descriptive Analytics of Seniors’ Daily Activities for Long-Term Health Monitoring. In Proceedings of the 2018 Machine Learning for Healthcare, Stanford, CA, USA, 17–18 August 2018; Volume 85, pp. 102–118. [Google Scholar]
- Zhang, Q.; Su, Y.; Yu, P. Assisting an Elderly with Early Dementia Using Wireless Sensors Data in Smarter Safer Home—Service Science and Knowledge Innovation; Liu, K., Gulliver, S.R., Li, W., Yu, C., Eds.; Springer: Berlin/Heidelberg, Germany, 2014; pp. 398–404. [Google Scholar]
- Torres-Huitzil, C.; Alvarez-Landero, A. Accelerometer-Based Human Activity Recognition in Smartphones for Healthcare Services BT—Mobile Health: A Technology Road Map; Adibi, S., Ed.; Springer International Publishing: Cham, Switzerland, 2015; pp. 147–169. ISBN 978-3-319-12817-7. [Google Scholar]
- Ahmed, N.; Rafiq, J.I.; Islam, M.R. Enhanced Human Activity Recognition Based on Smartphone Sensor Data Using Hybrid Feature Selection Model. Sensors 2020, 20, 317. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Castro, D.; Coral, W.; Rodriguez, C.; Cabra, J.; Colorado, J. Wearable-based human activity recognition using an IoT Approach. J. Sens. Actuator Netw. 2017, 6, 28. [Google Scholar] [CrossRef] [Green Version]
- Chung, S.; Lim, J.; Noh, K.J.; Kim, G.; Jeong, H. Sensor Data Acquisition and Multimodal Sensor Fusion for Human Activity Recognition Using Deep Learning. Sensors 2019, 19, 1716. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Cruciani, F.; Cleland, I.; Nugent, C.; McCullagh, P.; Synnes, K.; Hallberg, J. Automatic Annotation for Human Activity Recognition in Free Living Using a Smartphone. Sensors 2018, 18, 2203. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Navneet Dalal and Bill Triggs Histograms of Oriented Gradients for Human Detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), San Diego, CA, USA, 20–25 June 2005; pp. 886–893.
- Dalal, N.; Triggs, B.; Schmid, C. Human Detection Using Oriented Histograms of Flow and Appearance. In Lecture Notes in Computer Science, Proceedings of the Computer Vision—ECCV 2006, Graz, Austria, 7–13 May 2006; Leonardis, A., Bischof, H., Pinz, A., Eds.; Springer: Berlin/Heidelberg, Germany, 2006; pp. 428–441. [Google Scholar]
- Wang, L.; Xiong, Y.; Wang, Z.; Qiao, Y.; Lin, D.; Tang, X.; Gooi, L.V. Temporal Segment Networks: Towards Good Practices for Deep Action Recognition. In Proceedings of the European Conference on Computer Vision (ECCV) (2016), Amsterdam, The Netherlands, 8–19 October 2016; Volume 9912. [Google Scholar]
- Zhang, B.; Wang, L.; Wang, Z.; Qiao, Y.; Wang, H. Real-Time Action Recognition with Deeply Transferred Motion Vector CNNs. IEEE Trans. Image Process. 2018, 27, 2326–2339. [Google Scholar] [CrossRef] [PubMed]
- Simonyan, K.; Zisserman, A. Two-stream convolutional networks for action recognition in videos. In Proceedings of the 27th International Conference on Neural Information Processing Systems (NIPS), Montreal, QC, Canada, 8–11 December 2014; pp. 568–576. [Google Scholar]
- Tra, D.; Bourdev, L.; Fergus, R.; Torresani, L.; Paluri, M. Learning Spatiotemporal Features with 3D Convolutional Networks. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 13–16 December 2015; pp. 4489–4497. [Google Scholar]
- Hadidi, R.; Cao, J.; Xie, Y.; Asgari, B.; Krisima, T.; Kim, H. Characterizing the Deployment of Deep Neural Networks on Commercial Edge Devices. In Proceedings of the 2019 IEEE Workshop/Symposium on Workload Characterizations, Orlando, FL, USA, 3–5 November 2019; pp. 35–48. [Google Scholar]
- Song, S.; Lan, C.; Xing, J.; Zeng, W.; Liu, J. An end-to-end spatio-temporal attention model for human action recognition from skeleton data. In Proceedings of the 31st AAAI Conference on Artificial Intelligence (AAAI), San Francisco, CA, USA, 4–9 February 2017; Volume 99, pp. 4263–4270. [Google Scholar]
- Cao, Z.; Simon, T.; Wei, S.E.; Sheikh, Y. Realtime multi-person 2D pose estimation using part affinity fields. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1302–1310. [Google Scholar]
- Habermann, M.; Xu, W.; Zollhöfer, M.; Pons-Moll, G.; Theobalt, C. LiveCap: Real-Time Human Performance Capture from Monocular Video. ACM Trans. Graph. 2019, 38, 14:1–14:17. [Google Scholar] [CrossRef]
- Nvidia GeForce GTX 1080 Ti. Available online: https://www.nvidia.com/en-us/geforce/products/10series/geforce-gtx-1080-ti/ (accessed on 16 May 2020).
- Mehta, D.; Sotnychenko, O.; Mueller, F.; Xu, W.; Elgharib, M.; Fua, P.; Seidel, H.; Rhodin, H.; Pons-Moll, G.; Theobalt, C. XNect: Real-time Multi-person 3D Human Pose Estimation with a Single RGB Camera. arXiv 2019, arXiv:1907.00837. [Google Scholar]
- Martinez, J.; Hossain, R.; Romero, J.; Little, J.J. A Simple Yet Effective Baseline for 3D Human Pose Estimation. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 Octobor 2017; pp. 2659–2668. [Google Scholar]
- Lin, T.Y.; Maire, M.; Belongis, S.; Hays, J.; Perona, P.; Rananan, D.; Doll’ar, P.; Zitnick, C.L. Microsoft COCO: Common Objects in Context. In Proceedings of the 2014 13th European Conference on Computer Vision (ECCV), Zurich, Switzerland, 6–12 September 2014; pp. 740–755. [Google Scholar]
- Andriluka, M.; Pishchulin, L.; Gehler, P.; Schiele, B. 2D human pose estimation: New benchmark and state of the art analysis. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Washington, DC, USA, 23–28 June 2014; pp. 3686–3693. [Google Scholar]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanent, P.; Reed, S.; Anguelov, D.; Erhan, D.; Rabinovich, A. Going Deeper with Convolutions. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Sapiński, T.; Kamińska, D.; Pelikant, A.; Anbarjafari, G. Emotion Recognition from Skeletal Movements. Entropy 2019, 21, 646. [Google Scholar] [CrossRef] [Green Version]
- Sergey, I.; Szedegy, C.S. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In Proceedings of the 32nd International Conference on International Conference on Machine Learning (ICML), Lille, France, 6–11 July 2015; pp. 448–456. [Google Scholar]
- Glorot, X.; Bengio, Y. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the 13th International Conference on Artificial Intelligence and Statistics, Sardinia, Italy, 13–15 May 2010; Volume 9, pp. 249–256. [Google Scholar]
- Nair, V.; Hinton, G.E. Rectified Linear Units Improve Restricted Boltzmann Machines Vinod. In Proceedings of the 27th International Conference on International Conference on Machine Learning (ICML), Haifa, Israel, 21–24 June 2010; pp. 807–814. [Google Scholar]
- Nvidia Jetson AGX Xavier: The AI Platform for Autonomous Machines. Available online: https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/jetson-agx-xavier/ (accessed on 16 May 2020).
- OpenCR. Available online: http://www.robotis.us/opencr1-0/ (accessed on 16 March 2020).
- ROBOTIS Turtlebot. Available online: https://www.turtlebot.com/ (accessed on 16 March 2020).
- Dynamixel Motor. Available online: http://www.robotis.us/dynamixel/ (accessed on 16 March 2020).
- Shahroudy, A.; Liu, J.; Ng, T.T.; Wang, G. NTU RGB+D: A large scale dataset for 3D human activity analysis. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 1010–1019. [Google Scholar]
- Robbins, H.; Monro, S. A Stochastic Approximation Method. Ann. Math. Stat. 1951, 22, 400–407. [Google Scholar] [CrossRef]
- Ng, A.Y. Feature selection, L1 vs. L2 regularization, and rotational invariance. In Proceedings of the 2004 IEEE International Conference on Machine Learning (ICML), Banff, AB, Canada, 4–8 July 2004. [Google Scholar]
- Nvidia GeForce RTX 2080 Ti. Available online: https://www.nvidia.com/en-us/geforce/graphics-cards/rtx-2080/ (accessed on 16 March 2020).
- AMD Ryzen 7 2700 Processor. Available online: https://www.amd.com/en/products/cpu/amd-ryzen-7-2700 (accessed on 16 March 2020).
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
Method | Accuracy | Precision | Recall |
---|---|---|---|
RGB camera-17 | 71% | 0.71 | 0.69 |
Kinect-25 | 75% | 0.74 | 0.74 |
Kinect-17 | 74% | 0.73 | 0.73 |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lee, J.; Ahn, B. Real-Time Human Action Recognition with a Low-Cost RGB Camera and Mobile Robot Platform. Sensors 2020, 20, 2886. https://doi.org/10.3390/s20102886
Lee J, Ahn B. Real-Time Human Action Recognition with a Low-Cost RGB Camera and Mobile Robot Platform. Sensors. 2020; 20(10):2886. https://doi.org/10.3390/s20102886
Chicago/Turabian StyleLee, Junwoo, and Bummo Ahn. 2020. "Real-Time Human Action Recognition with a Low-Cost RGB Camera and Mobile Robot Platform" Sensors 20, no. 10: 2886. https://doi.org/10.3390/s20102886
APA StyleLee, J., & Ahn, B. (2020). Real-Time Human Action Recognition with a Low-Cost RGB Camera and Mobile Robot Platform. Sensors, 20(10), 2886. https://doi.org/10.3390/s20102886