Multi-Camera-Based Human Activity Recognition for Human–Robot Collaboration in Construction
Abstract
:1. Introduction
2. Related Works
3. Methodology
3.1. Estimating 3D Human Pose Using Particle Filter
Algorithm 1 Pseudocode for the particle filter algorithm used in 3D human pose estimation—EstimatePose(N, M, C) |
Parameters N: number of iterations M: number of particles C: Set of location and orientation of camera sensors |
1: P = initialize_particle(M) // A particle has 3D locations (xi, yi, zi) of i-th joint for I = 0,…,14 2: for n = 1 to N do 3: for i = 1 to n(C) do 4: image[i] = capture_frame(C[i]) 5: end for 6: for m = 1 to M do 7: P[m] += N(0, Σ) // Pose change as a free rolling ball 8: W[m] = 0 9: for i = 1 to n(C) do 10: // Expectation and observation of 2D locations of joints on the image coord. 11: expectedm,i = project_2d(P[m], C[i]) 12: observationi = detect_pose_2d(image) 13: W[m] = update(W[m], expectedm,i, observationi) 14: end for 15: end for 16: W = normalize(W) 17: P = resample(P, W) 18: end for 19: Return estimated pose |
3.2. Recognizing Human Activities Using LSTM with 3D Joint Locations
4. Experiments
4.1. Data Collection and Model Implementation
4.2. Performance Measurement
5. Results and Discussion
5.1. Experimental Results
5.2. Discussions
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Maeda, J.; Takada, H.; Abe, Y.; Maeda, J.; Takada, H.; Abe, Y. Applicable Possibility Studies on a Humanoid Robot to Cooperative Work on Construction Site with a Human Worker. In Proceedings of the 21st International Symposium on Automation and Robotics in Construction, Jeju Island, Republic of Korea, 21–25 September 2004. [Google Scholar] [CrossRef] [Green Version]
- Bock, T.; Bulgalow, A.; Ashida, S. Façade Cleaning Robot for the Skyscraper. In Proceedings of the 19th International Symposium on Automation and Robotics in Construction (ISARC), Washington, DC, USA, 23–25 September 2017. [Google Scholar] [CrossRef] [Green Version]
- Zhu, A.; Pauwels, P.; De Vries, B. Smart Component-Oriented Method of Construction Robot Coordination for Prefabricated Housing. Autom. Constr. 2021, 129, 103778. [Google Scholar] [CrossRef]
- Zhang, M.; Xu, R.; Wu, H.; Pan, J.; Luo, X. Human–Robot Collaboration for on-Site Construction. Autom. Constr. 2023, 150, 104812. [Google Scholar] [CrossRef]
- Brosque, C.; Galbally, E.; Khatib, O.; Fischer, M. Human-Robot Collaboration in Construction: Opportunities and Challenges. In Proceedings of the HORA 2020—2nd International Congress on Human-Computer Interaction, Optimization and Robotic Applications, Ankara, Turkey, 26–28 June 2020. [Google Scholar] [CrossRef]
- Lee, S.U.; Hofmann, A.; Williams, B. A Model-Based Human Activity Recognition for Human-Robot Collaboration. In Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China, 3–8 November 2019; pp. 736–743. [Google Scholar] [CrossRef]
- Roitberg, A.; Perzylo, A.; Somani, N.; Giuliani, M.; Rickert, M.; Knoll, A. Human Activity Recognition in the Context of Industrial Human-Robot Interaction. In Proceedings of the 2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2014, Siem Reap, Cambodia, 9–12 December 2014. [Google Scholar] [CrossRef]
- Agrawal, S.C.; Tripathi, R.K.; Jalal, A.S. Human Fall Detection Using Video Surveillance. ACS J. Sci. Eng. 2021, 1, 1–10. [Google Scholar] [CrossRef]
- Gupta, A.; Gupta, K.; Gupta, K.; Gupta, K.O. Human Activity Recognition Using Pose Estimation and Machine Learning Algorithm. In Proceedings of the ISIC’21: International Semantic Intelligence Conference, New Delhi, India, 25–27 February 2021. [Google Scholar]
- Nguyen, H.C.; Nguyen, T.H.; Scherer, R.; Le, V.H. Deep Learning for Human Activity Recognition on 3D Human Skeleton: Survey and Comparative Study. Sensors 2023, 23, 5121. [Google Scholar] [CrossRef]
- Xing, Y.; Zhu, J. Deep Learning-Based Action Recognition with 3D Skeleton: A Survey. CAAI Trans. Intell. Technol. 2021, 6, 80–92. [Google Scholar] [CrossRef]
- Ren, B.; Liu, M.; Ding, R.; Liu, H. A Survey on 3D Skeleton-Based Action Recognition Using Learning Method. arXiv 2020, arXiv:2002.05907v1. [Google Scholar]
- Ramirez, H.; Velastin, S.A.; Meza, I.; Fabregas, E.; Makris, D.; Farias, G. Fall Detection and Activity Recognition Using Human Skeleton Features. IEEE Access 2021, 9, 33532–33542. [Google Scholar] [CrossRef]
- Taylor, C.J. Reconstruction of Articulated Objects from Point Correspondences in a Single Uncalibrated Image. Comput. Vis. Image Underst. 2000, 80, 349–363. [Google Scholar] [CrossRef] [Green Version]
- Agarwal, A.; Triggs, B. 3D Human Pose from Silhouettes by Relevance Vector Regression. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Washington, DC, USA, 27 June–2 July 2004. [Google Scholar] [CrossRef]
- Agarwal, A.; Triggs, B. Recovering 3D Human Pose from Monocular Images. IEEE Trans. Pattern Anal. Mach. Intell. 2006, 28, 44–58. [Google Scholar] [CrossRef] [Green Version]
- Xu, W.; Chatterjee, A.; Zollhöfer, M.; Rhodin, H.; Fua, P.; Seidel, H.P.; Theobalt, C. Mo2Cap2: Real-Time Mobile 3D Motion Capture with a Cap-Mounted Fisheye Camera. IEEE Trans. Vis. Comput. Graph. 2018, 25, 2093–2101. [Google Scholar] [CrossRef] [Green Version]
- Tome, D.; Peluse, P.; Agapito, L.; Badino, H. XR-EgoPose: Egocentric 3D Human Pose from an HMD Camera. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 7727–7737. [Google Scholar] [CrossRef] [Green Version]
- Tome, D.; Alldieck, T.; Peluse, P.; Pons-Moll, G.; Agapito, L.; Badino, H.; De la Torre, F. SelfPose: 3D Egocentric Pose Estimation from a Headset Mounted Camera. IEEE Trans. Pattern. Anal. Mach. Intell. 2020, 45, 6794–6806. [Google Scholar] [CrossRef] [PubMed]
- Wang, J.; Tan, S.; Zhen, X.; Xu, S.; Zheng, F.; He, Z.; Shao, L. Deep 3D Human Pose Estimation: A Review. Comput. Vis. Image Underst. 2021, 210, 103225. [Google Scholar] [CrossRef]
- Ye, G.; Liu, Y.; Hasler, N.; Ji, X.; Dai, Q.; Theobalt, C. Performance Capture of Interacting Characters with Handheld Kinects. In Proceedings of the ECCV 2012: Computer Vision—ECCV 2012, Florence, Italy, 7–13 October 2012; Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Springer: Berlin/Heidelberg, Germany, 2012; Volume 7573, pp. 828–841. [Google Scholar] [CrossRef]
- Bashirov, R.; Ianina, A.; Iskakov, K.; Kononenko, Y.; Strizhkova, V.; Lempitsky, V.; Vakhitov, A. Real-Time RGBD-Based Extended Body Pose Estimation. In Proceedings of the 2021 IEEE Winter Conference on Applications of Computer Vision, WACV, Waikoloa, HI, USA, 3–8 January 2021; pp. 2806–2815. [Google Scholar] [CrossRef]
- Zhang, H.B.; Zhang, Y.X.; Zhong, B.; Lei, Q.; Yang, L.; Du, J.X.; Chen, D.S. A Comprehensive Survey of Vision-Based Human Action Recognition Methods. Sensors 2019, 19, 1005. [Google Scholar] [CrossRef] [Green Version]
- Wang, J.; Liu, L.; Xu, W.; Sarkar, K.; Theobalt, C. Estimating Egocentric 3D Human Pose in Global Space. In Proceedings of the IEEE International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 11480–11489. [Google Scholar] [CrossRef]
- Chen, L.; Ai, H.; Chen, R.; Zhuang, Z.; Liu, S. Cross-View Tracking for Multi-Human 3D Pose Estimation at over 100 FPS. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 3276–3285. [Google Scholar] [CrossRef]
- Gholami, M.; Rezaei, A.; Rhodin, H.; Ward, R.; Wang, Z.J. TriPose: A Weakly-Supervised 3D Human Pose Estimation via Triangulation from Video. arXiv 2021, arXiv:2105.06599. [Google Scholar]
- Ragaglia, M.; Zanchettin, A.M.; Rocco, P. Trajectory Generation Algorithm for Safe Human-Robot Collaboration Based on Multiple Depth Sensor Measurements. Mechatronics 2018, 55, 267–281. [Google Scholar] [CrossRef]
- Saito, A.; Kizawa, S.; Kobayashi, Y.; Miyawaki, K. Pose Estimation by Extended Kalman Filter Using Noise Covariance Matrices Based on Sensor Output. ROBOMECH J. 2020, 7, 36. [Google Scholar] [CrossRef]
- Negin, F.; Koperski, M.; Crispim, C.F.; Bremond, F.; Coşar, S.; Avgerinakis, K. A hybrid framework for online recognition of activities of daily living in real-world settings. In Proceedings of the 13th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Colorado Springs, CO, USA, 23–26 August 2016; pp. 37–43. [Google Scholar] [CrossRef]
- Zhang, Z.; Liu, Y.; Li, A.; Wang, M. A novel method for user-defined human posture recognition using Kinect. In Proceedings of the IEEE 7th International Congress on Image and Signal Processing, Dalian, China, 14–16 October 2014; pp. 736–740. [Google Scholar]
- Jalal, A.; Kim, Y.H.; Kim, Y.J.; Kamal, S.; Kim, D. Robust human activity recognition from depth video using spatiotemporal multi-fused features. Pattern Recognit. 2017, 61, 295–308. [Google Scholar] [CrossRef]
- Chang, J.; Wang, L.; Meng, G.; Xiang, S.; Pan, C. Vision-based occlusion handling and vehicle classification for traffic surveillance systems. IEEE Intell. Transp. Syst. Mag. 2018, 10, 80–92. [Google Scholar] [CrossRef]
- Dang, L.M.; Min, K.; Wang, H.; Piran, M.J.; Lee, C.H.; Moon, H. Sensor-based and vision-based human activity recognition: A comprehensive survey. Pattern Recognit. 2020, 108, 107561. [Google Scholar] [CrossRef]
- Li, X.; Fan, Z.; Liu, Y.; Li, Y.; Dai, Q. 3D Pose Detection of Closely Interactive Humans Using Multi-View Cameras. Sensors 2019, 19, 2831. [Google Scholar] [CrossRef] [Green Version]
- Caron, F.; Davy, M.; Duflos, E.; Vanheeghe, P. Particle Filtering for Multisensor Data Fusion with Switching Observation Models: Application to Land Vehicle Positioning. IEEE Trans. Signal Process. 2007, 55, 2703–2719. [Google Scholar] [CrossRef]
- Andrieu, C.; Davy, M.; Doucet, A. Efficient Particle Filtering for Jump Markov Systems. Application to Time-Varying Autoregressions. IEEE Trans. Signal Process. 2003, 51, 1762–1770. [Google Scholar] [CrossRef] [Green Version]
- Doucet, A.; Freitas, N.; Gordon, N. (Eds.) Sequential Monte Carlo Methods in Practice; Springer: Berlin/Heidelberg, Germany, 2001. [Google Scholar] [CrossRef]
- Inoue, M.; Inoue, S.; Nishida, T. Deep Recurrent Neural Network for Mobile Human Activity Recognition with High Throughput. Artif. Life Robot 2018, 23, 173–185. [Google Scholar] [CrossRef] [Green Version]
- Zhu, W.; Lan, C.; Xing, J.; Zeng, W.; Li, Y.; Shen, L.; Xie, X. Co-Occurrence Feature Learning for Skeleton Based Action Recognition Using Regularized Deep LSTM Networks. In Proceedings of the 30th AAAI Conference on Artificial Intelligence, AAAI, Phoenix, AR, USA, 12–17 February 2016; pp. 3697–3703. [Google Scholar] [CrossRef]
- Núñez, J.C.; Cabido, R.; Pantrigo, J.J.; Montemayor, A.S.; Vélez, J.F. Convolutional Neural Networks and Long Short-Term Memory for Skeleton-Based Human Activity and Hand Gesture Recognition. Pattern Recognit. 2018, 76, 80–94. [Google Scholar] [CrossRef]
- Kim, K.; Cho, Y.K. Automatic Recognition of Workers’ Motions in Highway Construction by Using Motion Sensors and Long Short-Term Memory Networks. J. Constr. Eng. Manag. 2020, 147, 04020184. [Google Scholar] [CrossRef]
- Shang, X.; Song, M.; Wang, Y.; Yu, C.; Yu, H.; Li, F.; Chang, C.I. Target-Constrained Interference-Minimized Band Selection for Hyperspectral Target Detection. IEEE Trans. Geosci. Remote Sens. 2021, 59, 6044–6064. [Google Scholar] [CrossRef]
- Wang, P.; Wang, L.; Leung, H.; Zhang, G. Super-Resolution Mapping Based on Spatial-Spectral Correlation for Spectral Imagery. IEEE Trans. Geosci. Remote Sens. 2021, 59, 2256–2268. [Google Scholar] [CrossRef]
- Lugaresi, C.; Tang, J.; Nash, H.; Mcclanahan, C.; Uboweja, E.; Hays, M.; Zhang, F.; Chang, C.-L.; Yong, M.G.; Lee, J.; et al. MediaPipe: A Framework for Building Perception Pipelines. arXiv 2019, arXiv:1906.08172. [Google Scholar]
- Rossi, L.; Paolanti, M.; Pierdicca, R.; Frontoni, E. Human trajectory prediction and generation using LSTM models and GANs. Pattern Recognit. 2021, 120, 108136. [Google Scholar] [CrossRef]
- Kim, K.; Cho, Y.K. Effective inertial sensor quantity and locations on a body for deep learning-based worker’s motion recognition. Autom. Constr. 2020, 113, 103126. [Google Scholar] [CrossRef]
- Roberts, D.; Calderon, W.T.; Tang, S.; Golparvar-Fard, M. Vision-Based Construction Worker Activity Analysis Informed by Body Posture. J. Comput. Civ. Eng. 2020, 34, 04020017. [Google Scholar] [CrossRef]
- Akhavian, R.; Behzadan, A.H. Smartphone-Based Construction Workers’ Activity Recognition and Classification. Autom. Constr. 2016, 71, 198–209. [Google Scholar] [CrossRef]
- Sam, M.; Franz, B.; Sey-Taylor, E.; McCarty, C. Evaluating the perception of human-robot collaboration among construction project managers. In Proceedings of the Construction Research Congress 2022, Arlington, VA, USA, 9–12 March 2022; pp. 550–559. [Google Scholar]
- Chea, C.P.; Bai, Y.; Pan, X.; Arashpour, M.; Xie, Y. An integrated review of automation and robotic technologies for structural prefabrication and construction. Transp. Saf. Environ. 2020, 2, 81–96. [Google Scholar] [CrossRef]
- Pransky, J. The Pransky interview: Dr. Tessa Lau, Founder and CEO of Dusty Robotics. Ind. Robot Int. J. Robot. Res. Appl. 2020, 47, 643–646. [Google Scholar] [CrossRef]
- Cardno, C.A. Robotic rebar-tying system uses artificial intelligence. Civ. Eng. Mag. Arch. 2018, 88, 38–39. [Google Scholar] [CrossRef]
- Eternal Robotics. MYRO. Available online: www.eternalrobotics.com/solutions/#SMyro (accessed on 30 June 2023).
- Putra, P.U.; Shima, K.; Shimatani, K. Markerless Human Activity Recognition Method Based on Deep Neural Network Model Using Multiple Cameras. In Proceedings of the 5th International Conference on Control, Decision and Information Technologies, CoDIT, Thessaloniki, Greece, 10–13 April 2018; pp. 13–18. [Google Scholar] [CrossRef]
- Siddiqi, M.H.; Almashfi, N.; Ali, A.; Alruwaili, M.; Alhwaiti, Y.; Alanazi, S.; Kamruzzaman, M.M. A Unified Approach for Patient Activity Recognition in Healthcare Using Depth Camera. IEEE Access 2021, 9, 92300–92317. [Google Scholar] [CrossRef]
- Agarwal, P.; Alam, M. A Lightweight Deep Learning Model for Human Activity Recognition on Edge Devices. Procedia Comput. Sci. 2020, 167, 2364–2373. [Google Scholar] [CrossRef]
- Antwi-Afari, M.F.; Qarout, Y.; Herzallah, R.; Anwer, S.; Umer, W.; Zhang, Y.; Manu, P. Deep learning-based networks for automated recognition and classification of awkward working postures in construction using wearable insole sensor data. Autom. Constr. 2022, 136, 104181. [Google Scholar] [CrossRef]
Manufacturer | Construction Robot | Functions | Collaborative Human Workers’ Behaviors | Human Workers’ Activities |
---|---|---|---|---|
Construction Robotics | Semi-Automated Masonry System (SAM) [49] | Lifting the brick, applying mortar, and placing each brick in place | Accurate placement of the bricks, cleaning up excess mortar, and overseeing the overall project | Lifting/Moving/Standing/Walking |
Construction Robotics | Material Unit Lift Enhancer (MULE) | lifting and placing heavy material | Ensuring the accurate placement of the materials | Lifting/Moving |
Fastbrick Robotics | Hardrian X [50] | bricklaying | accurate placement of the bricks, cleaning up excess mortar | Lifting/Moving |
Dusty Robotics | FieldPrinter/Theometrics | Layout and measurement task | Measuring with tape to ensure the accuracy | Measuring |
Doxel | Doxel AI [51] | Monitoring job progress | Supervision | Standing/Walking |
Advanced Construction Robotics, Inc. | TyBot [52] | Rebar tying | Supervision | Standing/Walking/Sitting/Kneeling |
Eternal Robotics | Myro [53] | Wall painting | Supervision | Standing/walking/climbing ladder/Working overhead |
Actual Class | |||
---|---|---|---|
Positive | Negative | ||
Predicted class | Positive | True positive (TP) | False positive (FP) |
Negative | False negative (FN) | True negative (TN) |
No. Camera | Single Camera | Two Cameras | Three Cameras | Four Cameras | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Activity | P | R | F | P | R | F | P | R | F | P | R | F |
Climbing ladder | 0.81 | 0.71 | 0.76 | 0.82 | 0.90 | 0.86 | 0.84 | 0.99 | 0.91 | 0.85 | 0.94 | 0.89 |
Hammering | 1.00 | 0.64 | 0.78 | 1.00 | 0.70 | 0.82 | 1.00 | 0.91 | 0.95 | 1.00 | 1.00 | 1.00 |
Kneeling | 0.63 | 0.67 | 0.65 | 0.95 | 0.98 | 0.97 | 0.95 | 0.98 | 0.97 | 0.96 | 1.00 | 0.98 |
Lifting object | 0.66 | 0.78 | 0.72 | 0.97 | 0.92 | 0.95 | 0.98 | 1.00 | 0.99 | 1.00 | 1.00 | 1.00 |
Measuring | 0.82 | 0.99 | 0.90 | 0.81 | 1.00 | 0.89 | 0.93 | 0.94 | 0.94 | 1.00 | 0.88 | 0.94 |
Moving object | 0.95 | 0.92 | 0.93 | 0.97 | 0.97 | 0.97 | 0.98 | 0.98 | 0.98 | 1.00 | 1.00 | 1.00 |
Sitting | 0.72 | 0.84 | 0.77 | 1.00 | 0.83 | 0.91 | 0.96 | 1.00 | 0.98 | 0.98 | 0.98 | 0.98 |
Standing | 0.84 | 0.68 | 0.75 | 0.81 | 0.99 | 0.89 | 0.99 | 1.00 | 1.00 | 1.00 | 0.99 | 1.00 |
Walking | 0.89 | 0.86 | 0.88 | 0.87 | 0.90 | 0.89 | 0.99 | 0.95 | 0.97 | 0.97 | 0.97 | 0.97 |
Working Overhead | 0.99 | 0.95 | 0.97 | 0.98 | 0.87 | 0.92 | 0.99 | 0.93 | 0.96 | 0.99 | 0.99 | 0.99 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Jang, Y.; Jeong, I.; Younesi Heravi, M.; Sarkar, S.; Shin, H.; Ahn, Y. Multi-Camera-Based Human Activity Recognition for Human–Robot Collaboration in Construction. Sensors 2023, 23, 6997. https://doi.org/10.3390/s23156997
Jang Y, Jeong I, Younesi Heravi M, Sarkar S, Shin H, Ahn Y. Multi-Camera-Based Human Activity Recognition for Human–Robot Collaboration in Construction. Sensors. 2023; 23(15):6997. https://doi.org/10.3390/s23156997
Chicago/Turabian StyleJang, Youjin, Inbae Jeong, Moein Younesi Heravi, Sajib Sarkar, Hyunkyu Shin, and Yonghan Ahn. 2023. "Multi-Camera-Based Human Activity Recognition for Human–Robot Collaboration in Construction" Sensors 23, no. 15: 6997. https://doi.org/10.3390/s23156997
APA StyleJang, Y., Jeong, I., Younesi Heravi, M., Sarkar, S., Shin, H., & Ahn, Y. (2023). Multi-Camera-Based Human Activity Recognition for Human–Robot Collaboration in Construction. Sensors, 23(15), 6997. https://doi.org/10.3390/s23156997