Computer Vision for Collaborative Robots in Industry 5.0: A Survey of Techniques, Gaps, and Future Directions †
Abstract
1. Introduction
2. Methodology
3. Computer Vision Techniques and Architectures
3.1. Classical Computer Vision Methods
3.2. Machine Learning-Based Methods
3.3. Deep Learning-Based Methods
4. Software Frameworks and Implementation Platforms
5. AI-Driven Computer Vision Tasks and Applications in Collaborative Robotics
6. Current Challenges and Research Gaps
7. Future Directions
8. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Aheleroff, S.; Huang, H.; Xu, X.; Zhong, R.Y. Toward sustainability and resilience with Industry 4.0 and Industry 5.0. Front. Manuf. Technol. 2022, 2, 951643. [Google Scholar] [CrossRef]
- Langås, E.F.; Zafar, M.H.; Sanfilippo, F. Exploring the synergy of human-robot teaming, digital twins, and machine learning in Industry 5.0: A step towards sustainable manufacturing. J. Intell. Manuf. 2025, 37, 999–1022. [Google Scholar] [CrossRef]
- Shah, R.; Doss, A.S.A.; Lakshmaiya, N. Advancements in AI-Enhanced Collaborative Robotics: Towards Safer, Smarter, and Human-Centric Industrial Automation. Results Eng. 2025, 27, 105704. [Google Scholar] [CrossRef]
- Puttero, S.; Verna, E.; Genta, G.; Galetto, M. Collaborative robots for quality control: An overview of recent studies and emerging trends. J. Intell. Manuf. 2025. [Google Scholar] [CrossRef]
- Cohen, Y.; Biton, A.; Shoval, S. Fusion of Computer Vision and AI in Collaborative Robotics: A Review and Future Prospects. Appl. Sci. 2025, 15, 7905. [Google Scholar] [CrossRef]
- Patil, S.; Vasu, V.; Srinadh, K.V.S. Advances and perspectives in collaborative robotics: A review of key technologies and emerging trends. Discov. Mech. Eng. 2023, 2, 13. [Google Scholar] [CrossRef]
- Rahman, M.M.; Khatun, F.; Jahan, I.; Devnath, R.; Bhuiyan, M.A.A. Cobotics: The Evolving Roles and Prospects of Next-Generation Collaborative Robots in Industry 5.0. J. Robot. 2024, 2024, 2918089. [Google Scholar] [CrossRef]
- Robinson, N.; Tidd, B.; Campbell, D.; Kulić, D.; Corke, P. Robotic Vision for Human-Robot Interaction and Collaboration: A Survey and Systematic Review. ACM Trans. Hum.-Robot Interact. 2022, 12, 12. [Google Scholar] [CrossRef]
- Borboni, A.; Reddy, K.V.V.; Elamvazuthi, I.; Al-Quraishi, M.S.; Natarajan, E.; Ali, S.S.A. The Expanding Role of Artificial Intelligence in Collaborative Robots for Industrial Applications: A Systematic Review of Recent Works. Machines 2023, 11, 111. [Google Scholar] [CrossRef]
- Santos, A.A.; Schreurs, C.; da Silva, A.F.; Pereira, F.; Felgueiras, C.; Lopes, A.M.; Machado, J. Integration of Artificial Vision and Image Processing into a Pick and Place Collaborative Robotic System. J. Intell. Robot. Syst. 2024, 110, 159. [Google Scholar] [CrossRef]
- Gaboardi, C. Use of the Wavelet Transform for Digital Terrain Model Edge Detection (Special Issue—Wavelet Analysis). J. Appl. Math. Phys. 2018, 6, 1997–2005. [Google Scholar] [CrossRef][Green Version]
- Canny, J. A Computational Approach to Edge Detection. IEEE Trans. Pattern Anal. Mach. Intell. 1986, PAMI-8, 679–698. [Google Scholar] [CrossRef]
- Agrawal, H.; Desai, K. Canny Edge Detection: A Comprehensive Review. Int. J. Technol. Res. Sci. 2024, 9, 27–35. [Google Scholar] [CrossRef]
- Kumar, A. SURF feature descriptor for image analysis. Imaging Radiat. Res. 2024, 6, 5643. [Google Scholar] [CrossRef]
- Bay, H.; Tuytelaars, T.; Gool, L.V. SURF: Speeded Up Robust Features. In Computer Vision—ECCV 2006; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2006; Volume 3951, pp. 404–417. [Google Scholar] [CrossRef]
- Karami, E.; Shehata, M.; Smith, A. Image Identification Using SIFT Algorithm: Performance Analysis against Different Image Deformations. arXiv 2022, arXiv:1710.02728. [Google Scholar] [CrossRef]
- Otero, I.R.; Delbracio, M. Anatomy of the SIFT Method. Image Process. Line 2014, 4, 370–396. [Google Scholar] [CrossRef]
- Mangat, A.S.; Mangler, J.; Rinderle-Ma, S. Interactive Process Automation based on lightweight object detection in manufacturing processes. Comput. Ind. 2021, 130, 103482. [Google Scholar] [CrossRef]
- Molaei, A.; Kolu, A.; Lahtinen, K.; Geimer, M. Automatic recognition of excavator working cycles using supervised learning and motion data obtained from inertial measurement units (IMUs). Constr. Robot. 2024, 8, 14. [Google Scholar] [CrossRef]
- Hussain, S.; Saeed, K.; Baimagambetov, A.; Rab, S.; Saad, M. Advancements in Gesture Recognition Techniques and Machine Learning for Enhanced Human-Robot Interaction: A Comprehensive Review. arXiv 2024, arXiv:2409.06503. [Google Scholar] [CrossRef]
- Roitberg, A.; Perzylo, A.; Somani, N.; Giuliani, M.; Rickert, M.; Knoll, A. Human activity recognition in the context of industrial human-robot interaction. In Proceedings of the 2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA); IEEE: Piscataway, NJ, USA, 2014; pp. 1–10. [Google Scholar] [CrossRef]
- Jiang, Y.; Leung, F. Gaussian Mixture Model and Gaussian Supervector for Image Classification. In Proceedings of the 2018 IEEE 23rd International Conference on Digital Signal Processing (ICDSP); IEEE: Piscataway, NJ, USA, 2018; pp. 1–5. [Google Scholar] [CrossRef]
- Chernova, S.; Veloso, M. Confidence-based policy learning from demonstration using Gaussian mixture models. In Proceedings of the Proceedings of the 6th International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS); ACM: New York, NY, USA, 2007; pp. 1–8. [Google Scholar] [CrossRef]
- Singh, A.K. On effective human robot interaction based on recognition and association. arXiv 2018, arXiv:1812.07100. [Google Scholar] [CrossRef]
- Qi, J.; Ma, L.; Cui, Z.; Yu, Y. Computer vision-based hand gesture recognition for human-robot interaction: A review. Complex Intell. Syst. 2023, 10, 1581–1606. [Google Scholar] [CrossRef]
- Hong, Y.; Yang, Y.; Park, J. Linear Discriminant Analysis-Based Motion Classification Using Distributed Micro-Doppler Radars with Limited Backhaul. Sensors 2021, 21, 2924. [Google Scholar] [CrossRef]
- Shahin, M.; Chen, F.F.; Hosseinzadeh, A.; Zand, N. Using machine learning and deep learning algorithms for downtime minimization in manufacturing systems: An early failure detection diagnostic service. Int. J. Adv. Manuf. Technol. 2023, 128, 3857–3883. [Google Scholar] [CrossRef]
- Li, H.; Jia, M.; Mao, Z. Dynamic Feature Extraction-Based Quadratic Discriminant Analysis for Industrial Process Fault Classification and Diagnosis. Entropy 2023, 25, 1664. [Google Scholar] [CrossRef] [PubMed]
- Ashfaq, T.; Khurshid, K. Classification of Hand Gestures Using Gabor Filter with Bayesian and Naïve Bayes Classifier. Int. J. Adv. Comput. Sci. Appl. 2016, 7. [Google Scholar] [CrossRef]
- Trovato, G.; Chrupała, G.; Takanishi, A. Application of the Naive Bayes Classifier for Representation and Use of Heterogeneous and Incomplete Knowledge in Social Robotics. Robotics 2016, 5, 6. [Google Scholar] [CrossRef]
- Escalante, H.J.; Morales, E.F.; Sucar, L.E. A naïve Bayes baseline for early gesture recognition. Pattern Recognit. Lett. 2016, 73, 91–99. [Google Scholar] [CrossRef]
- Zhuang, C.; Li, S.; Ding, H. Instance segmentation based 6D pose estimation of industrial objects using point clouds for robotic bin-picking. Robot. Comput.-Integr. Manuf. 2023, 82, 102541. [Google Scholar] [CrossRef]
- Saleem, Z.; Gustafsson, F.; Furey, E.; McAfee, M.; Huq, S. A review of external sensors for human detection in a human robot collaborative environment. J. Intell. Manuf. 2024, 36, 2255–2279. [Google Scholar] [CrossRef]
- Wang, S.; Zhang, J.; Wang, P.; Law, J.; Călinescu, R.; Mihaylova, L. A deep learning-enhanced Digital Twin framework for improving safety and reliability in human–robot collaborative manufacturing. Robot. Comput.-Integr. Manuf. 2023, 85, 102608. [Google Scholar] [CrossRef]
- Amorim, A.; Guimares, D.; Mendona, T.; Neto, P.; Costa, P.; Moreira, A.P. Robust human position estimation in cooperative robotic cells. Robot. Comput.-Integr. Manuf. 2020, 67, 102035. [Google Scholar] [CrossRef]
- Bellotto, N.; Hu, H. Computationally efficient solutions for tracking people with a mobile robot: An experimental evaluation of Bayesian filters. Auton. Robot. 2009, 28, 425–438. [Google Scholar] [CrossRef]
- Islam, M.J.; Hong, J.; Sattar, J. Person-following by autonomous robots: A categorical overview. Int. J. Robot. Res. 2019, 38, 1581–1618. [Google Scholar] [CrossRef]
- Sridharan, M.; Meadows, B. Towards a Theory of Explanations for Human–Robot Collaboration. KI KüNstliche Intell. 2019, 33, 331–342. [Google Scholar] [CrossRef]
- Rožanec, J.M.; Montini, E.; Cutrona, V.; Papamartzivanos, D.; Klemenčič, T.; Fortuna, B.; Mladenić, D.; Veliou, E.; Giannetsos, T.; Emmanouilidis, C. Human in the AI Loop via xAI and Active Learning for Visual Inspection. In Explainable Artificial Intelligence for Industry 5.0; Springer: Berlin/Heidelberg, Germany, 2023; pp. 381–406. [Google Scholar] [CrossRef]
- Terras, N.; Pereira, F.; Silva, A.R.; Santos, A.A.; Lopes, A.M.; da Silva, A.F.; Cartal, L.A.; Apostolescu, T.C.; Badea, F.; Machado, J. Integration of Deep Learning Vision Systems in Collaborative Robotics for Real-Time Applications. Appl. Sci. 2025, 15, 1336. [Google Scholar] [CrossRef]
- Govi, E.; Sapienza, D.; Toscani, S.; Cotti, I.; Franchini, G.; Bertogna, M. Addressing challenges in industrial pick and place: A deep learning-based 6 Degrees-of-Freedom pose estimation solution. Comput. Ind. 2024, 161, 104130. [Google Scholar] [CrossRef]
- Tavakoli, H.; Suh, S.; Walunj, S.; Pahlevannejad, P.; Plociennik, C.; Ruskowski, M. Object Detection for Human–Robot Interaction and Worker Assistance Systems. In Explainable Artificial Intelligence for Industry 5.0; Springer: Berlin/Heidelberg, Germany, 2023; pp. 319–332. [Google Scholar] [CrossRef]
- Patalas-Maliszewska, J.; Łosyk, H.; Dudek, A. Improving safety in human–robot collaboration towards sustainable production in Industry 5.0. J. Intell. Manuf. 2025. [Google Scholar] [CrossRef]
- Maham, A.; Tashfa, D.E.N. Deep Learning Perspective of Scene Understanding in Autonomous Robots. arXiv 2025, arXiv:2512.14020. [Google Scholar] [CrossRef]
- Munasinghe, C.; Amin, F.M.; Scaramuzza, D.; van de Venn, H.W. COVERED, CollabOratiVE Robot Environment Dataset for 3D Semantic segmentation. In Proceedings of the 2022 IEEE 27th International Conference on Emerging Technologies and Factory Automation (ETFA), Stuttgart, Germany, 6–9 September 2022; pp. 1–4. [Google Scholar] [CrossRef]
- Monsone, C.R.; Csapó, Á. Instance Segmentation in Industry 5.0 Applications Based on the Automated Generation of Point Clouds. Acta Polytech. Hung. 2025, 22, 25–46. [Google Scholar] [CrossRef]
- Ma, Z.; Jiao, W.; Li, L.; Yang, S.; Xu, X. Application of Keypoint Recognition for Industrial Human-Robot Safe Collaboration Scenarios. In Proceedings of the 2024 IEEE International Symposium on Assembly and Manufacturing (ISAM); IEEE: Piscataway, NJ, USA, 2024; pp. 1–6. [Google Scholar] [CrossRef]
- Forlini, M.; Neri, F.; Ciccarelli, M.; Palmieri, G.; Callegari, M. Experimental implementation of skeleton tracking for collision avoidance in collaborative robotics. Int. J. Adv. Manuf. Technol. 2024, 134, 57–73. [Google Scholar] [CrossRef]
- Svarny, P.; Tesar, M.; Behrens, J.K.; Hoffmann, M. Safe physical HRI: Toward a unified treatment of speed and separation monitoring together with power and force limiting. In Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS); IEEE: Piscataway, NJ, USA, 2019; pp. 7580–7587. [Google Scholar] [CrossRef]
- Amaya-Mejía, L.M.; Duque-Suárez, N.; Jaramillo-Ramírez, D.; Martínez, C. Vision-Based Safety System for Barrierless Human-Robot Collaboration. In Proceedings of the 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Kyoto, Japan, 23–27 October 2022; pp. 7331–7336. [Google Scholar] [CrossRef]
- Liang, G.; Chen, F.; Liang, Y.; Feng, Y.; Wang, C.; Wu, X. A Manufacturing-Oriented Intelligent Vision System Based on Deep Neural Network for Object Recognition and 6D Pose Estimation. Front. Neuror. 2021, 14, 616775. [Google Scholar] [CrossRef]
- Sanghai, N.; Brown, N. Advances in Transformers for Robotic Applications: A Review. arXiv 2024, arXiv:2412.10599. [Google Scholar] [CrossRef]
- Xia, W.; Zheng, H.; Xu, W.; Xu, X. Large vision-language models enabled novel objects 6D pose estimation for human-robot collaboration. Robot. Comput.-Integr. Manuf. 2025, 95, 103030. [Google Scholar] [CrossRef]
- Zhang, X.; Tian, S.; Liang, X.; Zheng, M.; Behdad, S. Early Prediction of Human Intention for Human–Robot Collaboration Using Transformer Network. J. Comput. Inf. Sci. Eng. 2024, 24, 051003. [Google Scholar] [CrossRef]
- Laplaza, J.; Moreno, F.; Sanfeliu, A. Enhancing Robotic Collaborative Tasks Through Contextual Human Motion Prediction and Intention Inference. Int. J. Soc. Robot. 2024, 17, 2077–2096. [Google Scholar] [CrossRef]
- Fung, A.; Benhabib, B.; Nejat, G. LDTrack: Dynamic People Tracking by Service Robots Using Diffusion Models. Int. J. Comput. Vis. 2025, 133, 3392–3412. [Google Scholar] [CrossRef]
- Osa, T.; Ikemoto, S. Goal-Conditioned Variational Autoencoder Trajectory Primitives with Continuous and Discrete Latent Codes. SN Comput. Sci. 2020, 1, 303. [Google Scholar] [CrossRef]
- Deng, F.; Luo, J.; Fu, L.; Huang, Y.; Chen, J.; Li, N.; Zhong, J.; Lam, T.L. DG2GAN: Improving defect recognition performance with generated defect image sample. Sci. Rep. 2024, 14, 14787. [Google Scholar] [CrossRef]
- Hong, J.; Fulton, M.; Sattar, J. A Generative Approach Towards Improved Robotic Detection of Marine Litter. In Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA); IEEE: Piscataway, NJ, USA, 2020; pp. 10525–10531. [Google Scholar] [CrossRef]
- Hafez, M.B.; Wermter, S. Behavior Self-Organization Supports Task Inference for Continual Robot Learning. In Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 27 September–1 October 2021; pp. 6739–6746. [Google Scholar] [CrossRef]
- Hsiao, F.I.; Kuo, J.H.; Sun, M. Learning a Multi-Modal Policy via Imitating Demonstrations with Mixed Behaviors. arXiv 2022, arXiv:1903.10304. [Google Scholar] [CrossRef]
- Mao, W.; Liu, M.; Salzmann, M. Generating Smooth Pose Sequences for Diverse Human Motion Prediction. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV); IEEE: Piscataway, NJ, USA, 2021; p. 13289. [Google Scholar] [CrossRef]
- Ling, H.Y.; Zinno, F.; Cheng, G.G.; van de Panne, M. Character controllers using motion VAEs. ACM Trans. Graph. 2020, 39, 40. [Google Scholar] [CrossRef]
- Aliakbarian, S.; Saleh, F.S.; Petersson, L.; Gould, S.J.; Salzmann, M. Contextually Plausible and Diverse 3D Human Motion Prediction. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV); IEEE: Piscataway, NJ, USA, 2021; p. 11313. [Google Scholar] [CrossRef]
- Li, J.; Villegas, R.; Ceylan, D.; Yang, J.; Kuang, Z.; Li, H.; Zhao, Y. Task-Generic Hierarchical Human Motion Prior using VAEs. In Proceedings of the 2021 International Conference on 3D Vision (3DV); IEEE: Piscataway, NJ, USA, 2021; pp. 771–781. [Google Scholar] [CrossRef]
- Riba, E.; Mishkin, D.; Ponsa, D.; Rublee, E.; Bradski, G. Kornia: An Open Source Differentiable Computer Vision Library for PyTorch. In Proceedings of the 2020 IEEE Winter Conference on Applications of Computer Vision (WACV); IEEE: Piscataway, NJ, USA, 2020; pp. 3663–3672. [Google Scholar] [CrossRef]
- Wang, Z.; Liu, K.; Li, J.; Zhu, Y.; Zhang, Y. Various Frameworks and Libraries of Machine Learning and Deep Learning: A Survey. Arch. Comput. Methods Eng. 2019, 31, 1–24. [Google Scholar] [CrossRef]
- Bonci, A.; Gaudeni, F.; Giannini, M.; Longhi, S. Robot Operating System 2 (ROS2)-Based Frameworks for Increasing Robot Autonomy: A Survey. Appl. Sci. 2023, 13, 12796. [Google Scholar] [CrossRef]
- de Melo, M.S.P.; da Silva Neto, J.G.; da Silva, P.J.L.; Teixeira, J.M.; Teichrieb, V. Analysis and Comparison of Robotics 3D Simulators. In Proceedings of the 2019 21st Symposium on Virtual and Augmented Reality (SVR); IEEE: Piscataway, NJ, USA, 2019; pp. 242–251. [Google Scholar] [CrossRef]
- Kim, D.; Kwon, J.I.; Kim, Y.; Kim, D.; Choi, C. AI-native robotic vision systems enabled by in-sensor computing. npj Unconv. Comput. 2026, 3, 2. [Google Scholar] [CrossRef]
- Erős, E.; Dahl, M.; Bengtsson, K.; Hanna, A.; Falkman, P. A ROS2 based communication architecture for control in collaborative and intelligent automation systems. Procedia Manuf. 2019, 38, 349–357. [Google Scholar] [CrossRef]
- Zhang, J.; Keramat, F.; Yu, X.; Hernández, D.M.; Queralta, J.P.; Westerlund, T. Distributed Robotic Systems in the Edge-Cloud Continuum with ROS 2: A Review on Novel Architectures and Technology Readiness. In Proceedings of the 2022 Seventh International Conference on Fog and Mobile Edge Computing (FMEC); IEEE: Piscataway, NJ, USA, 2022; pp. 1–8. [Google Scholar] [CrossRef]
- Bonci, A.; Cheng, P.D.C.; Indri, M.; Nabissi, G.; Sibona, F. Human-Robot Perception in Industrial Environments: A Survey. Sensors 2021, 21, 1571. [Google Scholar] [CrossRef]
- Alenjareghi, M.J.; Keivanpour, S.; Chinniah, Y.; Jocelyn, S. Computer vision-enabled real-time job hazard analysis for safe human–robot collaboration in disassembly tasks. J. Intell. Manuf. 2024, 36, 5563–5591. [Google Scholar] [CrossRef]
- Liu, S.; Zhang, J.; Wang, L.; Gao, R.X. Vision AI-based human-robot collaborative assembly driven by autonomous robots. CIRP Ann. 2024, 73, 13–16. [Google Scholar] [CrossRef]
- Shahria, M.T.; Sunny, M.S.H.; Zarif, M.I.I.; Ghommam, J.; Ahamed, S.I.; Rahman, M.H. A Comprehensive Review of Vision-Based Robotic Applications: Current State, Components, Approaches, Barriers, and Potential Solutions. Robotics 2022, 11, 139. [Google Scholar] [CrossRef]
- Harada, K.; Wan, W.; Tsuji, T.; Kikuchi, K.; Nagata, K.; Onda, H. Experiments on Learning Based Industrial Bin-picking with Iterative Visual Recognition. arXiv 2018, arXiv:1805.08449. [Google Scholar] [CrossRef]
- Villalonga, A.; Cruz, Y.J.; Alfaro, D.D.; Haber, R.E.; Lastra, J.L.M.; Castaño, F. Enhancing Quality Inspection in Zero-Defect Manufacturing Through Robotic-Machine Collaboration. In Proceedings of the 2024 IEEE International Conference on Robotics and Automation; IEEE: Piscataway, NJ, USA, 2024; pp. 1–6. [Google Scholar] [CrossRef]
- Lema, D.G.; Sánchez-González, L.; Usamentiaga, R.; de la Calle, F.J. Benchmarking Deep Learning Models for Surface Defect Detection: A Reproducible and Statistically-Rigorous Approach. J. Intell. Manuf. 2025. [Google Scholar] [CrossRef]
- Koskinopoulou, M.; Raptopoulos, F.; Papadopoulos, G.; Maniadakis, M.; Partsinevelos, P. Robotic Waste Sorting Technology: Toward a Vision-Based Categorization System for the Industrial Robotic Separation of Recyclable Waste. IEEE Robot. Autom. Mag. 2021, 28, 50–60. [Google Scholar] [CrossRef]
- Vukićević, A.M.; Petrović, M.; Jurišević, N.; Djapan, M.; Knezevic, N.; Novakovic, A.; Jovanovic, K. Versatile Waste Sorting in Small Batch and Flexible Manufacturing Industries Using Deep Learning Techniques. Sci. Rep. 2025, 15, 3756. [Google Scholar] [CrossRef]
- Jabrane, K.; Bousmah, M. A New Approach for Training Cobots from Small Amount of Data in Industry 5.0. Int. J. Adv. Comput. Sci. Appl. 2021, 12. [Google Scholar] [CrossRef]
- ISO/TS 15066; Robots and Robotic Devices—Collaborative Robots. International Organization for Standardization: Geneva, Switzerland, 2016. Available online: https://www.iso.org/standard/62996.html (accessed on 11 March 2026).
- Weber, T.; Wermter, S. Integrating Intrinsic and Extrinsic Explainability: The Relevance of Understanding Neural Networks for Human-Robot Interaction. arXiv 2020, arXiv:2010.04602. [Google Scholar] [CrossRef]
- Ambsdorf, J.; Munir, A.; Wei, Y.; Degkwitz, K.; Harms, H.M.; Stannek, S.; Ahrens, K.; Becker, D.; Strahl, E.; Weber, T.; et al. Explain yourself! Effects of Explanations in Human-Robot Interaction. In Proceedings of the 2022 31st IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), Napoli, Italy, 29 August–2 September 2022; pp. 393–400. [Google Scholar] [CrossRef]
- Callari, T.C.; Segate, R.V.; Hubbard, E.; Daly, A.; Lohse, N. An Ethical Framework for Human-Robot Collaboration for the Future People-Centric Manufacturing: A Collaborative Endeavour with European subject-matter experts in Ethics. Technol. Soc. 2024, 78, 102680. [Google Scholar] [CrossRef]
- Đorđević, M.; Albonico, M.; Lewis, G.A.; Malavolta, I.; Lago, P. Computation offloading for ground robotic systems communicating over WiFi—An empirical exploration on performance and energy trade-offs. Empir. Softw. Eng. 2023, 28, 140. [Google Scholar] [CrossRef]
- Neuman, S.M.; Plancher, B.; Duisterhof, B.P.; Krishnan, S.; Banbury, C.; Mazumder, M.; Prakash, S.; Jabbour, J.; Faust, A.; de Croon, G.; et al. Tiny Robot Learning: Challenges and Directions for Machine Learning in Resource-Constrained Robots. In Proceedings of the Proceedings of the 2022 IEEE 4th International Conference on Artificial Intelligence Circuits and Systems (AICAS); IEEE: Piscataway, NJ, USA, 2022; pp. 296–299. [Google Scholar] [CrossRef]
- Park, J.; Kim, P.; Ko, D. Real-time open-vocabulary perception for mobile robots on edge devices: A systematic analysis of the accuracy-latency trade-off. Front. Robot. AI 2025, 12, 1693988. [Google Scholar] [CrossRef]
- Triess, S.C.; Leitritz, T.; Jauch, C. Exploring AI-based Anonymization of Industrial Image and Video Data in the Context of Feature Preservation. In Proceedings of the 2024 32nd European Signal Processing Conference (EUSIPCO); IEEE: Piscataway, NJ, USA, 2024; pp. 471–475. [Google Scholar] [CrossRef]
- Moon, S.; Kim, M.; Qin, Z.; Liu, Y.; Kim, D. Anonymization for Skeleton Action Recognition. In Proceedings of the AAAI Conference on Artificial Intelligence; AAAI Press: Palo Alto, CA, USA, 2023; Volume 37, pp. 15028–15036. [Google Scholar] [CrossRef]
- Cramariuc, A.; Petrov, A.; Suri, R.; Mittal, M.; Siegwart, R.; Cadena, C. Learning Camera Miscalibration Detection. In Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA); IEEE: Piscataway, NJ, USA, 2020; pp. 4997–5003. [Google Scholar] [CrossRef]
- Qiao, G.; Li, G. Auto-Calibration for Vision-Based 6-D Sensing System to Support Monitoring and Health Management for Industrial Robots. In Proceedings of the ASME 2021 16th International Manufacturing Science and Engineering Conference. Volume 2: Manufacturing Processes; Manufacturing Systems; Nano/Micro/Meso Manufacturing; Quality and Reliability, Online, 21–25 June 2021. [Google Scholar] [CrossRef]
- Zhao, W.; Gangaraju, K.; Yuan, F. Multimodal perception-driven decision-making for human-robot interaction: A survey. Front. Robot. AI 2025, 12, 1604472. [Google Scholar] [CrossRef]
- Xue, T.; Wang, W.; Ma, J.; Liu, W.; Pan, Z.; Han, M. Progress and Prospects of Multimodal Fusion Methods in Physical Human–Robot Interaction: A Review. IEEE Sens. J. 2020, 20, 10355–10370. [Google Scholar] [CrossRef]
- Martinez-Gil, J.; Pichler, M.; Bountouni, N.; Koussouris, S.; Barreiro, M.M.; Gusmeroli, S. An Agentic Framework for Rapid Deployment of Edge AI Solutions in Industry 5.0. arXiv 2025, arXiv:2510.25813. [Google Scholar] [CrossRef]
- Shao, X.; Xu, L.; Zheng, T.; Sun, G.; Zhu, Y. Practical Finite-Time Motion Planning for Spacecraft-Mounted Soft Manipulators Under Dynamic Obstacles. IEEE Trans. Aerosp. Electron. Syst. 2026, 62, 2603–2620. [Google Scholar] [CrossRef]
- Shao, X.; Xu, L.; Sun, G.; Yao, W.; Wu, L.; Santina, C.D. Self-Attention Enhanced Dynamics Learning and Adaptive Fractional-Order Control for Continuum Soft Robots with System Uncertainties. IEEE Trans. Autom. Sci. Eng. 2025, 22, 18694–18708. [Google Scholar] [CrossRef]
| Approach/ Method | Typical CV Tasks | Data | Robustness | Real-Time | Deployment | Key Limitations |
|---|---|---|---|---|---|---|
| Classical computer vision approaches | ||||||
| Canny, SURF, SIFT, sliding window | Edge detection; localization; obstacle avoidance; feature matching; basic recognition; pose; inspection | Low | Low | High | Low–Med | Sensitive to lighting/occlusion; limited generalization; feature engineering required |
| Machine-learning-based methods | ||||||
| SVM, KNN, DT, RF | Classification; gesture/command recognition; fault detection; motion classification; task learning | Med | Med | High | Med | Depends on engineered features; limited end-to-end perception |
| GMM | Image classification; task learning | Med | Med | High | Med | Distribution/feature assumptions |
| LDA/QDA | Dimensionality reduction; classification; fault tasks | Low–Med | Med | High | Med | Separability/distribution assumptions |
| Naive Bayes | Gesture recognition; knowledge representation | Low–Med | Med | High | Med | Feature-independence assumption |
| Unsupervised (k-Means, DBSCAN, EM) | Segmentation; clustering; anomaly detection; bin-picking support | Low labels | – | – | – | Sensitive to feature space and hyperparameters |
| Kalman/Particle Filters | Pose tracking; trajectory prediction | – | Med–High | High | Med | Model mismatch can degrade tracking |
| Deep-learning-based methods | ||||||
| CNNs/Detectors (YOLO, SSD, Faster R-CNN) | Detection; classification; inspection; recognition; collision-aware perception | High | High | High | High | Compute-heavy; limited interpretability |
| Segmentation (U-Net, DeepLab, Mask R-CNN) | Pixel-wise segmentation; manipulation; bin picking; digital twins | High | High | High | High | Annotation cost; compute; limited interpretability |
| Pose nets (OpenPose, HRNet, SCC-HRNet) | Keypoints/pose; intention inference; action prediction | High | High | High | High | Occlusion + latency constraints |
| Depth/3D nets | 6D pose; reconstruction; grasping; navigation | High (RGB-D) | High | High | High | Sensor dependence; domain shift; compute |
| ViTs (DETR, Swin) | Long-range context; scene understanding; language-guided tasks | High | High | High | High | Training/data cost; compute |
| RNN/LSTM | Sequential modeling; motion prediction; gesture/anomaly detection | High | High | High | High | Latency/training complexity |
| GAN/VAE | Augmentation; synthetic data; representation learning | Synthetic | Med–High | High | High | Stability + realism validation issues |
| Category | Framework/ Platform | Primary Role | Typical CV Tasks | Strengths | Limitations |
|---|---|---|---|---|---|
| Robot middleware | ROS/ROS2 | Perception–control integration | Image streaming; sensor fusion; vision-to-motion pipelines | Modular; open-source; widely adopted [68,71] | Real-time constraints; setup complexity |
| Industrial robot SDKs | Robot–vision interfacing | Vision-guided manipulation; calibration-to-actuation integration | Industrial reliability; vendor support | Vendor lock-in; limited flexibility | |
| Vision libraries | OpenCV | Classical & hybrid vision | Detection; tracking; calibration; preprocessing | Lightweight; real-time capable [66] | Limited native deep learning support |
| PCL | 3D perception | Point-cloud processing; workspace modeling; registration | Strong 3D tooling; mature ecosystem | Computationally intensive | |
| Deep learning frameworks | PyTorch | Model development & training | CNNs; Transformers; multimodal learning | Flexible; research-friendly | Requires deployment optimization |
| TensorFlow/ TF Lite | Training & deployment | Industrial inspection; edge inference | Strong deployment ecosystem | Less flexible for rapid research iteration | |
| Prototyping platforms | MATLAB/ Simulink | Rapid prototyping & validation | Vision algorithm prototyping; control testing | Fast development; robust toolboxes | Proprietary licensing and toolbox costs |
| Simulation environments | Gazebo; Isaac Sim; CoppeliaSim | Virtual testing | Vision-based navigation; HRC evaluation; synthetic data generation | Safe testing; reproducibility [69,72] | Sim-to-real gap |
| Edge AI platforms | NVIDIA Jetson | On-device inference | Real-time detection; gesture/action recognition | High performance; GPU acceleration | Power and thermal constraints relative to server-grade hardware |
| Intel OpenVINO | Optimized inference | Industrial inspection; CPU-optimized pipelines | Efficient CPU usage; deployment tooling | Model/operator constraints Optimized primarily for Intel hardware | |
| Integration ecosystem | Python stack | Pipeline integration | Data handling; inference orchestration; ROS integration | Flexible; easy to prototype | Performance tuning needed |
| Docker/ MLOps tools | Deployment & scaling | Model lifecycle; reproducibility; CI/CD | Reproducibility; portability | Industrial adoption challenges; requires DevOps expertise |
| Feature | MATLAB | Python |
|---|---|---|
| Development Environment | Proprietary integrated environment with specialized toolboxes | Open-source and flexible ecosystem relying on diverse libraries and frameworks |
| Ease of Use | High, particularly for deep learning development and debugging | High, due to readability and extensive community support |
| Libraries/ Frameworks | Comprehensive proprietary toolboxes | Extensive open-source libraries (e.g., PyTorch, Scikit-learn, Scikit-image) |
| Performance | Interpreted environment; performance depends on toolbox implementation and available hardware acceleration (e.g., GPU) | Typically high when using GPU-accelerated frameworks such as PyTorch |
| Cost | Commercial software requiring licenses | Free and open-source |
| Support | Dedicated vendor support and extensive documentation | Primarily community-driven support with extensive documentation and a large user ecosystem; no dedicated vendor service-level agreements by default |
| Robotics Integration | Dedicated Robotics System Toolbox available with ROS and ROS2 support | Strong native integration with ROS, benefiting from extensive community-developed packages and tighter middleware coupling; adaptable for industrial applications |
| Deployment | Embedded deployment via MATLAB Coder/Simulink; edge/cloud deployment requires additional setup | Flexible deployment across edge, cloud, and embedded systems with widely available tooling |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Varolia, H.; Vasques, C.M.A.; Cavadas, A.M.S. Computer Vision for Collaborative Robots in Industry 5.0: A Survey of Techniques, Gaps, and Future Directions. Eng. Proc. 2026, 124, 99. https://doi.org/10.3390/engproc2026124099
Varolia H, Vasques CMA, Cavadas AMS. Computer Vision for Collaborative Robots in Industry 5.0: A Survey of Techniques, Gaps, and Future Directions. Engineering Proceedings. 2026; 124(1):99. https://doi.org/10.3390/engproc2026124099
Chicago/Turabian StyleVarolia, Himani, César M. A. Vasques, and Adélio M. S. Cavadas. 2026. "Computer Vision for Collaborative Robots in Industry 5.0: A Survey of Techniques, Gaps, and Future Directions" Engineering Proceedings 124, no. 1: 99. https://doi.org/10.3390/engproc2026124099
APA StyleVarolia, H., Vasques, C. M. A., & Cavadas, A. M. S. (2026). Computer Vision for Collaborative Robots in Industry 5.0: A Survey of Techniques, Gaps, and Future Directions. Engineering Proceedings, 124(1), 99. https://doi.org/10.3390/engproc2026124099

