Deep Learning Framework for Controlling Work Sequence in Collaborative Human–Robot Assembly Processes
Abstract
:1. Introduction
- Human-centered data—action, gesture and sound or voice recognition regarding the human;
- Data regarding the workspace—solely object detection, object tracking, or pose estimation of the components within the workspace or also in coordination with some kind of information regarding the human worker.
2. Proposed Approach for Collaborative Human–Robot Assembly
2.1. Deep Learning Model
- Firstly, the physical HRC workspace where both the human worker and the collaborative robot will co-work is defined. This also comprises defining beforehand the entire assembly sequence regarding the mechanical component to be built, as well as the tools and equipment necessary to conclude the entire assembly process;
- Subsequently, a large set of images (as large as possible) of objects and human action to be detected inserted into the scenes of interest for the operation of the HRC environment is acquired by a camera, for further training of the deep learning models. It is necessary to account for the importance of not having categorical class imbalance in this set of images, i.e., there should be roughly the same number of instances of each one of the categorical classes to be detected within the entire set of images. The images acquired by the camera are a mixture of frames taken while the human or/and the robot are continuously performing the entire assembly sequence, as well as individual pictures showing a smaller set of components or just an individual categorical class per image;
- After the acquisition of a large set of images, the manual labelling and definition of bounding boxes for each one of the images with each one of the chosen categorical classes has to be made, concluding in this way the creation of the dataset. The dataset is then divided into training dataset, validation dataset and test dataset. The training and validation datasets are to be used during the training process of a deep learning model and the test dataset to assess and compare its performance;
- In this step, the structure (backbone) of a deep learning model is created using a CNN architecture. This architecture can be created from scratch, or, alternatively, an existing architecture that has been successful in solving a similar problem can be used and adapted to the desired case;
- Before training the deep learning model, it is necessary that all the labeled images that make up the datasets are resized to match the image size of the input layer of the deep learning model. In addition, data augmentation is performed on the training and validation datasets in order to quickly create more intra-class variation (e.g., add noise, rotation, translation) and/or artificially increase the number of images in each of the datasets without the need to physically acquire more images;
- The deep learning model should be trained by a neural network training method having previously selected the training parameters, namely number of epochs, batch size, initial learning ratio, dropout and L2 regularization.
2.2. Management Algorithm for Collaborative Human–Robot Assembly
- Based on an assembly sequence, specific tasks are assigned to the human and the robot. Emphasis is placed on the human worker, so the robot performs assistive tasks, such as picking up tools or part assemblies out of the human’s reach and bringing them to the worker or mounting them on other parts;
- The tasks allocated to the robot must be taught to it. The robot must execute these tasks in an uncoupled manner, which means that the robot can execute any task at any time, without imposing on the robot that the execution of a task is dependent on the previous execution of another task. The sequence will be imposed by the decision-making algorithm and not by the robot itself. Each task must be taught to the robot in a way that ensures the safety of the human, i.e., the robot must change position with reduced accelerations and final speeds, with a trajectory that avoids collisions and maintains a minimum safety distance from the human worker;
- The next step is to define which parts/sets of parts, tools and human actions, detected by the deep learning model, as well as which conditions of the robotic cell (e.g., joint positions or robot speeds) should act as triggers to command the robot to perform a task. In order to ensure high reliability to the HRC framework, classes recognized by the deep learning model must be recognized in at least m frames when n frames are processed, where m < n. Define the m frames;
- Based on the previous points, the management algorithm is implemented using a programming language. A generic structure for this algorithm is suggested in Algorithm 1.
Algorithm 1: Management Algorithm |
1: Initialize communication between management algorithm and robot controller |
2: camera ← Set up the camera |
3: model ← Load deep learning model |
4: storage_detected_classes ← Reset the classes detected in the last n frames |
5: tasks_performed ← Reset the registered tasks as performed by the robot |
6: num_frames ← 0 Reset the frame count captured by the camera |
7: running_flag ← true |
8: while (running_flag) |
9: image ← Capture_Image (camera) |
10: detected_classes ← model (image) |
11: storage_detected_classes ← Update (detected_classes) |
12: tasks_performed ← The robot provides feedback about task execution |
13: num_frames ← num_frames + 1 |
14: if (num_frames == n) |
15: if (Existence and non-existence of certain key categorical classes within the workspace detected in m frames (storage_detected_classes) and certain tasks performed by the robot were and were not performed (tasks_performed) as required by robot task 1) |
16: Command the robot to perform the robot task 1 |
17: num_frames ← 0 |
18: storage_detected_classes ← Reset |
19: elseif (Existence and non-existence of certain key categorical classes within the workspace detected in m frames (storage_detected_classes) and certain tasks performed by the robot were and were not performed (tasks_performed) as required by robot task 2) |
20: Command the robot to perform the robot task 2 |
21: num_frames ← 0 |
22: storage_detected_classes ← Reset |
23: elseif (…) |
24: … |
25: elseif (All k robot tasks were performed (tasks_performed)) |
26: tasks_performed ← Reset |
27: endif |
28: if (num_frames == n) |
29: num_frames ← num_frames-1 |
30: endif |
31: endif |
32: running_flag ← Check if the human has turned off the HRC application |
33: endwhile |
3. Human–Robot Collaborative Assembly Environment
3.1. Deep Learning Model Creation
3.2. Implementation of the Management Algorithm
- Perform task R1 only when the human was performing the insertion of the screws in place (task H3);
- Perform task R2 only when the human started screwing (task H4);
- Perform task R3 only when the human turned the half-assembled mechanical component upwards (task H5);
- Perform task R4 only when all the components were assembled (completed task H6).
Algorithm 2: Implementation of the Management Algorithm to Control the HRC Robotic Cell |
1: Initialize communication between management algorithm and robot controller |
2: camera ← Set up the camera |
3: model ← Load deep learning model |
4: storage_detected_classes ← Reset the classes detected in the last n frames |
5: tasks_performed ← Reset the registered tasks as performed by the robot |
6: num_frames ← 0 Reset the frame count captured by the camera |
7: running_flag ← true |
8: while (running_flag) |
9: image ← Capture_Image (camera) |
10: detected_classes ← model (image) |
11: storage_detected_classes ← Update (detected_classes) |
12: tasks_performed ← The robot provides feedback about task execution |
13: num_frames ← num_frames + 1 |
14: n ← 10 |
15: if (num_frames == n) |
16: if (Base and Stepper have been detected in at least 7 of the 10 frames (storage_detected_classes) and R1 has not yet been performed (tasks_performed)) |
17: Command the robot to perform the robot task 1 |
18: num_frames ← 0 |
19: storage_detected_classes ← Reset |
20: elseif (Base and Screw have been detected in at least 5 of the 10 frames (storage_detected_classes) and R2 has not yet been performed (tasks_performed) and R1 has been performed (tasks_performed)) |
21: Command the robot to perform the robot task 2 |
22: num_frames ← 0 |
23: storage_detected_classes ← Reset |
24: elseif (The Base has been detected and neither the Bolt nor the Stepper has been detected in at least 3 of the 10 frames (storage_detected_classes) and R3 has not yet been performed (tasks_performed) and R1 and R2 have been performed (tasks_performed)) |
25: Command the robot to perform the robot task 3 |
26: num_frames ← 0 |
27: storage_detected_classes ← Reset |
28: elseif (The Base and the Belt and the Wheels A, B, C and D have been detected in at least 2 of the 10 frames (storage_detected_classes) and R4 has not yet been performed (tasks_performed) and R1 and R2 and R3 have been performed (tasks_performed)) |
29: Command the robot to perform the robot task 4 |
30: num_frames ← 0 |
31: storage_detected_classes ← Reset |
32: elseif (R1 and R2 and R3 and R4 have already been performed (tasks_performed)) |
33: tasks_performed ← Reset |
34: endif |
35: if (num_frames == n) |
36: num_frames ← num_frames-1 |
37: endif |
38: endif |
39: running_flag ← Check if the human has turned off the HRC application |
40: endwhile |
4. Results and Discussion
4.1. Deep Learning Model Performance
4.2. Human–Robot Collaborative Framework Performance
- The YOLOv3 deep learning model allowed constitution of a successful HRC framework for managing the sequence of tasks performed by a robot to assemble a mechanical component;
- The HRC framework was capable of triggering the robot to perform its tasks at the exact moment needed, sequentially assisting the human throughout the assembly process;
- The HRC framework made it possible for a human co-working together with a robot within the same workspace;
- The HRC framework proof-of-concept worked effectively and successfully in assembling mechanical parts.
5. Conclusions
- The best performing deep learning models in terms of mAP values were the models based on YOLOv3 and Faster R-CNN with ResNet-101 (72.26% and 72.93%, respectively). However, YOLOv3 was the only model capable of detecting the Screw categorical class, which was important for the proper functioning of the HRC framework;
- Although the YOLOv3-based model had a similar mAP to that presented by the Faster R-CNN with ResNet-101 model and showed in comparison a worse ability in distinguishing the Wheel_A, Wheel_B, Wheel_C and Wheel_D categorical classes, the YOLOv3-based model was the only model able to detect the Screw class. Due to the nature of the assembly sequence defined for the selected mechanical component, it was determined as more important to correctly detect the Screw categorical class, rather than to perfectly distinguish all categorical classes pertaining to the cogwheels;
- The YOLOv3-based model has the advantage of requiring fewer computational resources and having a faster detection speed, due to the smaller number of layers and parameters comprising the model, than the other deep learning models discussed in this study.
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Yu, F.; Schweisfurth, T. Industry 4.0 Technology Implementation in SMEs—A Survey in the Danish-German Border Region. Int. J. Innov. Stud. 2020, 4, 76–84. [Google Scholar] [CrossRef]
- Gualtieri, L.; Rauch, E.; Vidoni, R. Emerging Research Fields in Safety and Ergonomics in Industrial Collaborative Robotics: A Systematic Literature Review. Robot. Comput. Integr. Manuf. 2021, 67, 101998. [Google Scholar] [CrossRef]
- Machado, M.A.; Rosado, L.S.; Mendes, N.M.; Miranda, R.M.; Santos, T.G. Multisensor Inspection of Laser-Brazed Joints in the Automotive Industry. Sensors 2021, 21, 7335. [Google Scholar] [CrossRef]
- Machado, M.A.; Rosado, L.F.S.G.; Mendes, N.A.M.; Miranda, R.M.M.; dos Santos, T.J.G. New Directions for Inline Inspection of Automobile Laser Welds Using Non-Destructive Testing. Int. J. Adv. Manuf. Technol. 2022, 118, 1183–1195. [Google Scholar] [CrossRef]
- Pohlt, C.; Schlegl, T.; Wachsmuth, S. Human Work Activity Recognition for Working Cells in Industrial Production Contexts. In Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, Bari, Italy, 6–9 October 2019; pp. 4225–4230. [Google Scholar]
- Bonci, A.; Cheng, P.D.C.; Indri, M.; Nabissi, G.; Sibona, F. Human-Robot Perception in Industrial Environments: A Survey. Sensors 2021, 21, 1571. [Google Scholar] [CrossRef]
- Chen, C.; Wang, T.; Li, D.; Hong, J. Repetitive Assembly Action Recognition Based on Object Detection and Pose Estimation. J. Manuf. Syst. 2020, 55, 325–333. [Google Scholar] [CrossRef]
- Tao, W.; Lai, Z.H.; Leu, M.C.; Yin, Z. Worker Activity Recognition in Smart Manufacturing Using IMU and SEMG Signals with Convolutional Neural Networks. In Proceedings of the Procedia Manufacturing, College Station, TX, USA, 1 January 2018; Elsevier: Amsterdam, The Netherlands, 2018; Volume 26, pp. 1159–1166. [Google Scholar]
- El Aswad, F.; Djogdom, G.V.T.; Otis, M.J.D.; Ayena, J.C.; Meziane, R. Image Generation for 2D-CNN Using Time-Series Signal Features from Foot Gesture Applied to Select Cobot Operating Mode. Sensors 2021, 21, 5743. [Google Scholar] [CrossRef]
- Amin Khalil, R.; Saeed, N.; Member, S.; Masood, M.; Moradi Fard, Y.; Alouini, M.-S.; Al-Naffouri, T.Y. Deep Learning in the Industrial Internet of Things: Potentials, Challenges, and Emerging Applications; Deep Learning in the Industrial Internet of Things: Potentials, Challenges, and Emerging Applications. IEEE Internet Things J. 2021, 8, 11016–11040. [Google Scholar] [CrossRef]
- Laptev, A.; Andrusenko, A.; Podluzhny, I.; Mitrofanov, A.; Medennikov, I.; Matveev, Y. Dynamic Acoustic Unit Augmentation with Bpe-Dropout for Low-Resource End-to-End Speech Recognition. Sensors 2021, 21, 3063. [Google Scholar] [CrossRef]
- Mendes, N.; Ferrer, J.; Vitorino, J.; Safeea, M.; Neto, P. Human Behavior and Hand Gesture Classification for Smart Human-Robot Interaction. Procedia Manuf. 2017, 11, 91–98. [Google Scholar] [CrossRef]
- Lopes, J.; Simão, M.; Mendes, N.; Safeea, M.; Afonso, J.; Neto, P. Hand/Arm Gesture Segmentation by Motion Using IMU and EMG Sensing. Procedia Manuf. 2017, 11, 107–113. [Google Scholar] [CrossRef]
- Mendes, N. Surface Electromyography Signal Recognition Based on Deep Learning for Human-Robot Interaction and Collaboration. J. Intell. Robot. Syst. Theory Appl. 2022, 105, 42. [Google Scholar] [CrossRef]
- Mendes, N.; Simao, M.; Neto, P. Segmentation of Electromyography Signals for Pattern Recognition. In Proceedings of the IECON Proceedings (Industrial Electronics Conference), Lisbon, Portugal, 14–17 October 2019; IEEE: Lisbon, Portugal, 2019; pp. 732–737. [Google Scholar]
- Mendes, N.; Neto, P.; Safeea, M.; Moreira, A.P. Online Robot Teleoperation Using Human Hand Gestures: A Case Study for Assembly Operation. In Proceedings of the Advances in Intelligent Systems and Computing, Lisbon, Portugal, 19-21 December 2015; Springer: Berlin, Germany, 2016; Volume 418, pp. 93–104. [Google Scholar]
- Wen, X.; Chen, H.; Hong, Q. Human Assembly Task Recognition in Human-Robot Collaboration Based on 3D CNN. In Proceedings of the 9th IEEE International Conference on Cyber Technology in Automation, Control and Intelligent Systems, Suzhou, China, 29 July–2 August 2019; pp. 1230–1234. [Google Scholar]
- Cheng, Y.; Sun, L.; Liu, C.; Tomizuka, M. Towards Efficient Human-Robot Collaboration with Robust Plan Recognition and Trajectory Prediction. IEEE Robot. Autom. Lett. 2020, 5, 2602–2609. [Google Scholar] [CrossRef]
- Katona, J. A Review of Human–Computer Interaction and Virtual Reality Research Fields in Cognitive Infocommunications. Appl. Sci. 2021, 11, 2646. [Google Scholar] [CrossRef]
- Katona, J. Examination and Comparison of the EEG Based Attention Test with CPT and T.O.V.A. In Proceedings of the CINTI 2014—15th IEEE International Symposium on Computational Intelligence and Informatics, Budapest, Hungary, 19–21 November 2014; IEEE: Danvers, MA, USA, 2014; pp. 117–120. [Google Scholar]
- Katona, J.; Ujbanyi, T.; Sziladi, G.; Kovari, A. Examine the Effect of Different Web-Based Media on Human Brain Waves. In Proceedings of the 8th IEEE International Conference on Cognitive Infocommunications, CogInfoCom 2017, Debrecen, Hungary, 11–14 September 2017; pp. 407–412. [Google Scholar]
- Katona, J.; Kovari, A.; Ujbanyi, T.; Sziladi, G. Hand Controlled Mobile Robot Applied in Virtual Environment. Int. J. Mech. Mechatronics Eng. 2017, 11, 1430–1435. [Google Scholar]
- Ha, J.; Park, S.; Im, C.H. Novel Hybrid Brain-Computer Interface for Virtual Reality Applications Using Steady-State Visual-Evoked Potential-Based Brain–Computer Interface and Electrooculogram-Based Eye Tracking for Increased Information Transfer Rate. Front. Neuroinform. 2022, 16, 758537. [Google Scholar] [CrossRef]
- Katona, J. Analyse the Readability of LINQ Code Using an Eye-Tracking-Based Evaluation. Acta Polytech. Hung. 2021, 18, 193–215. [Google Scholar] [CrossRef]
- Park, K.B.; Choi, S.H.; Kim, M.; Lee, J.Y. Deep Learning-Based Mobile Augmented Reality for Task Assistance Using 3D Spatial Mapping and Snapshot-Based RGB-D Data. Comput. Ind. Eng. 2020, 146, 106585. [Google Scholar] [CrossRef]
- Lai, Z.H.; Tao, W.; Leu, M.C.; Yin, Z. Smart Augmented Reality Instructional System for Mechanical Assembly towards Worker-Centered Intelligent Manufacturing. J. Manuf. Syst. 2020, 55, 69–81. [Google Scholar] [CrossRef]
- Choi, S.H.; Park, K.B.; Roh, D.H.; Lee, J.Y.; Mohammed, M.; Ghasemi, Y.; Jeong, H. An Integrated Mixed Reality System for Safety-Aware Human-Robot Collaboration Using Deep Learning and Digital Twin Generation. Robot. Comput. Integr. Manuf. 2022, 73, 102258. [Google Scholar] [CrossRef]
- Zhang, J.; Wang, P.; Gao, R.X. Hybrid Machine Learning for Human Action Recognition and Prediction in Assembly. Robot. Comput. Integr. Manuf. 2021, 72, 102184. [Google Scholar] [CrossRef]
- Davis, J.; Goadrich, M. The Relationship between Precision-Recall and ROC Curves. In Proceedings of the 23rd International Conference on Machine Learning, Pittsburgh, PA, USA, 25–29 June 2006; Volume 148, pp. 233–240. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. In Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 6517–6525. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Kai, L. Li Fei-Fei ImageNet: A Large-Scale Hierarchical Image Database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; IEEE: Danvers, MA, USA, 2010; pp. 248–255. [Google Scholar]
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft COCO: Common Objects in Context; Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Cham, Switzerland, 2014; Volume 8693, pp. 740–755. [Google Scholar]
Architecture | Faster R-CNN | Faster R-CNN | YOLOv2 | YOLOv3 |
---|---|---|---|---|
Backbone | ResNet-50 | ResNet-101 | Darknet-19 | Darknet-53 |
Input layer size in pixels [height width] | [224 × 396] | [224 × 396] | [256 × 256] | [608 × 608] |
Pretraining dataset | ImageNet [35] | ImageNet [35] | ImageNet [35] | COCO [36] |
Training Parameters | Value |
---|---|
Solver | Sgdm |
Number of epochs | 50 |
Mini-batch size | 16/32/64 |
Initial learning rate | 1 × 10−4 |
Dropout | Yes |
L2 regularization | 1 × 10−4 |
Validation dataset | Yes |
Task Number | Task Description | Requirements |
---|---|---|
H1 | Mount the bottom of the Base part on the assembly support. | Assembly support |
H2 | Mount the Stepper on the Base. | N/A |
H3 | Insert the two Screws in the correct holes to fix the Stepper to the Base. | N/A |
H4 | Screw the Screws with the Screwdriver. | Screwdriver |
H5 | Turn the semi-assembled mechanical component upside down (top). | N/A |
H6 | Assemble the Wheels A, B, C and D and the Belt. | Cogwheel and belt storage box |
Robot Task | Robot Task Description |
---|---|
R1 | Pick up the screwdriver from outside the human workspace (HWS) and place it in the screwdriver holder in the HWS. |
R2 | Pick up the wheel box from outside the HWS and place it in a specific location inside the HWS. |
R3 | Pick up the screwdriver placed in the inner screwdriver holder, inside the HWS, and carry it to the outer holder, outside the HWS. |
R4 | Pick up and take away the empty wheel box to the initial position outside the HWS. |
Model | Mini-Batch Size | mAP50 (%) |
---|---|---|
16 | 62.76 | |
Faster R-CNN—ResNet-50 | 32 | 69.47 |
64 | 66.34 | |
16 | 69.53 | |
Faster R-CNN—ResNet-101 | 32 | 72.93 |
64 | 71.43 | |
16 | 38.82 | |
YOLOv2—Darknet-19 | 32 | 42.32 |
64 | 40.61 | |
16 | 67.34 | |
YOLOv3—Darknet-53 | 32 | 72.26 |
64 | 70.75 |
Deep Learning Model/Input Image Size | Average Detection Time per Image [s] | Standard Deviation [s] |
---|---|---|
Faster R-CNN—ResNet-50 [224 × 396] | 0.346 | 0.043 |
Faster R-CNN—ResNet-101 [224 × 396] | 0.403 | 0.036 |
YOLOv2—Darknet-19 [256 × 256] | 0.031 | 0.009 |
YOLOv3—Darknet-53 [608 × 608] | 0.551 | 0.060 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Garcia, P.P.; Santos, T.G.; Machado, M.A.; Mendes, N. Deep Learning Framework for Controlling Work Sequence in Collaborative Human–Robot Assembly Processes. Sensors 2023, 23, 553. https://doi.org/10.3390/s23010553
Garcia PP, Santos TG, Machado MA, Mendes N. Deep Learning Framework for Controlling Work Sequence in Collaborative Human–Robot Assembly Processes. Sensors. 2023; 23(1):553. https://doi.org/10.3390/s23010553
Chicago/Turabian StyleGarcia, Pedro P., Telmo G. Santos, Miguel A. Machado, and Nuno Mendes. 2023. "Deep Learning Framework for Controlling Work Sequence in Collaborative Human–Robot Assembly Processes" Sensors 23, no. 1: 553. https://doi.org/10.3390/s23010553
APA StyleGarcia, P. P., Santos, T. G., Machado, M. A., & Mendes, N. (2023). Deep Learning Framework for Controlling Work Sequence in Collaborative Human–Robot Assembly Processes. Sensors, 23(1), 553. https://doi.org/10.3390/s23010553