Enhancing Trust in Collaborative Assembly Through Resilient Adversarial Reinforcement Learning
Featured Application
Abstract
1. Introduction
- Ability refers to the skills and competencies that enable a party to function reliably within a specific domain. In HRC, this translates to the robot’s capability to complete tasks correctly and safely.
- Benevolence is the extent to which a trustee is believed to want to do good for the trustor. For a robot, this can be interpreted as the capacity to adapt its actions to support human partners, even when they make errors or deviate from the plan.
- Integrity involves adhering to a set of principles acceptable to the trustor, which implies predictability and consistency in behavior.
2. State of the Art in AAP
3. Materials and Methods
3.1. Task Decomposition for Collaborative Assembly
- 1.
- Absolute Constraints (feasibility, precedence):
- a.
- Feasibility Constraints (RTC): validate contact existence. The “OR” operator is applied to the feasibility constraint .
- b.
- Precedence Constraints (): checks for collision-free paths. The “AND” operator is applied to columns of the precedence constraint truth table to find the Boolean product , before to sum them:
- 2.
- Optimization Constraints (topological, functional, and stability):
- a.
- Topological Constraint: Ensures the application of precedence rules.
- b.
- Functional Constraint: Ensures the task is feasible for the robot gripper.
- c.
- Stability Constraint: Ensures parts remain stable during the assembly.
3.2. Synthetic Generation of Assembly Sequences
3.3. Reinforcement Learning Models
- is the set of all possible states in the environment (e.g., the status of the assembly);
- is the set of valid actions the agent can take (e.g., picking a part, fastening a bolt);
- represents the state transition probability, describing the likelihood of moving to a new state given the current state and action ;
- is the reward function, providing a scalar feedback signal received after transitioning from state via action ;
- is the discount factor, which determines the importance of future rewards compared to immediate ones.
3.4. Adversarial Reinforcement Learning Application
- On-Policy Learning in Adversarial Settings: In our adversarial framework, two agents (robot and human) compete against each other. PPO is an on-policy algorithm that requires data generated by the current policy to perform updates. In a multi-agent adversarial environment, the environment is usually non-stationary because the opponents are learning simultaneously. To address this, we use an alternating training scheme. Fixing the opponent’s policy while training the active agent makes the environment temporarily stationary. This allows the active agent to collect valid on-policy trajectories against the opponent’s current strategy, effectively optimizing a response to the game’s current “goal.”
3.4.1. Problem Formulation
- The robot receives a penalty for every time step to encourage speed. Upon reaching the final node, it receives a sparse completion reward.
- The human receives a positive reward for every step the game continues, incentivizing the prolongation of the task.
3.4.2. ARL Algorithm
- Environment and Graph (structure of the DAG and agents’ interaction rules)
- Reward Function for the Robot
- Reward Function Human
- Training Loop Parameters
- PPO Algorithm Hyperparameters (Stable Baselines3 Defaults)
4. Results
4.1. Definition of the Performance Metrics
- Efficiency metrics focus on the baseline performance of the system, primarily the Task Completion Time (TCT), which measures the total number of steps required to traverse the assembly DAG from start to finish. This is the absolute minimum number of steps if the human cooperated perfectly (or if the robot controlled both turns). While ARL is not expected to outperform purely optimal planning in ideal conditions, TCT serves as a benchmark to ensure the resilient policy remains within acceptable productivity limits.
- Robustness metrics, which directly address the ability and benevolence components of trust, are critical for demonstrating resilience. Key indicators include:
- a.
- Worst-Case Path Length (WCPL): the number of steps to complete the task when the robot contrasts the optimal policy of the human that is trying to delay the process.
- b.
- Resilience Ratio (RR): comparing WCPL against the distribution of all possible path lengths. A high percentile ranking confirms the robot’s ability to mitigate human variability. To calculate it the script runs 1000 simulations of the robot (optimal) vs. human (random). The ratio calculates the percentage of random trials that finished within the time bound established by the adversarial case. A 100% ratio confirms the robot has effectively learned a robust upper bound.
4.2. Definition of a Case Study
4.3. Execution of the Experiment on Synthetic Data
5. Discussion
5.1. Analysis of the Experimental Plan
5.2. Analysis of the Repeated Experiment
- The “Adversarial Tax”: The difference between the shortest possible theoretical paths (ranging from two to six steps) and the robot’s optimal policy (ranging from five to seventeen steps) is substantial. This metric perfectly illustrates the effectiveness of the adversary. By acting optimally, the human forces the robot to take paths that are, on average, over two and a half times longer than the shortest possible route (Figure 13). Obviously, this is an extreme scenario, not standard practice. The human coworker is expected to follow a collaborative policy most of the time.Optimal vs. Random Expectations: Interestingly, the overall average of the optimal robot’s worst-case policy (11.7 steps) is slightly higher than the average random length (10.73 steps). This occurs because the worst-case metric measures a guaranteed upper bound against an optimal adversary. Random trials average out human moves and reflect scenarios in which a smart robot can capitalize on human choices to cross the graph much faster.
5.3. Implications for Trust
- Ability: The robot demonstrates competence by consistently managing complex task sequences and avoiding deadlocks or excessive delays.
- Benevolence: By adapting its strategy to mitigate potential human errors (simulated by the adversary), the robot acts in the best interest of the team, reducing the burden on the human operator to perform perfectly.
5.4. Limitations
6. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| HRC | Human–Robot Collaboration |
| RL | Reinforcement Learning |
| ARL | Adversarial Reinforcement Learning |
| DAG | Directed Acyclic Graph |
| AAP | Automated Assembly Planning |
| SME | Small and Medium-sized Enterprises |
| ABI | Ability, Benevolence, Integrity |
| ASP | Assembly Sequence Planning |
| APP | Assembly Path Planning |
| PPO | Proximal Policy Optimization |
| MDP | Markov Decision Process |
| TCT | Task Completion Time |
| WCPL | Worst-Case Path Length |
| RR | Resilience Ratio |
| VLM | Vision Language Model |
References
- Wang, L.; Keshavarzmanesh, S.; Feng, H.Y.; Buchal, R.O. Assembly process planning and its future in collaborative manufacturing: A review. Int. J. Adv. Manuf. Technol. 2009, 41, 132–144. [Google Scholar] [CrossRef]
- Del Real Torres, A.; Andreiana, D.S.; Ojeda Roldan, A.; Hernandez Bustos, A.; Acevedo Galicia, L.E. A review of deep reinforcement learning approaches for smart manufacturing in industry 4.0 and 5.0 framework. Appl. Sci. 2022, 12, 12377. [Google Scholar] [CrossRef]
- Dieber, B.; Schlotzhauer, A.; Brandstötter, M. Safety and Security—Success factors of sensitive robotic technologies. Elektrotechnik Informationstechnik 2017, 134, 299–303. [Google Scholar] [CrossRef]
- Baumgartner, M.; Kopp, T.; Kinkel, S. Analysing factory workers’ acceptance of collaborative robots: A web-based tool for company representatives. Electronics 2022, 11, 145. [Google Scholar] [CrossRef]
- Bragança, S.; Costa, E.; Castellucci, I.; Arezes, P.M. A brief overview of the use of collaborative robots in industry 4.0: Human role and safety. In Occupational and Environmental Safety and Health; Springer: Cham, Switzerland, 2019; pp. 641–650. [Google Scholar]
- Jain, R.; Garg, N.; Khera, S.N. Comparing differences of trust, collaboration and communication between human-human vs human-bot teams: An experimental study. CERN IdeaSquare J. Exp. Innov. 2022, 7, 8–16. [Google Scholar]
- Haas, M.; Mortensen, M. The secrets of great teamwork. Harv. Bus. Rev. 2016, 94, 70–76. [Google Scholar]
- Mayer, R.C.; Davis, J.H.; Schoorman, F.D. An integrative model of organizational trust. Acad. Manag. Rev. 1995, 20, 709–734. [Google Scholar] [CrossRef]
- Khalid, H.; Helander, M.; Lin, M. Determinants of trust in human-robot interaction: Modeling, measuring, and predicting. In Trust in Human-Robot Interaction; Academic Press: Cambridge, MA, USA, 2021; pp. 85–121. [Google Scholar]
- Maderna, R.; Pozzi, M.; Zanchettin, A.M.; Rocco, P.; Prattichizzo, D. Flexible scheduling and tactile communication for human–robot collaboration. Robot. Comput. Integr. Manuf. 2022, 73, 102233. [Google Scholar] [CrossRef]
- Inkulu, A.K.; Bahubalendruni, M.R.; Dara, A. Challenges and opportunities in human robot collaboration context of Industry 4.0—A state of the art review. Ind. Robot Int. J. Robot. Res. Appl. 2022, 49, 226–239. [Google Scholar] [CrossRef]
- Masehian, E.; Ghandi, S. Assembly sequence and path planning for monotone and nonmonotone assemblies with rigid and flexible parts. Robot. Comput. Integr. Manuf. 2021, 72, 102180. [Google Scholar] [CrossRef]
- Peta, K.; Suszyński, M.; Wiśniewski, M.; Mitek, M. Analysis of Energy Consumption of Robotic Welding Stations. Sustainability 2024, 16, 2837. [Google Scholar] [CrossRef]
- Peta, K.; Wiśniewski, M.; Kotarski, M.; Ciszak, O. Comparison of Single-Arm and Dual-Arm Collaborative Robots in Precision Assembly. Appl. Sci. 2025, 15, 2976. [Google Scholar] [CrossRef]
- Lazzerini, B.; Marcelloni, F. A genetic algorithm for generating optimal assembly plans. Artif. Intell. Eng. 2000, 14, 319–329. [Google Scholar] [CrossRef]
- Li, M.; Wu, B.; Yi, P.; Jin, C.; Hu, Y.; Shi, T. An improved discrete particle swarm optimization algorithm for high-speed trains assembly sequence planning. Assem. Autom. 2013, 33, 360–373. [Google Scholar] [CrossRef]
- Han, Z.; Wang, Y.; Tian, D. Ant colony optimization for assembly sequence planning based on parameters optimization. Front. Mech. Eng. 2021, 16, 393–409. [Google Scholar] [CrossRef]
- Karthik, G.; Deb, S. A methodology for assembly sequence optimization by hybrid cuckoo-search genetic algorithm. J. Adv. Manuf. Syst. 2018, 17, 47–59. [Google Scholar] [CrossRef]
- Malek, N.; Peng, Q. Reinforcement learning for self-adaptive genetic algorithm in assembly sequence planning. Int. J. Adv. Manuf. Technol. 2025, 141, 4803–4822. [Google Scholar] [CrossRef]
- Suszyński, M.; Peta, K. Assembly sequence planning using artificial neural networks for mechanical parts based on selected criteria. Appl. Sci. 2021, 11, 10414. [Google Scholar] [CrossRef]
- Masehian, E.; Ghandi, S. ASPPR: A new assembly sequence and path planner/replanner for monotone and nonmonotone assembly planning. Comput. Aided Des. 2020, 123, 102828. [Google Scholar] [CrossRef]
- Liu, J.-C.; Chang, C.-H.; Sun, S.-H.; Yu, T.-L. Integrating planning and deep reinforcement learning via automatic induction of task substructures. In Proceedings of the Twelfth International Conference on Learning Representations, Vienna, Austria, 7–11 May 2024. [Google Scholar]
- Lettera, G.; Natale, C. An Integrated Architecture for Robotic Assembly and Inspection of a Composite Fuselage Panel with an Industry 5.0 Perspective. Machines 2024, 12, 103. [Google Scholar] [CrossRef]
- Mateus, J.; Aghezzaf, E.H.; Claeys, D.; Limère, V.; Cottyn, J. Method for transition from manual assembly to human-robot collaborative assembly. IFAC-PapersOnLine 2018, 51, 405–410. [Google Scholar] [CrossRef]
- Gottipolu, R.B.; Ghosh, K. A simplified and efficient representation for evaluation and selection of assembly sequences. Comput. Ind. 2003, 50, 251–264. [Google Scholar] [CrossRef]
- Deepak, B.B.; Bala Murali, G.; Bahubalendruni, M.R.; Biswal, B.B. Assembly sequence planning using soft computing methods: A review. Proc. Inst. Mech. Eng. Part E J. Process Mech. Eng. 2019, 233, 653–683. [Google Scholar] [CrossRef]
- Aliev, K.; Antonelli, D.; Bruno, G. Task-based programming and sequence planning for human-robot collaborative assembly. IFAC-PapersOnLine 2019, 52, 1638–1643. [Google Scholar] [CrossRef]
- Heath, L.; Pemmaraju, S.; Trenk, A. Directed Acyclic Graphs. Planar Graphs 1992, 9, 5. [Google Scholar]
- Sutton, R.; Barto, A.G. Reinforcement learning. J. Cogn. Neurosci. 1999, 11, 126–134. [Google Scholar]
- Puterman, M.L. Markov decision processes. Handb. Oper. Res. Manag. Sci. 1990, 2, 331–434. [Google Scholar]
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal policy optimization algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar] [CrossRef]
- Antonelli, D.; Zeng, Q.; Aliev, K.; Liu, X. Robust assembly sequence generation in a Human-Robot Collaborative workcell by reinforcement learning. FME Trans. 2021, 49, 851–858. [Google Scholar] [CrossRef]
- Zhao, H.; Liang, Z.; Ma, T.; Shi, X.; Kapadia, M.; Thrash, T.; Hoelscher, C.; Jia, J.; Liu, B.; Cao, J. Adversarial Reinforcement Learning for Enhanced Decision-Making of Evacuation Guidance Robots in Intelligent Fire Scenarios. IEEE Trans. Comput. Soc. Syst. 2024, 12, 2030–2046. [Google Scholar] [CrossRef]
- Pinto, L.; Davidson, J.; Sukthankar, R.; Gupta, A. Robust adversarial reinforcement learning. In Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; pp. 2817–2826. [Google Scholar]
- Hazem, Z.B.; Saidi, F.; Guler, N.; Altaif, A.H. A Hybrid Reinforcement Learning Framework Combining TD3 and PID Control for Robust Trajectory Tracking of a 5-DOF Robotic Arm. Automation 2025, 6, 56. [Google Scholar] [CrossRef]













| Software/Library | Version/Status |
|---|---|
| Python | 3.13.11 |
| Stable-Baselines3 | 2.7.1 |
| PyTorch | 2.9.1 |
| NumPy | 2.4.1 |
| Cloudpickle | 3.1.2 |
| Gymnasium | 1.2.3 |
| Task | Operation | Description | Assigned |
|---|---|---|---|
| 1.1 | Input confirmation | Confirm start of R2 movement | H |
| 1.2 | Bit change | Replace hex bit with cross bit on R1 | H |
| 1.3 | Component move | Move PC to the work area | R2 |
| 1.4 | PC Positioning | Place PC and 2× VT1 screws on BD | H |
| 1.5 | Component move | Move BC to the work area | R2 |
| 1.6 | Screwing | Screw VT1 (0.5 Nm) while holding BD | R1 |
| 2.1 | Bit change | Replace cross bit with hex bit on R1 | H |
| 2.2 | BC Positioning | Place BC on BD | H |
| 2.3 | Component move | Position 4×x VT2 screws on BC | R2 |
| 2.4 | Screw placement | Insert VT2 screws into holes | H |
| 2.5 | Screwing | Screw VT2 (1.5 Nm) while holding BD | R1 |
| 3.1 | Tool move | Move screwdriver to work area | R2 |
| 3.2 | Tool move | Move hex keys to work area | R2 |
| 3.3 | Component move | Move FOR to work area | R2 |
| 4.1 | Input confirmation | Move BD position | H |
| 4.2 | Support | Lift and hold BD | R2 |
| 4.3 | Manual screwing | Position and screw 2×x VT1 (0.5 Nm) | H |
| 5.1 | Input confirmation | Confirm BD movement | H |
| 5.2 | BD Positioning | Place BD on EV | R2 |
| 5.3 | Manual screwing | Position and screw 6×x VT3 (3 Nm) | H |
| 5.4 | Final assembly | Position and screw FOR and VT4 (1.5 Nm) | H |
| Layers/Max Nodes | TCT | WCPL | Max Random Length | Avg Random Length | RR |
|---|---|---|---|---|---|
| 4/10 | 1 | 4 | 5 | 3.86 | 87.90% |
| 10/4 | 2 | 7 | 8 | 4.57 | 94.20% |
| 10/10 | 2 | 9 | 11 | 8.18 | 93.40% |
| 15/10 | 5 | 10 | 12 | 8.18 | 93.30% |
| 20/10 | 4 | 11 | 18 | 10.75 | 71% |
| 20/20 | 3 | 14 | 19 | 12.28 | 85% |
| Trial | Total Nodes | TCT | WCPL | Avg Random Length | RR |
|---|---|---|---|---|---|
| 1 | 76 | 5 | 11 | 11.3 | 62.60% |
| 2 | 70 | 6 | 17 | 13.32 | 97.00% |
| 3 | 59 | 2 | 11 | 10.29 | 54.50% |
| 4 | 56 | 4 | 8 | 7.54 | 61.50% |
| 5 | 62 | 3 | 5 | 9.32 | 28.30% |
| 6 | 64 | 6 | 10 | 10.16 | 63.00% |
| 7 | 73 | 3 | 16 | 12.31 | 98.70% |
| 8 | 69 | 4 | 15 | 10.23 | 97.70% |
| 9 | 77 | 6 | 13 | 12.11 | 62.50% |
| 10 | 68 | 5 | 11 | 10.77 | 66.80% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Antonelli, D.; Aliev, K.; Yang, B. Enhancing Trust in Collaborative Assembly Through Resilient Adversarial Reinforcement Learning. Appl. Sci. 2026, 16, 3244. https://doi.org/10.3390/app16073244
Antonelli D, Aliev K, Yang B. Enhancing Trust in Collaborative Assembly Through Resilient Adversarial Reinforcement Learning. Applied Sciences. 2026; 16(7):3244. https://doi.org/10.3390/app16073244
Chicago/Turabian StyleAntonelli, Dario, Khurshid Aliev, and Bo Yang. 2026. "Enhancing Trust in Collaborative Assembly Through Resilient Adversarial Reinforcement Learning" Applied Sciences 16, no. 7: 3244. https://doi.org/10.3390/app16073244
APA StyleAntonelli, D., Aliev, K., & Yang, B. (2026). Enhancing Trust in Collaborative Assembly Through Resilient Adversarial Reinforcement Learning. Applied Sciences, 16(7), 3244. https://doi.org/10.3390/app16073244

