SAC-Optimized Fuzzy Variable Admittance Control for Lead-Through Teaching of Collaborative Robots
Abstract
1. Introduction
2. System Overview and Methods
2.1. Admittance Modeling and Operator Intent Quantification
2.2. Fuzzy Variable Admittance Controller
2.3. SAC-Based Optimization of Membership Function Parameters
2.3.1. Episode-Level Optimization Architecture and MDP Formulation
2.3.2. Task Rewards and Scaling-Factor Constraints
2.3.3. Performance-Gated Curriculum Learning and Training Configuration
3. Optimizer Selection and Ablation Analysis
3.1. Optimizer Comparison and Selection
3.2. Ablation Study
3.3. Deployed Membership Function Configuration
4. Lead-Through Teaching Experiments
4.1. Experimental Platform and Gravity-Compensation Validation
4.2. Experimental Design
4.3. Results and Analysis
4.3.1. Straight-Line Trajectory
4.3.2. Compound Trajectory
4.3.3. 3D Ramp Trajectory
5. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Villani, V.; Pini, F.; Leali, F.; Secchi, C. Survey on Human–Robot Collaboration in Industrial Settings: Safety, Intuitive Interfaces and Applications. Mechatronics 2018, 55, 248–266. [Google Scholar] [CrossRef]
- Haddadin, S.; De Luca, A.; Albu-Schäffer, A. Robot Collisions: A Survey on Detection, Isolation, and Identification. IEEE Trans. Rob. 2017, 33, 1292–1312. [Google Scholar] [CrossRef]
- Hogan, N. Impedance Control: An Approach to Manipulation: Part I—Theory. J. Dyn. Syst. Meas. Control 1985, 107, 1–7. [Google Scholar] [CrossRef]
- Ficuciello, F.; Villani, L.; Siciliano, B. Variable Impedance Control of Redundant Manipulators for Intuitive Human–Robot Physical Interaction. IEEE Trans. Rob. 2015, 31, 850–863. [Google Scholar] [CrossRef]
- ISO 10218-1:2025; Robotics—Safety Requirements—Part 1: Industrial Robots. International Organization for Standardization: Geneva, Switzerland, 2025.
- ISO 10218-2:2025; Robotics—Safety Requirements—Part 2: Industrial Robot Applications and Robot Cells. International Organization for Standardization: Geneva, Switzerland, 2025.
- ISO/TS 15066:2016; Robots and Robotic Devices—Collaborative Robots. International Organization for Standardization: Geneva, Switzerland, 2016.
- Hamad, Y.M.; Aydin, Y.; Basdogan, C. Adaptive Human Force Scaling via Admittance Control for Physical Human-Robot Interaction. IEEE Trans. Haptics 2021, 14, 750–761. [Google Scholar] [CrossRef]
- Kang, G.; Oh, H.S.; Seo, J.K.; Kim, U.; Choi, H.R. Variable Admittance Control of Robot Manipulators Based on Human Intention. IEEE/ASME Trans. Mechatron. 2019, 24, 1023–1032. [Google Scholar] [CrossRef]
- Sharkawy, A.-N.; Koustoumpardis, P.N.; Aspragathos, N. A Neural Network-Based Approach for Variable Admittance Control in Human–Robot Cooperation: Online Adjustment of the Virtual Inertia. Intell. Serv. Robot. 2020, 13, 495–519. [Google Scholar] [CrossRef]
- Han, L.; Zhao, L.; Huang, Y.; Xu, W. Variable Admittance Control for Safe Physical Human–Robot Interaction Considering Intuitive Human Intention. Mechatronics 2024, 97, 103098. [Google Scholar] [CrossRef]
- Dimeas, F.; Aspragathos, N. Reinforcement Learning of Variable Admittance Control for Human-Robot Co-Manipulation. In Proceedings of the 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Hamburg, Germany, 28 September–2 October 2015; pp. 1011–1016. [Google Scholar] [CrossRef]
- Du, Z.; Wang, W.; Yan, Z.; Dong, W.; Wang, W. Variable Admittance Control Based on Fuzzy Reinforcement Learning for Minimally Invasive Surgery Manipulator. Sensors 2017, 17, 844. [Google Scholar] [CrossRef]
- Roveda, L.; Maskani, J.; Franceschi, P.; Abdi, A.; Braghin, F.; Molinari Tosatti, L.; Pedrocchi, N. Model-Based Reinforcement Learning Variable Impedance Control for Human-Robot Collaboration. J. Intell. Robot. Syst. 2020, 100, 417–433. [Google Scholar] [CrossRef]
- Hu, X.; Liu, G.; Ren, P.; Jia, B.; Liang, Y.; Li, L.; Duan, S. An Admittance Parameter Optimization Method Based on Reinforcement Learning for Robot Force Control. Actuators 2024, 13, 354. [Google Scholar] [CrossRef]
- Gao, H.; Yang, Y.; Liu, J.; Sun, C. Reinforcement Learning-Based Admittance Control for Physical Human–Robot Interaction with Output Constraints. IEEE Trans. Autom. Sci. Eng. 2025, 22, 16334–16345. [Google Scholar] [CrossRef]
- Yang, Q.; Dürr, A.; Topp, E.A.; Stork, J.A.; Stoyanov, T. Variable Impedance Skill Learning for Contact-Rich Manipulation. IEEE Robot. Autom. Lett. 2022, 7, 8391–8398. [Google Scholar] [CrossRef]
- Martín-Martín, R.; Lee, M.A.; Gardner, R.; Savarese, S.; Bohg, J.; Garg, A. Variable Impedance Control in End-Effector Space: An Action Space for Reinforcement Learning in Contact-Rich Tasks. In Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China, 3–8 November 2019; pp. 1010–1017. [Google Scholar] [CrossRef]
- Rei, C.; Wang, Q.; Chen, L.; Yan, X.; Zhang, P.; Fu, L.; Wang, C.; Liu, X. Constant Force Grinding Controller for Robots Based on SAC Optimal Parameter Finding Algorithm. Sci. Rep. 2024, 14, 14127. [Google Scholar] [CrossRef] [PubMed]
- Hou, Z.; Li, Z.; Hsu, C.; Zhang, K.; Xu, J. Fuzzy Logic-Driven Variable Time-Scale Prediction-Based Reinforcement Learning for Robotic Multiple Peg-in-Hole Assembly. IEEE Trans. Autom. Sci. Eng. 2022, 19, 218–229. [Google Scholar] [CrossRef]
- Liu, Y.; Ning, R.; Sang, H.; Yu, S.; Yan, Y.; Fan, Y. TD3 Integrated Fuzzy-Finite Variable Admittance Control of Posture Estimation and Adjustment for Robotic Precise Peg-in-Hole. IEEE Trans. Autom. Sci. Eng. 2026, 23, 3782–3792. [Google Scholar] [CrossRef]
- Han, Q.; Boussaid, F.; Bennamoun, M. Soft Fuzzy Reinforcement Neural Network Proportional–Derivative Controller. Appl. Sci. 2025, 15, 5071. [Google Scholar] [CrossRef]
- Haarnoja, T.; Zhou, A.; Abbeel, P.; Levine, S. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. In Proceedings of the 35th International Conference on Machine Learning (ICML), Stockholm, Sweden, 10–15 July 2018; pp. 1861–1870. [Google Scholar]
- Kalman, R.E. A New Approach to Linear Filtering and Prediction Problems. J. Basic Eng. 1960, 82, 35–45. [Google Scholar] [CrossRef]
- Duan, J.; Liu, Z.; Bin, Y.; Cui, K.; Dai, Z. Payload Identification and Gravity/Inertial Compensation for Six-Dimensional Force/Torque Sensor with a Fast and Robust Trajectory Design Approach. Sensors 2022, 22, 439. [Google Scholar] [CrossRef]
- Wen, C.; Zheng, P.; Jing, Z.; Guo, C.; Chen, C. Force–Position Coordinated Compliance Control in the Adhesion/Detachment Process of Space Climbing Robot. Aerospace 2025, 12, 20. [Google Scholar] [CrossRef]
- Mamdani, E.H. Application of Fuzzy Algorithms for Control of Simple Dynamic Plant. Proc. Inst. Electr. Eng. 1974, 121, 1585–1588. [Google Scholar] [CrossRef]
- Ng, A.Y.; Harada, D.; Russell, S.J. Policy Invariance under Reward Transformations: Theory and Application to Reward Shaping. In Proceedings of the 16th International Conference on Machine Learning (ICML), Bled, Slovenia, 27–30 June 1999; pp. 278–287. [Google Scholar]
- Narvekar, S.; Peng, B.; Leonetti, M.; Sinapov, J.; Taylor, M.E.; Stone, P. Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey. J. Mach. Learn. Res. 2020, 21, 1–50. [Google Scholar]
- Todorov, E.; Erez, T.; Tassa, Y. MuJoCo: A Physics Engine for Model-Based Control. In Proceedings of the 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vilamoura, Portugal, 7–12 October 2012; pp. 5026–5033. [Google Scholar] [CrossRef]
- Raffin, A.; Hill, A.; Gleave, A.; Kanervisto, A.; Ernestus, M.; Dormann, N. Stable-Baselines3: Reliable Reinforcement Learning Implementations. J. Mach. Learn. Res. 2021, 22, 1–8. [Google Scholar]
- Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous Control with Deep Reinforcement Learning. In Proceedings of the 4th International Conference on Learning Representations (ICLR), San Juan, Puerto Rico, 2–4 May 2016. [Google Scholar]
- Fujimoto, S.; Hoof, H.; Meger, D. Addressing Function Approximation Error in Actor-Critic Methods. In Proceedings of the 35th International Conference on Machine Learning (ICML), Stockholm, Sweden, 10–15 July 2018; pp. 1587–1596. [Google Scholar]
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal Policy Optimization Algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar] [CrossRef]
- Shahriari, B.; Swersky, K.; Wang, Z.; Adams, R.P.; de Freitas, N. Taking the Human Out of the Loop: A Review of Bayesian Optimization. Proc. IEEE 2016, 104, 148–175. [Google Scholar] [CrossRef]















| Dragging Stage | Characteristic | Characteristic | Adjustment |
|---|---|---|---|
| Stationary | Increase | ||
| Start-up | ; small | Small | Decrease |
| Acceleration | Large positive | Small | Decrease |
| Constant velocity | Nearly constant | Small | Maintain |
| Deceleration | Small | Increase | |
| Turning | Small | Large | Increase |
| ZO | S | M | B | ||
|---|---|---|---|---|---|
| NB | IL | IL | IL | IM | |
| NM | IL | IL | IM | IM | |
| NS | IM | IM | IM | IS | |
| ZO | KP | IS | IM | IL | |
| PS | DS | KP | IS | IM | |
| PM | DM | IS | IS | IM | |
| PB | DL | KP | IS | IS | |
| Index | Propulsion Force (N) | Lateral Force (N) | Braking Force (N) | |
|---|---|---|---|---|
| 0 | 0.50 | [1.5, 7.5] | [0.75, 2.5] | [1.5, 5.0] |
| 1 | 0.65 | [1.95, 9.75] | [0.98, 3.25] | [1.95, 6.5] |
| 2 | 0.80 | [2.4, 12.0] | [1.2, 4.0] | [2.4, 8.0] |
| 3 | 1.00 | [3.0, 15.0] | [1.5, 5.0] | [3.0, 10.0] |
| 4 | 1.20 | [3.6, 18.0] | [1.8, 6.0] | [3.6, 12.0] |
| Hyperparameter | Value |
|---|---|
| Policy network architecture | Two fully connected layers, 256 units each |
| Q-network architecture | Two fully connected layers, 256 units each |
| Learning rate | 3 × 10−4 |
| Discount factor | 0.99 |
| Soft update coefficient | 0.005 |
| Batch size | 64 |
| Replay buffer warm-up steps | 100 |
| Total training episodes | 1000 (gated across 5 disturbance levels) |
| Algorithm | Mean Return (5 Seeds) | Best-Seed Return | Boundary Parameters 1 | Training Time (s) | Training Budget 2 |
|---|---|---|---|---|---|
| SAC | 339.4 ± 10.9 | 350.7 | 7/15 | 1265 | 1000 ep |
| DDPG | 329.9 ± 13.5 | 346.5 | 15/15 | 1167 | 1000 ep |
| TD3 | 329.1 ± 12.4 | 342.7 | 15/15 | 1188 | 1000 ep |
| PPO | 339.5 ± 2.9 | 341.7 | 0/15 | 1175 | 1024 ts |
| BO | 342.7 ± 3.5 | 347.9 | 2/15 | 28,796 | 25,000 env ep |
| Algorithm | Candidate Seed | Baseline Return 1 | Max Return Increase | Observable Improvements 2 | Feasible Directions 3 |
|---|---|---|---|---|---|
| SAC | 42 | 335.45 | +0.23 | 0 | 4 |
| DDPG | 42 | 335.19 | — | 0 | 3 |
| TD3 | 45 | 336.82 | +0.16 | 0 | 3 |
| PPO | 45 | 330.58 | +0.36 | 0 | 6 |
| BO | 43 | 334.01 | +0.51 | 1 | 4 |
| Group | Curriculum | Performance Gating | Disturbance Schedule | (last-50) | ||||
|---|---|---|---|---|---|---|---|---|
| G1 | — | — | — | Fixed | 0.9407 | 0.5055 | 1.5937 | 0.002 |
| G2 | ✓ | ✓ | ✓ | Gated, : 0.5–1.2 | 0.9540 | 0.5080 | 1.5940 | 0.027 |
| G3 | — | ✓ | ✓ | Gated, : 0.5–1.2 | 0.9405 | 0.5046 | 1.5947 | 0.0008 |
| G4 | ✓ | — | — | Fixed | 0.9407 | 0.5053 | 1.5937 | 0.002 |
| G5 | ✓ | ✓ | — | Fixed 200 ep/level, : 0.5–1.2 | 0.9994 | 0.5063 | 1.5943 | 0.0007 |
| Control Strategy | Damping Setting | Description |
|---|---|---|
| N·s/m | Fixed at 20 N·s/m | Low damping; high responsiveness |
| N·s/m | Fixed at 80 N·s/m | High damping; high stability |
| FAC | N·s/m, N·s/m | Hand-tuned fuzzy variable admittance |
| SAC-FAC | Same range as FAC | SAC-optimized fuzzy variable admittance |
| Control Strategy | (mm) | (N) | (N) | (m/(N·s)) |
|---|---|---|---|---|
| N·s/m | 3.56 ± 0.80 | 6.61 ± 0.51 | 6.26 ± 0.42 | 0.0263 ± 0.0014 |
| N·s/m | 3.20 ± 0.28 | 11.17 ± 1.27 | 11.24 ± 0.96 | 0.0091 ± 0.0003 |
| FAC | 3.11 ± 0.23 | 7.48 ± 0.48 | 6.97 ± 0.39 | 0.0138 ± 0.0009 |
| SAC-FAC | 3.04 ± 0.67 | 6.99 ± 0.36 | 6.56 ± 0.29 | 0.0145 ± 0.0008 |
| p-value (FAC vs. SAC-FAC) | 0.518 | <0.001 | <0.001 | <0.001 |
| Control Strategy | (mm) | (N) | (N) | (m/(N·s)) | (rad/(N·m)) |
|---|---|---|---|---|---|
| N·s/m | 3.40 ± 0.17 | 4.17 ± 0.10 | 4.01 ± 0.10 | 0.0133 ± 0.0007 | 36.09 ± 5.95 |
| N·s/m | 2.77 ± 0.22 | 5.39 ± 0.27 | 5.23 ± 0.24 | 0.0053 ± 0.0003 | 22.54 ± 3.36 |
| FAC | 2.64 ± 1.20 | 4.66 ± 0.21 | 4.51 ± 0.20 | 0.0070 ± 0.0008 | 29.11 ± 5.02 |
| SAC-FAC | 1.67 ± 0.25 | 4.43 ± 0.11 | 4.31 ± 0.07 | 0.0071 ± 0.0003 | 33.63 ± 5.08 |
| p-value (FAC vs. SAC-FAC) | <0.001 | <0.001 | <0.001 | 0.411 | <0.001 |
| Control Strategy | (mm) | (N) | (N) | (m/(N·s)) | (rad/(N·m)) |
|---|---|---|---|---|---|
| N·s/m | 5.80 ± 1.64 | 4.11 ± 0.24 | 6.77 ± 0.85 | 0.0114 ± 0.0010 | 28.99 ± 2.64 |
| N·s/m | 5.10 ± 1.10 | 6.72 ± 0.42 | 8.33 ± 0.75 | 0.0061 ± 0.0003 | 17.18 ± 0.80 |
| FAC | 3.45 ± 0.34 | 5.36 ± 0.33 | 8.20 ± 0.50 | 0.0067 ± 0.0004 | 23.46 ± 1.17 |
| SAC-FAC | 2.76 ± 0.26 | 4.11 ± 0.18 | 7.36 ± 0.33 | 0.0086 ± 0.0005 | 25.26 ± 1.25 |
| p-value (FAC vs. SAC-FAC) | <0.001 | <0.001 | <0.001 | <0.001 | <0.001 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Song, Y.; Ma, G. SAC-Optimized Fuzzy Variable Admittance Control for Lead-Through Teaching of Collaborative Robots. Sensors 2026, 26, 3576. https://doi.org/10.3390/s26113576
Song Y, Ma G. SAC-Optimized Fuzzy Variable Admittance Control for Lead-Through Teaching of Collaborative Robots. Sensors. 2026; 26(11):3576. https://doi.org/10.3390/s26113576
Chicago/Turabian StyleSong, Yu, and Guoqing Ma. 2026. "SAC-Optimized Fuzzy Variable Admittance Control for Lead-Through Teaching of Collaborative Robots" Sensors 26, no. 11: 3576. https://doi.org/10.3390/s26113576
APA StyleSong, Y., & Ma, G. (2026). SAC-Optimized Fuzzy Variable Admittance Control for Lead-Through Teaching of Collaborative Robots. Sensors, 26(11), 3576. https://doi.org/10.3390/s26113576
