# Development of a New Robust Stable Walking Algorithm for a Humanoid Robot Using Deep Reinforcement Learning with Multi-Sensor Data Fusion

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Walking Pattern Generator

#### 2.1. Walking Trajectory

#### 2.1.1. Ankle Trajectory

#### 2.1.2. Hip Trajectory

#### 2.2. Kinematic Analysis

#### 2.2.1. Forward Kinematics

#### 2.2.2. Inverse Kinematics

## 3. Orientation Angles and Gait Stability Criterion

#### 3.1. Calculation of Orientation Angles

#### 3.2. Calculation of Gait Stability Criterion

## 4. Proposed Framework of Gait Parameter Optimization

#### 4.1. Training with Dueling Double Deep Q Network

Algorithm 1 D3QN |

Initialize a stable humanoid gait system (Environment) |

Initialize $\mathcal{D}$ repeat buffer with $M$ capacity |

Initialize ${\mathcal{Q}}_{1}$ Evaluation Network with random $\theta $ parameters |

Initialize ${\mathcal{Q}}_{2}$ Target Network with ${\theta}^{-}\leftarrow \theta $ parameters |

Refill buffer with data generated by random policy |

while$True$do |

Restart stable humanoid walking system |

while $Notermination$ do |

Choose a random action $a$ with probability $\u03f5$, otherwise choose action $a=\mathrm{arg}\underset{a}{\mathrm{max}}{Q}_{1}\left({s}_{t},a;\theta \right)$ |

Perform action $a$, get reward $r$ and get next state ${s}^{\prime}$ |

Save experience tuple 〈$\mathrm{s},a,r,{s}^{\prime}$〉 to $\mathcal{D}$ |

Sample mini batch from $\mathcal{D}$ |

if $therei\mathrm{s}aterminationat\mathrm{s}tepj+1$ then |

${Y}_{j}={r}_{j}$ |

else |

${a}_{max}=\mathrm{arg}\underset{a}{\mathrm{max}}{Q}_{1}\left({s}^{\prime},a;\theta \right)$ |

${Y}_{j}=\gamma \underset{{a}^{\prime}}{\mathrm{max}}{Q}_{2}({s}^{\prime},{a}_{max};{\theta}^{-})$ |

${\widehat{Y}}_{j}={Q}_{1}\left(s,a;\theta \right)$ |

Minimize loss to train ${Q}_{1}$ for updating $\theta $ |

Update Target Network at every $C$ steps: ${\theta}^{-}\leftarrow \theta $ |

if${Q}_{1}$ converges then |

${Q}_{2}={Q}_{1}$ |

${Q}_{optimal}={Q}_{2}$ |

break |

#### 4.1.1. The Architecture of D3QN

#### 4.1.2. State Space

State | Unit | Size |
---|---|---|

$\mathrm{Orientation}\mathrm{angles}(\mathrm{pitch}(\mathsf{\theta}$$)\mathrm{and}\mathrm{roll}(\mathsf{\Phi}$)) | deg. | 2 |

$\mathrm{Position}({\mathrm{P}}_{\mathrm{x}},{\mathrm{P}}_{\mathrm{y}},{\mathrm{P}}_{\mathrm{z}}$) | mm | 3 |

$\mathrm{Rotation}\mathrm{in}\mathrm{the}\mathrm{z}-\mathrm{axis}({\mathrm{R}}_{\mathrm{z}}$) | rad. | 1 |

$\mathrm{Zero}\mathrm{Moment}\mathrm{Point}({\mathrm{ZMP}}_{\mathrm{x}}{\mathrm{and}\mathrm{ZMP}}_{\mathrm{y}}$) | mm | 2 |

$\mathrm{Distance}\mathrm{taken}\mathrm{in}\mathrm{the}\mathrm{forward}\mathrm{direction}({\mathrm{D}}_{\mathrm{x}}$) | mm | 1 |

#### 4.1.3. Action Space

#### 4.1.4. Reward Function

#### 4.2. Body Posture Balancing

## 5. Experimental Results

## 6. Conclusions and Future Work

## Author Contributions

## Funding

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## Appendix A

## References

- Silva, M.F.; Machado, J.A.T. A literature review on the optimization of legged robots. J. Vib. Control.
**2011**, 18, 1753–1767. [Google Scholar] [CrossRef] - Chung, R.-L.; Hsueh, Y.; Chen, S.-L.; Abu, P.A.R. Efficient and Accurate CORDIC Pipelined Architecture Chip Design Based on Binomial Approximation for Biped Robot. Electronics
**2022**, 11, 1701. [Google Scholar] [CrossRef] - Rostro-Gonzalez, H.; Lauterio-Cruz, J.; Pottiez, O. Modelling Neural Dynamics with Optics: A New Approach to Simulate Spiking Neurons through an Asynchronous Laser. Electronics
**2020**, 9, 1853. [Google Scholar] [CrossRef] - Liu, C.; Wang, D.; Chen, Q. Central Pattern Generator Inspired Control for Adaptive Walking of Biped Robots. IEEE Trans. Syst. Man Cybern. Syst.
**2013**, 43, 1206–1215. [Google Scholar] [CrossRef] - Yu, J.; Tan, M.; Chen, J.; Zhang, J. A Survey on CPG-Inspired Control Models and System Implementation. IEEE Trans. Neural Netw. Learn. Syst.
**2013**, 25, 441–456. [Google Scholar] [CrossRef] - Bai, L.; Hu, H.; Chen, X.; Sun, Y.; Ma, C.; Zhong, Y. CPG-Based Gait Generation of the Curved-Leg Hexapod Robot with Smooth Gait Transition. Sensors
**2019**, 19, 3705. [Google Scholar] [CrossRef] [Green Version] - Liu, C.-C.; Lee, T.-T.; Xiao, S.-R.; Lin, Y.-C.; Wong, C.-C. Real-Time FPGA-Based Balance Control Method for a Humanoid Robot Pushed by External Forces. Appl. Sci.
**2020**, 10, 2699. [Google Scholar] [CrossRef] [Green Version] - Morales, E.F.; Zaragoza, J.H. An Introduction to Reinforcement Learning. In Decision Theory Models for Applications in Artificial Intelligence: Concepts and Solution; IGI Global: Hershey, PA, USA, 2012; pp. 63–80. [Google Scholar] [CrossRef] [Green Version]
- Kasaei, M.; Lau, N.; Pereira, A. A Fast and Stable Omnidirectional Walking Engine for the Nao Humanoid Robot. In RoboCup 2019: Robot World Cup XXIII, Proceedings of the RoboCup 2019, Sydney, NSW, Australia, 2–8 July 2019; Springer: Berlin/Heidelberg, Germany, 2019; pp. 99–111. [Google Scholar] [CrossRef] [Green Version]
- MacAlpine, P.; Barrett, S.; Urieli, D.; Vu, V.; Stone, P. Design and Optimization of an Omnidirectional Humanoid Walk: A Winning Approach at the RoboCup 2011 3D Simulation Competition. Proc. Conf. AAAI Artif. Intell.
**2012**, 26, 1047–1053. [Google Scholar] [CrossRef] - Or, J. A hybrid CPG–ZMP control system for stable walking of a simulated flexible spine humanoid robot. Neural Netw.
**2010**, 23, 452–460. [Google Scholar] [CrossRef] - He, B.; Wang, Z.; Shen, R.; Hu, S. Real-time Walking Pattern Generation for a Biped Robot with Hybrid CPG-ZMP Algorithm. Int. J. Adv. Robot. Syst.
**2014**, 11, 160. [Google Scholar] [CrossRef] - Kasaei, S.M.; Simões, D.; Lau, N.; Pereira, A. A Hybrid ZMP-CPG Based Walk Engine for Biped Robots. In Proceedings of the ROBOT 2017: Third Iberian Robotics Conference, Sevilla, Spain, 22–24 November 2017; pp. 743–755. [Google Scholar] [CrossRef]
- Chang, L.; Piao, S.; Leng, X.; He, Z.; Zhu, Z. Inverted pendulum model for turn-planning for biped robot. Phys. Commun.
**2020**, 42, 101168. [Google Scholar] [CrossRef] - Pelit, M.M.; Chang, J.; Takano, R.; Yamakita, M. Bipedal Walking Based on Improved Spring Loaded Inverted Pendulum Model with Swing Leg (SLIP-SL). In Proceedings of the IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM), Boston, MA, USA, 6–9 July 2020; pp. 72–77. [Google Scholar] [CrossRef]
- Li, L.; Xie, Z.; Luo, X.; Li, J. Trajectory Planning of Flexible Walking for Biped Robots Using Linear Inverted Pendulum Model and Linear Pendulum Model. Sensors
**2021**, 21, 1082. [Google Scholar] [CrossRef] [PubMed] - Menga, G. The Spherical Inverted Pendulum: Exact Solutions of Gait and Foot Placement Estimation Based on Symbolic Computation. Appl. Sci.
**2021**, 11, 1588. [Google Scholar] [CrossRef] - Vukobratović, M.; Borovac, B. Zero-Moment Point—Thirty Five Years Of Its Life. Int. J. Hum. Robot.
**2004**, 1, 157–173. [Google Scholar] [CrossRef] - Bin Peng, X.; Berseth, G.; van de Panne, M. Dynamic terrain traversal skills using reinforcement learning. ACM Trans. Graph.
**2015**, 34, 1–11. [Google Scholar] [CrossRef] - Le, A.; Veerajagadheswar, P.; Kyaw, P.T.; Elara, M.; Nhan, N. Coverage Path Planning Using Reinforcement Learning-Based TSP for hTetran—A Polyabolo-Inspired Self-Reconfigurable Tiling Robot. Sensors
**2021**, 21, 2577. [Google Scholar] [CrossRef] - Huang, Y.; Wei, G.; Wang, Y. V-D D3QN: The Variant of Double Deep Q-Learning Network with Dueling Architecture. In Proceedings of the 37th Chinese Control Conference (CCC), Wuhan, China, 25–27 July 2018. [Google Scholar] [CrossRef]
- Michel, O. WebotsTM: Professional Mobile Robot Simulation. Int. J. Adv. Robot. Syst.
**2004**, 1, 39–42. [Google Scholar] [CrossRef] [Green Version] - Brockman, G.; Cheung, V.; Pettersson, L.; Schneider, J.; Schulman, J.; Tang, J.; Zaremba, W. OpenAI Gym. arXiv
**2016**, arXiv:1606.01540. [Google Scholar] - Heess, N.; Dhruva, T.B.; Sriram, S.; Lemmon, J.; Merel, J.; Wayne, G.; Tassa, Y.; Erez, T.; Wang, Z.; Ali Eslami, S.M.; et al. Emergence of Locomotion Behaviours in Rich Environments. arXiv
**2017**, arXiv:1707.02286. [Google Scholar] [CrossRef] - Gil, C.R.; Calvo, H.; Sossa, H. Learning an Efficient Gait Cycle of a Biped Robot Based on Reinforcement Learning and Artificial Neural Networks. Appl. Sci.
**2019**, 9, 502. [Google Scholar] [CrossRef] [Green Version] - Moodie, E.E.M.; Dean, N.; Sun, Y.R. Q-Learning: Flexible Learning About Useful Utilities. Stat. Biosci.
**2013**, 6, 223–243. [Google Scholar] [CrossRef] - Liu, C.; Ning, J.; Chen, Q. Dynamic walking control of humanoid robots combining linear inverted pendulum mode with parameter optimization. Int. J. Adv. Robot. Syst.
**2018**, 15, 1729881417749672. [Google Scholar] [CrossRef] - Peters, J. Policy gradient methods. Scholarpedia
**2010**, 5, 3698. [Google Scholar] [CrossRef] - Lin, J.-L.; Hwang, K.-S.; Jiang, W.-C.; Chen, Y.-J. Gait Balance and Acceleration of a Biped Robot Based on Q-Learning. IEEE Access
**2016**, 4, 2439–2449. [Google Scholar] [CrossRef] - Silva, I.J.; Perico, D.H.; Homem, T.; Vilão, C.O., Jr.; Tonidandel, F.; Bianchi, R.A.C. Humanoid Robot Gait on Sloping Floors Using Reinforcement Learning. In Robotics, Proceedings of the 12th Latin American Robotics Symposium and Third Brazilian Symposium on Robotics, LARS 2015/SBR 2015, Uberlândia, Brazil, 28 October–1 November 2015; Springer: Berlin/Heidelberg, Germany, 2016; pp. 228–246. [Google Scholar] [CrossRef]
- Silva, I.J.; Perico, D.H.; Costa, A.H.; Bianchi, R.A. Using Reinforcement Learning to Optimize Gait Generation. In Proceedings of the XIII Simpósio Brasileiro de Automaçao Inteligente, Porto Alegre, Brazil, 1–4 October 2017; pp. 288–294. [Google Scholar]
- Tesauro, G. Temporal difference learning and TD-Gammon. Commun. ACM
**1995**, 38, 58–68. [Google Scholar] [CrossRef] - Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous Control with Deep Reinforcement Learning. In Proceedings of the 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, 2–4 May 2016. [Google Scholar]
- Kumar, A.; Paul, N.; Omkar, S.N. Bipedal Walking Robot Using Deep Deterministic Policy Gradient. arXiv
**2018**, arXiv:1807.05924. [Google Scholar] - Heess, N.; Hunt, J.J.; Lillicrap, T.P.; Silver, D. Memory-Based Control with Recurrent Neural Networks. arXiv
**2015**, arXiv:1512.04455. [Google Scholar] [CrossRef] - Song, D.R.; Yang, C.; McGreavy, C.; Li, Z. Recurrent Deterministic Policy Gradient Method for Bipedal Locomotion on Rough Terrain Challenge. In Proceedings of the 15th International Conference on Control, Automation, Robotics and Vision (ICARCV), Singapore, 18–21 November 2018; pp. 311–318. [Google Scholar] [CrossRef] [Green Version]
- Kasaei, M.; Abreu, M.; Lau, N.; Pereira, A.; Reis, L.P. Robust biped locomotion using deep reinforcement learning on top of an analytical control approach. Robot. Auton. Syst.
**2021**, 146, 103900. [Google Scholar] [CrossRef] - Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal Policy Optimization Algorithms. arXiv
**2017**, arXiv:1707.06347. [Google Scholar] [CrossRef] - Jiang, Y.; Zhang, W.; Farrukh, F.U.D.; Xie, X.; Zhang, C. Motion Sequence Learning for Robot Walking Based on Pose optimization. In Proceedings of the IEEE International Conference on Mechatronics and Automation (ICMA), Beijing, China, 13–16 October 2020; pp. 1877–1882. [Google Scholar] [CrossRef]
- Zhang, W.; Jiang, Y.; Farrukh, F.U.D.; Zhang, C.; Zhang, D.; Wang, G. LORM: A novel reinforcement learning framework for biped gait control. PeerJ Comput. Sci.
**2022**, 8, e927. [Google Scholar] [CrossRef] - Christiano, P.F.; Leike, J.; Brown, T.B.; Martic, M.; Legg, S.; Amodei, D. Deep Reinforcement Learning from Human Preferences. In Advances in Neural Information Processing Systems 30, Proceedings of the 31st Annual Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; Curran Associates, Inc.: New York, NY, USA, 2018; pp. 4300–4308. [Google Scholar]
- Peng, X.B.; Berseth, G.; Yin, K.; van de Panne, M. DeepLoco: Dynamic locomotion skills using hierarchical deep reinforcement learning. ACM Trans. Graph.
**2017**, 36, 1–13. [Google Scholar] [CrossRef] - Xi, A.; Chen, C. Walking Control of a Biped Robot on Static and Rotating Platforms Based on Hybrid Reinforcement Learning. IEEE Access
**2020**, 8, 148411–148424. [Google Scholar] [CrossRef] - Wawrzyński, P. Reinforcement Learning with Experience Replay for Model-Free Humanoid Walking Optimization. Int. J. Hum. Robot.
**2014**, 11, 1450024. [Google Scholar] [CrossRef] - Feirstein, D.S.; Koryakovskiy, I.; Kober, J.; Vallery, H. Reinforcement Learning of Potential Fields to achieve Limit-Cycle Walking. IFAC-PapersOnLine
**2016**, 49, 113–118. [Google Scholar] [CrossRef] - Leng, J.; Fan, S.; Tang, J.; Mou, H.; Xue, J.; Li, Q. M-A3C: A Mean-Asynchronous Advantage Actor-Critic Reinforcement Learning Method for Real-Time Gait Planning of Biped Robot. IEEE Access
**2022**, 10, 76523–76536. [Google Scholar] [CrossRef] - Tao, C.; Xue, J.; Zhang, Z.; Gao, Z. Parallel Deep Reinforcement Learning Method for Gait Control of Biped Robot. IEEE Trans. Circuits Syst. II Express Briefs
**2022**, 69, 2802–2806. [Google Scholar] [CrossRef] - Liu, C.; Audu, M.L.; Triolo, R.J.; Quinn, R.D. Neural Networks Trained via Reinforcement Learning Stabilize Walking of a Three-Dimensional Biped Model with Exoskeleton Applications. Front. Robot. AI
**2021**, 8, 253. [Google Scholar] [CrossRef] - Liu, C.; Lonsberry, A.G.; Nandor, M.J.; Audu, M.L.; Lonsberry, A.J.; Quinn, R.D. Implementation of Deep Deterministic Policy Gradients for Controlling Dynamic Bipedal Walking. Biomimetics
**2019**, 4, 28. [Google Scholar] [CrossRef] [Green Version] - Huang, C.; Wang, G.; Zhou, Z.; Zhang, R.; Lin, L. Reward-Adaptive Reinforcement Learning: Dynamic Policy Gradient Optimization for Bipedal Locomotion. IEEE Trans. Pattern Anal. Mach. Intell.
**2022**, 1–10. [Google Scholar] [CrossRef] - Jang, J.-S.R. ANFIS: Adaptive-Network-Based Fuzzy Inference System. IEEE Trans. Syst. Man Cybern.
**1993**, 23, 665–685. [Google Scholar] [CrossRef] - Van Hasselt, H.; Guez, A.; Silver, D. Deep Reinforcement Learning with Double Q-Learning. Proc. Conf. AAAI Artif. Intell.
**2016**, 30, 2094–2100. [Google Scholar] [CrossRef] - Li, T.-H.S.; Kuo, P.-H.; Chen, L.-H.; Hung, C.-C.; Luan, P.-C.; Hsu, H.-P.; Chang, C.-H.; Hsieh, Y.-T.; Lin, W.-H. Fuzzy Double Deep Q-Network-Based Gait Pattern Controller for Humanoid Robots. IEEE Trans. Fuzzy Syst.
**2020**, 30, 147–161. [Google Scholar] [CrossRef] - Wong, C.-C.; Liu, C.-C.; Xiao, S.-R.; Yang, H.-Y.; Lau, M.-C. Q-Learning of Straightforward Gait Pattern for Humanoid Robot Based on Automatic Training Platform. Electronics
**2019**, 8, 615. [Google Scholar] [CrossRef] [Green Version] - Webots User Guide, ROBOTIS’ Robotis OP2. 2021. Available online: https://cyberbotics.com/doc/guide/robotis-op2 (accessed on 15 September 2021).
- Narváez, F.; Árbito, F.; Proaño, R. A Quaternion-Based Method to IMU-to-Body Alignment. In Proceedings of the DMH 2018: International Conference on Digital Human Modeling and Applications in Health, Safety, Ergonomics and Risk Management, Las Vegas, NV, USA, 15–20 July 2018; pp. 217–231. [Google Scholar] [CrossRef]
- Athans, M. Kalman Filtering. In The Control Systems Handbook. Control System Advanced Methods, 2nd ed.; CRC Press: Boca Raton, FL, USA, 2010; pp. 295–306. [Google Scholar] [CrossRef]
- Han, T.R.; Paik, N.J.; Im, M.S. Quantification of the path of center of pressure (COP) using an F-scan in-shoe transducer. Gait Posture
**1999**, 10, 248–254. [Google Scholar] [CrossRef] - Galanis, G.; Anadranistakis, M. A one-dimensional Kalman filter for the correction of near surface temperature forecasts. Meteorol. Appl.
**2002**, 9, 437–441. [Google Scholar] [CrossRef] [Green Version] - Ha, I.; Tamura, Y.; Asama, H. Gait pattern generation and stabilization for humanoid robot based on coupled oscillators. In Proceedings of the 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems, San Francisco, CA, USA, 25–30 September 2011; pp. 1737–1742. [Google Scholar] [CrossRef]

**Figure 9.**Relationship between ZMP and CoP: (

**a**) dynamically stable gait, (

**b**) dynamically unsteady gait.

**Figure 10.**Robotis-OP2’s FSRs: (

**a**) layout and numbering of forces (top view), (

**b**) placement in Webots.

**Figure 20.**Default parameters in the “config.ini” file [55].

**Figure 21.**Comparison of orientation angles calculated with Kalman filter: (

**a**) pitch angle, (

**b**) roll angle.

**Figure 22.**Comparison of ZMP values: (

**a**) ${\mathrm{ZMP}}_{\mathrm{x}}$, (

**b**) ${\mathrm{ZMP}}_{\mathrm{y}}$.

**Figure 23.**The robot’s initial states of walking: (

**a**) proposed framework, (

**b**) walking algorithm of robot.

**Figure 24.**Comparison of orientation angles of real robot during walking: (

**a**) pitch angle, (

**b**) roll angle.

Axis | $\mathit{\theta}$ (rad) | d (mm) | a (mm) | $\mathit{\alpha}$ (rad) |
---|---|---|---|---|

1 | ${\mathsf{\theta}}_{1}$ | 0 | 0 | −π/2 |

2 | ${\mathsf{\theta}}_{2}$ | 0 | 93 | 0 |

3 | ${\mathsf{\theta}}_{3}$ | 0 | 93 | 0 |

4 | ${\mathsf{\theta}}_{4}$ | 0 | 0 | π/2 |

5 | ${\mathsf{\theta}}_{5}$ | 0 | 0 | 0 |

**Table 2.**Joint angle limits of Robotis-OP2 legs [55].

Joint | Limits of Joint Angle (Degree) | |||||
---|---|---|---|---|---|---|

Right Leg | Left Leg | Selected Limits | ||||

min. | max. | min. | max. | min. | max. | |

$\mathrm{Ankle}\left(\mathrm{roll}\right){\mathsf{\theta}}_{1}$ | −39 | 60 | −58 | 34 | −58 | 34 |

$\mathrm{Ankle}\left(\mathrm{pitch}\right){\mathsf{\theta}}_{2}$ | −71 | 78 | −79 | 70 | −79 | 70 |

$\mathrm{Knee}{\mathsf{\theta}}_{3}$ | 0 | 128 | −129 | 0 | 0 | 128 |

$\mathrm{Hip}\left(\mathrm{pitch}\right){\mathsf{\theta}}_{4}$ | −101 | 25 | −28 | 96 | −96 | 25 |

$\mathrm{Hip}\left(\mathrm{roll}\right){\mathsf{\theta}}_{5}$ | −57 | 58 | −57 | 53 | −57 | 53 |

Parameter | Accelerometer | Gyroscope |
---|---|---|

${\mathrm{ADC}}_{\mathrm{value}}$ | 0~1023 | 0~1023 |

${\mathrm{V}}_{\mathrm{ref}}$ | 3.3 V | 3.3 V |

Bit Rate | 1024 | 1024 |

${\mathrm{V}}_{\mathrm{zero}}$ | 1.65 V | 1.65 V |

Sensitivity | 0.33 V/g | 0.0008 V/(degree/s) |

R | $\mathrm{a}$ | $\mathsf{\omega}$ |

Layer Name | Layer Type | Neuron Numbers | Activation Type |
---|---|---|---|

Input | Fully connected | 9 | - |

Common FC | Fully connected | 512 | ReLU |

Common FC | Fully connected | 256 | ReLU |

FC1 for Value | Fully connected | 128 | ReLU |

FC1 for Advantage | Fully connected | 128 | ReLU |

FC2 for Value | Fully connected | 1 | Linear |

FC2 for Advantage | Fully connected | 14,400 | Linear |

Output | Fully connected | 14,400 | - |

Hyperparameter | Value |
---|---|

Mini batch size | 32 |

Replay memory size | 15,000 |

$\mathrm{Discount}\mathrm{factor}\gamma $ | 0.95 |

Learning rate | 0.0001 |

$\mathrm{Initial}\mathrm{exploration}\mathrm{rate}\u03f5$$\mathrm{Min}.\mathrm{exploration}\mathrm{rate}{\u03f5}_{min}$ | 1 0.01 |

Parameter | Discrete Values |
---|---|

$\mathrm{w}$ | [25 30 35 40 45 50 55 60 65 70] |

$\mathrm{s}$ | [25 30 35 40 45 50 55 60 65 70 75 80] |

$\mathrm{h}$ | [25 30 35 40 45 50] |

DSR | [0.3 0.4 0.5 0.6] |

${\mathsf{\theta}}_{\mathrm{r}}$ | [−3 −8 −13 −18 −23] |

Parameter | Value | Unit |
---|---|---|

$\mathrm{w}$ | 50 | mm |

$\mathrm{s}$ | 60 | mm |

$\mathrm{h}$ | 30 | mm |

DSR | 0.6 | - |

${\mathsf{\theta}}_{\mathrm{r}}$ | −13 | degree |

Method | Pitch Angle (deg.) | Roll Angle (deg.) | ||
---|---|---|---|---|

min. | max. | min. | max. | |

Proposed framework | −13.446 | −12.995 | −1.964 | 2.332 |

Robot’s walking algorithm | −13.457 | −7.126 | −10.033 | 12.233 |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Kaymak, Ç.; Uçar, A.; Güzeliş, C.
Development of a New Robust Stable Walking Algorithm for a Humanoid Robot Using Deep Reinforcement Learning with Multi-Sensor Data Fusion. *Electronics* **2023**, *12*, 568.
https://doi.org/10.3390/electronics12030568

**AMA Style**

Kaymak Ç, Uçar A, Güzeliş C.
Development of a New Robust Stable Walking Algorithm for a Humanoid Robot Using Deep Reinforcement Learning with Multi-Sensor Data Fusion. *Electronics*. 2023; 12(3):568.
https://doi.org/10.3390/electronics12030568

**Chicago/Turabian Style**

Kaymak, Çağrı, Ayşegül Uçar, and Cüneyt Güzeliş.
2023. "Development of a New Robust Stable Walking Algorithm for a Humanoid Robot Using Deep Reinforcement Learning with Multi-Sensor Data Fusion" *Electronics* 12, no. 3: 568.
https://doi.org/10.3390/electronics12030568