Next Article in Journal
BoltVision: A Comparative Analysis of CNN, CCT, and ViT in Achieving High Accuracy for Missing Bolt Classification in Train Components
Next Article in Special Issue
Design and Control of a Climbing Robot for Autonomous Vertical Gardening
Previous Article in Journal
Collaborative Behavior for Non-Conventional Custom-Made Robotics: A Cable-Driven Parallel Robot Application
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Realization of a Human-like Gait for a Bipedal Robot Based on Gait Analysis †

1
Mechanical Engineering Program, Graduate School of Science and Technology, Meiji University, Kawasaki 214-8571, Japan
2
Graduate School of Information, Production and Systems, Waseda University, Kitakyushu 808-0135, Japan
*
Authors to whom correspondence should be addressed.
This paper is an extended version of our paper published in 26th International Conference Series on Climbing and Walking Robots and the Support Technologies for Mobile Machines (CLAWAR 2023), Florianópolis, Brazil, 2–4 October 2023.
Machines 2024, 12(2), 92; https://doi.org/10.3390/machines12020092
Submission received: 19 December 2023 / Revised: 19 January 2024 / Accepted: 23 January 2024 / Published: 25 January 2024
(This article belongs to the Special Issue The Latest Advances in Climbing and Walking Robots)

Abstract

:
There are many studies analyzing human motion. However, we do not yet fully understand the mechanisms of our own bodies. We believe that mimicking human motion and function using a robot will help us to deepen our understanding of humans. Therefore, we focus on the characteristics of the human gait, and the goal is to realize a human-like bipedal gait that lands on its heels and takes off from its toes. In this study, we focus on kinematic synergy (planar covariation) in the lower limbs as a characteristic gait seen in humans. Planar covariation is that elevation angles at the thigh, shank, and foot in the sagittal plane are plotted on one plane when the angular data are plotted on the three axes. We propose this feature as a reward for reinforcement learning. By introducing this reward, the bipedal robot achieved a human-like bipedal gait in which the robot lands on its heels and takes off from its toes. We also compared the learning results with those obtained when this feature was not used. The results suggest that planar covariation is one factor that characterizes a human-like gait.

1. Introduction

There are many studies that have analyzed the mechanisms and movements of the human body [1,2,3,4]. However, we do not yet fully understand the mechanisms and functions of our own bodies.
To date, robots have been used to analyze various angles to understand the mechanisms and movements of the human body. Kamikawa et al. developed a five-finger prosthetic hand with an underactuated mechanism optimized for human-like grasp force distribution that can grasp a variety of objects robustly [5]. This confirmed the effectiveness of human-like grasp force distribution in robust power grasping with the five-finger prosthetic hand. Hashimoto et al. developed a human-like foot mechanism in WABIAN-2R that mimics the elastic properties of the arch of the human’s foot and changes in arch height during walking to clarify the function of the foot arch structure [6,7]. Several walking experiments were conducted using a gait stabilization control based on gait analysis to quantitatively clarify the function of the arch structure [8]. Cheng et al. developed the Computational Brain (CB) [9], which was created to explore the fundamental processing of the human brain while dealing with the real world. Ude et al. focused on visual attention models and eye movement, saccadic eye movements, and Vestibulo-ocular reflex (VOR), and implemented them in CBs, thereby achieving eye movement responses comparable to those of humans [10]. Kryczka et al. focused on the tendency of humans to naturally stabilize their head posture while walking and proposed a head stabilization controller based on an IMU installed in the head of KOBIAN [11,12]. Using this control, the vibration of the robot’s head was shown to be reduced during walking and when subjected to large-amplitude vibrations. To clarify the relationship between the behavior of torso motion during static and dynamic walking and the center of pressure (CoP) under each foot of the walker, Ferreira et al. used four force sensors under each foot to obtain the CoP. This confirmed the influence of the human torso angle during walking on the CoP position [13]. Based on the above previous studies, we believe that mimicking humans, not only in appearance, but also in movement and function, will help to deepen our understanding of humans.
In this study, we aim to understand the characteristics of the human gait by applying them to the bipedal robot and to realize a human-like bipedal gait in which the robot lands on its heels and takes off from its toes. We focus on kinematic synergy in the lower limb and aim to realize a human-like bipedal gait by reproducing kinematic synergy in simulation using deep reinforcement learning.
The article, first published in the proceedings of the 26th International Conference on Climbing and Walking Robots (CLAWAR 2023), is an extended description of the concept introduced in [14]. The paper is organized as follows: Section 2 describes the characteristics of human gait, the simulation environment, and the control method used in this study; Section 3 evaluates the gait locomotion and learning results; Section 4 summarizes the results and discusses future work.

2. Materials and Methods

2.1. Characteristics of the Human Gait

In this study, we focus on the kinematic synergy [15] (planar covariation), highlighted by Borghese et al., as a characteristic gait of humans. Planar covariation is the angular data being placed on a single plane when the elevation angles at the thigh, shank, and foot are plotted on three axes in the sagittal plane of a human body. This allows humans to lower and control the redundant degrees of freedom of the joints. This kinematic synergy is not dependent on the walking speed. And it is observed when walking on hard or soft surfaces, or when climbing stairs or hopping [16]. The contribution of the elevation angles to human walking on hard and soft surfaces was determined through principal component analysis and is shown in Table 1. The first and second principal components are the basis vectors of the approximate plane, and the third principal component is the normal vector of the approximate plane. It is known that the cumulative contribution ratio up to the second principal component is more than 99% in humans during walking and that the first and second principal components do not differ significantly between the left and right feet. It is also seen not only in humans but also in mammals, such as cats and Japanese macaques [17], so it is considered to be an essential characteristic of the gait.
Figure 1 shows the definition of elevation angles at the thigh, shank, and foot. As a concrete example, the elevation angles at the thigh, shank, and foot are plotted in 3D space when walking on hard and soft surfaces and are shown in Figure 2. As shown in Figure 2, there is no significant change in the three elevation angles during the heel strike and toe-off during the human gait. This indicates that the leg motion during walking is cyclically repeated.
Although planar covariation has been used for robot control [18,19], these often use multiple controls in combination. This study aims to realize a human-like gait by setting a planar covariation as a reward function for reinforcement learning. The reward function is based on planar covariation and does not depend on other control elements. This method is expected to verify the hypothesis that planar covariation reflects kinematic synergies in a human-like gait.

2.2. Simulation Environment

In this study, we used PyBullet Gymperium (version: 0.1) [20], open-source software based on the physics simulator named PyBullet [21]. We used the environment Walker2DPyBulletEnv-v0, designed to make the bipedal robot walk. We use walker2d for the bipedal robot model. This robot model has three joints (hip joint, knee joint, and ankle joint) for each of the two legs. The total of six joints in both legs is controlled by applying torque. Table 2 shows the range of motion of each joint, which is based on that of a human [22].

2.3. Control Method

2.3.1. Deep Reinforcement Learning

The main walking control methods for the bipedal robots are model-based control and learning control. In this study, we use learning control, which has potential for future development and has been actively studied [23,24,25,26].
Humans learn to walk as infants by adjusting their movement methods through trial and error according to their abilities and the characteristics of the terrain [27]. This learning process is considered similar to the process of reinforcement learning. In reinforcement learning, the actions that maximize future value are learned through trial and error. The neural network used in deep learning is said to mimic the mechanism of human neurons [28]. For these reasons, among learning controls, we believe that deep reinforcement learning can be employed to incorporate human gait characteristics as rewards for acquiring a human-like bipedal gait in various environments.
We first explain reinforcement learning. In reinforcement learning, the agent executes actions according to the policy, and then, the environment returns rewards to the agent as feedback based on the state of the position, velocity, posture, joint angles, etc., and the agent’s choice of action. The agent improves the policy so that the cumulative reward received in the sequence of actions becomes larger. The above is repeated to find the optimal solution.
Reinforcement learning has the problem that it can only handle discrete values of the action space. Therefore, we combine the high-function approximation ability of neural networks with the action selection of reinforcement learning. As a result, the continuous action space can be learned with continuous values without artificially discretizing it, and a system that can respond more flexibly to changes in the state can be realized.
In this study, we use the policy gradient method, in which the policy is a function expressed by a certain parameter, and the policy can be learned directly by learning the parameter. Moreover, using the REINFORCE algorithm and Gaussian policy, the policy gradient methods can be used in a continuous action space [29].

2.3.2. Policy Gradient Methods

The update equation of the policy parameter θ can be expressed as follows:
θ t + 1 = θ t + α θ J θ t
where α is the learning rate factor, and θ denotes the partial differential vector with respect to θ , where θ is a multidimensional vector.
In the policy gradient method, we consider the problem of maximizing the objective function, the expected return J θ . We define the objective function J θ as the value function   v π s 0 calculated under the policy π a | s 0 , θ at the time of the start of learning, since this is the expected return at state s 0 .
J θ = v π s 0 E π G t | S t = s 0
where G t is the cumulative discounted reward, and S t is the state variable at time t . The gradient θ J θ with respect to the policy parameters of the objective function is expressed as follows for any policy   π a | s 0 , θ , differentiable with respect to the parameter θ , and for the objective function   J θ , defined by Equation (2) [30].
θ J θ = E π θ log π a | s , θ q π s , a
where a is the action and q π s , a is the action value function. There are two problems to calculate Equation (3): “It is difficult to calculate the expected value” and “Estimation of the action value function is necessary”.
The solution to the first problem is to use Monte Carlo approximation, which is a method for approximating the expected value. This allows the expected value to be calculated even when the probability distribution is complex. Based on the probabilistic policy   π , the action is executed for T steps, and the gradient is approximated from the obtained state, action, and reward observations.
θ J θ 1 T t = 0 T 1 θ log π A t | S t , θ q π S t , A t
The solution to the second problem is to use the REINFORCE algorithm, which does not require direct estimation of the action value function but approximates it with the actual cumulative discounted reward or reward obtained. The action value function q π s , a in Equation (4) is approximated based on the cumulative discounted reward G t .
G t = k = 1 T t γ k 1 R t + k
θ J θ 1 T t = 0 T 1 θ log π θ A t | S t G t
where γ is the discount rate factor, and R t is the reward variable at time t . The REINFORCE algorithm reduces the variance of the action value function q π s , a by introducing a baseline, which is a function   b s that provides a reference for the action value function q π s , a . This makes learning easier to converge. The equation with the baseline is shown below.
θ J θ = E π θ log π a | s , θ q π s , a b s
The value function v π s , a is a weighted average function of the action value function q π s , a with the policy probability π a | s , by definition. Therefore, the value function v π s , a is used as the baseline in this case. We define the advantage function, which represents the quantity of the action value measured with respect to its mean value, as follows.
A π s , a = q π s , a v π s
The expected value can be expressed by substituting Equation (8) into Equation (7) and computing the Monte Carlo approximation as follows.
θ J θ = E π θ log π a | s , θ A π s , a                          1 T t = 0 T 1 θ log π A t | S t , θ A π S t , A t
where A t is the action variable at time t . The action value function q π s , a included in the advantage function is approximated based on the cumulative discounted reward G t using the REINFORCE algorithm. Then, the update formula for θ is calculated as follows.
θ θ + α θ J θ
θ J θ = θ log π A t | S t , θ A π S t , A t
A π s , a = G t v π s

2.3.3. Neural Network

In this study, we use two types of neural networks.
The first is a neural network used for the Gaussian policy. To clarify the objective function, the derivative operator is taken from Equation (11) and multiplied by −1 to obtain Equation (13) as the loss function. RMSprop is used as the optimization algorithm, and the parameters are updated to minimize this loss function. The input, hidden, and output layers are listed in Table 3.
log π A t | S t , θ A π S t , A t
The second is a neural network for estimating the state value function. This neural network is designed to output the cumulative discounted rewards, using Equation (15) as the loss function. Adam is used as the optimization algorithm, and the parameters are updated to minimize this loss function. The input, hidden, and output layers are listed in Table 4.
t = 0 T 1 V G t 2
A schematic diagram of the learning algorithm used in this study is shown in Figure 3. We define one step as the next action to be taken in a certain state according to Gaussian policy and execute that action to obtain the next state and reward. One episode is defined as repeating the process until the termination conditions are met: the bipedal robot falls over, or the maximum number of steps is reached. When the termination condition is met, the neural network of the state value function is updated based on the history of states, actions, and rewards, and the cumulative discounted reward is calculated. The advantage function is then updated, and the neural network of the Gaussian policy is updated. In this study, this learning process is repeated for 500,000 episodes.

2.3.4. Probabilistic Policy with the Gaussian Model

The Gaussian model is a typical example of a probabilistic policy for control in continuous space. This model samples a K-dimensional action vector a from a K-dimensional normal distribution with a mean μ s and covariance matrix s as parameters in state s . The equation is expressed as follows.
π a | s , θ 1 det s exp 1 2 a μ s s 1 a μ s
If the covariance matrix s is not diagonal, the different components of the action vector will interact. This complicates the implementation of the policy function in a neural network. Therefore, we assume that the components of the action vectors are independent and that the covariance matrix consists only of diagonal components. Let the k-th diagonal component of the covariance matrix be σ k 2 s , then the K-dimensional normal distribution can be decomposed into an independent 1-dimensional normal distribution formula as follows.
π a | s , θ = k = 1 K π a k | s , θ                                                        k = 1 K 1 σ k 2 s exp a k μ k s 2 2 σ k 2 s
where μ k s and σ k 2 s are functions with state s as input, obtained based on the neural network introduced in Section 2.3.3.
Using the Gaussian policy, log π A t | S t , θ in Equation (13) can be calculated as follows.
log π A t | S t , θ = log k = 1 K π A t k | S t k , θ                            = k = 1 K log π A t k | S t k , θ                                                                  = 1 2 k = 1 K log σ k 2 S t A t k μ k S t 2 2 σ k 2 S t

2.3.5. Rewards

This study used the following four rewards for reinforcement learning.
  • r a l i v e
This is the reward that keeps the upper body at a certain height and reduces the rotation of the body in the direction of the pitch. The equation is expressed as follows.
r a l i v e = 1 , z b o d y > 0.8 m θ b o d y < 1.0 r a d 1 , e l s e
where z b o d y is the height of the center of the body and θ b o d y is the angle of the center of the body along the pitch axis.
  • r p r o g r e s s
This is the reward for the distance advanced by the center of the body. First, we find the distance p t from the position at time t to the target position.
p t = x d x t 2 + y d y t 2
where x d and y d are the target positions in the x and y directions, respectively. The advanced distance d is expected based on the following equation.
d = p t p t 1
The reward r p r o g r e s s is expressed by the following equation.
r p r o g r e s s = k p r o g r e s s × d
where k p r o g r e s s = 121.2 is the weight coefficient.
  • r j o i n t
This is the reward for ensuring that each joint does not exceed its range of motion. The reward r j o i n t is expected based on the following equation.
r j o i n t = k j o i n t × n j o i n t
where k j o i n t = 0.1 is the weight coefficient, n j o i n t is the number of joints beyond the range of motion.
  • r p l a n a r
This is the reward for planar covariation, which is a characteristic of the human gait. Since the kinematic synergy is such that the elevation angles at the thigh, shank, and foot are placed on a single plane, it is sufficient to ensure that the three calculated elevation angles are placed on a single plane. The following is a detailed description of the application method and the calculation process.
First, we explain how the angles are read and when the rewards are executed. The angle of each joint is read each time an action is executed according to the policy. When the heel of the bipedal robot lands on the ground, the reward r p l a n a r is executed and the angle data are reset.
Next, we explain how we take a plane from the angle readings. In this study, the plane is found by using the least-squares plane. The least-squares plane is a plane that is the least-squares distance from all the points in a 3-dimensional point cloud. The least-squares plane can be expressed as follows.
z = A x + B y + C
where A , B , and C are coefficients. We can obtain the unknowns A, B, and C using the lower–upper (LU) decomposition method. A least-squares plane is obtained by substituting the calculated coefficients into Equation (23). In this study, x is the elevation angle at the thigh, y is the elevation angle at the shank, and z is the elevation angle at the foot, so for simplicity, α is used for x , β for y , and γ for z . The plane γ is obtained by substituting each variable into Equation (23).
γ = A α + B β + C
The average of the squares ϵ of the difference between the plane and the elevation angle at the foot should be zero for the angle data to be placed on a single plane. Therefore, we train so that ϵ is close to zero. ϵ and the reward r p l a n a r are expressed by the following equations, respectively.
ϵ = i = i n γ i γ i 2 n d a t a
r p l a n a r = k p l a n a r × 0.5 ϵ , ϵ 1 k p l a n a r × 1 0.5 ϵ , ϵ < 1
where n d a t a is the number of the angle data, γ i is the elevation angle at the i-th foot, and k p l a n a r = 3.0 is the weight coefficient. Since planar covariation is valid for each of the left and right legs, the rewards in Equation (26) are also applied to each of the left and right legs.

3. Results

3.1. Walking Motion

3.1.1. Proposed Method

Figure 4 shows the walking motion of one step when the proposed method described in Section 2.3.5 is executed, walking in the order from (1) to (5). Figure 4 shows that the bipedal robot landed on its heels and took off its toes during walking. The elevation angles at the thigh, shank, and foot are plotted in 3-D space, and the views from the side and from above are shown in Figure 5. The blue points in each graph represent the timing of the heel strike, the red points represent the timing of the toe-off, and the yellow line in Figure 5a represents the least-squares plane. The contribution of the elevation angle is shown in Table 5.

3.1.2. Comparative Method

We use the rewards of Walker2DPyBulletEnv-v0 for a comparison with our proposed method. The rewards are the same for the body posture, the distance advanced by the center of the body, and the range of motion of the joints. Instead of a reward for the planar covariation, we use a deduction reward that prevents excessive torque and excessive power consumption. The following equations are used for each of these rewards.
r t o r q u e = k t o r q u e × i = 1 6 a t i 2 6
r e l e c t r i c i t y = k e l e c t r i c i t y × i = 1 6 a t i × ω t i 6
where a t i and ω t i are the action and angular velocity of each joint at a certain time t , and k t o r q u e = 0.1 and k e l e c t r i c i t y = 2.0 are the weight coefficient.
The walking motion of one step when the comparison method was executed is shown in Figure 6, with the steps taken in the order (1) through (5). As shown in Figure 6, the bipedal robot walked with its entire foot landing parallel to the road surface. Figure 7 shows the elevation angles at the thigh, shank, and foot plotted in 3-D space, viewed from the side and from above with respect to the least-squares plane of the point cloud. The blue points in each graph represent the timing of the heel strike, the red points represent the timing of the toe-off, and the yellow line in Figure 7a represents the least-squares plane. The contribution of the elevation angle is shown in Table 6.

3.2. Learning Curve

Figure 8 and Figure 9 show the learning curves of the number of steps, cumulative reward, and loss function of the neural network representing the Gaussian policy in one episode using the proposed method and the comparison method, respectively. From these figures, it is clear that more rewards are acquired, and the loss function decreases as the learning progresses. However, the loss function in Figure 8 has not converged compared to Figure 9, suggesting that the learning is insufficient.

4. Discussion and Future Work

This paper describes the gait control of a bipedal robot that aims to realize a human-like bipedal gait, in which the robot lands on its heels and takes off from its toes, using deep reinforcement learning in the simulation. We achieved a human-like gait by incorporating planar covariation as a part of the reinforcement learning reward.
The cumulative contribution of the proposed method and the comparison method to the second principal component is 97% and 97~98%, respectively. However, the comparison method did not produce a human-like gait, suggesting that planar covariation has other effects besides the fact that the elevation angles at the thigh, shank, and foot are placed on a single plane.
Comparing Figure 5 and Figure 7, it is clear that the proposed method does not significantly change the three elevation angles during heel strike and toe-off, as is the case with humans. On the other hand, in the comparative method, the red and blue points are unevenly distributed, suggesting that different movements are generated each time.
It is also known that human gait consists of continuous periodic and symmetrical movements produced by a precise series of coordinated movements, alternating between one leg and the other [31]. In the human and the proposed method, the contribution ratio of each principal component does not differ significantly for the left and right legs. On the other hand, the contribution rates of the first and second principal components differ by nearly 15% for the left and right legs in the comparison method. This suggests that the comparison method cannot consider the symmetry of the movement of the left and right legs.
These results indicate that planar covariation has an effect of symmetry of the left and right legs, in which the three elevation angles do not change significantly during heel strike and toe-off. In the future, we will consider not only the lower limbs but also the upper limbs, head, and other human characteristics. We would like to explore coordinated movements of the whole body to characterize a human-like gait.

Author Contributions

Conceptualization, J.Y., M.K., Y.S. and K.H.; methodology, J.Y., M.K., Y.S. and K.H.; software, J.Y., M.K. and Y.S.; validation, J.Y.; formal analysis, J.Y.; investigation, J.Y.; data curation, J.Y.; writing—original draft preparation, J.Y., M.K. and Y.S.; writing—review and editing, J.Y. and K.H.; visualization, J.Y.; supervision, K.H.; project administration, K.H.; funding acquisition, K.H. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by JSPS KAKENHI Grant Numbers JP20H04267 and JP21H05055. This work was also supported by a Waseda University Grant for Special Research Projects (Project number: 2023C-179).

Data Availability Statement

The datasets used and/or analyzed during the current study are available from the corresponding author upon reasonable request.

Acknowledgments

This study was conducted with the support of the Information, Production and Systems Research Center, Waseda University; Future Robotics Organization, Waseda University; and as a part of the humanoid project at the Humanoid Robotics Institute, Waseda University.

Conflicts of Interest

M.K. and Y.S. graduated from Meiji University in March 2023, and all authors declare no conflict of interest.

References

  1. Tomomitsu, M.S.V.; Alonso, A.C.; Morimoto, E.; Bobbio, T.G.; Greve, J.M.D. Static and dynamic postural control in low-vision and normal-vision adults. Clinics 2013, 68, 517–521. [Google Scholar] [CrossRef] [PubMed]
  2. Pozzo, T.; Berthoz, A.; Lefort, L. Head stabilization during various locomotor tasks in humans. Exp. Brain Res. 1990, 82, 97–106. [Google Scholar] [CrossRef]
  3. Venkadesan, M.; Yawar, A.; Eng, C.M.; Dias, M.A.; Singh, D.K.; Tommasini, S.M.; Haims, A.H.; Bandi, M.M.; Mandre, S. Stiffness of the human foot and evolution of the transverse arch. Nature 2020, 579, 97–100. [Google Scholar] [CrossRef] [PubMed]
  4. Bohm, S.; Mersmann, F.; Santuz, A.; Schroll, A.; Arampatzis, A. Muscle-specific economy of force generation and efficiency of work production during human running. eLife 2021, 10, e67182. [Google Scholar] [CrossRef] [PubMed]
  5. Kamikawa, Y.; Maeno, T. Underactuated Five-Finger Prosthetic Hand Inspired by Grasping Force Distribution of Humans. In Proceedings of the 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Nice, France, 22–26 September 2008; pp. 717–722. [Google Scholar]
  6. Ogura, Y.; Aikawa, H.; Shimomura, K.; Morishima, A.; Lim, H.O.; Takanishi, A. Development of a New Humanoid Robot WABIAN-2. In Proceedings of the 2006 IEEE International Conference on Robotics and Automation (ICRA), Orlando, FL, USA, 15–19 May 2006; pp. 76–81. [Google Scholar]
  7. Hashimoto, K.; Takezaki, Y.; Hattori, K.; Kondo, H.; Takashima, T.; Lim, H.O.; Takanishi, A. A Study of Function of Foot’s Medial Longitudinal Arch Using Biped Humanoid Robot. In Proceedings of the 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Taipei, Taiwan, 18–22 October 2010; pp. 2206–2211. [Google Scholar]
  8. Hashimoto, K.; Takezaki, Y.; Motohashi, H.; Otani, T.; Kishi, T.; Lim, H.O.; Takanishi, A. Biped Walking Stabilization Based on Gait Analysis. In Proceedings of the 2012 IEEE International Conference on Robotics and Automation (ICRA), Saint Paul, MN, USA, 14–18 May 2012; pp. 154–159. [Google Scholar]
  9. Cheng, G.; Hyon, S.H.; Morimoto, J.; Ude, A.; Colvin, G.; Scroggin, W.; Jacobsen, S.C. CB: A Humanoid Research Platform for Exploring NeuroScience. In Proceedings of the 6th IEEE-RAS International Conference on Humanoid Robots, Genova, Italy, 4–6 December 2006; pp. 182–187. [Google Scholar]
  10. Ude, A.; Wyar, V.; Lin, L.H.; Cheng, G. Distributed Visual Attention on a Humanoid Robot. In Proceedings of the 5th IEEE-RAS International Conference on Humanoid Robots, Tsukuba, Japan, 5 December 2005; pp. 381–386. [Google Scholar]
  11. Endo, N.; Takanishi, A. Development of whole-body emotional expression humanoid robot for ADL-assistive RT services. J. Robot. Mechatron. 2011, 23, 969–977. [Google Scholar] [CrossRef]
  12. Kryczka, P.; Falotico, E.; Hashimoto, K.; Lim, H.O.; Takanishi, A.; Laschi, C.; Dario, P.; Berthoz, A. A Robotic Implementation of a Bio-Inspired Head Motion Stabilization Model on a Humanoid Platform. In Proceedings of the 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vilamoura-Algarve, Portugal, 7–12 October 2012; pp. 2076–2081. [Google Scholar]
  13. Ferreira, J.P.; Crisóstomo, M.M.; Coimbra, A.P. Human Gait Acquisition and Characterization. IEEE Trans. Instrum. Meas. 2009, 58, 2979–2988. [Google Scholar] [CrossRef]
  14. Yamano, J.; Kurokawa, M.; Sakai, Y.; Hashimoto, K. Walking Motion Generation of Bipedal Robot Based on Planar Covariation Using Deep Reinforcement Learning. In Proceedings of the 26th International Conference on Climbing and Walking Robots (CLAWAR 2023), Florianópolis, Brazil, 2–4 October 2023. [Google Scholar]
  15. Borghese, N.A.; Bianchi, L.; Lacquaniti, F. Kinematic determinants of human locomotion. J. Physiol. 1996, 494, 863–879. [Google Scholar] [CrossRef] [PubMed]
  16. Ivanenko, Y.P.; d’Avella, A.; Poppele, R.E.; Lacquaniti, F. On the Origin of Planar Covariation of Elevation Angles During Human Locomotion. J. Neurophysiol. 2008, 99, 1890–1898. [Google Scholar] [CrossRef] [PubMed]
  17. Ogihara, N.; Kikuchi, T.; Ishiguro, Y.; Makishima, H.; Nakatsukasa, M. Planar covariation of limb elevation angles during bipedal walking in the Japanese macaque. J. R. Soc. Interface 2012, 9, 2181–2190. [Google Scholar] [CrossRef] [PubMed]
  18. Ha, S.S.; Yu, J.H.; Han, Y.J.; Hahn, H.S. Natural Gait Generation of Biped Robot based on Analysis of Human’s Gait. In Proceedings of the 2008 International Conference on Smart Manufacturing Application, Goyangi, Republic of Korea, 9–11 April 2008; pp. 30–34. [Google Scholar]
  19. Ghiasi, A.R.; Alizadeh, G.; Mirzaei, M. Simultaneous design of optimal gait pattern and controller for a bipedal robot. Multibody Syst. Dyn. 2010, 23, 410–429. [Google Scholar] [CrossRef]
  20. GitHub Benelot/Pybullet-Gym. Available online: https://github.com/benelot/pybullet-gym (accessed on 8 December 2023).
  21. PyBllet. Available online: https://pybullet.org/wordpress/ (accessed on 8 December 2023).
  22. Kapandji, A.I.; Owerko, C. The Physiology of the Joints: 2 The Lower Limb, 7th ed.; Handspring Publishing Limited: London, UK, 2019. [Google Scholar]
  23. Wang, S.; Chaovalitwongse, W.; Babuska, R. Machine learning algorithms in bipedal robot control. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 2012, 42, 728–743. [Google Scholar] [CrossRef]
  24. Xie, Z.; Berseth, G.; Clary, P.; Hurst, J.; van de Panne, M. Feedback Control For Cassie With Deep Reinforcement Learning. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; pp. 1241–1246. [Google Scholar]
  25. Tsounis, V.; Alge, M.; Lee, J.; Farshidian, F.; Hutter, M. DeepGait: Planning and Control of Quadrupedal Gaits Using Deep Reinforcement Learning. IEEE Robot. Autom. Lett. 2020, 5, 3699–3706. [Google Scholar] [CrossRef]
  26. Wang, Z.; Wei, W.; Xie, A.; Zhang, Y.; Wu, J.; Zhu, Q. Hybrid Bipedal Locomotion Based on Reinforcement Learning and Heuristics. Micromachines 2022, 13, 1688. [Google Scholar] [CrossRef]
  27. Adolph, K.E.; Bertenthal, B.I.; Boker, S.M.; Goldfield, E.C.; Gibson, E.J. Learning in the Development of Infant Locomotion. Monogr. Soc. Res. Child Dev. 1997, 62, 1–140. [Google Scholar] [CrossRef]
  28. Abiodun, O.I.; Jantan, A.; Omolara, A.E.; Dada, K.V.; Mohamed, N.A.; Arshad, H. State-of-the-art in artificial neural network applications: A survey. Heliyon 2018, 4, e00938. [Google Scholar] [CrossRef] [PubMed]
  29. Sutton, R.S.; Barto, A.G. Reinforcement Learning, 2nd ed.; The MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
  30. Sutton, R.S.; McAllester, S.; Singh, S.; Mansour, Y. Policy gradient methods for reinforcement learning with function approximation. In Proceedings of the 12th International Conference on Neural Information Processing Systems, Denver, CO, USA, 29 November–4 December 1999. [Google Scholar]
  31. Castermans, T.; Duvinage, M.; Cheron, G.; Dutoit, T. Towards Effective Non-Invasive Brain-Computer Interfaces Dedicated to Gait Rehabilitation Systems. Brain Sci. 2014, 4, 1–48. [Google Scholar] [CrossRef]
Figure 1. Definition of elevation angles at the thigh, shank, and foot.
Figure 1. Definition of elevation angles at the thigh, shank, and foot.
Machines 12 00092 g001
Figure 2. This figure plots the elevation angles at the thigh, shank, and foot during walking in 3-D space; (a) plotted data for walking on a hard surface; (b) plotted data for walking on a soft surface.
Figure 2. This figure plots the elevation angles at the thigh, shank, and foot during walking in 3-D space; (a) plotted data for walking on a hard surface; (b) plotted data for walking on a soft surface.
Machines 12 00092 g002
Figure 3. A schematic diagram of the learning algorithm.
Figure 3. A schematic diagram of the learning algorithm.
Machines 12 00092 g003
Figure 4. Walking motion with the proposed method. The red foot is the left side.
Figure 4. Walking motion with the proposed method. The red foot is the left side.
Machines 12 00092 g004
Figure 5. The elevation angles at the thigh, shank, and foot plotted in 3-D space using the proposed method; (a) viewed from the side with respect to the least-squares plane of the point cloud; (b) viewed from above with respect to the least-squares plane of the point cloud.
Figure 5. The elevation angles at the thigh, shank, and foot plotted in 3-D space using the proposed method; (a) viewed from the side with respect to the least-squares plane of the point cloud; (b) viewed from above with respect to the least-squares plane of the point cloud.
Machines 12 00092 g005
Figure 6. Walking motion with the comparative method. The red foot is the left side.
Figure 6. Walking motion with the comparative method. The red foot is the left side.
Machines 12 00092 g006
Figure 7. The elevation angles at the thigh, shank, and foot plotted in 3-D space using the comparative method; (a) viewed from the side with respect to the least-squares plane of the point cloud; (b) viewed from above with respect to the least-squares plane of the point cloud.
Figure 7. The elevation angles at the thigh, shank, and foot plotted in 3-D space using the comparative method; (a) viewed from the side with respect to the least-squares plane of the point cloud; (b) viewed from above with respect to the least-squares plane of the point cloud.
Machines 12 00092 g007
Figure 8. Learning curve for the proposed method. (a) number of steps per episode; (b) the cumulative reward obtained in one episode; (c) loss function of the neural network representing Gaussian policy.
Figure 8. Learning curve for the proposed method. (a) number of steps per episode; (b) the cumulative reward obtained in one episode; (c) loss function of the neural network representing Gaussian policy.
Machines 12 00092 g008
Figure 9. Learning curve for the comparative method. (a) number of steps per episode; (b) the cumulative reward obtained in one episode; (c) loss function of the neural network representing Gaussian policy.
Figure 9. Learning curve for the comparative method. (a) number of steps per episode; (b) the cumulative reward obtained in one episode; (c) loss function of the neural network representing Gaussian policy.
Machines 12 00092 g009
Table 1. The contribution of the elevation angle when humans walk on hard and soft surfaces.
Table 1. The contribution of the elevation angle when humans walk on hard and soft surfaces.
Surface1st2nd3rd
Hard77.4%22.0%0.6%
Soft72.0%27.4%0.6%
Table 2. The joint’s range of motion.
Table 2. The joint’s range of motion.
Joint NameAngle [deg]
Hip Joint−20~135
Knee Joint−140~0
Ankle Joint−45~25
Table 3. Neural network for the Gaussian Policy.
Table 3. Neural network for the Gaussian Policy.
LayerDetailDimension
InputBody position3
Body velocity3
Body posture2
Joint angle6
Joint angular velocity6
Foot ground state2
HiddenLayer 1220
Layer 2114
Layer 360
OutputMean6
Variance6
Table 4. Neural network for estimating the state value function.
Table 4. Neural network for estimating the state value function.
LayerDetailDimension
InputBody position3
Body velocity3
Body posture2
Joint angle6
Joint angular velocity6
Foot ground state2
HiddenLayer 1220
Layer 233
Layer 35
OutputState value function1
Table 5. The contribution of the elevation angle using the proposed method.
Table 5. The contribution of the elevation angle using the proposed method.
Leg1st2nd3rd
Left70.0%27.1%2.9%
Right72.8%24.0%3.2%
Table 6. The contribution of the elevation angle using the comparative method.
Table 6. The contribution of the elevation angle using the comparative method.
Leg1st2nd3rd
Left64.9%33.3%1.8%
Right78.8%17.8%3.4%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yamano, J.; Kurokawa, M.; Sakai, Y.; Hashimoto, K. Realization of a Human-like Gait for a Bipedal Robot Based on Gait Analysis. Machines 2024, 12, 92. https://doi.org/10.3390/machines12020092

AMA Style

Yamano J, Kurokawa M, Sakai Y, Hashimoto K. Realization of a Human-like Gait for a Bipedal Robot Based on Gait Analysis. Machines. 2024; 12(2):92. https://doi.org/10.3390/machines12020092

Chicago/Turabian Style

Yamano, Junsei, Masaki Kurokawa, Yuki Sakai, and Kenji Hashimoto. 2024. "Realization of a Human-like Gait for a Bipedal Robot Based on Gait Analysis" Machines 12, no. 2: 92. https://doi.org/10.3390/machines12020092

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop