Just Don’t Fall: An AI Agent’s Learning Journey Towards Posture Stabilisation

Hossny, Mohammed; Iskander, Julie

doi:10.3390/ai1020019

Open AccessArticle

Just Don’t Fall: An AI Agent’s Learning Journey Towards Posture Stabilisation

by

Mohammed Hossny

^*,†

and

Julie Iskander

^†

Institute for Intelligent Systems Research and Innovation (IISRI), Deakin University, Geelong 3217, Australia

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

AI 2020, 1(2), 286-298; https://doi.org/10.3390/ai1020019

Submission received: 6 May 2020 / Revised: 7 June 2020 / Accepted: 9 June 2020 / Published: 15 June 2020

(This article belongs to the Special Issue Advancements in Artificial Intelligence for Neurodegenerative Diseases Assessment)

Download

Browse Figures

Versions Notes

Abstract

:

Learning to maintain postural balance while standing requires a significant, fine coordination effort between the neuromuscular system and the sensory system. It is one of the key contributing factors towards fall prevention, especially in the older population. Using artificial intelligence (AI), we can similarly teach an agent to maintain a standing posture, and thus teach the agent not to fall. In this paper, we investigate the learning progress of an AI agent and how it maintains a stable standing posture through reinforcement learning. We used the Deep Deterministic Policy Gradient method (DDPG) and the OpenSim musculoskeletal simulation environment based on OpenAI Gym. During training, the AI agent learnt three policies. First, it learnt to maintain the Centre-of-Gravity and Zero-Moment-Point in front of the body. Then, it learnt to shift the load of the entire body on one leg while using the other leg for fine tuning the balancing action. Finally, it started to learn the coordination between the two pre-trained policies. This study shows the potentials of using deep reinforcement learning in human movement studies. The learnt AI behaviour also exhibited attempts to achieve an unplanned goal because it correlated with the set goal (e.g., walking in order to prevent falling). The failed attempts to maintain a standing posture is an interesting by-product which can enrich the fall detection and prevention research efforts.

Keywords:

postural balance; deep reinforcement learning; postural stabilisation; biomechanics

Graphical Abstract

1. Introduction

Postural balance is one of the key contributing factors towards fall prevention [1]. Despite the common misconception, maintaining a standing posture does require a huge co-ordination effort between neuromuscular and sensory systems. It also involves coordinated activity of muscles [2,3]. In contrast, walking was referred to as controlled falling [4]. Walking, which is similar to a swinging pendulum, utilises the potential energy preserved by the upper body and actuated by gravity, where efficiency in walking is maintained by the effective interchange between potential and kinetic energy [4,5,6]. In addition, from a standing posture, gait is initiated by the de-innervation of the muscles responsible of maintaining balance which causes the body to fall forwards, then a series of coordinated protective steps are performed [7,8]. In postural balance, walking, running and other human movements, the coordinated motor control of the neuro-musculoskeltal units (muscles) plays an essential role [9].

Studies of the biomechanical and physiological aspects of human movement were conducted to understand the neural control needed to maintain the type of movement desired [4,6,10,11,12,13]. Most of these studies included human participants recruitment, data collection using motion capture techniques, force plates and/or electromyography electrodes (EMGs). Biomechanics analysis platforms such as OpenSim [14] are also used extensively in different human movement studies [15,16,17,18,19,20,21,22]. The main challenge facing this type of studies is the fact that they involve human participants which entails ethical and safety consideration to maintain the safety and well-being of the participants. Simulation based studies using Opensim provided a solution away from human participants [14]. It had been used to predict movement such as walking [23] and jumping [24].

Recently machine learning techniques have been used beside statistical analysis tools in human movement studies with emphasis on classification, prediction and estimation tasks [25]. Bipedal and quadrupedal robotics movement has also been studied extensively where the use of different machine learning techniques were investigated [26,27,28]. Reinforcement learning (RL) [29] and especially deep reinforcement learning (DRL) that utilised deep learning techniques are slowly but steadily providing solutions to many simulation studies [30,31,32,33]. Reinforcement learning is observing an environment and then learning to perform actions that will maximise a score [29]. Learning to maximise the score is done through exploration of the environment and exploitation of past experience, where the challenge is how to create a balance between exploration and exploitation. Continuous action spaces are still imposing a bigger challenge for agents but more sophisticated learners were developed to solve it [34,35,36,37]. Recently, DRL was used to train a human musculoskeletal model, based on OpenSim models, to walk and run [31,32,33] and even to walk with a prosthetic leg [30]. Pilotto et al. has discussed the importance of technology in the advancement of geriatrics and prevention of falls in the elderly population [38]. While the results demonstrated by Lee et al. demonstrate posture stability, their solution presented in [39] relied on providing a reference motion collected from participants. In our work, we focused our efforts on self induced motion leading to posture stabilisation. In this paper, we discuss the use of Artificial Intelligence (AI) and especially DRL to learn how not to fall. More specifically, we investigate the learning progress of an AI agent towards maintaining a standing posture. We adopted a modular design of the control neural network by separating the observation from the policy. We also used multiple policies, each trained separately, as well as a coordination policy to coordinate between the learnt policies. The rest of this paper is organised as follows. Section 2 describes the materials and methods used in this study. Section 3 reports the results. Section 4 discusses the behaviour learnt by the AI agent. Finally, Section 5 derives conclusions and introduces to future directions.

2. Material and Methods

In this work, we focus our efforts on studying the muscle control strategies an AI agent can learn in order to prevent falling. This is achieved via biomechanical simulation using OpenSim and reinforcement learning via deep neural networks. The biomechanics simulation serves as the environment and the deep neural network serves as the motor control part of the brain.

2.1. Biomechanic Simulation Environment

The used environment is based on OpenSim [31], and OpenAI gym [40]. The human musculoskeletal model used is based on the previous works presented in [41,42,43], where the model is made up of seven body parts. The head, neck, torso and pelvis are represented as a single body for simplicity, in addition, each leg is represented by three body parts, upper leg, lower leg and foot. The model has 14 degrees-of-freedom (DoF), 6 DoFs (3 rotational and 3 translational) for the pelvis, 2 rotational DoFs for each hip, 1 rotations DoF foe each knee and finally, 1 rotational DoF for each ankle. The model is actuated by 22 muscles [44], 11 muscles for each leg. The muscles actuating each leg included the hip adductor, hip abductor, hamstrings, biceps femoris, gluteus maximus, iliopsoas, rectus femoris, vastus intermedius, gastrocnemius, soleus, and tibialis anterior. In addition, contact with the ground is modelled using the Hunt-Crossley model [45]. As shown in Figure 1, for each foot, contact spheres are positioned at the heel and toes. In addition, another rectangular contact plane is placed over the ground. Force is generated when the objects come into contact, which depends on the velocity of the collision and depth of penetration of the contact objects [31].

The observation fed to the AI agent included 100 values covering ground reaction forces, pelvis velocities, joint angles and muscle state. These readings were grouped as listed in Table 1. Three values were providing random numbers, summarising velocity vector field, from the environment were incorporated to introduce a randomisation factor to the AI training. Typically, the AI agent should learn how to progress within the environment by observing the score provided by the environment. This score reflects how good or bad the AI agent behaved towards achieving the desired goal. It is worth noting that the AI agent must be completely oblivious to what the desired goal is and thus, it should learn based on the praise it receives from the environment (much like a toddler learning to stand and walk). The scoring function was designed to provide a positive reward for having contact with the ground via

F_{g}

while penalising undesired actions resulting in change in pelvis height H, and velocity

| | \dot{p} | |, | | \dot{θ^{p}} | |

. In order to maintain a standing posture the score was penalised with the magnitude of values and change in the joint angles

| | θ^{J} | |

,

| | Δ θ^{J} | |

. The score also includes penalty for the change in joint angles in order to prevent the scissor legs posture described in [30]. This was achieved by penalising the adduction angles

θ_{a d d}^{J}

of both legs. The muscular state was not incorporated in the score function. The scoring function is then formulated as

\begin{matrix} R_{G R F} & = & {\begin{matrix} 1, & | | F_{g} | | > 0 \\ 0, & otherwise \end{matrix} \end{matrix}

(1)

\begin{matrix} R_{A l i v e} & = & 1 \end{matrix}

(2)

\begin{matrix} C o s t & = & | | m^{a} | | \end{matrix}

(3)

\begin{matrix} + & 8 {(H - H^{s})}^{2} \end{matrix}

(4)

\begin{matrix} + & 8 | | \dot{p} | | + 8 | | {\dot{θ}}^{p} | | + 64 | | θ^{J} | | + 64 | | Δ θ^{J} | | \end{matrix}

(5)

\begin{matrix} + & 8 γ^{2} + 8 ϕ^{2} \end{matrix}

(6)

\begin{matrix} + & 512 θ_{a d d}^{J} \end{matrix}

(7)

\begin{matrix} N o r m . S c o r e & = & R_{A l i v e} + R_{G R F} - N o r m . C o s t \end{matrix}

(8)

where

H^{s}

is the baseline pelvis height when standing, and

| | \cdot | |

is the Euclidean norm. All angles were normalised by

π

. The

N o r m . C o s t

term is the normalised cost using the sum of the weights.

2.2. Artificial Neural Network Model

For any AI agent, the choice of an artificial neural network (ANN) is largely affected by the inputs provided from the environment for several reasons. First, the dynamic range of input values from the environment differs based on the measured phenomenon. For example, joint angles range from

[- π, + π]

and increase on a spherical scale while muscle length is normalised between

[0, 1]

and increases linearly. Even with normalisation of joint angles as

{\bar{θ}}_{i} = θ_{i} / π

, the rate of change remains different. The second challenge was how information leakage from different measured values during the training of the ANN. To address these two challenges, we designed smaller neural networks to act as mini-observers trained for encoding the input values of the phenomena sharing the same dynamic range and purpose in the simulation. The output of these mini-observers were then concatenated into one encoded output.

The complexity of the task also affected the design of the ANN. Specifically, during the training of the neural network, it attempts to minimise the error between the estimated output and the target output via calculated update gradients. The update gradients update the parameters of the neural network at once which prevents the network from solving sub-tasks (e.g., joint flexion and extension) in order to solve the desired task (e.g., do not fall). In order to address this challenge we chose to train each policy separately with different initialisation parameters and then train a coordination policy to derive a mixture of the actuation signals from different policies. There are two solutions to this problem. The first solution is to enable only certain parts of the neural network to train while locking the rest of the network. The other solution, which is bio-inspired, is to expand the network as needed. In this work, we manually adapted the second solution, by training different policies separately and then integrating them with the coordination policy. Ideally, this approach should be done automatically by using the Neuro-Evolution of Augmenting Topology algorithm (NEAT) [46]. The NEAT algorithm is an evolutionary algorithm which relies on generating several MLP architectures and harnessing the power of mutations and cross-over for exploration of thousands for generations. However, the NEAT algorithm is more suitable to figuring out the MLP topology (i.e. policy). To the best of our knowledge, there is no evidence it could be expanded for an entire architecture of deep neural networks especially with the computationally expensive training and the vast amount of data and/or trials requirement. While UBER is now investing the expansion of the NEAT algorithm to deep learning [47], perhaps the meta-learning research [48] is the closest approach to expanding the NEAT algorithm on an entire deep learning architecture.

To that end, as shown in Figure 1, the AI model proposed in this work consists of three neural networks; mini-observer networks (one for each modality provided by the environment), policy networks (one for each sub-task) and a coordination network to combine the actions from different policies into a final actuation signal.

2.3. Reinforcement Training Procedure

Due to the dynamic nature of the problem, we chose to utilise deep reinforcement learning. We adopted the Deep Deterministic Policy Gradient (DDPG) method because of its impressive results in continuous action spaces [34]. The DDPG model consists of an actor network (described above) and a critic network to evaluate the actions produced by the actor in relation to the environment. The critic network takes the actions produced by the actor network and the values obtained from the environment and produces a score [34,49,50]. In our setup, the actor is the entire coordinated multi-policy shown in Figure 1 while the critic is a classical multi-layer-perceptron (MLP) neural network model which takes the observations and coordinated action as input and produces an estimated score. The estimated score is then compared to the actual score reported from the environment.

In our experiments, we altered the training algorithm to suit the incremental expansion of the policy neural networks. The proposed training was carried out in two stages, action policy training and policy aggregation. We started by training two policy networks for 10,000 steps (roughly 100 episodes). Both policies were initialised with different seeds in order to obtain diversity in the outcome. During the training the neural network model with the best score was stored. In the second stage, both action policies were combined and a new coordination policy was constructed. The coordination policy works as a switch to choose the weight of the actuation signals of different muscles and produce the final action. During this stage, the aggregated policies and the coordination policy were trained with a fresh experience replay buffer and a critic neural network. The rationale behind this is that the environment has changed from the policies’ point of view and thus new experiences must be gathered.

When expanding the policy pool with a new policy network in the second stage, the trained weights of the previous policy network is copied to the new untrained policy network instead of the standard random initialisation. This allows the newly added policy to have a training head start with the current training state and protects the previously trained policies from being drastically altered while adapting to the newly added policy.

During the coordination refinement stage of the training, we prevent all policy networks from training and use the experience to fine tune the coordination network. This allows the coordination network to adapt to the distinctive postural stability strategies adopted by different policies. Locking the policies is also essential to preserve their trained strategies without leaking information from other policies and the coordinator network.

3. Results

During the training of the AI agent, the final goal was clearly defined by the scoring function implemented in the environment. However, in order to achieve this goal, the AI agent had to learn to discover and solve two more intermediate tasks,

identify the importance of centre of gravity (COG); and
identify and exploit the dominant leg concept.

The training took place on two stages with incrementally increasing number of policies from one policy towards a total of three policies.

As illustrated in Figure 2-top, in the first few training episodes the AI agent explored the extremes of the de-innervating (10 episodes) and randomly innervating (100 episodes) the muscle-set controlling the body before it discovered the concept of the centre of gravity (COG). This allowed the AI agent attempt maintaining the COG and Zero-Moment-Point (ZMP) in front of the body. This, in return, allowed it to fall bottom-first instead of head-first. This resembles the behaviour toddlers exhibit when falling from a standing posture. It is worth noting that neither the COG nor the ZMP were provided as an input to the AI agent. In contrast to toddlers, the newly discovered concept (from the AI agent’s perspective) was derived from the need to balance. Toddlers, on the other hand, already grasp this concept during the early stages of locomotion, which are sitting without assistance, crawling, standing with assistance and standing without assistance with an average of

1.43 \pm 2.1

months between different milestones [51]. This is achieved via inputs from the vestibular, visual, and somatosensory sensory systems [52]. During the second stage of the training (two policies), the AI agent learnt to exploit the concept of using a dominant leg [53]. This allowed it to shift the load of the entire body on one leg while using the other leg for fine tuning the balancing action.

The first policy was trying to prevent the backward falls by exploiting the dominant leg and thrusting the pelvis forward. The second policy, however, adopted to perform a leaning forward action by pivoting on the heels and adjusting the ZMP via shifting the weight of the upper body anteriorly as shown in Figure 2-bottom. While it abused the newly discovered capability, the fine tuning of the coordination neural network allowed it to maintain the balance between the new action and the previously learnt actions. Finally, during the fine tuning and coordination between the two trained policies, the AI agent explored the possibility of expanding the leg base and finally managed to stand.

As shown in Figure 3, the muscle actuation pattern has changed in four milestones. Each milestone was trained for 500,000 simulation steps. The maximum training episode length was 500 simulation steps or 5 s (1 step = 0.01 s). Each milestone was evaluated via 35 test trials. Actuation tables are available in the Supplementary Materials. First, the AI agent adopted a strategy with three actuation levels (no, medium and full actuation). While this strategy does not actually maintain balance, it did serve as a foundation for subsequent milestones. In the next milestone, the AI agent adopted a left dominant leg strategy by maximising the actuation of the left gluteus maximus (glut_max_l) muscle to thrust the pelvis forward while locking the left knee with full extension by fully actuating the vastus intermedius (vasti_l) muscle. Accordingly, the right gluteus maximus and vastus intermedius muscles were actuated to achieve dexterous traction with the ground.

This strategy was further refined in the third milestone which allowed the agent to prolong the balancing action by engaging the hamstrings (hamstrings) and the iliopsoas (iliopsoas) muscles for finer hip and knee control while exploiting alternating foot tapping as shown in the actuation graphs of the gastrocnemius (gastroc) and tibialis anterior (tib_ant) muscles. In the fourth and final milestone, the AI agent further improved the actuation strategy to maintain longer standing duration. Because the AI agent adopted a locked knee strategy, it did not attempt to actuate the soleus muscle (soleus). This can be considered a local minimum caused by the negligible weight of the effort penalty term in the score function (Equation (3)).

4. Discussion

Because each policy is self-contained, the proposed approach is expected to work with other off-policy reinforcement learning algorithms such as Soft Actor Critic (SAC) [55] and Distributed Distributional DDPG (D4PG) [56]. However, while early tests using SAC did show similar behaviour, we anticipate technical challenges with D4PG because of the asynchronous update to the experience replay buffer. In this section we discuss an interesting behaviour of the AI agent, limitations of the study, and future directions.

4.1. An Interesting Behaviour

An interesting AI behaviour emerged during an early training stage of the coordination neural network. Because the ultimate goal remains not to fall, the AI agent explored the option of taking a protective step to maintain a better balance. In doing so, the AI agent learnt to take few coordinated steps as shown in Figure 4. This behaviour took place when we injected the AI agent with noise to increase exploration. However, because the score function was designed for standing, the learnt behaviour did not constitute a proper gait cycle. Also, the limited capacity of the AI model may have limited the dexterity of the learnt gait cycle. That been said, the ambition of the AI agent to engage in locomotion as a way to prevent falling remains an interesting behaviour. Considering Novacheck’s conclusion that walking is controlled falling [4], the AI behaviour here cannot be considered walking because it is not self induced. This behaviour known as the value misalignment problem in computer science literature. This occurs when the score function is not specific enough to excite the AI agent to learn the desired task, but instead it causes the AI agent to engage in an obsessive behaviour of maximising the score by any means necessary. This problem poses a paradox because having a very specific score function may lead the neural networks to overfit on the observed training scenarios and fail to generalise to other variations in the environment. It is worth noting that this locomotion attempt (3 steps) could not be reproduced with the same coordination demonstrated in the video attached in the Supplementary Materials.

4.2. Limitations

It is worth noting that, theoretically, the AI agent should be able to achieve the same result using only two policies (ideally a much deeper single policy). This limitation is usually addressed via model pruning which discards the redundant parts of the neural network. However, when applying model pruning, the performance (measured by standing duration) dropped by 50% and the model could not sustain a standing posture for more than two seconds. This suggests that the two policies do indeed contribute to posture stabilisation. We also noticed that after allowing training on a third policy, the AI agent discovered a new sub-task of slowly spreading the feet laterally to achieve a wider base. This behaviour opens the door for conducting further research into rearranging the policy neural networks into a chain or a pipeline. In this case, there will be no need for the coordination network. That being said, distributing the load over multiple neural networks does provide an explanation advantage of the behaviour of the AI model which is an important step towards explainable AI (XAI).

The main challenge with training a single neural network on such a complex task is the lack of control over the flow of gradient update. Not only does it update the entire policy at once which alters the policy and the first few observation layers interacting with the environment, but also deprives the agent from perfecting any of the sub-tasks required to solve the problem. This was observed as an oscillation between two policies favouring leaning forward and backward in the 2D planar setup, i.e., no lateral movement. In the 3D setup, we used in this work, the lateral movement became a problem not only because of the added dimension but also because the hip adduction and hip abduction muscles are now engaged. These muscles’ maximum isometric force is approximately 10 times the maximum isometric force of muscles like the biceps femoris which flexes the knee joint. This highlights the discrepancy in the characteristics of different actuators and is now being investigated using learnable parameterised activation functions in [57].

5. Conclusions

In this paper, we followed the learning journey of an AI agent attempting to assume a stable standing posture. We used the Opensim biomechanics simulation environment [30]. We adopted the DDPG reinforcement learning technique to derive coordinated continuous muscle actuation signals in order to stabilise a standing posture [34,49,50]. The AI agent learnt to maintain a standing posture for 4 s by learning two sub-tasks of leaning backward, forward and the coordination between the two actions. While considered a short duration for maintaining a standing posture, it is worth noting that maintaining a standing posture for prolonged periods requires recurrent backtracking through different standing states. Such a recursive behaviour would require utilising Long-Short-Term-Memory modules (LSTM). Nevertheless, it was very interesting to witness the evolution of learnt sub-tasks as we allowed the AI agent to train on new policies.

The behaviour witnessed in this study highlights two more research points to be investigated. The first research point is related to training AI models using synthetic data. The main motivation deriving this area forward is the expanding gap in available data for training AI models. This issue becomes more significant when considering sensitive applications where collecting realistic data is difficult or may raise safety and ethical concerns. Fall detection and prevention is a growing concerns among public health where there is a shortage in datasets of realistic fall posture sequences. These datasets are usually recorded by stunt actors who can fall safely or generated by 3D artists [58]. However, both solutions do provide data that is not a real representation of fall occurrences. Fortunately, the presented work does derive the coordinated actuation of muscles that do cause a realistic fall. The AI agent’s failed attempts to maintain a standing posture can provide a comprehensive dataset of falling posture sequences that can advance the fall detection and prevention research endeavours.

The second research point to be investigated is the discovered AI ambition to explore and exploit locomotion as a mean to prolong not falling. This problem is known in AI research as value misalignment problem and it has sparked a huge debate among computer scientists and philosophers. The reason for this is that relying on maximising the score solely can excite the AI agent to achieve this via exploiting the environment. While the argument of designing tighter score functions is sound, it is a very fine line that we have to walk between resorting to classical rule-based AI and the modern aspirations towards Artificial General Intelligence (AGI). This, in return, may introduce much feared scenarios of having intelligent machines running critical aspects of human lives [59]. While these scenarios are exaggerated in the media and the dystopian literature, it is unlikely to actually occur in the near future due to the limitation of compute power. However, not only does this debate raise good questions regarding AI safety, ethics and even rights, it also raises questions about our societal rights and duties.

Supplementary Materials

The following are available online at https://www.mdpi.com/2673-2688/1/2/19/s1.

Author Contributions

Both authors contributed equally to this work. This includes but not limited to conceptualisation, experiment design, coding, result reporting and writing. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AI	Artificial Intelligence
AGI	Artificial General Intelligence
ANN	Artificial Neural Network
COG/COM	Centre of Gravity/Mass
D4PG	Distributed Distributional Deep Deterministic Policy Gradient
DDPG	Deep Deterministic Policy Gradient
DoF	Degree of Freedom
DRL	Deep Reinforcement Learning
LSTM	Long-Short-Term-Memory
MLP	Multi-layer Perceptron
NEAT	Neuro-Evolution of Augmenting Topology
RL	Reinforcement Learning
XAI	Explainable Artificial Intelligence
ZMP	Zero Moment Point

References

Pua, Y.H.; Ong, P.H.; Clark, R.A.; Matcher, D.B.; Lim, E.C.W. Falls efficacy, postural balance, and risk for falls in older adults with falls-related emergency department visits: Prospective cohort study. BMC Geriatr. 2017, 17, 291. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Alonso, A.C.; Brech, G.C.; Bourquin, A.M.; Greve, J.M.D. The influence of lower-limb dominance on postural balance. Sao Paulo Med J. 2011, 129, 410–413. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Riemann, B.L.; Guskiewicz, K.M. Contribution of the Peripheral Somatosensory System to Balance and Postural Equilibrium; Human Kinetics: Champaign, IL, USA, 2000; pp. 37–51. [Google Scholar]
Novacheck, T.F. The biomechanics of running. Gait Posture 1998, 7, 77–95. [Google Scholar] [CrossRef]
Winter, D.A.; Quanbury, A.O.; Reimer, G.D. Analysis of instantaneous energy of normal gait. J. Biomech. 1976, 9, 253–257. [Google Scholar] [CrossRef]
Miller, C.A.; Verstraete, M.C. A mechanical energy analysis of gait initiation. Gait Posture 1999, 9, 158–166. [Google Scholar] [CrossRef]
Inman, V.T.; Ralston, H.J.; Todd, F. Human Walking; Williams & Wilkins: Philadelphia, PA, USA, 1981. [Google Scholar]
Devine, J. The versatility of human locomotion. Am. Anthropol. 1985, 87, 550–570. [Google Scholar] [CrossRef]
Winter, D.A. Biomechanics and Motor Control of Human Movement; John Wiley & Sons: Hoboken, NJ, USA, 2009. [Google Scholar]
Bohannon, R.W. Comfortable and maximum walking speed of adults aged 20–79 years: Reference values and determinants. Age Ageing 1997, 26, 15–19. [Google Scholar] [CrossRef] [Green Version]
Serra-AÑó, P.; López-Bueno, L.; García-Massó, X.; Pellicer-Chenoll, M.T.; González, L.M. Postural control mechanisms in healthy adults in sitting and standing positions. Percept. Mot. Skills 2015, 121, 119–134. [Google Scholar] [CrossRef]
Park, E.; Reimann, H.; Schöner, G. Coordination of muscle torques stabilizes upright standing posture: An UCM analysis. Exp. Brain Res. 2016, 234, 1757–1767. [Google Scholar] [CrossRef]
Barroso, F.O.; Torricelli, D.; Molina-Rueda, F.; Alguacil-Diego, I.M.; Cano-de-la Cuerda, R.; Santos, C.; Moreno, J.C.; Miangolarra-Page, J.C.; Pons, J.L. Combining muscle synergies and biomechanical analysis to assess gait in stroke patients. J. Biomech. 2017, 63, 98–103. [Google Scholar] [CrossRef]
Seth, A.; Hicks, J.L.; Uchida, T.K.; Habib, A.; Dembia, C.L.; Dunne, J.J.; Ong, C.F.; DeMers, M.S.; Rajagopal, A.; Millard, M.; et al. OpenSim: Simulating musculoskeletal dynamics and neuromuscular control to study human and animal movement. PLoS Comput. Biol. 2018, 14, e1006223. [Google Scholar] [CrossRef]
Raabe, M.E.; Chaudhari, A.M. An investigation of jogging biomechanics using the full-body lumbar spine model: Model development and validation. J. Biomech. 2016, 49, 1238–1243. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Delp, S.L.; Anderson, F.C.; Arnold, A.S.; Loan, P.; Habib, A.; John, C.T.; Guendelman, E.; Thelen, D.G. OpenSim: Open-source software to create and analyze dynamic simulations of movement. IEEE Trans. Biomed. Eng. 2007, 54, 1940–1950. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Reinbolt, J.A.; Seth, A.; Delp, S.L. Simulation of human movement: Applications using OpenSim. Procedia IUTAM 2011, 2, 186–198. [Google Scholar] [CrossRef]
Iskander, J.; Hossny, M.; Nahavandi, S. Using biomechanics to investigate the effect of VR on eye vergence system. Appl. Ergon. 2019, 81, 102883. [Google Scholar] [CrossRef]
Iskander, J.; Hossny, M.; Nahavandi, S.; Del Porto, L. An Ocular Biomechanic Model for Dynamic Simulation of Different Eye Movements. J. Biomech. 2018, 71, 208–216. [Google Scholar] [CrossRef]
Nahavandi, D.; Iskander, J.; Hossny, M.; Haydari, V.; Harding, S. Ergonomic effects of using Lift Augmentation Devices in mining activities. In Proceedings of the 2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Budapest, Hungary, 9–12 October 2016. [Google Scholar]
Abobakr, A.; Nahavandi, D.; Hossny, M.; Iskander, J.; Attia, M.; Nahavandi, S.; Smets, M. RGB-D ergonomic assessment system of adopted working postures. Appl. Ergon. 2019, 80, 75–88. [Google Scholar] [CrossRef] [PubMed]
Hossny, M.; Nahavandi, D.; Nahavandi, S.; Haydari, V.; Harding, S. Musculoskeletal analysis of mining activities. In Proceedings of the 2015 IEEE International Symposium on Systems Engineering (ISSE), Rome, Italy, 28–30 September 2015; pp. 184–189. [Google Scholar]
Dorn, T.W.; Wang, J.M.; Hicks, J.L.; Delp, S.L. Predictive simulation generates human adaptations during loaded and inclined walking. PLoS ONE 2015, 10, e0121407. [Google Scholar] [CrossRef] [Green Version]
DeMers, M.S.; Hicks, J.L.; Delp, S.L. Preparatory co-activation of the ankle muscles may prevent ankle inversion injuries. J. Biomech. 2017, 52, 17–23. [Google Scholar] [CrossRef] [Green Version]
Halilaj, E.; Rajagopal, A.; Fiterau, M.; Hicks, J.L.; Hastie, T.J.; Delp, S.L. Machine learning in human movement biomechanics: Best practices, common pitfalls, and new opportunities. J. Biomech. 2018, 81, 1–11. [Google Scholar] [CrossRef]
Haarnoja, T.; Ha, S.; Zhou, A.; Tan, J.; Tucker, G.; Levine, S. Learning to walk via deep reinforcement learning. arXiv 2018, arXiv:1812.11103. [Google Scholar]
Hebbel, M.; Kosse, R.; Nistico, W. Modeling and learning walking gaits of biped robots. In Proceedings of the Workshop on Humanoid Soccer Robots of the IEEE-RAS International Conference on Humanoid Robots, Genova, Italy, 4–6 December 2006; pp. 40–48. [Google Scholar]
Kober, J.; Bagnell, J.A.; Peters, J. Reinforcement learning in robotics: A survey. Int. J. Robot. Res. 2013, 32, 1238–1274. [Google Scholar] [CrossRef] [Green Version]
Sutton, R.S.; Barto, A.G. Introduction to Reinforcement Learning; MIT Press: Cambridge, MA, USA, 1998; Volume 2. [Google Scholar]
Kidziński, Ł.; Ong, C.; Mohanty, S.P.; Hicks, J.; Carroll, S.; Zhou, B.; Zeng, H.; Wang, F.; Lian, R.; Tian, H.; et al. Artificial Intelligence for Prosthetics: Challenge Solutions. In The NeurIPS’18 Competition; Springer: Berlin/Heidelberg, Germany, 2020; pp. 69–128. [Google Scholar]
Kidziński, Ł.; Mohanty, S.P.; Ong, C.F.; Hicks, J.L.; Carroll, S.F.; Levine, S.; Salathé, M.; Delp, S.L. Learning to run challenge: Synthesizing physiologically accurate motion using deep reinforcement learning. In The NIPS’17 Competition: Building Intelligent Systems; Springer: Berlin/Heidelberg, Germany, 2018; pp. 101–120. [Google Scholar]
Kidziński, Ł.; Mohanty, S.P.; Ong, C.F.; Huang, Z.; Zhou, S.; Pechenko, A.; Stelmaszczyk, A.; Jarosik, P.; Pavlov, M.; Kolesnikov, S.; et al. Learning to Run challenge solutions: Adapting reinforcement learning methods for neuromusculoskeletal environments. In The NIPS’17 Competition: Building Intelligent Systems; Springer: Berlin/Heidelberg, Germany, 2018; pp. 121–153. [Google Scholar]
Jaśkowski, W.; Lykkebø, O.R.; Toklu, N.E.; Trifterer, F.; Buk, Z.; Koutník, J.; Gomez, F. Reinforcement Learning to Run… Fast. In The NIPS’17 Competition: Building Intelligent Systems; Springer: Berlin/Heidelberg, Germany, 2018; pp. 155–167. [Google Scholar]
Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. arXiv 2015, arXiv:1509.02971. [Google Scholar]
Schulman, J.; Moritz, P.; Levine, S.; Jordan, M.; Abbeel, P. High-dimensional continuous control using generalized advantage estimation. arXiv 2015, arXiv:1506.02438. [Google Scholar]
Levine, S.; Finn, C.; Darrell, T.; Abbeel, P. End-to-end training of deep visuomotor policies. J. Mach. Learn. Res. 2016, 17, 1334–1373. [Google Scholar]
Duan, Y.; Chen, X.; Houthooft, R.; Schulman, J.; Abbeel, P. Benchmarking deep reinforcement learning for continuous control. In Proceedings of the 33rd International Conference on International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016; pp. 1329–1338. [Google Scholar]
Pilotto, A.; Boi, R.; Petermans, J. Technology in geriatrics. Age Ageing 2018, 47, 771–774. [Google Scholar] [CrossRef] [Green Version]
Lee, S.; Lee, K.; Park, M.; Lee, J. Scalable Muscle-actuated Human Simulation and Control. ACM Trans. Graph. 2019, 37. [Google Scholar] [CrossRef]
Brockman, G.; Cheung, V.; Pettersson, L.; Schneider, J.; Schulman, J.; Tang, J.; Zaremba, W. Openai gym. arXiv 2016, arXiv:1606.01540. [Google Scholar]
Delp, S.L.; Loan, J.P.; Hoy, M.G.; Zajac, F.E.; Topp, E.L.; Rosen, J.M. An interactive graphics-based model of the lower extremity to study orthopaedic surgical procedures. IEEE Trans. Biomed. Eng. 1990, 37, 757–767. [Google Scholar] [CrossRef]
Ong, C.F.; Geijtenbeek, T.; Hicks, J.L.; Delp, S.L. Predictive simulations of human walking produce realistic cost of transport at a range of speeds. In Proceedings of the 16th International Symposium on Computer Simulation in Biomechanics, Gold Coast, Australia, 20–22 July 2017; pp. 19–20. [Google Scholar]
Arnold, E.M.; Ward, S.R.; Lieber, R.L.; Delp, S.L. A model of the lower limb for analysis of human movement. Ann. Biomed. Eng. 2010, 38, 269–279. [Google Scholar] [CrossRef] [Green Version]
Thelen, D.G. Adjustment of muscle mechanics model parameters to simulate dynamic contractions in older adults. J. Biomech. Eng. 2003, 125, 70–77. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Hunt, K.; Crossley, F. Coefficient of Restitution Interpreted as Damping in Vibroimpact. J. Appl. Mech. 1975, 42, 440–445. [Google Scholar] [CrossRef]
Stanley, K.O.; Miikkulainen, R. Evolving Neural Networks through Augmenting Topologies. Evol. Comput. 2002, 10, 99–127. [Google Scholar] [CrossRef] [PubMed]
Stanley, K.O. Neuroevolution: A different kind of deep learning. O’Reilly 2017, 27, 2019. [Google Scholar]
Hospedales, T.; Antoniou, A.; Micaelli, P.; Storkey, A. Meta-Learning in Neural Networks: A Survey. arXiv 2020, arXiv:cs.LG/2004.05439. [Google Scholar]
Grondman, I.; Busoniu, L.; Lopes, G.A.; Babuska, R. A survey of actor-critic reinforcement learning: Standard and natural policy gradients. IEEE Trans. Syst. Man, Cybern. Part C Appl. Rev. 2012, 42, 1291–1307. [Google Scholar] [CrossRef] [Green Version]
Konda, V.R.; Tsitsiklis, J.N. Actor-critic algorithms. Adv. Neural Inf. Process. Syst. 2000, 1008–1014. [Google Scholar]
Ghassabian, A.; Sundaram, R.; Bell, E.; Bello, S.C.; Kus, C.; Yeung, E. Gross Motor Milestones and Subsequent Development. Pediatrics 2016, 138, e20154372. [Google Scholar] [CrossRef] [Green Version]
Forbes, P.A.; Chen, A.; Blouin, J.S. Sensorimotor control of standing balance. In Handbook of Clinical Neurology; Elsevier: Amsterdam, The Netherlands, 2018; Volume 159, pp. 61–83. [Google Scholar]
Spry, S.; Zebas, C.; Visser, M. What is Leg Dominance? ISBS-Conference Proceedings Archive. 1993. Available online: https://ojs.ub.uni-konstanz.de/cpa/article/view/1700 (accessed on 15 June 2012).
Dragicevic, P. Fair Statistical Communication in HCI. In Modern Statistical Methods for HCI; Springer: Berlin/Heidelberg, Germany, 2016; pp. 291–330. [Google Scholar] [CrossRef] [Green Version]
Haarnoja, T.; Zhou, A.; Abbeel, P.; Levine, S. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. arXiv 2018, arXiv:cs.LG/1801.01290. [Google Scholar]
Barth-Maron, G.; Hoffman, M.W.; Budden, D.; Dabney, W.; Horgan, D.; TB, D.; Muldal, A.; Heess, N.; Lillicrap, T. Distributed Distributional Deterministic Policy Gradients. arXiv 2018, arXiv:cs.LG/1804.08617. [Google Scholar]
Hossny, M.; Iskander, J.; Attia, M.; Saleh, K. Refined Continuous Control of DDPG Actors via Parametrised Activation. arXiv 2020, arXiv:cs.LG/2006.02818. [Google Scholar]
Abobakr, A.; Hossny, M.; Nahavandi, S. A Skeleton-Free Fall Detection System From Depth Images Using Random Decision Forest. IEEE Syst. J. 2018, 12, 1–12. [Google Scholar] [CrossRef]
Rahwan, I.; Cebrian, M.; Obradovich, N.; Bongard, J.; Bonnefon, J.F.; Breazeal, C.; Crandall, J.W.; Christakis, N.A.; Couzin, I.D.; Jackson, M.O.; et al. Machine behaviour. Nature 2019, 568, 477–486. [Google Scholar] [CrossRef] [PubMed] [Green Version]

Figure 1. The actor part of the proposed AI agent. The observations (from different modalities) read from the environment, are fed into light weight MLP networks (mini-observers) to produce an encoded latent features. The encoded features are then fed into all policies to produce a set of actions. The produced actions are then multiplied by the weight factors produced by the coordination network based on the state reported by the environment. The final, weighted, action is then fed back into the environment to actuate the muscles.

Figure 2. The learning of an AI agent towards maintaining a standing posture. Rows: Individually trained policies. Columns: Warm up episodes, better coordination, refined coordination (left to right). Muscles are coloured based on innervation level (Blue = 0 to Red = 1).

Figure 3. Four milestones of training (one trained model each). The box plot shows the duration before falling epsdlen for different milestones. The line plots show the AI-generated actuation signals for different left and right leg muscles. The AI agent adopted a knee locking strategy while thrusting the pelvis forward to control the centre of gravity. Each milestone was trained for 500,000 simulation steps. The final agent was able to maintain balance for 4 s on average. The maximum training episode length was 500 simulation steps or 5 s (1 step = 0.01 s). Solid blue line is the average of 35 test trials and the light blue envelop is the estimated standard error at

p < 0.05

using random bootstrapping [54].

Figure 3. Four milestones of training (one trained model each). The box plot shows the duration before falling epsdlen for different milestones. The line plots show the AI-generated actuation signals for different left and right leg muscles. The AI agent adopted a knee locking strategy while thrusting the pelvis forward to control the centre of gravity. Each milestone was trained for 500,000 simulation steps. The final agent was able to maintain balance for 4 s on average. The maximum training episode length was 500 simulation steps or 5 s (1 step = 0.01 s). Solid blue line is the average of 35 test trials and the light blue envelop is the estimated standard error at

p < 0.05

using random bootstrapping [54].

Figure 4. The AI agent explored the option of taking a protective step (top-left). When the associated score was positive (top-right), the AI agent started exploiting this with both legs (bottom-left). One thousand episodes later, the AI agent was able to provide coordinated actuation signals to perform a short locomotion (3 steps). Muscles are coloured based on innervation level (Blue = 0 to Red = 1). Locomotion video is available in the Supplementary Materials.

Table 1. Observation Table.

Observation	Size	Notation (in Score fn.)	Comments
Ground Reaction Forces	6	$F_{g}$	3 per foot
Pelvis Orientation/Linear/Angular Velocity	9	$p, {\dot{θ}}^{p}, H, γ, ϕ$
Joint Angles	8	$θ^{J}$	4 per leg
Change in Joint Angles	8	$Δ θ^{J}$	4 per leg
Muscle Actuation	22	$m^{a}$	11 per leg
Muscle Force	22	$m^{f}$	11 per leg
Muscle Length	22	$m^{l}$	11 per leg
Random Values	3		velocity vector field [30]

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hossny, M.; Iskander, J. Just Don’t Fall: An AI Agent’s Learning Journey Towards Posture Stabilisation. AI 2020, 1, 286-298. https://doi.org/10.3390/ai1020019

AMA Style

Hossny M, Iskander J. Just Don’t Fall: An AI Agent’s Learning Journey Towards Posture Stabilisation. AI. 2020; 1(2):286-298. https://doi.org/10.3390/ai1020019

Chicago/Turabian Style

Hossny, Mohammed, and Julie Iskander. 2020. "Just Don’t Fall: An AI Agent’s Learning Journey Towards Posture Stabilisation" AI 1, no. 2: 286-298. https://doi.org/10.3390/ai1020019

APA Style

Hossny, M., & Iskander, J. (2020). Just Don’t Fall: An AI Agent’s Learning Journey Towards Posture Stabilisation. AI, 1(2), 286-298. https://doi.org/10.3390/ai1020019

Article Menu

Just Don’t Fall: An AI Agent’s Learning Journey Towards Posture Stabilisation

Abstract

1. Introduction

2. Material and Methods

2.1. Biomechanic Simulation Environment

2.2. Artificial Neural Network Model

2.3. Reinforcement Training Procedure

3. Results

4. Discussion

4.1. An Interesting Behaviour

4.2. Limitations

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI