# Goal-Directed Planning for Habituated Agents by Active Inference Using a Variational Recurrent Neural Network

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Model

#### 2.1. Overview of PV-RNN

#### 2.2. Learning with Evidence Lower Bound

#### 2.3. Plan Generation with GLean and the Estimated Lower Bound

## 3. Experiments

#### 3.1. Experiment 1: Simulated Mobile Agent in a 2D Space

#### 3.1.1. Prior Generation

#### 3.1.2. Target Regeneration

#### 3.1.3. Plan Generation

#### 3.1.4. Plan Generation for Goals Set in Unlearned Regions

#### 3.2. Experiment 2: Simulated Robotic Object Manipulation Task

#### Plan Generation

#### 3.3. Comparison between GLean, FM, and SI

## 4. Conclusions and Discussion

## Author Contributions

## Acknowledgments

## Conflicts of Interest

## References

- Gabaix, X. A Sparsity-based Model of Bounded Rationality. Q. J. Econ.
**2014**, 129, 1661–1710. [Google Scholar] [CrossRef][Green Version] - Selten, R. Bounded Rationality. J. Inst. Theor. Econ.
**1990**, 146, 649–658. [Google Scholar] - Rao, R.P.N.; Ballard, D.H. Predictive coding in the visual cortex: A functional interpretation of some extra-classical receptive-field effects. Nat. Neurosci.
**1999**, 2, 79–87. [Google Scholar] [CrossRef] [PubMed] - Tani, J.; Nolfi, S. Learning to perceive the world as articulated: An approach for hierarchical learning in sensory-motor systems. Neural Netw.
**1999**, 12, 1131–1141. [Google Scholar] [CrossRef] - Lee, T.S.; Mumford, D. Hierarchical Bayesian inference in the visual cortex. J. Opt. Soc. Am. A
**2003**, 20, 1434–1448. [Google Scholar] [CrossRef] - Friston, K. A theory of cortical responses. Philos. Trans. R. Soc. B Biol. Sci.
**2005**, 360, 815–836. [Google Scholar] [CrossRef] - Hohwy, J. The Predictive Mind; Oxford University Press: Oxford, UK, 2013. [Google Scholar]
- Clark, A. Surfing Uncertainty: Prediction, Action, and the Embodied Mind; Oxford University Press: Oxford, UK, 2015. [Google Scholar]
- Friston, K. Does predictive coding have a future? Nat. Neurosci.
**2018**, 21, 1019–1021. [Google Scholar] [CrossRef] - Friston, K.; Daunizeau, J.; Kiebel, S. Reinforcement Learning or Active Inference? PLoS ONE
**2009**, 4, e6421. [Google Scholar] [CrossRef][Green Version] - Friston, K.; Daunizeau, J.; Kilner, J.; Kiebel, S. Action and behavior: A free-energy formulation. Biol. Cybern.
**2010**, 102, 227–260. [Google Scholar] [CrossRef][Green Version] - Friston, K.; Mattout, J.; Kilner, J. Action understanding and active inference. Biol. Cybern.
**2011**, 104, 137–160. [Google Scholar] [CrossRef][Green Version] - Buckley, C.; Kim, C.S.; McGregor, S.; Seth, A. The free energy principle for action and perception: A mathematical review. J. Math. Psychol.
**2017**, 81, 55–79. [Google Scholar] [CrossRef] - Pezzulo, G.; Rigoli, F.; Friston, K. Hierarchical active inference: A theory of motivated control. Trends Cogn. Sci.
**2018**, 22, 294–306. [Google Scholar] [CrossRef] [PubMed][Green Version] - Oliver, G.; Lanillos, P.; Cheng, G. Active inference body perception and action for humanoid robots. arXiv
**2019**, arXiv:1906.03022. [Google Scholar] - Miall, R.C.; Wolpert, D.M. Forward Models for Physiological Motor Control. Neural Netw.
**1996**, 9, 1265–1279. [Google Scholar] [CrossRef] - Kawato, M.; Maeda, Y.; Uno, Y.; Suzuki, R. Trajectory formation of arm movement by cascade neural network model based on minimum torque-change criterion. Biol. Cybern.
**1990**, 62, 275–288. [Google Scholar] [CrossRef] - Kawato, M. Internal models for motor control and trajectory planning. Curr. Opin. Neurobiol.
**1999**, 9, 718–727. [Google Scholar] [CrossRef] - Tani, J. Model-based learning for mobile robot navigation from the dynamical systems perspective. IEEE Trans. Syst. Man Cybern. Part B Cybern.
**1996**, 26, 421–436. [Google Scholar] [CrossRef][Green Version] - Jordan, M.I. Attractor dynamics and parallelism in a connectionist sequential machine. In Proceedings of the 8th Annual Conference of Cognitive Science Society, Amherst, MA, USA, 15–17 August 1986; pp. 531–546. [Google Scholar]
- Arie, H.; Endo, T.; Arakaki, T.; Sugano, S.; Tani, J. Creating novel goal-directed actions at criticality: A neuro-robotic experiment. New Math. Nat. Comput.
**2009**, 5, 307–334. [Google Scholar] [CrossRef][Green Version] - Choi, M.; Matsumoto, T.; Jung, M.; Tani, J. Generating Goal-Directed Visuomotor Plans Based on Learning Using a Predictive Coding-type Deep Visuomotor Recurrent Neural Network Model. arXiv
**2018**, arXiv:1803.02578. [Google Scholar] - Jung, M.; Matsumoto, T.; Tani, J. Goal-Directed Behavior under Variational Predictive Coding: Dynamic Organization of Visual Attention and Working Memorys. In Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems, Macau, China, 3–8 November 2019; pp. 1040–1047. [Google Scholar]
- Chung, J.; Kastner, K.; Dinh, L.; Goel, K.; Courville, A.C.; Bengio, Y. A Recurrent Latent Variable Model for Sequential Data. In Advances in Neural Information Processing Systems 28; Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R., Eds.; Curran Associates, Inc.: Montreal, QC, Canada, 7–12 December 2015; pp. 2980–2988. [Google Scholar]
- Ahmadi, A.; Tani, J. A Novel Predictive-Coding-Inspired Variational RNN Model for Online Prediction and Recognition. Neural Comput.
**2019**, 31, 2025–2074. [Google Scholar] [CrossRef][Green Version] - Yamashita, Y.; Tani, J. Emergence of Functional Hierarchy in a Multiple Timescale Neural Network Model: A Humanoid Robot Experiment. PLoS Comput. Biol.
**2008**, 4, e1000220. [Google Scholar] [CrossRef] [PubMed] - Beer, R. On the Dynamics of Small Continuous-Time Recurrent Neural Networks. Adapt. Behav.
**1995**, 3, 469–509. [Google Scholar] [CrossRef] - Nishimoto, R.; Namikawa, J.; Tani, J. Learning Multiple Goal-Directed Actions through Self-Organization of a Dynamic Neural Network Model: A Humanoid Robot Experiment. Adapt. Behav.
**2008**, 16, 166–181. [Google Scholar] [CrossRef] - Kingma, D.P.; Welling, M. Auto-Encoding Variational Bayes. In Proceedings of the 2nd International Conference on Learning Representations, Banff, AB, Canada, 14–16 April 2014. [Google Scholar]
- Friston, K.; Kilner, J.; Harrison, L. A free energy principle for the brain. J. Physiol.
**2006**, 100, 70–87. [Google Scholar] [CrossRef][Green Version] - Tani, J.; Ito, M. Self-organization of behavioral primitives as multiple attractor dynamics: A robot experiment. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum.
**2003**, 33, 481–488. [Google Scholar] [CrossRef] - Butz, M.V.; Bilkey, D.; Humaidan, D.; Knott, A.; Otte, S. Learning, planning, and control in a monolithic neural event inference architecture. Neural Netw.
**2019**, 117, 135–144. [Google Scholar] [CrossRef][Green Version] - Kirchhoff, M.; Parr, T.; Palacios, E.; Friston, K.; Kiverstein, J. The Markov blankets of life: Autonomy, active inference and the free energy principle. J. R. Soc. Interface
**2018**, 15. [Google Scholar] [CrossRef] - Ha, D.; Schmidhuber, J. World Models. arXiv
**2018**, arXiv:1803.10122. [Google Scholar] - Hafner, D.; Lillicrap, T.; Fischer, I.; Villegas, R.; Ha, D.; Lee, H.; Davidson, J. Learning Latent Dynamics for Planning from Pixels. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019. [Google Scholar]
- Beilock, S.L.; Gray, R. Why do athletes choke under pressure? In Handbook of Sport Psychology; Tenenbaum, G., Eklund, R.C., Eds.; John Wiley & Sons Inc.: Hoboken, NJ, USA, 2007; pp. 425–444. [Google Scholar]
- Cappuccio, M.; Kirchhoff, M.D.; Alnajjar, F.; Tani, J. Unfulfilled Prophecies in Sport Performance: Active Inference and the Choking Effect. J. Conscious. Stud.
**2019**, 27, 152–184. [Google Scholar] - Oudeyer, P.Y.; Kaplan, F.; Hafner, V.V. Intrinsic Motivation Systems for Autonomous Mental Development. IEEE Trans. Evol. Comput.
**2007**, 11, 265–286. [Google Scholar] [CrossRef][Green Version] - Forestier, S.; Mollard, Y.; Oudeyer, P.Y. Intrinsically Motivated Goal Exploration Processes with Automatic Curriculum Learning. arXiv
**2017**, arXiv:1708.02190. [Google Scholar]

**Figure 1.**(

**a**) The forward model (FM) and (

**b**) the predictive coding and active inference framework where $stat{e}_{t}$ and $sens{e}_{t+1}$ represent the current latent state and prediction of the next sensory state in terms of the exteroception and proprioception. The predicted proprioception can then be converted into a motor control signal as necessary, such as by using an inverse model as depicted in (

**b**).

**Figure 2.**Three different models for learning-based goal-directed motor planning. (

**a**) The forward model implemented in an RNN, (

**b**) predictive coding (PC) and active inference (AIF) frameworks implemented in a recurrent neural network (RNN) using initial sensitivity by latent random variables at the initial step, either by the stochastic ${\mathit{z}}_{t}$ or the deterministic ${\mathit{d}}_{t}$, and (

**c**) the proposed GLean scheme based on the PC and AIF framework implemented in a variational RNN. In each case, the horizontal axis indicates progression through time (left to right). The black arrows represent computation in the forward pass, while the red arrows represent prediction error being propagated during backpropagation through time (BPTT).

**Figure 3.**Graphical representation of predictive-coding inspired variational RNN (PV-RNN) as implemented in this paper.

**Figure 4.**Difference in how error regression is employed in (

**a**) future sequence prediction and (

**b**) goal-directed planning. Solid black lines represent the forward generative model while the dashed red lines represent back-propagation through time used to update ${A}^{\varnothing}$.

**Figure 5.**Plots of the trajectories prepared for a mobile agent generating goal-directed behaviors in 2D space. (

**a**) XY plot showing the initial position of the agent, the branch point, and the two goal areas, (

**b**) the plot of the X position over time, and (

**c**) the plot of the Y position over time. The branch point is visible at around $t=10$.

**Figure 7.**Trajectory plots showing (

**a**) the training data (ground truth), (

**b**) prior generation with a weak meta-prior, (

**c**) with an intermediate meta-prior, and (

**d**) with a strong meta-prior. Each plot contains 60 trajectories.

**Figure 9.**Trajectory plots showing (

**a**) the target (ground truth), (

**b**) target regeneration with a weak meta-prior, (

**c**) target regeneration with an intermediate meta-prior, and (

**d**) target regeneration with a strong meta-prior. Each plot contains 60 trajectories.

**Figure 10.**Plots of Kullback-Leibler divergence (KLD) during target regeneration given a particular ${\mathit{A}}_{1}$ adaptation value. (

**a**) Shows KLD for weak, intermediate and strong meta-prior in the bottom layer, (

**b**) shows KLD for weak, intermediate and strong meta-prior in the top layer. (

**c**) Adjusts the scale of (

**b**) so the intermediate meta-prior result can be more clearly seen. The peak in KLD in the intermediate meta-prior network is visible around $t=8$. The shaded areas indicate the standard deviation of KLD over 60 generated trajectories.

**Figure 11.**Plots of the x coordinate over time, target regeneration given a particular ${\mathit{A}}_{1}$ adaptation value with (

**a**) a weak meta-prior, (

**b**) an intermediate meta-prior, and (

**c**) a strong meta-prior. The branch point is visible around $t=10$, except in (

**c**) which does not exhibit any branching behavior.

**Figure 12.**Plots showing motor plans of the 20 test sequences. (

**a**) Shows the ground truth for untrained test data set, with the remaining plots generated with a (

**b**) weak meta-prior, (

**c**) intermediate meta-prior, and (

**d**) strong meta-prior as described in Table 2.

**Figure 13.**Plots showing motor plans of the 10 test sequences with goals set in an untrained region. (

**a**) Shows the ground truth test trajectories, and (

**b**) shows the results of plan generation.

**Figure 14.**Simulated robot executing the grasp and place task. In the workspace in front of the robot, there are two graspable blocks and two goal circles. Crosshair markers show the predicted positions of the gripper and the two blocks.

**Figure 15.**Trajectories of the gripper in two dimensions, with the mean positions of the blocks and goal circles overlaid. Dashed circles represent the standard deviation of the positions.

**Figure 16.**Graphical representations of (

**a**) the forward model (FM) and (

**b**) the stochastic initial state (SI) model as implemented in this paper.

**Figure 17.**Plots showing the predicted sensory states (top row) and the motor plans (bottom row) for a given goal. The colored lines within each plot represent a sequence of predictions for one sensory or proprioception dimension. The columns of plots correspond to (

**a**) weak meta-prior, (

**b**) intermediate meta-prior, (

**c**) strong meta-prior, and (

**d**) ground truth. An arrow indicates the grasp point, where the robot attempts to pick up the block. While the exact timestep of the grasp point can vary, if the relationship between the predicted dimensions is not maintained the grasping attempt is more likely to fail.

**Figure 18.**Comparison between the generated sensory predictions (solid lines) and the ground truth sensory states (dashed lines) for (

**a**) forward model, (

**b**) stochastic initial state, and (

**c**) GLean.

**Figure 19.**Comparison of one-step look ahead sensory prediction (solid lines) and the ground truth (dashed lines) among three different models—(

**a**) forward model, (

**b**) stochastic initial state, and (

**c**) GLean.

MTRNN Layer | ||
---|---|---|

1 | 2 | |

Neurons $|{\mathit{d}}^{l}|$ | 20 | 10 |

Z-units $|{\mathit{z}}^{l}|$ | 2 | 1 |

$\tau $ | 4 | 8 |

MTRNN Layer | ||
---|---|---|

Meta-Prior Setting $\mathit{w}$ | 1 | 2 |

Weak | 0.00001 | 0.000005 |

Intermediate | 0.01 | 0.005 |

Strong | 0.2 | 0.1 |

**Table 3.**Distribution of goals reached by networks with different meta-priors, after 60 prior generation sequences.

Training Meta-Prior | Left Goal % | Right Goal % |
---|---|---|

Weak | 38.3 | 61.7 |

Intermediate | 46.7 | 53.3 |

Strong | 55.0 | 45.0 |

Ground truth | 50.0 | 50.0 |

**Table 4.**Distribution of goals reached by networks with different meta-priors, after 60 target regeneration sequences.

Training Meta-Prior | Left Goal % | Right Goal % |
---|---|---|

Weak | 56.7 | 43.3 |

Intermediate | 70.0 | 30.0 |

Strong | 100.0 | 0.0 |

Target | 100.0 | 0.0 |

**Table 5.**Plan generation results on the 20 trajectory test set with varying meta-prior. Best result highlighted in bold.

Meta-Prior | Average ${\mathit{KLD}}_{\mathit{pq}}$ | Average RMSE $\pm \mathit{\sigma}$ | Average GD $\pm \mathit{\sigma}$ |
---|---|---|---|

Weak | 159.1 | $0.0615\pm 0.0042$ | $\mathbf{5.2}\times {\mathbf{10}}^{-\mathbf{6}}\pm \mathbf{2.1}\times {\mathbf{10}}^{-\mathbf{6}}$ |

Intermediate | 3.36 | $\mathbf{0.0344}\pm \mathbf{0.0021}$ | $7.8\times {10}^{-5}\pm 1.9\times {10}^{-5}$ |

Strong | 0.17 | $0.0375\pm 0.0015$ | $6.7\times {10}^{-4}\pm 8.8\times {10}^{-5}$ |

**Table 6.**Network parameters used for the simulated robot experiment for 3 different models—GLean, FM, and SI.

(a) GLean | |||

MTRNN Layer | |||

1 | 2 | 3 | |

Neurons $|{\mathit{d}}^{l}|$ | 30 | 20 | 10 |

Z-units $|{\mathit{z}}^{l}|$ | 3 | 2 | 1 |

$\tau $ | 2 | 4 | 8 |

(b) FM | |||

MTRNN Layer | |||

1 | 2 | 3 | |

Neurons $|{\mathit{d}}^{l}|$ | 30 | 20 | 10 |

Z-units $|{\mathit{z}}^{l}|$ | 0 | 0 | 0 |

$\tau $ | 2 | 4 | 8 |

(c) SI | |||

MTRNN Layer | |||

1 | 2 | 3 | |

Neurons $|{\mathit{d}}^{l}|$ | 30 | 20 | 10 |

Z-units $|{\mathit{z}}^{l}|$ | 30 * | 20 * | 10 * |

$\tau $ | 2 | 4 | 8 |

MTRNN Layer | |||
---|---|---|---|

Meta-Prior Setting $\mathit{w}$ | L1 | L2 | L2 |

Weak | 0.0004 | 0.0002 | 0.0001 |

Intermediate | 0.0008 | 0.0004 | 0.0002 |

Strong | 0.002 | 0.001 | 0.0005 |

**Table 8.**GLean generated plans with networks trained with different meta-priors, compared with ground truth. Note that in order for the results in the following tables to be comparable to the previous experiment, the output values were rescaled to $[0,1]$. Only the sensory states are compared between generated and ground truth trajectories. Best result highlighted in bold.

Meta-Prior | Average ${\mathit{KLD}}_{\mathit{pq}}$ | Average RMSE $\pm \mathit{\sigma}$ | Average GD $\pm \mathit{\sigma}$ |
---|---|---|---|

Weak | 12.48 | $0.0387\pm 0.00067$ | $\mathbf{5.7}\times {\mathbf{10}}^{-\mathbf{5}}\pm \mathbf{7.4}\times {\mathbf{10}}^{-\mathbf{6}}$ |

Intermediate | 4.64 | $\mathbf{0.0230}\pm \mathbf{0.00058}$ | $6.9\times {10}^{-5}\pm 8.6\times {10}^{-6}$ |

Strong | 2.35 | $0.0242\pm 0.00051$ | $1.3\times {10}^{-4}\pm 1.4\times {10}^{-5}$ |

**Table 9.**Simulation results of executing GLean generated plans with networks trained with different meta-priors. Best result highlighted in bold.

Meta-Prior | Success Rate | Average Error at Goal $\pm \mathit{\sigma}$ |
---|---|---|

Weak | 51.5% | $1.74\pm 0.15$ cm |

Intermediate | 86.0% | $\mathbf{1.52}\pm \mathbf{0.07}$cm |

Strong | 60.5% | $2.02\pm 0.11$ cm |

Model | Average ${\mathit{KLD}}_{\mathit{pq}}$ | Average RMSE $\pm \mathit{\sigma}$ | Average GD $\pm \mathit{\sigma}$ |
---|---|---|---|

Forward model | – | $0.1504$ | $6.8\times {10}^{-3}$ |

Stochastic initial state | 3.32 | $0.0257\pm 0.00085$ | $7.8\times {10}^{-5}\pm 3.3\times {10}^{-6}$ |

GLean | 4.64 | $\mathbf{0.0230}\pm \mathbf{0.00058}$ | $\mathbf{6.9}\times {\mathbf{10}}^{-\mathbf{5}}\pm \mathbf{8.6}\times {\mathbf{10}}^{-\mathbf{6}}$ |

**Table 11.**Simulation results of executing plans generated by GLean, FM, and SI. Best result highlighted in bold.

Model | Success Rate | Average Error at Goal $\pm \mathit{\sigma}$ |
---|---|---|

Forward model (FM) | 0.0% | – |

Stochastic initial state (SI) | 68.0% | $2.02\pm 0.14$ cm |

GLean | 86.0% | $\mathbf{1.52}\pm \mathbf{0.07}$cm |

**Table 12.**Comparison of average errors in sensory predictions generated by GLean, FM, and SI when provided with the ground truth motor states.

Model | Average RMSE |
---|---|

Forward model (FM) | 0.0119 |

Stochastic initial state (SI) | 0.0107 |

GLean | 0.0086 |

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Matsumoto, T.; Tani, J. Goal-Directed Planning for Habituated Agents by Active Inference Using a Variational Recurrent Neural Network. *Entropy* **2020**, *22*, 564.
https://doi.org/10.3390/e22050564

**AMA Style**

Matsumoto T, Tani J. Goal-Directed Planning for Habituated Agents by Active Inference Using a Variational Recurrent Neural Network. *Entropy*. 2020; 22(5):564.
https://doi.org/10.3390/e22050564

**Chicago/Turabian Style**

Matsumoto, Takazumi, and Jun Tani. 2020. "Goal-Directed Planning for Habituated Agents by Active Inference Using a Variational Recurrent Neural Network" *Entropy* 22, no. 5: 564.
https://doi.org/10.3390/e22050564