# Nonparametric Problem-Space Clustering: Learning Efficient Codes for Cognitive Control Tasks

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Methods

#### 2.1. Distance-Dependent Chinese Restaurant Process

#### 2.2. Generative Embedded CRP

#### 2.2.1. Algorithmic Generative Models

- (i)
- Sub-goal-sensitive model: This model assigns to each state a probability value proportional to how much the state is likely to be a stage of a plan or, in the algorithmic metaphor, to be the execution of some program’s instruction.By considering every potential program, namely every permitted combination of input states and policies, returning the state ${s}_{{i}^{\prime}}$, one can affirm that, up to a normalization factor, the prior probability of the state ${s}_{{i}^{\prime}}$ can be defined as:$$P\left({s}_{{i}^{\prime}}\right)\propto \sum _{i}\sum _{j}{2}^{-|{c}_{{i}^{\prime}}({s}_{i},{\pi}_{j})|}$$Of course, Equation (7) can be extended to every element of the space state, generating an a priori algorithmic probability distribution (or prior distribution), which assigns the highest probability value to the states that are useful to decompose a planning task into multiple steps [11,38]. This class of states can be denoted as sub-goal states, because they allow splitting problems into less demanding and less complicated sub-problems. Sub-goaling, namely breaking the original problem down into smaller and more manageable problems whose achievement corresponds to landmarks (or “sub-goals”) of the original problem, has long been recognized as a fundamental strategy of problem solving in humans (and possibly other animals) [39].
- (ii)
- Structure-sensitive model: Analogously to Equation (7), one can define the probability that two or more states co-occur. The joint probability of a subset of states depends on the number and the length of the programs passing simultaneously through the states. Hence, for two states ${s}_{i}$, ${s}_{l}$, the whole set of programs with such characteristics is built by considering every policy that starts from a generic state ${s}_{k}$ and ends both in ${s}_{i}$ and in ${s}_{l}$. Formally, $P({s}_{i},{s}_{l})$ is defined as:$$P({s}_{i},{s}_{l})\propto \sum _{k}\sum _{j}{2}^{-\left(\right|{c}_{i}({s}_{k},{\pi}_{j})|+|{c}_{l}({s}_{k},{\pi}_{j}|)}=\sum _{j}{2}^{-|{c}_{i}({s}_{l},{\pi}_{j})|}$$The joint probability between two states of the scenario represented in Equation (8) rests upon the number of programs that connect them. This model thus conveys relevant information on the problem structure and the degree of connectedness between its states.
- (iii)
- Goal-sensitive model: In goal-directed planning, the identification of a state as a goal changes the relations between every state of the domain. Practically speaking, the probability of each state depends on how many effective paths toward the goal pass through it. In probabilistic terms, this corresponds to the fact that the goal state conditions the state probabilities.More formally, this corresponds to the conditional probability that a state ${s}_{i}$ is reached given that ${s}_{l}$ has been returned. This conditional probability can be obtained from Equations (7) and (8) by the probabilistic product rule:$$P\left({s}_{i}\right|{s}_{l})=\frac{P({s}_{i},{s}_{l})}{P\left({s}_{l}\right)}$$
- (iv)
- Path-sensitive model: Planning could be inexpensively defined as acting to find a trajectory in a (symbolic or real) problem space, from an initial state to a final one. This means that it is crucial to carry out a succession of states that are “descriptive” about the agent’s provenance and destination.Hence, the probability of a state to be selected as a point of a planned trajectory is conditioned on both the initial and final states. In algorithmic probability terms, denoted with ${s}_{o}$ and ${s}_{g}$, the states corresponding respectively to the origin and the goal of a planning task, the conditional probability of a state ${s}_{i}$ given ${s}_{o}$ and ${s}_{g}$ can be written as:$$P\left({s}_{i}\right|{s}_{o},{s}_{g})\propto \sum _{j}{2}^{-\left(\right|{c}_{i}(g,{\pi}_{j})|+|{c}_{i}(o,{\pi}_{j})\left|\right)}$$
- (v)
- Multiple goal-sensitive model: In many real situations, a domain can include not just one goal, but a series of goals. The presence of multiple goals entails a different distribution of the most relevant states on the basis of their contribution to the achievement of a set of goals. This relation between a state ${s}_{i}$ and a set of goal states $\{{s}_{{g}_{1}},\cdots ,{s}_{{g}_{G}}\}$ is probabilistically ruled by the conditional distribution $P\left({s}_{i}\right|{s}_{{g}_{1}},\cdots ,{s}_{{g}_{G}})$.Algorithmically, $P\left({s}_{i}\right|{s}_{{g}_{1}},\cdots ,{s}_{{g}_{G}})$ has as its formulation the generalization of Equation (10):$$P\left({s}_{i}\right|{s}_{{g}_{1}},\cdots ,{s}_{{g}_{G}})\propto \sum _{j}{2}^{-\left(\right|{c}_{i}({g}_{1},{\pi}_{j})|+\cdots +|{c}_{i}({g}_{G},{\pi}_{j})\left|\right)}$$At the same time, as we consider in our experiments, one could be interested in grouping the states according to the goal of the set $\{{s}_{{g}_{1}},\cdots ,{s}_{{g}_{G}}\}$ to which they are greatly probabilistically connected. Answering this question results in a maximum a posteriori (MAP) estimation: a very widespread method adopted in fault-cause diagnostics and expert systems. MAP consists of solving the probabilistic inference:$${s}_{{g}^{*}}=arg\left(\right)open="("\; close=")">\underset{{s}_{{g}_{i}}}{max}P\left({s}_{{g}_{i}}\right|{s}_{i})=arg\left(\right)open="("\; close=")">\underset{{s}_{{g}_{i}}}{max}\left(\right)open="("\; close=")">P\left({s}_{i}\right|{s}_{{g}_{i}})P\left({s}_{{g}_{i}}\right)$$

#### 2.2.2. Kernels

- a:
- specifies the generative model adopted. If we consider the models introduced in Section 2.2.1, it can be “$subg$” (sub-goal sensitive), “$struc$” (structure sensitive), “$goal$” (goal sensitive), “$path$” (path sensitive) and “$multig$” (multiple goal sensitive).
- b:
- denotes the standard kernel. For example, we can use “L” for the Laplacian kernel (this is the only kernel involved in the experiments reported in Section 3) or “U” for the uniform kernel, and so on, with other kernels not shown here.

#### 2.2.3. Posterior Inference with Gibbs Sampling

#### 2.3. Properties Captured by the Five Clustering Schemes

- (i)
- The sub-goal-sensitive model clusters the state space ways that unveil the differences between states that are central to many paths (like sub-goals or bottlenecks) and from which many actions are possible, versus states that are only visited by fewer paths. This method is related to measures, such as graph centrality and network flow [47] and empowerment [48], at least in the sense that reaching states that are part of many paths and permit accessing many other states can enhance an agent’s empowerment.
- (ii)
- The goal-sensitive model and the multiple goal-sensitive model resemble the sub-goal-sensitive model, but with one important difference: here, the information that is extracted is also conditioned on (and influenced by) the agent’s goal(s). This implicitly creates a kind of gradient to the goal that can, in principle, be used to direct a planning algorithm from any start location, with some analogies to the concept of a value function in RL [6].
- (iii)
- The path-sensitive model can be considered as another sub-case of the sub-goal-sensitive model, in which both the initial and the final goal states are known. Different from the goal- (or multiple goal) sensitive models, here, an intuitive metaphor is that of a valley created by a river, where essentially, the very structure of the state space prescribes a (high probability) path. It can be related to diffusion measures introduced in network analysis where vertices can be ranked on the basis of their aptitude to spread information, infection or whatever through a target vertex [49].
- (iv)
- The structure-sensitive model is different from the aforementioned schemes, in that it is not directly related to a goal-directed navigation, but is more revelatory of structural information, such as connectedness in graph theory. Basically, it decomposes the space into subsets that could be used to subdivide a large and complex task in a hierarchy of simpler tasks. Such decomposition can simplify the planning problem by squeezing the effective size of the original state space into a smaller number of abstract states considered by every sub-task [50].

## 3. Experiments and Results

#### 3.1. The Three Experimental Scenarios

- (i)
- Open space scenario. This scenario is composed of 25 discrete states arranged in a square with all transitions enabled for the neighboring states. In this scenario, no structure restrictions (e.g., bottlenecks) are present. See Figure 3a.
- (ii)
- Grid world scenario. This scenario consists of an environment of 29 states grouped into four rooms. Here, rooms are groups of states linked together by restrictions (bottlenecks), i.e., states that allow the transition from a room to another (States $S3,S12,S13,S22$). In our scenario, we can identify four rooms: Room 1 (states $S1,S2,S7,S8$), Room 2 ($S4,S5,S6,S9,S10,S11$), Room 3 ($S14,S15,S16,S19,S20,S21,S25,S26,S27$) and Room 4 ($S17,S18,S23,S24,S28,S29$). See Figure 3b.
- (iii)
- Hanoi Tower scenario. This scenario models a tower of Hanoi (ToH) game of three disks of different sizes that can slide into three rods. Using a well-known mapping [51], the “three pegs-three rods” ToH problem is converted into a path planning problem of 27 states. In this scenario, a recursive community structure can be identified [12], i.e., sets of similar moves separated by bottlenecks represented by different kinds of transitions. In this scenario, it is possible to identify bottlenecks at different hierarchical levels: third order bottlenecks ($S9$–$S11$, $S6$–$S10$ and $S23$–$S24$), second order bottlenecks (e.g., $S3$–$S5$, $S2$–$S4$ and $S7$–$S8$) and first order bottlenecks (e.g., $S1$–$S2$, $S1$–$S3$ and $S2$–$S3$. Transitions among states are shown in Figure 3c.

#### 3.2. Simulations and Results

- The clustering matrix graph illustrates a symmetric matrix whose elements represent the empirical probabilities ${P}_{{C}_{k}}({s}_{i},{s}_{j})$, $k=1,\cdots ,T$, that two states ${s}_{i}$ and ${s}_{j}$ belong to the same cluster ${C}_{k}$ in the 25 repetitions. The matrix values are shown in gray scale with lighter (darker) colors for higher (lower) frequency to find two states in the same cluster. The size of the clustering matrix for a scenario of N states is $N\times N$. The states are sorted in rows and columns by reflecting the best clustering partition of the 25 repetitions, where by the best clustering, we mean the one that maximize Equation (26) (see below). States within the same cluster follow no particular order. The ordering of the different clusters is assigned with an intuitive criterion: when selecting a geCRP scheme, it is possible to assign to every state an algorithmic probability value computed according to the adopted geCRP scheme. Consequently, we can assign to clusters the mean of the probabilities of the states that belong to them, and then, we can order the clusters as a function of these mean values. Intuitively, clusters that are close in this ordering are clusters containing states with similar probability values. As a consequence, the presence of a “block” in the matrix reveals how frequent a cluster appears in the cluster configurations carried out in the 25 runs;
- The best clustering graph shows the best clustering result found in 25 runs. The best clustering result is acknowledged as the cluster configuration that maximizes the log-likelihood:$$\mathcal{L}({C}_{1},\cdots ,{C}_{T}|{s}_{1:N})=\sum _{k=1}^{T}logp\left({s}_{z\left({c}_{1:N}\right)=k}\right|z\left({c}_{1:N}\right))$$

#### 3.3. Results in the Open Space Scenario

#### 3.4. Results in the Grid World Scenario

#### 3.5. Results in the Hanoi Tower Scenario

## 4. Discussion

- The sub-goal-sensitive model is able to reveal useful sub-goal- or bottleneck-related information (i.e., extract states that can act as useful sub-goals or bottlenecks), if present in the environment. This can be considered a clustering method that is “agnostic” with regard to the specific behavioral goal of the agent, but can be used for planning from any start state to any goal state.
- The structure-sensitive model clusters the environment into coherent geometric partitions or shapes, e.g., rooms in a four-room scenario (grid world). However, in the open space, this method does not find coherent partitions; because every couple of states has the same similarity metric, each clustering experiment produces a different partitioning, due to the stochasticity of Gibbs sampling.
- The goal-sensitive model generates a clustering that groups the states according to a gradient-field that culminates in the goal site and affords effective navigation to a known goal. This method shares some resemblances with value functions that are widely used in reinforcement learning methods, which essentially create gradients to reward locations [6]. Our results indicate that this clustering method is appropriate for planning a path to a known goal location, from any start location.
- The path-sensitive model groups states on the basis of their probability values with respect to both a starting and a goal state, both of which are known a priori. The results indicate that this clustering method is appropriate for planning a path from a known start location to a known goal location.
- The multiple goal-sensitive model groups states in a way that is similar to the goal-sensitive model, but for several (known) goal states. The results show that this clustering method is appropriate when the planning problem includes multiple goal locations. Interestingly, the number and location of goals entail a modification, in an informational sense, of the geometry of the scenario. Consider for example the case of the grid world with three goals, one of which is a bottleneck. Although the scenario is characterized by the presence of four rooms, the model only produces three clusters, thus clustering together two rooms. The dependence of the clustering from the agent’s objectives (here, reaching multiple goal locations) is a hallmark of flexible coding schemes.

## 5. Conclusions

## Acknowledgments

## Author Contributions

## Conflicts of Interest

## References

- Olshausen, B.A.; Field, D.J. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature
**1996**, 381, 607–609. [Google Scholar] [CrossRef] [PubMed] - Simoncelli, E.P.; Olshausen, B.A. Natural image statistics and neural representation. Annu. Rev. Neurosci.
**2001**, 24, 1193–1216. [Google Scholar] [CrossRef] [PubMed] - Friston, K. The free-energy principle: A unified brain theory? Nat. Rev. Neurosci.
**2010**, 11, 127–138. [Google Scholar] [CrossRef] [PubMed] - Botvinick, M.; Weinstein, A.; Solway, A.; Barto, A. Reinforcement learning, efficient coding, and the statistics of natural tasks. Curr. Opin. Behav. Sci.
**2015**, 5, 71–77. [Google Scholar] [CrossRef] - Russell, S.; Norvig, P. Artificial Intelligence: A Modern Approach, 3rd ed.; Prentice Hall: Upper Saddle River, NJ, USA, 2009. [Google Scholar]
- Sutton, R.; Barto, A. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 1998. [Google Scholar]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature
**2015**, 518, 529–533. [Google Scholar] [CrossRef] [PubMed] - Van Dijk, S.G.; Polani, D. Grounding sub-goals in information transitions. In Proceedings of the 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), Paris, France, 11–15 April 2011; IEEE: Piscataway, NJ, USA, 2011; pp. 105–111. [Google Scholar]
- Van Dijk, S.G.; Polani, D.; Nehaniv, C.L. Hierarchical behaviours: Getting the most bang for your bit. In Advances in Artificial Life. Darwin Meets von Neumann; Springer: Berlin/Heidelberg, Germany, 2011; pp. 342–349. [Google Scholar]
- Van Dijk, S.; Polani, D. Informational Constraints-Driven Organization in Goal-Directed Behavior. Adv. Complex Syst.
**2013**, 16. [Google Scholar] [CrossRef] - Maisto, D.; Donnarumma, F.; Pezzulo, G. Divide et impera: Subgoaling reduces the complexity of probabilistic inference and problem solving. J. R. Soc. Interface
**2015**, 12. [Google Scholar] [CrossRef] [PubMed] - Solway, A.; Diuk, C.; Cordova, N.; Yee, D.; Barto, A.G.; Niv, Y.; Botvinick, M.M. Optimal behavioral hierarchy. PLoS Comput. Biol.
**2014**, 10, e1003779. [Google Scholar] [CrossRef] [PubMed] - Rigotti, M.; Barak, O.; Warden, M.R.; Wang, X.J.; Daw, N.D.; Miller, E.K.; Fusi, S. The importance of mixed selectivity in complex cognitive tasks. Nature
**2013**, 497, 585–590. [Google Scholar] [CrossRef] [PubMed] - Genovesio, A.; Tsujimoto, S.; Wise, S.P. Encoding goals but not abstract magnitude in the primate prefrontal cortex. Neuron
**2012**, 74, 656–662. [Google Scholar] [CrossRef] [PubMed] - Pezzulo, G.; Castelfranchi, C. Thinking as the Control of Imagination: A Conceptual Framework for Goal-Directed Systems. Psychol. Res. PRPF
**2009**, 73, 559–577. [Google Scholar] [CrossRef] [PubMed] - Pezzulo, G.; Rigoli, F.; Friston, K. Active Inference, homeostatic regulation and adaptive behavioural control. Prog. Neurobiol.
**2015**, 134, 17–35. [Google Scholar] [CrossRef] [PubMed] - Stoianov, I.; Genovesio, A.; Pezzulo, G. Prefrontal goal-codes emerge as latent states in probabilistic value learning. J. Cogn. Neurosci.
**2015**, 28, 140–157. [Google Scholar] [CrossRef] [PubMed] - Verschure, P.F.M.J.; Pennartz, C.M.A.; Pezzulo, G. The why, what, where, when and how of goal-directed choice: Neuronal and computational principles. Philos. Trans. R. Soc. B
**2014**, 369. [Google Scholar] [CrossRef] [PubMed] - Pitman, J. Combinatorial Stochastic Processes; Picard, J., Ed.; Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
- Blei, D.M.; Frazier, P.I. Distance dependent Chinese restaurant processes. J. Mach. Learn. Res.
**2011**, 12, 2461–2488. [Google Scholar] - Murphy, K.P. Machine Learning: A Probabilistic Perspective; MIT Press: Cambridge, MA, USA, 2012; p. 15. [Google Scholar]
- Therrien, C.W. Decision Estimation and Classification: An Introduction to Pattern Recognition and Related Topics; Wiley: Hoboken, NJ, USA, 1989. [Google Scholar]
- Ferguson, T.S. A Bayesian analysis of some nonparametric problems. Ann. Stat.
**1973**, 1, 209–230. [Google Scholar] [CrossRef] - Dahl, D.B. Distance-based probability distribution for set partitions with applications to Bayesian nonparametrics. In JSM Proceedings, Section on Bayesian Statistical Science, Washington, DC, USA, 30 July–6 August 2009.
- Ahmed, A.; Xing, E. Dynamic Non-Parametric Mixture Models and the Recurrent Chinese Restaurant Process: With Applications to Evolutionary Clustering. In Proceedings of the 2008 SIAM International Conference on Data Mining, Atlanta, GA, USA, 24–26 April 2008; pp. 219–230.
- Zhu, X.; Ghahramani, Z.; Lafferty, J. Time-Sensitive Dirichlet Process Mixture Models; Technical Report CMU-CALD-05-104; Carnegie Mellon University: Pittsburgh, PA, USA, May 2005. [Google Scholar]
- Rasmussen, C.E.; Ghahramani, Z. Infinite mixtures of Gaussian process experts. Adv. Neural Inf. Process. Syst.
**2002**, 2, 881–888. [Google Scholar] - Haussler, D. Convolution Kernels on Discrete Structures; Technical Report UCSC-CRL-99-10; University of California at Santa Cruz: Santa Cruz, CA, USA, July 1999. [Google Scholar]
- Jaakkola, T.; Haussler, D. Exploiting generative models in discriminative classifiers. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 1999; pp. 487–493. [Google Scholar]
- Shawe-Taylor, J.; Cristianini, N. Kernel Methods for Pattern Analysis; Cambridge university Press: Cambridge, UK, 2004. [Google Scholar]
- Brodersen, K.H.; Schofield, T.M.; Leff, A.P.; Ong, C.S.; Lomakina, E.I.; Buhmann, J.M.; Stephan, K.E. Generative embedding for model-based classification of fMRI data. PLoS Comput. Biol.
**2011**, 7, e1002079. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Li, M.; Vitányi, P.M. An Introduction to Kolmogorov Complexity and Its Applications; Springer: Berlin/Heidelberg, Germany, 2009. [Google Scholar]
- Solomonoff, R.J. A formal theory of inductive inference. Part I. Inf. Control
**1964**, 7, 1–22. [Google Scholar] [CrossRef] - Solomonoff, R.J. A formal theory of inductive inference. Part II. Inf. Control
**1964**, 7, 224–254. [Google Scholar] [CrossRef] - Solomonoff, R.J. Complexity-based induction systems: Comparisons and convergence theorems. IEEE Trans. Inf. Theory
**1978**, 24, 422–432. [Google Scholar] [CrossRef] - Hutter, M. Universal Artificial Intelligence: Sequential Decisions based on Algorithmic Probability; Springer: Berlin/Heidelberg, Germany, 2005. [Google Scholar]
- Zvonkin, A.K.; Levin, L.A. The complexity of finite objects and the development of the concepts of information and randomness by means of the theory of algorithms. Russ. Math. Surv.
**1970**, 25, 83–124. [Google Scholar] [CrossRef] - Van Dijk, S.G.; Polani, D. Informational constraints-driven organization in goal-directed behavior. Adv. Complex Syst.
**2013**, 16, 1350016. [Google Scholar] [CrossRef] - Newell, A.; Simon, H.A. Human Problem Solving; Prentice Hall: Upper Saddle River, NJ, USA, 1972. [Google Scholar]
- Schölkopf, B.; Tsuda, K.; Vert, J.P. Kernel Methods in Computational Biology; MIT Press: Cambridge, MA, USA, 2004. [Google Scholar]
- Ruiz, A.; López-de-Teruel, P.E. Nonlinear kernel-based statistical pattern analysis. IEEE Trans. Neural Netw.
**2001**, 12, 16–32. [Google Scholar] [CrossRef] [PubMed] - Storn, R.; Price, K. Differential evolution—A simple and efficient heuristic for global optimization over continuous spaces. J. Glob. Optim.
**1997**, 11, 341–359. [Google Scholar] [CrossRef] - Geman, S.; Geman, D. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell.
**1984**, PAMI-6, 721–741. [Google Scholar] [CrossRef] - Robert, C.; Casella, G. Monte Carlo Statistical Methods; Springer: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
- Neal, R.M. Markov chain sampling methods for Dirichlet process mixture models. J. Comput. Graph. Stat.
**2000**, 9, 249–265. [Google Scholar] - Anderson, J.R. The adaptive nature of human categorization. Psychol. Rev.
**1991**, 98, 409–429. [Google Scholar] [CrossRef] - Borgatti, S.P. Centrality and network flow. Soc. Netw.
**2005**, 27, 55–71. [Google Scholar] [CrossRef] - Tishby, N.; Polani, D. Information Theory of Decisions and Actions. In Perception-Action Cycle; Springer: Berlin/Heidelberg, Germany, 2011; pp. 601–636. [Google Scholar]
- Barrat, A.; Barthelemy, M.; Vespignani, A. Dynamical Processes on Complex Networks; Cambridge University Press: Cambridge, UK, 2008. [Google Scholar]
- Barto, A.G.; Mahadevan, S. Recent advances in hierarchical reinforcement learning. Discret. Event Dyn. Syst.
**2003**, 13, 41–77. [Google Scholar] [CrossRef] - Nilsson, N.J. Problem-Solving Methods in Artificial Intelligence; McGraw-Hill: New York, NY, USA, 1971. [Google Scholar]
- Botvinick, M.M. Hierarchical models of behavior and prefrontal function. Trends Cogn. Sci.
**2008**, 12, 201–208. [Google Scholar] [CrossRef] [PubMed] - Kiebel, S.J.; Daunizeau, J.; Friston, K.J. A hierarchy of time-scales and the brain. PLoS Comput. Biol.
**2008**, 4, e1000209. [Google Scholar] [CrossRef] [PubMed] - Tse, D.; Langston, R.F.; Kakeyama, M.; Bethus, I.; Spooner, P.A.; Wood, E.R.; Witter, M.P.; Morris, R.G.M. Schemas and memory consolidation. Science
**2007**, 316, 76–82. [Google Scholar] [CrossRef] [PubMed] - Pezzulo, G.; van der Meer, M.A.A.; Lansink, C.S.; Pennartz, C.M.A. Internally generated sequences in learning and executing goal-directed behavior. Trends Cogn. Sci.
**2014**, 18, 647–657. [Google Scholar] [CrossRef] [PubMed] - McClelland, J.L.; McNaughton, B.L.; O’Reilly, R.C. Why there are complementary learning systems in the hippocampus and neocortex: Insights from the successes and failures of connectionist models of learning and memory. Psychol. Rev.
**1995**, 102, 419–457. [Google Scholar] [CrossRef] [PubMed] - Collins, A.; Koechlin, E. Reasoning, learning, and creativity: Frontal lobe function and human decision-making. PLoS Biol.
**2012**, 10, e1001293. [Google Scholar] [CrossRef] [PubMed] - Donoso, M.; Collins, A.G.; Koechlin, E. Foundations of human reasoning in the prefrontal cortex. Science
**2014**, 344, 1481–1486. [Google Scholar] [CrossRef] [PubMed] - Schapiro, A.C.; Rogers, T.T.; Cordova, N.I.; Turk-Browne, N.B.; Botvinick, M.M. Neural representations of events arise from temporal community structure. Nat. Neurosci.
**2013**, 16, 486–492. [Google Scholar] [CrossRef] [PubMed] - Duncan, J. The multiple-demand (MD) system of the primate brain: Mental programs for intelligent behaviour. Trends Cogn. Sci.
**2010**, 14, 172–179. [Google Scholar] [CrossRef] [PubMed] - Passingham, R.E.; Wise, S.P. The Neurobiology of the Prefrontal Cortex: Anatomy, Evolution, and the Origin of Insight; Oxford University Press: Oxford, UK, 2012. [Google Scholar]
- Donnarumma, F.; Prevete, R.; Chersi, F.; Pezzulo, G. A Programmer-Interpreter Neural Network Architecture for Prefrontal Cognitive Control. Int. J. Neural Syst.
**2015**, 25, 1550017. [Google Scholar] [CrossRef] [PubMed]

**Figure 1.**Schematization of the generative embedded Chinese restaurant process. A generative embedded Chinese restaurant process (geCRP) can be viewed as the composition of three different modules. Initially, in Stage 1, a probabilistic model $\mathcal{M}=(S,{P}_{\mathcal{M}})$ of the problem at hand has to be defined. $\mathcal{M}$ is a probability space where ${P}_{\mathcal{M}}$ is a distribution on the space of events coinciding with the subsets of the problem state space. ${P}_{\mathcal{M}}$ acts as an embedding function that projects (collections of) states in $[0,1]$. To this aim, any probabilistic framework can be adopted. Here, we adopt a methodology of information theory, the algorithmic probability theory, to construct different distributions, as detailed in Section 2.2.1. Thereafter, in Stage 2, a generative embedded kernel ${K}_{\mathcal{M}}:S\times S\to {\mathbb{R}}^{+}$ is defined that represents a similarity metric and determines a generative score space. This stage is described in Equation (3) and can be thought of as being split into two steps: first, the introduction of a kernel function $K:{P}_{\mathcal{M}}\left(\mathcal{F}\right)\times {P}_{\mathcal{M}}\left(\mathcal{F}\right)\to {\mathbb{R}}^{+}$ by choosing, according to the characteristics of the problem, among those ones of common use (Section 2.2.2); second, the composition of K with the function ${P}_{\mathcal{M}}\times {P}_{\mathcal{M}}$, whose domain is the Cartesian product $S\times S$ of the space of states S with itself. In the last stage, Stage 3, the generative embedded kernel is used in a Gibbs sampling scheme (Section 2.2.3) to figure out the cluster partitioning that best explains the observations. The clusters can then be visualized and interpreted according to the generative model adopted in Stage 1.

**Figure 2.**Effects of adding/removing a link in the customer link and table assignment representations. Illustration of the potential effects on cluster configurations considered by the Gibbs sampler when adding/removing a link between two customers, in the customer link (left side) and table assignment (right side) representations. When the gray link is removed, it could leave the cluster configuration unaltered (the first two examples from the top) or separate a cluster into two (the example down in the figure). When it is added, it could leave the clustering (the first two examples) or it might merge two distinct clusters into one (the last example).

**Figure 3.**2D discrete scenarios used in the experimental analysis. (

**a**) Open space, a scenario without restrictions and bottlenecks; (

**b**) grid world, four different groups of states linked together by four bottlenecks; (

**c**) Hanoi Tower, a model of the tower of Hanoi game in the configuration of three disks and three rods.

**Figure 4.**Test OS1: $geCRP/subg/L$ results in the open space scenario (rectangles highlight the first cluster of sub-goals $\left\{S13\right\}$). (

**a**) Clustering matrix; (

**b**) best clustering. See Table 1 for the parameters and the text for details.

**Figure 5.**Test OS2: $geCRP/struc/L$ results in the open space scenario. (

**a**) Clustering matrix; (

**b**) best clustering. See Table 1 for the parameters and the text for details.

**Figure 6.**Test OS3: $geCRP/goal/L$ results in the open space scenario (${s}_{g}=S19$, highlighted with rectangles in both panels). (

**a**) Clustering matrix; (

**b**) best clustering. See Table 1 for the parameters and the text for details.

**Figure 7.**Test OS4: $geCRP/path/L$ results in the open space scenario (${s}_{o}=S1$, ${s}_{g}=S25$, highlighted with rectangles in both panels). (

**a**) Clustering matrix; (

**b**) best clustering. See Table 1 for the parameters and the text for details.

**Figure 8.**Test OS5: $geCRP/multig/L$ on the open space scenario(goals $=\{S1,S21,S5,S25\}$, highlighted with rectangles in both panels). (

**a**) Clustering matrix; (

**b**) best clustering. See Table 1 for the parameters and the text for details.

**Figure 9.**Test GW1: $geCRP/subg/L$ results in the grid world scenario (rectangles highlight the first sub-goal cluster $\{S10$, $S15$, $S20$, $S21$, $S25\}$). (

**a**) Clustering matrix; (

**b**) best clustering. See Table 2 for the parameters and the text for details.

**Figure 10.**Test GW2: $geCRP/struc/L$ results in the grid world scenario (rectangles highlight bottlenecks $S3$, $S12$, $S13$, $S22$). (

**a**) Clustering matrix; (

**b**) best clustering. See Table 2 for the parameters and the text for details.

**Figure 11.**Test GW3: $geCRP/goal/L$ results in the grid world scenario (${s}_{g}=S25$. highlighted with rectangles in both panels). (

**a**) Clustering matrix; (

**b**) best clustering. See Table 2 for the parameters and the text for details.

**Figure 12.**Test GW4: $geCRP/path/L$ results in the grid world scenario (${s}_{o}=S6$, ${s}_{g}=S25$, highlighted with rectangles in both panels). (

**a**) Clustering matrix; (

**b**) best clustering. See Table 2 for the parameters and the text for details.

**Figure 13.**Test GW5: $geCRP/multig/L$ in the grid world scenario (goals $=\{S25,S1,S13\}$, highlighted with rectangles in both panels). (

**a**) Clustering matrix; (

**b**) best clustering. See Table 2 for the parameters and the text for details.

**Figure 14.**Test GW6: $geCRP/multig/L$ in the grid world scenario (goals $=\{S25,S1,S13,S6,S29\}$, highlighted with rectangles in both panels). (

**a**) Clustering matrix; (

**b**) best clustering. See Table 2 for the parameters and the text for details.

**Figure 15.**Test ToH1: $geCRP/subg/L$ results in the Hanoi Tower scenario (rectangles highlight the first sub-goal cluster $\{S2$, $S3$, $S16$, $S19$, $S21$, $S26\}$). (

**a**) Clustering matrix; (

**b**) best clustering. See Table 3 for the parameters and the text for details.

**Figure 16.**Test ToH2: $geCRP/struc/L$ results in the Hanoi Tower scenario (rectangles highlight third order bottlenecks). (

**a**) Clustering matrix; (

**b**) best clustering. See Table 3 for the parameters and the text for details.

**Figure 17.**Test ToH3: $geCRP/goal/L$ results in the Hanoi Tower scenario (${s}_{g}=S20$, highlighted with rectangles). (

**a**) Clustering matrix; (

**b**) best clustering. See Table 3 for the parameters and the text for details.

**Figure 18.**Test ToH4: $geCRP/path/L$ results in the Hanoi Tower scenario (${s}_{o}=S27$, ${s}_{g}=S20$, highlighted with rectangles). (

**a**) Clustering matrix; (

**b**) best clustering. See Table 3 for the parameters and the text for details.

**Figure 19.**Test ToH5: $geCRP/multig/L$ results in the grid world scenario (goals $=\{S27,S20\}$, highlighted with rectangles). (

**a**) Clustering matrix; (

**b**) best clustering. See Table 3 for the parameters and the text for details.

**Figure 20.**Test ToH6: $geCRP/multig/L$ in the Hanoi Tower scenario (goals $=\{S1,S11,S20,S27\}$, highlighted with rectangles). (

**a**) Clustering matrix; (

**b**) best clustering. See Table 3 for the parameters and the text for details.

**Table 1.**Parameter setup for Tests OS1–5 in the open space scenario. geCRP, generative embedded Chinese restaurant process.

Test | OS1 | OS2 | OS3 | OS4 | OS5 |
---|---|---|---|---|---|

geCRP scheme | $geCRP/subg/L$ | $geCRP/struc/L$ | $geCRP/goal/L$ | $geCRP/path/L$ | $geCRP/multig/L$ |

α | 0.08 | 0.08 | 0.008 | 0.008 | 0.08 |

λ | $1\times {10}^{-5}$ | $1\times {10}^{-5}$ | $1\times {10}^{-4}$ | $1\times {10}^{-5}$ | 0.1 |

case-specific setup | – | – | ${s}_{g}=19$ | ${s}_{o}=S1$, ${s}_{g}=S25$ | goals: $S1$, $S21$, $S5$, $S25$ |

Test | GW1 | GW2 | GW3 | GW4 | GW5 | GW6 |
---|---|---|---|---|---|---|

geCRP scheme | $geCRP/subg/L$ | $geCRP/struc/L$ | $geCRP/goal/L$ | $geCRP/path/L$ | $geCRP/multig/L$ | $geCRP/multig/L$ |

α | 0.008 | 0.08 | 0.008 | 0.008 | 0.01 | 0.01 |

λ | $1\times {10}^{-3}$ | $1\times {10}^{-4}$ | $1\times {10}^{-6}$ | $1\times {10}^{-7}$ | $1\times {10}^{-7}$ | $1\times {10}^{-7}$ |

case-specific | – | – | ${s}_{g}=S25$ | ${s}_{o}=S6$, ${s}_{g}=S25$ | goals: $S25$, $S1$, $S13$ | goals: $S25$, $S1$, |

setup | $S13$, $S6$, $S29$ |

Test | ToH1 | ToH2 | ToH3 | ToH4 | ToH5 | ToH6 |
---|---|---|---|---|---|---|

geCRP scheme | $geCRP/subg/L$ | $geCRP/struc/L$ | $geCRP/goal/L$ | $geCRP/path/L$ | $geCRP/multig/L$ | $geCRP/multig/L$ |

α | 0.08 | 0.08 | 0.008 | 0.08 | 0.1 | 0.1 |

λ | $1\times {10}^{-5}$ | $1\times {10}^{-5}$ | $1\times {10}^{-7}$ | $1\times {10}^{-8}$ | $1\times {10}^{-3}$ | $1\times {10}^{-3}$ |

case-specific | – | – | ${s}_{g}=S20$ | ${s}_{o}=S27$, ${s}_{g}=S20$ | goals: $S27$, $S20$ | goals: $S1$, $S11$, |

setup | $S20$, $S27$ |

© 2016 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons by Attribution (CC-BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Maisto, D.; Donnarumma, F.; Pezzulo, G.
Nonparametric Problem-Space Clustering: Learning Efficient Codes for Cognitive Control Tasks. *Entropy* **2016**, *18*, 61.
https://doi.org/10.3390/e18020061

**AMA Style**

Maisto D, Donnarumma F, Pezzulo G.
Nonparametric Problem-Space Clustering: Learning Efficient Codes for Cognitive Control Tasks. *Entropy*. 2016; 18(2):61.
https://doi.org/10.3390/e18020061

**Chicago/Turabian Style**

Maisto, Domenico, Francesco Donnarumma, and Giovanni Pezzulo.
2016. "Nonparametric Problem-Space Clustering: Learning Efficient Codes for Cognitive Control Tasks" *Entropy* 18, no. 2: 61.
https://doi.org/10.3390/e18020061