# Inferring an Observer’s Prediction Strategy in Sequence Learning Experiments

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Background

#### 2.1. A Hypothetical Experiment

#### 2.2. Some Hypothetical Observers

#### 2.3. Memory, Complexity, and Order-R Markov Models

## 3. Results

#### 3.1. A Simple, Principled Strategy for Inferring an Observer’s Prediction Algorithm

#### 3.2. With Infinite Identical Observers, We Can Infer Observers’ Prediction Strategies

**Lemma**

**1.**

**Proof.**

**Theorem**

**1.**

**Proof.**

#### 3.3. With Reasonable Amounts of Finite Data, We Can Only Infer the Model Class

## 4. Discussion

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

## Abbreviations

TLA | Three letter acronym |

LD | linear dichroism |

## Appendix A. Derivation for Expressions of Likelihood

## References

- Friston, K.J.; Daunizeau, J.; Kilner, J.; Kiebel, S.J. Action and behavior: A free-energy formulation. Biol. Cybern.
**2010**, 102, 227–260. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Clark, A. Whatever next? Predictive brains, situated agents, and the future of cognitive science. Behav. Brain Sci.
**2013**, 36, 181–204. [Google Scholar] [CrossRef] [PubMed] - Hohwy, J. The Predictive Mind; Oxford University Press: Oxford, UK, 2013. [Google Scholar]
- Von Helmholtz, H. Handbuch der physiologischen Optik: mit 213 in den Text eingedruckten Holzschnitten und 11 Tafeln. 1860. Available online: https://books.google.co.uk/books?hl=en&lr=&id=4u7lRLnD11IC&oi=fnd&pg=PA8&dq=Handbuch+der+Physiologischen+Optik&ots=XQkB-n05Cp&sig=syrtv5qmLp9ssAhHdCm9zYUWV2Y#v=onepage&q=Handbuch%20der%20Physiologischen%20Optik&f=false (accessed on 1 July 2020).
- Attenave, F. Applications of Information Theory to Psychology: A Summary of Basic Concepts, Methods and Results; Holt-Dryden Book: New York, NY, USA, 1959. [Google Scholar]
- Dayan, P.; Hinton, G.E.; Neal, R.M.; Zemel, R.S. The helmholtz machine. Neural Comput.
**1995**, 7, 889–904. [Google Scholar] [CrossRef] [PubMed] - Friston, K. The free-energy principle: A unified brain theory? Nat. Rev. Neurosci.
**2010**, 11, 127–138. [Google Scholar] [CrossRef] - Clark, A. Embodied Prediction. 2015. Available online: https://open-mind.net/papers/embodied-prediction (accessed on 1 July 2020).
- Knill, D.C.; Pouget, A. The Bayesian brain: The role of uncertainty in neural coding and computation. Trends Neurosci.
**2004**, 27, 712–719. [Google Scholar] [CrossRef] - Tenenbaum, J.B.; Kemp, C.; Griffiths, T.L.; Goodman, N.D. How to grow a mind: Statistics, structure, and abstraction. Science
**2011**, 331, 1279–1285. [Google Scholar] [CrossRef] [Green Version] - Gershman, S.J.; Horvitz, E.J.; Tenenbaum, J.B. Computational rationality: A converging paradigm for intelligence in brains, minds, and machines. Science
**2015**, 349, 273–278. [Google Scholar] [CrossRef] - Chater, N.; Oaksford, M. Ten years of the rational analysis of cognition. Trends Cogn. Sci.
**1999**, 3, 57–65. [Google Scholar] [CrossRef] - Griffiths, T.L.; Kemp, C.; Tenenbaum, J.B. Bayesian Models of Cognition. 2008. Available online: https://kilthub.cmu.edu/articles/Bayesian_models_of_cognition/6613682 (accessed on 1 July 2020).
- Körding, K.P.; Wolpert, D.M. Bayesian decision theory in sensorimotor control. Trends Cogn. Sci.
**2006**, 10, 319–326. [Google Scholar] [CrossRef] - Yuille, A.; Kersten, D. Vision as Bayesian inference: Analysis by synthesis? Trends Cogn. Sci.
**2006**, 10, 301–308. [Google Scholar] [CrossRef] [Green Version] - Weiss, Y.; Simoncelli, E.P.; Adelson, E.H. Motion illusions as optimal percepts. Nat. Neurosci.
**2002**, 5, 598–604. [Google Scholar] [CrossRef] [PubMed] - Orbán, G.; Fiser, J.; Aslin, R.N.; Lengyel, M. Bayesian learning of visual chunks by human observers. Proc. Natl. Acad. Sci. USA
**2008**, 105, 2745–2750. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Tenenbaum, J.B.; Griffiths, T.L.; Kemp, C. Theory-based Bayesian models of inductive learning and reasoning. Trends Cogn. Sci.
**2006**, 10, 309–318. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Goodman, N.; Tenenbaum, J.B.; Black, M.J. A Bayesian framework for cross-situational word-learning. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 3–8 December 2018; pp. 457–464. [Google Scholar]
- Griffiths, T.L.; Sobel, D.M.; Tenenbaum, J.B.; Gopnik, A. Bayes and blickets: Effects of knowledge on causal induction in children and adults. Cogn. Sci.
**2011**, 35, 1407–1455. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Hamrick, J.B.; Battaglia, P.W.; Griffiths, T.L.; Tenenbaum, J.B. Inferring mass in complex scenes by mental simulation. Cognition
**2016**, 157, 61–76. [Google Scholar] [CrossRef] [Green Version] - Oaksford, M.; Chater, N. The probabilistic approach to human reasoning. Trends Cogn. Sci.
**2001**, 5, 349–357. [Google Scholar] [CrossRef] [Green Version] - Trenti, E.J.; Barraza, J.F.; Eckstein, M.P. Learning motion: Human vs. optimal Bayesian learner. Vis. Res.
**2010**, 50, 460–472. [Google Scholar] [CrossRef] [Green Version] - Tjan, B.S.; Braje, W.L.; Legge, G.E.; Kersten, D. Human efficiency for recognizing 3-D objects in luminance noise. Vis. Res.
**1995**, 35, 3053–3069. [Google Scholar] [CrossRef] [Green Version] - Abbey, C.K.; Pham, B.T.; Shimozaki, S.S.; Eckstein, M.P. Contrast and stimulus information effects in rapid learning of a visual task. J. Vis.
**2008**, 8, 8. [Google Scholar] [CrossRef] [Green Version] - Battaglia, P.W.; Kersten, D.; Schrater, P.R. How haptic size sensations improve distance perception. PLoS Comput. Biol.
**2011**, 7, e1002080. [Google Scholar] [CrossRef] [Green Version] - Morales, J.; Solovey, G.; Maniscalco, B.; Rahnev, D.; de Lange, F.P.; Lau, H. Low attention impairs optimal incorporation of prior knowledge in perceptual decisions. Atten. Percept. Psychophys.
**2015**, 77, 2021–2036. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Adler, W.T.; Ma, W.J. Comparing Bayesian and non-Bayesian accounts of human confidence reports. PLoS Comput. Biol.
**2018**, 14, e1006572. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Maniscalco, B.; Lau, H. A signal detection theoretic approach for estimating metacognitive sensitivity from confidence ratings. Conscious. Cogn.
**2012**, 21, 422–430. [Google Scholar] [CrossRef] [PubMed] - Anderson, B.L.; O’Vari, J.; Barth, H. Non-Bayesian contour synthesis. Curr. Biol.
**2011**, 21, 492–496. [Google Scholar] [CrossRef] [Green Version] - Fu, W.T.; Gray, W.D. Suboptimal tradeoffs in information seeking. Cogn. Psychol.
**2006**, 52, 195–242. [Google Scholar] [CrossRef] - Rahnev, D.; Denison, R. Suboptimality in perception. bioRxiv
**2016**, 060194. [Google Scholar] [CrossRef] - Bowers, J.S.; Davis, C.J. Bayesian just-so stories in psychology and neuroscience. Psychol. Bull.
**2012**, 138, 389. [Google Scholar] [CrossRef] [Green Version] - Strelioff, C.C.; Crutchfield, J.P.; Hübler, A.W. Inferring Markov chains: Bayesian estimation, model comparison, entropy rate, and out-of-class modeling. Phys. Rev. E
**2007**, 76, 011106. [Google Scholar] [CrossRef] [Green Version] - Visser, I.; Raijmakers, M.E.; Molenaar, P.C. Characterizing sequence knowledge using online measures and hidden Markov models. Mem. Cogn.
**2007**, 35, 1502–1517. [Google Scholar] [CrossRef] [Green Version] - Vulkan, N. An economist’s perspective on probability matching. J. Econ. Surv.
**2000**, 14, 101–118. [Google Scholar] [CrossRef] - Corner, A.; Harris, A.; Hahn, U. Conservatism in belief revision and participant skepticism. In Proceedings of the Annual Meeting of the Cognitive Science Society, Portland, OR, USA, 11–14 August 2010; Volume 32. [Google Scholar]
- Crutchfield, J.P.; Feldman, D.P. Regularities unseen, randomness observed: Levels of entropy convergence. Chaos Interdiscip. J. Nonlinear Sci.
**2003**, 13, 25–54. [Google Scholar] [CrossRef] [PubMed]

**Figure 1.**The quintessential experiment to reveal an observer’s prediction strategy: observations are shown to the observer, who then tries to predict the next observation. This happens repeatedly for as many trials as the observer can stand.

**Figure 2.**Two example order-R Markov models. Order 0 (

**top**) and 1 (

**bottom**) with the finite alphabet $\mathcal{A}=\{0,1\}$.

**Figure 3.**The average error in ${\varphi}_{ngram-argmax}$ parameter inference for n-gram argmax prediction strategy and ${\varphi}_{ngram-average}$ for the n-gram average prediction strategy shows that the inference seems to perform perfectly. For n-gram average, we see that every combination of parameters was able to produce perfect estimates for ${\varphi}_{ngram-average}$. For n-gram argmax, the sample sizes were 7 for each pair corresponding to $R=2,6$ and 10 for $R=4$. For n-gram average, the sample sizes were 9 for each $(R,N)$ pair.

**Figure 4.**Surface and contour plot of log likelihood of the n-gram argmax prediction strategy show a peak at some combination of parameters. On the x and y axes are parameters ${\varphi}_{ngram-argmax}$ (the ratio of $\beta $, the probability of dropping an observation, to $\alpha -1$, the concentration parameter subtracted by 1) and $\gamma $ (the regularization term in the prior over models), and on the z-axis is the log likelihood of the observer model for a string of inputs, averaged over infinite identical observers. It is difficult to see in the pictures, but there are ridges in the average log likelihood as a function of ${\varphi}_{ngram-argmax}$, which we still cannot explain.

**Figure 5.**Surface and contour plot of log likelihood of the n-gram average prediction strategy show a much smoother surface than n-gram argmax. On the x and y axes are parameters ${\varphi}_{ngram-average}$ (the ratio of $\beta $, the probability of dropping an observation, to $\alpha $, the concentration parameter subtracted by 1) and $\gamma $ (the regularization term in the prior over models), and on the z-axis is the log likelihood of the observer model for a string of inputs, averaged over infinite identical observers. This appears to be a much nicer surface to optimize over, though it is not without its ridges.

**Table 1.**The confusion matrix for strategy inference shows that we are able to perfectly infer the observer’s prediction model class, even if we are not able to perfectly infer the exact parameters.

Actual Strategy | ||||
---|---|---|---|---|

Bayesian | GLM | Total | ||

Inferred Strategy | Bayesian | 100 | 0 | 100 |

GLM | 0 | 100 | 100 | |

Total | 100 | 100 | 200 |

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Uppal, A.; Ferdinand, V.; Marzen, S.
Inferring an Observer’s Prediction Strategy in Sequence Learning Experiments. *Entropy* **2020**, *22*, 896.
https://doi.org/10.3390/e22080896

**AMA Style**

Uppal A, Ferdinand V, Marzen S.
Inferring an Observer’s Prediction Strategy in Sequence Learning Experiments. *Entropy*. 2020; 22(8):896.
https://doi.org/10.3390/e22080896

**Chicago/Turabian Style**

Uppal, Abhinuv, Vanessa Ferdinand, and Sarah Marzen.
2020. "Inferring an Observer’s Prediction Strategy in Sequence Learning Experiments" *Entropy* 22, no. 8: 896.
https://doi.org/10.3390/e22080896