# Learning Genetic Population Structures Using Minimization of Stochastic Complexity

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Results and Discussion

#### 2.1. Tree-based factorization of the joint distribution of multilocus genotypes

#### 2.2. Chow expansion of the of the joint distribution of multilocus genotypes

#### Prior predictive data distributions under Chow expansion

#### 2.3. Stochastic complexity and learning of classifications and tree structures

#### Asymptotic expansion of the stochastic complexity for a Chow expansion

#### The stochastic complexity for an unsupervised classification under Chow expansion

**Figure 1.**Graphical representation of the dependence structure for an unsupervised classification model augmented by Chow-Liu trees. Here $d=5$ and $k=2$ and the unbroken arrow lines correspond to dependence between the stochastic nodes and the dashed arrows correspond to the dependence of the root nodes on the classification variable λ, which is connected to the trees by a random switch (represented by the curved arrow) according the probabilities in λ.

#### 2.4. Algorithms for learning unsupervised classifications and Chow expansions

#### Deterministic algorithm for learning Chow expansions

**A1.**- Compute the numbers$$I{P}_{i,j}=\sum _{u=0}^{1}\sum _{v=0}^{1}{\widehat{P}}_{i,j}\left(u,v\right)log\frac{{\widehat{P}}_{i,j}\left(u,v\right)}{{\widehat{P}}_{i}\left(u\right)\xb7{\widehat{P}}_{j}\left(v\right)}-\frac{1}{2}\xb7\frac{1}{t}\left[log\left({n}_{c}\left(1\right)\right)+log\left({n}_{c}\left(0\right)\right)\right]$$
**A2.**- Construct a complete undirected graph with the binary variables as nodes.
**A3.**- Construct a maximum weighted spanning tree with the extra condition that an edge is in the tree only if $I{P}_{i,j}>0$.
**A4.**- Make the maximum weighted spanning tree directed by choosing a root variable and setting the direction of all edges to be outward from the root.

**A**$\mathbf{3}$, when the condition for permitting disconnected graphs is not imposed. The most time honoured algorithm for the task is the Borůvka-Choquet-Kruskal algorithm [29].

**A1**-

**A4**we have a tree structure

#### Deterministic algorithm for learning unsupervised classification augmented by Chow expansions

**B1.**- Fix k, set $w=0$ and store an arbitrary (random) ${U}_{\left(w\right)}$.
**B2.**- Find the structure ${\widehat{\Pi}}_{\left(w\right)}$ maximizing$$\sum _{c=1}^{k}\frac{{t}_{c}}{n}\sum _{i=2}^{d}{I}_{i,{\Pi}_{c}\left(i\right)}-\frac{1}{2}k\xb7\left(2d\right)\frac{logn}{n}$$
**A1-A4**). **B3.**- For ${U}_{\left(w\right)}$ and ${\widehat{\Pi}}_{\left(w\right)}$ compute the maximum likelihood estimates ${\widehat{\Theta}}_{\left(w\right)}$ and ${\widehat{\lambda}}_{\left(w\right)}$.
**B4.**- Given ${\widehat{\Theta}}_{\left(w\right)}$, ${\widehat{\lambda}}_{\left(w\right)}$, and ${\widehat{\Pi}}_{\left(w\right)}$ determine ${U}_{(w+1)}={\left\{{\left({u}_{c}^{\left(l\right)}\right)}_{(w+1)}\right\}}_{c,l=1}^{n,k}$ using$${\left({u}_{c}^{\left(l\right)}\right)}_{(w+1)}=\left\{\begin{array}{cc}1\hfill & \text{if}\phantom{\rule{4.pt}{0ex}}{c}_{*}^{\left(l\right)}=c\phantom{\rule{4.pt}{0ex}}\hfill \\ 0\hfill & \text{otherwise,}\hfill \end{array}\right.$$$${c}_{*}^{\left(l\right)}=arg\underset{1\le c\le k}{max}{P}_{{\underline{\widehat{\theta}}}_{c},{\underline{\widehat{\varphi}}}_{c}}\left({x}^{\left(l\right)}\mid {\Pi}_{c}\right){\widehat{\lambda}}_{c}.$$
**B5.**- If ${U}_{(w+1)}={U}_{\left(w\right)}$, then stop, otherwise set $w=w+1$ and go to
**B**2.

**B2**only a finite number of times and, after having stopped, will have found a local minimum of

**B4**.

#### 2.5. Discussion

## Acknowledgements

## Appendices

#### A.1. Prior predictive data distributions under Chow expansion

#### A.2. Asymptotic expansion of the stochastic complexity for a Chow expansion

## References

- Ewens, W.J. Mathematical Population Genetics, 2nd ed.; Springer-Verlag: New York, NY, USA, 2004. [Google Scholar]
- Nagylaki, T. Theoretical Population Genetics; Springer-Verlag: Berlin, Germany, 1992. [Google Scholar]
- Pritchard, J.K.; Stephens, M.; Donnelly, P. Inference of population structure using multilocus genotype data. Genetics
**2000**, 155, 945–959. [Google Scholar] - Dawson, K.J.; Belkhir, K. A Bayesian approach to the identification of panmictic populations and the assignment of individuals. Genet. Res.
**2001**, 78, 59–77. [Google Scholar] [CrossRef] [PubMed] - Corander, J.; Waldmann, P.; Sillanpää, M.J. Bayesian analysis of genetic differentiation between populations. Genetics
**2003**, 163, 367–374. [Google Scholar] [PubMed] - Corander, J.; Marttinen, P. Bayesian identification of admixture events using multi-locus molecular markers. Mol. Ecol.
**2006**, 15, 2833–2843. [Google Scholar] [CrossRef] [PubMed] - Corander, J.; Gyllenberg, M.; Koski, T. Random Partition models and Exchangeability for Bayesian Identification of Population Structure. Bull. Math. Biol.
**2007**, 69, 797–815. [Google Scholar] [CrossRef] [PubMed] - Falush, D.; Stephens, M.; Pritchard, J.K. Inference of population structure using multilocus genotype data: Linked loci and correlated allele frequencies. Genetics
**2003**, 164, 1567–1587. [Google Scholar] [PubMed] - Guillot, G.; Estoup, A.; Mortier, F.; Cosson, J.F. A spatial statistical model for landscape genetics. Genetics
**2005**, 170, 1261–1280. [Google Scholar] [CrossRef] [PubMed] - Guillot, G.; Leblois, R.; Coulon, A.; Frantz, A.C. Statistical methods in spatial genetics. Mol. Ecol.
**2010**, 18, 4734–4756. [Google Scholar] [CrossRef] [PubMed] - Gyllenberg, M.; Carlsson, J.; Koski, T. Bayesian Network Classification of Binarized DNA Fingerprinting Patterns. In Mathematical Modelling and Computing in Biology and Medicine; Capasso, V., Ed.; Progetto Leonardo: Bologna, Italy, 2003; pp. 60–66. [Google Scholar]
- Chow, C.K.; Liu, C.N. Approximating Discrete Probability Distributions with Dependence Trees. IEEE Trans. Inf. Theory
**1968**, 14, 462–467. [Google Scholar] [CrossRef] - Friedman, N.; Geiger, D.; Goldszmidt, M. Bayesian Network Classifiers. Mach. Learn.
**1997**, 29, 1–36. [Google Scholar] [CrossRef] - Corander, J.; Tang, J. Bayesian analysis of population structure based on linked molecular information. Math. Biosci.
**2007**, 205, 19–31. [Google Scholar] [CrossRef] [PubMed] - Cowell, R.G.; Dawid, A.P.; Lauritzen, S.L.; Spiegelhalter, D.J. Probabilistic Networks and Expert Systems; Springer-Verlag: New York, NY, USA, 1999. [Google Scholar]
- Koski, T.; Noble, J.N. Bayesian Networks: an Introduction; Wiley: Chichester, UK, 2009. [Google Scholar]
- Meil, M.; Jordan, M.I. Learning with Mixtures of Trees. J. Mach. Learn. Res.
**2000**, 1, 1–48. [Google Scholar] - Pearl, J. Probabilistic Reasoning in Intelligent Systems; Morgan Kaufmann: San Francisco, CA, USA, 1988. [Google Scholar]
- Becker, A.; Geiger, D.; Meek, C. Perfect Tree-like Markovian Distributions. Proc. 16th Conf. Uncertainty in Artificial Intelligence
**2000**, 19–23. [Google Scholar] - Heckerman, D.; Geiger, D.; Chickering, D.M. Learning Bayesian Networks: The combination of knowledge and statistical data. Mach. Learn.
**1995**, 20, 197–243. [Google Scholar] [CrossRef] - Heckerman, D.; Geiger, D.; Chickering, D.M. Likelihoods and Parameter Priors for Bayesian Networks. Microsoft Res. Tech. Rep. MSR-TR-95-54.
- Duda, R.O.; Hart, P.E. Pattern Classification and Scene Analysis; Wiley: New York, NY, USA, 1973. [Google Scholar]
- Clarke, B.S.; Barron, A.R. Jeffreys’ prior is asymptotically least favorable under entropy risk. J. Stat. Planning Inference
**1994**, 41, 37–60. [Google Scholar] [CrossRef] - Rissanen, J. Fisher Information and Stochastic Complexity. IEEE Trans. Inf. Theory
**1996**, 42, 40–47. [Google Scholar] [CrossRef] - Cover, T.M.; Thomas, J.A. Elements of Information Theory; Wiley: New York, NY, USA, 1991. [Google Scholar]
- Gyllenberg, M.; Koski, T. Bayesian Predictiveness, Exchangeability and Sufficientness in Bacterial Taxonomy. Math. Biosci.
**2002**, 177 & 178, 161–184. [Google Scholar] [CrossRef] - DeGroot, M.H. Optimal Statistical Decisions; McGraw-Hill: New York, NY, USA, 1970. [Google Scholar]
- Suzuki, J. Learning Bayesian Belief Networks Based on the Minimum Description Length Principle: Basic Properties. IEICE Trans. Fundamentals
**1999**, 82, 2237–2245. [Google Scholar] - Kučera, L. Combinatorial Algorithms; Adam Hilger: Bristol, UK, 1990. [Google Scholar]
- Schwartz, G. Estimating the Dimension of a Model. Ann. Statist.
**1978**, 6, 461–464. [Google Scholar] [CrossRef] - Gyllenberg, M.; Koski, T.; Verlaan, M. Classification of Binary Vectors by Stochastic Complexity. J. Multiv. Analysis
**1997**, 63, 47–72. [Google Scholar] [CrossRef] - Kass, R.E.; Wasserman, L. A Reference Bayesian Test for Nested Hypotheses and Its Relationship to the Schwartz criterion. J. Amer. Stat. Assoc.
**1995**, 90, 928–934. [Google Scholar] [CrossRef] - Drton, M.; Sturmfels, B.; Sullivant, S. Lectures on Algebraic Statistics; Birkhäuser: Basel, Switzerland, 2005. [Google Scholar]
- Rusakov, D.; Geiger, D. Asymptotic Model Selection for Naive Bayesian Networks. J. Mach. Learn. Res.
**2005**, 6, 1–35. [Google Scholar] - Biernacki, C.; Celeux, G.; Covaert, G. Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood. IEEE Trans. Patt. Anal. Mach. Intel.
**2000**, 28, 719–725. [Google Scholar] [CrossRef] - Haughton, D.M.A. On the Choice of the Model to Fit Data from an Exponential Family. Ann. Statist.
**1988**, 16, 342–355. [Google Scholar] [CrossRef] - Wong, S.K.M.; Poon, F.C.S. Comments on the Approximating Discrete Probability Distributions with Dependence Trees. IEEE Trans. Patt. Anal. Mach. Intel.
**1989**, 11, 333–335. [Google Scholar] [CrossRef] - Balagani, K.S.; Phoha, V.V. On the Relationship between Dependence Tree Classification Error and Bayes Error Rate. IEEE Trans. Patt. Anal. Mach. Intel.
**2007**, 29, 1866–1868. [Google Scholar] [CrossRef] [PubMed] - Rissanen, J. Stochastic Complexity in Learning. J. Comp. System Sci.
**1997**, 55, 89–95. [Google Scholar] [CrossRef] - Vitányi, P.M.B.; Li, M. Minimum Description Length Induction, Bayesianism, and Kolmogorov Complexity. IEEE Trans. Inf. Theory
**2000**, 46, 446–464. [Google Scholar] [CrossRef] - Yamanishi, J.K. A decision-theoretic extension of stochastic complexity and its applications to learning. IEEE Trans. Inf. Theory
**1998**, 44, 1424–1439. [Google Scholar] [CrossRef] - Gyllenberg, M.; Koski, T.; Lund, T.; Gyllenberg, H.G. Bayesian Predictive Identification and Cumulative Classification of Bacteria. Bull. Math. Biol.
**1999**, 61, 85–111. [Google Scholar] [CrossRef] [PubMed] - Friedman, N.; Goldszmidt, M. Learning Bayesian Networks with Local Structure. In Learning in Graphical Models; Jordan, M., Ed.; MIT Press: Cambridge, MA, USA, 1997; pp. 421–459. [Google Scholar]
- Lam, W.; Bacchus, F. Learning Bayesian Belief Networks: An Approach Based on the MDL Principle. Comput. Intel.
**1994**, 10, 269–293. [Google Scholar] [CrossRef] - Buntine, W. A guide to the literature on learning probabilistic networks from data. IEEE Trans. Knowl. Data Eng.
**1996**, 8, 195–210. [Google Scholar] [CrossRef] - Sangüesa, R.; Cortés, U. Learning causal networks from data: a survey and a new algorithm for recovering possibilistic causal networks. AI Commun.
**1997**, 10, 31–61. [Google Scholar] - Chow, C.K.; Wagner, T.J. Consistency of an estimate of tree-dependent probability distributions. IEEE Trans. Inf. Theory
**1973**, 19, 369–371. [Google Scholar] [CrossRef] - Cooper, G.F.; Herskovits, E. A Bayesian Method for the Induction of Probabilistic Networks from Data. Mach. Learn.
**1992**, 9, 309–347. [Google Scholar] [CrossRef] - Chickering, D.M. Learning Bayesian Networks is NP-Complete. In Learning from Data. Artificial Intelligence and Statistics; V. Fisher, D., Lenz, H-J., Eds.; Springer-Verlag: New York, NY, USA, 1996; pp. 121–130. [Google Scholar]
- Corander, J.; Gyllenberg, M.; Koski, T. Bayesian model learning based on a parallel MCMC strategy. Stat. Comput.
**2006**, 16, 355–362. [Google Scholar] [CrossRef] - Corander, J.; Ekdahl, M.; Koski, T. Parallell interacting MCMC for learning of topologies of graphical models. Data Mining Knowl. Discovery
**2008**, 17, 431–456. [Google Scholar] [CrossRef] - Corander, J.; Gyllenberg, M.; Koski, T. Bayesian unsupervised classification framework based on stochastic partitions of data and a parallel search strategy. Adv. Data Anal. Classification
**2009**, 3, 3–24. [Google Scholar] [CrossRef]

© 2010 by the authors; licensee MDPI Basel, Switzerland. This article is an open-access article distributed under the terms and conditions of the Creative Commons Attribution license http://creativecommons.org/licenses/by/3.0/.

## Share and Cite

**MDPI and ACS Style**

Corander, J.; Gyllenberg, M.; Koski, T.
Learning Genetic Population Structures Using Minimization of Stochastic Complexity. *Entropy* **2010**, *12*, 1102-1124.
https://doi.org/10.3390/e12051102

**AMA Style**

Corander J, Gyllenberg M, Koski T.
Learning Genetic Population Structures Using Minimization of Stochastic Complexity. *Entropy*. 2010; 12(5):1102-1124.
https://doi.org/10.3390/e12051102

**Chicago/Turabian Style**

Corander, Jukka, Mats Gyllenberg, and Timo Koski.
2010. "Learning Genetic Population Structures Using Minimization of Stochastic Complexity" *Entropy* 12, no. 5: 1102-1124.
https://doi.org/10.3390/e12051102