# Computation of the Likelihood of Joint Site Frequency Spectra Using Orthogonal Polynomials

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

#### 1.1. Inference with a Single Site Frequency Spectrum Assuming Equilibrium

#### 1.2. Inference Based on the Beta Equilibrium Distribution

#### 1.3. Inference Based on the Assumptions of Equilibrium and Rare Mutations

## 2. Mathematical Theory and Algorithms

#### 2.1. The Boundary-Mutation Model

#### 2.2. Modified Gegenbauer Polynomials

#### 2.2.1. Solution of the Pure Drift Forward Equation with Gegenbauer Polynomials

#### 2.2.2. Starting and Prior Distributions

#### 2.2.3. Algorithm 1: Allelic Proportions x with Pure Drift for All Times t, Conditional on Initial Values

- A measure $f\left(x\right)$ between zero and one, which may have point masses ${m}_{0}$ and ${m}_{1}$ at Boundaries 0 and 1, is represented by an expansion of the ${H}_{i}\left(x\right)$ up to $i=n$. The coefficients ${c}_{i}$ are calculated according to Equation (34). The expansion of $g\left(x\right)$ times the prior, up to the order n, is then:$$g\left(x\right)=\left({m}_{0}-\sum _{i=2}^{n}{c}_{i}\frac{{(-1)}^{i}}{i}\right)\phantom{\rule{0.166667em}{0ex}}\delta \left(x\right)+\left({m}_{1}-\sum _{i=2}^{n}\frac{{c}_{i}}{i}\right)\phantom{\rule{0.166667em}{0ex}}\delta (x-1)+\sum _{i=2}^{n}\left({c}_{i}{H}_{i}\left(x\right)\right)+O(n+1)\phantom{\rule{0.166667em}{0ex}}.$$
- The solution of Equation (28) for all t conditional on the initial distribution can be represented by a series expansion up to n:$$g(x,t)=\left({m}_{0}-\sum _{i=2}^{n}{c}_{i}\frac{{(-1)}^{i}}{i}\right)\phantom{\rule{0.166667em}{0ex}}\delta \left(x\right)+\left({m}_{1}-\sum _{i=2}^{n}\frac{{c}_{i}}{i}\right)\phantom{\rule{0.166667em}{0ex}}\delta (x-1)+\sum _{i=2}^{n}\left({c}_{i}{H}_{i}(x){e}^{-{\lambda}_{i}t}\right)+O(n+1)\phantom{\rule{0.166667em}{0ex}},$$

#### 2.3. Modified Gegenbauer Polynomials and the Boundary-Mutation Model

#### 2.3.1. Mutation and Drift: Slowly Evolving Dynamics

#### 2.3.2. Mutation and Drift: Quickly Evolving Dynamics

#### 2.3.3. Mutation and Drift: Slowly and Quickly Evolving Dynamics

**Theorem**

**1.**

**Proof.**

#### 2.3.4. Boundary-Mutation-Drift Equilibrium Distribution

#### 2.3.5. Prior Distribution

#### 2.3.6. Algorithm 2: Allelic Proportions x with Boundary-Mutations and Drift for All Times t, Conditional on Initial Values

- The interior of a joint distribution $p(x,y\phantom{\rule{0.166667em}{0ex}}|\phantom{\rule{0.166667em}{0ex}}{M}_{0},\alpha ,\theta )$ is represented as a Gegenbauer series (53).
- The slowly evolving part of the system consists of the dynamics at the boundaries. Set the boundary terms at $t=0$ to ${b}_{0}(t=0)$ and ${b}_{1}(t=0)$ as in Equations (54) and (55). With time, the boundary terms ${b}_{0}\left(t\right)$ and ${b}_{1}\left(t\right)$ then change slowly at the rate of θ according to the exponential function in Equation (39).
- Set $\omega ={\int}_{0}^{1}p(x,y\phantom{\rule{0.166667em}{0ex}}|\phantom{\rule{0.166667em}{0ex}}{M}_{0},\alpha ,\theta )\phantom{\rule{0.166667em}{0ex}}dx={b}_{0}\left(t\right)+{b}_{1}\left(t\right)$. The solution of Equation (41) for all t conditional on $f\left(x\right)$ can be represented by a series expansion up to n:$$f(x,t)={b}_{0}\left(t\right)\delta \left(x\right)+{b}_{1}\left(t\right)\delta (x-1)+\sum _{i=2}^{n}\left({\tau}_{i}\left(t\right){H}_{i}\left(x\right)\right)+O(n+1)\phantom{\rule{0.166667em}{0ex}},$$$${\tau}_{i}\left(t\right)=\frac{{A}_{i}^{\prime}}{{\lambda}_{i}}+\left({c}_{i}-\frac{{A}_{i}^{\prime}}{{\lambda}_{i}}-\frac{{B}_{i}^{\prime}}{{\lambda}_{i}-\theta}\right){e}^{-{\lambda}_{i}t}+\frac{{B}_{i}^{\prime}}{{\lambda}_{i}-\theta}{e}^{-\theta t}\phantom{\rule{0.166667em}{0ex}},$$$$\begin{array}{cc}\hfill {A}_{i}^{\prime}& =-\omega \alpha \beta \theta (2i-1)i\left({(-1)}^{i}+1\right),\hfill \\ \hfill {B}_{i}^{\prime}& =-\theta (2i-1)i({b}_{0}\left(0\right)-\omega \beta )\left({(-1)}^{i}\alpha -\beta \right).\hfill \end{array}$$

## 3. Applications

#### 3.1. A Joint Site Frequency Spectrum under Pure Drift

**Figure 1.**Approximate densities using the Gegenbauer polynomial expansion with terms up to $n=52$. (

**A**) Approximation to point masses at both boundaries, but without mass in the interior region; (

**B**) approximation to the equilibrium improper density overlying the function ${x}^{-1}{(1-x)}^{-1}$; (

**C**) approximation to the joint posterior distribution for a sample with $y=1$, $M=1$ overlying the joint distribution $2\phantom{\rule{0.166667em}{0ex}}{x}^{1-1}{(1-x)}^{1-1}$; (

**D**) approximation to the joint posterior distribution for a sample with $y=3$, $M=6$ overlying the joint distribution $\left(\genfrac{}{}{0pt}{}{6}{3}\right)\phantom{\rule{0.166667em}{0ex}}{x}^{3-1}{(1-x)}^{3-1}$.

**Figure 2.**Pure drift model. Likelihood curves of the parameter ${t}_{1}$ given a sample of $L=10,000$, ${M}_{0}={M}_{1}=3$ and true ${t}_{1}$ (dashed vertical lines) equal to 0.1 (

**A**), 0.5 (

**B**), 1 (

**C**) and 2 (

**D**).

#### 3.2. Application to Drosophila Population Data

**Table 1.**A joint site frequency spectrum of Drosophila short intronic sites with ${M}_{0}=1$ and ${M}_{1}=6$. The left-most column and the upper row of the table represent the possible allelic states of sites for the sample ${M}_{0}$ and ${M}_{1}$, respectively. The interior entries of the table are the counts of sites with a specific allelic state with respect to Allele 1.

0 | 1 | 2 | 3 | 4 | 5 | 6 | |
---|---|---|---|---|---|---|---|

0 | 84, 294 | 862 | 369 | 59 | 233 | 293 | 5121 |

1 | 5637 | 259 | 276 | 310 | 475 | 1168 | 41, 531 |

**Figure 3.**Likelihood surface with respect to parameters ${\theta}_{1}$ and ${t}_{1}$ estimated from the joint site frequency spectrum in Table 1. The point on the likelihood surface corresponds to ML estimates: ${\widehat{\theta}}_{1}=0.03$ and ${\widehat{t}}_{1}=4.5$.

## 4. Discussion

## 5. Conclusions

## Acknowledgments

## Author Contributions

## Conflicts of Interest

## Appendices

#### A.1. Appendix: Modified Gegenbauer Polynomials as the Limit of Modified Jacobi Polynomials

**Lemma**

**1.**

**Proof.**

**Remark**

**1.**

#### A.2. Appendix: Mutation-Drift Equilibrium

**Theorem**

**2.**

**Proof.**

## References

- Fisher, R. The Genetical Theory of Natural Selection; Clarendon Press: Oxford, UK, 1930. [Google Scholar]
- Wright, S. Evolution in Mendelian populations. Genetics
**1931**, 16, 97–159. [Google Scholar] [PubMed] - Vogl, C. Estimating the scaled mutation rate and mutation bias with site frequency data. Theor. Popul. Biol.
**2014**, 98, 19–27. [Google Scholar] [CrossRef] [PubMed] - Vogl, C.; Bergman, J. Inference of directional selection and mutation parameters assuming equilibrium. Theor. Popul. Biol.
**2015**, 106, 71–82. [Google Scholar] [CrossRef] [PubMed] - Kimura, M. Solution of a process of random genetic drift with a continuous model. Proc. Natl. Acad. Sci. USA
**1955**, 41, 144–150. [Google Scholar] [CrossRef] [PubMed] - Griffiths, R.; Spanò, D. Diffusion processes and coalescent trees. In Probability and Mathematical Genetics: Papers in Honour of Sir John Kingman; Cambridge University Press: Cambridge, UK, 2010; pp. 358–375. [Google Scholar]
- Song, Y.; Steinrücken, M. A simple method for finding explicit analytic transition densities of diffusion processes with general diploid selection. Genetics
**2012**, 190, 1117–1129. [Google Scholar] [CrossRef] [PubMed] - Tran, T.; Hofrichter, J.; Jost, J. An introduction to the mathematical structure of the Wright-Fisher model of population genetics. Theory Biosci.
**2013**, 132, 73–82. [Google Scholar] [CrossRef] [PubMed] - Vogl, C. Computation of the likelihood in biallelic diffusion models using orthogonal polynomials. Computation
**2014**, 2, 199–220. [Google Scholar] [CrossRef] - Vogl, C.; Clemente, F. The allele-frequency spectrum in a decoupled Moran model with mutation, drift, and directional selection, assuming small mutation rates. Theor. Popul. Genet.
**2012**, 81, 197–209. [Google Scholar] [CrossRef] [PubMed] - Parsch, J.; Novozhilov, S.; Saminadin-Peter, S.; Wong, K.; Andolfatto, P. On the utility of short intron sequences as a reference for the detection of positive and negative selection in Drosophila. Mol. Biol. Evol.
**2010**, 27, 1226–1234. [Google Scholar] [CrossRef] [PubMed] - Clemente, F.; Vogl, C. Unconstrained evolution in short introns?—An analysis of genome-wide polymorphism and divergence data from Drosophila. J. Evol. Biol.
**2012**, 25, 1975–1990. [Google Scholar] [CrossRef] [PubMed] - Clemente, F.; Vogl, C. Evidence for complex selection on four-fold degenerate sites in Drosophila melanogaster. J. Evol. Biol.
**2012**, 25, 2582–2595. [Google Scholar] [CrossRef] [PubMed] - Lack, J.; Cardeno, C.; Crepeau, M.; Taylor, W.; Corbett-Detig, R.B.; Stevens, K.; Langley, C.; Pool, J. The Drosophila Genome Nexus: A population genomic resource of 623 Drosophila melanogaster genomes, including 197 from a single ancestral range population. Genetics
**2015**, 199, 1229–1241. [Google Scholar] [CrossRef] [PubMed] - NCBI Updates of Drosophila Annotations. Available online: http://www.flybase.org/ (accessed on 21 October 2015).
- Carlin, B.; Louis, T. (Eds.) Bayes and Empirical Bayes Methods for Data Analysis, 2nd ed.; Chapman and Hall: Boca Raton, FL, USA, 2000.
- RoyChoudhury, A.; Wakeley, J. Sufficiency of the number of segregating sites in the limit under finite-sites mutation. Theor. Popul. Biol.
**2010**, 78, 118–122. [Google Scholar] [CrossRef] [PubMed] - Ewens, W. A note on the sampling theory for infinite alleles and infinite sites models. Theor. Popul. Biol.
**1974**, 6, 143–148. [Google Scholar] [CrossRef] - Watterson, G. On the number of segregating sites in genetical models without recombination. Theor. Popul. Biol.
**1975**, 7, 256–276. [Google Scholar] [CrossRef] - Kimura, M. Diffusion models in population genetics. J. Appl. Probab.
**1964**, 1, 177–232. [Google Scholar] [CrossRef] - Kimura, M. The number of heterozygous nucleotide sites maintained in a finite population due to steady flux of mutations. Genetics
**1969**, 61, 893–903. [Google Scholar] [PubMed] - Chan, A.; Jenkins, P.; Song, Y. Genome-Wide Fine-Scale Recombination Rate Variation in Drosophila melanogaster. PLoS Genet.
**2012**, 8, e1003090. [Google Scholar] [CrossRef] [PubMed] - Campos, J.L.; Zeng, K.; Parker, D.; Charlesworth, B.; Haddrill, P. Codon usage bias and effective population sizes on the X chromosome versus the autosomes in Drosophila melanogaster. Mol. Biol. Evol.
**2013**, 30, 811–823. [Google Scholar] [CrossRef] [PubMed] - Campos, J.L.; Halligan, D.L.; Haddrill, P.R.; Charlesworth, B. The relation between recombination rate and patterns of molecular evolution and variation in Drosophila melanogaster. Mol. Biol. Evol.
**2014**, 31, 1010–1028. [Google Scholar] [CrossRef] [PubMed] - Ewens, W. The sampling theory of selectively neutral alleles. Theor. Popul. Biol.
**1972**, 3, 87–112. [Google Scholar] [CrossRef] - Sawyer, S.; Hartl, D. Population genetics of polymorphism and divergence. Genetics
**1992**, 132, 1161–1176. [Google Scholar] [PubMed] - Bustamante, C.; Wakeley, J.; Sawyer, S.; Hartl, D. Directional selection and the site-frequency spectrum. Genetics
**2001**, 159, 1779–1788. [Google Scholar] [PubMed] - Bustamante, C.; Nielsen, R.; Hartl, D. Maximum likelihood and Bayesian methods for estimating the distribution of selective effects among classes of mutations using DNA polymorphism data. Theor. Popul. Biol.
**2003**, 63, 91–103. [Google Scholar] [CrossRef] - Williamson, S.; Fledel-Alon, A.; Bustamante, C. Population genetics of polymorphism and divergence for diploid selection models with arbitrary dominance. Genetics
**2004**, 168, 463–475. [Google Scholar] [CrossRef] [PubMed] - Ewens, W. Mathematical Population Genetics; Springer: New York, NY, USA, 1979. [Google Scholar]
- Abramowitz, M.; Stegun, I. (Eds.) Handbook of Mathematical Functions, 9th ed.; Dover: Mineola, NY, USA, 1970.
- Zhao, L.; Yue, X.; Waxman, D. Complete numerical solution of the diffusion equation of random genetic drift. Genetics
**2013**, 194, 973–985. [Google Scholar] [CrossRef] [PubMed] - Ewens, W. Mathematical Population Genetics, 2nd ed.; Springer: New York, NY, USA, 2004. [Google Scholar]
- Gelman, A.; Carlin, J.; Stern, H.; Rubin, D. Bayesian Data Analysis; Chapman & Hall: London, UK, 1995. [Google Scholar]
- Lachaise, D.; Cariou, M.; David, J.; Lemeunier, F.; Tsacas, L.; Ashburner, M. Historical biogeography of the Drosophila melanogaster species subgroup. Evol. Biol.
**1988**, 22, 159–225. [Google Scholar] - Russo, C.; Takezaki, N.; Nei, M. Molecular phylogeny and divergence times of Drosophilid species. Mol. Biol. Evol.
**1995**, 12, 391–404. [Google Scholar] [PubMed] - Cutter, A. Divergence times in Caenorhabditis and Drosophila inferred from direct estimates of the neutral mutation rate. Mol. Biol. Evol.
**2008**, 25, 778–786. [Google Scholar] [CrossRef] [PubMed] - Kuhner, M. LAMARC 2.0: Maximum likelihood and Bayesian estimation of population parameters. Bioinformatics
**2006**, 15, 768–770. [Google Scholar] [CrossRef] [PubMed] - Gutenkunst, R.; Hernandez, R.; Williamson, S.; Bustamante, C. Inferring the Joint Demographic History of Multiple Populations from Multidimensional SNP Frequency Data. PLoS Genet.
**2009**, 5, e1000695. [Google Scholar] [CrossRef] [PubMed] - Evans, S.; Shvets, Y.; Slatkin, M. Non-equilibrium theory of the allele frequency spectrum. Theor. Popul. Biol.
**2007**, 71, 109–119. [Google Scholar] [CrossRef] [PubMed] - Zivkovic, D.; Steinrücken, M.; Song, Y.; Stephan, W. Transition densities and sample frequency spectra of diffusion processes with selection and variable population size. Genetics
**2015**, 200, 601–617. [Google Scholar] [PubMed] - Hein, J.; Schierup, M.; Wiuf, C. Gene Genealogies, Variation, and Evolution: A Primer in Coalescent Theory; Oxford University Press: Oxford, UK, 2005. [Google Scholar]
- Wakeley, J. Coalescent Theory: An Introduction; Roberts and Co.: Greenwood Village, CO, USA, 2009. [Google Scholar]
- Chen, H. The joint allele frequency spectrum of multiple populations: A coalescent theory approach. Theor. Popul. Biol.
**2012**, 81, 179–195. [Google Scholar] [CrossRef] [PubMed] - Chen, H. Intercoalescence time distribution of incomplete gene genealogies in temporally varying populations and applications in population genetic inference. Ann. Hum. Genet.
**2013**, 77, 158–173. [Google Scholar] [CrossRef] [PubMed] - Kamm, J.; Terhorst, J.; Song, Y. Efficient computation of the joint sample frequency spectra for multiple populations. 2015. arXiv: 1503.01133. Available online: http://arxiv.org/abs/1503.01133 (accessed on 3 March 2015).
- Steinrücken, M.; Wang, R.; Song, Y. An explicit transition density expansion for a multi-allelic Wright-Fisher diffusion with general diploid selection. Theor. Popul. Biol.
**2013**, 83, 1–14. [Google Scholar] [CrossRef] [PubMed] - Steinrücken, M.; Bhaskar, A.; Song, Y. A novel method for inferring general diploid selection from time series genetic data. Ann. Appl. Stat.
**2014**, 8, 2203–2222. [Google Scholar] [CrossRef] [PubMed]

© 2016 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons by Attribution (CC-BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Vogl, C.; Bergman, J. Computation of the Likelihood of Joint Site Frequency Spectra Using Orthogonal Polynomials. *Computation* **2016**, *4*, 6.
https://doi.org/10.3390/computation4010006

**AMA Style**

Vogl C, Bergman J. Computation of the Likelihood of Joint Site Frequency Spectra Using Orthogonal Polynomials. *Computation*. 2016; 4(1):6.
https://doi.org/10.3390/computation4010006

**Chicago/Turabian Style**

Vogl, Claus, and Juraj Bergman. 2016. "Computation of the Likelihood of Joint Site Frequency Spectra Using Orthogonal Polynomials" *Computation* 4, no. 1: 6.
https://doi.org/10.3390/computation4010006