# Using Machine Learning to Discover Latent Social Phenotypes in Free-Ranging Macaques

^{1}

^{2}

^{3}

^{4}

^{5}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Materials and Methods

#### 2.1. Study Site and Subjects

#### 2.2. Observational Data

#### 2.3. Genetic Data

#### 2.4. A Model for Social Phenotypes

#### 2.5. Data Processing and Likelihood

- Construct a data matrix with a row for each focal observation, and a column for each behavior in the ethogram.
- For each focal observation:
- (a)
- For each “event” behavior, count the number of times that behavior occurred during the observation.
- (b)
- For each “activity” behavior, calculate the total proportion of the focal observation spent engaged in that behavior.
- (c)
- Populate the associated row in the data matrix with these values.

- For each behavior:
- (a)
- Calculate quintiles, e.g., 20th percentile, 40th percentile, etc., of the values in that behavior’s associated column in the data matrix.
- (b)
- Also calculate the 1st and 99th percentile of the behavior to make high and low outliers.
- (c)
- Bin using the quantiles calculated above as cutpoints, e.g., values ≤ 1st percentile being 1, > 1st and ≤ 20th percentiles being 2, etc.

#### 2.6. Mathematical Description of the Model

#### 2.7. Model Fitting

#### 2.8. Repeatability Analysis

#### 2.9. Simulations

- Pick an output state ${k}^{\prime}$ and calculate the posterior mean for each of its parameters, ${\widehat{\theta}}^{\left({k}^{\prime}\right)}$.
- For each simulated state k that is not already matched with an output state, calculate the correlation between ${\theta}^{\left(k\right)}$ and ${\widehat{\theta}}^{\left({k}^{\prime}\right)}$.
- Pick the simulated state with the highest correlation as the match for the output state ${k}^{\prime}$.
- Repeat for each ${k}^{\prime}$.

#### 2.10. Comparisons with Factor Analysis

#### 2.11. Assessing Genetic and Covariate Influences on Social Phenotypes

## 3. Results

#### 3.1. Simulation Tests

#### 3.2. Phenotype Distributions and Behavioral State Content

#### 3.3. Repeatability

#### 3.4. Group, Rank, and Sex Effects on Social Phenotypes

#### 3.5. Genetic Components of Social Phenotypes

#### 3.6. Comparison with Factor Analyis

## 4. Discussion

## 5. Conclusions

## Supplementary Materials

## Acknowledgments

## Author Contributions

## Conflicts of Interest

## Abbreviations

CPRC | Carribean Primate Research Center |

PCA | Principle components analysis |

NHP | Non-human primate |

CI | Central credible interval |

WAIC | Widely-applicable information criterion |

## Appendix A

#### Appendix A.1. Self-Directed Behaviors

- Scratch: Rapid and repeated movement of the nails of a hand or foot across the skin.
- Self groom: Running of hands or mouth through one’s own hair for at least 5 s.
- Feed: Searching for, manipulating, holding and ingesting food items, including water, for at least 5 s.
- Travel: Movement from one location to another over a distance of at least 5 m.

#### Appendix A.2. Affiliative Behaviors

- Approach: One individual approaches another to within arms’ reach (2 m) without physical contact, and remains within that distance for at least 5 s.
- Leave: Exiting a 2 m area around another without an agonistic interaction.
- Affiliative Vocalization (AffilVocal in figures): Emiting a friendly vocalization in the form of either a grunt, girney, vocal exchange, or lipsmack. Individuals will often emit many vocalizations in short succession.
- Groom: Running the hands or mouth through the hair of another monkey for at least 5 s.
- Passive Contact: Sitting or lying in physical contact with another animal without grooming.
- Social Proximity: Number of time points out of three at which the focal was observed to be within 2 m of at least one other animal. The time points were at the beginning, middle, and end of the focal observation.
- Proximity Group Size: Number of unique animals with whom the focal animal shared social proximity as defined above.

#### Appendix A.3. Agonistic Behaviors

- Threat: One individual threatens another with one or a combination of staring, barks, head bobs, and opening one’s mouth with covered teeth.
- Avoid: Moving out of the way of another before they come within 2 m.
- Displacement: Similar to avoid, but within 2 m.
- Fear Grimace: Submissive facial expression wherein lips are retracted horizontally to expose teeth.
- Submit: Leaning away from another or crouching while raising hindquarters towards another.
- Noncontact Aggression: A lunge, charge, or chase that does not result in direct physical contact.
- Contact Aggression: Direct physical contact such as a bite, hit, push, or grab.

## References

- Morgan, K.N.; Tromborg, C.T.; Syaadah, O.; Norma, M.; Feldon, J.; Berckmans, D.; Hare, V.; Tepper, E.; Lindburg, D. Sources of stress in captivity. Appl. Anim. Behav. Sci.
**2007**, 102, 262–302. [Google Scholar] [CrossRef] - Rietveld, C.A.; Medland, S.E.; Derringer, J.; Yang, J.; Esko, T.; Martin, N.G.N.W.; Westra, H.J.; Shakhbazov, K.; Abdellaoui, A.; Agrawal, A.; et al. GWAS of 126,559 individuals identifies genetic variants associated with educational attainment. Science
**2013**, 340, 1467–1471. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Allen, H.L.; Estrada, K.; Lettre, G.; Berndt, S.I.; Weedon, M.N.; Rivadeneira, F.; Willer, C.J.; Jackson, A.U.; Vedantam, S.; Raychaudhuri, S.; et al. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature
**2010**, 467, 832–838. [Google Scholar] [CrossRef] [PubMed] - Murphy, K.P. Machine Learning: A Probabilistic Perspective; The MIT Press: Cambridge, MA, USA, 2012. [Google Scholar]
- Brent, L.J.N.; Semple, S.; Maclarnon, A.; Ruiz-Lambides, A.; Gonzalez-Martinez, J.; Platt, M.L. Personality Traits in Rhesus Macaques (Macaca mulatta) Are Heritable but Do Not Predict Reproductive Output. Int. J. Primatol.
**2014**, 35, 188–209. [Google Scholar] [CrossRef] [PubMed] - Blei, D.M.; Ng, A.Y.; Jordan, M.I. Latent Dirichlet Allocation. J. Mach. Learn. Res.
**2003**, 3, 993–1022. [Google Scholar] - Blei, D.M. Probabilistic topic models. Commun. ACM
**2012**, 55, 77. [Google Scholar] [CrossRef] - Lafferty, J.D.; Blei, D.M. Correlated Topic Models. In Advances in Neural Information Processing Systems 18; Weiss, Y., Schölkopf, P.B., Platt, J.C., Eds.; MIT Press: Cambridge, MA, USA, 2006; pp. 147–154. [Google Scholar]
- Chen, J.; Zhu, J.; Wang, Z.; Zheng, X.; Zhang, B. Scalable Inference for Logistic-Normal Topic Models. In Advances in Neural Information Processing Systems 26; Burges, C.J.C., Bottou, L., Welling, M., Gharamani, Z., Weinberger, K.Q., Eds.; Curran Associates Inc.: Red Hook, NY, USA, 2013; pp. 2445–2453. [Google Scholar]
- Rodríguez, A.; Dunson, D.B. Nonparametric Bayesian models through probit stick-breaking processes. Bayesian Anal.
**2011**, 6, 145–177. [Google Scholar] [CrossRef] [PubMed] - Seyfarth, R.M.; Silk, J.B.; Cheney, D.L. Variation in personality and fitness in wild female baboons. Proc. Natl. Acad. Sci. USA
**2012**, 109, 16980–16985. [Google Scholar] [CrossRef] [PubMed] - Budaev, S.V. Using Principal Components and Factor Analysis in Animal Behaviour Research: Caveats and Guidelines. Ethology
**2010**, 116, 472–480. [Google Scholar] [CrossRef] - Brent, L.J.N.; Heilbronner, S.R.; Horvath, J.E.; Gonzalez-Martinez, J.; Ruiz-Lambides, A.; Robinson, A.G.; Skene, J.H.P.; Platt, M.L. Genetic origins of social networks in rhesus macaques. Sci. Rep.
**2013**, 3, 1042. [Google Scholar] [CrossRef] [PubMed] - Widdig, A.; Bercovitch, F.B.; Streich, W.J.; Sauermann, U.; Nürnberg, P.; Krawczak, M. A longitudinal analysis of reproductive skew in male rhesus macaques. Proc. Biol. Sci.
**2004**, 271, 819–826. [Google Scholar] [CrossRef] [PubMed] - Carpenter, C.R. Characteristics of Social Behavior in Non-Human Primates. Trans. N. Y. Acad. Sci.
**1942**, 4, 248–258. [Google Scholar] [CrossRef] - Bernstein, I.S.; Sharpe, L.G. Social Roles in a Rhesus Monkey Group. Behaviour
**1966**, 26, 91–104. [Google Scholar] [CrossRef] [PubMed] - Brent, L.J.N. Investigating The Causes and Consequences of Sociality in Adult Female Rhesus Macaques Using a Social Network Approach. Ph.D. Thesis, University of Roehampton, London, UK, 2010. [Google Scholar]
- Altmann, J. Observational Study of Behavior: Sampling Methods. Behaviour
**1974**, 49, 227–267. [Google Scholar] [CrossRef] [PubMed] - Rawlins, R.G.; Kessler, M.J. The Cayo Santiago Macaques: History, Behavior, and Biology; State University of New York Press: Albany, NY, USA, 1986; p. 306. [Google Scholar]
- Sinnwell, J.P.; Therneau, T.M.; Schaid, D.J. The kinship2 R package for pedigree data. Hum. Hered.
**2014**, 78, 91–93. [Google Scholar] [CrossRef] [PubMed] - Gelman, A.; Hill, J. Data Analysis Using Regression and Multilevel/Hierarchical Models; Analytical Methods for Social Research; Cambridge University Press: New York, NY, 2007. [Google Scholar]
- Kruuk, L.E.B. Estimating genetic parameters in natural populations using the “animal model”. Philos. Trans. R. Soc. Lond. B Biol. Sci.
**2004**, 359, 873–890. [Google Scholar] [CrossRef] [PubMed] - Van Dyk, D.A.; Park, T. Partially Collapsed Gibbs Samplers: Theory and Methods. J. Am. Stat. Assoc.
**2008**, 103, 790–796. [Google Scholar] [CrossRef] - Park, T.; Min, S. Partially Collapsed Gibbs Sampling for Linear Mixed-effects Models. Commun. Stat. Simul. Comput.
**2014**, 45, 165–180. [Google Scholar] [CrossRef] - Gelman, A.; Jakulin, A.; Pittau, M.G.; Su, Y.S. A weakly informative default prior distribution for logistic and other regression models. Ann. Appl. Stat.
**2008**, 2, 1360–1383. [Google Scholar] [CrossRef] - Bezanson, J.; Edelman, A.; Karpinski, S.; Shah, V.B. Julia: A Fresh Approach to Numerical Computing. SIAM Rev.
**2014**, 59, 65–98. [Google Scholar] [CrossRef] - Polson, N.G.; Scott, J.G.; Windle, J. Bayesian Inference for Logistic Models Using Pólya–Gamma Latent Variables. J. Am. Stat. Assoc.
**2013**, 108, 1339–1349. [Google Scholar] [CrossRef] - Stan Development Team. RStan: The R interface to Stan, R package version 2.14.1. 2016. [Google Scholar]
- Vehtari, A.; Gelman, A.; Gabry, J. Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Stat. Comput.
**2017**, 27, 1413–1432. [Google Scholar] [CrossRef] - R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2017. [Google Scholar]
- Agresti, A. Categorical Data Analysis, 2nd ed.; Wiley: Hoboken, NY, USA, 2002. [Google Scholar]
- Wilson, A.J.; Réale, D.; Clements, M.N.; Morrissey, M.M.; Postma, E.; Walling, C.A.; Kruuk, L.E.B.; Nussey, D.H. An ecologist’s guide to the animal model. J. Anim. Ecol.
**2010**, 79, 13–26. [Google Scholar] [CrossRef] [PubMed] - Roweis, S.; Ghahramani, Z. A Unifying Review of Linear Gaussian Models. Neural Comput.
**1999**, 11, 305–345. [Google Scholar] [CrossRef] [PubMed] - Yang, J.; Benyamin, B.; McEvoy, B.P.; Gordon, S.; Henders, A.K.; Nyholt, D.R.; Madden, P.A.; Heath, A.C.; Martin, N.G.; Montgomery, G.W.; et al. Common SNPs explain a large proportion of the heritability for human height. Nat. Gen.
**2010**, 42, 565–569. [Google Scholar] [CrossRef] [PubMed] - Davies, G.; Tenesa, A.; Payton, A.; Yang, J.; Harris, S.E.; Liewald, D.; Ke, X.; Le Hellard, S.; Christoforou, A.; Luciano, M.; et al. Genome-wide association studies establish that human intelligence is highly heritable and polygenic. Mol. Psychiatry
**2011**, 16, 996–1005. [Google Scholar] [CrossRef] [PubMed] - Benjamin, D.J.; Cesarini, D.; Chabris, C.F.; Glaeser, E.L.; Laibson, D.I.; Guðnason, V.; Harris, T.B.; Launer, L.J.; Purcell, S.; Smith, A.V.; et al. The Promises and Pitfalls of Genoeconomics. Annu. Rev. Econ.
**2012**, 4, 627–662. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Neale, B.M.; Kou, Y.; Liu, L.; Ma’ayan, A.; Samocha, K.E.; Sabo, A.; Lin, C.F.; Stevens, C.; Wang, L.S.; Makarov, V.; et al. Patterns and rates of exonic de novo mutations in autism spectrum disorders. Nature
**2012**, 485, 242–245. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Brent, L.J.N. Friends of friends: Are indirect connections in social networks important to animal behaviour? Anim. Behav.
**2015**, 103, 211–222. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Seyfarth, R.M. The distribution of grooming and related behaviours among adult female vervet monkeys. Anim. Behav.
**1980**, 28, 798–813. [Google Scholar] [CrossRef] - Seyfarth, R.M. A model of social grooming among adult female monkeys. J. Theor. Biol.
**1977**, 65, 671–698. [Google Scholar] [CrossRef] - Colvin, J.D. Proximate Causes of Male Emigration at Puberty in Rhesus Monkeys. In The Cayo Santiago macaques: History, behavior, and biology; Rawlins, R.G., Kessler, M.J., Eds.; State University of New York Press: Albany, NY, USA, 1986; Chapter 6. [Google Scholar]
- Melnick, D.J.; Pearl, M.C.; Richard, A.F. Male migration and inbreeding avoidance in wild rhesus monkeys. Am. J. Primatol.
**1984**, 7, 229–243. [Google Scholar] [CrossRef] - Gelman, A.; Carlin, J.B.; Stern, H.S.; Dunson, D.B.; Vehtari, A.; Rubin, D.B. Bayesian Data Analysis, 3rd ed.; CRC Press: Boca Raton, FL, USA, 2013. [Google Scholar]
- Pritchard, J.K.; Stephens, M.; Donnelly, P. Inference of Population Structure Using Multilocus Genotype Data. Genetics
**2000**, 155, 945–959. [Google Scholar] [PubMed] - Hernandez, R.D.; Hubisz, M.J.; Wheeler, D.A.; Smith, D.G.; Ferguson, B.; Rogers, J.; Nazareth, L.; Indap, A.; Bourquin, T.; McPherson, J.; et al. Demographic histories and patterns of linkage disequilibrium in Chinese and Indian rhesus macaques. Science
**2007**, 316, 240–243. [Google Scholar] [CrossRef] [PubMed] - Balding, D.J. A tutorial on statistical methods for population association studies. Nat. Rev. Gen.
**2006**, 7, 781–791. [Google Scholar] [CrossRef] [PubMed] - Chabris, C.F.; Hebert, B.M.; Benjamin, D.J.; Beauchamp, J.; Cesarini, D.; van der Loos, M.; Johannesson, M.; Magnusson, P.K.E.; Lichtenstein, P.; Atwood, C.S.; et al. Most reported genetic associations with general intelligence are probably false positives. Psychol. Sci.
**2012**, 23, 1314–1323. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Spencer, C.C.A.; Su, Z.; Donnelly, P.; Marchini, J. Designing Genome-Wide Association Studies: Sample Size, Power, Imputation, and the Choice of Genotyping Chip. PLoS Gen.
**2009**, 5, e1000477. [Google Scholar] [CrossRef] [PubMed] - Willer, C.J.; Speliotes, E.K.; Loos, R.J.F.; Li, S.; Lindgren, C.M.; Heid, I.M.; Berndt, S.I.; Elliott, A.L.; Jackson, A.U.; Lamina, C.; et al. Six new loci associated with body mass index highlight a neuronal influence on body weight regulation. Nat. Gen.
**2009**, 41, 25–34. [Google Scholar] [CrossRef] [PubMed] - Schaaf, C.P.; Sabo, A.; Sakai, Y.; Crosby, J.; Muzny, D.; Hawes, A.; Lewis, L.; Akbar, H.; Varghese, R.; Boerwinkle, E.; et al. Oligogenic heterozygosity in individuals with high-functioning autism spectrum disorders. Hum. Mol. Gen.
**2011**, 20, 3366–3375. [Google Scholar] [CrossRef] [PubMed] - Gratten, J.; Wray, N.R.; Keller, M.C.; Visscher, P.M. Large-scale genomics unveils the genetic architecture of psychiatric disorders. Nat. Neurosci.
**2014**, 17, 782–790. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Subramanian, A.; Tamayo, P.; Mootha, V.K.; Mukherjee, S.; Ebert, B.L.; Gillette, M.A.; Paulovich, A.; Pomeroy, S.L.; Golub, T.R.; Lander, E.S.; et al. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. USA
**2005**, 102, 15545–15550. [Google Scholar] [CrossRef] [PubMed] - Lee, S.H.; Goddard, M.E.; Visscher, P.M.; van der Werf, J.H. Using the realized relationship matrix to disentangle confounding factors for the estimation of genetic variance components of complex traits. Gen. Sel. Evolut.
**2010**, 42, 22. [Google Scholar] [CrossRef] [PubMed] - Gusev, A.; Lee, S.H.; Trynka, G.; Finucane, H.; Vilhjálmsson, B.J.; Xu, H.; Zang, C.; Ripke, S.; Bulik-Sullivan, B.; Stahl, E.; et al. Partitioning Heritability of Regulatory and Cell-Type-Specific Variants across 11 Common Diseases. Am. J. Hum. Gen.
**2014**, 95, 535–552. [Google Scholar] [CrossRef] [PubMed] - Kostem, E.; Eskin, E. Improving the accuracy and efficiency of partitioning heritability into the contributions of genomic regions. Am. J. Hum. Gen.
**2013**, 92, 558–564. [Google Scholar] [CrossRef] [PubMed] - Yang, J.; Lee, S.H.; Goddard, M.E.; Visscher, P.M. GCTA: A tool for genome-wide complex trait analysis. Am. J. Hum. Gen.
**2011**, 88, 76–82. [Google Scholar] [CrossRef] [PubMed] - Gay, L.; Siol, M.; Ronfort, J. Pedigree-free estimates of heritability in the wild: Promising prospects for selfing populations. PLoS ONE
**2013**, 8, e66983. [Google Scholar] [CrossRef] [PubMed] - Bérénos, C.; Ellis, P.A.; Pilkington, J.G.; Pemberton, J.M. Estimating quantitative genetic parameters in wild populations: A comparison of pedigree and genomic approaches. Mol. Ecol.
**2014**, 23, 3434–3451. [Google Scholar] [CrossRef] [PubMed] [Green Version]

**Figure 1.**Correlations between simulated and fit phenotypes (

**left**) and state contents (

**right**). Both panels show boxplots, though for the state contents the values are concentrated enough that the hinges of the plots are not distinguishable. For phenotypes, each data point is an individual, while for states, each data point is a state.

**Figure 2.**The distribution of phenotypes of the studied population. The box-and-whisker plots display the distribution of posterior mean probabilities of being in each state across all studied individuals. States are ordered by descending mean probability across the population. “Hinges” of the boxes represent 25%, 50%, and 75% quartiles.

**Figure 3.**The content of behavioral states fit by the model. Relative rates for each behavior are calculated by dividing the posterior mean rates across states by the mean rate of the highest state, such that 1 represents the highest mean rate across states. For visual clarity, behaviors with relative rates below 0.05 are omitted, and behaviors for which the difference between the “give” and “receive” variants is less than 0.33 are concatenated into a single label (give/rec).

**Figure 4.**Regression coefficients with S2 as a baseline (see Methods). Points represent posterior means and lines, 95% central credible intervals.

**Figure 5.**Expected probabilities of being in a state across different social groups for animals of average age and rank. Points represent posterior means and error bars, 95% central credible intervals of the expected probability.

**Figure 6.**Relationship between dominance rank and the expected probability of being in a state for animals of average age in group F. Curves represent posterior means and shaded regions, 95% central credible intervals of the value of the curve at the corresponding rank.

**Figure 7.**Genetic influences on the probability of being in a state in the studied population. Points represent posterior means, thick error bars, 66% central credible interval (roughly equivalent to 1 standard error), and thin error bars, 95% central credible intervals. See Methods for definition of the pseudo-h2 measure.

**Figure 8.**Comparisons between the state model and factor analysis models. (

**a**) The leftmost panel shows observed means and standard errors of rates of Travel at varying levels of Feed and Noncontact Aggression (received); The right two panel shows the expected levels of Traveling under the state model and factor model 1; (

**b**) Repeatability, as measured by the correlation between 2012 and 2013 phenotypes, for the three models; (

**c**) Heritability estimates of phenotypes from the three models. Error bars in all panels represent one standard error.

© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Madlon-Kay, S.; Brent, L.; Montague, M.; Heller, K.; Platt, M.
Using Machine Learning to Discover Latent Social Phenotypes in Free-Ranging Macaques. *Brain Sci.* **2017**, *7*, 91.
https://doi.org/10.3390/brainsci7070091

**AMA Style**

Madlon-Kay S, Brent L, Montague M, Heller K, Platt M.
Using Machine Learning to Discover Latent Social Phenotypes in Free-Ranging Macaques. *Brain Sciences*. 2017; 7(7):91.
https://doi.org/10.3390/brainsci7070091

**Chicago/Turabian Style**

Madlon-Kay, Seth, Lauren Brent, Michael Montague, Katherine Heller, and Michael Platt.
2017. "Using Machine Learning to Discover Latent Social Phenotypes in Free-Ranging Macaques" *Brain Sciences* 7, no. 7: 91.
https://doi.org/10.3390/brainsci7070091