Next Article in Journal
Variational Autoencoder Reconstruction of Complex Many-Body Physics
Next Article in Special Issue
Coevolutionary Analysis of Protein Subfamilies by Sequence Reweighting
Previous Article in Journal
Truncated Inverted Kumaraswamy Generated Family of Distributions with Applications
Previous Article in Special Issue
Phylogenetic Weighting Does Little to Improve the Accuracy of Evolutionary Coupling Analyses
Open AccessArticle

Toward Inferring Potts Models for Phylogenetically Correlated Sequence Data

1
Laboratoire de Biologie Computationnelle et Quantitative (LCQB), Institut de Biologie Paris-Seine, Sorbonne Université, Centre national de la recherche scientifique (CNRS), 75005 Paris, France
2
Group of Complex Systems and Statistical Physics, Department of Theoretical Physics, Physics Faculty, University of Havana, La Habana 10400, Cuba
3
Biozentrum, University of Basel, 4056 Basel, Switzerland
*
Author to whom correspondence should be addressed.
Entropy 2019, 21(11), 1090; https://doi.org/10.3390/e21111090
Received: 25 September 2019 / Revised: 28 October 2019 / Accepted: 6 November 2019 / Published: 7 November 2019
Global coevolutionary models of protein families have become increasingly popular due to their capacity to predict residue–residue contacts from sequence information, but also to predict fitness effects of amino acid substitutions or to infer protein–protein interactions. The central idea in these models is to construct a probability distribution, a Potts model, that reproduces single and pairwise frequencies of amino acids found in natural sequences of the protein family. This approach treats sequences from the family as independent samples, completely ignoring phylogenetic relations between them. This simplification is known to lead to potentially biased estimates of the parameters of the model, decreasing their biological relevance. Current workarounds for this problem, such as reweighting sequences, are poorly understood and not principled. Here, we propose an inference scheme that takes the phylogeny of a protein family into account in order to correct biases in estimating the frequencies of amino acids. Using artificial data, we show that a Potts model inferred using these corrected frequencies performs better in predicting contacts and fitness effect of mutations. First, only partially successful tests on real protein data are presented, too. View Full-Text
Keywords: phylogeny; co-evolution; direct coupling analysis phylogeny; co-evolution; direct coupling analysis
Show Figures

Figure 1

MDPI and ACS Style

Rodriguez Horta, E.; Barrat-Charlaix, P.; Weigt, M. Toward Inferring Potts Models for Phylogenetically Correlated Sequence Data. Entropy 2019, 21, 1090.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop