ProtParCon: A Framework for Processing Molecular Data and Identifying Parallel and Convergent Amino Acid Replacements
Abstract
:1. Introduction
2. Description of the Computational Steps
2.1. Implementation of ProtParCon
2.2. Package Validation
3. Case Study: Parallel and Convergent Amino Acid Replacements in Lysozyme C Sequences
3.1. Data
3.2. Parallel and Convergent Amino Acid Replacements in Lysozyme c
3.3. Results and Tentative Conclusions
4. Discussion
Supplementary Materials
Author Contributions
Acknowledgments
Conflicts of Interest
References
- Nei, M. DNA Polymorphism and Adaptive Evolution, Plant Population Genetics, Breeding and Genetic Resources; Sinauer Associates Inc.: Sunderland, MA, USA, 1990; pp. 128–142. [Google Scholar]
- Pagel, M.D.; Harvey, P.H. Comparative methods for examining adaptation depend on evolutionary models. Folia Primatol. 1989, 53, 203–220. [Google Scholar] [CrossRef] [PubMed]
- Zhang, J.; Kumar, S. Detection of convergent and parallel evolution at the amino acid sequence level. Mol. Biol. Evol. 1997, 14, 527–536. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Graur, D. Molecular and Genome Evolution; Sinauer Associates Inc.: Sunderland, MA, USA, 2016. [Google Scholar]
- Zou, Z.; Zhang, J. Are Convergent and Parallel Amino Acid Substitutions in Protein Evolution More Prevalent Than Neutral Expectations? Mol. Biol. Evol. 2015, 32, 2085–2096. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Edgar, R.C. MUSCLE: Multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004, 32, 1792–1797. [Google Scholar] [CrossRef] [PubMed]
- Katoh, K.; Standley, D.M. MAFFT multiple sequence alignment software version 7: Improvements in performance and usability. Mol. Biol. Evol. 2013, 30, 772–780. [Google Scholar] [CrossRef] [PubMed]
- Sievers, F.; Wilm, A.; Dineen, D.; Gibson, T.J.; Karplus, K.; Li, W.; Lopez, R.; McWilliam, H.; Remmert, M.; Söding, J.; et al. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol. Syst. Biol. 2011, 7, 539. [Google Scholar] [CrossRef] [PubMed]
- Tommaso, P.; Moretti, S.; Xenarios, I.; Orobitg, M.; Montanyola, A.; Chang, J.M.; Taly, J.F.; Notredame, C. T-Coffee: A web server for the multiple sequence alignment of protein and RNA sequences using structural information and homology extension. Nucleic Acids Res. 2011, 39, W13–W17. [Google Scholar] [CrossRef] [PubMed]
- Stamatakis, A. RAxML version 8: A tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 2014, 30, 1312–1313. [Google Scholar] [CrossRef] [PubMed]
- Price, M.N.; Dehal, P.S.; Arkin, A.P. FastTree 2-Approximately Maximum-Likelihood Trees for Large Alignments. PLoS ONE 2010, 5, e9490. [Google Scholar] [CrossRef] [PubMed]
- Nguyen, L.T.; Schmidt, H.A.; von Haeseler, A.; Minh, B.Q. IQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies. Mol. Biol. Evol. 2015, 32, 268–274. [Google Scholar] [CrossRef] [PubMed]
- Guindon, S.; Delsuc, F.; Dufayard, J.F.; Gascuel, O. Estimating maximum likelihood phylogenies with PhyML. Methods Mol. Biol. 2009, 537, 113–137. [Google Scholar] [PubMed]
- Rambaut, A.; Grassly, N.C. Seq-Gen: An application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees. Comput. Appl. Biosci. 1997, 13, 235–238. [Google Scholar] [CrossRef] [PubMed]
- Yang, Z. PAML 4: Phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 2007, 24, 1586–1591. [Google Scholar] [CrossRef] [PubMed]
- Shimodaira, H. An approximately unbiased test of phylogenetic tree selection. Syst. Biol. 2002, 51, 492–508. [Google Scholar] [CrossRef] [PubMed]
- Thomas, G.W.; Hahn, M.W. Determining the null model for detecting adaptive convergence from genomic data: A case study using echolocating mammals. Mol. Biol. Evol. 2015, 32, 1232–1236. [Google Scholar] [CrossRef] [PubMed]
- Stewart, C.B.; Schilling, J.W.; Wilson, A.C. Adaptive Evolution in the Stomach Lysozymes of Foregut Fermenters. Nature 1987, 330, 401–404. [Google Scholar] [CrossRef] [PubMed]
- Kornegay, J.R.; Schilling, J.W.; Wilson, A.C. Molecular adaptation of a leaf-eating bird: Stomach lysozyme of the hoatzin. Mol. Biol. Evol. 1994, 11, 921–928. [Google Scholar] [PubMed]
- Irwin, D.M. Molecular evolution of ruminant lysozymes. EXS 1996, 75, 347–361. [Google Scholar] [PubMed]
- Consortium, T.U. UniProt: The universal protein knowledgebase. Nucleic Acids Res. 2017, 45, D158–D169. [Google Scholar]
- Prasad, A.B.; Marc, W.; Allard, D. Confirming the phylogeny of mammals by use of large comparative sequence data sets. Mol. Biol. Evol. 2008, 25, 1795–1808. [Google Scholar] [CrossRef] [PubMed]
- Irwin, D.M.; Biegel, J.M.; Stewart, C.B. Evolution of the mammalian lysozyme gene family. BMC Evol. Biol. 2011, 11, 166. [Google Scholar] [CrossRef] [PubMed]
- Esselstyn, J.A.; Oliveros, C.H.; Swanson, M.T.; Faircloth, B.C. Investigating difficult nodes in the placental mammal tree with expanded taxon sampling and thousands of ultraconserved elements. Genome Biol. Evol. 2017, 9, 2308–2321. [Google Scholar] [CrossRef] [PubMed]
- Jarvis, E.D.; Mirarab, S.; Aberer, A.J.; Li, B.; Houde, P.; Li, C.; Ho, S.Y.; Faircloth, B.C.; Nabholz, B.; Howard, J.T.; et al. Whole-genome analyses resolve early branches in the tree of life of modern birds. Science 2014, 346, 1320–1331. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Jones, D.T.; Taylor, W.R.; Thornton, J.M. The rapid generation of mutation data matrices from protein sequences. Bioinformatics 1992, 8, 275–282. [Google Scholar] [CrossRef]
- Yuan, F.; Nguyen, H.; Graur, D. A new null model for detecting adaptive parallelism and convergence in proteins. J. Mol. Evol. under review.
- Hughes, A.L.; Packer, B.; Welch, R.; Bergen, A.W.; Chanock, S.J.; Yeager, M. Widespread purifying selection at polymorphic sites in human protein-coding loci. Proc. Natl. Acad. Sci. USA 2003, 100, 15754–15757. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Landan, G.; Graur, D. Heads or tails: A simple reliability check for multiple sequence alignments. Mol. Biol. Evol. 2007, 24, 1380–1383. [Google Scholar] [CrossRef] [PubMed]
- Sela, I.; Ashkenazy, H.; Katoh, K.; Pupko, T. GUIDANCE2: Accurate detection of unreliable alignment regions accounting for the uncertainty of multiple parameters. Nucl. Acids Res. 2015, 43, 7–14. [Google Scholar] [CrossRef] [PubMed]
Functions | Description | Supported Programs |
---|---|---|
oma | For OMA orthology database | N/A |
msa | For multiple sequence alignment | MUSCLE [6] MAFFT [7] Clustal Omega [8] T-COFFEE [9] |
asr | For ancestral states reconstruction | CODEML 1 RAxML [10] |
mlt | For maximum-likelihood tree inference | FastTree [11] IQ-TREE [12] RAxML PhyML [13] |
aut | For topology test 2 | IQ-TREE |
sim | For protein sequence simulation | EVOLVER 1 Seq-Gen [14] |
imc | P&C identification 3 | N/A |
Branch Pair | Parallel Replacement | Convergent Replacement | ||||
---|---|---|---|---|---|---|
Obs. | Exp. | p-Value | Obs. | Exp. | p-Value | |
Cow-Langur | 0 | 0.43 | 0.6505 | 1 | 0.35 | 0.0487 * |
Cow-Hoatzin | 8 | 1.51 | 0.0000 ** | 1 | 0.78 | 0.184 |
Langur-Hoatzin | 2 | 0.52 | 0.0159 * | 1 | 0.32 | 0.0415 * |
Cow-Squirrel | 1 | 0.98 | 0.2569 | 0 | 0.23 | 0.7945 |
Cow-Mouse | 2 | 1.96 | 0.3125 | 0 | 0.57 | 0.5655 |
Cow-Tarsier | 2 | 1.51 | 0.1937 | 1 | 0.5 | 0.0902 |
Cow-Colobus | 1 | 0.34 | 0.0462 * | 0 | 0.17 | 0.8437 |
Hoatzin-Mallard | 1 | 0.61 | 0.1252 | 0 | 0.43 | 0.6505 |
Hoatzin-Chicken | 1 | 0.35 | 0.0487 * | 0 | 0.27 | 0.7634 |
Hoatzin-Turkey | 1 | 0.34 | 0.0462 * | 0 | 0.23 | 0.7945 |
Hoatzin-Opossum | 7 | 3.08 | 0.0137 * | 0 | 0.14 | 0.8694 |
Hoatzin-Elephant | 4 | 1.43 | 0.0155 * | 0 | 0.26 | 0.7711 |
Hoatzin-Hedgehog | 1 | 1.6 | 0.5249 | 0 | 0.46 | 0.6313 |
Hoatzin-Cat | 7 | 1.98 | 0.0010 ** | 1 | 0.73 | 0.1663 |
Hoatzin-Pig | 4 | 2.1 | 0.0621 | 2 | 0.64 | 0.0273 * |
Hoatzin-Dolphin | 1 | 0.94 | 0.2422 | 0 | 0.42 | 0.657 |
Hoatzin-Squirrel | 1 | 1.22 | 0.6554 | 0 | 0.35 | 0.7047 |
Hoatzin-Mouse | 5 | 2.62 | 0.0505 | 0 | 0.72 | 0.4868 |
Hoatzin-Rat | 7 | 1.88 | 0.0007 ** | 1 | 0.59 | 0.1186 |
Hoatzin-Tarsier | 5 | 2.02 | 0.0173 * | 0 | 0.74 | 0.4771 |
Hoatzin-Colobus | 1 | 0.38 | 0.0563 | 0 | 0.28 | 0.7558 |
Langur-Mouse | 1 | 0.57 | 0.1121 | 0 | 0.28 | 0.7558 |
Langur-Rat | 1 | 0.55 | 0.1057 | 0 | 0.3 | 0.7408 |
Langur-Tarsier | 2 | 0.48 | 0.0129 * | 0 | 0.27 | 0.7634 |
Langur-Pig | 1 | 0.46 | 0.0783 | 0 | 0.29 | 0.7483 |
Langur-Cat | 1 | 0.38 | 0.0563 | 0 | 0.27 | 0.7634 |
Cat-Pig | 5 | 2.87 | 0.0714 | 0 | 0.00 | N/A |
Cat-Cow | 5 | 1.5 | 0.0045 ** | 0 | 0.42 | 0.657 |
Cat-Squirrel | 1 | 1.67 | 0.5026 | 0 | 0.11 | 0.8958 |
Cat-Mouse | 4 | 2.21 | 0.0736 | 0 | 0.36 | 0.6977 |
Cat-Rat | 2 | 1.88 | 0.2909 | 0 | 0.25 | 0.7788 |
Cat-Tarsier | 3 | 2.03 | 0.1483 | 0 | 0.32 | 0.7261 |
Cat-Colobus | 2 | 0.43 | 0.0096 ** | 0 | 0.16 | 0.8521 |
Chicken-Elephant | 1 | 0.19 | 0.0159 * | 0 | 0.20 | 0.8187 |
Chicken-Mouse | 0 | 0.31 | 0.7334 | 1 | 0.22 | 0.0209 * |
Dolphin-Squirrel | 1 | 0.77 | 0.1805 | 0 | 0.13 | 0.8781 |
Dolphin-Rat | 1 | 0.96 | 0.2495 | 0 | 0.31 | 0.7334 |
Dolphin-Tarsier | 2 | 0.98 | 0.0767 | 0 | 0.28 | 0.7558 |
Elephant-Hedgehog | 1 | 1.48 | 0.5645 | 0 | 0.17 | 0.8437 |
Elephant-Cat | 2 | 1.54 | 0.2013 | 0 | 0.11 | 0.8958 |
Elephant-Pig | 2 | 1.71 | 0.2454 | 1 | 0.11 | 0.0056 ** |
Elephant-Cow | 1 | 1.02 | 0.7284 | 0 | 0.28 | 0.7558 |
Elephant-Squirrel | 1 | 1.31 | 0.6233 | 0 | 0.03 | 0.9704 |
Elephant-Mouse | 2 | 1.64 | 0.2270 | 0 | 0.15 | 0.8607 |
Elephant-Rat | 2 | 1.59 | 0.2141 | 0 | 0.12 | 0.8869 |
Elephant-Tarsier | 1 | 1.50 | 0.5578 | 0 | 0.19 | 0.827 |
Elephant-Marmoset | 1 | 0.31 | 0.0392 * | 0 | 0.08 | 0.9231 |
Hedgehog-Cat | 3 | 2.39 | 0.2192 | 0 | 0.00 | N/A |
Hedgehog-Pig | 5 | 2.75 | 0.0608 | 0 | 0.00 | N/A |
Hedgehog-Dolphin | 3 | 0.93 | 0.0150 * | 0 | 0.32 | 0.7261 |
Hedgehog-Cow | 2 | 1.52 | 0.1962 | 0 | 0.38 | 0.6839 |
Hedgehog-Squirrel | 3 | 1.58 | 0.0761 | 0 | 0.08 | 0.9231 |
Hedgehog-Mouse | 5 | 2.37 | 0.0339 * | 0 | 0.36 | 0.6977 |
Hedgehog-Rat | 4 | 1.91 | 0.0449 * | 0 | 0.33 | 0.7189 |
Hedgehog-Tarsier | 3 | 1.94 | 0.1322 | 0 | 0.29 | 0.7483 |
Hedgehog-Colobus | 2 | 0.37 | 0.0064 ** | 0 | 0.21 | 0.8106 |
Mallard-Platypus | 1 | 0.47 | 0.0812 | 0 | 0.38 | 0.6839 |
Mallard-Mouse | 1 | 0.62 | 0.1285 | 0 | 0.41 | 0.6637 |
Mallard-Rat | 1 | 0.40 | 0.0616 | 0 | 0.46 | 0.6313 |
Opossum-Elephant | 3 | 1.31 | 0.0441 * | 0 | 0.24 | 0.7866 |
Opossum-Hedgehog | 1 | 1.68 | 0.4995 | 0 | 0.53 | 0.5886 |
Opossum-Cat | 3 | 1.67 | 0.0888 | 0 | 0.51 | 0.6005 |
Opossum-Pig | 3 | 1.90 | 0.1253 | 0 | 0.54 | 0.5827 |
Opossum-Cow | 0 | 1.35 | 0.2592 | 1 | 0.48 | 0.0842 |
Opossum-Squirrel | 2 | 1.05 | 0.0897 | 0 | 0.33 | 0.7189 |
Opossum-Mouse | 4 | 1.72 | 0.0309 * | 0 | 0.59 | 0.5543 |
Opossum-Rat | 5 | 1.66 | 0.0072 ** | 0 | 0.68 | 0.5066 |
Opossum-Tarsier | 2 | 1.74 | 0.2534 | 0 | 0.45 | 0.6376 |
Opossum-Marmoset | 1 | 0.35 | 0.0487 * | 0 | 0.17 | 0.8437 |
Platypus-Hedgehog | 2 | 1.15 | 0.1099 | 0 | 0.38 | 0.6839 |
Platypus-Pig | 2 | 1.45 | 0.1787 | 0 | 0.45 | 0.6376 |
Platypus-Dolphin | 1 | 0.61 | 0.1252 | 0 | 0.36 | 0.6977 |
Platypus-Squirrel | 1 | 0.85 | 0.2093 | 0 | 0.33 | 0.7189 |
Platypus-Mouse | 1 | 1.51 | 0.5545 | 1 | 0.45 | 0.0754 |
Platypus-Rat | 2 | 1.45 | 0.1787 | 0 | 0.47 | 0.625 |
Platypus-Tarsier | 1 | 1.19 | 0.6662 | 0 | 0.48 | 0.6188 |
Pig-Cow | 1 | 1.83 | 0.4540 | 0 | 0.32 | 0.7261 |
Pig-Squirrel | 1 | 1.51 | 0.5545 | 0 | 0.10 | 0.9048 |
Pig-Mouse | 4 | 2.36 | 0.0909 | 1 | 0.34 | 0.0462 * |
Pig-Rat | 6 | 2.23 | 0.0080 ** | 0 | 0.35 | 0.7047 |
Pig-Tarsier | 3 | 2.29 | 0.1986 | 0 | 0.24 | 0.7866 |
Rat-Tarsier | 5 | 1.87 | 0.0123 * | 0 | 0.36 | 0.6977 |
Squirrel-Mouse | 1 | 1.53 | 0.5478 | 0 | 0.13 | 0.8781 |
Squirrel-Tarsier | 1 | 1.48 | 0.5645 | 0 | 0.20 | 0.8187 |
Squirrel-Marmoset | 1 | 0.34 | 0.0462 * | 0 | 0.10 | 0.9048 |
Mouse-Tarsier | 3 | 2.23 | 0.1866 | 0 | 0.34 | 0.7118 |
Mouse-Marmoset | 1 | 0.37 | 0.0537 | 0 | 0.22 | 0.8025 |
Mouse-Colobus | 1 | 0.39 | 0.0589 | 0 | 0.18 | 0.8353 |
Tarsier-Colobus | 1 | 0.42 | 0.0670 | 0 | 0.16 | 0.8521 |
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yuan, F.; Nguyen, H.; Graur, D. ProtParCon: A Framework for Processing Molecular Data and Identifying Parallel and Convergent Amino Acid Replacements. Genes 2019, 10, 181. https://doi.org/10.3390/genes10030181
Yuan F, Nguyen H, Graur D. ProtParCon: A Framework for Processing Molecular Data and Identifying Parallel and Convergent Amino Acid Replacements. Genes. 2019; 10(3):181. https://doi.org/10.3390/genes10030181
Chicago/Turabian StyleYuan, Fei, Hoa Nguyen, and Dan Graur. 2019. "ProtParCon: A Framework for Processing Molecular Data and Identifying Parallel and Convergent Amino Acid Replacements" Genes 10, no. 3: 181. https://doi.org/10.3390/genes10030181
APA StyleYuan, F., Nguyen, H., & Graur, D. (2019). ProtParCon: A Framework for Processing Molecular Data and Identifying Parallel and Convergent Amino Acid Replacements. Genes, 10(3), 181. https://doi.org/10.3390/genes10030181