Three-Way Alignment Improves Multiple Sequence Alignment of Highly Diverged Sequences
Abstract
:1. Introduction
2. Materials and Methods
2.1. Carrillo–Lipman Algorithm for Three Sequences
2.2. Three-Way Alignment Algorithm
2.3. Simulated Dataset
2.4. Measuring Distance Matrix and Constructing Guide Tree
2.5. Seqeuence Alignment with MAFFT
2.6. Comparing the Accuracy of Phylogenetic Trees
3. Results
3.1. Three-Way Alignment Tends to Generate Guide Trees Closer to the True Tree than Other Approaches
3.2. Three-Way Alignment Leads to More Accurate Phylogenetic Results than Other Approaches
3.3. Accuracy of the Guide Tree Affects the Accuracy of the Final Tree from MSA
3.4. Sum-of-Pair Score May Not Be a Good Criterion for Choosing the Best MSA
3.5. Performance of the Three-Way Alignment on Benchmark Datasets
4. Discussion
4.1. Is the True Tree the Best Guide Tree for Progressive Multiple Sequence Alignment?
4.2. How to Obtain the Best Guide Tree?
4.3. Is Sum-of-Pair Score or Its Derivative a Good Criterion for Choosing the Best MSA?
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Xia, X. Post-Alignment Adjustment and Its Automation. Genes 2021, 12, 1809. [Google Scholar] [CrossRef] [PubMed]
- Hall, B.G. Comparison of the Accuracies of Several Phylogenetic Methods Using Protein and DNA Sequences. Mol. Biol. Evol. 2005, 22, 792–802. [Google Scholar] [CrossRef] [PubMed]
- Goldman, N. Effects of Sequence Alignment Procedures on Estimates of Phylogeny. BioEssays 1998, 20, 287–290. [Google Scholar] [CrossRef]
- Morrison, D.A.; Ellis, J.T. Effects of Nucleotide Sequence Alignment on Phylogeny Estimation: A Case Study of 18S rDNAs of Apicomplexa. Mol. Biol. Evol. 1997, 14, 428–441. [Google Scholar] [CrossRef] [PubMed]
- Needleman, S.B.; Wunsch, C.D. A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins. J. Mol. Biol. 1970, 48, 443–453. [Google Scholar] [CrossRef] [PubMed]
- Eddy, S.R. What Is Dynamic Programming? Nat. Biotechnol. 2004, 22, 909–910. [Google Scholar] [CrossRef] [PubMed]
- Sankoff, D. Minimal Mutation Trees of Sequences. SIAM J. Appl. Math. 1975, 28, 35–42. [Google Scholar] [CrossRef]
- Sankoff, D.; Cedergren, R.J.; Lapalme, G. Frequency of Insertion-Deletion, Transversion, and Transition in the Evolution of 5S Ribosomal RNA. J. Mol. Evol. 1976, 7, 133–149. [Google Scholar] [CrossRef] [PubMed]
- Hirschberg, D.S. A Linear Space Algorithm for Computing Maximal Common Subsequences. Commun. ACM 1975, 18, 341–343. [Google Scholar] [CrossRef]
- Smith, T.F.; Waterman, M.S. Identification of Common Molecular Subsequences. J. Mol. Biol. 1981, 147, 195–197. [Google Scholar] [CrossRef]
- Feng, D.-F.; Doolittle, R.F. Progressive Sequence Alignment as a Prerequisite to Correct Phylogenetic Trees. J. Mol. Evol. 1987, 25, 351–360. [Google Scholar] [CrossRef] [PubMed]
- Katoh, K.; Misawa, K.; Kuma, K.; Miyata, T. MAFFT: A Novel Method for Rapid Multiple Sequence Alignment Based on Fast Fourier Transform. Nucleic Acids Res. 2002, 30, 3059–3066. [Google Scholar] [CrossRef] [PubMed]
- Edgar, R.C. MUSCLE: Multiple Sequence Alignment with High Accuracy and High Throughput. Nucleic Acids Res. 2004, 32, 1792–1797. [Google Scholar] [CrossRef] [PubMed]
- Notredame, C.; Higgins, D.G.; Heringa, J. T-Coffee: A Novel Method for Fast and Accurate Multiple Sequence alignment11Edited by J. Thornton. J. Mol. Biol. 2000, 302, 205–217. [Google Scholar] [CrossRef] [PubMed]
- Thompson, J.D.; Higgins, D.G.; Gibson, T.J. CLUSTAL W: Improving the Sensitivity of Progressive Multiple Sequence Alignment through Sequence Weighting, Position-Specific Gap Penalties and Weight Matrix Choice. Nucleic Acids Res. 1994, 22, 4673–4680. [Google Scholar] [CrossRef] [PubMed]
- Thompson, J.D.; Plewniak, F.; Poch, O. A Comprehensive Comparison of Multiple Sequence Alignment Programs. Nucleic Acids Res. 1999, 27, 2682–2690. [Google Scholar] [CrossRef] [PubMed]
- Noah, K.E.; Hao, J.; Li, L.; Sun, X.; Foley, B.; Yang, Q.; Xia, X. Major Revisions in Arthropod Phylogeny through Improved Supermatrix, with Support for Two Possible Waves of Land Invasion by Chelicerates. Evol. Bioinform. Online 2020, 16, 1176934320903735. [Google Scholar] [CrossRef]
- Regier, J.C.; Shultz, J.W.; Zwick, A.; Hussey, A.; Ball, B.; Wetzer, R.; Martin, J.W.; Cunningham, C.W. Arthropod Relationships Revealed by Phylogenomic Analysis of Nuclear Protein-Coding Sequences. Nature 2010, 463, 1079–1083. [Google Scholar] [CrossRef] [PubMed]
- Xia, X. PhyPA: Phylogenetic Method with Pairwise Sequence Alignment Outperforms Likelihood Methods in Phylogenetics Involving Highly Diverged Sequences. Mol. Phylogenetics Evol. 2016, 102, 331–343. [Google Scholar] [CrossRef]
- Bellamy-Royds, A.B.; Turcotte, M. Can Clustal-Style Progressive Pairwise Alignment of Multiple Sequences Be Used in RNA Secondary Structure Prediction? BMC Bioinform. 2007, 8, 190. [Google Scholar] [CrossRef]
- Masoumi, B.; Turcotte, M. Simultaneous Alignment and Structure Prediction of Three RNA Sequences. Int. J. Bioinform. Res. Appl. 2005, 1, 230–245. [Google Scholar] [CrossRef] [PubMed]
- Xia, X. Phylogenetic Relationship Among Horseshoe Crab Species: Effect of Substitution Models on Phylogenetic Analyses. Syst. Biol. 2000, 49, 87–100. [Google Scholar] [CrossRef] [PubMed]
- Xia, X.; Xie, Z.; Kjer, K.M. 18S Ribosomal RNA and Tetrapod Phylogeny. Syst. Biol. 2003, 52, 283–295. [Google Scholar] [CrossRef] [PubMed]
- Zhan, Q.; Ye, Y.; Lam, T.-W.; Yiu, S.-M.; Wang, Y.; Ting, H.-F. Improving Multiple Sequence Alignment by Using Better Guide Trees. BMC Bioinform. 2015, 16, S4. [Google Scholar] [CrossRef] [PubMed]
- Capella-Gutiérrez, S.; Gabaldón, T. Measuring Guide-Tree Dependency of Inferred Gaps in Progressive Aligners. Bioinformatics 2013, 29, 1011–1017. [Google Scholar] [CrossRef] [PubMed]
- Penn, O.; Privman, E.; Landan, G.; Graur, D.; Pupko, T. An Alignment Confidence Score Capturing Robustness to Guide Tree Uncertainty. Mol. Biol. Evol. 2010, 27, 1759–1767. [Google Scholar] [CrossRef] [PubMed]
- Nelesen, S.; Liu, K.; Zhao, D.; Linder, C.R.; Warnow, T. The Effect of the Guide Tree on Multiple Sequence Alignments and Subsequent Phylogenetic Analyses. In Biocomputing 2008; World Scientific: Singapore, 2007; pp. 25–36. ISBN 978-981-277-608-2. [Google Scholar]
- Ye, Y.; Cheung, D.W.; Wang, Y.; Yiu, S.-M.; Zhan, Q.; Lam, T.-W.; Ting, H.-F. GLProbs: Aligning Multiple Sequences Adaptively. In Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics, Wshington, DC, USA, 22–25 September 2013; Association for Computing Machinery: New York, NY, USA; pp. 152–160. [Google Scholar]
- Kruspe, M.; Stadler, P.F. Progressive Multiple Sequence Alignments from Triplets. BMC Bioinform. 2007, 8, 254. [Google Scholar] [CrossRef] [PubMed]
- Chien, R.-T.; Liao, Y.-L.; Wang, C.-A.; Li, Y.-C.; Lu, Y.-C. Three-Dimensional Dynamic Programming Accelerator for Multiple Sequence Alignment. In Proceedings of the 2018 IEEE Nordic Circuits and Systems Conference (NORCAS): NORCHIP and International Symposium of System-on-Chip (SoC), Tallinn, Estonia, 30–31 October 2018; pp. 1–5. [Google Scholar]
- Gotoh, O. Alignment of Three Biological Sequences with an Efficient Traceback Procedure. J. Theor. Biol. 1986, 121, 327–337. [Google Scholar] [CrossRef] [PubMed]
- Carrillo, H.; Lipman, D. The Multiple Sequence Alignment Problem in Biology. SIAM J. Appl. Math. 1988, 48, 1073–1082. [Google Scholar] [CrossRef]
- Huang, X. Alignment of Three Sequences in Quadratic Space. SIGAPP Appl. Comput. Rev. 1993, 1, 7–11. [Google Scholar] [CrossRef]
- Ly-Trong, N.; Naser-Khdour, S.; Lanfear, R.; Minh, B.Q. AliSim: A Fast and Versatile Phylogenetic Sequence Simulator for the Genomic Era. Mol. Biol. Evol. 2022, 39, msac092. [Google Scholar] [CrossRef] [PubMed]
- Jones, D.T.; Taylor, W.R.; Thornton, J.M. The Rapid Generation of Mutation Data Matrices from Protein Sequences. Bioinformatics 1992, 8, 275–282. [Google Scholar] [CrossRef]
- Guindon, S.; Dufayard, J.-F.; Lefort, V.; Anisimova, M.; Hordijk, W.; Gascuel, O. New Algorithms and Methods to Estimate Maximum-Likelihood Phylogenies: Assessing the Performance of PhyML 3.0. Syst. Biol. 2010, 59, 307–321. [Google Scholar] [CrossRef]
- Robinson, D.F.; Foulds, L.R. Comparison of Phylogenetic Trees. Math. Biosci. 1981, 53, 131–147. [Google Scholar] [CrossRef]
- Paradis, E.; Claude, J.; Strimmer, K. APE: Analyses of Phylogenetics and Evolution in R Language. Bioinformatics 2004, 20, 289–290. [Google Scholar] [CrossRef] [PubMed]
- Edgar, R.C.; Batzoglou, S. Multiple Sequence Alignment. Curr. Opin. Struct. Biol. 2006, 16, 368–373. [Google Scholar] [CrossRef]
- Gotoh, O. A Weighting System and Aigorithm for Aligning Many Phylogenetically Related Sequences. Bioinformatics 1995, 11, 543–551. [Google Scholar] [CrossRef]
- Altschul, S.F.; Carroll, R.J.; Lipman, D.J. Weights for Data Related by a Tree. J. Mol. Biol. 1989, 207, 647–653. [Google Scholar] [CrossRef] [PubMed]
- Thompson, J.D.; Koehl, P.; Ripp, R.; Poch, O. BAliBASE 3.0: Latest Developments of the Multiple Sequence Alignment Benchmark. Proteins Struct. Funct. Bioinform. 2005, 61, 127–136. [Google Scholar] [CrossRef]
- Edgar, R.C. Muscle5: High-Accuracy Alignment Ensembles Enable Unbiased Assessments of Sequence Homology and Phylogeny. Nat. Commun. 2022, 13, 6968. [Google Scholar] [CrossRef]
- Gotoh, O. Consistency of Optimal Sequence Alignments. Bull. Math. Biol. 1990, 52, 509–525. [Google Scholar] [CrossRef] [PubMed]
Symmetric Tree | Asymmetric Tree | |||||
---|---|---|---|---|---|---|
Guide Tree | Ntrue (1) | RFd (2) | SERFd (3) | Ntrue (1) | RFd (2) | SERFd (3) |
L-INS-i | 27 | 0.92 | 0.1424 | 0 | 5.12 | 0.1993 |
3-WAY | 31 | 0.76 | 0.1387 | 0 | 4.84 | 0.2067 |
H-S Tree | S Tree | H-AS Tree | AS Tree | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Guide Tree | Ntrue | RFd | SERFd | Ntrue | RFd | SERFd | Ntrue | RFd | SERFd | Ntrue | RFd | SERFd |
FFT-NS-1 | 35 | 1.08 | 0.2693 | 0 | 9.24 | 0.5107 | 15 | 2.52 | 0.3514 | 0 | 15.32 | 0.5817 |
FFT-NS-2 | 34 | 1.28 | 0.2956 | 0 | 7.24 | 0.4495 | 29 | 1.16 | 0.2497 | 0 | 13 | 0.4891 |
L-INS-i | 50 | 0 | 0 | 39 | 0.6 | 0.1429 | 41 | 0.36 | 0.1098 | 0 | 6.48 | 0.2325 |
3-WAY | 50 | 0 | 0 | 40 | 0.4 | 0.1143 | 39 | 0.52 | 0.1491 | 3 | 5.8 | 0.3886 |
Symmetric Tree | Asymmetric Tree | |||||
---|---|---|---|---|---|---|
Guide Tree | Ntrue | RFd | SERFd | Ntrue | RFd | SERFd |
True Tree | 50 | 0 | 0 | 21 | 1.52 | 0.2180 |
L-INS-i | 29 | 0.92 | 0.1637 | 0 | 6 | 0.2356 |
3-WAY | 34 | 0.68 | 0.1469 | 0 | 5 | 0.2231 |
H-S Tree | S Tree | H-AS Tree | AS Tree | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Guide Tree | Ntrue | RFd | SERFd | Ntrue | RFd | SERFd | Ntrue | RFd | SERFd | Ntrue | RFd | SERFd |
True tree | 50 | 0 | 0 | 50 | 0 | 0 | 50 | 0 | 0 | 18 | 2.40 | 0.3182 |
FFT-NS-1 | 45 | 0.12 | 0.0679 | 11 | 3.28 | 0.3865 | 40 | 0.44 | 0.1314 | 0 | 11.16 | 0.4203 |
FFT-NS-2 | 47 | 0.20 | 0.0857 | 13 | 3.28 | 0.3908 | 43 | 0.32 | 0.1193 | 0 | 10.52 | 0.4345 |
L-INS-i | 50 | 0 | 0 | 45 | 0.20 | 0.0857 | 44 | 0.28 | 0.1144 | 0 | 6.60 | 0.3758 |
3-Way | 50 | 0 | 0 | 48 | 0.12 | 0.0887 | 46 | 0.16 | 0.0755 | 1 | 6.36 | 0.3114 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Askari Rad, M.; Kruglikov, A.; Xia, X. Three-Way Alignment Improves Multiple Sequence Alignment of Highly Diverged Sequences. Algorithms 2024, 17, 205. https://doi.org/10.3390/a17050205
Askari Rad M, Kruglikov A, Xia X. Three-Way Alignment Improves Multiple Sequence Alignment of Highly Diverged Sequences. Algorithms. 2024; 17(5):205. https://doi.org/10.3390/a17050205
Chicago/Turabian StyleAskari Rad, Mahbubeh, Alibek Kruglikov, and Xuhua Xia. 2024. "Three-Way Alignment Improves Multiple Sequence Alignment of Highly Diverged Sequences" Algorithms 17, no. 5: 205. https://doi.org/10.3390/a17050205