Benchmarking State-of-the-Art Approaches for Norovirus Genome Assembly in Metagenome Sample
Abstract
:Simple Summary
Abstract
1. Introduction
2. Methods
3. Results
4. Discussion
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Fuentes-Trillo, A.; Monzó, C.; Manzano, I.; Santiso-Bellón, C.; Andrade, J.d.S.R.d.; Gozalbo-Rovira, R.; García-García, A.B.; Rodríguez-Díaz, J.; Chaves, F.J. Benchmarking different approaches for Norovirus genome assembly in metagenome samples. BMC Genom. 2021, 22, 1–12. [Google Scholar] [CrossRef] [PubMed]
- Edgar, R.C.; Taylor, J.; Lin, V.; Altman, T.; Barbera, P.; Meleshko, D.; Lohr, D.; Novakovsky, G.; Buchfink, B.; Al-Shayeb, B.; et al. Petabase-scale sequence alignment catalyses viral discovery. Nature 2022, 602, 142–147. [Google Scholar] [CrossRef]
- Kawasaki, J.; Kojima, S.; Tomonaga, K.; Horie, M. Hidden viral sequences in public sequencing data and warning for future emerging diseases. Mbio 2021, 12, e01638-21. [Google Scholar] [CrossRef] [PubMed]
- Sczyrba, A.; Hofmann, P.; Belmann, P.; Koslicki, D.; Janssen, S.; Dröge, J.; Gregor, I.; Majda, S.; Fiedler, J.; Dahms, E.; et al. Critical assessment of metagenome interpretation—A benchmark of metagenomics software. Nat. Methods 2017, 14, 1063–1071. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Magoc, T.; Pabinger, S.; Canzar, S.; Liu, X.; Su, Q.; Puiu, D.; Tallon, L.J.; Salzberg, S.L. GAGE-B: An evaluation of genome assemblers for bacterial organisms. Bioinformatics 2013, 29, 1718–1725. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Luo, W.; Friedman, M.S.; Shedden, K.; Hankenson, K.D.; Woolf, P.J. GAGE: Generally applicable gene set enrichment for pathway analysis. BMC Bioinform. 2009, 10, 1–17. [Google Scholar] [CrossRef] [Green Version]
- Meyer, F.; Fritz, A.; Deng, Z.L.; Koslicki, D.; Lesker, T.R.; Gurevich, A.; Robertson, G.; Alser, M.; Antipov, D.; Beghini, F.; et al. Critical Assessment of Metagenome Interpretation: The second round of challenges. Nat. Methods 2022, 19, 429–440. [Google Scholar] [CrossRef]
- Nurk, S.; Meleshko, D.; Korobeynikov, A.; Pevzner, P.A. metaSPAdes: A new versatile metagenomic assembler. Genome Res. 2017, 27, 824–834. [Google Scholar] [CrossRef] [Green Version]
- Li, D.; Liu, C.M.; Luo, R.; Sadakane, K.; Lam, T.W. MEGAHIT: An ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 2015, 31, 1674–1676. [Google Scholar] [CrossRef] [Green Version]
- Roux, S.; Emerson, J.B.; Eloe-Fadrosh, E.A.; Sullivan, M.B. Benchmarking viromics: An in silico evaluation of metagenome-enabled estimates of viral community composition and diversity. PeerJ 2017, 5, e3817. [Google Scholar] [CrossRef] [Green Version]
- Sutton, T.D.S.; Clooney, A.G.; Ryan, F.J.; Ross, R.P.; Hill, C. Choice of assembly software has a critical impact on virome characterisation. Microbiome 2019, 7, 12. [Google Scholar] [CrossRef] [Green Version]
- Grabherr, M.G.; Haas, B.J.; Yassour, M.; Levin, J.Z.; Thompson, D.A.; Amit, I.; Adiconis, X.; Fan, L.; Raychowdhury, R.; Zeng, Q.; et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol. 2011, 29, 644–652. [Google Scholar] [CrossRef] [Green Version]
- Bushmanova, E.; Antipov, D.; Lapidus, A.; Prjibelski, A.D. rnaSPAdes: A de novo transcriptome assembler and its application to RNA-Seq data. GigaScience 2019, 8, giz100. [Google Scholar] [CrossRef] [Green Version]
- Baaijens, J.A.; Aabidine, A.Z.E.; Rivals, E.; Schönhuth, A. De novo assembly of viral quasispecies using overlap graphs. Genome Res. 2017, 27, 835–848. [Google Scholar] [CrossRef] [Green Version]
- Hunt, M.; Gall, A.; Ong, S.H.; Brener, J.; Ferns, B.; Goulder, P.; Nastouli, E.; Keane, J.A.; Kellam, P.; Otto, T.D. IVA: Accurate de novo assembly of RNA virus genomes. Bioinformatics 2015, 31, 2374–2376. [Google Scholar] [CrossRef] [Green Version]
- Meleshko, D.; Hajirasouliha, I.; Korobeynikov, A. coronaSPAdes: From biosynthetic gene clusters to RNA viral assemblies. Bioinformatics 2022, 38, 1–8. [Google Scholar] [CrossRef]
- Chan, M.C.; Kwan, H.S.; Chan, P.K. Structure and Genotypes of Noroviruses. In The Norovirus; Elsevier: Amsterdam, The Netherlands, 2017; pp. 51–63. [Google Scholar] [CrossRef]
- Bigot, T.; Temmam, S.; Pérot, P.; Eloit, M. RVDB-prot, a reference viral protein database and its HMM profiles [version 2; peer review: 2 approved]. F1000Research 2020, 8, 530. [Google Scholar] [CrossRef]
- Gurevich, A.; Saveliev, V.; Vyahhi, N.; Tesler, G. QUAST: Quality assessment tool for genome assemblies. Bioinformatics 2013, 29, 1072–1075. [Google Scholar] [CrossRef] [Green Version]
- Viehweger, A.; Krautwurst, S.; Lamkiewicz, K.; Madhugiri, R.; Ziebuhr, J.; Hölzer, M.; Marz, M. Direct RNA nanopore sequencing of full-length coronavirus genomes provides novel insights into structural variants and enables modification analysis. Genome Res. 2019, 29, 1545–1554. [Google Scholar] [CrossRef] [Green Version]
- Eddy, S.R. Accelerated Profile HMM Searches. PLoS Comput. Biol. 2011, 7, e1002195. [Google Scholar] [CrossRef] [Green Version]
- Hyatt, D.; Chen, G.L.; LoCascio, P.F.; Land, M.L.; Larimer, F.W.; Hauser, L.J. Prodigal: Prokaryotic gene recognition and translation initiation site identification. BMC Bioinform. 2010, 11, 1–11. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Lukashin, A. GeneMark.hmm: New solutions for gene finding. Nucleic Acids Res. 1998, 26, 1107–1115. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Zhang, K.Y.; Gao, Y.Z.; Du, M.Z.; Liu, S.; Dong, C.; Guo, F.B. Vgas: A Viral Genome Annotation System. Front. Microbiol. 2019, 10, 184. [Google Scholar] [CrossRef] [PubMed]
- Li, W.; Godzik, A. Cd-hit: A fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 2006, 22, 1658–1659. [Google Scholar] [CrossRef] [Green Version]
- Edgar, R.C. MUSCLE: Multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004, 32, 1792–1797. [Google Scholar] [CrossRef] [Green Version]
- Steinegger, M.; Söding, J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat. Biotechnol. 2017, 35, 1026–1028. [Google Scholar] [CrossRef] [Green Version]
RVS | CS | RS | M | T | ||
---|---|---|---|---|---|---|
SRR8074276 | Longest alignment (nt) | 7538 | 7538 | 7538 | 7548 | 7547 |
Genome fraction% | 99.83 | 99.83 | 99.83 | 99.96 | 99.94 | |
Longest alignment IDY% | 99.95 | 99.95 | 99.91 | 99.87 | 99.93 | |
SRR9141472 | Longest alignment (nt) | 7569 | 7569 | 5282 | 7560 | 5848 |
Genome fraction% | 100.0 | 100.0 | 78.80 | 99.88 | 100.0 | |
Longest alignment IDY% | 100.0 | 100.0 | 99.95 | 99.99 | 100.0 | |
SRR9141473 | Longest alignment (nt) | 7536 | 7536 | 4,979 | 7487 | 6,899 |
Genome fraction% | 100.0 | 100.0 | 90.55 | 99.35 | 100.0 | |
Longest alignment IDY% | 100.0 | 100.0 | 99.91 | 99.97 | 100.0 | |
SRR9141474 | Longest alignment (nt) | 7541 | 7541 | 6,838 | 7516 | 7542 |
Genome fraction% | 99.99 | 99.99 | 99.99 | 99.65 | 100.0 | |
Longest alignment IDY% | 100.0 | 100.0 | 99.91 | 99.99 | 100.0 | |
SRR9141475 | Longest alignment (nt) | 7482 | 7493 | 7049 | 7440 | 7534 |
Genome fraction% | 99.28 | 99.43 | 99,08 | 98.72 | 99.97 | |
Longest alignment IDY% | 100.0 | 100.0 | 99.96 | 100.0 | 100.0 | |
SRR9141476 | Longest alignment (nt) | 7533 | 7533 | 5699 | 7526 | 7533 |
Genome fraction% | 100.0 | 100.0 | 100.0 | 99.90 | 100.0 | |
Longest alignment IDY% | 100.0 | 100.0 | 99.98 | 100.0 | 99.98 | |
SRR9141477 | Longest alignment (nt) | 7554 | 7554 | 5935 | 7467 | 3658 |
Genome fraction% | 99.95 | 99.95 | 91.20 | 98.79 | 99.84 | |
Longest alignment IDY% | 100.0 | 100.0 | 99.93 | 99.96 | 99.97 | |
SRR9141478 | Longest alignment (nt) | 7540 | 7540 | 6,592 | 7540 | 7540 |
Genome fraction% | 100.0 | 100.0 | 95.90 | 100.0 | 100.0 | |
Longest alignment IDY% | 99.99 | 99.99 | 99.95 | 99.99 | 99.88 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Meleshko, D.; Korobeynikov, A. Benchmarking State-of-the-Art Approaches for Norovirus Genome Assembly in Metagenome Sample. Biology 2023, 12, 1066. https://doi.org/10.3390/biology12081066
Meleshko D, Korobeynikov A. Benchmarking State-of-the-Art Approaches for Norovirus Genome Assembly in Metagenome Sample. Biology. 2023; 12(8):1066. https://doi.org/10.3390/biology12081066
Chicago/Turabian StyleMeleshko, Dmitry, and Anton Korobeynikov. 2023. "Benchmarking State-of-the-Art Approaches for Norovirus Genome Assembly in Metagenome Sample" Biology 12, no. 8: 1066. https://doi.org/10.3390/biology12081066
APA StyleMeleshko, D., & Korobeynikov, A. (2023). Benchmarking State-of-the-Art Approaches for Norovirus Genome Assembly in Metagenome Sample. Biology, 12(8), 1066. https://doi.org/10.3390/biology12081066