Next Article in Journal
A Toolbox for Herpesvirus miRNA Research: Construction of a Complete Set of KSHV miRNA Deletion Mutants
Next Article in Special Issue
From Conventional to Next Generation Sequencing of Epstein-Barr Virus Genomes
Previous Article in Journal
In Vitro and in Vivo Evaluation of Mutations in the NS Region of Lineage 2 West Nile Virus Associated with Neuroinvasiveness in a Mammalian Model
Previous Article in Special Issue
Metagenomic Analysis of Virioplankton of the Subtropical Jiulong River Estuary, China
viruses-logo
Article Menu
Open AccessArticle

Identification of Known and Novel Recurrent Viral Sequences in Data from Multiple Patients and Multiple Cancers

1
Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark
2
Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, DK-1350 Copenhagen, Denmark
3
Department of Autoimmunology and Biomarkers, Statens Serum Institut, DK-2300 Copenhagen S, Denmark
4
NNF Center for Protein Research, University of Copenhagen, Blegdamsvej 3B, DK-2200 Copenhagen, Denmark
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Academic Editor: Marcus Thomas Gilbert
Viruses 2016, 8(2), 53; https://doi.org/10.3390/v8020053
Received: 30 October 2015 / Revised: 29 January 2016 / Accepted: 5 February 2016 / Published: 19 February 2016
Virus discovery from high throughput sequencing data often follows a bottom-up approach where taxonomic annotation takes place prior to association to disease. Albeit effective in some cases, the approach fails to detect novel pathogens and remote variants not present in reference databases. We have developed a species independent pipeline that utilises sequence clustering for the identification of nucleotide sequences that co-occur across multiple sequencing data instances. We applied the workflow to 686 sequencing libraries from 252 cancer samples of different cancer and tissue types, 32 non-template controls, and 24 test samples. Recurrent sequences were statistically associated to biological, methodological or technical features with the aim to identify novel pathogens or plausible contaminants that may associate to a particular kit or method. We provide examples of identified inhabitants of the healthy tissue flora as well as experimental contaminants. Unmapped sequences that co-occur with high statistical significance potentially represent the unknown sequence space where novel pathogens can be identified. View Full-Text
Keywords: sequence clustering; taxonomic characterisation; novel sequence identification; next generation sequencing; cancer causing viruses; oncoviruses; assay contamination sequence clustering; taxonomic characterisation; novel sequence identification; next generation sequencing; cancer causing viruses; oncoviruses; assay contamination
Show Figures

Figure 1

MDPI and ACS Style

Friis-Nielsen, J.; Kjartansdóttir, K.R.; Mollerup, S.; Asplund, M.; Mourier, T.; Jensen, R.H.; Hansen, T.A.; Rey-Iglesia, A.; Richter, S.R.; Nielsen, I.B.; Alquezar-Planas, D.E.; Olsen, P.V.S.; Vinner, L.; Fridholm, H.; Nielsen, L.P.; Willerslev, E.; Sicheritz-Pontén, T.; Lund, O.; Hansen, A.J.; Izarzugaza, J.M.G.; Brunak, S. Identification of Known and Novel Recurrent Viral Sequences in Data from Multiple Patients and Multiple Cancers. Viruses 2016, 8, 53.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop