Next Article in Journal
Sialic Acid-Containing Glycans as Cellular Receptors for Ocular Human Adenoviruses: Implications for Tropism and Treatment
Next Article in Special Issue
RNAseq Analysis Reveals Virus Diversity within Hawaiian Apiary Insect Communities
Previous Article in Journal
Isolation and Characterization of Lactobacillus brevis Phages
Previous Article in Special Issue
Interpreting Viral Deep Sequencing Data with GLUE
Article Menu

Export Article

Open AccessArticle

The Utility of Data Transformation for Alignment, De Novo Assembly and Classification of Short Read Virus Sequences

1
School of Biological Sciences, The University of Manchester, Manchester M13 9PT, UK
2
Modernising Medical Microbiology Consortium, Nuffield Department of Clinical Medicine, John Radcliffe Hospital, University of Oxford, Oxford OX3 9DU, UK
3
Department of Viroscience, Erasmus Medical Centre, Doctor Molewaterplein 40, 3015 GD Rotterdam, The Netherlands
4
Department of Engineering Science, Institute of Biomedical Engineering, University of Oxford, Oxford OX3 7DQ, UK
5
MRC-University of Glasgow Centre for Virus Research, Glasgow G61 1QH, UK
6
MRC/UVRI & LSHTM Uganda Research Unit Entebbe, P.O. Box 49 Entebbe, Uganda
*
Author to whom correspondence should be addressed.
Viruses 2019, 11(5), 394; https://doi.org/10.3390/v11050394
Received: 30 March 2019 / Revised: 19 April 2019 / Accepted: 22 April 2019 / Published: 26 April 2019
(This article belongs to the Special Issue Virus Bioinformatics)
  |  
PDF [4505 KB, uploaded 29 April 2019]
  |     |  

Abstract

Advances in DNA sequencing technology are facilitating genomic analyses of unprecedented scope and scale, widening the gap between our abilities to generate and fully exploit biological sequence data. Comparable analytical challenges are encountered in other data-intensive fields involving sequential data, such as signal processing, in which dimensionality reduction (i.e., compression) methods are routinely used to lessen the computational burden of analyses. In this work, we explored the application of dimensionality reduction methods to numerically represent high-throughput sequence data for three important biological applications of virus sequence data: reference-based mapping, short sequence classification and de novo assembly. Leveraging highly compressed sequence transformations to accelerate sequence comparison, our approach yielded comparable accuracy to existing approaches, further demonstrating its suitability for sequences originating from diverse virus populations. We assessed the application of our methodology using both synthetic and real viral pathogen sequences. Our results show that the use of highly compressed sequence approximations can provide accurate results, with analytical performance retained and even enhanced through appropriate dimensionality reduction of sequence data. View Full-Text
Keywords: alignment; assembly; taxonomic classification; time series; data transformation; DWT; DFT; PAA; data compression; compressive genomics alignment; assembly; taxonomic classification; time series; data transformation; DWT; DFT; PAA; data compression; compressive genomics
Figures

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).
SciFeed

Share & Cite This Article

MDPI and ACS Style

Tapinos, A.; Constantinides, B.; Phan, M.V.T.; Kouchaki, S.; Cotten, M.; Robertson, D.L. The Utility of Data Transformation for Alignment, De Novo Assembly and Classification of Short Read Virus Sequences. Viruses 2019, 11, 394.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics

1

Comments

[Return to top]
Viruses EISSN 1999-4915 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top