Next Article in Journal
HIV-1 Persistence and Chronic Induction of Innate Immune Responses in Macrophages
Previous Article in Journal
Amino Acid at Position 166 of NS2A in Japanese Encephalitis Virus (JEV) Is Associated with In Vitro Growth Characteristics of JEV
Previous Article in Special Issue
Early Phase of the COVID-19 Outbreak in Hungary and Post-Lockdown Scenarios
Open AccessArticle

Bioinformatics Pipeline for Human Papillomavirus Short Read Genomic Sequences Classification Using Support Vector Machine

1
Wallace H. Coulter Department of Biomedical Engineering, Georgia Tech, Atlanta, GA 30332, USA
2
Division of High-Consequence Pathogens & Pathology, Centers for Disease Control and Prevention, Atlanta, GA 30329, USA
3
School of Computational Science and Engineering Georgia Tech, Atlanta, GA 30332, USA
*
Author to whom correspondence should be addressed.
Viruses 2020, 12(7), 710; https://doi.org/10.3390/v12070710
Received: 3 June 2020 / Revised: 26 June 2020 / Accepted: 27 June 2020 / Published: 30 June 2020
(This article belongs to the Special Issue Computational Biology of Viruses: From Molecules to Epidemics)
We recently developed a test based on the Agilent SureSelect target enrichment system capturing genomic fragments from 191 human papillomaviruses (HPV) types for Illumina sequencing. This enriched whole genome sequencing (eWGS) assay provides an approach to identify all HPV types in a sample. Here we present a machine learning algorithm that calls HPV types based on the eWGS output. The algorithm based on the support vector machine (SVM) technique was trained on eWGS data from 122 control samples with known HPV types. The new algorithm demonstrated good performance in HPV type detection for designed samples with 25 or greater HPV plasmid copies per sample. We compared the results of HPV typing made by the new algorithm for 261 residual epidemiologic samples with the results of the typing delivered by the standard HPV Linear Array (LA). The agreement between methods (97.4%) was substantial (kappa = 0.783). However, the new algorithm identified additionally 428 instances of HPV types not detectable by the LA assay by design. Overall, we have demonstrated that the bioinformatics pipeline is an accurate tool for calling HPV types by analyzing data generated by eWGS processing of DNA fragments extracted from control and epidemiological samples. View Full-Text
Keywords: HPV typing; HPV whole genome sequencing; target enrichment; h classification; bioinformatics pipeline HPV typing; HPV whole genome sequencing; target enrichment; h classification; bioinformatics pipeline
Show Figures

Figure 1

MDPI and ACS Style

Lomsadze, A.; Li, T.; Rajeevan, M.S.; Unger, E.R.; Borodovsky, M. Bioinformatics Pipeline for Human Papillomavirus Short Read Genomic Sequences Classification Using Support Vector Machine. Viruses 2020, 12, 710.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Search more from Scilit
 
Search
Back to TopTop