HPV Population Profiling in Healthy Men by Next-Generation Deep Sequencing Coupled with HPV-QUEST

Multiple-type human papillomaviruses (HPV) infection presents a greater risk for persistence in asymptomatic individuals and may accelerate cancer development. To extend the scope of HPV types defined by probe-based assays, multiplexing deep sequencing of HPV L1, coupled with an HPV-QUEST genotyping server and a bioinformatic pipeline, was established and applied to survey the diversity of HPV genotypes among a subset of healthy men from the HPV in Men (HIM) Multinational Study. Twenty-one HPV genotypes (12 high-risk and 9 low-risk) were detected in the genital area from 18 asymptomatic individuals. A single HPV type, either HPV16, HPV6b or HPV83, was detected in 7 individuals, while coinfection by 2 to 5 high-risk and/or low-risk genotypes was identified in the other 11 participants. In two individuals studied for over one year, HPV16 persisted, while fluctuations of coinfecting genotypes occurred. HPV L1 regions were generally identical between query and reference sequences, although nonsynonymous and synonymous nucleotide polymorphisms of HPV16, 18, 31, 35h, 59, 70, 73, cand85, 6b, 62, 81, 83, cand89 or JEB2 L1 genotypes, mostly unidentified by linear array, were evident. Deep sequencing coupled with HPV-QUEST provides efficient and unambiguous classification of HPV genotypes in multiple-type HPV infection in host ecosystems.

Studies of multiple-type HPV infection by conventional probe-based assays are hampered by the limited scope of type-specific probes and/or cross-hybridization by probes for related HPV genomes [26][27][28][29][30]. Direct sequencing based on electropherographs or pyrograms is problematic in accurately defining multiple infections present in one specimen [31,32]. Next-generation deep sequencing (NGS) technology combines the specificity of sequence data with a capacity to analyze large numbers of samples within a clinical setting. In this study, an HPV-genotyping system was developed by coupling NGS with HPV-QUEST, a web server we developed previously for automated HPV genotyping with a capability of handling up to 10 megabases (Mb) of sequences [33], and applied to a subset of healthy men from the HPV in Men (HIM) Multinational Study [34][35][36][37] to survey HPV genotype diversity cross-sectionally and longitudinally.

Study Participants, Specimen Collection, and DNA Extraction
The study included 22 samples from 18 men from the United States (n = 15), Brazil (n = 1), or Mexico (n = 2), representing a subset of healthy males from the HPV in Men (HIM) Multinational Study recruited between 2005 and 2008 to examine the HPV prevalence in asymptomatic males [34][35][36][37]. The HIM study was approved by the human-subjects' committees of the University of South Florida (St. Petersburg, FL, USA), the Ludwig Institute for Cancer Research (São Paulo, Brazil), the Centro de Referẽncia e Tratamento de Doencas Sexualmente Transmissiveis e AIDS, and the National Institute of Public Health of Mexico (Cuernavaca, Mexico). Written consent was obtained from all participants. Participants at study entry were a median (25%-75% quartile range (QR)) age of 22 years (19-31 years), had none to 3 sexual partners within 6 weeks before sampling (Table S1), were HIV-1 negative, and received no HPV vaccines before or during the course of the study. Genital sampling and DNA extraction have been described previously [35,38]. In brief, saline-wetted Dacron applicators (Digene, Gaithersburg, MD, USA) were used to swab three sites of the external genitalia (glans penis/coronal sulcus, penile shaft, and scrotum), and combined into one sample in 450 µL of specimen transport medium (Digene Corporation, Gaithersburg, MD). DNA was extracted using the Media Kit (Qiagen, Valencia, CA, USA) according to the manufacturer's instructions, and eluted in a 100-to 120 µL elution buffer. The consistency of sample collection reached 91.4% as tested previously [38]. To prevent sample cross-contamination, DNA extraction was carefully performed in a clean and plasmid-free environment. HPV L1 genotyping by linear array (LA) method (Roche Diagnostics, Indianapolis, IN, USA) using PGMY11/09 (PGMY) consensus primers and probes for 37 HPV mucosal types (6, 11,16,18,26,31,33,35,39,40,42,45,[51][52][53][54][55][56]58,59,61,62,64,[66][67][68][69][70][71][72]73 (MM9), 81, 82 (MM4), 83 (MM7), 84 (MM8), IS39 and CP6108) (Linear Array®HPV Genotyping Test, Roche Molecular Diagnostics, Pleasanton, CA, USA) was performed at the H. Lee Moffitt Cancer Center (HLMCC, Tampa, FL, USA). HPV genotyping by LA was highly reproducible with 83.5% or 93.2% concordance rate for single-or multiple-type detection, respectively [38]. The deep sequencing sub-study included samples with HPV L1 genomes identified by the LA method (12 samples including longitudinal samples from 8 participants) or with indeterminate HPV infection (10 samples from 10 participants) reflecting failure either to amplify by PGMY primers or to hybridize by probes in the LA assay. DNA from de-identified samples, provided to the University of Florida (UF) with approval by the Institutional Review Boards at HLMCC and UF, was concentrated by speed vacuum, re-suspended in 5 µL of double distilled water, and used in a single PCR to capture all HPV copies in the sample. The median (QR) input of total DNA was 349 ng (182-995 ng), equivalent to approximately median (QR) of 72,000 cells (31,500-150,000 cells), based on a calculation that 600 ng DNA is equivalent to 100,000 cells [39][40][41][42]. Limited sample DNA precluded direct quantification of total cells or HPV genome copies by quantitative PCR. Deep sequencing coupled with HPV-QUEST [33] was applied to a cross-sectional study of 18 individuals, and a longitudinal follow-up of 2 participants with 6 samples obtained at 6-month intervals. DNA extracted from Primer sets used to generate human papillomaviruses (HPV) L1 amplicon libraries. A cocktail of PGMY11/09 PGMY consensus primers was used in 1st---round amplification to produce a product of 464 nucleotides (nt). GP5+ and GP6+ consensus primers conjugated at 5' end with adaptors A or B respectively, were used in 2nd---round amplification to produce a product of 155 nt (106 nt w/o adaptors, multiplex identifier (MID) and primers). Primer positions are based on HPV8 genome (M12737.1) [44,45]. Distinct MIDs were added between adaptor A and GP5+ for sample identification. Sequences were read from adaptor A.

Sequence Analysis
The bioinformatic pipeline for analysis of L1 deep sequences is outlined in Figure  2. Raw reads generated from pooled libraries were de---multiplexed in Geneious Pro 5.6.5 (Biomatters Ltd., Auckland, New Zealand) using "Separate Reads by Barcode" function with 454 MIDS selected as the Barcode Set to assign sequences to each individual. Quality control steps were performed first in Geneious Pro 5.6.5 to exclude unassigned reads, reads inaccurately assigned to MIDs unused in multiplexing due to an incomplete MID, no MID, sequencing errors in the MID, reads with more than one mismatch in any of the primers, or reads with a sequence length outside the mean ± 2 SD range. Subsequent analysis in BioEdit (Ibis Biosciences, Carlsbad, CA, USA) identified and excluded reads with ambiguous nucleotide(s) or out---of---reading---frame indels. After quality control, a median (QR) of 1,441 (684-1,670) and 25,928 (16,,038) quality sequences were obtained for the first 14 samples and the 2nd 11 samples, respectively, with correspondent median (QR) removal rate of 9.7% (8.0%-14.5%) and 1.4% (0.7%-3.4%) (Table  S1). Primer sets used to generate human papillomaviruses (HPV) L1 amplicon libraries. A cocktail of PGMY11/09 PGMY consensus primers was used in 1st-round amplification to produce a product of 464 nucleotides (nt). GP5+ and GP6+ consensus primers conjugated at 5' end with adaptors A or B respectively, were used in 2nd-round amplification to produce a product of 155 nt (106 nt w/o adaptors, multiplex identifier (MID) and primers). Primer positions are based on HPV8 genome (M12737.1) [44,45]. Distinct MIDs were added between adaptor A and GP5+ for sample identification. Sequences were read from adaptor A.

Sequence Analysis
The bioinformatic pipeline for analysis of L1 deep sequences is outlined in Figure 2. Raw reads generated from pooled libraries were de-multiplexed in Geneious Pro 5.6.5 (Biomatters Ltd., Auckland, New Zealand) using "Separate Reads by Barcode" function with 454 MIDS selected as the Barcode Set to assign sequences to each individual. Quality control steps were performed first in Geneious Pro 5.6.5 to exclude unassigned reads, reads inaccurately assigned to MIDs unused in multiplexing due to an incomplete MID, no MID, sequencing errors in the MID, reads with more than one mismatch in any of the primers, or reads with a sequence length outside the mean˘2 SD range. Subsequent analysis in BioEdit (Ibis Biosciences, Carlsbad, CA, USA) identified and excluded reads with ambiguous nucleotide(s) or out-of-reading-frame indels. After quality control, a median (QR) of 1,441 (684-1,670) and 25,928 (16,,038) quality sequences were obtained for the first 14 samples and the 2nd 11 samples, respectively, with correspondent median (QR) removal rate of 9.7% (8.0%-14.5%) and 1.4% (0.7%-3.4%) (Table S1).
Quality sequences were clustered at 3% pairwise distance (representing 2% dissimilarity within variants [52][53][54] plus 1% to more than compensate for sequencing errors) using ESPRIT [51]. ESPRIT generated a consensus sequence for each cluster representing an HPV genotype variant, and displayed the size of the cluster (number of sequences/cluster) reflecting the abundance of each variant, which only serves as a reference rather than a quantitation due to the possible bias of PGMY/GP+ consensus primers towards certain HPV genotypes over others. The number of consensus sequences for each genotype is detailed in Table S1. . Bioinformatic pipeline of sequence analysis. HPV L1 was amplified by PCR using PGMY or nested PCR using PGMY/GP+ from genomic DNA from each individual and subjected to Linear Array (LA) and deep sequencing/HPV---QUEST, respectively, to compare sensitivity and coverage in HPV genotype capturing. Raw deep sequencing reads generated from pooled HPV L1 libraries were separated and assigned to each individual according to MID sequence codes. Quality control removed low quality reads. Quality sequences of an HPV16 clone were aligned to HPV16 reference sequence (GI│333031) to study errors due to PCR and deep sequencing. To study HPV L1 sequences at variant level, quality sequences were clustered at 3% pairwise distance [51]. Consensus sequence derived from each cluster represented an HPV variant, and relative proportion of each HPV genotype within a sample served as a reference rather than a quantification of viral population structure. Consensus sequences were queried in HPV---QUEST to obtain HPV genus, species, genotype and oncogenicity. Papillomavirus Episteme (PaVE) Specific Blastn was used to confirm HPV---QUEST genotyping results, and Nationa Center for Biotechnology Information (NCBI) Blastn was applied to evaluate discrepancies of genotyping by HPV---QUEST and PaVE. NGS/HPV---QUEST was compared with LA with respect of sensitivity and coverage in identifying HPV genotypes and variants.
To assess deep sequencing errors, 25,928 quality L1 deep sequences from a molecular clone of HPV16 were aligned to an HPV16 reference sequence (GI│333031) using MUSCLE in Geneious package, followed by BioEdit (Ibis Biosciences, Carlsbad, CA, USA). Errors were evaluated using an Figure 2. Bioinformatic pipeline of sequence analysis. HPV L1 was amplified by PCR using PGMY or nested PCR using PGMY/GP+ from genomic DNA from each individual and subjected to Linear Array (LA) and deep sequencing/HPV-QUEST, respectively, to compare sensitivity and coverage in HPV genotype capturing. Raw deep sequencing reads generated from pooled HPV L1 libraries were separated and assigned to each individual according to MID sequence codes. Quality control removed low quality reads. Quality sequences of an HPV16 clone were aligned to HPV16 reference sequence (GI |333031) to study errors due to PCR and deep sequencing. To study HPV L1 sequences at variant level, quality sequences were clustered at 3% pairwise distance [51]. Consensus sequence derived from each cluster represented an HPV variant, and relative proportion of each HPV genotype within a sample served as a reference rather than a quantification of viral population structure. Consensus sequences were queried in HPV-QUEST to obtain HPV genus, species, genotype and oncogenicity. Papillomavirus Episteme (PaVE) Specific Blastn was used to confirm HPV-QUEST genotyping results, and Nationa Center for Biotechnology Information (NCBI) Blastn was applied to evaluate discrepancies of genotyping by HPV-QUEST and PaVE. NGS/HPV-QUEST was compared with LA with respect of sensitivity and coverage in identifying HPV genotypes and variants.
Consensus sequences were queried in HPV-QUEST, a custom HPV-genotyping server which collected 150 cutaneous and mucosal HPV reference sequences containing the L1 region obtained from National Center for Biotechnology Information (NCBI) Genebank [55], Los Alamos HPV Sequence Database [56] and Virus Sequence Database [57] with nomenclatures updated, and accommodates large sequence data sets for automated and expedited HPV genotyping and classification of HPV genus, species, type and oncogenicity (high risk = oncogenic; low risk = nononcogenic) [33]. To insure reliability of the HPV genotyping, highly stringent cutoff values were applied with e-value ď 1.0E-38 and ě 80 nucleotide identities between query (~106 nt) and reference sequences. Because of stability and conservation of HPV genomes over evolutionary times [58], genome segments as short as 20 to 30 nt can provide reliable genotyping [26,31]. Validation analysis of genetically unrelated sequences, including human immunoglobulin heavy chain variable region or human immunodeficiency virus type 1 envelope hypervariable region 3, returned results as "not identified (ND)". Singleton HPV genotypes concordant with LA typing, or unique among individuals, or containing sample-specific substitution(s) were included in the study because of the unlikelihood of contamination among the samples. They were otherwise excluded from further analysis. Cervical epidermoid carcinoma cell lines CaSki and C-4 II, reported to contain HPV16 or HPV18, respectively, were used to verify the sequence analysis pipeline. A total of 16,444 quality sequences generated from C-4 II DNA were genotyped as a single variant identical to reference HPV18, while a total of 22,763 quality sequences obtained from CaSki DNA formed a single cluster that was identical to reference HPV16 (Tables S1 and S2). Papillomavirus Episteme (PaVE) PV Specific Blastn [59] was used to confirm HPV-QUEST genotyping results, and NCBI Blastn [60] was applied to evaluate discrepancies of genotyping by HPV-QUEST and PaVE PV Specific Blastn. Genotype variants were studied by aligning them to correspondent reference sequences to identify synonymous and/or nonsynonymous substitution(s).
Genotypes defined by PaVE PV Specific Blastn were virtually identical to those classified by HPV---QUEST with exactly the same local identities to the same reference sequences used in both databases. Minor differences between the two genotyping tools, including HPVJEB2 typed by HPV---QUEST and classified as HPV72 by PaVE PV Specific Blastn, were due to different reference sequences used by the two programs (Table S2). HPVJEB2 is more accurate because the local identity is higher and is the top hit by NCBI Blastn.

HPV Genotype Variants
Across the study group, L1 regions from a variety of genotypes were identical to reference sequences at the nucleotide level (for example, HPVs51, 52, 56, 66, 11, 32 and 87 from S7, S9 and Genotypes defined by PaVE PV Specific Blastn were virtually identical to those classified by HPV-QUEST with exactly the same local identities to the same reference sequences used in both databases. Minor differences between the two genotyping tools, including HPVJEB2 typed by HPV-QUEST and classified as HPV72 by PaVE PV Specific Blastn, were due to different reference sequences used by the two programs (Table S2). HPVJEB2 is more accurate because the local identity is higher and is the top hit by NCBI Blastn.

HPV Genotype Variants
Across the study group, L1 regions from a variety of genotypes were identical to reference sequences at the nucleotide level (for example, HPVs51, 52, 56, 66, 11, 32 and 87 from S7, S9 and S12-S16; HPV16 from S5, S9, S12 and S15-S17; HPVcand85 from S13; and HPV6b from S7, S8 and S12). In contrast, a number of L1 variants displayed nucleotide differences from the corresponding reference sequences in more than 77% (14 of 18) of individuals (Figure 4). Both synonymous and nonsynonymous nucleotide variants were detected within high-risk and low-risk HPV genotypes. Multiple sample-specific substitutions could be found within a single HPV genotype, such as HPV16 in S3 (T1154C) and S4 (A1203G), HPV18 in S8 (C1196G), HPV59 in S9 (T1115C), and HPV81 in S18 (A1085G), while same substitution within a genotype occurred in samples from different individuals, such as HPV70 in S10 and S13 (G1044A), HPV73 in S10, S11 and S13 (G1083A), or HPV83 in S6 and S7 (A1104C). Overall, 81% (47/58) of nucleotide variants differing from reference variants within HPV L1 were identified among samples unclassified by LA. HPV16 in S3 (T1154C) and S4 (A1203G), HPV18 in S8 (C1196G), HPV59 in S9 (T1115C), and HPV81 in S18 (A1085G), while same substitution within a genotype occurred in samples from different individuals, such as HPV70 in S10 and S13 (G1044A), HPV73 in S10, S11 and S13 (G1083A), or HPV83 in S6 and S7 (A1104C). Overall, 81% (47/58) of nucleotide variants differing from reference variants within HPV L1 were identified among samples unclassified by LA. were aligned to corresponding reference genotype sequences with grey color highlighting variants identical to reference sequences. Synonymous and nonsynonymous nucleotide differences were highlighted by green and red, respectively, and boxed to indicate the same nucleotide substitution in more than one variant of an HPV genotype in one or multiple individuals. Nucleotide positions in HPV L1 sequences were based on L1 region of reference sequences (  were aligned to corresponding reference genotype sequences with grey color highlighting variants identical to reference sequences. Synonymous and nonsynonymous nucleotide differences were highlighted by green and red, respectively, and boxed to indicate the same nucleotide substitution in more than one variant of an HPV genotype in one or multiple individuals. Nucleotide positions in HPV L1 sequences were based on L1 region of reference sequences (

Longitudinal Changes of HPV Types within Two Healthy Individuals
Longitudinal changes of HPV genotypes were evaluated in S15 and S16 based on the ability to generate amplicon libraries from a series of three samples collected at six-month intervals (T0, T1, and T2) ( Figure 5). Similar to cross-sectional results, HPV84 genotype captured by LA was not detected by deep sequencing in either participant. In contrast, HPV16 appeared as the persistent genotype at each time point with temporal fluctuations in other genotypes, including HPVs51, 81, and 83 in S15 or HPV11 in S16. All HPV genotypes identified in the samples over time from each individual displayed nucleotide identities with reference genotypes.

Longitudinal Changes of HPV Types within Two Healthy Individuals
Longitudinal changes of HPV genotypes were evaluated in S15 and S16 based on the ability to generate amplicon libraries from a series of three samples collected at six---month intervals (T0, T1, and T2) ( Figure 5). Similar to cross---sectional results, HPV84 genotype captured by LA was not detected by deep sequencing in either participant. In contrast, HPV16 appeared as the persistent genotype at each time point with temporal fluctuations in other genotypes, including HPVs51, 81, and 83 in S15 or HPV11 in S16. All HPV genotypes identified in the samples over time from each individual displayed nucleotide identities with reference genotypes.

Discussion
Much recent attention has highlighted the necessity of HPV genotyping in clinical pathology. An increasing line of evidence implicates HPV viral diversity and genotypes as substantial factors in cancer grade, response to therapy, recurrence, and a patient's survival in HPV---related cancers, e.g., cervical cancers (CC) and head and neck squamous cell carcinomas (HNSCC) [12][13][14][15][16][17][18]30,64]. Infection by multiple HPV genotypes leads to poor response to radiotherapy with reduced survival in CC [13,14], and α7 genotypes, e.g., HPVs18, 39 and 45 are more resistant to radiotherapy [14,30]. HPV + HNSCC are emerging as a separate biological entity among all HNSCC. HPV---positive HNSCC are less likely to harbor p53 mutations, are more sensitive to radio/chemotherapy, and have a lower risk of recurrence and a better chance of survival than HPV---negative HNSCC [4,[15][16][17]. Presence of low---risk HPV type(s) in HNSCC result in poor therapy outcome [18]. The College of American Pathologists has recently recommended routine HPV testing as part of the standard pathologic evaluation of resected oropharyngeal SCC [65].

Discussion
Much recent attention has highlighted the necessity of HPV genotyping in clinical pathology. An increasing line of evidence implicates HPV viral diversity and genotypes as substantial factors in cancer grade, response to therapy, recurrence, and a patient's survival in HPV-related cancers, e.g., cervical cancers (CC) and head and neck squamous cell carcinomas (HNSCC) [12][13][14][15][16][17][18]30,64]. Infection by multiple HPV genotypes leads to poor response to radiotherapy with reduced survival in CC [13,14], and α7 genotypes, e.g., HPVs18, 39 and 45 are more resistant to radiotherapy [14,30]. HPV + HNSCC are emerging as a separate biological entity among all HNSCC. HPV-positive HNSCC are less likely to harbor p53 mutations, are more sensitive to radio/chemotherapy, and have a lower risk of recurrence and a better chance of survival than HPV-negative HNSCC [4,[15][16][17]. Presence of low-risk HPV type(s) in HNSCC result in poor therapy outcome [18]. The College of American Pathologists has recently recommended routine HPV testing as part of the standard pathologic evaluation of resected oropharyngeal SCC [65].
HPV-genotyping methods vary considerably across laboratories, resulting in substantial disagreement among studies [65][66][67]. Hybridization-and sequence-based tests are two major classes of HPV-genotyping systems. Current conventional hybridization-based HPV-genotyping methods represented by LA (Roche), SPF 10 -LiPA 25 [68,69], HPV DNA chip [70], and GP+ reverse line blot [71] detect a limited number of 22 to 37 HPV types out of more than 100 known mucosal types due to a limited number of type-specific probes available. Cross-hybridization between related HPV genomes compromises genotyping accuracy [37,64,72,73]. Primers and probes differ with PGMY11/09 for LA, SPF 10 for SPF 10 -LiPA 25 , and GP+ for HPV DNA chip and GP+ reverse line blot, resulting in large differences in sensitivity of detecting HPV infection and individual genotypes among the tests [14,27,28].
With the rapid revolution in technology, NGS applied to HPV genotyping provides high sensitivity in capturing even low frequency types, reflecting unprecedented coverage of viral copies and almost no limit in the number of types detected in a sample by querying reference sequence databases [72,[74][75][76]. NGS has opened up the opportunity to examine viral diversity, monitor persistence, fluctuation and evolution, discover new genotypes and survey vaccine response, which can be detailed at variant level [74,75,77,78]. Studies at the HPV variant level have clinical relevance since some HPV intra-type variants are geographically specific, associated with oncogenicity, or responsible for vaccine escape [52,[79][80][81][82]. However, the spectrum of HPV genotypes identified by deep sequencing still relies on the primer system used for amplicon generation.
In this study, we applied nested PCR using PGMY, followed by GP+ primers, to generate HPV L1 amplicon libraries since PGMY/GP+ provided greater sensitivity and broader coverage of HPV genotypes compared to MY alone, PGMY alone, GP+ alone, or MY/GP+ [46,47]. Other pyrosequencing-based HPV-genotyping studies used single-round amplification with MY, PGMY, or GP+ primers [72,74,76,83]. In addition, a 150 nt amplicon length falls within the average read length of 400 nt by Titanium pyrosequencing. Consensus primer systems all suffer from certain amplification bias towards HPV genotypes with greater priming efficiency [44,46,47,72,76,84,85]; thus, the percent of sequences may not correlate with the actual amount of virus present and cannot serve as a quantitative measure of viral population structure [72]. Primer design remains a critical factor for HPV genotyping by deep sequencing. Application of rolling circle amplification using random hexamer followed by deep sequencing may provide unbiased amplification of episomal HPV particles but misses genotypes integrated into the cell genome, which are closely correlated with viral persistence, cancer development, increased carcinogenesis and poor therapy response [86][87][88][89][90][91][92][93][94][95].
HPV genotyping of deep sequences relied mainly on querying to NCBI Blastn [72,75]. NCBI Blastn can only query a single sequence per blast search and thus cannot accommodate large sequence data sets. PaVE PV Specific Blastn is able to handle multiple sequences per blast search, but the capacity is far less than 10 Mb per run provided by HPV-QUEST, and genotyping results returned by PaVE PV Specific Blastn require further processing and organization. HPV-QUEST, a web-based genotyping system we developed [33], includes 150 annotated cutaneous and mucosal HPV reference sequences with L1 regions and updated nomenclatures, processes up to 10 Mb of sequences (around 6500 sequences of 100 nt) per run, returns results within one to two minutes, and outputs results in excel and text format, including blast score, e-value, local identity (the percentage of matched nucleotides within the alignment region), genus, species, genotype, infection site (mucosal or cutaneous or both), risk (high or low or unknown), accession number and gene identification number of reference sequence, and alignment of query sequence with reference sequence. Query sequences failing to align with any reference sequences in the HPV-QUEST are designated as "ND". Any sequences that fail to blast, have low local identity, or with an e-value >1e´15 are considered low quality, a new recombination, or a new genotype. With our NGS/HPV-QUEST genotyping system, large data sets of L1 sequences were easily generated and managed, and were classified rapidly and accurately with stringent quality assurance. The repertoire of HPV genotypes that can be classified unambiguously was extended because all published HPV genotypes were included among the reference sequence database in HPV-QUEST, and analysis was not restricted by availability of HPV type-specific probes. Nevertheless, the consensus nature of PGMY/GP+ primers used in generating amplicon libraries were designed to cover a broad spectrum of HPV types with genetic dissimilarity of at least 10%, rendering amplification bias towards certain HPV types unavoidable, similar to probe-based assays. Less than optimal specificity of GP+ primers for HPV genomes species 3 [96] likely accounts in part for lack of concordance between LA and deep sequencing for HPV typing in some samples. Previous studies suggested that PCR amplification-mediated bias is more due to differential primer-template annealing efficiencies of consensus primers than input template copies [97][98][99].
Sequencing allowed the identification of signature polymorphisms among L1 genotypes in more than 75% of individuals in this exploratory study with relatively small sample size, providing a framework to apply to future studies of HPV clearance, persistence, and super-infection by variants of the same HPV genotypes in a larger cohort. In addition, other regions of the HPV genome that may provide more robust molecular information can be incorporated into a deep sequencing platform to track genotypes over time or evaluate HPV gene expression [100]. A high frequency of nucleotide variants differing from reference variants among a range of HPV genotypes failed to be typed by LA, potentially reflecting an inefficient hybridization between HPV L1 regions with polymorphism and primers and/probes used in LA.
In the current study, deep sequencing was applied to define the range of HPV genotypes in asymptomatic individuals. Our data clearly demonstrate that healthy men can carry a range of high-risk and/or low-risk types HPV genotypes, including the vaccine types. Infection by multiple-type high-risk HPV, alone or in combination with low-risk genotypes, is a significant clinical issue for prognosis or treatment for a variety of HPV cancers [4,[13][14][15][16][17][18], while multiple-type infection among asymptomatic individuals can diminish HPV clearance and increase progression to precancerous lesions [8][9][10][11][12]14,19,[21][22][23][24][25]. Deep sequencing technology combined with an efficient analysis pipeline offers a platform that can detect HPV genotypes at low frequency in the population, that enables the discovery of the extent to which a repertoire of HPV genotype(s) may relate to subsequent development of disease, that can extend the unambiguous classification of HPV genotypes, and that can provide more robust data about prevalence, dynamics, and competition among genotypes in an ecosystem. Sequence data also provides the basis to implement phylogenetic tools to both assess the organization of HPV genomes within an individual, among transmission pairs, or across geographic locations, and investigate the relationship among new HPV sequences with reference genotypes.