Cervicovaginal Human Papillomavirus Genomes, Microbiota Composition and Cytokine Concentrations in South African Adolescents

The interaction between cervicovaginal virome, bacteriome and genital inflammation has not been extensively investigated. We assessed the vaginal DNA virome from 33 South African adolescents (15–19 years old) using shotgun DNA sequencing of purified virions. We present analyses of eukaryote-infecting DNA viruses, with a focus on human papillomavirus (HPV) genomes and relate these to the vaginal bacterial microbiota (assessed by 16S rRNA gene sequencing) and cytokines (assessed by Luminex). The DNA virome included single-stranded (Anelloviridae, Genomoviridae) and double-stranded DNA viruses (Adenoviridae, Alloherpesviridae, Herpesviridae, Marseilleviridae, Mimiviridae, Polyomaviridae, Poxviridae). We identified 110 unique, complete HPV genomes within two genera (Alphapapillomavirus and Gammapapillomavirus) representing 40 HPV types and 12 species. Of the 40 HPV types identified, 35 showed positive co-infection patterns with at least one other type, mainly HPV-16. HPV-35, a high-risk genotype currently not targeted by available vaccines, was the most prevalent HPV type identified in this cohort. Bacterial taxa commonly associated with bacterial vaginosis also correlated with the presence of HPV. Bacterial vaginosis, rather than HPV, was associated with increased genital inflammation. This study lays the foundation for future work characterizing the vaginal virome and its role in women’s health.


Introduction
Various eukaryote-infecting DNA viruses have been identified in the genital tracts of generally healthy, asymptomatic women of reproductive age by shotgun metagenomic sequencing, including double-stranded (ds) DNA viruses (families Adenoviridae, Herpesviridae, Papillomaviridae and Polyomaviridae) and single-stranded (ss) DNA viruses (families Anelloviridae) [1,2]. Additionally, novel dsDNA viruses that share sequence similarity to those in the Alloherpesviridae, Iridoviridae, Marseilleviridae, Mimiviridae, Phycodnaviridae and Poxviridae families have been identified in the vaginal samples of pregnant people, and those with reproductive disorders [2][3][4][5].
To date, more than 220 human papillomavirus (HPV) types have been identified, including at least 50 types that preferentially infect the genital mucosa [6]. In the Human Microbiome Project (HMP) that included healthy, asymptomatic women from North America, viruses of the genus Alphapapillomavirus were the most common DNA viruses detected in the lower reproductive tract, with 38% of the participants being infected with at least one Alphapapillomavirus. Longitudinal sampling showed the presence of many HPVs over multiple time points, suggesting that up to 50% of these papillomaviruses established productive infections [1]. Similarly, papillomaviruses were the most frequently detected viruses in women undergoing in vitro fertilization [5] and in women with human immunodeficiency virus (HIV) [7], confirming that papillomaviruses are highly prevalent in the human female genital tract (FGT), irrespective of the population studied.
High-risk HPV types have been well described to cause high-grade cervical intraepithelial neoplasia (CIN) and cancer [8], and more recently, HPV infections have been linked to increased HIV acquisition risk in sub-Saharan African women [9]. Given that young women in Sub-Saharan Africa are at the highest risk of HIV acquisition [10], and vaginal microbiota composition and genital tract inflammation have been linked to HIV risk [11,12], it is important to evaluate the interactions between HPV, cervicovaginal microbiota and inflammation in this age group. Further, although infections with low-risk HPV types do not necessarily result in clinically adverse health outcomes [8], viruses that do not cause obvious disease yet establish chronic infections have been shown to influence immunity in the gut [13]. Conversely, gut microbiota has been suggested to regulate viral infections [14]. Similar processes are likely to occur in the mucosal environment of the FGT, yet the interactions between vaginal virome, bacteriome and immunity have not been well described yet.
Bacterial vaginosis (BV) is a vaginal condition in women of reproductive age where the optimal microbiota dominated by Lactobacillus spp. is replaced by a range of diverse anaerobes [15]. Meta-analyses have shown an association between BV and incident HPV infection and HPV persistence [16]. Similarly, these meta-analyses showed that women with vaginal microbiota dominated by non-Lactobacillus spp. or Lactobacillus iners had up to five times higher odds of being infected with any HPV compared with women with vaginal microbiomes dominated by L. crispatus [17]. In these studies, HPV types were detected using commercially available HPV typing arrays that tested for the presence of several pre-specified types. These HPV typing approaches are limited by the restricted number of HPV genotypes included in commercial arrays.
Here, we used an untargeted approach to characterize eukaryotic DNA viruses in the FGT of South African adolescents to capture the FGT virus landscape in this population. Our emphasis lies on HPV genotypes, including their co-infection patterns and association with cervicovaginal bacteriomes and cytokines.

Study Cohort
The participants of this sub-study were recruited through a parent study, UChoose (clinicaltrials.gov/NCT02404038), which was conducted between 2015-2017 and enrolled cis-gender females aged 15-19 years. None of the participants had received an HPV vaccine, as the roll out of HPV vaccinations in the public sector in South Africa only commenced in 2014 and targeted younger girls [18]. The study was conducted in full compliance with South African Good Clinical Practice (SA-GCP), ICH76 GCP and ICMJE guidelines and approval for the study was obtained from the Human Research Ethics Committee at the University of Cape Town (HREC 801/2014). Eligibility criteria and study design have previously been described in detail [19][20][21]. Briefly, 130 non-pregnant HIV-seronegative adolescent girls were enrolled at the Desmond Tutu Health Foundation (DTHF) Youth Centre in Masiphumelele, Cape Town, South Africa, into a randomized study of injectable hormonal contraception (norethisterone enanthate, NET-EN), combined contraceptive intravaginal ring (CCVR, NuvaRing ® ; MSD Pty Ltd., Johannesburg, South Africa), and combined oral contraceptive pills (COC, Triphasil ® or Nordette ® ). The participants were seen at enrollment, 4 months, and 8 months. For this exploratory viral metagenome analysis, we included single time point samples from 33 participants, with samples from two longitudinal time points included for participants who experienced a change in BV status throughout the study.

Sample Collection
At all study visits, rapid HIV and pregnancy testing were performed, and if positive, the participant was counselled and referred for management, and no further mucosal samples were collected. Detailed interviewer-assisted questionnaires assessing medical/health history, sexual behavior, menstrual cycle, contraceptive use, intravaginal practices, and antibiotic use were completed at all visits. Venous blood was obtained for herpes simplex virus 2 (HSV-2) serological testing. Cervicovaginal secretions were collected using a disposable menstrual cup (Softcup ® ) placed over the cervix for half an hour to measure genital cytokines by Luminex, and a clinician collected a vulvovaginal swab for sexually transmitted infection (STI) testing (Chlamydia trachomatis, Neisseria gonorrhoeae, Trichomonas vaginalis, Mycoplasma genitalium) by multiplex PCR [22], BV testing (Gram staining and Nugent scoring; BV negative (Nugent 0-3), intermediate (Nugent [4][5][6] or positive (Nugent 7-10)), and pH measurement using color-fixed indicator strips (Macherey-Nagel, Düren, Germany). A clinician-collected vaginal lateral wall swab was collected for bacterial 16S rRNA gene sequencing and bacterial and viral metagenomic analysis. No samples were collected during menstruation. Upon arrival at the laboratory, the Softcup ® fluid and vaginal swabs were stored at −80 • C until further processing.

Extraction of Bacterial Nucleic Acids and 16S rRNA Gene Sequencing
DNA extraction followed by amplification, sequencing and bioinformatics analysis of the V4 region of the 16S rRNA gene using 515F and 806R primers from vaginal lateral wall swabs were described previously in detail [20]. Amplicons from 96 samples were pooled in equimolar amounts, and the libraries were sequenced on the Illumina MiSeq platform (300-bp paired-end) with v3 chemistry. The raw 16S rRNA gene amplicon sequences from this study are available at https://www.ebi.ac.uk/ (accessed on 15 December 2022) under project number PRJEB30774.

Extraction of Viral Nucleic Acids
The genital tract virome was determined by shotgun sequencing of an aliquot (450 µL) from the vaginal lateral wall swab buffer solution. Virus-like particles (VLPs) in the vaginal swabs were resuspended by the addition of 1 mL SM buffer, 0.1 M NaCl, 50 mM Tris/HCl (pH 7.4), and 10 mM MgSO4, to the sample. This step was followed by filtering the sample through a 0.2 µm filter (Santorius, Göttingen, Germany) to separate VLPs from bacteria, human cells, and larger particles. Viral DNA was extracted from the resulting filtrate using the High Pure Viral Nucleic acid extraction kit (Roche Life Sciences, Indianapolis, IN, USA) following the manufacturer's protocol. The extracted viral DNA was subjected to rolling circle amplification using the Illustra TM TempliPhi kit (GE Healthcare, Chicago, IL, USA) and stored at −80 • C until library preparation.

Viral Metagenome Sequencing and Bioinformatics Analyses
Library preparation and quantitation were performed by the University of Washington (UW) Northwest Genomics Center (NWGC), and the resulting libraries were sequenced in multiplex mode on a NovaSeq6000 S4 flow cell, generating~30 Gb per sample. The raw reads were trimmed using Trimmomatic 0.39 [23], and then de novo assembled using metaSPAdes 3.12.0 [24] with k = 33, 55, 77. De novo assembled contigs >500 nucleotides in length were identified via alignment against a viral protein RefSeq sequence database (downloaded from NCBI) using BLASTx [25]. Circular contigs were identified in the viral-like contigs by checking for terminal redundancy. All the open reading frames (ORF) were identified using ORFfinder at https://www.ncbi.nlm.nih.gov/orffinder/ (accessed 7 January 2022) and refined and annotated with data from PaVE [26]. The HPV genomes (n = 65) identified via viral metagenome sequencing are deposited in GenBank (OP970964-OP970967; OP971042-OP971102), and raw reads have been deposited in SRA under BioProject PRJNA881266.

Whole Community Metagenome Sequencing and Bioinformatics Analyses
We leveraged previously generated lateral vaginal wall shotgun metagenome sequencing data (i.e., from DNA extracted from swabs without VLP-enrichment, described in Happel et al. [27]) from participants of the same cohort to identify additional complete HPV genomes and report on these here. Briefly, the UW NWGC performed whole metagenome library preparation, quantitation and sequencing of genomic DNA using an Illumina No-vaSeq 6000 S4 flow cell. The bioinformatic analysis has been described before [27], but for this analysis, circular genomes were identified based on terminal redundancy. Those with homology to previously described HPV genotypes (n = 45) have been deposited in GenBank (OP970997-OP971041). Raw metagenomic reads have been deposited in SRA under BioProject PRJNA767784.

Genome Analyses of HPVs and L1 Phylogeny
CenoteTaker2 [28], coupled with manually refined reference genomes from PaVE [26], was used to annotate the 110 HPV genomes. The classification of papillomavirus species is based on major capsid protein (L1) nucleotide sequence similarity [29]. The species demarcation threshold was 70%, with the papilloma-type threshold being 90%. The L1 sequences of classified papillomaviruses were downloaded from PaVE [26], including reference and non-reference sequences, and these sequences, together with the 110 genomes identified (65 genomes identified by viral metagenome sequencing and 45 genomes identified by whole community metagenome sequencing) constituted an L1 sequence dataset. The pairwise identities of L1 sequences were determined using SDT v1.2 [30]. Based on these pairwise identities of the L1 sequences, the HPVs identified in this study were assigned HPV types. Since the pairwise identities of the L1 sequences showed that all 110 HPVs are members of the Alphapapillomavirus or Gammapapillomavirus genus, we assembled a dataset of known L1 sequences of all HPV types in these two genera, as well as those in this study and a subset of those from the genera Betapapillomavirus, Nupapillomavirus, Mupapillomavirus and avian-infecting papillomaviruses. This dataset was translated and aligned using MAFFT v7.113 [31]. The alignment was then used to determine a best-fit amino acid substitution model (LG + I+G + F) using ProtTest 3 [32], and a maximum likelihood phylogenetic tree was inferred using PhyML 3 [33] with approximate likelihoodratio test (aLRT) branch support. The maximum likelihood tree was rooted with the avian papillomavirus L1 sequences and visualized and annotated in iTOL v6 [34].

Statistical Analyses
All downstream analyses were conducted in Rstudio using R version 4.2.2 [35] unless otherwise specified. Study cohort characteristics were described using means, medians, standard deviations, interquartile ranges, and proportions, as appropriate. Differences in study population characteristics during the absence and presence of BV were tested using Fisher's exact test for count measurements, paired Student's t-test for differences in means (parametric data) and paired Wilcoxon signed-rank test for differences in medians (nonparametric data). Co-occurrence patterns of HPV genotypes were determined using the R package cooccur [36]. The Data Integration Analysis for Biomarker discovery using Latent Components (DIABLO) framework, as part of the mixOmics R Bioconductor package [37], was used to integrate the microbial and HPV data. Bacterial taxa that accounted for the highest degree of variance between women with and without HPV were selected via sparse partial least-squares discriminant analysis (sPLS-DA) using centered log ratio (CLR) transformed bacterial relative abundance and binary presence/absence of any HPV. A network including the selected variable was drawn using the network function and visualized using the igraph R package version 1.3.5 [38] and reformatted in Cytoscape [39]. Differences in median cervicovaginal cytokine concentrations between women with any HPV, multiple HPVs, any high-risk or any low-risk HPVs were compared to those without, and between women with a given HPV type versus those without that specific type using unpaired Mann-Whitney U test (nonparametic data). Mixed effect logistic regressions that included participant identification number (PID) as a random variable (since a subset of participants had samples collected at two time points) and adjusted for BV status (since BV influences cytokine concentrations) were run using the R package lme4 [40] to evaluate associations between HPV positivity and concentrations of cervicovaginal cytokines. Adjustment for multiple comparisons was done using a false discovery rate step-down procedure [41]. p values and 95% confidence intervals were used to assess statistical significance.

The Eukaryotic Vaginal DNA Virome in South African Adolescents
The 33 South African adolescents and young women included in the viral metagenome analysis had a median age of 16 years (IQR 16-18 years) with a median age of sexual debut of 15 years (IQR 14-16 years). Most of the sampling was performed when participants were using injectables, oral contraceptive pills or Nuvaring ® as hormonal contraceptives ( Table 1). The prevalence of HSV-2 seropositivity and BV was 30%, and about one-fifth had a bacterial STI or candidiasis. With regards to sexual risk behavior, most women only had one partner, half used a condom during their last sex act, and one-sixth were unsure whether their partner was monogamous. Previous pregnancies were rare (Table 1). About half of the women had a vaginal microbiota dominated by a diverse group of anaerobic bacteria, with the most abundant species being Gardnerella vaginalis followed by Lachnocurva vaginae (formerly BVAB1), Megasphaera, L. iners, Prevotella spp. (including P. amnii, P. timonensis, and P. bivia), Fanyhessa vaginae, Sneathia, and Aerococcus christensenii (community state type IV (CST-IV), Figure S1). The remainder had low-diversity communities dominated by L. crispatus (CST-I) or L. iners (CST-III). For a subset of eight participants that experienced a change in BV status, the vaginal viromes of longitudinal samples were sequenced from visits both with and without BV, 4 months apart. As expected, participants had a more diverse bacterial microbiota (CST-IV) when diagnosed with BV, and most vaginal microbiotas were dominated by Lactobacillus spp. (CST-I or -III) with a lower diversity in the absence of BV. A higher proportion of women were using oral contraceptive pills during episodes of BV, but no other evaluated demographic or biological variable differed by BV status (Table 1). To identify eukaryote-infecting DNA viruses in the vaginal samples, we characterized the viral taxonomy of contigs longer than 500 nucleotides using BLASTx and the NCBI Refseq viral protein database. Sequences from 11 eukaryote-infecting virus families were detected ( Figure 1). We identified ssDNA viruses that share sequence similarities to Anelloviridae and Genomoviridae; and dsDNA viruses that share sequence similarities to Adenoviridae, Alloherpesviridae, Herpesviridae, in the subfamilies Alphaherpesvirinae, Betaherpesvirinae, and Gammaherpesvirinae, Marseilleviridae, Mimiviridae, Polyomaviridae, Poxviridae in the subfamilies Chordopoxvirinae and Entomopoxvirinae. We also observed sequences with similarities to Retroviridae, primarily human endogenous retroviruses. Finally, we identified complete genomes in the viral families Anelloviridae (n = 3), Genomoviridae (n = 1), and Polyomaviridae (n = 2), which have been reported by Jimoh et al. [42]. Betaherpesvirinae, and Gammaherpesvirinae, Marseilleviridae, Mimiviridae, Polyomaviridae, Poxviridae in the subfamilies Chordopoxvirinae and Entomopoxvirinae. We also observed sequences with similarities to Retroviridae, primarily human endogenous retroviruses. Finally, we identified complete genomes in the viral families Anelloviridae (n = 3), Genomoviridae (n = 1), and Polyomaviridae (n = 2), which have been reported by Jimoh et al. [42].

HPV Infection Patterns in South African Adolescents
Among the eukaryote-infecting viruses, we identified multiple viruses in the Papillomaviridae (subfamily Firstpapillomavirinae) family, which are all HPVs. We assembled 65 full genomes from the viral metagenomics dataset and leveraged our total metagenome dataset from a subset of participants from the same cohort [27] to assemble an additional 45 HPV genomes. These 110 unique genomes represent 40 types and 12 species within two genera, i.e., Alphapapillomavirus (n = 104) and Gammapapillomavirus (n = 6) (Table 2; Figure 2).   The overall prevalence of any HPV in this group of women was 60.6% (20/33), with 54.5% (18/33) being infected with high-risk HPV types. Most of the women infected with any HPV were infected with multiple HPV types (80.0%; 16/20). The most frequent HPV types detected were HPV-35 (high-risk), found in six women, followed by HPV-16 and -53 (both high-risk) and HPV-81 and -90 (both low-risk). HPV-16, -53, -81, and -90 were each detected in four women. Interestingly, one woman had HPV-214, which was only recently described in a penile swab from a study conducted in Cape Town, South Africa [43]. HPV types targeted by the bivalent HPV vaccine (HPV-16/-18) were detected in 15.2% (5/33) of women, while those targeted by the quadrivalent vaccine (HPV-6/-11/-16/-18) were detected in 21.2% (7/33) of women; and those targeted by the nonavalent vaccine (HPV-6/-11/-16/-18/-31/-33/-45/-52/-58) were detected in 33.3% (11/33) of women. HPV-11, which is one of the types targeted by the quadrivalent vaccine, was not detected, while all other HPV types targeted in the bivalent, quadrivalent and nonavalent vaccines were found. The most common HPV type identified (HPV-35) is, however, not targeted by any of the commercially available vaccines. to samples where HPV-16 was not present (n = 35; median 1, IQR 0-2; p < 0.0001). Lowrisk HPV-61 and high-risk HPV-69 co-occurred in all samples in which the given genotypes were detected (p = 0.0008). This was also true for the co-occurrence of HPV-39 and -68 (both high-risk, p = 0.0003) and HPV-40 and -53 (both low-risk, p = 0.0003). These data suggest that HPV genotypes frequently co-occur and that the co-occurrence between specific HPV genotypes might be more common than for others.  We next evaluated HPV co-infection patterns using a model that employs combinatorics to determine the probability that the observed frequency of HPV genotype cooccurrence is significantly greater than expected (positive association), significantly less than expected (negative association), or approximately equal to expected (random association) (Figure 3a). Of the 40 HPV genotypes identified in this study, 35 showed positive associations with one or more HPV genotypes. No associations were found between HPV-62, -66, -67, -215, -mSD2 and any other HPV genotype. There was a significant positive co-occurrence between the presence of HPV-16 and twelve other HPV genotypes, including high-risk HPV-18, -33, -53, -56, and -68, and low-risk HPV-30, -40, -43, -44, -54, -61 and -69 (all p < 0.05) (Figure 3a). In agreement, samples where HPV-16 was present (n = 6) had significantly more other HPV types (median 12, interquartile range (IQR) 5-12) compared to samples where HPV-16 was not present (n = 35; median 1, IQR 0-2; p < 0.0001). Low-risk HPV-61 and high-risk HPV-69 co-occurred in all samples in which the given genotypes were detected (p = 0.0008). This was also true for the co-occurrence of HPV-39 and -68 (both high-risk, p = 0.0003) and HPV-40 and -53 (both low-risk, p = 0.0003). These data suggest that HPV genotypes frequently co-occur and that the co-occurrence between specific HPV genotypes might be more common than for others.  HPV42  HPV61  HPV62  HPV81  HPV84  HPV87  HPV89  HPV69  HPV26  HPV51  HPV30  HPV53  HPV56  HPV66  HPV40  HPV58  HPV18  HPV39  HPV45  HPV68  HPV43  HPV91  HPV16  HPV31  HPV33  HPV35  HPV52  HPV67  HPV44  HPV6  HPV74  HPV34  HPV73  HPV54  HPV90  HPV108  HPV214   determined by a probabilistic co-occurrence model [36]. Genotype names are positioned to indicate the columns and rows that represent their pairwise relationships with other species. Blue indicates that the co-occurrence of HPV genotypes is significantly large and greater than expected (statistically significant positive association). This analysis included the HPV genotypes identified in the viral metagenome and whole community sequencing projects. The plot only includes the 35 HPV genotypes that had significant associations with one or more other HPV genotypes. (B) The occurrence of the 39 HPV genotypes identified in the 33 participants collected when women had BV (light blue, n = 19) and when BV was absent (dark blue, n = 22) is shown in a cross-sectional comparison. HPV genotypes are annotated by risk classification and species. This analysis includes samples from the 33 women in the viral metagenome sequencing project.

HPV and the Vaginal Microbiota
Given the previously reported relationship between HPV and the vaginal microbiota [16], we evaluated associations between the presence of HPV and the vaginal microbiota in cross-sectional analyses that included 19 samples collected when women had BV and 22 samples collected during the absence of BV, using an untargeted viral metagenome sequencing approach. Cervicovaginal samples from women with BV had a significantly higher number of HPV genotypes (median 2, IQR 0-6) compared to those without BV (median 0, IQR 0-3, p = 0.031). The proportion of women with high-risk HPV types was not statistically different between women with BV (12/19, 63.2%) compared to those without BV ( We performed sPLSDA to identify bacterial taxa that accounted for the highest degree of variance between women with and without any HPV (Figure 4a). A network analysis that included the statistically significant bacterial taxa selected by sPLSDA showed that the presence of HPV correlated positively with the presence of Coriobacteriaceae, Prevotella and Dialister spp., in addition to other BV-associated species, including Megasphera, Gardnerella vaginalis, Fannyhessea vaginae and Aerococcus christensenii (Figure 4b). In contrast, the absence of HPV infection correlated positively with the presence of L. crispatus, and Corynebacteria (C. tuberculostearicum and C. amycolatum). There were two main clusters of bacteria that best distinguished HPV status: one dominated by L. crispatus (with a lower abundance in women infected with HPV) and one comprised of Coriobacteriaceae, Prevotella spp. (P. buccalis and P. timonensis), Parvimonas micra and Dialister micraerophilus (with a higher abundance in women infected with HPV). Most of the latter clusters were highly abundant in CST-IV in this cohort ( Figure S1) and have previously been associated with BV and genital inflammation in South African women [44,45]. In agreement, most samples assigned to this BV-associated bacteria cluster (16/20, 80%) were from women who had HPV, while fewer women assigned to the L. crispatus cluster had HPV (9/21; 43%; p = 0.0247) (Figure 4c). These data suggest that vaginal microbiota and HPV infection might be associated in this cohort of South African adolescents.
We sought to investigate this relationship in more detail in a subset of eight participants who experienced a change in BV status longitudinally over 4 months (Table 1, Figure 5). This analysis only included samples with a read coverage of >75% of the genome as a cut-off for positive detection of a given HPV type. Using these criteria, four women had more HPV variants present during episodes of BV compared to episodes with no BV, three had fewer, and one participant did not have any HPV in the presence or absence of BV. Women had a median of three HPV variants during a BV episode (IQR 1-8)  between the presence of HPV and some BV-associated taxa, and for some women episodes of BV coincide with the acquisition of new HPV genotypes, this pattern is not universal.  We sought to investigate this relationship in more detail in a subset of eight participants who experienced a change in BV status longitudinally over 4 months (Table 1, Figure 5). This analysis only included samples with a read coverage of >75% of the genome as a cut-off for positive detection of a given HPV type. Using these criteria, four women   BV. Women had a median of three HPV variants during a BV episode (IQR 1-8) and a median of two variants when BV was absent (IQR 0-4; p = 0.638). The same observations were made on genotype (median 2 (IQR 0-4) vs. 3 (IQR 1-4), p = 0.678) and species (median 2 (IQR 0-3) vs. 2 (IQR 1-5), p = 0.674) level. This suggests that while there were associations between the presence of HPV and some BV-associated taxa, and for some women episodes of BV coincide with the acquisition of new HPV genotypes, this pattern is not universal. Figure 5. Evaluating presence of HPV variants in longitudinal samples from participants who experienced a change in BV status. From eight participants included in the viral metagenome sequencing project longitudinal samples were collected 4 months apart, and HPV genotype presence was evaluated in the absence (purple) and presence (blue) of BV. A variant was defined based on a 5% difference (virus operational taxonomic unit, vOTU), and the analysis was limited to vOTUs that had >75% genome coverage.

HPV and Cervicovaginal Cytokine Concentrations
Previously, in a study of South African women at high risk of HIV acquisition, infection with any HPV type was associated with increases in several chemokines,  Figure 5. Evaluating presence of HPV variants in longitudinal samples from participants who experienced a change in BV status. From eight participants included in the viral metagenome sequencing project longitudinal samples were collected 4 months apart, and HPV genotype presence was evaluated in the absence (purple) and presence (blue) of BV. A variant was defined based on a 5% difference (virus operational taxonomic unit, vOTU), and the analysis was limited to vOTUs that had >75% genome coverage.

HPV and Cervicovaginal Cytokine Concentrations
Previously, in a study of South African women at high risk of HIV acquisition, infection with any HPV type was associated with increases in several chemokines, growth/hematopoietic factors, pro-inflammatory, adaptive and regulatory cytokines, including those found to be associated with HIV risk [46]. Similar observations were made in other settings [47,48].
Here, we compared cervicovaginal cytokine levels from women infected with a given HPV genotype to those without that specific genotype in univariate analyses (Figure 6a). Infection with some HPV types (i.e., high-risk HPV-35, -39, and -68) was associated with higher cervicovaginal cytokine concentrations, particularly of IL-17A, IL-17F, TNF-α, IL-25 and IFN-γ. In contrast, infection with HPV genotypes of the species Alphapapillomovirus 6 (i.e., HPV-30) was generally associated with lower cytokines concentrations. In univariate analyses, infection with HPV-35 was associated with a significantly higher concentration of six of the thirteen cytokines measured, including IL-1β (median 131.65 pg/mL (IQR 109.08-214. 31 34-8.89), p = 0.048), with IL-1β and TNF-α remaining significant after adjustment for multiple comparisons (adjust. p = 0.0044 and adjust. p = 0.0054, respectively). As BV has been shown to increase genital inflammation, we adjusted for BV status in mixed effect logistic regressions that included PID as a random variable. In these models, only TNF-α showed a significant association with the presence of HPV-35 (β = 2.53; 95% CI 1.30-1.95, p = 0.048).
To further dissect the relationship between HPV and genital cytokines, we compared cervicovaginal cytokine levels from women according to HPV status, including infections with any HPV, multiple HPVs, high-risk, or low-risk HPVs versus those without any HPV infection (Figure 6b). We found no significant differences in cytokine concentrations in univariate analyses between women with any, multiple or any high-risk or low-risk HPVs compared to those without. We next used mixed effect logistic regressions that included PID as a random variable and adjusted for BV status to evaluate associations between HPV positivity and levels of cervicovaginal cytokines (Table 3). This analysis confirmed that having any HPV, multiple HPVs, any high-risk or any low-risk HPVs was not associated with significantly higher concentrations in any of the cervicovaginal cytokines measured, while having BV was associated with significantly higher IL-1β (β = 1.00; 95% CI 0.30 -1.69; p = 0.006), IL-17F (β = 0.82; 95% CI 0.01-1.64; p = 0.048), IL-25 (β = 0.68; 95% CI 0.03-1.33; p = 0.041), IL-33 (β = 1.34; 95% CI 0.12-2.57; p = 0.032) and TNF-α (β = 1.60; 95% CI 0.52-2.68; p = 0.004). This suggests that the presence of BV has a greater impact on cytokine levels in the lower FGT than the presence of HPV in this group of women. We controlled for BV and included PID as a random variable, as some participants had repeated measurements. p-values after adjustment for multiple comparisons using a false discovery rate step-down procedure [41] are shown.

Discussion
Our exploratory study describes the eukaryote-infecting DNA virome in young women from Sub-Saharan Africa, with an emphasis on HPV genomes. Using a metagenome sequencing approach, we assembled 110 complete HPV genomes across 40 HPV genotypes. We have made these publicly available to expand the current databases.
This untargeted approach for HPV typing allowed us to capture a broader snapshot of the FGT HPV landscape, as we were not limited to HPV types included in commercial arrays. While most HPV types detected are included in currently available commercial assays like the HPV Direct Flow CHIP (Master Diagnóstica, Granada, Spain), more recently described HPV genotypes like HPV-108, -214, -215 or -mSD2 would not have been detected using commercial arrays. The risks associated with infections with these HPV genotypes are still unknown and will be difficult to assess if only commercial arrays are used to conduct HPV typing.
HPV-35 was the most common HPV type detected, with a prevalence of 15% in this small cohort of asymptomatic, young South African women. Studies in Sub-Saharan African populations have reported an HPV-35 prevalence of up to 40% among women with cervical intraepithelial neoplasia (CIN) [49]. HPV-35 is also detected in approximately 10% of cervical cancer cases in African women, while it is detected in 2% of cervical cancer cases worldwide [50]. Despite its high prevalence and risk associated with cancer development in African women, HPV-35 is not targeted by any of the currently available HPV vaccines. HPV co-infections were common, especially with high-risk HPV genotypes. Whether this relates to behavior or biological mechanisms remains unclear. Our cohort included young women, and others have found that co-infections are more common amongst younger women than their older counterparts [51,52], suggesting that the higher frequency of sexual activity of younger women, or the higher number of partners, may lead to the sexual transmission of multiple HPV genotypes. HPV typing and investigation of co-infection patterns on a larger scale across more demographic and age groups would help inform the HPV vaccination campaigns for Sub-Saharan African women. Given the high prevalence of HPV in this age group, alongside high HIV risk [10], and recent observations that infection with any high-risk or low-risk HPV is associated with increased risk of HIV acquisition in Sub-Saharan women [9], these findings not only provide an argument for increased HPV vaccine coverage to prevent HPV-associated cancers but also to potentially reduce HIV incidence in this population. Further, while the current South African guidelines recommend that all women should have a Pap smear at least every 10 years, starting at age 30, this might suggest including younger women in HPV screening programs, in addition to raising awareness around this topic [53].
A meta-analysis has suggested a causal link between vaginal dysbiosis and HPV risk [16]. There was limited evidence for associations between the presence of BV by Nugent scoring and HPV infections in our study, but the sample size is small and limits our power to detect differences. However, in a more detailed analysis of the vaginal microbiota composition, we found that women with HPV had less Lactobacillus spp. and a higher abundance of Prevotella buccalis and P. timonensis, Parvimonas micra and Dialister micraerophilus, which were highly abundant in women with CST-IV in this cohort. This supports findings from a recent study that has described that HPV infection alters the vaginal microbiome through the down-regulation of host mucosal innate peptides used by Lactobacillus spp. as amino acid sources [54]. Larger longitudinal studies are needed to infer causality and describe the directionality in the interplay between HPV and BV. Our findings raise the question of whether the shifting of the vaginal microbiota towards a Lactobacillus-dominant state, e.g., by using live biotherapeutic products, could be leveraged as a treatment strategy for women with HPV. While larger studies in diverse populations are warranted, previous studies have suggested that the administration of Lactobacillus probiotics may favor HPV clearance [55][56][57].
Following the adjusting for BV status, we found that having any HPV type or a high-risk HPV type was not associated with cytokine levels in this small cohort of women. Only infection with HPV-35 was associated with significantly higher concentrations of TNF-α after adjusting for BV status. In contrast, having BV was associated with higher levels of many cytokines that overlap substantially with cytokines associated with HIV risk. While HPV infection has been found to increase the risk of HIV acquisition in Sub-Sharan African women [9], and an increase in genital inflammation, independent of the cause, has been associated with an increased risk of HIV acquisition [11], these data suggest that the increased risk of HIV acquisition amongst women with HPV is not facilitated via an increase in genital inflammation.
The sample size of this exploratory study was small, thereby limiting our ability to make strong conclusions regarding associations between vaginal microbiota, cytokines, and the presence of HPV genotypes. Nonetheless, we analyzed longitudinal, matched data for a subset of participants, which strengthened our approach. While we focused on DNA viruses here, we plan to include RNA viruses in subsequent studies and to assess bacteriophages. The effect of other clinical parameters on the vaginal microbiota, such as diet or vaginal insertion practices, should be considered in future studies to be carried out in different populations. Strengths of this study included using a viral metagenomic sequencing approach to capture the HPV landscape in a more unbiased way and to identify full HPV genomes from Sub-Saharan Africa, which we have made publicly available.

Conclusions
This viral metagenome sequencing study has identified 110 unique HPV genomes from healthy, asymptomatic South African women. Amongst the most common high-risk HPV types were genotypes that are not currently targeted by available vaccines. Describing the HPV landscape helps inform HPV vaccination strategies and should be conducted on a larger scale in Sub-Saharan African women.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/v15030758/s1, Figure S1: Bacterial community state types (CSTs). A bar plot depicting the relative abundance of the most abundant bacteria in the samples of the 33 participants was identified by 16S rRNA microbiome profiling. Samples are grouped by communitystate type (CST) established using soft k-means clustering with weighted UniFrac distances. CST-I was dominated by L. crispatus, CST-III by L. iners, and CST-IV had a diverse range of anaerobic bacteria, including G. vaginalis, BVAB-I and P. amnii.  Informed Consent Statement: Informed consent was obtained from all participants 18 years or older involved in the study, while assent from the participant and informed consent from a parent or legal guardian was obtained for participants younger than 18 years old.
Data Availability Statement: Papillomaviruses sequences have been deposited in GenBank under accessions OP970964-OP971102, and raw reads have been deposited in SRA under BioProjects PRJNA767784 and PRJNA881266.