2. Materials and Methods
We conducted a prospective cross-sectional study that included patients who underwent both cervical screening and histopathologic evaluation at Cuza voda Clinical Hospital of Obstetrics and Gynecology, Iasi, Romania, between September 2024 and September 2025. Additional tests (vaginal microbiome profiling) for cervical dysplasia were performed.
The study was conducted in accordance with the Declaration of Helsinki. The protocol was approved by the local institutional ethics committees (Cuza voda Clinical Hospital of Obstetrics and Gynecology—11630/6 September 2024; Grigore T. Popa University of Medicine and Pharmacy Iasi—480/21 October 2024), and written informed consent was obtained from all participants involved in the study.
Only samples with concordant cytologic and histopathologic results were included in the analysis. We also included patients with available results for HPV genotyping, who gave their informed consent for participation in the study. Patients were excluded from the study if any of the following applied: absence of a histopathologic diagnosis, incomplete screening test results, prior treatment for cervical intraepithelial neoplasia or cervical cancer, or insufficient clinical data for the variables of interest.
The following clinically relevant data was retrieved from their medical files: age (years), body mass index (BMI, kg/m2), number of pregnancies, place of residence, history of HPV infection, history of sexually transmitted infections (STIs), smoking, alcohol consumption, hormonal contraceptive use, immunosuppression, HPV vaccination status, and HR-HPV positivity.
The groups were segregated based on histopathological diagnoses:
- -
Normal (Negative for intraepithelial neoplasia, NILM);
- -
LSIL (low-grade squamous intraepithelial lesion)/(CIN1, cervical intraepithelial neoplasia grade 1);
- -
HSIL (high-grade squamous intraepithelial lesion)/(CIN2–CIN3, cervical intraepithelial neoplasia grades 2 and 3);
- -
CCU (cervical carcinoma).
The final dataset included 86 samples: Normal (n = 26 patients), LSIL (n = 25 patients), HSIL (n = 25 patients), and CCU (n = 10). A cervix brush (Hologic, Bedford, MA, USA) was used to collect cervical samples for the Pap test and HPV genotyping. The ThinPrep liquid-based procedure was used to prepare the samples in accordance with the manufacturer’s instructions (ThinPrep-Hologic, Bedford, MA, USA). All samples were subjected to human papillomavirus detection and genotyping using AllplexTM HPV28 Detection (Seegene Technologies Inc. Europe, Dusseldorf, Germany) in accordance with the manufacturer’s instructions.
Vaginal samples for microbiota analysis were collected using the OMNIgene®•VAGINAL collection device (DNA Genotek (Stittsville, ON, Canada)). All samples were collected according to the manufacturer’s instructions and processed uniformly.
Microbial DNA was extracted following standardized protocols recommended by the manufacturer. Library preparation for bacterial profiling was performed using the Oxford Nanopore Technologies (ONT) (Oxford, UK) 16S Barcoding Kit 1–24 (SQK-16S024), which targets the full-length 16S rRNA gene and enables multiplexing of up to 24 samples per sequencing run.
Sequencing libraries were loaded onto R9.4.1 flow cells (FLO-MIN106) and sequenced on the ONT MinION platform. Flow cell priming, library loading, and sequencing were performed following ONT manufacturer recommendations.
Taxonomic classification was performed using 16S rRNA gene-based pipelines, yielding genus-level bacterial profiles for each sample. Only taxa consistently detected above background levels were retained for downstream analyses.
All wet-lab procedures were performed strictly according to the manufacturer’s protocols for the OMNIgene®•VAGINAL device, ONT 16S Barcoding Kit 1–24, and MinION sequencing platform.
Alpha diversity was assessed using observed genus richness and the Shannon diversity index, calculated on relative abundance data. Differences across diagnostic groups were evaluated using the Kruskal–Wallis test, followed by pairwise Mann–Whitney U tests with Benjamini–Hochberg false discovery rate (FDR) correction. Effect sizes were quantified using Cliff’s delta, and effect magnitude was interpreted using established thresholds (negligible, small, medium, large).
Beta diversity was quantified using the Aitchison distance, computed as Euclidean distance in CLR-transformed space. Ordination was performed using principal coordinates analysis (PCoA). Global differences in community composition across diagnostic categories were tested using PERMANOVA with 9999 permutations. Pairwise PERMANOVA comparisons were performed between diagnostic groups, with FDR correction applied to p values. To assess whether observed compositional differences were influenced by heterogeneity in within-group variance, PERMDISP was conducted using the same distance matrix.
Associations between vaginal microbiome composition and host-related variables (including physical activity, HPV vaccination status, menstrual cycle phase, and additional clinical and behavioral factors) were evaluated using PERMANOVA based on Aitchison distances. Each factor was tested independently using 9999 permutations.
Differential abundance testing was performed using ANCOM-BC2, a bias-corrected compositional method that estimates log-fold changes while accounting for sampling variability and compositional constraints. Global tests and biologically relevant pairwise contrasts were conducted across diagnostic categories. Genera with an FDR-adjusted q value ≤ 0.10 were considered statistically significant. Log-fold change estimates and 95% confidence intervals were visualized using forest plots.
To quantify disease-associated shifts in community structure, two log-ratio indices were computed using CLR-transformed data. The first was a predefined Lactobacillus-to-anaerobe log-ratio, contrasting Lactobacillus against a curated set of obligate anaerobic genera (including Gardnerella, Prevotella, Dialister, and related taxa), representing the ecological transition from Lactobacillus-dominated to dysbiotic states. The second was a data-driven composite log-ratio, constructed by contrasting geometric means of genera consistently increased versus decreased in CCU relative to Normal samples.
Log-ratio values were compared across diagnostic groups using Kruskal–Wallis tests followed by pairwise Mann–Whitney U tests with FDR correction. Effect sizes were quantified using Cliff’s delta. Diagnostic performance was evaluated using receiver operating characteristic (ROC) curves.
Genus–genus associations were estimated using Spearman rank correlations computed on CLR-transformed abundances. Correlations involving constant vectors or undefined coefficients were excluded.
Microbial co-occurrence networks were inferred separately for each diagnostic group using samples with available histopathological diagnoses. To ensure network stability and reduce spurious associations, taxa were filtered within each group prior to network construction. Genera were retained if they met the following criteria:
- -
Prevalence ≥ 30% of samples within the group (≥ 40% for CCU due to smaller sample size);
- -
Non-zero variance across samples.
From the genera passing these criteria, a maximum of 20 genera with the highest total abundance within each group were selected.
Pairwise Spearman correlations were computed on CLR-transformed data. p values were adjusted using the Benjamini–Hochberg FDR procedure. Edges were retained if they met both statistical and strength criteria:
- -
|ρ| ≥ 0.60 and q < 0.05 for Normal, LSIL, and HSIL groups;
- -
|ρ| ≥ 0.70 and q < 0.05 for CCU.
Undirected weighted networks were constructed using NetworkX 3.6.1, with nodes representing genera and edges weighted by Spearman’s ρ.
Network structure was characterized using standard graph metrics, including the following:
- -
Number of nodes (genera retained);
- -
Number of edges (significant correlations);
- -
Network density;
- -
Number of connected components.
All analyses were conducted in Python 3.12 and Stata 19.5 (StataCorp LLC, College Station, TX, USA). A p value < 0.05 was considered statistically significant.
3. Results
3.1. Baseline Characteristics of the Included Patients
The final cohort included 86 patients, and their clinical characteristics are presented in
Table 1. Women diagnosed with LSIL and HSIL were younger on average (35.40 ± 8.86 years and 37.56 ± 9.88 years, respectively) compared with those in the normal group (42.35 ± 9.40 years,
p = 0.0109). The highest mean age was observed in the CCU group (45.10 ± 8.80 years), suggesting a trend toward increasing age with disease severity and progression from precursor lesions to invasive cervical cancer.
HR-HPV positivity increased progressively with lesion severity, being present in 46.15% of women with normal cytology, 88.00% of those with LSIL, and 100% of women with HSIL and CCU (p < 0.001). Vaccination coverage was highest among women with LSIL (40.00%) and HSIL (28.00%), while no vaccinated individuals were identified in the CCU group (p = 0.049).
3.2. Microbial Diversity Association with Lesion Severity
Cervicovaginal microbial communities exhibited a progressive restructuring across the spectrum of cervical disease. Normal samples displayed low richness (median 6.0, Q1–Q3 4.0–14.0) and Shannon diversity (median 0.08, Q1–Q3 0.01–0.45), reflecting low-complexity ecosystems (
Table 2;
Figure 1 and
Figure 2).
LSIL and HSIL samples showed intermediate diversity (LSIL richness 10.0 [2.5–20.0], Shannon 0.23 [0.03–0.81]; HSIL richness 6.0 [4.0–10.3], Shannon 0.21 [0.01–0.78]), indicating partial destabilization of these communities. Cervical cancer samples exhibited markedly higher richness (15.5 [8.5–20.5]) and Shannon index (1.06 [0.84–1.78]) (
Table 2;
Figure 1 and
Figure 2).
Table 2.
Alpha diversity metrics across diagnostic groups.
Table 2.
Alpha diversity metrics across diagnostic groups.
| Diagnosis | Richness (Median) | Richness (Q1–Q3) | Shannon Index (Median) | Shannon Index (Q1–Q3) |
|---|
| Normal | 6.0 | 4.0–14.0 | 0.08 | 0.01–0.45 |
| LSIL | 10.0 | 2.5–20.0 | 0.23 | 0.03–0.81 |
| HSIL | 6.0 | 4.0–10.3 | 0.21 | 0.01–0.78 |
| CCU | 15.5 | 8.5–20.5 | 1.06 | 0.84–1.78 |
Pairwise comparisons revealed that cervical cancer samples presented significantly higher Shannon diversity and richness than all non-cancer groups. Specifically, cervical cancer samples versus normal samples showed a Shannon index difference with FDR-adjusted
p = 0.000028 and a large effect size (Cliff’s delta = −0.90), and richness was also significantly higher (FDR-adjusted
p = 0.0471, Cliff’s delta = −0.53) (
Table 3 and
Table 4). Compared with LSIL, cervical cancer samples exhibited elevated Shannon diversity (FDR-adjusted
p = 0.0068, Cliff’s delta = −0.67) and a smaller increase in richness (FDR-adjusted
p = 0.427, Cliff’s delta = −0.21). Relative to HSIL, cervical cancer samples displayed higher Shannon diversity (FDR-adjusted
p = 0.0068, Cliff’s delta = −0.65) and richness (FDR-adjusted
p = 0.0471, Cliff’s delta = −0.54).
On the other hand, differences among non-cancer samples were modest or negligible: normal versus LSIL (Shannon FDR = 0.375, Cliff’s delta = −0.18; richness FDR = 0.427, delta = −0.19), normal versus HSIL (Shannon FDR = 0.375, delta = −0.17; richness FDR = 0.740, delta = −0.06), and LSIL versus HSIL (Shannon FDR = 0.992, delta = −0.00; richness FDR = 0.427, delta = 0.17).
3.3. Community Composition Differs by Disease Severity
Multivariate analyses revealed that cervicovaginal microbial community composition diverges progressively across cervical disease severity. Global PERMANOVA based on Aitchison distances confirmed a strong effect of diagnosis on microbiome structure (pseudo-F = 2.43,
p = 0.0006; 9999 permutations;
n = 82) (
Table 5), indicating that diagnostic category accounts for a significant proportion of compositional variance.
Pairwise PERMANOVA comparisons highlighted that the largest shifts in community structure were associated with cervical cancer samples. Specifically, normal versus cervical cancer samples exhibited the strongest separation (FDR-adjusted
p = 0.0006), followed by significant differences between HSIL and cervical cancer samples (FDR-adjusted
p = 0.0296) and LSIL and cervical cancer samples (FDR-adjusted
p = 0.0377) (
Table 6).
In contrast, differences between intermediate lesion stages were smaller and, in some cases, not statistically significant, such as LSIL versus HSIL (FDR-adjusted p = 0.2228).
Normal samples also differed significantly from HSIL (FDR-adjusted
p = 0.0138), indicating that compositional changes begin early but intensify as lesions progress (
Table 6).
The PCoA scatter plot (
Figure 3) visualizes these compositional differences. Samples cluster broadly by diagnostic category, with normal and LSIL samples forming a relatively tight cluster near the origin, HSIL samples slightly more dispersed, and cervical cancer samples spreading further along both PCoA1 and PCoA2 axes, reflecting their higher dissimilarity.
Table 7 presents the mean Aitchison distances between vaginal microbiome profiles across diagnostic categories. The largest distances were observed between cervical cancer samples and all other groups (Normal: 14.98, LSIL: 15.98, HSIL: 14.88), indicating that cervical cancer microbiomes are highly divergent from non-cancer microbiomes.
On the other hand, the distances among non-cancer groups were smaller (Normal–LSIL: 10.74, Normal–HSIL: 9.91, LSIL–HSIL: 11.83), suggesting more reduced compositional changes between intermediate lesion stages.
3.4. Increasing Within-Group Heterogeneity with Progression
To determine whether variation in within-group dispersion influenced these patterns, PERMDISP analyses were performed. Dispersion differed significantly across diagnostic categories (F = 4.97,
p = 0.0034), with mean dispersions increasing progressively from Normal (5.86) to LSIL (8.69), HSIL (7.35), and cervical cancer samples (11.42), suggesting that microbial communities become increasingly heterogeneous with lesion severity (
Table 8).
The convex hull PCoA plot (
Figure 4) further illustrates group dispersion. The hulls for normal, LSIL, and HSIL largely overlap, reflecting their moderate similarity, while the convex hull for cervical cancer samples extends away from other groups, confirming both their compositional divergence and higher intra-group variability.
3.5. Host Factors Associated with Microbiome Composition
Table 9 summarizes PERMANOVA analyses assessing the association between host factors and vaginal microbiome composition. Among the factors tested, only a few were significantly associated with microbiome variation. Physical activity showed a significant effect (pseudo-F = 1.836,
p = 0.007), suggesting that activity levels influence overall community structure. HPV vaccination status also had a significant effect (pseudo-F = 2.594,
p = 0.014), indicating that vaccinated and unvaccinated individuals harbor distinct microbial communities. Additionally, current menstrual cycle phase was modestly significant (pseudo-F = 1.670,
p = 0.046), implying that hormonal fluctuations may contribute to microbiome variation.
3.6. Loss of Lactobacillus Dominance and Anaerobe Enrichment Across Lesion Severity
Individual-level stacked bar plots revealed marked shifts in the taxonomic structure of the cervicovaginal microbiome across diagnostic categories (
Figure 5 and
Figure 6). Normal samples were uniformly dominated by
Lactobacillus, with only minor contributions from other genera. In LSIL and HSIL, this structure became progressively more heterogeneous, with increasing representation of anaerobic taxa such as
Prevotella,
Peptostreptococcus,
Dialister, and
Anaerococcus. Cervical cancer samples exhibited the most significant restructuring:
Lactobacillus dominance was lost in nearly all patients and replaced by highly diverse, polymicrobial communities enriched in inflammatory anaerobes (
Prevotella,
Peptoniphilus,
Fannyhessea,
Finegoldia,
Fusobacterium).
Normal samples were dominated by Lactobacillus, which accounted for a median relative abundance of 98.77% and was detected in 100% of samples. Although Lactobacillus prevalence remained high in LSIL (median 96.33%, prevalence 95.65%) and HSIL (median 96.08%, prevalence 91.67%), its dominance progressively weakened, as reflected by declining mean relative abundance (LSIL 83.48%, HSIL 66.88%) and increasing representation of non-lactobacillar taxa.
In contrast, cervical cancer samples exhibited a marked loss of
Lactobacillus dominance, with a median relative abundance of only 5.02% despite persistence in 80% of samples (
Table 10). This loss of
Lactobacillus was accompanied by increased abundance and prevalence of anaerobic and facultative anaerobic genera.
Prevotella (median 1.39%, prevalence 70%),
Anaerococcus (median 0.88%, prevalence 70%),
Peptoniphilus (median 0.66%, prevalence 70%),
Fannyhessea (median 0.00%, prevalence 10%),
Finegoldia (median 0.07%, prevalence 60%), and
Fusobacterium (median 0.00%, prevalence 30%) were substantially enriched in cervical cancer samples, showing both higher mean relative abundance and increased prevalence compared with non-cancer groups. Several of these genera were present at low median abundance but high prevalence, indicating widespread low-level colonization rather than dominance by single taxa (
Table 10).
Intermediate lesion categories (LSIL and HSIL) displayed transitional microbial profiles, characterized by partial retention of lactobacillar dominance alongside increased prevalence of anaerobic genera. For example,
Anaerococcus prevalence increased from 28% in normal samples to 60.87% in LSIL and 33.33% in HSIL, while
Dialister prevalence rose from 48% in normal samples to 52.17% in LSIL. Similarly,
Peptoniphilus prevalence increased from 40% in normal samples to 43.48% in LSIL and 33.33% in HSIL, consistent with a gradual ecological shift rather than an abrupt compositional change (
Table 10).
3.7. Differential Abundance and Lactobacillus-to-Anaerobe Compositional Shifts Across Cervical Disease Severity
Global and pairwise analyses using ANCOM-BC2 (
Table 11) corroborated the predominant role of
Lactobacillus in driving microbial differences across diagnostic categories. The global test indicated a trend for
Lactobacillus depletion across disease stages (
p = 0.0038, FDR q = 0.369), although this did not reach statistical significance after correction for multiple testing. Pairwise comparisons further demonstrated consistent reductions in
Lactobacillus abundance with disease severity: logFC = −5.47 in cervical cancer versus normal, −4.49 in cervical cancer versus LSIL, −1.38 in HSIL versus LSIL, and −0.98 in LSIL versus normal.
Prevotella exhibited increased abundance in cervical cancer versus normal (logFC = 2.76, q = 0.160) and versus LSIL (logFC = 1.47, q = 0.929), but decreased modestly in HSIL versus LSIL (logFC = −1.22, q = 0.422).
Dialister and Staphylococcus showed more subtle and inconsistent patterns across diagnostic categories. For Dialister, the log fold changes were −0.24 for cervical cancer versus normal (q = 0.409), −0.98 for cervical cancer versus LSIL (q = 0.929), −1.17 for HSIL versus LSIL (q = 0.422), and 0.74 for LSIL versus normal (q = 0.657). Staphylococcus exhibited logFC values of −0.26 in cervical cancer versus normal (q = 0.391), −0.85 in cervical cancer versus LSIL (q = 0.929), −0.82 in HSIL versus LSIL (q = 0.479), and 0.58 in LSIL versus normal (q = 0.657). These data indicate that, unlike Lactobacillus, both Dialister and Staphylococcus show minor, non-significant fluctuations across lesion severity without consistent directional trends.
To capture community-wide compositional shifts, log-ratio analyses were performed. A predefined ratio contrasting
Lactobacillus against a panel of anaerobic genera showed a strong monotonic decline, reaching its lowest values in cervical cancer samples (
Figure 7).
Analysis of the log-ratio contrasting
Lactobacillus abundance against anaerobic genera revealed significant differences across cervical disease categories (Kruskal–Wallis H = 18.69,
p = 0.0003;
Table 12). Normal samples exhibited high median log-ratio values (5.06, Q1–Q3: 4.41–6.36), consistent with strong
Lactobacillus dominance and low relative abundance of anaerobes.
Intermediate lesions displayed partially destabilized communities, with LSIL samples showing a median log-ratio of 3.57 (Q1–Q3: 3.05–5.22) and HSIL samples a median of 4.34 (Q1–Q3: 0.21–5.07), reflecting heterogeneous microbial states. Cervical cancer samples demonstrated markedly reduced log-ratio values (median 0.51, Q1–Q3: −0.63–1.26), indicating a pronounced shift toward anaerobe-rich microbiomes and loss of Lactobacillus dominance.
Pairwise comparisons (
Table 13) confirmed that the largest differences were observed in comparisons involving cervical cancer. Log-ratio values differed significantly between Normal and cervical cancer (
p = 8.7 × 10
−5, FDR q = 5.2 × 10
−4) and between LSIL and cervical cancer (
p = 0.0040, FDR q = 0.0120), both with large effect sizes (Cliff’s delta = 0.86 and 0.64, respectively). Differences between Normal and LSIL (
p = 0.0149, q = 0.0223) and Normal and HSIL (
p = 0.0135, q = 0.0223) were of medium magnitude (Cliff’s delta = 0.41). No meaningful difference was detected between LSIL and HSIL (
p = 0.710, q = 0.710, Cliff’s delta = 0.07), indicating that early lesion categories exhibit only partial shifts in microbial composition. Comparisons of HSIL versus cervical cancer showed a medium effect size (Cliff’s delta = 0.40) but did not reach statistical significance after FDR correction (q = 0.0871).
To ensure consistency with non-parametric approaches, the same log-ratio was evaluated using Mann–Whitney tests (
Table 14). Significant differences were observed primarily in comparisons involving cervical cancer, including normal versus cervical cancer (
p = 0.00010, q = 0.00060) and LSIL versus cervical cancer (
p = 0.00107, q = 0.00643). Differences between HSIL and cervical cancer were nominally significant (
p = 0.0120, q = 0.0717), whereas comparisons among non-cancer categories (Normal vs. LSIL, Normal vs. HSIL, LSIL vs. HSIL) were non-significant after FDR adjustment.
3.8. Co-Occurrence Patterns Within Diagnostic Groups
Analysis of genus–genus co-occurrence patterns within each diagnostic category revealed strong, statistically significant correlations among specific bacterial taxa (
Table 15 and
Figure 8). In normal samples, the strongest positive correlations were observed between
Peptostreptococcus and
Escherichia (Spearman’s ρ = 0.92,
p < 0.001),
Veillonella and
Shigella (ρ = 0.80,
p < 0.001), and
Prevotella with
Hoylesella (ρ = 0.77,
p < 0.001).
Additional notable associations included Anaerococcus with Hoylesella (ρ = 0.76, p < 0.001) and with Peptoniphilus (ρ = 0.74, p < 0.001), reflecting coordinated presence of anaerobic and facultative anaerobic taxa within the healthy cervicovaginal microbiome.
In LSIL samples, strong correlations persisted among facultative anaerobes and anaerobic genera, including
Shigella and
Escherichia (ρ = 0.78,
p < 0.001),
Dialister with
Anaerococcus (ρ = 0.71,
p < 0.001) and with
Prevotella (ρ = 0.69,
p < 0.001), as well as
Peptostreptococcus with
Fusobacterium (ρ = 0.60,
p < 0.01) and
Prevotella with
Anaerococcus (ρ = 0.60,
p < 0.01). These results indicate that early lesion stages are characterized by moderate co-occurrence among anaerobic and facultative anaerobic taxa, consistent with partial destabilization of the microbiome (
Table 15 and
Figure 9).
In HSIL samples, correlations were generally stronger, with
Shigella and
Escherichia remaining highly correlated (ρ = 0.87,
p < 0.001). Other strong associations included
Fusobacterium with
Campylobacter (ρ = 0.80,
p < 0.001),
Peptoniphilus with
Finegoldia (ρ = 0.74,
p < 0.001), and
Anaerococcus with
Campylobacter (ρ = 0.74,
p < 0.001), reflecting emerging co-occurrence networks among pathogenic anaerobes as lesion severity increases (
Table 15 and
Figure 10).
In cervical cancer samples, genus–genus correlations reached the highest magnitudes. Perfect or near-perfect correlations were observed between
Shigella and
Escherichia (ρ = 1.00,
p < 0.001) and
Dialister with
Hoylesella (ρ = 0.99,
p < 0.001). Other strong associations included
Veillonella with
Pseudomonas (ρ = 0.94,
p < 0.001),
Anaerococcus with
Peptoniphilus (ρ = 0.93,
p < 0.001), and
Ureaplasma with
Staphylococcus (ρ = 0.87,
p < 0.01). These results indicate that advanced disease is associated with highly structured co-occurrence networks among anaerobic and facultative anaerobic genera, reflecting a more deterministic, polymicrobial community state in invasive carcinoma (
Table 15 and
Figure 11).
Correlation network analysis revealed marked differences in microbial community structure across disease stages (
Table 16). In the normal group, the network was sparse, with only seven genera retained after prevalence filtering and a single significant association detected between
Dialister and
Peptoniphilus (Spearman’s ρ = 0.686, q = 0.0032). The low network density (0.0476) and high number of disconnected components (six components) indicated a weakly interacting microbial community, consistent with a stable ecosystem.
In contrast, the LSIL group exhibited a substantial increase in network complexity, with fifteen genera retained and three strong positive associations. Notably, Dialister emerged as a central node, showing significant correlations with both Anaerococcus (ρ = 0.708, q = 0.0084) and Prevotella (ρ = 0.689, q = 0.0098), while a strong association between Escherichia and Shigella (ρ = 0.776, q = 0.0014) indicated coordinated expansion of facultative pathobionts. Despite this increased connectivity, the LSIL network remained fragmented, with a low density (0.0286) and a high number of disconnected components (12 components), suggesting heterogeneous microbial configurations characteristic of an early dysbiotic transition rather than a fully consolidated community state.
The HSIL group displayed an intermediate network structure, characterized by ten retained genera and two significant edges, representing a reduction in both node and edge counts relative to LSIL. However, the remaining associations were strong and predominantly involved anaerobic taxa, including Finegoldia–Peptoniphilus (ρ = 0.740, q = 0.0016) and Dialister–Peptoniphilus (ρ = 0.687, q = 0.0047). Network density (0.0444) suggested consolidation around a smaller number of tightly co-occurring anaerobic consortia, potentially reflecting increasing environmental constraints associated with lesion progression.
The cervical cancer samples showed the highest degree of network connectivity despite the smallest sample size, with eleven genera retained and four significant edges. This group exhibited the highest network density (0.0727) across all diagnostic categories. Strong positive correlations were observed among several anaerobic genera, including Dialister–Hoylesella (ρ = 0.988, q = 5.1 × 10−6) and Anaerococcus–Peptoniphilus (ρ = 0.927, q = 0.0031). In addition, a strong negative association between Anaerococcus and Pseudomonas (ρ = −0.855, q = 0.0225) indicated competitive exclusion within the cancer-associated microbiome.
4. Discussion
In this study, we comprehensively characterized the dynamics of the cervicovaginal microbiome across the spectrum of cervical disease severity, from normal cytology to intraepithelial lesions and invasive cervical carcinoma. Our results revealed a progressive, coherent, and multifaceted restructuring of the microbial community that is significantly associated with disease progression, loss of Lactobacillus dominance, and the emergence of complex polymicrobial anaerobic communities.
A central finding of our analyses is that the most pronounced differences in microbial diversity and composition are concentrated in cervical cancer, whereas transitions among non-cancer states (Normal, LSIL, HSIL) are comparatively subtle. Both alpha diversity metrics (richness and Shannon index) and multivariate analyses (PERMANOVA, PCoA, Aitchison distances) consistently demonstrate a clear ecological break between cancer and all other diagnostic categories. This pattern suggests that severe dysbiosis is not an early event in cervical disease but rather a defining feature of invasive carcinoma, likely reflecting profound alterations in the local cervical microenvironment accompanying malignant transformation.
Quantitative analyses across multiple studies consistently showed that HPV infection, cervical intraepithelial neoplasia, and cervical cancer were associated with increased microbial richness and higher Shannon diversity, together with distinct beta-diversity patterns. A large meta-analysis of 507 cervical samples showed significantly higher Shannon diversity and evenness in CIN and cervical cancer compared with normal controls, with a clear increasing trend across normal controls, HPV infection, CIN, and cancer, although differences between CIN and cancer were not significant [
18]. The same meta-analysis showed that cervical cancer was characterized by enrichment of opportunistic pathogenic taxa, including
Streptococcus,
Fusobacterium,
Pseudomonas, and
Anaerococcus, alongside a marked depletion of
Lactobacillus compared with normal controls. On the other hand, the CIN group exhibited significantly increased relative abundances of
Gardnerella,
Sneathia,
Pseudomonas, and
Fannyhessea relative to other bacterial taxa [
18].
Consistently, a cross-sectional study that included a large HPV-positive cohort (
n = 692 patients) demonstrated significantly greater diversity in high-grade CIN compared with lower-grade lesions using Shannon-based indices [
19]. The authors also showed that high-grade CIN was associated with coordinated downregulation of multiple metabolic and regulatory pathways, including the phosphotransferase system, transcription-related functions, fructose and mannose metabolism, amino sugar and nucleotide sugar metabolism, and galactose metabolism. Also, CIN was characterized by a distinct vaginal microbiome configuration marked by depletion of
Lactobacillus and
Pseudomonas and concomitant enrichment of
Gardnerella,
Prevotella, and
Dialister [
19].
Similarly, another cohort of HPV-positive patients reported an increase in mean Shannon diversity from 1.06 in HPV-negative patients to 2.23 in HPV-positive patients (
p = 0.002), with values rising across normal cytology, CIN, and cancer, even though not all histology-specific comparisons reached statistical significance. Moreover, HPV-negative normal samples clustered distinctly from CIN and cancer cases in ordination space, indicating fundamentally different community structures [
20]. A longitudinal CIN progression study including controls, LSIL, HSIL, and invasive cervical cancer further confirmed increasing Shannon and Simpson diversity with lesion severity in parallel with progressive loss of
Lactobacillus dominance [
21].
A systematic review and meta-analysis reinforced these findings, reporting significantly higher richness and Shannon diversity in vaginal samples from cervical cancer cases compared with controls, as well as higher Shannon diversity in cervical samples, although richness measures were less consistent across sample types [
22].
Beta-diversity and multivariate analyses further supported the presence of disease-associated shifts in overall community composition [
18]. Recent literature data incorporating compositionality-aware methods, such as Aitchison distances and robust compositional models, largely confirmed significant class separation after adjusting for study-specific effects, further strengthening evidence for consistent, disease-associated restructuring of the cervicovaginal microbiome [
23,
24].
Contrary to a model of gradual, linear microbial deterioration across lesion stages, our data indicate that LSIL and HSIL represent intermediate and unstable states characterized by increased heterogeneity rather than uniform shifts in diversity or community structure. This is supported by the small or negligible effect sizes observed in pairwise comparisons among Normal, LSIL, and HSIL groups for both alpha diversity and overall composition. Thus, microbiome alterations appear to accumulate gradually but manifest abruptly upon transition to invasive cancer.
The marked increase in alpha diversity observed in cervical cancer, together with higher within-group dispersion, points to a loss of community-level constraint and the emergence of more permissive and less stable microbial assemblages. Anaerobic growth in cervical cancer most likely results from the interplay of microbial ecology, inflammation, and persistent HPV infection. Immune dysregulation and HPV-induced epithelium disruption decrease colonization resistance, but
Lactobacillus depletion raises pH and lessens ecological limitations, which promotes anaerobic expansion [
25,
26]. Consequently, inflammatory metabolites produced by anaerobes may enhance viral persistence and alter the milieu to facilitate the growth of cancer [
27,
28].
A key feature of disease progression identified in this study is the progressive loss of Lactobacillus dominance. While Lactobacillus remained prevalent in most non-cancer samples, its relative abundance declined in parallel with increasing lesion severity, accompanied by expansion of anaerobic taxa. Compositional log-ratio analyses, which are robust to the constraints of relative abundance data, revealed a strong monotonic decrease in the Lactobacillus-to-anaerobe ratio, with the most pronounced differences involving cervical cancer. These findings support the notion that imbalance between lactic acid-producing bacteria and anaerobic, pro-inflammatory taxa.
Across multiple cohorts, cervicovaginal microbial communities have been consistently classified into
Lactobacillus-dominated community state types (CSTs I–III/V) and anaerobe-dominated CST IV, revealing systematic proportional shifts with increasing disease severity. In a Chinese cohort spanning normal cytology, HPV infection, LSIL, HSIL, and cervical cancer,
Lactobacillus remained the most abundant genus overall but declined progressively with lesion severity, while anaerobic taxa, including
Prevotella,
Anaerococcus,
Sneathia,
Megasphaera,
Fusobacterium,
Veillonellaceae, and
Porphyromonas uenonis, were disproportionately enriched in cancer cases [
29]. Normal samples were predominantly classified as CST III (
L. iners–dominated), whereas HPV infection and subsequent lesion development were associated with a stepwise increase in CST IV prevalence, reflecting a marked reduction in the
Lactobacillus:anaerobe balance as disease progressed [
29].
Similar patterns were observed in a mixed-ethnicity cohort, where the prevalence of high-diversity,
Lactobacillus-poor CST IV increased from 10% in healthy controls to 40% in invasive cervical cancer, accompanied by declining
Lactobacillus abundance and increasing representation of strict anaerobes such as
Sneathia,
Anaerococcus, and
Peptostreptococcus [
29].
A culturomics-based comparison of non-cancer and cervical cancer samples similarly showed dominance of Firmicutes and lactic acid bacteria in non-cancer samples, contrasted with depletion or complete absence of
Lactobacillus, increased anaerobic diversity, and frequent isolation of
Bacteroides and other opportunistic anaerobes in cervical cancer, suggesting that the
Lactobacillus: anaerobe ratio approaches zero in many affected women [
30].
In a longitudinal study of CIN2 patients followed for 24 months,
Lactobacillus-dominant communities (≥81.6%
Lactobacillus) were present in 65.5% of women at baseline, whereas those with
Lactobacillus-depleted, strict-anaerobe–rich communities (<54.2%
Lactobacillus) exhibited a 3.2- to 3.6-fold increased odds of CIN2 persistence at 12 months (adjusted OR 3.56, 95% CI 1.31–9.60) [
31]. These findings highlight the
Lactobacillus: anaerobe ratio as a biologically meaningful metric that distinguishes regressive from persistent diseases and links microbial community structure to clinical outcomes.
Notably, genus-level differential abundance analyses (ANCOM-BC2) identified consistent directional trends but few statistically significant associations after correction for multiple testing. This highlights an important limitation of single-taxon approaches in highly variable microbial ecosystems and suggests that biologically meaningful changes are better captured at the community level or through compositional contrasts, such as log-ratios and network-based analyses.
Network and correlation analyses further illuminated the reorganization of microbial community structure across disease stages. Normal samples exhibited sparse, weakly connected networks, characteristic of a stable Lactobacillus-dominated state with limited inter-taxon interactions. With increasing lesion severity, networks became denser and more structured, culminating in cervical cancer with tightly interconnected anaerobic consortia. The presence of strong negative correlations in cancer samples further suggests competitive interactions and niche exclusion, hallmarks of perturbed and highly constrained microbial systems.
Among host-related factors, only physical activity, HPV vaccination status, and menstrual cycle phase were significantly associated with overall microbiome composition. However, evidence on physical activity, HPV vaccination, and menstrual cycle phase as modifiers of the vaginal microbiota specifically in cervical cancer is very sparse. Literature data suggested that vaginal microbial community structure is strongly influenced by hormonal fluctuations and becomes less stable and more diverse during menstruation, whereas pregnancy is associated with greater stability and dominance of
Lactobacillus [
32].
Reviews focusing on the cervical microbiota and cancer further emphasize the role of endogenous hormones and menopausal status in shaping cervicovaginal communities, with menopause generally associated with reduced
Lactobacillus, increased anaerobic taxa, and heightened inflammation, conditions that may promote neoplastic processes [
22,
33]. Also, a large case series of HPV-positive women with CIN and invasive cervical cancer showed that microbiome, metabolite, and cytokine interactions differed before and after menopause, with age-specific cancer-associated genera and metabolite correlations, supporting a strong effect of hormonal and menstrual status on the microenvironment of cervical neoplasia [
34].
Several limitations of this study should be acknowledged. The relatively small size of the cervical cancer group may have limited statistical power for certain analyses, particularly differential abundance testing. As a consequence, the probability of detecting small effects, particularly in high-dimensional microbiome data, is reduced, and the risk of false negatives is higher.
In addition, the cross-sectional design precludes causal inference regarding whether microbiome alterations contribute to lesion progression or arise as a consequence of disease. Although this study was not designed to investigate host-related determinants, our results showed that physical activity, HPV vaccination status, and menstrual cycle phase reached nominal significance. On the other hand, their modest effect sizes indicated a limited contribution to overall microbiome variation. Several other factors showed borderline associations, which may be biologically plausible but cannot be interpreted definitively due to limited statistical power. Larger, adequately powered studies are needed to confirm these findings.
In summary, our data support a model in which progression to cervical cancer is associated with a major reorganization of the cervicovaginal microbiome, characterized by loss of Lactobacillus dominance, increased diversity, and consolidation of complex anaerobic networks. These changes appear to reflect a shift in community state associated with advanced disease rather than a gradual continuum across early lesion stages. Finally, the use of compositional log-ratio indices provides a quantitative measure of the Lactobacillus-to-anaerobe balance, and it could be further studied as a potentially more robust marker of dysbiosis than relative abundance alone.
Thus, our findings highlight the cervicovaginal microbiome as a potential biomarker of disease progression and a candidate target for adjunctive strategies in cervical cancer prevention and management. However, high inter-individual variability, temporal instability influenced by hormonal and environmental factors, and lack of standardized sampling and analytical pipelines limit immediate clinical translation of vaginal microbiota.