1. Introduction
Inflammation is a well-recognized hallmark of cancer and is implicated across multiple stages of tumor development [
1,
2]. This immune-response signaling can be triggered by a variety of stimuli, arising from exogenous factors, particularly by infectious agents or endogenous sources including tissue damage, cellular stress, and dysregulated cellular signaling [
3,
4]. When inflammation becomes persistent, it may contribute to tumorigenesis by sustaining cytokine and chemokine signaling, increasing oxidative and replicative stress, and inducing microenvironmental alterations, such as stromal activation and extracellular matrix (ECM) remodeling [
5,
6,
7,
8]. Altogether, these processes may contribute to a pro-tumorigenic milieu that has been associated with tumor initiation and progression [
9,
10].
Within the female reproductive tract, several pathogens can drive inflammatory responses; among them, the bacterium
Chlamydia trachomatis (CT) is one of the most common sexually transmitted infections worldwide [
11,
12,
13,
14,
15]. CT can establish persistent or recurrent infections and is strongly associated with adverse reproductive outcomes, including pelvic inflammatory disease (PID), tubal scarring, and infertility [
14,
16,
17]. Beyond reproductive morbidity, CT has been proposed as a potential contributor to pro-tumoral microenvironment development, through sustained inflammatory signaling, activation of innate immune pathways, metabolic and oxidative stress, and tissue remodeling [
18,
19,
20,
21,
22,
23]. However, the infection-induced molecular pathways that could plausibly connect CT exposure to tumor initiation and progression remain incompletely investigated [
24].
The fallopian tube is a relevant reproductive tissue for investigating CT infection-associated mechanisms in gynecologic disease [
25,
26,
27]. Indeed, mesenchymal cells in the fallopian tube contribute to ECM deposition, wound healing responses, and stromal remodeling, and they participate in the inflammatory crosstalk during tissue injury and repair [
26]. In the setting of CT infection, stromal and mesenchymal responses may therefore contribute to chronic inflammation and fibrosis that underlie tubal pathology [
28]. Characterizing transcriptional responses in fallopian tube mesenchymal cells can provide mechanistic insight into host pathways engaged by CT infection and into tissue remodeling pathways that may have downstream relevance to disease [
29].
Interestingly, high-grade serous ovarian carcinoma (HGSOC), the most frequent and lethal ovarian cancer subtype, is strongly associated with the distal fallopian tube as a site of origin in many patients [
30]. This anatomical and biological connection motivates investigating whether CT infection-associated transcriptional programs observed in fallopian tube cell types show overlap with transcriptional states present in ovarian tumors [
30,
31,
32,
33,
34,
35]. Importantly, pathway-level overlaps do not indicate causality or confirm pathogen presence in tumor tissues, but may provide translational context for mechanistic hypotheses.
Despite already established interest in this topic, evidence connecting CT infection to ovarian tumorigenesis remains limited and has not been systematically explored at the level of infection-induced transcriptional programs and their relevance in the tumoral context [
36,
37]. In this study, we performed an integrative in silico analysis using publicly available transcriptomic data from primary human fallopian tube mesenchymal cells infected in vitro with CT (GSE109428) to identify differentially expressed genes and to characterize affected biological processes using functional enrichment and protein-protein association network analysis [
38]. To provide transcriptional context, we further analyzed TCGA ovarian cancer (TCGA-OV) by computing single-sample gene set enrichment (ssGSEA) scores for four pre-defined signatures capturing IFN/ISG, TNF/NF-kB, NOD/innate immunity, and ECM programs, and by evaluating inter-signature relationships and exploratory associations with clinical outcomes [
39]. This study design is based on the re-analysis of existing high-throughput datasets to generate testable hypotheses and prioritize candidate pathways for future mechanistic validation using approaches that can distinguish contributions from different cell populations [
40,
41,
42]. Accordingly, we aimed to define CT-responsive gene expression changes in primary human fallopian tube mesenchymal cells and identify enriched biological processes and interaction modules using g:Profiler and STRING [
43,
44]. Concomitantly, we assessed the co-occurrence and exploratory prognostic relevance of selected infection-linked signatures in TCGA ovarian tumors (TCGA-OV) using ssGSEA and survival analyses [
45].
A schematic overview of the study design is provided in
Figure 1, summarizing the in vitro infection transcriptome analysis, functional/network interpretation, and the translational ssGSEA-based evaluation in TCGA-OV.
2. Materials and Methods
2.1. Transcriptomic Data Search Strategy
The public transcriptomic datasets related to CT infection were searched in the NCBI Gene Expression Omnibus (GEO) database [
46]. The query term “
Chlamydia trachomatis” was used and results were filtered to “
Homo sapiens” origin and study type “Expression profiling by array”. This search returned 28 datasets, which were then screened manually based on predefined inclusion/exclusion criteria aligned with the study objective of characterizing host transcriptional programs induced by CT infection in a fallopian tube–relevant cellular context. In detail, the inclusion criteria were human-derived samples; a clear CT infection vs. an appropriate uninfected control; availability of raw or processed expression data with sample-level metadata sufficient to define groups; and a study design compatible with differential expression analysis without major co-interventions. Additionally, the exclusion criteria were blood samples; studies involving drug treatments or additional interventions beyond CT infection; established long-term cell lines if primary tissue-derived cells were available; and non-gynecologic infection contexts.
After the manual screening, GSE109428 dataset (“Primary mesenchymal cells from the human fallopian tube infected with
Chlamydia trachomatis”) was selected because it uses primary human fallopian tube mesenchymal cells, which are a stromal population relevant to tissue remodeling and scarring, also it includes defined timepoints: 24- and 48- hours post-infection (hpi) enabling evaluation of initial and sustained responses, and finally, it provides a direct infection vs. control workflow, appropriate for downstream analyses [
38]. Importantly, this dataset has been comparatively underexplored in the context of CT infection and tumorigenesis hypothesis generation.
2.2. Differential Gene Expression Analysis
Differential gene expression analysis was performed in R version 4.5.3 (2026-03-11 ucrt) using the Bioconductor package Linear Models for Microarray Data (limma) (version 3.66.0). The GSE109428 series matrix was retrieved using the GEOquery package (version 2.78.0). Prior to differential expression analysis, quality control was performed to assess data integrity and sample consistency. Expression values distributions were inspected across samples. Expression values were log2-transformed when required (maximum expression value > 100). Unsupervised principal component analysis (PCA) was performed on the full expression matrix to assess sample clustering by experimental group and to identify potential outliers or batch effects. All nine samples clustered in accordance with their assigned experimental groups (control, 24-hpi, 48-hpi), with no evidence of outliers or systematic batch effects. Probes with zero variance across samples were removed prior to model fitting. It should be noted that the GSE109428 dataset includes a single set of non-infected control samples (n = 3) used as the reference for both the 24-hpi and 48-hpi comparisons, as time-matched controls were not available in this public dataset. Consequently, transcriptional changes observed at each timepoint are interpreted relative to the same baseline, and the potential contribution of culture duration effects cannot be formally excluded.
A design matrix was defined to model the three experimental conditions (uninfected control, 24- and 48-hpi). Linear models were fitted using lmFit (limma package version 3.66.0), and contrasts were specified to compare infected samples at 24- and 48-hpi against controls. Empirical Bayes moderation was applied using eBayes (limma package version 3.66.0). Multiple testing correction was performed using the Benjamini–Hochberg false discovery rate (FDR) method.
Microarray probe identifiers were mapped to gene symbols using the corresponding GEO platform annotation file (GPL21272, Agilent microarray, Santa Clara, CA, USA) [
47]. Probes without valid gene symbol annotations were excluded. When multiple probes mapped to the same gene symbol, probe-level results were collapsed to gene level by retaining the probe with the lowest adjusted
p-value for that gene. Differentially expressed genes (DEGs) were defined using an adjusted
p-value < 0.05 and an absolute log2 fold change (|log2FC|) ≥ 1. To facilitate interpretation and address reviewer requests for transparency, the top 25 upregulated and downregulated genes at each timepoint, ranked by FDR, are provided in
Supplementary File S1 (Table S1), including log2FC, mean expression, t-statistic, and FDR values. DEG sets were subsequently used for functional enrichment and protein–protein association network analyses. All data processing, statistical analysis, integration, and visualization steps were performed in R. The complete R script is provided in
Supplementary File S1.
2.3. Functional Enrichment Analysis
Functional enrichment analysis was performed to interpret the biological processes represented in the CT-responsive gene sets derived from the previous differential expression analysis. In detail, this analysis was carried out using the g:Profiler web server (g:GOSt), with the organism set to
Homo sapiens [
44]. To reduce platform-related bias, the enrichment background or gene universe was restricted to genes represented on the GPL21272 microarray platform, mapped to valid gene symbols after preprocessing and annotation (
n = 32,063). Gene Ontology (GO) Biological Processes (GO:BP), KEGG, and Reactome terms were queried. Also, multiple testing correction using the g:SCS (Set Counts and Sizes) method implemented in g:Profiler, which controls for multiple testing in the context of gene set enrichment and is not equivalent to Benjamini–Hochberg FDR correction. Significantly enriched terms were defined at a g:SCS-adjusted
p-value threshold of <0.05. For reporting in the main text, a more stringent threshold of g:SCS-adjusted
p-value < 0.01 was applied to prioritize the most robust terms. Enrichment results were used to guide the selection of representative gene signatures for downstream analyses and to support interpretation of infection-associated transcriptional programs.
2.4. Protein–Protein Interaction (PPI) Analysis
PPI analysis was performed using the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) web platform (version 12.0) to identify interaction modules and highly connected pathways within CT-responsive gene sets [
43]. Networks were generated using a high-confidence interaction threshold (combined score ≥ 0.7), restricted to
Homo sapiens. To minimize literature-driven and context-independent links, text-mining and genomic neighborhood evidence were excluded. The resulting PPI networks were visualized to assess dominant biological pathways, interaction patterns, and functional modules.
2.5. TCGA-OV ssGSEA Signature Scoring, Tumor Microenvironment Deconvolution, and Survival Analysis
As a translational extension of the in vitro CT infection analysis, we investigated whether infection-linked transcriptional programs are reflected as expression signatures in human ovarian tumors and whether they show any exploratory association with clinical outcomes. Gene-level mRNA expression data for ovarian cancer were retrieved from the UCSC Xena platform using the UCSCXenaTools R package (version 1.7.0). We analyzed the TCGA-OV HiSeqV2 gene expression matrix (UCSC Xena dataset identifier: TCGA.OV.sampleMap/HiSeqV2). In detail, TCGA-OV was selected given the relevance of the fallopian tube to high-grade serous ovarian carcinoma and to provide a clinically annotated tumor transcriptome resource for signature-based analyses.
The downloaded expression matrix was processed as a gene-by-sample matrix with gene symbols as row names and log2-scaled expression values as provided by UCSC Xena. Four biologically motivated gene signatures were defined based on infection-associated programs observed in the in vitro analysis. Signature gene selection followed a data-driven approach based on the results of the in vitro transcriptomic analysis. Briefly, following differential expression analysis, each DEG set was independently submitted to g:Profiler functional enrichment analysis. Enriched biological terms were reviewed, and non-redundant representative pathways were selected across GO:BP, KEGG, and Reactome databases. For each selected pathway, the corresponding contributing genes from the DEG list were identified, and subsets of these genes were submitted to STRING network analysis to identify highly connected interaction modules. Signature genes were then defined as those forming coherent, densely connected modules in STRING that were also supported by functional enrichment evidence, ensuring that each signature reflected a biologically interpretable and network-supported transcriptional program rather than an arbitrary gene list. Specifically, the IFN/ISG signature (12 genes) was derived from the interferon/ISG STRING module of sustained upregulated genes, comprising MX1, MX2, IFIT1, IFIT2, IFIT3, OAS2, OASL, ISG15, USP18, RSAD2, IRF7, and ISG20. The TNF/NF-κB signature (13 genes) was derived from the cytokine/chemokine STRING module, comprising IL1A, IL1B, IL6, CXCL1, CXCL2, CCL2, CCL3, CCL5, NFKBIA, RELB, NFKB1, TNF, and PTGS2. The NOD/innate signature (9 genes) was derived from innate immune signaling components enriched in the sustained upregulated set, comprising RIPK2, TRAF6, NFKB1, IRF9, CASP4, GBP5, NLRP3, SQSTM1, and PELI2. The ECM signature (15 genes) was derived from the basement membrane and the ECM STRING module of genes downregulated exclusively at 48-hpi, comprising COL1A2, COL3A1, COL4A1, COL4A2, COL5A1, COL5A2, LAMA2, LAMB1, LAMB2, LAMC1, NID1, NID2, POSTN, FN1, and ITGAV. Prior to scoring, each signature was intersected with the TCGA-OV expression matrix to retain only genes present in the dataset, and the final signature sizes were recorded. For each tumor sample, signature scores were calculated using single-sample gene set enrichment analysis (ssGSEA) implemented in the GSVA package (parameter-based API; ssgseaParam), generating one score per signature, per tumor sample. Associations between signature scores were assessed using Spearman’s rank correlation with pairwise complete observations.
To characterize the contribution of tumor microenvironment composition to the observed signature scores, immune and stromal infiltration were estimated using the ESTIMATE algorithm, implemented in the ESTIMATE R package. Briefly, ESTIMATE derives an ImmuneScore and a StromalScore for each sample based on the expression of curated immune- and stromal-associated gene sets, respectively, alongside a composite ESTIMATEScore. The expression matrix was formatted according to ESTIMATE requirements and filtered to the 10,412 common genes using filterCommonGenes prior to score computation. Spearman correlations between ssGSEA signature scores and ESTIMATE scores were computed and visualized to assess the extent to which signature variation reflects immune or stromal cell abundance.
Clinical endpoints were obtained from the UCSC Xena curated survival table (dataset identifier: survival/OV_survival.txt), including overall survival (OS) and progression-free interval (PFI). Clinical covariate data, including age at diagnosis, tumor stage (FIGO), histological grade, and residual disease, were retrieved from the TCGA-OV clinical matrix (UCSC Xena dataset identifier: TCGA.OV.sampleMap/OV_clinicalMatrix). Stage was categorized as I–II, III, or IV; grade as G1–G2 or G3–G4; and residual disease as none, low (1–10 mm), or high (>10 mm).
For exploratory visualization, tumors were stratified into “High” versus “Low” groups based on a median split of a signature score (IFN/ISG), and Kaplan–Meier curves were compared using the log-rank test. Multivariable Cox proportional hazards models were fitted for OS and PFI in two specifications: (i) an unadjusted model including only the four continuous ssGSEA signature scores, and (ii) a clinically adjusted model additionally including age, stage, grade, and residual disease as covariates. A third model further incorporated ESTIMATE ImmuneScore and StromalScore as additional covariates to account for tumor microenvironment composition. The analytic cohort for adjusted models was restricted to samples with complete data for all covariates (
n = 264; 43 samples excluded due to missing clinical covariate data, primarily residual disease status). To account for multiple comparisons across signatures and endpoints,
p-values from all Cox models were adjusted using the Benjamini–Hochberg (BH) false discovery rate procedure. All analyses were performed in R and for reproducibility, the full workflow is provided in
Supplementary File S2.
2.6. Statistical Analysis and Visualization
All analyses were performed in RStudio (version 2025.09.1+401). Differential expression analyses were conducted using limma with empirical Bayes moderation and Benjamini–Hochberg false discovery rate (FDR) correction. For the microarray dataset, expression values were log2-transformed when required (maximum expression value > 100), and probes with zero variance across samples were removed prior to model fitting. Probe identifiers were mapped to gene symbols using the GEO platform annotation (GPL21272) and probes without valid gene symbols were excluded. When multiple probes were mapped to the same gene symbol, results were collapsed to the gene level by retaining the probe with the lowest adjusted p-value for that gene. DEGs were defined using FDR < 0.05 and |log2FC| ≥ 1. Functional enrichment analysis was performed using g:Profiler with g:SCS-adjusted p-values. STRING network analysis used a high-confidence threshold (combined score ≥ 0.7), excluding text-mining and genomic neighborhood evidence.
Data manipulation was performed using the dplyr package. Volcano plots were generated using ggplot2 and heatmaps using pheatmap. TCGA-OV data retrieval was performed using UCSCXenaTools; ssGSEA scoring used the GSVA package; correlation analyses used Spearman’s method; tumor microenvironment deconvolution was performed using the ESTIMATE package, deriving ImmuneScore, StromalScore, and ESTIMATEScore for each sample; survival analyses used the survival and survminer packages, including Kaplan–Meier curves with log-rank tests and Cox proportional hazards models. Multivariable Cox models were fitted in two specifications: unadjusted (signature scores only) and clinically adjusted (additionally incorporating age, tumor stage, histological grade, and residual disease), with a third specification further including ESTIMATE ImmuneScore and StromalScore. To account for multiple comparisons across signatures and endpoints, p-values from Cox models were adjusted using the Benjamini–Hochberg procedure.
4. Discussion
In this exploratory in silico study, we characterized gene expression signatures induced by CT infection in primary human fallopian tube mesenchymal cells and evaluated whether analogous inflammatory and ECM-related programs are detectable in clinically annotated ovarian tumors from TCGA-OV [
38,
45,
48]. By integrating differential expression analysis, functional enrichment, protein–protein interaction network analysis, and ssGSEA scoring, this work provides a reproducible, hypothesis-generating framework and prioritizes infection-associated pathways for follow-up in studies addressing potential links between CT infection, chronic tissue remodeling, and female tumorigenesis.
The most prominent transcriptional feature of CT-infected fallopian tube mesenchymal cells was a pronounced and sustained inflammatory response, characterized by enrichment of TNF/NF-κB, IL-17, and cytokine-mediated signaling pathways alongside a densely connected STRING module centered on IL1A/IL1B, IL6, CXCL1/2, CCL2/5, and related chemokines (
Table 3 and
Figure 2). These findings are consistent with the known immunobiology of CT infection, which triggers pattern recognition receptor activation and downstream cytokine cascades, and with broader enrichment of NF-κB- and MAPK-related terms together with cytokine–cytokine receptor interaction annotations [
13]. The sustained nature of this response, detectable across both post-infection timepoints, is consistent with CT engaging host transcriptional programs characteristic of chronic inflammatory tissue states rather than a purely transient innate response.
In parallel, we observed a strongly connected interferon-stimulated gene (ISG) module, with enrichment for type I interferon signaling and a compact STRING subnetwork comprising MX1/MX2, IFIT1/2/3, OAS2/OASL, ISG15, USP18, RSAD2, and IRF7 (
Figure 2). This pattern is consistent with activation downstream of intracellular pathogen detection and can persist beyond the early-infection timepoint. Persistent interferon-associated programs may have complex downstream consequences, including modulation of cytokine networks, antigen presentation, cell survival and stress responses [
49,
50,
51]. Importantly, both the inflammatory and ISG programs have been implicated in chronic inflammation-associated carcinogenesis, where persistent cytokine signaling and oxidative stress can promote genomic instability and pro-tumorigenic microenvironmental changes [
8,
52,
53]. In the context of infection–tumorigenesis hypotheses, these sustained inflammatory and interferon-linked axes represent a mechanistically plausible route by which recurrent or persistent CT infection could potentially contribute to long-term microenvironmental perturbation, even if active infection is no longer present at the time of tumor development. Indeed, the concurrent activation of interferon and NF-κB programs observed here is mechanistically supported by evidence from Madaan and colleagues, who demonstrate in fallopian tube epithelial cells that ISGylation, mediated by ISG15 and its activating enzyme UBA7, simultaneously amplifies IRF3 and NF-κB signaling downstream of cytosolic RIG-I/MDA5 pattern recognition receptors [
54]. Although this mechanism was defined in epithelial cells, it suggests that coordinated interferon and NF-κB activation may represent a conserved feature of innate immune signaling across fallopian tube cellular compartments. In the context of intracellular infections, such as with CT, this suggests a plausible molecular basis for the co-regulation of these transcriptional axes observed in our data, consistent with transcriptional states that have been associated with pro-tumorigenic microenvironmental conditions in other contexts.
It should also be acknowledged that the relationship between chronic infection, inflammation, and tumorigenesis is not unidirectional. Persistent CT infection and the associated inflammatory and interferon responses could also promote immune surveillance mechanisms or induce fibrotic responses that may inhibit early neoplastic transformation. The net effect of CT-associated stromal and immune perturbation on tumor initiation or progression is therefore uncertain, and it depends on the context, which underscores the need for experimental models that can clarify these biological processes.
A second major observation of our analysis was the coherent downregulation of ECM and adhesion-related programs specifically at 48-hpi, including enrichment for ECM organization, ECM–receptor interaction, and focal adhesion pathways, and a densely connected basement membrane STRING module dominated by collagens, laminins, and nidogens (
Table 4 and
Figure 3). Reactome enrichment for MET–PTK2 signaling further supports involvement of motility- and adhesion-associated signaling axes [
55,
56]. Mesenchymal and stromal cells are central orchestrators of tissue remodeling and fibrosis; thus, coordinated downregulation of ECM structural genes in this cell type suggests that CT infection may perturb stromal programs that regulate tissue architecture and cell–matrix interactions [
57]. This finding aligns with reports that CT infection induces stromal activation and fibrotic responses in the fallopian tube, contributing to tubal scarring and dysfunction [
27,
58,
59]. In cancer biology, ECM remodeling and altered adhesion signaling are key features of tumor microenvironments that influence invasion, immune cell trafficking, and therapeutic resistance [
60,
61,
62,
63].
The downregulation of ECM structural genes in CT-infected mesenchymal cells at 48-hpi may appear paradoxical given that most solid tumors, including HGSOC, are associated with ECM upregulation and desmoplasia [
64]. This apparent contrast may reflect several non-mutually exclusive possibilities: an acute cytoskeletal reorganization response to intracellular infection, a transient adaptation that precedes subsequent fibrotic remodeling upon resolution or persistence of infection, or a limitation of the in vitro system that cannot recapitulate the complexity of a multicellular in vivo system [
65,
66]. Importantly, fibrotic remodeling in the context of CT infection is a complex process involving immune cell recruitment and paracrine cytokine signaling that cannot be fully modeled in a single-cell infection system. Whether the acute ECM repression observed here is protective, permissive, or a transient in vitro response warrants investigation in more physiologically relevant experimental models.
To provide translational tumor context, we quantified ssGSEA scores for infection-linked inflammatory, innate, interferon, and ECM programs in TCGA-OV. The moderate correlations observed between TNF/NF-κB, NOD/innate, and IFN/ISG scores indicate that these immune-related transcriptional states frequently co-occur across ovarian tumor bulk transcriptomes, consistent with the concept of an inflamed tumor microenvironment phenotype described across multiple cancer types and associated with immune cell infiltration and coordinated cytokine signaling [
67,
68,
69]. This pattern may reflect combined contributions from tumor cells and the immune and stromal components of the tumor microenvironment [
67]. Notably, the ECM signature showed weak correlation with inflammatory programs, suggesting that ECM variation in tumors is likely driven by distinct biological processes, such as stromal composition, desmoplastic responses, or fibrosis, rather than by the same inflammatory axes captured here, consistent with the known functional heterogeneity of cancer-associated fibroblasts and their role in extracellular matrix remodeling and desmoplasia [
64,
70,
71,
72]. It should be emphasized that the co-occurrence of inflammatory transcriptional states in TCGA-OV tumors reflects broad immune programs that are common across many cancer types and inflammatory conditions and cannot be interpreted as evidence of CT-driven oncogenesis or pathogen presence in tumor tissues.
Neither the IFN/ISG median-split Kaplan–Meier analysis nor the multivariable Cox models (unadjusted, clinically adjusted, or adjusted with ESTIMATE microenvironment scores) yielded statistically significant associations with overall survival or progression-free interval after Benjamini–Hochberg correction (
Figure 6 and
Table 6). This negative finding is informative rather than merely null, as it points to specific limitations in signature design and data context rather than the absence of a biologically relevant association. In detail, the signatures used herein were short (9–15 genes) and derived from a single in vitro infection model, which cannot capture the full complexity of these transcriptional programs in heterogeneous tumor tissues. TCGA-OV expression profiles are bulk measurements influenced by not only the tumor cells, but also the variable immune and stromal infiltration, importantly, ESTIMATE-based deconvolution confirmed that the ECM signature score is strongly driven by stromal cell abundance (StromalScore ρ = 0.83), and inflammatory signatures substantially reflect immune cell infiltration (ImmuneScore ρ = 0.72–0.74 for TNF/NF-κB and NOD/innate), indicating that these scores do not capture infection-specific transcriptional programs independently of microenvironment composition [
73]. Furthermore, TCGA-OV is enriched for high-grade serous ovarian carcinoma, a relatively homogeneous and aggressive subtype with limited transcriptomic prognostic stratification in many published signatures, and the analytic cohort may be underpowered to detect modest effect sizes [
74,
75,
76]. These findings do not exclude the possibility that refined or expanded signatures, validated in independent cohorts, could reveal meaningful prognostic associations. The TCGA component should therefore be interpreted as a supportive translational context for pathway-level plausibility rather than as evidence of CT-driven prognostic stratification.
Therefore, several limitations of this study should be acknowledged. This work is entirely in silico and hypothesis-generating, so no experimental validation of candidate pathways or genes has been performed. Experimental confirmation of key findings, for example, using qRT-PCR, Western blotting, or immunostaining in independent infection models or patient-derived samples, would be required to establish biological significance beyond the transcriptomic level. Second, the GSE109428 dataset profiles a single mesenchymal cell population from a limited number of donors (n = 3, per condition), which substantially limits statistical power and increases the possibility of donor-specific effects or unstable differential expression estimates. The differential expression results should therefore be treated as hypothesis-generating. Furthermore, the dataset does not include time-matched uninfected controls; both the 24-hpi and 48-hpi comparisons were made against a single set of non-infected controls, and therefore transcriptional changes at each timepoint may partly reflect culture duration effects rather than exclusively infection-specific responses. This limitation is inherent to the original study design and cannot be resolved retrospectively.
Third, the study focuses exclusively on fallopian tube mesenchymal cells, whereas ovarian tumorigenesis involves complex interactions among epithelial, stromal, and immune compartments. The transcriptional programs identified here are cell-type-specific and may not be representative of responses in fallopian tube epithelial cells, which are considered the predominant cell of origin for high-grade serous ovarian carcinoma. The choice of mesenchymal cells is justified by their established roles in stromal remodeling, fibrosis, and inflammatory crosstalk, but the translational relevance of these findings to epithelial-driven carcinogenesis remains to be determined.
Also, functional enrichment and network analyses were performed using g:Profiler and STRING, which rely on curated annotation databases and association evidence that may not fully capture context-specific or cell-type-specific biology, and are subject to annotation biases toward extensively studied genes and pathways. Complementary unbiased approaches, such as pathway topology analysis or network centrality metrics, could provide additional interpretive value in future analyses.
Moreover, the four ssGSEA signatures used for TCGA-OV scoring were short (9–15 genes each) and derived from a single in vitro infection dataset, which limits their generalizability. Although signature genes were selected through a data-driven approach combining g:Profiler enrichment and STRING network evidence, the signatures were not benchmarked against existing immune or ECM signatures in cancer, nor validated across multiple infection datasets. ESTIMATE-based deconvolution confirmed that the ECM signature score is strongly correlated with stromal cell abundance (StromalScore ρ = 0.83), and inflammatory signatures substantially reflect immune cell infiltration, indicating that these scores may not capture infection-specific transcriptional programs independently of microenvironment composition.
Finally, the TCGA-OV survival analyses were exploratory. Although clinically adjusted Cox models incorporating age, tumor stage, histological grade, and residual disease were fitted, and p-values were corrected using the Benjamini–Hochberg procedure, no significant associations were identified. The analytic cohort was restricted to samples with complete clinical covariate data (n = 264), and results should be interpreted with caution, given the limited signature size, cohort heterogeneity, and absence of CT exposure data in TCGA clinical annotations. The TCGA analyses cannot establish whether transcriptional programs observed in tumors are related to prior CT infection, as pathogen exposure history is unavailable.
In line with these limitations, we propose that future studies should prioritize experimental validation of candidate pathways using more complex and cell-type-specific models, such as co-culture systems incorporating tumor, stromal, and immune cells, to better recapitulate the complex in vivo biology of the tumor microenvironment and its dynamics. Future work combining serological exposure data with tumor transcriptomic profiling in well-annotated cohorts could help bridge the gap between population-level associations and molecular mechanisms. Additionally, expanded signature development using multiple infection datasets and validation in independent ovarian cancer cohorts would improve both robustness and potential prognostic utility.
5. Conclusions
In summary, this in silico study shows that CT infection in primary human fallopian tube mesenchymal cells is associated with sustained inflammatory and interferon-linked transcriptional programs, including activation of NF-κB/TNF and IL-17 signaling axes, a densely connected ISG module comprising MX1/MX2, IFIT family members, OAS2, ISG15, and IRF7, and coordinated repression of ECM and adhesion networks at 48-hpi. Cytokine hubs centered on IL1A/IL1B, IL6, and CXCL/CCL chemokines represent candidate mediators of the sustained inflammatory state and constitute priority targets for experimental follow-up.
From a translational perspective, analogous inflammatory and innate immune transcriptional states are detectable in TCGA ovarian tumors, consistent with the biological plausibility of infection-linked programs as features of the tumor microenvironment, though this co-occurrence does not imply a causal relationship with CT infection. Although the short gene signatures derived here did not yield robust prognostic signals in TCGA-OV, a finding that remained consistent across unadjusted, clinically adjusted, and microenvironment-adjusted models, this reflects the exploratory nature of the analysis rather than the absence of a biologically relevant relationship, and does not preclude meaningful associations in larger, better-annotated cohorts with expanded signatures.
Altogether, this work provides a reproducible integrative framework bridging in vitro infection biology and tumor transcriptomics, and identifies specific candidate programs, including cytokine hubs centered on IL1A/IL1B and IL6, the ISG module comprising MX1, IFIT family members, and ISG15, and the ECM repression program centered on collagen and laminin genes. These candidate programs represent tractable entry points for future mechanistic studies in complex cell-type-specific models, including co-culture systems and organoid-based infection models. From an epidemiological perspective, future studies combining serological CT exposure data with tumor transcriptomic profiling in prospectively annotated cohorts with documented infection history would help bridge the gap between population-level associations and molecular mechanisms, and more directly address the potential role of CT infection as a co-factor in female reproductive tract tumorigenesis.