Review Reports
- Ixchel Salter 1,*,
- Michaele-Francesco Corbisiero 1 and
- Andrés F. Henao-Martínez 1,*
- et al.
Reviewer 1: Anonymous Reviewer 2: Emmanouil N. Magiorkinis
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThis is a clinically interesting study by Salter et al that addresses NHNT patients, who are an under characterised population, using a large federated database, a powerful approach. The writing is generally clear and the authors demonstrate good awareness of their limitations. However, several methodological concerns, inferential overreaches, and structural issues need to be addressed before publication.
Major Comments
- Propensity Score Matching Design and Scope
The PSM is used to balance five comorbidities (anemia, hypertension, neoplasms, T2DM, heart failure) when comparing survivors vs. non-survivors, but the rationale for selecting these five variables exclusively is not adequately justified. Why were neutropenia, glucocorticoid use, or malignancy subtype not included as matching covariates, given their well-established relationship with CMV reactivation risk and mortality? The paper would benefit from a clearer explanation of covariate selection for the PSM model.
Additionally, the PSM reduces the cohort from 1,123 to 556, discarding roughly half the data. The authors should demonstrate that the matched cohort remains representative of the original, and discuss whether findings generalize to the full population.
- Multiple Comparisons Without Correction for Multiplicity
The statistical analysis presents a serious multiplicity problem that is not acknowledged anywhere in the manuscript. Table 2 performs simultaneous comparisons across approximately 40 variables, spanning demographics, comorbidities, symptoms, medications, and laboratory values, each tested at a conventional α = 0.05 threshold using independent t-tests and χ² / Fisher's exact tests. No correction for multiple comparisons is applied or even discussed.
At α = 0.05 across 40 simultaneous tests, approximately 2 false positives would be expected by chance alone even if no true associations existed. This is a fundamental concern for the validity of several reported findings. The authors should apply an appropriate correction; either Bonferroni correction or a false discovery rate adjustment such as Benjamini-Hochberg and reassess their conclusions accordingly.
- Conflation of Association and Risk Factor Language
Throughout the Results and Discussion, variables that are associated with mortality in a matched cohort are repeatedly described in language that implies they are predictive risk factors or independent predictors (eg the figure title reads "Independent Predictors of Mortality"). No multivariable logistic regression or Cox proportional hazards model is presented.
- Tacrolimus Finding Needs Explanation
The finding that survivors had higher tacrolimus exposure than non-survivors is biologically counterintuitive given that tacrolimus is an immunosuppressant. The authors mention it only in passing in the Results. This deserves some discussion.
- Absence of a Control Group
Can the authors include a control group without CMV DNAemia? If not this should be listed as a limitation.
- Viral Load Threshold Justification
The 5,000 IU/mL cutoff is described as "pragmatic" with no universally accepted alternative. While the authors acknowledge this, they should more clearly articulate the potential consequence of this threshold: what happens if this threshold is changed?
Minor Comments
- Missing sentence
The final sentence of section 4.6 (Ferritin) is incomplete: "Further investigation related to the interaction of CMV and ferritin,"
- CD4+ Count Interpretation
The significantly lower CD4+ count in non-survivors (190 vs. 408 cells/µL) is a compelling finding, but the authors should note that CD4+ testing in non-HIV patients is non-standardized and may be ordered selectively (i.e., in sicker patients or those with suspected immunodeficiency), introducing ascertainment bias in this variable.
Author Response
This is a clinically interesting study by Salter et al that addresses NHNT patients, who are an under characterised population, using a large federated database, a powerful approach. The writing is generally clear and the authors demonstrate good awareness of their limitations. However, several methodological concerns, inferential overreaches, and structural issues need to be addressed before publication.
Major Comments
- Propensity Score Matching Design and Scope
The PSM is used to balance five comorbidities (anemia, hypertension, neoplasms, T2DM, heart failure) when comparing survivors vs. non-survivors, but the rationale for selecting these five variables exclusively is not adequately justified. Why were neutropenia, glucocorticoid use, or malignancy subtype not included as matching covariates, given their well-established relationship with CMV reactivation risk and mortality? The paper would benefit from a clearer explanation of covariate selection for the PSM model.
- Thank you for this important observation. The five covariates selected for propensity score matching (anemia, hypertension, neoplasms, type 2 diabetes mellitus, and heart failure) were chosen because they represent the most prevalent comorbidities in the cohort that are independently associated with all-cause mortality in the general population, and because the TriNetX platform's built-in PSM module constrains the number of covariates that can be included while maintaining adequate matching. Variables such as neutropenia, glucocorticoid use, and malignancy subtype were not included in the PSM model because they were considered potential mediators on the causal pathway between immunosuppression and mortality rather than baseline confounders, and their inclusion could introduce overadjustment bias. However, the reviewer's concern about residual confounding is well taken, and this limitation is now explicitly acknowledged.
- Section 2.4 (Statistical Analysis), page 5, line 171:we have added the following text after the sentence listing the matched covariates: "These five covariates were selected because they represent the most prevalent comorbidities in the cohort with established independent associations with all-cause mortality. Variables such as neutropenia, glucocorticoid use, and malignancy subtype were not included in the propensity model because they were considered potential mediators on the causal pathway between immunosuppression and CMV-related outcomes, and their inclusion could introduce overadjustment bias. Additionally, the TriNetX platform's PSM module imposes practical constraints on the number of covariates that can be simultaneously matched. We acknowledge that the omission of these variables may result in residual confounding, and this is discussed as a study limitation."
- Section 5 Limitations, page 13, line 442: we added: "The propensity score model included only five covariates, and important confounders such as severity of illness scores, immunosuppressive burden, neutropenia, and sepsis were not available for matching within the TriNetX platform. Residual confounding, therefore, cannot be excluded, and the findings should be interpreted accordingly."
Additionally, the PSM reduces the cohort from 1,123 to 556, discarding roughly half the data. The authors should demonstrate that the matched cohort remains representative of the original, and discuss whether findings generalize to the full population.
- Thank you for raising this point. The reduction from 1,123 to 556 patients is inherent to 1:1 nearest-neighbor matching and reflects the exclusion of patients without suitable matches. To address this concern, a supplementary table comparing baseline characteristics of the full unmatched cohort and the matched cohort is provided to demonstrate that the matched cohort remains broadly representative.
- Section 3.3 (results section), page 7, line 252:Add: "Readers should note that findings from the matched analysis may not fully generalize to the unmatched population, particularly to patients at the extremes of the propensity score distribution."
- Multiple Comparisons Without Correction for Multiplicity
The statistical analysis presents a serious multiplicity problem that is not acknowledged anywhere in the manuscript. Table 2 performs simultaneous comparisons across approximately 40 variables, spanning demographics, comorbidities, symptoms, medications, and laboratory values, each tested at a conventional α = 0.05 threshold using independent t-tests and χ² / Fisher's exact tests. No correction for multiple comparisons is applied or even discussed. At α = 0.05 across 40 simultaneous tests, approximately 2 false positives would be expected by chance alone even if no true associations existed. This is a fundamental concern for the validity of several reported findings. The authors should apply an appropriate correction; either Bonferroni correction or a false discovery rate adjustment such as Benjamini-Hochberg and reassess their conclusions accordingly.
- Thank you for this critical, statistically valid observation. We have added the following sentence to section 3.3. Page 7, line 248: “To address the multiplicity of comparisons in Table 2, Bonferroni-corrected p-values are reported (adjusted α = 0.05/40 = 0.00125). After correction, 17 of 22 originally significant associations remained significant. Notably, CMV viral load (original p = 0.0462) and CD4+ cell count (original p = 0.0054) did not survive Bonferroni correction, and these findings should be interpreted as hypothesis-generating."
- Table 2:Add: "Given the large number of simultaneous comparisons, the Benjamini-Hochberg false discovery rate (FDR) correction was applied to all p-values in the post-propensity score matched analysis to control for the expected proportion of false discoveries. Footnote clarifying which variables did not survive the adjusted p-values (q-values) is reported in Table 2 footnote."
- Conflation of Association and Risk Factor Language
Throughout the Results and Discussion, variables that are associated with mortality in a matched cohort are repeatedly described in language that implies they are predictive risk factors or independent predictors (eg the figure title reads "Independent Predictors of Mortality"). No multivariable logistic regression or Cox proportional hazards model is presented.
- Thank you for this important point. The reviewer is correct that the study design does not support the use of terms such as "independent predictors” as no multivariable logistic regression or Cox proportional hazards model was performed. The term is now removed from the figrure title.
- Figure 1 title:Changed from "Independent Predictors of Mortality from CMV DNAemia at 90 Days Following Propensity Score Matching" to "Clinical Factors Associated with 90-Day All-Cause Mortality in Patients with CMV DNAemia After Propensity Score Matching."
- Tacrolimus Finding Needs Explanation
The finding that survivors had higher tacrolimus exposure than non-survivors is biologically counterintuitive given that tacrolimus is an immunosuppressant. The authors mention it only in passing in the Results. This deserves some discussion.
- Thank you for highlighting this. The finding that survivors had higher tacrolimus exposure (32.4% vs. 15.5%) is indeed counterintuitive, given that tacrolimus is an immunosuppressant known to increase CMV risk. However, this finding likely reflects confounding by indication: tacrolimus use in this NHNT cohort may identify a subgroup of patients with autoimmune or inflammatory conditions (rather than malignancy) who have a fundamentally different prognosis and disease trajectory.
- Section 4.3 (Discussion — Immunosuppression and CMV Reactivation), page 10, after line 328:Added a new paragraph: "The observation that survivors had significantly higher tacrolimus exposure than non-survivors (32.4% vs. 15.5%, p < 0.0001) is counterintuitive, given that tacrolimus is an immunosuppressant associated with increased CMV risk. This finding likely reflects confounding by indication: tacrolimus use in this NHNT cohort may identify a subgroup of patients with autoimmune or inflammatory. This association should not be interpreted as a protective effect of tacrolimus and requires further investigation in prospective studies."
- Absence of a Control Group
Can the authors include a control group without CMV DNAemia? If not this should be listed as a limitation.
- Thank you for this suggestion. The inclusion of a control group without CMV DNAemia would indeed strengthen the study. However, the TriNetX platform does not allow the construction of a well-matched control cohort without CMV DNAemia that is comparable in terms of underlying disease severity, immunosuppressive burden, and comorbidity profile, as the indication for CMV viral load testing itself introduces selection bias. This is now explicitly listed as a limitation.
- Section 4.7. Limitations, page 12, line 448:Add: "A major limitation of this study is the absence of a comparator group without CMV DNAemia. Without such a control group, it is not possible to determine whether CMV DNAemia independently increases risk compared with similar patients without detectable viremia. The TriNetX platform does not readily permit the construction of a well-matched control cohort, as the indication for CMV viral load testing itself introduces selection bias. This limitation restricts the ability to address whether CMV DNAemia is a causal contributor to adverse outcomes or merely a marker of underlying disease severity."
- Viral Load Threshold Justification
The 5,000 IU/mL cutoff is described as "pragmatic" with no universally accepted alternative. While the authors acknowledge this, they should more clearly articulate the potential consequence of this threshold: what happens if this threshold is changed?
- Thank you for this comment. The 5,000 IU/mL threshold was chosen pragmatically to reduce the likelihood of capturing clinically insignificant low-level reactivation. As the ECIL 10 guidelines note, a universal viral load threshold for clinically significant CMV infection cannot be defined because patient groups vary greatly. Sensitivity analyses at alternative thresholds (e.g., 1,000 IU/mL and 10,000 IU/mL) would be informative but are constrained by the TriNetX platform's analytic capabilities. The potential consequences of threshold selection are now discussed.
- Section 1.7. Limitations, page 13:Added: "The 5,000 IU/mL viral load threshold used to define CMV DNAemia lacks validation across populations, and the robustness of findings at alternative thresholds has not been assessed. This represents a limitation that may affect the generalizability of the results."
Minor Comments
- Missing sentence
The final sentence of section 4.6 (Ferritin) is incomplete: "Further investigation related to the interaction of CMV and ferritin,"
- Thank you for catching this. The sentence is now complete. Page 13, line 433: "Further investigation related to the interaction of CMV and ferritin, including prospective studies comparing ferritin levels in patients with and without CMV DNAemia matched for underlying disease severity, is warranted to clarify whether elevated ferritin reflects CMV-specific pathology or the overall burden of critical illness."
- CD4+ Count Interpretation
The significantly lower CD4+ count in non-survivors (190 vs. 408 cells/µL) is a compelling finding, but the authors should note that CD4+ testing in non-HIV patients is non-standardized and may be ordered selectively (i.e., in sicker patients or those with suspected immunodeficiency), introducing ascertainment bias in this variable.
- Thank you for this insightful comment. CD4+ testing in non-HIV patients is indeed non-standardized and is typically ordered selectively in patients with suspected immunodeficiency or those who are clinically sicker, introducing ascertainment bias. This caveat will be added to the discussion.
- Section 4.3 (Discussion), page 10, lines 354:the following sentence was added: "However, this finding should be interpreted with caution, as CD4+ testing in non-HIV patients is non-standardized and is typically ordered selectively in patients with sus-pected immunodeficiency or those who are clinically more ill, introducing potential ascertainment bias. Patients in the non-survivor group may have been more likely to undergo CD4+ testing because of their greater clinical severity, potentially inflating the observed difference between groups."
Reviewer 2 Report
Comments and Suggestions for AuthorsThis manuscript examines the clinical characteristics and outcomes of cytomegalovirus (CMV) DNAemia in non-HIV-infected, non-transplant (NHNT) patients using a large, retrospective dataset derived from the TriNetX global research network. The authors aim to describe morbidity, mortality, and factors associated with 90-day mortality in this understudied population, employing propensity score matching to compare survivors and non-survivors. The study addresses an important and clinically relevant gap, as CMV reactivation outside traditional immunocompromised populations remains poorly characterized. The relatively large sample size and use of an international database are strengths. However, substantial methodological and interpretative limitations significantly weaken the conclusions, and major revisions are required before the manuscript can be considered for publication.
A central concern is the persistent ambiguity between association and causation throughout the manuscript. While the authors acknowledge that CMV DNAemia may represent a marker of illness severity rather than a causal driver of outcomes, the discussion frequently reverts to mechanistic interpretations implying pathogenic roles of CMV. For example, extensive speculation regarding immune evasion, ferritin-mediated inflammation, and thrombosis suggests biological causality that is not supported by the study design. Given the retrospective and observational nature of the dataset, the findings should be strictly framed as associative. The manuscript would benefit from a clearer conceptual framework early in the discussion explicitly distinguishing CMV DNAemia as a potential biomarker rather than a mediator of disease, and mechanistic interpretations should be substantially tempered or removed.
A second major issue relates to the definition of CMV DNAemia using a viral load threshold of more or equal to 5000 IU/mL, which is described as pragmatic but lacks validation across populations. The absence of sensitivity analyses using alternative thresholds raises concerns about misclassification and robustness of the findings. Furthermore, viral load is used both as an inclusion criterion and as a predictor of mortality, introducing conceptual circularity. The authors should justify this cut-off with existing literature where possible and perform sensitivity analyses using different thresholds to demonstrate stability of the results.
The study is also heavily affected by confounding, particularly by underlying illness severity. The cohort is dominated by patients with significant comorbidities, including a high prevalence of neoplasms (63%) and extensive glucocorticoid exposure (91%) suggesting a population that is already severely ill or immunocompromised. Many variables identified as predictors of mortality, such as acute respiratory failure, hypotension, encephalopathy, and malnutrition, are well-established markers of critical illness rather than CMV-specific effects. Without adequate adjustment for baseline severity, it is not possible to disentangle whether CMV DNAemia contributes to outcomes or merely reflects the underlying disease burden. The authors should incorporate more robust measures of illness severity, perform additional multivariable analyses, and consider stratified analyses (e.g., ICU versus non-ICU populations, malignancy versus non-malignancy) to better address this issue.
Related to this, the propensity score matching approach is insufficiently specified and appears incomplete. Matching was performed using a limited set of variables (e.g., anaemia, hypertension, neoplasms, diabetes, and heart failure) omitting critical confounders such as severity of illness, immunosuppressive burden, and sepsis. This raises serious concerns about residual confounding and the validity of comparisons between survivors and non-survivors. The manuscript should provide a clear rationale for variable selection, include additional clinically relevant covariates in the propensity model, and present comprehensive balance diagnostics to demonstrate adequate matching.
The interpretation of medication use is another area requiring revision. Observed differences in antiviral therapy (e.g., higher ganciclovir use among non-survivors and higher valganciclovir use among survivors) are presented descriptively but may be misinterpreted as reflecting treatment effects. In reality, these associations are highly susceptible to confounding by indication, where sicker patients are more likely to receive more aggressive therapy. The authors should explicitly acknowledge this bias and avoid any implication of causal treatment effects. Consideration of time-dependent biases, such as immortal time bias, would further strengthen the analysis.
The choice of all-cause mortality as the primary outcome is appropriate given database limitations, but the manuscript repeatedly interprets findings in a CMV-specific context. Since the TriNetX dataset lacks histopathological confirmation and clinical adjudication, CMV-attributable mortality cannot be assessed. Therefore, conclusions should be strictly limited to associations between CMV DNAemia and overall mortality, and any inference regarding CMV-specific pathogenicity should be removed or clearly framed as speculative.
Another critical limitation is the absence of a comparator group without CMV DNAemia. Without such a control group, the study cannot determine whether CMV DNAemia independently increases risk compared to similar patients without detectable viremia. This limits the ability to address the manuscript’s central clinical question. If inclusion of a control cohort is not feasible, this limitation should be more explicitly acknowledged and emphasized.
The reliance on ICD-10 codes to define CMV end-organ disease introduces additional uncertainty. As the authors note, histopathological confirmation is not available and ICD coding for CMV disease is known to be imprecise. As such, statements regarding the incidence or distribution of CMV organ involvement should be interpreted with caution and toned down accordingly.
Temporal relationships between variables are also insufficiently defined. It is unclear whether symptoms, laboratory abnormalities, and medication exposures occurred before or after the detection of CMV DNAemia. This ambiguity complicates interpretation, as many identified "predictors" may in fact represent consequences of disease progression rather than antecedent risk factors. Clarification of timing is essential to support any predictive claims.
Several statistical concerns further limit confidence in the findings. The analysis involves numerous comparisons across a wide range of variables without adjustment for multiple testing, increasing the risk of false-positive results. Additionally, several continuous variables, including viral load and laboratory parameters, demonstrate extreme variability, suggesting non-normal distributions that may not be appropriately analysed using parametric methods. The authors should consider data transformation, non-parametric testing, and reporting of medians with interquartile ranges.
Beyond these major issues, the cohort itself is highly heterogeneous, encompassing patients with malignancy, critical illness, and various forms of immunosuppression. While this reflects real world complexity, it reduces interpretability without appropriate subgroup analyses. Similarly, the discussion of ferritin as a mechanistic biomarker and the proposed association between CMV and venous thromboembolism appear overstated given the nonspecific nature of these findings and the high prevalence of confounding conditions such as cancer and critical illness The extensive immunological discussion, while informative, is not directly supported by the data and should be condensed.
In summary, this study addresses an important clinical question and leverages a large dataset, but its current form is limited by substantial confounding, methodological weaknesses, and overinterpretation of associative findings. The manuscript would benefit from a clearer focus on its observational nature, more rigorous statistical adjustment, improved transparency in methodology, and a more cautious interpretation of results. With these revisions, the study could provide valuable descriptive insights into CMV DNAemia in NHNT populations, but in its current state, major revision is required.
Author Response
Reviewer #2
This manuscript examines the clinical characteristics and outcomes of cytomegalovirus (CMV) DNAemia in non-HIV-infected, non-transplant (NHNT) patients using a large, retrospective dataset derived from the TriNetX global research network. The authors aim to describe morbidity, mortality, and factors associated with 90-day mortality in this understudied population, employing propensity score matching to compare survivors and non-survivors. The study addresses an important and clinically relevant gap, as CMV reactivation outside traditional immunocompromised populations remains poorly characterized. The relatively large sample size and use of an international database are strengths. However, substantial methodological and interpretative limitations significantly weaken the conclusions, and major revisions are required before the manuscript can be considered for publication.
A central concern is the persistent ambiguity between association and causation throughout the manuscript. While the authors acknowledge that CMV DNAemia may represent a marker of illness severity rather than a causal driver of outcomes, the discussion frequently reverts to mechanistic interpretations implying pathogenic roles of CMV. For example, extensive speculation regarding immune evasion, ferritin-mediated inflammation, and thrombosis suggests biological causality that is not supported by the study design. Given the retrospective and observational nature of the dataset, the findings should be strictly framed as associative. The manuscript would benefit from a clearer conceptual framework early in the discussion explicitly distinguishing CMV DNAemia as a potential biomarker rather than a mediator of disease, and mechanistic interpretations should be substantially tempered or removed.
- Thank you for this important critique. A conceptual framework paragraph will be added early in the Discussion explicitly framing CMV DNAemia as a potential biomarker of illness severity rather than a confirmed mediator of disease, and mechanistic speculation will be substantially tempered throughout.
- Section 4.1 (Discussion — Overview of Key Findings), page 9, line 267:Added a new paragraph at the beginning of the Discussion: "Before interpreting the findings of this study, it is essential to establish a clear conceptual framework. CMV DNAemia in NHNT patients may represent: (1) a pathogenic driver of adverse outcomes through direct tissue injury and immune dysregulation; (2) a bystander marker of severe immunosuppression and critical illness; or (3) a combination of both. The retrospective, observational design of this study cannot distinguish among these possibilities. All findings reported herein should be interpreted as associations, not causal relationships. Mechanistic hypotheses discussed below are offered as context for future hypothesis-generating research and should not be construed as evidence of causality."
- Mechanistic hypothesis were clearly stated as speculative
A second major issue relates to the definition of CMV DNAemia using a viral load threshold of more or equal to 5000 IU/mL, which is described as pragmatic but lacks validation across populations. The absence of sensitivity analyses using alternative thresholds raises concerns about misclassification and robustness of the findings. Furthermore, viral load is used both as an inclusion criterion and as a predictor of mortality, introducing conceptual circularity. The authors should justify this cut-off with existing literature where possible and perform sensitivity analyses using different thresholds to demonstrate stability of the results.
- Thank you for this comment. The 5,000 IU/mL threshold was chosen pragmatically to reduce the likelihood of capturing clinically insignificant low-level reactivation. As the ECIL 10 guidelines note, a universal viral load threshold for clinically significant CMV infection cannot be defined because patient groups vary greatly. Sensitivity analyses at alternative thresholds (e.g., 1,000 IU/mL and 10,000 IU/mL) would be informative but are constrained by the TriNetX platform's analytic capabilities. The potential consequences of threshold selection are now discussed. Unfortunately, the TrinetX platform limits for additional proposed sensitivity analysis.
- Section 4.7. Limitations, page 12:Added: "The 5,000 IU/mL viral load threshold used to define CMV DNAemia lacks validation across populations, and the robustness of findings at alternative thresholds has not been assessed. This represents a limitation that may affect the generalizability of the results."
- Section 4.1 (results), page 9, line 289:Added: "Also, the association between the mortality and CMV DNAemia disappeared after Bonferroni correction."
The study is also heavily affected by confounding, particularly by underlying illness severity. The cohort is dominated by patients with significant comorbidities, including a high prevalence of neoplasms (63%) and extensive glucocorticoid exposure (91%) suggesting a population that is already severely ill or immunocompromised. Many variables identified as predictors of mortality, such as acute respiratory failure, hypotension, encephalopathy, and malnutrition, are well-established markers of critical illness rather than CMV-specific effects. Without adequate adjustment for baseline severity, it is not possible to disentangle whether CMV DNAemia contributes to outcomes or merely reflects the underlying disease burden. The authors should incorporate more robust measures of illness severity, perform additional multivariable analyses, and consider stratified analyses (e.g., ICU versus non-ICU populations, malignancy versus non-malignancy) to better address this issue.
- Section 4.1 (Discussion), page 8–9:Add: "It is important to recognize that many of the clinical factors associated with mortality in this analysis are nonspecific markers of critical illness. Without adequate adjustment for baseline severity, disentangling the contribution of CMV DNAemia from the underlying disease burden is not possible."
- Section 4.7. Limitations, page 13:Add: "The TriNetX database does not provide validated illness severity scores (e.g., APACHE II, SOFA), precluding adequate adjustment for baseline severity of illness. Many variables associated with mortality in this analysis—including acute respiratory failure, hypotension, encephalopathy, and malnutrition—are well-established markers of critical illness and may reflect the underlying disease burden rather than CMV-specific effects. Stratified analyses by ICU vs. non-ICU status and malignancy vs. non-malignancy subgroups were not performed due to platform constraints and anticipated small subgroup sizes, but would be valuable in future studies with more granular data."
Related to this, the propensity score matching approach is insufficiently specified and appears incomplete. Matching was performed using a limited set of variables (e.g., anaemia, hypertension, neoplasms, diabetes, and heart failure) omitting critical confounders such as severity of illness, immunosuppressive burden, and sepsis. This raises serious concerns about residual confounding and the validity of comparisons between survivors and non-survivors. The manuscript should provide a clear rationale for variable selection, include additional clinically relevant covariates in the propensity model, and present comprehensive balance diagnostics to demonstrate adequate matching.
- Thank you for this important observation. The five covariates selected for propensity score matching (anemia, hypertension, neoplasms, type 2 diabetes mellitus, and heart failure) were chosen because they represent the most prevalent comorbidities in the cohort that are independently associated with all-cause mortality in the general population, and because the TriNetX platform's built-in PSM module constrains the number of covariates that can be included while maintaining adequate matching. Variables such as neutropenia, glucocorticoid use, and malignancy subtype were not included in the PSM model because they were considered potential mediators on the causal pathway between immunosuppression and mortality rather than baseline confounders, and their inclusion could introduce overadjustment bias. However, the reviewer's concern about residual confounding is well taken, and this limitation is now explicitly acknowledged.
- Section 2.4 (Statistical Analysis), page 5, line 171:we have added the following text after the sentence listing the matched covariates: "These five covariates were selected because they represent the most prevalent comorbidities in the cohort with established independent associations with all-cause mortality. Variables such as neutropenia, glucocorticoid use, and malignancy subtype were not included in the propensity model because they were considered potential mediators on the causal pathway between immunosuppression and CMV-related outcomes, and their inclusion could introduce overadjustment bias. Additionally, the TriNetX platform's PSM module imposes practical constraints on the number of covariates that can be simultaneously matched. We acknowledge that the omission of these variables may result in residual confounding, and this is discussed as a study limitation."
- Section 4.7. Limitations, page 13, line 442: we added: "The propensity score model included only five covariates, and important confounders such as severity of illness scores, immunosuppressive burden, neutropenia, and sepsis were not available for matching within the TriNetX platform. Residual confounding, therefore, cannot be excluded, and the findings should be interpreted accordingly."
The interpretation of medication use is another area requiring revision. Observed differences in antiviral therapy (e.g., higher ganciclovir use among non-survivors and higher valganciclovir use among survivors) are presented descriptively but may be misinterpreted as reflecting treatment effects. In reality, these associations are highly susceptible to confounding by indication, where sicker patients are more likely to receive more aggressive therapy. The authors should explicitly acknowledge this bias and avoid any implication of causal treatment effects. Consideration of time-dependent biases, such as immortal time bias, would further strengthen the analysis.
- Thank you for this important point. These differences may well reflect treatment allocation patterns driven by clinical severity rather than treatment effects. Unfortunately, time-dependent biases, including immortal time bias, cannot be assessed within the TriNetX platform.
- Section 4.7. Limitations, page 13:Add: "Differences in antiviral prescribing patterns between survivors and non-survivors are confounded by indication and immortal-time bias and should not be interpreted as ev-idence of treatment efficacy or harm."
The choice of all-cause mortality as the primary outcome is appropriate given database limitations, but the manuscript repeatedly interprets findings in a CMV-specific context. Since the TriNetX dataset lacks histopathological confirmation and clinical adjudication, CMV-attributable mortality cannot be assessed. Therefore, conclusions should be strictly limited to associations between CMV DNAemia and overall mortality, and any inference regarding CMV-specific pathogenicity should be removed or clearly framed as speculative.
- Thank you. This concern is addressed by the conceptual framework paragraph added in response to comments above. Additionally, all instances where findings are interpreted in a CMV-specific context will be revised to explicitly state that the outcome is all-cause mortality and that CMV-attributable mortality cannot be assessed.
Another critical limitation is the absence of a comparator group without CMV DNAemia. Without such a control group, the study cannot determine whether CMV DNAemia independently increases risk compared to similar patients without detectable viremia. This limits the ability to address the manuscript's central clinical question. If inclusion of a control cohort is not feasible, this limitation should be more explicitly acknowledged and emphasized.
- Thank you for this suggestion. The inclusion of a control group without CMV DNAemia would indeed strengthen the study. However, the TriNetX platform does not allow the construction of a well-matched control cohort without CMV DNAemia that is comparable in terms of underlying disease severity, immunosuppressive burden, and comorbidity profile, as the indication for CMV viral load testing itself introduces selection bias. This is now explicitly listed as a limitation.
- Section 4.7. Limitations, page 12, line 448:Add: "A major limitation of this study is the absence of a comparator group without CMV DNAemia. Without such a control group, it is not possible to determine whether CMV DNAemia independently increases risk compared with similar patients without detectable viremia. The TriNetX platform does not readily permit the construction of a well-matched control cohort, as the indication for CMV viral load testing itself introduces selection bias. This limitation restricts the ability to address whether CMV DNAemia is a causal contributor to adverse outcomes or merely a marker of underlying disease severity."
The reliance on ICD-10 codes to define CMV end-organ disease introduces additional uncertainty. As the authors note, histopathological confirmation is not available, and ICD coding for CMV disease is known to be imprecise. As such, statements regarding the incidence or distribution of CMV organ involvement should be interpreted with caution and toned down accordingly.
- Thank you. The manuscript already acknowledges this limitation, but the language will be strengthened. Page 6, line 223
Temporal relationships between variables are also insufficiently defined. It is unclear whether symptoms, laboratory abnormalities, and medication exposures occurred before or after the detection of CMV DNAemia. This ambiguity complicates interpretation, as many identified "predictors" may in fact represent consequences of disease progression rather than antecedent risk factors. Clarification of timing is essential to support any predictive claims.
- Thank you for this important methodological concern. The TriNetX platform captures symptoms within 30 days and laboratory values within 90 days of the index event, but the temporal sequence (before vs. after CMV DNAemia detection) cannot be reliably determined. This means that many "associated factors" may represent consequences of disease progression rather than antecedent risk factors.
- Section 2.2 (Study Design and Population), page 5, lines 142–146:Added: "Importantly, the temporal relationship between the index CMV DNAemia event and the captured symptoms, laboratory values, and medication exposures cannot be reliably determined within the TriNetX platform. Symptoms were captured within 30 days and laboratory values within 90 days of CMV DNAemia documentation, but whether these occurred before or after viremia detection is unknown."
- Section 4.7. Limitations, page 13:Add: "Importantly, the temporal relationship between the index CMV DNAemia event and the captured symptoms, laboratory values, and medication exposures cannot be reliably determined within the TriNetX platform. Symptoms were captured within 30 days and laboratory values within 90 days of CMV DNAemia documentation, but whether these occurred before or after viremia detection is unknown"
Several statistical concerns further limit confidence in the findings. The analysis involves numerous comparisons across a wide range of variables without adjustment for multiple testing, increasing the risk of false-positive results. Additionally, several continuous variables, including viral load and laboratory parameters, demonstrate extreme variability, suggesting non-normal distributions that may not be appropriately analysed using parametric methods. The authors should consider data transformation, non-parametric testing, and reporting of medians with interquartile ranges.
- Thank you for this critical, statistically valid observation. We have added the following sentence to section 3.3. Page 7, line 248: “To address the multiplicity of comparisons in Table 2, Bonferroni-corrected p-values are reported (adjusted α = 0.05/40 = 0.00125). After correction, 17 of 22 originally significant associations remained significant. Notably, CMV viral load (original p = 0.0462) and CD4+ cell count (original p = 0.0054) did not survive Bonferroni correction, and these findings should be interpreted as hypothesis-generating."
- Table 2:Add: "Given the large number of simultaneous comparisons, the Benjamini-Hochberg false discovery rate (FDR) correction was applied to all p-values in the post-propensity score matched analysis to control for the expected proportion of false discoveries. Footnote clarifying which variables did not survive the adjusted p-values (q-values) is reported in Table 2 footnote."
- Unfortunately, TrinetX aggregated data limits the report of additional medians and interquartile ranges.
Beyond these major issues, the cohort itself is highly heterogeneous, encompassing patients with malignancy, critical illness, and various forms of immunosuppression. While this reflects real world complexity, it reduces interpretability without appropriate subgroup analyses. Similarly, the discussion of ferritin as a mechanistic biomarker and the proposed association between CMV and venous thromboembolism appear overstated given the nonspecific nature of these findings and the high prevalence of confounding conditions such as cancer and critical illness The extensive immunological discussion, while informative, is not directly supported by the data and should be condensed.
- Section 4.7. Limitations, page 13:Added: "The cohort is highly heterogeneous, encompassing patients with diverse malignancies, varying degrees of critical illness, and multiple forms of immunosuppression. While this reflects real-world clinical complexity, it reduces interpretability without appropriate subgroup analyses."
- Thank you. The discussions of ferritin as a mechanistic biomarker (Section 4.6) and the proposed association between CMV and VTE (Section 4.5) will be condensed and more explicitly framed as hypothesis-generating rather than conclusive, given the nonspecific nature of these findings in a cohort with high prevalence of cancer and critical illness.
- The immunological discussion was condensed as recommended.
In summary, this study addresses an important clinical question and leverages a large dataset, but its current form is limited by substantial confounding, methodological weaknesses, and overinterpretation of associative findings. The manuscript would benefit from a clearer focus on its observational nature, more rigorous statistical adjustment, improved transparency in methodology, and a more cautious interpretation of results. With these revisions, the study could provide valuable descriptive insights into CMV DNAemia in NHNT populations, but in its current state, major revision is required.
- We thank the reviewer for the very helpful feedback. The changes above will certainly strengthen the manuscript.
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsThe authors have satisfied my concerns.
Reviewer 2 Report
Comments and Suggestions for AuthorsThank you for your reply. I have no further comments.