sMICA/sMICB and Immune Checkpoint in Endometriosis: Toward a Minimally Invasive Diagnostic Model Based on Machine Learning
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThis is a strong, well-designed study with high potential impact. The core idea—profiling a comprehensive panel of soluble immune mediators to build a machine learning model for non-invasive endometriosis diagnosis—is novel, timely, and addresses a critical clinical need. The methodological foundation is robust. However, the manuscript's presentation currently does not meet the standard required for publication. Significant revisions are needed, primarily to the results section, figures, and language, to fully realize the study's merit and ensure clarity for the reader.
1. Results and Figures (Highest Priority)
The current presentation is the manuscript's weakest point and must be improved.
-
Figures and Legends: Each figure legend must be a self-contained, detailed description. You must specify:
-
Plot type (e.g., box plot, scatter plot).
-
Data representation (e.g., median with IQR; mean ± SEM).
-
Units of measurement on axes (e.g., pg/mL).
-
Exact statistical annotations (e.g., p=0.01; ρ=0.75) directly on the figure or in the legend.
-
Clear group labels (avoid using "control" for within-patient comparisons; use "adhesion-positive" vs. "adhesion-negative").
-
-
Report Numerical Data: Key findings (e.g., sMICA levels in infertility) must include the actual concentration values (median/mean) for each group in the text or, preferably, in a supplementary table summarizing all biomarker levels across groups.
-
Correct Statistical Notation: The use of q to represent the correlation coefficient (e.g., q > 0.6) is incorrect and confusing. Replace with the standard notation (e.g., Spearman's ρ or Pearson's r). Consistently report the coefficient value and its p-value.
-
Clarify Model Development: Add a sentence specifying the data split (e.g., 70/30 train/test) and validation method (e.g., k-fold cross-validation) used for the machine learning models.
2. Methods – Assay Details (Critical)
-
Section 2.3 is Inadequate. Stating that sEng, sMICA, and sMICB were measured "using previously published methods [21, 22]" is not sufficient for replication. Summarize the key details here: assay type (e.g., ELISA), kit catalog numbers (if commercial), and critical performance characteristics (sensitivity, intra-assay CV) as performed in your laboratory.
-
Section 2.2 needs more detail. Specify the initial sample volume, final dilution factor, and whether samples were run in singlicate or duplicate. Report the assay's sensitivity (LLOD) and dynamic range for the key analytes.
Specific Comments by Section
-
Abstract: Well-structured and clear. Consider starting the final sentence more actively: "Our results demonstrate…"
-
Introduction: Good background. Strengthen the logical bridge to your specific biomarkers. Explicitly state the rationale for choosing NKG2D ligands and soluble checkpoints based on prior literature.
-
Methods:
-
2.1: Excellent cohort description. Consider noting the absence of a pre-study power calculation as a limitation in the Discussion.
-
2.4: The statistical description is outstanding in its detail and rigor.
-
-
Results:
-
Clarify the contradictory p = 0.12, q = 0.023 statement. Explain that the q-value (FDR-corrected) indicates significance.
-
The narrative is logically sound but must be supported by the complete data presentation as outlined above.
-
-
Discussion:
-
Well-written, linking findings to the literature. Be cautious to frame mechanistic explanations (e.g., on sB7.2 binding or TIM-3 shedding) as hypotheses informed by your correlative data, not as proven conclusions.
-
The discussion of the ML model should more forcefully state the critical need for external validation in an independent cohort as the primary next step and a current limitation.
-
-
Conclusions: Supportable but should be tempered. Use associative language (e.g., "are linked to" or "are associated with") rather than causative ("contribute to") for the biomarkers' roles. Explicitly mention that the diagnostic model requires prospective validation.
The English requires comprehensive proofreading to achieve a polished, professional standard.
-
Fix inconsistent hyphenation/spacing (e.g., "non- invasive" → "non-invasive").
-
Correct article usage ("remains significant challenge" → "remains a significant challenge").
-
Improve sentence flow to be more direct and idiomatic (e.g., "The results obtained demonstrate…" → "Our results demonstrate…").
-
Ensure consistent terminology (e.g., use "sMICB" consistently, not "sMlCB").
Author Response
Dear reviewer,
Thank you for your feedback.
We would like to clarify the following points:
- Results and Figures (Highest Priority)
Figures and Legends. We sincerely thank the editor for this valuable feedback. In response, we have thoroughly revised all figures and their legends to ensure clarity, completeness, and adherence to journal standards. Specifically, each legend now includes: the exact plot type (scatter plot), a clear description of data representation, units of measurement on all axes (e.g., pg/mL); and precise group labels (lines 203-206, 211-214, 219-222).
Correct Statistical Notation. We have carefully reviewed all instances of correlation reporting throughout the manuscript and replaced the incorrect notation “q” with the appropriate statistical Spearman’s ρ. In every case, we now report both the correlation coefficient and its exact p-value in the figures, legends, and main text, ensuring consistency and clarity in statistical reporting (lines 227, 229, 235, 236).
Clarify Model Development. We appreciate this suggestion. We have now added a clear description of the model development procedure in the Methods section (lines 166-185).
- Methods – Assay Details (Critical)
Section 2.3 is Inadequate. Stating that sEng, sMICA, and sMICB were measured "using previously published methods [21, 22]" is not sufficient for replication. Summarize the key details here: assay type (e.g., ELISA), kit catalog numbers (if commercial), and critical performance characteristics (sensitivity, intra-assay CV) as performed in your laboratory.
Section 2.2 needs more detail. Specify the initial sample volume, final dilution factor, and whether samples were run in singlicate or duplicate. Report the assay's sensitivity (LLOD) and dynamic range for the key analytes.
We sincerely thank the reviewer for this essential methodological comment. We have fully revised Sections 2.2 and 2.3 to provide complete assay transparency and ensure reproducibility (lines 117-118, 123-125, 127-134).
- Specific Comments by Section.
Abstract. We thank the reviewer for this helpful suggestion. We have revised the final sentence of the Abstract to read more actively (line 25).
Introduction. We sincerely thank the reviewer for this insightful suggestion. We have revised the Introduction to strengthen the logical connection between the established pathophysiology of endometriosis and our selection of specific biomarkers (lines 45-49, 54-56, 62-67).
Results. We have revised the statistical reporting to ensure accuracy and clarity. The value previously labeled as “q = 0.023” has been corrected to reflect that it is the p-value adjusted using the Holm–Bonferroni method for multiple comparisons. We updated sentence in the Results (lines 201-202). We have also updated the Methods section (lines 163-164).
Discussion. We have carefully revised the Discussion to ensure that all mechanistic interpretations are explicitly framed as hypotheses rather than established conclusions. We also clarified the limitations of our machine learning model, emphasizing the need for external validation in an independent cohort before clinical application. (lines 309-318).
Conclusions. We have carefully revised the Conclusions to use strictly associative language (lines 366-367).
Reviewer 2 Report
Comments and Suggestions for AuthorsBelow I provide my review of the manuscript entitled “sMICA/sMICB Shedding and Immune Checkpoint Dysregulation in Endometriosis: Toward a Non-Invasive Diagnostic Model Based on Machine Learning.”
Here are my suggestions for the paper. The topic is clearly up-to-date, but I see several issues — mainly related to group sample sizes, ASRM interpretation, statistical reporting, and, most importantly, a high risk of overfitting. Therefore I recommend major revision, and we will see if the authors are able to substantially improve the work.
First:
There are multiple contradictory sample sizes across the manuscript. You report endometriosis: 67 patients and controls: 16 (total 83). However, the ASRM split in Methods is stages I–II (n=18) and III–IV (n=48), which sums to 66, not 67. In Results (three-group split) you report I–II n=15; III n=17; IV n=28, which sums to 60 (so 7 patients are missing). The caption of Figure 1 gives yet other numbers (e.g., control=9; stage IV n=26). Because of that, you need to correct the sample sizes and reach one consistent version. I suggest creating one table or a flowchart (screening → inclusion → analyses for each marker).
Second:
Your inclusion criteria state “peritoneal endometriosis confirmed intraoperatively and histologically.” At the same time, the interpretation of ASRM stages in Results and Conclusions mixes concepts: ASRM is not a DIE classification per se, and stage IV does not automatically mean “deep infiltrating” (this depends on phenotype: DIE vs endometrioma vs adhesions, etc.). In the Conclusions you directly state that stage IV equals “deep infiltrating,” which is not correct. You therefore need:
- a precise phenotype description (peritoneal-only? endometrioma present? DIE? exact locations, rASRM score + components), and
- if DIE is present, provide a DIE-appropriate classification (e.g., ENZIAN) or at least an operational definition.
Third:
Your control group consists of women after laparoscopy in a “pre-IVF work-up” due to male factor infertility, with no pathology detected. This is not a truly healthy population in a general sense and it carries a risk of confounding (IVF qualification itself, stress, coexisting factors, possible subclinical endometriosis, selection to surgery). Additionally, in the infertility analyses you use the label “control” for the “without infertility” subgroup within endometriosis, which is confusing (Figure 2). Please standardize the group naming.
Fourth:
Soluble checkpoint proteins were measured only in 32 endometriosis patients and 8 controls (total 40), while sEng/sMICA/sMICB were measured in all 83 participants. Despite that, the narrative suggests broad conclusions on “immune checkpoint dysregulation,” but in practice the study is underpowered for stable estimates with multiple markers plus multiple-comparison correction.
Fifth:
The result “p = 0.12, q = 0.023” for the stage IV vs III comparison is illogical under standard FDR correction (q generally should not be lower than p). This suggests either a typo (maybe p=0.012?) or an analysis/reporting error.
The biggest issue (in my opinion):
The “non-invasive diagnostic” model is based on a small set of features (serum sEng/sMICA/sMICB + age, VAS, infertility, parity) and reports very high performance (AUC 0.94–0.95 and accuracy 0.94 for XGBoost). With this dataset size, there is a major risk that:
- metrics are calculated on the training set or after an unstable split,
- the model is mainly driven by simple separation based on VAS (not biomarkers),
- calibration and clinical utility are not evaluated.
You should improve reporting and methods by including:
- a clear validation scheme: train/test split (how randomized, seed), or preferably nested cross-validation / bootstrap,
- metrics with 95% CI (e.g., bootstrapping),
- model comparisons: clinical-only vs biomarkers-only vs combined (do biomarkers add value beyond VAS/history?),
- calibration (calibration slope/intercept, Brier score) and decision thresholds (sensitivity/specificity/PPV/NPV),
- full XGBoost parameters (max_depth, eta, nrounds…), pre-processing, missing data handling, standardization,
- transparency for SHAP interpretation: on which dataset it was computed and how stable it is.
In the Discussion, when comparing PF vs serum and the concept of “local pelvic environment vs systemic signal,” and also the limitations of an IVF-workup control group, you should address that PF biomarkers can correlate more strongly with phenotype (e.g., infertility) than serum markers — which you suggest for sMICA/sMICB and which is key for the credibility of your predictive model. I recommend you discuss ZEB findings and cite: https://doi.org/10.3390/biomedicines10102460.
Throughout the manuscript the terminology “control” is very inconsistent — sometimes meaning “no endometriosis,” sometimes “no infertility,” and sometimes “no adhesions.” This needs to be fixed, otherwise readers will be lost.
Also, in practice your model uses blood + VAS. This is at best “minimally invasive” / “blood-based,” not strictly “non-invasive.” It is worth naming it precisely, otherwise the editors may challenge it.
Overall, I think the manuscript has potential for publication, but there is still a lot of work to do.
Author Response
Dear reviewer,
Thank you for your feedback.
We would like to clarify the following points:
- There are multiple contradictory sample sizes across the manuscript.
We sincerely thank the reviewer for this careful observation. We acknowledge that the initial description contained inconsistencies in sample sizes, and we have now thoroughly revised the manuscript to ensure full consistency across all sections (lines 92, 188, 200).
The total number of participants is 82: 66 women with endometriosis and 16 controls. The discrepancy in the Results section (where the three-group ASRM split sums to n = 60) arises because sMICB levels in peritoneal fluid could not be quantified in 6 patients due to insufficient sample volume or assay limitations.
- Your inclusion criteria state “peritoneal endometriosis confirmed intraoperatively and histologically.” At the same time, the interpretation of ASRM stages in Results and Conclusions mixes concepts: ASRM is not a DIE classification per se, and stage IV does not automatically mean “deep infiltrating” (this depends on phenotype: DIE vs endometrioma vs adhesions, etc.). In the Conclusions you directly state that stage IV equals “deep infiltrating,” which is not correct.
We sincerely thank the reviewer for this crucial clarification. We fully agree that rASRM stage IV does not automatically equate to deep infiltrating endometriosis (DIE), as stage IV can also result from large endometriomas or extensive adhesions without true deep infiltration.
In response, we have removed all statements equating rASRM stage IV with “deep infiltrating endometriosis” from the Results and Conclusions (line 364). All analyses are now based strictly on the rASRM classification—specifically, the total score and assigned stage—as a standardized, surgically documented measure of disease severity, without assuming or implying a specific phenotype.
- Your control group consists of women after laparoscopy in a “pre-IVF work-up” due to male factor infertility, with no pathology detected. This is not a truly healthy population in a general sense and it carries a risk of confounding (IVF qualification itself, stress, coexisting factors, possible subclinical endometriosis, selection to surgery). Additionally, in the infertility analyses you use the label “control” for the “without infertility” subgroup within endometriosis, which is confusing (Figure 2). Please standardize the group naming.
We thank the reviewer for this important observation.
First, we do not describe our control group as “healthy” in the general population sense. Instead, we explicitly state that these women underwent diagnostic laparoscopy as part of a pre-IVF work-up due to male-factor infertility, and no endometriotic lesions or other gynecological pathology were identified intraoperatively. Given that laparoscopy remains the gold standard for ruling out peritoneal endometriosis—especially superficial or atypical forms not detectable by imaging—we consider this group to be confirmed endometriosis-free, which is the critical criterion for our immunological comparisons. We acknowledge the potential for residual confounding (e.g., stress, hormonal stimulation), and we have added a sentence in the Limitations section addressing this (lines 319-324).
Second, we fully agree that using the term “control” for the “infertility-negative” subgroup within the endometriosis cohort was misleading. As noted in our previous revisions, we have standardized all group labels (lines 203-206, 211-214, 219-222).
- Soluble checkpoint proteins were measured only in 32 endometriosis patients and 8 controls (total 40), while sEng/sMICA/sMICB were measured in all 83 participants. Despite that, the narrative suggests broad conclusions on “immune checkpoint dysregulation,” but in practice the study is underpowered for stable estimates with multiple markers plus multiple-comparison correction.
We fully acknowledge that soluble immune checkpoint proteins were assessed in a subset of participants (32 patients with endometriosis and 8 controls) due to sample availability and assay constraints, while sMICA/sMICB/sEng were measured in the full cohort (n = 82). Consequently, our analyses of immune checkpoint correlations and associations are exploratory and hypothesis-generating, not definitive. We have added the following statement to the Limitations (lines 304-308).
- The result “p = 0.12, q = 0.023” for the stage IV vs III comparison is illogical under standard FDR correction (q generally should not be lower than p). This suggests either a typo (maybe p=0.012?) or an analysis/reporting error.
We thank the reviewer for this observation. We updated sentence in the Results (lines 202-202). We have also updated the Methods section (lines 163-164).
- The “non-invasive diagnostic” model is based on a small set of features (serum sEng/sMICA/sMICB + age, VAS, infertility, parity) and reports very high performance (AUC 0.94–0.95 and accuracy 0.94 for XGBoost). With this dataset size, there is a major risk that:
- metrics are calculated on the training set or after an unstable split,
- the model is mainly driven by simple separation based on VAS (not biomarkers),
- calibration and clinical utility are not evaluated.
You should improve reporting and methods by including:
- a clear validation scheme: train/test split (how randomized, seed), or preferably nested cross-validation / bootstrap,
- metrics with 95% CI (e.g., bootstrapping),
- model comparisons: clinical-only vs biomarkers-only vs combined (do biomarkers add value beyond VAS/history?),
- calibration (calibration slope/intercept, Brier score) and decision thresholds (sensitivity/specificity/PPV/NPV),
- full XGBoost parameters (max_depth, eta, nrounds…), pre-processing, missing data handling, standardization,
- transparency for SHAP interpretation: on which dataset it was computed and how stable it is.
We thank the reviewer for these valuable comments. We fully acknowledge that, given our sample size (n = 82 total, with 66 endometriosis cases), there is a non-negligible risk of overfitting, despite our use of a held-out test set (17 patients) and early stopping based on validation performance. To address this concern:
Performance metrics (AUC = 0.95, accuracy = 0.94) are reported exclusively on the independent test set, not on training or validation data. The dataset was split once into training (n = 49), validation (n = 16), and test (n = 17) subsets using stratified random sampling to preserve case–control balance.
While VAS pain score was indeed among the top predictors (alongside serum sMICB), SHAP analysis (Figure 9) confirms that both clinical and biomarker features contributed meaningfully to predictions.
Calibration and clinical utility: We agree that calibration curves, decision curve analysis, or net reclassification improvement were not performed—primarily due to limited sample size. We have added a sentence in the Limitations section (lines 351-358).
We have fully addressed all points by adding a dedicated subsection in the Methods titled “Machine Learning Model Development and Validation”, which now provides complete transparency (lines 166-185).
- In the Discussion, when comparing PF vs serum and the concept of “local pelvic environment vs systemic signal,” and also the limitations of an IVF-workup control group, you should address that PF biomarkers can correlate more strongly with phenotype than serum markers — which you suggest for sMICA/sMICB and which is key for the credibility of your predictive model
We thank the reviewer for this insightful comment. We fully agree that peritoneal fluid (PF) biomarkers show stronger associations with specific clinical phenotypes—such as sMICA with infertility and sMICB with adhesions—reflecting their origin in the local pelvic microenvironment. This is precisely why we first identified these key biomarkers through PF analysis.
However, since PF is not accessible in routine clinical practice, our central goal was to determine whether serum levels of the same molecules could serve as reliable surrogates. Our data show significant correlations between PF and serum concentrations for sMICA, sMICB, and several immune checkpoints (ρ = 0.4–0.7), supporting the biological plausibility of using serum as a minimally invasive proxy.
We sincerely thank the reviewer for this insightful suggestion and for for recommending the study by Bartnik et al. (Biomedicines 2022, DOI:10.3390/biomedicines10102460). We have now incorporated this important reference into the Discussion to strengthen our rationale for using serum—rather than peritoneal fluid—biomarkers in our diagnostic model (lines 333-347).
We acknowledge that the control group—women undergoing laparoscopy for male-factor infertility work-up—may differ from a general asymptomatic population. Nevertheless, because all controls were confirmed endometriosis-free by direct visualization, they provide a valid reference for detecting systemic immune alterations specifically linked to endometriosis (rather than infertility per se).
- Throughout the manuscript the terminology “control” is very inconsistent — sometimes meaning “no endometriosis,” sometimes “no infertility,” and sometimes “no adhesions.” This needs to be fixed, otherwise readers will be lost.
We thank the reviewer for this important observation. We have thoroughly revised the manuscript to eliminate inconsistent use of the term “control.”
- Also, in practice your model uses blood + VAS. This is at best “minimally invasive” / “blood-based,” not strictly “non-invasive.” It is worth naming it precisely, otherwise the editors may challenge it.
We appreciate this precise methodological note. We agree that, strictly speaking, blood-based diagnostics are minimally invasive rather than fully non-invasive. We have therefore revised the manuscript to use more accurate terminology:
The term “non-invasive” has been replaced with “minimally invasive” or “blood-based” throughout the Abstract, Introduction, Results, and Discussion (3, 26, 27, 239, 258-259, 348, 373).
Round 2
Reviewer 2 Report
Comments and Suggestions for AuthorsI still have a few minor corrections to the manuscript — some inconsistencies are still there.
First, there are still sample size inconsistencies: in Methods → ML you write “No missing data were present; thus, no imputation…”. Meanwhile, in the Results/Figures 1–3 it’s clearly visible that for peritoneal fluid (PF) some measurements are missing (e.g., in Fig. 1 the control group is n=9 instead of 16, and the stage groups are also smaller).
Second, in the description of stage IV you state it is “defined by large endometriomas, extensive deep infiltrating lesions, and dense adhesions…”, which again suggest that rASRM stage IV = DIE (even if not stated directly).
Third, in the Results you mention “8 healthy controls”. This is inconsistent with your explanation that this is not a general healthy population.
In the Discussion you already bring up “peritoneal fibrosis/adhesions” and phenotypes, and I think this is a good place to also mention fibronectin and collagen IV as ECM markers and potential endometriosis biomarkers measured in both plasma and peritoneal fluid (Warzecha et al., IJMS 2022): https://doi.org/10.3390/ijms232415669
Another issue: in Methods (ML) you declare evaluation of calibration and utility metrics (Brier score, slope/intercept, PPV/NPV etc.), but in the Discussion you say that “clinical utility and calibration … were not assessed”. Please unify the narrative here.
Also, in Results 3.1 there is a sentence: “No significant differences between the groups were observed for any analyte.” But just a few lines later you report a significant difference for PF sMICB between stage IV and III.
Finally, the term “control” is still used for subgroups within the endometriosis cohort (which was supposed to be fixed), e.g. Fig. 3: “Adhesions-negative patients (control, n=12)”. This is exactly the confusing naming issue — I suggest “adhesion-negative subgroup” / “no-adhesions subgroup”.
Overall, these are smaller points now — in my view this is minor revision, but still necessary to correct before potential publication.
Author Response
Dear reviewer,
We sincerely thank the reviewer for this careful observation.
We would like to clarify the following points:
- First, there are still sample size inconsistencies: in Methods → ML you write “No missing data were present; thus, no imputation…”. Meanwhile, in the Results/Figures 1–3 it’s clearly visible that for peritoneal fluid (PF) some measurements are missing (e.g., in Fig. 1 the control group is n=9 instead of 16, and the stage groups are also smaller).
The statement “No missing data were present” refers exclusively to the variables used in the machine learning model, which included only serum biomarkers (sMICB, sMICA, sEng) and clinical features (age, VAS, infertility status, parity). These variables were fully available for all 82 participants (66 endometriosis patients + 16 controls), so no imputation was needed for model development.
In contrast, peritoneal fluid (PF) analyses (reported in Figures 1–3) involved a subset of patients due to limited PF volume—hence the smaller group sizes (e.g., control n = 9 in Figure 1). However, PF measurements were not used in the diagnostic model, which relies solely on serum-based, minimally invasive inputs.
We have clarified this distinction in the Methods section to avoid confusion (lines 171-173).
- Second, in the description of stage IV you state it is “defined by large endometriomas, extensive deep infiltrating lesions, and dense adhesions…”, which again suggest that rASRM stage IV = = DIE (even if not stated directly).
We thank the reviewer for this important clarification. We agree that our original phrasing could be misinterpreted as implying that all stage IV cases necessarily include deep infiltrating endometriosis (DIE). In fact, the rASRM classification assigns stage IV based on a total score ≥40, which can result from various combinations of factors—including large endometriomas, dense adhesions, and/or deep infiltrating lesions—but does not require the presence of DIE.
We have revised the text to state more precisely (lines 199-200).
- Third, in the Results you mention “8 healthy controls”. This is inconsistent with your explanation that this is not a general healthy population.
We thank the reviewer for this important clarification. We agree that referring to the control group as “healthy” is inaccurate and potentially misleading. We have therefore replaced all instances of “healthy controls” with precise descriptions (lines 84, 191, 266).
- In the Discussion you already bring up “peritoneal fibrosis/adhesions” and phenotypes, and I think this is a good place to also mention fibronectin and collagen IV as ECM markers and potential endometriosis biomarkers measured in both plasma and peritoneal fluid (Warzecha et al., IJMS 2022): https://doi.org/10.3390/ijms232415669
We sincerely thank the reviewer for recommending the inclusion of fibronectin and collagen IV as relevant extracellular matrix biomarkers in endometriosis. We have now incorporated this important reference (Warzecha et al., Int. J. Mol. Sci. 2022, DOI:10.3390/ijms232415669) into the Discussion, where we contextualize our findings on peritoneal fibrosis and adhesions within the broader framework of ECM remodeling. This addition strengthens the biological plausibility of our observed associations and aligns our work with recent advances in endometriosis biomarker research. (lines 205-308).
- Another issue: in Methods (ML) you declare evaluation of calibration and utility metrics (Brier score, slope/intercept, PPV/NPV etc.), but in the Discussion you say that “clinical utility and calibration … were not assessed”. Please unify the narrative here.
We thank the reviewer for identifying this inconsistency. Upon careful review, we confirm that calibration metrics (Brier score, calibration slope/intercept) and detailed clinical utility measures (PPV, NPV at multiple thresholds) were not formally evaluated due to the limited size of the test set (n = 17), which precludes stable estimation of these statistics.
The earlier mention of these metrics in the Methods section was an overstatement based on planned analyses that could not be reliably performed. We have now removed all references to calibration and clinical utility metrics from the Methods (line 180).
- Also, in Results 3.1 there is a sentence: “No significant differences between the groups were observed for any analyte.” But just a few lines later you report a significant difference for PF sMICB between stage IV and III.
We sincerely thank the reviewer for this important observation. We have revised the text in Section 3.1 to clarify that the statement “No significant differences between the groups were observed for any analyte” refers exclusively to the soluble immune checkpoint proteins (measured in a subset of participants), and not to sMICA/sMICB/sEng, for which significant associations were indeed found. This change eliminates ambiguity and ensures accurate interpretation of our results. (lines 191-192).
- Finally, the term “control” is still used for subgroups within the endometriosis cohort (which was supposed to be fixed), e.g. Fig. 3: “Adhesions-negative patients (control, n=12)”. This is exactly the confusing naming issue — I suggest “adhesion-negative subgroup” / “no-adhesions subgroup”.
We sincerely thank the reviewer for this exceptionally careful observation — and for your patience with us. You are absolutely right: despite our intention to standardize terminology, we inadvertently retained the misleading label “control” for internal subgroups (e.g., “adhesions-negative patients (control)”) in Figure 3 and related text. This was an oversight on our part, and we deeply appreciate your sharp eye (line 225).
