Review Reports - Exploratory Statistical Analyses of Clinical and Biochemical Factors for Differentiated Thyroid Cancer from a Romanian Cohort

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This manuscript represents a retrospective statistical assessment of correlations between clinicopathological/biochemical/demographic characteristics in a large Romanian cohort of patients diagnosed with DTC. Stratified subtype regression analyses were also performed. As mentioned above, strengths of this manuscript include a reasonably sized dataset (n=1470 fully evaluable patients) as well as an important clinical topic of interest, i.e. personalized risk stratification/thresholding and accompanying health economics questions. With that said, this manuscript has significant limitations and should undergo major revision prior to publication. Major revisions focus on methodological/statistical concerns as well as interpretation/discussion of findings.
I. For what is presented as an “exploratory statistical assessment,” the manuscript fails to identify a guiding primary research question/hypothesis. Reading through the statistical methodology and results section, it is clear that many of the analyses are data-driven and hypothesis-generating. The authors perform 195 statistical tests, 70 of which remain significant following FDR correction. While some sort of multiple-testing correction has been performed, the authors do not clarify the hierarchy behind this analytical approach nor discuss exploratory vs. confirmatory inference. Overall, the authors should be clearer about whether they want this to read more hypothesis-generating or hypothesis-testing and upfront define what the primary endpoints are vs. secondary endpoints. Secondary endpoints interpreted after many moderate associations have been identified are vulnerable to over-interpretation biologic significance.
II. With regards to missing data, the authors removed all records from analyses that were missing TG, Anti-TG, or TSH lab values, decreasing the final sample size from 1556 patients to 1470 cases analyzed. However, they fail to perform a sensitivity analysis demonstrating that patients who were excluded due to these missing values do not systemically differ from patients who were included in the analysis. Because these biochemical markers are not primary predictors in all models, complete-case exclusion could induce meaningful selection bias. Authors should either confirm that included and excluded cases are similar enough that the complete-case analysis is valid or provide summary statistics comparing included vs. excluded cases and discuss reasons why imputation was not suitable in this scenario.
III. Regression modeling strategy should be clarified/strengthened. Authors chose to perform logistic regression models stratified by subtype but failed to acknowledge that many of these models have quasi-complete separation. The most egregious example of this is any model including lymphovascular invasion as a predictor. While the authors discuss inflated coefficients and unstable model performance in the Results section, they continue to interpret these models narratively throughout the discussion. In scenarios with separation, penalized regression approaches (i.e. Firth correction) or exact logistic regression are preferred. Authors should choose one of these strategies and either refit models or drastically change the tone/conclusions drawn from these unstable parameter estimates.
Additionally, there is no mention of any model performance metrics other than Cox & Snell pseudo-R² values. Authors do not report any calibration or discrimination metrics (i.e. AUC), CIs for odds ratios, or any internal validation. Regression modeling is difficult to interpret without these metrics, and clinicians cannot determine usefulness without them. If these models are strictly exploratory, the authors should make that clearer. Otherwise, claims regarding predicted value should be watered down significantly.
IV. There are a handful of statistically significant findings with trivial effect sizes that are discussed as if they have substantial meaning. Authors discuss multiple statistically significant correlations between sex and tumor size, height, or other parameters associated with local spread as if this could be biologically meaningful. However, no underlying mechanism is proposed, and no adjustment is performed for known confounding beyond what is included in limited regression models. With such a large sample size, we will often see small p-values arising from trivial effects. I would encourage the authors to be more deliberate about what constitutes a meaningful deviation from zero.

Author Response

Comment 0: This manuscript represents a retrospective statistical assessment of correlations between clinicopathological/biochemical/demographic characteristics in a large Romanian cohort of patients diagnosed with DTC. Stratified subtype regression analyses were also performed. As mentioned above, strengths of this manuscript include a reasonably sized dataset (n=1470 fully evaluable patients) as well as an important clinical topic of interest, i.e. personalized risk stratification/thresholding and accompanying health economics questions. With that said, this manuscript has significant limitations and should undergo major revision prior to publication. Major revisions focus on methodological/statistical concerns as well as interpretation/discussion of findings.

Response 0: Thank you for your review and for the constructive comments. We have addressed the concerns raised and hope that the revised version of the manuscript meets your expectations.

Comment 1: For what is presented as an “exploratory statistical assessment,” the manuscript fails to identify a guiding primary research question/hypothesis. Reading through the statistical methodology and results section, it is clear that many of the analyses are data-driven and hypothesis-generating. The authors perform 195 statistical tests, 70 of which remain significant following FDR correction. While some sort of multiple-testing correction has been performed, the authors do not clarify the hierarchy behind this analytical approach nor discuss exploratory vs. confirmatory inference. Overall, the authors should be clearer about whether they want this to read more hypothesis-generating or hypothesis-testing and upfront define what the primary endpoints are vs. secondary endpoints. Secondary endpoints interpreted after many moderate associations have been identified are vulnerable to over-interpretation biologic significance.

Response 1: Thank you for your comment. The study was designed as an exploratory, hypothesis-generating analysis rather than a confirmatory hypothesis-testing study. To clarify this, we have revised the manuscript to explicitly state the primary research question and study objectives. We have also emphasized throughout the Abstract, Methods, Results, and Discussion that the statistical analyses are exploratory in nature and that the identified associations should be interpreted as hypothesis-generating. In addition, we have moderated the interpretation of the findings to avoid overstatement of their biological significance.

Comment 2: With regards to missing data, the authors removed all records from analyses that were missing TG, Anti-TG, or TSH lab values, decreasing the final sample size from 1556 patients to 1470 cases analyzed. However, they fail to perform a sensitivity analysis demonstrating that patients who were excluded due to these missing values do not systemically differ from patients who were included in the analysis. Because these biochemical markers are not primary predictors in all models, complete-case exclusion could induce meaningful selection bias. Authors should either confirm that included and excluded cases are similar enough that the complete-case analysis is valid or provide summary statistics comparing included vs. excluded cases and discuss reasons why imputation was not suitable in this scenario.

Response 2: Thank you for your observation. According to the study protocol, these measurements were expected to be recorded for all patients after the first radioiodine treatment. Therefore, the absence of these values suggested that the data collection protocol was not fully adhered to in those cases. Because the reliability and completeness of the remaining information in such records could not be ensured, these cases were removed from the analysis. We considered that including them or performing data imputation might compromise the dataset's consistency and trustworthiness.

Comment 3: Regression modeling strategy should be clarified/strengthened. Authors chose to perform logistic regression models stratified by subtype but failed to acknowledge that many of these models have quasi-complete separation. The most egregious example of this is any model including lymphovascular invasion as a predictor. While the authors discuss inflated coefficients and unstable model performance in the Results section, they continue to interpret these models narratively throughout the discussion. In scenarios with separation, penalized regression approaches (i.e. Firth correction) or exact logistic regression are preferred. Authors should choose one of these strategies and either refit models or drastically change the tone/conclusions drawn from these unstable parameter estimates.

Additionally, there is no mention of any model performance metrics other than Cox & Snell pseudo-R² values. Authors do not report any calibration or discrimination metrics (i.e. AUC), CIs for odds ratios, or any internal validation. Regression modeling is difficult to interpret without these metrics, and clinicians cannot determine usefulness without them. If these models are strictly exploratory, the authors should make that clearer. Otherwise, claims regarding predicted value should be watered down significantly.

Response 3: Thank you for your insight. The regression models in our study were intended primarily for exploratory purposes. While issues related to quasi-separation were already noted in several sections of the manuscript, we agree that the interpretation required further moderation. We have further revised the text in the Abstract, Method, Results, Discussion, and Conclusions sections to clarify the exploratory nature of these analyses. We have also moderated the interpretation of the regression findings to avoid overstatement and to emphasize that these results should be considered hypothesis-generating.

Comment 4: There are a handful of statistically significant findings with trivial effect sizes that are discussed as if they have substantial meaning. Authors discuss multiple statistically significant correlations between sex and tumor size, height, or other parameters associated with local spread as if this could be biologically meaningful. However, no underlying mechanism is proposed, and no adjustment is performed for known confounding beyond what is included in limited regression models. With such a large sample size, we will often see small p-values arising from trivial effects. I would encourage the authors to be more deliberate about what constitutes a meaningful deviation from zero.

Response 4: Thank you for your observation. Given the large sample size of our dataset, we anticipated that statistically significant p-values with small effects could arise. Therefore, we calculated effect sizes for all statistical tests and structured the Results section accordingly, separating findings into two subsections: results with moderate to large effect sizes and results with small effect sizes. The latter were included due to the exploratory nature of the study, but they are presented separately because small effect sizes are more likely to reflect limited practical relevance or reduced reliability. We have clarified this distinction in the manuscript and emphasized the need for cautious interpretation of these findings. Regarding potential mechanisms, we agree that the current data do not allow firm biological explanations; therefore, these results are framed as exploratory observations that may generate hypotheses for future research.

Response to Comments on the Quality of English Language: We thank the reviewer for the constructive feedback. In addition to the specific revisions described above, the manuscript has been carefully double-checked in multiple iterations with Grammarly and by English speakers spending more than 20 years abroad in an English-speaking country.

Reviewer 2 Report

Comments and Suggestions for Authors

I appreciate the invitation to review this paper. This paper presents an assessment of clinical data of a cohort of 1470 thyroid cancer patients from Romania.

The paper is written in good and understandable language, with no issues with grammar or spelling. The paper is well organized and follows a logical narrative. However, I do have concerns on how the data is presented and displayed in the figures. I am a statistician and I am confused.

The last part of the introduction (lines 139-160) read more like a discussion and a conclusion as it reveals what was done in the paper. This style is odd and uncommon in academic papers.

Missing data assessment is mentioned 212-219 and again in lines 263-267 in the methods but summaries on missing data are not presented in tables 1 and 2. These should include the n of the missing as well.

In table 1, several categorical data variables do not add up to the total. Some of these missing records may not be possible (ontological /etiological impossibility) or they were not recorded (data collection or documentation failure). There should be a way to differentiate them and present them in the tables.

Figure 1 is odd. The effect size scale goes from 0.0 to 3.0 what are these? Some variables there are categorical, some are binary, some continuous.

In the same figure, why use the 4 colors in Figure 1 instead of a gradient?

Heatmaps are presented as row normalized and unnormalized, what is the point of presenting both?

I do not understand table 3, what are all these confidence intervals and how were they calculated?

Data is presented in a very awkward way. I am not even sure of what is being presented.

Author Response

Comment 0: I appreciate the invitation to review this paper. This paper presents an assessment of clinical data of a cohort of 1470 thyroid cancer patients from Romania. The paper is written in good and understandable language, with no issues with grammar or spelling. The paper is well organized and follows a logical narrative. However, I do have concerns on how the data is presented and displayed in the figures. I am a statistician and I am confused.

Response 0: Thank you for your thorough review!

Comment 1: The last part of the introduction (lines 139-160) read more like a discussion and a conclusion as it reveals what was done in the paper. This style is odd and uncommon in academic papers.

Response 1: Thank you for this observation. We agree that the final portion of the introduction reads more like a discussion or conclusion and was not consistent with the usual structure of academic manuscripts. This section had been included following a suggestion from one of the review editors; however, upon reconsideration, we agree with your point. Accordingly, we have removed this passage from the introduction to improve the manuscript’s structure and flow.

Comment 2: Missing data assessment is mentioned 212-219 and again in lines 263-267 in the methods but summaries on missing data are not presented in tables 1 and 2. These should include the n of the missing as well.

Response 2: Thank you for this observation. Table 1 has been revised to explicitly include the number of missing observations for each variable. This addition provides a clearer overview of data completeness and aligns the descriptive tables with the missing data assessments.

Comment 3: In table 1, several categorical data variables do not add up to the total. Some of these missing records may not be possible (ontological /etiological impossibility) or they were not recorded (data collection or documentation failure). There should be a way to differentiate them and present them in the tables.

Response 3: Thank you for this observation. Table 1 has been revised in several ways. First, the number of missing observations has been added, as requested in the previous comment. Second, minor numerical typographical errors were corrected (e.g., 815 instead of 655 for tumor location LDT). Finally, cases with multiple recorded categories (mixed entries) are now indicated. When categories are mutually exclusive, the total number of cases can be recovered by summing the category counts and missing values while accounting for mixed cases. For cancer type, subtracting mixed cases is sufficient because overlaps occur only between two categories. For tumor location, however, some cases involve 2–4 simultaneous locations due to data recoding, so subtracting mixed cases alone does not recover the total. This clarification has been added as a note under Table 1.

Comment 4: Figure 1 is odd. The effect size scale goes from 0.0 to 3.0 what are these? Some variables there are categorical, some are binary, some continuous. In the same figure, why use the 4 colors in Figure 1 instead of a gradient?

Response 4: Thank you for pointing this out. The numeric scale (0–3) in the original figure was used only to implement the color coding and did not represent meaningful values. To avoid confusion, we have removed this scale and retained only the color categories. Because the figure summarizes associations between variables of different types (categorical, binary, and continuous), the corresponding effect sizes were calculated using different statistical measures, as described in the Methods section. Since these measures are not directly comparable on a single continuous scale, using a gradient would be misleading. Therefore, we used four color categories to indicate the magnitude of associations: grey for non-significant correlations, yellow for small effects, orange for moderate effects, and red for large effects.

Comment 5: Heatmaps are presented as row normalized and unnormalized, what is the point of presenting both?

Response 5: The heatmaps are included in both forms to highlight different aspects of the data. The unnormalized heatmap shows the absolute counts, allowing readers to see the overall distribution and relative frequencies across categories. In contrast, the row-normalized heatmap presents proportions within each row, which facilitates comparison of patterns between groups independent of differences in sample size. Presenting both views, therefore, allows readers to examine both the absolute distribution of cases and the relative structure of the associations.

Comment 6: I do not understand table 3, what are all these confidence intervals and how were they calculated?

Response 6: Table 3 presents the 95% confidence intervals for each variable by subtype. For example, the first entry indicates that, for papillary microcarcinoma, the confidence interval for the proportion of females is 80.73%–87.74%. This interval represents an estimate of the proportion in the overall population from which the sample was drawn. To our knowledge, reporting these values may be useful for future hypothesis-confirmatory studies, where such estimates can inform sample size calculations based on power analyses. As described in the Methods section, confidence intervals for proportions (percentages) were calculated using the Wilson method, while confidence intervals for means were computed using a t-distribution–based approach.

Comment 7: Data is presented in a very awkward way. I am not even sure of what is being presented.

Response 7: Thank you for your input. Following your earlier suggestions, we revised several figures and tables in the manuscript to improve clarity. We also provided additional context in our responses above. We hope that these changes make the presented information clearer and easier to follow.

Reviewer 3 Report

Comments and Suggestions for Authors

The manuscript, "Statistical Analyses of Clinical and Biochemical Factors for Differentiated Thyroid Cancer From a Romanian Cohort," is well-written and detailed, and I found it interesting to read. However, after reading the text to the end, I found myself still unclear about what the authors were trying to achieve and what they ultimately achieved. I believe the main limitations of this study are:

1) Lack of a clearly stated research objective in the Introduction.

2) Lack of clear conclusions in the Discussion (or Conclusions) – what the authors consider to be the main findings of their study and why.

If the authors add these points to the manuscript, its perception will be significantly improved.

The second important issue is related to the sample, which the authors describe as follows:

“1,556 patients diagnosed with differentiated thyroid cancer who underwent surgical tumor removal followed by radioiodine therapy;”

First, why were only patients who had undergone radioiodine therapy included? This isn't clearly justified and isn't discussed further. Why weren't patients who hadn't received radioiodine therapy included in the sample?

Secondly, why weren't only PTC samples included? The sample also included FTC and (based on Table 1) MTC. However, FTC accounts for approximately 5% of the sample, while MTC accounts for 0.5%. This raises the question of how applicable the study results are to FTC/MTC. This issue is not addressed, and in fact, FTC/MTC only introduce statistical error into the PTC data. I believe their inclusion in the sample should be justified.

Section-by-section comments.

Introduction

Line 90: such as higher BMI…

Comment: The abbreviation needs to be explained.

Line 143: (age, PNI, vascular…

Comment: The abbreviation needs to be explained.

Method

It's very difficult to understand the number of samples used in the calculations. The text initially lists a total of 1556.

Line 165-167: The study population comprised 1233 female and 323 male patients, with 1458 cases of papillary carcinoma, 73 cases of follicular carcinoma, and 25 cases of mixed histology.

Further in the text the number 1470 appears.

Line 217-219: After applying the filters, the working dataset used in subsequent analyses comprised 1,470 fully evaluable patient records.

Table 1 has different numbers. For example: PTC – 1404, FTC – 82, MTC – 8, for a total of 1494. In other words, not a single number in the table matches what was given above in the text.

And the other variables add up to anything but 1470 or 1556, for example, Nodal Status N0+N1=935. So, in fact, the data for all variables was fragmentary? This should be clearly reflected in the text.

In addition, a footnote below the table 1 should provide an explanation of the abbreviations LST and LDT.

Line 296-297: Specifically, for the most prevalent subtypes (papillary microcarcinoma, diffuse sclerosing variant of PTC, papillary thyroid carcinoma),

Comment: "papillary thyroid carcinoma" is not a subtype.

Results

Figure 1. Heatmap of Significant Correlations and Their Effect Sizes

Comment: The presence of a significant correlation between "Cancer Type - Papillary" and "Cancer Type - Follicular" is unclear, and this is not discussed in the text. I would like some clarification on this matter.

Line 495-497: A comprehensive list of all identified statistically significant correlations, irrespective of effect size, is provided in the supplementary materials for completeness and transparency.

Comment: Here, authors should add the name of the supplementary materials file if there are multiple. However, I didn't find this specific information in the supplementary materials. So, either add it or remove the reference from the text.

Line 505-506: The corresponding heatmaps are provided in the supplementary materials and may serve as a reference for domain specialists.

Comment: Here, the authors also need to add the file name (subtip_vs_all_heatmap_data.xlsx). I successfully found it, but I should note that the file should contain explanations, as it's unclear what the percentages mean, why the number of samples for one subtype changes in some rows, and why some variables are represented as separate values—for example, Clinical Stage is represented as ST1…ST4, and Nodal Status as N[x/0/1].

Line 512: Minimally invasive PTC

Comment: Apparently the authors mean “minimally invasive FTC”.

Table 3. It is necessary to indicate what the percentages mean here.

Author Response

Comment 0: The manuscript, "Statistical Analyses of Clinical and Biochemical Factors for Differentiated Thyroid Cancer From a Romanian Cohort," is well-written and detailed, and I found it interesting to read. However, after reading the text to the end, I found myself still unclear about what the authors were trying to achieve and what they ultimately achieved. I believe the main limitations of this study are:

1) Lack of a clearly stated research objective in the Introduction.

2) Lack of clear conclusions in the Discussion (or Conclusions) – what the authors consider to be the main findings of their study and why.

If the authors add these points to the manuscript, its perception will be significantly improved.

Response 0: Thank you for this helpful comment. We agree that the original version of the manuscript did not sufficiently emphasize the study objective and the main conclusions. To address this, we revised both the Introduction and the Conclusion sections.

Clarification of the study objective (Introduction).
We added a paragraph explicitly stating the aim and research questions of the study. The revised text now explains that the analysis is exploratory in nature and intended to identify potential associations among clinical, pathological, biochemical, and demographic variables that may generate hypotheses for future research. The primary and secondary research questions are now explicitly stated.
Clarification of the main findings (Conclusion).
We substantially expanded the Conclusion section to summarize the principal findings of the study.

These revisions aim to make both the purpose of the study and the key findings clearer to readers.

Comment 1: The second important issue is related to the sample, which the authors describe as follows:

“1,556 patients diagnosed with differentiated thyroid cancer who underwent surgical tumor removal followed by radioiodine therapy;”

Response 1: Thank you for your insightful comment. The inclusion of only patients who underwent radioiodine therapy reflects the study design and was intended to ensure a clinically homogeneous cohort. Specifically, the analysis was restricted to patients who underwent total or near-total thyroidectomy followed by radioiodine therapy, a group typically characterized by an intermediate-to-high risk of recurrence requiring adjuvant treatment. At our institution, these patients undergo standardized postoperative evaluation, including biochemical monitoring (thyroglobulin and anti-thyroglobulin antibodies) and whole-body scintigraphy. This standardized follow-up enabled consistent data collection and more reliable clinicopathological correlations across the cohort. Patients who did not receive radioiodine therapy often had more heterogeneous management and follow-up protocols, which could have introduced additional variability into the analysis. This rationale has now been clarified in the Methods section.

With respect to histological subtypes, our intention was to analyze a real-world dataset reflecting the actual caseload treated at the “C.I. Parhon” National Institute of Endocrinology during the study period. As expected, papillary thyroid carcinoma (PTC) constituted the majority of cases, while follicular thyroid carcinoma (FTC) and medullary thyroid carcinoma (MTC) were present in smaller numbers within the institutional database and were therefore retained to reflect the pathological diversity encountered in clinical practice. The cohort was treated as a representative sample of the treated population, and the analyses were performed on the dataset as a whole. However, given the predominance of PTC cases, this type naturally has a greater influence on the overall results. We have clarified this point in the manuscript.

These clarifications have been incorporated into the revised manuscript.

Comment 2: Line 90: such as higher BMI…

Comment: The abbreviation needs to be explained.

Response 2: Thank you for your observation. As this was the only instance where the abbreviation appeared, we have replaced it with the full term (body mass index).

Comment 3: Line 143: (age, PNI, vascular…

Comment: The abbreviation needs to be explained.

Response 3: Thank you for the observation. Based on feedback from another reviewer, the paragraph containing this instance was removed from the revised version of the manuscript. In the next occurrence of the abbreviation, the full term is provided before its use.

Comment 4: It's very difficult to understand the number of samples used in the calculations. The text initially lists a total of 1556.

Line 165-167: The study population comprised 1233 female and 323 male patients, with 1458 cases of papillary carcinoma, 73 cases of follicular carcinoma, and 25 cases of mixed histology.

Further in the text the number 1470 appears.

Line 217-219: After applying the filters, the working dataset used in subsequent analyses comprised 1,470 fully evaluable patient records.

Table 1 has different numbers. For example: PTC – 1404, FTC – 82, MTC – 8, for a total of 1494. In other words, not a single number in the table matches what was given above in the text.

Response 4: Thank you for this important observation. The apparent inconsistencies arise from two aspects of the dataset that have now been clarified in the revised manuscript and Table 1.

First, the initial cohort comprised 1556 patients, as stated in the text. After applying the predefined filters and excluding records that were not fully evaluable for the main analyses, the working dataset used in subsequent analyses consisted of 1,470 patients.

Second, several variables contain missing values or multiple recorded categories, which means that the counts for individual categories do not always sum to the total number of patients. Table 1 has been revised to explicitly report the number of missing observations for each variable, and cases with mixed category entries are now indicated. This allows the reader to understand why the sums of category counts may differ from the total number of cases.

A clarification has also been added to the table notes explaining how totals should be interpreted when missing values or mixed categories are present.

Comment 5: In addition, a footnote below the table 1 should provide an explanation of the abbreviations LST and LDT.

Response 5: Thank you for your observation. Initially, the explanations for LST and LDT were provided in the description of Figure 1. However, since these abbreviations first appear in Table 1, we agree that they should be defined there. Accordingly, we have added a footnote to Table 1 explaining these abbreviations.

Comment 6: Line 296-297: Specifically, for the most prevalent subtypes (papillary microcarcinoma, diffuse sclerosing variant of PTC, papillary thyroid carcinoma),

Comment: "papillary thyroid carcinoma" is not a subtype.

Response 6: Thank you for this observation. We agree that “papillary thyroid carcinoma” refers to the main histological type rather than a subtype. This text, as well as other similar instances in the manuscript, has been corrected to refer to the classical variant of papillary thyroid carcinoma, which was the intended subtype.

Comment 7: Figure 1. Heatmap of Significant Correlations and Their Effect Sizes

Response 7: Thank you for this observation. The significant correlation between “Cancer Type – Papillary” and “Cancer Type – Follicular” arises from the way these variables are encoded in the dataset. These categories are mutually exclusive in almost all cases, meaning that patients diagnosed with papillary carcinoma generally do not have follicular carcinoma, and vice versa. As a result, the statistical analysis detects a strong negative association between the two variables. However, this relationship is a structural consequence of the classification scheme rather than a clinically informative finding. To clarify this point, we have added an explanatory note in the revised manuscript (at the end of the “Correlations with Moderate and Large Effect Sizes” subsection).

Comment 8: Line 495-497: A comprehensive list of all identified statistically significant correlations, irrespective of effect size, is provided in the supplementary materials for completeness and transparency.

Response 8: Thank you for this observation. In an earlier stage of manuscript preparation, we planned to include a supplementary file containing the complete list of statistically significant correlations regardless of effect size. However, during the final revision of the manuscript, we concluded that these additional associations would provide limited interpretative value and might unnecessarily increase the volume of supplementary material. Therefore, we decided not to include this file and have now removed the corresponding sentence from the revised manuscript.

Comment 9: Line 505-506: The corresponding heatmaps are provided in the supplementary materials and may serve as a reference for domain specialists.

Response 9: Thank you for your comment. We have revised the manuscript to explicitly include the name of the supplementary file (subtip_vs_all_heatmap_data.xlsx). The supplementary file has also been updated with a table description explaining the meaning of the reported percentages, confidence intervals, and sample sizes.

The number of samples for each subtype varies across rows and columns because it depends on both the total number of cases in that subtype and the availability of data for the corresponding variable. Cases with missing data for a given variable were excluded from the corresponding calculation. Additional information on missing data is now provided in Table 1 of the manuscript.

Finally, we clarified the representation of categorical variables. Binary variables (e.g., nodal status) are represented in a single column, while variables with multiple categories require separate columns. We also corrected a typographical error in the table header, where binary variables were previously labeled with an unnecessary “x/” (e.g., N[x/0/1]), which has now been corrected to N[0/1].

Comment 10: Line 512: Minimally invasive PTC. Comment: Apparently the authors mean “minimally invasive FTC”.

Response 10: Thank you for pointing this out. This was indeed a typographical error, and it has been corrected to “minimally invasive FTC” in the revised version of the manuscript.

Comment 11: Table 3. It is necessary to indicate what the percentages mean here.

Response 11: Thank you for this observation. The percentages in Table 3 correspond to proportions calculated for categorical variables, whereas numerical variables are reported as means. Accordingly, the reported values represent 95% confidence intervals for either proportions or means. Confidence intervals for proportions (categorical variables) were calculated using the Wilson method, while confidence intervals for means (numerical variables) were calculated using a t-distribution–based method. To improve clarity, we have revised the description of Table 3 in the manuscript to explicitly state this.

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

Thanks again for the invitation to follow up on this paper. I apologize for the delay in responding, as I was out of town. The authors have done a great job addressing my comments.

The introduction now flows well with the edits added at the end.

With the revisions in Table 1, the table is much clearer. Figure 1 is also much improved. I would only suggest adding two items to the legend: first, the definitions for small, medium, and large effects, and second, a note that the results have been adjusted for multiple testing, the adjustment elevated the robustness of the findings and adds extra merit.

I understand the argument presented in the response regarding the unnormalized heatmaps. However, I am still concerned that they may confuse readers. Because the unnormalized tables heavily reflect the sampling, they may be extremely biased when shown next to their normalized counterparts. I appreciate the transparency they provide, but I would prefer to see them moved to the supplementary materials.

For Table 3, the proportions and means are mentioned in the legend, but the actual values do not appear in the table. I recommend adding each value next to its corresponding confidence interval.

Overall, I believe the paper can be accepted once these minor suggestions are addressed.

Author Response

Comment 0: Thanks again for the invitation to follow up on this paper. I apologize for the delay in responding, as I was out of town. The authors have done a great job addressing my comments.

The introduction now flows well with the edits added at the end.

Response 0: Thank you for your kind feedback and for taking the time to follow up. We appreciate your thoughtful review.

Comment 1: With the revisions in Table 1, the table is much clearer. Figure 1 is also much improved. I would only suggest adding two items to the legend: first, the definitions for small, medium, and large effects, and second, a note that the results have been adjusted for multiple testing, the adjustment elevated the robustness of the findings and adds extra merit.

Response 1: Thank you for this helpful suggestion. We have revised the Figure 1 legend to include clarification on the interpretation of effect sizes and a note regarding multiple testing adjustment. Specifically, we now indicate that effect size categories (small, moderate, large) are based on established, metric-specific thresholds appropriate to each statistical test. Given that different effect size measures were used (correlation coefficients, rank-biserial correlation, and Cramér’s V with dimension-dependent thresholds), detailed definitions have been provided in the Methods section to ensure clarity while maintaining readability of the figure legend. Additionally, we have specified in the legend that all p-values were adjusted using a false discovery rate (FDR) correction.

Comment 2: I understand the argument presented in the response regarding the unnormalized heatmaps. However, I am still concerned that they may confuse readers. Because the unnormalized tables heavily reflect the sampling, they may be extremely biased when shown next to their normalized counterparts. I appreciate the transparency they provide, but I would prefer to see them moved to the supplementary materials.

Response 2: Thank you for this insightful suggestion. We agree that presenting both normalized and unnormalized heatmaps side by side may lead to potential confusion due to the influence of sampling on the unnormalized data. Accordingly, we have moved the unnormalized heatmaps to the supplementary materials, where they remain available for transparency and completeness.

Comment 3: For Table 3, the proportions and means are mentioned in the legend, but the actual values do not appear in the table. I recommend adding each value next to its corresponding confidence interval.

Response 3: Thank you for this helpful suggestion. We have revised Table 3 to include the corresponding proportions and means alongside their respective confidence intervals, as recommended.

Comment 4: Overall, I believe the paper can be accepted once these minor suggestions are addressed.

Response 4: Thank you for your positive assessment of our work. We have addressed all the suggested revisions.