Integrated Hyperparameter Optimization with Dimensionality Reduction and Clustering for Radiomics: A Bootstrapped Approach

Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThe manuscript presents a comprehensive analysis of unsupervised clustering methods applied to synthetic and clinical data, proposing a generalizable framework for clustering evaluation. The topic is relevant and highly interesting, and the methodology is robust. However, there are some recommendations to improve clarity.
Abstract: The abstract lacks a summary of the main results and conclusions. I recommend integrating the most relevant results (e.g., performance metrics, insights from clustering) and a short concluding remark on the potential of the proposed framework.
Introduction: Please ensure that all abbreviations and acronyms are properly defined when first introduced, both in the Introduction and throughout the paper.
Additionally, the authors should clearly state that the study involves both synthetic and clinical datasets, and explicitly frame the rationale for using simulated data as a preliminary validation step for the proposed methodology.
Materials and Methods: The structure of this section should be improved to better balance the presentation of analyses on simulated and clinical data (as in the Results section). Indeed, it is not entirely clear the role of the simulated datasets with respect to the clinical ones.
I suggest including in the manuscript the description of the preprocessing step applied to the features (as indicated in Figure 1). In particular, I would recommend clarifying:
- Which radiomic features were extracted and grouped (if any)
- Whether the preprocessing steps were identical for both datasets
- The rationale behind specific preprocessing choices (especially the normalization and the correlation phase).
Discussion: First, I would recommend that the authors elaborate more on the results obtained from the synthetic data. I would also avoid repeating the methodological setup and focus instead on the innovativeness and potential results of the framework.
Given that AUROC values are generally not very high, it would be useful to better highlight the novel aspects of the study and to contextualize the results in terms of practical clinical relevance.
For the clinical dataset, it would strengthen the paper to further explore whether the identified clusters correlate with clinical variables not initially used in the clustering process, especially given the framework’s stated objective to uncover novel clinical groupings or features.
If possible, I suggest including graphical representations (e.g., t-SNE, PCA, UMAP) showing how clinical data points are distributed across the clusters. This would enhance the interpretability of the clustering and help assess separation and structure visually.
Author Response
Abstract
1. The abstract lacks a summary of the main results and conclusions. I recommend integrating the most relevant results (e.g., performance metrics, insights from clustering) and a short concluding remark on the potential of the proposed framework.
Ans: Thank you for the suggestion. We have revised the abstract to include key findings from both the simulation and real-world studies as the following: We evaluated ten unsupervised learning pipelines using both simulation studies and real-world radiomics data derived from multiphase CT images of renal cell carcinoma. In simulations, we found that Non-negative Matrix Factorization (NMF) and Spectral Clustering outperformed the traditional Principal Component Analysis (PCA)-based approach. The best-performing pipeline (NMF followed by K-means clustering) successfully identified all three simulated clusters, achieving a Cramér’s V of 0.9. The simulation also established a reference framework for under-standing the concordance patterns among different pipelines under varying strengths of clustering effects. High concordance reflects strong clustering. In the real-world data application, we observed a moderate clustering effect, which aligned with the weak associations to clinical outcomes, reflected by an AUROC of ≤ 0.63.
Introduction:
2. Please ensure that all abbreviations and acronyms are properly defined when first introduced, both in the Introduction and throughout the paper. Additionally, the authors should clearly state that the study involves both synthetic and clinical datasets and explicitly frame the rationale for using simulated data as a preliminary validation step for the proposed methodology.
Ans: We thank the reviewer for this observation. We have thoroughly reviewed the manuscript to ensure that all abbreviations and acronyms are defined upon first use (we had missed this in the case of t-SNE). Additionally, we have revised the Introduction to explicitly state that the study involves both synthetic and clinical datasets. We also clarified that simulated data were used to preliminarily validate the robustness and effectiveness of the proposed unsupervised pipelines under controlled conditions before applying them to real-world radiomics data.
Materials and Methods:
3. The structure of this section should be improved to better balance the presentation of analyses on simulated and clinical data (as in the Results section). Indeed, it is not entirely clear the role of the simulated datasets with respect to the clinical ones. I suggest including in the manuscript the description of the preprocessing step applied to the features (as indicated in Figure 1). In particular, I would recommend clarifying:
- Which radiomic features were extracted and grouped (if any)
- Whether the preprocessing steps were identical for both datasets
- The rationale behind specific preprocessing choices (especially the normalization and the correlation phase).
Ans: As suggested by the reviewer, we have revised the structure of the Materials and Methods section to better distinguish and balance the presentation of simulated and clinical data. The updated structure is summarized below.
Materials And Methods
2.1 Datasets
2.1.1 Clinical Data
2.1.2 Simulated Data
2.2. Methods
2.2.1 Unsupervised Learning Pipelines
2.2.2 Hyperparameter Tuning
2.2.3 Performance Evaluation
- The radiomic feature families extracted are already described in the “Preprocessing and Data Extraction” subsection. These include shape-based features, first-order statistics, and higher-order texture features derived from GLCM, GLRLM, GLSZM, GLDM, and NGTDM matrices. No additional grouping was applied beyond these predefined categories.
- We have added a sentence in Section 2.1.2 clarifying that the same preprocessing steps were applied to the simulated data as in the clinical dataset in the revised manuscript.
- Preprocessing steps were chosen to reflect the needs of unsupervised clustering in high-dimensional radiomics data. Mean imputation was used to ensure a complete feature matrix, necessary for applying dimensionality reduction methods such as PCA and NMF. Correlation filtering removed features with Pearson correlation coefficients above 0.9 to mitigate redundancy and improve model interpretability, as redundant features can bias clustering outcomes and distort distance metrics. Z-score normalization was essential to standardize feature scales, especially because clustering and dimensionality reduction methods used (e.g., K-means, Spectral, PCA) are sensitive to feature magnitude. These steps ensured fair evaluation across pipelines and improved numerical stability during optimization.
Discussion:
4. First, I would recommend that the authors elaborate more on the results obtained from the synthetic data. I would also avoid repeating the methodological setup and focus instead on the innovativeness and potential results of the framework. Given that AUROC values are generally not very high, it would be useful to better highlight the novel aspects of the study and to contextualize the results in terms of practical clinical relevance. For the clinical dataset, it would strengthen the paper to further explore whether the identified clusters correlate with clinical variables not initially used in the clustering process, especially given the framework’s stated objective to uncover novel clinical groupings or features. If possible, I suggest including graphical representations (e.g., t-SNE, PCA, UMAP) showing how clinical data points are distributed across the clusters. This would enhance the interpretability of the clustering and help assess separation and structure visually.
Ans: The discussion has been rearranged with the following logic flow:
1) Highlighted the methodological contribution: "This study presents a novel approach to unsupervised learning by conducting hyperparameter tuning across multiple computing steps in a unified framework."
2) Highlighted the novelty in performance evaluation: "Our study further introduced a rigorous performance evaluation framework. We assessed model performance on both simulated and real-world radiomics data, al-lowing systematic comparison under controlled conditions and clinical relevance. Performance was evaluated based on three key criteria: (1) cluster robustness, measured through ARI across bootstrapped iterations, (2) cluster interpretability, assessed using a Classification Tree (CART) to unpack high-dimensional radiomic features, and (3) clinical relevance."
3) Explained an important insight about a framework for understanding the concordance patterns among different pipelines under varying strengths of clustering effects. High concordance reflects strong clustering. The true strength of clustering structure remains unknown when using high-dimensional imaging data in cancer research. Pathological subtypes may not fully account for the observed variation in radiological image expression.
4) Discussed about why the performances were different across the pipelines and pointed out the limitation of tSNE.
5) Explained how clustering concordance across pipelines observed in real-life data consistent with the concordance patterns from simulated data. This can lead to a conclusion of a moderate clustering strength which aligned with the weak associations to clinical outcomes, reflected by an AUROC of ≤ 0.63.
6) Make a final recommendation "Given the variability in clustering outcomes and the observed disagreement across different dimensionality reduction methods, we recommend a multi-pipeline approach for future studies".
Regarding to the graphic, we have thought about it too initially. However, in a second thought, we realized t-SNE plots and cluster heatmaps are visually appealing and can provide a high-level understanding of complex data, but they have major limitations when interpreting the clinical meaning of clusters.
t-SNE preserves local relationships (i.e., nearby points stay nearby), but it distorts global distances. Clusters may appear well-separated or compact in t-SNE space even when they're not truly distinct in high-dimensional space. The position, size, and distance between clusters in a t-SNE plot cannot be trusted for inference or interpretation. t-SNE is non-deterministic, each run of t-SNE can produce different visualizations due to random initialization. And it has no statistical rigor.
Cluster heatmaps are highly sensitive to row/column ordering, which is often determined by hierarchical clustering dendrograms. Apparent clusters in a heatmap may arise due to how data is sorted, not due to actual meaningful subgroups. They are not robust to scaling or feature selection and often don't reveal the real structure.
Both t-SNE and cluster heatmaps cannot link back to the original variables to obtain the clinical or biological interpretation. A preferable method is to conduct transparent supervised leaning such as classification tree or direct feature comparisons between clusters (e.g., ANOVA, t-tests).
In this study, we have used Classification Tree for the cluster interpretation. We have added the Classification Tree output in the supplementary material.
Reviewer 2 Report
Comments and Suggestions for AuthorsReviewer’s comments
This manuscript, "Novel Hyperparameter Optimization with a Chain of Dimension Reduction and Unsupervised Clustering Methods for Radiomics Data", is well-written. The work is relevant. However, some areas require significant revision to improve clarity and strengthen the manuscript to reach publication standards.
Here are my comments to improve the quality of the manuscript
Minor Corrections
- The title is quite ambiguous; consider changing the title to be more specific. Instead of “Novel Hyperparameter Optimization”. I recommend changing to “Integrated Hyperparameter Optimization with Dimensionality Reduction and Clustering for Radiomics: A Bootstrapped Approach”
- The manuscript is missing clarity on the discussions on why t-SNE and NMF were selected as DR tools. Similarly, there’s little reference to hyperparameter tuning in unsupervised radiomics studies. I recommend adding two more citations to expand the literature gap, specifically around pipeline optimization in unsupervised radiomics. Consider citing: “Enhanced MRI-based brain tumour classification with a novel Pix2pix GAN augmentation framework” and “A Comparative Analysis of the Novel Conditional Deep Convolutional Neural Network Model, Using Conditional Deep Convolutional Generative Adversarial Network-Generated Synthetic and Augmented Brain Tumour Datasets for Image Classification”.
- To improve the interpretation of clustering results, will recommend providing visuals (e.g., t-SNE plots or cluster heatmaps).
- The authors reported an AUROC value < 0.63. This indicates weak discrimination. I would recommend that the authors discuss the implications of this weak performance and future directions to improve its clinical relevance
Author Response
Response to Reviewer-2
Minor Corrections
1. The title is quite ambiguous; consider changing the title to be more specific. Instead of “Novel Hyperparameter Optimization”. I recommend changing to “Integrated Hyperparameter Optimization with Dimensionality Reduction and Clustering for Radiomics: A Bootstrapped Approach”
Ans: We thank the reviewer for the constructive suggestion. As recommended, we have revised the manuscript title to: “Integrated Hyperparameter Optimization with Dimensionality Reduction and Clustering for Radiomics: A Bootstrapped Approach.”
2. The manuscript is missing clarity on the discussions on why t-SNE and NMF were selected as DR tools. Similarly, there’s little reference to hyperparameter tuning in unsupervised radiomics studies. I recommend adding two more citations to expand the literature gap, specifically around pipeline optimization in unsupervised radiomics. Consider citing: “Enhanced MRI-based brain tumour classification with a novel Pix2pix GAN augmentation framework” and “A Comparative Analysis of the Novel Conditional Deep Convolutional Neural Network Model, Using Conditional Deep Convolutional Generative Adversarial Network-Generated Synthetic and Augmented Brain Tumour Datasets for Image Classification”.
Ans: We thank the reviewer for this important point. The rationale for selecting t-SNE and NMF is now explicitly discussed in the revised manuscript. As noted in the Discussion section (lines 301–310), NMF was chosen for its ability to extract parts-based, interpretable components, which is advantageous for uncovering latent biological patterns in radiomic data. In contrast, t-SNE was included to evaluate the performance of a widely used nonlinear embedding technique. Its instability and lack of consistent cluster preservation across runs were also discussed (lines 309–318), highlighting why it ultimately underperformed in our pipeline evaluation.
We appreciate the review for pointing this out, we have read the literature the reviewer provided and more literature from our own search. We have decided to cite the following reference which is more directly related to our discussion.
- Sun, Eric D., Rong Ma, and James Zou. "Dynamic visualization of high-dimensional data." Nature Computational Science 3.1 (2023): 86-100.
3. To improve the interpretation of clustering results, will recommend providing visuals (e.g., t-SNE plots or cluster heatmaps).
Ans: We thank reviewer to bring this up. The Interpretation of clustering result is important. t-SNE plots and cluster heatmaps are visually appealing and can provide a high-level understanding of complex data, but they have major limitations when interpreting the biological meaning of clusters.
t-SNE preserves local relationships (i.e., nearby points stay nearby), but it distorts global distances. Clusters may appear well-separated or compact in t-SNE space even when they're not truly distinct in high-dimensional space. The position, size, and distance between clusters in a t-SNE plot cannot be trusted for inference or interpretation. t-SNE is non-deterministic, each run of t-SNE can produce different visualizations due to random initialization. And it has no statistical rigor.
Cluster heatmaps are highly sensitive to row/column ordering, which is often determined by hierarchical clustering dendrograms. Apparent clusters in a heatmap may arise due to how data is sorted, not due to actual meaningful subgroups. They are not robust to scaling or feature selection and often don't reveal the real structure
Both t-SNE and cluster heatmaps cannot link back to the original variables to obtain the clinical or biological interpretation. A preferable method is to conduct transparent supervised leaning such as classification tree or direct feature comparisons between clusters (e.g., ANOVA, t-tests).
The focus of our paper Is to seed solutions to produce reliable (highly repeatable), realistic (reflect the true clustering effect not hallucination) and meaningful (clinically or biologically interpretable, association with biological or clinical outcomes). Therefore, we feel that including virtualization may shift attention away from the primary focus of our paper.
- Yang, Z., Chen, Y., & Corander, J. (2021). T-SNE Is Not Optimized to Reveal Clusters in Data. arXiv preprint arXiv:2110.02573.
- https://bioinformatics.mdanderson.org/public-software/ngchm/heatmaps/
4. The authors reported an AUROC value < 0.63. This indicates weak discrimination. I would recommend that the authors discuss the implications of this weak performance and future directions to improve its clinical relevance
Ans: We acknowledge the reviewer’s observation regarding the weak discriminatory performance (AUROC < 0.63). This point is addressed in the Discussion section (lines 319–334), where we noted the relatively low agreement between unsupervised clusters and clinical metrics, consistent with prior studies [1]. As discussed, this reflects the known disconnect between high-dimensional radiomic patterns and categorical clinical endpoints, which are often coarse and influenced by histopathological variability. However, our paper is a methodological study rather than a clinical one. The real-world data application is intended to illustrate how unsupervised methods can be appropriately applied when the clustering effect is modest, and pathological subtypes do not fully explain the observed variation in radiological image expression.
- Gao et al., "Identification of clear cell renal cell carcinoma subtypes by integrating radiomics and transcriptomics," Heliyon, vol. 10, no. 11, 2024.
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsI appreciate the authors' responses and the adjustments to the manuscript.
These modifications have greatly enhanced the clarity and quality of the piece.
Thank you very much.