Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

An Investigation on Radiomics Feature Handling for HNSCC Staging Classification

Appl. Sci. 2022, 12(15), 7826; https://doi.org/10.3390/app12157826

by Nadia Brancati^1,*

, Massimo La Rosa¹, Giuseppe De Pietro¹

, Giusy Esposito^2,3

, Marika Valentino^4,5, Marco Aiello⁶ and Marco Salvatore⁶

Reviewer 1:

Yi Tao

Reviewer 2:

Carlo Aprile

Reviewer 3: Anonymous

Appl. Sci. 2022, 12(15), 7826; https://doi.org/10.3390/app12157826

Submission received: 17 June 2022 / Revised: 1 August 2022 / Accepted: 3 August 2022 / Published: 4 August 2022

(This article belongs to the Special Issue Artificial Intelligence and Radiomics in Computer-Aided Diagnosis)

Round 1

Reviewer 1 Report

This work proposed a maching learning method based on the use of 9 radiomics features, extracted from CT and PET images, to classify the disease in terms of pN-Stage, 10 pT-Stage and overall Stage. It’s an interesting work. The comments are as below:

1、 Line 32: please add references to the statement” a lot of studies have been published in the radiomics field”;

2、 Line 59: chagne Accuracy to accuracy;

3、 Line 154:Pearson coefficient less than 0.85 was selected as threshold. Please add reference

4、 Line 295: change resulted to resulted in

5、 Line 305: change “Age and Gender “ to “age and gender”

6、 Line 315: change give to gave

Author Response

We thank the reviewer for the comments.

In the following the answers at reviewer comments:

1) Line 32: please add references to the statement” a lot of studies have been published in the radiomics field”; We added references from [9] to [12];

2) Line 59: chagne Accuracy to accuracy; Done

3) Line 154: Pearson coefficient less than 0.85 was selected as threshold. Please add reference. We added the references [42] and [43] in which the Pearson correlation coefficient has been used for feature selection. In particular, the choice of Pearson correlation threshold, i.e. 0.85, was carried out after a few experiments using different thresholds such as 0.75, 0.80, and 0.90. Using threshold values less than 0.85 led to select fewer features of the dataset for classification, eliminating features that could be considered important for the reliability of the classification purpose, while increasing the threshold values, i.e. up to 0.90, included too many features in the dataset, generating features redundancy that could compromise the reliability of the classification results.

4) Line 295: change resulted to resulted in. Done

5) Line 305: change “Age and Gender “ to “age and gender”. Done

6) Line 315: change give to gave. Done

Reviewer 2 Report

The stage evaluation of HNSCC and lymphnodes status greatly affects prognosis and management of these patients.
In addition to common diagnostic imaging methods bioptic specimens are needed not only in the diagnostic/staging phase but need to be repeated several times in order to evaluate the response to chemo-radiotherapy. More recently advances in the field of Artificial Intelligence and Machine Learning seem to indicate the possibility of earlier diagnosis, more accurate staging and personalised therapy. Brancato and coll. address the issue employing different analytical approaches to an open source dataset of HNSCC.
Results they obtained are very promising in terms of N, T, and overall stage, even if for overall stage classification a higher number of features are necessary. The paper highlight the importance of radiomics analysis of both FDG-PET and CT , usually performed in the diagnosis/staging and follow-up steps. Therefore the introduction of such approach could reduce the costs for the health system and the dosimetric cost for patients.
A limit of the research is due to the retrospective analysis. Reported data are promising but need to be validated in a perspective study in the future.

Author Response

We thank the reviewer for the comments.

We added in the "Conclusions" section the following sentence (highlighted in blue in the manuscript):

"Our future aim is to strengthen these results validating our model on a perspective study."

Reviewer 3 Report

This paper reports the results of Machine Learning method (radiomics) extracted from CT and PET images, to obtain the HNSCC staging classification. The authors' results may represent an important contribution of oncology research. The study was well planned, the methods used are described in detail, the results are consistently presented, the conclusions correspond to the data obtained. Overall, the present quality of the manuscript is sufficient for publication. However, please consider the following suggestions that might help to improve the manuscript.

Specific comments:

1. How was the sample size determined? Please provide more information about participants in this study. What are the inclusion and exclusion criteria in the study?

2. If ethics committee approval, please provide the relevant protocol number in the manuscript.

3. Please provide about the Statistical analyses in the Material and methods.

4. Please provide more information about the features extracted from PET image play a key role in this study. It would be helpful if the authors give example or scenario to support its description.

5. Line 204, Please correct this problem problems.

Author Response

We thank the reviewer for the comments.

In the following the answers at reviewer comments:

R1. How was the sample size determined? Please provide more information about participants in this study. What are the inclusion and exclusion criteria in the study?

A1. We provided information concerning patients and inclusion and exclusion criteria in the “H&N dataset” subsection. For sake of clearness, we report below the text part relative to the requested information:

“[...] . The dataset collects 298 patients, belonging to four different institutions (HGJ, CHUS, HMR and CHUM), with Head and Neck Squamous Cell Carcinoma (HNSCC), locoregional recurrence, and distant metastasis. Per each patient fluorodeoxyglucose (FDG) PET and CT diagnostic images have been scanned before treatment. Further information about specific cohorts, image acquisition, Region Of Interests (ROIs) contours, and clinical data have been presented in a reference study [36]. In our work, 201 patients with Oropharyngeal primary tumor have been selected and patients with primary tumor site in Larynx (n=45), Nasopharynx (n=28), and Hypopharynx (n=13) or Unknown (n=9) have been excluded. Moreover, patients with Overall Stage equal to I have been excluded from this analysis (n=2). [...]”.

R2.If ethics committee approval, please provide the relevant protocol number in the manuscript.

A2.The dataset "Head-Neck-PET-CT" TCIA used in our study is available online at link:

https://wiki.cancerimagingarchive.net/display/Public/Head-Neck-PET-CT#242838670b323d6250cc42fa8fa09821fabe0bd7.

It was released under the TCIA rules and can be used following the TCIA data usage policies and restrictions:

https://wiki.cancerimagingarchive.net/display/Public/Data+Usage+Policies+and+Restrictions

R3. Please provide about the Statistical analyses in the Material and methods.

A3. As suggested, we reorganized the sections relative to features selection and classification, since the statistical parts were already included in the sections. We report below how we have modified these sections in order to highlight the statistical analyses. In blue is the modified text:

2.4 Statistical-based feature selection methods

After the features extraction step, a feature selection has been performed to reduce the high number of extracted features, applying different statistical measures to carry out the most appropriate statistical-based feature selection method. The high number of extracted features leads to correlations and redundancies, which can destroy further analyses both in speed for the high dimensionality and in prediction accuracy for the irrelevant details [41].

Nevertheless, it is important to demonstrate which features give greater contributions in terms of classification after a feature selection, without excluding a priori features that can be pivotal in the clinical outcome. Among the statistical methods for selecting features, Pearson correlation-based selection has been used to eliminate the most correlated features, hence just features with a Pearson coefficient less than 0.85, as threshold, have been collected both for PET (SUV154 maps) images and CT ones [42, 43]. Moreover, a set of supervised methods for feature selection has been tested: Chi-square, Mutual Information, Fisher Score, Anova, F-value and Recursive Features Elimination using Logistic Regression [44]. After several experiments, Recursive Features Elimination (RFE) using Logistic Regression (LR) has turned out to be the best method for selecting particularly significant features for classification purposes. RFE is a wrapped selection algorithm that reduces features dimensionality, constructing recursively the model. Here, it trained a LR model, then computes the ranking criterion for all features and removes those with the smallest ranking criterion [45]. A Leave-One-Out (LOO) cross-validation (CV) on DS1 has been conducted in order to determine the best subset of features selected by RFE-LR. For each patient in the test folder, a set of features SFk with k in [5, 10, 20, 50, 100] was selected to train the LR model. Finally, at predefined cutoff values of the 25th and 75th percentile, two subsets of features were selected for each set SFk, denoting them respectively SF1k and SF2k . In conclusion, 10 subsets of features were selected for each patient.

2.5. Multivariate analysis for classification pipeline

After the features selection step, a classification pipeline has been implemented. Six well known ML algorithms have been tested: Decision Tree (DT) [46], K-Nearest Neighbors (KNN) [47], Multi-Layer Perceptron (MLP) [ 48], Naive Bayes (NB) [ 49], Random Forest (RF) [50], and Support Vector Machine (SVM) [51 ]. In order to solve the data imbalance, the Synthetic Minority Oversampling Technique (SMOTE) algorithm [52] has been used. This algorithm helps to overcome the over-fitting problem that shows up when a random oversampling is used. New instances in feature space are generated by using interpolation between the positive instances that lie together. The subsets SF1k and SF2k have gone to feed the six ML models. As in the case of feature selection, LOO-CV on DS1 has been conducted to determine the predictive power of each model. F1-score, AUC and Accuracy have been used to estimate the predictive performance of the models. In particular, AUC has been chosen as main measure to quantify the predictive performance of the classification methods. In the case of multi-class classification (pT-Stage and Overall Stage), AUC has been computed with One vs Rest (OvR) strategy: one class was considered as “positive”, while all the others were considered as the “negative” classes. Finally, a mean of the AUCs was computed. Thus, for each subsets SF1k and SF2k , the model with the best value of AUC has been selected. Next, the best models have been tested on the independent test set, DS2, in order to assess the validity and robustness of the approach.

Furthermore, to better understand the classification results in terms of F1-score and AUC with respect to the selected features for each kind of diagnostic images, we have used a horizontal bar-plot, where on the x-axis the values range of F1-score and AUC are set, and the different colors of the bars refer to all-possible combinations between the selected features with the proper legend. In this way, we have been easily able to assert which features combination had the highest values of F1-score and AUC to determine the most accurate, reliable, and suitable diagnostic tool for head and neck tumour discrimination.

All the statistical analyses have been conducted by using Python v3.7.

R4. Please provide more information about the features extracted from PET image play a key role in this study. It would be helpful if the authors give example or scenario to support its description.

A4. In “Discussion” section, we added the following text (highlighted in blue):

“For instance, as shown in Table S.2 in the Supplementary Materials, for pT-Stage classification only features coming from PET images were selected, while for the more complex Overall Stage classification both features coming from PET and CT were necessary to achieve promising results. The best results for pN-Stage classification was obtained by using only features coming from original images, but also in this case features from PET images were selected together with some features from CT images, as shown in Table S.3 in Supplementary Materials, highlighting the importance of PET imaging in head and neck cancer diagnosis.

In fact, the importance and the significance of radiomics features extracted from PET images is confirmed by Bogowicz M. et al. [28], asserting that, since CT images are more prone to artifacts altering the classification purpose, radiomics features extracted from PET images are more robust for the classification modeling in head and neck region.”

[28] Bogowicz, M.; Riesterer, O.; Stark, L.S.; Studer, G.; Unkelbach, J.; Guckenberger, M.; Tanadini Lang, S. Comparison of PET and CT radiomics for prediction of local tumor control in head and neck squamous cell carcinoma. Acta oncologica 2017, 56, 1531–1536

R5. Line 204, Please correct this problem problems.

A5. Done

Article Menu

An Investigation on Radiomics Feature Handling for HNSCC Staging Classification

Further Information

Guidelines

MDPI Initiatives

Follow MDPI