A Systematic Review of the Current Status and Quality of Radiomics for Glioma Differential Diagnosis

Simple Summary Gliomas can be difficult to discern clinically and radiologically from other brain lesions (either neoplastic or non-neoplastic) since their clinical manifestations as well as preoperative imaging features often overlap and appear misleading. Radiomics could be extremely helpful for non-invasive glioma differential diagnosis (DDx). However, implementation in clinical practice is still distant and concerns have been raised regarding the methodological quality of radiomic studies. In this context, we aimed to summarize the current status and quality of radiomic studies concerning glioma DDx in a systematic review. In total, 42 studies were selected and examined in our work. Our study revealed that, despite promising and encouraging results, current studies on radiomics for glioma DDx still lack the quality required to allow its introduction into clinical practice. This work could provide new insights and help to reach a consensus on the use of the radiomic approach for glioma DDx. Abstract Radiomics is a promising tool that may increase the value of imaging in differential diagnosis (DDx) of glioma. However, implementation in clinical practice is still distant and concerns have been raised regarding the methodological quality of radiomic studies. Therefore, we aimed to systematically review the current status of radiomic studies concerning glioma DDx, also using the radiomics quality score (RQS) to assess the quality of the methodology used in each study. A systematic literature search was performed to identify original articles focused on the use of radiomics for glioma DDx from 2015. Methodological quality was assessed using the RQS tool. Spearman’s correlation (ρ) analysis was performed to explore whether RQS was correlated with journal metrics and the characteristics of the studies. Finally, 42 articles were selected for the systematic qualitative analysis. Selected articles were grouped and summarized in terms of those on DDx between glioma and primary central nervous system lymphoma, those aiming at differentiating glioma from brain metastases, and those based on DDx of glioma and other brain diseases. Median RQS was 8.71 out 36, with a mean RQS of all studies of 24.21%. Our study revealed that, despite promising and encouraging results, current studies on radiomics for glioma DDx still lack the quality required to allow its introduction into clinical practice. This work could provide new insights and help to reach a consensus on the use of the radiomic approach for glioma DDx.


Introduction
Gliomas are the most common primary brain tumor, which originate in the glial cells, including astrocytes, oligodendrocytes, and ependymal cells [1]. According to the World Health Organization (WHO) grading system, gliomas are categorized into grades 1 to 4. Except for pilocytic astrocytoma (WHO grade 1), all the WHO 2-4 gliomas are malignant tumors [2]. 2 of 20 Although comprising less than 2% of all newly diagnosed cancers, gliomas are associated with substantial mortality and morbidity. Of these, glioblastoma multiforme (GBM) is the most aggressive and lethal glioma and accounts for 70-75% of all gliomas [3].
Concerning clinical aspects, glioma predominantly manifests with neurological signs, which can also be encountered in other neoplastic and nonneoplastic lesions such as brain inflammation, abscess, lymphoma, or brain metastasis [4,5].
Brain imaging has a fundamental role in glioma management, for establishing an accurate diagnosis, classification, surgical planning, and post-treatment follow-up. Commonly, a brain computed tomography (CT) scan is the initial imaging modality used to diagnose glioma, which presents as a hypodense lesion, possibly showing rim enhancement following contrast agent injection. Despite providing important anatomical information, CT is usually followed by magnetic resonance imaging (MRI), which is generally considered superior to CT in terms of contrast resolution and can provide complementary information [6,7]. MRI with gadolinium contrast enhancement is considered the gold standard imaging method for assessing brain tumors. It provides information on location, mass effect, peritumoral edema, and contrast-enhancement [7]. However, advances in imaging techniques have allowed for a more detailed characterization of tumor characteristics and for a deeper investigation of glioma pathophysiological aspects. Advanced MRI sequences such as perfusion, advanced diffusion protocols [8], and susceptibility weighted imaging, as well as positron emission tomography (PET) scans with specific radiotracer, have emerged as valuable tools to inform clinical decision making and provide a non-invasive way to help in glioma management [9].
Nevertheless, beyond what concerns the overlapping clinical manifestations, gliomas can be difficult to discern radiologically from other brain lesions (either neoplastic or nonneoplastic) since their preoperative imaging features often overlap and appear misleading. Because certain lesions require nonoperative treatments, it is necessary to distinguish them from gliomas, and this constitutes a serious clinical challenge affecting both surgical planning and follow-up treatment.
For example, primary central nervous system lymphoma (PCNSL) is a common brain lesion that has shown an increase in occurrence in recent decades as the number of immunosuppressed and immunocompetent patients has increased. On MRI, PCNSL and high-grade gliomas share structural overlaps and anatomical similarities, both of which show contrast-enhancing lesions with peritumoral edema [10]. Similarly, distinguishing a glioma from brain metastasis is another clinical challenge, not only because of the similar symptoms of these conditions but also due to their very similar appearance on conventional MRI sequences as solitary, highly enhancing brain tumors surrounded by a T2-hyperintense edema [4,11].
Furthermore, despite the great spectrum of imaging available, a wide range of brain non-neoplastic disorders can mimic a brain tumor, both clinically and radiologically, posing a potential pitfall for physicians involved in patient care. For example, distinguishing brain parenchyma inflammation from grade II glioma can be difficult for neuroradiologists since both inflammation and glioma appear on conventional MRI sequences as lesions with a mass effect. Moreover, they have similar properties on specific sequences, such as hypointensity on T1-w, hyperintensity on T2-w, and no enhancement on postcontrast T1-CE [4,12].
As a result, there is a continued need for more accurate pre-operative glioma differential diagnosis (DDx), which may be conducted non-invasively with more advanced imaging techniques or through artificial intelligence methods [13,14].
In light of the above, the use of radiomics could be extremely helpful for non-invasive glioma DDx since it uses a voxel-by-voxel approach to convert the sparse imaging data into big data (histogram, texture, and transformed features). The concept behind radiomic is that biomedical imaging derived from medical images (e.g., CT, MRI, and PET) contains hidden information that can be discovered by quantitative image analyses and used to obtain pathophysiological information so as to supplement data held by the radiologist [15,16].
Using advanced mathematical algorithms, radiomics has advantages in exploiting more tumor features that cannot be recognized by the naked eye [17]. The basic principle of radiomics is that a pathological process that alters the tissue modifies the intensity and distribution of the pixels, which will be reflected in different values of textural features with respect to those of the normal tissue and/or tissues affected by other diseases [18].
In neuro-oncology, these features can potentially be used for DDx of newly diagnosed cerebral lesions suggestive of brain tumors [19].
In the last decade, radiomics studies aiming at differentiating gliomas from other intracranial diseases have substantially increased, with many demonstrating the power of radiomic features for distinguishing between gliomas and metastases, as well as gliomas and PCNLS, and also non-neoplastic brain diseases [12,20,21]. Nevertheless, the current use of radiomics in glioma differentiation is rather confined to the academic literature, with no research translating to clinical applications, thus generating doubts among clinicians about the validity of radiomics in this field. This is owing in part to a general lack of efficient and effective strategies for translation of imaging biomarkers into clinical practice. In response to the great need for a qualified reporting, standardized evaluation of the performance, reproducibility, and clinical utility of radiomics, a system of metrics to determine the validity and completeness of radiomics studies was developed by Lambin et al. in the form of the radiomics quality score (RQS) [15]. The RQS is a modality-independent tool developed to assess the methodological quality of studies using radiomics. It is based on 16 items that reward and penalize the methodology and analyses of a radiomics study, thus encouraging best scientific practice.
Given the above, the aim of our study was to summarize the current status of radiomic studies concerning glioma DDx, evaluating the radiomics analysis conducted in previous publications by means of the RQS. Our intention was to promote the quality of radiomics research studies in glioma DDx, analyzing its feasibility for medical decision making, and triggering integrated clinical and advanced imaging analyses.

Search Strategy and Selection Criteria
A systematic search for all published studies using radiomics for glioma DDx was conducted. Three of the most relevant scientific electronic databases (PubMed, Web of Science, Google Scholar) were comprehensively explored and used to build the search. Only studies published since 2015 were selected. The last search was performed on 1 March 2022. The search strategy included the key terms listed in Supplementary Materials. The literature search was restricted to English-language publications and studies of human subjects.
Two reviewers, after having independently screened identified titles and abstracts, assessed the full text of articles that evaluated the use of a radiomics approach for glioma DDx with respect to other diseases and were not review articles. For articles meeting these criteria with full text available, the following further selection criteria had to be fulfilled: involvement of patients in confirmed diseases by pathology and/or surgery and/or overall analysis combined with medical history, clinical symptoms, and various imaging data; presence of information about imaging protocol. Studies were excluded if they aimed at differentiating between different types of glioma (this kind of classification cannot be considered as "DDx" since it falls within the "grading" task).

Planning and Conducting the Review
After the selection procedure, selected articles were analyzed by two reviewers, and data useful for conducting the systematic review were collected in a predesigned sheet. Extracted data will include the following: study characteristics (first author name, publication year, scientometric indexes, namely, Impact Factor (IF), 5-years IF, CiteScore, H-index, first author IF with and without self-citations, study design, in particular prospective or retrospective, number of included patients), diseases involved in the DDx task, imaging modalities used for radiomic feature extraction, information on the ROI placement, soft-ware for radiomic feature extraction, number and feature type, feature selection methods (if used), classification methods, validation methods (if used), information on whether models were applied to a separate test or validation datasets, highest accuracy/most important results, and main findings.
Studies were classified and analyzed according to the purpose they had, and in particular to diseases evaluated other than glioma in the DDx task. This systematic review was conducted in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement (see Supplementary Materials for PRISMA Checklist) [22]. This systematic review has been registered on the Centre for Open Science's Open Science Framework (OSF) (osf.io/3ksa9).

Quality Assessment Using RQS Evaluation
The methodological quality of each study was evaluated by two reviewers independently using the Radiomic Quality Score (RQS) [15]. Any disagreement was resolved by consensus. RQS tool is composed of 16 items structured to assess various crucial steps in the workflow of radiomics analyses. In particular, a maximum of 36 points can be assigned to each study: up to 2 points for the first (a single item, namely "Image protocol quality"), up to 3 points for the second (3 items, specifically on multiple segmentation strategies, the use of phantoms, and multiple imaging time points), and up to 31 points for the third (12 items, encompassing feature extraction, exploratory analysis design as well as model building and validation) RQS checkpoint (refer to Supplementary Table S1 for RQS checkpoints, items, and points for each item). The total score ranges between −8 and 36 and can be translated into a final 0-100 RQS percentage. Two readers assessed each included study using the RQS and any disagreement was resolved by consensus.

Statistical Analysis
Spearman's correlation (ρ) analysis was performed to explore whether there was a correlation between RQS and journal metrics (Impact Factor (IF) of the journal at the year of publication, 5-Year IF, CiteScore, and H-index at the year of publication). Moreover, Spearman's correlation was used to explore the correlation between RQS and H-index of the first author and the year of publication of the study (both with and without self-citations), as well as the association with the number of patients involved in the study and the number of radiomic features investigated. Finally, to explore whether there was a difference in RQS according to the clinical purpose of the study, a subgroup analysis was performed using Kruskal-Wallis. In case of significance, Wilcoxon rank-sum post hoc tests with Bonferroni correction were carried out on each pair of groups. The significance level was set at 0.05. All statistical analysis was performed using SPSS (version 27) (SPSS Inc., Chicago, IL, USA).

Study Selection
A total of 491 articles were retrieved by searching scientific electronic databases. After removal of duplicates, there were 124 articles left for investigation. By scanning the title and abstract of these records, 53 records were excluded because they clearly did not match the inclusion criteria (23 were off-topic, 14 were on glioma grading, 16 were review articles). A total of 71 articles were evaluated on their full text. Of these articles, 19 records were excluded based on the inclusion criteria (15 were off-topic, 11 were not on radiomics, 4 were on glioma grading). An additional 12 articles were found through references of selected articles or pre-existing review/systematic review/meta-analyses, of which 3 were included in our study. Finally, 41 records were included for qualitative synthesis. The PRISMA flow diagram of included studies according to the inclusion and exclusion criteria is presented in Figure 1. included in our study. Finally, 41 records were included for qualitative synthesis. PRISMA flow diagram of included studies according to the inclusion and exclusion c ria is presented in Figure 1.

Characteristics of Included Studies
Characteristics of the 42 selected articles selected are reported in Table 1. The med number of patients (±absolute deviation) was 107.5 ± 76.64. Study designs were 4.8% (2 prospective and 95.2% (39/42) retrospective. All studies except one investigated the po of radiomic features arising from MRI for DDx. Only two investigated radiomic featu from 18FDG-PET [23,24] and only one investigated the power of CT radiomics for glio DDx [25]. A total of 20 studies focused on radiomics for DDx of primary nervous sys lymphoma (PCNSL) and glioma (47.6%), with all but one involving IV glioma gr (GBM) patients. In total, 16 studies explored the diagnostic feasibility of radiomic featu for DDx of glioma and metastases (38.1%), with all but three studies involving IV glio grade (GBM) patients. One study investigated the power of radiomic features for DD GBM, PCNSL, and metastasis and was discussed separately (GBM vs. PCNLS and G vs. MET) [26]. The remaining five studies focused on DDx of glioma and other brain eases (11.9%). Based on these findings, the following section was divided into three s paragraphs, according to the other diseases involved in the included studies other t glioma.

Characteristics of Included Studies
Characteristics of the 42 selected articles selected are reported in Table 1. The median number of patients (±absolute deviation) was 107.5 ± 76.64. Study designs were 4.8% (2/42) prospective and 95.2% (39/42) retrospective. All studies except one investigated the power of radiomic features arising from MRI for DDx. Only two investigated radiomic features from 18FDG-PET [23,24] and only one investigated the power of CT radiomics for glioma DDx [25]. A total of 20 studies focused on radiomics for DDx of primary nervous system lymphoma (PCNSL) and glioma (47.6%), with all but one involving IV glioma grade (GBM) patients. In total, 16 studies explored the diagnostic feasibility of radiomic features for DDx of glioma and metastases (38.1%), with all but three studies involving IV glioma grade (GBM) patients. One study investigated the power of radiomic features for DDx of GBM, PCNSL, and metastasis and was discussed separately (GBM vs. PCNLS and GBM vs. MET) [26]. The remaining five studies focused on DDx of glioma and other brain diseases (11.9%). Based on these findings, the following section was divided into three subparagraphs, according to the other diseases involved in the included studies other than glioma.

Radiomics for DDx of Glioma and PCNSL
In total, 21 studies focused on radiomics for DDx of PCNSL and glioma, with all but one [28] involving GBM. Among them, all but one extracted radiomic features from MRI sequences, while the remaining one focused on radiomic features extracted from PET [23].
Among MRI radiomic studies, 6 extracted radiomic features from CE-T1w images. Kunimatsu et al. performed two complementary studies [33,40]. In the first [33], they simply performed image feature extraction and selection and limited the analysis to a principal component analysis to find the predominant features in evaluating the differences between GBM and PCNSL. The training and cross-validation was performed in a subsequent study [40] and found an AUC from 0.87 to 0.99 for the training set and of 0.75 for the testing set.
Xiao et al. [36] compared different supervised classifiers based on T1-CE radiomic features and found that naive Bayes classifier had an AUC of 0.90 for preoperative discrimination of GBM and PCNSL. Similar studies were performed by Priya et al. and Chen et al. [47,57], who found similarly high AUC values for different combinations of classifier models and feature selection techniques. Chen et al. [29] proposed a method based on Scale Invariant Feature Transform features and found that an SVM model based on SIFT features yielded an AUC superior to 0.99 for GBM vs. PCNSL classification task.
Promising results in DDx between PCNSL and high-grade gliomas were also found by Alcaide-Leon et al. [28], who found that SVM classification based on textural features of T1w-CE is not inferior to expert human evaluation in the differentiation of PCNSL and high-grade gliomas, with similar results in terms of AUC. Notably, their study also involved grade III gliomas other than GBM.
Other studies built prediction models based on radiomic features extracted from multiparametric MRI. In particular, Kim et al. [21] found that a logistic regression-based classifier built starting from CE-T1, T2, and ADC features yielded an AUC superior to 0.95 to distinguish between GBM and PCNSL. Similar classification performances were reached by mpMRI-based classifiers built in studies by Xia et al. and Bathla et al. [49,53]. Interestingly, Pryia et al. found that T1-CE had comparable performance to that of mpMRIbased methods. However, these results were obtained from a three-class problem that also included a group of patients with metastasis. Promising results were also found by Suh et al. [35] in an mpMRI-based radiomic study involving features extracted from CE-T1, T2w, and FLAIR. They found that a random forest classifier built using these features outperformed both ADC values and visual analysis by human radiologists. Findings by Nakagawa et al. [34] were in line with those of Kim et al. However, differently from the previous one, features were extracted from T2, rCBV, CE-T1WIs, and ADC.
Xia et al. [52] found that the combination of CE-T1w and ADC radiomic features showed high diagnostic performances (AUC = 0.94). Moreover, the integration of this model with radiologists' diagnoses outperformed performances of the radiologists alone. Similar results were obtained by Choi et al. [27], who found that the initial area under the curve derived from CE-T1w could be useful in combination with ADC for differentiating between PCNSL and atypical GBM.
Two studies were performed by the same group [21,44] and were also based on radiomic features extracted from CE-T1w and ADC. In the older one, they evaluated different feature selection methods and machine learning models and found that the combination of recursive feature elimination and a random forest classifier revealed an AUC of 0.984 in the internal and AUC 0.94 in the external validation set. In the more recent study, they utilized a lower number of radiomics features (n = 936 with respect to n = 1618 of the previous one) and applied four different classification metrics, of which two based on radiomic features were extracted from CE-T1w and ADC. Metrics 1 and 2 used radiomic features, and feature selection and classification were optimized with SVM, GLM, or random forest (metric 1) or multilayer perceptron (MLP) network. They found that a deep learning-based MLP network classifier with radiomic features showed the highest performance in differentiating PCNSL from GBM. These results were in line with considerations of Wu et al. [30], who also proposed a radiomic approach based on deep learning and considering CE-T1WI and T2w as MRI sequences. In particular, they proposed a sparse representation-based radiomics system for classifying GBM from PCNSL and found that this approach outperformed traditional radiomics methods.
Among MRI-based studies, Wang et al. and Bao et al. [37] were the only two that did not involve radiomic features extracted from CE-T1. Wang et al. [43] focused only on T2w and found that texture features from T2w could be used for differentiating GBM from PCNSL. However, it should be noted that they considered only 5 textural features. Bao et al. found that the combination of whole-tumor-based histogram features from normalized cerebral blood volume (nCBV) and ADC for contrast-enhancing lesions could be useful for GBM/PCNSL differentiation.
Kong et al. [23] explored a 18F-FDG-PET-based radiomics approach to distinguish PCNSL from GBM. They extracted features from a standardized uptake value (SUV) map, an SUV map calibrated with the normal contralateral cortex (ncc) activity (SUV/ncc map), and an SUV map calibrated with the normal brain mean (nbm) activity (SUV/nbm map). They found that the most discriminative power was achieved by SUV first-order and textural features.

Radiomics for DDx of Glioma and Metastases
A total of 16 studies explored the diagnostic feasibility of radiomic features for DDx of glioma and metastases. All but two of them extracted radiomic features from MRI sequences, while one evaluated features from CT [25] and one extracted features from PET [24]. In all but three studies [25,46,58], the glioma group consisted of patients with grade IV glioma (GBM). Six studies extracted radiomic features from contrast-enhanced T1-weighted MRI scans [20,31,38,54,55,59]. Among them, the largest patient sample was investigated by Artzi et al. [31] (439 patients), who aimed at differentiating GBM and MET subtypes using radiomics analysis based on conventional post-contrast T1w. They tested four different types of machine learning algorithms (both supervised and unsupervised), revealing that SVM was the best (AUC = 0.98). They suggest that classification between glioblastoma and brain metastasis subtypes may require additional MRI sequences with other tissue contrasts. Similar study settings and results can be found in studies by Chen et al. [38], Han et al. [55], and De Causans et al. [54], in which diagnostic models were built based on multiple selection methods and classification algorithms for differentiating GBM from MET. Su et al. [59] aimed to differentiate GBM from primary brain metastases, finding that a radiomics model based on logistic regression might be a useful supporting tool for the preoperative differentiation of GBM from solitary brain MET due to an AUC superior to 80%. Ortiz-Ramon et al. proposed a radiomics MRI approach able to discriminate between GBM and MET with AUC > 80%. Unlike the previous three studies, they used radiomic features extracted from 2D ROIs.
Dong et al. [39], Qian et al. [42], and Bae et al. [45] investigated multiple classifiers for differentiating between solitary brain MET and GBM by extracting radiomic features from T1w, T2w, and T1-CE. Dong et al. [39] found that features derived from the peri-enhancing oedema region had moderate value in differentiating supratentorial single brain MET from GBM. Qian et al. [42] found more promising results, showing that the clinical performance of the classifier based on SVM and LASSO (>95%) was superior to neuroradiologists' performances. Bae et al. [45] also investigated multiple feature selection methods and classifiers for differentiating between single brain metastases and GBM. Interestingly, they also compared results from traditional machine learning radiomic approaches with a deep neural network approach. The latter performed better than the best-performing traditional machine learning classifiers or human readers and demonstrated good generalizability in the external validation.
Petrujkić et al. [41] aimed to differentiate GBM and solitary brain metastases of different origin by means of quantitative parameters of fractal and GLCM texture features from T2W, SWI, and CET1 images and found that texture features are more significant than fractal-based features for GBM solitary MET.
A recent study by Priya et al. [56] also cross-compared multiple radiomics-based machine learning models using features extracted from mpMRI (T1W, T2W, T1-CE, ADC, FLAIR) for DDx of intracranial metastatic disease from GBM and found that FLAIR was the best individual sequence (LASSO-full feature set, AUC 0.951), while for combined T1-CE/FLAIR sequence, adaBoost-full feature set was the best performer (AUC 0.951).
Among studies investigating the value of MRI radiomics features in differentiating brain metastases from both high-and low-grade gliomas (unlike the previously discussed 9 studies involving only GBM), Dastmalchian et al. [46] found that texture features from MRI fingerprinting T1 and T2 maps were able to differentiate brain MET from high-and low-grade glial brain tumors. Notably, they did not build any multivariable model but performed ROC analysis on each feature. Similar results were obtained by Csutak et al., who found that texture parameters from T2w were able to distinguish high-grade gliomas from MET. Notably, they investigated texture analysis of the peritumoral zone [51]. Su et al. evaluated the utility of radiomics for Amide Proton Transfer weighted imaging for the same purpose and in a similar patient cohort. Their classification model based on the random forest classifier achieved an AUC superior to 70%.
Among studies involving other modalities than MRI, Zhang et al. [24] found that an integrated radiomics model incorporating DWI and 18F-FDG PET improved the performance of differentiating GBM from solitary brain metastases. Promising performances (AUC = 0.992) were also obtained from models built using CT-based textural features to differentiate patients with high-grade gliomas from those with solitary brain metastases. However, the patient sample was relatively small (36 patients).

Radiomics for DDx of Glioma and Other Brain Diseases
Five studies focused on DDx of glioma and other brain tumors, of which two involved paediatric populations. In particular, Dong et al. [48] aimed to investigate the effectiveness of radiomics and machine-learning techniques based on mpMRI in distinguishing the glioma subtype ependymoma from medulloblastoma. They explored different combinations of feature selection and machine learning techniques starting from features extracted from postcontrast T1w images and ADC maps, finding that multivariable logistic regression feature selection combined with the random forest classifier yielded an AUC = 91% for the classification of EP from MB. Zhou et al. [50] aimed to assess the power of machine learning radiomic-based models for differentiating paediatric posterior fossa tumors and involved a larger population of 288 patients. Unlike Dong et al. [48], they extracted features from T2w images, and included patients with the glioma subtype pilocytic astrocytoma, except those with EP and MB in their cohort. Their machine-learning automatic approach revealed an AUC = 94% with an accuracy of 85% for differentiation between MB and non-MB (namely glioma group) and was superior to performances of non-automatic pipeline and qualitative expert MRI review. The third study involved adult patients and aimed to assess the value of MR-based radiomic features arising from T1w and T2w in differentiating brain inflammation from grade II glioma [12]. Their findings were promising, with models' AUCs superior to 92% and their performances superior to those from experienced radiologists. Finally, the remaining two studies investigated the ability of radiomics to differentiate between gliomas (in particular, necrotic glioblastomas [60] and cystic gliomas [61]) and brain abscess.

Quality Assessment with RQS
The details of the RQS of all included studies are provided in Supplementary Table S3. The average RQS total score was 8.71 ± 5.67, with the corresponding percentage of 24.21 ± 15.56%, ranging from 0.0 to 52.78% (Figure 2). Concerning the first RQS checkpoint (item 1), all studies provided a comprehensive documentation of imaging protocol, with only two of them scoring the maximum amount of points arising from the usage of a public protocol.
Cancers 2022, 14, x FOR PEER REVIEW 16 o with only two of them scoring the maximum amount of points arising from the usag a public protocol. Concerning the second RQS checkpoint (items from 2 to 4), more than half of studies (57.1%, 24/42) employed multiple segmentation (mainly arising from segmen tion by different radiologists), but only five studies satisfied the item of "imaging at m tiple time points" and only 3 articles satisfied that of "phantom study". Regarding ite included in the third RQS checkpoint (items from 5 to 16), all but four studies (90. applied feature reduction techniques. Only four studies (9.52%) performed multivaria analysis with non-radiomics features. Only 2 out of 42 included articles (4.76%) were a to detect and discuss biological correlates and only 15 (35.7%) provided a cut-off analy All but one of the studies reported discrimination statistics and their statistical nificance, of which all but three applied resampling techniques. Conversely, only 2 studies reported calibration statistics, and none of them applied resampling technique In total, 35.7% of the studies (15/42) did not include a validation of their resu Among studies validating their results, only five validated analyses using an external idation cohort and one used two external validation cohorts. Moreover, 8/42 studies co pared radiomics models with the specific gold standard and about half of the inclu studies (21/42) discussed the clinical utility of the developed model by means of decis curve analysis.
Finally, no study included a cost-effectiveness analysis and 11 made code and d publicly available.

Statistical Analysis
There was a significant positive correlation between RQS and journal Impact Fa (ρ = 0.35, p = 0.022), number of patients involved (ρ = 0.44, p = 0.003), and number of ra omics features (ρ = 0.51, p = 0.0009) extracted in the study. On the other hand, weak p tive but not significant correlations were found between RQS and 5-year IF, HI of the jo nal, and of the first author with and without self-citations (ρ = 0.25, ρ = 0.25, ρ = 0.20, ρ = 0.22, respectively). No statistically significant differences were found between RQ studies with different aims. Refer to Supplementary Table S2 for details of scientome indexes of the included studies. Concerning the second RQS checkpoint (items from 2 to 4), more than half of the studies (57.1%, 24/42) employed multiple segmentation (mainly arising from segmentation by different radiologists), but only five studies satisfied the item of "imaging at multiple time points" and only 3 articles satisfied that of "phantom study". Regarding items included in the third RQS checkpoint (items from 5 to 16), all but four studies (90.5%) applied feature reduction techniques. Only four studies (9.52%) performed multivariable analysis with non-radiomics features. Only 2 out of 42 included articles (4.76%) were able to detect and discuss biological correlates and only 15 (35.7%) provided a cut-off analysis.
All but one of the studies reported discrimination statistics and their statistical significance, of which all but three applied resampling techniques. Conversely, only 2/42 studies reported calibration statistics, and none of them applied resampling techniques.
In total, 35.7% of the studies (15/42) did not include a validation of their results. Among studies validating their results, only five validated analyses using an external validation cohort and one used two external validation cohorts. Moreover, 8/42 studies compared radiomics models with the specific gold standard and about half of the included studies (21/42) discussed the clinical utility of the developed model by means of decision curve analysis.
Finally, no study included a cost-effectiveness analysis and 11 made code and data publicly available.

Statistical Analysis
There was a significant positive correlation between RQS and journal Impact Factor (ρ = 0.35, p = 0.022), number of patients involved (ρ = 0.44, p = 0.003), and number of radiomics features (ρ = 0.51, p = 0.0009) extracted in the study. On the other hand, weak positive but not significant correlations were found between RQS and 5-year IF, HI of the journal, and of the first author with and without self-citations (ρ = 0.25, ρ = 0.25, ρ = 0.20, and ρ = 0.22, respectively). No statistically significant differences were found between RQS of studies with different aims. Refer to Supplementary Table S2 for details of scientometric indexes of the included studies.

Discussion
In this systematic review, we aimed to explore whether radiomics could provide information about the DDx of gliomas, summarizing the current status of the literature research and evaluating the quality of included studies using the RQS tool. The reasons that led us to perform the study are both the urgent need for clinicians to assess alternative noninvasive differential diagnostic tools to ensure an accurate preoperative assessment of intracranial masses (since the lack of a clear diagnosis may therefore lead to invasive procedures that may be inappropriate for the primary disease treatment and could also aggravate a patient's condition) and the potential power of radiomics for DDx of newly diagnosed cerebral lesions suggestive of brain tumors.
A total of 42 studies from 2015 onwards were examined. Almost all studies involved machine learning techniques for radiomic analysis, of which two involved unsupervised DNN techniques. Among studies involving supervised machine learning, 24 investigated multiple models combined with multiple feature selection methods and evaluated the combination providing the best result in terms of accuracy.
Despite promising results obtained from each of them (with best AUCs ranging from 0.7 to 0.99), our study revealed that those studies were far from providing definitive conclusions for clinical implementation and widespread use of radiomics for glioma DDx.
Almost all studies investigated radiomic approaches based on MRI. In particular, CE-T1WI sequence was the most investigated since it is the first-line MRI sequence for glioma assessment. Only two studies investigated the ability of PET radiomic features to differentiate gliomas from metastases [24] and glioma from PCNSL [23], and only one study was on CT [25].
The results of RQS have brought out the main positive and negative aspects related to the radiomic workflow followed in each selected study. Mean RQS was 8.71 out 36, with a mean percentage RQS of 24.21%, and this was in line with previously published data regarding prostate, breast, lung, renal, and brain cancer [62][63][64][65]. The lack of a rigorous procedure related to radiomics workflow largely contributed to the low RQS scores of the included studies.
Concerning RQS checkpoint 1, image protocol was well documented in all studies. Moreover, no studies involved public image protocols which allow reproducibility and replicability. The results of RQS items included in RQS checkpoint 2, more than half of the studies performed multiple segmentations to limit the extent of bias arising from segmentation variability. It is worth noting that the ROI type (2D/3D) and the segmentation method (manual, semi-automatic, automatic) is not uniform across studies. Furthermore, manual or semi-automated image segmentation with manual correction were used in almost all studies, and this limits included studies since it is well known that manual segmentation is time-consuming and both manual and semi-automated segmentation introduce a considerable observation bias and affect studies in terms of intra-and inter-observer variations concerning ROI/VOI delineation [18]. It should also be considered that the area considered for feature extraction was extremely variable across studies. It is worth noting that some studies targeted the enhancing tumor (with or without the inclusion of necrosis and intratumoral cysts) [27,36,54], while others targeted the peritumoral zone [25,51,60].
Notably, no studies determined inter-scanner and inter-vendor variability and collected images at multiple timepoints. On a positive note, considering the third RQS checkpoint, all studies except four performed feature reduction. It is a positive aspect since excessive dimensionality of features negatively affects model performance and could lead to overfitting [66].
Another relevant finding emerging from our study was that only two of the included studies were prospectively designed. This constitutes an important limiting factor in radiomic research since a well-designed prospective study can reduce and minimize the potential confounding factors, representing a higher level of evidence for the quality validity (this is the reason why prospective studies are given the highest weighting in the RQS tool (7 points), accounting for around 20% of the full scale).
It is significant that almost half of the reviewed papers did not include a validation of their results, and this negatively affects the risk of false-positive results that prevent the translation of radiomics to clinical practice. On a positive note, among the remaining studies not performing validation with an independent cohort, almost all opted for performing the cross-validation.
Most studies lacked any kind of openness, either in sharing datasets, segmentations, or codes, and this constitutes a significant limitation in terms of verification and reproducibility of the reported findings [67,68].
The same happened for the cost-effectiveness analysis that can evaluate a radiomics prediction model in terms of health economics in case of its application in clinical practice, assuming that a novel predictor should not be more expensive than currently available predictors when accuracy is comparable and comparing the health effect of a radiomics predictor with a condition without a radiomics predictor [15]. However, this RQS point takes second place since has standardization and radiomics models' validation as a prerequisite.
It should be highlighted that only 20/42 studies refer to IBSI guidelines or used software for radiomic features extraction that are IBSI-compliant (e.g., PyRadiomics). About this topic, it is important to adhere to the standardization of the radiomics features nomenclature and calculation according to the IBSI to improve the reproducibility of scientific research [69]. Future studies are needed in terms of adherence to the standardization of radiomics features.
To our knowledge, this is the first systematic review aimed at exploring whether radiomics could provide information about the DDx of gliomas and evaluating studies by means of an RQS tool.
Previous studies aimed at evaluating the radiomic analysis in different studies for different applications. Park et al. evaluated radiomics analysis in neuro-oncologic studies according to RQS and found that the quality of reporting of radiomics studies was insufficient, with a median RQS of 11 out of 36 [65]. The results of a study by Stanzione et al. on prostate MRI radiomics were in line with our findings and revealed an average RQS score of 7.93 and an RQS percentage of 23% [62]. Wang et al. performed a systematic review of radiomic studies focused on lymphoma and found a mean percentage RQS of 14.2% [70]. Notably, their study included 12 studies also evaluated in our systematic review, in particular those on DDx of glioma and PCNSL.
Unlike most studies aimed at investigating the quality of radiomic studies by means of RQS, we considered it appropriate to investigate the possible association between RQS and scientometric indexes and found that publications with higher RQS were published in journals with higher IF. However, studies with high/low RQS and low/high IF were also found. Interestingly, we also found included studies' quality increased with the increasing number of included patients and the number of extracted features.
Our review of the literature has some limitations that should be acknowledged. First, as also highlighted in previous studies, the RQS scoring system is not a gold standard to qualify radiomics studies and still needs revisions to become a widely accepted tool in radiology. Therefore, some aspects of the RQS scoring system such as the difficulty in implementing imaging at multiple time points and phantom study on all scanners, as well as the lack of specificity for a particular study aim, could lower the current literature more than necessary [65,71]. Another limitation affecting our study is that almost all included studies were retrospective, and they are supposed to be more bias-affected [72,73]. This aspect, together with the absence of external validation cohorts for almost all included studies, as well as the comparison with reference standards, prevented us from drawing conclusions about the efficacy of radiomics for glioma DDx. Moreover, the high variability in sample size, inclusion criteria, and methodological settings across studies prevented us from performing a meta-analysis according to the aims of the studies. Moreover, we did not investigate specific radiomics features shared among different studies (according to the specific aim), given the extreme variability of imaging protocol and software for feature extraction.

Conclusions
Despite promising and encouraging results found in each of the included studies, our study revealed that the current literature on radiomics for glioma DDx still lack the quality required to allow its introduction into clinical practice. In particular, validation is necessary using an external dataset, and improvements need to be made to feature reproducibility, analysis of the clinical utility, pursuits of a higher level of evidence in study design, and openness of science. However, their value might go beyond what was formally assessed with the RQS tool, and further efforts are warranted to provide more solid evidence and the basis for future investigations in this field. This work could provide new insights and help to reach a consensus on the use of the radiomic approach for glioma DDx.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/cancers14112731/s1, Table S1: RQS checkpoints, items and points for each item; Table S2: Journal metrics of the included studies; Table S3: Details of methodological quality assessment by Radiomic quality score (RQS) tool.

Conflicts of Interest:
The authors declare no conflict of interest.