1. Introduction
Sinonasal tumors are rare conditions that include both benign and malignant entities. Malignant tumors account for 3–5% of all head and neck neoplasms and often carry a poor prognosis, primarily due to late diagnosis [
1,
2,
3]. Although benign tumors typically exhibit non-aggressive behavior, they can appear at advanced clinical stages and, like malignant tumors, produce nonspecific symptoms. Moreover, certain benign tumors have the potential to recur or even to transform into malignancies [
4]. Distinguishing between malignant (cancerous) and benign (non-cancerous) tumors is crucial, as it significantly impacts on patient management, prognosis, and treatment strategies [
5,
6,
7].
The initial diagnostic approach is generally similar for both benign and malignant tumors. It involves a comprehensive physical examination, including nasal endoscopy, imaging, and tissue biopsy [
5,
6]. Radiological evaluations are essential for diagnosing, staging, and determining the feasibility of tumor resection. Both Computed Tomography (CT) and Magnetic Resonance Imaging (MRI) are indispensable for devising an appropriate treatment plan [
8].
CT scans, while helpful, provide limited insight into specific tumor patterns, with exceptions such as fibro-osseous lesions like osteomas, fibrous dysplasia, and ossifying fibromas [
9]. Additionally, CT scans may overestimate disease extent, as they cannot reliably differentiate between tumors, inflamed mucosa, or retained secretions [
10,
11]. To better delineate tumor patterns, assess surrounding soft tissues, and evaluate orbital or intracranial involvement, MRI is often superior compared to CT, due to its ability to manipulate contrast between different tissues through techniques such as fat suppression, diffusion weighting, and many others. Imaging features on conventional MR sequences, such as T
2-weighted (T
2-w) and T
1-weighted (T
1-w) images (with and without gadolinium-based contrast agent, GBCA), may predict tumor histology before biopsy [
8,
10].
However, MRI diagnostic accuracy is limited by spatial resolution and relies on the operator’s expertise, introducing an element of subjectivity. Furthermore, the morphological features of benign and malignant sinonasal tumors often overlap, complicating their differentiation [
12]. As a result, traditional imaging techniques only scratch the surface of the diagnostic potential they could provide.
The emergence of artificial intelligence (AI), machine learning (ML), and radiomics holds promise for addressing these limitations [
13]. These technologies can extract hidden and otherwise inaccessible radiological information, potentially improving diagnostic precision and accelerating the pathway to a final diagnosis. This advancement could significantly enhance treatment planning and prognosis [
12,
14,
15,
16,
17,
18,
19,
20,
21,
22].
The aim of this study was to analyze the potential of utilizing a radiomic pipeline in the analysis of conventional MRI scans to determine the classification of sinonasal and skull base neoplasms as either benign or malignant. Clinical and radiomic variables will be evaluated and selected to possibly improve the discrimination model. A dedicated ML pipeline for the classification of malignant and benign tumors was developed. The analyses were performed using both radiomic and clinical features, proving the effectiveness of their synergy in determining the most informative subset of variables. Emphasis was given to the feature selection strategy employed for data integration, which represents a hot topic in the current radiomic literature. A hybrid signature of clinical and radiomic features will be proposed, customizing the DNetPRO algorithm [
23] for this purpose [
24]. The classification performances of the DNetPRO signature were compared to other standard ML approaches. The possibility to inspect the elements of the network signature extracted by the DNetPRO algorithm also provided explainable results associated with the numerical performance of the model. To the authors’ knowledge, this is the first application of the DNetPRO approach to this kind of tumor and the second on radiomic feature selection. This technique represents a technical innovation in this field, allowing the depiction of the most important features as well as their relation.
2. Materials and Methods
2.1. Study Population and Patient Selection
A retrospective study was conducted using data from patients referred to two Italian tertiary-care referral centers for sinonasal and skull base tumors, namely “Ospedale di Circolo e Fondazione Macchi” in Varese and “Ospedale Sant’Anna” in Como. The reference period spanned from 1 January 2011, to 1 April 2024. The inclusion criteria were as follows: (i) histopathologically diagnosed sinonasal or skull base tumors; (ii) primary or recurrent tumors; (iii) MRI examination performed no more than 4 weeks before the intended treatment; (iv) MRI studies including contrast-enhanced T
1-w images (CE T
1-w in the following) and axial T
2-w images; (v) image acquisition on 1.5T scanners. The exclusion criteria were as follows: (a) incomplete or unavailable clinical or radiological data; or (b) MR sequences with a signal-to-noise ratio ≤ 1.0. All tumors were categorized as either benign or malignant based on histopathological analysis and according to the most recent World Health Organization (WHO) classification [
25]. This study was approved by the Institutional Review Board of the hospitals (Insubria Board of Ethics, approval number 0033025/2015, approval date: 7 July 2015), and written informed consent was waived.
2.2. Image Acquisition and Segmentation
All participants underwent an MRI examination protocol that included at least the following sequences: fast spin echo (FSE) T1-w, FSE fat-saturated T2-w, and CE FSE T1-w. The latter sequence was acquired following intravenous administration of gadolinium (gadopentetate dimeglumin, Magnevist; Bayer 0.2 mL/kg or gadobutrol, Gadovist; Bayer 0.1 mL/kg). All the scans were conducted using two different 1.5 T MR scanners (Philips Achieva, in Varese Hospital; GE signa in Como Hospital), all of them with 1.5 T magnetic field strength. MR images were reviewed by an experienced head and neck radiologist to confirm the visual quality, readability, and adequacy for the following analytic phases.
Imaging alignment and 3D volume of interest (VOI) semi-automatic segmentation were performed on CE T
1-w and T
2-w using ITKSNAP software (version 3.8.0) [
26]. Two additional VOI semi-automatic segmentations and an alignment review were performed for intra-observer agreement assessment by two expert radiologists. Subsequently, the processed alignment and semi-automatic segmentation were independently reviewed for inter-observer agreement reliability. Intra-/inter-observer disagreements were resolved by consensus. Quantitative inter-/intra-observer variability was assessed using the Dice similarity coefficient (DSC), which yielded a value of 0.90.
2.3. Clinical and Radiological Variables Selection
The population analyzed in the current study includes 145 patients, split into 76 (52%) malignant tumors and 69 (48%) benign ones. Patient demographics, tumor details, and clinical characteristics were recorded in a dedicated database and explained in
Table 1. For some patients, the whole set of characteristics was not available, leading to some missing values (i.e., for tumor size, bone involvement, perineural spread, gross adjacent involvement, and epicenter).
Clinical information included age, sex, risk factors, symptoms, and tumor side. More specifically, symptoms were divided into class 1, asymptomatic or nonspecific (nasal obstruction, anosmia, headache, and epiphora), and class 2, red flags (persistent epistaxis, visual impairment, diplopia, proptosis, face pain, ocular movement pain, and face or palatal swelling). Imaging features included tumor size (more or less than 5 cm), T2-w low-signal (more or less than 50% of tumor), margins (well- or ill-defined), cystic component (yes/no), necrosis (more or less than 10% of tumor), septations (yes/no), bone involvement (yes/no), pattern of enhancement (heterogeneous or homogeneous), perineural spread (yes/no), midline crossing (yes/no), epicenter (nasal cavity vs. ethmoid sinus vs. maxillary sinus vs. frontal sinus vs. sphenoid sinus), and gross adjacent site involvement (yes/no) of at least one of the following: orbit, brain, nasolacrimal drainage system, palate, or skin.
The statistical analysis of clinical and demographic characteristics was performed, transforming the values into categorical variables. The categorical association between clinical variables and tumor outcomes was assessed using a Cramer’s V statistic [
27] (using Bergsma Wicher correction [
28]). The association test with tumor outcomes indicated a lower significance for all the considered variables except for gross adjacent site involvement (0.38), symptoms (0.41), and tumor Size (0.38).
2.4. Radiomic Features Extraction
For each lesion volume mask, a standard set of radiomic features was extracted from the lesion volume mask VOI on both the CE T
1-w and T
2-w images. The extracted features include first-order statistics, 3D shape-based scores, gray level co-occurrence matrix, gray level run length matrix, gray level size zone matrix, neighboring gray tone difference matrix, and gray level dependence matrix. For the sake of simplicity, we will label this set of variables as Original in the following sections. Associated with this first set of variables, an analogous amount of information was extracted by the images transformed using Laplacian of Gaussian (LoG) (with sigma of 0.5, 1.0, 1.5 and 2.0 mm) and wavelet transform (with 10 as the bin width parameter). For the sake of clarity, we will refer to these two sets of radiomic features as LoG and Wavelet, respectively. At the end of the extraction, each patient was characterized by a total of 1224 radiomic variables, split into the highlighted Original, LoG, and Wavelet for each image modality (CE T
1-w or T
2-w). The radiomic feature extraction was performed using the pyRadiomics [
29] Python package (v3.1.0, accessed on 24 March 2024).
2.5. Radiomic Feature Embedding
A preliminary analysis of the radiomic features was assessed using a dimensionality reduction procedure. The variables were preliminarily standardized according to their median and interquartile values. The PaCMAP [
30] dimensionality reduction algorithm was used to project the high-dimensional feature space onto a 2D space, keeping the hospital center and the tumor types as reference for the clustering enrichment. The aim of this analysis was to ensure the presence of possible batch effects in the data and/or the possibility to identify structures in the embedding space representation able to carry out information about the nature of the tumor type in a complete unsupervised framework.
2.6. Radiomic Feature Selection
The automated identification of malignant and benign tumors could be interpreted as a binary classification procedure. To this purpose, a dedicated ML pipeline was developed for the identification of the most informative features and outcome classification. Due to the large number of radiomic variables and the intrinsic co-linearity of the information, an accurate feature selection procedure represents a mandatory task for the noise reduction and final interpretability of the results. In this work, we adapted the DNetPRO [
23] algorithm to manage high-dimensional radiomic feature space [
24]. The original version of the algorithm was developed to handle gene expression data, typically characterized by a high number of features compared to the low available sample. An analogous behavior could be found also in radiomic data analysis, in which the redundancy of radiomic variables could lead to difficulties in the identification of a small interpretable subset of them. The possibility to identify a low dimensional set of features—a signature—described by a network relationship between them could facilitate the interpretability of the results, also facilitating the explainability of the classification model. For this purpose, we adapted the DNetPRO algorithm inserting linear Support Vector Machine (SVM) (in the couple evaluation step) and Penalized Logistic Regression models (in the best signature identification step) for the evaluation of feature pairs and filtering of the resulting signature. The results without the insertion of the DNetPRO procedure, i.e., applying an SVM classifier to all variables together, were used as a benchmark and discussed in terms of classification performance and model explainability.
2.7. Machine Learning Pipeline
Following the same scheme proposed for the feature embedding analysis, a dedicated ML pipeline for the classification of tumor types was developed independently for clinical and radiomic features. The intrinsic different but complementary nature of these two sources of information was considered, comparing the classification performances with a final model integrating both sources. In all the simulations, the developed ML pipeline involved a preprocessing step for the standardization of the values according to mean and standard deviation, followed by a feature selection procedure (DNetPRO or SVM classifier [
31]), which led to filtering only the variables considered by a Penalized Logistic Regression model used for the binary classification. The data were split into training and test sets using a 10-fold cross-validation procedure, tuning the model parameters on the training set and evaluating the obtained performance on the related test set. The use of the DNetPRO algorithm was tested following both
procedure A and
procedure B (according to the nomenclature described in the original paper). Due to the low number of samples involved in the current study, we applied DNetPRO
procedure B using a 3-step hold-out partitioning of the dataset, iterating the procedure 3 times. The schematic representation of the proposed pipelines is reported in
Figure 1.
All these procedures were performed on features extracted from CE T1-w and T2-w separately and in combination. For each of the three classifications, a reduced set of radiomic features was used to remove redundant information and to focus on the information gain that could be achieved by integrating radiomic and clinical variables.
We identified the best set of radiomic features among the Original, LoG, and Wavelet categories by testing the performance of an SVM classifier over them separately. The evaluation of the feature category performance was carried out by repeating a stratified 10-fold cross-validation 100 times and employing the median and interquartile range (IQR) as a performance indicator. The performances were quantified using the Matthews Correlation Coefficient (MCC) score, aiming to prevent possible unbalanced results between the two classes. In this way, it was possible to consider the bias of the validation dataset and the different performances that could be achieved by different parameter initialization.
Once the most informative radiomic feature category was identified, it was analyzed using the DNetPRO A and B procedures. The different approaches (SVM and DNetPRO) were compared considering the distribution of the results (after 100 repeated stratified 10-fold cross-validation iterations) achieved on the same category of radiomic features, using the Wilcoxon test. Finally, the DNetPRO signature and the most relevant features for the classic radiomic approach were analyzed.
4. Discussion
The application of radiomics and ML to MRI in the analysis of sinonasal tumors has demonstrated remarkable potential, achieving significantly higher diagnostic accuracy compared to traditional methods [
13]. However, despite these advancements, challenges persist in translating these tools into routine clinical practice. One key limitation lies in the variability of results across studies, driven by differences in data quality, sample size, and the diversity of AI models used. Moreover, the “black box” nature of many AI algorithms raises concerns about the interpretability and clinical reliability of the results.
To ensure that these innovative approaches become practical tools in everyday diagnostics, it is essential to prioritize the development of models that not only achieve high accuracy but also offer interpretable outcomes. This interpretability is crucial for fostering trust among clinicians and for guiding informed decision-making in patient care. Additionally, standardizing radiomic feature extraction, model validation, and reporting practices will be necessary to enhance reproducibility and comparability across studies. In our analysis, we tried to overcome this issue, providing a set of explainable, low-dimensional, and interpretable overviews of the most informative radiomic features.
Starting with the cluster analysis of the PaCMAP space, a partial stratification related to the hospital center is observed, possibly related to different acquisition devices and procedures. But since the distribution of malignant and benign samples is very balanced in the two centers, this suggests that the batch effect provided by the different centers is mitigated; thus, the findings from the ML analysis should not be significantly biased.
Performance analysis of the ML pipeline demonstrates that radiomic features alone have less predictive power than clinical features alone. However, when combined, the classifier’s performance improves, suggesting that radiomic features provide complementary information to clinical features.
Analyzing separately the contribution of individual categories of the radiomic features, stratified by the MRI modality from which they are extracted (CE T1-w, T2-w, and CE T1-w and T2-w) and by filters applied to the image itself (Original, LoG, and Wavelet), it was possible to observe how different features impact on classification. Comparing the classification results of DNetPRO and SVM, it is possible to observe how some results improve by considering only a category of radiomic features. That is the case of the Wavelet features from both the CE T1-w and T2-w sequences, the LoG features from the CE T1-w MRI sequence only, and the Original features for the T2-w sequence only, possibly because radiomic features provide redundant information, reducing the performances of the ML models.
Moreover, it is possible to explain the meaning of the set of most informative features selected for each image modality. The LoG extracts information about edges and blobs of a certain scale (specified by the sigma parameter); since the CE T1-w MRI sequence provides precise structural and morphological information, we can surmise that the LoG filter and the radiomic features are able to retrieve precise information about the tumor border and its internal structure (in term of uniformity or pattern complexity), also allowing the consideration of the adjacent inflamed mucosa. Since the T2-w MRI sequence can better visualize the tumor, the informative power of the radiomic features extracted from the original image (Original) is prominent, since it characterizes the texture and image appearance. Finally, radiomic features extracted from CE T1-w and T2-w images after wavelet transform are the most informative set when combining the two image modalities. This is likely due to the application of low-pass and high-pass filters in the wavelet transform computation, which resulted in images carrying similar information to the original and LoG-filtered images. The low-pass filter produces a denoised version of the original image, preserving the same information. In contrast, the high-pass filter enhances the edges, yielding results like those from the LoG filter. Based on the identified signature, most of the selected T2-w MRI features are derived from the image filtered with the low-pass filter. For the features selected from the CE T1-w images, they are extracted after applying the high-pass filter in at least one direction. This behavior supports the hypothesis that the Wavelet features contain information from both the Original and LoG categories.
Considering the DNetPRO approach, it was possible to observe that
procedure A outperforms
procedure B every time, possibly due to the low number of samples, which makes splitting the dataset into three parts is more difficult. The small number of patients represents an intrinsic limit of the current study, further penalizing the approaches in which a greater split of the data is required. Analogous results were also found and discussed in the work of Curti et al. [
23], in which the authors emphasized the importance of a correct approach in terms of ML pipeline, considering the
procedure A structure valid for a fair comparison of the state-of-the-art literature.
According to the Wilcoxon one-sided test, the DNetPRO procedure B performance is lower than ML in the T2 (p = ) and T1 + T2 (p = ) cases; for the T1 case, the test was not significant (p = 0.18). However, the DNetPRO signature provides valuable information about feature subsets, their relation, and their relevance for classification.
Analyzing the best clinical–radiological signatures selected by the model, the significance of the “Symptoms” and “Tumor Size” variables was remarkable. This highlighted the role of the so-called “red flags” in malignant tumors as an alarm for a potentially aggressive disease. Both variables appeared to be relevant, particularly when correlated with specific radiomic parameters. Other variables considered by the model included the gross evident involvement of adjacent sites and the tumor epicenter.
This also mirrored the clinical practice, as malignant tumors are more likely to involve structures such as the orbit, brain, lacrimal pathway, palate, and skin. Despite the absence of other clinical–radiological variables among the top signatures, their relevance was not dismissed but rather relegated to a secondary role. The model considered these variables less consistent when associated with the selected radiomic indicators.
The DNetPRO approach enhances the understanding and interpretability of the signature and the relationships between the selected features. Upon studying the signature, it becomes evident that radiomic features do not play a central role in classification; instead, they integrate the information provided by the clinical features, contributing to more accurate classification results. As expected, redundant radiomic features are also present. Redundancy is highlighted by the fact that these features are related within the same adjacency matrix, sharing interactions with other covariates. Finally, an interpretation of the radiomic component of the signatures can be proposed. The significance of gray level distribution and histogram dispersion for features extracted from the T2-w MRI sequence may reflect the degree of internal heterogeneity within the tumor. Malignant tumors often exhibit such heterogeneities, including areas of necrosis or high cellular density, which are less commonly observed in benign tumors, typically characterized by greater uniformity. Moreover, the presence of complex texture patterns in the CE T1-w image may be indicative of malignant tumors, which frequently display varied and irregular structural features, such as calcifications or hemorrhages. In contrast, large uniform areas are more commonly associated with benign tumors, such as cysts or fibrous lesions.
The proposed model, by integrating clinical and radiomic features, can significantly enhance the routine clinical workflow in the evaluation of sinonasal tumors. It has the potential to assist clinicians in expediting the diagnostic process by providing probabilistic discrimination between benign and malignant lesions prior to the biopsy. This can reduce diagnostic delays and help prioritize cases that require urgent attention, thus optimizing clinical workflow and resource allocation. Moreover, this model could especially assist non-referral or peripheral centers that may lack specialized expertise in sinonasal tumors. It can also play a crucial role during patient follow-up, aiding in distinguishing between a true tumor recurrence and benign post-treatment changes or inflammatory lesions, particularly in anatomical regions where biopsy is challenging (e.g., frontal sinus). The integration of the radiomic component into MRI analysis software is a step that could further empower radiologists by providing real-time, data-driven insights when interpreting complex or equivocal imaging findings. This decision support system would enhance radiological accuracy and confidence, ultimately streamlining the diagnostic pathway.
5. Conclusions
The introduction of ML and radiomics is transforming sinonasal tumor diagnostics, addressing limitations of conventional MRI and radiologist interpretation. Despite progress, challenges like dataset size, class imbalance, and segmentation persist. Furthermore, a clear link between data and phenotypes of tumor volumes is still missing, highlighting the huge difficulty in radiomic explainability. The demand for explainable solutions and signatures represents a mandatory task for novel ML models to guarantee their application in clinical practice, requiring extra efforts in data analysis. In our application we tried to overcome this limitation by introducing an accurate feature selection step, showing how signatures identified using a network-based approach (DNetPRO) could facilitate this task.
The present study has several limitations that should be carefully considered when interpreting its findings and implications for clinical practice. Firstly, its retrospective design introduces inherent selection bias, relying on existing records which may not uniformly capture all relevant clinical and radiological variables. Additionally, the small sample size limits statistical power and generalizability, potentially affecting the model’s accuracy in differentiating between various histological types of sinonasal and skull base tumors. The relatively small dataset size does not allow the creation of an external test set, affecting the quantification of generalization capabilities. Furthermore, while semi-automated, the segmentation process involved manual steps, which may introduce bias and reduce reproducibility. Finally, the small dataset accounts could introduce bias from the two acquisition centers, but this effect was estimated and considered by the PaCMAP and clustering procedure.
However, the present study underscores ML potential in MRI analysis as a tool to aid clinicians in distinguishing between benign and malignant sinonasal tumors. From a multitude of radiomic features, we were able to identify essential signatures crucial for this differentiation, suggesting future integration into MRI software to support real-time diagnostic decision-making by radiologists.