Comparison of Ruptured Intracranial Aneurysms Identification Using Different Machine Learning Algorithms and Radiomics

Different machine learning algorithms have different characteristics and applicability. This study aims to predict ruptured intracranial aneurysms by radiomics models based on different machine learning algorithms and evaluate their differences in the same data condition. A total of 576 patients with intracranial aneurysms (192 ruptured and 384 unruptured intracranial aneurysms) from two institutions are included and randomly divided into training and validation cohorts in a ratio of 7:3. Of the 107 radiomics features extracted from computed tomography angiography images, seven features stood out. Then, radiomics features and 12 common machine learning algorithms, including the decision-making tree, support vector machine, logistic regression, Gaussian Naive Bayes, k-nearest neighbor, random forest, extreme gradient boosting, bagging classifier, AdaBoost, gradient boosting, light gradient boosting machine, and CatBoost were applied to construct models for predicting ruptured intracranial aneurysms, and the predictive performance of all models was compared. In the validation cohort, the area under curve (AUC) values of models based on AdaBoost, gradient boosting, and CatBoost for predicting ruptured intracranial aneurysms were 0.889, 0.883, and 0.864, respectively, with no significant differences among them. Of note, the performance of these models was significantly superior to that of the other nine models. The AUC of the AdaBoost model in the cross-validation was within the range of 0.842 to 0.918. Radiomics models based on the machine learning algorithms can be used to predict ruptured intracranial aneurysms, and the prediction efficacy differs among machine learning algorithms. The boosting algorithms might be superior in the application of radiomics combined with the machine learning algorithm to predict aneurysm ruptures.


Introduction
Intracranial aneurysms, which are a life-threatening "untimed bomb" inside the skull potentially causing an aneurysmal subarachnoid hemorrhage (aSAH), affect approximately 1-3% of the adult population [1][2][3].However, a large proportion of detected but unruptured aneurysms remain asymptomatic and unruptured during a lifelong follow up [4].The preventive treatment for such aneurysms is risky and expensive, though once ruptured, the outcome is catastrophic [2].The accurate and timely assessment of the rupture risk of intracranial aneurysms would be of great significance to clinical practice.
In clinical practice, scales based on the clinical data and morphology of the aneurysm, such as PHASES or ELAPSS scores [5,6], play an important role in assessing the risk of rupture.With the improvement in efficacy encountering a bottleneck, the topic of interest introduced new features, such as radiomics [7].Radiomics has the advantage of using features that cannot be obtained by regular observations and can achieve a more comprehensive and detailed quantification of features.Recent studies have shown that radiomics features were different between ruptured and unruptured intracranial aneurysms [8][9][10].Furthermore, most studies have shown that the use of radiomics features may improve the rupture prediction performance of intracranial aneurysms [8,9].However, the future research must to explore how to analyze radiomics features to better capture aneurysm characteristic information to more effectively predict ruptured aneurysms.
Machine learning can not only process complex data, including morphologic or radiomics features or fusion features from different types of features medical imaging methods [11][12][13], but can also identify trends and patterns that humans may miss.In addition, the traditional machine learning method has the advantages of reduced training time and is more suitable for a small data size than deep learning.The model and results based on traditional machine learning methods are easy to understand and interpret, and the application is relatively flexible, such as algorithm hybridization [14].With the combination of radiomics and machine learning, the prediction of intracranial aneurysm ruptures has achieved accumulating salient results [15][16][17][18], which achieved the best AUC of 0.86 [18].However, the hidden possibilities in diverse machine learning algorithms have been ignored.In a review of previous work [15][16][17][18] and a brief overview of the application of machine learning, we noticed that there were many machine learning algorithms suitable for the prediction of aneurysm ruptures, which reminded us that by using the dataset, including the population of patients and radiomics features derived from them, a proper machine learning algorithm for further analysis could be developed.
Thus, in this study, after extracting the radiomics features of intracranial aneurysms from computed tomography angiography (CTA) images, our main interests focus on the efficacy of different algorithms combined with radiomics for predicting aneurysm ruptures with the same baseline parameters.We aim to predict ruptured intracranial aneurysms using different machine learning algorithms combined with radiomics and evaluate their differences.

Materials and Methods
The design of this study is shown in Figure 1.In clinical practice, scales based on the clinical data and morphology of the aneurysm, such as PHASES or ELAPSS scores [5,6], play an important role in assessing the risk of rupture.With the improvement in efficacy encountering a bottleneck, the topic of interest introduced new features, such as radiomics [7].Radiomics has the advantage of using features that cannot be obtained by regular observations and can achieve a more comprehensive and detailed quantification of features.Recent studies have shown that radiomics features were different between ruptured and unruptured intracranial aneurysms [8][9][10].Furthermore, most studies have shown that the use of radiomics features may improve the rupture prediction performance of intracranial aneurysms [8,9].However, the future research must to explore how to analyze radiomics features to better capture aneurysm characteristic information to more effectively predict ruptured aneurysms.
Machine learning can not only process complex data, including morphologic or radiomics features or fusion features from different types of features medical imaging methods [11][12][13], but can also identify trends and patterns that humans may miss.In addition, the traditional machine learning method has the advantages of reduced training time and is more suitable for a small data size than deep learning.The model and results based on traditional machine learning methods are easy to understand and interpret, and the application is relatively flexible, such as algorithm hybridization [14].With the combination of radiomics and machine learning, the prediction of intracranial aneurysm ruptures has achieved accumulating salient results [15][16][17][18], which achieved the best AUC of 0.86 [18].However, the hidden possibilities in diverse machine learning algorithms have been ignored.In a review of previous work [15][16][17][18] and a brief overview of the application of machine learning, we noticed that there were many machine learning algorithms suitable for the prediction of aneurysm ruptures, which reminded us that by using the dataset, including the population of patients and radiomics features derived from them, a proper machine learning algorithm for further analysis could be developed.
Thus, in this study, after extracting the radiomics features of intracranial aneurysms from computed tomography angiography (CTA) images, our main interests focus on the efficacy of different algorithms combined with radiomics for predicting aneurysm ruptures with the same baseline parameters.We aim to predict ruptured intracranial aneurysms using different machine learning algorithms combined with radiomics and evaluate their differences.

Materials and Methods
The design of this study is shown in Figure 1.

Participants
This experiment was approved by the ethics standards committee on human experimentation of the Second Affiliated Hospital of Chongqing Medical University, and written informed consent was obtained from each participant.
Patients diagnosed with intracranial aneurysms by CTA or digital subtraction angiography (DSA) in the two centers of our hospital between 2015-2021 were included in this study.The exclusion criteria included: (1) secondary intracranial aneurysms of primary vascular disease or intracranial aneurysms combined with intracranial vascular diseases (such as Moyamoya disease, arteriovenous malformations, and autoimmune-related vascular disease, etc.); (2) multiple or fusiform intracranial aneurysms; (3) intracranial aneurysms with a maximum diameter < 2 mm on CTA images; (4) intracranial aneurysms that could not be differentiated from infundibulum on CTA images; (5) poor quality of CTA images (motion artifacts, delayed scanning, etc.); and (6) surgical or interventional therapy of intracranial aneurysms before the CTA examination.Finally, a total of 739 patients were included in this study; 192 who suffered from SAH during the follow up were diagnosed as having ruptured intracranial aneurysms.The demographic data of all patients including a history of hypertension and SAH was recorded.
Image Acquisition and Analysis CTA examinations were performed on multislice spiral CT scanners (Aquilion ONE, Canon medical Systems, Japan; Somatom Definition Force, Siemens Healthcare, Germany), with the following scanning protocol: a tube voltage of 110-120 kV, 200-250 mA, a layer thickness of 1.0 mm, a layer spacing of 0.7 mm, and a matrix of 512 × 512.The contrast agent was iohexol solution, the total dose was 150-300 mg/kg of iodine, and the injection flow rate was 4.5-5.0mL/s.
Given the images derived from multiple CT scanners with different parameters, the images were preprocessed as follows: (1) the CTA images were resampled with the voxel size 1.0 × 1.0 × 1.0 mm 3 ; (2) gray-level discretization with the original intensities were resampled with a fixed bin width (256 bins); and (3) the characteristics of intracranial aneurysms were viewed in a fixed window (level, 50 Hounsfield unit [Hu]; width, 110 Hu) on CTA images.Image analysis of intracranial aneurysms was performed independently by three radiologists who were blinded to the clinical status of the patients to provide a consensus as to the final interpretation.Multiplanar reformation technology was applied when necessary to measure the long diameter and for determining the location of the intracranial aneurysm.In addition, the PHASES scores were analyzed to evaluate the general risk of rupture of intracranial aneurysms [5].

Simple Random Sampling
Considering the influence of sample imbalance between the ruptured and unruptured intracranial aneurysm groups, we randomly selected 384 patients from the sample population with unruptured intracranial aneurysms using the random seed of 68,439.The ratio of individuals in the unruptured intracranial aneurysm group to ruptured intracranial aneurysm group was 2:1.In order to test whether the distribution of the random sample was consistent with that of the population, Poisson, negative binomial, normal, gamma, and generalized Pareto distribution tests were conducted on the PHASES scores of the random sample and population [5,6].Then, we found that the PHASES scores of the population presented the minimum standard errors and were in agreement with the negative binomial and Poisson distributions, with scores of 0.047 and 0.049, respectively, and those of the corresponding test sample (384 patients) were 0.060 and 0.070, respectively.In addition, we compared these models with the other distribution models; the corresponding sample remained at the minimum value.Thus, it can be considered that the 384 randomly selected patients could represent the sample population with unruptured intracranial aneurysms, and the corresponding sample was chosen as the unruptured intracranial aneurysm group for the analysis.
Finally, 576 patients with intracranial aneurysms were retrospectively reviewed and they were randomly divided into training (n = 403) and validation (n = 173) cohorts in a ratio of 7:3 by computer software-generated random numbers.

Radiomics Analysis and Models
Firstly, the volume of interests (VOIs) of the intracranial aneurysm group was manually sketched slice by slice on CTA images by a trained radiologist using ITK-SNAP software (version 3.8.0)and double-checked by another radiologist.Then, a total of 107 radiomics features were extracted automatically from each VOI using PyRadiomics software (version 3.0.1).Then, inter-observer and intra-observer reproducibility analyses were performed to assess the stability of radiomics features (see Supplementary Materials).
Harmonization in the feature domain was performed according to the previous research before further feature selection [19,20], which is described in the Supplementary Materials.Then, an independent sample test and elastic network regression analysis were performed to choose the optimal radiomics features related to ruptured intracranial aneurysms.Finally, the optimal radiomics features and 12 common machine learning algorithms (including the support machine learning (SVM), decision-making tree, eXtreme gradient boosting (XGB), Gaussian Naive Bayes (GNB), logistic regression, random forest, k-nearest neighbor (KNN), bagging classifier, AdaBoost, gradient boosting, light gradient boosting machine (LGBM), and CatBoost) were used to construct models for predicting intracranial aneurysm ruptures in the training cohort, and then validated in the validation cohort.The calibration curves were plotted to assess the calibration ability of the 12 machine learning models.The area under curve (AUC), sensitivity, specificity, positive predictive value, negative predictive value, and accuracy of each model in the validation cohort were calculated to quantify the discriminant performance of each model.The Delong test was used to determine the significance of the AUC difference among the 12 machine learning models.For the model with the best performance, we further carried out crossvalidation tests (3 folds and 5 repeats) to calculate its AUC values in the validation cohort.Additionally, SHapley additive exPlanations (SHAPs) values were introduced to show the importance of each features in the model.

Statistical Analysis
The Shapiro-Wilk and Bartlett tests were applied to test the distribution of the clinical variables, then the Student's t-test, F test, or chi-squared test were applied to determine the between-group differences of the training and validation cohorts, and that of patients with ruptured and unruptured intracranial aneurysms in the training and validation cohorts, respectively.The Spearman correlation analysis was used to evaluate the correlation between optimal radiomics characteristics and the size measurement of aneurysms.A two-tailed p < 0.05 was considered statistically significant.

Demographics and Clinical Variables
The demographic information of the patients is shown in Table 1.In training and validation cohorts, patients with ruptured aneurysms were significantly younger, had larger aneurysms, and had higher PHASES scores than patients with unruptured aneurysms.In addition, there were significant differences in the aneurysm location between ruptured and unruptured patients.In the training cohort, the proportion of patients with hypertension was significantly less in the ruptured aneurysm group (p = 0.047).Moreover, age, gender, proportion of hypertension, aSAH, aneurysm size, and PHASES scores did not significantly differ between the training and validation cohorts (Supplementary Materials, Table S1).
Optimal Radiomics Features A total of 107 features of aneurysms were obtained and 21 of them survived the Student's t-test.Secondly, the elastic network regression was used to select the features that were mostly relevant to the rupture of aneurysms; then, seven optimal radiomics features were finally determined (Figure 2).Definitions of the seven optimal radiomics features were described in the Supplementary Materials.Among them, six optimal radiomics features were significantly associated with aneurysm size, with details described in Supplementary Materials Table S2.

Optimal Radiomics Features
A total of 107 features of aneurysms were obtained and 21 of them survived the dent s t-test.Secondly, the elastic network regression was used to select the features were mostly relevant to the rupture of aneurysms; then, seven optimal radiomics feat were finally determined (Figure 2).Definitions of the seven optimal radiomics feat were described in the Supplementary Materials.Among them, six optimal radiomics tures were significantly associated with aneurysm size, with details described in Sup mentary Materials Table S2.

Machine Learning Model Construction and Evaluation
The calibration curves of the 12 machine learning models for the probability of tured aneurysms showed a good agreement between the actual observed and predi ruptured aneurysms in the validation cohort (Figure 3A).

Machine Learning Model Construction and Evaluation
The calibration curves of the 12 machine learning models for the probability of ruptured aneurysms showed a good agreement between the actual observed and predicted ruptured aneurysms in the validation cohort (Figure 3A).The ROC curves of 12 machine learning models in the validation cohort are shown in Figure 3B, and their diagnostic performances are shown in Table 2.With the Delong test, we observed that AdaBoost, gradient boosting, and CatBoost had significantly higher AUCs than the other nine models, among which no significant differences in the AUC were observed (see Supplementary Table S3).The radiomics model based on AdaBoost showed the best performance, with an AUC of 0.889 (0.842-0.936) and a sensitivity of 0.716 (0.591-0.817).In addition, the AUCs of the radiomics model based on AdaBoost in the cross-validation tests ranged from 0.842 to 0.913 (Table 3), and the SHAP values of features in the AdaBoost model presented the contribution of seven features to the prediction (Figure 2B), of which the top-three important features were the dependence entropy, elongation, and cluster shape.The ROC curves of 12 machine learning models in the validation cohort are shown in Figure 3B, and their diagnostic performances are shown in Table 2.With the Delong test, we observed that AdaBoost, gradient boosting, and CatBoost had significantly higher AUCs than the other nine models, among which no significant differences in the AUC were observed (see Supplementary Table S3).The radiomics model based on AdaBoost showed the best performance, with an AUC of 0.889 (0.842-0.936) and a sensitivity of 0.716 (0.591-0.817).In addition, the AUCs of the radiomics model based on AdaBoost in the cross-validation tests ranged from 0.842 to 0.913 (Table 3), and the SHAP values of features in the AdaBoost model presented the contribution of seven features to the prediction (Figure 2B), of which the top-three important features were the dependence entropy, elongation, and cluster shape.

Discussion
In this work, we included patients with ruptured or unruptured intracranial aneurysms, and further constructed prediction models with 12 common machine learning algorithms.We observed that, with the same dataset and radiomics features, all 12 radiomics models based on different algorithms achieved considerable prediction efficacy; however, the efficacy differed across algorithms.
A previous study reported that the introduction of radiomics features into prediction models could exert a significant improvement on prediction strength with the AUC ranging from 0.767-0.879(8).Most algorithms we adopted in this work were applied in previous studies individually, including the KNN with feature fusion [21], SVM with aneurysms after surgery with a stent [22], logistic regression with hundreds of features [18], SVM, random forest, logistic regression, and multilayer perceptron with radiomics features and hemodynamics [23], which showed that SVM performed best among them, but without boosting algorithms included.In our study, eight of the 12 models achieved comparable prediction efficiency to those in previous studies with AUC values over 0.80, which indicated that the prediction efficacy of the models we constructed reached the general level.Moreover, our current work included relatively larger bands of machine learning algorithms, and all predictive models were constructed with the same radiomics features and were evaluated on the same basis.
As we established, a larger aneurysm size is related to an increased risk of rupture.However, there still exists a considerable proportion of aneurysms whose diameters larger than 3 mm remained unruptured.This caused us to wonder whether there are more diagnostic indicators that, other than the size of the aneurysm, can be used to predict aneurysm status.Interestingly, six of the seven optimal radiomics features were found to be significantly correlated to the size of the aneurysm.Such radiomics features we employed might have covered the factor of aneurysm size and also information other than regular observations, which might be a crucial aspect of why the application of radiomics would improve the sensitivity of rupture predictions.Additionally, the SHAP analysis indicated that the AdaBoost model tended to correlate heterogeneous density within the aneurysm and the asymmetry of the aneurysm shape with an increased rupture risk.Meanwhile, during the review of cases that suffered false-negative predictions, we observed several cases whose diameters were smaller than 3 mm; however, those with slightly higher values of radiomics features than the average (2 or 3) still suffered from aSAH in the follow up.When observing the original images, we found they were located where the artery bent sharply.After consulting the neurosurgeon, we hypothesized that the rupture might be associated with hemodynamics, and there was a study that included the measurements of hemodynamics, which achieved salient prediction efficacy [23,24].
The machine learning algorithms we employed could be classified into three main categories: four in supervised learning (SVM, decision-making tree, logistic regression, and GNB), one in statistical inference (KNN), and the remaining seven in ensemble learning.From the Delong test, we observed that the ensemble learning algorithms were relatively better than those of the others in this study, which might have resulted from the different ways of learning.The algorithms in the supervised learning category were the application of single algorithm classifications, similar to the KNN of the statistical inference, while the algorithms of the ensemble learning integrated better than one algorithm that fit [15].In a simplified understanding, the random forest could be likened to the effective combination of several decision-making trees.Correspondingly, we might assume that for multifaceted radiomics features displaying different aspects of the aneurysms, ensemble learning could achieve a better performance.It should be noted that a previous study implementing five machine learning algorithms using morphological parameters, including random forest, CatBoost, SVM, light GBM, and XGB, showed a better performance than PHASE, and the SVM was superior among them [25].Combined with our findings, it might indicate that the performance of the algorithm differs among different types of features.
In the seven machine learning algorithms of the ensemble learning, AdaBoost showed the highest AUC; though, it was not significantly different from gradient boosting and CatBoost.The best-known application of the AdaBoost algorithm is automatic face recognition, which technically performs multiple iterations on the features of the same training set at different scales [26], and integrates the classifiers generated by each iteration for the final classifier [27,28].Technically, the boosting algorithm involved a similar technical path, which involves re-iterating the misclassified samples and further optimizes the model [29], which might indicate that machine learning with boosting algorithms would be the better choice for intracranial aneurysm rupture identification based on radiomics.
This work had several limitations.First, although we used the baseline data from two centers, as well as the follow-up data from ruptured intracranial aneurysms, selection bias was difficult to completely avoid.Second, given the retrospective nature of this study, the timing of the aneurysm rupture was not considered in the analysis, and the diagnostic accuracy may also have been overestimated.Third, although we performed image normalization and a feature homogenization process in the study, there were parameter differences between different CT scanners.Differences beyond causes of biological effects may have also existed.Lastly, given the lack of recognized automatic extraction methods, the method of manually delineating VOIs on CTA images was feasible in this study; however, it may not be suitable for clinical applications.The result of our study could only provide tendentious references and must be further confirmed with larger, related experiments.

Figure 1 .
Figure 1.Radiomics combined with machine learning methods to predict the rupture of intracranial aneurysms.

Figure 1 .
Figure 1.Radiomics combined with machine learning methods to predict the rupture of intracranial aneurysms.

Figure 2 .
Figure 2. Feature selection with the elastic network regression.(A) The elastic network regre coefficient analysis of the 21 radiomics features.Each colored line represents the coefficients of feature.(B) The SHAP values of the 7 optimal radiomics features for the model construction.

Figure 2 .
Figure 2. Feature selection with the elastic network regression.(A) The elastic network regression coefficient analysis of the 21 radiomics features.Each colored line represents the coefficients of each feature.(B) The SHAP values of the 7 optimal radiomics features for the model construction.

Figure 3 .
Figure 3.The calibration and receiver operating characteristic curves of models based on 12 machine learning algorithms in the validation cohort.(A) The calibration curves of models based on 12 machine learning algorithms in the validation cohort.(B) The receiver operating characteristic curves of models based on 12 machine learning algorithms in the validation cohort.Abbreviations: SVM, support machine learning; DT, decision-making tree; XGB, eXtreme gradient boosting; GNB, Gaussian Naive Bayes; LR, logistic regression; RF, random forest; KNN, k-nearest neighbor; BC, bagging classifier; ADA, AdaBoost; GB, gradient boosting; LGBM, light gradient boosting machine; CB, Cat-Boost.

Figure 3 .
Figure 3.The calibration and receiver operating characteristic curves of models based on 12 machine learning algorithms in the validation cohort.(A) The calibration curves of models based on 12 machine learning algorithms in the validation cohort.(B) The receiver operating characteristic curves of models based on 12 machine learning algorithms in the validation cohort.Abbreviations: SVM, support machine learning; DT, decision-making tree; XGB, eXtreme gradient boosting; LR, logistic regression; RF, random forest; KNN, k-nearest neighbor; BC, bagging classifier; ADA, AdaBoost; GB, gradient boosting; LGBM, light gradient boosting machine; CB, CatBoost.

Table 1 .
Univariates analysis of patients in the training and validation cohorts.
Data are noted as median and interquartile ranges or numbers and percentages in parenthesis.SD, standard deviation; aSAH, aneurysmal subarachnoid hemorrhage; ICA, internal carotid artery; ACA, anterior cerebral artery; MCA, middle cerebral artery; ACOM, anterior communicating artery; PCOM, posterior communicating artery.

Table 3 .
The results of cross-validation on AdaBoost.