Intracranial Hemorrhage Detection in Head CT Using Double-Branch Convolutional Neural Network, Support Vector Machine, and Random Forest

: Brain hemorrhage is a severe threat to human life, and its timely and correct diagnosis and treatment are of great importance. Multiple types of brain hemorrhage are distinguished depending on the location and character of bleeding. The main division covers ﬁve subtypes: subdural, epidural, intraventricular, intraparenchymal, and subarachnoid hemorrhage. This paper presents an approach to detect these intracranial hemorrhage types in computed tomography images of the head. The model trained for each hemorrhage subtype is based on a double-branch convolutional neural network of ResNet-50 architecture. It extracts features from two chromatic representations of the input data: a concatenation of the image normalized in different intensity windows and a stack of three consecutive slices creating a 3D spatial context. The joint feature vector is passed to the classiﬁer to produce the ﬁnal decision. We tested two tools: the support vector machine and the random forest. The experiments involved 372,556 images from 11,454 CT series of 9997 patients, with each image annotated with labels related to the hemorrhage subtypes. We validated deep networks from both branches of our framework and the model with either of two classiﬁers under consideration. The obtained results justify the use of a combination of double-source features with the random forest classiﬁer. The system outperforms state-of-the-art methods in terms of F1 score. The highest detection accuracy was obtained in intraventricular (96.7%) and intraparenchymal hemorrhages (93.3%). We compare the results of both classiﬁers in the detection of each ICH subtype. We also verify the efﬁciency of individual ResNet-50 networks from both branches. For the training and assessment, we employ a public database containing 372,556 head CT slices with the corresponding ground-truth labels indicating the presence of either SDH, EDH, IPH, IVH, or SAH. Finally, we compare our approach to the state-of-the-art reference methods described in


Introduction
Intracranial hemorrhage (ICH) relates to bleeding occurring within the intracranial vault. Possible reasons include, i.a., vascular abnormalities, venous infarction, tumor, trauma effects, therapeutic anticoagulation, and cerebral aneurysm [1][2][3][4]. Regardless of the actual cause, a hemorrhage constitutes a major threat. Therefore, an accurate and rapid diagnosis is crucial for the treatment process and its success. ICH diagnosis relies on patient medical history, physical examination, and non-contrast computed tomography (CT) examination of the brain region. CT examination enables bleeding localization and can indicate the primary causes of ICH [5]. There are several challenges related to the ICH diagnosis and treatment: the urgency of the procedure, a complex and time-consuming decision-making process, an insufficient level of experience in the case of novice radiologists, and the fact that most emergencies occur at nighttime. Thus, there is a significant need for a computer-aided diagnosis tool to assist the specialist. Nevertheless, the accuracy of automated hemorrhage detection should be sufficiently high for medical purposes.
Depending on the brain's anatomic site of bleeding, several ICH subtypes can be distinguished ( Figure 1). Subdural hemorrhage (SDH) refers to bleeding between the dura and the arachnoid, whereas the epidural subtype (EDH) involves bleeding between the dura and the bone. Both frequently result from traumatic injuries. Intraparenchymal hemorrhage (IPH) is bleeding within the area of brain parenchyma. A hemorrhage inside the ventricular system is known as intraventricular (IVH). Finally, blood within the subarachnoid space indicates subarachnoid hemorrhage (SAH) [1,5]. One of the leading causes of SAH is a ruptured cerebral aneurysm [3,4]. Hemorrhage detection and classification are challenging due to similarities between the various ICH subtypes (e.g., SDH vs. EDH) and subtle differences between healthy and bleeding tissues. These are barely noticeable to the inexperienced observer. Figure 1 presents examples of selected CT slices featuring the physiological brain appearance and ICH subtypes under consideration. In recent years, we have observed increasing interest among researchers in deep learning methods for image classification and segmentation. Convolutional neural networks (CNNs) have gained popularity due to their reliability and efficiency, becoming significant factors in medical diagnosis support [6,7]. In general, CT scans are 3D structures composed of a stack of 2D slices. Thus, operating on image voxels is possible, but may require large computational complexity. A technique to avoid the latter is to either process slices individually or employ the 3D context in a less complex way. Several deep learning approaches to intracranial hemorrhage detection and classification have been proposed, most of them in the last three or four years. Different studies address either one-class detection related to a single class of ICH present in a CT scan [8][9][10], multi-class classification distinguishing ICH subtypes [11][12][13][14][15], or ICH pixel area detection within individual images [7,16,17]. The popularity of pre-trained CNN models can be observed, for example for VGG [8], MobileNet [15], AlexNet [18], and ResNet-18 [14].
Nguyen et al. [11] reported the results of a combination of a convolutional neural network and a long short-term memory (LSTM) for hemorrhage classification. The slice-wise pre-trained CNN (ResNet-50 and SE-ResNeXt-5 architectures) extracted features from every image, while the LSTM linked them across slices. Arbabshirani et al. [9] proposed a CNN architecture, employing two fully connected layers and 37,084 training images to achieve ICH presence detection. The achieved accuracy was equal to 95%. Danilov et al. [19] designed a ResNexT CNN model for classification between five ICH subtypes, using the Adam optimizer and a dataset of 674,258 CT slices. The accuracies were equal to 82.8%, 81.8%, 82.0%, 89.3%, and 83.5% for EDH, SDH, SAH, IVH, and IPH, respectively. Ye et al. [12] employed a joint 3D CNN and recurrent neural network (RNN) to detect ICH and recognize its five subtypes. The accuracy of detecting any bleeding exceeded 98%, yet in individual subtypes it varied between 75% and 96%. Ker et al. [20] proposed a 3D CNN for various IPH subtype classification: healthy brain, SAH, IPH, acute subdural hemorrhage (ASDH), and brain polytrauma hemorrhage (BPH). The dataset included 399 volumetric CT scans (approx. 12,000 images). Two types of classification were performed: two-class (normal vs. a specific ICH) and four-class (normal vs. all considered ICH subtypes). The authors employed the F1 score to assess their approaches and obtained 70.6-95.2% for the two-class cases and 68.4% for the multiclass analysis. Togaçar et al. [18] proposed a combination of a CNN, autoencoder network, and a heat map method. The dataset was processed using an autoencoder network with heatmaps generated based on every image. The outcome subjected to the augmentation process constituted the input to the pre-trained CNN (AlexNet architecture). The classification employed a support vector machine (SVM). The authors reported 98.6% accuracy, 98.1% sensitivity, and 99.0% specificity for detecting any hemorrhage using a dataset consisting of 2101 images with augmentation. Dawud et al. [21] compared the results of the AlexNet CNN-SVM combination and two conventional CNN architectures (CNN and pre-trained AlexNet model). The first one produced evaluation metrics better than traditional architectures: 93% accuracy, 95% sensitivity, and 90% specificity for the ICH presence detection over a database containing 12,635 images. Burduja et al. [22] proposed an integration of two neural network architectures to perform ICH subtype classification: ResNeXt-101 and bidirectional long short-term memory (BiLSTM). They compared model performance with evaluation metrics calculated based on annotations collected from three radiologists. The obtained accuracy exceeded 96% in ICH subtypes with a weighted mean log loss equal to 0.04989 in the test dataset. However, reduced sensitivity over highly imbalanced data suggests that the F1 score would reflect the system's performance more realistically.
In this study, we propose a method for detecting various subtypes of intracranial hemorrhage (SDH, EDH, IPH, IVH, and SAH) in the brain CT scans based on a double-branch CNN for feature extraction and two different classifiers. According to the literature reports, binary detectors of individual subtypes frequently feature higher precision than multiclass approaches [8,10,12,15,23]. Moreover, a definitive decision from the multiclass framework is often partially wrong since a single slice can present more than one subtype of the ICH. Thus, we use individually trained instances of the proposed architecture to detect each ICH subtype separately. Before feature extraction and classification, preprocessing is applied, including a skull removal algorithm. Three intensity windows are employed to transform the original CT slice. We concatenate the resulting grayscale images to set up inputs for the double-branch CNN. In the first branch, the method considers different-window images to gain more information from a single input. The second branch analyzes the slice in a local 3D context using the neighboring slices. A ResNet-50 architecture is implemented and trained in both paths to extract features subjected to the classification automatically. The classification itself involves two tools: the SVM and the random forest (RF). We compare the results of both classifiers in the detection of each ICH subtype. We also verify the efficiency of individual ResNet-50 networks from both branches. For the training and assessment, we employ a public database containing 372,556 head CT slices with the corresponding ground-truth labels indicating the presence of either SDH, EDH, IPH, IVH, or SAH. Finally, we compare our approach to the state-of-the-art reference methods described in the introduction. The employment of two various representations of data and the use of an external classifier constitute the main contributions of our study. Both spatial and intensity-related features feed a classifier through a joint vector. Approaches incorporating spatial information can be found in the ICH detection domain; however, to the best of our knowledge, none of them have used a hybrid technique similar to the one proposed in this study.
The remainder of the paper is structured as follows: Section 2 describes the materials and methods, including preprocessing, double-branch feature extraction, and the classification process. In Section 3, we present the results of intracranial hemorrhage classification using the proposed tool with different settings and compare them to the state-of-the-art methods. Section 3 also discusses the results, whereas Section 4 concludes the paper.

Materials
The database employed in this study includes 372,556 head CT slices acquired from 9997 patients and 11,454 complete CT series. The data are a part of the public Radiological Society of North America (RSNA) database used for the intracranial hemorrhage detection competition [24,25]. The database contains images presenting various types of bleeding: subdural, epidural, intraventricular, intraparenchymal, and subarachnoid. Many CT slices are not assigned to any of those types; some contain more than one hemorrhage. A ground-truth label determining the presence of every ICH subtype is attached to each case. The class distribution within the database is given in Table 1. The number of slices per scan varies among patients, although most of the CT studies present the whole brain area, not only its middle section. We prepared the data for the analysis by sorting the patient ID, CT study, and series, based on standard DICOM (Digital Imaging and Communications in Medicine)attributes.

Methods
The proposed framework involves three stages of data processing ( Figure 2): (1) preprocessing and preparing the input data for deep learning, (2) automated extraction of features using two CNN branches, and (3) two-class classification based on the joint feature vector. As shown in Figure 2, an individual deep learning and classification model is dedicated to each ICH subtype, so the preprocessing is the only step common for all five models.

Preprocessing
The data preprocessing starts with applying dedicated intensity windows to the CT image, followed by the skull removal algorithm. We use three intensity windows (in Hounsfield units-HU): L = 100, W = 200 (subdural window), L = 600, W = 2800 (bone window), and L = 40, W = 80 (brain window). Images after the intensity windowing are normalized to the 0-1 range. Then, we determine the region of interest (ROI) embracing the cranial area as a bounding box framing the largest binary object in the subdural image after binarization using the Otsu method [26]. The skull removal process is performed on the subdural-window image by extracting pixels with the highest intensities corresponding to the bone regions and replacing them with zero-valued pixels. Due to soft tissue outside the skull, the morphological opening is applied to extract the intracranial area.
After the skull removal, the data preparation splits into two branches related to the architecture of the following CNNs. In either branch, preprocessing concludes with preparing RGB images for the feature extraction part of the CNNs. In the left branch in Figure 2, the 3D image structure is composed of three windowed slices with skull pixels replaced with zeros. The right branch employs spatial information by combining three subsequent CT slices from the current scan, all in the subdural window and with a removed skull. The i-th slice under consideration is put into the green (G) channel, while the red (R) and blue (B) channels contain slices preceding and following the i-th one, respectively. Figure 3 shows the results of subsequent preprocessing stages for three sample CT slices. Figure 2. General scheme of a proposed hemorrhage detection model. The same architecture applies for each classification task (five ICH subtypes vs. healthy brain). The preprocessing is a common step for all five models.

Double-Branch-CNN Feature Extraction
To automatically extract features from the prepared RGB images, we use a double-branch convolutional neural network. Both paths employ similar pre-trained ResNet-50 architecture consisting of a set of convolutional and identity blocks (Figure 4) [27]. ResNet is a representative deep CNN yielding efficient results in the image classification domain. As a deeper network, it allows the extraction of more advanced features. However, a large number of layers might cause vanishing gradients and accuracy to decrease. The skip-connections in ResNet architecture minimize that risk [27]. The network accepts input data of a 224 × 224 × 3 size, so the images prepared during preprocessing are resized to match the required format.  Columns (left to right): raw CT slice before region of interest (ROI) extraction, subdural-window image, brain-window image, bone-window image (first three before and next three after skull removal), a stack of three CT-windowed images (Branch #1), a stack of three neighboring slices (Branch #2).
We trained both ResNet networks separately using our training data. As mentioned in Section 1, we tested both ResNets to assess their performance (note Experiment #1 described in Section 3). However, in our model, we have taken the features extracted by either branch after the average pooling layer and concatenated them to set up a joint feature vector for classification.
The training of either network was performed using the adaptive moment estimation optimizer (Adam). We applied image augmentation, including data rotation (±15 • ), scaling (0.8-1.25), shear (0 • -45 • horizontally and vertically), and translation (±25 pixels horizontally and vertically). Due to the dominance of non-hemorrhage slices within the database, we performed class distribution weighting in each ICH subtype, leading to different numbers of images for classification into categories (this concerned both the balancing of class sizes and the application of weighted cross-entropy loss during training). In each experiment, we randomly partitioned the database into the training, validation, and test subsets with a 80%:10%:10% ratio in a patient-wise manner. The patient-wise assigning of datasets secures the presence of data from each patient in a single subset only. That makes the framework robust against the impact of a subject's slice-to-slice similarity on the classification accuracy. The network parameters, including the mini-batch size (8) and the number of epochs (60), were chosen experimentally. The training was terminated automatically if the validation loss did not decrease for ten epochs.
After the training, features from the last block preceding the ResNet-50's fully connected layer were taken from either branch and concatenated. The joint feature vector containing 4096 elements was subjected to the classification process.  For classification purposes, we involved and compared two classifiers: the support vector machine and random forest. SVM [28] is a well-recognized tool that is efficient in regression and classification, especially in two-class tasks. It aims to separate samples from different classes in the feature space with a surface that maximizes the margin between them using optimization methods. We used the SVM classifier with a linear kernel [29] due to its superior performance compared to other kernels under consideration, including Gaussian and RBF. As a result, the tool provides probabilities of a given ICH subtype's presence in the image. Hemorrhage is confirmed if the probability exceeds 50%. RF [30] is an ensemble classifier that parallelly uses multiple models of decision trees (DTs). RF takes the decision of the majority of the trained DTs as its final choice. Single trees are trained in parallel using various subsets of a given dataset (bootstrapping) followed by an aggregation, together known as bagging. The training data are divided into bootstrap sample data and out-of-bag (OOB) data that allow RF for cross-validation. RF is considered more complex in interpretation than DTs, yet more convenient in respect of hyperparameter adjustment. We set the maximum number of splits to n − 1, where n is the number of observations in the training sample with the number of trees equal to 200.

Evaluation Metrics
We performed several experiments to train and evaluate our models. In each case, the classification performance was assessed using the following metrics: • accuracy: • sensitivity (recall, true positive rate): • specificity (true negative rate): • F1 score (Dice index): where TP, TN, FP, and FN denote the number of true positive, true negative, false positive, and false negative classifications, respectively.

Experimental Results
In Experiment #1, we tested both full ResNet-50 architectures (corresponding to two branches of the feature extraction stage) separately. The concatenation of features and SVM or RF classification were omitted. The results of Experiment #1 are presented in Table 2. The best results were achieved for the intraventricular hemorrhage, whereas the worst performance was observed in the case of epidural hemorrhage. Neither branch outperformed the other in all cases.
In Experiment #2, the model described in Section 2.2 was prepared and validated with the concatenation of double-branch features and classification employing both the SVM and RF. Table 3 shows the obtained metrics for each ICH subtype and classifier. Similarly to Experiment #1, the best performance was observed in intraventricular hemorrhage, whereas the poorest outcomes were noted in epidural hemorrhage. RF produced higher quality measures in most categories except EDH. Moreover, in terms of accuracy and F1 score, the RF classifier outperformed any ResNet model from Experiment #1. Table 2. Summary of evaluation metrics obtained in Experiment #1 for individual ICH subtypes and ResNet models.

Discussion
The proposed classification tool works efficiently either as individual ResNet-50 networks or a double-branch deep learning architecture. There is a balance in the results produced by standalone single-branch ResNets with mostly slight differences. That suggests that both input data representations contribute almost equally to the proper hemorrhage classification. The IVH is the best-classified subtype featuring accuracy and F1 scores exceeding 95% in both branches. The worst performance can be observed in the EDH (accuracy below 70%, F1 score below 60%).
In a double-branch CNN architecture supported by SVM and RF classifiers, IVH and EDH classification performs most and least efficiently, respectively. The random forest has a slight advantage over the support vector machine except for the EDH. However, keep in mind that the EDH produces classification results significantly lower than the other subtypes. The subtype-classification-assessment ranking reflects the order yielded by individual ResNets. Nonetheless, double-branch architecture improves classification in each subtype. The gains in accuracy and F1 score differ between subtypes-from ca. 1-2% in SDH, IPH, IVH, and SAH to over 10% in EDH, where the room for growth is the largest. The consistency of improvement throughout various ICHs leads us to conclude that the employment of an external machine learning classifier taking advantage of the double-source deep features is profitable in ICH detection.
Improved performance of the proposed double-branch architecture can be associated with the use of multi-source features for classification. The fusion of features extracted by networks addressing specific image data representations provides a broader range of information. A slice-to-slice neighborhood context appears to be particularly relevant. Considering the entire CT study arranged in the correct order, hemorrhage should be observed in subsequent slices. Thus, such spatial information improves the cognitive capabilities of the automated system.
Slice-wise brain hemorrhage detection frameworks generally operate on the entire CT slices or, like our approach, perform some primary ROI extraction to prepare the data for the analysis. The central part of processing is left to the deep learning tool with a limited impact of the operator on the feature extraction. Some targeted localization of selected portions of the image could likely improve the detection and classification scores. Differences between various hemorrhage subtypes are associated with their location and shape. Taking the above into account in more specifically dedicated procedures might benefit classification accuracy improvement. Hemorrhages are frequently caused by mechanical head injuries. Thus, the skull's external area could be examined as well, apart from the deep learning analysis.
The lowest metrics obtained in EDH classification raise a question on the imbalance between classes in a single classification task. The number of epidural cases is an order smaller than the other subtypes. That is probably the case in this particular bleeding, and the class weighing employed during training is not sufficient to resolve this problem. However, there is no such correlation throughout all subtypes. The IVH with the top classification accuracy (correctly detecting 29 out of each 30 cases) is not the best-represented hemorrhage; second-best IPH (14 out of 15 correct) is the second-worst-represented subtype of ICH. Thus, the character of a particular hemorrhage appearance in CT has a crucial influence on the detection capabilities. Similarities between different bleeding symptoms and features are also possible (recall that the dataset of negative samples in each subtype classification also contains images with all other hemorrhages). Some complex cases represent a combination of multiple subtypes simultaneously. Therefore, correct classification is undoubtedly a challenging task.
To put our study in context, we gathered the F1 score results from corresponding state-of-the-art ICH detection methods [12,16,19,22] along with our double-branch RF model (DB-RF) in Table 4. The F1 score is a reliable measure in highly imbalanced datasets since it ignores the true negatives. The selected methods addressed the detection of at least a half of our hemorrhage subtype set (Chang et al. [16] operate on a joint SDH+EDH class and omit the IVH) and were published from 2018 to 2020. Most importantly, similarly to our approach, they detected hemorrhages in CT slices, not in the entire 3D series. Danilov et al. [19] and Burduja et al. [22] used the RSNA competition database. We used provided data and assessment results to determine the F1 score wherever it was not given explicitly. Table 4. Comparison of classification F1 scores obtained using state-of-the-art methods [12,16,19,22] and our double-branch random forest (DB-RF) model. All values in (%).

Conclusions
In this paper we addressed the detection of various intracranial hemorrhage types by employing double-branch deep feature extraction and machine learning classifiers. The results justify the idea of searching for relevant features in different representations of the data (normalized image in multiple intensity windows and 3D spatial context) and their concatenation for classification using random forest. The system offers the highest detection efficiency in intraventricular and intraparenchymal hemorrhage.