Machine Learning Model for Intracranial Hemorrhage Diagnosis and Classification

Intracranial hemorrhage (ICH) is a pathological disorder that necessitates quick diagnosis and decision making. Computed tomography (CT) is a precise and highly reliable diagnosis model to detect hemorrhages. Automated detection of ICH from CT scans with a computer-aided diagnosis (CAD) model is useful to detect and classify the different grades of ICH. Because of the latest advancement of deep learning (DL) models on image processing applications, several medical imaging techniques utilize it. This study develops a new densely connected convolutional network (DenseNet) with extreme learning machine (ELM)) for ICH diagnosis and classification, called DN-ELM. The presented DL-ELM model utilizes Tsallis entropy with a grasshopper optimization algorithm (GOA), named TEGOA, for image segmentation and DenseNet for feature extraction. Finally, an extreme learning machine (ELM) is exploited for image classification purposes. To examine the effective classification outcome of the proposed method, a wide range of experiments were performed, and the results are determined using several performance measures. The simulation results ensured that the DL-ELM model has reached a proficient diagnostic performance with the maximum accuracy of 96.34%.


Introduction
Intracranial hemorrhage (ICH) is an important and a severe disease that paves the way for heart disease and stroke. ICH mostly affects severely overweight people and the mortality rate has been enhanced progressively within a limited time period. Moreover, it occurs in multiple intracranial blocks, which are caused by many external factors. In order to treat ICH, a neuro-imaging mechanism is available for examining the position and quantity of hemorrhage and its impending cerebral damage, which helps inpatient treatment [1]. However, it is externally affected in the brain parenchyma (extra-axial). This intra-axial as well as extra-axial hemorrhage is impossible to treat unless it is discovered in the earlier stage. For instance, intra-axial hemorrhage influences severely overweight people in the United States with maximum fatality [2]. Clinical admission of ICH has been enhanced drastically because of the growing population, expensive lifestyle, and abnormal blood pressure management. Furthermore, later diagnosis of ICH cause one of the serious health effects that results in massive death within a short period of time, Computed tomography (CT) screening is the general mechanism used for diagnosing ICH accurately, as well as diagnosing it early, which relies on the robustness of CT with respect to quick interpretation of ICH.
The interpretation of radiological work depends upon the preference for any patient to be tested as either an inpatient or outpatient. Typically, Stat works are interpreted with limited time, whereas regular outpatient examination takes the maximum duration, which relies on an accessible radiology system. ICH occurs in an outpatient setting, albeit with low frequency when compared with an inpatient or emergency department setting. For instance, an aged outpatient on anticoagulation remedy suffers from the risk of ICH [3][4][5]. Interestingly, the primary signs might be vague, which prompts non-emergent, regular head CT. Furthermore, CT is defined as a popular non-invasive and efficient imaging technique for ICH prediction. Hemorrhage is examined in non-contrast CT as blood has a high density (Hounsfield unit, HU) when compared with alternate brain cells; however, it is limited for bones. An exact analysis of bleeding is very serious for medical interventions. Moreover, the estimation of head CT is required for patients admitted in emergencies. Simultaneously, a primary interpretation of head CT is done by junior radiologists and trainee radiologists for emergency patients. Finally, initial interpretations are made by expert radiologists.
An automated triage of imaging studies, which apply computer models capable of predicting ICH with enhanced results, was used. A quality enhancement tool was employed for automated management for early interpretation of imaging works with supposed ICH as well as optimization of the radiology task. Computer vision and machine learning (ML) methodologies are suitable for learning and predicting the patterns. Specifically, the DL algorithm is a kind of ML model that has been leveraged for automatic classification operations like natural language processing (NLP), audio analysis, and object prediction [6][7][8]. Progressive development in "augmented" diagnostic vision with ML in the clinical field. For instance, DL models are used for diagnosing diabetic retinopathy (DR) from retinal images, breast cancer from mammograms, and so on. The published applications involve the prediction and diagnosis of skin cancer, pulmonary lumps, and cerebral micro-HM. In spite of the studies demonstrating the efficiency of ML for diagnostic medicine and radiology, medical implementation of DL technology remains unexplored [9,10].
Automated identification of ICH using CT scans using computer-aided diagnosis (CAD) models can be employed to increase the detection rate in a short period of time. As the quantity of neuro imaging data obtainable for the design of these solutions is normally restricted, this paper designs an effective densely connected convolutional network (DenseNet) with an extreme learning machine (ELM) for ICH classification and diagnoses, called DL-ELM. The presented method comprises several sub processes, namely, classification, pre-processing, segmentation, feature extraction, and so on. The DL-ELM model undergoes a pre-processing step, where the input data from the NIfTI files are transformed into JPEG format. Next, Tsallis entropy with a grasshopper optimization algorithm (GOA), named TEGOA, is used for image segmentation. Afterward, the DenseNet algorithm is applied to identify the useful set of feature vectors and, finally, the ELM is employed to categorize the ICH into different class labels. A detailed analysis of the experimental results takes place to determine the performance of the DL-ELM technique.

State-of-the-Art Approaches to ICH Diagnosis
Many traditional and DL algorithms are explained in this work. According to the traditional ML models, Yuh et al. [11] developed a threshold-reliant methodology for the prediction of ICH. A technique predicted anICH sub-types, which depend upon the position, structure, and volume. Developers have optimized the threshold value under the application of retrospective instances of CT scans and determined on CT scans of subjects with traumatic brain injury (TBI). Consequently, maximum sensitivity and specificity are accomplished for ICH prediction and intermediate accuracy is accomplished while predicting ICH sub-types. Alternatively, Li et al. [12] projected two models to segment subarachnoid hemorrhage (SAH) space and applied segmented regions for the purpose of forecasting SAH. In this approach, CT scans are employed to train and test mechanisms. Effective performance was addressed with the help of the Bayesian decision model with testing SE, SP, and accuracy.
Based on the DL models, convolutional neural networks (CNNs) and corresponding variants are deployed in [13], which depend upon the fully convolutional networks (FCNs) approach. Here, spatial dependence among adjacent slices was assumed under the application of random forest (RF) or recurrent neural network (RNNs). Moreover, developers have applied an extended version of CNNs to compute a complete CT scan or interpolation layer. Alternate technologies are one-stage, which means that it does not apply spatial dependency among the slices. Prevedello et al. [14] projected two methodologies related to CNNs. The primary approach concentrated on ICH prediction, hydrocephalus, and mass effect under the scan level, whereas alternate models are established for predicting malicious acute infarcts.
Chilamkurthy et al. [15] projected four models for forecasting ICH subtypes, namely, midline shift, mass effect, and calvarial fractures. They validated and trained their processes on a massive dataset with maximum CT scans, correspondingly. The two datasets are utilized mainly for testing, where one model has partial scans that are available in a common dataset named as CQ500.
Clinical radiology reports are employed as the gold-standard for labelling of the trained CT scans and authentication of CT scans. This medical report is employed for scanning and the NLP model is also employed for testing the scanning reports and annotated using massive votes of ICH subtypes addressed by three specialized radiotherapists. Diverse deep methods were employed for four predictive types, namely, ResNet18 undergoes training with five parallel FC layers as the output layer. Ye et al. [16] designed a 3D joint convolution and recurrent neural network (CNN-RNN) for the purpose of classifying and predicting ICH. Hence, the entire stricture of this technique is the same as in the method developed by Grewal et al. [17]. VGG-16 was applied as a CNN mechanism and bi-GRU was applied as an RNN model. Therefore, the RNN layer performs a similar function to the slice interpolation approach, as presented by Lee et al. [18], although it is effective in terms of adjacent slices applied in classifications. Hence, it is trained and verified on CT scans and sampled. A more precise slice-level was attained in ICH detection.
In line with this, Jnawalia et al. [19] applied a TL method on an ensemble of four popular CNN methodologies for forecasting ICH sub-types and bleeding points. A spatial dependency from adjacent slices is regarded as a slice interpolation framework. This ensemble model undergoes training and is verified under the application of a dataset using CT scans, and is tested through a retrospective database using CT scans as well as a prospective dataset. As a result, ICH prediction is used to develop a better area under the ROC curve (AUC), specificity, and sensitivity. Thus, the newly developed approach resulted in minimum SE for classifying ICH sub-types.

Proposed Methodology
In this study, a new DL-ELM model is introduced for the diagnosis and classification of ICH. Initially, the input data from the NIfTI file are transformed into JPEG images. The pre-processed data are segmented using the TEGOA model, and then features are extracted using the DenseNet model. Finally, the ELM method is employed for classifying the different class labels of ICH. The working principle is exhibited in Figure 1 and the algorithms are discussed in the following subsections.

TEGOA-Based Segmentation Process
Primarily, the input data are preprocessed and then the segmentation process is carried out. Entropy is relevant to the chaos value inside a network. Initially, Shannon applied entropy is used to measure the uncertainties of the data involved in a system. It is recommended that, after a physical system is divided into statistically free A & B subsystems, an entropy measure is determined as follows: According to Shannon's strategy, a non-extensive entropy paradigm has been presented by Tsallis and is expressed as follows: where T denotes the system's capability, q implies the entropic index, and p i refers to the possibility of all states i. In general, Tsallis entropy S q satisfies Shannon's entropy if q → 1 .
An entropy score is defined as a pseudo additive rule, as given below: The Tsallis entropy is assumed for identifying effective thresholds of an image [20]. Assume an image with L gray level from {0, 1, . . . , L − 1} with likelihood distribution p i = p 0 , p 1 , . . . p L−1 . Therefore, Tsallis multilevel thresholding is attained by applying the given objective function: where In the case of the multi-level thresholding model, it has to compute an optimal threshold value T that enhances an objective function f (T). In this case, ( f (T)) maximization has been performed under the application of GOA.
The GOA accelerates similar to the normalized swarm nature of grasshoppers. Likewise, in swarm methods, a grasshopper implies a candidate solution that is generated randomly at the time of initialization; furthermore, using the evaluation function, an optimal grasshopper would be considered as a leader. A leader attracts neighbouring grasshoppers towards it. X i implies the place of the ith grasshopper in n dimension space. The numerical formation of GOA is depicted as follows: where S i represents a social communication as described in Equation (9), G i depicts a gravity force illustrated in Equation (11), and A i denotes a wind advection demonstrated in Equation (12).
In the case of grasshopper action, social communication S i plays an important role, which is attained from Equation (9).
denotes a unit vector of two grasshoppers, d ij refers to the Euclidean distance among two grasshoppers, s is accomplished using d ij = x j − x i , and s signifies a function for estimating the intensity of social communication and is evaluated as follows: where f signifies the intensity of attraction and l refers to an attractive length scale. A study on the nature of grasshoppers with diverse measures of l and f also identifies that the distance between grasshoppers within [0, 2.079] can be repulsive. It becomes a comfort zone. The function used for determining the gravity factor is represented as follows: where g denotes a gravitational constant and e Λ g implies a unit vector. The estimated equation of wind direction is formulated as follows: where u denotes a constant drift and e Λ w signifies a unit vectors in the wind direction. The addition of S i , G i , A i into Equation (8) modifies the equation of grasshopper motion, which is depicted by the following: where x i , x j implies the ith and jth grasshopper and X i denotes the consecutive location of grasshopper x i . The grasshoppers accomplish the comfort zone using Equation (13). For identifying the convergence of a certain point, the predefined function is enhanced to achieve a closer optimal solution. Consider that X d i is the position of grasshopper i in the dth dimension. Henceforth, the enhanced function is expressed as follows: where ub d and lb d refer to an upper as well as lower bound in the dth dimension, correspondingly. T Λ d means the value of the dth dimension. In Equation (14), the gravity factors are fixed to zero and the wind factor often shows a recent best grasshopper. Upon decline, coefficient parameters c 1 and c 2 were employed for simulating the slowdown procedure of grasshoppers that access the food position and utilize the food. While the iterations are enhanced, c 1 is applied for limiting a search scope, whereas c 2 is utilized to reduce the impact of attraction and repulsion among all agents. The maximization function of a variable c i (i = 1, 2) is provided below.
where cMax, cMin denote the maximum and minimum value of c 1 , c 2 , respectively. The parameters are allocated with unique measures, respectively. L shows a high iteration and l is a recent iteration.

DenseNet Based Feature Extraction Process
The segmented images are fed as input to the DenseNet-201 model. The proficient way to accomplish a prominent outcome in classification issues with a small amount of data is the TL module. Moreover, hyper-tuning of the DTL method is applicable in enhancing the simulation outcome. Here, a DTL approach with DenseNet201 is presented. Therefore, a newly projected approach is applied in feature extraction, where learned weights on the lmageNet dataset and convolutional neural framework are deployed [21]. The framework of the newly developed DTL approach with DenseNet201 for ICH classification is depicted in Figure 2. DenseNet201 makes use of a condensed network, which provides simple training and efficiency because of the possible feature used for diverse layers, which enhances the difference in the consecutive layer, maximizing the system performance. This method has displayed standard function under different datasets such as ImageNet and CIFAR-100. In order to improvise the connectivity in a DenseNet201 scheme, direct communication from previous layers to consecutive layers is employed, as illustrated in Figure 3. The feature combination is expressed in a numerical form: z l = H l z 0 , z 1 , . . . . . . ., z l−1 (16) In this approach, H l means a non-linear transformation described as a composite function with BN, ReLU, and a Conv of (3 × 3). z 0 , z 1 , . . . . . . , z l−1 represents a feature map combination of equivalent layer 0 to l − 1 that has been integrated into a tensor for simple implementation. In the case of the down-sampling mechanism, dense blocks are developed for isolation of layers and transition layers have BN with a 1 × 1 Conv layer and 2 × 2 average pooling layer. The progression rate in DenseNet201 defines how a dense structure accomplishes modern intentions for hyper-parameter k. It computes the sufficient progressive rate in which a feature map is assumed as the global state of a system. Thus, a successive layer is composed of feature maps with a previous layer. k feature maps are included to the global state in every layer, in which the overall input feature map at the lth layers (FM) l is illustrated: In this framework, the channel in an input layer is referred to as k 0 . In order to enhance the processing efficacy, a 1 × 1 Conv layer was deployed for all 3 × 3 Conv layers that mitigates the overall number of input feature maps, which is higher when compared with output feature maps k. Hence, the 1 × 1 Conv layer was established, named the bottleneck layer, and it generates 4k feature maps. For the purpose of classification [22], two dense layers using neurons were enclosed. The feature extraction system with DenseNet201 and sigmoid activation function is applied for computing binary classifications by inter-changing softmax activation function applied as the traditional DenseNet201 structure. A neuron present in the fully connected (FC) dense layers is linked to all neurons in the former layer. It is defined numerically by FC layer 1, where the input 2D feature map is extended to ID feature vectors: ..
x l = f w k ..
The Bernoulli function generates a vector t l−1 randomly using the 0-1 distribution with a certain probability. c l−1 represents the vector dimension. Two layers of the FC layer apply a dropout principle for blocking specific neurons based on the desired probability, which prevents over-fitting problems in a deep system. w l and o l describe the weighting as well as offset variables of the FC layer, correspondingly. A sigmoid activation function is applied for changing non-normalized results into binary outputs as zero/one. Henceforth, it is helpful in the consequent classification of ICH positive or negative patients. Here, a sigmoid function is illustrated as follows: where y refers the final outcome of a neuron. w i and x i define the weights and inputs, correspondingly.

ELM-Based Classification Process
After the extraction of a valuable set of feature vectors, the ELM model is applied for the classification method. In general, ELM is defined as a single hidden-layer feed-forward neural network (SLFN). The working principle of SLFN has to be optimized for a system that has to be labelled for data such as threshold value, weight, and activation function; thus, advanced learning is carried out. In the gradient-based learning model, the parameters are modified iteratively to accomplish an optimized measure. Then, with the possibility of a connected device and local minima, the function generates minimal outcomes. In contrast to FNN, it is renewed according to the gradient in ELM; outcomes are estimated, whereas input weights are selected randomly. In the analytic learning process, a success rate is enhanced, as the resolution time and error value mitigate the probability of extracting a local minimum. ELM is also applied for selecting a linear function and enables the cells of the hidden layer, and to apply non-linear (sinusoidal and sigmoid), non-derivatized, or intermittent activation function [23]. Figure 4 showcases the ELM structure.
where β i denotes the weights among input and hidden layers and β j refers to the weight from output and hidden layers; b j implies a thresholding value of neuron in the hidden layer and g is an activation function. The same number of input layer weights w i,j and bias (b j ) are allocated arbitrarily. Normally, the activation function (g(·)) is allocated for the input layer neuron number (n) and hidden-layer neuron value (m). In this approach, these parameters are referred to as an equilibrium that is unified and organized, and the output layer is depicted in Equation (24).
In the training procedure, the training error is minimized to a greater extent. Then, the error function of an output Y p is attained by the original outputŶ o value in ELM, 2 , which can be reduced. These functions are used to accomplish output Y p , achieved by the original value Y o , which has to be similar to Y p . While satisfying this function, an unknown parameter in Equation is depicted. The H matrix is defined as a matrix with a lower possibility, which refers to the count of data in the trained set not being equal to the count of features.
To estimate the performance of the projected method, a research study was carried out utilizing the standard ICH dataset [24]. The dataset includes ICH masks and CT scans, in JPG and NIfTI format, at PhysioNet repository. NIfTI is a type of file format for neuroimaging, which is used very commonly in imaging informatics for neuroscience and even neuroradiology research. The datasets are gathered from the CT scans of 82 individuals under the age group up to 72 years. Furthermore, the dataset has images falling under six classes, namely, intraventricular, with 24 slices; epidural, with 182 slices; intrarparenchymal, with 73 slices; subdural, with 56 slices; subarachnoid, with 18 slices; and no hemorrhage, with 2173 slices. For experimental validation, we have used fivefold cross validation for splitting the dataset into testing and training sets.
The result is observed in terms of four measures, namely, sens y , spec y , acc y , and prec s . For comparison functions, the set of methods employed for comparisons are U-Net, Window Estimator Module to a Deep Convolutional Neural Network (WEM-DCNN) [25], Watershed Algorithm with ANN (WA-ANN) [26], ResNexT [27], SVM, and CNN approaches.

Results and Discussion
An analysis of ICH diagnoses results achieved by the DN-ELM model is examined under a different number of epochs. The presented DN-ELM model displayed excellent results under all the distinct epoch counts, as shown in Table 1 and Figures 5 and 6. For example, in the presence of 100 epochs, the DN-ELM model resulted in a sens y of 95.67%, spec y of 98.10%, prec s of 96.55%, and acc y of 96.08%. Simultaneously, in the presence of 200 epochs, the DN-ELM method resulted in a sens y of 94.82%, spec y of 97.75%, prec s of 95.98%, and acc y of 96.15%. Concurrently, in the presence of 300 epochs, the DN-ELM approach resulted in a sens y of 94.91%, spec y of 97.51%, prec s of 96.18%, and acc y of 96.42%.   In addition, in the presence of 400 epochs, the DN-ELM methodology resulted in a sens y of 95.12%, spec y of 97.34%, prec s of 96.27%, and acc y of 96.30%. Besides, in presence of 500 epochs, the DN-ELM model resulted in a sens y of 95.76%, spec y of 97.81%, prec s of 96.45%, and acc y of 96.76%. The average results analysis of the DN-ELM model shown that the DN-ELM technique attained the highest sens y of 95.26%, spec y of 97.70%, prec s of 96.29%, and acc y of 96.34%. Figure 7 investigates the results analysis of the DL-ICH approach with the current technique with respect to several performance measures. It is shown that the WA-ANN model provided ineffective ICH diagnosis results by offering a minimum spec y of 70.13% and sens y of 60.18%. At the same time, the U-Net algorithm attempted to showcase a slightly higher sens y of 63.1% and spec y of 88.6%. Furthermore, the SVM model demonstrated manageable performance with a sens y of 76.38% and spec y of 79.41%. In line with this, the WEM-CNN technique depicted outstanding performance, with a sens y of 83.33% and spec y of 97.48%. Moreover, the CNN model provided somewhat higher performance with a sens y of 87.06% and spec y of 88.18%. Although the ResNexT model resulted in a competitive sens y of 88.75% and spec y of 97.7%, the proposed DN-ELM system achieved a superior ICH diagnostic outcome with a sens y of 95.26% and spec y of 97.7%. It is shown that the WA-ANN method provided ineffective ICH diagnosis results by offering a minimum prec s of 70.08% and acc y of 69.78%. Simultaneously, the SVM model attempted to demonstrate a somewhat superior prec s of 77.53% and acc y of 77.32%. In line with this, the CNN approach portrayed manageable performance with an acc y of 87.56% and prec s of 87.98%. At the same time, the U-Net approach displayed even more optimal outcomes with an acc y of 87% and prec s of 88.19%. Besides, the WEM-DCNN approach provided slightly higher performance with a prec s of 89.9% and acc y of 88.35%. Although the ResNexT technique resulted in a good prec s of 95.2% and acc y of 89.3%, the presented DN-ELM method attained optimal ICH diagnostic results, with a prec s of 96.29% and acc y of 96.34%.

Conclusions
This paper introduced a new DL-ELM technique for the diagnosis and classification of ICH. The presented method comprises several subprocesses, such as classification, preprocessing, segmentation, and feature extraction. The DL-ELM model undergoes a preprocessing step, where the input data from the NIfTI file are transformed into JPEG format. Then, the TEGOA technique is employed for the image segmentation process. The application of GOA helps to determine the optimal threshold value to perform multilevel thresholding-based image segmentation. Furthermore, the segmented image is fed as input to the DenseNet-201 model. Subsequent to the extraction of a valuable set of feature vectors, the ELM model is employed for the classification process. A detailed experimental results analysis takes place to determine the performance of the DL-ELM approach. The outcome of the simulations implied that the DN-ELM model outperformed all of the state-of-the-art ICH approaches, with a sens y of 95.26%, spec y of 97.70%, prec s of 96.29%, and acc y of 96.34%. As a part of the future scope, the hyper parameters of the DenseNet methodology should be determined using the bio-inspired optimization algorithms to improve the classification outcome.