Intracranial Hemorrhages Segmentation and Features Selection Applying Cuckoo Search Algorithm with Gated Recurrent Unit

: Generally, traumatic and aneurysmal brain injuries cause intracranial hemorrhages, which is a severe disease that results in death, if it is not treated and diagnosed properly at the early stage. Compared to other imaging techniques, Computed Tomography (CT) images are extensively utilized by clinicians for locating and identifying intracranial hemorrhage regions. However, it is a time-consuming and complex task, which majorly depends on professional clinicians. To highlight this problem, a novel model is developed for the automatic detection of intracranial hemorrhages. After collecting the 3D CT scans from the Radiological Society of North America (RSNA) 2019 brain CT hemorrhage database, the image segmentation is carried out using Fuzzy C Means (FCM) clustering algorithm. Then, the hybrid feature extraction is accomplished on the segmented regions utilizing the Histogram of Oriented Gradients (HoG), Local Ternary Pattern (LTP), and Local Binary Pattern (LBP) to extract discriminative features. Furthermore, the Cuckoo Search Optimization (CSO) algorithm and the Optimized Gated Recurrent Unit (OGRU) classiﬁer are integrated for feature selection and sub-type classiﬁcation of intracranial hemorrhages. In the resulting segment, the proposed ORGU-CSO model obtained 99.36% of classiﬁcation accuracy, which is higher related to other considered classiﬁers.


Introduction
The intracranial hemorrhage disease is caused in the brain due to the leakage in the blood vessels that leads to inactive body functions such as memory loss, speech, and eyesight [1]. The major risk factors in intracranial hemorrhages are infected blood vessel walls, and leakage in the vein [2]. Compared to other imaging modalities, CT imaging is the preferred modality in intracranial hemorrhage detection because of its limited cost, high sensitivity, rapidity, and wide availability [3]. The intracranial hemorrhage lesions are brightly characterized in the CT imaging modality. The manual detection of intracranial hemorrhage lesions from the CT scan remains challenging because of artifacts in CT scans, uneven boundaries, noise, and overlapping pixel intensities [4,5]. Hence, the manual demarcation is subject to the intra-observer and inter-observer, and it is heavily dependent on the physician's expertise [6].
Additionally, the complexities and irregularities associated with varied sizes and shapes of intracranial hemorrhage lesions make the segmentation and classification process more strenuous and difficult [7,8]. Intracranial hemorrhage detection becomes a daunting and laborious task, especially in large clinical settings, which introduce delay and inadvertent error. Therefore, the development of automated models supports physicians in making efficient, reliable, and rapid intracranial hemorrhage lesions detection from 3D

•
Developed FCM clustering algorithm for segmenting the diseased portions from the collected 3D brain scans. The FCM clustering algorithm gives good results in the overlapped database, and it is comparatively better than other considered clustering algorithms. • Performed hybrid feature extraction using HoG, LTP, and LBP descriptors. The hybrid feature extraction includes advantages such as improved data visualization, a sped-up training process, an increase in explainability, and overfitting risk reduction. • Developed CSO algorithm for feature optimization to diminish the dimension of the extracted feature vectors that reduce the complexity of the system and computational time. • Proposed OGRU model for classifying 3D brain image classes, namely Intraparenchymal, Subdural, Subarachnoid, Intraventricular, Epidural, and any other. The proposed OGRU model includes Broyden Fletcher Goldfarb Shanno's (BFGS) algorithm to resolve the un-constrained non-linear optimization issues.

•
Proposed OGRU-CSO model's efficiency is investigated by utilizing evaluation measures such as Matthews Correlation Coefficient (MCC), precision, f-measure, specificity, recall, and accuracy.
This paper is organized as follows: Some recent articles related to intracranial hemorrhage detection are reviewed in Section 2. The mathematical derivations and the experimental evaluations of the OGRU-CSO model are represented in Sections 3 and 4. Lastly, the conclusion is denoted in Section 5.

Related Works
Anupama et al. [10] implemented a new intracranial hemorrhages detection model based on a synergic deep learning model and a Grab cut-based segmentation algorithm. Initially, the Gabor filter was used to remove noise from the acquired images, and then the Grab cut-based segmentation algorithm was developed to segment the diseased portions from the denoised images. Finally, the synergic deep learning model was implemented with a soft-max classifier for feature extraction and classification. Raghavendra et al. [11] used non-linear features and a probabilistic neural network for effective intracranial hemorrhage detection. The experimental examination showed that the developed model performs better related to the existing models. The developed model effectively simplifies the diagnostic process that enables clinicians in evaluating a large amount of 3D CT scans with consistent and more accurate results. In addition, Hssayeni et al. [12] developed a new Convolutional Neural Network (CNN) named U-Net for the automatic classification of intracranial hemorrhages. The developed model creates better intracranial hemorrhage indexing, which offers better sifts, highly accurate solutions, and is cost-effective for the identification of intracranial hemorrhage lesions.
Ye et al. [13] have integrated both Recurrent Neural Network (RNN) and CNN models for intracranial hemorrhage detection and classification of its subtypes, such as subarachnoid, epidural, intraventricular, cerebral parenchymal, and subdural. In this literature, an extensive experiment was carried out on the benchmark datasets to investigate the performance of the developed model. Sage and Badura [14] implemented a double branch CNN, random forest, and support vector machine classifiers for automatic intracranial hemorrhage detection in head 3D CT scans. The experimental results justified the use of random forest with the double source features. Burduja et al. [15] integrated CNN and long short-term memory networks to detect intracranial hemorrhages in the 3D CT scans. The developed model exhibits strong generalization capacity and also provides robust and accurate results on large image databases. Wang et al. [16] implemented a new deep learning model that integrates two sequence models and two-dimensional CNN for achieving precise acute intracranial hemorrhage detection. In this literature study, the simulation result was examined under dissimilar performance measures: accuracy, sensitivity, and specificity.
Gautam and Raman [17] developed a new system by integrating a deep-learning model and image fusion. Initially, the quad-tree approach was developed to pre-process the collected images and then the fusion technique was utilized to improve the contrast of the stroke regions. Furthermore, the CNN model was implemented to classify the brain strokes into three (normal, hemorrhagic, and ischemic) and two categories (ischemic and hemorrhagic) from the 3D CT images. Additionally, Mansour and Aljehane [18] combined an elephant herd optimization algorithm with Kapur's thresholding for segmenting the diseased portions from the collected images, and further, the Inception V4 Network was developed for extracting discriminative features from the diseased portions. Lastly, the multi-layer perceptron was applied for sub-type classification, which achieved a better classification accuracy rate compared to other deep learning models.
Kumar et al. [19] implemented an entropy-based unsupervised model for automatic detection and segmentation of intracranial hemorrhages on the brain CT images. The developed model used the FCM clustering algorithm for skull removal. The experimental evaluation confirmed that the FCM clustering algorithm obtained significant segmentation results compared to manual segmentation. Patel et al. [20] implemented a bi-directional long short-term memory network for intracranial hemorrhage detection at the image level. In this study, the developed model was used for various pathology and anatomy. By investigating the existing literature studies, the majority of the works are developed using handcrafted features, which require complex domain experts for the detection of intracranial hemorrhages. To address this concern, a new model named ORGU-CSO is proposed in this paper.

Methodology
In the intracranial hemorrhage detection, the proposed system comprises five phases: Image Collection: RSNA 2019 brain CT hemorrhage database, Segmentation: FCM clustering algorithm, Feature Extraction: HoG, LTP, and LBP, Feature Optimization: Binary cuckoo search algorithm, and Classification: OGRU. The flowchart of the proposed system is illustratively specified in Figure 1. Initially, the quad-tree approach was developed to pre-process the collected images and then the fusion technique was utilized to improve the contrast of the stroke regions. Furthermore, the CNN model was implemented to classify the brain strokes into three (normal, hemorrhagic, and ischemic) and two categories (ischemic and hemorrhagic) from the 3D CT images. Additionally, Mansour and Aljehane [18] combined an elephant herd optimization algorithm with Kapur's thresholding for segmenting the diseased portions from the collected images, and further, the Inception V4 Network was developed for extracting discriminative features from the diseased portions. Lastly, the multi-layer perceptron was applied for sub-type classification, which achieved a better classification accuracy rate compared to other deep learning models.
Kumar et al. [19] implemented an entropy-based unsupervised model for automatic detection and segmentation of intracranial hemorrhages on the brain CT images. The developed model used the FCM clustering algorithm for skull removal. The experimental evaluation confirmed that the FCM clustering algorithm obtained significant segmentation results compared to manual segmentation. Patel et al. [20] implemented a bi-directional long short-term memory network for intracranial hemorrhage detection at the image level. In this study, the developed model was used for various pathology and anatomy. By investigating the existing literature studies, the majority of the works are developed using handcrafted features, which require complex domain experts for the detection of intracranial hemorrhages. To address this concern, a new model named ORGU-CSO is proposed in this paper.

Methodology
In the intracranial hemorrhage detection, the proposed system comprises five phases: Image Collection: RSNA 2019 brain CT hemorrhage database, Segmentation: FCM clustering algorithm, Feature Extraction: HoG, LTP, and LBP, Feature Optimization: Binary cuckoo search algorithm, and Classification: OGRU. The flowchart of the proposed system is illustratively specified in Figure 1.

Image Collection
The proposed OGRU model's performance is validated on a foreign database: RSNA 2019 brain CT hemorrhage database, which consists of 25,272 3D brain scans with 870,301

Image Collection
The proposed OGRU model's performance is validated on a foreign database: RSNA 2019 brain CT hemorrhage database, which consists of 25,272 3D brain scans with 870,301 slices and a pixel size of 256 × 256. In this manuscript, the proposed OGRU model is re-trained on this database, and then the results are validated with different cross-folds. In the RSNA 2019 brain CT hemorrhage database, the 3D brain scans are labeled with the annotators using 5 brain hemorrhage label types intraparenchymal, epidural, intraventricular, subarachnoid, and subdural. Furthermore, the brain scans are collected from three institutions like Stanford University, Universidade Federal de Sao Paulo Institution, and Thomas Jefferson University Hospital. In the RSNA 2019 brain CT hemorrhage database, the annotators have no information about symptoms' acuity, medical history, patient age, and prior examination. However, it is automatically labeled as intracranial hemorrhage, while a slice comprises at least one intracranial hemorrhage type. The sample acquired 3D brain scans are depicted in Figure 2.
slices and a pixel size of 256 256. In this manuscript, the proposed OGRU model is retrained on this database, and then the results are validated with different cross-folds. In the RSNA 2019 brain CT hemorrhage database, the 3D brain scans are labeled with the annotators using 5 brain hemorrhage label types intraparenchymal, epidural, intraventricular, subarachnoid, and subdural. Furthermore, the brain scans are collected from three institutions like Stanford University, Universidade Federal de Sao Paulo Institution, and Thomas Jefferson University Hospital. In the RSNA 2019 brain CT hemorrhage database, the annotators have no information about symptoms' acuity, medical history, patient age, and prior examination. However, it is automatically labeled as intracranial hemorrhage, while a slice comprises at least one intracranial hemorrhage type. The sample acquired 3D brain scans are depicted in Figure 2.

Image Segmentation
After collecting the 3D brain scans, the image segmentation is accomplished using the FCM clustering algorithm to localize the specific object in the complex templates. Hence, the FCM uses fuzzy-set theory to assign a data object to the clusters. In the FCM clustering algorithm, each object is considered a member of each cluster with a variable degree of membership. The similarity between the object is estimated by utilizing the Euclidian distance measure, which plays a crucial role to select the precise clusters. In every iteration, the objective function is reduced in the FCM clustering algorithm that is defined in Equation (1). (1) where indicates clusters, states degree of membership for data point in the cluster , indicates the center vector of the cluster , and denotes the number of data points.
In addition, the norm estimates the similarity of data points to the center vector of the cluster . Then, is determined for a given data point using Equation (2). 1 where states the fuzziness coefficient. Additionally, the center vector is determined using Equation (3) [21,22].
The fuzziness coefficient estimates the clustering tolerance by utilizing Equations (2) and (3). The limited fuzziness coefficient value has a smaller overlap between the

Image Segmentation
After collecting the 3D brain scans, the image segmentation is accomplished using the FCM clustering algorithm to localize the specific object in the complex templates. Hence, the FCM uses fuzzy-set theory to assign a data object to the clusters. In the FCM clustering algorithm, each object is considered a member of each cluster with a variable degree of membership. The similarity between the object is estimated by utilizing the Euclidian distance measure, which plays a crucial role to select the precise clusters. In every iteration, the objective function j is reduced in the FCM clustering algorithm that is defined in Equation (1).
where C indicates clusters, δ ij states degree of membership for i th data point x di in the cluster j, c j indicates the center vector of the cluster j, and N x di denotes the number of data points.
In addition, the norm x di − c j estimates the similarity of data points x di to the center vector of the cluster j. Then, δ ij is determined for a given data point x di using Equation (2).
where m states the fuzziness coefficient.
The fuzziness coefficient m estimates the clustering tolerance by utilizing Equations (2) and (3). The limited fuzziness coefficient m value has a smaller overlap between the clusters C. In this clustering algorithm, the accuracy a is estimated by using δ ij from the present iteration k to the next iteration k + 1, which is mathematically specified in Equation (4).
where δ k ij and δ k+1 ij indicates the degree of membership of the iterations k and k + 1, and ∆ specifies the highest vector value. Furthermore, the hybrid feature extraction is accomplished using HoG, LBP, and LTP descriptors for extracting features from the segmented images. The sample segmented 3D brain scans are depicted in Figure 3.
Appl. Sci. 2022, 12, x FOR PEER REVIEW 5 of 1 clusters . In this clustering algorithm, the accuracy is estimated by using from the present iteration to the next iteration 1, which is mathematically specified in Equa tion (4).
∆ ∆ (4 where and indicates the degree of membership of the iterations and 1 and ∆ specifies the highest vector value. Furthermore, the hybrid feature extraction is ac complished using HoG, LBP, and LTP descriptors for extracting features from the seg mented images. The sample segmented 3D brain scans are depicted in Figure 3.

Hybrid Feature Extraction
After image segmentation, the hybrid feature extraction is accomplished by using HoG, LBP, and LTP feature descriptors, where these descriptors are selected based on the feature importance calculation. In image processing applications, the HoG descriptor i often used for extracting feature values from medical images. In the HoG feature de scriptor, the magnitude, and orientation of the brain scans are initially computed. The vertical gradient and horizontal gradient are mathematically specified in Equa tion (5).
The computed vertical gradient and horizontal gradient are utilized to calcu late the gradient magnitude , and angular orientation , that are defined in Equations (6) and (7).
, , , The gradient magnitude , and angular orientation , partitions the 3D brain scans into different cells. Furthermore, the orientation related to the similar cells i integrated and quantized into final histogram bins and then the respective bins are com bined into the final histogram [23,24]. The total number of features are estimated by utilizing Equation (8).
represents the number of bins, specifies the number of blocks per 3D brain scan, and denotes block size. In addition, the LBP and LTP encode the relation between the neighborhood pixe and the referenced pixel by calculating the gray-level difference. The LBP is an effective texture feature descriptor, which transforms the 3D brain scans into labels based on lumi nance value. In a 3D brain scan , the position of the pixel is represented as , , which is derived by utilizing the central pixel value of the threshold to signify the neighbor hood pixel . Additionally, the binary pixel value is weighted using the power of two

Hybrid Feature Extraction
After image segmentation, the hybrid feature extraction is accomplished by using HoG, LBP, and LTP feature descriptors, where these descriptors are selected based on the feature importance calculation. In image processing applications, the HoG descriptor is often used for extracting feature values from medical images. In the HoG feature descriptor, the magnitude, and orientation of the brain scans I N are initially computed. The vertical gradient G v and horizontal gradient G h are mathematically specified in Equation (5).
The computed vertical gradient G v and horizontal gradient G h are utilized to calculate the gradient magnitude M(x, y) and angular orientation θ(x, y) that are defined in Equations (6) and (7).
The gradient magnitude M(x, y) and angular orientation θ(x, y) partitions the 3D brain scans into different cells. Furthermore, the orientation related to the similar cells is integrated and quantized into final histogram bins and then the respective bins are combined into the final histogram [23,24]. The total number of features T hog are estimated by utilizing Equation (8).
where N b represents the number of bins, B img specifies the number of blocks per 3D brain scan, and B s denotes block size.
In addition, the LBP and LTP encode the relation between the neighborhood pixel and the referenced pixel by calculating the gray-level difference. The LBP is an effective texture feature descriptor, which transforms the 3D brain scans into labels based on luminance value. In a 3D brain scan I, the position of the pixel is represented as (x, y), which is derived by utilizing the central pixel value x c of the threshold to signify the neighborhood pixel n p . Additionally, the binary pixel value is weighted using the power of two, and further, the value is summed to generate a decimal number and it is stored in the location of x c . The LBP is mathematically specified in Equations (9) and (10) [25].
where u denotes maximum jumping time and x i specifies the gray-level value of the center pixel x c . Similarly, the LTP is an extension of LBP that uses a thresholding constant for the pixel intensify values of three. In the LTP feature descriptor, the thresholding value is defined by using Equation (11) [26].
where T v denotes thresholding constant. By contrast, the extracted 9824 feature vectors are given as the input to the binary cuckoo search algorithm for feature optimization. The graphical representation of the feature importance calculation is shown in Figure 4.
Appl. Sci. 2022, 12, x FOR PEER REVIEW 6 of 1 and further, the value is summed to generate a decimal number and it is stored in the location of . The LBP is mathematically specified in Equations (9) and (10) [25].
where denotes maximum jumping time and specifies the gray-level value of the cen ter pixel .
Similarly, the LTP is an extension of LBP that uses a thresholding constant for the pixel intensify values of three. In the LTP feature descriptor, the thresholding value i defined by using Equation (11) [26].

Feature Optimization
After feature extraction, the feature optimization is accomplished using the binary cuckoo search algorithm that is stimulated using obligate parasites. In the host bird nests the cuckoo birds lay down their eggs. The cuckoo bird mimics the external properties o the eggs from the host nests, such as color, spot, and size, and further, the cuckoo bird place the eggs in the host bird's nests. When this approach is ineffective, the host bird identify the cuckoo eggs. Then, the host birds abandon the nest or throw away cuckoo eggs, or else the cuckoo is successful in its strategies and process for the next generation Based on this concept, the cuckoo search algorithm is generated [27,28] and the step-by step process of this algorithm is given below: Initialization Stage: Firstly, the host nest population is selected randomly (where 1, 2. . ). Generation of New Cuckoo Stage: After randomly initializing the nest population in search space, the initialized cuckoos are assessed by utilizing an objective function fo identifying better solutions.
Fitness Evaluation Stage: Compute the fitness based on Equation (12) that helps to select the best one. Where, indicates feature length, denotes the state vector of the chaotic system and represents the state vector of the estimated system.

Feature Optimization
After feature extraction, the feature optimization is accomplished using the binary cuckoo search algorithm that is stimulated using obligate parasites. In the host bird nests, the cuckoo birds lay down their eggs. The cuckoo bird mimics the external properties of the eggs from the host nests, such as color, spot, and size, and further, the cuckoo bird place the eggs in the host bird's nests. When this approach is ineffective, the host birds identify the cuckoo eggs. Then, the host birds abandon the nest or throw away cuckoo eggs, or else the cuckoo is successful in its strategies and process for the next generation. Based on this concept, the cuckoo search algorithm is generated [27,28] and the step-by-step process of this algorithm is given below: Initialization Stage: Firstly, the host nest population P i is selected randomly (where i = 1, 2...n).
Generation of New Cuckoo Stage: After randomly initializing the nest population in search space, the initialized cuckoos are assessed by utilizing an objective function for identifying better solutions.
Fitness Evaluation Stage: Compute the fitness based on Equation (12) that helps to select the best one. Where, F L indicates feature length, V denotes the state vector of the chaotic system and V represents the state vector of the estimated system.
Updating Stage: Cosine transform is employed to revise the initial solution of levy flights. A nest is chosen randomly and the excellence of the novel solution is assessed. In case, if the excellence of the new solution is superior to the old solution. The old solution is replaced with the new solution; otherwise, consider the old solution as the best solution. The levy flight used by the cuckoo search algorithm is mathematically represented in Equation (13).
The Levy flight Equation (13) with Gaussian distribution is shown in Equations (14) and (15).
where σ 0 and µ indicates constant value and c g denotes the current generation. Reject Worst Nest Stage: In this stage, the novel nests are generated randomly, and the worst nests are thrown away based on the possible values. Additionally, the best solutions are graded based on a fitness function. Finally, the best solutions are spotted and recognized as optimal solutions.
Stopping Criterion Stage: This process is replicated until the maximum iteration is accomplished.
Immigration of Cuckoos: Once the cuckoos are grown and become mature; they live in their area and society for a certain period. The best profit society value is selected after the cuckoo groups are formed in dissimilar areas. It is hard to recognize which cuckoo belongs to which group when mature cuckoos live all over the environment. To avoid this concern, cuckoo grouping is carried out using the decision tree method. Each cuckoo β flies toward the goal habitat with a deviation of ∅ radians. These two parameters, β and ∅, help cuckoos identify their positions in the environment. For each cuckoo, β and ∅ are determined by using the Equations (16) and (17).
where β indicates the random number, w denotes the parameter, which compels the deviation from the goal habitat. The parameter settings of the cuckoo search algorithm are given as follows: iteration is 100, step length is 0.01, Levy flight distribution parameter is 1.5, the number of the nest is 20, the number of transition groups is 8, transition separation coefficient is 1, and transition probability coefficient is 0.1. Next, the selected 5409 feature vectors are given as the input to the OGRU model to classify six classes: Intraparenchymal, Subdural, Subarachnoid, Intraventricular, Epidural, and any other. The flowchart of the binary cuckoo search algorithm is represented in Figure 5

Classification
The GRU is an updated version of the Long Short Term Memory (LSTM) network that integrates forget and input gate into a single gate named the "update gate" and further, the GRU model includes an additional gate named the "reset gate". Compared to the LSTM network, the GRU model is simple; therefore, it is becoming increasingly popular. Firstly, the GRU modulates the feature information inside the unit without using a memory cell. In the GRU model, the activation function ℎ is a linear interpolation between the previous activation ℎ and candidate activation ℎ at the time state , which is mathematically specified in Equation (18) [29,30].
where represents the update gate that decides the number of units updating its activation and ℎ states candidate activation.
The mathematical expressions of the update gate and the candidate activation are defined in Equations (19) and (20).
where states reset gates and ℎ states hyperbolic tangent function.
The is mathematically calculated using Equation (21). ℎ where states parameter or weight, and indicates sigmoid function. In this scenario, the update gate controls the prior states, where the long-term dependency units are called active update gates and the short-term dependency units are called active reset gates. The Stochastic Gradient Descent (SGD) optimization algorithm is applied in the GRU model for optimizing the stochastic objective functions based on the lower-order moments. The iterative algorithm: SGD initially starts with the random point

Classification
The GRU is an updated version of the Long Short Term Memory (LSTM) network that integrates forget and input gate into a single gate named the "update gate" and further, the GRU model includes an additional gate named the "reset gate". Compared to the LSTM network, the GRU model is simple; therefore, it is becoming increasingly popular. Firstly, the GRU modulates the feature information inside the unit without using a memory cell. In the GRU model, the activation function h j t is a linear interpolation between the previous activation h j t−1 and candidate activation h j t at the time state t, which is mathematically specified in Equation (18) [29,30].
where z j t represents the update gate that decides the number of units updating its activation and h j t states candidate activation. The mathematical expressions of the update gate and the candidate activation are defined in Equations (19) and (20).
where r j t states reset gates and tanh states hyperbolic tangent function. The r j t is mathematically calculated using Equation (21).
where w states parameter or weight, and σ indicates sigmoid function. In this scenario, the update gate z is applied in the GRU model for optimizing the stochastic objective functions based on the lower-order moments. The iterative algorithm: SGD initially starts with the random point of the gradient curve, and then it slants in the slope with the help of a user-defined learning rate until the gradient curve reaches the minimum value. In this study, the SGD optimization algorithm updates the weight or parameter w utilizing the gradient value ∂L/∂w, and then the corresponding gradient value is multiplied by the learning rate α. Therefore, the updated reset gate is mathematically defined in Equation (22).
where w r+1 = w r − α∂L/∂w r and the term ∂L/∂w r states gradient loss function L that reduces w. If any decimal values occur, the GRU model approximately rounds off the respective decimal values into complete values. The architecture of the GRU model is specified in Figure 6.
Appl. Sci. 2022, 12, x FOR PEER REVIEW 9 of 17 of the gradient curve, and then it slants in the slope with the help of a user-defined learning rate until the gradient curve reaches the minimum value. In this study, the SGD optimization algorithm updates the weight or parameter utilizing the gradient value / , and then the corresponding gradient value is multiplied by the learning rate . Therefore, the updated reset gate is mathematically defined in Equation (22).
where / and the term / states gradient loss function that reduces .
If any decimal values occur, the GRU model approximately rounds off the respective decimal values into complete values. The architecture of the GRU model is specified in Figure 6. The parameter settings of the GRU model are listed as follows: lambda loss amount is 0.0015, the number of hidden units are 32, the learning rate is 0.0025, and the number of iteration is 100. To resolve the un-constrained non-linear optimization issues, a BFGS algorithm is integrated with the GRU model. The BFGS algorithm uses a gradient descent function to further reduce the gradient value to the local minimum. The gradient descent function is mathematically defined in Equation (23). (23) where, states non-convex function. The point is computed in the next iteration using the point , as mentioned in Equation (24). (24) where states search direction and denotes step size, and the minimizer is mathematically defined in Equation (25).
Additionally, the search direction is specified in Equation (26).
, 0 where denotes the 2nd derivative of , which is called the Hessian matrix. In this scenario, the quasi-Newton method is employed to compute , as mentioned in Equation (27). (27) where and . Furthermore, the the approximation is computed utilizing Equation (28). The classes: Intraparenchymal, Subdural, Subarachnoid, Intraventricular, Epidural, and any The parameter settings of the GRU model are listed as follows: lambda loss amount is 0.0015, the number of hidden units are 32, the learning rate is 0.0025, and the number of iteration is 100. To resolve the un-constrained non-linear optimization issues, a BFGS algorithm is integrated with the GRU model. The BFGS algorithm uses a gradient descent function µ to further reduce the gradient value to the local minimum. The gradient descent function µ is mathematically defined in Equation (23).
where, Ψ(µ) states non-convex function. The point µ k+1 is computed in the next iteration k using the point µ k , as mentioned in Equation (24).
where d k states search direction and ϑ k denotes step size, and the minimizer ϑ k is mathematically defined in Equation (25). Additionally, the search direction is specified in Equation (26).
where S k denotes the 2nd derivative of Ψ, which is called the Hessian matrix. In this scenario, the quasi-Newton method is employed to compute S k , as mentioned in Equation (27).
where δ k = µ k+1 − µ k and γ k = Ψ µ k+1 − Ψ µ k . Furthermore, the S k the approximation is computed utilizing Equation (28). The classes: Intraparenchymal, Subdural, Subarachnoid, Intraventricular, Epidural, and any other are classified based on the approximation of S k . The experimental results of the OGRU-CSO model are specified in Section 4.

Experimental Results
In this paper, the OGRU-CSO model's performance is simulated using MATLAB 2020a software tool on a system configuration with 16 GB random access memory, an Intel Core i9 processor, and Windows 10 operating system. The developed OGRU-CSO model's efficiency is investigated in terms of MCC, precision, f-measure, specificity, recall, and accuracy. The mathematical formula of the undertaken performance measures is represented in Equations (29)-(34). Where, FP, FN, TP, and TN indicate false positive, false negative, true positive, and true negative:

Quantitative Evaluation
In this scenario, the efficiency of the OGRU model is validated on the RSNA 2019 brain CT hemorrhage database utilizing recall, precision, f-measure, MCC, specificity, and accuracy. By viewing Tables 1 and 2, the performance of 7 different classifiers: LSTM, Deep Belief Network (DBN), autoencoder, Recurrent Neural Network (RNN), Adaptive Neuro-Fuzzy Inference System (ANFIS), GRU, and OGRU, are tested with the Cuckoo Search Optimization (CSO) algorithm. During the classification, similar parameters are utilized in all runs for all classifiers and a five-fold cross-validation method is applied for analyzing the performance of the OGRU model where the better use of data for training and testing decreases the computational time with limited bias and variance. The OGRU model's performance is evaluated with two different training and testing percentages such as 50:50% and 80:20%. As stated in Table 1, the OGRU model with the CSO algorithm achieved 92.80% of precision, 90.28% of recall, 91.90% of f-measure, 92.91% of MCC, 90.88% of accuracy, and 90.48% of specificity in the intracranial hemorrhage detection with the 50% training and 50% testing of data. The obtained simulation results are better related to comparative classifiers such as LSTM, DBN, autoencoder, RNN, ANFIS, and the conventional GRU model. A graphical comparison of dissimilar classifiers with 50% training and 50% testing of data is depicted in Figure 7.   Furthermore, the RSNA 2019 brain CT hemorrhage database is categorized into train set and test set with the ratio of 80% and 20%. In the RSNA 2019 brain CT hemorrhage database, the highest performance is achieved by the OGRU model with the CSO algorithm. The developed OGRU-CSO model achieved a precision of 99.86%, recall of 99.25%, f-measure of 99.34%, MCC of 99.67%, specificity of 99.40%, and classification accuracy of 99.36%, which are superior related to other classification techniques: LSTM, DBN, autoencoder, RNN, ANFIS, and conventional GRU model. By viewing Table 2, the proposed OGRU-CSO model is highly capable of extracting and optimizing the most discriminative features, which helps in achieving better classification with limited feature vectors. A graphical comparison of dissimilar classifiers with 80% training and 20% testing of data is stated in Figure 8.  Furthermore, the RSNA 2019 brain CT hemorrhage database is categorized into train set and test set with the ratio of 80% and 20%. In the RSNA 2019 brain CT hemorrhage database, the highest performance is achieved by the OGRU model with the CSO algorithm. The developed OGRU-CSO model achieved a precision of 99.86%, recall of 99.25%, fmeasure of 99.34%, MCC of 99.67%, specificity of 99.40%, and classification accuracy of 99.36%, which are superior related to other classification techniques: LSTM, DBN, autoencoder, RNN, ANFIS, and conventional GRU model. By viewing Table 2, the proposed OGRU-CSO model is highly capable of extracting and optimizing the most discriminative features, which helps in achieving better classification with limited feature vectors. A graphical comparison of dissimilar classifiers with 80% training and 20% testing of data is stated in Figure 8. In Tables 3 and 4, the performance investigation is conducted by using different feature optimization algorithms, such as Ant Colony Optimization (ACO), Artificial Bee Colony (ABC), Particle Swarm Optimization (PSO), Genetic Algorithm (GA), Whale Optimization Algorithm (WOA), CSO, and Grasshopper Optimization Algorithm (GOA) with OGRU model. Related to other combinations, the CSO algorithm with the OGRU attained higher results in intracranial hemorrhage detection. In this paper, the CSO algorithm significantly selects the optimum feature vectors with a better balance between exploitation and exploration. However, the comparative optimization algorithms suffer from a premature convergence rate because the optimization algorithms are stuck at the local optimal value. As seen in Tables 3 and 4, the proposed OGRU-CSO model obtained significant performance with 80:20% training and testing of RSNA 2019 brain CT hemorrhage data compared to 50:50% training and testing of data. The graphical representation of dissimilar optimizers with 50% training and 50% testing of data is specified in Figure 9. Correspondingly, the graphical comparison of dissimilar optimizers with 80% training and 20% testing of data is denoted in Figure 10.  In Tables 3 and 4, the performance investigation is conducted by using different feature optimization algorithms, such as Ant Colony Optimization (ACO), Artificial Bee Colony (ABC), Particle Swarm Optimization (PSO), Genetic Algorithm (GA), Whale Optimization Algorithm (WOA), CSO, and Grasshopper Optimization Algorithm (GOA) with OGRU model. Related to other combinations, the CSO algorithm with the OGRU attained higher results in intracranial hemorrhage detection. In this paper, the CSO algorithm significantly selects the optimum feature vectors with a better balance between exploitation and exploration. However, the comparative optimization algorithms suffer from a premature convergence rate because the optimization algorithms are stuck at the local optimal value. As seen in Tables 3 and 4, the proposed OGRU-CSO model obtained significant performance with 80:20% training and testing of RSNA 2019 brain CT hemorrhage data compared to 50:50% training and testing of data. The graphical representation of dissimilar optimizers with 50% training and 50% testing of data is specified in Figure 9. Correspondingly, the graphical comparison of dissimilar optimizers with 80% training and 20% testing of data is denoted in Figure 10.   In addition, the simulation results of the proposed model by varying the features with 50:50% and 80:20% training and testing of data are depicted in Tables 5 and 6. By inspecting the tables, the hybrid-feature extraction has achieved better classification results compared to individual feature descriptors by means of precision, recall, f-measure, MCC, specificity, and accuracy. A graphical comparison of dissimilar features with 50% training and 50% testing of data is denoted in Figure 11. In addition, the graphical comparison of dissimilar features with 80% training and 20% testing of data is represented in Figure 12.     In addition, the simulation results of the proposed model by varying the features with 50:50% and 80:20% training and testing of data are depicted in Tables 5 and 6. By inspecting the tables, the hybrid-feature extraction has achieved better classification results compared to individual feature descriptors by means of precision, recall, f-measure, MCC, specificity, and accuracy. A graphical comparison of dissimilar features with 50% training and 50% testing of data is denoted in Figure 11. In addition, the graphical comparison of dissimilar features with 80% training and 20% testing of data is represented in Figure 12.  In addition, the simulation results of the proposed model by varying the features with 50:50% and 80:20% training and testing of data are depicted in Tables 5 and 6. By inspecting the tables, the hybrid-feature extraction has achieved better classification results compared to individual feature descriptors by means of precision, recall, f-measure, MCC, specificity, and accuracy. A graphical comparison of dissimilar features with 50% training and 50% testing of data is denoted in Figure 11. In addition, the graphical comparison of dissimilar features with 80% training and 20% testing of data is represented in Figure 12.

Comparative Evaluation
Additionally, the comparative investigation between the proposed and the existing models are indicated in Table 7. Anupama et al. [10] created a novel intracranial hemorrhage detection system based on a synergic deep learning model and a Grab cut-based

Comparative Evaluation
Additionally, the comparative investigation between the proposed and the existing models are indicated in Table 7. Anupama et al. [10] created a novel intracranial hemorrhage detection system based on a synergic deep learning model and a Grab cut-based

Comparative Evaluation
Additionally, the comparative investigation between the proposed and the existing models are indicated in Table 7. Anupama et al. [10] created a novel intracranial hemorrhage detection system based on a synergic deep learning model and a Grab cut-based segmentation algorithm. Hence, the developed model has achieved 95.73% of classification accuracy, 97.78% of specificity, and 94.01% of recall on the benchmark intracranial hemorrhages detection database. Burduja et al. [15] have combined a pre-trained CNN model named ResNeXt-101 with a bidirectional long short-term memory network for recognizing intracranial hemorrhages in 3D CT scans. The experimental results showed that the individual ResNeXt-101 model achieved 97.54% of classification accuracy, 60.79% of recall, and 99.32% of specificity. However, the ResNeXt-101 with bidirectional long short-term memory network has achieved better specificity of 99%, recall of 72.86%, and classification accuracy of 97.83% on the RSNA 2019 brain CT hemorrhage database. Wang et al. [16] developed a novel deep learning model, which integrates two sequence models and 2D CNN to achieve precise acute intracranial hemorrhage detection. However, the presented model attained specificity of 94.85%, recall of 95.84%, and classification accuracy of 95% on the benchmark intracranial hemorrhages detection database. Related to these comparative models, the developed OGRU-CSO model achieved superior results in intracranial hemorrhage detection by utilizing specificity, recall, and accuracy on the RSNA 2019 brain CT hemorrhage database. In addition to this, the selection of the optimal features by the CSO algorithm reduces the model complexity to linear. The computational time of the proposed model is 33.28 s, which is superior to other optimizers and classifiers.

Conclusions
In this paper, the OGRU-CSO model is developed for intracranial hemorrhage detection or segmentation in the 3D CT scans. The proposed OGRU-CSO model consists of two important phases: feature selection or optimization and classification. After segmenting the diseased portions from the collected 3D CT scans, the feature vectors are extracted using three feature descriptors such as HoG, LBP, and LTP. The extracted features are multi-dimensional, so the CSO optimization algorithm is employed for optimizing the dimension of the extracted feature vectors to improve the computation time and the system complexity. Finally, the optimized discriminative feature vectors are given as the input to the OGRU for the sub-type classification of intracranial hemorrhages. In the experimental section, the proposed OGRU-CSO model achieved 99.36% of classification accuracy on the RSNA 2019 brain CT hemorrhage database, which is effectively related to other classifiers and optimizers. Furthermore, the computational complexity of the proposed OGRU-CSO model is linear by optimizing the feature vectors. As a future extension, a novel hyperparameter optimization algorithm can be included in the OGRU model to further enhance intracranial hemorrhage lesions detection.