A Novel Lightweight CNN Architecture for the Diagnosis of Brain Tumors Using MR Images

Over the last few years, brain tumor-related clinical cases have increased substantially, particularly in adults, due to environmental and genetic factors. If they are unidentified in the early stages, there is a risk of severe medical complications, including death. So, early diagnosis of brain tumors plays a vital role in treatment planning and improving a patient’s condition. There are different forms, properties, and treatments of brain tumors. Among them, manual identification and classification of brain tumors are complex, time-demanding, and sensitive to error. Based on these observations, we developed an automated methodology for detecting and classifying brain tumors using the magnetic resonance (MR) imaging modality. The proposed work includes three phases: pre-processing, classification, and segmentation. In the pre-processing, we started with the skull-stripping process through morphological and thresholding operations to eliminate non-brain matters such as skin, muscle, fat, and eyeballs. Then we employed image data augmentation to improve the model accuracy by minimizing the overfitting. Later in the classification phase, we developed a novel lightweight convolutional neural network (lightweight CNN) model to extract features from skull-free augmented brain MR images and then classify them as normal and abnormal. Finally, we obtained infected tumor regions from the brain MR images in the segmentation phase using a fast-linking modified spiking cortical model (FL-MSCM). Based on this sequence of operations, our framework achieved 99.58% classification accuracy and 95.7% of dice similarity coefficient (DSC). The experimental results illustrate the efficiency of the proposed framework and its appreciable performance compared to the existing techniques.


Introduction
The brain plays a crucial role in every aspect of human activity but studying its clinical elements is very challenging due to the complexity associated with its structure and functionality. Behind many medical complications in the brain, tumors are observed to be the main reason. Usually, it is created in or around the brain due to the unconstrained development of irregular cells, which may spread to other parts [1]. Typically, brain tumors are classified into primary and secondary (metastatic). Primary tumors begin in the brain, while secondary brain tumors arise from other body regions such as lungs, breasts, kidneys, skin, etc., and migrate to brain tissues through the bloodstream [2].
Further, primary brain tumors can be categorized as either cancerous (malignant) or non-cancerous (benign). Non-cancerous tumors do not have any active cells; hence, they can be wholly restrained and treated by a surgical process. On the other hand, cancerous tumors have active cells proliferating and attacking other brain areas. These tumors cannot be cured under regular medication but may be controlled by radiotherapy/chemotherapy. The survival rate of victims of cancerous tumors is low compared to non-cancerous tumors, so early brain tumor detection is crucial. In this process, imaging modalities such as magnetic resonance (MR) imaging and computed tomography (CT) [3] play an essential 1.
Traditional automatic detection approaches utilized conventional machine learning algorithms, whose performance depends on the choice of appropriate features and learning approaches.

2.
Some classification methods employed wavelets for image analysis. However, they fail to acquire directional information; the selection of subbands and mother wavelets is also critical. 3.
Some approaches use handcrafted features but are not robust to noise and exhibit poor discrimination. 4.
The authors implemented some traditional CNN frameworks such as pre-trained CNN models with transfer learning to classify brain MR images in a few works. However, they demand a large number of parameters and high computational time. Table 1. Summary of the state-of-the-art-approaches.

Reference Methods to Be Used Accuracy Pros Cons
Kale et al. [8] LBP and SP Accuracy = 96.17% Significantly extract the directional details of abnormal tissues.
Performance of the model depends upon the selection of orientation bands.
Singh et al. [9] DWT and ICA Accuracy = 98.87% Relatively obtain the spatial information that is useful in the classification of brain MR images.
The selection of an appropriate mother wavelet is a major challenge.
Relatively required large number of coefficients for approximating the smooth functions. Gokulalakshmi et al. [11] DWT and GLCM Accuracy = 92.76% Low-processing time and easy to implement.
Selection of displacement vector.
Working on low-resolution images.
Loose the some significant information.
Wang et al. [14] SWT and Entropy Accuracy = 96.6% Significantly highlighting the image edge features.
Irrelevant features might be extracted due to wavelet aliasing.
Arunkumar et al. [15] K-means clustering and ANN Accuracy = 94.07% Woks very well on limited data.
Selection of K-value is difficult.
Togaçar et al. [16] CNN and hyper-column feature selection Accuracy = 96.77% Relatively retain the local discriminative features.
High computational time.
Low performance on normal brain MR images Lu et al. [18] AlexNet Accuracy = 95.71 % Perform well on abnormal brain MR images.
Large number of parameters to be need for training.
Requires necessary and sufficient information for developing significant clusters.
Hasan et al. [20] Modified GLCM Accuracy = 97.8 % Achieved remarkable accuracy and also independent on atlas registration.
Large memory requirements and computationally expensive. Only the axial dataset of brain tumors was considered.
Haitham et al. [31] Cascaded CNN DSC = 85.3% Relatively achieved good performance in a limited brain MR image database.
Required more time to train the parameters.
To address the abovementioned problems, we suggested a new approach for identifying and classifying brain MR images using a fast-linking modified spiking cortical model (FL-MSCM) and lightweight CNN.

Significant Contributions
The significant contributions of this work are summarized as follows:

1.
Skull-stripping is performed to enhance the robustness of the segmentation process by eliminating extra-meningeal mater (or dura mater) based on thresholding and morphological operations.

2.
Image data augmentation is implemented to enhance the sufficiency and diversity of the training database by geometric transformation operators. By this, we significantly reduce the overfitting issues encountered during training progress.

3.
We proposed a novel lightweight CNN architecture to detect high-level features from brain MR images. We can effectively minimize the parameters, including trainable and non-trainable, compared to the existing CNN models and automatically extract the significant features. Due to this, we limit the influence of human beings in the analysis of brain MR tumor images, which is the considerable benefit of the suggested CNN model.

4.
Analyze the impact of various optimization algorithms (Stochastic gradient descent with moment (SGDM), Adam, Adagrad, AdaMax, Adadelta, Nadam, and RMSProp) during training of the CNN model with the help of K-fold cross-validation (KFC). It is the fundamental difference between the existing and proposed models.

5.
The FL-MSCM is employed to separate the foreground (affected regions) and background (non-affected areas) from brain MR images, which can minimize issues of other traditional segmentation algorithms, such as the impact of noise, spurious blobs, and other imaging artifacts, by making each region as uniform as possible. Due to this, we improve the segmentation accuracy, which is a significant advantage of the presented FL-SCM technique.
The remaining part of the work is organized as follows: Section 2 represents the background of the CNN model. Section 3 illustrates the proposed technique and metrics to evaluate the performance of the models. Section 4 analyzes the outcomes and reasons behind the proposed method's success and compares it with other state-of-the-art approaches. Section 5 discusses the conclusion of the present work.

Preliminaries
In this section, we discuss the background of deep learning and describe various layers used in the implementation of the proposed model in detail. Deep learning (DL) architectures can learn complex tasks by hierarchically constructing feature maps. CNNbased methods are more popular among the available DL models and have the following layers: convolutional, pooling, activation, batch normalization, fully connected (FC), and softmax, respectively.

Convolutional Layer
The convolutional layer plays a crucial role in classification. Typically, it produces many feature maps, F by convolving the input image with a set of filters in a sliding window manner as follows: where represents the convolution operator, B is the segmented image, C denotes the filter kernel, u and v are the indices of the generated feature map.

Batch Normalization Layer
It is also termed the batch norm and is mainly used to enhance the stability of a network by normalizing the features obtained from a convolutional layer, or FC layer. Typically, it lies between the convolutional and activation layer. The main advantages of this layer are:

1.
Improving the training speed of the network.

3.
Reducing overfitting since it has slight regularization.
The entire process of the batch norm is described in Algorithm 1.
where γ represents scale; ξ illustrates shift; K is the number of feature inputs; µ and σ 2 are the mean and variance across the batch, b; is a constant, which is used to enhance the stability when σ 2 b is too small. Output: b n F j

Activation Functions
Usually, activation functions are incorporated after the convolutional layer, establishing non-linearity in each neuron's output. Due to this, the network will be able to learn many complex tasks. In this work, we utilized the softplus activation function, which is a smoothed version of rectified linear unit (ReLU) as shown in Figure 1. Mathematically the softplus function is defined as 1. Improving the training speed of the network. 2. Minimizing the internal covariance shift [32]. 3. Reducing overfitting since it has slight regularization.
The entire process of the batch norm is described in Algorithm 1.

Algorithm 1. Batch normalization
Input: Values of F over a mini-batch: Parameters to be learned: , γ ξ .

Activation Functions
Usually, activation functions are incorporated after the convolutional layer, establishing non-linearity in each neuron's output. Due to this, the network will be able to learn many complex tasks. In this work, we utilized the softplus activation function, which is a smoothed version of rectified linear unit (ReLU) as shown in Figure 1. Mathematically the softplus function is defined as

Pooling Layer
The main goal of this layer is to scale down the spatial size of feature maps obtained from the preceding layers, minimizing the number of parameters to be learned and reducing computational time. Average pooling and max-pooling are the most frequently used approaches [33]. In our work, we utilized average and global average pooling (GAP), which is achieved by estimating the average value from each/entire region of the feature map, as shown in Figures 2 and 3. Here, the main objective of the GAP is to yield one feature map for each corresponding classification task category, which avoids the overfitting problem.

Pooling Layer
The main goal of this layer is to scale down the spatial size of feature maps obtained from the preceding layers, minimizing the number of parameters to be learned and reducing computational time. Average pooling and max-pooling are the most frequently used approaches [33]. In our work, we utilized average and global average pooling (GAP), which is achieved by estimating the average value from each/entire region of the feature map, as shown in Figures 2 and 3. Here, the main objective of the GAP is to yield one feature map for each corresponding classification task category, which avoids the overfitting problem.

Softmax
Typically, the softmax is employed at the end of the neural network to transform the features into class probabilities. The softmax yields a value for each class based on the computation of probabilities given by where f is the feature vector; T indicates the transpose operator; w illustrates the weight vector; P is the predicted probability of i th − class and finally, M represents the number of classes. Here, we have chosen M as 2 since we perform binary classification.

Materials and Methods
The proposed system for identifying and classifying brain MR images is represented in Figure 4, and it includes the collection of the database, skull-stripping, image data augmentation, feature extraction and classification by CNN model, and tumor detection using FL-MSCM.

Softmax
Typically, the softmax is employed at the end of the neural network to transform the features into class probabilities. The softmax yields a value for each class based on the computation of probabilities given by where f is the feature vector; T indicates the transpose operator; w illustrates the weight vector; P is the predicted probability of i-th class and finally, M represents the number of classes. Here, we have chosen M as 2 since we perform binary classification.

Materials and Methods
The proposed system for identifying and classifying brain MR images is represented in Figure 4, and it includes the collection of the database, skull-stripping, image data augmentation, feature extraction and classification by CNN model, and tumor detection using FL-MSCM.

Database
To measure the effectiveness of the presented framework, we collected 60 normal and 125 abnormal T2-weighted brain MR images (glioma, metastatic adenocarcinoma, meningioma, sarcoma, and Alzheimer's diseases) from a publicly available data source such as Harvard Medical School [34]. However, we cannot develop an effective diagnosis model based on this small sample size. Therefore, further, we generated augmented im-

Database
To measure the effectiveness of the presented framework, we collected 60 normal and 125 abnormal T2-weighted brain MR images (glioma, metastatic adenocarcinoma, meningioma, sarcoma, and Alzheimer's diseases) from a publicly available data source such as Harvard Medical School [34]. However, we cannot develop an effective diagnosis model based on this small sample size. Therefore, further, we generated augmented images with the help of rotation, translation, reflection, shearing, and scaling geometric transformation operations. Before implementing this step, we performed a skull-stripping process to improve the detection accuracy of the model.

Skull-Stripping
Skull-stripping is a significant preliminary stage in the analysis of biomedical images, which helps improve the effectiveness of brain tumor segmentation during the diagnosis of patients [35]. The main objective of this approach is to extract brain tissues by eliminating non-brain matters such as fat, skin, skull, etc. There have been numerous approaches [36]; thresholding and morphology-based procedures are more popular among them. Inspired by this, we proposed a combination of thresholding and morphological operations to achieve better skull-stripping.

1.
Initially, we separate the image, I into two regions R 1 and R 2 over an intensity-level Here, L is the number of intensity levels, usually an integer power of 2.

2.
Obtain the binary image, B by setting the optimal thresholding value, T opt which is estimated by the following equations where m 1 , m 2 and s 2 1 , s 2 2 represents the mean and variance of the regions over R 1 , and R 2 ; T define the thresholding.

3.
Construct a disk-shaped structuring element, S d with a required radius. 4.
Eliminate the small peak objects from B using a simple area opening operation and then fill the regions with an image filling operation.

5.
Employ the erosion operation on the outcome of step 3 with the defined S d . Using this, we can eliminate small objects which appear in the binary image B.

6.
Finally, the binary image obtained in step 5 is superimposed on the original image, I and replaces the non-binary region with zeros. With this process, the skull-free brain MR image is obtained, which improves the segmentation accuracy.

Image Data Augmentation
Deep learning heavily depends upon the massive amount of data to prevent overfitting. Overfitting is the phenomenon that occurs when a model learns a function with huge variance, which results in high performance on the training database, but fails to obtain high accuracy on the testing database. Hence, to mitigate this problem, we need to increase the number of samples in the given database. To meet this criterion, in this work, we employed data augmentation on skull-stripped images using geometric transformation techniques such as rotation, scaling, translation, and shearing along x-and y-directions, and reflection. Table 2 illustrates the configurations of the suggested augmented operators. We finally attained 540 normal and 1125 abnormal brain MR images with these operators.
After that, we deployed the lightweight CNN model onto augmented images to predict the abnormality of brain MR images.

The Suggested Lightweight CNN Architecture
In the literature, various conventional CNN frameworks [18,26,29,30] were discussed to identify the abnormality of brain MR images. However, they demand a large number of parameters to yield better accuracy, as results increase the computational complexity. Hence, we proposed a lightweight CNN architecture. With the help of our model, we can minimize the number of learning parameters and reduce the training speed without compromising the classification performance. It is the significant difference between the conventional and lightweight CNN models. The architecture of the presented CNN model is illustrated in Figure 5. The fundamental building block of our model is ConvNet, and it includes a convolutional layer, softplus activation function, and batch norm. The structure of the ConvNet is illustrated on the left side of Figure 5.
The proposed CNN model has four blocks, denoted by Blocks 1, 2, 3, and 4. The first block has only one ConvNet module. But the rest of the blocks have three ConvNet modules followed by a 2 × 2 average pooling with the stride of 2 and an adder operator to add the feature map values by point-to-point except the first block. The configurations of ConvNet in each block are as follows: 1.
In the first block, the ConvNet module has 32 filters with a 5 × 5 kernel size, and the stride is 2. Here, the stride of 2 for the convolutional filter minimizes the input's size to half, resulting in reduced computational complexity. Usually, the initial convolutional layers extract edge features; therefore, the stride of 2 will not significantly impact the model's accuracy at initial convolutional layers.

2.
Block 2 has three ConvNets, and they have 48 filters with a kernel size of 3 × 3, 3 × 3, and 1 × 1, and the strides of 2, 1, and 1, respectively. Similarly, blocks 3 and 4 contain three ConvNets with 64 and 128 filters. Each filter has a size of 3 × 3, 3 × 3, and 1 × 1, with a stride of 1. Here, the 1 × 1 convolutional filter is mainly used to minimize the computational requirements, i.e., reduce the dimensionality of the feature map. Due to that, the proposed CNN model required significantly fewer learnable parameters to train the model, as illustrated in Table 3. From this table, we observed that the total number of parameters is nearly 0.35 million. This number is much less than the other traditional CNN models discussed in the literature such as AlexNet [18,26], ResNet-50 [29], VGG-19 [30], etc. Hence, we called as a lightweight CNN. 3.
In each ConvNet, we used a batch norm layer to improve the training speed and minimize overfitting.
At the end of block 4, we incorporated one GAP layer, a dense layer, and a softmax layer having two classes in sequence. Here, the GAP is used to compress the feature map by taking an average of each incoming feature map. After implementing the proposed CNN model, the resultant outcomes fed to the segmentation phase for identifying the infected area of abnormal brain MR images. minimize the number of learning parameters and reduce the training speed with promising the classification performance. It is the significant difference between ventional and lightweight CNN models. The architecture of the presented CNN illustrated in Figure 5. The fundamental building block of our model is ConvN includes a convolutional layer, softplus activation function, and batch norm. The of the ConvNet is illustrated on the left side of Figure 5.

Segmentation
The main objective of segmentation is to improve diagnosis by automatically identifying suspicious patterns. However, it is a challenging task due to the artifacts, soft tissue boundaries, irregular shapes of brain tissues, etc. To address this, we developed a new brain tumor segmentation methodology termed fast-linking modified spiking cortical model (FL-MSCM), motivated by the work in [37].

Modified Spiking Cortical Model (MSCM)
The spiking cortical model (SCM) [38] is derived from Eckhorn's visual cortex model [39] and is developed especially for image processing applications such as segmentation, fusion, texture retrieval, etc. The functional flow graph of the SCM is illustrated in Figure 6, and it consists of a receptive field, a modulation field, and a pulse generator. In the receptive field, each (i, j)-th neuron has a feeding input S i,j and linking input L i,j . In the modulation area, the membrane potential (internal activity), U i,j of the neuron is obtained by multiplying S i,j with L i,j . Finally, the neuron fires and provides a pulse output, Y i,j when U i,j greater than threshold E i,j . The equivalent mathematical expressions for this procedure are given below: where (k, l) denotes the positions of neighboring neurons, n is the number of iterations, W i,j,k,l , and m L represent the weight matrix and magnitude scaling factor of linking field, respectively, β is the linking strength, f is decay constant which always lies between 0 and 1. In our work, S is the input image and S i,j is the intensity value at (i, j) pixel location. In the conventional SCM model [38], to estimate E , an exponential decay function g is used, which results in slow computation. To speed up the process, we employed the MSCM approach with a linear decay mechanism to obtain the E value as follows , , , where h is the threshold magnitude component and Δ ensures that the entire neuron threshold decays linearly. From Equations (11)- (14), we note that the proposed approach has only one convolution term and two leaky integrators. It is the significant advantage of MSCM over pulsecoupled neural networks [40].

Parameter Settings of MSCM
In the implementation of MSCM, the parameters are initialized as follows: 1. Firstly, the output, Y and internal activity, U are initialized as 'zero'. 5. Threshold decay, 0.02 Δ = . 6. Due to the position invariant nature W can be determined by a 7 × 7 Gaussian filter with standard deviation '1′, which is utilized to estimate the precision level of the image pixel. 7. The threshold magnitude component, h is employed to ensure that each neuron will not fire more than once and is estimated using Equation (16).
where the linking strength, β is obtained by the following expression:  In the conventional SCM model [38], to estimate E, an exponential decay function g is used, which results in slow computation. To speed up the process, we employed the MSCM approach with a linear decay mechanism to obtain the E value as follows where h is the threshold magnitude component and ∆ ensures that the entire neuron threshold decays linearly. From Equations (11)- (14), we note that the proposed approach has only one convolution term and two leaky integrators. It is the significant advantage of MSCM over pulse-coupled neural networks [40].

Parameter Settings of MSCM
In the implementation of MSCM, the parameters are initialized as follows:

1.
Firstly, the output, Y and internal activity, U are initialized as 'zero'.
Due to the position invariant nature W can be determined by a 7 × 7 Gaussian filter with standard deviation '1 , which is utilized to estimate the precision level of the image pixel. 7.
The threshold magnitude component, h is employed to ensure that each neuron will not fire more than once and is estimated using Equation (16).
where the linking strength, β is obtained by the following expression: where G = G 2 x + G 2 y and G x , G y are the central difference gradient of S along xdirection and ydirection. 8.
The maximum number of iterations N can be determined as follows: where T G is the gray-level thresholding of S, estimated from Otsu's approach [41].
Here, the primary objective of thresholding is to calculate the number of iterations.
For better segmentation, we apply the fast-linking algorithm to MSCM.

Fast-Linking
Here, compared to normal linking [42], the neurons with similar stimuli respond quickly and synchronously. It mainly includes two loops:

1.
Internal loop: Here, U and Y are repeated until Y does not vary.

2.
External loop: Here, the function E is iterated.
The above process is depicted in Algorithm 2, and the corresponding outputs of FL-MSCM are shown in Figure 7i-l. This figure shows that the proposed segmentation approach significantly separated the tumor and non-tumor regions from skull-free brain MR images.

Performance Metrics
The performance of the proposed model is evaluated using various well-known metrics such as true positive rate (TPR), true negative rate (TNR), positive predictive value (PPV), F-score, accuracy, and the area under the curve (AUC) [43]. TPR estimates the percentage of accurately identified abnormal brain MR images, while TNR measures the percentage of precisely recognized normal brain MR images. PPV calculates the fraction of correctly identified brain MR images flagged as abnormal. F-score is the weighted average or harmonic mean of PPV and TPR. AUC is an effective way of quantifying the overall performance of the test. Accuracy represents the percentage of correctly classified brain MR images, including both normal and abnormal, over the total number of images. The mathematical interpretations of all these parameters are described as follows: where S = segmented image; S G = ground truth; TP = true positive; FN = false negative; FP = false positive and TN = true negative.
where G T is the gray-level thresholding of S , estimated from Otsu's approach [41].
Here, the primary objective of thresholding is to calculate the number of iterations. For better segmentation, we apply the fast-linking algorithm to MSCM.

Fast-Linking
Here, compared to normal linking [42], the neurons with similar stimuli respond quickly and synchronously. It mainly includes two loops: 1. Internal loop: Here, U and Y are repeated until Y does not vary. 2. External loop: Here, the function E is iterated.
The above process is depicted in Algorithm 2, and the corresponding outputs of FL-MSCM are shown in Figure 7i-l. This figure shows that the proposed segmentation approach significantly separated the tumor and non-tumor regions from skull-free brain MR images.

Results and Discussion
In this section, we present experimental outcomes to demonstrate the performance of the proposed methodology. To assess the efficiency of our model, we conduct a wide range of experiments using K-FCV. Typically, it is a simple and effective method compared to other cross-validation approaches [44] and is mainly used to reduce overfitting. The selection of the K-value is a significant aspect of the classification problems. A small value of K will result in high bias, low variance, and an underfitting model. Similarly, a high value of K yields low bias, high variance, and an overfitting model. Therefore, we have chosen a moderate value for K as five to avoid this ambiguity.

Experimental Outcomes
This study implemented an efficient framework to identify and classify brain MR images using lightweight CNN and FL-MSCM. Primarily, we extracted brain cells from MR images to improve the accuracy of diagnosis by removing the non-brain matter using mathematical morphology and thresholding operations. Then, we employed data augmentation to enhance the model's generalization ability. Afterward, we employed CNN model to differentiate the brain MR images as normal and abnormal. Finally, we separated the infected and non-infected tumor regions from abnormal samples using the FL-MSCM-based image segmentation framework. All these experiments were carried out on Intel (R) Core (TM) i3-5005U CPU @ 2 GHz using MATLAB 2020 and Google Colab. For a better understanding, the outcomes of the proposed methodology are separated into two phases. The first phase engages the classification results; the second phase describes the segmentation results.

Classification Analysis
To classify brain MR images, we applied a CNN model to the skull-free augmented images. Typically, our architecture automatically tries to attain the relevant features using a series of hidden layers and learns using the back-propagation approach. During the training process, we used the cross-entropy loss function. Here, to train the model, we consider the batch size of 64 and the number of epochs of 30. In addition to that, stochastic gradient descent with momentum (SGDM) [45], Adam [46], AdaMax [46], Adagrad [47], Adadelta [48], RMSProp [49], and Nadam [50] optimizers were taken into account for minimizing the loss. Table 4 represents the parameters to be considered for optimization. Table 4. Parameter settings of optimizer.
The performance of the proposed approach on various optimization techniques using 5-FCV is presented in Tables 5-11. From the representations, we identified that Adadelta yields poor results among all other optimizers (see Table 9), especially in predicting normal brain MR images because the learning rate will become very low in the late training period. Similarly, we noted that Adam, AdaMax, and Nadam optimizers performed significantly better than others, with more than 99% accuracy on average. However, Adam optimization effectively minimizes the loss function since it slows down when converging to the local minima and minimizes the high variance. Hence, it provides better results on the suggested lightweight CNN model with 99.45% TPR, 99.80% TNR, 99.91% PPV, 99.68% F-score, 99.66% AUC, and 99.58% accuracy (see Table 6).
The suggested methodology is compared with other well-received techniques, as illustrated in Table 12. From this, we note that the proposed diagnosis approach provides better results on the given benchmark dataset than the traditional CNN-based approaches [12,[16][17][18]25,[27][28][29][30]) and other machine learning frameworks. The significant advantages of the proposed method are:

1.
Fewer parameters to train the model, approximately 0.35 million.

2.
Minimize the overfitting problems due to the initialization of weights in the layer. 3.
Significantly achieved high performance due to image data augmentation.

5.
Extraction of complex features without human intervention.

. Segmentation Analysis
The assessment of the proposed segmentation methodology is presented in Table 13, while an evaluation of the suggested approach with existing techniques is illustrated in Table 14. The outcomes of our framework are 0.96 DSC, 99.83% PPV, 99.8% TPR, 96.5% TNR, 99.82% F-score, 98.15% AUC, and 99.65 % accuracy. Based on the analysis of segmentation results (Table 14), we conclude that the proposed framework achieved remarkable performance compared to the existing techniques in terms of DSC. It must be noted that in evaluating the segmentation, higher values of DSC represent good performance. Even a small increment in this metric is remarkable and essential for clinical decisions. The reasons behind the success of the proposed segmentation methodology are:

1.
Using the proposed skull-stripping process, we significantly isolate the brain tissues from non-brain matters. Due to this, the implemented approach accurately identifies brain-related diseases.

2.
The proposed FL-MSCM makes each region as homogeneous as possible, with high computational efficiency, simple parameter tuning, low reduction in contrast, and image details. It is a significant advantage of the FL-MSCM. 3.
The implemented approach access adequately visible edges or boundaries.  Table 14. Segmentation performance of the proposed and existing approaches.

Conclusions and Future Scope
Considering the spread of brain tumor-related cases and their impact on human life, we proposed an efficient methodology to differentiate between normal/abnormal brain MR images based on CNN and FL-MSCM. This study initially utilized the skull-stripping process to isolate extra-cranial tissues from MR images. Further, we generated augmented images using geometric transformation operators. After that, each augmented slice is fed to our lightweight CNN model to classify brain MR slices as normal and abnormal. Finally, the FL-MSCM-based automatic segmentation approach is applied to abnormal brain MR slices for identifying the region of interest (or pixels of infected organs). Based on a detailed analysis of experimental outcomes, we observed that our framework has low-computational time and achieved high performance with an accuracy of 99.58% compared to the wellreceived approaches due to the automatic feature learning, appropriate selection of the number of training/testing samples, effective hyper-parameter tuning, and adequately access the visible edges or boundaries from an image. Hence, anatomists can use the recommended method as a decision-making tool during clinical therapy. This paper mainly focused on binary classification (normal vs. abnormal). In the future, our work would extend to the multiclass classification of brain MR images (normal vs. sarcoma vs. glioma vs. meningioma vs. Alzheimer's) and other medical diseases such as breast, skin, and lung cancers, etc. In addition, we would like to extend our work on real-time experimental data.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy restrictions.

Conflicts of Interest:
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. All of the authors have read and approved the paper, and it has not been published previously nor is it being considered by any other peer-reviewed journal.