MRI-Based Brain Tumor Classification Using a Dilated Parallel Deep Convolutional Neural Network

: Brain tumors are frequently classified with high accuracy using convolutional neural networks (CNNs) to better comprehend the spatial connections among pixels in complex pictures. Due to their tiny receptive fields, the majority of deep convolutional neural network (DCNN)-based techniques overfit and are unable to extract global context information from more significant regions. While dilated convolution retains data resolution at the output layer and increases the receptive field without adding computation, stacking several dilated convolutions has the drawback of producing a grid effect. This research suggests a dilated parallel deep convolutional neural network (PDCNN) architecture that preserves a wide receptive field in order to handle gridding artifacts and extract both coarse and fine features from the images. This article applies multiple preprocessing strategies to the input MRI images used to train the model. By contrasting various dilation rates, the global path uses a low dilation rate (2,1,1), while the local path uses a high dilation rate (4,2,1) for decremental even numbers to tackle gridding artifacts and to extract both coarse and fine features from the two parallel paths. Using three different types of MRI datasets, the suggested dilated PDCNN with the average ensemble method performs best. The accuracy achieved for the multiclass Kaggle dataset-III, Figshare dataset-II, and binary tumor identification dataset-I is 98.35%, 98.13%, and 98.67%, respectively. In comparison to state-of-the-art techniques, the suggested structure improves results by extracting both fine and coarse features, making it efficient.


Introduction
A growth that may adversely impact a person's life is a brain tumor, which can appear in the tissues enclosing the brain or skull.Two characteristics can identify a benign or malignant growth.While secondary tumors, also referred to as brain metastasis tumors, are typically formed from tumors outside the brain, primary cancers start inside the brain.Meningiomas, pituitary adenomas, and gliomas are the three most common primary brain tumors.The brain and spinal cord membrane layers are the origin of meningiomas, a type of tumor that grows slowly.Cancerous cells that arise in the pituitary gland are referred to as pituitary adenomas [1].The brain tissue is compressed by the irregular growth of these tumors.Malignant tumors, in comparison with benign tumors, grow unevenly and damage the tissues around them.Surgical techniques are frequently employed in the treatment of brain tumors [2].Because MRI is non-interfering, it is preferred over computed tomography (CT), positron emission tomography (PMT), and x-rays [3].It is estimated that 79,340 Americans aged 40 and older will be diagnosed with a primary brain tumor by 2023.It is estimated that one million Americans suffer from primary brain tumors; of these, 72% are benign tumors and 28% are malignant.The adults with primary brain tumors typically have meningioma (46.1%), glioblastoma (16.4%), and pituitary tumors proaches to machine learning rely on manual qualities the durability of the solution.But occasionally, models o better than unsupervised learning strategies, leading t inappropriate for another large database.These challen creating a machine-learning-based, fully automated sys (a) (b) right is a healthy brain [8].
A CNN's architecture is based on a neural netw model, which excels at image recognition and classifica CNN is too tiny to produce excellent precision [11].A lution kernel would help enhance the efficiency of the the fixed size of the sliding window in a CNN prohibit convolution, pooling, and flattening.The recommende ability to acquire characteristics extracted from the im focused on, recent iterations of CNN models have yet important consideration is a CNN's local feature collec limited quantity of the kernel, sharply raising the dila collection failures and hinder small-object detection [12 tiny-object detection.As a result, the dilation rate has suggested model.By doing this, the dilated feature m more data can be extracted from the investigated regio Using publicly available Kaggle and Figshare dat fully autonomous dilated PDCNN with an average ens sification [8,13,14].This article suggests an architecture of brain tumors that consists of two synchronously dila are accurate and time-efficient processes, the dilated P As the number of patients has grown, individually analyzing these images has become laborious, disorganized, and frequently incorrect.A computer-aided diagnostic technique that improves the ease of brain MRI identification needs to be developed to overcome this limitation.Many attempts have been made to create an extremely effective and trustworthy method for classifying brain tumors automatically.Conventional approaches to machine learning rely on manual qualities, which increase the cost and limit the durability of the solution.But occasionally, models of supervised learning can perform better than unsupervised learning strategies, leading to an overfitted framework that is inappropriate for another large database.These challenges underscore the significance of creating a machine-learning-based, fully automated system for classifying brain tumors.A CNN's architecture is based on a neural network known as the deep learning model, which excels at image recognition and classification [9,10].The receptive field in a CNN is too tiny to produce excellent precision [11].A large receptive field of the convolution kernel would help enhance the efficiency of the classification techniques because the fixed size of the sliding window in a CNN prohibits the utilization of techniques like convolution, pooling, and flattening.The recommended model's parameters possess the ability to acquire characteristics extracted from the images.While hyperparameters are focused on, recent iterations of CNN models have yet to focus much on them.Another important consideration is a CNN's local feature collection.Furthermore, because of the limited quantity of the kernel, sharply raising the dilation rate could exacerbate feature collection failures and hinder small-object detection [12].High dilation rates may impact tiny-object detection.As a result, the dilation rate has been gradually decreased in this suggested model.By doing this, the dilated feature map's sparsity has decreased, and more data can be extracted from the investigated region.
Using publicly available Kaggle and Figshare datasets, this work aims to develop a fully autonomous dilated PDCNN with an average ensemble model for brain tumor classification [8,13,14].This article suggests an architecture for the detection and classification of brain tumors that consists of two synchronously dilated DCNNs.Because convolutions are accurate and time-efficient processes, the dilated PDCNN with an average ensemble model performs more quickly than the conditional random field (CRF)-based methods.The recommended dilated PDCNN with an average ensemble framework incorporates batch normalization to normalize the results of previous layers.
By simultaneously integrating two DCNNs with two distinct window sizes, parallel pathways enable the model to learn both global and local features.While maintaining a large receptive field, this research also recommends managing gridding artifacts and extracting both coarse and fine characteristics from the images.Key accomplishments of the work are shown in these aspects: • A dilated PDCNN is suggested for brain tumor classification with even-numbered dilation rate decrements in the global and local paths and a combination of two parallel CNNs with data preparation (image preprocessing, data augmentation) and hyperparameter tuning.• To improve the identification and classification performance, both high-level and low-level data, along with particular brain features, are integrated.• To achieve high accuracy and computational efficiency, the architecture of the sug- gested dilated convolution with an expanded receptive field of the kernel is analyzed in detail.• The recommended experimental findings are discussed, including why low precision in brain tumor identification with the dilation rate is caused by the tiny receptive field of the PDCNN.• To greatly improve the dynamic properties offered by the two simultaneous convolutional layers, feature fusion technology is employed.
The remaining portion of this work will be organized as follows: A summary of pertinent studies and a thorough assessment of these investigations are presented in Section 2. The recommended dilated PDCNN with an average ensemble approach is described in detail in Section 3. Sections 4 and 5 describe the proposed approach, thoroughly compare it to existing approaches, and present the outcomes of the experiment.Section 6, the last section of the study, brings the article to an end.

Related Work: A Brief Review
There are multiple studies in the literature that categorize brain tumors differently.A few of the works that have been analyzed are listed here.
The method proposed by Anil et al. [15] consists of a classification network that divides the input MRI images into two groups: one that contains tumors and another that does not.In this study, the classifier for brain cancer identification is retrained by applying the transfer learning approach.With a success rate of 95.78%, the results show that VGG-19 is the most efficient.To categorize brain tumors, Muhammad Sajjad et al. establish a new CNN model [16].First, segmentation is used to identify the location of the tumor from MRI images.The dataset is enlarged in the next phase.The categorization process ends up using the suggested CNN.The data are classified with 94.58% accuracy.Habib [17] recommends a CNN model that uses the Kaggle binary category of brain tumor dataset-I, which is used in this study for recognizing brain cancers.With an updated neural network architecture, this method can attain an accuracy of 88.7%.The authors of [5] describe the development of a model centered on a simulated CNN for MRI analysis using matrix calculations and mathematical formulas.A total of 155 brain tumors and 98 brains with no tumors are used to train this neural network employing MRI.The model demonstrates a tumor's location with a 96.7% correctness rate in the validation data.
A multi-pathway CNN structure is created by Díaz et al. [18] to automatically segment brain tumors, including pituitary tumors, meningiomas, and gliomas.They achieve 97.3% accuracy when testing their proposed model on a publicly available T1-weighted contrast-enhanced MRI dataset.Their atmosphere for learning is quite expensive, though.Mahmoud Khaled Abd-Ellah et al. recommend a PDCNN framework in [19] to identify and categorize gliomas from brain MRI images.The proposed PDCNNs are tested on the BraTS-2017 dataset.In this study, 1200 images are employed for the PDCNN's training phase, 150 images are employed for its validation phase, and 450 images are applied for its testing phase.The framework obtains impressive outcomes in terms of sensitivity, specificity, and accuracy (97.44%, 97.00%, and 98.00%, respectively).
To classify brain tumors, Kwabena Adu et al. propose a less trainable CapsNet structure in [20].This architecture uses segmented tumor regions as inputs, and it outperforms related works with a greater accuracy of 95.54%.To improve and maintain the high resolution of the images being used for better classification, the network also employs dilation.The architecture's dilation has shortened training times and a decreased number of elements that need to be learned.A. E. Minarno et al. use a CNN structure to identify three different kinds of brain tumors on MRI images [21].A total of 3264 datasets containing detailed images of meningioma tumors (937 photos), pituitary cancers (901 photos), glioma tumors (926 photos), and other tumor-free datasets (500 photos) are analyzed in this study.The CNN method is presented along with hyperparameter tuning to achieve the best possible results for brain tumor categorization.This paper tests the framework in three distinct cases.Classifying brain tumors with an accuracy of 96.00% is the result of the third model evaluation scenario.
In order to classify brain tumors, seven deep CNN models are evaluated by M.A. Gomez-Guzman et al. [22].The dataset used for the proposed study is Msoud, which consists of 7023 MRI images from the Figshare, SARTAJ, and Br35H datasets.Three classes of brain tumors-glioma, meningioma, and pituitary-and one class of healthy brains comprise the MRIs included in the dataset.Multiple preprocessing strategies are used in this paper's training of the models using input MRI images.Xception, MobileNetV2, InceptionV3, InceptionRes-NetV2, Generic CNN, ResNet50, and EfficientNetB0 are the CNN models that are assessed.The best CNN model for this dataset is InceptionV3, which achieves an average accuracy of 97.12% when all CNN models-including a generic CNN and six pre-trained models-are compared.
A CNN architecture is recommended by M. I. Mahmud et al. [23] for the effective identification of brain tumors using MR images.A comparison between the suggested architecture and a number of models, including ResNet-50, VGG16, and Inception V3, is also covered in this article.The suggested model outperforms the others after being compared to other models using various metrics, including accuracy, recall, loss, and area under the curve (AUC).The CNN model performs well with a dataset of 3264 MR images, exhibiting 93.3% accuracy, 98.43% AUC, 91.19% recall, and 0.25 loss.

Proposed Brain Tumor Detection and Classification Methodology
Prior to beginning treatment, the most significant challenge is identifying and categorizing brain MRI tumors.There are not many studies on tumor diagnosis as a time-saving method, despite the majority of brain tumor identification research focusing on tumor slicing and positioning methods.Most DCNN-based methods are unable to acquire global context details of larger regions because of the small receptive fields.Stacking multiple dilated convolutions has the disadvantage of creating a grid effect, even though dilated convolution maintains data resolution at the output layer and expands the receptive field without incorporating calculation.If the dilation factor (DF) is low, the model may have a smaller receptive field but misses the coarse characteristics.In contrast, when the DF is excessive, the model is unable to learn from the finer details.This study proposes the use of a dilated PDCNN architecture that maintains a large receptive field to cope with gridding distortions and capture both coarse and fine attributes from images.Initial input image resizing is followed by grayscale transformation to minimize complexity.Data augmentation has since been used to expand the number of datasets.While maintaining an extensive receptive field, the dilated PDCNN utilizes the reduced computational cost and helps to reduce gridding artifacts.The schematic representation of the suggested dilated PDCNN design is presented in Figure 2.
The sequence that follows is the order in which the recommended structure's events occur: Brain MRI images are fed into the input layer of the dilated PDCNN after being processed.The initial images are converted from various resolution dimensions to 32 × 32 pixels for training reasons.The grayscale transformation of these input images contributes to a reduction in complexity.Following that, new images are created from prior ones using data augmentation.The dataset has been split into training and testing subsets in order to train the suggested network.The PDCNN structure then makes use of the chosen dilated rates to effectively classify the input images.Following the classification of the images using four classifiers, support vector machine (SVM), K-nearest neighbor (KNN), naïve Bayes (NB), and decision tree, the brain tumor identification process is completed using an average ensemble approach.al 2024, 24, x FOR PEER REVIEW 5 of naïve Bayes (NB), and decision tree, the brain tumor identification process is complet using an average ensemble approach.This study makes use of three distinct public datasets containing images from bra MRIs.The details regarding the dataset are provided as follows.Develop a dilated PDCNN with selected parameters 2: Accuracy ← 0 3: end for 14: end for

Dataset
This study makes use of three distinct public datasets containing images from brain MRIs.The details regarding the dataset are provided as follows.
Dataset-I: Through the Kaggle platform, an initial accessible dataset of binary-class MRI scans of the brain has been obtained for simplicity, and this dataset is widely used.This dataset is known as dataset-I in this study [8].This set of 253 brain MRI images includes 98 samples with tumors and 155 samples without tumors.Dataset-II: The Figshare dataset containing 233 patients' brain MRI images is employed in this research [13].These brain MRI images were obtained at Nanfang Hospital and General Hospital, two Chinese medical centers.This dataset, designated dataset-II, comprises 3064 brain MRI scans, including 1426 glioma tumors, 708 meningioma tumors, and 930 pituitary tumors.
Dataset-III: The additional dataset utilized in this study can also be obtained via the Kaggle website [14]; it contains brain MRI images of 826, 822, 395, and 827 glioma tumors, meningioma tumors, no tumors, and pituitary tumors, respectively.This collection of data is identified as dataset-III in the current research.The four different kinds of brain MRI images that are present in dataset-III are shown in Figure 3.
Dataset-Ⅲ: The additional dataset utilized in this study can also be obtained Kaggle website [14]; it contains brain MRI images of 826, 822, 395, and 827 glioma t meningioma tumors, no tumors, and pituitary tumors, respectively.This collection is identified as dataset-Ⅲ in the current research.The four different kinds of bra images that are present in dataset-Ⅲ are shown in Figure 3.

Data Preprocessing
A method for enhancing the efficiency of a machine learning model is calle preprocessing, which involves purifying and preparing data for usage by the mod skull photos in the MRI datasets are not all identical in width and height; instea image is scaled to 32 x 32 pixels for training purposes.The grayscale conversion o data contributes to a reduction in the level of complexity.Digital images can be no without having their edges blurred through the utilization of the anisotropic di filter.After the utilization of the anisotropic diffusion filter, Table 1 represents the dataset.

No Tumor
Glioma Tumor Meningioma Tumor Pituitary Tum MRI brain pictures [12] Anisotropic

Data Preprocessing
A method for enhancing the efficiency of a machine learning model is called data preprocessing, which involves purifying and preparing data for usage by the model.The skull photos in the MRI datasets are not all identical in width and height; instead, each image is scaled to 32 × 32 pixels for training purposes.The grayscale conversion of these data contributes to a reduction in the level of complexity.Digital images can be noise-free without having their edges blurred through the utilization of the anisotropic diffusion filter.After the utilization of the anisotropic diffusion filter, Table 1 represents the filtered dataset.
Dataset-Ⅲ: The additional dataset utilized in this study can also be Kaggle website [14]; it contains brain MRI images of 826, 822, 395, and 827 meningioma tumors, no tumors, and pituitary tumors, respectively.This c is identified as dataset-Ⅲ in the current research.The four different kin images that are present in dataset-Ⅲ are shown in Figure 3.

Data Preprocessing
A method for enhancing the efficiency of a machine learning mod preprocessing, which involves purifying and preparing data for usage by skull photos in the MRI datasets are not all identical in width and heig image is scaled to 32 x 32 pixels for training purposes.The grayscale con data contributes to a reduction in the level of complexity.Digital images without having their edges blurred through the utilization of the aniso filter.After the utilization of the anisotropic diffusion filter, Table 1 repre dataset.

Anisotropic diffusionfiltered pictures
Digital 2024, 24, x FOR PEER REVIEW comprises 3064 brain MRI scans, including 1426 glioma tumors, 708 meningioma t and 930 pituitary tumors.Dataset-Ⅲ: The additional dataset utilized in this study can also be obtained Kaggle website [14]; it contains brain MRI images of 826, 822, 395, and 827 glioma t meningioma tumors, no tumors, and pituitary tumors, respectively.This collection is identified as dataset-Ⅲ in the current research.The four different kinds of bra images that are present in dataset-Ⅲ are shown in Figure 3.

Data Preprocessing
A method for enhancing the efficiency of a machine learning model is call preprocessing, which involves purifying and preparing data for usage by the mod skull photos in the MRI datasets are not all identical in width and height; instea image is scaled to 32 x 32 pixels for training purposes.The grayscale conversion o data contributes to a reduction in the level of complexity.Digital images can be no without having their edges blurred through the utilization of the anisotropic d filter.After the utilization of the anisotropic diffusion filter, Table 1 represents the dataset.

Anisotropic diffusionfiltered pictures
Digital 2024, 24, x FOR PEER REVIEW 6 of 26 comprises 3064 brain MRI scans, including 1426 glioma tumors, 708 meningioma tumors, and 930 pituitary tumors.Dataset-Ⅲ: The additional dataset utilized in this study can also be obtained via the Kaggle website [14]; it contains brain MRI images of 826, 822, 395, and 827 glioma tumors, meningioma tumors, no tumors, and pituitary tumors, respectively.This collection of data is identified as dataset-Ⅲ in the current research.The four different kinds of brain MRI images that are present in dataset-Ⅲ are shown in Figure 3.

Data Preprocessing
A method for enhancing the efficiency of a machine learning model is called data preprocessing, which involves purifying and preparing data for usage by the model.The skull photos in the MRI datasets are not all identical in width and height; instead, each image is scaled to 32 x 32 pixels for training purposes.The grayscale conversion of these data contributes to a reduction in the level of complexity.Digital images can be noise-free without having their edges blurred through the utilization of the anisotropic diffusion filter.After the utilization of the anisotropic diffusion filter, Table 1 represents the filtered dataset.

Anisotropic diffusionfiltered pictures
Digital 2024, 24, x FOR PEER REVIEW 6 of 26 comprises 3064 brain MRI scans, including 1426 glioma tumors, 708 meningioma tumors, and 930 pituitary tumors.Dataset-Ⅲ: The additional dataset utilized in this study can also be obtained via the Kaggle website [14]; it contains brain MRI images of 826, 822, 395, and 827 glioma tumors, meningioma tumors, no tumors, and pituitary tumors, respectively.This collection of data is identified as dataset-Ⅲ in the current research.The four different kinds of brain MRI images that are present in dataset-Ⅲ are shown in Figure 3.

Data Preprocessing
A method for enhancing the efficiency of a machine learning model is called data preprocessing, which involves purifying and preparing data for usage by the model.The skull photos in the MRI datasets are not all identical in width and height; instead, each image is scaled to 32 x 32 pixels for training purposes.The grayscale conversion of these data contributes to a reduction in the level of complexity.Digital images can be noise-free without having their edges blurred through the utilization of the anisotropic diffusion filter.After the utilization of the anisotropic diffusion filter, Table 1 represents the filtered dataset.

No Tumor Glioma Tumor Meningioma Tumor Pituitary Tumor
Anisotropic diffusionfiltered pictures dataset.
Table 1.Filtered dataset after utilization of anisotropic diffusion filter.

Anisotropic diffusionfiltered pictures
Table 1.Filtered dataset after utilization of anisotropic diffusion filter.

Data Augmentation
Since deep learning requires a lot of data to extract information, data enhancement is being employed at this time to increase the quantity of available data by altering the initial image.Supplementary data can be used to increase the effectiveness of categorized outcomes.Illustrations can undergo the following procedures: shifting, scaling, translation, and filtering methods.This article uses the process of anisotropic diffusion filtering as augmentation.Dataset statistics for this research are presented in Table 2.

Developed Dilated PDCNN Design
This paper presents the design of a multiscale dilated two-simultaneous deep CNN technique to extract multiscale detail characteristics from MRI images.To increase the receptive field despite adding more parameters to the network, dilated convolution is used.Additionally, batch normalization is used to guarantee that the model's precision will not drop as the network depth increases.
The multiscale extraction of characteristics, integrating path, and classification stage are the three main elements of the suggested network, as illustrated in Figure 4. Since the suggested model uses dilated CNNs, the DF is an additional hyperparameter that must be considered.
Both local and global characteristics are acquired in the dilated PDCNN framework through the corresponding local and global routes.However, most DCNN-based methods cannot effectively collect both local and global data because of their tiny receptive fields.Stacking multiple dilated convolutions has the disadvantage of creating a grid effect, even though dilated convolution maintains data resolution at the output layer and expands the receptive field without incorporating computation.In the case of a poor DF, the model may contain a smaller receptive field and nevertheless miss the coarse features.In contrast, with an excessive DF, the model is unable to pick up the finer details.By contrasting various DFs, these suitable DFs are chosen for both local and global feature paths.Each of the convolutional layers is followed by the max-pooling layer for every single path that down-samples the outcome of the convolutional layer and uses the ReLU activation function.In the end, an average ensemble method is employed to carry out the brain tumor categorization process after four ML classifiers have trained the images.CNNs have been used extensively in the field of medicine and have demonstrated good results in the segmentation and classification of medical images [24].CNN architectures are built using a variety of building blocks, such as fully connected (FC) layers, pooling layers, and convolution layers.Convolution layers, which combine linear and nonlinear operations-that is, activation functions and convolution operations-are used in feature extraction [25,26].Kernels and their hyperparameters, such as the size, quantity, stride, padding, and activation function of each kernel, are the parameters of convolution layers [27].Six convolution layers are used in the two simultaneous paths, and the convolution operation occurs using Equation (1).
where for the r th kernel in layer l, O l p,q,r expresses the resultant feature map of position (p, q), W l r represents the weight vector's values, I l−1 p,q indicates the input vector of position (p, q) in l − 1, and b l r is the symbol of bias.In addition, the activation function is f (.) [28].By down-sampling, pooling layers lower the dimensionality of the feature maps.The stride, padding, and filter size are among the hyperparameters that comprise pooling layers, although they do not contain any other parameters.Two common varieties of pooling layers are max pooling and global average pooling.Maximum pooling layers are used in this structure.The output size of the pooling operation in the CNN is calculated using Equation (2).
where n stands for the dimension of input, f is the kernel size, the padding size is shown by p, and s is symbol of stride size [28].
The pooling layers' feature maps are smoothed out and sent to several one-dimensional (1D) vectors known as FC layers.The most popular activation parameter for FC layers is the rectified linear unit (ReLU), which is illustrated in Equation (3).
The final FC layer's activation function is usually SoftMax for the categorization of multiple classes and Sigmoid for binary classification.The node values in the final FC layer of the proposed model are computed using Equation (4), and the sigmoid activation function for a binary categorization dataset-I is calculated using Equation (5) [25].
where h stands for the neural network layers' internal calculations, b shows the bias, and w stands for the weights used to determine an output node's value.Furthermore, the input vector and output class are denoted by x and y, respectively.The SoftMax activation function is calculated using Equation ( 6) for the multiclass categorization of Figshare dataset-II; and Kaggle dataset-III; in this proposed structure.
where x stands for the input vector and y for the class in the case of a multiclass categorization problem.Additionally, the c th component of the class rating vector in the final FC layer is displayed by f c .The category k with the highest P coefficient is chosen as the output class Digital 2024, 4 538 in the SoftMax activation function [25].A backpropagation algorithm is used during CNN training to adjust the weights of the FC and convolution layers.The two main elements of backpropagation are the loss function and gradient descent (GD), among which GD is used to minimize the loss function.Among the loss functions most frequently employed by CNNs is the cross-entropy (CE) loss function.For the binary categorization dataset-I with a sigmoid activation function, the CE loss function is computed using Equation (7).
where z is computed using Formula (4).For the multiclass categorization Figshare dataset-II and Kaggle dataset-III with the SoftMax activation function, the CE loss function is calculated using Equation (8) [28,29].
where N denotes the quantity of training elements, the input image class i th is indicated by y i , and the c th component of the category scores vector in the final FC layer is represented by f c [28].
Expanding the receptive field in deep learning involves boosting the dimension and depth of the convolution kernel, which in turn enhances the number of elements in the network.By adding weights of zero to the conventional convolution kernel, dilated convolution may enhance the receptive field without adding more network elements.
Equation ( 9) defines the convolution function * as follows: 1-D dilated convolution using DF, where l = 1 connects the input image F alongside kernel k.The term "standard CNN" refers to this 1-D convolution.The network is identified as a dilated CNN when l rises.
Upon the introduction of a DF denoted as l and through its expansion, l is referred to as, Using Equation (10), the dilated convolution operation is calculated in this proposed structure.The fundamental CNN has a value of l = 1 [29,30].
The main function of the dilated convolution layer is to extract features.In addition to conveying fine and high-level feature details, MRI images also contain rough and lowlevel information.As a result, image data must be extracted at several scales.Specifically, the local and global routes are employed to obtain the local and global features.Within the local route, the convolutional layers make use of the small 5 × 5 pixels window dimension to provide low-level details about the images.However, a vast number of filters with 12 × 12 pixels are present in the convolutional stages of the global path.The same 5 by 5 filters are used by three different convolution layers throughout the local path, and each layer's decremental even number of the high DF (4,2,1) is the only factor used to produce the coarse feature maps.Three distinct convolution layers in the global path employ identical 12 × 12 filters, and the generation of finer feature maps is exclusively dependent on the tiny DF (2,1,1) of every single layer.As illustrated in Figure 4, three convolution layers with distinct filter numbers (128,96,96) are applied at each feature extraction path to extract image data at various scales.
Conv1, Conv3, and Conv4 provide local as well as coarse features, while Conv2, Conv5, and Conv6 supply global as well as fine features.The max-pooling layer is employed after each convolutional layer for each path that down-samples the output of the convolutional layer.By employing a 2 × 2 kernel, the max-pooling layers lower the dimension of the attributes that are produced.
A dimension of (32,32,1) is assigned to each input tensor in the suggested model's structure.To test the impact of the DF on the model's efficiency and comprehend the gridding impact brought about by the dilation approach, the interior design is kept as simple as possible.In the local path, layer Conv1 applies a 5 × 5 filter and a dilation factor of d 1 = 4 to generate coarse feature maps (such as shapes and contours); layer Conv3 applies the same filter and dilation factor of d 2 = 2 along with the final convolution to generate coarse feature maps once more; and layer Conv4 applies a 5 × 5 filter and dilation factor of d 3 = 1 to generate coarse feature maps.In the global route, layer Conv2 applies a 12 × 12 filter and a dilation factor of d 4 = 2, layer Conv5 applies the same filter and dilation factor of d 5 = 1 along with the last convolution to generate fine feature maps once more, and layer Conv6 applies a 12 × 12 filter and a dilation factor of d 6 = 1 to generate fine feature maps.The activation function of ReLU is utilized by all six convolutional stages.

Merge Stage
A merge layer connects the two routes, creating a single path with a cascaded link until it reaches the endpoint.This process extracts multiscale features, where local paths with high dilation rates extract local as well as coarse features, and global paths with low dilation rates extract global as well as fine features.Two fully interconnected layers that are connected to a dropout layer through the merging pathway come after a batch normalization layer and a ReLU layer.In order to address the issue of performance degradation brought on by a boost in neural network stages, batch normalization is used.Equation (11) provides the feature map that results after a merging phase [19].
where σ stands for the ReLU activation function, BN denotes the batch normalization function, and f (X) denotes the fused feature maps from each channel in the preceding paths.

Hyperparameter Tuning
Hyperparameter adjustment is a successful parameter-searching technique for the suggested dilated PDCNN framework.The dense layer, optimization, and dropout measure are among the parameters that must be chosen to perform this PDCNN adjustment.It provides the framework with the ideal set of parameters, producing the most effective results.Hyperparameter settings for model training are displayed in Table 3. Table 3. Hyperparameter settings for model training.

Dense layer 512
Learning rate 0.0001

Iteration per epoch 34
The training data for the simulated scenario are provided by the effective adjustment of the hyperparameter, which includes the adaptive moment estimation (Adam) optimizer, 0.3 dropout, 512 dense layers, and a 0.0001 rate of learning.In this work, the weight of the layers is updated via Adam, the optimizer that calculates the adaptive learning rates of every parameter.The training setting employs a validation frequency of 20 Hz.The highest average accuracy for the test datasets is collected for each run.When the epoch count reaches 70, the framework is trained employing a range of epoch counts; it acquires 98.67% accuracy for dataset-I.It acquires 98.13% and 98.35% accuracy for dataset-II and dataset-III, respectively, when the epoch number is 60.

Feature Map of Dilated Convolutional Layers
A CNN feature map represents specific attributes in the input image as the result of a convolutional layer.It is produced by filtering input images or the previous layers' feature map output.The feature maps that are acquired from every convolutional layer are presented in Figures 5 and 6.In Figure 5, the low-level and coarse features of the three convolutional layers conv_1, conv_3, and conv_4 having filters of 128, 96, and 96 are displayed.The feature maps in this figure are primarily composed of coarse and local features, which represent the texture in an image.In this local path, a dilated CNN algorithm that has DFs associated with (d 1 = 4, d 2 = 2, d 3 = 1) is referred to as a dilated PDCNN (4,2,1).In Figure 6, the high-level feature maps including contour representations, shape descriptors, and fine features of the three deeper convolutional layers conv_2, conv_5, and conv_6, having the same filters, are shown.DFs corresponding to (d 4 = 2, d 5 = 1, d 6 = 1) are used in this global path.The multiscale feature maps, which are displayed in Figure 7, are greatly improved when these features are combined using a feature fusion technique.Figure 8 displays the final multiscale features that are extracted, along with a fully connected layer that is prevented from overfitting by employing the dropout technique.A CNN feature map represents specific attributes in the input image as the result of a convolutional layer.It is produced by filtering input images or the previous layers' feature map output.The feature maps that are acquired from every convolutional layer are presented in Figures 5 and 6.In Figure 5, the low-level and coarse features of the three convolutional layers conv_1, conv_3, and conv_4 having filters of 128, 96, and 96 are displayed.The feature maps in this figure are primarily composed of coarse and local features, which represent the texture in an image.In this local path, a dilated CNN algorithm that has DFs associated with ( = 4,  = 2,  = 1) is referred to as a dilated PDCNN (4,2,1).In Figure 6, the high-level feature maps including contour representations, shape descriptors, and fine features of the three deeper convolutional layers conv_2, conv_5, and conv_6, having the same filters, are shown.DFs corresponding to ( = 2,  = 1,  = 1) are used in this global path.The multiscale feature maps, which are displayed in Figure 7, are greatly improved when these features are combined using a feature fusion technique.Figure 8 displays the final multiscale features that are extracted, along with a fully connected layer that is prevented from overfitting by employing the dropout technique.Both the FC and convolutional layers provide parameters that can be learned.Parameters are the quantities of weights that the CNN structure learns during training.It is possible to calculate the convolutional layer parameters (P conv ) equation as follows: F h and F w stand for the length and width of the filter, accordingly.F num indicates the quantity of filters.C in indicates the associated layer's input channel quantity.The parameters of the layer that is fully connected (P FC ) are as follows: where A (prev) indicates the prior layer's activation pattern, and N (unit) denotes the number of neurons that make up the present FC layer.There are no variables that can be learned in the max-pooling layer.The batch normalization layer's variables are the product of the number of channels utilized in the preceding convolutional layer [10].Details of the suggested dilated PDCNN framework are presented in Table 4.

Classification Stage
In this categorization phase extracting all the multiscale attributes from the last FC layer, four types of classifiers, SVM, KNN, NB, and decision tree, are used to categorize the three types of brain tumor datasets.These ML classifiers and their hyperparameter settings used in this experiment for brain tumor classification are discussed in the following subsections.

SVM
The SVM algorithm is a categorization technique that works by creating a hyperplane with the maximum margin between classes.An SVM needs a collection of instances, each of which can be expressed as a pair of (x i , y i ).x i is an input vector's symbol, and its matching class label is indicated by y i .
In SVM, the ideal hyperplane is computed as illustrated in Equation ( 14) [31].
where x i represents features from the brain MRI, α * i defines the Lagrange multiplier, and y i indicates the category of the used three types of datasets.
Hence, a multiclass linear support vector machine with a linear kernel function and a zero verbose value is employed in this suggested method.

k-NN
The initial training set, which is kept in the memory, is utilized directly by k-NN to make predictions.In order to categorize a newly acquired data set, including a feature from a brain MR image, for example, k-NN selects the set of k elements from the training samples that are nearest to the newly collected data by measuring the distance and allocates the label with two categories, normal and tumor; three categories, namely glioma, meningioma, and pituitary; or four categories, normal, glioma, meningioma, and pituitary tumor.The newly created object is selected by a majority decision of its k neighbors.The following is the calculation of the Euclidean distance (d) utilizing the k-NN algorithm between data points x and y in the present research [28]: The efficacy of the algorithm is assessed using the correct classification provided during the test phase.The coefficient K is modifiable until a respectable degree of accuracy is achieved if the outcome does not meet expectations.As can be seen, the value of neighbors is limited to 5, and the best value for K is determined to be 5.This value is then applied with the maximum precision to dataset-I, -II, and -III, respectively.The standardization of the predictors is performed using the zero standardized value.

Naïve Bayes (NB)
The NB classifier is an ML classifier with the assumption of conditional independence between the attributes given in the class.In this article, the final class is predicted using vectors of attributes and class prior probabilities, as shown in Equation ( 18) [28].
where X indicates the given data instance (extracted deep features from brain MR image), which is represented by its feature vector (x 1 , . . ., x n ), and C is the class target (type of brain tumor) with two classes (normal and tumor) for binary dataset-I, three classes (glioma, meningioma, and pituitary) for Figshare dataset-II, or four classes (normal, glioma, meningioma, and pituitary) for multiclass Kaggle dataset-III.Here, P(X = x|C = i) in this classifier is computed by taking the dataset's features to be independent.
The value for the minimum threshold of probabilities for the NB classifier is 0.001.For every predictor and class combination, the kernel smoothing window width is automatically chosen by default in this suggested approach.

Decision Tree
For both classification and regression, decision tree structures provide a non-parametric supervised learning technique.In this work, this technique is employed to build a model that utilizes basic decision rules deduced from the data features extracted from brain MR images.With training vectors x i ∈ R n , i = 1,. ..,l and a label vector y ∈ R l , a decision tree successively divides the domain of features so that instances that share similar desired values are organized together [32].
Three datasets are used in this proposed approach.Here, dataset-I at node m is denoted by Q m .Splitting θ = (j, t m ) for every candidate that is composed of a characteristic j and threshold t m divides the dataset- Next, to calculate the quality of a potential split of node m, a loss function H() is used.
This process should be carried out for Q le f t m (θ * ) and Q right m (θ * ) subsets until the maximum permitted depth is achieved.These steps are also followed for the other two datasets multiclass Figshare dataset-II and Kaggle dataset-III.
The maximum number of branch nodes in the suggested method is fixed at 1.In this approach, there must be a minimum of one leaf node.

Average Ensemble Method
Machine learning and signal analysis both use the statistical technique of ensemble averaging.Model averaging is a machine learning technique for ensemble learning in which each member of the ensemble makes an equal contribution to the ultimate prediction.A group of frameworks often outperforms a single one because the individual errors in the models "average out."In the average ensemble method, the actions are as follows:

•
Create N experts, each starting at a different value; typically, initial values are selected at random from a distribution.• Train every specialist independently.
• Add up all the experts and take the mean of their scores.
Digital 2024, 4 545 These steps of the average ensemble method are used to average the accuracy to acquire the final outcome in the proposed method [33,34].

Experimental Outcomes and Evaluation
MATLAB is used to run the implementation program for the suggested dilated PD-CNN model.The computing device is equipped with a Core i5 processor manufactured by Intel, running at 3.2 GHz, with 8 GB of RAM and a Windows 10 operating system installed.

Performance Analysis of Suggested Dilated PDCNN Model
The confusion matrix is used to express the classification system's results.The efficiency is evaluated using the following criteria [35].
Table 5 provides an overview of the dilated PDCNN algorithm's efficiency indicators using ML classifiers over dataset-I.As per the findings in Table 5, the dilated PDCNN model that incorporates KNN and decision tree classifiers has the best F1-score, recall, accuracy, and precision in comparison with the remaining models.The suggested dilated PDCNN model utilizing KNN and decision tree classifiers yields 100.00% for all performance criteria, which is better than the outcomes obtained by other ML classifiers.The dilated PDCNN model that includes SVM and NB classifiers is noteworthy for having 100% precision and recall, which is also the same as the dilated PDCNN model alongside KNN and decision tree classifiers.However, the KNN and decision tree classifiers execute better concerning other performance indicators.The suggested dilated PDCNN utilizing the average ensemble approach for dataset-I has finalized accuracy, precision, recall, and F1-score values of 98.67%, 98.62%, 99.17%, and 98.28%, respectively, after implementing the average ensemble technique.An overview of the dilated PDCNN algorithm's efficiency indicators using ML classifiers over dataset-II is provided in Table 6.The results shown in Table 6 demonstrate that, when contrasted with other scenarios, the dilated PDCNN algorithm using the NB classifier provides the highest performance indicators.The suggested dilated PDCNN model incorporating the NB classifier outperforms the findings of the remaining ML classifiers alongside an accuracy of 98.90%, precision of 98.67%, recall of 98.67%, and F1-score of 98.67%.In the end, using the average ensemble approach, the suggested dilated PDCNN employing the average ensemble technique for the dataset-II has accuracy, precision, recall, and F1-score values of 98.13%, 97.74%, 98.05%, and 97.80%, respectively.
Digital 2024, 4 Table 7 provides an overview of the effective measurements of the suggested dilated PDCNN model using machine learning classifiers for dataset-III.In comparison to other models, the accuracy, precision, recall, and F1-score of the suggested dilated PDCNN alongside SVM classifier are 98.60%, 98.50%, 98.25%, and 98.50%, respectively, based on the results shown in Table 7.The findings of the dilated PDCNN employing the average ensemble technique for dataset-III are, after executing the average ensemble strategy, 98.35%, 98.35%, 97.85%, and 98.20%, respectively, for accuracy, precision, recall, and F1-score.

Comparative Analysis of Different Dilation Rates
While dilated convolution retains data resolution at the output layer and increases the receptive field without adding computation, stacking several dilated convolutions has the drawback of producing a grid effect.Validating the results involves comparing multiple combinations of dilation rates for the various convolution layers.High dilation rates may impact tiny-object recognition.As a result, the DF gradually decreases (even-numbered arithmetic decrease) at the local path in the suggested framework.By doing this, the dilated feature map's sparsity is reduced, and more data can be extracted from the investigated region.At the global path, the low DF (2,1,1) is carried out to extract the fine features.
A comprehensive review of the gridding issue and the consequences of different dilation rates can be found in the accompanying Figure 9.The poor efficiency of the (4,2,1) dilated value for the global pathway as well as the (8,4,2) dilated value for the local route of the suggested model is caused by the gridding phenomenon, which arises when a high DF is used.This limits the framework from acquiring finer characteristics.When a high DF (4,2,1) is used for both local and global paths, the accuracy increases more than before.On the contrary, using low dilation rates, the model only learns fine features.When a low DF (1,1,1) is used in the global path and (2,2,2) is used for the local path, the value of accuracy for dataset-I, dataset-II, and dataset-III is 97.30%, 97.70%, and 93.70% respectively.When a low DF (2,1,1) is used for the global feature and a high DF (4,2,1) is used for the local feature, the highest accuracy is achieved.The highest accuracy for dataset-I, dataset-II, and dataset-III is 97.33%, 98.20%, and 97.94% respectively.Providing the best-case scenario, a well-balanced model (4,2,1) for the local path and (2,1,1) for the global path may acquire both the coarse and the fine characteristics of the pictures.

Evaluation Measurements of the Proposed System on the Three Datasets
With the SVM, KNN, NB, and decision tree classifiers for dataset-Ⅰ, Table 8 illustrates the classification accuracy, error, duration, and kappa scores for the suggested PDCNN as well as dilated PDCNN architectures.When the expected precision of the random classifier is considered, the kappa statistic expresses how closely the instances identified by the classification model matched the data assigned as ground truth.In comparison to the PDCNN alongside the average ensemble model, the dilated PDCNN has a larger kappa.The error rate is reduced, and the elapsed time is increased following the application of dilation to the PDCNN with the average ensemble model.The success rate, error, period, and kappa statistics for the suggested PDCNN and dilated PDCNN architectures employing the SVM, KNN, NB, and decision tree classifiers for dataset-Ⅱ are presented in Table 9.When the expected precision of the random classifier is considered, the kappa statistic expresses how closely the instances identified by the classification model match the data assigned as ground truth.As compared to the PDCNN employing the average ensemble model's kappa, the dilated PDCNN offers a greater

Evaluation Measurements of the Proposed System on the Three Datasets
With the SVM, KNN, NB, and decision tree classifiers for dataset-I, Table 8 illustrates the classification accuracy, error, duration, and kappa scores for the suggested PDCNN as well as dilated PDCNN architectures.When the expected precision of the random classifier is considered, the kappa statistic expresses how closely the instances identified by the classification model matched the data assigned as ground truth.In comparison to the PDCNN alongside the average ensemble model, the dilated PDCNN has a larger kappa.The error rate is reduced, and the elapsed time is increased following the application of dilation to the PDCNN with the average ensemble model.The success rate, error, period, and kappa statistics for the suggested PDCNN and dilated PDCNN architectures employing the SVM, KNN, NB, and decision tree classifiers for dataset-II are presented in Table 9.When the expected precision of the random classifier is considered, the kappa statistic expresses how closely the instances identified by the classification model match the data assigned as ground truth.As compared to the PDCNN employing the average ensemble model's kappa, the dilated PDCNN offers a greater kappa.The error rate drops when dilation is applied to the PDCNN using the average ensemble method, but the time that passes increases.When compared to the PDCNN employing an average ensemble model, the dilated PD-CNN has a higher kappa value.The error rate is reduced, and the elapsed time is increased following the application of dilation to the PDCNN with the average ensemble model.

Impact of Applying Dilation on the Proposed Model
In the categories of efficiency, precision, recall, F1-score, error rate, kappa, and training time, Figure 10 shows that the suggested dilated PDCNN alongside the average ensemble approach performs better than the conventional PDCNN alongside the average ensemble framework.Values of the effectiveness indicators will increase even further if dilation is applied to increase the efficiency of the recommended approach.
These findings show that in comparison to the proposed PDCNN model using the average ensemble technique, the proposed dilated average ensemble classifier for three types of dataset indicates a higher accuracy, precision, recall, F1-score, and kappa and a lower error rate and execution time.

Comparison of the Suggested Model with Prior Investigations Based on Three Datasets
A comprehensive assessment is made at the end of the validation process for the proposed approach.A brief overview is shown in Table 11.These findings show that in comparison to the proposed PDCNN model using the average ensemble technique, the proposed dilated average ensemble classifier for three types of dataset indicates a higher accuracy, precision, recall, F1-score, and kappa and a lower error rate and execution time.

Comparison of the Suggested Model with Prior Investigations Based on Three Datasets
A comprehensive assessment is made at the end of the validation process for the proposed approach.A brief overview is shown in Table 11.

Discussion
With an increasing number of patients, manually analyzing MRI images has grown more complicated, time-consuming, and frequently inaccurate.Conventional machine learning techniques utilize manual properties, which reduce the solution's durability and raise its cost.Nonetheless, there are situations when supervised learning models perform better than unsupervised learning strategies, leading to an overfitted structure that is inappropriate for another large database.These problems emphasize how crucial it is to create a fully machine-learning-based classification system for brain tumors.By combining the average ensemble technique with PDCNN, this investigation presents a novel to the identification and classification of brain tumors.The dilated PDCNN architecture includes both local and global multiscale feature selection paths, a merging phase, and categorization pathways.The initial pictures are converted to grayscale, which makes the process easier.After that, new images are produced from old ones by employing data augmentation.Using a modest window size of 5x5 pixels and gradually higher dilation rates (4,2,1) for each convolution layer, the convolutional layers in the local path collect coarse characteristics and provide local data for the images.In contrast, the global path's convolutional layers obtain fine details by using a large window dimension of 12 by 12 pixels and low dilation rates (2,1,1) for every layer of convolution.The ReLU activation function and max-pooling layer are applied after each convolutional layer for each path that down-samples the convolutional layer output.A fusion layer connects the two parallel pathways, forming a single path with a cascading link that continues until it reaches the end destination.Two fully connected layers that are attached to a dropout layer that is included in the merging route come after a batch-normalized layer and a ReLU layer.In the output path, the four classifier types-SVM, KNN, NB, and decision tree-are used to execute the brain tumor categorization procedure.A regularization method called dropout is also employed to stop the training data from being overfitted.
Tables 5-7 present the performance parameters of the dilated PDCNN model with ML classifiers on binary dataset-I, multiclass Figshare dataset-II, and multiclass Kaggle dataset-III.Among all the performance metrics, including accuracy, precision, recall, and F1-score, for three different brain tumor datasets employing the average ensemble technique, binary classification dataset-I provides the best outcomes.The values of accuracy, precision, recall, and F1-score of the dilated PDCNN model with binary classification dataset-I are 98.67%, 98.62%, 99.17%, and 98.28%, respectively.
The impact of different dilation rates on the model's accuracy has been examined for the dilated PDCNN.The comparison analysis among various dilation rate arrangements for the various convolution layers is displayed in Figure 9.The comparative study demonstrates that the decremental large dilation rate (4,2,1) for the local path and the low dilation rate (2,1,1) for the global path yield the best results, based on an understanding of the gridding phenomenon and various recommendations for the dilation rate parameter for each layer.For datasets I, II, and III, the highest accuracy values obtained are 98.67%, 98.13%, and 98.35%.This demonstrates that while the global path (lower dilation rates) gains knowledge from the finer features, the local path (higher dilation rates) concentrates on the coarse features.The best outcomes are obtained with this combination.
Tables 8-10 display the evaluation results, including accuracy, error rate, time, and kappa value, for the proposed system with the three types of datasets.The lowest error rate of 1.33% is obtained for the binary classification dataset-I, and the highest value of kappa of 0.977 is obtained for the multiclass Kaggle dataset-III.
The results shown in Figure 10 demonstrate that in terms of accuracy, precision, recall, F1-score, error rate, kappa, and training duration, the suggested dilated PDCNN with the average ensemble model performs better than the standard PDCNN with the average ensemble approach.The performance indicators' values will increase even further if the three different types of datasets are dilated to increase the suggested dilated PDCNN's efficiency with the average ensemble model.A thorough comparison is performed once the evaluation of the proposed method is accomplished.The findings demonstrate that the suggested simultaneous network topology outperforms detection and classification techniques that have been previously published.

Conclusions and Future Work
Since brain tumors vary in shape, dimension, and structure, the proper identification of these conditions remains extremely difficult.It is well known how important it is to detect brain tumors early to receive the right medical care.This study proposed a dilated PDCNN structure with ML classifiers to detect and classify brain tumors from MRI images.The proposed dilated PDCNN with the average ensemble method is evaluated for binary and multiclass classification using the Kaggle dataset, which contains four different types of tumor images, while the Figshare dataset contains three types of tumor images.The suggested dilated convolution with an expanded receptive field of the kernel increases the computation efficiency while preserving high accuracy.The framework achieves outstanding accuracy, precision, recall, and F1-score regarding the binary brain cancer dataset-I.In order to gain a better understanding of the inner workings of the network and the effectiveness of the dilation rate parameter, experimental evaluation can be performed on other datasets in future investigations.Additionally, studies can be carried out to identify brain tumors with greater accuracy by utilizing actual patient information from any source (various images captured by scanners).

Figure 1 .
Figure 1.MRI scans performed on two different brains.(a) right is a healthy brain [8].

Figure 1 .
Figure 1.MRI scans performed on two different brains.(a) On the left is a tumor, and (b) on the right is a healthy brain [8].

Figure 2 .
Figure 2. Proposed methodology's workflow.The step-by-step flow of the suggested framework is mentioned in Algorithm 1.

Algorithm 1
Algorithm based on brain tumor detection and classification approach Input Three different representative public datasets of brain MRI pictures: I j ,j = 1, 2, . .., K of size W × H and the class labels C = 2, 3, 4. Output Brain tumor detection and classification 1: (a) No tumor (b) Glioma tumor (c) Meningioma tumor (d) Pituitary tumor

Figure 4 .Algorithm 2
Figure 4. Proposed architecture of dilated PDCNN model.The step-by-step flow of the suggested dilated PDCNN structure is mentioned in Algorithm 2. Algorithm 2 Algorithm based on dilated PDCNN model Parallel dilated deep convolutional layers i.For the local and coarse path, set the window size = 5, and for the global and finer path, set the window size = 12 ii.Divide I j , j = 1, 2, . .., K into blocks a ij , i = 1, 2, 3, . . .B, j = 1, 2, 3, . .., K of size w × h × d, where d = Number of feature maps in I j and B = Number of blocks created from each I j iii.Flatten a ij into vector x i ∈ R n , i = 1, 2, 3, . . .M, where M = K × B, and D = w × h × d iv.Large dilation _rate: coarse features v. Small dilation _rate: finer features vi.Compare different configurations of dilation rates to find best-diagnosed results vii.Compute ReLU activation function, f(x) = x for x ≥ 0 0 for x < 0 viii.Compute cross-channel normalization, x ′ = x (K+ α × ss window channel size ) β , where α, β, K are the hyperparameters in the normalization and ss = sum of squares of the elements in the normalization window ix.Apply max-out plan Z s to different feature maps O s , O s+1 , . . ., O s+k−1 , which takes maximum over the O s and maps it individually, as represented in Z s,i,j = max(O s,i,j , O s+1,i,j , . . ., O s+k−1,i,j ) x. Apply Adam_Optimizer to minimize error rate xi.Repeat steps ii, vii, and ix; twice for both parallel paths where the filter number 96 xii.Update weights using back_propagation xiii.Weights Best ← Save weights xiv.Employ the optimized weights to extract the multiscale features in the training set

Figure 7 .
Figure 7. Addition of all features.

Figure 7 .
Figure 7. Addition of all features.

Figure 7 .
Figure 7. Addition of all features.

3. 4 . 5 .
Parameters for Dilated PDCNN ModelBoth the FC and convolutional layers provide par eters are the quantities of weights that the CNN struct sible to calculate the convolutional layer parameters (

Figure 9 .
Figure 9. Comparing accuracy across different configurations of dilation rates.

Table 1 .
Filtered dataset after utilization of anisotropic diffusion filter.

Table 1 .
Filtered dataset after utilization of anisotropic diffusion filter.

Table 1 .
Filtered dataset after utilization of anisotropic diffusion filter.

Table 1 .
Filtered dataset after utilization of anisotropic diffusion filter.

Table 1 .
Filtered dataset after utilization of anisotropic diffusion filter.

Table 1 .
Filtered dataset after utilization of anisotropic diffusion filter.

Table 1 .
Filtered dataset after utilization of anisotropic diffusion filter.

Table 3 .
Hyperparameter settings for model training.

Table 4 .
Information about the suggested dilated PDCNN framework.

Table 5 .
Performance parameters of dilated PDCNN model with ML classifiers on dataset-I.

Table 6 .
Performance parameters of dilated PDCNN model with ML classifiers for dataset-II.

Table 7 .
Performance parameters of dilated PDCNN model with ML classifiers for dataset-III.

Table 8 .
Evaluation results of the proposed system with the binary classification dataset-Ⅰ.

) Error (%) Time (s) Kappa PDCNN
Comparing accuracy across different configurations of dilation rates.

Table 8 .
Evaluation results of the proposed system with the binary classification dataset-I.

Table 9 .
Evaluation results of the proposed system with the multiclass Figshare dataset-II.

Table 10
presents the kappa values, accuracy, error, and training duration for the recommended PDCNN and dilated PDCNN models that employ the SVM, KNN, NB, and decision tree classifiers for dataset-III, in that order.

Table 10 .
Evaluation results of the proposed system with the multiclass Kaggle dataset-III.

Table 11 .
Assessment of the employed Kaggle and Figshare datasets with the methods currently in use.

Table 11 .
Assessment of the employed Kaggle and Figshare datasets with the methods currently in use.