Multi-Scale CNN: An Explainable AI-Integrated Unique Deep Learning Framework for Lung-Affected Disease Classification

: Lung-related diseases continue to be a leading cause of global mortality. Timely and precise diagnosis is crucial to save lives, but the availability of testing equipment remains a challenge, often coupled with issues of reliability. Recent research has highlighted the potential of Chest X-Ray (CXR) images in identifying various lung diseases, including COVID-19, fibrosis, pneumonia


Introduction
The field of medical image analysis has witnessed remarkable advancements in recent years, particularly in the context of diagnosing lung-related diseases.Among these, the severe acute respiratory syndrome caused by the coronavirus type 2 (SARS-CoV-2), commonly known as COVID-19, has posed unprecedented challenges to healthcare systems worldwide.Since its emergence in Wuhan, Hubei, China, in December 2019, COVID-19 has evolved into a global pandemic, with staggering statistics as of 30 July 2023-more than 768 million reported cases spanning 234 countries and over 6.9 million lives lost [1].COVID-19 manifests with a spectrum of symptoms, including fever, cough, fatigue, shortness of breath, and a loss of taste and smell.Given the rapid spread of the virus, swift and accurate diagnosis is paramount in controlling its worldwide impact.
Concerning COVID-19 image classification, chest X-rays (CXRs) have emerged as a valuable tool, notably as the initial image-based strategy employed in countries like Spain [1].When a patient is suspected of having COVID-19, a nasopharyngeal exudate sample is typically collected for reverse transcription-polymerase chain reaction (RT-PCR) analysis.Simultaneously, a chest X-ray is obtained to assess the patient's condition.The CXR plays a pivotal role in accelerating clinical evaluations, especially when PCR test results may only be available after several hours.In cases where both the clinical condition and CXR appear normal, patients may be discharged while waiting for the results of additional tests.However, if the CXR reveals abnormalities, the patient is often referred to a hospital for further evaluation.
In response to the global demand for lung-related disease testing, healthcare professionals have explored alternative diagnostic methods, particularly those relying on medical imaging techniques such as chest X-rays and computed tomography (CT) scans.These imaging modalities aid in confirming the presence of lung infection and tracking disease progression.Notably, when viral or bacterial infection affects the lungs, it manifests as distinctive radiological patterns, often referred to as ground-glass opacities (GGOs), visible in CXR images and chest CT scans.
Recent developments in deep learning (DL) have opened new avenues for predicting various lung-related diseases, including COVID-19 [2,3].Researchers have leveraged DL-powered models to detect and classify these diseases [4,5].Parallel to these developments, contemporary publications in system reliability research have provided rich insights, methodologies, and perspectives that can be thoughtfully integrated into the design and execution of deep learning models [6,7].These insights contribute to embracing such models' robustness, efficiency, and reliability, when applied to the intricate domain of medical image analysis.However, existing multi-class classification models have exhibited limitations, characterized by reduced accuracy and complexity.The inherent complexity of these models has hindered their effectiveness in making precise diagnostic decisions.Existing methodologies struggle in accurate disease classification as the number of the disease class increases, impacting precision and recall rates.
To address these challenges and critical gaps in the existing research, an innovative DL architecture called a multi-scale CNN (MS-CNN) is presented.This model is specifically designed for the classification of multiple lung-related diseases, including COVID-19, bacterial pneumonia, viral pneumonia, fibrosis, lung opacity, tuberculosis, and normal cases.One of the key strengths of the proposed approach lies in its ability to maintain high accuracy, reliability, and efficiency even as the number of disease classes increase, overcoming a prevalent drawback in the existing literature.Another unique strength is that predictions from adjacent layers are carefully combined with the model's backbone, preventing the oversight of vital predictions in this innovative approach.Furthermore, it is worth noting that the proposed approach aims to significantly reduce testing time compared to the state-of-the-art (SOTA) models.This streamlined efficiency has the potential to achieve precise diagnostic results and expedite diagnostic processes particularly in realworld clinical scenarios, ensuring timely and effective medical interventions for patients with various lung-related diseases.
Moreover, XAI techniques such as SHAP and Grad-CAM have been integrated to visualize and identify the regions of CXR images that contribute most to the model's predictions.This further enhances the model's interpretability and provides valuable insights into its decision-making process.SHAP values provide insights into pixel contributions for each instance in the dataset, shedding light on the significance of different image regions in the model's decision.Grad-CAM generates heatmaps highlighting areas of interest within the images that the model relies on for classification.This additional layer of transparency enhances the reliability and trustworthiness of the deep learning model's outputs, making it a valuable tool in the clinical setting.
The major contributions of this study can be summarized as follows: 1.
To create a comprehensive dataset encompassing seven distinct classes (COVID-19, normal, viral pneumonia, bacterial pneumonia, fibrosis, lung opacity, and tuberculosis), four publicly available datasets were combined.

2.
An MS-CNN model is proposed to detect six lung-related disorders and healthy patients from the CXR images where predictions from different layers are combined, avoiding any instances of overlooking or omitting important predictions.

3.
Predictions from several layers are concatenated to create a variety of feature maps that operate at various resolutions in order to improve the accuracy and effectiveness of multi-class predictions.4.
The performance of the proposed MS-CNN model is compared with popular TL models (VGG16 and VGG19) and other SOTA models proposed in the literature. 5.
The XAI techniques were integrated to enhance the interpretability and trustworthiness of the model by providing visual insights into how the model makes predictions and highlighting the regions of importance in the chest X-ray images for different disease classifications.
The remaining parts of this paper are structured as follows.Section 2 reviews existing research works connected to this study.Section 3 explains the dataset collection and creation, preprocessing, suggested system architecture, hyperparameter settings and experimental settings, and performance metrics.Section 4 explains experimental results for dataset 1 to dataset 10 and explainable AI on multiscale-CNN interpretability using SHAP and Grad-CAM techniques.Section 5 presents discussions on comparative analysis with other published research and pre-trained models.Finally, the conclusions along with future research directions are drawn in Section 6.

Literature Review
Since the beginning of the COVID-19 catastrophe, investigators have developed several deep learning-based methods for accurately detecting COVID-19-positive patients using a variety of radiological imaging techniques, including CXR and CT scans.The investigation on COVID-19 diagnosis that predominantly relied on AI-based techniques, notably machine learning, and deep learning, are highlighted in this section.
To identify COVID-19 utilizing chest X-ray image classification, a deep CNN architecture was suggested by Reshi et al. [8].The dataset used in the architecture was preprocessed using several methods throughout multiple stages, which involved balancing the dataset, having medical professionals analyze the photos, and enhancing the data.The trial outcomes demonstrated an astounding total accuracy of 99.5%, underscoring the suggested CNN model's outstanding performance in this application domain.The study by Muhammad et al. [9] presented a CNN model that had fewer model parameters but produced good accuracy.The model is made up of five primary convolution connection layers or blocks.With this model, a multi-layer fusion strategy is designed to increase the effectiveness of COVID-19 screening.Observations were made utilizing databases of lung ultrasound (LUS) images and videos that were freely available.The precision, accuracy, and retrieval rate of the suggested fusion method's data gathering were impressively high at 92.5%, 91.8%, and 93.2%, respectively.In COVID-19 screening, these efficiency metrics outperform those of current cutting-edge CNN models.
A controlled study by Mahajan et al. [10] investigated COVID-19 detection utilizing radiology-based images, specifically chest X-rays, and analyzed several detection models including VGG16, VGG19, Residual Network, and Dark-Net.For predictions, these models were compared using the Single Shot MultiBox Detector (SSD), augmented by task-specific preprocessing approaches such as CLAHE.Notably, the study indicates the efficacy of the DenseNet201 + SSD512 model, with precision and recall rates of 93.01 and 94.98, respectively.
A hybrid COVID-CheXNet model based on deep learning was developed by Al-Waisy et al. [11] to detect the COVID-19 virus in chest X-ray images.The method successfully identified COVID-19 patients with a detection accuracy rate of 99.99% demonstrating high confidence in distinguishing between healthy individuals and those infected with COVID-19 based on the X-ray images.
Srivastava et al. [12] introduced an innovative custom CNN-based CoviXNet model.This model comprises 15 carefully designed layers, emphasizing the efficiency of the architecture.Their research showcased CoviXNet's exceptional performance in binary classification tasks related to COVID-19 detection.Notably, the model attained an accuracy rate of 99.47%, highlighting its potential as a powerful tool for diagnosing COVID-19 in medical imaging.
Nahiduzzaman et al. [13] developed a method for detecting COVID-19 cases among various lung diseases.A three-class classification approach specifically designed to identify COVID-19 cases from pneumonia and normal cases.To achieve this, the authors employed a CNN-ELM model and achieved 97.42% accuracy.CNN-ELM utilized a dataset of 12,701 samples with 512 features for model training.Additionally, 3176 data points were used to assess the model's performance.
A 2D-CNN model was designed to classify instances of bacterial pneumonia, COVID-19, and normal instances by Abida et al. [15].The proposed model demonstrated high performance, achieving an impressive accuracy of 97.49%.The model was also modified for five classes (bacterial pneumonia, COVID-19, fibrosis, normal, and tuberculosis) and six classes (bacterial pneumonia, COVID-19, fibrosis, normal, tuberculosis, and viral pneumonia) and secured an accuracy of 97.81% and 96.75%, respectively.This study's findings showcase the potential of the 2D-CNN approach for the accurate and efficient classification of different lung conditions, contributing to the field of medical imaging and disease diagnosis.
Elakkiya et al. [16] presented a novel approach for categorizing various diseases, including COVID-19, pneumonia, tuberculosis, and other specific conditions.They introduced the sharpened cosine similarity network (SCS-Net), which stands out from traditional neural networks by utilizing sharpened cosine similarity instead of dot products.In their experiments involving multi-class classification combining classes such as COVID-19, normal, pneumonia, and tuberculosis, the proposed SCS-Net demonstrated an accuracy rate of 94.05%.
Hussain et al. [17] introduced a novel CNN model named CoroDet.The primary objective of this model was to facilitate the automatic detection of COVID-19 through the utilization of raw chest X-ray and CT scan images.In their research, the authors comprehensively evaluated CoroDet's performance, employing a four-class classification approach involving categories such as bacterial pneumonia, COVID-19, normal, and viral pneumonia, achieving an accuracy rate of 91.20%.
Al-Timemy et al. [18] presented a pipeline for classifying five classes using a combination of ResNet-50 for DF (deep features) computation and an ensemble of subspace discriminant classifiers.Through their research, this pipeline emerged as the top performer in accurately classifying the five classes with an accuracy of 91.6% and a 95% confidence interval of 2.6%.Some of the recent developments in CXR and CT scan dataset-based research that utilized deep learning approaches similar to the proposed work are analyzed in this section.Ghoshal and Tucker [19] developed a Bayesian convolutional neural network (BCNN) to assess uncertainties and interpretability of coronavirus identification using COVID-19 CXR images.The results demonstrate that the pre-trained VGG-16 model significantly increased detection accuracy from 85.2% to 92.9%.By developing saliency maps to understand the suggested model's outcomes better, they also established the approach's interpretability.Narin et al. [20] provided a transfer learning-based method for classifying CXR pictures into COVID-19 and normal categories, using three pre-trained models, with ResNet50 achieving the highest accuracy.Oh et al. [21] developed a patch-based approach for training and fine-tuning the ResNet18 CNN model.Jain et al. [22] used X-ray images and transfer learning-based algorithms for COVID-19 screening and found that the Xception model achieved the highest accuracy of 97.97%.Hoon et al. [23] developed a decision tree classifier based on deep learning for COVID-19 screening, achieving a 95% accuracy rate for categorizing coronavirus patients.Pereira et al. [24] proposed a deep learning-based system that used a radiography image data augmentation approach for COVID-19 identification, achieving an F1-score of 0.89.Sakib et al. [25] used a generic augmentation method and GAN to create artificial COVID-19 pictures, achieving a test data accuracy of 93.94%.
Makris et al. [26] conducted a study in which they offered numerous CNN models with transfer learning techniques to categorize three distinct categories.According to their observations, VGG16 had the maximum accuracy, with a score of 95.88%.Then, in a study by Khalid El Asnaouia et al. [27], numerous pre-trained CNN models were proposed to categorize three separate classes.According to their results, Inception ResnetV2 had the best accuracy of 92.18%.Furthermore, Saiz et al. [28] suggested a CNN VGG16 approach utilizing CLAHE.As per the study's findings, utilizing CLAHE on the database led to an accuracy level of 94% rather than an accuracy rate of 83% without such an approach.COVIDXNet, a deep learning framework proposed by the authors of [29], can facilitate radiologists in automatically diagnosing COVID-19.The suggested framework included seven distinct architectures, including a modified VGG19 and Google MobileNet's second version.Rahimzadeh et al. [30] provided a method for classifying X-ray images into three groups based on two publicly accessible datasets.They also showed how Xception and ResNet50V2 might be used to enhance classification accuracy.
While many studies have reported impressive accuracy in binary and limited-class classification scenarios, their performance consistently degrades as the number of classes increases.This phenomenon arises due to the increasing complexity of distinguishing between multiple conditions with features having minute differences.This limitation hampers the applicability of these models in real-world clinical applications where patients may exhibit diverse lung conditions.Therefore, a tailor-made and robust deep learning framework is required to perform multi-class classification of lung diseases with high accuracy and confidence for real-life scenarios.

Methodology
Figure 1 shows the general workflow of the proposed research work.A larger dataset with seven classes was produced by combining CXR images from publicly available sources.The dataset was split into three separate datasets for three different operations-80% of the original for training, 10% for validation, and 10% for testing.After that, the training data were appropriately preprocessed through resizing, rescaling, and augmentation.Preprocessing was performed after data splitting to ensure that information from the validation and test sets did not influence the preprocessing decisions made on the training set.This helps to maintain the integrity of the evaluation process, as the validation and test sets should represent real-world data that the model will encounter in general.Each class comprises 950 images, 760 for training, 95 for validation, and 95 for testing.To achieve the optimum outcome, various hyperparameters were utilized.Additionally, binary, three-class, four-class, five-class, six-class, and seven-class datasets were trained with the MS-CNN model.Finally, the model's effectiveness was demonstrated by a comparative analysis using a variety of performance metrics.

Chest X-ray Databases
Most of the datasets used in this investigation were acquired from four distinct reputable sources.Figure 2  (2) Curated Dataset for COVID-19 2 (accessed on 16 February 2023) [32], (3) NIAID TB dataset 3 (accessed on 12 May 2023) [33], and (4) NIH Chest X-ray Dataset 4 (accessed on 9 August 2023) [34].The datasets are utilized in this study in the following manner  This database contains 1900 CXRs, with the images evenly divided between COVID-19 patients and healthy participants.The COVID-19 Radiography Database [31] obtains all CXRs from affected and healthy individuals.This dataset is intended to be split into two categories.

Dataset 2
This database contains 2850 images, 950 of which are COVID-19 images, 950 of which are Normal images, and 950 of which are Fibrosis images.The COVID-19 Radiography Database was used to obtain COVID-19 and healthy person images [31].The 950 CXR pictures in this dataset come from the NIH Chest X-ray Dataset [34].A three-class categorization is devised for this balanced dataset.

Dataset 3
This dataset contains 2850 images, 950 of which are COVID-19, 950 of which are normal, and 950 of which are tuberculosis images.The COVID-19 Radiography Database was used to obtain COVID-19 and healthy person images [31].The 950 CXR tuberculosis images in the mix come from the NIAID TB dataset [33].A three-class categorization is planned for this balanced dataset.This database contains 1900 CXRs, with the images evenly divided between COVID-19 patients and healthy participants.The COVID-19 Radiography Database [31] obtains all CXRs from affected and healthy individuals.This dataset is intended to be split into two categories.

Dataset 2
This database contains 2850 images, 950 of which are COVID-19 images, 950 of which are Normal images, and 950 of which are Fibrosis images.The COVID-19 Radiography Database was used to obtain COVID-19 and healthy person images [31].The 950 CXR pictures in this dataset come from the NIH Chest X-ray Dataset [34].A three-class categorization is devised for this balanced dataset.

Dataset 3
This dataset contains 2850 images, 950 of which are COVID-19, 950 of which are normal, and 950 of which are tuberculosis images.The COVID-19 Radiography Database was used to obtain COVID-19 and healthy person images [31].The 950 CXR tuberculosis images in the mix come from the NIAID TB dataset [33].A three-class categorization is planned for this balanced dataset.

Dataset 4
This dataset contains 2850 images, 950 of which are COVID-19, 950 are normal, and 950 are bacterial pneumonia images.The COVID-19 Radiography Database was used to obtain COVID-19 and healthy person images [31].The 950 CXR bacterial pneumonia images in the mix come from the COVID-19 Curated Dataset [32].A three-class categorization is designed for this balanced dataset.

Dataset 5
This dataset contains 950 COVID-19, 950 healthy individuals, 950 TB, and 950 Fibrosis images.All COVID-19 and Normal images are gathered from the COVID-19 Radiography Database [31].The NIAID TB dataset [33] and the NIH Chest X-ray Dataset [34], respectively, served as the sources of the remaining 950 images of tuberculosis and 950 images of fibrosis.A 4-class classification is considered for this balanced dataset.

Dataset 6
This dataset has 950 COVID-19, 950 healthy individuals, 950 Bacterial Pneumonia, and 950 Fibrosis images.All COVID-19 and Normal images are gathered from the COVID-19 Radiography Database [31].COVID-19 Curated Dataset [32] and the NIH Chest X-ray Dataset [34], respectively, served as the sources of the remaining 950 photos of bacterial pneumonia and 950 images of fibrosis.A 4-class classification is considered for this balanced dataset.

Dataset 7
In this collection, 950 COVID-19 images, 950 images of healthy individuals, 950 images of bacterial pneumonia, and 950 tuberculosis images are found.All COVID-19 and normal images are gathered from the COVID-19 Radiography Database [31].The COVID-19 Curated Dataset [32] and the NIAID TB dataset [33], respectively, served as the source of the remaining 950 images of bacterial pneumonia and 950 images of tuberculosis.A 4-class classification is considered for this balanced dataset.

Dataset 8
This CXR assembly of 4750 images is spanned evenly across 950 images of COVID-19, 950 images of healthy individuals, 950 images of TB, 950 images of Bacterial Pneumonia, and 950 images of Fibrosis.The COVID-19 Radiography Database results in the images of COVID-19 and healthy persons.In addition, the COVID-19 Curated Dataset [32] is used to gather 950 images of bacterial pneumonia.While the 950 images of fibrosis are derived from the NIH Chest X-ray Dataset [34], the remaining 950 images of tuberculosis are gathered from the NIAID TB dataset [33].The five-class categorization is considered in this regard.

Dataset 9
This CXR assembly of 5700 images is spanned evenly across 950 images of COVID-19, 950 images of healthy individuals, 950 images of TB, 950 images of bacterial pneumonia, 950 images of viral pneumonia, and 950 fibrosis images.The COVID-19 Radiography Database results in images of COVID-19, viral pneumonia, and healthy persons.In addition, the COVID-19 Curated Dataset [32] is used to gather 950 images of bacterial pneumonia.While the 950 images of fibrosis are derived from the NIH Chest X-ray Dataset [34], the remaining 950 images of TB are gathered from the NIAID TB dataset [33].The six-class categorization is considered in this regard.

Dataset 10
This CXR assembly of 6650 images is spanned evenly across 950 images of COVID-19, 950 images of healthy individuals, 950 images of TB, 950 images of Bacterial Pneumonia, 950 images of Viral Pneumonia, 950 images of Lung Opacity, and 950 images of Fibrosis.The COVID-19 Radiography Database results in images of COVID-19, Viral Pneumonia, Lung Opacity, and healthy persons.In addition, the COVID-19 Curated Dataset [32] is used to gather 950 images of Bacterial Pneumonia.While the 950 images of Fibrosis are derived from the NIH Chest X-ray Dataset [34], the remaining 950 images of Tuberculosis are gathered from the NIAID TB dataset [33].The seven-class categorization is considered in this regard.

Dataset Splitting
As mentioned before, 80% of the introduced datasets are used for training, 10% for testing, and 10% for validation.Each class uses 760 images for training purposes for dataset 1 to dataset 10.In each class, 95 images are utilized for testing, and 95 images are used for validation.Table 1 represents details of the datasets.

Pre-Processing and Augmentation
The images are scaled to match the input dimension for the CNN, with larger images suppressing the traits of interest most likely.To begin, all images are downsized to 300 × 300 pixels.Then, all pixels [0, 1] are rescaled using the min-max normalization approach.Additionally, image augmentation techniques were used to address the limited number of images in the datasets and increase training efficiency while preventing model overfitting.

Sample-Wise Centering
This technique was applied to ensure that the mean pixel value of individual images was set to zero.It involves adjusting the brightness levels of the image while preserving the relative differences between pixels.

Sample-by-Sample Standard Deviation Normalization
This technique involves rescaling the pixel values based on their associated standard deviation.This normalization process helps to standardize the variability of pixel values across different images.

Horizontal Flipping
This technique involves creating a mirrored version of the original image by flipping it horizontally.In the context of lung images, this augmentation is relevant as lung structure and patterns can be symmetric.

Image Generator
The image generator uses the sample-wise center for augmentation to make the single image's mean pixel value zero.Following that, sample-by-sample standard deviation normalization is used to partition images based on their associated standard deviation value.Finally, the horizontal flip is used to flip photographs horizontally.
These augmentation techniques were specifically chosen to enhance the diversity of the dataset while ensuring that the transformations were meaningful for lung images.

Proposed Multi-Scale CNN Architecture
The proposed architecture in Figure 3 has two components: a backbone and a CNN head.The backbone is a pre-trained image classification network acting as a feature extractor.Here, the top layers of the pre-trained network are extracted, and the bottom layers are removed to provide only the low-level extracted feature maps.Using VGG-16 as the backbone, convolutional layers as the head for feature extraction in multiple scales and filter size optimizations, the model's ability to extract discriminative features from CXR images is enhanced [35].It leverages pre-trained weights, benefits from transfer learning, captures features at multiple scales, and adapts to the specific characteristics of CXR images.Transfer learning allows the model to transfer the knowledge gained from a source task (ImageNet classification) to a target task (CXR classification) [35].This is especially valuable when the target task has limited labeled data, as it enables the model to generalize better and achieve higher accuracy by leveraging the learned representations from a related task.These advantages contribute to improved accuracy and robustness in CXR classification tasks.
generalize better and achieve higher accuracy by leveraging the learned representations from a related task.These advantages contribute to improved accuracy and robustness in CXR classification tasks.VGG-16 is a deep neural network architecture pre-trained on the large-scale ImageNet dataset [35].The VGG-16 backbone consists of convolutional and max pooling layers, gradually reducing the spatial dimensions while increasing the number of channels.Using its pre-trained weights, the model can leverage the knowledge learned from millions of images to initialize its feature extraction process.This helps in capturing generic visual features useful for a wide range of image classification tasks, including CXR classification.
CNN head comprises multiple convolutional layers stacked together and added to the top of the backbone model.By incorporating these convolutional layers into the head of the model, the architecture is customized to extract features at multiple scales.The VGG16 layers closer to the input learn low-level features like edges and textures, while deeper layers in the CNN head learn more complex and abstract features.This is crucial for CXR classification, as abnormalities within the image can appear in unusual sizes.Multi-scale feature mapping enables the model to become more robust and capable of identifying abnormalities of varying sizes, improving its overall accuracy in CXR classification.VGG-16 is a deep neural network architecture pre-trained on the large-scale ImageNet dataset [35].The VGG-16 backbone consists of convolutional and max pooling layers, gradually reducing the spatial dimensions while increasing the number of channels.Using its pre-trained weights, the model can leverage the knowledge learned from millions of images to initialize its feature extraction process.This helps in capturing generic visual features useful for a wide range of image classification tasks, including CXR classification.
CNN head comprises multiple convolutional layers stacked together and added to the top of the backbone model.By incorporating these convolutional layers into the head of the model, the architecture is customized to extract features at multiple scales.The VGG16 layers closer to the input learn low-level features like edges and textures, while deeper layers in the CNN head learn more complex and abstract features.This is crucial for CXR classification, as abnormalities within the image can appear in unusual sizes.Multi-scale feature mapping enables the model to become more robust and capable of identifying abnormalities of varying sizes, improving its overall accuracy in CXR classification.
The input image size is set to 300 × 300 pixels, and the VGG-16 network acts as the backbone of the model consisting of five blocks.Blocks 1 to 5, the part of the VGG-16 backbone helps extract hierarchical features from the input image.In each block, the filter size is doubled on the top Conv2D layer, while the feature map size is halved in the MaxPool-ing2D layer.The first two blocks, named Block 1 and Block 2, include two Conv2D layers (convX_1 and convX_2) and one MaxPooling2D layer (maxpoolX).Layer output sizes of the Conv2D layers in Block 1 and Block 2 are 300 × 300 × 64 and 150 × 150 × 128, respec-tively.The corresponding MaxPooling2D layers produce output sizes of 150 × 150 × 64 and 75 × 75 × 128.The MaxPooling2D layers downsample the feature maps, reducing their spatial dimensions.The following three blocks, named Block 3, Block 4, and Block 5, consist of three Conv2D layers and one MaxPooling2D layer (convX_1, convX_2, convX_3, and maxpoolX).The Conv2D layers progressively keep decreasing in spatial dimensions, resulting in feature maps with sizes of 75 × 75 × 256 (Block 3), 37 × 37 × 512 (Block 4), and 18 × 18 × 512 (Block 5).The MaxPooling2D layers further perform downsampling of the feature maps.The MaxPooling2D layers produce output sizes of 37 × 37 × 256, 18 × 18 × 512, and 18 × 18 × 512, respectively.The first effective layer responsible for Chest X-ray (CXR) classification, conv4_3, has a spatial dimension of 38 × 38, representing a considerable reduction compared to the input image size.Higher-resolution feature maps play a crucial role in detecting small edges and patterns in the image.
Afterward, a CNN head consisting of Block numbers from 6 to 11 introduces additional convolutional layers to the model, increasing its complexity.This allows the model to learn more intricate and abstract features of the CXR input image.Gradually, as the Conv2D structure keeps decreasing in spatial dimensions, the resolution of the feature maps also decreases.The feature map from Block 4 (conv4_3) is connected to a Concatenate layer, and the feature map from Block 5 (maxpool5) is connected to the Conv6 block with an output size of 18 × 18 × 1024.Conv6 is then connected to the Conv7 block.Following Conv7, four additional convolutional blocks (Conv8 to Conv11) are added each containing two Conv2D layers.Each block in the architecture builds upon the features extracted by the previous blocks.The Conv8 block has conv8_1 and conv8_2 layers with output sizes of 18 × 18 × 256 and 9 × 9 × 512, respectively.The second layer of Conv8 (conv8_2) is connected to the first layer of Conv9 (conv9_1).Similarly, Conv9 has conv9_1 and conv9_2 layers with output sizes of 9 × 9 × 128 and 5 × 5 × 256, respectively.This pattern reiterates with conv9_2 further connecting to conv10_1 (5 × 5 × 128), conv10_2 (3 × 3 × 256), and finally conv11_1 (3 × 3 × 128).
Lastly, a Concatenation block is used to merge the feature maps from all of the convolutional layers, namely, conv4_3, conv6, conv7, conv8_2, conv9_2, conv10_2, and conv11_2 to combine the feature maps into a single concatenated feature map.All the smaller feature maps contain different levels of information extracted from the input image at different scales and resolutions.The resulting tensor from the concatenation operation has a more significant depth, combining the channels from individual feature maps.The output size of the Concatenate layer is 8096 × 16.
After the concatenation operation, the resulting tensor is passed through a flattened layer that converts the multi-dimensional tensor into a one-dimensional vector.Then, the flattened layer transforms the concatenated layer, which has a shape of 8096 × 16, into a flat vector of length 8096 × 6 = 129,536.Following the flattened layer, the flattened vector is passed through a dense layer with SoftMax activation, providing the classification probabilities for the input image across different classes.The choice of filter sizes, number of layers, and block configurations followed the SSD300 (Single-shot Multibox Detector) feature extraction standard and the abstraction practices of VGG16.Furthermore, introducing additional convolutional blocks (Conv Blocks 6 to 11) was purposeful, aiming to allow the model to learn more intricate and abstract features from the input.The basic working principle of the MS-CNN model is presented in Algorithm 1. Resize images: X resized = resize(X, 300 × 300 × 3) ii.

Experimental Setup and Hyperparameter Settings
The proposed model was trained using the TensorFlow API operating on a 64-bit Windows 11 Pro system.Keras and Scikit-Learn libraries facilitated seamless handling of diverse data with Multi-scale CNN model design, training, and evaluation tasks on the local machine setup mentioned in Table 2. Hyperparameters were meticulously determined through extensive experimentation on the platform.Multiple training runs, varying learning rates, patience, optimizers, epochs, and batch sizes were executed.Model performance was rigorously assessed on validation data to pinpoint the optimal combination of hyperparameters for desired model performance.
The training commenced with a conservative learning rate of 0.0001, with continuous progress monitoring.The rate of loss reduction guided adjustments to the learning rate, with a pivotal role played by a patience value of 10 in implementing effective early stopping, a technique crucial for preventing overfitting.Throughout training, the Adam optimizer dynamically adapted the learning rate for each parameter.datasets.
Center and standard normalize: Apply horizontal flip (HF) for training.

Model Construction:
i.
ROC: Area Under ROC (AUC) on X test : compute AUC (X test , M trained ).

Experimental Setup and Hyperparameter Settings
The proposed model was trained using the TensorFlow API operating on a 64-bit Windows 11 Pro system.Keras and Scikit-Learn libraries facilitated seamless handling of diverse data with Multi-scale CNN model design, training, and evaluation tasks on the local machine setup mentioned in Table 2. Hyperparameters were meticulously determined through extensive experimentation on the platform.Multiple training runs, varying learning rates, patience, optimizers, epochs, and batch sizes were executed.Model performance was rigorously assessed on validation data to pinpoint the optimal combination of hyperparameters for desired model performance.
The training commenced with a conservative learning rate of 0.0001, with continuous progress monitoring.The rate of loss reduction guided adjustments to the learning rate, with a pivotal role played by a patience value of 10 in implementing effective early stopping, a technique crucial for preventing overfitting.Throughout training, the Adam optimizer dynamically adapted the learning rate for each parameter.
Experimentation revealed that training for fewer than 15 epochs resulted in underfitting, while over 50 epochs led to overfitting.Thus, a balanced choice of 25 epochs struck the right pattern-capturing balance while avoiding overfitting.Different batch sizes were explored, with smaller sizes showing promise for better generalization, albeit at a slower training pace.Conversely, batch sizes exceeding 32 led to validation data exhibiting unstable minima.A batch size of 16 was deemed optimal, ensuring a balanced compromise between training speed, memory usage, and convergence.This meticulous hyperparameter tuning process yielded a high-performing neural model for the classification task, detailed in Tables 2 and 3 for reference.

Performance Metrics
A confusion matrix is a table that summarizes obtained predictions from a model with the actual ground truth labels of the dataset.A classification report is a comprehensive summary of various metrics, including precision, recall, F1-Score, and support (the number of occurrences of each class).The percentage of accurate predictions to the net predictions is known as accuracy (Ac).
Precision (Pr) is a metric used to evaluate the quality of the results produced by a model.
Recall (Rc), also known as sensitivity or true positive rate, is a measure used to quantitatively evaluate a model's performance.
F1 score is often considered more informative than accuracy as a performance metric when class imbalance is present.
where TP, TN, FP, and FN, denote true positives, true negatives, false positives, and false negatives, respectively.

Classification of Dataset 1
The classification outputs using dataset 1 are presented in Table 4.The results served as the foundation for the proposed model including a comparison with some transfer learning models for all the applied performance assessment criteria.The proposed MS-CNN training and validation accuracy curves, ROC curves, and confusion matrix are shown in Figure 4.This graph illustrates that the model acquired a testing accuracy of 100% and a loss of 0.0131.The confusion matrix also demonstrated that the proposed model performed correctly on 100% (87 images) of the COVID-19 images and 100% (103 images) of the normal images.The model does not misclassify any normal images as COVID-19 or COVID-19 images as normal.The model also achieved an AUC value of 1.00 for identifying COVID-19 samples compared to the healthy sample.A high recall value of 1.00 indicates that the model successfully decreased false-negative rates to zero, ensuring no significant cases of COVID-19 infection cases were missed.However, the high precision value of 1.00 shows that the model has no false positive rates; hence, COVID-19-infected cases were not frequently misclassified.The performance comparison between the proposed model and other TL models is presented in Figure 5.
where TP, TN, FP, and FN, denote true positives, true negatives, false positives, and false negatives, respectively.

Classification of Dataset 1
The classification outputs using dataset 1 are presented in Table 4.The results served as the foundation for the proposed model including a comparison with some transfer learning models for all the applied performance assessment criteria.The proposed MS-CNN training and validation accuracy curves, ROC curves, and confusion matrix are shown in Figure 4.This graph illustrates that the model acquired a testing accuracy of 100% and a loss of 0.0131.The confusion matrix also demonstrated that the proposed model performed correctly on 100% (87 images) of the COVID-19 images and 100% (103 images) of the normal images.The model does not misclassify any normal images as COVID-19 or COVID-19 images as normal.The model also achieved an AUC value of 1.00 for identifying COVID-19 samples compared to the healthy sample.A high recall value of 1.00 indicates that the model successfully decreased false-negative rates to zero, ensuring no significant cases of COVID-19 infection cases were missed.However, the high precision value of 1.00 shows that the model has no false positive rates; hence, COVID-19-infected cases were not frequently misclassified.The performance comparison between the proposed model and other TL models is presented in Figure 5.

Classification of Dataset 2
The proposed MS-CNN model training and validation accuracy curves, ROC curves, and confusion matrix are shown in Figure 6.The outputs of dataset 2 are presented in Table 5 and Figure 7.The testing accuracy and loss of the proposed model were 99.65% and 0.0236, respectively.The CM indicated that the proposed model misclassified one COVID-19 image as fibrosis, two fibrosis as COVID-19, and five normal as fibrosis.The AUC values for COVID-19, Fibrosis, and Normal were 1.00, 0.99, and 0.99, respectively.

Classification of Dataset 2
The proposed MS-CNN model training and validation accuracy curves, ROC curves, and confusion matrix are shown in Figure 6.The outputs of dataset 2 are presented in Table 5 and Figure 7.The testing accuracy and loss of the proposed model were 99.65% and 0.0236, respectively.The CM indicated that the proposed model misclassified one COVID-19 image as fibrosis, two fibrosis as COVID-19, and five normal as fibrosis.The AUC values for COVID-19, Fibrosis, and Normal were 1.00, 0.99, and 0.99, respectively.

Classification of Dataset 3
The proposed MS-CNN model training and validation accuracy curves, ROC curves, and confusion matrix are shown in Figure 8.The outputs of the dataset 3 are presented in Table 6 and Figure 9.The testing accuracy and loss of the proposed model were 99.30%

Classification of Dataset 3
The proposed MS-CNN model training and validation accuracy curves, ROC curves, and confusion matrix are shown in Figure 8.The outputs of the dataset 3 are presented in Table 6 and Figure

Classification of Dataset 4
The proposed MS-CNN model training and validation accuracy curves, ROC cur and confusion matrix are shown in Figure 10.The outputs of the dataset 4 are prese in Table 7 and Figure

Classification of Dataset 5
The proposed MS-CNN model training and validation accuracy curves, ROC curves and confusion matrix are shown in Figure 12.

Classification of Dataset 5
The proposed MS-CNN model training and validation accuracy curves, ROC curv and confusion matrix are shown in Figure 12.The outputs of the dataset 5 are presented in Table 8 and Figure

Classification of Dataset 6
The proposed Multi-Scale CNN model training and validation accuracy curves, R curves, and confusion matrix are shown in Figure 14.The outputs of the dataset 6 presented in Table 9 and Figure 15.The testing accuracy and loss of the proposed model were 99.21% and 0.0498, respec tively.The CM indicated that the proposed model misclassified one Bacterial Pneumonia

Classification of Dataset 7
The proposed Multi-Scale CNN model training and validation accuracy curves, ROC curves, and confusion matrix are shown in Figure 16.The outputs of the dataset 7 are presented in Table 10 and Figure 17.

Classification of Dataset 7
The proposed Multi-Scale CNN model training and validation accuracy curves, ROC curves, and confusion matrix are shown in Figure 16.The outputs of the dataset 7 are presented in Table 10 and Figure 17.

Classification of Dataset 8
The proposed Multi-Scale CNN model training and validation accurac curves, and confusion matrix are shown in Figure 18.The outputs of data sented in Table 11 and Figure

Classification of Dataset 9
The proposed Multi-Scale CNN model training and validation accuracy curves, ROC curves, and confusion matrix are shown in Figure 20.The outputs of dataset 9 are presented in Table 12 and Figure

Classification of Dataset 10
The proposed MS-CNN model training and validation accuracy curves, ROC curves, and confusion matrix are shown in Figure 22.The outputs of dataset 10 are presented in Table 13 and Figure

Explainable AI on MS-CNN Interpretability
For the model interpretability through different explainable AI techniques, image plots were created to visualize SHAP values generated by the explainer object [36], and Grad-CAM was used to generate a heatmap of CXR images.
The weight of the final convolution layer is typically used to create a heatmap from the original image [37].The heatmap was then applied to the original input image to create the output image.The damaged area on the CXR is depicted in the overlay image to identify the disease category.By enabling quick viewing of the damaged region on the image, it will help medical professionals to identify problems.
To begin with, a SHAP explainer was established for the model to calculate SHAP values for a given set of instances.SHAP-partition explainer function was employed to create a specialized SHAP partition explainer explicitly designed for deep learning models.The SHAP values represent how much each pixel contributes to the model's output for every instance in the dataset.In binary classification, two sets of SHAP values correspond to the two classes.These SHAP values are organized in matrices where rows represent instances, and columns represent features.Positive values indicate features that push the prediction toward the positive class, while the negative values indicate features that push toward the negative class. Figure 24 shows an initial image plot generated using the SHAP values.The plot displays the actual image, with certain parts highlighted in shades of red and blue.Red areas signify positive contributions to the prediction of that class, while blue areas indicate negative contributions.Red regions enhance the probability of predicting a class, while blue regions diminish it.

Explainable AI on MS-CNN Interpretability
For the model interpretability through different explainable AI techniques, image plots were created to visualize SHAP values generated by the explainer object [36], and Grad-CAM was used to generate a heatmap of CXR images.
The weight of the final convolution layer is typically used to create a heatmap from the original image [37].The heatmap was then applied to the original input image to create the output image.The damaged area on the CXR is depicted in the overlay image to identify the disease category.By enabling quick viewing of the damaged region on the image, it will help medical professionals to identify problems.
To begin with, a SHAP explainer was established for the model to calculate SHAP values for a given set of instances.SHAP-partition explainer function was employed to create a specialized SHAP partition explainer explicitly designed for deep learning models.The SHAP values represent how much each pixel contributes to the model's output for every instance in the dataset.In binary classification, two sets of SHAP values correspond to the two classes.These SHAP values are organized in matrices where rows represent instances, and columns represent features.Positive values indicate features that push the prediction toward the positive class, while the negative values indicate features that push toward the negative class. Figure 24 shows an initial image plot generated using the SHAP values.The plot displays the actual image, with certain parts highlighted in shades of red and blue.Red areas signify positive contributions to the prediction of that class, while blue areas indicate negative contributions.Red regions enhance the probability of predicting a class, while blue regions diminish it.Figure 24 shows a lung opacity sample prediction extracted through the MS-CNN classifier using SHAP Partition Explainer with an image plot on a lung opacity sample.The model thinks the sample belongs to the top two categories: Lung Opacity and COVID.On the x-axis of Figure 24a, the higher SHAP value to the right corresponds to a higher prediction value ("Lung Opacity" class), and the lower SHAP value to the left corresponds to a lower prediction value (not the "Lung Opacity" class).The larger the pixel value in the lung region (the redder color), the higher the SHAP value.This means that when the pixel value of the lung outermost top, left, and suitable regions in Figure 24a are more extensive, the SHAP value corresponds to a higher prediction value.Hence, the model is more likely to consider the data as a "Lung Opacity" class.On the other hand, the smaller the pixel value in lung center regions (the bluer color), the smaller the SHAP value.Hence, when the pixel value of Figure 24a is smaller inside the lung cavities, the model is less likely to consider the data as a "Lung Opacity" class.Looking at Figure 24b, the "COVID" Figure 24 shows a lung opacity sample prediction extracted through the MS-CNN classifier using SHAP Partition Explainer with an image plot on a lung opacity sample.The model thinks the sample belongs to the top two categories: Lung Opacity and COVID.On the x-axis of Figure 24a, the higher SHAP value to the right corresponds to a higher prediction value ("Lung Opacity" class), and the lower SHAP value to the left corresponds to a lower prediction value (not the "Lung Opacity" class).The larger the pixel value in the lung region (the redder color), the higher the SHAP value.This means that when the pixel value of the lung outermost top, left, and suitable regions in Figure 24a are more extensive, the SHAP value corresponds to a higher prediction value.Hence, the model is more likely to consider the data as a "Lung Opacity" class.On the other hand, the smaller the pixel value in lung center regions (the bluer color), the smaller the SHAP value.Hence, when the pixel value of Figure 24a is smaller inside the lung cavities, the model is less likely to consider the data as a "Lung Opacity" class.Looking at Figure 24b, the "COVID" class is the second-highest probability, where the whole left-half region of the sample corresponds to the higher prediction value of the "COVID" class, and the right-half region corresponds to the lower prediction value of the "COVID" class.24.The model confuses the image with not only "COVID", as explained above, with the second-highest probability, but also with the "Fibrosis" and "Tuberculosis" classes.It can be seen in Figure 25d that the prominence of red areas (positive SHAP values) in the plot signifies a tendency toward the prediction "Lung Opacity" class, indicating the correct prediction.
class is the second-highest probability, where the whole left-half region of the sample corresponds to the higher prediction value of the "COVID" class, and the right-half region corresponds to the lower prediction value of the "COVID" class.
Figures 25 and 26 also show the first seven categories the model thinks the image belongs to. Figure 25 explains the same lung opacity sample for the seven class predictions used in Figure 24.The model confuses the image with not only "COVID", as explained above, with the second-highest probability, but also with the "Fibrosis" and "Tuberculosis" classes.It can be seen in Figure 25d that the prominence of red areas (positive SHAP values) in the plot signifies a tendency toward the prediction "Lung Opacity" class, indicating the correct prediction.The sample in Figure 26 shows an essential concept about explanations for black-box models; they explain what the model is predicting but do not attempt to explain if the predictions are correct.The similarity in magnitude of red areas (positive SHAP values) in Figure 26d with the presence of blue areas (negative SHAP values) in Figure 26f creates confusion in predicting either the "Fibrosis" class or the "Tuberculosis" class.The explainer generates positive SHAP values for "Fibrosis" and negative SHAP values for the "Tuberculosis" class, where the magnitudes are similar for both SHAP values, indicating a higher probability of misclassification in model prediction for the sample.
Figure 27 shows Grad-CAM representation on example images of lung disorders where the MS-CNN model primarily detects the afflicted area as (a) Fibrosis and (b) Tuberculosis.Grad-CAM shows that a region's more significant importance to the model is shown by its red hue, while its lesser priority is indicated by its blue color.However, caution should be taken while interpreting the heat maps.The same sample used for the SHAP explanation in Figure 26, when used in Grad-CAM, shows heatmap regions extracted from the deeper layer of the model generating heatmap for the "Fibrosis" class and "Tuberculosis" class.This indicates a higher probability of misclassification by the model.24.The model confuses the image with not only "COVID", as explained above, with the second-highest probability, but also with the "Fibrosis" and "Tuberculosis" classes.It can be seen in Figure 25d that the prominence of red areas (positive SHAP values) in the plot signifies a tendency toward the prediction "Lung Opacity" class, indicating the correct prediction.The sample in Figure 26 shows an essential concept about explanations for black-box models; they explain what the model is predicting but do not attempt to explain if the predictions are correct.The similarity in magnitude of red areas (positive SHAP values) in Figure 26d with the presence of blue areas (negative SHAP values) in Figure 26f creates confusion in predicting either the "Fibrosis" class or the "Tuberculosis" class.The explainer generates positive SHAP values for "Fibrosis" and negative SHAP values for the "Tuberculosis" class, where the magnitudes are similar for both SHAP values, indicating a higher probability of misclassification in model prediction for the sample.
Figure 27 shows Grad-CAM representation on example images of lung disorders where the MS-CNN model primarily detects the afflicted area as (a) Fibrosis and (b) Tuberculosis.Grad-CAM shows that a region's more significant importance to the model is shown by its red hue, while its lesser priority is indicated by its blue color.However, caution should be taken while interpreting the heat maps.The same sample used for the SHAP explanation in Figure 26, when used in Grad-CAM, shows heatmap regions extracted from the deeper layer of the model generating heatmap for the "Fibrosis" class and "Tuberculosis" class.This indicates a higher probability of misclassification by the model.The sample in Figure 26 shows an essential concept about explanations for blackbox models; they explain what the model is predicting but do not attempt to explain if the predictions are correct.The similarity in magnitude of red areas (positive SHAP values) in Figure 26d with the presence of blue areas (negative SHAP values) in Figure 26f creates confusion in predicting either the "Fibrosis" class or the "Tuberculosis" class.The explainer generates positive SHAP values for "Fibrosis" and negative SHAP values for the "Tuberculosis" class, where the magnitudes are similar for both SHAP values, indicating a higher probability of misclassification in model prediction for the sample.
Figure 27 shows Grad-CAM representation on example images of lung disorders where the MS-CNN model primarily detects the afflicted area as (a) Fibrosis and (b) Tuberculosis.Grad-CAM shows that a region's more significant importance to the model is shown by its red hue, while its lesser priority is indicated by its blue color.However, caution should be taken while interpreting the heat maps.The same sample used for the SHAP explanation in Figure 26, when used in Grad-CAM, shows heatmap regions extracted from the deeper layer of the model generating heatmap for the "Fibrosis" class and "Tuberculosis" class.This indicates a higher probability of misclassification by the model.

Comparative Analysis of Multi-Scale CNN with Different Datasets
Figure 28 compares the performance of the MS-CNN model in correctly identifying lung-related disorders for various datasets.The illustration clearly shows that the testing accuracy was lowest (96.05%) in the case of dataset 10 (seven classes), but when the number of classes was reduced, the testing accuracy improved.For example, the dataset 1 (Binary class) has an accuracy of 100.00%.However, a little discrepancy is discovered between dataset 3 (three classes), dataset 4 (three classes), and dataset 5 (four classes).

Comparative Analysis of Multi-Scale CNN with Different Datasets
Figure 28 compares the performance of the MS-CNN model in correctly identifying lung-related disorders for various datasets.The illustration clearly shows that the testing accuracy was lowest (96.05%) in the case of dataset 10 (seven classes), but when the number of classes was reduced, the testing accuracy improved.For example, the dataset 1 (Binary class) has an accuracy of 100.00%.However, a little discrepancy is discovered between dataset 3 (three classes), dataset 4 (three classes), and dataset 5 (four classes).Increasing class means including additional images of the same type of lung disease in the training and testing datasets, which reduces accuracy.Higher AUC values prove the model's capacity to correctly classify lung-related disorders even from a more significant number of lung disorders.

Comparative Analysis of Multi-Scale CNN with other Research in the Literature
A comparison between the proposed MS-CNN classification technique and other research performed by deep learning algorithms based on the CXR images with two-class, three-class, four-class, five-class, six-class, and seven-class is presented in Table 14.The COVID-CheXNet system proposed by Al-Waisy et al. [11] and the Al-Srivastava et al. [12]-proposed CoviXNet successfully diagnosed COVID-19 patients for binary classifications with an accuracy rate of 99.99% and 99.47%, respectively.In those cases, the MS-CNN model performed with 100% accuracy.
Nahiduzzaman et al. [13] employed a lightweight CNN-ELM method with only three layers in which they applied a three-class classification approach that achieved 97.42% accuracy.Yaman et al. [14] introduced the ACL model, combining attention, LSTM, and CNN for classifying healthy, COVID-19, and pneumonia cases in chest X-ray (CXR) images.The model achieved 96% accuracy on an 80:20 train/test ratio.Changing the ratio creates an impact on the accuracy.However, in this model, every layer's outputs were merged to extract additional features to predict the exact output with a higher accuracy of 98.60%.
Abida et al. [15] designed a 2D-CNN model to classify Bacterial Pneumonia, COVID-19, Fibrosis, Lung Opacity, Normal, Tuberculosis, and Viral Pneumonia.For two-, three-, four-, five-, six-, and seven-class schemes, this model achieved 98.00%, 97.49%, 97.81%, 96.96%, 96.75%, and 93.15%, respectively.In their research, they utilized a lightweight 2D-CNN model with three Conv2D layers, which extracts features for classification, but more is needed to obtain higher accuracy in multi-class classifications.In two-, three-, and four-class schemes, they acquired good results compared to other related works, but at higher-class numbers, the accuracy is reduced.For example, in the seven-class scheme, the classification accuracy is 93.15%.However, the proposed model achieved an accuracy of 96.05%, which is nearly 3% greater than Abida et al.The proposed model used all the layer's output predictions to merge from the multiple feature maps at different resolution scales to improve class predictions.
Elakkiya et al. [16] presented a novel approach SCS-Net for categorizing COVID-19, pneumonia, tuberculosis, and normal with an accuracy of 94.05%.Hussain et al. [17] introduced CoroDet employing a four-class classification, achieving an accuracy of 91.20%.Al-Timemy et al. [18] presented a classification of five classes using a combination of ResNet-50 for DF (Deep Features) computation and an ensemble of subspace discriminant classifiers with an accuracy of 91.6%.In these cases, the proposed model achieved better scores of 98.95%, 98.33%, and 97.00%.

Comparison with Datasets of Other Literature
The MS-CNN model was further validated by training and testing the model on other datasets (balanced and imbalanced both).Model comparison is shown in Table 15.Al-Waisey et.al. [11] proposed a COVID-CheXNet framework with two deep learning methods (e.g., ResNet34 and HRNet).The authors created their own COVID-19-vs-normal dataset.The dataset contains 400 images of confirmed COVID-19 cases gathered from 4 different sources and 400 chest X-ray images of normal condition.The whole dataset is split into training, validation, and test sets with 70% for training and validation and 30% for test set evaluation.The proposed MS-CNN model was evaluated for binary-class performance on the dataset.In the literature, ResNet34 and HRNet diagnosed the COVID-19 patients with a DAR (detection accuracy rate) of 89.98% and 90%, respectively.The proposed model outperformed both ResNet34 and HRNet models with a testing accuracy of 99.38%.MS-CNN obtained an average testing accuracy, precision, recall, and f1-score of 99.38%, 99.38%, 99.38%, and 99.98%, respectively.A class-wise comparison has been shown against 2D-CNN architecture with transfer learning on the dataset developed by Abida et.al. [15].The dataset contains 18,564 CXR images.The MS-CNN was tested for 5-7 class performances on the dataset of [15].On 5, 6, and 7 classes, the proposed model outperformed 2D-CNN with an average testing accuracy of 98.80%, 98.10%, and 95.18%, respectively.Model training times were also comparably lower than the 2D-CNN with the highest seven-class training time being 50 min (almost 12 min faster than 2D-CNN).

Comparison with State-of-the-Art Models on Dataset 10
The efficiency of the proposed MS-CNN was compared with the current state-of-theart classification architectures on dataset 10.The training dataset contains 5320 CXR images (80%), whereas the validation (10%) and testing (10%) datasets contain the rest of the 1330 images, 665 each.The proposed MS-CNN was tested for seven class performances against SOTA architectures like DenseNet, InceptionResNetV2, NasNet, ResNet, etc., using transfer learning with the top layer removed.Models such as NASNet Mobile, ResNet101V2, and ResNet152 performed best with fine-tuned weights trained with dataset 10 on the bottom layers and pre-trained ImageNet weights on the top layers.On the other hand, ResNet50 and ResNet101 performed best with fully-trained top and bottom layer weights.The rest of the pre-trained models did not need any further modifications with weight training to generate low-bias and low-variance predictions.The models were trained on Adam Optimizer for 25 epochs or less with early stop callback for patience 10.The proposed model outperformed most models with an average testing accuracy, precision, and recall of 96.05%, 97.00%, and 95.00%, respectively.The comparison of performance metrics is shown in Table 16 and computational time is presented in Figure 29.The MS-CNN model only took only 3.12 s for evaluating the whole test dataset.

Strength and Limitations
Motivated by the limitations of higher-class classification problems, the authors of this study integrated multiple databases to build a dataset of 6650 CXR images classified into a maximum of seven class classifications.In this investigation, the MS-CNN model, a pioneering deep learning framework tailored to excel in multi-class classification scenarios, was applied.Building on the strengths of existing models, this framework was designed to overcome this challenge by employing advanced architectural and optimization techniques.In the proposed model, multiple Conv2D blocks were applied and concatenated to use all the layer's output predictions to merge from the multiple feature maps at different resolution scales to improve class predictions.The resulting model achieved high accuracy across diverse lung conditions, even as the class count expanded, making it a robust tool for accurate disease classification.
The term "Multi-Scale CNN" denotes the model's ability to integrate information from different resolution scales using multiple feature maps.This allows it to capture fine and coarse-level features within images, enhancing its effectiveness in identifying lungrelated diseases with varied manifestations.
In all the formed datasets from dataset 1 to dataset 10, the model was run and it was observed that in every case the scores are superior.All the details regarding this are presented in Section 5.1.In comparison with other recent research, it was observed that the proposed model performed better than others in terms of testing accuracy shown in Section 5.2.Compared with others in their respective datasets either balanced or imbalanced, the current model performed better as presented in Section 5.3.In the case of the maximum number of classes (seven classes) in dataset 10, the proposed model outperformed various pre-trained models in discriminating lung disorders as well as healthy individuals

Strength and Limitations
Motivated by the limitations of higher-class classification problems, the authors of this study integrated multiple databases to build a dataset of 6650 CXR images classified into a maximum of seven class classifications.In this investigation, the MS-CNN model, a pioneering deep learning framework tailored to excel in multi-class classification scenarios, was applied.Building on the strengths of existing models, this framework was designed to overcome this challenge by employing advanced architectural and optimization techniques.In the proposed model, multiple Conv2D blocks were applied and concatenated to use all the layer's output predictions to merge from the multiple feature maps at different resolution scales to improve class predictions.The resulting model achieved high accuracy across diverse lung conditions, even as the class count expanded, making it a robust tool for accurate disease classification.
The term "Multi-Scale CNN" denotes the model's ability to integrate information from different resolution scales using multiple feature maps.This allows it to capture fine and coarse-level features within images, enhancing its effectiveness in identifying lung-related diseases with varied manifestations.
In all the formed datasets from dataset 1 to dataset 10, the model was run and it was observed that in every case the scores are superior.All the details regarding this are presented in Section 5.1.In comparison with other recent research, it was observed that the proposed model performed better than others in terms of testing accuracy shown in Section 5.2.Compared with others in their respective datasets either balanced or imbalanced, the current model performed better as presented in Section 5.3.In the case of the maximum number of classes (seven classes) in dataset 10, the proposed model outperformed various pre-trained models in discriminating lung disorders as well as healthy individuals in terms of the performance matrixes used along with the computational times shown in Section 5.4.
The lack of data on other types of lung disorders limits this study.Significant improvements can be made with greater data availability and algorithm training using radiological data from patients and nonpatients throughout the world.It should be noted that this MS-CNN model was not constructed based on a lightweight structure, but VGG-16 was employed as the backbone of the model, and some Conv2D layers were additionally used to concatenate the output, which employs more parameters than some pre-trained models.Therefore, running the model might require more hardware resources.However, this minor issue did not limit its superiority in terms of higher accuracy and shorter testing time.

Conclusions
In this study, a highly accurate Multi-Scale CNN architecture was designed to predict 724 distinct classes of images, encompassing COVID-19 and five other lung-affected disorders.Notably, the MS-CNN model exhibits remarkable efficiency in COVID-19 detection, resulting in significantly higher testing accuracy compared to the previous methodologies.Even as the number of classes increases, the MS-CNN consistently outperformed all previously reported models in the literature, showcasing a novel approach that addresses a persistent limitation in the existing research.Additionally, the current approach substantially shortens the testing duration in comparison with the state-of-the-art models, offering the potential for expedited medical interventions for patients with lung-related diseases.In the case of dataset 10, which comprises seven classes, the MS-CNN model achieves an impressive accuracy rate of 96.05%, complemented by precision, recall, F1-score, and AUC values averaging at 97%, 95%, 95%, and 94%, respectively.Likewise, in dataset 9, encompassing six classes, the MS-CNN demonstrates an accuracy rate of 97.47%, coupled with precision, recall, F1-score, and AUC values averaging at 96%, 95%, 95%, and 99%, respectively.Better classification scores are achieved by merging predictions from several feature maps at various resolution scales using the additional Conv2D layers with the backbone VGG16.SHAP and Grad-CAM as XAI techniques were integrated into the model, enhancing its interpretability, which ultimately brings further confidence for practical applications.As part of future development, a comprehensive plan has been devised to expand the number of disease classes in future studies.
should represent real-world data that the model will encounter in general.Each class comprises 950 images, 760 for training, 95 for validation, and 95 for testing.To achieve the optimum outcome, various hyperparameters were utilized.Additionally, binary, three-class, four-class, five-class, six-class, and seven-class datasets were trained with the MS-CNN model.Finally, the model's effectiveness was demonstrated by a comparative analysis using a variety of performance metrics.

Figure 1 .
Figure 1.A schematic of the overall Multi-scale CNN system architecture.
shows examples of Chest X-ray images of Bacterial Pneumonia, COVID-19, fibrosis, lung opacity, tuberculosis, viral pneumonia, and normal subjects utilized in the proposed work.The following public datasets of CXR images were used in this study: (1) COVID-19 Radiography Database 1 (accessed on 16th February 2023) [31],

Figure 1 .
Figure 1.A schematic of the overall Multi-scale CNN system architecture.
shows examples of Chest X-ray images of Bacterial Pneumonia, COVID-19, fibrosis, lung opacity, tuberculosis, viral pneumonia, and normal subjects utilized in the proposed work.The following public datasets of CXR images were used in this study: (1) COVID-19 Radiography Database 1 (accessed on 16 February 2023) [31],

Figure 3 .
Figure 3. Block diagram of Multi-scale CNN architecture.

Figure 3 .
Figure 3. Block diagram of Multi-scale CNN architecture.

Figure 4 .
Figure 4. (a) Accuracy curves, (b) Loss curves, (c) Confusion Matrix, and (d) ROC curves of the proposed Multi-Scale CNN model with two individual classes (COVID and Normal).

Figure 4 .
Figure 4. (a) Accuracy curves, (b) Loss curves, (c) Confusion Matrix, and (d) ROC curves of the proposed Multi-Scale CNN model with two individual classes (COVID and Normal).

Figure 6 .
Figure 6.(a) Accuracy curves, (b) Loss curves, (c) Confusion Matrix, and (d) ROC curves of the proposed Multi-Scale CNN model with three individual classes (COVID, Normal, and Fibrosis).

4. 3 .
Classification of Dataset 3 The proposed MS-CNN model training and validation accuracy curves, ROC curves, and confusion matrix are shown in Figure 8.The outputs of the dataset 3 are presented in Table 6 and Figure 9.The testing accuracy and loss of the proposed model were 99.30% and 0.0250, respectively.The CM indicated that the proposed model misclassified two COVID-19 images as tuberculosis, one COVID-19 as Normal, and one Tuberculosis as COVID-19.The AUC values for COVID-19, tuberculosis, and Normal each were 1.00.

9 .
The testing accuracy and loss of the proposed model were 99.30% and 0.0250, respectively.The CM indicated that the proposed model misclassified two COVID-19 images as tuberculosis, one COVID-19 as Normal, and one Tuberculosis as COVID-19.The AUC values for COVID-19, tuberculosis, and Normal each were 1.00.

Figure 8 .
Figure 8.(a) Accuracy curves, (b) Loss curves, (c) Confusion Matrix, and (d) ROC curves of the proposed Multi-Scale CNN model with three individual classes (COVID, Normal, and Tuberculosis).
11.The testing accuracy and loss of the proposed model were 98. and 0.1079, respectively.The CM indicated that the proposed model misclassified COVID-19 image as Normal, and eight Normal as Bacterial Pneumonia.The AUC va for COVID-19, Bacterial Pneumonia, and Normal each were 1.00.

4. 4 .
Classification of Dataset 4 The proposed MS-CNN model training and validation accuracy curves, ROC curves, and confusion matrix are shown in Figure 10.The outputs of the dataset 4 are presented in Table 7 and Figure 11.The testing accuracy and loss of the proposed model were 98.60% and 0.1079, respectively.The CM indicated that the proposed model misclassified one COVID-19 image as Normal, and eight Normal as Bacterial Pneumonia.The AUC values for COVID-19, Bacterial Pneumonia, and Normal each were 1.00.

Figure 10 .
Figure 10.(a) Accuracy curves, (b) Loss curves, (c) Confusion Matrix, and (d) ROC curves of th proposed Multi-Scale CNN model with three individual classes (COVID, Bacterial Pneumonia, and Normal).

4. 5 .
Classification of Dataset 5The proposed MS-CNN model training and validation accuracy curves, ROC curves, and confusion matrix are shown in Figure12.
13.The testing accuracy and loss of the proposed model were 99.74% and 0.0240, respectively.The CM indicated that the proposed model misclassified two COVID-19 images as Tuberculosis, three Fibrosis as COVID-19, five Fibrosis as Normal, and two Tuberculosis as COVID-19.The AUC values for COVID-19, Fibrosis, Normal, and Normal were 0.99, 0.99, 1.00, and 0.99, respectively.

4. 6 .
Classification of Dataset 6 The proposed Multi-Scale CNN model training and validation accuracy curves, ROC curves, and confusion matrix are shown in Figure 14.The outputs of the dataset 6 are presented in Table 9 and Figure 15.The testing accuracy and loss of the proposed model were 99.21% and 0.0498, respectively.The CM indicated that the proposed model misclassified one Bacterial Pneumonia image as COVID-19, one Bacterial Pneumonia as Fibrosis, one COVID-19 as Fibrosis, three Fibrosis as COVID-19, and one Normal as Fibrosis.The AUC values for Bacterial Pneumonia, COVID-19, Fibrosis, and Normal each were 1.00.

Figure 14 .
Figure 14.(a) Accuracy curves, (b) Loss curves, (c) Confusion Matrix, and (d) ROC curves of the proposed Multi-Scale CNN model with four individual classes (Bacterial Pneumonia, COVID, Fibrosis, and Normal).

Figure 16 .Figure 16 .
Figure 16.(a) Accuracy curves, (b) Loss curves, (c) Confusion Matrix, and (d) ROC curves of the proposed Multi-Scale CNN model with four individual classes (Bacterial Pneumonia, COVID, Normal, and Tuberculosis).The testing accuracy and loss of the proposed model were 98.95% and 0.0589, respectively.The CM indicated that the proposed model misclassified 3 Bacterial Pneumonia images as Tuberculosis, 11 COVID-19 as Tuberculosis, 2 Normal as Bacterial Pneumonia,
19.The testing accuracy and loss of the pr were 98.67% and 0.0715, respectively.The CM indicated that the proposed m sified one Bacterial Pneumonia image as COVID-19, two Bacterial Pneumon one COVID-19 as Fibrosis, three Fibrosis as COVID-19, one Fibrosis as Nor mal as Bacterial Pneumonia, one Tuberculosis as COVID-19, and three T Fibrosis.The AUC values for Bacterial Pneumonia, COVID-19, Fibrosis, No berculosis were 1.00, 1.00, 0.99, 1.00, and 0.97, respectively.

Figure 17 .
Figure 17.Average Precision (%), Average Recall (%), Average F1-Score (%), Average AUC (%), and Accuracy of the different models with four individual classes (Bacterial Pneumonia, COVID, Normal, and Tuberculosis) for Dataset 7. The testing accuracy and loss of the proposed model were 98.95% and 0.0589, respectively.The CM indicated that the proposed model misclassified 3 Bacterial Pneumonia images as Tuberculosis, 11 COVID-19 as Tuberculosis, 2 Normal as Bacterial Pneumonia, 2 Normal as COVID-19, 1 Normal as Tuberculosis, and 1 Tuberculosis as COVID-19.The AUC values for Bacterial Pneumonia, COVID-19, Normal, and Tuberculosis were 0.99, 0.95, 1.00, and 0.99, respectively.4.8.Classification of Dataset 8 The proposed Multi-Scale CNN model training and validation accuracy curves, ROC curves, and confusion matrix are shown in Figure 18.The outputs of dataset 8 are presented in Table 11 and Figure 19.The testing accuracy and loss of the proposed model were 98.67% and 0.0715, respectively.The CM indicated that the proposed model misclassified one Bacterial Pneumonia image as COVID-19, two Bacterial Pneumonia as Fibrosis, one COVID-19 as Fibrosis, three Fibrosis as COVID-19, one Fibrosis as Normal, two Normal as Bacterial Pneumonia, one Tuberculosis as COVID-19, and three Tuberculosis as Fibrosis.The AUC values for Bacterial Pneumonia, COVID-19, Fibrosis, Normal, and Tuberculosis were 1.00, 1.00, 0.99, 1.00, and 0.97, respectively.
21.The testing accuracy and loss of the proposed model were 97.47% and 0.0885, respectively.The CM indicated that the proposed model misclassified two Bacterial Pneumonia images as Fibrosis, three COVID-19 as Fibrosis, three Normal as Bacterial Pneumonia, three Normal as Fibrosis, two Tuberculosis as Fibrosis, seven Viral Pneumonia as Bacterial Pneumonia, one Viral Pneumonia as Fibrosis, and one Viral Pneumonia as Tuberculosis.The AUC values for Bacterial Pneumonia, COVID-19, Fibrosis, Normal, Tuberculosis, and Viral Pneumonia were 0.99, 1.00, 1.00, 1.00, 1.00, and 0.99, respectively.

4. 9 .
Classification of Dataset 9 The proposed Multi-Scale CNN model training and validation accuracy curves, ROC curves, and confusion matrix are shown in Figure 20.The outputs of dataset 9 are presented in Table 12 and Figure 21.The testing accuracy and loss of the proposed model were 97.47% and 0.0885, respectively.The CM indicated that the proposed model misclassified two Bacterial Pneumonia images as Fibrosis, three COVID-19 as Fibrosis, three Normal as Bacterial Pneumonia, three Normal as Fibrosis, two Tuberculosis as Fibrosis, seven Viral Pneumonia as Bacterial Pneumonia, one Viral Pneumonia as Fibrosis, and one Viral Pneumonia as Tuberculosis.The AUC values for Bacterial Pneumonia, COVID-19, Fibrosis, Normal, Tuberculosis, and Viral Pneumonia were 0.99, 1.00, 1.00, 1.00, 1.00, and 0.99, respectively.
23.This testing accuracy and loss of the proposed model were 96.05% and 0.1386, respectively.The CM indicated that the proposed model misclassified three Bacterial Pneumonia image as Viral Pneumonia, two COVID-19 as Tuberculosis, one Fibrosis as Normal, one Fibrosis as Tuberculosis, one Lung Opacity as Bacterial Pneumonia,

Figure 24 .
Figure 24.SHAP Partition Explainer with image plot on a lung opacity sample; top two categories that the model thinks the sample belongs to are (a) Lung opacity and (b) COVID.

Figure 24 .
Figure 24.SHAP Partition Explainer with image plot on a lung opacity sample; top two categories that the model thinks the sample belongs to are (a) Lung opacity and (b) COVID.

Figures 25 and 26
Figures 25 and 26 also show the first seven categories the model thinks the image belongs to.Figure25explains the same lung opacity sample for the seven class predictions used in Figure24.The model confuses the image with not only "COVID", as explained above, with the second-highest probability, but also with the "Fibrosis" and "Tuberculosis" classes.It can be seen in Figure25dthat the prominence of red areas (positive SHAP values) in the plot signifies a tendency toward the prediction "Lung Opacity" class, indicating the correct prediction.

Figure 25 .
Figure 25.SHAP partition Explainer with Image Plot on a Lung Opacity sample; Predictions on all seven categories where the model thinks the sample is (a) Bacterial Pneumonia, (b) COVID, (c) Fibrosis, (d) Lung Opacity, (e) Normal, (f) Tuberculosis, and (g) Viral Pneumonia.

Figure 26 .
Figure 26.SHAP Partition Explainer with Image Plot on a Fibrosis sample; Predictions on all seven categories where the model thinks the sample is (a) Bacterial Pneumonia, (b) COVID, (c) Fibrosis, (d) Lung Opacity, (e) Normal, (f) Tuberculosis, and (g) Viral Pneumonia.

Figure 25 .
Figure 25.SHAP partition Explainer with Image Plot on a Lung Opacity sample; Predictions on all seven categories where the model thinks the sample is (a) Bacterial Pneumonia, (b) COVID, (c) Fibrosis, (d) Lung Opacity, (e) Normal, (f) Tuberculosis, and (g) Viral Pneumonia.

Figure 25 .
Figure 25.SHAP partition Explainer with Image Plot on a Lung Opacity sample; Predictions on all seven categories where the model thinks the sample is (a) Bacterial Pneumonia, (b) COVID, (c) Fibrosis, (d) Lung Opacity, (e) Normal, (f) Tuberculosis, and (g) Viral Pneumonia.

Figure 26 .
Figure 26.SHAP Partition Explainer with Image Plot on a Fibrosis sample; Predictions on all seven categories where the model thinks the sample is (a) Bacterial Pneumonia, (b) COVID, (c) Fibrosis, (d) Lung Opacity, (e) Normal, (f) Tuberculosis, and (g) Viral Pneumonia.

Figure 26 .
Figure 26.SHAP Partition Explainer with Image Plot on a Fibrosis sample; Predictions on all seven categories where the model thinks the sample is (a) Bacterial Pneumonia, (b) COVID, (c) Fibrosis, (d) Lung Opacity, (e) Normal, (f) Tuberculosis, and (g) Viral Pneumonia.

Figure 27 .
Figure 27.Original CXR, Heatmap, and Super-imposed Grad-CAM image of Multi-scale CNN Model with two individual classes for one sample: (a) Fibrosis (on top) and (b) Tuberculosis (on bottom).

Figure 28 .
Figure 28.Comparison of performance metrics for all the ten datasets from Class-2 to Class-7 obtained by Multi-Scale CNN for identifying lung-affected diseases.Increasing class means including additional images of the same type of lung disease in the training and testing datasets, which reduces accuracy.Higher AUC values prove

Figure 27 .
Figure 27.Original CXR, Heatmap, and Super-imposed Grad-CAM image of Multi-scale CNN Model with two individual classes for one sample: (a) Fibrosis (on top) and (b) Tuberculosis (on bottom).

1 .
Figure28compares the performance of the MS-CNN model in correctly identifying lung-related disorders for various datasets.The illustration clearly shows that the testing accuracy was lowest (96.05%) in the case of dataset 10 (seven classes), but when the number of classes was reduced, the testing accuracy improved.For example, the dataset 1 (Binary class) has an accuracy of 100.00%.However, a little discrepancy is discovered between dataset 3 (three classes), dataset 4 (three classes), and dataset 5 (four classes).

Figure 27 .
Figure 27.Original CXR, Heatmap, and Super-imposed Grad-CAM image of Multi-scale CNN Model with two individual classes for one sample: (a) Fibrosis (on top) and (b) Tuberculosis (on bottom).

Figure 28 .
Figure 28.Comparison of performance metrics for all the ten datasets from Class-2 to Class-7 obtained by Multi-Scale CNN for identifying lung-affected diseases.Increasing class means including additional images of the same type of lung disease in the training and testing datasets, which reduces accuracy.Higher AUC values prove

Figure 28 .
Figure 28.Comparison of performance metrics for all the ten datasets from Class-2 to Class-7 obtained by Multi-Scale CNN for identifying lung-affected diseases.

Figure 29 .
Figure 29.Comparison of computational time for all state-of-the-art (SOTA) models of Dataset 10.

Figure 29 .
Figure 29.Comparison of computational time for all state-of-the-art (SOTA) models of Dataset 10.

Table 1 .
Designing of Chest X-ray Datasets.

Table 3 .
Hyperparameters utilized in model training.

Table 4 .
Classification performance results for Dataset 1.

Table 4 .
Classification performance results for Dataset 1.

Table 5 .
Classification performance results for Dataset 2.

Table 5 .
Classification performance results for Dataset 2.

Table 6 .
Classification performance results for Dataset 3.

Table 7 .
Classification performance results for Dataset 4.

Table 7 .
Classification performance results for Dataset 4.

Table 8 .
Classification performance results for Dataset 5.

Table 8 .
Classification performance results for Dataset 5.

Table 9 .
Classification performance results for Dataset 6.

Table 10 .
Classification performance results for Dataset 7.

Table 11 .
Classification performance results for Dataset 8.

Table 11 .
Classification performance results for Dataset 8.

Table 12 .
Classification performance results for Dataset 9.

Table 12 .
Classification performance results for Dataset 9.

Table 13 .
Classification performance results for Dataset 10.

Table 14 .
Comparative analysis with different diagnostic approaches of previous works.
Note: Bold text indicates the best values.

Table 15 .
Comparison with Datasets of Other Literature.
Note: Bold text indicates the best values.
* All weights trained on Dataset 10. ** Bottom layer weights fine-tuned on Dataset 10.Bold text indicates the best values.