COVID-19 Diagnosis in Computerized Tomography (CT) and X-ray Scans Using Capsule Neural Network

This study proposes a deep-learning-based solution (named CapsNetCovid) for COVID-19 diagnosis using a capsule neural network (CapsNet). CapsNets are robust for image rotations and affine transformations, which is advantageous when processing medical imaging datasets. This study presents a performance analysis of CapsNets on standard images and their augmented variants for binary and multi-class classification. CapsNetCovid was trained and evaluated on two COVID-19 datasets of CT images and X-ray images. It was also evaluated on eight augmented datasets. The results show that the proposed model achieved classification accuracy, precision, sensitivity, and F1-score of 99.929%, 99.887%, 100%, and 99.319%, respectively, for the CT images. It also achieved a classification accuracy, precision, sensitivity, and F1-score of 94.721%, 93.864%, 92.947%, and 93.386%, respectively, for the X-ray images. This study presents a comparative analysis between CapsNetCovid, CNN, DenseNet121, and ResNet50 in terms of their ability to correctly identify randomly transformed and rotated CT and X-ray images without the use of data augmentation techniques. The analysis shows that CapsNetCovid outperforms CNN, DenseNet121, and ResNet50 when trained and evaluated on CT and X-ray images without data augmentation. We hope that this research will aid in improving decision making and diagnostic accuracy of medical professionals when diagnosing COVID-19.


Introduction
Coronavirus disease , one of the deadliest pandemics in the history of mankind, has swept through almost all the countries in the world [1]. Coronavirus has infected over 676 million people and killed over 6.88 million as of 17 March 2023, as indicated in the COVID-19 map of Johns Hopkins University. Unfortunately, the virus is still evolving, and new variants continue to emerge worldwide. Multiple nations, including Australia, Bangladesh, Denmark, India, Japan, and the United States, detected a novel immune-evasive COVID-19 strain (XBB) in August 2022, which is causing outbreaks in various nations. This shows that COVID-19 is still a threat, and there is a need for suitable techniques that can be used to tackle this pandemic.
Recently, computer-aided diagnosis technologies have become a fundamental part of routine clinical practice. These tools can be utilized to aid physicians in accurately diagnosing COVID-19 patients. Convolutional neural networks (CNNs) are one of the effective deep learning (DL) algorithms for building improved medical imaging systems. However, they are unable to handle input transformations effectively. In addition, CNNs must be trained on massive or augmented datasets to generate superior results. A capsule neural network (CapsNet) is a recent deep learning (DL) algorithm proposed by Hinton from specific regions in an image. Moreover, the concatenated connections strengthen the coupling coefficient and improves the learning ability of the capsule layers. It also increases the model's ability to extract complex features from images with large spatial dimensions. The authors also used a pre-trained model to finetune the network's performance. The proposed framework was tested on three different datasets, and it achieved classification accuracies of 96.0%, 96.8%, and 95.9% without finetuning. The technique also produced classification accuracies of 98.3%, 99.0%, and 98.9% for the three datasets with finetuning.
Toraman et al. [7] introduced a novel CapsNet architecture for COVID-19 diagnosis from X-ray images. The architecture is composed of five convolution layers, and each layer consist of 16, 32-, 64-, 128-, and 256-layer kernels. A large number of convolutional layers was added to provide effective feature maps to the primary layer of the CapsNet. The kernel size of the first three layers is 5 × 5, while the kernel size of the fourth layer is 9 × 9. The fifth layer is a primary capsule layer, consisting of 32 capsules, and each has a kernel size of 9 × 9. The proposed architecture was evaluated on a dataset consisting of 231 X-ray images of COVID-19 [8], 1050 X-ray images of no findings [9], and 1050 X-ray images of pneumonia [9]. The technique produced a classification accuracy of 84.22% and 97.24% for multi-class and binary classification, respectively.
Tiwari et al. [10] designed a hybrid framework for COVID-19 diagnosis (called VGG-CapsNet). The framework consists of CapsNet and VGG16. The input images are fed into the VGG16 pre-trained network to extract feature maps. The extracted feature maps are then fed into the CapsNet for classification. The proposed technique was evaluated on a dataset containing 219, 1345, and 1341 radiography images of COVID-19, pneumonia, and normal conditions, respectively [11]. The proposed technique was evaluated for multi-class classification and binary classification. The proposed hybrid model was also compared with the standard CapsNet model (called CNN-CapsNet). The framework achieved 97% for binary classification and 92% for multi-class classification. The results also show that the proposed hybrid framework outperforms the standard CapsNet by 2% for binary classification and by 1% for multi-class classification.
Afshar et al. [12] proposed a CapsNet-based framework for COVID-19 diagnosis. The framework consists of four convolutional layers and three capsule layers. The first convolutional layer is followed by a batch normalization layer, while the second convolutional layer is followed by an average pooling layer. The features from the fourth convolutional layer are reshaped and fed into the CapsNet. The dataset used to evaluate the framework is imbalanced; the number of positive cases is lower than the number of negative cases. Therefore, the loss function of the network is modified, such that more weight is assigned to the positive samples. The weights are determined using a formula specified in [12]. The framework was evaluated by first pre-training it on a dataset containing 94,323 frontal view of chest X-ray images. The pre-trained network was then finetuned on a dataset containing 358 CXR COVID-19 images, 8,066 8,066 normal images, and 5538 non-COVID-19 images. The framework achieved an accuracy, sensitivity, specificity, and AUC of 95.7%, 90%, 95.8%, and 0.97, respectively.
Heidarian et al. [13] proposed a fully automated two-stage framework for COVID-19 diagnosis using CapsNet and CT images, called COVID-FACT. At the first stage, COVID-FACT uses U-Net architecture to detect infected slices from a 3D volumetric CT scan. The infected slices are classified in the second stage. Two variants of the framework were developed in the study. Whole CT images are used as inputs to the first variant, while the segmented lung region is used as an input to the second variant. COVID-FACT was trained on a dataset containing 171, 60, and 76 COVID-19, community-acquired pneumonia (CAP), and normal volumetric CT images, respectively. Experiments shows that the two variants produced the same classification accuracy of 90.82%. However, the variant that was trained on the segmented lung regions improved the sensitivity and AUC of the model by over 1.83% and 0.03, respectively.
Quan et al. [14] designed a COVID-19 diagnosis method using DenseNet121 and Cap-sNet. They also introduced a dataset pre-processing technique that reduces the impact of dataset heterogeneity on the performance of a network. Data augmentation was also used to generate more datasets. The proposed framework uses a segmentation network, namely TernausNet [15], to segment or extract the lung contour from X-ray images. The segmented lung contours are then fed into DenseNet121 for feature extraction. The extracted features are fed into CapsNet for classification. The segmentation network was trained on the Montgomery County Chest X-ray Database [16], containing 80 and 50 normal and tuberculosis X-ray images, respectively. The classification network was trained on a dataset from three sources. The dataset contains 781, 2917, 2884, and 2850 COVID-19, normal, pneumonia (virus), and pneumonia (bacteria) X-ray images, respectively. The framework achieved a classification accuracy, sensitivity, and F1-score of 90.7%, 96%, and 90.9%, respectively.
Qi et al. [17] developed a fully automated pipeline for classifying COVID-19 from CAP using CT images. The pipeline consists of four modules. The first module uses LinkNet [18] to segment the lungs from CT images, and the second module uses CapsNet to select slices with lesions. The third module uses ResNet50 and CapsNet for slice-level prediction, and the fourth module uses DensNet121 and CapsNet for patient-level prediction. The pipeline was trained on a dataset containing 161 CT images with COVID-19 and 100 CT images with CAP. The CapsNet with ResNet50 achieved a classification accuracy and AUC of 92.5% and 0.933, respectively, for the slice-level prediction. The CapsNet with DenseNet121 achieved a better classification accuracy and AUC of 97.1% and 0.992, respectively, for slice-level prediction. The pipeline achieved an accuracy of 100% for patient-level prediction.
Attallah [19] proposed a CNN-based technique for COVID-19 diagnosis called RADIC. RADIC is divided into four stages. In the first stage, four radiomics methods are used to analyze CT and X-ray images, including gray-level run-length matrix (GLRLM), gray-level covariance matrix (GLCM), discrete wavelet transform (DWT), and dual-tree complex wavelet transform (DTCWT). The output of the analysis was then converted to heatmap images. In the second stage, the heatmap images are used to train three CNN models, including MobileNet, DenseNet201, and Darknet53. After training, deep features were extracted from the batch normalization layers of the three models. Furthermore, the complexity of the extracted features was reduced using the fast Walsh-Hadamard transform (FWHT). The reduced features from the three CNN models were combined using discrete cosine transform. Finally, the combined features were used to train different classification models, including linear support vector machine (L-SVM), quadratic-SVM, linear discriminant analysis (LDA), and ensemble subspace discriminant (ESD). The technique was evaluated on a CT and X-ray dataset, and it produced 99.4% and 99% on the two datasets.
Mercaldo et al. [20] designed a DL technique for COVID-19 diagnosis using VGG16. They added one more fully connected layer to the VGG16 and trained the added layer on a dataset containing 18,000 CT images. The model achieved an accuracy of 95%. In another study, Shah et al. [21] designed a CNN-based technique for COVID-19 diagnosis. They evaluated the model on 738 CT images, and it produced a classification accuracy of 82.1%. They also compared the performance of the proposed model to DenseNet169, VGG16, ResNet50, InceptionV3, and VGG19. The comparison shows that VGG-19 outperformed the other techniques, achieving an accuracy of 94.52%.
Attallah and Samir [22] designed a DL-based pipeline for COVID-19 diagnosis using a multilevel discrete wavelet decomposition (DWT) and three ResNet models. DWT was used to analyze CT scans and generate heatmap images. The heatmap images were used to train three ResNet models. After training, spectral-temporal features were extracted from the three ResNet models, including ResNet50, ResNet101, and ResNet18. Furthermore, the same ResNet models were trained on the original CT images and some spatial features were extracted from the models after training. Furthermore, the spatial features were combined with the spectral-temporal features, and the combined feature dimension was reduced. Finally, the reduced features were used to train three SVM models. The technique was evaluated on two datasets, which achieved a classification accuracy of 99.33% and 99.7%.
Attallah [23] proposed a framework for COVID-19 diagnosis using texture-based radiomic images. The author trained three ResNet models (ResNet18, ResNet50, and ResNet101) on two types of texture-based radiomic images. The first set of images were generated by discrete wavelet transform, while the second set were generated by gray-level covariance matrix. After training, some texture-based radiomic features were extracted from the trained models and combined using discrete cosine transform. The fused features were used to train three SVM algorithms. The technique was evaluated on a dataset consisting of 2482 COVID-19 normal CT images, and it achieved a classification accuracy of 99.60%. Zhao et al. [24] designed a technique for COVID-19 diagnosis using a modified version of the ResNet model. In the modified model, the authors substituted group normalization for batch normalization and performed a weight standardization for all the convolutional layers. The model was evaluated on a dataset containing 194,922 images, and it achieved a classification accuracy of 99.2%.
Shankar and Perumal [25] proposed a novel technique for COVID-19 diagnosis. This technique is divided into three stages. In the first stage, Gaussian filtering is used for smoothening and noise removal from the images. Furthermore, the proposed fusion model is used to extract a different set of features from the processed images. It extracts handcrafted features using the local binary pattern model and DL features using the InceptionV3 model. Furthermore, the extracted features were fused and trained on the multilayer perceptron classifier. The technique was evaluated on an X-ray dataset consisting of 27 normal, 220 COVID-19, 11 SARS, and 15 Pneumocystis images, and it produced a classification accuracy of 94.08%. In another study, Marios et al. [26] presented an analysis of five DL algorithms for COVID-19 diagnosis using ResNet50, ResNet101, DenseNet121, DenseNet169, and InceptionV3. The models were trained on a dataset consisting 11,956 COVID-19 Xray images, 10,701 normal images, and 11,263 pneumonia images. The results show that ResNet101 achieved the best classification accuracy of 96%. Attallah [27] proposed a CNNbased method for COVID-19 diagnosis using spectral-temporal images. This method is divided into three stages. In the first stage, multilevel discrete wavelet transform (DWT) is used to analyze CT images and extract spectral-temporal images. The extracted images were then used to train three ResNet models. After training, deep features were extracted and fused together. The dimension of the fused features was reduced and used to train SVM. The technique was evaluated on a dataset consisting of CT images, and it produced satisfactory accuracy. A summary of the literature review is presented in Table 1.

Limitations of Existing COVID-19 Diagnosis Models
As shown in the summary and in a literature survey written in [28,29], COVID-19 diagnosis models have some shortcomings. Tracking people that are infected with COVID-19 is a challenging task. Moreover, identifying patients infected with COVID-19 beforehand is impossible because COVID-19 has an incubation period of 14 days. Furthermore, some of the datasets used for training lacks quality, as some of them are available in an unstructured format. In addition, some of the datasets are too clean, lacking representation of realworld datasets [28]. Moreover, the generalization performance of some of the proposed models is not good due to overfitting. Furthermore, most studies do not explore the use of unsupervised ML algorithms for COVID-19 diagnosis, such as principal component analysis (PCA) and cluster analysis [29]. Most studies also focus on DL algorithms, such as CNN, while few studies explored CapsNet. Furthermore, to the best of the authors' knowledge, no study presented a performance analysis of CapsNet on images of different rotations and transformations. Moreover, existing studies did not compare CapsNet and other CNN-based techniques in terms of their ability to recognize randomly transformed and rotated images. This is quite necessary, as one of the core advantages of CapsNet over CNN is its resistance to image rotations and transformations [2], as well as its ability to produce excellent results when trained on small datasets. This study aims to bridge some of the highlighted gaps. The main contributions of this study are highlighted in Section 1. Introduced a novel CapsNet architecture for COVID-19 diagnosis.
Two datasets were used in the study. Dataset 1 consists of 1230 non-COVID-19 and 1252 COVID-19 CT scans. Dataset 2 consist of 1784 COVID-19 X-ray images, 1754 healthy X-ray scans, and 1345 X-Ray scans of people with pneumonia.
The technique was evaluated on a CT and X-ray dataset, and it produced 99.4% and 99% on the two datasets.

Methodology
This study proposes a CapsNet architecture for COVID-19 diagnosis (CapsNetCovid). The architecture is shown in Figure 1. The same model was used for the CT and X-ray images. The model consists of convolutional layer, primary capsule layer, and digit capsule layer. The convolutional layer is used to extract features from images, the primary capsule layer is used to learn different image parts and features of an image (such as orientation, size, pose, texture, etc.) and the spatial relationships between the parts. The digit capsule layer is used to perform the final classification. to the network after performing experiments with different number of layers, kernels, and filter sizes. The convolutional layers help to extract effective and informed features for the primary capsule. The first and second convolutional layer consists of 256 kernels of size 3 × 3 with a stride of 1. The third convolutional layer consist of 512 kernels of size 3 × 3 with a stride of 2. The ReLU activation function is used for all the layers. The ReLU activation function is used to introduce non-linearity to the model and handle the vanishing gradient problem.
Initially, images are passed through the three convolutional layers. The images are resized to 224 × 224 after experimenting with different image sizes. The output from the convolutional layer is passed to 16 primary capsule layers, where each capsule contains 8D vectors. The capsule layer applies convolutional operation with 9 × 9 kernel, and then squash the output to obtain a capsule. The output of the capsule layer is passed to a digit layer, containing 16D vectors per class. The layer is used to classify the CT images into two classes (COVID-19 and normal) and the X-ray images into three classes (COVID-19, normal, and pneumonia).
Another CNN model was designed in this study for the purpose of comparison. The CNN model consists of two convolutional layers, one fully connected layer, and one output layer. The proposed architecture was also compared with DenseNet121 and ResNet50. The output of DenseNet121 and ResNet50 was passed through two fully connected layers, and one output layer. The output layer consists of two neurons for the binary classification, and three neurons for the multi-class classification. The pooling and dropout layer was also used to improve the computation speed and prevent overfitting. Note that only the added layers were finetuned. The number of layers and parameters for the CapsNet model, CNN model, and the two pre-trained models were selected after performing the series of experiments. More information about the parameters is presented in Tables 2-4.   Specifically, the proposed model consists of three convolutional layers, sixteen primary capsule layers, and one digit capsule layer. Three convolutional layers were added to the network after performing experiments with different number of layers, kernels, and filter sizes. The convolutional layers help to extract effective and informed features for the primary capsule. The first and second convolutional layer consists of 256 kernels of size 3 × 3 with a stride of 1. The third convolutional layer consist of 512 kernels of size 3 × 3 with a stride of 2. The ReLU activation function is used for all the layers. The ReLU activation function is used to introduce non-linearity to the model and handle the vanishing gradient problem.
Initially, images are passed through the three convolutional layers. The images are resized to 224 × 224 after experimenting with different image sizes. The output from the convolutional layer is passed to 16 primary capsule layers, where each capsule contains 8D vectors. The capsule layer applies convolutional operation with 9 × 9 kernel, and then squash the output to obtain a capsule. The output of the capsule layer is passed to a digit layer, containing 16D vectors per class. The layer is used to classify the CT images into two classes (COVID-19 and normal) and the X-ray images into three classes (COVID-19, normal, and pneumonia).
Another CNN model was designed in this study for the purpose of comparison. The CNN model consists of two convolutional layers, one fully connected layer, and one output layer. The proposed architecture was also compared with DenseNet121 and ResNet50. The output of DenseNet121 and ResNet50 was passed through two fully connected layers, and one output layer. The output layer consists of two neurons for the binary classification, and three neurons for the multi-class classification. The pooling and dropout layer was also used to improve the computation speed and prevent overfitting. Note that only the added layers were finetuned. The number of layers and parameters for the CapsNet model, CNN model, and the two pre-trained models were selected after performing the series of experiments. More information about the parameters is presented in Tables 2-4.  the trained CapsNetCovid was saved and used in the subsequent experiments. During the other experiments, the saved CapsNet model was evaluated on the eight augmented datasets. Note that the augmented datasets were not used to train CapsNetCovid; they were only used to evaluate the pre-trained CapsNetCovid. We did this to assess CapsNetCovid's ability to distinguish precisely between standard, flipped, shifted, and rotated images. Additionally, we wanted to evaluate the CapsNet's ability to recognize augmented images, even if it was not exposed to such images during training. The same procedure was carried out for CNN, DenseNet121, and ResNet50. The models were trained, validated, and tested on the original datasets. After training, their trained weights were saved and evaluated on the eight augmented datasets.

Dataset
Two types of datasets are used in this study. The first dataset type consists of standard images, while the second dataset type consists of augmented/transformed images. Standard images/datasets in this study refers to images/datasets that are not transformed (rotated or shifted).

Standard Dataset
Two datasets with standard images were used in this study. The first dataset was obtained from different sources, including China National Center for Bio-information [31], National Institutes of Health Intramural Targeted Anti-COVID-19 [32], Negin Radiology Medical Center [33], Union Hospital and Liyuan Hospital of Huazhong University of Science and Technology [34], COVID-19 CT Lung and Infection Segmentation initiative [35], and the Radiopaedia collection [36]. The dataset (called COVID-Net CT-2) was created by Gunraj et al. [37]. Readers are referred to [37] for more information on the dataset. A subset of the COVID-Net CT-2 dataset is used in this study. Samples of the dataset are shown in Figure 2. The dataset consists of 14,000 CT images (9000 COVID-19 images and 5000 non-COVID-19 images). The second dataset was created by some researchers at the university of Qatar [38,39]. The dataset consists of 3616 COVID-19 X-ray images, 10,192 normal X-ray images, and 1345 pneumonia X-ray images. The dataset is publicly available and it can be downloaded from [40]. university of Qatar [38,39]. The dataset consists of 3616 COVID-19 X-ray images, 10,192 normal X-ray images, and 1345 pneumonia X-ray images. The dataset is publicly available and it can be downloaded from [40].

Augmented Datasets
Eight new augmented datasets were generated from the original CT and X-ray datasets. The Keras ImageDataGenerator class was used to generate the augmented datasets. The first four augmented datasets consist of 14,000 randomly flipped CT images, 14,000 randomly shifted CT images, 14,000 CT images rotated randomly by 45 degrees, and 14,000 CT images rotated randomly by 90 degrees. The last four augmented datasets consist of 15,153 randomly flipped X-ray images, 15,153 randomly shifted X-ray images, 15,153 X-ray images rotated randomly by 45 degrees, and 15,153 X-ray images rotated randomly by 90 degrees. More details on the dataset are provided in Table 5. Additionally, the samples from the CT and X-ray standard and augmented dataset are shown in Figures

Augmented Datasets
Eight new augmented datasets were generated from the original CT and X-ray datasets. The Keras ImageDataGenerator class was used to generate the augmented datasets. The first four augmented datasets consist of 14,000 randomly flipped CT images, 14,000 randomly shifted CT images, 14,000 CT images rotated randomly by 45 degrees, and 14,000 CT images rotated randomly by 90 degrees. The last four augmented datasets consist of 15,153 randomly flipped X-ray images, 15,153 randomly shifted X-ray images, 15,153 X-ray images rotated randomly by 45 degrees, and 15,153 X-ray images rotated randomly by 90 degrees. More details on the dataset are provided in Table 5. Additionally, the samples from the CT and X-ray standard and augmented dataset are shown in Figures 2 and 3, respectively. university of Qatar [38,39]. The dataset consists of 3616 COVID-19 X-ray images, 10,192 normal X-ray images, and 1345 pneumonia X-ray images. The dataset is publicly available and it can be downloaded from [40].

Augmented Datasets
Eight new augmented datasets were generated from the original CT and X-ray datasets. The Keras ImageDataGenerator class was used to generate the augmented datasets. The first four augmented datasets consist of 14,000 randomly flipped CT images, 14,000 randomly shifted CT images, 14,000 CT images rotated randomly by 45 degrees, and 14,000 CT images rotated randomly by 90 degrees. The last four augmented datasets consist of 15,153 randomly flipped X-ray images, 15,153 randomly shifted X-ray images, 15,153 X-ray images rotated randomly by 45 degrees, and 15,153 X-ray images rotated randomly by 90 degrees. More details on the dataset are provided in Table 5. Additionally, the samples from the CT and X-ray standard and augmented dataset are shown in Figures  2 and Figure 3, respectively.  During the pre-processing stage, the images' pixel values were converted to the range 0 to 1 by dividing them by 255. This value was used because 255 is the maximum possible pixel value for an image. The images were also resized to 224 × 224 and used as inputs to the CapsNet model. Eighty percent of the dataset was used for training, while the remaining twenty percent was used to test the model. During training, 20 percent of the training images was used to validate the training performance. All the experiments were conducted on a computer cluster. The cluster computer had the following specifications: 2 × Intel Xeon E5-2697A v4 processors with 512 GB of 2.4 GHz DDR4 memory.

Performance Measures
Five performance measures 3454 used to evaluate the performance of the models, namely accuracy, precision, sensitivity, F1-score, and area under the ROC curve (AUC-ROC). The performance metrics can be calculated using Equations (1)-(4). The five metrics are influenced by the number of true negatives (TNs), true positives (TPs), false negatives (FNs), and false positives (FNs).
AUC-ROC is a measure showing the efficacy of a model in separating different classes. A high AUC indicates that the model is performing well, while a low AUC indicates otherwise.

Results and Discussion
Different experiments were performed to evaluate the performance of the proposed CapsNet model. This section presents the results and discussion. This section also presents a comparative analysis between CapsNetCovid and CNN, ResNet50, DenseNet121, and two existing studies.

Performance of CapsNetCovid for Binary Classification
Tables 6-10 and Figure 4 show the performance of CapsNetCovid on COVID-19 CT scans. As shown, the CapsNet achieved a test accuracy of 99.929%. This shows that Cap-sNetCovid misclassified less than 0.1% of the CT images in the test dataset. Tables 7 and 8 show the precision and sensitivity produced by the CapsNet during evaluation. The CapsNet achieved a precision, sensitivity and F1-score of 99.887%, 100%, and 99.316%, respectively. The sensitivity of 100% shows that the proposed model correctly classified all the COVID-19 samples, making them a good fit for medical diagnosis. It is crucial in the medical field to develop a model with a high degree of sensitivity. The precision of 99.887% shows the quality and completeness of the predictions. It confirms that all the COVID-19 samples were correctly predicted. The F1-score of 99.316% shows that the proposed CapsNet model correctly predicted 99.316% of the COVID-19 and normal samples across the evaluated dataset. This is quite admirable, as there is a good balance between the prediction of COVID-19 and normal samples in the dataset.     Table 10 shows the AUC scores produced by the proposed model. Furthermore, Figure 5 shows the AUC curves and their macro average with AUC scores. As shown, the proposed model performed well with an AUC of 100% for the two classes. This shows that the proposed model correctly distinguished all the COVID-19 and normal CT images in the original dataset. The proposed model is useful to medical practitioners because it correctly classifies all the COVID-19 and normal classes. A false positive result can lead to unnecessary procedures and treatments, while a false negative result can prevent a patient from receiving the necessary treatment, which can lead to the death of a patient.

Performance of CapsNetCovid on Augnemted Dataset for Binary Classification
Tables 6-10 also show the performance of CapsNetCovid on the augmented dataset. As shown, CapsNetCovid produced a classification accuracy of 71.075%, 84.935%, 87.114%, and 80.5844% for the RandomShift, RandomFlip, Rotated_45, and Rotated_90 datasets, respectively. The results show that the CapsNet is able to correctly identify a significant proportion of the augmented variants of the images it was trained on. The results also demonstrate the CapsNet's resistance to image transformations and its ability to generate accurate results without additional data. Table 10 also shows that Cap-

Performance of CapsNetCovid on Augnemted Dataset for Binary Classification
Tables 6-10 also show the performance of CapsNetCovid on the augmented dataset. As shown, CapsNetCovid produced a classification accuracy of 71.075%, 84.935%, 87.114%, and 80.5844% for the RandomShift, RandomFlip, Rotated_45, and Rotated_90 datasets, respectively. The results show that the CapsNet is able to correctly identify a significant proportion of the augmented variants of the images it was trained on. The results also demonstrate the CapsNet's resistance to image transformations and its ability to generate accurate results without additional data. Table 10 also shows that CapsNetCovid produced an AUC score of 0.61, 0.81, 0.81, and 0.72 for the RandomShift, RandomFlip, Rotated_45, and Rotated_90 datasets, respectively. This indicates that CapsNetCovid's ability to reliably distinguish between COVID-19 and normal CT images decreased. The generalization performance of the CapsNet can be improved if it is exposed to augmented images during training. In addition, as demonstrated by the results, CapsNetCovid's performance varies for various image transformations. The results also shows that the CapsNet is more robust at capturing randomly rotated and randomly flipped images than randomly shifted images. This shows that the robustness of the CapsNet depends on the type and degree of image transformation. More work is required to improve the generalization performance of CapsNet when applied to augmented medical images. This presents an opportunity for future research.

Comparative Analysis of CapsNetCovid with CNN-Based Techniques on Binary Classification
One of the key advantages of the CapsNet over CNN is its ability to capture affine rotations and transformations better than CNN. In view of this, we trained CNN, DenseNet121, and ResNet50 on the same COVID-19 dataset and compared their performance to that of CapsNetCovid. The results are shown in Tables 6-10 and Figures 6-8. As shown in the table, CapsNetCovid outperformed CNN on the standard and rotated datasets. CapsNet-Covid produced better classification accuracy, precision, sensitivity, and F1-score than CNN in most cases. This indicates that the CapsNet is more robust than CNN in identifying randomly rotated and transformed images without data augmentation. This is because the CNN model must be trained on all orientations of the images to achieve very good results. However, the CapsNet can detect and learn all orientations from a single image using a single capsule. In addition, it should be noted that the CapsNet is a recent DL algorithm. CNN existed before the CapsNet and has undergone numerous improvements over the years. Therefore, it is quite enthralling to see the CapsNet outperform CNN in most cases.  CapsNetCovid was compared with two state-of-the-art CNN pre-trained models, namely DenseNet121 and ResNet50. The two models were finetuned on the COVID-19 datasets used in this study. After training, the finetuned models were saved and evaluated on the four augmented datasets. The results of the experiments are reported in Tables 6-10. As shown in the tables, CapsNetCovid produced better classification accuracy, sensitivity, precision, and F1-score than DenseNet121 and ResNet50 in the original dataset. CapsNetCovid also outperformed DenseNet121 and ResNet50 in the augmented datasets    11 show the ROC curves for CNN, DenseNet121, and ResNet50. As shown, CapsNetCovid outperforms the AUC score of CNN, DenseNet121, and ResNet50 by 0.01%, 0.16%, and 0.23% for both normal and COVID-19 CT images. This shows that Cap-sNetCovid is more effective at distinguishing between positive and negative classes than the three compared CNN-based models.    -11 show the ROC curves for CNN, DenseNet121, and ResNet50. As shown, CapsNetCovid outperforms the AUC score of CNN, DenseNet121, and ResNet50 by 0.01%, 0.16%, and 0.23% for both normal and COVID-19 CT images. This shows that Cap-sNetCovid is more effective at distinguishing between positive and negative classes than the three compared CNN-based models. CapsNetCovid was compared with two state-of-the-art CNN pre-trained models, namely DenseNet121 and ResNet50. The two models were finetuned on the COVID-19 datasets used in this study. After training, the finetuned models were saved and evaluated on the four augmented datasets. The results of the experiments are reported in Tables 6-10. As shown in the tables, CapsNetCovid produced better classification accuracy, sensitivity, precision, and F1-score than DenseNet121 and ResNet50 in the original dataset. CapsNet-Covid also outperformed DenseNet121 and ResNet50 in the augmented datasets in most cases. In addition, the outcomes demonstrate that CapsNetCovid produced a higher AUC score than DenseNet121 and ResNet50 for the RandomFlip and Rotated_45 datasets. Additionally, it produced a higher AUC score than ResNet50 for the Rotated_90 dataset. This demonstrates that the CapsNet is superior to CNN at detecting transformations in images. Note that DenseNet121 and ResNet50 have already been trained on a large-scale dataset (ImageNet) containing over 1.2 million images. Nonetheless, the CapsNet still performed better than the two models. This demonstrates the capability of the CapsNet to handle small and augmented medical image datasets without data augmentation techniques. Figures 9-11 show the ROC curves for CNN, DenseNet121, and ResNet50. As shown, CapsNetCovid outperforms the AUC score of CNN, DenseNet121, and ResNet50 by 0.01%, 0.16%, and 0.23% for both normal and COVID-19 CT images. This shows that CapsNet-Covid is more effective at distinguishing between positive and negative classes than the three compared CNN-based models.

Performance of CapsNetCovid on Multi-Class Classification
The proposed technique was applied to a dataset with three classes: COVID-19, normal, and pneumonia. Figure 12 and Tables 11-15 show the performance of CapsNetCovid on the multi-class dataset. As shown, CapsNetCovid achieved a classification accuracy, precision, sensitivity, and F1-score of 94.721, 93.864%, 92.947%, and 93.386%, respectively. The accuracy shows that the proposed model correctly predicted over 94% of the images in the dataset. Figure 13 shows that CapsNetCovid also produced an AUC score of 95.21%. This shows that the model has a strong ability in distinguishing between COVID-19, normal, and pneumonia X-ray images. CapsNetCovid correctly predicted 95% of normal X-ray scans, 96% of pneumonia scans, and 95% of COVID-19 X-ray scans.      Tables 12-14 shows the precision, sensitivity, and F1-score of CapsNetCovid. As shown, CapsNetCovid produced a precision, sensitivity, and F1-score of 93.864%, 92.947%, and 93.386%, respectively. The high F1-score shows that the model has good generalization performance, and it performs well for normal, COVID-19 and pneumonia classes. The high sensitivity shows that the model correctly identified most of the COVID-19 and pneumonia classes. This is quite remarkable because it can be catastrophic to incorrectly diagnose a patient with COVID-19 or pneumonia. Medical practitioners prefer models with high sensitivity than models with high accuracy. The high precision shows that the CapsNetCovid model is 93.864% correct when it predicts an image to be COVID-19 or pneumonia.    Tables 12-14 shows the precision, sensitivity, and F1-score of CapsNetCovid. As shown, CapsNetCovid produced a precision, sensitivity, and F1-score of 93.864%, 92.947%, and 93.386%, respectively. The high F1-score shows that the model has good generalization performance, and it performs well for normal, COVID-19 and pneumonia classes. The high sensitivity shows that the model correctly identified most of the COVID-19 and pneumonia classes. This is quite remarkable because it can be catastrophic to incorrectly diagnose a patient with COVID-19 or pneumonia. Medical practitioners prefer models with high sensitivity than models with high accuracy. The high precision shows that the CapsNetCovid model is 93.864% correct when it predicts an image to be COVID-19 or pneumonia. Diagnostics 2023, 13, x FOR PEER REVIEW 20 of 29 Figure 13. CapsNet ROC curves for X-ray images.
It was observed that the performance of CapsNetCovid reduced from 99.929% to 94.721% when applied to multi-class classification. This decrease could be because of the quality of images in the dataset or the change in image modality. This may indicate that the CapsNet performs better on CT images compared to X-ray images. This reduction may also be because of the multi-class dataset. This may indicate that CapsNet performs better on binary classification compared to multi-class classification. More experiments are required to confirm the reason(s) for the decrease in performance. Overall, the proposed model performed well on the original X-ray images. Figures 14-16 and Tables 11-15 shows the performance of CNN, DenseNet121, and ResNet50 on the multi-class dataset. As shown, CapsNetCovid outperforms the three models in terms of classification accuracy and AUC score. It outperforms CNN, Dense-Net121, and ResNet50 by 5.18%, 4.52%, and 26.36%, respectively. This shows that Cap-sNetCovid performs better than CNN in correctly distinguishing between COVID-19, pneumonia, and normal X-ray images without using data augmentation. It also shows that the proposed technique outperformed the compared CNN-based techniques in terms of correctly identifying COVID-19 and pneumonia cases. The proposed model will be a good fit for medical practitioners as its predictions for COVID-19, pneumonia, and normal X-ray images are satisfactory.

Comparative Analysis of CapsNetCovid with CNN-Based Techniques on Multi-Class Classification
Note that DenseNet121 and ResNet50 are pre-trained on the ImageNet dataset containing over 1.2 million images. This shows that CapsNet does not need to be trained on large-scale datasets to outperform CNN-based models. The results also show that Cap-sNetCovid produced higher F1-score, precision, sensitivity, and AUC score than the compared CNN-based techniques in most cases. This indicates that the proposed technique has a better ability to correctly predict COVID-19 and pneumonia X-ray scans compared to CNN, DensNet121, and ResNet50. This shows that the CapsNet will be more acceptable to medical practitioners compared to CNN, especially when working with small datasets, which is mostly the case for medical image datasets. It was observed that the performance of CapsNetCovid reduced from 99.929% to 94.721% when applied to multi-class classification. This decrease could be because of the quality of images in the dataset or the change in image modality. This may indicate that the CapsNet performs better on CT images compared to X-ray images. This reduction may also be because of the multi-class dataset. This may indicate that CapsNet performs better on binary classification compared to multi-class classification. More experiments are required to confirm the reason(s) for the decrease in performance. Overall, the proposed model performed well on the original X-ray images. Figures 14-16 and Tables 11-15 shows the performance of CNN, DenseNet121, and ResNet50 on the multi-class dataset. As shown, CapsNetCovid outperforms the three models in terms of classification accuracy and AUC score. It outperforms CNN, DenseNet121, and ResNet50 by 5.18%, 4.52%, and 26.36%, respectively. This shows that CapsNetCovid performs better than CNN in correctly distinguishing between COVID-19, pneumonia, and normal X-ray images without using data augmentation. It also shows that the proposed technique outperformed the compared CNN-based techniques in terms of correctly identifying COVID-19 and pneumonia cases. The proposed model will be a good fit for medical practitioners as its predictions for COVID-19, pneumonia, and normal X-ray images are satisfactory.

Comparative Analysis of CapsNetCovid with CNN-Based Techniques on Multi-Class Classification
Note that DenseNet121 and ResNet50 are pre-trained on the ImageNet dataset containing over 1.2 million images. This shows that CapsNet does not need to be trained on large-scale datasets to outperform CNN-based models. The results also show that CapsNetCovid produced higher F1-score, precision, sensitivity, and AUC score than the compared CNN-based techniques in most cases. This indicates that the proposed technique has a better ability to correctly predict COVID-19 and pneumonia X-ray scans compared to CNN, DensNet121, and ResNet50. This shows that the CapsNet will be more acceptable to medical practitioners compared to CNN, especially when working with small datasets, which is mostly the case for medical image datasets. Figures 17-19 show the ROC curves produced by CNN, DenseNet121, and ResNet50 for multi-class classification. As shown, CapsNetCovid outperformed CNN by 0.09%, 0.05%, and 0.11% for normal, pneumonia, and COVID-19 images, respectively. CapsNet-Covid outperformed DenseNet121 by 0.07%, 0.02%, and 0.08% for normal, pneumonia, and COVID-19 images, respectively. CapsNetCovid outperformed ResNet50 by 0.45%, 0.46%, and 0.45% for normal, pneumonia, and COVID-19 images, respectively. This shows that CapsNetCovid is more effective at correctly predicting COVID-19, pneumonia, and normal X-ray images than CNN, DenseNet121, and ResNet50.      Figures 17-19 show the ROC curves produced by CNN, DenseNet121, and ResNet50 for multi-class classification. As shown, CapsNetCovid outperformed CNN by 0.09%, 0.05%, and 0.11% for normal, pneumonia, and COVID-19 images, respectively. Cap-sNetCovid outperformed DenseNet121 by 0.07%, 0.02%, and 0.08% for normal, pneumonia, and COVID-19 images, respectively. CapsNetCovid outperformed ResNet50 by 0.45%, 0.46%, and 0.45% for normal, pneumonia, and COVID-19 images, respectively. This shows that CapsNetCovid is more effective at correctly predicting COVID-19, pneumonia, and normal X-ray images than CNN, DenseNet121, and ResNet50.    Figures 17-19 show the ROC curves produced by CNN, DenseNet121, and ResNet50 for multi-class classification. As shown, CapsNetCovid outperformed CNN by 0.09%, 0.05%, and 0.11% for normal, pneumonia, and COVID-19 images, respectively. Cap-sNetCovid outperformed DenseNet121 by 0.07%, 0.02%, and 0.08% for normal, pneumonia, and COVID-19 images, respectively. CapsNetCovid outperformed ResNet50 by 0.45%, 0.46%, and 0.45% for normal, pneumonia, and COVID-19 images, respectively. This shows that CapsNetCovid is more effective at correctly predicting COVID-19, pneumonia, and normal X-ray images than CNN, DenseNet121, and ResNet50.

Performance of CapsNetCovid on Augmented Dataset for Multi-Class Classification
As aforementioned, the proposed technique was evaluated on four augmented X-ray datasets containing 15,153 randomly flipped, randomly rotated, and randomly shifted Xray images. The results are reported in Tables 11-15. As shown, the performance of Cap-sNetCovid decreased when evaluated on the augmented images. This is obviously because the model was not exposed to any of the augmented images during training. Cap-sNetCovid was anticipated to successfully recognize a larger percentage of the augmented version of the images it was trained on. However, as shown in the results, that was not the case. This shows that the robustness of CapsNet to affine transformations requires improvement, especially for multi-class classification. This is an opportunity for future research.
As shown in Tables 11-15, the performance of CapsNetCovid on randomly flipped, randomly rotated, and randomly shifted images varies. It achieved a higher classification accuracy for randomly flipped and randomly rotated images. This shows that the Cap-sNet is more resistant to randomly flipped and rotated images compared to randomly shifted images. CapsNetCovid also produced higher AUC score for randomly flipped and rotated images. This shows that it correctly predicted more randomly flipped COVID-19 and pneumonia images compared to normal images. The results also show that Cap-sNetCovid performed better on images that are randomly rotated by 45 degrees compared Figure 19. ResNet50 ROC curves for X-ray images.

Performance of CapsNetCovid on Augmented Dataset for Multi-Class Classification
As aforementioned, the proposed technique was evaluated on four augmented X-ray datasets containing 15,153 randomly flipped, randomly rotated, and randomly shifted X-ray images. The results are reported in Tables 11-15. As shown, the performance of CapsNetCovid decreased when evaluated on the augmented images. This is obviously because the model was not exposed to any of the augmented images during training. Cap-sNetCovid was anticipated to successfully recognize a larger percentage of the augmented version of the images it was trained on. However, as shown in the results, that was not the case. This shows that the robustness of CapsNet to affine transformations requires improvement, especially for multi-class classification. This is an opportunity for future research.
As shown in Tables 11-15, the performance of CapsNetCovid on randomly flipped, randomly rotated, and randomly shifted images varies. It achieved a higher classification accuracy for randomly flipped and randomly rotated images. This shows that the CapsNet is more resistant to randomly flipped and rotated images compared to randomly shifted images. CapsNetCovid also produced higher AUC score for randomly flipped and rotated images. This shows that it correctly predicted more randomly flipped COVID-19 and pneumonia images compared to normal images. The results also show that CapsNetCovid performed better on images that are randomly rotated by 45 degrees compared to images that are rotated by 90 degrees. This shows that the robustness of the CapsNet for image rotation is limited by the degree of image rotation.

Comparative Analysis of CapsNetCovid with CNN-Based Techniques on Multi-Class Classification
Tables 11-15 also show the performance of CNN, DenseNet121, and ResNet50 on augmented X-ray images. As shown in the results, the performance of the three models also decreased. CapsNetCovid produced better accuracy than CNN for randomly flipped and randomly rotated images. Furthermore, although DenseNet121 and ResNet50 produced higher classification accuracy than CapsNetCovid, the proposed model produced better precision, sensitivity, and F1-score than DenseNet121 and ResNet50. This shows that CapsNet is more robust than CNN-based techniques in correctly identifying COVID-19 and pneumonia images. The high classification accuracy of DenseNet121 and ResNet50 is most likely because the two models were pre-trained on over 1.2 million normal and augmented images. This suggests that data augmentation can be used to improve the robustness and generalization performance of CapsNet for image transformations. This can be confirmed from the performance of the CNN model. The CNN model was not previously trained on the augmented images, and it performed poorer than CapsNet, DenseNet121, and ResNet50.
Furthermore, as shown in the results, CapsNetCovid outperform CNN, DenseNet121, and ResNet50 in terms of precision, sensitivity, and F1-score. This shows that CapsNet is more robust for image rotations and affine transformation than the compared CNN-based techniques. Figure 13 shows the ROC curves of CapsNetCovid for the three classes and their macro average. As shown, the proposed model produced a better AUC score for standard images compared to augmented images. This shows that the performance of the CapsNet can be improved if it is exposed to augmented images during training. The ROC curves for CNN, DenseNet121, and ResNet50 are shown in Figures 17-19. As shown, CapsNetCovid produced a better AUC score than CNN and ResNet50. This shows that it outperforms the two models in correctly predicting COVID-19 and pneumonia images.

Comparison of CapsNetCovid with Related Studies
The proposed technique is compared with existing state-of-the-art COVID-19 diagnosis techniques. The technique is compared with 10 binary classification techniques and 11 multiclass classification techniques. The results are reported in Tables 16 and 17. As shown in the tables, the proposed technique outperformed all the compared techniques for binary classification and most of the techniques for multi-class classification. It is noteworthy to highlight that some of the compared techniques combined CNN pre-trained models with CapsNet. Notwithstanding, the proposed CapsNetCovid model still outperformed most of them. As an example, Tiwari and Anurag [41] proposed a CapsNet architecture for COVID-19 diagnosis from CT scans. They hybridized different CNN pre-trained models with a CapsNet. As shown in the results, CapsNetCovid performed slightly better than DenseCapsNet. It should be noted that DenseCapsNet is an aggregation of CapsNet and DensNet121, implying that it is already pre-trained on the ImageNet dataset with millions of images. Despite this, CapsNetCovid still produced comparable results to DenseCapsNet. Some studies combined CNN and SVM, CNN and CapsNet, optimization techniques and InceptionV3; nevertheless, the proposed model still outperformed them.

Summarized Results and Deductions
Different experiments were performed in this study, and their results are presented in Sections 4.1-4.8. As shown in the results, CapsNetCovid performed differently for both CT and X-ray images. The summary of the all the results is presented in this section. Deductions from the results are also presented in this section.

•
The results show that CapsNetCovid performs well on standard X-ray and CT images. It produced better accuracy when trained and evaluated on CT images and binary classification. Its performance slightly decreased when trained and evaluated on X-ray images and multi-class classification. Overall, the proposed technique produced very good accuracy, sensitivity, F1-score, and AUC score when trained on standard images without data augmentation. The proposed technique also performs well on small medical image datasets. This is because the CNN model must be trained on all orientations of the images to achieve very good results. However, CapsNet can detect and learn all orientations from a single image using a single capsule.

•
The results show that CapsNet is able to correctly identify a large proportion of the augmented variants of the images it was previously trained on, especially for binary classification. This demonstrates the CapsNet's resistance to image transformations and its ability to achieve good results without data augmentation techniques.

•
The performance of the CapsNet decreased when evaluated on the augmented variants of images it was previously trained on. This decrease was higher for X-ray images and multi-class classification. This is an indication that the CapsNet is more resis-tant to image rotations and transformations for binary classification than multi-class classification.

•
The results show that CapsNetCovid outperforms CNN, DenseNet121, and ResNet50 when trained and evaluated on CT and X-ray images without data augmentation. This indicates that CapsNet is an excellent choice when working with small dataset and binary and multi-class classification. • CapsNet outperforms CNN, DenseNet121, and ResNet50 when evaluated on an augmented CT image dataset with two classes (binary classification). It outperforms the CNN-based techniques in terms of classification accuracy, sensitivity, F1-score, and AUC score. Furthermore, although DenseNet121 and ResNet50 outperform the CapsNet in terms of classification accuracy, the CapsNet produced better precision, sensitivity, and F1-score than CNN, DenseNet121 and ResNet50 when evaluated on an augmented X-ray dataset with three classes (multi-class classification). This shows that medical practitioners will favor the CapsNet over CNN due to the significance of high sensitivity and F1-score in the medical domain. The higher classification accuracy of DenseNet121 and ResNet50 is most likely because the two models are pre-trained on a dataset with over 1.2 million normal and augmented images. This suggests that data augmentation can be used to improve the performance of the CapsNet for multi-class classification.

•
The results show that the CapsNet produces a better AUC score than CNN, DenseNet121, and ResNet50 for both binary and multi-class classification problems. This shows that the CapsNet has a better ability to distinguish between positive and negative classes, which is remarkable.
Overall, as shown in all the reported results, the proposed CapsNet model produced very good results for a small medical image dataset and it outperformed CNN, DenseNet121, and ResNet50 at classifying both standard and augmented CT and X-ray images. Moreover, Figures 4 and 12 show the training and validation loss of CapsNetCovid. As shown, the training and validation loss and accuracy curves are nearly overlapping, indicating that there is no significant variance between the training and validation loss and accuracy. This shows that the CapsNet model did not overfit.

Conclusions
The COVID-19 pandemic remains a threat, with multiple waves causing significant damage to the health of millions of people around the world. This study developed a CapsNet model (named CapsNetCovid) for COVID-19 diagnosis using CT and X-ray images. The model achieved a classification accuracy, precision, sensitivity and F1-score of 99.929%, 99.887%, 100%, and 99.319%, respectively, for CT images. Moreover, it achieved a classification accuracy, precision, sensitivity, and F1-score of 94.721%, 93.864%, 92.947%, and 93.386%, respectively, for the X-ray dataset. CapsNetCovid was compared with a CNN model designed for the purpose of comparison, and it outperformed the model on both standard and augmented CT and X-ray images. CapsNetCovid was also compared with two state-of-the-art pre-trained models, namely DenseNet121 and ResNet50. CapsNetCovid outperformed the two models for the standard CT and X-ray image dataset.
Moreover, the results show that CapsNetCovid is more resistant to image rotations and affine transformations than CNN, DenseNet121 and ResNet50 for CT and X-Ray images. Furthermore, the results show that the CapsNet is more resistant to image rotations and transformations for binary classification than multi-class classification. Furthermore, the results show that the CapsNet performs better when applied to randomly rotated and flipped images compared to shifted images. The results also suggest that data augmentation can be used to improve the performance of the CapsNet for multi-class classification. Data augmentation can also be used to improve the overall generalization performance of the CapsNet. Future research can focus on improving the generalization performance of the CapsNet and the robustness of the CapsNet for image rotations and transformations, especially for multi-class classification problems. Data Availability Statement: Publicly available datasets were analyzed in this study. This data can be found here: [37][38][39].

Conflicts of Interest:
The authors declare no conflict of interest.