A Dual-Stage Vocabulary of Features (VoF)-Based Technique for COVID-19 Variants’ Classiﬁcation

: Novel coronavirus, known as COVID-19, is a very dangerous virus. Initially detected in China, it has since spread all over the world causing many deaths. There are several variants of COVID-19, which have been categorized into two major groups. These groups are variants of concern and variants of interest. Variants of concern are more dangerous, and there is a need to develop a system that can detect and classify COVID-19 and its variants without touching an infected person. In this paper, we propose a dual-stage-based deep learning framework to detect and classify COVID-19 and its variants. CT scans and chest X-ray images are used. Initially, the detection is done through a convolutional neural network, and then spatial features are extracted with deep convolutional models, while handcrafted features are extracted from several handcrafted descriptors. Both spatial and handcrafted features are combined to make a feature vector. This feature vector is called the vocabulary of features (VoF), as it contains spatial and handcrafted features. This feature vector is fed as an input to the classiﬁer to classify different variants. The proposed model is evaluated based on accuracy, F 1-score, speciﬁcity, sensitivity, speciﬁcity, Cohen’s kappa, and classiﬁcation error. The experimental results show that the proposed method outperforms all the existing state-of-the-art methods.


Introduction
Coronavirus, known as COVID-19, is a deadly virus that was discovered in Wuhan China in December 2019 and swiftly spread all over the world. It has taken the lives of millions of people. The World Health Organization (WHO) called it a global pandemic [1]. It has several variants; these variants are categorized into three major groups, named as variants of concern, variants of interest, and variants under monitoring. Variants of concern can cause an increase in transmission. These variants are alpha (α), beta (β), gamma (γ) [2], and delta (δ) [3]. On the other hand, variants of concern such as lambda (λ) and mu (µ) can cause community transmission or multiple clusters. The epsilon (ε), eta (η), iota (ι), and kappa (κ) were downgraded to variants of monitoring from variants of concern, while zeta (ζ) and theta (θ) have not been formally labeled [4]. The variants of concern are more dangerous and cause death. These variants transmit from one individual to others due to physical contact. The most recent and dangerous variant of COVID-19 is the delta variant. Its place of emergence is India. It is highly contagious and has the potential to evade some types of antibodies. Figure 1 shows the variants of concern and their place of emergence. It is necessary to develop a system that detects COVID-19 and classifies its variants without making any physical contact with the infected person. Medical professionals are focused on developing technology to fight this deadly virus. The virus has taken the lives of many people around the world. Artificial intelligence is one of the most promising technologies to detect this virus. A lot of systems have been developed to detect COVID-19 [5]. One of the most widely used approaches to detect COVID-19 from chest X-ray images is the use of a convolutional neural network (CNN) [6]. Several state-of-the-art techniques using machine learning and deep learning have been proposed for the disease classification and prognostics. In [7], the authors proposed a machine learning framework for the prediction of brain strokes using brain images. Similarly, in [8], a machine learning algorithm was proposed for the prediction of heart diseases based on electrocardiogram (ECG) signals. In [9], the authors introduced a novel framework for the evaluation of the outcome of brain neurons after a stroke. They demonstrated the use of machine learning algorithms for the quantitative evaluation. However, the authors of [10] demonstrated a framework for the features' extraction from electroencephalogram (EEG) signals. The use of machine learning and deep learning proves to be proficient for the detection and classification of several diseases.
In this paper, we propose a vocabulary of features (VoF)-based deep learning framework to detect COVID-19 and classify its variants. The framework is divided into two sections: the detection of COVID-19 and classification of different variants. Detection is done by performing different operations. After detection, the COVID-19 strain is classified into its variant. For the detection and classification, the vocabulary of features (VoF) is used. These VoF frameworks are used to train models. Several VoFs are used, and the results verify that the proposed method gives the best performance.
The rest of the paper is organized as follows: Section 2 explains a comprehensive literature review, and Section 3 illustrates the proposed methodology. In Section 4, a comparative analysis of experimental results is presented, while in Section 5, a brief conclusion is drawn.

Literature Review
Coronavirus is a highly dangerous virus. There have been a lot of systems used to detect COVID-19. The analysis of radiology images assisting in detecting coronavirus has obtained elevated attention by researchers around the world. In [11], a deep convolutional neural network-based technique is proposed to detect the virus. The authors fused two datasets and made a combined dataset. The accuracy achieved by the model was 98.7%. Another model used for classification is known as the support vector machine (SVM). The researcher studied different neural networks and among them, ResNet-50 emerged as the best, having an accuracy of 95%. However, this approach is complex and requires a massive dataset as well as abundant execution time [12]. Furthermore, Narin et al. suggested three pre-trained CNN architectures consisting of Inceptionv3, Inception-ResNet-v2, and ResNet-50, which help in detecting COVID-19-positive cases from X-rays. It was observed that a 98% accuracy was achieved for ResNet50, whereas an 87% accuracy was obtained for Inception-ResNet-v2, and Inceptionv3 gained an accuracy of 97%. The drawback of all these models is that they took only 100 images for examination. For a large dataset, the performance of the model will decline slowly [12]. Another method is proposed for the detection of COVID-19, named Covid-Net, helped in classifying several classes of illness: COVID-19, severe pneumonia, and normal. However, this approach gained only 92.4% accuracy, which is not much better as compared to other techniques [13]. Another article used a SqueezeNet model and achieved an accuracy of 98.3%. However, this model is not recommended as it required high processing power and speed for its working [14].
Many techniques have been in discussion for the detection of coronavirus pneumonia, among which some of them use deep learning techniques on computed tomography (CT) images, which can detect up to 89.5% for the accuracy, 87% for the sensitivity, and 88% for the specificity [15]. An advanced deep learning technique that automatically detects COVID-19 was produced by using X-ray images and DCNN-based Inception V3 model, which can validate the accuracy up to 98% [16].
In [17], COVID-19 was detected using X-ray images, and for those images, three algorithms were used and a 95%-99% F-score could be deduced successfully from them. In [18], an exemplar model evolved, which first applies a fuzzy tree transformation and then uses a multi-Kernel local binary pattern. Then, feature detection from the images is done and afterward, they are processed through two or three algorithms like the support vector machine (SVM) and decision tree. SVM produced the best results with an accuracy of about 97.01%.
Similarly, in paper [19], a model to detect COVID-19 with a CNN model called CoroDet is proposed. This used the raw images of CT scan and X-rays. This model is mainly used for three different types of classifications, among which the accuracy achieved is 94.2%, 99.1%, and 91.2% for the three-class classification, two-class classification, and four-class classification, respectively. In [19], a model established on deep learning is introduced whose architecture relies on the ResNet-101 CNN network and recognizes objects from millions of images and then detect anomalies in chest X-ray images. The accuracy was found to be about 71.9%. In [20], another automatic model for COVID-19 is presented using X-ray images. These are classified as binary and multi-class classifications. A real-time object detection system was used, named DarkNet, which uses 17 convolutional layers. The accuracy achieved was 87.02% and 98.08% for multi-class and binary classifications, respectively. Ahsan et al. proposed a novel deep learning-based technique for the detection of COVID-19 from chest X-rays and CT scans. They achieved the accuracy of 82.94% and 93.94% for chest X-rays and CT scans, respectively [21].
CT scans can also be used for the detection of coronavirus. In [22], the authors demonstrated the presence of COVID-19 in the CT scan of a 44-year-old patient. The lesion in the lungs can be found in the virus-affected CT scan. Similarly, in [23], the authors illustrated the detection of the novel coronavirus in a CT scan by detecting the presence of non-invasive fluid. The authors of [24] presented recent advances and emerging techniques for the detection of the virus.
From the literature, it is evident that DCNN is efficient in the detection of COVID-19. Motivated by this, we propose a novel algorithm to detect COVID-19 from chest X-rays and CT scans and then classify it into variants. The following are the major contribution of this article.

•
Detection of COVID-19 from CT scans and chest X-rays; • Classification of COVID-19 variants based on a unique vocabulary of features (VoF) technique; • Comparison of the proposed method with state-of-the art techniques.
The following section explains the proposed methodology in detail.

Proposed Methodology
Chest X-rays and CT scans are utilized in this methodology to detect the presence of COVID-19. Initially, the dataset is preprocessed. In preprocessing, all the input images of the dataset are resized to an equal size. Then, 2D discrete wavelet transform (DWT) is applied. This gives us the spectrogram of 2D images. These preprocessed images are used as input for the deep convolutional neural network. If the output of the first CNN is COVID-19, then the handcrafted features are extracted from the gray-scale images as well as spatial features, which are extracted from the three-channel gray-scale images and RGB images. These spatial features are integrated with the handcrafted features to get the vocabulary of features (VoF) vector. This VoF vector is fed as an input to the classifier to obtain the output label. The output labels tell us which COVID-19 variant is present. The proposed framework is illustrated in Figure 2.

Preprocessing
All the images of the dataset are preprocessed before passing these as the input for the feature extractors. All the preprocessing steps are explained as follows.

CT Scan Image Slicing Planes
CT scan images can be sliced using three planes: the axial, coronal, and sagittal planes. These planes are visually represented in Figure 3. In our research, we used the axial plane for the slicing of the CT scan images because it is the standard image acquisition for CT and provides a clear perception of the type and distribution of the abnormalities.

Sample Size of Chest X-rays and CT Scans
In medical image processing, particularly using CT scans and chest X-rays, it is important to consider sample size. In this article, we used all the CT scans with a dimension of 365 × 260, and the dimensions of the chest X-rays were 299 × 299.

Image Resize
It is necessary to make all the images an equal size for the better performance of the models. In this paper, all the images were resized to 227 × 227 pixels.

Discrete Wavelet Transform
After image resizing, discrete wavelet transform (DWT) is applied to obtain the compressed and approximated image pixels. Wavelets of an image are the functions generated from a single function by dilation and translations. A simple illustration of DWT is shown in Figure 4. The approximation coefficient of the DWT is used as an input for the feature extractors.

Features Extraction
If the output of the DCNN from the first stage is COVID-19, then the features are extracted from the COVID-19-affected image. These extracted features are handcrafted, and handcrafted features are extracted from handcrafted descriptors, while spatial features are extracted from the DCNN.

Handcrafted Features Extraction
Handcrafted features are extracted with the help of histogram of oriented gradient (HOG) [25], local binary pattern (LBP) [26], and oriented FAST and rotated BRIEF (ORB) [27].
HOG is a handcrafted descriptor used in image processing. The basic purpose of HOG is to detect objects based on the orientation of the gradient. It counts occurrences of gradient orientation in the localized segments of an image.
LBP is a very simple texture operator. LBP assigns a label to each pixel based on thresholding its neighbors. The output in this case is a binary number. That is the reason it is known as a local binary pattern.
ORB is a fast and reliable local feature detector for computer vision tasks such as object recognition and 3D reconstruction. It uses a modified version of the visual descriptor BRIEF and the FAST key-point detector. Its goal is to provide a quick and effective replacement for the scale-invariant feature transform (SIFT).

Spatial Features Extraction
Spatial features are extracted with the help of deep convolutional neural network (DCNN) models. DCNN has convolutional layers as well as max-pooling layers. The convolutional layers are used for the features extraction from the images, while the max-pooling layers act as a filter. The output feature vectors are collected at the fully connected layers of the DCNN. Feature extraction from an image via DCNN is demonstrated in Figure 6.

Vocabulary of Features (VoF) Vector
Vocabulary of features (VoF) is the vector containing handcrafted as well as spatial features. The VoF vector is the combined vector of the spatial feature vector and handcrafted feature vector.

Classifier
The VoF vector is applied to the classifier to perform classification. The classifier used is the support vector machine (SVM), which classifies objects using support vectors. The classification is done by drawing a hyperplane. The greater the distance of margin of support vector from the hyperplane, the better the accuracy and vice versa. There are three basic kernels of SVM. These are linear, Gaussian, and polynomial.

Dataset
The datasets used for the experiments are the images of the chest X-ray and CT scans. The chest X-ray dataset contains chest X-rays of normal and COVID-19-affected persons. Similarly, the CT scans dataset contains the CT scans of normal and COVID-19-affected patients. The COVID-19 images are categorized as alpha, beta, gamma, and delta. Then, 60% of the dataset is used for training the model, while 20% is used for validation and 20% is used for testing. The total number of images for the alpha variant are 1345, for the beta variant there are 10,192 images, there are 6012 samples for the gamma variant, and 3616 samples for the delta variant. The chest X-rays and CT scan images are publicly available at [4]. For the classification of the delta variant, we followed [28] to arrange our database into different classes of COVID-19 variants. Figure 7 shows the sample images of database. The next section presents the experimental results and discussion.

Experimental Results
The proposed framework was evaluated based on different performance parameters. The performance parameters include accuracy, F1-score, sensitivity, specificity, Cohen's kappa, and classification error. The accuracy of the model can be evaluated by Equation (1).
where TP denotes true positive, TN denotes true negative, FP represents false positive, and FN means false negative. Accuracy can be stated as the ratio of the correct prediction of COVID-19 to the total number of images in the database. Accuracy is very important performance parameter. Similarly, specificity can be calculated by Equation (2). Specificity is the ratio of the prediction of normal images present in the database.
Another performance parameter is the sensitivity of the model, and it is defined as the ratio of the prediction of COVID-19. It can be determined by Equation (3).
Like accuracy, the F1-score is also an important parameter for the evaluation of the model performance. It is the average of the specificity and sensitivity and the formula to evaluate the F1-score is given in Equation (4).
where Se and Sp denote sensitivity and specificity, respectively. Another performance parameter considered in this article is Cohen's kappa. It can be calculated by using Equation (5).
These parameters were used to evaluate the performance of our proposed method.

Simulation Parameters
In our framework DCNNs were used for the extraction of spatial features. The training of the DCNN is performed by fine tuning the different hyperparameters. The final hyperparameters are shown in Table 1.

Simulation Results
The performance of the trained model was evaluated. The training accuracy and loss of the model were 99.95% and 1.02%, respectively. First, the performance of first stage is presented. Tables 2 and 3 show the performance of different DCNNs applied for the detection of COVID-19 from chest X-rays and CT scans, respectively. The highest accuracy achieved by the proposed model for first stage was 99.5% and 99.74% for CT scans and chest X-rays, respectively. The confusion matrices of the proposed framework for chest X-rays and CT scans are shown in Figure 8. The ROC curve for the training and testing of the first stage of the proposed methodology is shown in Figure 9.

Validation of Results
We performed a k-fold cross-validation to validate our model as well as to deal with the data imbalance problem. We applied a 10-fold cross-validation to our dataset, and the optimal accuracy achieved by the cross-validation was 98.9% with a 1.1% loss.

Comparison with Existing Models
We also compared the accuracy of the first stage of our proposed framework with other state-of-the-art techniques. The proposed framework outperformed the first stage of the existing frameworks for COVID-19 detection. Figure 10 shows the first-stage performance of the proposed method for chest X-rays with [41][42][43][44][45][46][47], and Figure 11 shows the comparison of the first-stage performance of the proposed technique for CT scans with [21] and [48][49][50][51][52][53][54][55].
We can conclude from Figures 10 and 11 that the proposed method achieves better accuracy than other methods. The accuracy of proposed method is worse than one model of Loey et al. [41] for X-ray images, but the authors in that article used small dataset as compared to the database used in this paper. Similarly, the accuracy of proposed model for the CT scan images is less than Hasan et al. [52] as the authors used a limited database.
After the detection of COVID-19, we further classified COVID-19 into its variant using the vocabulary of features (VoF) technique, where the handcrafted and spatial features were considered for training the model. The comparison of all variants in terms of handcrafted features in X-ray images is shown in Figure 12.  The classification results of the X-rays and CT scans are shown in Figure 13 in the form of confusion matrices. The accuracy for X-ray images was 99.12%, while for CT scan it was 98.54%.  The proposed method achieved a 99.12% accuracy with a classification error of 0.88% for the classification of variants of COVID-19 with the X-ray images. Similarly, the proposed technique achieved 98.54% accuracy with a classification error of 1.46% for the classification of different variants of COVID-19 with the CT scans. The upcoming section presents the conclusion.

Conclusions
COVID-19 was discovered in December 2019, and it quickly became a global pandemic. Many people have lost their lives due to COVID-19, and it transmits from one person to another by contact between patients. Since its discovery, several variants of it have been subsequently discovered. In this article, we propose a dual-stage framework for the classification of different variants of COVID-19. The proposed method achieved 99.5% and 99.74% accuracy for the detection of COVID-19 in CT scan and X-ray images, respectively. The maximum accuracy of the second stage of the method was 98.54% and 99.12% for the classification of variants of COVID-19 in CT scans and X-ray images, respectively. The proposed framework thus achieved a state-of-the-art performance for the classification of the variants of concern. In the future, we plan to extend this work to three stages: the detection of COVID-19, classification into variants of concern and variants of interest, and then further classification into their names.