QUCoughScope: An Intelligent Application to Detect COVID-19 Patients Using Cough and Breath Sounds

Problem—Since the outbreak of the COVID-19 pandemic, mass testing has become essential to reduce the spread of the virus. Several recent studies suggest that a significant number of COVID-19 patients display no physical symptoms whatsoever. Therefore, it is unlikely that these patients will undergo COVID-19 testing, which increases their chances of unintentionally spreading the virus. Currently, the primary diagnostic tool to detect COVID-19 is a reverse-transcription polymerase chain reaction (RT-PCR) test from the respiratory specimens of the suspected patient, which is invasive and a resource-dependent technique. It is evident from recent researches that asymptomatic COVID-19 patients cough and breathe in a different way than healthy people. Aim—This paper aims to use a novel machine learning approach to detect COVID-19 (symptomatic and asymptomatic) patients from the convenience of their homes so that they do not overburden the healthcare system and also do not spread the virus unknowingly by continuously monitoring themselves. Method—A Cambridge University research group shared such a dataset of cough and breath sound samples from 582 healthy and 141 COVID-19 patients. Among the COVID-19 patients, 87 were asymptomatic while 54 were symptomatic (had a dry or wet cough). In addition to the available dataset, the proposed work deployed a real-time deep learning-based backend server with a web application to crowdsource cough and breath datasets and also screen for COVID-19 infection from the comfort of the user’s home. The collected dataset includes data from 245 healthy individuals and 78 asymptomatic and 18 symptomatic COVID-19 patients. Users can simply use the application from any web browser without installation and enter their symptoms, record audio clips of their cough and breath sounds, and upload the data anonymously. Two different pipelines for screening were developed based on the symptoms reported by the users: asymptomatic and symptomatic. An innovative and novel stacking CNN model was developed using three base learners from of eight state-of-the-art deep learning CNN algorithms. The stacking CNN model is based on a logistic regression classifier meta-learner that uses the spectrograms generated from the breath and cough sounds of symptomatic and asymptomatic patients as input using the combined (Cambridge and collected) dataset. Results—The stacking model outperformed the other eight CNN networks with the best classification performance for binary classification using cough sound spectrogram images. The accuracy, sensitivity, and specificity for symptomatic and asymptomatic patients were 96.5%, 96.42%, and 95.47% and 98.85%, 97.01%, and 99.6%, respectively. For breath sound spectrogram images, the metrics for binary classification of symptomatic and asymptomatic patients were 91.03%, 88.9%, and 91.5% and 80.01%, 72.04%, and 82.67%, respectively. Conclusion—The web-application QUCoughScope records coughing and breathing sounds, converts them to a spectrogram, and applies the best-performing machine learning model to classify the COVID-19 patients and healthy subjects. The result is then reported back to the test user in the application interface. Therefore, this novel system can be used by patients in their premises as a pre-screening method to aid COVID-19 diagnosis by prioritizing the patients for RT-PCR testing and thereby reducing the risk of spreading of the disease.


Introduction
The novel coronavirus-2019 (COVID-19) disease has infected 320 million and caused death to around 5.5 million people worldwide as of 15 January 2022 [1]. This has led to countries imposing strict lockdowns to reduce the infection rate, which has severely affected the economic and social lives of people. Mass vaccination has helped some countries, but some countries have entered into second and third waves of infection. Due to the emerging new variants, the pattern of infection and effectiveness of vaccination is still under question. The common symptoms of COVID-19 include fever, cough, shortness of breath, and pneumonia. People with a compromised immune system or elderly people are more likely to develop serious illnesses but the younger population is also affected, especially by the new variants [2][3][4][5][6].
Currently, diagnosis of COVID-19 is done by time-consuming, expensive, and expertdependent reverse transcription-polymer chain reaction (RT-PCR) testing. This kit is not easily available in some regions due to a lack of adequate supplies, medical professionals, and healthcare facilities. Moreover, it requires patients to travel to a laboratory facility to be tested, thereby potentially infecting others along the way. Due to the delay in obtaining the results of RT-PCR, rapid antigen detection tests have also been used in many countries, but they suffer from low accuracy [7][8][9]. Recently, Artificial Intelligence (AI) has been implemented in the health sector widely [10], such as on chest X-rays [11][12][13][14] and computed tomography (CT) scans [15][16][17], which have also been used for early detection of COVID-19 and other lung abnormalities. Recently, electrocardiogram (ECG) trace images have been used with AI for the detection of COVID-19 and other cardiovascular diseases [18]. Hasoon et al. [19] proposed a method for classification and early detection of COVID-19 through image processing using X-ray images. The evaluation results showed high diagnosis accuracy, from 89.2% up to 98.66%. Alyasseri et al. [20] provided a comprehensive review of the deep learning and machine learning (ML) techniques for COVID-19 diagnosis from studies between December 2019 and April 2021. This paper included more than 200 studies that were carefully selected from several publishers, such as IEEE, Springer, and Elsevier. It provided COVID-19 public datasets established in and extracted from different countries. Al-Waisy et al. [21] proposed a novel hybrid multimodal deep learning system for identifying COVID-19 virus in chest X-ray (CX-R) images and termed it the COVID-DeepNet system. It aids expert radiologists in rapid and accurate image interpretation, and helps in correctly and accurately diagnosing patients with COVID-19 with an accuracy rate of 99.93%. Abdulkareem et al. [22] proposed a model based on ML and the Internet of Things (IoT) to diagnose patients with COVID-19 in smart hospitals. Compared with benchmark studies, the proposed SVM model obtained the most substantial diagnosis performance (up to 95%). Obaid et al. [23] proposed a prediction mechanism that uses a long short-term memory (LSTM) deep learning model that has been carried out on a coronavirus dataset that was obtained from the records of infections, deaths, and recovered cases across the world. Furthermore, they have stated that by producing a dataset which includes features (temperature and humidity) of geographic regions that have experienced severe virus outbreaks, risk factors, spatiotemporal analysis, and social behavior of people, a predictive model can be developed for areas where the virus is likely to spread. All of the above approaches would need the patient to go to a medical center to provide a sample or undergo testing [18]. However, the asymptomatic COVID-19 patients will not undergo any test until the disease reaches a level of concern. Therefore, these COVID-19 patients can easily spread the disease. Moreover, vaccinated patients, when infected by the virus, are often asymptomatic or show very mild symptoms, and can spread the disease very easily. Thus, there is a need for an early screening tool for such patients in the convenience of their homes.
Machine learning has been used for many applications in the field of speech and audio [24][25][26], including machine learning techniques for spectrogram images [27][28][29]. It is used for the screening and early detection of different life-threatening diseases. It is stated that breathing, speech, sneezing, and coughing can be used by machine learning models to diagnose different respiratory illnesses such as COVID-19 [30][31][32]. Different body signals such as respiration or heart signals have been used by researchers to automatically detect different lung and heart diseases (such as wheeze detection in asthma [33][34][35]). The human voice has been used for early detection of several diseases such as Parkinson's disease, coronary artery disease, traumatic brain injury, and brain disorders. Parkinson's disease was linked to the softness of speech which can result from a lack of vocal muscle coordination [36,37]. Different voice parameters such as vocal frequency, vocal tone, pitch, rhythm, rate, and volume can be correlated with coronary artery disease [38]. Invisible illnesses such post-traumatic stress disorder [39], traumatic brain injury, and psychiatric conditions [40] can be linked with audio information. Human-generated audio can be used as a biomarker for the early detection of different diseases and can be a cheap solution for mass population screening and pre-screening. This becomes even more useful and comfortable to the user if it is related to their daily activities and the data acquisition can be done non-invasively.
Recent works have showed how respiratory sounds (e.g., coughing, breathing, and voice) from patients who tested positive for COVID-19 in hospitals differ from sounds of healthy people. Digital stethoscope data from lung auscultation is used as a diagnostic signal for COVID-19 [41], while the coughs 48 COVID-19 patients versus patients with other pathological coughs collected with phones were used to detect COVID-19 using an ensemble of CNN models [42]. In [11], speech recordings from hospitalized COVID-19 patients were used to automatically detect the health status of the patients. Thus, it is possible to identify whether a person is infected by the virus or not by utilizing respiratory signals like breath and cough sounds.
Data collection from COVID-19 patients is challenging due to the possibility of getting infected and the datasets are often not publicly available. McFarlane et al. [43] had stressed the need for a COVID-19 cough database which would help the development of an algorithm for detecting COVID from coughs. They used a database of 73 individual cough events from public media and named it NoCoCoDa. They stressed the need for uniformity/consistency in the dataset to help develop reliable algorithms. Grant et al. [44] have utilized crowd-sourced recorded speech, breath, and cough data from 150 COVID-19-positive cases to train a machine learning model. They investigated random forest and deep neural networks using mel-frequency cepstral coefficients (MFCCs) and relative spectral perceptual linear prediction (RASTA-PLP) features and have achieved a 0.7983 area under the curve (AUC) for detecting COVID-19 using speech sound analysis and a 0.7575 AUC for detecting COVID-19 using breathing sounds. Mouawad et al. [45] used MFCC features of cough and vowel 'eh' pronunciation from a dataset collected by the Corona Voice Detect project in partnership with Voca.ai and Carnegie Mellon University. They used XGBoost machine learning classifier and achieved an F1-score of 91% for cough and 89% for vowel "eh". Erdogam and Narin [46] discussed the features of cough spectrogram data with the help of empirical mode decomposition (EMD), discrete wavelet transform (DWT) and the ReliefF algorithm on a dataset from a free-access site, achieving a 98.06% F1-score in detecting COVID from cough sounds. Pahar et al. in [47] have investigated machine learning classifiers, long short-term memory (LSTM), and convolutional neural network (CNNs), and found that the ResNet50 network of the Coswara dataset [48] and Sarcos dataset [49] achieved an AUC of 0.98. Imran et al. [42] proposed a mobile app called AI4COVID-19, which records 3 s of cough sounds to analyze automatically for the detection of COVID-19 within 2 min using transfer learning. The pipeline consists of two stages: cough detection and collection, and COVID-19 diagnosis. In the cough detection engine, a user must record 3 s of good quality cough sounds, and a mel spectrogram image of the wave is analyzed with a convolutional neural network (CNN). After the cough is detected, the system passes to the COVID-19 diagnosis to decide the result. It consists of three AI approaches, the deep transfer learning multi-class classifier (DTL-MC), the classical machine learning multi-class classifier (CML-MC), and the deep transfer learning binary-class classifier. Some key limitations of the current AI4Covide-19 are (1) limited training data, (2) limited data to generalize the model, (3) an AI model is not publicly available. In another study by Pal and Sankarasubbu [50], the authors investigated deep neural networks (DNNs) on a dataset in which 328 cough sounds had been recorded from 150 patients of four different types: COVID-19, asthma, bronchitis, and healthy. In the study, Pal and Sankarasubbu's trained DNN could distinguish the COVID-19 coughs from others with an accuracy of 96.83% [50]. These studies confirm that COVID-19 coughs have a unique pattern. Bagad et al. [51] found that a pre-trained ResNet18 classifier could identify COVID-19 coughs with an AUC of 0.72 using COVID-19-confirmed cough samples collected over the phone from 3621 individuals. Laguarta et al. [52] had an AUC of 0.97 and a sensitivity of 98.5% with a pre-trained ResNet50 model for distinguishing COVID-19 coughs from non-COVID-19 patients using coughs which trained on 4256 subjects and tested on the remaining 1064 subjects [52]. Belkacem et al. [53] reported a complete hardware system that can be used to collect cough samples, temperature (via thermos camera) and airflow (via spirometer) and transmit this information to a database using smartphones. Next, cough samples and other health details with expert opinion were used to train a machine learning network to classify the samples as either COVID-19, bronchitis, flu, cold, or other. They used the existing motivation from recent papers that cough samples and machine learning networks are very useful in distinguishing between COVID-19 and healthy patients, but confirmed it with other data (airflow and body temperature). However, they have not mentioned the performance of their approach. A similar approach was adopted by Rahman et al. [54] utilizing chest X-rays, CT Scans, cough samples, temperature, and symptom inputs from patients. Although both the above approaches make the final results very reliable, they cannot be used immediately due to the hardware or extra health details needed for those systems.
Brown et al. [55] collected both cough and breathing sounds, then investigated how such data can aid with COVID-19 diagnosis. They provided handcrafted features for cough and breath sounds such as duration, onset, tempo, period, root mean square (RMS) energy, spectral centroid, roll-off frequency, zero-crossing, mel-frequency cepstrum (MFCC), and delta MFCC. Combined with deep transfer learning, VGGish, which is a convolution network designed to extract audio features, automatically achieved an accuracy of 0.80 ± 0.7 for two-class classification problems using the cough and breathing data. This dataset has also been used by Coppock et al. [56] in a pilot study, even before the dataset was made public, with their deep learning network achieving an AUC of 0.846. Kumar et al. [57], with their developed deep convolutional network, achieved a weighted F1-score of 96.46% in distinguishing between non-COVID and COVID-19 patients. This dataset was shared with our team under a data-sharing agreement, and was used to develop a machine learning pipeline in combination with Qatari data.
The scope for having a more reliable and robust machine learning network trained and validated using a diverse database (due to limitations in terms of inconsistency and low-quality recordings in the available datasets) has motivated the current work. This work proposes a novel machine-learning framework using the combined Cambridge and Qatari Diagnostics 2022, 12, 920 5 of 20 cough and breathing sound databases. Most of the previous works either used classical machine learning with hand-crafted features or used pre-trained models to classify the spectrograms. A very limited number of works used combined datasets and no work has used the novel stacking concept for increasing model performance. Moreover, none of the AI-enabled data collection applications can show instant outcomes of the users' data. Most applications are mere crowd data collection applications. We developed an AI-enabled web application as a pre-screening tool to decrease the pressure on health centers and provide a faster and more reliable testing mechanism to reduce the spread of the virus. Our contribution can be summarized as follows: Conduct a literature review of related works to prove the potential applicability of the proposed solution.
Point out the limitations of related works and how the proposed solution may overcome those problems.
To the best of the authors' knowledge, this is the first time an innovative and novel stackingbased CNN model using spectrograms of cough and breath sounds have been proposed.
Experimentally prove cough sounds have latent features to distinguish COVID-19 patients from non-COVID patients.
A web application with a backend server was created that allows the user to share symptoms and cough and breath data for COVID-19 diagnosis anonymously from a computer, tablet, or Android or iOS mobile phone.
To the best of our knowledge, QUCoughScope (https://www.qu-mlg.com/projects/ qu-cough-scope, accessed on 5 May 2021) is the first solution that is not just an application to collect crowd-sourced data. Rather, we have implemented a deep-learning pipeline in the backend to immediately provide the screening outcome to the user.
This article consists of six sections. In the introduction, we explained the problem of the current COVID-19 testing approach and how it can be addressed with the help of our pre-screening tool. Section II highlights related works, while Section III introduces the methodology, with details of the dataset, data preparation, and experiment, and Section IV summarizes all the results. Section V explains the implementation details while Section VI concludes the article.

Methodology
The overall methodology of the study is summarized in Figure 1. This study used cough and breath sounds of COVID-19 (symptomatic and asymptomatic) and healthy subjects after converting these sounds into spectrograms to identify COVID-19 patients. This paper discusses four different binary classification experiments: healthy and COVID-19 symptomatic (i) and asymptomatic (ii) subjects using cough sound spectrograms; healthy and COVID-19 symptomatic (iii) and asymptomatic (iv) subjects using breath sound spectrograms.
For all four experiments, novel stacking machine learning models were deployed, in which the eight CNN models were used as the base learners and then a logistic regressionbased meta learner was used to detect COVID-19 from cough and breath sound spectrograms. Detailed descriptions of the dataset, preprocessing, and the experiments are presented below.

Dataset Description
Several public datasets are available such as Coswara [48], CoughVid [58], and the Cambridge dataset [55]. However, the Cambridge dataset was not completely public, and the team has made it available upon request. Among the accessible datasets, the Cambridge dataset was the most reliable as it was acquired in a well-designed framework. Moreover, the authors have collected a similar cough and breath dataset from COVID-19-infected and healthy subjects with our proposed framework. For all four experiments, novel stacking machine learning models were deployed, in which the eight CNN models were used as the base learners and then a logistic regressionbased meta learner was used to detect COVID-19 from cough and breath sound spectrograms. Detailed descriptions of the dataset, preprocessing, and the experiments are presented below.

Dataset Description
Several public datasets are available such as Coswara [48], CoughVid [58], and the Cambridge dataset [55]. However, the Cambridge dataset was not completely public, and the team has made it available upon request. Among the accessible datasets, the Cambridge dataset was the most reliable as it was acquired in a well-designed framework. Moreover, the authors have collected a similar cough and breath dataset from COVID-19infected and healthy subjects with our proposed framework.
Cambridge dataset: The Cambridge dataset was designed for developing a diagnostic tool for COVID-19 based on cough and breath sounds [55]. The dataset was collected through an app (Android and web application (www.covid-19-sounds.org (accessed on 5 May 2021))) that asked volunteers for samples of their coughs and breathing as well as their medical history and symptoms. Age, gender, geographical location, current health status, and pre-existing medical conditions were also recorded. Audio recordings were sampled at 44.1 kHz and subjects were from different parts of the world. Cough and breath sound samples were collected from 582 healthy subjects and 141 COVID-19-positive patients. Among them, 264 healthy subjects and 54 COVID-19 patients had cough symptoms while 318 healthy subjects and 87 COVID-19 patients had no symptoms (Table  1). Qatari dataset: The QU cough dataset [59] consists of both cough and breath data from symptomatic and asymptomatic patients. Cough and breath sound samples were collected from 245 healthy subjects and 96 COVID-19-positive, respectively. Among them, Cambridge dataset: The Cambridge dataset was designed for developing a diagnostic tool for COVID-19 based on cough and breath sounds [55]. The dataset was collected through an app (Android and web application (www.covid-19-sounds.org (accessed on 5 May 2021))) that asked volunteers for samples of their coughs and breathing as well as their medical history and symptoms. Age, gender, geographical location, current health status, and pre-existing medical conditions were also recorded. Audio recordings were sampled at 44.1 kHz and subjects were from different parts of the world. Cough and breath sound samples were collected from 582 healthy subjects and 141 COVID-19-positive patients. Among them, 264 healthy subjects and 54 COVID-19 patients had cough symptoms while 318 healthy subjects and 87 COVID-19 patients had no symptoms (Table 1). Qatari dataset: The QU cough dataset [59] consists of both cough and breath data from symptomatic and asymptomatic patients. Cough and breath sound samples were collected from 245 healthy subjects and 96 COVID-19-positive, respectively. Among them, 32 healthy subjects and 18 COVID-19 patients had cough symptoms while 213 healthy subjects and 78 COVID-19 patients had no symptoms (as shown in Table 1).
In this study, we investigated the cough and breath sounds to overcome the limitations of some related works. We have therefore investigated two different pipelines for cough and breath. Moreover, for both cough and breath, we investigated symptomatic and asymptomatic patients' data. Both datasets were merged to train, validate, and test the models in this study. Table 2 shows the experimental pipelines used in this study.

Pre-Processing Stage
As shown in Figure 1, the input data (i.e., user cough and breath sounds) were converted to spectrograms, which were then tested using a 5-fold cross validation approach with 80% for training and 20% for testing. The detailed pre-processing stage is mentioned below: Since the dataset was collected using web and Android platforms, it was first organized into two sub-sets: cough and breath sounds. Then, each of these subsets was subdivided into symptomatic and asymptomatic groups. Each of the symptomatic and asymptomatic breath and cough sounds for COVID-19 and healthy groups were visualized in the time domain to see potential differences among them ( Figure 2).
Firstly, we converted cough and breath sounds to spectrograms. A spectrogram is a visual representation of an audio signal that shows the evolution of the frequency spectrum over time. A spectrogram is usually generated by performing a Fast Fourier Transform (FFT) on a collection of overlapping windows extracted from the original signal. The process of dividing the signal in short-term sequences of fixed size and applying FFT on those independently is called short-time Fourier transform (STFT). The spectrogram is the squared magnitude of the STFT of the signal, s(t) for a window width, w. These are the parameters used for STFT: n_fft = 2048, hop_length = 512, win_length = n_fft, and window = 'hann'.

Five-Fold Cross-Validation
The training dataset had to be balanced to avoid biased training. This was done with the help of the data augmentation approach, an effective method for providing reliable results evident in many of the authors' recent publications [11,12,[60][61][62][63]. In this study, two augmentation strategies (scaling and translation) were utilized to balance the training images shown in Table 3. The scaling operation is the magnification or reduction of the frame size of the image; 2.5% to 10% image magnifications were used in this work. Image translation was done by translating images horizontally and vertically by 5% to 10%. The complete image set was divided into 80% training and 20% testing sub-sets for five-fold cross-validation, and 10% of training data were used for validation, whose primary purpose was to avoid model overfitting. Table 3 shows the number of training, validation, and test images used in the two experiments on symptomatic and asymptomatic patients.
As discussed earlier, eight pre-trained CNN models were used in the study and were implemented using PyTorch library with Python 3.7 on an Intel ® Xeon ® CPU E5-2697 v4@ 2.30 GHz and 64 GB RAM, with a 16-GB NVIDIA GeForce GTX 1080 GPU. Eight of the pre-trained CNN models were trained using the same training parameters and stopping criteria mentioned in Table 4. Five-fold cross-validation results were averaged to produce the final receiver operating characteristic (ROC) curve, confusion matrix, and evaluation matrices. Here, 80% of the images were used for training and 20% for testing per fold. Image augmentations were used in the training set, and 20% of the non-augmented training set was used for validation to avoid overfitting of the models [64]. We also used a logistic regression classifier as a meta-learner for the final prediction in the stacking model where 'lbfgs' solver with L2 regularization was used and the maximum iteration was 100.
Since the dataset was collected using web and Android platforms, it was first organized into two sub-sets: cough and breath sounds. Then, each of these subsets was subdivided into symptomatic and asymptomatic groups. Each of the symptomatic and asymptomatic breath and cough sounds for COVID-19 and healthy groups were visualized in the time domain to see potential differences among them ( Figure 2). Firstly, we converted cough and breath sounds to spectrograms. A spectrogram is a visual representation of an audio signal that shows the evolution of the frequency spectrum over time. A spectrogram is usually generated by performing a Fast Fourier Transform (FFT) on a collection of overlapping windows extracted from the original signal. The process of dividing the signal in short-term sequences of fixed size and applying FFT on those independently is called short-time Fourier transform (STFT). The spectrogram is the

Stacking Model Development
In this study, we used a CNN-based stacking approach in which the eight stateof-the-art CNN models (Resnet18 [65], Resnet50 [65], Resnet101 [65], InceptionV3 [65], DenseNet201 [66], Mobilenetv2 [67], EfficientNet_B0 [68], and EfficientNet_B7 [68]) were used as a base learner and multiple best-performing models were used to train a logistic regression based meta learner classifier for the final decision. A single dataset A consists of data vectors (x i ) and their classification score (y i ). At first, a set of base-level classifiers M 1 , . . . . . . , M p is generated and the outputs are used to train the meta-level classifier, as illustrated in Figure 3. As discussed earlier, eight pre-trained CNN models were used in the study and were implemented using PyTorch library with Python 3.7 on an Intel ® Xeon ® CPU E5-2697 v4@ 2.30 GHz and 64 GB RAM, with a 16-GB NVIDIA GeForce GTX 1080 GPU. Eight of the pre-trained CNN models were trained using the same training parameters and stopping criteria mentioned in Table 4. Five-fold cross-validation results were averaged to produce the final receiver operating characteristic (ROC) curve, confusion matrix, and evaluation matrices. Here, 80% of the images were used for training and 20% for testing per fold. Image augmentations were used in the training set, and 20% of the non-augmented training set was used for validation to avoid overfitting of the models [64]. We also used a logistic regression classifier as a meta-learner for the final prediction in the stacking model where 'lbfgs' solver with L2 regularization was used and the maximum iteration was 100.

Stacking Model Development
In this study, we used a CNN-based stacking approach in which the eight state-ofthe-art CNN models (Resnet18 [65], Resnet50 [65], Resnet101 [65], InceptionV3 [65], DenseNet201 [66], Mobilenetv2 [67], EfficientNet_B0 [68], and EfficientNet_B7 [68]) were used as a base learner and multiple best-performing models were used to train a logistic regression based meta learner classifier for the final decision. A single dataset A consists of data vectors (x ) and their classification score (y ). At first, a set of base-level classifiers M , … … , M is generated and the outputs are used to train the meta-level classifier, as illustrated in Figure 3. We used five-fold cross-validation to generate a training set for the meta-level classifier. Among these folds, base-level classifiers were used on four folds, leaving one fold for testing. Each base-level classifier predicts a probability (0 to 1) over the possible class values. Thus, using input x, a probability distribution is created using the predictions of the base-level classifier set M: We used five-fold cross-validation to generate a training set for the meta-level classifier. Among these folds, base-level classifiers were used on four folds, leaving one fold for testing. Each base-level classifier predicts a probability (0 to 1) over the possible class values. Thus, using input x, a probability distribution is created using the predictions of the base-level classifier set M:

Performance Metrics
To evaluate the performance of the COVID-19 detection classifiers, we used the receiver operating characteristic (ROC) and area under the curve (AUC) along with precision, sensitivity, specificity, accuracy, and F1-Score as shown in Equations (2)-(6). Here, TP, TN, FP, and FN represent the true positive, true negative, false positive, and false negative, respectively.
where accuracy is the ratio of the correctly classified samples to all the samples.
where precision is the rate of correctly classified positive class samples among all the samples classified as positive.
where sensitivity is the rate of correctly predicted positive samples in the positive class samples, where F1 is the harmonic average of precision and sensitivity.
where specificity is the ratio of accurately predicted negative class samples to all negative class samples. The performance of deep CNNs was assessed using different evaluation metrics with 95% confidence intervals (CIs). Accordingly, the CI for each evaluation metric was computed, as shown in Equation (7): where N is the number of test samples, and z is the level of significance that is 1.96 for 95% CI. In addition to the above metrics, the various classification networks were compared in terms of elapsed time per image, or the time it took each network to classify an input image, as shown in Equation (8).
In this equation, T 1 is the starting time for a network to classify a cough sound, S and T 2 is the end time when the network has classified the same cough sound, S.

Results and Discussion
This section describes the performance of the different classification networks on healthy and COVID-19 cough and breath sound spectrograms for symptomatic and asymptomatic patients. As mentioned earlier, two different experiments using cough and breath sound spectrograms were conducted: (i) symptomatic COVID-19 and healthy, and (ii) asymptomatic COVID-19 and healthy. The comparative performance of different CNNs for these classification schemes is shown in Table 5A,B.
Overall accuracies for five-fold cross-validation from the top three CNN models for symptomatic and asymptomatic patients using cough sounds are 95.38%, 94.29%, and 93.25% and 98.5%, 98.28%, and 96.84%, respectively. The top three networks for symptomatic and asymptomatic patients using cough sounds are Resnet50, Resnet101, and DenseNet201 and Mobilenetv2, DenseNet201, and Resnet101, respectively. On the other hand, the overall accuracies from the top three CNN models for symptomatic and asymptomatic patients using breath sounds are 90.33%, 87.57%, and 84.53% and 75.6%, 69.72%, and 68.4%, respectively. The top three networks for symptomatic and asymptomatic patients using breath sounds are EfficientNet_B0, MobileNetv2, and ResNet101 and Effi-cientNet_B7, ResNet101, and MobileNetv2, respectively. It is evident from the results that cough sound-based stratification models perform better than breath sound-based models, for both symptomatic and asymptomatic patients. Interestingly, the stacking CNN model outperformed all CNN models for both cough and breath sounds, as can be seen from Table 5. It achieved accuracies of 96.5% and 98.85% for symptomatic and asymptomatic patients' cough sounds, respectively. On the contrary, it produced accuracies of 91.03% and 80.01% for symptomatic and asymptomatic patients' breath data, respectively. It is clear that breath sounds were unable to classify healthy subjects and COVID-19 patients reliably, whereas cough sounds performed better for both symptomatic and asymptomatic patients. Figure 4 shows the area under the curve (AUC)/receiver-operating characteristic (ROC) curve (also known as AUROC (area under the receiver-operating characteristic)) for the symptomatic and asymptomatic patients' cough and breath data. These ROC curves clearly depict that the stacking model performs better than any individual CNN model for cough and breath data, however, as mentioned earlier, cough sounds can reliably distinguish COVID-19 patients from the healthy group. It can also be seen that the best-performing scheme is the asymptomatic COVID-19 patients' stratification using cough sounds. The asymptomatic patients are the ones who are spreading the virus unknowingly, and our trained network performs well in detecting them from their cough sounds. Therefore, this COVID-19 screening framework can significantly help in screening suspected populations and reducing the risk of spread. sounds. Therefore, this COVID-19 screening framework can significantly help in screening suspected populations and reducing the risk of spread.  Figure 5 shows the confusion matrix for the outperforming-stacking model for the cough data of symptomatic and asymptomatic patients and the breath data of symptomatic and asymptomatic patients. It can be noticed that even with the best-performing model, eight out of 72 COVID-19 spectrogram images were miss-classified as healthy and 9 out of 296 healthy spectrogram images were mis-classified as COVID-19 images for symptomatic cough sound spectrogram images. On the other hand, five out of 165 COVID-19 images were mis-classified as healthy and only two out of 531 healthy spectrogram images were mis-classified as COVID-19 images for asymptomatic cough sound spectrogram images. Once again, consistent with the results from Figure 4, the cough sounds performed very well in distinguishing between the asymptomatic subjects.  Figure 5 shows the confusion matrix for the outperforming-stacking model for the cough data of symptomatic and asymptomatic patients and the breath data of symptomatic and asymptomatic patients. It can be noticed that even with the best-performing model, eight out of 72 COVID-19 spectrogram images were miss-classified as healthy and 9 out of 296 healthy spectrogram images were mis-classified as COVID-19 images for symptomatic cough sound spectrogram images. On the other hand, five out of 165 COVID-19 images were mis-classified as healthy and only two out of 531 healthy spectrogram images were mis-classified as COVID-19 images for asymptomatic cough sound spectrogram images. Once again, consistent with the results from Figure 4, the cough sounds performed very well in distinguishing between the asymptomatic subjects.
For symptomatic breath sound spectrogram images, eight out of 72 COVID-19 images were miss-classified as healthy and 25 out of 296 healthy spectrogram images were misclassified as COVID-19 images while 47 out of 165 COVID-19 images were mis-classified as healthy and 92 out of 531 healthy spectrogram images were mis-classified as COVID-19 images for asymptomatic breath sound spectrogram images. It is evident from the confusion matrices that the cough sound spectrogram outperformed the breath sound spectrogram. This outstanding performance of any computer-aided classifier using noninvasively acquirable cough sounds can significantly help with fast diagnosis of COVID-19 immediately and in the comfort of the user's home. For symptomatic breath sound spectrogram images, eight out of 72 COVID-19 images were miss-classified as healthy and 25 out of 296 healthy spectrogram images were mis-classified as COVID-19 images while 47 out of 165 COVID-19 images were mis-classified as healthy and 92 out of 531 healthy spectrogram images were mis-classified as COVID-19 images for asymptomatic breath sound spectrogram images. It is evident from the confusion matrices that the cough sound spectrogram outperformed the breath sound spectrogram. This outstanding performance of any computer-aided classifier using noninvasively acquirable cough sounds can significantly help with fast diagnosis of COVID-19 immediately and in the comfort of the user's home. Figure 6 shows a comparison of accuracy versus the inference time for each image for different CNN networks and the stacking CNN model for symptomatic and asymptomatic data. Inference times of the best-performing stacking network for symptomatic and asymptomatic cough sounds were about 0.0389 and 0.0411 s, respectively. Even though the inference time for the stacking model was higher than for most of the individual models, the inference time was still small enough to be suitable for real-time applications [69]. Therefore, to enable real-time application, we have deployed the best-performing stacking models in a web application that can be used from any mobile browser to make it independent from Android and iOS platforms. The next section describes the development and deployment steps of the AI-enabled web application.  Figure 6 shows a comparison of accuracy versus the inference time for each image for different CNN networks and the stacking CNN model for symptomatic and asymptomatic data. Inference times of the best-performing stacking network for symptomatic and asymptomatic cough sounds were about 0.0389 and 0.0411 s, respectively. Even though the inference time for the stacking model was higher than for most of the individual models, the inference time was still small enough to be suitable for real-time applications [69]. Therefore, to enable real-time application, we have deployed the best-performing stacking models in a web application that can be used from any mobile browser to make it independent from Android and iOS platforms. The next section describes the development and deployment steps of the AI-enabled web application.

AI-Enabled Application for COVID-19 Detection
An AI-enabled application was developed using Flutter, a cross-platform app development framework maintained by Google which uses the Dart programming language. The utility of using a cross-platform framework over native frameworks like Swift or Kotlin is that we can maintain multiple platforms like Android, iOS, and even desktop using a single codebase. This will in essence provide us with the maximum coverage for users, quicker development and continuous integration, seamless deployment and maintenance, easier cloud integration, and increased stability. Furthermore, using Flutter instead of other cross-platform frameworks like Ionic comes with the benefit of developing almost near-native code with complete access to native plugins and device hardware features in device AI using built-in GPU. We deployed an application entitled QUCoughScope [70] that allows patients to upload cough and breath sounds along with clinical history. For our purposes, the application requires access to the microphone of the smartphone and records cough and breath sounds. The mobile-recorded audio signal and symptoms, once received by the server machine, undergo an STFT operation to convert raw audio signals into spectrogram images without any pre-processing. The deployed Google computation engine-based backend AI-based server analyzes the uploaded sounds to classify them as healthy or COVID-19-positive.

AI-Enabled Application for COVID-19 Detection
An AI-enabled application was developed using Flutter, a cross-platform app development framework maintained by Google which uses the Dart programming language. The utility of using a cross-platform framework over native frameworks like Swift or Kotlin is that we can maintain multiple platforms like Android, iOS, and even desktop using a single codebase. This will in essence provide us with the maximum coverage for users, quicker development and continuous integration, seamless deployment and maintenance, easier cloud integration, and increased stability. Furthermore, using Flutter instead of other cross-platform frameworks like Ionic comes with the benefit of developing almost near-native code with complete access to native plugins and device hardware features in device AI using built-in GPU. We deployed an application entitled QUCoughScope [70] that allows patients to upload cough and breath sounds along with clinical history. For our purposes, the application requires access to the microphone of the smartphone and records cough and breath sounds. The mobile-recorded audio signal and symptoms, once received by the server machine, undergo an STFT operation to convert raw audio signals into spectrogram images without any pre-processing. The deployed Google computation engine-based backend AI-based server analyzes the uploaded sounds to classify them as healthy or COVID-19-positive. In the prototype system, the user fills in some demographic data, as well as a list of confirmed symptoms. Next, once the app collects cough and breath sounds from the user, these are transferred to the server using HTTPS protocol. The server performs signal processing and machine learning classification to determine whether the cough and breath sounds like those of COVID-19 patients or not (Figure 7). Our app then notifies the users about their status. The application displays the results and also stores them in a cloud database. In the prototype system, the user fills in some demographic data, as well as a list of confirmed symptoms. Next, once the app collects cough and breath sounds from the user, these are transferred to the server using HTTPS protocol. The server performs signal processing and machine learning classification to determine whether the cough and breath sounds like those of COVID-19 patients or not (Figure 7). Our app then notifies the users about their status. The application displays the results and also stores them in a cloud database. Our pipeline is divided into two parts: symptomatic (cough) and asymptomatic users (no symptoms). Once the spectrogram is generated, our AI-enabled server checks whether the user has a cough or not, based on which two separate pipelines are carried out. If the user has entered that he/she has a cough, the symptomatic pipeline is activated. It was observed that differentiating between COVID-19-positive and healthy users based on symptomatic and asymptomatic patients' cough sounds plays a more important role than breath sounds. Our pipeline is divided into two parts: symptomatic (cough) and asymptomatic users (no symptoms). Once the spectrogram is generated, our AI-enabled server checks whether the user has a cough or not, based on which two separate pipelines are carried out. If the user has entered that he/she has a cough, the symptomatic pipeline is activated. It was observed that differentiating between COVID-19-positive and healthy users based on symptomatic and asymptomatic patients' cough sounds plays a more important role than breath sounds.

Conclusions
This work presents a novel stacking approach with deep CNN models for the automatic detection of COVID-19 using cough and breath sound spectrogram images for symptomatic and asymptomatic patients. As can be seen in comparison Table 6, the proposed innovative stacking approach has provided the best performance compared to similar studies. The performance of eight different CNN models was evaluated for the classification of different studies: binary classification of healthy and COVID-19 using cough and breath sound spectrogram images for symptomatic and asymptomatic patients. The study also evaluated the performance of the stacking CNN model in which the top three models were used as a base learner, and predictions of those models were used to train a logistic regression-based meta learner classifier for the final decision. The stacking CNN model outperformed other networks and the best classification accuracy, sensitivity, and specificity for binary classification using cough sound spectrogram images with symptomatic and asymptomatic data were found to be 96.5%, 96.42%, and 95.47% and 98.85%, 97.01%, and 99.6%, respectively. The best classification accuracy, sensitivity, and specificity for binary classification with symptomatic and asymptomatic breath sound data were found to be 91.03%, 88.9%, 91.5%, 80.01%, 72.04%, and 82.67% respectively. Thus, it is clear that cough sounds spectrogram images are more reliable in detecting COVID-19 patients than breath sound spectrograms. Moreover, the network has shown the best performance in detecting the asymptomatic patients, who are unknowing super-spreaders. The proposed web application can also help in crowdsourcing more data and further increasing the robustness of the solution. Therefore, automatic COVID-19 detection using cough sound spectrogram images can play a crucial role in computer-aided diagnosis as a fast diagnostic tool, which can detect a significant number of people in the early stages and can reduce healthcare costs and burden significantly.
The limitations of this work include (i) a less diverse dataset in terms of ethnicity, as the datasets are from the UK and Qatar, (ii) less intuitiveness of the application in terms of not being able to distinguish between cough or breath sounds from other sounds, even though we have an option for the user to confirm the recorded sound, (iii) the dataset has limited RT-PCR verified labelled data.
These limitations can be minimized in future work, as the application is being proposed to many doctors and government organizations (nationally and internationally) so that the network can be trained with a more diverse dataset to improve itself. Doctors and government organizations can help in providing RT-PCR labelled datasets, as this convenient solution can be a much better replacement of low sensitivity rapid antigen test kits, which are widely used for quicker results. The authors are working to train an anomaly detection model to ensure that the user can only submit cough and breathing sounds while other sounds will not be accepted. This will improve the robustness of the proposed system.  Institutional Review Board Statement: The study was approved by the Institutional Review Board of Qatar University (protocol code IRB-A-QU-2020-0014 and date of approval is 8 July 2020).
Informed Consent Statement: Patient consent was waived as the dataset was collected through crowdsourcing and there is no identification information available in the dataset and there is no way to track the user and the users has participated in the study voluntarily.

Data Availability Statement:
The dataset collected from QUCoughScope through crowdsourcing is available in [59].