Next Article in Journal
Sequential Convex Programming for Nonlinear Optimal Control in UAV Trajectory Planning
Next Article in Special Issue
A Quantum Approach for Exploring the Numerical Results of the Heat Equation
Previous Article in Journal
To Cache or Not to Cache
Previous Article in Special Issue
Explainable AI Frameworks: Navigating the Present Challenges and Unveiling Innovative Applications
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

SCMs: Systematic Conglomerated Models for Audio Cough Signal Classification

Department of Artificial Intelligence Convergence, Hallym University, Chuncheon 24252, Republic of Korea
*
Author to whom correspondence should be addressed.
Algorithms 2024, 17(7), 302; https://doi.org/10.3390/a17070302
Submission received: 14 June 2024 / Revised: 2 July 2024 / Accepted: 4 July 2024 / Published: 8 July 2024
(This article belongs to the Special Issue Quantum and Classical Artificial Intelligence)

Abstract

:
A common and natural physiological response of the human body is cough, which tries to push air and other wastage thoroughly from the airways. Due to environmental factors, allergic responses, pollution or some diseases, cough occurs. A cough can be either dry or wet depending on the amount of mucus produced. A characteristic feature of the cough is the sound, which is a quacking sound mostly. Human cough sounds can be monitored continuously, and so, cough sound classification has attracted a lot of interest in the research community in the last decade. In this research, three systematic conglomerated models (SCMs) are proposed for audio cough signal classification. The first conglomerated technique utilizes the concept of robust models like the Cross-Correlation Function (CCF) and Partial Cross-Correlation Function (PCCF) model, Least Absolute Shrinkage and Selection Operator (LASSO) model, elastic net regularization model with Gabor dictionary analysis and efficient ensemble machine learning techniques, the second technique utilizes the concept of stacked conditional autoencoders (SAEs) and the third technique utilizes the concept of using some efficient feature extraction schemes like Tunable Q Wavelet Transform (TQWT), sparse TQWT, Maximal Information Coefficient (MIC), Distance Correlation Coefficient (DCC) and some feature selection techniques like the Binary Tunicate Swarm Algorithm (BTSA), aggregation functions (AFs), factor analysis (FA), explanatory factor analysis (EFA) classified with machine learning classifiers, kernel extreme learning machine (KELM), arc-cosine ELM, Rat Swarm Optimization (RSO)-based KELM, etc. The techniques are utilized on publicly available datasets, and the results show that the highest classification accuracy of 98.99% was obtained when sparse TQWT with AF was implemented with an arc-cosine ELM classifier.

1. Introduction

The most important information about the respiratory system can be obtained from the cough sounds of a particular patient [1]. The cough sounds of every respiratory disease are quite unique to each other, and so, the diagnosis of it can be made easily by the physician. The expelled air from the lungs is reflected by the waveforms of the cough sounds. The cough has a specific sound pressure, and its unique relationship to the respiratory condition of the patient can be demonstrated well [2]. The esophageal pressure and the cough flow have a good relationship with the power of cough sounds. The voiced phase, explosive phase and intermediate phase are the three main phases of the individual cough sounds. When COVID-19 occurred, Artificial Intelligence (AI) techniques helped a lot in the diagnosis of various medical conditions and provided good solutions for patient outcome prediction depending on various inputs of the data [3]. Machine learning algorithms with AI techniques helped a lot in developing a quick and inexpensive approach to diagnosing COVID-19 problems. The recording of human audio signals can be performed in an instantaneous manner. With the advent of cough segmentation, data segmentation, machine learning and deep learning techniques, audio signal classification has been performed in the past decade. A few important works conducted in the past decade and in the past few years include the following. For detecting abnormal pulmonary function, the classification of voluntary cough sounds and airflow patterns was conducted by Abaza et al. [4]. By internal sound analysis, the detection of voluntary cough sounds was implemented by Lucio et al. [5]. Asthmatic cough sounds were described quantitatively by Thorpe et al. [6]. For the rapid diagnosis of childhood pneumonia, wavelet-augmented cough analysis was examined by Kosasih et al. [7]. Cough sound recognition was used for automatic croup diagnosis by Sharan et al. [8]. Wet and dry coughs in pediatric patients were automatically identified and classified by Swarnkar et al. [9]. The audio-based assessment of cough analysis was performed by Shi et al. [10]. The analysis of cough sounds with objective correlation and spirometry was conducted by Rudraraju et al., where an accuracy of 91.97% was obtained [11]. A multi-modal deep learning method for the analysis of cough sounds was presented by Malik et al., where an accuracy of 99.01% was obtained [12]. In a real-world environment, automatic cough classification for tuberculosis diagnosis was performed by Pahar et al. [13]. The Mel Frequency Cepstral Coefficient (MFCC) features and a CNN were used for cough classification for COVID-19 by Bansal et al. [14]. A multi-branch deep learning model for COVID-19 detection from cough sounds was developed by Ulukaya et al. [15]. Scalogram image specification using deep learning models was performed for cough audio classification by Loey et al. [16]. An ensemble approach for detecting COVID-19 from cough sounds was developed by Chowdhury et al. [17]. The unique spectral fingerprints were identified in cough sounds for diagnosing respiratory ailments by Ghrabli et al. [18]. A deep neural network-based respiratory pathology classification using cough sounds was conducted by Balamurali et al. [19]. A Malaysian cough sound analysis and COVID-19 classification with deep learning was conducted by Kho et al. [20]. With the help of interpretable symptom embeddings, the early diagnosis of COVID-19 was performed by Pal et al. [21]. Cough classification using machine learning and global smartphone recordings was conducted by Pahar et al. [22]. An in-depth study of utilizing cough sounds and deep neural networks for the early detection of COVID-19 was conducted by Islam et al. [23]. The wavelet analysis of voluntary cough sounds in patients with respiratory diseases was performed by Knocikova et al. [24]. A Variable Markov Model and recurrence dynamics were used for cough sound detection by Mouawad et al. [25]. Cough sound analysis for pneumonia and asthma classification in a pediatric population were conducted by Amrulloh et al. [26]. The acoustic features were analyzed for the speech-sound-based classification of asthmatic and health subjects by Yadav et al. [27]. A proof-of-concept study was conducted for the development of machine learning for asthmatic and healthy voluntary cough sounds by Hee et al. [28]. In this work, three systematic conglomerated models (SCMs) are proposed for audio cough signal classification. Initially, the pre-processing aspect is given some importance in our work, as follows.
To pre-process and preserve the audio cough signal, a filter is applied so that the unnecessary and redundant detail coefficients are removed. To mitigate the noise with both non-white and uncertain sources, a time–frequency domain filter back dependent on the Wavelet Transform is used. Depending on a collection of recursive functions, the versatile form of a time series signal technique utilized is the Discrete Wavelet Transform (DWT) [29]. Wavelet functions aid greatly in the formation of the Wavelet Transform (WT). To mitigate the noise in the signal, Wavelet Transforms are used widely. A low-pass digital filter is utilized initially for the audio cough signal to pass through it efficiently. Most of the redundant frequency components of the signals are eliminated completely. A signal can be decomposed into various frequency sub-bands by DWT, which is expressed in the following equation:
ψ ( t ) = 1 s * ψ ( t p s * )
where s * denotes the scale and p denotes the translation parameter. At various frequencies, the correlation is expressed as follows:
W s * ( p , s * ) = 1 s * x ( t ) ψ * ( t p s * ) d t
To define the signal frequency, the DWT coefficients are used, and this is represented as follows:
( s j * , p k ) = ( 2 j , k 2 j : j , k Z )
Once the cough audio signals are pre-processed, the proposed models are implemented. The main innovation of this proposal is to select and choose the various available models in the literature and create a novel conglomerated or hybrid model and test its performance on the audio cough signal datasets. The following techniques are used:
(a)
The first technique utilizes the concept of robust models with Gabor dictionary analysis and machine learning techniques with special focus on support vector regression (SVR) and ensemble models;
(b)
The second technique utilizes the concept of stacked conditional autoencoders (SAEs);
(c)
The third technique utilizes the concept of using some efficient feature extraction schemes like TQWT, sparse TQWT, MIC and DCC and some feature selection techniques like BTSA, aggregation functions, FA, EFA classified with machine learning classifiers, KELM, arc-cosine ELM, RSO-based KELM, etc.

2. Proposed Method 1: Robust Models with Gabor Dictionary Analysis and Machine Learning

The cough sound signals are projected as Z of length L , and this is conducted in a very-high-dimensional subspace. Sometimes, the projection can be over an overcomplete dictionary also, which comprises column vectors arranged as normalized atoms and represented as C = [ c 1 , c 2 , , c N ] of size L × N , with L < N . With the help of an ordinary least square term, this can be conducted efficiently. A sparsity-inducing regularizer could also be designed efficiently using such a technique. By means of solving the following optimization issue, the best coefficient vector b of length N is attained:
min b z C b 2   subject   to   b 0 K N
where the energy norm is represented by 2 and the number of non-zero entries is represented by 0 in the vector b . Therefore, the most descriptive collection of atoms is obtained by the model. A high-end analysis dictionary technique and a robust regularization scheme are required to model the atoms. With the help of greedy matching, the tackling of the issue is conducted efficiently. By means of projecting the residual onto the specific atoms of the dictionary, the coefficient vector b is computed iteratively. Atoms that mitigate the residual energy the most are chosen, and then, the residual is updated. Unless a termination criterion is met, the procedure is repeated often. Traditionally, the concept of matching pursuit was used with the Gabor dictionary; hence, in our proposed work, three different schemes were used to replace the matching pursuit concept [30]. The sparsity degree of the model is controlled, and the regularization parameter handles the multicollinearity very well.
For the cough audio signals, a dictionary along with a particular resolution based on its time–frequency atoms is considered, and then, the coefficient vector b is estimated so that the regularization parameter is explored well. This optimization step aids the accurate classification of cough audio signals. Then, using the estimated coefficient vector b of the respective cough audio signals, the time–frequency matrices are obtained so that it is useful for classification. The coefficient vectors are reshaped efficiently so that appropriate mathematical functions are implemented correctly. The proposed framework is given in Figure 1, as follows.

2.1. Gabor Dictionary

For the conceptualization of basis functions, the analysis dictionary is commonly used. An overcomplete dictionary is generally used so that a better sparsity can be obtained by means of the versatile localization of atoms. The analytic Gabor dictionary is used here, where a Gaussian window function is used and can be translated, scaled and modulated as follows [31]:
d [ α , β , w , θ ] [ n ] = γ [ α , β , w , θ ] α g [ n β α ] cos [ w [ n β ] + θ ]
where the scaling constant is expressed by γ [ α , β , w , θ ] and the Gaussian factor is expressed by g [ n β α ] = 2 1 4 e π [ n β α ] 2 , with n = 0 , 1 , 2 , , 2 n 1 . The Gaussian window length is considered as L = 2 n , and it should correspond to the audio signal length which is supposed to be decomposed. Initially, the scale parameter α is chosen, and the translator parameter β and frequency w are expressed as follows:
[ α , β , w , θ ] = [ 2 h , 2 h 1 , q , 2 h 1 π f , 0 ]
where 0 < h < N , q = 0 : 1 : 2 N h + 1 1 and f = 0 : 1 : 2 h + 1 1 .
The shift parameter is presumed to be zero here, and the total number of atoms are modelled accordingly. The h t h Gabor dictionary C h of size 2 N × 2 N + 2 is developed for a particular value of h in a matrix format. All the audio cough signals have a similar length, and the variables of the h t h dictionary are assigned approximately. By means of altering h , the tradeoff between the spectral resolution and temporal resolution of the dictionary is solved. There is a slow decay in the Gaussian function g [ n β α ] as the value of h increases. Therefore, the time duration of the atom is enhanced, and the temporal resolution is severely degraded. So, there is an increase in the Gaussian function duration, and the overall bandwidth of atoms present around the frequency reduces, leading to the atoms attaining a higher spectral resolution.

2.2. Cross-Correlation Function (CCF) and Partial Cross-Correlation Function (PCCF) Model

The audio cough signal data are split into a t and bt with t = 1 , 2 , , n and a specific lag l . Between the two time series a t and b t + l , the general formula for the covariance function is represented as follows:
c a b ( l ) = 1 n t = 1 n l ( a t μ ¯ a ) ( b t + l μ ¯ b )
where the means of a t and b t are expressed as follows:
r a b ( l ) = { c a b ( l ) σ a σ b ; l 0 c b a ( l ) σ a σ b ; l < 0
where the standard deviations of a t and b t are indicated by σ a and σ b , respectively. at and b t + l are correlated if l is a positive integer. a t and b t l are correlated if l is a negative integer. When l = 0 , then r a b = r a b ( 0 ) , indicating the correlation coefficient. It is assumed that the cough audio signal a t is correlated with the time-delayed signal b t + l and the audio signal q t is highly correlated with both a t and b t + l ( t = 1 , 2 , , n ) . Based on CCF r a b ( l ) , it can be assessed with the help of partial CCF (PCCF) and represented as follows [32]:
r a b | q ( l ) = r a b ( l ) r a q r q b ( l ) ( 1 r a q 2 ) ( 1 r q b 2 ( l ) )
where the CCF at lag l is represented by r a b ( l ) and r q b ( l ) . A discrete PCCF which corresponds to lag l is represented by r a b | q ( l ) . The elimination of the linear influence q t is conducted from r a b ( l ) . The PCCF manages to indicate the cross-correlation among the signal residuals.

2.3. LASSO Model

LASSO is quite powerful in selecting variables and constructing regression techniques. A penalty term is added to the model’s residual sum of squares in the concept of LASSO [33]. If the penalty term is exceeded, then a greater number of variables will be penalized. A regularization process is added to the penalty term, and so, some of its coefficients are shrunk to zero. As the least correlated variables shrink, the interpretability of the regression model increases. The irrelevant variables can be eliminated, and so, only the explanatory variables are brought on board, thereby eliminating overfitting. During training, the model can fit well, but it cannot predict the testing data accurately. Finally, in between the variance and bias, a suitable tradeoff could be found, and so, the model can be less complex and well balanced. In this linear model, the mathematical formulation is as follows:
G = H β + ε
where the response variable is denoted as G and the explanatory variable is denoted as H . H is assessed as an H n × k matrix and G is assessed as G n × 1 matrix in a vector format. The RSS is minimized by LASSO where the upper bound is specified by t . The formula for LASSO is expressed in the following equation:
min i = 1 n ( g i j h i j β j ) 2   subject   to   j = 1 k | β j | < t
The optimization of it in the vector form is expressed as follows:
min ( | G H β | 2 n ) 2   subject   to   j = 1 k | β j | < t
The addition of a penalty term is conducted before the sum of the absolute value of the model parameters. If the penalty term is large, then the model will have more shrinkage. The estimation of the parameters is represented as follows:
β ^ ( λ ) = arg min β ( | G H β | 2 2 n + λ j = 1 k | β j | )
The j t h coefficient is assumed to be zero, and it will be eliminated from the model. The penalty values can be altered so that the number of selected features can be changed.

2.4. Elastic Net Regularization Model

A slightly broader version of the LASSO technique/ridge regression is the elastic net regularization model [34]. Linear models have a lot of vector coefficients, and sometimes, there is an instability of estimates, and to prevent it, ridge regression is used. An accurate estimate of the coefficient is provided by the LASSO, as some explanatory atoms associated with it become zero. The regularization parameter denoted as α is denoted between 0 and 1 and is non-negative in nature λ ( λ 0 ) . It helps to solve the following issue and is represented as follows:
min a h , α ( 1 2 N Z D h a h , α 2 2 + λ ( 1 α 2 a h , α 2 2 + α a h , α 1 ) )
The ridge regression and LASSO are obtained when the α values are considered as 0 and 1. When the value of α = 0.5, both the properties of ridge regression and LASSO are imbibed by it. Producing the lowest error is very important so that the coefficient vector is well estimated; thereby, the selection of non-zero entries is performed easily. For various values of λ , a geometric sequence is used by the elastic net regularization models so that the feature shrinkage is controlled. Depending on the minimum mean square error, the assessment of the optimal value of λ is performed. With the help of the alternating direction method of multipliers, the optimization of Equation (14) is carried out. The termination of the iterative update of the vector a h , α is performed when there is a nice change in its size observed below a threshold. Some quantities of interest can be assessed at this convergence point so that the optimal dictionary can be identified well.

2.5. Support Vector Regression

When kernel range strategies are extended, SVR is formed, and it is used for regression analysis [35]. The qualities that help to explain SVM like sparsity, entropy and kernels are all supported by SVR. For predictive data analysis, SVR is a potential tool and is used in multiple applications. For the purpose of both non-linear and linear mapping, SVR is widely used, and different kernels can be predicted efficiently. Between the input and output, the mapping function is expressed as follows:
g i = w ζ ( y ) + α
where y is considered as the input ( y 1 , y 2 , , y N t r a i n ) , and the output is considered as g i . The weight vector is assigned by w . The training data number is expressed by N t r a i n , and the constant is expressed by α . The non-linear function is expressed by ξ ( ) . With the help of the following function, the estimation of the value of w is performed.
Minimize   1 2 w 2 + B i = 1 N t r a i n ( θ i θ i * )
Subject   to   { g i ( w ξ ( y i ) + α i ) ε + θ i ( w ξ ( y i ) + α i ) g i ε + θ i * θ i , θ i * 0
where the slack variables are expressed as θ i , θ i * . The box constraint is expressed as B , and the insensitive loss function is expressed by ε . With the help of following equation, the output can be computed as follows:
g i = i = 1 N t r a i n ( θ i + θ i * ) K ( y i , y ) + α
where the kernel function is denoted as K ( y i , y ) .

2.6. Learning through Ensemble Methods

To obtain good accuracy, many ensemble approaches are leveraged so that a good conceptual framework is developed. The overall performance can be improved by ensemble learning where several weak learners are bought together and then combined. When there are limited training data, this technique is implemented. The output of various classifiers is averaged, and so, the change in selecting an inadequate classifier is mitigated by the ensemble algorithm [36]. In the development of ensemble learning algorithms, bagging and boosting techniques are utilized. A famous regression algorithm that utilizes the bootstrapping technique is called the bagging ensemble learning technique. Here, a forest of decision trees is created randomly depending on various training sets. From every decision tree, an output is considered so that the final output is generated by averaging the equation below:
f ^ a v e r a g e = 1 T i = 1 T f ^ T ( y )
where the individual prediction from T training sets is expressed by f ^ T ( y ) . Another type of ensemble leaning is boosting ensemble learning techniques where the bootstrap sampling method is not used. The creation of the models is conducted in a sequential manner, and hence, a good knowledge of the model is necessary before the generation of the model.

3. Proposed Technique 2: Stacked CAE Network for Audio Cough Classification

3.1. CAE

An unsupervised feature learning neural network is autoencoder (AE), where the learned features are specified by the hidden layer [37]. The encoder network comprises the input and hidden layers, and the original input is transformed into hidden features by this network. The decoder network comprises the hidden and output layers, and the original inputs are reconstructed from the learned hidden features. For every input vector p d from datasets { p d } d = 1 M , the hidden vector h d is represented as follows:
h d = f ( W ( 1 ) p d + b ( 1 ) )
For every input vector p d from datasets { p d } d = 1 M , the reconstruction vector p ^ d is expressed as follows:
p ^ d = f ( W ( 2 ) h d + b ( 2 ) )
where the weight matrices are expressed by W ( 1 ) and W ( 2 ) . The bias vectors are indicated by b ( 1 ) and b ( 2 ) , and the activation function is indicated by f .
In this work, the sigmoid function is utilized, and its derivative function is expressed as follows:
f ( β ) = 1 ( 1 + e β )
f ( β ) = f ( β ) ( 1 f ( β ) )
The reconstruction error is expressed as follows:
L ( p d , p ^ d ) = 1 2 p d p ^ d 2
To the reconstruction error of the AE, a penalty term is applied, and therefore, a CAE can be obtained easily [38]. With respect to the input vector, the Frobenius norm of the Jacobian matrix of the hidden activations corresponds to the penalty term and is expressed as follows:
J f ( p d ) F 2 = i = 1 n j = 1 e ( h j ( p d ) p i ) 2
For the j t h row and the i t h column in the matrix, the J f ( p d ) element is specified as J f ( p d ) j i and is defined as follows:
J f ( p d ) j i = h j ( p d ) p i
The computation of Equation (26) is conducted when the activation function is a sigmoid function and is represented as follows:
J f ( p d ) j i = h j ( p d ) ( 1 h j ( p d ) ) W j i ( 1 )
where the connection weight between neuron j in the hidden layer and neuron i in the input layer is expressed by W j i ( 1 ) . Equation (27) can be further computed dependent on Equation (28) as follows:
J f ( p d ) F 2 = j = 1 e ( h j ( p d ) ( 1 h j ( p d ) ) ) 2 i = 1 n W j i ( 1 ) 2
For S samples, the overall cost function of the CAE is expressed as follows:
J ( W , b ) = 1 S d = 1 s [ L ( p d , p ^ d ) + λ J f ( p d ) F 2 ]
where the penalty term parameters are specified by λ . The relative significance of the reconstruction error and the penalty term is controlled by λ .
The representative hidden features can be learnt automatically by training CAE so that all the useful information can be easily extracted. By utilizing the BP algorithm, the cost function J ( W , b ) is minimized, and so, even to the small changes in the inputs, it can be invariant. Each element W ( 1 ) , W ( 2 ) , b ( 1 ) and b ( 2 ) is updated after every iteration as follows:
W j i ( 1 ) = W j i ( 1 ) ε J ( w , b ) W j i ( 1 ) W i j ( 2 ) = W i j ( 2 ) ε J ( w , b ) W i j ( 2 )
b j ( 1 ) = b j ( 1 ) ε J ( w , b ) b j ( 1 ) b j ( 2 ) = b j ( 2 ) ε J ( w , b ) b j ( 2 )
where the learning rate is expressed by ε . Thus, with the help of the training process, the simultaneous learning of the optimal parameters W ( 1 ) , W ( 2 ) , b ( 1 ) , b ( 2 ) is determined.

3.2. Softmax Classifier

For multi-class classification, a commonly utilized classifier is the softmax classifier [39]. Assuming the description of training datasets as { ( a ( 1 ) , b ( 1 ) ) , , ( a ( N ) , b ( N ) ) } , where one of the input vectors of the softmax classifier is expressed as a ( S ) R c , s = 1 , 2 , , N and the corresponding label is represented as m = 1 , 2 , , N . If the training datasets are assumed to belong to k different classes, then k different values are obtained by the labels, and b ( s ) { 1 , 2 , , k } . The output vector o ( s ) for every input vector a ( s ) is expressed as follows:
o ( s ) = θ a ( s )
where for the softmax classifier the parameter matrix is expressed as θ R k × c and is defined as
θ = [ θ 1 T θ 2 T : θ k T ]
The probability of a ( s ) belonging to class q (i.e., the probability of b ( s ) = q , q = 1 , 2 , , k ) is defined as follows:
p ( b ( s ) = q | a ( s ) ; θ ) = e θ q T a ( s ) v = 1 k e θ v T a ( s )
The probability of a ( m ) belonging to each class is estimated by the hypothesis hθ(a(m)) and is expressed as follows:
h θ ( a ( s ) ) = [ p ( b ( s ) = 1 | a ( s ) ; θ ) p ( b ( s ) = 2 | a ( s ) ; θ ) : p ( b ( s ) = k | a ( s ) ; θ ) ] = 1 v = 1 k e θ v T a ( s ) [ e θ 1 T a ( s ) e θ 2 T a ( s ) : e θ k T a ( s ) ]
In between the target and the predicted class labels, the error has to be described, and the cost function of the softmax classifier J ( θ ) is defined as follows:
J ( θ ) = 1 N [ s = 1 N q = 1 k 1 { b s = q } log e θ q T a ( s ) v = 1 k e θ v T a ( s ) ] + γ 2 θ 2 2
To prevent overfitting, the second term, also called the weight penalty term, is used. To control the significance of the penalty term, the penalty term parameter used is γ . To procure the optimal parameter matrix θ , the training of the softmax classifier is also performed like the training of CAE so that by means of mixing the art function with the back propagation (BP) algorithm a suitable transfer function is established between the input and the target labels. Every element of θ is updated after each iteration as follows:
θ j i = θ j i ε 2 J ( θ ) θ j i
where the learning rate is expressed as ε 2 . θ j i indicates the elements of θ in the j t h row and i t h column.

3.3. Stacked CAE Network

To form a stacked CAE network with hidden layers, L CAEs and one softmax classifier are stacked together [40]. The encoder network of CAE1 comprises the input and the first hidden layer. The encoder network of CAE2 comprises the first and second hidden layers. To the output layers of the stacked CAE network, the addition of a softmax classifier is performed so that the classification can be conducted efficiently. Pre-training and fine-tuning are the two important training processes of the stacked CAE network. Each CAE is pre-trained individually by the pre-training process, and to learn the intra-non-linear transformation relationship of every single CAE, a softmax classifier is used. The relationship between the target labels and the learned high-level features can be easily explored by this process. A better set of weight initialization is obtained by this process rather than utilizing random initialization. In between the layers, the inter-relationships are learned by the fine-tuning process, and the best suitable transformation relationships between the labels and the learned high-level intrinsic features are established well, and so, classification is enabled. In between the target and the predicted class labels, the error must be described, and the cost function of the softmax classifier J ( θ ) is defined as follows:
J ( θ ) = 1 N [ S = 1 N q = 1 k 1 { b s = q } log e θ q T a ( s ) v = 1 k e θ v T a ( s ) ] + γ 2 θ 2 2
To prevent overfitting, the second term (weight penalty term) is used. To control the significance of the penalty term, the penalty term parameter used is γ . To procure the optimal parameter matrix θ , the training of the softmax classifier is also conducted like the training of CAE so that by means of minimizing the cost function with the BP algorithm a suitable transfer function is established between the input and the target labels. Every element of θ is updated after each iteration as follows:
θ j i = θ j i ε 2 J ( θ ) θ j i
where ε 2 expresses the learning rate and θ j i indicates the elements of θ in the j t h row and i t h column. To train CAE1, the BP optimization algorithm is used so that the optimal parameters are obtained by means of minimizing the cost function [37]. Once the training of CAE1 is performed effectively, the computation of the first hidden representations h 1 d is conducted as the input vector of CAE2. The definition of h 1 d is expressed as follows:
h 1 d = f ( W 1 ( 1 ) p d + b 1 ( 1 ) )
Using the BP algorithm, CAE2 is also effectively trained, and the computation of the second hidden representation h 2 d is conducted as the input vector of CAE3. Until the effective training of the L t h CAE is carried out, this process is repeated, and the L t h hidden representation h L d is computed as the input vector of the softmax classifier and is expressed as follows:
h L d = f ( W L ( 1 ) h L 1 d + b L ( 1 ) )
The calculation of both the hypothesis H y p θ ( h L ( d ) ) and the cost function J ( θ ) stacked is conducted as follows:
H y p θ ( h L ( d ) ) = [ p | q ( d ) = 1 | h L ( d ) ; θ p | q ( d ) = 2 | h L ( d ) ; θ : p | q ( d ) = k | h L ( d ) ; θ ] = 1 v = 1 k e θ v T h L ( d ) [ e θ 1 T h L ( d ) e θ 2 T h L ( d ) : e θ k T h L ( d ) ]
J ( θ ) s t a c k e d = 1 S [ d = 1 M r = 1 k 1 { q ( d ) = r } log e θ r T h L ( d ) v = 1 k e θ v T h L ( d ) ] + γ 2 θ 2 2
Hence, by means of minimizing the cost function, the optimal parameter matrix θ is obtained, and ultimately, the training of the softmax classifiers is conducted with the help of supervised learning. Finally, to fine-tune the parameters of the entire network, the BP algorithm is used so that the most suitable relationship between the layers is well established, and thereby, a well-trained stacked CAE network is obtained. Figure 2 shows the illustration for the proposed stacked CAE for cough audio classification.

4. Proposed Technique 3: Efficient Feature Selection and Machine Learning Classification

4.1. Analysis of TQWT

Depending on the oscillation characteristics of the audio cough signal, an improvised version of the Wavelet Transform is proposed, known as TQWT [41]. With the aid of real-valued sampling factors, reversible oversampling filters are used by TQWT. Through the low-pass and high-pass filters, the passing of the input is performed in every decomposition stage. For the low-pass and high-pass filter in TQWT, the frequency responses are indicated as G 0 ( w ) and G 1 ( w ) and expressed as follows:
G 0 ( w ) = { 1 | w | ( 1 σ ) π θ [ w + ( σ 1 ) π μ + σ 1 ] ( 1 σ ) π < | w | < μ π 0 μ π | w | π
G 1 ( w ) = { 0 | w | ( 1 σ ) π θ [ μ π w μ + σ 1 ] ( 1 σ ) π < | w | < μ π 1 μ π | w | π
where the low-pass scaling factor is denoted by μ and the high-pass scaling factor is denoted by σ , and it is assumed that 0 < μ < 1 , 0 < σ 1 and μ + σ > 1 . The Daubechies filter is considered as θ ( w ) , which helps to construct the transition bands G 0 ( w ) and G 1 ( w ) , giving rise to the following equation:
θ ( w ) = 1 2 ( 1 + cos w ) 2 cos w , | w | π
The low-pass filter output w L of every stage is the preceding stage input in this multi-layer decomposition technique. For each layer, the wavelet coefficient is w H . To obtain a novel set, the processing of every coefficient is conducted after the decomposition is performed. With the help of TQWT inverse transformation, the reconstruction of the signal can be performed. The oversampling rate r , quality factor Q and the decomposition level D are the main parameters here. Q = f 0 B w is the ration between the center frequency f 0 and the bandwidth of the signal B w . The number of wavelet coefficients can be expressed by r , and it assesses the spectral overlap occurring between the corresponding bandpass filters. In our work, r = 5 is considered. D + 1 sub-bands are produced by D -level decomposition. The filter’s bandwidth limitation is assessed, and the following is obtained:
D max = [ lg ( σ N s / 8 ) lg ( 1 / μ ) ]
Once the parameters are determined, σ = 2 Q + 1 and μ = 1 2 ( Q + 1 ) r can be computed.

4.2. Analysis of Sparse Coefficients of TQWT

The wavelet coefficients that assess the given audio cough signal are not distinctive as TQWT is oversampled. By means of basis pursuit, the sparse collection of TQWT coefficients can be assessed. The ( D + 1 ) wavelet coefficients are specified as { w d } d = 1 D + 1 , w d = [ w d , 1 , w d , 2 , , w d , H d ] , where the number of coefficients of the d t h set is represented by H d . To solve the coefficients, the following optimization problem is utilized:
arg min w z T Q W T 1 ( w ) 2 2 + d = 1 D + 1 η d w d
where the regularization parameter for the d t h sub-band is expressed as η d . The TQWT reconstruction is assigned as T Q W T 1 ( ) . With the help of Lagrangian shrinkage methods, the optimization problem can be solved easily.

4.3. Analysis of Maximal Information Coefficient

In between the two variables, the association strength can be easily measured by MIC where both the linear and non-linear relationships are considered [42]. The data are partitioned into a grid, and then, the dependence between the variables is measured in every cell. Assuming two variables a i and a j ; then, the computation of MIC is conducted as follows:
M I C = max { I ( a i , a j ) / log 2 min { n a , n j } }
where the data point is represented by n . The partition bins are expressed by n a and n j . The quality of information about a particular variable is provided by this highest value of dependence. Detecting relationships is quite possible as the MIC is quite robust to outliers and the merits of maximal dependency are considered.

4.4. Analysis of Distance Correlation Coefficient

Between the two random variables, the interdependence is evaluated by means of analyzing both the linear and non-linear relationships [43]. Between the two variables, the covariance between the distance matrices is normalized by means of distance standard deviations, and it is represented as follows:
d C o r ( a i , a j ) = d C o r ( a i , a j ) d V a r ( a i ) , d V a r ( a j )
The samples have a lot of observations, and the pairwise distance between them is indicated by a distance matrix. The correlation present between the scalar products is emphasized by the DCC. The proposed model of feature extraction/selection and classification for audio cough classification is shown in Figure 3.

4.5. Feature Selection

For all the extracted features, the discriminative capabilities are not important or higher, and so, sometimes, this could lead to overfitting in the network. Only the essential features must be retained, and the redundant ones should be eliminated. By means of optimizing the fitness or cost function, the optimal features are identified by feature selection tasks by making use of nature-inspired algorithms. In this work, a simple and effective algorithm called the Binary Tunicate Swarm Algorithm (BTSA) is used for feature selection.

4.5.1. BTSA

In the deep ocean, the food source search process of tunicates was evaluated to develop this famous swarm-based optimization algorithm. With the help of a jet propulsion scheme, the tunicates progress towards food sources, thereby exhibiting a higher amount of intelligence. To solve various numerical optimization issues, the mathematical modelling of these two behaviors is considered. In the BTSA algorithm, the tunicate position is converted to a binary form of a vector [44]. The tunicates have a swarm behavior, and so, collisions among them are prevented and progression is towards the direction of the best tunicate. While searching for food particles, the tunicates avoid colliding with one another. This strategy is defined as follows:
P = G S
where the gravity forces among the agents are represented by G and social forces among the agents are represented by S . Both of these forces regulate tunicate movement. The water flow ( W ) is used to control the gravity force and is represented as follows:
G = c 1 + c 2 W
W = 2 c 3
where random variables are expressed as c 1 , c 2 and c 3 , located by r a n d ( 0 , 1 ) . The parameter S is expressed as follows:
S = h i + c 3 ( h s h i )
where h i and h s indicate the min and max speed so that a good social interaction is achieved. The tunicates trace the best position and progress towards it so that collisions are avoided between them. The average distance is computed as follows:
D k = | Q b r 1 · Q k ( i ) |
where the distance of the present location of the k t h tunicate ( Q k ( i ) ) from the best position of tunicates ( Q b ) is represented by D k . The tunicate tries to remain near the best tunicate, and so, a good source position is tried out and expressed by the following equation:
Q k ( i ) = { Q b + P · D k , r a n d 0.5 Q b P · D k , r a n d 0.5
The next k t h tunicate position is updated as follows:
Q k ( i + 1 ) = Q k ( i ) + Q b 2 + c 1
To match the feature selection issue, these positions are matched to a binary value. By means of applying transfer functions such as S-shaped and V-shaped, this conversion happens. In this work, a V-shaped transfer function (T) is used by the BTSA, so the positions are updated well from one iteration to another, and this is usually performed in binary space. The tunicate position in binary form is updated as follows:
Q K ( i + 1 ) = { Q k ( i ) , r a n d ( ) < T ( Q k ( i + 1 ) ) Q k ( i ) , r a n d ( ) T ( Q k ( i + 1 ) )
where Q k ( i ) indicates the complement of Q k ( i ) .
The considered fitness function is as follows. Depending on the feature selection ratio and the classification error, the fitness function can be designed as follows:
F i t n e s s 1 = k 1 e r r o r + k 2 F S R
where error represents the classification error. The total number of features chosen for the feature size is considered the feature selection ratio (FSR). The weighing factors are chosen as k 1 and k 2 in the range of [0, 1]. To mitigate the classifier burden, a small FSR is usually useful.
For the optimal feature selection technique, the BTSA along with the fitness function is utilized. Algorithm 1 shows the BTSA feature choosing process. The tunicates symbolize the search agent position vectors and are assigned initially with binary values which are randomly generated. Every subset of a feature is selected by the agent where the feature from the initial vector is indicated by the binary value 1. By choosing the feature subset, every agent performs the classification task, and the fitness function is used to evaluate the result. The recording of the position vector which mitigates the fitness function is conducted. Every position of each agent is updated by the TSA. With the aid of the transformation function, the position is converted into binary values. Unless the termination criteria are met, the algorithm iterates continually by representing the novel position vectors. The optimal feature set is provided by the best position vector when the iteration ends.
Algorithm 1: BTSA Algorithm
Input: Feature Set
Assign: Search agents (Tunicate population)
Start TSA parameters
   For i = 1: no of search agents do
      Choose feature with a position 1
      Execute classification task with selected features
End
Assess fitness function
Memorize the best agent position
While (no of iterations) do
   Update the tunicate positions
   Convert to binary values
   Execute classification using updated positions
   Assess Fitness function
   Memorize the global best position
   If convergence reached then
   Exit the process
   End
End
Output the optimal feature set.

4.5.2. Choosing Aggregate Functions

The main goal of feature selection lies in aggregation function operations, Ref. [45]. In some techniques, every feature selection is collected and combined together to make a final decision. Aggregate functions based on basic operations such as average/minimum/maximum/median, etc., are usually considered. Depending on the voting strategies also, the aggregation technique can be developed easily.

Aggregation Techniques Depending on Basic Operations

Assuming there are k feature selection results, and every result comprises y features. The minimum value of the feature correlation is chosen by considering the minimum aggregation. The final score of all features is computed, and it is ordered to obtain the ultimate score so that various feature subsets can be aggregated. If the end score is high, the feature is more significant. The computation of the feature f z ( z = 1 , 2 , , y ) end score is considered here. The max aggregation is nothing but the end score of f z , and it depends on the highest scores of each subset, and it is computed as follows:
S c o r e = M a x ( F 1 ( f z ) , F 2 ( f z ) , , F k ( f z ) )
The minimum aggregation is nothing but the end score of f z , and it depends on the lowest scores of each subset, and it is computed as follows:
S c o r e = M i n ( F 1 ( f z ) , F 2 ( f z ) , , F k ( f z ) )
The median aggregation is nothing but the end score of f z , and it depends on the median scores of each subset, and it is computed as follows:
S c o r e = M e d i a n ( F 1 ( f z ) , F 2 ( f z ) , , F k ( f z ) )
The mean aggregation is nothing but the end score of f z , and it depends on the mean score of each subset, and it is computed as follows:
S c o r e = 1 k i = 1 k F i ( f z )

4.5.3. Factor Analysis

The relationships can be well summarized or reduced with the help of factor analysis so that a limited number of basic dimensions are obtained ultimately [46]. The dependency structure can be destroyed completely by means of mitigating the size. In between the variables, the relationships are explored so that some novel structures are formed and the number of variables is reduced. The selection of varimax is performed if there are independent factors in FA. In between the variables, the correlation is computed by means of assessing the ingredients of the obtained factors. Exploratory and confirmatory factor analysis are two types of factor analysis used generally and, in this work, explanatory factor analysis is used. The factor is assumed to be a simple linear hybridization of its variables. In between the observed variables, the covariance and correlation are linked, and so, it forms a variable reduction method. A collection of variables is completely reduced by means of utilizing a limited number of latent factors. The variables with very high correlations are considered by FA and then analyzed together as factors. With various names, FA is referred based on its utility. By means of utility correlation and covariance matrices of the data, independent new variables are formed by the EFA.

EFA

To provide a good visualization and analysis, the different variables are considered and correlated to each other, and then, the best few factors are considered after eliminating the total number of variables. The covariance or correlation matrix is used in this dimension reduction technique. The observed variables ( Z i , i = 1 , 2 , , q ) are assumed to be a linear combination of the factors. A factor analysis model with some observed variables and common factors is expressed as follows:
Z 1 μ 1 = c 11 F 1 + c 12 F 2 + + c 1 k F k + ε 1
Z 2 μ 2 = c 21 F 1 + c 22 F 2 + + c 2 k F k + ε 1
Z q μ q = c q 1 F 1 + c q 2 F 2 + + c q k F k + ε q
where the coefficient is written as c i j and is expressed as factor loadings. The average vector is represented by μ . The matrix form of the FA model is expressed as follows:
Z μ = L o a d m F a c t o r v + ε
where the different vectors are denoted by Z μ , L o a d m is the factor loading matrix, F a c t o r v is the factor vector and ε is the error vector. To assess the factors, likelihood techniques are used in this study.

4.6. KELM

The ELM is a commonly used machine learning technique comprising a single hidden layer feed-forward neural network [47]. A lot of parameters are present in the standard neural network, and adjusting the parameters becomes quite difficult, so back propagation is necessary. The inputs and hidden layer nodes are connected by means of initializing the weights and biases in a random manner by the ELM. Then, in an analytic manner, the weights are estimated appropriately, and so, the performance and learning efficiency is highly improved. Sometimes, when choosing the hidden layer nodes and in the random start mechanism process, an unstable prediction may happen in the ELM, and to prevent these shortcomings, the KELM was proposed. The expression of SLFN is as follows:
f ( p ) = h ( p ) β = H β
where the input vector is denoted by p . The weights to fulfil the transformation of the feature are denoted by β , and the feature mapping matrix is indicated by H .
β is calculated as follows:
β = H T ( I C + H H T ) 1 T
where the regularization coefficient is denoted as C , and the label data of the training dataset are denoted by T . The expression of the ELM is as follows:
f ( p ) = h ( p ) β = h ( p ) H T ( I C + H H T ) 1 T
The kernel function replaces the feature mapping of the hidden layer so that the KELM is obtained with generalization and stability. The definition of KELM is as follows:
f ( p ) = h ( p ) H T ( I C + H H T ) 1 T = [ k ( p , p 1 ) : k ( p , p N ) ] ( I C + Ω K E L M ) 1 T
where Ω K E L M is known as the kernel matrix and is expressed as follows:
Ω K E L M = h ( p i ) · h ( p j ) = k ( p i · p j )
When compared to SVM [48], the generalization performance of the KELM is much better.

4.6.1. Arc-Cosine Kernel Function

Choosing the kernel function is quite important in the KELM. The most used kernel functions are sigmoid, wavelet, polynomial, RBF, etc. In this study, the arc-cosine kernel function is used. Depending on the cosine angle, the computation of vector input similarity has been conducted, and some good results have been obtained [49]. With the help of integral representation, the n t h order kernel is expressed as follows:
K n ( p , q ) = 2 d w e w 2 2 ( 2 π ) d 2 Θ ( w · p ) Θ ( w · q ) ( w · p ) n ( w · q ) n
where the Heaviside step function is indicated as follows:
Θ ( l ) = 1 2 ( 1 + s i g n ( l ) )
With n degrees, an arc-cosine kernel family is expressed as follows:
k n ( p , q ) = 1 π p n q n A n ( θ )
where the angular dependence is computed by A n ( θ ) and is expressed as follows:
A n ( θ ) = ( 1 ) n ( sin θ ) 2 n + 1 ( 1 sin θ θ ) n ( π θ sin θ ) , n = 1 , 2 , 3 ,
The first 3 expenses of A n ( θ ) are expressed as
{ A 0 ( θ ) = π θ A 1 ( θ ) = sin θ + ( π θ ) cos θ A 2 ( θ ) = 3 sin θ cos θ + ( π θ ) ( 1 + 2 cos 2 θ )
The similarity of vectors p and q is measured by the angle θ and is expressed by the following:
θ = cos 1 ( ( p · q ) p q )
When the activation factor has arbitrary values, it implies that various geometric properties are considered by the arc-cosine kernel. There are no corresponding kernel parameters for the arc-cosine kernel function, so no tedious and meticulous pre-setting is required here.

4.6.2. RSO-Based KELM

The kernel parameter and penalty coefficient are important in choosing the KELM algorithm. A better optimization capability is possessed by this RSO algorithm, and so, it is used to optimize the KELM [50]. The fitness function is nothing but the mean square error (MSE), and, in every iteration, the parameters with the minimum error are considered. This is conducted so that the best kernel parameters and penalty coefficient can be obtained. The MSE formula is considered as follows:
M S E = 1 N t = 1 N ( m ^ ( t ) m ( t ) ) 2
where the original data are considered as m ( t ) and the predictor data are considered as m ^ ( t ) . The predicted sample number is expressed by N . The main procedure of RSO-KELM is expressed as follows:
(a)
The features are input and divided into test and training data. The kernel parameter and penalty coefficient of the KELM are obtained as variables.
(b)
The rat population size is set as s . The maximum iteration number is set as T .
(c)
The individual position of a rat is assigned as R i , i = 1 , 2 , , n .
(d)
For every rat, the fitness value is computed.
(e)
The best position (individual) of the rat R r .
(f)
Assume the present iteration number i = 0 .
(g)
The individual position information of the rat is updated.
(h)
For every rat, the updated fitness value is computed.
(i)
The present optimal position R r is saved if the present position is better than the previous position.
(j)
Assume i = i + 1 . The best position is output as R r .
(k)
The algorithm ends; otherwise, the individual position of the rat is updated and the process starts once again.
(l)
The individual position of the best rat R r is given as an output which is nothing but the kernel parameter and best penalty coefficient.

5. Results and Discussion

This work was implemented on a publicly available dataset: http://web.firat.edu.tr/turkertuncer/acute_asthma_cough.rar [51]. From Firat University Hospital, 842 subjects were considered, including healthy individuals and individuals with conditions such as heart failure, COVID-19 and acute asthma. For the acquisition of cough signals, smart phone microphones were utilized, and signals were saved at a sampling rate of 48 KHz. To avoid redundancy, some of the speech signals were removed manually. A smaller number of segments was obtained when every cough sound signal was subdivided. The parameter settings for the experiments followed in this work are as follows. As far as the BTSA is concerned, the weighing factor was considered as 0.6. The iteration limit was considered as 500, the search space was considered within [0, 10] and the population size was considered as 50. When the ELM is considered, the regularization coefficient was set to 4, and the kernel parameter was set to 200. The gamma value was set to 0.5, and the error tolerance in kernel matrix factorization was set to a trial-and-error approach. The input dimension was present, and the model structure selection was based on it. For the LASSO model, the grid search approach is utilized widely, and it ranges between 0 and 20. As far as SVR is concerned, the input dimension was present, the constant value was set as 0.5 and the gamma value was set as 0.8. When using the concept of autoencoders, the learning rate used in every layer was 0.4, and the learning rate of the fine-tuning was set at 0.2. As far as the hyperparameters in the bagging ensemble learning are concerned, the number of trees was specified as 10, the minimum leaf size of the tree was set as 50 and the number of predictors in each split was set as 75. A 10-fold cross-validation method was used in this work for all the experiments. This work was implemented on Matlab 2020a version, windows 10.1 operating system, 512 GB SDD hard disk, intel i7 CPU and 32 GB RAM.
Table 1 shows the proposed work of robust models with Gabor dictionary analysis and machine learning. Table 2 shows the proposed stacked CAE for audio cough classification. Table 3 shows the proposed model of TQWT, feature selection and classification for audio cough classification. Table 4 shows the proposed model of sparse TQWT, feature selection and classification for audio cough classification. Table 5 shows the proposed model of MIC, feature selection and classification for audio cough classification. Table 6 shows the proposed model of DCC, feature selection and classification for audio cough classification. On examining Table 1, it is evident that a high classification accuracy of 98.78% is obtained when the Gabor dictionary is utilized with an elastic net model and ensemble models. A low classification accuracy of 92.05% is obtained when the Gabor dictionary is utilized with the LASSO model. On examining Table 2, it is evident that a high classification accuracy of 96.35% is obtained when using SCAE, and a low classification accuracy of 91.02% is obtained when using AE. On examining Table 3, it is evident that a high classification accuracy of 98.34% is obtained when AF is utilized with the arc-cosine ELM classifier. A low classification accuracy of 90.31% is obtained when FA is utilized with the ELM classifier. On examining Table 4, it is evident that a high classification accuracy of 98.99% is obtained when AF is utilized with the arc-cosine ELM classifier. A low classification accuracy of 91.23% is obtained when FA is utilized with the ELM classifier. On examining Table 5, it is evident that a high classification accuracy of 98.86% is obtained when EFA is utilized with the arc-cosine ELM classifier. A low classification accuracy of 90.11% is obtained when FA is utilized with the ELM classifier. On examining Table 6, it is evident that a high classification accuracy of 98.93% is obtained when EFA is classified with the arc-cosine ELM classifier. A low classification accuracy of 92.23% is obtained when the BTSA is classified with the ELM classifier.
On examining Figure 4, it is evident that a high classification accuracy is obtained for the Gabor dictionary and elastic net combination classified with the ensemble models. On examining Figure 5, it is evident that a high classification accuracy is obtained for the TQWT concept with AF analysis and the arc-cosine ELM classifier. On examining Figure 6, it is evident that a high classification accuracy is obtained from the DCC concept with EFA analysis and classified with the arc-cosine ELM classifier.

Performance Comparison with Other Works

A generalized comparison is given in Table 7 for the reader’s understanding where different techniques have been utilized with various subjects reporting a classification accuracy for various datasets utilized for audio cough classification. However, this work mainly compares its results with the Firat University Hospital dataset [51] that contains 110 individuals with acute asthma, 247 healthy individuals, 241 individuals with COVID-19 and 244 individuals with heart failure, and the results are provided.
On examining Table 7, the proposed work gave very good results when compared to previous works, especially in reference number [51]. Though the previous work proposed a classification accuracy of 100 percent, only a single idea was concentrated on, whereas in this work, three unique ideas were implemented, and a comprehensive analysis was performed and reported. The highest classification accuracy of 98.99% was obtained when sparse TQWT with AF was implemented with the arc-cosine ELM classifier. The second highest classification accuracy of 98.93% was obtained when DCC with EFA was implemented with the arc-cosine ELM classifier. The third highest classification accuracy of 98.86% was obtained when MIC with EFA was implemented with the arc-cosine ELM classifier. The advantage of the proposed model is that it utilizes the intrinsic properties of all the hybrid models, thereby taking advantage of all the wonderful properties of each algorithm, and a good result is produced every time. The disadvantage of the proposed model is that one must be careful in choosing the various models or algorithms, as certain combinations can produce extremely good results, and some combinations can produce very poor results. As a result, a lot of trial-and-error combination possibilities must be tried out, which might require a lot of time and energy from the researchers.

6. Conclusions and Future Works

In machine learning, one of the hot research issues is the cough-based detection of diseases. As the COVID-19 pandemic affected the lives of people very badly, the innate signs of COVID-19 may resemble other medical conditions such as kidney problems, lung problems, heart failure problems, etc. With the advent of AI techniques, many wonderful applications have been implemented in the field of medicine. In this work, cough sound signals are used for disease detection with the help of some algorithms and machine learning techniques. Various computer-aided diagnostic methods are used for the automated detection of many disorders, and so, in this work, these nice strategies are developed using systematic conglomerated models. The best results were produced when sparse TQWT with AF was implemented with the arc-cosine ELM classifier, producing a classification accuracy of 98.99%. The second highest classification accuracy of 98.93% was obtained when DCC with EFA was implemented with the arc-cosine ELM classifier. The use of advanced AI technique chip-based diagnostic models can be developed in the future too. This work can also be extended to telemedicine and cloud-based applications by means of applying it directly to embedded systems. This work can be useful for pulmonology clinics where it can largely assist the physician. The work strategy can also be implemented with other signal processing techniques in the future. Work can also be planned to implement large audio cough datasets in the future.

Author Contributions

S.K.P.—Conceptualization, Methodology, Software, Validation, Formal Analysis, Investigation, Resources, Data Curation, Writing—Original draft preparation; D.-O.W.—Validation, Formal Analysis, Writing—review and editing, visualization, supervision, project administration, funding acquisition. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. 2022R1A5A8019303) and partly supported by Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. 2021-0-02068, Artificial Intelligence Innovation Hub).

Data Availability Statement

The data is publicly available online at http://web.firat.edu.tr/turkertuncer/acute_asthma_cough.rar.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Harle, A.S.; Blackhall, F.H.; Molassiotis, A.; Yorke, J.; Dockry, R.; Holt, K.J.; Yuill, D.; Baker, K.; Smith, J.A. Cough in patients with lung cancer: A longitudinal observational study of characterization and clinical associations. Chest 2019, 155, 103–113. [Google Scholar] [CrossRef] [PubMed]
  2. Sharma, N.; Krishnan, P.; Kumar, R.; Ramoji, S.; Chetupalli, S.R.; Ghosh, P.K.; Ganapathy, S. Coswara—A database of breathing, cough, and voice sounds for COVID-19 diagnosis. arXiv 2020, arXiv:2005.10548. [Google Scholar]
  3. Nessiem, M.A.; Mohamed, M.M.; Coppock, H.; Gaskell, A.; Schuller, B.W. Detecting COVID-19 from breathing and coughing sounds using deep neural networks. In Proceedings of the 2021 IEEE 34th International Symposium on Computer-Based Medical Systems (CBMS), Aveiro, Portugal, 7–9 June 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 183–188. [Google Scholar]
  4. Abaza, A.A.; Day, J.B.; Reynolds, J.S.; Mahmoud, A.M.; Goldsmith, W.T.; McKinney, W.G.; Petsonk, E.L.; Frazer, D.G. Classification of voluntary cough sound and airflow patterns for detecting abnormal pulmonary function. Cough 2009, 5, 8. [Google Scholar] [CrossRef] [PubMed]
  5. Lücio, C.; Teixeira, C.; Henriques, J.; de Carvalho, P.; Paiva, R.P. Voluntary cough detection by internal sound analysis. In Proceedings of the 2014 7th International Conference on Biomedical Engineering and Informatics, BMEI 2014, Dalian, China, 14–16 October 2014. [Google Scholar] [CrossRef]
  6. Thorpe, C.W.; Toop, L.J.; Dawson, K.P. Towards a quantitative description of asthmatic cough sounds. Eur. Respir. J. 1992, 5, 685–692. [Google Scholar] [CrossRef] [PubMed]
  7. Kosasih, K.; Abeyratne, U.R.; Swarnkar, V.; Triasih, R. Wavelet augmented cough analysis for rapid childhood pneumonia diagnosis. IEEE Trans. Biomed. Eng. 2015, 62, 1185–1194. [Google Scholar] [CrossRef]
  8. Sharan, R.V.; Abeyratne, U.R.; Swarnkar, V.R.; Porter, P. Automatic croup diagnosis using cough sound recognition. IEEE Trans. Biomed. Eng. 2018, 66, 485–495. [Google Scholar] [CrossRef] [PubMed]
  9. Swarnkar, V.; Abeyratne, U.R.; Chang, A.B.; Amrulloh, Y.A.; Setyati, A.; Triasih, R. Automatic identification of wet and dry cough in pediatric patients with respiratory diseases. Ann. Biomed. Eng. 2013, 41, 1016–1028. [Google Scholar] [CrossRef] [PubMed]
  10. Shi, Y.; Liu, H.; Wang, Y.; Cai, M.; Xu, W. Theory and application of audio-based assessment of cough. J. Sens. 2018, 2018, 9845321. [Google Scholar] [CrossRef]
  11. Rudraraju, G.; Palreddy, S.; Mamidgi, B.; Sripada, N.R.; Sai, Y.P.; Vodnala, N.K.; Haranath, S.P. Cough sound analysis and objective correlation with spirometry and clinical diagnosis. Inform. Med. Unlocked 2020, 19, 100319. [Google Scholar] [CrossRef]
  12. Malik, H.; Anees, T. Multi-modal deep learning methods for classification of chest diseases using different medical imaging and cough sounds. PLoS ONE 2024, 19, e0296352. [Google Scholar] [CrossRef]
  13. Pahar, M.; Klopper, M.; Reeve, B.; Warren, R.; Theron, G.; Niesler, T. Automatic cough classification for tuberculosis screening in a real-world environment. Physiol. Meas. 2021, 42, 105014. [Google Scholar] [CrossRef] [PubMed]
  14. Bansal, V.; Pahwa, G.; Kannan, N. Cough Classification for COVID-19 based on audio mfcc features using Convolutional Neural Networks. In Proceedings of the 2020 IEEE International Conference on Computing, Power and Communication Technologies (GUCON), Greater Noida, India, 2–4 October 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 604–608. [Google Scholar]
  15. Ulukaya, S.; Sarıca, A.A.; Erdem, O.; Karaali, A. MSCCov19Net: Multi-branch deep learning model for COVID-19 detection from cough sounds. Med. Biol. Eng. Comput. 2023, 61, 1619–1629. [Google Scholar] [CrossRef] [PubMed]
  16. Loey, M.; Mirjalili, S. COVID-19 cough sound symptoms classification from scalogram image representation using deep learning models. Comput. Biol. Med. 2021, 139, 105020. [Google Scholar] [CrossRef] [PubMed]
  17. Chowdhury, N.K.; Kabir, M.A.; Rahman, M.M.; Islam SM, S. Machine learning for detecting COVID-19 from cough sounds: An ensemble-based MCDM method. Comput. Biol. Med. 2022, 145, 105405. [Google Scholar] [CrossRef] [PubMed]
  18. Ghrabli, S.; Elgendi, M.; Menon, C. Identifying unique spectral fingerprints in cough sounds for diagnosing respiratory ailments. Sci. Rep. 2024, 14, 593. [Google Scholar] [CrossRef] [PubMed]
  19. Balamurali, B.T.; Hee, H.I.; Kapoor, S.; Teoh, O.H.; Teng, S.S.; Lee, K.P.; Herremans, D.; Chen, J.M. Deep Neural Network-Based Respiratory Pathology Classification Using Cough Sounds. Sensors 2021, 21, 5555. [Google Scholar] [CrossRef] [PubMed]
  20. Kho, S.J.; Shiong, B.L.C.; Wan-Tze, V.; Boon, L.K.; Pathmanathan, M.D.; Rahman, M.A.B.A.; Xuan, K.P.; Hanafi, W.N.B.W.; Peariasamy, K.M.; Hui, P.T.H. Malaysian cough sound analysis and COVID-19 classification with deep learning. Intell. Based Med. 2024, 9, 100129. [Google Scholar] [CrossRef]
  21. Pal, A.; Sankarasubbu, M. Pay attention to the cough: Early diagnosis of COVID-19 using interpretable symptoms embeddings with cough sound signal processing. In Proceedings of the 36th Annual ACM Symposium on Applied Computing, Virtual, 22–26 March 2021; pp. 620–628. [Google Scholar]
  22. Pahar, M.; Klopper, M.; Warren, R.; Niesler, T. COVID-19 cough classification using machine learning and global smartphone recordings. Comput. Biol. Med. 2021, 135, 104572. [Google Scholar] [CrossRef] [PubMed]
  23. Islam, R.; Abdel-Raheem, E.; Tarique, M. A study of using cough sounds and deep neural networks for the early detection of COVID-19. Biomed. Eng. Adv. 2022, 3, 100025. [Google Scholar] [CrossRef]
  24. Knocikova, J.; Korpas, J.; Vrabec, M.; Javorka, M. Wavelet analysis of voluntary cough sound in patients with respiratory diseases. J. Physiol. Pharmacol. 2008, 59, 331–340. [Google Scholar]
  25. Mouawad, P.; Dubnov, T.; Dubnov, S. Robust Detection of COVID-19 in Cough Sounds: Using Recurrence Dynamics and Variable Markov Model. SN Comput. Sci. 2021, 2, 34. [Google Scholar] [CrossRef]
  26. Amrulloh, Y.; Abeyratne, U.; Swarnkar, V.; Triasih, R. Cough sound analysis for pneumonia and asthma classification in pediatric population. In Proceedings of the 2015 6th International Conference on Intelligent Systems, Modelling and Simulation, Kuala Lumpur, Malaysia, 9–12 February 2015; pp. 127–131. [Google Scholar]
  27. Yadav, S.; Keerthana, M.; Gope, D.; Ghosh, P.K. Analysis of acoustic features for speech sound-based classification of asthmatic and healthy subjects. In Proceedings of the ICASSP 2020—2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; pp. 6789–6793. [Google Scholar]
  28. Hee, H.I.; Balamurali, B.; Karunakaran, A.; Herremans, D.; Teoh, O.H.; Lee, K.P.; Teng, S.S.; Lui, S.; Chen, J.M. Development of machine learning for asthmatic and healthy voluntary cough Sounds: A proof-of-concept study. Appl. Sci. 2019, 9, 2833. [Google Scholar] [CrossRef]
  29. Cohen, A.; Daubechies, I.; Jawerth, B.; Vial, P. Multiresolution analysis, wavelets and fast wavelet transform on an interval. C. R. Acad. Sci. Paris 1993, 316, 417–421. [Google Scholar]
  30. Zhao, S.; Zhang, Q.; Yang, H. Orthogonal Matching Pursuit Based on Tree-Structure Redundant Dictionary. In Proceedings of the Advances in Computer Science, Environment, Ecoinformatics, and Education. CSEE 2011. Communications in Computer and Information Science, Wuhan, China, 21–22 August 2011; Lin, S., Huang, X., Eds.; Springer: Berlin/Heidelberg, Germany, 2011; Volume 215. [Google Scholar]
  31. Yang, M.; Zhang, L. Gabor Feature Based Sparse Representation for Face Recognition with Gabor Occlusion Dictionary. In Proceedings of the Computer Vision—ECCV 2010. ECCV 2010. Lecture Notes in Computer Science, Heraklion, Greece, 5–11 September 2010; Daniilidis, K., Maragos, P., Paragios, N., Eds.; Springer: Berlin/Heidelberg, Germany, 2010; Volume 6316. [Google Scholar]
  32. Dean, R.T.; Dunsmuir, W.T.M. Dangers and uses of cross-correlation in analyzing time series in perception, performance, movement, and neuroscience: The importance of constructing transfer function autoregressive models. Behav. Res. 2016, 48, 783–802. [Google Scholar] [CrossRef] [PubMed]
  33. Yazdi, M.; Golilarz, N.A.; Nedjati, A.; Adesina, K.A. An improved lasso regression model for evaluating the efficiency of intervention actions in a system reliability analysis. Neural Comput. Appl. 2021, 33, 7913–7928. [Google Scholar] [CrossRef]
  34. Bry, X.; Niang, N.; Verron, T.; Bougeard, S. Clusterwise elastic-net regression based on a combined information criterion. Adv. Data Anal. Classif. 2023, 17, 75–107. [Google Scholar] [CrossRef]
  35. Kandiri, A.; Shakor, P.; Kurda, R.; Deifalla, A.F. Modified Artificial Neural Networks and Support Vector Regression to Predict Lateral Pressure Exerted by Fresh Concrete on Formwork. Int. J. Concr. Struct. Mater. 2022, 16, 64. [Google Scholar] [CrossRef]
  36. Nti, I.K.; Adekoya, A.F.; Weyori, B.A. A comprehensive evaluation of ensemble learning for stock-market prediction. J. Big Data 2020, 7, 20. [Google Scholar] [CrossRef]
  37. Torabi, H.; Mirtaheri, S.L.; Greco, S. Practical autoencoder based anomaly detection by using vector reconstruction error. Cybersecurity 2023, 6, 1. [Google Scholar] [CrossRef]
  38. Xu, X.; Wang, Y.; Wang, L.; Yu, B.; Jia, J. Conditional Temporal Variational AutoEncoder for Action Video Prediction. Int. J. Comput. Vis. 2023, 131, 2699–2722. [Google Scholar] [CrossRef]
  39. Webb, D. Applying Softmax Classifiers to Open Set. In Proceedings of the Data Mining. AusDM 2019. Communications in Computer and Information Science, Adelaide, Australia, 2–5 December 2019; Le, T.D., Ong, T.-L., Zhao, Y., Jin, W.H., Wong, S., Liu, L., Williams, G., Eds.; Springer: Singapore, 2019; Volume 1127. [Google Scholar]
  40. Tibebu, H.; Malik, A.; De Silva, V. Text to Image Synthesis Using Stacked Conditional Variational Autoencoders and Conditional Generative Adversarial Networks. In Proceedings of the Intelligent Computing. SAI 2022. Lecture Notes in Networks and Systems, Virtual, 14–15 July 2022; Arai, K., Ed.; Springer: Cham, Switzerland, 2022; Volume 506. [Google Scholar]
  41. Subasi, A.; Qaisar, S.M. Surface EMG signal classification using TQWT, Bagging and Boosting for hand movement recognition. J. Ambient. Intell. Humaniz. Comput. 2022, 13, 3539–3554. [Google Scholar] [CrossRef]
  42. Shuliang, W.; Surapunt, T. Bayesian Maximal Information Coefficient (BMIC) to reason novel trends in large datasets. Appl. Intell. 2022, 52, 10202–10219. [Google Scholar] [CrossRef]
  43. Miao, C. Clustering of different dimensional variables based on distance correlation coefficient. J. Ambient. Intell. Humaniz. Comput. 2021. [Google Scholar] [CrossRef]
  44. Rizk-Allah, R.M.; Hagag, E.A.; El-Fergany, A.A. Chaos-enhanced multi-objective tunicate swarm algorithm for economic-emission load dispatch problem. Soft Comput. 2023, 27, 5721–5739. [Google Scholar] [CrossRef]
  45. Rocklin, M.; Pinar, A. Computing an Aggregate Edge-Weight Function for Clustering Graphs with Multiple Edge Types. In Proceedings of the Algorithms and Models for the Web-Graph. WAW 2010. Lecture Notes in Computer Science, Stanford, CA, USA, 13–14 December 2010; Kumar, R., Sivakumar, D., Eds.; Springer: Berlin/Heidelberg, Germany, 2010; Volume 6516. [Google Scholar]
  46. Rajaguru, H.; Prabhakar, S.K. Factor Analysis and Weighted KNN Classifier for Epilepsy Classification from EEG signals. In Proceedings of the 2018 Second International Conference on Electronics, Communication and Aerospace Technology (ICECA), Coimbatore, India, 29–31 March 2018; pp. 332–335. [Google Scholar] [CrossRef]
  47. Huang, G.B.; Zhu, Q.Y.; Siew, C.K. Extreme Learning Machine: Theory and Applications. Neurocomputing 2006, 70, 489–501. [Google Scholar] [CrossRef]
  48. Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
  49. Ghanizadeh, A.N.; Ghiasi-Shirazi, K.; Monsefi, R.; Qaraei, M. Neural Generalization of Multiple Kernel Learning. Neural Process. Lett. 2024, 56, 12. [Google Scholar] [CrossRef]
  50. Dhiman, G.; Garg, M.; Nagar, A.; Kumar, V.; Dehghani, M. A novel algorithm for global optimization: Rat Swarm Optimizer. J. Ambient. Intell. Humaniz. Comput. 2021, 12, 8457–8482. [Google Scholar] [CrossRef]
  51. Kuluozturk, M.; Kobat, M.A.; Barua, P.D.; Dogan, S.; Tuncer, T.; Tan, R.S.; Ciaccio, E.J.; Acharya, U.R. DKPNet41: Directed knight pattern network-based cough sound classification model for automatic disease diagnosis. Med. Eng. Phys. 2022, 110, 103870. [Google Scholar] [CrossRef]
Figure 1. Proposed work of robust models with Gabor dictionary analysis and machine learning.
Figure 1. Proposed work of robust models with Gabor dictionary analysis and machine learning.
Algorithms 17 00302 g001
Figure 2. Proposed stacked CAE for audio cough classification.
Figure 2. Proposed stacked CAE for audio cough classification.
Algorithms 17 00302 g002
Figure 3. Proposed integrated model of feature extraction, feature selection and classification for audio cough classification.
Figure 3. Proposed integrated model of feature extraction, feature selection and classification for audio cough classification.
Algorithms 17 00302 g003
Figure 4. Proposed work of robust models with Gabor dictionary analysis and machine learning—Comparative Analysis.
Figure 4. Proposed work of robust models with Gabor dictionary analysis and machine learning—Comparative Analysis.
Algorithms 17 00302 g004
Figure 5. Comparative analysis of TQWT, feature selection and classification for audio cough classification.
Figure 5. Comparative analysis of TQWT, feature selection and classification for audio cough classification.
Algorithms 17 00302 g005
Figure 6. Comparative analysis of DCC, feature selection and classification for audio cough classification.
Figure 6. Comparative analysis of DCC, feature selection and classification for audio cough classification.
Algorithms 17 00302 g006
Table 1. Proposed work of robust models with Gabor dictionary analysis and machine learning.
Table 1. Proposed work of robust models with Gabor dictionary analysis and machine learning.
SVREnsemble ModelsNBCSVMAdaboostRFDT
Gabor dictionary + CCF96.7697.6493.4594.5693.2594.3793.45
Gabor dictionary + PCCF97.2198.0394.3595.3294.2495.2194.22
Gabor dictionary + LASSO96.2497.9995.2194.3494.9296.0192.05
Gabor dictionary + elastic net95.7798.7897.0997.2294.6795.7893.78
Table 2. Proposed stacked CAE for audio cough classification.
Table 2. Proposed stacked CAE for audio cough classification.
AECAESCAE
Accuracy91.0293.4596.35
Table 3. Proposed model of TQWT, feature selection and classification for audio cough classification.
Table 3. Proposed model of TQWT, feature selection and classification for audio cough classification.
ELMKernel ELMArc-Cosine ELMRSO-ELMSVMAdaboostNBC
BTSA89.4592.3596.8997.4495.3493.2390.12
AF91.0393.4598.3498.2396.3492.2491.22
FA90.3194.2197.3498.0195.4593.4592.01
EFA92.3494.9996.2398.1196.2291.4590.34
Table 4. Proposed model of sparse TQWT, feature selection and classification for audio cough classification.
Table 4. Proposed model of sparse TQWT, feature selection and classification for audio cough classification.
ELMKernel ELMArc-Cosine ELMRSO-ELMSVMAdaboostNBC
BTSA90.2393.3397.1198.0397.2395.3494.22
AF92.3494.2198.9998.2296.2394.5693.45
FA91.2395.1198.0298.0195.3593.5694.09
EFA93.2395.0197.3497.9997.0195.0193.99
Table 5. Proposed model of MIC, feature selection and classification for audio cough classification.
Table 5. Proposed model of MIC, feature selection and classification for audio cough classification.
ELMKernel ELMArc-Cosine ELMRSO-ELMSVMAdaboostNBC
BTSA91.3294.0895.2297.3496.8796.0996.11
AF93.2192.6697.3496.6798.5595.6796.45
FA90.1196.7895.6795.8797.6796.8997.64
EFA92.9993.9298.8698.1196.9394.2296.26
Table 6. Proposed model of DCC, feature selection and classification for audio cough classification.
Table 6. Proposed model of DCC, feature selection and classification for audio cough classification.
ELMKernel ELMArc-Cosine ELMRSO-ELMSVMAdaboostNBC
BTSA92.2395.0997.1198.2295.1195.0995.22
AF94.4694.7798.2498.3697.2394.9795.24
FA95.7797.6798.7597.8796.7895.5696.68
EFA95.8995.8398.9397.9895.9993.8895.94
Table 7. Performance comparison with other works.
Table 7. Performance comparison with other works.
Reference NumberMethodSubjectsAccuracy (%) or Other Performance Metric Mentioned
[17]MFCC and multi-criteria decision making622 COVID-19 and 1610 healthyAUC = 78.00%
[21]MFCC and DNN150 COVID-19, bronchitis, healthy and asthma90.8% for cough data
[22]CNN + LSTM92 COVID-19 and 1079 healthy95.3%
[23]MFCC and DNN50 COVID-19 and 50 healthy97.5%
[24]Wavelet Transform + LDA26 healthy, 17 asthma and 22 chronic obstructive pulmonary disease90%
[25]MFCC and Recurrence Quantification Analysis and XGBoost20 COVID-19 and 1468 healthy97%
[26]Neural networks9 asthma and 9 pneumoniaKappa = 89.00%
[27]MFCC + SVM47 asthma and 48 healthy77.8%
[51]Directed Knight Pattern110 acute asthma, 247 healthy, 241 COVID-19 and 244 heart failure100%
Proposed workGabor dictionary with elastic net and ensemble model110 acute asthma, 247 healthy, 241 COVID-19 and 244 heart failure98.78%
SCAE110 acute asthma, 247 healthy, 241 COVID-19 and 244 heart failure96.35%
TQWT + AF + arc-cosine ELM110 acute asthma, 247 healthy, 241 COVID-19 and 244 heart failure98.34%
Sparse TQWT + AF + arc-cosine ELM110 acute asthma, 247 healthy, 241 COVID-19 and 244 heart failure98.99%
MIC + EFA + arc-cosine ELM110 acute asthma, 247 healthy, 241 COVID-19 and 244 heart failure98.86%
DCC + EFA + arc-cosine ELM110 acute asthma, 247 healthy, 241 COVID-19 and 244 heart failure98.93%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Prabhakar, S.K.; Won, D.-O. SCMs: Systematic Conglomerated Models for Audio Cough Signal Classification. Algorithms 2024, 17, 302. https://doi.org/10.3390/a17070302

AMA Style

Prabhakar SK, Won D-O. SCMs: Systematic Conglomerated Models for Audio Cough Signal Classification. Algorithms. 2024; 17(7):302. https://doi.org/10.3390/a17070302

Chicago/Turabian Style

Prabhakar, Sunil Kumar, and Dong-Ok Won. 2024. "SCMs: Systematic Conglomerated Models for Audio Cough Signal Classification" Algorithms 17, no. 7: 302. https://doi.org/10.3390/a17070302

APA Style

Prabhakar, S. K., & Won, D. -O. (2024). SCMs: Systematic Conglomerated Models for Audio Cough Signal Classification. Algorithms, 17(7), 302. https://doi.org/10.3390/a17070302

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop