1. Introduction
Respiratory diseases include a wide spectrum of different medical conditions that affect the respiratory system, consisting of the organs responsible for breathing, like the lungs, bronchi and diaphragm. These conditions include flu, bronchitis, asthma, pneumonia, pulmonary fibrosis, cystic fibrosis, tuberculosis and lung cancer, among many others [
1]. The recent COVID-19 pandemic is also a member of this group of diseases, as it attacks the patient’s respiratory system as well [
2]. A significant number of these conditions share similar symptoms during the early stages; thus, proper and timely diagnosis is vital for a successful treatment [
3]. Examinations are typically tailored to each patient’s symptoms and suspected respiratory disease, so a combination of diagnostic tests is often necessary to accurately identify the condition and choose an adequate treatment procedure.
Along with an accurate diagnosis, early detection is also vital for optimal treatment. If the condition is discovered early enough, accurate and timely treatment may prevent disease progression, decrease the possibility of complications and drastically improve the outcome. Otherwise, the symptoms may develop and be considerably more severe, posing a much greater risk. When the condition is properly identified, treatment may start, depending on the specific disease, its severity and the patient’s history. Treatment in most cases comprises medication and different sorts of therapies; however, sometimes it also requires changes in the lifestyle of the patient and, in most severe cases, even surgery. Moreover, if the condition is not identified correctly, the applied treatment can be inefficient, harmful and dangerous [
4,
5].
Various advanced methods are used to diagnose respiratory conditions. These include ultrasounds, chest X-rays and magnetic resonance images (MRIs). Artificial intelligence (AI) techniques have also become widely accepted as support tools for early diagnosis due to their ability to detect subtle correlations. For example, deep learning methods, mainly convolutional neural networks (CNNs), are intensively utilized to analyze medical imaging data, like X-rays, MRIs and CT scans [
6,
7]. They can effectively aid in detecting and localizing tumors and lesions in lungs and other tissue. Recurrent neural networks are typically employed to analyze time-series data, like apatient’s vital signs, historical data and clinical records, to forecast the progression of the condition along with the treatment outcome [
8,
9]. Machine learning approaches, on the other hand, are successful in classification problems, and they are capable of distinguishing among different respiratory diseases with respect to the symptoms and/or test results [
10,
11,
12].
Generally speaking, AI algorithms are already intensively used in medical examinations, and they possess tremendous potential to revolutionize respiratory condition identification by supporting early and accurate diagnosis, along with personalized treatment and enhanced decision-making processes. Nevertheless, numerous challenges exist, like training data quality, the interpretability of the model and regulatory compliance, to name a few [
13,
14,
15]. Moreover, AI models necessitate the fine-tuning of their hyperparameters for every single practical problem, since a solitary method capable of achieving superior outcomes throughout all possible application domains simply does not exist, as stated by the no free lunch theorem (NFL) [
16].
Tuning hyperparameters is an extremely hard optimization problem, belonging to the group of NP-hard challenges by its nature. In other words, this task is not resolvable with the employment of standard deterministic approaches due to unacceptably intensive time and resource requirements. Stochastic algorithms, however, are capable of obtaining near-optimal solutions within a reasonable time frame. Metaheuristic algorithms, more precisely, have established themselves as very potent optimizers with considerable success for this particular use case. Once again, according to the NFL [
16], there is no sole algorithm capable of performing equally well on all possible optimization problems. Consequently, this constraint necessitates thorough experiments with various algorithms to find one that has the best performance on the particular task at hand. Fine-tuning the parameters of CNNs through metaheuristics is essential, as it significantly enhances the ability of the model to accurately identify complex patterns in data, thereby improving the efficacy and reliability of medical diagnostic tools [
17,
18,
19,
20].
A recent introduction to the family of nature-inspired optimizers is the elk heard optimization (EHO) [
21] algorithm. Introduced in 2024, the algorithm draws inspiration from the foraging and mating behaviors of elk, a species that is a member of the deer family. Elk are herbivores and are relatively low in the food chain. Nevertheless, they boast notable physical capabilities. Elks congregate in larger herds for safety, comprising males, females and offspring, and this herd structure serves as a defense mechanism against potential threats. Communication is maintained in the herd through a series of sounds that are distinct to each member’s role, with grunts used to signal danger and locate offspring. During the mating season, males engage in aggressive displays to establish dominance. These behaviors are mathematically modeled by the EHO algorithm to facilitate the exploration of a simulated search space in the hope of locating more promising solutions within acceptable time constraints and with realistic computational resources.
This research investigates a novel approach in the diagnosis of respiratory conditions based on the analysis of audio recordings. These recordings are first transformed into a graphical format by applying mel spectrograms. A CNN is subsequently used to perform the classification task. Since the performance of a CNN heavily relies on the appropriate choice of its hyperparameters, an altered version of the EHO algorithm is utilized. The EHO algorithm was proposed in 2024, and its potential has still not been fully explored in CNN optimization processes. This algorithm has been empirically chosen for further modifications, as the baseline variant yielded very promising outcomes on the smaller-scale experiments that were executed prior to the main simulations. Consequently, this manuscript’s major contributions may be outlined in the following manner:
A novel framework based on mel spectrograms is proposed to convert audio recordings of the respiratory system into images, with a CNN enrolled to perform the classification between healthy lungs and respiratory diseases.
An altered EHO metaheuristic is introduced, which compensates for some limitations of the baseline metaheuristics.
This improved metaheuristic has been utilized to optimize the CNN hyperparameters for this specific task.
The remainder of the manuscript is structured as follows.
Section 2 presents a literature survey of AI in medical diagnosis and a brief introduction to the employed technologies.
Section 3 commences by describing the baseline variant of the EHO metaheuristics, followed by the proposed modifications. The simulation environment is explained in
Section 4, while the simulation outcomes are presented in
Section 5. Finally,
Section 6 puts forward the concluding remarks, indicates future endeavors in this domain and wraps up the manuscript.
3. Methods
In this section, the baseline variant of EHO metaheuristics is explained first, followed by the noted limitations found during extensive experiments with benchmark functions. Later, the improved version of EHO is introduced, which aims to leverage the performance level of the basic EHO even further.
3.1. Elementary Elk Herd Optimization Algorithm
The EHO metaheuristics belong to the group of the most recent algorithms, as EHO was put forward in early 2024 [
21]. It belongs to the nature-inspired population-based methods and was inspired by the mating and breeding processes exhibited by the elk herd. These processes can be separated into a pair of main stages: rutting and calving. The first phase is characterized by the separation of the herd into families whose sizes may differ. The separation process is guided by the bulls and their competition for dominance, where the most powerful bulls form families with numerous females. During the second phase, every family produces fresh calves from the dominant male and related females. The baseline version of EHO is characterized by a singular control variable,
, denoting the initial bull ratio in the population.
Each execution of metaheuristics is triggered by generating the starting population: a herd of elks, comprising bulls and harems. The herd
is represented by a matrix shown in Equation (
10), with the dimensions
,
N being the size of the herd.
Every single elk
is generated according to Equation (
11):
where
and
correspond to the upper and lower boundaries of the solution realm. The elks within the population are subsequently arranged in ascending order based on their fitness values.
Within the rutting stage, families are established based upon the male rate . Initially, the overall number of families can be determined by . Males are selected from the population according to their fitness values. The top B elks regarding fitness (the cream of the crop within the population) are designated as bulls, symbolizing the confrontations for supremacy in which the most robust males engage, consequently securing more harems for their families.
Hence, the bulls within
B engage in combat to establish families. The roulette wheel technique is applied to distribute females among the bulls within
B, considering their fitness values relative to the cumulative fitness. Specifically, each male
within
B is allocated a probability
, which is determined by its absolute fitness marked by
, divided by the cumulative sum of fitness across all males, as outlined by the following equation:
In the calving phase, the progeny of each family, labeled
, is created based primarily on the traits inherited from the paternal bull
and the maternal elk, denoted as
. In cases where the calf
has the same index
i as the paternal bull of the family, it is produced according to the following equation:
In the equation above, represents an arbitrary value within the interval, utilized to determine the rate of inheriting attributes from the randomly picked elk from the population . Greater values of lead to a greater probability of arbitrary attributes in the fresh calf, which enhances diversification.
Alternatively, if the calf has an identical index to the mother, then
is going to be derived from both the mother
and the father
based on the next equation:
Above, corresponds to the i-th parameter of calf j during the -th iteration, denotes the bull of the j-th harem and r is the index of a randomly selected bull, as, in the wild, a certain probability exists that the mother was also engaged in mating with other bulls in the herd, in cases where the bull did not defend her appropriately. Lastly, variables and represent random values inside the range , having the role of arbitrarily selecting the ratio of attributes inherited from formerly generated calves.
In the subsequent stage, bulls, females and calves belonging to every family are consolidated. The elks are arranged in ascending order based on their fitness, and only the top-performing elks are retained for the following generation.
3.2. Improved Elk Herd Optimization Algorithm
Even though the baseline EHO is a novel algorithm, extensive experimentation with benchmark functions has shown that there is still some room for improvement. This manuscript suggests adding a quasi-reflection-based learning procedure (QRL) during the algorithm’s initialization stage, as it can aid in improving the search space coverage [
94]. For every parameter
j (
), a quasi-reflexive-opposite parameter (
) will be synthesized according to
where
corresponds to the random number from
. The altered EHO initialization stage commences by producing
solutions utilizing the QRL procedure while not increasing the algorithm’s complexity in terms of the fitness function evaluations (
). This is a common approach for complexity evaluations of metaheuristics methods, as the most expensive operation during the execution of the algorithm is the fitness function computation. The suggested initialization procedure is provided in Algorithm 1.
Algorithm 1 QRL Initialization Procedure |
Stage 1: Construct starting populace having individuals through traditional EHO initialization scheme provided in Equation ( 11) Stage 2: Construct QRL populace upon by employing the Equation ( 15) Stage 3: Construct final initial populace P as union of and () Stage 4: Compute the fitness score for every solution in P Stage 5: Arrange all solutions belonging to P with respect to their fitness scores
|
After the initialization stage, during the entire execution of the metaheuristic, the worst solution in every iteration is deleted and replaced with the QRL opposite of the best individual (guided best procedure). This introduced alteration does not increase the complexity of the baseline method calculated in because the fitness scores are not evaluated.
Another alteration introduced to the EHO is inspired by a GA [
67]. As the iterations pass by and the algorithm begins to converge, the search should be focused on the best solutions found so far. The algorithm is accelerated in the last
rounds, where
denotes the maximum number of iterations, by replacing the solution that has the second-worst fitness score by the fresh individual synthesized as the hybrid of the best two solutions after applying a uniform crossover operator with the probability of crossover per gene (parameter of the individual) being
. This modification reinforces the exploitation and thus accelerates the algorithm. Again, this alteration does not introduce supplementary calculations of the fitness values; therefore, it does not increase the complexity with respect to the
. Consequently, the complexity of the modified EHO is the same as the baseline EHO. The altered variant of EHO has been named Accelerated Guided best Adaptive EHO (AGbAEHO), and the pseudocode is illustrated in Algorithm 2, where variable
t denotes the current iteration of the optimization.
Algorithm 2 AGbAEHO Pseudocode |
Synthesize initial population P with QRL as explained in Algorithm 1 t = 0 while do Sort solutions within P regarding their fitness scores for each solution X in P do Employ EHO search to conduct optimization Synthesize fresh solution as QRL opposite of the current best Replace the worst solution within P with this fresh solution if then Produce new individual as the hybrid of the best pair of individuals by applying uniform crossover operator having Replace the second-worst solution with this novel hybrid individual end if end for t = t + 1 end while return the best solution found within P
|
4. Experimental Setup
To evaluate the viability of the introduced approach, this work used a publicly available respiratory dataset [
95]:
https://www.kaggle.com/datasets/vbookshelf/respiratory-sound-database, accessed on 15 March 2024. The dataset comprises audio recordings in Waveform Audio File (wav) format. Respiratory sounds are an indicator of lung health as well as the presence of disorders. The sounds produced while an individual inhales and exhales are directly related to issues of air movement or lung tissues. Listening to a person’s lungs while breathing is a standard step during medical checkups and a vital part of the diagnosis. Particular sounds can indicate the presence of certain conditions such as asthma or chronic obstructive disorders. The dataset includes 920 annotated recordings of 126 patients.
The format of the audio files makes them relatively poorly suitable for use with AI classifiers. While time-series classification may be applied, the intense computational demands of such an approach make it relatively inefficient. Furthermore, the large size of the dataset exacerbates these computational demands. Finally, much of the data in the audio file can be considered redundant. While manual filtering might be used to reduce and focus on certain frequencies, the use of mel spectrograms allows the audio file to be simultaneously observed across a spectrum. Additionally, mel spectrograms can be treated as images, making them well-suited for use with CNNs. The use of CNNs allows certain connections between neurons to be discarded as well as the use of filter kernels, thus reducing computational demands to some degree. In this work, the WAV audio files are converted into mel spectrogram images using the Librosa Python library. Images are then labeled according to the condition of the patient. Training is facilitated using 70% of the samples, while 30% are reserved for testing the trained models. A sample of generated spectrograms can be observed in
Figure 1.
Once inputs are established, the next step is to determine suitable network architectures and appropriate training parameters for the networks. Due to the large search spaces for potential solutions, traditional methods such as grid search would be ineffective. Therefore, metaheuristic optimizers are employed to optimize both network architecture and training parameters. The parameters and their experimental ranges are empirically determined to have the highest impact on performance. These include the number of CNN layers and fully connected layers in the networks as well as the number of neurons in each individual layer. Additionally, the number of training epochs is tuned alongside the learning rate and dropout probability. The respective parameter ranges are provided in
Table 1.
Given the computational intensity of training deep learning models, the experimental setup is constrained to a maximum of eight iterations () per run, with each run utilizing only six solution candidates. We use 30 runs for each experiment to account for the stochastic nature of the metaheuristic optimization processes. These settings make the experiments computationally feasible and assure the repeatability of the results.
To meet the demands of this study, a modified version of the recently introduced EHO, called AGbAEHO, is utilized. The algorithm is utilized to optimize the CNN through an iterative process of hyperparameter tuning. To assess the performance of the introduced approach, it is compared to the original EHO [
21]. Additionally, several state-of-the-art optimizers are also included in a comparative analysis. All algorithms are independently implemented for this study in Python. Each optimizer is issued six agents, with eight iterations allocated to improve outcomes. Algorithms chosen for the comparative analysis include the GA [
96], PSO [
66], the ABC [
97], the FA [
69], SCHO [
98] and the COLSHADE [
74] algorithm. Optimizers are implemented with the default parameter suggested in the respective source works that introduced each algorithm.
As the evaluated algorithms optimized classification models, several metrics are included in the assessment to ensure a thorough evaluation. Apart from the standard accuracy, precision, recall and f1-score metrics shown in Equations (
16)–(
19), the Cohen’s kappa [
99] statistic is also tracked during experimentation and calculated according to Equation (
20).
where
,
,
and
denote true positive, true negative, false positive and false negative values, respectively.
where
represents an observed value while
is the expected. Cohen’s kappa is well suited for working with imbalanced data, such as the dataset utilized in this work, and it is therefore utilized as an objective function. Alongside the objective function, the indicator function is tracked. In the case of this work, the indicator function is the error rate determined as
. This metric is a fairly intuitive way of understanding the outcomes. Two sets of experiments are conducted: the first explores the detection of respiratory condition detection, and the second evaluates the potential of identifying specific types of conditions.
A flowchart of the introduced framework is presented in
Figure 2.
6. Conclusions
This work examines the diagnostic potential of AI for respiratory illness detection, emphasizing the value of prompt diagnosis and treatment in enhancing patient outcomes in a range of healthcare environments. By using audio analysis and CNNs, a potentially helpful way to determine patients’ respiratory problems is introduced. Due to the heavy dependence of classifiers on algorithm performance, a modified version of a metaheuristic optimizer is introduced. Simulations using mel spectrograms of patients’ breathing patterns, in particular, demonstrate the potential of this method in respiratory condition detection and multiclass classification scenarios. An accuracy of 0.933 is demonstrated for condition detection, with specific condition classification demonstrating an accuracy of 0.75.
Notwithstanding these encouraging results, it is critical to recognize the inherent limits of this work. The limited data availability makes it difficult to investigate a wider range of respiratory diseases, and the computing requirements of optimization limit the thorough investigation of different optimizers. Future works hope to further refine the proposed methodology and address some of the observed limitations as additional computational resources and data become available.