Prediction of Asthma Disease Using Machine Learning Algorithm

Zahab,; Hussain, Manzoor; Parwati, Lusiana Sani

doi:10.3390/engproc2025107115

Open AccessProceeding Paper

Prediction of Asthma Disease Using Machine Learning Algorithm^†

by

Zahab

^1,*,

Manzoor Hussain

² and

Lusiana Sani Parwati

³

¹

Department of Software Engineering, University of Sialkot, Sialkot 51040, Pakistan

²

Department of Computing, Indus University, Karachi 74000, Pakistan

³

Department of Mathematics, Nusa Putra University, Sukabumi 43152, West Java, Indonesia

^*

Author to whom correspondence should be addressed.

^†

Presented at the 7th International Global Conference Series on ICT Integration in Technical Education & Smart Society, Aizuwakamatsu City, Japan, 20–26 January 2025.

Eng. Proc. 2025, 107(1), 115; https://doi.org/10.3390/engproc2025107115

Published: 26 September 2025

(This article belongs to the Proceedings of The 7th International Global Conference Series on ICT Integration in Technical Education & Smart Society)

Download

Browse Figures

Versions Notes

Abstract

Millions of people worldwide suffer from asthma disease, and frequently, early diagnosis and efficient treatment are needed to enhance patient outcomes. Through an analysis of clinical and environmental characteristics, this study investigates a machine learning algorithm for predicting asthma using decision trees, K-Nearest Neighbors, random forests, and the naïve Bayes method. A dataset related to asthma disease is divided into two parts, with the first part for training consisting of around 70% and the second part for testing comprising 30%. Before dividing the subset, SMOTE is applied to balance the dataset because the dataset is unbalanced. Regarding the four algorithms, the decision tree attained better accuracy than the other algorithms. K-NN (K Nearest Neighbor) attained 97.50% accuracy, random forest attained 97.35% accuracy, naïve Bayes attained 69.99% accuracy, and the decision tree attained 67.65% accuracy. In all algorithms, the decision tree performed with high accuracy, as its prediction is 97.65% correct in detection. These algorithms can be applied to related predictive healthcare tasks.

Keywords:

asthma prediction; machine learning; decision tree; SMOTE; predictive healthcare

1. Introduction

Worldwide, over 300 million people suffer from asthma disease, which affects the airways. This is a major global health problem. Some common symptoms of asthma include coughing, chest tightness, wheezing, and shortness of breath. These symptoms are different from others in extent and duration from patient to patient. Asthma has a great impact on the healthcare system. As asthma symptoms are witnessed, including tight chest, coughing, wheezing, and panic attacks [1], in this advanced era, there is an opportunity to improve the healthcare system to understand this complex disease and predict it on time [2]. The major reason for the increase in the number of cases of asthma disease is environmental factors. Many factories and vehicles increase air pollution due to climate change, and these problems are linked to an increase in the number of asthma patients. Most newborn babies are affected by this disease.

According to the Global Asthma Report 2019, every year, almost 461,000 people die worldwide from asthma, and there are almost 1000 deaths per day due to asthma disease. This is the reason why asthma disease has a high impact on the healthcare system. In 2014, the United States diagnosed 17.7 million adults and 6.3 million children with asthma [3]. In this study, the diagnosis of asthma was made using machine learning algorithms. This is also known as an Artificial Intelligence technique in which the algorithms learn from available data and predict unseen data [4]. We take a dataset of asthma disease from Kaggle and work on the dataset or use machine learning algorithms like decision trees, K-Nearest Neighbors, random forest, and naïve Bayes to attain the best accuracy. Using the algorithm with the best accuracy helps in training models to predict asthma disease on time and allows patients to receive treatment on time. This also helps in detecting the disease on time, and many patients thus recover on time. The flow of this research paper is shown in Figure 1.

2. Literature Review

We studied [5] and found that this paper explored machine learning application in asthma management using M-health, focusing on the challenges of delivering personalized care in highly variable conditions. This research paper highlighted the gap in leveraging diverse M-health data, as existing research is limited to small datasets, and there is not enough real-world validation. Reviewing 22 studies, the authors divided their efforts into three parts, technology development, attack prediction, and clustering, with most applying Supervised Machine Learning Algorithms. Promising results were achieved in attack prediction and subgroup identification; however, their ability to be used widely remains a constraint. Compared to earlier methods, machine learning may have greater potential for use in integrated asthma care but needs testing on a large scale. We also studied [6] and found that this research paper focuses on the challenges of the prediction of childhood asthma, which is very difficult due to symptom variability and the limitations of the old methods for regression modeling in terms of accuracy and generalizability. This study identified a gap in machine learning for the validation of external prediction models. By developing the CAPE and CAPP (Childhood Asthma Prediction in Early life and Childhood Asthma Prediction at Preschool age) models using recursive feature elimination and SVM (Support Vector machine) classifiers, the authors then achieved effective predication performance (AUCs (Area Under the Curve) of 0.71 and 0.82) and effective external validation. These models are regression-based models, such as those based on PARS (Predictive Asthma Risk Score), in terms of sensitivity and predictive power. This work focuses on the potential of machine learning to enhance asthma prediction.

Furthermore, we studied [7] and found that this research paper focuses on the challenge of complex asthma diagnosis by highlighting the inefficiency of the current method in leveraging routine blood biomarkers. Then the gap in developing simpler, scalable models was compared with complex techniques like SVM and neural networks. Using the Mahala Nobis–Taguchi System (MTS), this research improved seven biomarkers, including the platelet distribution width and eosinophil count reaching 94.15%, a specificity of 97.20%, and an AUC of 0.983. Ref. [7] relied on 24 features with SVM using this approach and achieved effective accuracy with fewer features. We then studied [8] and found that this research paper focuses on imbalanced datasets and irrelevant features in asthma detection. Old machine learning methods like SVM and neural networks fail to resolve these issues effectively. The authors suggested combining an approach using an improved generative Adversarial Network for data augmentation and extreme Gradient Boosting for feature selection and classification. Then using this method, they achieved 94.03% accuracy and an AUC of 0.929 and found five critical features, like Score and Eosin2, for improving consistency in testing. They compared methods like the use of backpropagation neural networks (80.68%) and SVM (89.73%). They suggested that this model has effective performance by integrating advanced data processing techniques.

We also studied [9] and found that this research paper focuses on complex genetic interaction in asthma diagnosis and the missing heritability phenomenon. The authors identified a gap in old methods which usually miss the SNP-SNP (Single Nucleotide Polymorphism-Single Nucleotide Polymorphism) interaction and struggle with complex data. They used two algorithms: SVM and random forest. Using random forest for feature selection and Support vector Machine for classification, this research achieved 62.17% accuracy and 69% sensitivity and identified key SNPs linked to asthma. In 2011 we combined clinical data with SPN for an AUC of 0.66. This research relied on SNP data and achieved comparable results, showing the effectiveness of RF (Random Forest) for genomic feature selection. We then studied [10] and found that this research paper focuses on the challenge of the accurate detection of asthma disease and chronic respiratory disease by using machine learning models, and it points out the complexity of many causes of asthma. The authors identified a gap in the machine learning application of numerical datasets for asthma disease detection with more focus on biomarkers or imaging. The authors applied many models to the asthma disease dataset taken from Kaggle using many techniques like SMOTE (Synthetic Minority Over-sampling Technique) for class imbalance and PCA for dimensionality reduction. The cat boost classifier is better than others, achieving 96.04% accuracy, while using models like SVC (Support Vector Classifier) and neural networks leads to more effective results than when using others.

We studied [11] and found that this research paper focuses on the challenges of asthma prediction, which is difficult due to the high complexity of some factors like medical history, biomarkers, and environmental triggers. Old methods fail to detect the asthma disease. The authors identified a gap in multidimensional data and the limited application of machine learning in clinical settings. Their study explored some ML models like regression gradient boosting machine, random forests, and decision trees using diverse datasets such as electronic health records and patient-generated data. The results depend on the ML model rather than old methods. If we want to achieve higher prediction accuracy by leveraging key features like prior exacerbations and environmental exposure compared to earlier studies, these models give effective performance. We studied [12] and found that this research paper focuses on the challenges of the prediction of asthma disease and points out old methods of detecting disease and many factors of disease. The authors identified a gap in leveraging large-scale datasets and advance machine learning techniques for asthma populations. Using some machine learning algorithms like XGBoost, LSTM, and Transformers on over 1.3 million patients’ electronic health records, this study mentioned the key risk factors and protective factors. The XGBoost models achieve the best performance, being better than those in earlier studies like Tong and Zein, which use smaller datasets and have lower prediction accuracy. We then studied [13], and we found that this research paper conducts a meta-analysis and systematic review to focus on the challenges of asthma disease detection, as they decrease the quality of life and increase healthcare problems. While other studies use the old regression model method, these studies identify gaps in exploring advanced machine learning algorithms like boosting and random forests. After studying 11 research articles using 23 models, the authors then report that the boosting and random forest algorithms achieved higher prediction performance (AUROC (Area Under the Receiver Operating Characteristic curve) = 0.84) compared with logistic regression (AUROC = 0.77). Some key prediction indicators include steroid use, emergency visits, and exacerbation history. Similarly to how prior work pointed out gaps in the old method, this study highlights the promise of machine learning but calls for improved wide applicability using large datasets and standard methodologies. We studied [14] and found that this paper addresses the challenges of the prediction of asthma and eczema phenotypes, emphasizing the heterogeneity of this disease and the limitations of old diagnostic methods. It identifies a gap in integrating diverse data sources, like environmental, genetic, and clinical attributes. For more accurate or correct prediction, machine learning algorithms like random forest and regression should be used. The dataset was studied, and it was found that the most effective algorithm was random forest (AUROC: 84% for asthma and 76% for wheezing and 64% for eczema). Some key prediction indicators included allergen sensitization and lung function, with bio-impedance also identified as a novel factor for eczema. Unlike some previous research papers like [15], this relies on prior diagnoses for higher sensitivity, and this paper avoided circular reasoning and focused on diversity for effective predictions. We also studied and found that this research paper focuses on old methods of the diagnosis of asthma disease by using complex data patterns [16]. The authors identified a gap in leveraging affinity graph base classifiers and already explored old machine learning methods like SVM and ensemble methods. This research suggests that the Affinity Graph Enhanced Classifier (AGEC) improves the prediction of affinity graphs by utilizing 24 blood biomarkers. The AGEC has 72.50% accuracy, and in terms of AUROC, the AGEC has a value of 74.01%, SVM has 69.80%, and FWAdaboost has 61.02%. A prior study using this approach effectively demonstrated the correlation of the potential of the graph base method in asthma prediction [17].

3. Methodology

In this study, a machine learning algorithm was used to predict asthma disease. This method makes predictions using algorithms to gain knowledge and learn about patterns. This method is effective because since technologies are advancing, we can then use them and predict asthma disease before it is too late. We trained the model and used different technologies [18,19] to predict this disease on time. It is difficult to predict asthma disease in childhood because all old methods are very time-consuming, and these methods use machine learning algorithms but attain less accuracy [6,7]. Four algorithms are applied to the dataset after balancing the dataset using SMOTE up-sampling because our dataset is more imbalanced. The algorithms applied to the dataset are the K-NN (K nearest neighbor), decision tree, random forest, and naïve Bayes algorithms, and the proposed solution framework is shown in Figure 2.

3.1. Boxplot

The boxplot in Figure 3 is showing the physical activity of the dataset.

3.2. Dataset Description

Different parameters for detecting asthma disease were used according to WHO (World Health Organization) guidelines, and different resources were used to create a dataset. The labels 0 and 1 are shown in the last column of the dataset. There are 2392 total instances in this dataset [17]. This dataset was acquired via Kaggle.com. The dataset was utilized to train the model for the prediction of asthma disease. It contains 28 features that can be used for the prediction of asthma disease. These features are based on global standards and WHO guidelines. The attributes are shown in Table 1. First, we checked whether the dataset was balanced or imbalanced and found that the dataset was highly imbalanced, as shown in Figure 4.

The dataset is imbalanced, and the number of instances in the dataset is shown in Table 2.

As shown, more instances are labeled as 0 than 1. The label 0 contains 2268 instances, while 94.81% of the data contains a label of 0. The label 1 contains only 124 instances, which means that 5.18% of the data contains a label of 1. This dataset is imbalanced. We then used SMOTE (Synthetic Minority Over-sampling Technique). This technique helps us to generate synthetic instances for minority class 1 and then balance the dataset [18]. There are 2268 instances of 0 and 124 instances of 1, and after using this technique, the number of instances of 1 is increased to 2268, balancing the dataset. After applying the SMOTE up-sampling technique, the graph of the dataset is created, as shown in Figure 4.

4. Results

We discuss the accuracy of the datasets for asthma disease. We apply four algorithms to obtain different accuracies. For example, the K-NN algorithm obtains 97.50% accuracy. The naïve Bayes algorithm obtains 96.99% accuracy. The random forest algorithm obtains 97.35% accuracy. The applied decision tree obtains an accuracy of 97.65. Using random forests allows us to obtain better accuracy than when using other algorithms. A graph of accuracy is shown in Figure 5.

This graph shows the accuracy of the algorithms applied to the dataset. First, the naïve Bayes algorithm attained 96.99% accuracy; the random forest algorithm attained 97.35% accuracy; the K-NN algorithm attained 97.50% accuracy; and finally, the decision tree algorithm attained 97.65% accuracy. The decision tree attained the best accuracy compared to the other algorithms. We used the decision tree to attain a confusion matrix, as shown in Table 3.

Author Contributions

Z. conceived the main idea of the study, designed the overall framework, and supervised the research process. M.H. contributed to the implementation, data preprocessing, algorithm development, and experimental analysis. L.S.P. assisted in model evaluation, literature review, result validation, and manuscript preparation. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data will be made available upon reasonable request to the first author.

Conflicts of Interest

The authors declare no conflict of interest.

References

Kavitha, K.S.; Monisha, M.; Nischitha, M.; Nisha, M.; Raksitha, A. Asthma Prediction and Monitoring. Indian J. Allergy Asthma Immunol. 2023, 37, 33–36. [Google Scholar] [CrossRef]
Fontanella, S.; Cucco, A.; Custovic, A. Machine Learning in Asthma Research: Moving Toward a More Integrated Approach. Expert Rev. Respir. Med. 2021, 15, 609–621. [Google Scholar] [CrossRef] [PubMed]
Bhat, G.S.; Shankar, N.; Kim, D.; Song, D.J.; Seo, S.; Panahi, I.M.; Tamil, L. Machine Learning-Based Asthma Risk Prediction Using IoT and Smartphone Applications. IEEE Access 2021, 9, 118708–118715. [Google Scholar] [CrossRef]
Spathis, D.; Vlamos, P. Diagnosing Asthma and Chronic Obstructive Pulmonary Disease with Machine Learning. Health Inform. J. 2019, 25, 811–827. [Google Scholar] [CrossRef] [PubMed]
Tsang, K.C.H.; Pinnock, H.; Wilson, A.M.; Shah, S.A. Application of Machine Learning Algorithms for Asthma Management with mHealth: A Clinical Review. J. Asthma Allergy 2022, 2022, 855–873. [Google Scholar] [CrossRef]
Kothalawala, D.M.; Murray, C.S.; Simpson, A.; Custovic, A.; Tapper, W.J.; Arshad, S.H.; Holloway, J.W.; Rezwan, F.I.; STELAR/UNICORN Investigators. Development of Childhood Asthma Prediction Models Using Machine Learning Approaches. Clin. Transl. Allergy 2021, 11, e12076. [Google Scholar] [CrossRef] [PubMed]
Zhan, J.; Chen, W.; Cheng, L.; Wang, Q.; Han, F.; Cui, Y. Diagnosis of Asthma Based on Routine Blood Biomarkers Using Machine Learning. Comput. Intell. Neurosci. 2020, 2020, 8841002. [Google Scholar] [CrossRef] [PubMed]
Lee, Z.J.; Yang, M.R.; Hwang, B.J. A Sustainable Approach to Asthma Diagnosis: Classification with Data Augmentation, Feature Selection, and Boosting Algorithm. Diagnostics 2024, 14, 723. [Google Scholar] [CrossRef] [PubMed]
Gaudillo, J.; Rodriguez, J.J.R.; Nazareno, A.; Baltazar, L.R.; Vilela, J.; Bulalacao, R.; Domingo, M.; Albia, J. Machine Learning Approach to Single Nucleotide Polymorphism-Based Asthma Prediction. PLoS ONE 2019, 14, e0225574. [Google Scholar] [CrossRef] [PubMed]
Mukherjee, A.; Jena, J.J.; Gourisaria, M.K. Predictive Modeling for Asthma Disease Detection: A Comparative Study of Machine Learning Algorithms. 2024. Available online: https://hal.science/hal-04731433v1 (accessed on 1 January 2025).
Molfino, N.A.; Turcatel, G.; Riskin, D. Machine Learning Approaches to Predict Asthma Exacerbations: A Narrative Review. Adv. Ther. 2024, 41, 534–552. [Google Scholar] [CrossRef] [PubMed]
Turcatel, G.; Xiao, Y.; Caveney, S.; Gnacadja, G.; Kim, J.; Molfino, N.A. Predicting Asthma Exacerbations Using Machine Learning Models. Adv. Ther. 2024, 42, 362–374. [Google Scholar] [CrossRef] [PubMed]
Xiong, S.; Chen, W.; Jia, X.; Jia, Y.; Liu, C. Machine Learning for Prediction of Asthma Exacerbations Among Asthmatic Patients: A Systematic Review and Meta-Analysis. BMC Pulm. Med. 2023, 23, 278. [Google Scholar] [CrossRef] [PubMed]
Prosperi, M.C.; Marinho, S.; Simpson, A.; Custovic, A.; Buchan, I.E. Predicting Phenotypes of Asthma and Eczema with Machine Learning. BMC Med. Genom. 2014, 7, S7. [Google Scholar] [CrossRef] [PubMed]
Chatzimichail, E.; Paraskakis, E.; Sitzimi, M.; Rigas, A. An intelligent system approach for asthma prediction in symptomatic preschool children. Comput. Math. Methods Med. 2013, 2013, 240182. [Google Scholar] [CrossRef] [PubMed]
Li, D.; Abhadiomhen, S.E.; Zhou, D.; Shen, X.J.; Shi, L.; Cui, Y. Asthma Prediction via Affinity Graph Enhanced Classifier: A Machine Learning Approach Based on Routine Blood Biomarkers. J. Transl. Med. 2024, 22, 100. [Google Scholar] [CrossRef] [PubMed]
Airehrour, D.; Gutierrez, J.; Kumar Ray, S. GradeTrust: A secure trust based routing protocol for MANETs. In Proceedings of the 25th International Telecommunication Networks and Applications Conference (ITNAC), Sydney, NSW, Australia, 18–20 November 2015; pp. 65–70. [Google Scholar] [CrossRef]
Diwaker, C.; Tomar, P.; Solanki, A.; Nayyar, A.; Jhanjhi, N.Z.; Abdullah, A.; Supramaniam, M. A New Model for Predicting Component- Based Software Reliability Using Soft Computing. IEEE Access 2019, 7, 147191–147203. [Google Scholar] [CrossRef]
Kok, S.H.; Abdullah, A.; Jhanjhi, N.Z.; Supramaniam, M. A review of intrusion detection system using machine learning approach. Int. J. Eng. Res. Technol. 2019, 12, 8–15. [Google Scholar]

Figure 1. Flow of this research paper.

Figure 2. Proposed framework.

Figure 3. Boxplot.

Figure 4. Unbalanced dataset.

Figure 5. Accuracy of classifiers.

Table 1. Patient data attributes used in this study.

Attribute 1	Attribute 2
Patient ID	Pollution Exposure
Age	Pollen Exposure
Gender	Dust Exposure
Ethnicity	Pet Allergy
Education Level	Family History of Asthma
BMI	History of Allergies
Smoking	Eczema
Physical Activity	Hay Fever
Diet Quality	Sleep Quality
Gastroesophageal Symptoms	Reflux
Lung Function FEV1	Lung Function FVC
Wheezing	Shortness of Breath
Chest Tightness	Coughing
Nighttime Symptoms	Exercise-Induced Diagnosis

Table 2. Classification report by class.

Class	Count	Percentage	Precision	Recall
0	2268	94.81%	99.69%	95.59%
1	124	5.18%	95.76%	99.71%

Table 3. Confusion matrix with class-wise precision and recall.

	True 0	True 1	Class Precision	Class Recall
0	650	2	99.69%	95.59%
1	30	678	95.76%	99.71%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zahab; Hussain, M.; Parwati, L.S. Prediction of Asthma Disease Using Machine Learning Algorithm. Eng. Proc. 2025, 107, 115. https://doi.org/10.3390/engproc2025107115

AMA Style

Zahab, Hussain M, Parwati LS. Prediction of Asthma Disease Using Machine Learning Algorithm. Engineering Proceedings. 2025; 107(1):115. https://doi.org/10.3390/engproc2025107115

Chicago/Turabian Style

Zahab, Manzoor Hussain, and Lusiana Sani Parwati. 2025. "Prediction of Asthma Disease Using Machine Learning Algorithm" Engineering Proceedings 107, no. 1: 115. https://doi.org/10.3390/engproc2025107115

APA Style

Zahab, Hussain, M., & Parwati, L. S. (2025). Prediction of Asthma Disease Using Machine Learning Algorithm. Engineering Proceedings, 107(1), 115. https://doi.org/10.3390/engproc2025107115

Article Menu

Prediction of Asthma Disease Using Machine Learning Algorithm^†

Abstract

1. Introduction

2. Literature Review

3. Methodology

3.1. Boxplot

3.2. Dataset Description

4. Results

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Prediction of Asthma Disease Using Machine Learning Algorithm †

Abstract

1. Introduction

2. Literature Review

3. Methodology

3.1. Boxplot

3.2. Dataset Description

4. Results

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Prediction of Asthma Disease Using Machine Learning Algorithm^†