From Black Box to Glass Box: SHAP-Explained XGBoost Model for Coronary Artery Disease Prediction
Abstract
1. Introduction
2. Related Works
3. Materials and Methods
3.1. Dataset
3.2. Data Preprocessing
3.3. Modeling/Machine Learning Algorithms
3.3.1. Support Vector Machine (SVM)
3.3.2. Logistic Regression (LR)
3.3.3. K-Nearest Neighbor (KNN)
3.3.4. Naïve Bayes (NB)
3.3.5. Random Forest (RF)
3.3.6. Extreme Gradient Boosting (XGBoost)
3.3.7. ANOVA
3.3.8. Shapley Additive Explanations (SHAP)
3.4. Proposed Method
4. Results and Discussion
Computational Cost and Interpretability Review
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Ramalingam, V.V.; Dandapath, A.; Raja, M.K. Heart disease prediction using machine learning techniques: A survey. Int. J. Eng. Technol. 2018, 7, 684–687. [Google Scholar] [CrossRef]
- Kundu, J.; Kundu, S. Cardiovascular disease (CVD) and its associated risk factors among older adults in India: Evidence from LASI Wave 1. Clin. Epidemiol. Glob. Health 2022, 13, 100937. [Google Scholar] [CrossRef]
- WHO. Cardiovascular Diseases (CVDs). Available online: https://www.who.int/news-room/fact-sheets/detail/cardiovascular-diseases-(cvds) (accessed on 6 December 2024).
- Ali, F.; El-Sappagh, S.; Islam, S.R.; Kwak, D.; Ali, A.; Imran, M.; Kwak, K.S. A smart healthcare monitoring system for heart disease prediction based on ensemble deep learning and feature fusion. Inf. Fusion 2020, 63, 208–222. [Google Scholar] [CrossRef]
- Ali, M.M.; Paul, B.K.; Ahmed, K.; Bui, F.M.; Quinn, J.M.; Moni, M.A. Heart disease prediction using supervised machine learning algorithms: Performance analysis and comparison. Comput. Biol. Med. 2021, 136, 104672. [Google Scholar] [CrossRef]
- Anitha, S.; Sridevi, N. Heart disease prediction using data mining techniques. J. Anal. Comput. 2019, 13, 47–55. [Google Scholar]
- Dwivedi, A.K. Performance evaluation of different machine learning techniques for prediction of heart disease. Neural Comput. Appl. 2018, 29, 685–693. [Google Scholar] [CrossRef]
- Al-Taie, R.R.K.; Saleh, B.J.; Saedi, A.Y.F.; Salman, L.A. Analysis of WEKA data mining algorithms Bayes net, random forest, MLP and SMO for heart disease prediction system: A case study in Iraq. Int. J. Electr. Comput. Eng. 2021, 11, 5229. [Google Scholar]
- Pouriyeh, S.; Vahid, S.; Sannino, G.; De Pietro, G.; Arabnia, H.; Gutierrez, J. A comprehensive investigation and comparison of machine learning techniques in the domain of heart disease. In Proceedings of the 2017 IEEE Symposium on Computers and Communications (ISCC), Heraklion, Greece, 3–6 July 2017; pp. 204–207. [Google Scholar]
- Khan, A.; Khan, A.; Khan, M.M.; Farid, K.; Alam, M.M.; Su’ud, M.B.M. Cardiovascular and Diabetes Diseases Classification Using Ensemble Stacking Classifiers with SVM as a Meta Classifier. Diagnostics 2022, 12, 2595. [Google Scholar] [CrossRef]
- Kavitha, M.; Gnaneswar, G.; Dinesh, R.; Sai, Y.R.; Suraj, R.S. Heart disease prediction using hybrid machine learning model. In Proceedings of the 2021 6th International Conference on Inventive Computation Technologies (ICICT), Coimbatore, India, 20–22 January 2021; pp. 1329–1333. [Google Scholar] [CrossRef]
- Hossain, M.M.; Khurshid, S.; Fatema, K.; Hasan, M.Z.; Kamal Hossain, M. Analysis and Prediction of Heart Disease Using Machine Learning and Data Mining Techniques. Can. J. Med. 2021, 3, 36–44. [Google Scholar]
- Shah, D.; Patel, S.; Bharti, S.K. Heart disease prediction using machine learning techniques. SN Comput. Sci. 2020, 1, 345. [Google Scholar] [CrossRef]
- Ishaq, A.; Sadiq, S.; Umer, M.; Ullah, S.; Mirjalili, S.; Rupapara, V.; Nappi, M. Improving the prediction of heart failure patients’ survival using SMOTE and effective data mining techniques. IEEE Access 2021, 9, 39707–39716. [Google Scholar] [CrossRef]
- Reddy, G.T.; Reddy, M.P.K.; Lakshmanna, K.; Rajput, D.S.; Kaluri, R.; Srivastava, G. Hybrid genetic algorithm and a fuzzy logic classifier for heart disease diagnosis. Evol. Intell. 2020, 13, 185–196. [Google Scholar] [CrossRef]
- Musa, U.A.; Muhammad, S.A. Enhancing the Performance of Heart Disease Prediction from Collecting Cleveland Heart Dataset using Bayesian Network. J. Appl. Sci. Environ. Manag. 2022, 26, 1093–1098. [Google Scholar] [CrossRef]
- UCI Repository. Available online: https://archive.ics.uci.edu/ml/datasets/heart+disease (accessed on 26 October 2024).
- Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
- Suthaharan, S.; Suthaharan, S. Support vector machine. In Machine Learning Models and Algorithms for Big Data Classification: Thinking with Examples for Effective Learning; Springer: Boston, MA, USA, 2016; pp. 207–235. [Google Scholar] [CrossRef]
- Gong, Y.; Jia, L. Research on SVM environment performance of parallel computing based on large dataset of machine learning. J. Supercomput. 2019, 75, 5966–5983. [Google Scholar] [CrossRef]
- Binbusayyis, A.; Alaskar, H.; Vaiyapuri, T.; Dinesh, M. An investigation and comparison of machine learning approaches for intrusion detection in IoMT network. J. Supercomput. 2022, 78, 17403–17422. [Google Scholar] [CrossRef]
- Al-Hajjar, A.L.N.; Al-Qurabat, A.K.M. An overview of machine learning methods in enabling IoMT-based epileptic seizure detection. J. Supercomput. 2023, 79, 16017–16064. [Google Scholar] [CrossRef]
- Cervantes, J.; Garcia-Lamont, F.; Rodríguez-Mazahua, L.; Lopez, A. A comprehensive survey on support vector machine classification: Applications, challenges, and trends. Neurocomputing 2020, 408, 189–215. [Google Scholar] [CrossRef]
- LaValley, M.P. Logistic regression. Circulation 2008, 117, 2395–2399. [Google Scholar] [CrossRef]
- Nasteski, V. An overview of the supervised machine learning methods. Horizons B 2017, 4, 51–62. [Google Scholar]
- Mitchell, T.M. Machine Learning; McGraw-Hill: New York, NY, USA, 1997. [Google Scholar]
- Mahesh, B. Machine learning algorithms-a review. Int. J. Sci. Res. 2020, 9, 381–386. [Google Scholar] [CrossRef]
- Jackins, V.; Vimal, S.; Kaliappan, M.; Lee, M.Y. AI-based smart prediction of clinical disease using random forest classifier and Naive Bayes. J. Supercomput. 2021, 77, 5198–5219. [Google Scholar] [CrossRef]
- Ullah, F.; Moon, J.; Naeem, H.; Jabbar, S. Explainable artificial intelligence approach in combating real-time surveillance of COVID19 pandemic from CT scan and X-ray images using ensemble model. J. Supercomput. 2022, 78, 19246–19271. [Google Scholar] [CrossRef]
- Prabha, A.; Yadav, J.; Rani, A.; Singh, V. Design of intelligent diabetes mellitus detection system using hybrid feature selection based XGBoost classifier. Comput. Biol. Med. 2021, 136, 104664. [Google Scholar] [CrossRef]
- Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Adewole, K.S.; Han, T.; Wu, W.; Song, H.; Sangaiah, A.K. Twitter spam account detection based on clustering and classification methods. J. Supercomput. 2020, 76, 4802–4837. [Google Scholar] [CrossRef]
- Farzipour, A.; Elmi, R.; Nasiri, H. Detection of Monkeypox cases based on symptoms using XGBoost and Shapley additive explanations methods. Diagnostics 2023, 13, 2391. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar] [CrossRef]
- Abbasniya, M.R.; Sheikholeslamzadeh, S.A.; Nasiri, H.; Emami, S. Classification of breast tumors based on histopathology images using deep features and ensemble of gradient boosting methods. Comput. Electr. Eng. 2022, 103, 108382. [Google Scholar] [CrossRef]
- Deng, A.; Zhang, H.; Wang, W.; Zhang, J.; Fan, D.; Chen, P.; Wang, B. Developing computational model to predict protein-protein interaction sites based on the XGBoost algorithm. Int. J. Mol. Sci. 2020, 21, 2274. [Google Scholar] [CrossRef]
- Maleki, A.; Raahemi, M.; Nasiri, H. Breast cancer diagnosis from histopathology images using deep neural network and XGBoost. Biomed. Signal Process. Control 2023, 86, 105152. [Google Scholar] [CrossRef]
- Bennasar, M.; Hicks, Y.; Setchi, R. Feature selection using joint mutual information maximization. Expert Syst. Appl. 2015, 42, 8520–8532. [Google Scholar]
- Kumar, M.; Rath, N.K.; Swain, A.; Rath, S.K. Feature selection and classification of microarray data using MapReduce based ANOVA and K-nearest neighbor. Procedia Comput. Sci. 2015, 54, 301–310. [Google Scholar] [CrossRef]
- Johnson, K.J.; Synovec, R.E. Pattern recognition of jet fuels: Comprehensive GC× GC with ANOVA-based feature selection and principal component analysis. Chemom. Intell. Lab. Syst. 2002, 60, 225–237. [Google Scholar] [CrossRef]
- Mangalathu, S.; Hwang, S.H.; Jeon, J.S. Failure mode and effects analysis of RC members based on machine-learning-based SHapley Additive exPlanations (SHAP) approach. Eng. Struct. 2020, 219, 110927. [Google Scholar] [CrossRef]
- Lundberg, S.M.; Lee, S.I. A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst. 2017, 30, 4768–4777. [Google Scholar]
- Parsa, A.B.; Movahedi, A.; Taghipour, H.; Derrible, S.; Mohammadian, A.K. Toward safer highways, application of XGBoost and SHAP for real-time accident detection and feature analysis. Accid. Anal. Prev. 2020, 136, 105405. [Google Scholar] [CrossRef]
- Fatahi, R.; Nasiri, H.; Homafar, A.; Khosravi, R.; Siavoshi, H.; Chehreh Chelgani, S. Modeling operational cement rotary kiln variables with explainable artificial intelligence methods—A “conscious lab” development. Part. Sci. Technol. 2023, 41, 715–724. [Google Scholar] [CrossRef]
- Das, S.; Sultana, M.; Bhattacharya, S.; Sengupta, D.; De, D. XAI–reduct: Accuracy preservation despite dimensionality reduction for heart disease classification using explainable AI. J. Supercomput. 2023, 1–31. [Google Scholar] [CrossRef]










| Variable Name | Description | Minimum | Maximum | Average | STD |
|---|---|---|---|---|---|
| Age | This shows the age of the patient. | 29 | 77 | 54.5 | 9.03 |
| Sex | This has two values, 0 and 1 | 0 | 1 | - | - |
| Chest Pain (CP) | This indicates the type of chest pain experienced, and is described by the following four numbers: Number 1: normal angina. Number 2: uncommon angina. Number 3: non-angina pain. Number 4: unsigned. | 1 | 4 | - | - |
| Blood Pressure (trestbps) | This variable will show people’s blood pressures when they are admitted to the hospital, and this will be stored in each patient’s medical record. | 94 | 200 | 131.69 | 17.76 |
| Cholesterol (chol) | This determines the amount of cholesterol in the blood (in milligrams per decilitre, mg/DL). | 126 | 564 | 247.35 | 51.99 |
| Blood Sugar (trestbps) | This variable shows the patient’s blood sugar level. | 0 | 1 | - | - |
| Electrocardiographic (restecg) | This variable determines the results of resting electrocardiography. Zero value: normal. Value one: abnormal. Value two: left ventricular hypertrophy. | 0 | 2 | - | - |
| Heart Rate (thalach) | The highest number of heartbeats. | 71 | 202 | 149.59 | 22.94 |
| Exercise Induced (exang) | This is a two-valued variable, where a value of one indicates the presence of a sore throat or angina, and zero indicates the absence of a sore throat or angina. | 0 | 1 | - | - |
| Depression (oldpeak) | This is a continuous variable that shows ST depression caused by exercise compared to rest. | 0 | 6.20 | 1.05 | 1.16 |
| Slope | This factor is the ECG test, which has three values: Value 1: upward slope. Value 2: smooth and without slope. Value 3: downward slope. | 1 | 3 | - | - |
| Ca | Number of main vessels stained by fluoroscopy. | 0 | 3 | - | - |
| Heart Condition (Thal) | This variable has three values: Value 3: natural. Value 6: fixed defect. Value 7: reversible defect. | 3 | 7 | - | - |
| Target | This indicates the presence or absence of coronary artery disease in the patient, and has 2 values: 0: indicates the absence of coronary artery disease. 1: indicates the presence of coronary artery disease. | 0 | 1 | - | - |
| Parameter | Values |
|---|---|
| Random Forest | |
| n_estimators: This identifies the trees in the forest | 1000 |
| criterion: It is a function that determines the quality of division. | Entropy |
| SVM | |
| Kernel | RBF |
| K-Nearest Neighbor | |
| n_neighbors: Determines the number of neighbors. | 3, 7 |
| P: Represents Minkowski metric. | 2 |
| Parameter | Value |
|---|---|
| Maximum depth of tree | 1 |
| Subsample ratio of the training instance | 0.6 |
| Minimum sum of instance weight | 5 |
| Alpha | 0.1 |
| Subsample ratio of columns when constructing each tree | 0.6 |
| Learning Rate | 0.4 |
| Number of predictors | 50 |
| Number of Features | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Model | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 |
| LR | 0.8666 | 0.8833 | 0.9000 | 0.9000 | 0.8833 | 0.9000 | 0.8833 | 0.9000 | 0.8666 | 0.8500 |
| NB | 0.8833 | 0.8833 | 0.8833 | 0.8833 | 0.9000 | 0.9000 | 0.8833 | 0.9000 | 0.9000 | 0.90 |
| RF | 0.8166 | 0.8166 | 0.85 | 0.8500 | 0.8500 | 0.9000 | 0.8666 | 0.8666 | 0.8666 | 0.8833 |
| SVM | 0.8666 | 0.8666 | 0.8833 | 0.8833 | 0.8833 | 0.9166 | 0.9000 | 0.8833 | 0.9000 | 0.8500 |
| KNN, K = 7 | 0.8833 | 0.8166 | 0.8500 | 0.9166 | 0.9166 | 0.8833 | 0.9000 | 0.8833 | 0.8833 | 0.8833 |
| KNN, K = 3 | 0.8333 | 0.8666 | 0.8166 | 0.8833 | 0.9166 | 0.9166 | 0.9333 | 0.9000 | 0.9166 | 0.8833 |
| XGBOOST | 0.8166 | 0.7666 | 0.8333 | 0.8333 | 0.8000 | 0.8500 | 0.8019 | 0.8333 | 0.8500 | 0.8500 |
| Number of Features | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Model | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 |
| LR | 0.8333 | 0.875 | 0.9166 | 0.9166 | 0.9166 | 0.9166 | 0.9166 | 0.9166 | 0.9166 | 0.9166 |
| NB | 0.7916 | 0.8333 | 0.875 | 0.875 | 0.9166 | 0.9166 | 0.875 | 0.9166 | 0.9166 | 0.9166 |
| RF | 0.7083 | 0.7916 | 0.8333 | 0.8333 | 0.8333 | 0.875 | 0.875 | 0.8333 | 0.875 | 0.875 |
| SVM | 0.7916 | 0.7916 | 0.7916 | 0.7916 | 0.8333 | 0.875 | 0.875 | 0.8333 | 0.8333 | 0.8333 |
| KNN, K = 7 | 0.7916 | 0.75 | 0.75 | 0.8333 | 0.8333 | 0.7916 | 0.875 | 0.875 | 0.875 | 0.875 |
| KNN, K = 3 | 0.75 | 0.8333 | 0.75 | 0.8333 | 0.8333 | 0.875 | 0.9166 | 0.875 | 0.9166 | 0.9166 |
| XGBOOST | 0.7083 | 0.875 | 0.8333 | 0.8333 | 0.875 | 0.8333 | 0.7814 | 0.7916 | 0.875 | 0.875 |
| Number of Features | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Model | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 |
| LR | 0.8333 | 0.8400 | 0.8461 | 0.8461 | 0.8148 | 0.8461 | 0.8148 | 0.8461 | 0.7857 | 0.8461 |
| NB | 0.7916 | 0.8695 | 0.8400 | 0.8571 | 0.8461 | 0.8461 | 0.8400 | 0.8461 | 0.8461 | 0.9160 |
| RF | 0.8095 | 0.7600 | 0.800 | 0.800 | 0.800 | 0.875 | 0.8076 | 0.8333 | 0.8076 | 0.8400 |
| SVM | 0.8636 | 0.8636 | 0.9047 | 0.9047 | 0.8695 | 0.9130 | 0.8750 | 0.8695 | 0.9090 | 0.800 |
| KNN, K = 7 | 0.9047 | 0.7826 | 0.8571 | 0.9523 | 0.9523 | 0.9047 | 0.875 | 0.8400 | 0.8400 | 0.8400 |
| KNN, K = 3 | 0.8181 | 0.8333 | 0.7826 | 0.8695 | 0.9523 | 0.9130 | 0.9166 | 0.8750 | 0.8800 | 0.8148 |
| XGBOOST | 0.8095 | 0.8076 | 0.7692 | 0.7692 | 0.7000 | 0.8000 | 0.8039 | 0.7916 | 0.7777 | 0.7777 |
| Number of Features | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Model | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 |
| LR | 0.8333 | 0.8571 | 0.8799 | 0.8799 | 0.8627 | 0.8799 | 0.8627 | 0.8799 | 0.8461 | 0.8301 |
| NB | 0.7916 | 0.8510 | 0.8571 | 0.8571 | 0.8799 | 0.8799 | 0.8571 | 0.8799 | 0.8461 | 0.8799 |
| RF | 0.7755 | 0.7755 | 0.8163 | 0.8163 | 0.8163 | 0.8571 | 0.84 | 0.8333 | 0.84 | 0.8571 |
| SVM | 0.8260 | 0.8260 | 0.8444 | 0.8444 | 0.8510 | 0.8936 | 0.875 | 0.8510 | 0.8695 | 0.8163 |
| KNN, K = 7 | 0.8444 | 0.7659 | 0.7999 | 0.8888 | 0.8888 | 08444 | 0.875 | 0.8571 | 0.8571 | 0.8571 |
| KNN, K = 3 | 0.7826 | 0.8333 | 0.7659 | 0.8510 | 0.8888 | 0.8936 | 0.9166 | 0.875 | 0.8979 | 0.8627 |
| XGBOOST | 0.7555 | 0.8400 | 0.8000 | 0.8000 | 0.7777 | 0.8163 | 0.7899 | 0.7916 | 0.8235 | 0.8235 |
| Model | Number of Features | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | |
| LR | 0.8116 | 0.8078 | 0.8312 | 0.8381 | 0.8314 | 0.8348 | 0.8248 | 0.8282 | 0.8382 | 0.8517 |
| NB | 0.7983 | 0.8080 | 0.8245 | 0.8312 | 0.8313 | 0.8213 | 0.8248 | 0.8248 | 0.8247 | 0.8247 |
| RF | 0.7773 | 0.7844 | 0.7811 | 0.8145 | 0.8147 | 0.8248 | 0.8081 | 0.8045 | 0.8181 | 0.8079 |
| SVM | 0.7947 | 0.8012 | 0.8180 | 0.8079 | 0.8217 | 0.8350 | 0.8181 | 0.8180 | 0.8214 | 0.8349 |
| KNN | 0.8112 | 0.8078 | 0.8045 | 0.8245 | 0.8350 | 0.8316 | 0.8045 | 0.8045 | 0.8112 | 0.8078 |
| Number of Features | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Model | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 |
| LR | 0.8181 | 0.8115 | 0.8517 | 0.8495 | 0.8434 | 0.8438 | 0.8225 | 0.8280 | 0.8378 | 0.8532 |
| NB | 0.7901 | 0.7987 | 0.8324 | 0.8428 | 0.8320 | 0.8177 | 0.8218 | 0.8239 | 0.8218 | 0.8218 |
| RF | 0.7774 | 0.7879 | 0.7900 | 0.8365 | 0.8225 | 0.8430 | 0.8182 | 0.8279 | 0.8343 | 0.8216 |
| SVM | 0.8175 | 0.8242 | 0.8360 | 0.8270 | 0.8295 | 0.8446 | 0.8423 | 0.8359 | 0.8413 | 0.8621 |
| KNN | 0.8342 | 0.8224 | 0.8042 | 0.8292 | 0.8249 | 0.8284 | 0.7976 | 0.8036 | 0.8080 | 0.8191 |
| Number of Features | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Model | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 |
| LR | 0.7829 | 0.7626 | 0.7660 | 0.7905 | 0.7842 | 0.7905 | 0.7898 | 0.7898 | 0.8027 | 0.8181 |
| NB | 0.7823 | 0.8747 | 0.7886 | 0.7864 | 0.7996 | 0.8007 | 0.7985 | 0.7985 | 0.7985 | 0.7985 |
| RF | 0.7347 | 0.7378 | 0.7296 | 0.7515 | 0.7664 | 0.7572 | 0.7567 | 0.7349 | 0.7584 | 0.7517 |
| SVM | 0.7200 | 0.7349 | 0.7549 | 0.7392 | 0.7776 | 0.7910 | 0.7504 | 0.7561 | 0.7561 | 0.7710 |
| KNN | 0.7398 | 0.7532 | 0.7687 | 0.7910 | 0.8200 | 0.8069 | 0.7815 | 0.7712 | 0.7772 | 0.7582 |
| Number of Features | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Model | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 |
| LR | 0.7918 | 0.7820 | 0.8081 | 0.8150 | 0.8078 | 0.8120 | 0.8019 | 0.8048 | 0.8151 | 0.8310 |
| NB | 0.7798 | 0.7867 | 0.8025 | 0.8080 | 0.8095 | 0.8023 | 0.8039 | 0.8040 | 0.8031 | 0.8031 |
| RF | 0.7528 | 0.7594 | 0.7549 | 0.7892 | 0.7910 | 0.7950 | 0.7814 | 0.7740 | 0.7919 | 0.7814 |
| SVM | 0.7619 | 0.7729 | 0.7907 | 0.7767 | 0.7996 | 0.8134 | 0.7899 | 0.7895 | 0.7921 | 0.8092 |
| KNN | 0.7802 | 0.7810 | 0.7817 | 0.8055 | 0.8187 | 0.8134 | 0.7834 | 0.7801 | 0.7860 | 0.7839 |
| Parameter | Value |
|---|---|
| Maximum depth of tree | 1 |
| The ratio of the subsample to the training sample | 0.95 |
| Minimum sum of instance weight | 1 |
| A fraction of the features used | 1 |
| L1 regularization | 1 |
| Learning Rate | 0.3 |
| Gamma | 1 |
| Number of predictors | 100 |
| Alpha | 30 |
| Accuracy | Precision | Recall | F1-Score | |
|---|---|---|---|---|
| XGBoost | 0.8616 | 0.8168 | 0.8741 | 0.8418 |
| Research | Year | Feature Selection Method | Data Evaluation | Classifier | Accuracy |
|---|---|---|---|---|---|
| [9] | 2017 | - | 10-fold cross-validation | SVM | 84.85 |
| [13] | 2020 | - | Hold-out | KNN | 90.78 |
| [15] | 2020 | Rough set theory | Hold-out | Combine adaptive genetic algorithm with fuzzy logic | 90 |
| [11] | 2021 | - | Hold-out | Combination of Decision Tree and Random Forest | 88.7 |
| [16] | 2022 | Embedded feature selection methods | Hold-out | Bayesian Networks | 88 |
| Proposed Method | ANOVA | Hold-out | KNN | 93.33 | |
| ANOVA | Hold-out | SVM | 91.66 | ||
| Proposed Method | - | 10-fold cross-validation | XGBoost | 86.16 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ahmadian, H.; Emami, S.; Nasiri, H. From Black Box to Glass Box: SHAP-Explained XGBoost Model for Coronary Artery Disease Prediction. Algorithms 2025, 18, 771. https://doi.org/10.3390/a18120771
Ahmadian H, Emami S, Nasiri H. From Black Box to Glass Box: SHAP-Explained XGBoost Model for Coronary Artery Disease Prediction. Algorithms. 2025; 18(12):771. https://doi.org/10.3390/a18120771
Chicago/Turabian StyleAhmadian, Hanieh, Samaneh Emami, and Hamid Nasiri. 2025. "From Black Box to Glass Box: SHAP-Explained XGBoost Model for Coronary Artery Disease Prediction" Algorithms 18, no. 12: 771. https://doi.org/10.3390/a18120771
APA StyleAhmadian, H., Emami, S., & Nasiri, H. (2025). From Black Box to Glass Box: SHAP-Explained XGBoost Model for Coronary Artery Disease Prediction. Algorithms, 18(12), 771. https://doi.org/10.3390/a18120771
