Breast Cancer Classification Using an Adapted Bump-Hunting Algorithm
Abstract
:1. Introduction
2. Material and Methods
2.1. State of the Art of Breast Cancer Classification
2.2. Overview of Major Supervised Algorithms Used for Breast Cancer Classification
2.3. The Patient Rule Induction Method
2.3.1. Overview
2.3.2. The PRIM’s Metrics
2.3.3. Related Works
3. PRIM-Based Framework for Breast Cancer Classification and Explanation
3.1. Presentation of the Framework
- Feature space X = {x1, x2, ..., xn} with categorical and numeric features
- Target variable y = {0,1}
- Training data D = {(x1,y1), (x2,y2), ..., (xn,yn)}
- Minimum support threshold β
- Peeling parameter α
- Random Feature Selection phase with random number of features for each feature combination
- Box construction phase for each subspace having the peeling criteria α and the minimum support β
- Metarules applied to all boxes to find associations between boxes and overlapping between boxes
- Assessing the quality of boxes using the density, the coverage, and the support
- Selection of the final boxes for the classifier
- Final classifier for prediction
- Metarules for detecting overlapping regions
- Weak boxes for subgroup and knowledge discovery
3.1.1. Step 1: Data Preparation and Defining Learning Objectives
3.1.2. Step 2: Building the Boxes with the PRIM on Random Feature Selection
3.1.3. Step 3: Handling Rule Conflict
3.1.4. Step 4: Organizing and Pruning Rules Using Metarules
3.1.5. Step 5: Selecting the Final Classifier
3.2. Illustrative Example of the PRIM-Based Classification Framework
- Ten rules that were contained within each other;
- Five rules that were associated with another five rules.
4. Results
4.1. Experimental Setup
4.2. Empirical Results
- The best results are consistently obtained by Random Forest and XGBoost.
- The PRIM framework maintains strong performance close to the best algorithms.
- Despite its clinical viability, Logistic Regression usually exhibits a lower capacity for discrimination.
- The models generally perform better in the high-sensitivity region, which is crucial for medical applications.
- The performance remains robust across varying dataset sizes and feature dimensions.
5. Discussion
6. Limitations
7. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Bray, F.; Laversanne, M.; Sung, H.; Ferlay, J.; Siegel, R.L.; Soerjomataram, I.; Jemal, A. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA A Cancer J. Clin. 2024, 74, 229–263. [Google Scholar] [CrossRef]
- Worldwide Cancer Data | World Cancer Research Fund International. Available online: http://wcrf.org (accessed on 19 February 2025).
- Friedman, J.H.; Fisher, N.I. Bump hunting in high-dimensional data. Stat. Comput. 1999, 9, 123–143. [Google Scholar] [CrossRef]
- Oviedo, F.; Ferres, J.L.; Buonassisi, T.; Butler, K.T. Interpretable and explainable machine learning for materials science and chemistry. Acc. Mater. Res. 2022, 3, 597–607. [Google Scholar] [CrossRef]
- Linardatos, P.; Papastefanopoulos, V.; Kotsiantis, S. Explainable ai: A review of machine learning interpretability methods. Entropy 2020, 23, 18. [Google Scholar] [CrossRef]
- Nassih, R.; Berrado, A. A Random PRIM Based Algorithm for Interpretable Classification and Advanced Subgroup Discovery. Algorithms 2024, 17, 565. [Google Scholar] [CrossRef]
- Berrado, A.; Runger, G.C. Using metarules to organize and group discovered association rules. Data Min. Knowl. Discov. 2007, 14, 409–431. [Google Scholar] [CrossRef]
- Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; Academic Press: Cambridge, MA, USA, 2016; pp. 785–794. [Google Scholar]
- Friedman, J.; Hastie, T.; Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 2010, 33, 1. [Google Scholar] [CrossRef]
- Asri, H.; Mousannif, H.; Al Moatassime, H.; Noel, T. Using machine learning algorithms for breast cancer risk prediction and diagnosis. Procedia Comput. Sci. 2016, 83, 1064–1069. [Google Scholar] [CrossRef]
- Fatima, N.; Liu, L.; Hong, S.; Ahmed, H. Prediction of Breast Cancer, Comparative Review of Machine Learning Techniques, and Their Analysis. IEEE Access 2020, 8, 150360–150376. [Google Scholar] [CrossRef]
- Jacob, D.S.; Viswan, R.; Manju, V.; PadmaSuresh, L.; Raj, S. A survey on breast cancer prediction using data miningtechniques. In Proceedings of the 2018 Conference on Emerging Devices and Smart Systems (ICEDSS), Tiruchengode, India, 2–3 March 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 256–258. [Google Scholar]
- Zand, H.K.K. A comparative survey on data mining techniques for breast cancer diagnosis and prediction. Indian J. Fundam. Appl. Life Sci. 2015, 5, 4330–4339. [Google Scholar]
- Hou, C.; Zhong, X.; He, P.; Xu, B.; Diao, S.; Yi, F.; Zheng, H.; Li, J. Predicting breast cancer in Chinese women using machine learning techniques: Algorithm development. JMIR Med. Inform. 2020, 8, e17364. [Google Scholar] [CrossRef] [PubMed]
- Naji, M.A.; El Filali, S.; Aarika, K.; Benlahmar, E.H.; Abdelouhahid, R.A.; Debauche, O. Machine learning algorithms for breast cancer prediction and diagnosis. Procedia Comput. Sci. 2021, 191, 487–492. [Google Scholar] [CrossRef]
- Prastyo, P.H.; Paramartha, I.G.Y.; Pakpahan, M.S.M.; Ardiyanto, I. Predicting Breast Cancer: A Comparative Analysis of Machine Learning Algorithms. In Proceedings of the International Conference on Science and Engineering, Antalya, Turkey, 21–25 October 2020; IEEE: Piscataway, NJ, USA, 2020; Volume 3, pp. 455–459. [Google Scholar]
- Ahmad, L.G.; Eshlaghy, A.T.; Poorebrahimi, A.; Ebrahimi, M.; Razavi, A.R. Using three machine learning techniques for predicting breast cancer recurrence. J. Health Med. Inf. 2013, 4, 3. [Google Scholar]
- Tseng, Y.-J.; Huang, C.-E.; Wen, C.-N.; Lai, P.-Y.; Wu, M.-H.; Sun, Y.-C.; Wang, H.-Y.; Lu, J.-J. Predicting breast cancer metastasis by using serum biomarkers and clinicopathological data with machine learning technologies. Int. J. Med. Inform. 2019, 128, 79–86. [Google Scholar] [CrossRef]
- Gupta, S.; Kumar, D.; Sharma, A. Data mining classification techniques applied for breast cancer diagnosis and prognosis. Indian J. Comput. Sci. Eng. (IJCSE) 2011, 2, 188–195. [Google Scholar]
- Li, J.; Zhou, Z.; Dong, J.; Fu, Y.; Li, Y.; Luan, Z.; Peng, X. Predicting breast cancer 5-year survival using machine learning: A systematic review. PLoS ONE 2021, 16, e0250370. [Google Scholar] [CrossRef]
- Nassif, A.B.; Talib, M.A.; Nasir, Q.; Afadar, Y.; Elgendy, O. Breast cancer detection using artificial intelligence techniques: A systematic literature review. Artif. Intell. Med. 2022, 127, 102276. [Google Scholar] [CrossRef]
- Abreu, P.H.; Santos, M.S.; Abreu, M.H.; Andrade, B.; Silva, D.C. Predicting breast cancer recurrence using machine learning techniques: A systematic review. ACM Comput. Surv. (CSUR) 2016, 49, 1–40. [Google Scholar] [CrossRef]
- Houfani, D.; Slatnia, S.; Kazar, O.; Zerhouni, N.; Merizig, A.; Saouli, H. Machine learning techniques for breast cancer diagnosis: Literature review. In Proceedings of the International Conference on Advanced Intelligent Systems for Sustainable Development, Tangier, Morocco, 21–26 December 2020; Springer: Cham, Switzerland, 2020; pp. 247–254. [Google Scholar]
- Shokri, A.; Walker, J.P.; van Dijk, A.I.; Wright, A.J.; Pauwels, V.R. Application of the patient rule induction method to detect hydrologic model behavioral parameters and quantify uncertainty. Hydrol. Process. 2018, 32, 1005–1025. [Google Scholar] [CrossRef]
- Kwakkel, J.H. A generalized many-objective optimization approach for scenario discovery. Futures Foresight Sci. 2019, 1, e8. [Google Scholar] [CrossRef]
- Su, H.C.; Sakata, T.; Herman, C.; Dolins, S. Analysis of Massive Data Accumulations Using Patient Rule Induction Method and Online Analytical Processing. U.S. Patent 6,643,646, 4 November 2003. [Google Scholar]
- Dyson, G. An Application of the Patient Rule-Induction Method to Detect Clinically Meaningful Subgroups from Failed Phase III Clinical Trials. Int. J. Clin. Biostat. Biom. 2021, 7, 38. [Google Scholar] [CrossRef]
- Nassih, R.; Berrado, A. Towards a patient rule induction methodbased classifier. In Proceedings of the 2019 1st International Conference on Smart Systems and Data Science (ICSSD), Rabat, Morocco, 3–4 October 2019; IEEE: Piscataway, NJ, USA, 2019. [Google Scholar]
- Nassih, R.; Berrado, A. Potential for PRIM based classification: A literature review. In Proceedings of the International Conference on Industrial Engineering and Operations Management, Pilsen, Czech Republic, 23–26 July 2019. [Google Scholar]
- Nassih, R.; Berrado, A. State of the art of Fairness, Interpretability and Explainability in Machine Learning: Case of PRIM. In Proceedings of the 13th International Conference on Intelligent Systems: Theories and Applications, Rabat, Morocco, 23–24 September 2020. [Google Scholar]
- Sáez, J.A.; Galar, M.; Krawczyk, B. Addressing the overlapping data problem in classification using the one-vs-one decomposition strategy. IEEE Access 2019, 7, 83396–83411. [Google Scholar] [CrossRef]
- Das, B.; Krishnan, N.C.; Cook, D.J. Handling class overlap and imbalance to detect prompt situations in smart homes. In Proceedings of the 2013 IEEE 13th International Conference on Data Mining Workshops, Washington, DC, USA, 7–10 December 2013; IEEE: Piscataway, NJ, USA, 2013; pp. 266–273. [Google Scholar]
- Lindgren, T. On handling conflicts between rules with numerical features. In Proceedings of the 2006 ACM Symposium on Applied Computing, Dijon, France, 23–27 April 2006; pp. 37–41. [Google Scholar]
- Lindgren, T. Methods for rule conflict resolution. In European Conference on Machine Learning; Springer: Berlin/Heidelberg, Germany, 2024; pp. 262–273. [Google Scholar]
- Reed, R. Pruning algorithms-a survey. IEEE Trans. Neural Netw. 1993, 4, 740–747. [Google Scholar] [CrossRef]
- Fürnkranz, J. Pruning Algorithms for Rule Learning. Mach. Learn. 1997, 27, 139–172. [Google Scholar] [CrossRef]
- Breiman, L.; Friedman, J.; Olshen, R.; Stone, C. Cart. In Classification and Regression Trees; Taylor & Francis: London, UK, 1984. [Google Scholar]
- Demsar, J.; Curk, T.; Erjavec, A.; Gorup, C.; Hocevar, T.; Milutinovic, M.; Mozina, M.; Polajnar, M.; Toplak, M.; Staric, A.; et al. Orange: Data Mining Toolbox in Python. J. Mach. Learn. Res. 2013, 14, 2349–2353. [Google Scholar]
- Powers, D.M. Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation. arXiv 2020, arXiv:2010.16061. [Google Scholar]
Rules for Class = 1 | Coverage | Density | Dimension | Support |
---|---|---|---|---|
R1: 128.0 < Glucose < 199.0 AND 17.0 < SkinThickness < 99.0 AND 0.0 < Insulin < 520.0 AND 0.257 < DiabetesPedigreeFunction < 2.42 AND 25.0 < Age < 57.0 | 0.29 | 0.77 | 5 | 0.13 |
R2: 111.0 < Glucose < 199.0 AND 56.0 < BloodPressure < 122.0 AND 0.0 < SkinThickness < 43.0 AND 0.078 < DiabetesPedigreeFunction < 1.37 AND 32.0 < Age < 54.0 | 0.24 | 0.63 | 5 | 0.13 |
R3: 100.0 < Glucose < 199.0 AND 0.253 < DiabetesPedigreeFunction < 1.16 AND 29.0 < Age < 62.0 | 0.16 | 0.52 | 3 | 0.10 |
R4: 90.0 < Glucose < 199.0 AND 0.1495 < DiabetesPedigreeFunction < 2.42 AND 22.0 < Age < 81.0 | 0.24 | 0.22 | 3 | 0.38 |
R5: 0.0 < BloodPressure < 82.0 AND 12.0 < SkinThickness < 99.0 AND 0.0 < Insulin < 99.0 AND 0.1265 < DiabetesPedigreeFunction < 2.42 AND 24.0 < Age < 81.0 | 0.02 | 0.15 | 5 | 0.05 |
R6: 89.0 < Glucose < 199.0 AND 0.1265 < DiabetesPedigreeFunction < 2.42 | 0.03 | 0.13 | 2 | 0.08 |
R7: 128.0 < Glucose < 199.0 AND 17.0 < SkinThickness < 99.0 AND 25.0 < Age < 56.0 | 0.34 | 0.75 | 3 | 0.16 |
R8: 101.0 < Glucose < 199.0 AND 60.0 < BloodPressure < 85.0 AND 0.0 < SkinThickness < 26.0 AND 33.0 < Age < 52.0 | 0.14 | 0.63 | 4 | 0.08 |
R9: 109.0 < Glucose < 199.0 AND 12.0 < SkinThickness < 99.0 AND 31.0 < Age < 59.0 | 0.08 | 0.53 | 3 | 0.05 |
R10: 124.0 < Glucose < 199.0 AND 0.0 < SkinThickness < 0.0 AND 25.0 < Age < 53.0 | 0.11 | 0.71 | 3 | 0.05 |
R11: 95.0 < Glucose < 199.0 AND 22.0 < Age < 62.0 | 0.23 | 0.22 | 2 | 0.37 |
R12: 0.0 < BloodPressure < 85.0 AND 7.0 < SkinThickness < 99.0 AND 26.0 < Age < 56.0 | 0.03 | 0.25 | 3 | 0.05 |
R13: 93.0 < Glucose < 199.0 AND 60.0 < BloodPressure < 92.0 | 0.03 | 0.16 | 2 | 0.08 |
R14: 130.0 < Glucose < 199.0 AND 30.05 < BMI < 67.1 | 0.51 | 0.73 | 2 | 0.24 |
R15: 109.0 < Glucose < 199.0 AND 27.85 < BMI < 67.1 | 0.26 | 0.39 | 2 | 0.22 |
R16: 95.0 < Glucose < 199.0 AND 22.79 < BMI < 67.1 | 0.18 | 0.22 | 2 | 0.28 |
R17: 7.0 < Pregnancies < 9.0 AND 145.0 < Glucose < 199.0 AND 0.0 < Insulin < 495.0 | 0.14 | 0.88 | 3 | 0.05 |
R18: 128.0 < Glucose < 199.0 AND 18.0 < SkinThickness < 99.0 AND 74.0 < Insulin < 478.0 | 0.22 | 0.64 | 3 | 0.12 |
R19: 109.0 < Glucose < 199.0 | 0.48 | 0.39 | 1 | 0.42 |
R20: 7.0 < Pregnancies < 17.0 AND 84.0 < Glucose < 199.0 | 0.06 | 0.4 | 2 | 0.05 |
R21: 0.0 < Pregnancies < 3.0 AND 77.0 < Glucose < 199.0 AND 13.0 < SkinThickness < 45.0 AND 36.0 < Insulin < 846.0 | 0.04 | 0.12 | 4 | 0.12 |
R22: 4.0 < Pregnancies < 6.0 AND 0.0 < Glucose < 105.0 AND 0.0 < SkinThickness < 42.0 AND 0.0 < Insulin < 156.0 | 0.02 | 0.14 | 4 | 0.06 |
R23: 0.0 < Pregnancies < 2.0 AND 90.0 < Glucose < 199.0 AND 0.0 < SkinThickness < 42.0 AND 0.0 < Insulin < 15.0 | 0.01 | 0.09 | 4 | 0.05 |
R24: 8.0 < Pregnancies < 17.0 AND 24.0 < SkinThickness < 43.0 AND 31.0 < BMI < 45.90 | 0.11 | 0.75 | 3 | 0.05 |
R25: 30.05 < BMI < 67.1 | 0.69 | 0.43 | 1 | 0.55 |
R26: 7.0 < Pregnancies < 17.0 AND 0.0 < BloodPressure < 94.0 AND 23.15 < BMI < 67.1 | 0.08 | 0.45 | 3 | 0.05 |
R27: 4.0 < Pregnancies < 17.0 AND 0.0 < BloodPressure < 80.0 AND 0.0 < SkinThickness < 24.0 | 0.06 | 0.27 | 3 | 0.07 |
R28: 3.0 < Pregnancies < 17.0 AND 0.0 < SkinThickness < 33.0 | 0.03 | 0.14 | 2 | 0.07 |
R29: 0.0 < BloodPressure < 86.0 AND 23.15 < BMI < 29.5 | 0.03 | 0.09 | 2 | 0.12 |
R30: 28.1 < BMI < 67.1 AND 0.20 < DiabetesPedigreeFunction < 2.42 AND 31.0 < Age < 60.0 | 0.5 | 0.62 | 3 | 0.27 |
R31: 29.0 < Insulin < 846.0 AND 26.1 < BMI < 67.1 AND 0.1275 < DiabetesPedigreeFunction < 2.42 AND 28.0 < Age < 53.0 | 0.1 | 0.49 | 4 | 0.07 |
R32: 26.9 < BMI < 67.1 AND 0.1265 < DiabetesPedigreeFunction < 2.42 AND 25.0 < Age < 62.0 | 0.21 | 0.36 | 3 | 0.20 |
R33: 0.0 < Insulin < 194.0 AND 22.79 < BMI < 67.1 AND 0.1195 < DiabetesPedigreeFunction < 0.817 AND 23.0 < Age < 81.0 | 0.11 | 0.22 | 4 | 0.17 |
R34: 22.0 < Age < 54.0 | 0.059 | 0.11 | 1 | 0.18 |
R35: 24.75 < BMI < 67.1 | 0.018 | 0.11 | 1 | 0.05 |
R36: 30.85 < BMI < 67.1 | 0.74 | 0.46 | 1 | 0.56 |
R37: 23.25 < BMI < 67.1 | 0.24 | 0.24 | 1 | 0.34 |
R38: 0.0 < BMI < 23.05 | 0.01 | 0.04 | 1 | 0.08 |
R39: 7.0 < Pregnancies < 12.0 AND 110.0 < Insulin < 846.0 AND 0.188 < DiabetesPedigreeFunction < 2.42 | 0.12 | 0.82 | 3 | 0.05 |
R40: 0.3235 < DiabetesPedigreeFunction < 2.42 | 0.56 | 0.37 | 1 | 0.52 |
R41: 7.0 < Pregnancies < 12.0 AND 64.0 < BloodPressure < 122.0 AND 0.1215 < DiabetesPedigreeFunction < 0.2825 | 0.067 | 0.43 | 3 | 0.05 |
R42: 0.11 < DiabetesPedigreeFunction < 0.2825 | 0.20 | 0.25 | 1 | 0.27 |
R43: 0.0 < Insulin < 140.0 AND 0.086 < DiabetesPedigreeFunction < 2.42 | 0.03 | 0.16 | 2 | 0.07 |
R44: 7.0 < Pregnancies < 9.0 AND 145.0 < Glucose < 199.0 AND 0.0 < Insulin < 495.0 | 0.14 | 0.88 | 3 | 0.06 |
R45: 134.0 < Glucose < 199.0 AND 0.0 < Insulin < 478.0 | 0.40 | 0.59 | 2 | 0.23 |
R46: 109.0 < Glucose < 199.0 | 0.31 | 0.34 | 1 | 0.31 |
R47: 7.0 < Pregnancies < 17.0 AND 84.0 < Glucose < 199.0 | 0.05 | 0.4 | 2 | 0.05 |
R48: 0.0 < Pregnancies < 3.0 AND 78.0 < Glucose < 199.0 AND 36.0 < Insulin < 846.0 | 0.04 | 0.11 | 3 | 0.13 |
R49: 4.0 < Pregnancies < 6.0 AND 0.0 < Glucose < 104.0 AND 0.0 < Insulin < 156.0 | 0.02 | 0.14 | 3 | 0.06 |
R50: 0.0 < Pregnancies < 2.0 AND 90.0 < Glucose < 199.0 AND 0.0 < Insulin < 15.0 | 0.01 | 0.09 | 3 | 0.05 |
R51: 28.1 < BMI < 67.1 AND 0.20 < DiabetesPedigreeFunction < 2.42 AND 31.0 < Age < 60.0 | 0.5 | 0.62 | 3 | 0.27 |
R52: 26.70 < BMI < 35.45 AND 0.1275 < DiabetesPedigreeFunction < 2.42 AND 30.0 < Age < 53.0 | 0.09 | 0.53 | 3 | 0.06 |
R53: 29.95 < BMI < 67.1 AND 0.1265 < DiabetesPedigreeFunction < 2.42 AND 25.0 < Age < 81.0 | 0.21 | 0.38 | 3 | 0.18 |
R54: 23.35 < BMI < 67.1 AND 0.1275 < DiabetesPedigreeFunction < 0.6535 AND 28.0 < Age < 61.0 | 0.05 | 0.32 | 3 | 0.05 |
R55:0.1195 < DiabetesPedigreeFunction < 2.42 AND 22.0 < Age < 60.0 | 0.12 | 0.14 | 2 | 0.29 |
R56: 27.85 < BMI < 67.1 AND 21.0 < Age < 62.0 | 0.02 | 0.17 | 2 | 0.05 |
Rules for Class = 0 | ||||
R1: 94.0 < Glucose < 157.0 AND 0.0 < BloodPressure < 88.0 AND 60.0 < Insulin < 228.0 AND 0.078 < DiabetesPedigreeFunction < 0.899 AND 21.0 < Age < 49.0 | 0.25 | 0.76 | 5 | 0.22 |
R2: 89.0 < Glucose < 183.0 AND 0.0 < BloodPressure < 90.0 AND 0.0 < SkinThickness < 41.0 AND 0.0 < Insulin < 190.0 AND 0.078 < DiabetesPedigreeFunction < 1.1855 AND 21.0 < Age < 59.0 | 0.42 | 0.66 | 6 | 0.42 |
R3: 80.0 < Glucose < 189.0 AND 52.0 < BloodPressure < 82.0 AND 12.0 < SkinThickness < 39.0 AND 49.0 < Insulin < 394.0 AND 0.259 < DiabetesPedigreeFunction < 2.42 | 0.07 | 0.84 | 5 | 0.06 |
R4: 70.0 < BloodPressure < 106.0 AND 16.0 < SkinThickness < 50.0 AND 0.0 < Insulin < 145.0 AND 0.1535 < DiabetesPedigreeFunction < 0.712 | 0.06 | 0.71 | 4 | 0.06 |
R5: 52.0 < BloodPressure < 122.0 AND 0.0 < Insulin < 485.0 AND 0.239 < DiabetesPedigreeFunction < 2.42 | 0.14 | 0.54 | 3 | 0.16 |
R6: 0.0 < Glucose < 189.0 AND 0.11 < DiabetesPedigreeFunction < 1.143 | 0.05 | 0.43 | 2 | 0.07 |
R7: 93.0 < Glucose < 137.0 AND 54.0 < BloodPressure < 88.0 AND 7.0 < SkinThickness < 40.0 AND 21.0 < Age < 52.0 | 0.32 | 0.71 | 4 | 0.30 |
R8: 90.0 < Glucose < 157.0 AND 23.25 < BMI < 41.65 | 0.61 | 0.68 | 2 | 0.58 |
R9: 19.20 < BMI < 47.34 | 0.36 | 0.62 | 1 | 0.37 |
R10: 0.0 < Pregnancies < 0.0 AND 13.0 < SkinThickness < 45.0 AND 63.0 < Insulin < 291.0 | 0.07 | 0.86 | 3 | 0.05 |
R11: 2.0 < Pregnancies < 7.0 AND 92.0 < Glucose < 133.0 AND 0.0 < SkinThickness < 39.0 AND 73.0 < Insulin < 267.0 | 0.10 | 0.79 | 4 | 0.09 |
R12: 1.0 < Pregnancies < 8.0 AND 105.0 < Glucose < 169.0 AND 0.0 < SkinThickness < 47.0 AND 74.0 < Insulin < 846.0 | 0.14 | 0.70 | 4 | 0.13 |
R13: 1.0 < Pregnancies < 17.0 AND 80.0 < Glucose < 199.0 AND 0.0 < SkinThickness < 41.0 AND 0.0 < Insulin < 220.0 | 0.51 | 0.63 | 4 | 0.53 |
R14: 56.0 < Glucose < 199.0 AND 0.0 < SkinThickness < 51.0 AND 0.0 < Insulin < 474.0 | 0.15 | 0.57 | 3 | 0.17 |
R15: 0.0 < BloodPressure < 88.0 AND 21.45 < BMI < 43.55 | 0.84 | 0.66 | 2 | 0.83 |
R16: 0.0 < Pregnancies < 10.0 AND 17.0 < SkinThickness < 46.0 AND 20.6 < BMI < 46.15 | 0.06 | 0.78 | 3 | 0.05 |
R17: 0.0 < BloodPressure < 94.0 AND 0.0 < SkinThickness < 47.0 AND 0.0 < BMI < 51.15 | 0.08 | 0.57 | 3 | 0.09 |
R18: 40.0 < Insulin < 215.0 AND 25.1 < BMI < 41.65 AND 0.078 < DiabetesPedigreeFunction < 1.18 AND 21.0 < Age < 46.0 | 0.31 | 0.75 | 4 | 0.27 |
R19: 20.6 < BMI < 43.34 AND 0.078 < DiabetesPedigreeFunction < 0.9155 | 0.55 | 0.63 | 2 | 0.56 |
R20: 15.0 < Insulin < 846.0 AND 0.0 < BMI < 46.6 AND 0.247 < DiabetesPedigreeFunction < 2.2125000000000004 | 0.06 | 0.67 | 3 | 0.06 |
R21: 0.0 < Insulin < 14.0 | 0.07 | 0.56 | 1 | 0.08 |
R22: 0.0 < BloodPressure < 88.0 AND 21.45 < BMI < 43.55 | 0.84 | 0.66 | 2 | 0.83 |
R23: 0.0 < BloodPressure < 106.0 AND 19.20 < BMI < 46.150 | 0.11 | 0.66 | 2 | 0.11 |
R24: 0.0 < BloodPressure < 108.0 | 0.04 | 0.54 | 1 | 0.05 |
R25: 0.0 < Pregnancies < 1.0 AND 62.0 < BloodPressure < 84.0 AND 60.0 < Insulin < 265.0 | 0.12 | 0.85 | 3 | 0.09 |
R26: 2.0 < Pregnancies < 7.0 AND 70.0 < BloodPressure < 88.0 AND 56.0 < Insulin < 160.0 AND 0.078 < DiabetesPedigreeFunction < 0.69 | 0.07 | 0.81 | 4 | 0.05 |
R27: 0.0 < BloodPressure < 90.0 AND 0.0 < Insulin < 220.0 AND 0.1405 < DiabetesPedigreeFunction < 1.1855 | 0.64 | 0.64 | 3 | 0.65 |
R28: 0.0 < Pregnancies < 3.0 AND 52.0 < BloodPressure < 106.0 AND 14.0 < Insulin < 540.0 AND 0.094 < DiabetesPedigreeFunction < 2.42 | 0.06 | 0.78 | 4 | 0.05 |
R29: 1.0 < Pregnancies < 17.0 AND 64.0 < BloodPressure < 108.0 AND 0.098 < DiabetesPedigreeFunction < 2.42 | 0.08 | 0.57 | 3 | 0.10 |
R30: 0.0 < Pregnancies < 2.0 AND 65.0 < Insulin < 291.0 | 0.24 | 0.78 | 2 | 0.20 |
R31: 0.0 < Pregnancies < 10.0 AND 89.0 < Glucose < 169.0 AND 56.0 < Insulin < 846.0 | 0.18 | 0.66 | 3 | 0.18 |
R32: 1.0 < Pregnancies < 17.0 AND 75.0 < Glucose < 199.0 AND 0.0 < Insulin < 0.0 | 0.38 | 0.63 | 3 | 0.40 |
R33: 0.0 < Pregnancies < 6.0 AND 56.0 < Glucose < 187.0 | 0.16 | 0.61 | 2 | 0.17 |
R34: 0.0 < Glucose < 195.0 | 0.04 | 0.48 | 1 | 0.06 |
R35: 23.25 < BMI < 42.5 AND 0.078 < DiabetesPedigreeFunction < 1.09AND 21.0 < Age < 58.0 | 0.77 | 0.67 | 3 | 0.74 |
R36: 19.45 < BMI < 49.65AND 0.2355 < DiabetesPedigreeFunction < 1.31 | 0.16 | 0.67 | 2 | 0.16 |
R37: 0.10 < DiabetesPedigreeFunction < 2.42 AND 22.0 < Age < 81.0 | 0.07 | 0.49 | 2 | 0.09 |
Datasets | Nb of Instances | Nb of Attributes | Class Labels | Class Distribution |
---|---|---|---|---|
Wisconsin | 569 | 32 | Malignant: 1 | 359 |
Benign: 0 | 210 | |||
SEER | 4024 | 12 | Alive: 0 | 3408 |
Dead: 1 | 616 | |||
ISPY1-clinica | 168 | 18 | No: not dead: 0 | 32 |
Yes: dead: 1 | 136 | |||
Mammographic masses | 961 | 6 | 1: malignant | 445 |
0: benign | 516 | |||
NKI dataset | 272 | 1570 | 1: dead | 195 |
0: alive | 77 |
Accuracy | F1-Score | |||||||
---|---|---|---|---|---|---|---|---|
Datasets | RF | XGB | LG | R-PRIM-Cl | RF | XGB | LG | R-PRIM-Cl |
Wisconsin | 97.6 | 95.4 | 94.1 | 96.8 | 96 | 97.3 | 89.9 | 94.9 |
SEER | 94.4 | 97.5 | 88.7 | 98.4 | 96.7 | 97.3 | 87.4 | 96.3 |
IYSP1-clinica | 98.7 | 96.3 | 89.8 | 95.3 | 98 | 95.3 | 88.2 | 94.7 |
Mammographic masses | 95.8 | 98.6 | 92.4 | 97.2 | 97.6 | 97.4 | 92.8 | 96.1 |
NKI dataset | 98 | 97.5 | 85.6 | 95.6 | 97.8 | 95.6 | 93.7 | 96.9 |
Precision | Recall | |||||||
---|---|---|---|---|---|---|---|---|
Datasets | RF | XGB | LG | R-PRIM-Cl | RF | XGB | LG | R-PRIM-Cl |
Wisconsin | 98.2 | 98 | 94.8 | 95.6 | 93.9 | 96.7 | 85.4 | 94.2 |
SEER | 95.6 | 97.6 | 88.1 | 97.1 | 98 | 97.2 | 86.7 | 95.6 |
IYSP1-clinica | 98.9 | 95.2 | 86.5 | 94.6 | 97.2 | 95.4 | 89.9 | 94.8 |
Mammographic masses | 97.7 | 96.3 | 92.4 | 96.3 | 97.5 | 98.7 | 93.2 | 95.8 |
NKI dataset | 97.9 | 94.5 | 93.3 | 96.7 | 97.8 | 96.7 | 94.1 | 97.1 |
Number of Rules Per Class | ||||
---|---|---|---|---|
Before the Metarules | After the Metarules | |||
0 | 1 | 0 | 1 | |
Wisconsin | 36 | 45 | 19 | 34 |
SEER | 28 | 72 | 15 | 44 |
IYSP1-clinica | 5 | 12 | 5 | 11 |
Mammographic masses | 14 | 12 | 9 | 9 |
NKI dataset | 38 | 54 | 12 | 33 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Nassih, R.; Berrado, A. Breast Cancer Classification Using an Adapted Bump-Hunting Algorithm. Algorithms 2025, 18, 136. https://doi.org/10.3390/a18030136
Nassih R, Berrado A. Breast Cancer Classification Using an Adapted Bump-Hunting Algorithm. Algorithms. 2025; 18(3):136. https://doi.org/10.3390/a18030136
Chicago/Turabian StyleNassih, Rym, and Abdelaziz Berrado. 2025. "Breast Cancer Classification Using an Adapted Bump-Hunting Algorithm" Algorithms 18, no. 3: 136. https://doi.org/10.3390/a18030136
APA StyleNassih, R., & Berrado, A. (2025). Breast Cancer Classification Using an Adapted Bump-Hunting Algorithm. Algorithms, 18(3), 136. https://doi.org/10.3390/a18030136