Unsupervised Hierarchical Classification Approach for Imprecise Data in the Breast Cancer Detection
Abstract
:1. Introduction
2. Materials and Methods
2.1. Methodology of the Proposed Approach
- , when and are identical;
- , when and are disjointed;
- Otherwise, .
2.2. Description of the Breast Cancer Example Data
- Diagnosis (M = malignant, B = benign).
- 1.
- Radius (mean of distances from center to points on the perimeter);
- 2.
- Texture (standard deviation of gray-scale values);
- 3.
- Perimeter;
- 4.
- Area;
- 5.
- Smoothness (local variation in radius lengths);
- 6.
- Compactness (perimeter/area − 1.0);
- 7.
- Concavity (severity of concave portions of the contour);
- 8.
- Concave points (number of concave portions of the contour);
- 9.
- Symmetry;
- 10.
- Fractal dimension (“coastline approximation” − 1).
2.3. Statistical Analysis
3. Results
4. Discussion
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
| MDPI | Multidisciplinary Digital Publishing Institute | 
| DOAJ | Directory of open access journals | 
| TLA | Three letter acronym | 
| LD | Linear dichroism | 
Appendix A. Validation Study
| Interval-Valued Approach | Conventional Approach | |||||
|---|---|---|---|---|---|---|
| Estimate | Lower 95% | Upper 95% | Estimate | Lower 95% | Upper 95% | |
| Sensitivity | 1.000 | 0.962 | 1.000 | 0.036 | 0.004 | 0.125 | 
| Specificity | 0.709 | 0.571 | 0.824 | 1.000 | 0.962 | 1.000 | 
| Pos.Pred.Val. | 0.856 | 0.776 | 0.915 | 1.000 | 0.158 | 1.000 | 
| Neg.Pred.Val. | 1.000 | 0.910 | 1.000 | 0.642 | 0.559 | 0.719 | 
| LR+ | 3.438 | 2.275 | 5.193 | 8.571 | 0.419 | 175.363 | 
| LR− | 0.007 | 0.000 | 0.118 | 0.964 | 0.915 | 1.014 | 
| Accuracy | 0.893 | 0.833 | 0.938 | 0.647 | 0.565 | 0.723 | 
| Interval-Valued Approach | Conventional Approach | |||||
|---|---|---|---|---|---|---|
| Estimate | Lower 95% | Upper 95% | Estimate | Lower 95% | Upper 95% | |
| Sensitivity | 0.935 | 0.890 | 0.966 | 0.009 | 0.000 | 0.048 | 
| Specificity | 0.623 | 0.527 | 0.712 | 1.000 | 0.980 | 1.000 | 
| Pos.Pred.Val. | 0.802 | 0.743 | 0.853 | 1.000 | 0.025 | 1.000 | 
| Neg.Pred.Val. | 0.855 | 0.761 | 0.923 | 0.622 | 0.564 | 0.677 | 
| LR+ | 2.480 | 1.953 | 3.149 | 4.878 | 0.200 | 118.743 | 
| LR− | 0.104 | 0.059 | 0.182 | 0.991 | 0.974 | 1.008 | 
| Accuracy | 0.817 | 0.768 | 0.859 | 0.623 | 0.566 | 0.678 | 
References
- Tao, Z.; Shi, A.; Lu, C.; Song, T.; Zhang, Z.; Zhao, J. Breast cancer: Epidemiology and etiology. Cell Biochem. Biophys. 2015, 72, 333–338. [Google Scholar] [CrossRef] [PubMed]
- Huang, J.; Chan, P.S.; Lok, V.; Chen, X.; Ding, H.; Jin, Y.; Yuan, J.; Lao, X.Q.; Zheng, Z.J.; Wong, M.C. Global incidence and mortality of breast cancer: A trend analysis. Aging 2021, 13, 5748. [Google Scholar] [CrossRef] [PubMed]
- Wang, L. Early diagnosis of breast cancer. Sensors 2017, 17, 1572. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Al-Dhabyani, W.; Gomaa, M.; Khaled, H.; Fahmy, A. Dataset of breast ultrasound images. Data Brief 2020, 28, 104863. [Google Scholar] [CrossRef]
- Kolb, T.M.; Lichy, J.; Newhouse, J.H. Comparison of the performance of screening mammography, physical examination, and breast US and evaluation of factors that influence them: An analysis of 27,825 patient evaluations. Radiology 2002, 225, 165–175. [Google Scholar] [CrossRef] [Green Version]
- Bagui, S.C.; Bagui, S.; Pal, K.; Pal, N.R. Breast cancer detection using rank nearest neighbor classification rules. Pattern Recognit. 2003, 36, 25–34. [Google Scholar] [CrossRef]
- Karabatak, M.; Ince, M.C. An expert system for detection of breast cancer based on association rules and neural network. Expert Syst. Appl. 2009, 36, 3465–3469. [Google Scholar] [CrossRef]
- Cheng, H.D.; Shan, J.; Ju, W.; Guo, Y.; Zhang, L. Automated breast cancer detection and classification using ultrasound images: A survey. Pattern Recognit. 2010, 43, 299–317. [Google Scholar] [CrossRef] [Green Version]
- Bazazeh, D.; Shubair, R. Comparative study of machine learning algorithms for breast cancer detection and diagnosis. In Proceedings of the 2016 5th International Conference on Electronic Devices, Systems and Applications (ICEDSA), Ras Al Khaimah, United Arab Emirates, 6–8 December 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 1–4. [Google Scholar]
- Chaurasia, V.; Pal, S. A novel approach for breast cancer detection using data mining techniques. Int. J. Innov. Res. Comput. Commun. Eng. 2017, 2, 1–17. [Google Scholar]
- Amrane, M.; Oukid, S.; Gagaoua, I.; Ensari, T. Breast cancer classification using machine learning. In Proceedings of the 2018 Electric Electronics, Computer Science, Biomedical Engineerings’ Meeting (EBBT), Istanbul, Turkey, 18–19 April 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1–4. [Google Scholar]
- Ramadan, S.Z. Methods used in computer-aided diagnosis for breast cancer detection using mammograms: A review. J. Healthc. Eng. 2020, 2020, 9162464. [Google Scholar] [CrossRef] [Green Version]
- Carrizosa, E.; Morales, D.R. Supervised classification and mathematical optimization. Comput. Oper. Res. 2013, 40, 150–165. [Google Scholar] [CrossRef] [Green Version]
- Bandyopadhyay, S.; Saha, S. Unsupervised Classification: Similarity Measures, Classical and Metaheuristic Approaches, and Applications; Springer: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
- Gharibdousti, M.S.; Haider, S.M.; Ouedraogo, D.; Susan, L. Breast cancer diagnosis using feature extraction techniques with supervised and unsupervised classification algorithms. Appl. Med. Inform. 2019, 41, 40–52. [Google Scholar]
- Dubey, A.K.; Gupta, U.; Jain, S. Analysis of k-means clustering approach on the breast cancer Wisconsin dataset. Int. J. Comput. Assist. Radiol. Surg. 2016, 11, 2033–2047. [Google Scholar] [CrossRef] [PubMed]
- Muhammad, M.; Zeebaree, D.; Brifcani, A.M.A.; Saeed, J.; Zebari, D.A. Region of interest segmentation based on clustering techniques for breast cancer ultrasound images: A review. J. Appl. Sci. Technol. Trends 2020, 1, 78–91. [Google Scholar]
- Niţică, Ş.; Czibula, G.; Tomescu, V.I. A comparative study on using unsupervised learning based data analysis techniques for breast cancer detection. In Proceedings of the 2020 IEEE 14th International Symposium on Applied Computational Intelligence and Informatics (SACI), Timisoara, Romania, 21–23 May 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 000099–000104. [Google Scholar]
- Analytical Methods Committee. Uncertainty of measurement: Implications of its use in analytical science. Analyst 1995, 120, 2303–2308. [Google Scholar] [CrossRef]
- Oosterhuis, W.P.; Bayat, H.; Armbruster, D.; Coskun, A.; Freeman, K.P.; Kallner, A.; Koch, D.; Mackenzie, F.; Migliarino, G.; Orth, M.; et al. The use of error and uncertainty methods in the medical laboratory. Clin. Chem. Lab. Med. CCLM 2018, 56, 209–219. [Google Scholar] [CrossRef]
- Bandemer, H.; Näther, W. Fuzzy Data Analysis; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2012; Volume 20. [Google Scholar]
- Hartigan, J.A. Statistical theory in clustering. J. Classif. 1985, 2, 63–76. [Google Scholar] [CrossRef]
- Makretsov, N.A.; Huntsman, D.G.; Nielsen, T.O.; Yorida, E.; Peacock, M.; Cheang, M.C.; Dunn, S.E.; Hayes, M.; van de Rijn, M.; Bajdik, C.; et al. Hierarchical clustering analysis of tissue microarray immunostaining data identifies prognostically significant groups of breast carcinoma. Clin. Cancer Res. 2004, 10, 6143–6151. [Google Scholar] [CrossRef] [Green Version]
- Triayudi, A.; Fitri, I. Comparison of parameter-free agglomerative hierarchical clustering methods. ICIC Express Lett. 2018, 12, 973–980. [Google Scholar]
- Huang, Z.; Chen, D. A Breast Cancer Diagnosis Method based on VIM Feature Selection and Hierarchical Clustering Random Forest Algorithm. IEEE Access 2021, 10, 3284–3293. [Google Scholar] [CrossRef]
- D’Urso, P.; De Giovanni, L. Robust clustering of imprecise data. Chemom. Intell. Lab. Syst. 2014, 136, 58–80. [Google Scholar] [CrossRef]
- Coppi, R.; D’Urso, P.; Giordani, P. Fuzzy and possibilistic clustering for fuzzy data. Comput. Stat. Data Anal. 2012, 56, 915–927. [Google Scholar] [CrossRef]
- Hathaway, R.J.; Bezdek, J.C.; Pedrycz, W. A parametric model for fusing heterogeneous fuzzy data. IEEE Trans. Fuzzy Syst. 1996, 4, 270–281. [Google Scholar] [CrossRef]
- Sato, M.; Sato, Y. Fuzzy clustering model for fuzzy data. In Proceedings of the 1995 IEEE International Conference on Fuzzy Systems, Yokohama, Japan, 20–24 March 1995; IEEE: Piscataway, NJ, USA, 1995; Volume 4, pp. 2123–2128. [Google Scholar]
- D’Urso, P.; Leski, J. Fuzzy c-ordered medoids clustering for interval-valued data. Pattern Recognit. 2016, 58, 49–67. [Google Scholar] [CrossRef]
- Coppi, R.; Giordani, P.; D’Urso, P. Component models for fuzzy data. Psychometrika 2006, 71, 733. [Google Scholar] [CrossRef]
- D’Urso, P.; Giordani, P. A possibilistic approach to latent component analysis for symmetric fuzzy data. Fuzzy Sets Syst. 2005, 150, 285–305. [Google Scholar] [CrossRef]
- Giordani, P.; Kiers, H.A. Principal component analysis of symmetric fuzzy data. Comput. Stat. Data Anal. 2004, 45, 519–548. [Google Scholar] [CrossRef]
- Denoeux, T.; Masson, M. Multidimensional scaling of interval-valued dissimilarity data. Pattern Recognit. Lett. 2000, 21, 83–92. [Google Scholar] [CrossRef]
- Kabir, S.; Wagner, C.; Havens, T.C.; Anderson, D.T.; Aickelin, U. Novel similarity measure for interval-valued data based on overlapping ratio. In Proceedings of the 2017 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), Naples, Italy, 9–12 July 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1–6. [Google Scholar]
- Mangasarian, O.L.; Street, W.N.; Wolberg, W.H. Breast cancer diagnosis and prognosis via linear programming. Oper. Res. 1995, 43, 570–577. [Google Scholar] [CrossRef] [Green Version]
- Agarap, A.F.M. On breast cancer detection: An application of machine learning algorithms on the wisconsin diagnostic dataset. In Proceedings of the 2nd International Conference on Machine Learning and Soft Computing, Phu Quoc Island, Vietnam, 2–4 February 2018; pp. 5–9. [Google Scholar]
- D’Urso, P.; Giordani, P. A weighted fuzzy c-means clustering model for fuzzy data. Comput. Stat. Data Anal. 2006, 50, 1496–1523. [Google Scholar] [CrossRef]




| Interval-Valued Approach | Conventional Approach | |||||
|---|---|---|---|---|---|---|
| Estimate | Lower 95% | Upper 95% | Estimate | Lower 95% | Upper 95% | |
| Sensitivity | 0.613 | 0.544 | 0.679 | 0.080 | 0.047 | 0.125 | 
| Specificity | 0.905 | 0.869 | 0.933 | 1.000 | 0.990 | 1.000 | 
| Pos.Pred.Val. | 0.793 | 0.723 | 0.852 | 1.000 | 0.805 | 1.000 | 
| Neg.Pred.Val. | 0.798 | 0.755 | 0.836 | 0.647 | 0.605 | 0.687 | 
| LR+ | 6.439 | 4.596 | 9.020 | 58.826 | 3.556 | 973.204 | 
| LR− | 0.427 | 0.360 | 0.508 | 0.920 | 0.883 | 0.957 | 
| Accuracy | 0.796 | 0.761 | 0.829 | 0.657 | 0.617 | 0.696 | 
| Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. | 
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Fordellone, M.; Chiodini, P. Unsupervised Hierarchical Classification Approach for Imprecise Data in the Breast Cancer Detection. Entropy 2022, 24, 926. https://doi.org/10.3390/e24070926
Fordellone M, Chiodini P. Unsupervised Hierarchical Classification Approach for Imprecise Data in the Breast Cancer Detection. Entropy. 2022; 24(7):926. https://doi.org/10.3390/e24070926
Chicago/Turabian StyleFordellone, Mario, and Paolo Chiodini. 2022. "Unsupervised Hierarchical Classification Approach for Imprecise Data in the Breast Cancer Detection" Entropy 24, no. 7: 926. https://doi.org/10.3390/e24070926
APA StyleFordellone, M., & Chiodini, P. (2022). Unsupervised Hierarchical Classification Approach for Imprecise Data in the Breast Cancer Detection. Entropy, 24(7), 926. https://doi.org/10.3390/e24070926
 
         
                                                

 
       