Deep Learning Approach for Breast Cancer Detection Using UNet and CNN in Ultrasound Imaging

Ch, Ravikumar; Naresh, Usikela; Malik, Arun; Hattamurrahman, M. Putra Sani

doi:10.3390/engproc2025107077

Open AccessProceeding Paper

Deep Learning Approach for Breast Cancer Detection Using UNet and CNN in Ultrasound Imaging^†

¹

Department of CSE, Sreenidhi University, Hyderabad 501301, India

²

Department of CSE (AI&ML), CVR College of Engineering, Hyderabad 501510, India

³

Department of CSE, Lovely Professional University, Phagwara 144411, India

⁴

Department of Electrical Engineering, Nusa Putra University, Sukabumi 43152, Indonesia

^*

Author to whom correspondence should be addressed.

^†

Presented at the 7th International Global Conference Series on ICT Integration in Technical Education & Smart Society, Aizuwakamatsu City, Japan, 20–26 January 2025.

Eng. Proc. 2025, 107(1), 77; https://doi.org/10.3390/engproc2025107077

Published: 9 September 2025

(This article belongs to the Proceedings of The 7th International Global Conference Series on ICT Integration in Technical Education & Smart Society)

Download

Browse Figures

Versions Notes

Abstract

Breast cancer continues to be a serious concern for global health, especially when proper treatment is time-sensitive. This research contributes a novel method to improve breast cancer detection in ultrasound images by employing a deep learning technique that integrates UNet and Convolution Neural Networks(CNN) architectures. For tumor segmentation within breast ultrasound images, UNet has been used, alongside a CNN that classifies the resulting tumor as benign or malignant and performs feature extraction. When evaluated on the ‘Dataset_BUSI_with_GT’, the model was found to be reliable across varying conditions, achieving high sensitivity (97.44%) and accuracy (95.24%), scores better than those ofexisting approaches. The developed system is composed of an imaging module, image upload, preprocessing, inference, result display, and feedback, providing non-interrupted service and enhancing user-centered functionalities. Continuous improvement capabilities allow the system to redefine new image changes, sustaining reliability in examinations and clinical settings. Compared to other methodologies, the proposed model demonstrates superior accuracy alongside less computational resources, translating to reduced diagnostic human error while optimizing the workflow in primary healthcare. Future work could includethe application of multimodal imaging, deploy real-time imaging, and increase its interpretability to strengthen its use in medical diagnosis.

Keywords:

breast cancer detection; deep learning; UNet architecture; CNN; image segmentation; ultrasound imaging; sensitivity

1. Introduction

Since obesity contributes substantially to the global burden of chronic illnesses like metabolic syndrome, type 2diabetes, and cardiovascular conditions, it has emerged as a significant public health concern. Although Body Mass Index (BMI) and other traditional techniques of assessing obesity are widely used, they often fail to consider body composition and other important criteria, which lead to misclassification [1]. One promising tool is machine learning (ML), which makes use of different types of datasets in the pursuit of subtle patterns and the interconnectedness of causal factors related to obesity. Recent studies have shown how well machine learning algorithms predict obesity by considering variables including metabolic markers, physical activity, and eating patterns. However, most of the current methods either use a small number of datasets or concentrate on a small number of algorithms, which leaves gaps in reaching the best possible predicted accuracy. By evaluating the performance of four machine learning algorithms, decision tree, naïve Bayes, random forest, and K Nearest Neighbor(K-NN), on a carefully chosen dataset that contains several traits associated with obesity, this work seeks to close these gaps. Finding the optimal algorithm to predict obesity is the aim of this study, with an emphasis on appropriate feature selection and data processing. The findings add to the expanding corpus of research on machine learning applications in obesity management and lay the groundwork for creating practical, evidenced-based strategies to address the obesity pandemic. The methodology of the dataset, results, and potential avenues for future research are covered in depth in the sections that follow. The workflow of our research paper is illustrated in Figure 1.

2. Literature Review

Jeon et al. [1] explored age-specific risk factors for predicting obesity using multiple machine learning algorithms. Unlike traditional models that mainly rely on lifestyle data, such as eating habits or physical activity, their research incorporated metabolic parameters to enable more accurate predictions across demographics. The study showed that triglycerides, glycated hemoglobin (HbA1c), alanine Aminotransferase (ALT, also known as serum glutamic-pyruvic transaminase, SGPT), and uric acid are strong predictors of obesity, especially in younger populations where the models achieved accuracy levels above 70%. For older groups, the accuracy decreased, but the findings emphasized the necessity of incorporating biological and metabolic indicators for a more personalized prediction framework compared to previous lifestyle-only approaches.

Colmenarejo et al. [2] conducted a comprehensive review of machine learning models applied to childhood and adolescent obesity. They highlighted the inadequacy of traditional statistical models in addressing the nonlinear and multivariate nature of obesity risk factors. By comparing different approaches, the review concluded that machine learning methods such as deep learning and ensemble techniques offered stronger predictive power, particularly when applied to large and diverse datasets. Extending this work, Jeon et al. [3] examined the limitations of body mass index (BMI), which only considers height and weight, and proposed a machine learning framework enhanced by three-dimensional (3D) body scans, dual-energy X-ray absorptiometry (DXA), and bioelectrical impedance analysis (BIA). With genetic algorithm–based feature selection, the model achieved 80% accuracy, significantly outperforming BMI- and BIA-only methods. This marked a step forward in obesity research by providing more precise classifications that take body composition into account.

Kaur et al. [4] advanced the field by not only predicting obesity risk but also recommending personalized dietary plans to manage it. They used gradient boosting (GB) and extreme gradient boosting (XGBoost), achieving an impressive 98.11% accuracy with a 90:10 train-test split. Beyond prediction, their model applied the Harris–Benedict equation to recommend calorie-based meal plans, bridging the gap between prediction and intervention. Osadchiy et al. [5] also expanded the predictive scope of obesity by combining metabolic and neuroimaging data. Their machine learning models identified brain connectivity patterns and gut-derived metabolites that differentiated obese individuals from overweight ones with accuracy exceeding 90%. These findings suggested that obesity involves distinct brain–gut mechanisms, emphasizing the importance of neuro-metabolic biomarkers. Complementing these works, Cheng et al. [6] discussed practical considerations in childhood obesity prediction, identifying challenges such as class imbalance, insufficient feature selection, and the lack of standardized benchmarks in obesity-related datasets.

Sun et al. [7] used interpretable machine learning models to investigate the role of lifestyle behaviors in predicting overweight and obesity. Employing data from the China Health and Nutrition Survey (CHNS) and the U.S. National Health and Nutrition Examination Survey (NHANES), they used gradient boosting decision trees with SHapley Additive exPlanations (SHAP) analysis. Their results revealed that protein intake, alcohol consumption, and physical inactivity were the most important predictors of obesity risk. Singh et al. [8] addressed the issue of childhood obesity prediction using longitudinal records from the UK Millennium Cohort Study. Their research employed the Synthetic Minority Oversampling Technique (SMOTE) to handle class imbalance and applied advanced machine learning techniques to achieve higher accuracy than traditional methods, particularly in predicting obesity risk at age 14.

Kim et al. [9] expanded obesity-related research by predicting not just obesity but also associated cardiometabolic conditions such as hypertension, dyslipidemia, and type 2 diabetes mellitus. They used deep neural networks (DNNs) on dietary intake data from the Korea National Health and Nutrition Examination Survey, achieving superior prediction accuracy compared to logistic regression and decision tree methods. Similarly, Du et al. [10] developed a visualization-based obesity risk prediction system that tested ten machine learning models. Among them, XGBoost demonstrated the best performance, and SHAP analysis highlighted hip circumference and triglycerides as key predictors, providing both predictive accuracy and interpretability. Cheng et al. [11] further analyzed whether physical activity could predict obesity, combining statistical and machine learning techniques. Their results reinforced physical activity as a key protective factor, demonstrating its predictive power across large datasets.

Pang et al. [12] examined the use of electronic health records (EHRs) to predict early childhood obesity. Their model highlighted the importance of longitudinal data for improving predictive accuracy, providing a framework for clinical applications. Meanwhile, Diwaker et al. [13] and Kok et al. [14], though addressing different application domains such as software reliability and cybersecurity intrusion detection, showcased the flexibility of machine learning approaches to complex prediction tasks. Their methodologies highlight how techniques developed outside of healthcare can be adapted and applied effectively to obesity research. Airehrour et al. [15] extended this concept by presenting a secure trust-based communication framework for mobile networks. While not directly related to obesity, their emphasis on reliability and trust in data-driven systems reinforces the importance of robust computational infrastructures for the deployment of machine learning in health contexts.

3. Methodology

To predict obesity, machine learning classifiers are used on a carefully chosen dataset in this work. The goal of this project is to develop an efficient and understandable machine learning-based system that predicts obesity using Rapid Miner. The methodology includes choosing features, training models, and processing data and evaluation. The following algorithms were used in our experiment:

Decision Tree
The dataset was divided relative to feature values utilizing a decision tree classifier, thereby creating a tree structure to categorize obesity levels. The dataset wasperiodically divided into subgroups by the tree according to the characteristics that decrease impurity or optimize information gain. Information gain, where • H(X) = entropy of the dataset T. • Xv = subset of T for which feature Z has value v. Entropy: • Pi = proportion of class i in dataset X. Figure 2 shows how information gain is calculated.

Figure 2. Informationgain.

Random Forest
Random forest is an ensemble learning method that builds several decision trees during training and chooses the most common class for classification tasks or average predictions for regression tasks to obtain the final result. Random forest formula:

$\hat{y} = M o d e (R_{1} (a), R_{2} (a), . . ., R_{n} (a))$

(1)

where $R_{i} (a) =$ prediction of the $i^{t h}$ decision tree for input a. $n =$ Number of decision trees.

K-Nearest Neighbors (K-NNs)
The K-NN algorithm classifies a data point based on most classes of its k-nearest neighbors. Distance of metric: the degree of resemblance between data points is determined using the Euclidean distance:

$d (A, B) = \sqrt{Σ {(A_{i} - B_{i})}^{2}}$

(2)

where • A and B = two data points in n-dimensional space.

Naïve Bayes
Assuming feature independence, the naïve Bayes classification technique is based on Bayes’ theorem. It works well with large data and uses feature likelihoods to estimate the likelihood that a data point belongs to a class.

$P (A| K) = \frac{P (K| A) \cdot P (A)}{P (K)}$

(3)

where
- P(A|K) = posterior probability of class A given feature vector K;
- P(K|A) = likelihood of K given class A;
- P(A) = prior probability of class A;
- P(K) = marginal probability of K.

3.1. Framework

The procedure of the proposed framework, which integrates several machine learning classifiers for predicting obesity, includes gathering and preprocessing obesity-related data; feature engineering, i.e., handling all missing values, selecting features, and selecting relevant predictors; model training and evaluation using machine learning algorithms to train and validate modes; and generating predictions using the model with the best performance. Figure 3 shows framework of the proposed methodology, illustrating the sequence of processes from data collection and preprocessing through feature selection, data splitting, model training, validation, prediction, performance evaluation, and final conclusion.

3.2. Attribute with Description

Twenty-one characteristics, including physical stature, family history of obesity, and lifestyle choices, were included in the dataset utilized for this investigation. People are categorized by the goal variables into classes like normal, overweight, and obese. Figure 4 shows the features used in this research along with feature type and description.

3.3. Replace Missing Values

We used different techniques to tackle the missing values in the dataset to make it clear in order toachieve better performance and accuracy. The techniques we used to handle the missing values are as follows: to deal with the numeric features, we replaced the mean of the respective feature, and for categorical features, we replaced this with the most frequent category, ensuring that no data point is lost so that the dataset maintains its integrity.

3.4. Split Data

To guarantee proper model training and evaluation, the dataset was divided into two sections: training and testing. The workflow is shown in Figure 5. Twenty percent of the data were set aside for performance evaluation, and the remaining eighty percent were used for training.

3.5. Machine Learning

Four distinct machine learning algorithms as shown in Figure 6, decision tree, random forest, K-NN, and naive Bayes—are used in the suggested methodology to forecast obesity based on the carefully selected dataset. Each of these algorithms was implemented within RapidMiner, and their respective performances were estimated based on accuracy. The best accuracy of 98.33% was achieved with the decision tree algorithm, with a tree-based structure for the classification of the data. Random forest is another ensemble technique in which many decision trees are aggregated to obtain an improved version in terms of robustness; 98.27% accuracy was observed from this. In the K-NN algorithm, classification is made using the majority class of its neighbors, achieving a high accuracy of 98.03%. Finally, the probabilistic classifier was naive Bayes is the Bayes theorem-based method. It obtained only 90.08% accuracy; this was low because of the independence assumption among the features. The results show that tree-based models perform well in predicting obesity effectiveness; therefore, for this dataset, decision tree will be the appropriate algorithm.

4. Results

As for Figure 7 Random forest, K-NN, naive Bayes, and decision tree were the four machine learning algorithms that were examined in this study. At 98.33% accuracy, decision tree had the highest score, followed by random forest at 98.27% and K-NN at 98.03%. Naive Bayes, which assumed independence, performed the worst, with a score of 90.08%. All things considered, tree-based models performed well, so decision tree was the ideal option for this task.

5. Conclusions

Decision tree achieved the highest accuracy of 98.33% in predicting obesity, followed by random forest, K-NN, naive Bayes, and decision tree. Compared with more conventional techniques such as BMI, the results have shown that tree-based models are often the best for the classification of obesity. Future research could focus on the inclusion of more data sources, such as genetic and metabolic markers, and the exploration of cutting-edge methods like deep learning to further enhance model performance. Practical applications in healthcare will be further supported by creating customized obesity management systems and tackling issues like data imbalance and real-time deployment.

Author Contributions

Conceptualization, R.C. and U.N.; Methodology, A.M. and M.P.S.H.; Software, U.N.; Validation, R.C. and A.M.; Formal Analysis, M.P.S.H.; Investigation, U.N.; Resources, R.C.; Data Curation, A.M.; Writing, Original Draft Preparation, R.C. and U.N.; Writing, Review & Editing, A.M. and M.P.S.H.; Visualization, U.N.; Supervision, R.C.; Project Administration, R.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are available from the corresponding author upon reasonable request.

Conflicts of Interest

Authors declare no conflict of interest.

References

Jeon, J.; Lee, S.; Oh, C. Age-specific risk factors for the prediction of obesity using a machine learning approach. Front. Public Health 2023, 10, 998782. [Google Scholar] [CrossRef] [PubMed]
Colmenarejo, G. Machine learning models to predict childhood and adolescent obesity: A review. Nutrients 2020, 12, 1–31. [Google Scholar] [CrossRef] [PubMed]
Jeon, S.; Kim, M.; Yoon, J.; Lee, S.; Youm, S. Machine learning-based obesity classification considering 3D body scanner measurements. Sci. Rep. 2023, 13, 1–10. [Google Scholar] [CrossRef] [PubMed]
Kaur, R.; Kumar, R.; Gupta, M. Predicting risk of obesity and meal planning to reduce the obese in adulthood using artificial intelligence. Endocrine 2022, 78, 458–469. [Google Scholar] [CrossRef] [PubMed]
Osadchiy, V.; Bal, R.; Mayer, E.A.; Kunapuli, R.; Dong, T.; Vora, P.; Petrasek, D.; Liu, C.; Stains, J.; Gupta, A. Machine learning model to predict obesity using gut metabolite and brain microstructure data. Sci. Rep. 2023, 13, 5488. [Google Scholar] [CrossRef] [PubMed]
Cheng, E.; Steinhardt, R.; Miled, Z. Predicting childhood obesity using machine learning: Practical considerations. BioMedInformatics 2022, 2, 184–203. [Google Scholar] [CrossRef]
Sun, Z.; Yuan, Y.; Farrahi, V.; Herold, F.; Xia, Z.; Xiong, X.; Qiao, Z.; Shi, Y.; Yang, Y.; Qi, K.; et al. Using interpretable machine learning methods to identify the relative importance of lifestyle factors for overweight and obesity in adults: Pooled evidence from CHNS and NHANES. BMC Public Health 2024, 24, 3034. [Google Scholar] [CrossRef] [PubMed]
Singh, B.; Tawfik, H. Machine learning approach for the early prediction of the risk of overweight and obesity in young people. In International Conference on Computational Science (ICCS); Springer International Publishing: Cham, Switzerland, 3 June 2020; pp. 523–535. [Google Scholar]
Kim, H.; Lim, D.; Kim, Y. Classification and prediction on the effects of nutritional intake on overweight/obesity, dyslipidemia, hypertension and type 2 diabetes mellitus using deep learning model: 4–7th Korea National Health and Nutrition Examination Survey. Int. J. Environ. Res. Public Health 2021, 18, 5597. [Google Scholar] [CrossRef] [PubMed]
Du, J.; Yang, S.; Zeng, Y.; Ye, C.; Chang, X.; Wu, S. Visualization obesity risk prediction system based on machine learning. Sci. Rep. 2024, 14, 22424. [Google Scholar] [CrossRef] [PubMed]
Cheng, X.; Lin, S.Y.; Liu, J.; Liu, S.; Zhang, J.; Nie, P.; Fuemmeler, B.F.; Wang, Y.; Xue, H. Does physical activity predict obesity—A machine learning and statistical method-based analysis. Int. J. Environ. Res. Public Health 2021, 18, 3966. [Google Scholar] [CrossRef] [PubMed]
Pang, X.; Forrest, C.; Lê-Scherban, F.; Masino, A. Prediction of early childhood obesity with machine learning and electronic health record data. Int. J. Med. Inform. 2021, 150, 104454. [Google Scholar] [CrossRef] [PubMed]
Diwaker, C.; Tomar, P.; Solanki, A.; Nayyar, A.; Jhanjhi, N.Z.; Abdullah, A.; Supramaniam, M. A new model for predicting component-based software reliability using soft computing. IEEE Access 2019, 7, 147191–147203. [Google Scholar] [CrossRef]
Kok, S.H.; Abdullah, A.; Jhanjhi, N.Z.; Supramaniam, M. A review of intrusion detection system using machine learning approach. Int. J. Eng. Res. Technol. 2019, 12, 8–15. [Google Scholar]
Airehrour, D.; Gutierrez, J.; Ray, S.K. GradeTrust: A secure trust based routing protocol for MANETs. In Proceedings of the 25th International Telecommunication Networks and Applications Conference (ITNAC), Sydney, Australia, 18–20 November 2015; pp. 65–70. [Google Scholar] [CrossRef]

Figure 1. Research Paper Workflow.

Figure 3. Framework.

Figure 4. Attribute with description.

Figure 5. Split data.

Figure 6. Machine learning algorithm.

Figure 7. Algorithm accuracy.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ch, R.; Naresh, U.; Malik, A.; Hattamurrahman, M.P.S. Deep Learning Approach for Breast Cancer Detection Using UNet and CNN in Ultrasound Imaging. Eng. Proc. 2025, 107, 77. https://doi.org/10.3390/engproc2025107077

AMA Style

Ch R, Naresh U, Malik A, Hattamurrahman MPS. Deep Learning Approach for Breast Cancer Detection Using UNet and CNN in Ultrasound Imaging. Engineering Proceedings. 2025; 107(1):77. https://doi.org/10.3390/engproc2025107077

Chicago/Turabian Style

Ch, Ravikumar, Usikela Naresh, Arun Malik, and M. Putra Sani Hattamurrahman. 2025. "Deep Learning Approach for Breast Cancer Detection Using UNet and CNN in Ultrasound Imaging" Engineering Proceedings 107, no. 1: 77. https://doi.org/10.3390/engproc2025107077

APA Style

Ch, R., Naresh, U., Malik, A., & Hattamurrahman, M. P. S. (2025). Deep Learning Approach for Breast Cancer Detection Using UNet and CNN in Ultrasound Imaging. Engineering Proceedings, 107(1), 77. https://doi.org/10.3390/engproc2025107077

Article Menu

Deep Learning Approach for Breast Cancer Detection Using UNet and CNN in Ultrasound Imaging^†

Abstract

1. Introduction

2. Literature Review