Next Article in Journal
User Perception of Content Credibility in E-Commerce Websites: Insight from Behavioral Economics Theories
Previous Article in Journal
AI-Driven Semantic Framework for Automated Construction Planning and Scheduling with BIM and Digital Twin Integration
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Proceeding Paper

Breast Cancer Predictions Using Machine Learning Algorithms †

1
Department of Software Engineering, University of Sialkot, Sialkot 51040, Pakistan
2
Department of Computer Science, Gulf University for Sciences and Technology, Safat 13060, Kuwait
3
Department of Mathematics, Nusa Putra University, Sukabumi 43152, Indonesia
*
Author to whom correspondence should be addressed.
Presented at the 7th International Global Conference Series on ICT Integration in Technical Education & Smart Society, Aizuwakamatsu City, Japan, 20–26 January 2025.
Eng. Proc. 2025, 107(1), 129; https://doi.org/10.3390/engproc2025107129
Published: 10 October 2025

Abstract

Breast cancer is the most commonly rising problem among women. It is a type of cancer, and there are different causes of breast cancer and they can be predicted by having knowledge of the symptoms of breast cancer. Breast cancer can be caused if anyone from a family has suffered with this problem in the past. It is caused due to hormonal imbalance among women. There are different treatments for it. Many women are affected by this problem. In 2024, a Bollywood actress was also found to be the victim of breast cancer and she is still suffering from the disease. Breast cancer can be treated by applying different therapies and surgeries, etc. In the last stage of breast cancer, it spreads to other parts of the body, showing symptoms of weight loss, fatigue etc.

1. Introduction

Breast cancer is the most commonly rising problem among women. It is a type of cancer. It occurs when the breast cells of women grow rapidly without any control, creating a tumor which harms the surrounding tissues and spreads, metastasizing in other parts of the body. China, India, and the US have had the highest ratio of breast cancer in recent years. According to research, developed countries have a higher expected ratio of breast cancer in comparison with developing countries which shows that developing countries have limited access to healthcare and other basic needs of human health. There are different types of breast cancer, i.e., Ductal Carcinoma In Situ (DCIS), Invasive Ductal Carcinoma (IDC), Invasive Lobular Carcinoma (ILC), triple-negative breast cancer, Human Epidermal Growth Factor Receptor 2 positive(HER2) breast cancer, and inflammatory breast cancer. DCIS stands for ductal carcinoma in situ which occurs when abnormal cells are present in the inner layer of the breast duct. IDC stands for invasive ductal carcinoma which is the most common type of breast cancer which occurs when the cancer spreads in the adjacent tissues of the breast [1]. ILC stands for invasive lobular carcinoma which occurs when milk-producing tissues move to other areas in breasts. The fourth type of disease occurs when there is a nonexistence of estrogen, progesterone, and HER2 receptors in most critical cases. HER2 type of breast cancer occurs when there is a higher quantity of HER2 proteins in the breast and it can be treated with different therapies. Inflammatory cancer is a very rare type of breast cancer which occurs when the breast is swollen and its color changes to red [2]. The above graph shows a summary of different types of breast cancer. There are different causes and risk factors of breast cancer. There could be a family history of breast cancer, for example, your sister, daughter, or mother were the victims of breast cancer in the past. Hormonal imbalance can lead to breast cancer, higher body fat, irregular diet, and irregular periods are the major causes of breast cancer. We should try to improve our hormonal imbalance and other problems by carrying the habit of regular exercise at least 15 to 30 min early in the morning per day. The regular exercise will lose our weight and protect us from being a victim of breast cancer. There are different symptoms of breast cancer depending on the type and stage of disease. The most common sign of this disease is the creation of a lump or thickness near the breast and underarm. There might be no pain in the lumps but if you noticed any change near your breast or underarm, you should be checked by a doctor. An unusual change in one side of the breast is also a cause of breast cancer. A change of skin color, such as a redness on the skin can indicate a serious form of breast cancer. If there is no presence of lumps near or under the breast but you feel that your breast is swollen or warm, this could also be a cause of breast cancer. In advanced stages, cancer can spread to other parts of the body and symptoms for the last stage of breast cancer are bone pain, weight loss, and fatigue [3]. There are different treatment methods for breast cancer which we will discuss in this paper. The most commonly used technique for this purpose is therapies. There are also many treatment methods but if we can detect breast cancer early based on its symptoms, we can control and remove it in its early stage and then we will not need any kind of treatment for this purpose. See Figure 1 for a summary of breast cancer types while Figure 2 displays the early breast cancer signs and symptoms.

2. Literature Review

This 2020 study addresses the challenges of breast cancer prediction using machine learning due to limitations of traditional diagnostic methods. In this research gap, it achieves a high accuracy with less errors using various machine learning models. In this report, we employ Support Vector Machine (SVM), Logistic Regression, Random Forest, and K Nearest Neighbor (KNN), by comparing their predictive performance using datasets that are processed in the jupyter environment. It achieved SVM with a highest accuracy of (97.13%) and also had the lowest errors. It showed comparison to prior studies. This work demonstrates competitive performance emphasizing the potential of machine learning in improving breast cancer diagnosis [4].
This paper covers a high ratio of breast cancer in women in 2020, addressing the need for more accurate models for detection of breast cancer. This paper shows that machine learning algorithms are best for classification purposes. This paper applies different models of machine learning, i.e., KNN, SVM, Logistic Regression and Naïve Bayes, by using the dataset downloaded from an open source database and dividing this dataset in the ratio of 80:20, where 80% is the training data and 20% is the testing data. The result for the applied models shows that KNN is the best model for classification among all of the above-mentioned models with an accuracy of 99%. As a comparison with others, this paper shows that KNN is best for efficiency in the prediction and detection of breast cancer as compared to SVM and Logistic Regression [5]. The paper written by [6] shows the problems faced in the classification of breast cancer and highlights the challenge of breast cancer classification owing to its high death ratio by imposing effective machine learning (ML) models. The paper identifies the most efficient algorithm with the highest accuracy and minimal error rate. The paper applies different models of ML using the dataset from Kaggle and applying ML models in an anaconda environment. Results for applied models show that Random Forest is the best model for classifying breast cancer on the basis of its death ratio with an accuracy of 99.76%. As compared to other studies, this paper confirms Random Forest efficiency over SVM and Logistic Regression [7]. This paper found the challenges of breast cancer prediction by combining multiple risk factors for improved accuracy. It combines demographic, laboratory, and mammographic data for detailed predictions. This paper applies Random Forest, Gradient Boosting Trees, Multi-Layer perceptron and genetic Algorithms using a dataset from the Motamed Cancer institute. The results show that model RF has the highest sensitivity (95%) and GBT presented improved global capability for accurate predictions (AUC = 0.59). In a comparison with other studies, this paper highlights the position of multi-factorial modeling over single-method techniques [8]. The advanced technology, i.e., chemotherapy, gene therapy, and many other techniques like this have improved the survival ratio of women who suffer from breast cancer. The paper discusses various treatment technologies for treatment of this disease. As compared to traditional chemotherapy and home therapy, monoclonal antibodies are best for reducing its side effects and enhancing the accuracy rate for treatment [9]. Benson et al. wrote a paper in 2009 which shows that breast cancer is the major cause of death in women. The paper suggests the importance of treatment with less side effects. Screening and other treatments have increased the survival rate but there are still issues in surgery and gene testing for personal care. It shows different treatments of breast cancer like surgery, targeted drugs, and different therapy combinations. The results for different treatment techniques shows that targeted drugs are the best approach for treatment as compared to chemotherapy. This research helps in choosing the best treatment technology and avoiding unnecessary therapies [10]. Ref. [11] show that breast cancer is the most commonly spreading disease among women in Western countries with an increased ratio of 190 cases to 1,000,000 women per year. The study highlights advanced technologies for treatment, i.e., gene profiling, and imaging for HER2 positive cases.
This study shows the need of awareness among women for an early prediction of breast cancer and its treatment to decrease the death ratio of women due to breast cancer [12]. Ref. [13] shows the limitations of breast cancer in men due to limited knowledge and more research about breast cancer only in women. The author shows the difference between the aspect ratio of breast cancer among men and women. The research shows that the hormone receptor positivity is more in men in comparison with women. The results show the reduced rate of breast cancer cases and improved results for survival in this case which avoids unnecessary treatments and chemotherapy. The results show the higher rate of hormone receptor positivity in men as compared to women which may protect them from being a victim of breast cancer [14].

3. Methodology

We have applied different models of machine learning (Figure 3 shows the overall methodology flow) for the early prediction of breast cancer among women in upcoming years on the basis of past research data or conclusion/results. We downloaded a dataset from an open source then applied algorithms on it using the most efficient tool named Rapid minor [15,16]. The names of the applied modes are KNN, Logistic Regression, Decision Tree, SVM, and ANN. We applied different classification algorithms on our dataset to get the model with the highest accuracy.
  • Logistic Regression: This is a generalized linear model. This is also simple and interpretable. This is also suitable for binary classification, when you need a quick baseline for binary classification like a diagnosis column.
  • Naïve Bayes: It is fast and efficient for high dimensional datasets and it also works well with categorical data, when features are independent in it.
  • KNN Nearest Neighbors: It is also simple and non-parametric and makes no data distribution. Used for small datasets.

3.1. Description of Dataset

Table 1 shows the dataset description with parameters.

3.2. Proposed Machine Learning Model

The proposed model for the applied models is as follows. We will see a detailed view of applied models in the table with accuracy after applying different models. The following image shows the methodology used in preparing a model of models for the year 2012–2015 using the following given process.

4. Results

Here, we will discuss breast cancer solutions and accuracy from previous models and will recommend the best models. The classification results for the experiments are presented in Figure 4, Figure 5, Figure 6 and Figure 7. Figure 4 shows the performance of the Naïve Bayes classifier, while Figure 5, Figure 6 and Figure 7 illustrate the classification workflows implemented in RapidMiner for breast cancer prediction. Each workflow begins with the retrieval of the breast cancer dataset, followed by preprocessing using the Replace Missing Values operator to handle incomplete data. The dataset is then split into training and testing subsets through the Split Data operator. For model construction, different classifiers are applied: Figure 5 shows the Naïve Bayes classifier, Figure 6 presents the Logistic Regression classifier, and Figure 7 illustrates the KNN classifier. In each case, the trained model is applied to the test subset using the Apply Model operator, and the results are evaluated through the Performance operator, which computes metrics such as accuracy, precision, recall, and F-measure. These workflows ensure a systematic approach to preprocessing, model training, prediction, and performance evaluation across different algorithms. Table 2 shows the classification results.
Figure 8 shows the comparative performance of Naïve Bayes, KNN, and Logistic Regression classifiers, highlighting the superior effectiveness of Naïve Bayes and KNN over Logistic Regression for this dataset.
The entire set of diagnostic models underwent training and were incorporated within a sophisticated, end-to-end diagnostic pipeline consisting of the following:
  • Prior to cleansing the images of all noise, the segmentation process began. The images are then re-sized and unneeded noise is erased. Following this, the modified images are sent to the U-Net segmentor.
  • The output from the segmentor undergoes classification by the Convolution Neural Network (CNN).
Professional healthcare doctors can easily navigate the user interface developed in React and access the results through the web-based system.

5. Implementation

The proposed system demonstrated significant improvements over existing methods in terms of segmentation and classification accuracy. Below, we discuss the performance metrics and key findings.

5.1. Description of Dataset

We applied different machine learning models to get the model with the highest accuracy. The formula for calculating the accuracy of models is as follows:
A c c u r a c y = T P + T N T P + T F + F P + F N

5.2. Accuracy with Machine Learning Models

As we have discussed earlier, different algorithms are applied on the dataset to achieve the best and highest accuracy. Accuracy for every model is different and we would consider the model with the highest accuracy. The following table shows the accuracy of different algorithms which are achieved after applying algorithms on the Rapid Miner. Every algorithm has its own purpose and different method.

6. Conclusions

The results show that Naïve Bayes is the best model for predicting the ratio of breast cancer with an accuracy of 94.7% in comparison with the other applied models. The accuracy of KNN is 91.81% and the accuracy of Logistic Regression is 37.43%. So, for the early prediction of breast cancer in upcoming years, this can be found by applying Naïve Bayes on the given dataset. Most of the women are unaware of breast cancer and its results for their health. In the countries, women living in villages are unaware of breast cancer. We need to run awareness campaigns for causes, symptoms, and dangers of breast cancer. Campaigns can be run physically in the villages as most of the villages face network issues and also on social media platforms for the young generation.

Author Contributions

Conceptualization, A.M.; methodology, A.M.; software, A.N.; validation, T.M.A.; formal analysis, A.M.; investigation, T.M.A.; resources, A.N.; data curation, T.M.A.; writing—original draft preparation, A.M.; writing—review and editing, T.M.A.; visualization, A.N.; supervision, L.S.P.; project administration, L.S.P.; funding acquisition, L.S.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Li, Y. Performance evaluation of machine learning methods for breast cancer prediction. Appl. Comput. Math. 2018, 7, 212. [Google Scholar] [CrossRef]
  2. Floyd, C.E.; Lo, J.Y.; Yun, A.J.; Sullivan, D.C.; Kornguth, P.J. Prediction of breast cancer malignancy using an artificial neural network. Cancer 1994, 74, 2944–2948. [Google Scholar] [CrossRef]
  3. Wang, H. Breast Cancer Prediction Using Data Mining Method. 2015. Available online: https://www.researchgate.net/publication/319688741 (accessed on 1 January 2025).
  4. Daemen, A.; Griffith, O.L.; Heiser, L.M.; Wang, N.J.; Enache, O.M.; Sanborn, Z.; Pepin, F.; Durinck, S.; Korkola, J.E.; Griffith, M.; et al. Modeling precision treatment of breast cancer. Genome Biol. 2013, 14, R110. [Google Scholar] [CrossRef]
  5. Fatima, N.; Liu, L.; Hong, S.; Ahmed, H. Prediction of breast cancer, comparative review of machine learning techniques, and their analysis. IEEE Access 2020, 8, 150360–150376. [Google Scholar] [CrossRef]
  6. Sivapriya, J.; Kumar, A.; Sai, S.S.; Sriram, S. Breast cancer prediction using machine learning. Int. J. Recent Technol. Eng. (IJRTE) 2019, 8, 4879–4881. [Google Scholar]
  7. Mavaddat, N.; Michailidou, K.; Dennis, J.; Lush, M.; Fachal, L.; Lee, A.; Tyrer, J.P.; Chen, T.H.; Wang, Q.; Bolla, M.K.; et al. Polygenic risk scores for prediction of breast cancer and breast cancer subtypes. Am. J. Hum. Genet. 2019, 104, 21–34. [Google Scholar] [CrossRef]
  8. Botlagunta, M.; Botlagunta, M.D.; Myneni, M.B.; Lakshmi, D.; Nayyar, A.; Gullapalli, J.S.; Shah, M.A. Classification and diagnostic prediction of breast cancer metastasis on clinical data using machine learning algorithms. Sci. Rep. 2023, 13, 485. [Google Scholar] [CrossRef]
  9. Khan, F.; Khan, M.A.; Abbas, S.; Athar, A.; Siddiqui, S.Y.; Khan, A.H.; Saeed, M.A.; Hussain, M. Cloud-Based Breast Cancer Prediction Empowered with Soft Computing Approaches. J. Healthc. Eng. 2020, 2020, 8017496. [Google Scholar] [CrossRef]
  10. Sakri, S.B.; Abdul Rashid, N.B.; Muhammad Zain, Z. Particle swarm optimization feature selection for breast cancer recurrence prediction. IEEE Access 2018, 6, 29637–29647. [Google Scholar] [CrossRef]
  11. Baselga, J.; Norton, L. Focus on breast cancer. Cancer Cell 2002, 1, 319–322. [Google Scholar] [CrossRef]
  12. Rawal, R. Breast Cancer Prediction Using Machine Learning. 2020. Available online: http://www.jetir.org (accessed on 10 February 2025).
  13. Benson, J.R.; Jatoi, I.; Keisch, M.; Esteva, F.J.; Makris, A.; Jordan, V.C. Early breast cancer. Lancet 2009, 373, 1463–1479. [Google Scholar] [CrossRef]
  14. Diwaker, C.; Tomar, P.; Solanki, A.; Nayyar, A.; Jhanjhi, N.Z.; Abdullah, A.; Supramaniam, M. A new model for predicting component-based software reliability using soft computing. IEEE Access 2019, 7, 147191–147203. [Google Scholar] [CrossRef]
  15. Kok, S.H.; Abdullah, A.; Jhanjhi, N.Z.; Supramaniam, M. A review of intrusion detection system using machine learning approach. Int. J. Eng. Res. Technol. 2019, 12, 8–15. [Google Scholar]
  16. Ahmed, S.; Hossain, M.A.; Bhuiyan, M.M.I.; Ray, S.K. A comparative study of machine learning algorithms to predict road accident severity. In Proceedings of the 2021 20th International Conference on Ubiquitous Computing and Communications (IUCC/CIT/DSCI/SmartCNS), London, UK, 20–22 December 2021; pp. 390–397. [Google Scholar] [CrossRef]
Figure 1. Summary of types of breast cancer.
Figure 1. Summary of types of breast cancer.
Engproc 107 00129 g001
Figure 2. Warning signs.
Figure 2. Warning signs.
Engproc 107 00129 g002
Figure 3. Proposed machine learning model.
Figure 3. Proposed machine learning model.
Engproc 107 00129 g003
Figure 4. Naïve Bayes results.
Figure 4. Naïve Bayes results.
Engproc 107 00129 g004
Figure 5. Logistic regression.
Figure 5. Logistic regression.
Engproc 107 00129 g005
Figure 6. Naïve Bayes.
Figure 6. Naïve Bayes.
Engproc 107 00129 g006
Figure 7. KNN.
Figure 7. KNN.
Engproc 107 00129 g007
Figure 8. Results of Classification.
Figure 8. Results of Classification.
Engproc 107 00129 g008
Table 1. Description of attributes in the dataset.
Table 1. Description of attributes in the dataset.
Sr.AttributesDescription
1YearYears from 2019 to 2024
2AgeAge gaps of women
3MenopauseFeature for predicting breast size.
4Tumor SizeUsed for predicting breast cancer
5Inv-NodesNumber of affected people
6BreastPopulation data (2022)
7HistoryPast cases of breast cancer
8Diagnosis ResultIndicates weather cancer is benign or malignant
Table 2. Performance of different classifiers.
Table 2. Performance of different classifiers.
AlgorithmsAccuracyPrecisionRecall
KNN91.81%91.67%85.94%
Naïve Bayes94.7%93.65%92.19%
Logistic Regression37.43%37.43%38.9%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Mehak, A.; Ali, T.M.; Nawaz, A.; Parwati, L.S. Breast Cancer Predictions Using Machine Learning Algorithms. Eng. Proc. 2025, 107, 129. https://doi.org/10.3390/engproc2025107129

AMA Style

Mehak A, Ali TM, Nawaz A, Parwati LS. Breast Cancer Predictions Using Machine Learning Algorithms. Engineering Proceedings. 2025; 107(1):129. https://doi.org/10.3390/engproc2025107129

Chicago/Turabian Style

Mehak, Adan, Tahir Muhammad Ali, Ali Nawaz, and Lusiana Sani Parwati. 2025. "Breast Cancer Predictions Using Machine Learning Algorithms" Engineering Proceedings 107, no. 1: 129. https://doi.org/10.3390/engproc2025107129

APA Style

Mehak, A., Ali, T. M., Nawaz, A., & Parwati, L. S. (2025). Breast Cancer Predictions Using Machine Learning Algorithms. Engineering Proceedings, 107(1), 129. https://doi.org/10.3390/engproc2025107129

Article Metrics

Back to TopTop