You are currently viewing a new version of our website. To view the old version click .
Computers
  • Article
  • Open Access

12 September 2022

Predicting Breast Cancer from Risk Factors Using SVM and Extra-Trees-Based Feature Selection Method

,
,
,
,
,
,
,
and
1
Department of Electrical Engineering and Informatics, Vocational College, Universitas Gadjah Mada, Yogyakarta 55281, Indonesia
2
Department of Artificial Intelligence, Sejong University, Seoul 05006, Korea
3
Department of Data Science, Sejong University, Seoul 05006, Korea
4
Industrial and System Engineering School, Telkom University, Bandung 40257, Indonesia
This article belongs to the Special Issue Human Understandable Artificial Intelligence

Abstract

Developing a prediction model from risk factors can provide an efficient method to recognize breast cancer. Machine learning (ML) algorithms have been applied to increase the efficiency of diagnosis at the early stage. This paper studies a support vector machine (SVM) combined with an extremely randomized trees classifier (extra-trees) to provide a diagnosis of breast cancer at the early stage based on risk factors. The extra-trees classifier was used to remove irrelevant features, while SVM was utilized to diagnose the breast cancer status. A breast cancer dataset consisting of 116 subjects was utilized by machine learning models to predict breast cancer, while the stratified 10-fold cross-validation was employed for the model evaluation. Our proposed combined SVM and extra-trees model reached the highest accuracy up to 80.23%, which was significantly better than the other ML model. The experimental results demonstrated that by applying extra-trees-based feature selection, the average ML prediction accuracy was improved by up to 7.29% as contrasted to ML without the feature selection method. Our proposed model is expected to increase the efficiency of breast cancer diagnosis based on risk factors. In addition, we presented the proposed prediction model that could be employed for web-based breast cancer prediction. The proposed model is expected to improve diagnostic decision-support systems by predicting breast cancer disease accurately.

1. Introduction

Large-scale high-dimensional data sets have recently become accessible in a wide range of disciplines and technologies; i.e., machine learning (ML) models were employed to help analyze the medical data so that potential health issues can be identified [,,,]. One of the major global health problems and a major cause of mortality in breast cancer. The most prevalent cancer in women and one of the leading causes of mortality among them is breast cancer []. The World Health Organization (WHO) reported that three out of every ten women diagnosed with breast cancer globally died in 2020 []. Most breast cancer disease is discovered during routine screening due to its silent development []. The incidence, mortality, and survival rates of breast cancer could be influenced by various factors, i.e., environment, genetic factors, lifestyle, and population structure []. The likelihood of survival is extremely high when breast cancer is discovered and treated quickly.
Early detection of disease can be achieved by developing a prediction model so that the patient will get better treatment. Machine-learning-based models have been utilized in previous studies for detecting breast cancer and showed significant performance [,,,,,]. Support vector machine (SVM) is an ML model that divides instances of each class from the others by locating the linear optimum hyperplane after nonlinearly mapping the original data into a high-dimensional feature space. SVMs have demonstrated superior performance for breast cancer detection as compared to conventional models [,,,]. Additionally, earlier research has demonstrated the beneficial effects of using extra-tree as the feature selection approach to increase classification accuracy in natural language processing [], white blood cell classification [], and network intrusion detection [].
However, none of these previous studies have integrated prediction models based on SVM with extra-trees into web-based breast cancer prediction. Therefore, the current study integrated SVM and extra-trees into web-based breast cancer prediction to improve the prediction performance for breast cancer. Extra-trees was utilized to extract significant risk factors, while SVM was used as a classifier to generate a more accurate prediction. In addition, integrating the proposed model into web-based breast cancer prediction could help the medical team in the decision-making process. Early prediction of breast cancer can be obtained by the medical team so that preemptive actions for the patients can be taken earlier than the incidents occur. The contributions of the present study are as follows:
(i)
For the first time, we suggest a combined extra-trees and SVM technique for predicting breast cancer.
(ii)
By employing extra-trees to identify the most useful features, we enhanced the performance of the proposed model.
(iii)
We undertook in-depth experiments comparing the proposed model to other prediction models and findings from earlier research.
(iv)
We analyzed the effects of using extra-trees or not in the feature selection approach on the accuracy performance of the model.
(v)
Finally, we created a web-based breast cancer prediction tool to illustrate the viability of our model.
Additionally, the developed application can be helpful for practitioners and decision-makers as beneficial guidelines for creating and putting into practice breast cancer prediction models for practical applications.
The remainder of our study is structured as follows: ML models for breast cancer are presented in Section 2, including related SVM and extra-trees feature selection. The proposed breast cancer prediction model is described in Section 3, and Section 4 describes the experimental results and deployment of our model. Section 5 presents the conclusion including study limitations and future research directions.

3. Methodology

3.1. Dataset

In this study, we used the breast cancer dataset provided by previous studies [,]. The dataset was collected from the Gynaecology Department of the University Hospital Centre of Coimbra (CHUC) between 2009 and 2013. The Coimbra Breast Cancer Dataset (CBCD) consists of 64 women with breast cancer and 52 healthy subjects. The 9 (nine) potential risk factors (attributes) were obtained from routine blood analysis, such as body mass index/ BMI (kg/m2), age (years), levels of glucose (mg/dL), homeostasis model assessment (HOMA), insulin (μU/mL), adiponectin (μg/mL), leptin (ng/mL), MCP-1 (pg/dL), and resistin (ng/mL). The class label (breast cancer) was assigned when the subject had positive mammography and was histologically confirmed. The purpose of this study is to diagnose whether a subject will have breast cancer in the future. Therefore, we proposed SVM and extra-trees model to predict whether a subject will later develop breast cancer.
Risk factor significance reflects the relationship between attributes and the subject’s disease class. We used Pearson’s correlation coefficient to investigate this relationship, which varies from −1 to +1, with a negative or positive value indicating a negative or positive correlation and a larger absolute value indicating a stronger correlation, respectively. Attributes with high correlation to the output class could be utilized as input features to maximize prediction model accuracy. Figure 1 shows that levels of glucose, insulin, HOMA, and resistin have a high positive correlation toward the class, whereas leptin has a poor correlation.
Figure 1. Attribute correlation for breast cancer dataset.

3.2. Breast Cancer Prediction Model

The proposed model consists of SVM, and extra-trees were utilized to predict breast cancer; the details can be seen in Figure 2. In our study, we employed data pre-processing to remove inappropriate and inconsistent data. During the pre-processing stage, data normalization was applied by rescaling numeric attributes into [0, 1]. The extra-trees feature selection method was utilized to remove irrelevant features, while SVM-based prediction was used as a classifier. Performance was evaluated by comparing the proposed with other machine learning models. In this study, the stratified 10-fold cross-validation (CV) was utilized for the proposed and other machine learning models. K-fold CV works by splitting the dataset into k subsets of equal size, and the instances for each subset or fold are randomly selected. Each subset, in turn, is used for testing and the remainder for the training set. The model is evaluated k times such that each subset is used once as the test set. In stratified k-fold cross-validation, each subset is stratified so that they contain approximately the same proportion of class labels as the original dataset.
Figure 2. Flow diagram of the proposed model for breast cancer prediction.

3.3. Extra-Trees Feature Selection Method

Feature selection removes redundant and irrelevant features to improve machine learning quality and efficiency. The feature selection can be categorized into three methods: they are wrapper, filter, and embedded methods []. In our study, we utilized the extra-trees algorithm as one of the examples of an embedded method to extract relevant features [].
Extra-trees generate a large number of individual decision trees from the whole training dataset. For the root node, the algorithm chooses a split rule based on a random subset of features (K) and a partially random cut point. It selects a random split to divide the parent node into two random child nodes. This process is repeated in each child node until reaching a leaf node. A leaf node is a node that does not have a child node. The predictions of all the trees are combined to establish the final prediction through a majority vote. To perform feature selection, during the construction of the forest, for each feature, the Gini importance is computed. Each feature is ordered in descending order according to the Gini importance of each feature. Finally, the user selects the top k features according to his/her choice to be used as input for the classification model.
Figure 3 shows the attribute ranking for the breast cancer dataset generated by extra-trees model. We investigated the importance of the features so that the significant attributes can be used for the classification model input. Finally, we found that the top three breast cancer features are the features that can maximize model accuracies, namely levels of glucose, age, and resistin.
Figure 3. Feature importance using extra-trees classifier on breast cancer dataset.

3.4. SVM

In this study, the SVM algorithm was utilized as a prediction model for breast cancer. SVM can be extended to solve nonlinear classification tasks when the original data cannot be separated linearly. By applying kernel functions, the original data are mapped onto a high-dimensional feature space, in which the linear classification makes it possible to separate instances of each class from the others []; for the case of separating training vectors belonging to two linearly separable classes,
( x i   , y i ) ,   x i   R n ,   y i   { + 1 , 1 } ,   i = 1 , , n   ,
where x i is a real-valued n-dimensional input vector, and y i is the class label associated with the training vector. The separating hyperplane is determined by an orthogonal vector w and bias b, which identify points that satisfy
w · x + b = 0 .
Thus, the classification mechanism for SVM can be expressed as
m a x α [   i = 1 n α i 1 2   i , j = 1 n α i α j y i y j K ( x i . x j ) ] ,  
with constraints
i = 1 n α i y j = 0 ,   0 α i C ,   i = 1 , 2 , . , n ,  
where α is the parameter vector for the classifier hyperplane, and C is a penalty parameter to control the number of misclassifications [].
We implemented the machine learning models in Python V3.7.3, utilizing the Scikit-learn V0.22.2 library []. In our SVM model, we set the regularization parameters C = 1 and radial basis function (RBF) as kernel K. The default parameters from Scikit-learn were used for other classification models. In addition, we selected the maximum number of features in the extra-trees classifier as max_features = sqrt(n_features). Therefore, the final maximum number of features is three out of nine features. The experiments were performed on an Intel Core i5-4590 computer with 8 GB RAM running Windows 7 64-bit.
Table 1 shows the confusion matrix, a useful tool to analyze classifier performance. True positive (TP) and true negative (TN) represent data that are correctly classified, whereas false positive (FP) and false negative (FN) represent data incorrectly classified. For this dataset, the patients that are diagnosed with breast cancer are labeled as 1, while normal patients are labeled as 0. Average performance metrics, such as accuracy (%), precision (%), sensitivity or recall (%), specificity (%), and area under the receiver operating characteristic curves (AUC) were obtained by conducting 10 runs of stratified 10-fold CV. Table 2 shows the classification model performance metrics based on the average value for all cross-validations.
Table 1. Classifier confusion matrix.
Table 2. Classifier model performance metrics.

4. Results and Discussions

4.1. Breast Cancer Model Performances

We assessed how well the machine learning model performed and how feature selection affected the model’s accuracy. The proposed SVM with extra-trees was compared with other data-driven models to predict breast cancer using known risk factors. The ML algorithms, namely logistic regression (LR), multi-layer perceptron (MLP), decision tree (DT), K-nearest neighbor (KNN), random forest (RF), naïve Bayes (NB), eXtreme Gradient Boosting (XGBoost), and adaptive boosting (AdaBoost) were employed as breast cancer prediction models. Averaging over 10 iterations of stratified 10-fold CV, the metrics for model performance are displayed in Table 3. Our proposed model combining SVM with extra-trees was higher in accuracy, precision, specificity, sensitivity, and AUC rates by up to 80.23%, 82.71%, 78.57%, 78.57%, and 0.78, respectively. Our proposed model outperformed other models for all metrics except for the recall, where XGBoost showed the better result. Finally, the proposed model achieved a 13.61% average accuracy improvement as contrasted with the other breast cancer prediction models.
Table 3. Performance metrics for breast cancer prediction.
We further evaluated the model performance by considering the receiver operating characteristic curve (ROC) as a specific metric for an unbalanced dataset []. False-positive and false-negative results are contrasted via the ROC curve, where AUC ≈ 1 is considered the best model []. Figure 4 displays the ROC curves analysis for the suggested and additional categorization models under consideration. Our findings indicated that the suggested model had the highest AUC, which was 0.78.
Figure 4. ROC analysis for the breast cancer prediction models.

4.2. Impact Analysis of Extra-Trees-Based Feature Selection

Figure 5 shows the effect of feature selection on the precision of classification models. The outcome showed that improved accuracy was offered by using the feature selection strategy for prediction models as opposed to using all attributes as input except for AdaBoost. Our findings demonstrated that we could increase classifier accuracy by deleting unimportant characteristics. Last but not least, adding additional tree-based feature selection increases average accuracy by up to 7.29% when compared to classifiers without the feature selection method.
Figure 5. Impact of feature selection on classification accuracy.
Worldwide risk variables differ, and prior studies demonstrated that significant predictors for breast cancer may be extracted. This proposed study used extra-trees classifier to select the optimal features. The extra-trees algorithm identified the top three risk factors for breast cancer, which are levels of glucose, age, and resistin. These results were consistent with the previous findings to improve the diagnosis of breast cancer [].

4.3. Study Comparison with Earlier Works

In this study, we compared the findings to those of studies that used the same breast cancer Coimbra dataset in the past. Table 4 presents a comparison of the findings between our study and earlier research.
Ghani et al. [] applied recursive feature elimination (RFE) for feature selection and various classification models such as DT, KNN, NB, and ANN. The hold-out validation method (80% for training and 30% for testing) was employed to evaluate the performance of each ML model. The results showed that ANN performed the greatest accuracy up to 80%.
Khatun et al. [] evaluated four ML models, such as NB, RF, MLP, and simple LR, and applied the hold-out method by splitting the data into 80% training and 20% testing. Their study revealed that MLP outperformed other ML models by achieving up to 85% accuracy rate.
Nanglia et al. [] utilized the ensemble model and chi-square-based feature selection for breast cancer prediction on the Coimbra dataset. They formed a stacking ensemble model using three ML algorithms, such as SVM, DT, and KNN, and applied a 20-fold CV for the validation. The greatest accuracy was achieved by the model by as much as 78% as contrasted to other classifiers employed in their work.
Rasool et al. [] developed a model with RFE for the same Coimbra breast dataset. They applied the hold-out validation (80% for training and 20% for testing) method and achieved the highest accuracy up to 76.42% for the polynomial SVM classifier.
MLP continues to have the highest classification accuracy on the Coimbra dataset []; however, it should be emphasized that, in contrast to 10-fold cross-validation, they used the hold-out validation approach (80%/20% for training and testing), which is less trustworthy and increases the likelihood of over-fitting and over-optimism []. Furthermore, none of the earlier studies offered a real-world web-based application of their research in practice. We therefore constructed and implemented a web-based application of our model for breast cancer prediction in this study.
Table 4. Comparison of our study with previous works.
Table 4. Comparison of our study with previous works.
Author/MethodFeature SelectionValidation MethodAccuracy (%)Practical Application
Ghani et al. (2019) []/ANNRFEHold-out (70/30)80No
Khatun et al. (2021) []/MLP-Hold-out (80/20)85No
Nanglia et al. (2022) []/Ensemble modelChi-square20-fold CV78No
Rasool et al. (2022) []/Polynomial SVMRFEHold-out (80/20)76.42No
Our study/SVM and Extra-treesExtra-trees10-fold stratified CV80.23Yes
Notes: ANN, artificial neural networks; MLP, multilayer perceptron; RFE, recursive features elimination; CV, cross-validation.
It is important to note that as the reported results were obtained using various classification models, parameter settings, and validation techniques, comparing their direct performance is not fair. As a result, the results shown in Table 4 may not only be used to support the effectiveness of the categorization models but also to compare our study to earlier research in general.

4.4. Management Implications

Web-based diagnostics have been widely utilized by researchers to detect risks and facilitate decision making in a range of contexts, including the prediction of chronic disease [,], violent behavior [], self-care [], and preventive medicine []. Therefore, the objective of our work is to design and implement a web-based cancer screening tool that will aid the medical team in making screening decisions. The developed web-based breast cancer prediction was implemented in Python V3.9 and Flask V2.2, while the proposed prediction model was implemented using Scikit-learn V1.1.1 on the server side. Figure 6a illustrates how a user (medical team) can access an application through their web browser on a computer or a mobile device. The user can then utilize the diagnosis form as an input feature to submit it. The input feature data are then transmitted to a web server, and our proposed model is employed to diagnose the subjects’ breast cancer status. The diagnosis result is then presented to the user in the prediction output interface. The proposed breast cancer prediction model is developed from breast cancer data and implements an extra-tree model for feature selection and SVM to predict the final breast cancer status.
Figure 6. The developed web-based breast cancer prediction: (a) designed framework; (b) input form interface; (c) prediction output interface.
The diagnosis form interface that users can input is shown in Figure 6b. When all the required fields have been filled out, the user can click the “submit” button to send the information to a secure remote server, which loads our model to predict the subjects’ risk of breast cancer. The resulting interface, as depicted in Figure 6c, receives the prediction status after that. This application is expected to help individuals with an early breast cancer diagnosis and improve the performance of breast cancer classification. Therefore, preventive actions or further treatments can be provided to each individual.

5. Conclusions and Future Works

Our study proposed a breast cancer prediction model based on SVM and extra-trees. A dataset incorporating breast cancer risk factors was employed. Our proposed combined SVM and Extra-trees model was contrasted with other ML prediction models, i.e., LR, NB, KNN, DT, RF, AdaBoost, MLP, and XGBoost. Our study showed that the proposed model outperformed other models, achieving accuracy = 80.23%. The extra-trees classifier was used to identify significant features from the dataset. Furthermore, by utilizing extra-trees as a feature selection method, the average ML prediction accuracy was improved by up to 7.29% as contrasted to ML without a feature selection method. In addition, we integrated our prediction model into web-based breast cancer prediction. This web-based system can be utilized to support the medical team in the decision-making practice regarding breast cancer. Finally, our study is expected to improve healthcare systems and help reduce the breast cancer risk for individuals.
Our study focused on a small set of the population; thus, the result may not be generalized for wider cases. A future study should consider other clinical datasets, prediction models, and feature selection methods.

Author Contributions

Conceptualization, G.A. and M.S.; methodology, G.A.; software, M.S. and N.L.F.; validation, F.T.D.A., T.W. and N.B.; formal analysis, M.S.; investigation, N.L.F.; resources, F.B. and F.T.D.A.; data curation, T.W. and N.B.; writing—original draft preparation, G.A.; writing—review and editing, G.A. and M.S.; visualization, I.F. and T.W.; supervision, J.R.; project administration, I.F.; funding acquisition, F.B. and J.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The dataset can be downloaded here: [].

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

AbbreviationDefinition
SVMSupport vector machine
Extra-treesExtremely randomized trees classifier
CVCross-validation
AIArtificial intelligence
MLMachine learning
WHOWorld Health Organization
WBCDWisconsin Breast Cancer dataset
CHUCUniversity Hospital Centre of Coimbra
LRLogistic regression
CBCDCoimbra Breast Cancer dataset
GNBGaussian naïve Bayes
BPNNBack-propagation neural network
TPTrue positive
KNNK-nearest neighbor
V-CDNNVoting convergent difference neural network
MLPMulti-layer perceptron
CFSCurvature-based feature selection
RFRandom forest
DTDecision tree
NBNaïve Bayes
TNTrue negative
GBMGradient boosting method
BMIBody mass index
ANNArtificial neural network
RBFRadial basis function
NLPNatural language processing
WBCWhite blood cell
ELMExtreme learning machine
AUCArea under the receiver operating characteristic curves
FNFalse negative
HOMAHomeostasis model assessment
FPFalse positive
AdaBoostAdaptive boosting
XGBoosteXtreme Gradient Boosting
ROCReceiver operating characteristic curve
RFERecursive features elimination

References

  1. Alfian, G.; Syafrudin, M.; Fitriyani, N.L.; Anshari, M.; Stasa, P.; Svub, J.; Rhee, J. Deep Neural Network for Predicting Diabetic Retinopathy from Risk Factors. Mathematics 2020, 8, 1620. [Google Scholar] [CrossRef]
  2. Alfian, G.; Syafrudin, M.; Fitriyani, N.L.; Syaekhoni, M.A.; Rhee, J. Utilizing IoT-Based Sensors and Prediction Model for Health-Care Monitoring System. In Artificial Intelligence and Big Data Analytics for Smart Healthcare; Elsevier: Amsterdam, The Netherlands, 2021; pp. 63–80. ISBN 978-0-12-822060-3. [Google Scholar]
  3. Fitriyani, N.L.; Syafrudin, M.; Alfian, G.; Rhee, J. Development of Disease Prediction Model Based on Ensemble Learning Approach for Diabetes and Hypertension. IEEE Access 2019, 7, 144777–144789. [Google Scholar] [CrossRef]
  4. Fitriyani, N.L.; Syafrudin, M.; Alfian, G.; Fatwanto, A.; Qolbiyani, S.L.; Rhee, J. Prediction Model for Type 2 Diabetes Using Stacked Ensemble Classifiers. In Proceedings of the 2020 International Conference on Decision Aid Sciences and Application (DASA), Sakheer, Bahrain, 8–9 November 2020; pp. 399–402. [Google Scholar]
  5. Ferlay, J.; Soerjomataram, I.; Dikshit, R.; Eser, S.; Mathers, C.; Rebelo, M.; Parkin, D.M.; Forman, D.; Bray, F. Cancer incidence and mortality worldwide: Sources, methods and major patterns in GLOBOCAN 2012. Int. J. Cancer 2015, 136, E359–E386. [Google Scholar] [CrossRef] [PubMed]
  6. Breast Cancer. Available online: https://www.who.int/news-room/fact-sheets/detail/breast-cancer (accessed on 15 August 2021).
  7. Alkabban, F.M.; Ferguson, T. Breast Cancer. In StatPearls; StatPearls Publishing: Treasure Island, FL, USA, 2022. [Google Scholar]
  8. Hortobagyi, G.N.; de la Garza Salazar, J.; Pritchard, K.; Amadori, D.; Haidinger, R.; Hudis, C.A.; Khaled, H.; Liu, M.-C.; Martin, M.; Namer, M.; et al. The Global Breast Cancer Burden: Variations in Epidemiology and Survival. Clin. Breast Cancer 2005, 6, 391–401. [Google Scholar] [CrossRef] [PubMed]
  9. Akben, S. Determination of the Blood, Hormone and Obesity Value Ranges that Indicate the Breast Cancer, Using Data Mining Based Expert System. IRBM 2019, 40, 355–360. [Google Scholar] [CrossRef]
  10. Dalwinder, S.; Birmohan, S.; Manpreet, K. Simultaneous feature weighting and parameter determination of Neural Networks using Ant Lion Optimization for the classification of breast cancer. Biocybern. Biomed. Eng. 2019, 40, 337–351. [Google Scholar] [CrossRef]
  11. Zuo, Z.; Li, J.; Xu, H.; Al Moubayed, N. Curvature-based feature selection with application in classifying electronic health records. Technol. Forecast. Soc. Chang. 2021, 173, 121127. [Google Scholar] [CrossRef]
  12. Zhang, Z.; Chen, B.; Xu, S.; Chen, G.; Xie, J. A novel voting convergent difference neural network for diagnosing breast cancer. Neurocomputing 2021, 437, 339–350. [Google Scholar] [CrossRef]
  13. Austria, Y.D.; Lalata, J.-A.; Maria, L.B.S., Jr.; Goh, J.E.; Goh, M.L.; Vicente, H. Comparison of Machine Learning Algorithms in Breast Cancer Prediction Using the Coimbra Dataset. Int. J. Simul. Syst. Sci. Technol. 2019, 20, 23.1–23.8. [Google Scholar] [CrossRef]
  14. Nanglia, S.; Ahmad, M.; Khan, F.A.; Jhanjhi, N. An enhanced Predictive heterogeneous ensemble model for breast cancer prediction. Biomed. Signal Process. Control 2021, 72, 103279. [Google Scholar] [CrossRef]
  15. Akay, M.F. Support vector machines combined with feature selection for breast cancer diagnosis. Expert Syst. Appl. 2009, 36, 3240–3247. [Google Scholar] [CrossRef]
  16. Patrício, M.; Pereira, J.; Crisóstomo, J.; Matafome, P.; Gomes, M.; Seiça, R.; Caramelo, F. Using Resistin, glucose, age and BMI to predict the presence of breast cancer. BMC Cancer 2018, 18, 29. [Google Scholar] [CrossRef]
  17. Rahman, M.; Ghasemi, Y.; Suley, E.; Zhou, Y.; Wang, S.; Rogers, J. Machine Learning Based Computer Aided Diagnosis of Breast Cancer Utilizing Anthropometric and Clinical Features. IRBM 2020, 42, 215–226. [Google Scholar] [CrossRef]
  18. Alnowami, M.R.; Abolaban, F.A.; Taha, E. A Wrapper-Based Feature Selection Approach to Investigate Potential Biomarkers for Early Detection of Breast Cancer. J. Radiat. Res. Appl. Sci. 2022, 15, 104–110. [Google Scholar] [CrossRef]
  19. Nicula, B.; Dascalu, M.; Newton, N.N.; Orcutt, E.; McNamara, D.S. Automated Paraphrase Quality Assessment Using Language Models and Transfer Learning. Computers 2021, 10, 166. [Google Scholar] [CrossRef]
  20. Baby, D.; Devaraj, S.J.; Hemanth, J.; M, A.R.M. Leukocyte classification based on feature selection using extra trees classifier: A transfer learning approach. Turk. J. Electr. Eng. Comput. Sci. 2021, 29, 2742–2757. [Google Scholar] [CrossRef]
  21. Sharma, J.; Giri, C.; Granmo, O.-C.; Goodwin, M.; Sharma, J.; Giri, C.; Granmo, O.-C.; Goodwin, M. Multi-layer intrusion detection system with ExtraTrees feature selection, extreme learning machine ensemble, and softmax aggregation. EURASIP J. Inf. Secur. 2019, 2019, 15. [Google Scholar] [CrossRef]
  22. Breast Cancer Dataset. Available online: https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Coimbra (accessed on 1 June 2022).
  23. Guyon, I. Feature Extraction Foundations and Applications; Springer: Berlin, Germany, 2006; Volume 207, ISBN 9783540354871. [Google Scholar]
  24. Geurts, P.; Ernst, D.; Wehenkel, L. Extremely randomized trees. Mach. Learn. 2006, 63, 3–42. [Google Scholar] [CrossRef]
  25. Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
  26. Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
  27. Bradley, A.P. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit. 1997, 30, 1145–1159. [Google Scholar] [CrossRef] [Green Version]
  28. Huang, J.; Ling, C. Using AUC and accuracy in evaluating learning algorithms. IEEE Trans. Knowl. Data Eng. 2005, 17, 299–310. [Google Scholar] [CrossRef]
  29. Ghani, M.U.; Alam, T.M.; Jaskani, F.H. Comparison of Classification Models for Early Prediction of Breast Cancer. In Proceedings of the 2019 International Conference on Innovative Computing (ICIC), Lahore, Pakistan, 9–10 November 2019; pp. 1–6. [Google Scholar]
  30. Khatun, T.; Utsho, M.M.R.; Islam, M.A.; Zohura, M.F.; Hossen, M.S.; Rimi, R.A.; Anni, S.J. Performance Analysis of Breast Cancer: A Machine Learning Approach. In Proceedings of the 2021 Third International Conference on Inventive Research in Computing Applications (ICIRCA), Coimbatore, India, 2–4 September 2021; pp. 1426–1434. [Google Scholar]
  31. Rasool, A.; Bunterngchit, C.; Tiejian, L.; Islam, R.; Qu, Q.; Jiang, Q. Improved Machine Learning-Based Predictive Models for Breast Cancer Diagnosis. Int. J. Environ. Res. Public Health 2022, 19, 3211. [Google Scholar] [CrossRef] [PubMed]
  32. Santos, M.S.; Soares, J.P.; Abreu, P.H.; Araujo, H.; Santos, J. Cross-Validation for Imbalanced Datasets: Avoiding Overoptimistic and Overfitting Approaches [Research Frontier]. IEEE Comput. Intell. Mag. 2018, 13, 59–76. [Google Scholar] [CrossRef]
  33. Alfian, G.; Syafrudin, M.; Ijaz, M.F.; Syaekhoni, M.A.; Fitriyani, N.L.; Rhee, J. A Personalized Healthcare Monitoring System for Diabetic Patients by Utilizing BLE-Based Sensors and Real-Time Data Processing. Sensors 2018, 18, 2183. [Google Scholar] [CrossRef] [PubMed]
  34. Fitriyani, N.L.; Syafrudin, M.; Alfian, G.; Rhee, J. HDPM: An Effective Heart Disease Prediction Model for a Clinical Decision Support System. IEEE Access 2020, 8, 133034–133050. [Google Scholar] [CrossRef]
  35. Krebs, J.; Negatsch, V.; Berg, C.; Aigner, A.; Opitz-Welke, A.; Seidel, P.; Konrad, N.; Voulgaris, A. Applicability of two violence risk assessment tools in a psychiatric prison hospital population. Behav. Sci. Law 2020, 38, 471–481. [Google Scholar] [CrossRef]
  36. Syafrudin, M.; Alfian, G.; Fitriyani, N.L.; Anshari, M.; Hadibarata, T.; Fatwanto, A.; Rhee, J. A Self-Care Prediction Model for Children with Disability Based on Genetic Algorithm and Extreme Gradient Boosting. Mathematics 2020, 8, 1590. [Google Scholar] [CrossRef]
  37. Yu, C.-S.; Lin, Y.-J.; Lin, C.-H.; Lin, S.-Y.; Wu, J.L.; Chang, S.-S. Development of an Online Health Care Assessment for Preventive Medicine: A Machine Learning Approach. J. Med. Internet Res. 2020, 22, e18585. [Google Scholar] [CrossRef]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.