Empirical Analysis of Data Sampling-Based Decision Forest Classifiers for Software Defect Prediction
Abstract
:1. Introduction
- This study aims to empirically assess the effectiveness of decision forest (DF) models (CS-Forest, FPA, and FT) on balanced and imbalanced SDP datasets;
- The objective is to create improved ensemble variants of decision forest models (CS-Forest, FPA, and FT) utilizing diverse homogeneous ensemble techniques;
- To empirically assess and compare DF models (CS-Forest, FPA, and FT) alongside their enhanced ensemble variants with current SDP models.
2. Related Works
3. Methodology
3.1. Decision Forest Classifiers
3.1.1. Cost-Sensitive Forest (CS-Forest) Classifier
Algorithm 1. Cost-Sensitive Forest Classifier (CS-Forest) |
Input: Training dataset with features and corresponding class labels: D Misclassification cost matrix indicating the penalty for misclassifying one class as another: C Number of trees in the forest: n Test dataset for evaluation: T Maximum depth or criteria for splitting a decision tree: k Output: Predicted class labels for instances in T Begin 1. Initialization:
For each tree i (where i ϵ {1, 2, 3, …, n }:
4. Output: Output the predicted class labels for all instance in T End |
3.1.2. Forest Penalizing Attribute (FPA) Classifier
Algorithm 2. Forest Penalizing Attribute (FPA) Classifier |
Input: Dataset D with attributes A and class labels C Number of decision trees T Weights Range (WR) configuration Parameters: λ (attribute level), ρ (overlap prevention factor) Maximum depth or criteria for splitting a decision tree: k Output: Predicted class labels for instances by FPA Begin 1. Initialize Parameters:
For each tree t in T:
3. Aggregate Prediction:
Output the predicted class labels for all instance in D End |
3.1.3. Functional Tree (FT) Classifier
Algorithm 3. Functional Tree (FT) Classifier |
Input: Training Dataset where = inputs features, = class labels Parameters for pruning and constructor functions configurations. Output: Predicted class labels for instances by FT Begin 1. Initialize Parameters:
An FT classifier capable of classifying new data instances. End |
3.2. Homogeneous Ensemble Methods
3.2.1. Bootstrap Aggregating (Bagging) Technique
Algorithm 4. Bagging Technique |
Input: Training Dataset D with instances N Number of base classifiers T Base Classifiers {CS-Forest, FPA, FT} Output: The ensemble model E for predictions. Begin 1. Initialize Parameters:
The ensemble model E for predictions End |
3.2.2. Boosting Technique
Algorithm 5. Boosting Technique (Adaboost) |
Input: Training Dataset where = inputs features, = class labels Number of Iterations T Base Classifiers {CS-Forest, FPA, FT} Output: The ensemble (strong) model H(x) for predictions. Begin 1. Initialize Weights:
Train a Weak Classifier: Train a weak classifier using the weighted training dataset. where α equals 1 if the condition is true and 0 otherwise. Compute Classifier Weight: Calculate the weight of the weak classifier Update Sample Weight: Update the weights of the training samples: Normalize the weights such that 3. Aggregation Weak Classifier:
The ensemble (strong) model H(x) for predictions End |
3.3. Synthetic Minority Oversampling Technique (SMOTE)
Algorithm 6. Synthetic Minority Oversampling Technique (SMOTE) |
Input: Minority class dataset: Number of synthetic samples to generate: N Number of nearest neighbors: k Output: Augmented dataset with N synthetic samples. Begin 1. Compute k-nearest neighbors: For each instance in the minority class calculate its k-nearest neighbors using a distance metric (e.g., Euclidean distance) 2. Select neighbors for oversampling: Randomly select one or more neighbors from the k-nearest neighbors for each 3. Generate synthetic instances
Continue the process until N synthetic instances have been generated. 5. Augment dataset: Combine the synthetic instance with the original dataset to form the augmented dataset. End |
3.4. Software Defect Datasets
3.5. Experimental Framework
3.6. Experimental Performance Metrics
4. Results and Discussion
4.1. Scenario 1: Experimental Results of DF Models and the Baseline Classifiers
4.2. Scenario 2: Experimental Results of Enhanced DF Models
4.3. Scenario 3: Comparison of DF Models and Its Enhanced Variations with Current SDP Models
5. Threat to Validity
6. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Laplante, P.A.; Kassab, M. Requirements Engineering for Software and Systems; Auerbach Publications: Boca Raton, FL, USA, 2022. [Google Scholar]
- Westfall, L. Software requirements engineering: What, why, who, when, and how. Softw. Qual. Prof. 2005, 7, 17. [Google Scholar]
- Luo, L.; He, Q.; Xie, J.; Yang, D.; Wu, G. Investigating the relationship between project complexity and success in complex construction projects. J. Manag. Eng. 2017, 33, 04016036. [Google Scholar]
- Nan, N.; Harter, D.E. Impact of budget and schedule pressure on software development cycle time and effort. IEEE Trans. Softw. Eng. 2009, 35, 624–637. [Google Scholar]
- Menzies, T.; Nichols, W.; Shull, F.; Layman, L. Are delayed issues harder to resolve? Revisiting cost-to-fix of defects throughout the lifecycle. Empir. Softw. Eng. 2017, 22, 1903–1935. [Google Scholar]
- Humphrey, W.S. Why big software projects fail: The 12 key questions. In Software Management; John Wiley & Sons: Hoboken, NJ, USA, 2006; pp. 21–27. [Google Scholar]
- Humphrey, W.S. Psp (sm): A Self-Improvement Process for Software Engineers; Addison-Wesley Professional: Boston, MA, USA, 2005. [Google Scholar]
- Wu, W.; Wang, S.; Liu, B.; Shao, Y.; Xie, W. A novel software defect prediction approach via weighted classification based on association rule mining. Eng. Appl. Artif. Intell. 2024, 129, 107622. [Google Scholar]
- Leszak, M.; Perry, D.E.; Stoll, D. A case study in root cause defect analysis. In Proceedings of the 22nd International Conference on Software Engineering, Limerick, Ireland, 4–11 June 2000; pp. 428–437. [Google Scholar]
- Catal, C. Software fault prediction: A literature review and current trends. Expert Syst. Appl. 2011, 38, 4626–4636. [Google Scholar] [CrossRef]
- Koçan, M.; Yıldız, E. Evaluation of Consumer Complaints: A Case Study Using MAXQDA 2020 Data Analysis Software. Çankırı Karatekin Üniversitesi İktisadi Ve İdari Bilim. Fakültesi Derg. 2024, 14, 266–289. [Google Scholar] [CrossRef]
- Kumar, G.; Imam, A.A.; Basri, S.; Hashim, A.S.; Naim, A.G.H.; Capretz, L.F.; Balogun, A.O.; Mamman, H. Ensemble Balanced Nested Dichotomy Fuzzy Models for Software Requirement Risk Prediction. IEEE Access 2024, 12, 146225–146243. [Google Scholar]
- Bayramova, T.A.; Malikova, N.C. Developing a conceptual model for improving the software system reliability. Probl. Inf. Soc. 2024, 15, 42–56. [Google Scholar] [CrossRef]
- Phung, K.; Ogunshile, E.; Aydin, M. Error-type—A novel set of software metrics for software fault prediction. IEEE Access 2023, 11, 30562–30574. [Google Scholar] [CrossRef]
- Li, Z.; Niu, J.; Jing, X.-Y. Software defect prediction: Future directions and challenges. Autom. Softw. Eng. 2024, 31, 19. [Google Scholar]
- Mashhadi, E.; Chowdhury, S.; Modaberi, S.; Hemmati, H.; Uddin, G. An empirical study on bug severity estimation using source code metrics and static analysis. J. Syst. Softw. 2024, 217, 112179. [Google Scholar]
- Malek, A.; Balogun, A.O.; Basri, S.; Abdullahi, A.; Imam, A.K.A.; Alazzawi, A.K.; Adeyemo, V.E.; Kumar, G. Empirical Analysis of Threshold Values for Rank-Based Filter Feature Selection Methods in Software Defect Prediction. J. Eng. Sci. Technol. 2023, 18, 187–209. [Google Scholar]
- Ali, M.; Mazhar, T.; Arif, Y.; Al-Otaibi, S.; Ghadi, Y.Y.; Shahzad, T.; Khan, M.A.; Hamam, H. Software defect prediction using an intelligent ensemble-based model. IEEE Access 2024, 12, 20376–20395. [Google Scholar]
- Bashir, A.T.; Balogun, A.O.; Adigun, M.O.; Ajagbe, S.A.; Capretz, L.F.; Awotunde, J.B.; Mojeed, H.A. Cascade Generalization-Based Classifiers for Software Defect Prediction. In Proceedings of the Computer Science Online Conference, Online, 25–28 April 2024; Springer: Berlin/Heidelberg, Germany, 2024; pp. 22–42. [Google Scholar]
- Odejide, B.J.; Bajeh, A.O.; Balogun, A.O.; Alanamu, Z.O.; Adewole, K.S.; Akintola, A.G.; Salihu, S.A.; Usman-Hamza, F.E.; Mojeed, H.A. An empirical study on data sampling methods in addressing class imbalance problem in software defect prediction. In Proceedings of the Computer Science Online Conference, Online, 26–26 April 2022; Springer: Berlin/Heidelberg, Germany, 2022; pp. 594–610. [Google Scholar]
- Balogun, A.O.; Basri, S.; Mahamad, S.; Capretz, L.F.; Imam, A.A.; Almomani, M.A.; Adeyemo, V.E.; Kumar, G. A Novel Rank Aggregation-Based Hybrid Multifilter Wrapper Feature Selection Method in Software Defect Prediction. Comput. Intell. Neurosci. 2021, 2021, 5069016. [Google Scholar] [PubMed]
- Balogun, A.O.; Basri, S.; Abdulkadir, S.J.; Hashim, A.S. Performance analysis of feature selection methods in software defect prediction: A search method approach. Appl. Sci. 2019, 9, 2764. [Google Scholar] [CrossRef]
- Nama, P. Integrating AI in testing automation: Enhancing test coverage and predictive analysis for improved software quality. World J. Adv. Eng. Technol. Sci. 2024, 13, 769–782. [Google Scholar]
- Batarseh, F.A.; Gonzalez, A.J. Predicting failures in agile software development through data analytics. Softw. Qual. J. 2018, 26, 49–66. [Google Scholar]
- Khan, M.F.I.; Masum, A.K.M. Predictive Analytics and Machine Learning for Real-Time Detection Of Software Defects And Agile Test Management. Educ. Adm. Theory Pract. 2024, 30, 1051–1057. [Google Scholar]
- Croft, R.; Babar, M.A.; Kholoosi, M.M. Data quality for software vulnerability datasets. In Proceedings of the 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), Melbourne, Australia, 14–20 May 2023; pp. 121–133. [Google Scholar]
- Shiri Harzevili, N.; Boaye Belle, A.; Wang, J.; Wang, S.; Jiang, Z.M.; Nagappan, N. A Systematic Literature Review on Automated Software Vulnerability Detection Using Machine Learning. ACM Comput. Surv. 2024, 57, 1–36. [Google Scholar]
- Wang, J.; Liu, Y.; Li, P.; Lin, Z.; Sindakis, S.; Aggarwal, S. Overview of data quality: Examining the dimensions, antecedents, and impacts of data quality. J. Knowl. Econ. 2024, 15, 1159–1178. [Google Scholar]
- Balogun, A.O.; Basri, S.; Said, J.A.; Adeyemo, V.E.; Imam, A.A.; Bajeh, A.O. Software defect prediction: Analysis of class imbalance and performance stability. J. Eng. Sci. Technol. 2019, 14, 3294–3308. [Google Scholar]
- Pandey, S.; Kumar, K. Software fault prediction for imbalanced data: A survey on recent developments. Procedia Comput. Sci. 2023, 218, 1815–1824. [Google Scholar]
- Balogun, A.O.; Odejide, B.J.; Bajeh, A.O.; Alanamu, Z.O.; Usman-Hamza, F.E.; Adeleke, H.O.; Mabayoje, M.A.; Yusuff, S.R. Empirical analysis of data sampling-based ensemble methods in software defect prediction. In Proceedings of the International Conference on Computational Science and Its Applications, Malaga, Spain, 4–7 July 2022; pp. 363–379. [Google Scholar]
- Pachouly, J.; Ahirrao, S.; Kotecha, K.; Selvachandran, G.; Abraham, A. A systematic literature review on software defect prediction using artificial intelligence: Datasets, Data Validation Methods, Approaches, and Tools. Eng. Appl. Artif. Intell. 2022, 111, 104773. [Google Scholar]
- Yin, L.; Sun, Z.; Gao, F.; Liu, H. Deep forest regression for short-term load forecasting of power systems. IEEE Access 2020, 8, 49090–49099. [Google Scholar]
- Adnan, M.N.; Islam, M.Z. Forest PA: Constructing a decision forest by penalizing attributes used in previous trees. Expert Syst. Appl. 2017, 89, 389–403. [Google Scholar]
- Siers, M.J.; Islam, M.Z. Software defect prediction using a cost sensitive decision forest and voting, and a potential solution to the class imbalance problem. Inf. Syst. 2015, 51, 62–71. [Google Scholar]
- Chen, Y.; Yang, X.; Dai, H.-L. Cost-sensitive continuous ensemble kernel learning for imbalanced data streams with concept drift. Knowl.-Based Syst. 2024, 284, 111272. [Google Scholar]
- Wang, N.; Zhao, S.; Wang, S. A novel clustering-based resampling with cost-sensitive boosting method to model and map wildfire susceptibility. Reliab. Eng. Syst. Saf. 2024, 242, 109742. [Google Scholar]
- Cardoso, P.; Guillerme, T.; Mammola, S.; Matthews, T.J.; Rigal, F.; Graco-Roza, C.; Stahls, G.; Carlos Carvalho, J. Calculating functional diversity metrics using neighbor-joining trees. Ecography 2024, 2024, e07156. [Google Scholar]
- Balogun, A.O.; Adewole, K.S.; Bajeh, A.O.; Jimoh, R.G. Cascade generalization based functional tree for website phishing detection. In Proceedings of the Advances in Cyber Security: Third International Conference, ACeS 2021, Penang, Malaysia, 24–25 August 2021; Revised Selected Papers 3. Springer: Singapore, 2021; pp. 288–306. [Google Scholar]
- Luong, A.V.; Vu, T.H.; Nguyen, P.M.; Van Pham, N.; McCall, J.; Liew, A.W.-C.; Nguyen, T.T. A homogeneous-heterogeneous ensemble of classifiers. In Proceedings of the Neural Information Processing: 27th International Conference, ICONIP 2020, Bangkok, Thailand, 18–22 November 2020; Proceedings, Part V 27. Springer: Cham, Switzerland, 2020; pp. 251–259. [Google Scholar]
- Ramakrishna, M.T.; Venkatesan, V.K.; Izonin, I.; Havryliuk, M.; Bhat, C.R. Homogeneous adaboost ensemble machine learning algorithms with reduced entropy on balanced data. Entropy 2023, 25, 245. [Google Scholar] [CrossRef] [PubMed]
- Jhala, R.; Majumdar, R. Software model checking. ACM Comput. Surv. (CSUR) 2009, 41, 1–54. [Google Scholar]
- Leokhin, Y.; Fatkhulin, T.; Kozhanov, M. Research of Static Application Security Testing Technique Problems and Methods for Solving Them. In Proceedings of the 2024 Systems of Signals Generating and Processing in the Field of on Board Communications, Moscow, Russia, 12–14 March 2024; pp. 1–7. [Google Scholar]
- Smidts, C.; Stutzke, M.; Stoddard, R.W. Software reliability modeling: An approach to early reliability prediction. IEEE Trans. Reliab. 1998, 47, 268–278. [Google Scholar]
- Cortellessa, V.; Singh, H.; Cukic, B. Early reliability assessment of UML based software models. In Proceedings of the 3rd International Workshop on Software and Performance, Rome, Italy, 24–27 July 2002; pp. 302–309. [Google Scholar]
- Gaffney, J.; Davis, C.F. An approach to estimating software errors and availability. In Proceedings of the Eleventh Minnowbrook Workshop on Software ReliabilitySPC-TR-88-007, Version 1.0, Blue Mountain Lake, NY, USA, 26–29 July 1988. [Google Scholar]
- Gaffney, J.; Pietrolewiez, J. An automated model for software early error prediction (SWEEP). In Proceeding of 13th Minnow Brook Workshop on Software Reliability, Blue Mountain Lake, NY, USA, 24–27 July 1990; pp. 45–57. [Google Scholar]
- Al-Jamimi, H.A. Toward comprehensible software defect prediction models using fuzzy logic. In Proceedings of the 2016 7th IEEE International Conference on Software Engineering and Service Science (ICSESS), Beijing, China, 26–28 August 2016; pp. 127–130. [Google Scholar]
- Yadav, H.B.; Yadav, D.K. A fuzzy logic based approach for phase-wise software defects prediction using software metrics. Inf. Softw. Technol. 2015, 63, 44–57. [Google Scholar]
- Adak, M.F. Software defect detection by using data mining based fuzzy logic. In Proceedings of the 2018 Sixth International Conference on Digital Information, Networking, and Wireless Communications (DINWC), Beirut, Lebanon, 25–27 April 2018; pp. 65–69. [Google Scholar]
- Borgwardt, S.; Distel, F.; Peñaloza, R. The limits of decidability in fuzzy description logics with general concept inclusions. Artif. Intell. 2015, 218, 23–55. [Google Scholar]
- Ma, Y.; Qin, K.; Zhu, S. Discrimination Analysis for Predicting Defect-Prone Software Modules. J. Appl. Math. 2014, 2014, 675368. [Google Scholar]
- Jing, X.-Y.; Wu, F.; Dong, X.; Xu, B. An improved SDA based defect prediction framework for both within-project and cross-project class-imbalance problems. IEEE Trans. Softw. Eng. 2016, 43, 321–339. [Google Scholar]
- Naseem, R.; Khan, B.; Ahmad, A.; Almogren, A.; Jabeen, S.; Hayat, B.; Shah, M.A. Investigating tree family machine learning techniques for a predictive system to unveil software defects. Complexity 2020, 2020, 6688075. [Google Scholar]
- Abdulshaheed, M.; Hammad, M.; Alqaddoumi, A.; Obeidat, Q. Mining historical software testing outcomes to predict future results. Compusoft 2019, 8, 3525–3529. [Google Scholar]
- Tantithamthavorn, C.; McIntosh, S.; Hassan, A.E.; Matsumoto, K. The impact of automated parameter optimization on defect prediction models. IEEE Trans. Softw. Eng. 2018, 45, 683–711. [Google Scholar]
- Al Qasem, O.; Akour, M.; Alenezi, M. The influence of deep learning algorithms factors in software fault prediction. IEEE Access 2020, 8, 63945–63960. [Google Scholar] [CrossRef]
- Shen, Z.; Chen, S. A survey of automatic software vulnerability detection, program repair, and defect prediction techniques. Secur. Commun. Netw. 2020, 2020, 8858010. [Google Scholar] [CrossRef]
- Liang, H.; Yu, Y.; Jiang, L.; Xie, Z. Seml: A semantic LSTM model for software defect prediction. IEEE Access 2019, 7, 83812–83824. [Google Scholar] [CrossRef]
- Farid, A.B.; Fathy, E.M.; Eldin, A.S.; Abd-Elmegid, L.A. Software defect prediction using hybrid model (CBIL) of convolutional neural network (CNN) and bidirectional long short-term memory (Bi-LSTM). PeerJ Comput. Sci. 2021, 7, e739. [Google Scholar] [CrossRef]
- Uddin, M.N.; Li, B.; Ali, Z.; Kefalas, P.; Khan, I.; Zada, I. Software defect prediction employing BiLSTM and BERT-based semantic feature. Soft Comput. 2022, 26, 7877–7891. [Google Scholar] [CrossRef]
- Li, Z.; Zhang, H.; Jing, X.-Y.; Xie, J.; Guo, M.; Ren, J. Dssdpp: Data selection and sampling based domain programming predictor for cross-project defect prediction. IEEE Trans. Softw. Eng. 2022, 49, 1941–1963. [Google Scholar] [CrossRef]
- Bennin, K.E.; Keung, J.W.; Monden, A. On the relative value of data resampling approaches for software defect prediction. Empir. Softw. Eng. 2019, 24, 602–636. [Google Scholar] [CrossRef]
- Qiao, L.; Li, X.; Umer, Q.; Guo, P. Deep learning based software defect prediction. Neurocomputing 2020, 385, 100–110. [Google Scholar] [CrossRef]
- Usman-Hamza, F.E.; Balogun, A.O.; Nasiru, S.K.; Capretz, L.F.; Mojeed, H.A.; Salihu, S.A.; Akintola, A.G.; Mabayoje, M.A.; Awotunde, J.B. Empirical analysis of tree-based classification models for customer churn prediction. Sci. Afr. 2024, 23, e02054. [Google Scholar] [CrossRef]
- Ahmadlou, M.; Karimi, M.; Sammen, S.S.; Alsafadi, K. Three novel cost-sensitive machine learning models for urban growth modelling. Geocarto Int. 2024, 39, 2353252. [Google Scholar]
- Van Phong, T.; Ly, H.-B.; Trinh, P.T.; Prakash, I.; Btjvjoes, P. Landslide susceptibility mapping using Forest by Penalizing Attributes (FPA) algorithm based machine learning approach. Vietnam J. Earth Sci. 2020, 42, 237–246. [Google Scholar]
- Gama, J. Functional trees. Mach. Learn. 2004, 55, 219–250. [Google Scholar]
- Mosavi, A.; Shirzadi, A.; Choubin, B.; Taromideh, F.; Hosseini, F.S.; Borji, M.; Shahabi, H.; Salvati, A.; Dineva, A.A. Towards an ensemble machine learning model of random subspace based functional tree classifier for snow avalanche susceptibility mapping. IEEE Access 2020, 8, 145968–145983. [Google Scholar]
- Zhao, C.; Peng, R.; Wu, D. Bagging and boosting fine-tuning for ensemble learning. IEEE Trans. Artif. Intell. 2023, 5, 1728–1742. [Google Scholar]
- Archana, K.; Komarasamy, G. A novel deep learning-based brain tumor detection using the Bagging ensemble with K-nearest neighbor. J. Intell. Syst. 2023, 32, 20220206. [Google Scholar]
- Wu, Y.; Liu, L.; Xie, Z.; Chow, K.-H.; Wei, W. Boosting ensemble accuracy by revisiting ensemble diversity metrics. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 16469–16477. [Google Scholar]
- Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar]
- Fernández, A.; Garcia, S.; Herrera, F.; Chawla, N.V. SMOTE for learning from imbalanced data: Progress and challenges, marking the 15-year anniversary. J. Artif. Intell. Res. 2018, 61, 863–905. [Google Scholar]
- Balogun, A.O.; Lafenwa-Balogun, F.B.; Mojeed, H.A.; Adeyemo, V.E.; Akande, O.N.; Akintola, A.G.; Bajeh, A.O.; Usman-Hamza, F.E. SMOTE-based homogeneous ensemble methods for software defect prediction. In Proceedings of the Computational Science and Its Applications–ICCSA 2020: 20th International Conference, Cagliari, Italy, 1–4 July 2020; Proceedings, Part VI 20. Springer: Berlin/Heidelberg, Germany, 2020; pp. 615–631. [Google Scholar]
- Shepperd, M.; Song, Q.; Sun, Z.; Mair, C. Data quality: Some comments on the nasa software defect datasets. IEEE Trans. Softw. Eng. 2013, 39, 1208–1215. [Google Scholar]
- Hall, M.; Frank, E.; Holmes, G.; Pfahringer, B.; Reutemann, P.; Witten, I.H. The WEKA data mining software: An update. ACM SIGKDD Explor. Newsl. 2009, 11, 10–18. [Google Scholar]
- Davari, A.; Islam, S.; Seehaus, T.; Hartmann, A.; Braun, M.; Maier, A.; Christlein, V. On Mathews correlation coefficient and improved distance map loss for automatic glacier calving front segmentation in SAR imagery. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–12. [Google Scholar]
- Akintola, A.G.; Balogun, A.O.; Capretz, L.F.; Mojeed, H.A.; Basri, S.; Salihu, S.A.; Usman-Hamza, F.E.; Sadiku, P.O.; Balogun, G.B.; Alanamu, Z.O. Empirical analysis of forest penalizing attribute and its enhanced variations for android malware detection. Appl. Sci. 2022, 12, 4664. [Google Scholar] [CrossRef]
- Alsaeedi, A.; Khan, M.Z. Software defect prediction using supervised machine learning and ensemble techniques: A comparative study. J. Softw. Eng. Appl. 2019, 12, 85–100. [Google Scholar]
- Iqbal, A.; Aftab, S.; Ali, U.; Nawaz, Z.; Sana, L.; Ahmad, M.; Husen, A. Performance analysis of machine learning techniques on software defect prediction using NASA datasets. Int. J. Adv. Comput. Sci. Appl. 2019, 10, 300–308. [Google Scholar]
- Babatunde, A.N.; Ogundokun, R.O.; Adeoye, L.B.; Misra, S. Software Defect Prediction Using Dagging Meta-Learner-Based Classifiers. Mathematics 2023, 11, 2714. [Google Scholar] [CrossRef]
- El-Shorbagy, S.A.; El-Gammal, W.M.; Abdelmoez, W.M. Using SMOTE and heterogeneous stacking in ensemble learning for software defect prediction. In Proceedings of the 7th International Conference on Software and Information Engineering, Cairo, Egypt, 2–4 May 2018; pp. 44–47. [Google Scholar]
- Li, R.; Zhou, L.; Zhang, S.; Liu, H.; Huang, X.; Sun, Z. Software defect prediction based on ensemble learning. In Proceedings of the 2019 2nd International Conference on Data Science and Information Technology, Seoul, Republic of Korea, 19–21 July 2019; pp. 1–6. [Google Scholar]
References | SDP Models | Class Imbalance | Findings | Limitations |
---|---|---|---|---|
Al-Jamimi [48] | Fuzzy logic (Takagi-Sugeno fuzzy inference engine) | No | The reported findings demonstrated the ability of fuzzy logic to produce transparent defect prediction models. | Scalability challenges, static rule bases, and dependence on expert-defined membership functions, making it less adaptable to complex and dynamic systems. While hybrid approaches combining fuzzy logic with others exist, they often face integration and computational complexity issues. |
Yadav and Yadav [49] | Fuzzy inference system | No | The projected defect density indicators assist analyze fault severity in software project SDLC artefacts. | |
Adak [50] | MANOVA, Fuzzy logic, and Gini decision tree | No | Hybrid fuzzy logic models with statistical method provide better outcomes than pure fuzzy or data mining models. | |
Ma, Qin and Zhu [52] | kernel discrimination classifier (KDC) | No | KDC can tackle nonlinearly separable and class-imbalanced problems. Experiments show that KDC can give good performances among the comparative methods on the test sets. | They often require extensive computational resources to process high-dimensional data and tune kernel parameters effectively, limiting their scalability. Additionally, the interpretability of KDC or ISDA models is reduced, making it challenging to understand the impact of individual features or justify predictions, especially in domains where transparency is critical. |
Jing, Wu, Dong and Xu [53] | Improved subclass discriminant analysis (ISDA) | No | ISDA performed better than other state-of-the-art within-project class-imbalance learning methods | |
Naseem, Khan, Ahmad, Almogren, Jabeen, Hayat and Shah [54] | Credal Decision Tree (CDT), CS-Forest), Decision Stump (DS), FPA, Hoeffding Tree (HT), DT, Logistic Model Tree (LMT), RF, Random Tree (RT), and REP-Tree (REP-T). | No | RF outperformed other classifiers. | Tree-based classifiers, such as DT and RF, often struggle with imbalanced datasets and the latent class imbalance problem was not considered in the performance of the experimented tree-based classifiers. |
Abdulshaheed, Hammad, Alqaddoumi and Obeidat [55] | kNN, MLP, and RF | No | kNN outperformed other methods such as RF and MLP | The study was limited in scope due to the small number of classifiers and datasets used, and kNN’s performance heavily depended on parameter tuning. |
Al Qasem, Akour and Alenezi [57] | MLP and CNN | No | The study found that adding layers positively impacts the ideal number of layers for each dataset. The best performance was achieved using the ReLU activation function. | DL models are sensitive to hyperparameter settings and could overfit to noise or specific patterns in training data, necessitating careful tuning and regularization techniques. DL models face significant challenges with imbalanced datasets, as they tend to prioritize the majority class, leading to poor performance on minority class predictions. |
Liang, Yu, Jiang and Xie [59] | Semantic LSTM | No | This method outperformed recent defect prediction methods in most projects. | |
Farid, Fathy, Eldin and Abd-Elmegid [60] | CNN and Bi-LSTM (CBIL) | No | The proposed method significantly improved base models. | |
Uddin, Li, Ali, Kefalas, Khan and Zada [61] | Bi-LSTM and BERT | No | It employs BiLSTM to leverage contextual information derived from the embedded token vectors obtained via the BERT model. Additionally, it employs an attention mechanism to identify significant features of the nodes. | |
Qiao, et al. [64] | Deep Neural Network (DPNN) | No | The research results demonstrate that the proposed approach is precise and enhances existing state-of-the-art methods. |
DF Models | Parameter Configuration |
---|---|
CS-Forest | BatchSize = 100; confidenceLevel = 0.25; costGoodness = 0.2; costMatric = (2 × 2) with default value “1”; MinRecLeaf = 10; numberTrees = 60; separation = 0.3 |
FPA | BatchSize = 100; numberTrees = 10; seed = 1; simpleCartMinimumRecords = 2; simpleCartPruningFolds = 2 |
FT | BatchSize = 100; binSplit = False; errorOnProbabilities = False; minNumInstances = 15; modelType = InnerLeaves; numBoostingIterations = 15; useAIC = False; weighTrimBeta = 0.0; |
Homogeneous Ensemble | Parameter Configuration |
---|---|
Bagging | bagSizePercent = 100; calcOutOfBag = False; numIterations = 10; classifiers = {FPA, CS-Forest, FT}; outputOutOfBagComplexityStatistics = False |
Boosting | batchSize = 100; resume = False; numIterations = 10; classifiers = {FPA, CS-Forest, FT}; useResampling = False; weightThreshold = 100 |
Datasets | Instances | Features | Defective Instances | Non-Defective Instances |
---|---|---|---|---|
CM1 | 327 | 38 | 42 | 285 |
KC1 | 1126 | 22 | 294 | 868 |
KC3 | 194 | 40 | 36 | 158 |
MW1 | 250 | 38 | 25 | 225 |
PC1 | 679 | 38 | 55 | 624 |
PC3 | 1053 | 38 | 130 | 923 |
PC4 | 1270 | 38 | 176 | 1094 |
PC5 | 1694 | 39 | 458 | 1236 |
Accuracy | Original Datasets | ||||||||
---|---|---|---|---|---|---|---|---|---|
SDP Models | CM1 | KC1 | KC3 | MW1 | PC1 | PC3 | PC4 | PC5 | Average |
CSForest | 87.16 | 74.78 | 81.44 | 90.00 | 91.90 | 87.56 | 87.65 | 74.11 | 84.33 |
FPA | 86.85 | 76.94 | 79.90 | 89.20 | 91.90 | 87.00 | 88.19 | 76.86 | 84.61 |
FT | 83.49 | 76.42 | 82.47 | 88.80 | 89.70 | 85.70 | 88.89 | 74.81 | 83.79 |
NB | 81.35 | 73.58 | 78.87 | 81.60 | 89.10 | 35.65 | 87.18 | 74.34 | 75.21 |
kNN | 77.98 | 73.24 | 72.16 | 83.60 | 90.70 | 84.96 | 85.70 | 72.53 | 80.11 |
DT | 81.04 | 74.18 | 79.38 | 90.40 | 91.50 | 84.96 | 88.34 | 73.40 | 82.90 |
AUC | Original Datasets | ||||||||
---|---|---|---|---|---|---|---|---|---|
SDP Models | CM1 | KC1 | KC3 | MW1 | PC1 | PC3 | PC4 | PC5 | Average |
CSForest | 0.725 | 0.686 | 0.745 | 0.802 | 0.833 | 0.832 | 0.924 | 0.778 | 0.791 |
FPA | 0.652 | 0.713 | 0.713 | 0.702 | 0.775 | 0.778 | 0.904 | 0.776 | 0.752 |
FT | 0.591 | 0.600 | 0.656 | 0.689 | 0.629 | 0.582 | 0.762 | 0.650 | 0.645 |
NB | 0.645 | 0.681 | 0.662 | 0.779 | 0.790 | 0.766 | 0.833 | 0.690 | 0.731 |
kNN | 0.521 | 0.633 | 0.539 | 0.607 | 0.679 | 0.643 | 0.697 | 0.654 | 0.622 |
DT | 0.570 | 0.604 | 0.653 | 0.503 | 0.598 | 0.616 | 0.722 | 0.651 | 0.615 |
Accuracy | Balanced Datasets | ||||||||
---|---|---|---|---|---|---|---|---|---|
SDP Models | CM1 | KC1 | KC3 | MW1 | PC1 | PC3 | PC4 | PC5 | Average |
CSForest | 81.40 | 76.04 | 76.27 | 86.67 | 94.31 | 89.71 | 93.11 | 76.25 | 84.22 |
FPA | 88.25 | 83.12 | 87.03 | 91.33 | 93.83 | 91.04 | 93.51 | 82.06 | 88.77 |
FT | 84.39 | 74.17 | 82.60 | 86.00 | 88.86 | 87.43 | 91.49 | 76.17 | 83.89 |
NB | 63.68 | 60.94 | 64.56 | 71.88 | 65.86 | 59.81 | 76.62 | 59.07 | 65.30 |
kNN | 87.54 | 79.84 | 85.76 | 91.55 | 92.22 | 88.81 | 91.62 | 77.82 | 86.90 |
DT | 85.44 | 78.57 | 81.65 | 87.77 | 92.86 | 88.23 | 91.17 | 79.03 | 85.59 |
AUC | Balanced Datasets | ||||||||
---|---|---|---|---|---|---|---|---|---|
SDP Models | CM1 | KC1 | KC3 | MW1 | PC1 | PC3 | PC4 | PC5 | Average |
CSForest | 0.952 | 0.884 | 0.915 | 0.959 | 0.985 | 0.972 | 0.985 | 0.897 | 0.944 |
FPA | 0.946 | 0.898 | 0.930 | 0.960 | 0.984 | 0.969 | 0.982 | 0.900 | 0.946 |
FT | 0.844 | 0.742 | 0.826 | 0.860 | 0.889 | 0.874 | 0.915 | 0.762 | 0.839 |
NB | 0.771 | 0.691 | 0.684 | 0.822 | 0.839 | 0.804 | 0.876 | 0.714 | 0.775 |
kNN | 0.875 | 0.799 | 0.863 | 0.916 | 0.922 | 0.886 | 0.916 | 0.778 | 0.869 |
DT | 0.848 | 0.824 | 0.835 | 0.869 | 0.934 | 0.897 | 0.918 | 0.799 | 0.866 |
Accuracy | Original Dataset | ||||||||
---|---|---|---|---|---|---|---|---|---|
CSForest | Bagged CSForest | BoostCSForest | FPA | Bagged FPA | BoostFPA | FT | Bagged FT | BoostFT | |
CM1 | 87.16 | 87.16 | 87.16 | 86.85 | 86.85 | 86.85 | 83.49 | 85.32 | 85.93 |
KC1 | 74.78 | 76.94 | 76.94 | 76.94 | 78.14 | 78.14 | 76.42 | 77.80 | 77.80 |
KC3 | 81.44 | 81.44 | 81.44 | 79.90 | 80.93 | 80.93 | 82.47 | 81.96 | 81.96 |
MW1 | 90.00 | 90.00 | 90.40 | 89.20 | 89.60 | 89.60 | 88.80 | 88.80 | 88.80 |
PC1 | 91.90 | 91.90 | 91.61 | 91.90 | 91.90 | 91.90 | 89.70 | 91.31 | 90.72 |
PC3 | 87.56 | 87.56 | 88.02 | 87.00 | 87.65 | 87.65 | 85.70 | 85.52 | 84.22 |
PC4 | 87.65 | 89.43 | 89.43 | 88.19 | 89.88 | 89.36 | 88.89 | 90.21 | 91.14 |
PC5 | 74.11 | 77.50 | 77.50 | 76.86 | 82.66 | 82.38 | 74.81 | 76.56 | 76.56 |
Average | 84.33 | 85.24 | 85.31 | 84.61 | 85.95 | 85.85 | 83.79 | 84.69 | 84.64 |
Accuracy | Original Dataset | ||||||||
---|---|---|---|---|---|---|---|---|---|
CSForest | Bagged CSForest | BoostCSForest | FPA | Bagged FPA | BoostFPA | FT | Bagged FT | BoostFT | |
CM1 | 0.725 | 0.728 | 0.728 | 0.652 | 0.714 | 0.714 | 0.591 | 0.723 | 0.712 |
KC1 | 0.686 | 0.808 | 0.808 | 0.713 | 0.742 | 0.742 | 0.600 | 0.718 | 0.694 |
KC3 | 0.745 | 0.751 | 0.751 | 0.713 | 0.730 | 0.730 | 0.656 | 0.711 | 0.659 |
MW1 | 0.802 | 0.863 | 0.863 | 0.702 | 0.756 | 0.702 | 0.689 | 0.735 | 0.656 |
PC1 | 0.833 | 0.861 | 0.861 | 0.775 | 0.851 | 0.883 | 0.629 | 0.800 | 0.794 |
PC3 | 0.832 | 0.835 | 0.835 | 0.778 | 0.827 | 0.778 | 0.582 | 0.777 | 0.753 |
PC4 | 0.924 | 0.925 | 0.925 | 0.904 | 0.933 | 0.922 | 0.762 | 0.921 | 0.931 |
PC5 | 0.778 | 0.785 | 0.785 | 0.776 | 0.793 | 0.776 | 0.650 | 0.770 | 0.739 |
Average | 0.791 | 0.820 | 0.820 | 0.752 | 0.793 | 0.781 | 0.645 | 0.769 | 0.742 |
Accuracy | Balanced Dataset | ||||||||
---|---|---|---|---|---|---|---|---|---|
CSForest | Bagged CSForest | BoostCSForest | FPA | Bagged FPA | BoostFPA | FT | Bagged FT | BoostFT | |
CM1 | 81.40 | 85.44 | 90.18 | 88.25 | 90.00 | 90.06 | 84.39 | 85.80 | 88.60 |
KC1 | 76.04 | 79.67 | 81.39 | 83.12 | 83.35 | 83.35 | 74.17 | 79.55 | 79.21 |
KC3 | 76.27 | 79.75 | 85.76 | 87.03 | 87.03 | 87.03 | 82.60 | 84.49 | 85.13 |
MW1 | 86.67 | 86.67 | 93.11 | 91.33 | 92.44 | 93.33 | 86.00 | 87.56 | 89.56 |
PC1 | 94.31 | 94.23 | 96.87 | 93.83 | 93.99 | 95.83 | 88.86 | 92.07 | 93.27 |
PC3 | 89.71 | 90.19 | 92.20 | 91.04 | 92.26 | 92.26 | 87.43 | 89.77 | 89.50 |
PC4 | 93.11 | 93.02 | 95.14 | 93.51 | 94.19 | 94.60 | 91.49 | 92.84 | 93.96 |
PC5 | 76.25 | 79.88 | 82.50 | 82.06 | 83.38 | 83.38 | 76.17 | 79.52 | 77.54 |
Average | 84.22 | 86.11 | 89.64 | 88.77 | 89.58 | 89.98 | 83.89 | 86.45 | 87.10 |
Accuracy | Balanced Dataset | ||||||||
---|---|---|---|---|---|---|---|---|---|
CSForest | Bagged CSForest | BoostCSForest | FPA | Bagged FPA | BoostFPA | FT | Bagged FT | BoostFT | |
CM1 | 0.952 | 0.955 | 0.962 | 0.946 | 0.962 | 0.964 | 0.844 | 0.929 | 0.943 |
KC1 | 0.884 | 0.898 | 0.883 | 0.898 | 0.907 | 0.892 | 0.742 | 0.858 | 0.855 |
KC3 | 0.915 | 0.920 | 0.932 | 0.930 | 0.941 | 0.936 | 0.826 | 0.900 | 0.912 |
MW1 | 0.959 | 0.961 | 0.972 | 0.960 | 0.970 | 0.973 | 0.860 | 0.937 | 0.961 |
PC1 | 0.985 | 0.986 | 0.991 | 0.984 | 0.988 | 0.989 | 0.889 | 0.974 | 0.983 |
PC3 | 0.972 | 0.972 | 0.969 | 0.969 | 0.975 | 0.974 | 0.874 | 0.957 | 0.955 |
PC4 | 0.985 | 0.985 | 0.987 | 0.982 | 0.987 | 0.987 | 0.915 | 0.973 | 0.981 |
PC5 | 0.897 | 0.955 | 0.956 | 0.900 | 0.910 | 0.903 | 0.762 | 0.878 | 0.857 |
Average | 0.944 | 0.954 | 0.957 | 0.946 | 0.955 | 0.952 | 0.839 | 0.926 | 0.931 |
SDP Models | CM1 | KC1 | KC3 | MW1 | PC1 | PC3 | PC4 | PC5 |
---|---|---|---|---|---|---|---|---|
* BaggedCSForest | 85.44 | 79.67 | 79.75 | 86.67 | 94.23 | 90.19 | 93.02 | 79.88 |
* BoostCSForest | 90.18 | 81.39 | 85.76 | 93.11 | 96.87 | 92.20 | 95.14 | 82.50 |
* BaggedFPA | 90.00 | 83.35 | 87.03 | 92.44 | 93.99 | 92.26 | 94.19 | 83.38 |
* BoostFPA | 90.06 | 83.35 | 87.03 | 93.33 | 95.83 | 92.26 | 94.60 | 83.38 |
* BaggedFT | 85.80 | 79.55 | 84.49 | 87.56 | 92.07 | 89.77 | 92.84 | 79.52 |
* BoostFT | 88.60 | 79.21 | 85.13 | 89.56 | 93.27 | 89.50 | 93.96 | 77.54 |
CG-NB [19] | 84.71 | 77.45 | 78.35 | 90.00 | 92.19 | 87.84 | 90.16 | 77.33 |
CG-DT [19] | 85.32 | 76.85 | 79.90 | 90.00 | 92.05 | 87.94 | 88.98 | 78.16 |
CG-kNN [19] | 85.32 | 77.62 | 80.41 | 90.40 | 91.90 | 86.99 | 89.09 | 76.45 |
BaggedLR [80] | 74.00 | - | 76.00 | - | 81.00 | 75.00 | 83.00 | 68.00 |
AdaboostSVM [80] | 75.00 | - | 77.00 | - | 79.00 | 74.00 | 81.00 | 68.00 |
kStar [81] | 77.55 | 72.20 | 75.86 | 82.67 | 86.27 | 82.59 | 81.89 | 69.88 |
CS-Forest [54] | 82.53 | - | 81.44 | 88.33 | 91.16 | 84.77 | 88.88 | - |
Rotation Tree [54] | 83.33 | - | 70.61 | 86.60 | 91.07 | 85.54 | 86.69 | - |
Dagging_NB [82] | 70.80 | 66.90 | 70.20 | 75.30 | 78.60 | 78.50 | 81.60 | 69.90 |
Dagging_DT [82] | 59.60 | 68.10 | 64.50 | 77.10 | 76.70 | 78.20 | 89.70 | 76.50 |
Dagging_kNN [82] | 61.10 | 67.90 | 62.30 | 72.20 | 78.50 | 75.30 | 84.40 | 76.40 |
SDP Models | CM1 | KC1 | KC3 | MW1 | PC1 | PC3 | PC4 | PC5 |
---|---|---|---|---|---|---|---|---|
* BaggedCSForest | 0.955 | 0.898 | 0.920 | 0.961 | 0.986 | 0.972 | 0.985 | 0.955 |
* BoostCSForest | 0.962 | 0.883 | 0.932 | 0.972 | 0.991 | 0.969 | 0.987 | 0.956 |
* BaggedFPA | 0.962 | 0.907 | 0.941 | 0.970 | 0.988 | 0.975 | 0.987 | 0.910 |
* BoostFPA | 0.964 | 0.892 | 0.936 | 0.973 | 0.989 | 0.974 | 0.987 | 0.903 |
* BaggedFT | 0.929 | 0.858 | 0.900 | 0.937 | 0.974 | 0.957 | 0.973 | 0.878 |
* BoostFT | 0.943 | 0.855 | 0.912 | 0.961 | 0.983 | 0.955 | 0.981 | 0.857 |
CG-NB [19] | 0.704 | 0.732 | 0.723 | 0.731 | 0.885 | 0.846 | 0.937 | 0.803 |
CG-DT [19] | 0.717 | 0.723 | 0.731 | 0.723 | 0.861 | 0.834 | 0.925 | 0.806 |
CG-kNN [19] | 0.689 | 0.733 | 0.703 | 0.724 | 0.864 | 0.830 | 0.935 | 0.800 |
BaggedLR [80] | 0.650 | - | 0.660 | - | 0.770 | 0.740 | 0.870 | 0.680 |
AdaboostSVM [80] | 0.680 | - | 0.660 | - | 0.760 | 0.730 | 0.820 | 0.680 |
Stacking (NB, MLP, J48) [83] | - | - | - | - | 0.749 | - | - | - |
Stacking (NB, MLP, J48)+SMOTE [83] | - | - | - | - | 0.871 | - | - | - |
J48 [84] | 0.594 | 0.689 | - | - | 0.668 | - | - | - |
kStar [81] | 0.538 | 0.651 | 0.528 | 0.543 | 0.673 | 0.749 | 0.734 | 0.629 |
Dagging_NB [82] | 0.708 | 0.669 | 0.702 | 0.753 | 0.786 | 0.785 | 0.816 | 0.699 |
Dagging_DT [82] | 0.596 | 0.681 | 0.645 | 0.771 | 0.767 | 0.782 | 0.897 | 0.765 |
Dagging_kNN [82] | 0.611 | 0.679 | 0.623 | 0.722 | 0.785 | 0.753 | 0.844 | 0.764 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Usman-Hamza, F.E.; Balogun, A.O.; Mamman, H.; Capretz, L.F.; Basri, S.; Oyekunle, R.A.; Mojeed, H.A.; Akintola, A.G. Empirical Analysis of Data Sampling-Based Decision Forest Classifiers for Software Defect Prediction. Software 2025, 4, 7. https://doi.org/10.3390/software4020007
Usman-Hamza FE, Balogun AO, Mamman H, Capretz LF, Basri S, Oyekunle RA, Mojeed HA, Akintola AG. Empirical Analysis of Data Sampling-Based Decision Forest Classifiers for Software Defect Prediction. Software. 2025; 4(2):7. https://doi.org/10.3390/software4020007
Chicago/Turabian StyleUsman-Hamza, Fatima Enehezei, Abdullateef Oluwagbemiga Balogun, Hussaini Mamman, Luiz Fernando Capretz, Shuib Basri, Rafiat Ajibade Oyekunle, Hammed Adeleye Mojeed, and Abimbola Ganiyat Akintola. 2025. "Empirical Analysis of Data Sampling-Based Decision Forest Classifiers for Software Defect Prediction" Software 4, no. 2: 7. https://doi.org/10.3390/software4020007
APA StyleUsman-Hamza, F. E., Balogun, A. O., Mamman, H., Capretz, L. F., Basri, S., Oyekunle, R. A., Mojeed, H. A., & Akintola, A. G. (2025). Empirical Analysis of Data Sampling-Based Decision Forest Classifiers for Software Defect Prediction. Software, 4(2), 7. https://doi.org/10.3390/software4020007