1. Introduction
Software defect prediction (SDP) is a technique for improving software quality and reducing software testing costs through the creation of multiple categorization or classification models utilizing various machine learning approaches. Many companies that develop various types of software want to foresee problems to maintain software quality for customer satisfaction and save testing costs. SDP is part of the software development life cycle in which we predict the fault using a Machine Learning (ML) approach with historical data [
1]. It is a structured methodology that enables the creation of high-quality, low-cost software in the short possible time to meet customer expectations.
SDP’s mission is to provide high-quality software and dependability while making efficient use of limited resources. As a result of this, software developers will be able to prioritize the utilization of computer resources at each level of the software development process [
3]. Many organizations which are producing various types of software wish to predict the defects in software to maintain software quality for customer satisfaction and to save the testing cost. SDP is used to increase the software’s quality and testing can be conducted efficiently by constructing various classification models using different machine learning methods. A wide range of ML approaches has been investigated so far to anticipate errors in software modules to enhance software quality and reduce software testing costs. There are several ML techniques, which are implemented in the SDP Decision tree, Naïve Bayes (NB), Radial basis function, Support Vector Machine (SVM), K-nearest neighbor, Multi-layer perceptron and Random Forest (RF) [
New advances in ML are ensembling techniques and various ML techniques with feature selection methods such as PCA, etc. [
6]. In the extant literature, there are several types of software metrics that have been found and utilized for SDP. It would be more practical to deal with the most important software metrics and focus on them to predict defect in software [
7]. SDP analyses data from the past acquired from software repositories to find out the quality and reliability of software modules [
8]. There are numerous types of software metrics that have been found and utilized for SDP in existing literature. SDP models are generated with the help of software metrics from data acquired from previously established systems or similar software initiatives [
It would be more practical to look at and focus on the most important software metrics to predict bugs in the software. Therefore, the dataset used in the paper has been publicly available on the Promise Repository since 2005, providing information on various applications that NASA (National Aeronautics and Space Administration) has investigated. In the research context, after dataset pre-processing and feature selection (FS), K-means clustering is used to perform the output categorization. Then, ML approaches such as SVM, NB and RF with and without particle swarm optimization (PSO) are used. An ensemble approach is then used to integrate the results. Finally, all ML models are analysed and compared to the previous studies. The models’ performance is evaluated using precision, accuracy, recall, F-measure, performance error metrics, and confusion matrix.
4. Related Work
In SDP, the most commonly used ML approaches are clustering, classification and deep learning. Using ML-based classifiers and statistical approaches, researchers have suggested a variety of SDP models. In their study, Iqbal et al. [
4] used multiple classification algorithms to forecast software errors using twelve NASA datasets. Classifiers include NB, Redial basis function, K-nearest neighbor, Multi-layer perceptron, K-star, SVM, Decision tree, One rule, and RF. Precision, accuracy, recall, f-measure, and AUC were used to assess performance. Root cause analysis attempts to find the root causes of the issue in order to remedy the problem. The root cause analysis method helps us to find defects at an early stage. Different types of clustering techniques are implemented in the model [
10]. Clustering is a form of unsupervised learning which works on data similarities. Researchers applied the various types of clustering techniques to find out the defect. WEKA was used to implement clustering techniques. K-means clustering presents better performance on this.
ML-based classification is also used for Software bug prediction [
11]. The different classifier is applied with Machine Learning for this. Artificial Neural Networks (ANNs) classifier, NB classifier and Decision tree classifier are used in this ML-based classification, which achieved good results. Five classes are the output of this approach with small dataset measurements. The experimentation is implemented on three public datasets. WEKA 3.6.9 is used as a ML tool. ML techniques have good performance in this task. Perreault et al. used five distinct types of classifiers to discover software defects in their investigation [
12]. To measure performance, five distinct datasets from NASA’s metrics data program were employed for classification: Neural networks, logistic regression, NB, SVM and k-nearest neighbor. For some datasets, NB and SVM perform best amongst others. Poor software quality is caused by software faults. As a result, it is critical to eliminate software faults in order to improve software quality.
Rawat and Dubey et al. [
13] provide numerous models for improving software quality by studying factors that affect software quality and improving product and quality in terms of software in their research. They looked at a variety of size and complexity measurements, as well as models such as Bayesian belief networks, genetic algorithms and neural network, among others. Surndha Naidu et al. [
14] presented another good paper. The major purpose of the article was to identify how many problems there were overall in order to save money and time. Volume, difficulty, commitment, and time estimator, as well as program length, have all been used to categorize the defect. They employed a decision tree classifier for this. To classify defects, they used the classification algorithm ID3. They then used a pattern mining approach to classify faulty patterns. They used JAVA to implement the proposed paradigm.
The literature on model prediction focuses on extracting characteristics and utilizing various ML algorithms. Control Flow Graphs (CFG) with a neural network approach is introduced [
15]. This model’s first step is about CFG, in which they present a program of software into graphs. After compiling the source code, a program’s CFG is built from the assembly code. A Strong neural network on label graphs is used, which is also known as a multilayer convolution neural network. This model’s performance is evaluated using four datasets.
For prediction, this research in [
16] presents a standard ADBBO (Adaptive Dimensional Biogeography-Based Optimization) model combined with RBFNN (Redial Basis Function Neural Network) model. Five datasets that are publicly available and part of the NASA data program are used in the experiments. The authors conduct a comparison with predictions from earlier studies for similar datasets. The class imbalance issue reduces the accuracy of defect prediction. Kernel-based learning was introduced by Ren et al. [
17]. This problem is solved using an AKPLSC (Asymmetric Kernel Partial Least Square Classifier) and an AKPCAC (Asymmetric Kernel Principal Component Analysis Classifier). After that, both classifiers are subjected to the kernel function. As a kernel, a Gaussian function is utilized. Experimentation is carried out using SOFTLAB and NASA databases.
In [
18], Pooja Paramshetti et al. use K-means clustering and the a priori technique. Clustering is used to achieve discretization, and then a priori is used to extract rules or patterns from data. Experimentation is carried out using NASA defect data. Finally, a comparison with existing techniques is given. The CART technique, which is an optimized regression and classification tree, is proposed in paper [
19]. Along with this method, principle component analysis is utilized to minimize dimensions. They claim that optimization run time is an overhead in this case. As a result, further improvements can be realized by shortening the run time.
A genetic algorithm with an upgraded deep neural network is proposed in [
20]. For feature optimization, a genetic algorithm is applied. As a result, a hybrid technique is employed to anticipate defects utilizing four datasets. The promise data repository provided the dataset. For implementation, a MATLAB tool is used. The accuracy of this hybrid method is excellent.
The ensemble method based on feature selection was utilized in this study [
21]. The proposed framework has been implemented both with and without Feature Selection (FS). NASA provided twelve cleaned public datasets. The proposed findings are compared to those of different classifiers. Although the results are improved, the problem of class imbalance persists in this situation. Different ML algorithms are examined in this work [
22]. Artificial neural network, decision tree, NB, PSO and linear classifier are among the algorithms used. The KEEL tool is used to perform the experiments, and seven datasets from NASA’s data repository are used. In four datasets, the linear classifier outperforms the rest.
The ANN approach is combined with the Artificial Bee Colony (ABC) approach in [
23]. The ABC algorithm is used to train the artificial neural network. For optimization, the ABC algorithm is applied. In the data, there is a problem with class imbalance. As a result, the results are unbalanced. NASA Data Program provided five separate datasets for this investigation. For performance evaluation, accuracy, AUC and other metrics are used. A mixed strategy is adopted in this study [
24]. With a decision tree classifier, a genetic algorithm is applied. The fitness function is applied to those features that have been optimized by this. This study makes use of three promise repository datasets. Experimentation is carried out using the MATLAB program. Alternative approaches to estimating the probability of defects are discussed in this publication [
25]. However, the bulk of these experiments rely on predicting defects from a wide variety of device functions.
In this review [
26], the authors discuss some software metrics and datasets for predicting defects. They are mostly used for finding defects in the ML approach, which consists of making information from software archives which contain messages as well as sources code. Instances contain a method, class, sources code, packages and code change. The instance also contains some features obtained from the software archives. Metrics values characterize software complexity and development. Instances are considered as labeled or non-labeled. Metrics play an important part in making any predicting model which helps to improve the consistency of software by getting as much as possible defect from the software. Metrics are divided into two-part code, which describes the complexity of code, and process metrics, which describe the complexity of development. The most used code metrics are line of code metrics. A defect dataset is important for predicting defects. Some early studies show that researchers used a non-public dataset. Some publicly available datasets are NASA, SOFTLAB, PROMISE, ReLink, AEEEM, ECLIPSE 1, ECLIPSE 2, etc. For predicting software defects, many evaluation measures are used, i.e., Probability of defect, True Positive (TP) rate, False Negative (FN), True Negative (TN) and False Positive (FP) rate, precision, accuracy, G measure, F measure, AUC. With the new advancement in research, the ML algorithm for SDP has also improved.
In [
5], a hybrid technique is utilized to predict faults. The authors use a feature selection approach in conjunction with various machine learning classifiers. For feature selection, the Optimized Artificial Immune Networks (Opt-aiNet) method was utilized. ML classifiers are used to compare the results. From the PROMISE repository, five separate datasets are used. Accuracy and AUC are evaluation metrics. After selecting features, performance improved. The work in [
27] suggested a remedy to imbalance problem. Principle component analysis was used to pick features. For cost-sensitive imbalance problems, ANFIS is employed for prediction. This method improves the ROC area by 5%.
Another excellent paper was presented by Alsaeedi et al. [
28]. To anticipate flaws, they used three classifiers: Decision tree, RF, SVM and ensemble approaches including Ada-boost and Bagging. For the imbalance data, SMOTE sampling was used. Ten NASA datasets are used in the experiment. Among all approaches, RF and Ada-boost with RF performed best. The use of many classifiers in a combined strategy is a hot issue these days. For defect prediction, the article in [
29] also used a mixed or ensemble technique. Six algorithms were used on five different datasets from the NASA MDP database. To improve data quality, SMOTE was employed for sampling. Experimentation was performed using the WEKA tool. The best ensemble algorithm is RF.
The work in [
30] employs tree-based ensembling. For the four free source NASA Metric Data Program Datasets, seven ensemblers are employed. Two are bagging, while the other five are boosting. Bagging performs well, however, Ada-boost performs poorly. Different clustering algorithms are used in [
31], and the resulting clusters are then integrated into a single model. To increase software quality, PSO is employed. To improve the quality of software, the NB classifier is used in conjunction with ARM, which is used to pick features [
32]. NASA datasets are used to test five different datasets. Other methods are used to assess performance. Different ML approaches and frameworks are used with different metrics to eradicate issues in this review [
33]. They looked at 40 studies published in various journals between 2009 and 2018. Despite all of the work that has been done in this field, there is always room for improvement due to the imbalanced nature of datasets and ambiguities.
The authors of [
6] analyzed five public datasets from the promise repository using ML-based predictions and ten classifiers. Measure accuracy was used in the evaluation. Deep learning techniques were investigated for defect prediction in this survey [
34]. None of the methods consistently produced results with great accuracy, recall and precision. SLR was employed to monitor current developments in ensemble or hybrid techniques [
35]. A strategy was unveiled to boost performance. From reputable online libraries, 46 papers were chosen for shortlisting. According to a study, FS and data sampling enhance outcomes. Utilizing evaluation metrics, performance is accessed. The ensemble strategy outperforms others.
The literature study shows that various ML techniques have been applied till now but their performance varies across datasets and in terms of ML, their performance is less accurate. Therefore, we want to improve accuracy by analysing various ML techniques combined with FS and K-means clustering. The purpose of research is to improve accuracy with respect to literature studies.