Developing a Machine Learning-based Software Fault Prediction Model using Improved Whale Optimization Algorithm

: Software fault prediction (SFP) is vital for ensuring software system reliability by detecting and mitigating faults. Machine learning has proven effective in addressing SFP challenges. However, extensive fault data from historical repositories often leads to dimensionality issues due to numerous metrics. Feature selection (FS) helps mitigate this problem by identifying key features. This research enhances the Whale Optimization Algorithm (WOA) by combining truncation selection with a single-point crossover method to enhance exploration and avoid local optima. Evaluating the enhancement on 14 SFP datasets from the PROMISE repository reveals its superiority over the original WOA and other variants, demonstrating its potential for improved SFP .


Introduction
SFP greatly aids in producing high-quality software at a low cost by identifying fault-prone software modules [1].Machine learning algorithms like decision trees, Bayesian learners, neural networks, support vector machines, and rule-based learning have shown promise, as have soft computing approaches like fuzzy computing, neural networks, evolutionary computing, and swarm intelligence [2].
Feature selection is frequently used to improve the SFP performance of machine learning (ML) algorithms, intending to increase data processing effectiveness and avoid algorithmic error [3].This is often done using metaheuristic algorithms like genetic algorithms and particle swarm optimization [4].Among these metaheuristic approaches, the Whale Optimization Algorithm (WOA) has emerged as a promising choice for feature selection.However, WOA is susceptible to local optima trapping, a challenge in large datasets.
This study addresses the local optima problem in WOA for feature selection in SFP by introducing the truncation selection method.Building on recent advancements in WOA variants [5], the research investigates the effectiveness of truncation selection within the context of WOA selection enhancement.This novel approach aims to improve WOA's performance in SFP, offering a potential solution to the local optima challenge.
In summary, the research aims to contribute to the field of software fault prediction by leveraging metaheuristic algorithms, specifically WOA, in combination with the novel truncation selection method, building upon previous advancements to address local optima challenges and enhance the effectiveness of machine learning algorithms in software fault prediction.

Review of Related Work
A comprehensive review of literature related to Software Fault Prediction (SFP), Machine Learning (ML)-based SFP, feature selection, and Meta heuristic algorithms for software fault prediction is presented.This review aims to provide a foundational understanding of existing knowledge in this field to support the development and evaluation of the proposed methodology.
A study by [4] brings to the fore the traditional techniques used in SFP, encompassing software matrices, Soft Computing (SC), and Machine Learning (ML).While these approaches have significantly contributed to early fault prediction and the development of dependable software, they still have limitations in predicting certain types of faults.Additionally, they might be time-consuming, particularly when applied to complex software projects, leading to potentially diminished software testing effectiveness.
Feature selection (FS) has become a significant step in data mining in general and machine learning in particular since it helps to clean data by removing noisy, irrelevant, and redundant features [11].A study by [12] developed a novel FS called evolving populations with mathematical diversification (FS-EPWMD), which uses arithmetic diversification among candidate solutions to avoid the local optimum.The guiding principle of populations evolving through crossover and mutation is the survival of the fittest.The results demonstrated that FS-EPWMD outperforms other models.
Swarm intelligence (SI) is one of computational intelligence techniques that are used to resolve complicated problems [13].Swarm intelligence algorithms have demonstrated excellent performance in lowering the running time and addressing the FS problem.For instance, In order to solve the FS problem in the area of software fault prediction, [14] proposed the island model to improve the BMFO.The EBMFO with SVM classifier produced the best results overall.These findings show that the suggested model can be a useful predictor for the software fault issue.

Proposed Work
This section presents the research workflow and outlines the proposed methodology employed in the study, covering the enhancement of the algorithm and subsequent performance evaluation.

The Research Workflow
The proposed model work in four phases namely; literature review, methodology, implementation and evaluation and results phase.The flow begins from literature review down to evaluation and results and each phase is represented with its activities.Figure 1 represent the entire work flow of the model.
In the first phase, we conducted a comprehensive literature review, exploring 42 articles related to Software Fault Prediction (SFP), ML-based SFP, Feature Selection (FS), and Selection schemes.The Second phase highlighted the main components of the proposed ML-based SFP model and the enhanced WOA.ML-based SFP handles prediction, while the enhanced WOA aims to enhance its performance.In the third phase, we implemented the ML-based SFP model using Google Colab and replicated baseline work in the same environment for comparison.To classify FS problems, four well-known classifiers were employed; Support Vector Machine (SVM), Decision Tree (DT), Linear Discriminant Analysis (LDA), and K-Nearest Neighbors (KNN).In the final phase, the proposed model's performance was evaluated using the following metrics; Area Under Curve (AUC), precision, recall, F1 score, and accuracy.Cross-validation technique was also used to assess the performance of the model, where 80% of the dataset was used for training and 20% for testing.

The Proposed ML-based SFP Model
This section details the research methodology for ML-based Software Fault Prediction (SFP).Figure 2 presents the proposed model diagram which works in five stages; data collection, data pre-processing, feature selection, machine learning classifiers, and evaluation.
Stage 1 involves gathering 14 datasets from the PROMISE dataset repository, with details provided in table 1.In Stage 2, data pre-processing was used to stabilized the dateset into a form suitable for training and validation.Stage 3 employs the Whale Optimization Algorithm (WOA) for feature selection, with a focus on improving its selection scheme using truncation selection.The stage 4 deploys four ML classifiers (DT, KNN, LDA, SVM) to predict software faults, enhancing the WOA's performance in feature selection.In the final stage we evaluated the model using the evaluation metrics mentioned above.

The Proposed Enhanced Whale Optimization Algorithm
This work employs the Whale Optimization Algorithm (WOA) and enhances it by incorporating truncation selection to improve its selection scheme, as illustrated in Figure 3.In Figure 3   This section devoted to the presentation and of the experimental results.

2 •
, we addressed the challenge of a stuck best solution in local optima, where we proposed a solution involving the combination of the Whale Optimization Algorithm (WOA), truncation selection, and a single-point crossover approaches.This enhancement primarily focuses on improving the selection part of WOA, where employed truncation selection.In the truncation selection process, individuals are ranked based on their fitness values, and only the best-performing individuals are chosen as parents for the next generation.This selection is governed by a primary truncation selection parameter known as the TRS threshold, which can vary between 50% and 10%.percentage of the population that will serve as parents.Individuals falling below this threshold are eliminated 1 as they are considered unfit for reproduction.The truncstion selection processes are indicated below.The population is sorted based on each individual's evaluation scores.

3 • 4 •
The poorest-performing fraction of the population is removed.The eliminated individuals are replaced with variations of individuals from the top-performing fraction, with each of the 5 best individuals creating one offspring.These offspring subsequently replace one of the previously removed, 6 lower-performing individuals in the population.

Figure 3 :
Figure 3: flowchart of the proposed enhanced WOA

31 4 . 1
Implementation Environment 32 The work was implemented on Google Colab environment, using Python, Pandas 33 and Tensor Flow libraries.

34 4 . 2
Proposed Model Performance 35 The proposed enhance WOA was evaluated by iteratively enhancing a population 36 of candidate solutions using truncation selection method, crossover, and coefficient up-37 date procedures.The results obtained from the experiments revealed promising out-38 comes as shown in

Figure 4 :
Figure 4: Comparison of results of WOA implemented with different selection schemes with KNN