Software Refactoring Prediction Using SVM and Optimization Algorithms

: Test suite code coverage is often used as an indicator for test suite capability in detecting faults. However, earlier studies that have explored the correlation between code coverage and test suite effectiveness have not addressed this correlation evolutionally. Moreover, some of these works have only addressed small sized systems, or systems from the same domain, which makes the result generalization process unclear for other domain systems. Software refactoring promotes a positive consequence in terms of software maintainability and understandability. It aims to enhance the software quality by modifying the internal structure of systems without affecting their external behavior. However, identifying the refactoring needs and which level should be executed is still a big challenge to software developers. In this paper, the authors explore the effectiveness of employing a support vector machine along with two optimization algorithms to predict software refactoring at the class level. In particular, the SVM was trained in genetic and whale algorithms. A well-known dataset belonging to open-source software systems (i.e., ANTLR4, JUnit, MapDB, and McMMO) was used in this study. All experiments achieved a promising accuracy rate range of between 84% for the SVM–Junit system and 93% for McMMO − GA + Whale + SVM. It was clear that added value was gained from merging the SVM with two optimization algorithms. All experiments achieved a promising F-measure range between the SVM–Antlr4 system’s result of 86% and that of the McMMO − GA + Whale + SVM system at 96%. Moreover, the results of the proposed approach were compared with the results from four well known ML algorithms (NB-Naïve, IBK-Instance, RT-Random Tree, and RF-Random Forest). The results from the proposed approach outperformed the prediction performances of the studied MLs.


Introduction
In any business sector, the quality of a particular product or a service matters, and this quality is often dependent on the process that is followed to build that product or service [1]. Today's world massively depends on software technology, and high quality in these software systems has been greatly in demand for the past few decades. The main expectancy of high-quality software is in its reliability and ecosystem. This is achieved by reducing the bugs or failures in the software algorithms. These bugs tend to slow down the software's response and user experience, which can harm its performance. These system errors cause faults, and subsequently, the faults cause system failures [1][2][3][4][5]. Altering a software system in a way that does not affect its external response, but improves its internal structure, is known as refactoring [6]. It also improves the external response by adding qualities such as improved user experience and interfaces.
During a software application's life cycle, the software is continuously changed to adapt to new features or to modify existing ones to cope with new requirements. In order to continue to satisfy stakeholders' needs, their requirements oblige the developer to reflect their intended needs into the software. It is known that software maintenance is the most expensive phase in the software development lifecycle [7]. These maintenance activities usually happen incrementally, and can be carried out to add or modify functionality, or to restructure the design for a better user experience. If the system does not go through several design correction activities, its quality will degrade [8].
Once software developers receive new demands or requests, they modify the software to accommodate these requirements (software refactoring) [9]. Software refactoring modifies the internal structure of the software without altering its external functionality [8,9]. Moreover, software refactoring is employed to enhance the understandability, reduce the complexity, and increase the maintainability of the targeted software [10,11].
Software Refactoring might change the software at three levels, from the lower to higher levels of variables, functions, and classes. These changes introduce a big technical challenge to the software developers, especially when they need to identify both the level and all the code pieces that need refactoring. The primary aim of refactoring is to make the code more maintainable without changing its semantics [12]. Software refactoring is a highly challenging task, particularly in identifying which parts of the software have to be refactored and which methods are to be used. These challenges arise due to the significant functionality limitations that software repositories contain, and the type of data used in them [1]. Hence, much research has raised the need to build refactoring prediction/recommendation systems to assist in evolution tasks [6,11,[13][14][15][16].
Although the refactoring task is generally dependent on the software developers' skills and insights, this process may still be supported by refactoring prediction/recommendation systems. These prediction systems facilitate the process of detecting the classes or methods that need refactoring.
To the best of our knowledge, the work presented in this article introduces a new research contribution. The research work in this paper presents a class-level refactoring prediction from four open-source Java-based systems, i.e., ANTLR4, JUnit, MapDB, and McMMO, using a support vector machine (SVM) and two optimization algorithms: genetic algorithms (GA) and whale algorithms (WA). This paper uses the studied algorithms to predict the refactoring needs at the class level when stand-alone and integrated algorithms are applied. The main problem that software practitioners encounter is recognizing which code segment has to be refactored. Therefore, this paper focuses on the use of SVM and optimization methods in this regard. By repeating the experiments until we reach the best iteration, we can develop an understanding of which technique's response is better, leading to optimized results in terms of software quality. Thus, by conducting these experiments, suggestions and conclusions can be made regarding the aforementioned refactoring methods and algorithms.

Related Work
In order to make predictions about the defects in particular software, researchers and developers also apply a machine learning approach to a software system in realtime. Some well-known examples of these machine learning approaches involve telecontrol/telepresence, robotics, and mission planning systems. Many studies have been conducted in the field of software fault prediction, and the methods used by researchers differ between optimization, machine learning, and classification techniques [3]. There are several procedures employed to examine the defects present in software, but until now there has been no report of a technique that can display highly accurate results.
As discussed, various refactoring implication types exist. The main process of refactoring involves the modification of classes, methods, and variables. Upon doing that, developers must also address an important aspect-identifying all the code elements or code segments in the large complex system of the software that require refactoring.
In this respect, support vector machines (SVMs) have high popularity among software developers and testers. An SVM classifies data into predefined classes by computing a hyper-plane in a high-dimensional space [24][25][26]. In other words, it is a machine learning technique that can be used for classification. The advantage of using this method for feature selection is that it tends to reduce the computation time, and it also improves the prediction performance. Since it improves prediction accuracy and helps to enable the observation of different values and crucial factors for evaluating performance, many researchers use SVM for feature selection in their work.
Refactoring has been studied extensively within the literature. Fowler initiated the effort by coming up with the first catalog of 72 refactoring types, with an accompanying guide [9]. Simon et al. [27] proposed an approach to generate visualizations that help developers to identify bad smells.
Several different studies that examine the prediction of faults in software using objectoriented metrics have been conducted. The results of these studies show that object-oriented metrics are able to produce significantly enhanced outcomes compared to static code metrics. This is because object-oriented metrics represent different structural characteristics, such as coupling, cohesion, inheritance, encapsulation, complexity, and size [3,11,17,27].
An early survey [8] was conducted to shed light on refactoring which discussed refactoring activities, techniques, and tools. The authors discussed their beliefs about how refactoring can improve software quality in the long run. Most existing research studies are based on rule-based, machine learning, or search-based approaches. A systematic literature review (SLR) in [28] discusses how researchers are increasingly becoming interested in automatic refactoring techniques. Their results suggest that source code approaches are far more studied than model-based ones. The results also show that search-based approaches are more popular, and that recently more machine learning approaches have been explored by researchers to help experts to discover refactoring needs.
Mariani and Vergilio [29] conducted an SLR of search-based refactoring approaches. They observed that evolutionary algorithms, specifically genetic algorithms, were the most used algorithms. Mohan and Greer [30] investigated search-based refactoring in more depth, covering tools, metrics, and evolution, since their focus was software maintenance. They also found that the evolutionary algorithms were the most used.
Moreover, Shepperd and Kadoda [31] used simulation methods to differentiate between software predictions with the help of stepwise regression rule induction (RI), casebased reasoning (CBR), and artificial neural networks (ANN). They compared these prediction models to the results in actual software in terms of accuracy, explanatory value, and configurability, and they found that CBR and RI gave them an advantage over ANN, while CBR was favored by all.
Azeem et al. [32] conducted a systematic literature review to summarize the research on machine learning (ML) algorithms for code smelling predictions. Their review included 15 research studies that involved code smell and prediction models. According to the results of their study, decision trees and SVM are the most widely used ML algorithms for code smell detection, and furthermore, JRip and Random are the most effective algorithms in terms of performance.
In addition to this, Liu et al. [33] describe a tool that uses conceptual relationship, implementation similarity, structural correspondence, and inheritance hierarchies to identify potential refactoring opportunities in the source code of open-source software systems.
Liu et al. [33] also showed that machine learning models that could predict a high level of defect classes could be built using static measures and defect data, which was collected at a high-class level.
Tsantalis and Chatzigeorgiou [34] reported a way to recognize refactoring suggestions with the help of polymorphism. Their main focus was on the detection and elimination of state-checking problems in programs that implement Java and deploy as eclipse add-ons or plug-ins. In 2007, Ng and Levitin proposed correcting faults, in addition to making predictions of faulty parts in software. In order to achieve this, they applied a genetic Processes 2022, 10, 1611 4 of 10 algorithm and a number of neural networks iteratively. The genetic algorithm in this project was used to increase the performance of the prediction model. Erturk and Sezer [5] analyzed the evolution of an object-oriented source code at a class level. The refactoring events, which depended on a vector space model, were the main focus here. A list of class refactoring operations was created by the application of this proposed approach to an open-source domain.
Another study by Caldeira et al. [1] investigated the effects of aspects such as dataset size, metrics sets, These aspects had not been researched prior to this study. Random forest algorithms and artificial immune systems were used as machine learning methods, and a dataset was collected from the PROMISE repository. The algorithm selected was determined to be much more important than the metrics selected, as per this study [1].

Methodology
In this section, the authors present the technique developed for predicting software refactoring using a support vector machine classifier and two optimization algorithms. The developed approach is composed of four main phases. In the first phase, a pre-processing procedure is conducted on collected datasets. In the second phase, the GA, WA, and SVM classifiers are applied to the processed datasets to predict the refactoring opportunities. In the third phase, the results are evaluated by the Wilcoxon signed-rank test [35]. In the last phase, the results are compared, to determine the best overall approach. Figure 1 depicts the main phases of the proposed technique.
Tsantalis and Chatzigeorgiou [34] reported a way to recognize refactoring suggestions with the help of polymorphism. Their main focus was on the detection and elimination of state-checking problems in programs that implement Java and deploy as eclipse add-ons or plug-ins. In 2007, Ng and Levitin proposed correcting faults, in addition to making predictions of faulty parts in software. In order to achieve this, they applied a genetic algorithm and a number of neural networks iteratively. The genetic algorithm in this project was used to increase the performance of the prediction model. Erturk and Sezer [5] analyzed the evolution of an object-oriented source code at a class level. The refactoring events, which depended on a vector space model, were the main focus here. A list of class refactoring operations was created by the application of this proposed approach to an open-source domain.
Another study by Caldeira et al. [1] investigated the effects of aspects such as dataset size, metrics sets, These aspects had not been researched prior to this study. Random forest algorithms and artificial immune systems were used as machine learning methods, and a dataset was collected from the PROMISE repository. The algorithm selected was determined to be much more important than the metrics selected, as per this study [1].

Methodology
In this section, the authors present the technique developed for predicting software refactoring using a support vector machine classifier and two optimization algorithms. The developed approach is composed of four main phases. In the first phase, a pre-processing procedure is conducted on collected datasets. In the second phase, the GA, WA, and SVM classifiers are applied to the processed datasets to predict the refactoring opportunities. In the third phase, the results are evaluated by the Wilcoxon signed-rank test [35]. In the last phase, the results are compared, to determine the best overall approach.

Datasets and Pre-Processing
In this work, a well-known dataset was used. The dataset includes empirical refactoring occurrences of four open-source software systems (JUnit, McMMO, MapDB, and ANTLR4) [36]. The dataset is available at the PROMISE Repository, making the experiment easily reproducible. The studied attributes of the datasets are shown in Table 1.

Datasets and Pre-Processing
In this work, a well-known dataset was used. The dataset includes empirical refactoring occurrences of four open-source software systems (JUnit, McMMO, MapDB, and ANTLR4) [36]. The dataset is available at the PROMISE Repository, making the experiment easily reproducible. The studied attributes of the datasets are shown in Table 1. All unnecessary attributes, such as the Long-Name, Parent, path, and Component were deleted during data pre-processing. Moreover, the class labels were replaced with 0 and 1, where false became 0 and true became 1.

Classifiers and Optimization Algorithms
An optimization algorithm is a process that is executed iteratively, by making comparisons of different solutions until an optimum result is found. A classifier is an algorithm that prints the input data to a specific category. The feature of a classifier model is that it can individually measure the properties of the software under inspection [37].
A well-known classifier and two optimization algorithms were used in this study. Moreover, several experiments were performed to discover the best integration of these algorithms in terms of refactoring prediction accuracy.

GA with SVM
Genetic algorithm is either a heuristic optimization algorithm, or it is one of the searchbased techniques. GA is commonly used in optimization, classification, and regression problems [38]. It provides a method of solving both constrained and unconstrained optimization problems based on a natural selection process. Genetic algorithms are one of the optimization algorithm types that are widely used in software fault prediction. In this stage, integration of the genetic algorithm with an SVM classifier was applied to the four datasets, and this iteration was repeated 51 times.

Whale Algorithm with SVM
Whale algorithm's structure is based on the way of life of whales. It employs the solution's population to discover the optimal solution. The main idea behind this algorithm is different from the others, as it employs two opposite solutions. These two solutions are the best and the worst solutions, conceived in order to ascertain the optimal situation [39]. The whale algorithm is a new optimization algorithm, that is also used in our work. In this stage, we integrated the whale algorithm with an SVM classifier and applied it to the four datasets. This iteration was also repeated 51 times.

GA and Whale Algorithms with SVM
In this stage, the three algorithms were integrated: we first applied the GA to the dataset with 17 iterations, then we applied the whale algorithm with 17 iterations, and finally the SVM algorithm was applied for another 17 iterations; resulting in a total of 51 iterations.

Results and Discussion
The proposed approach was empirically assessed using the four studied systems. The experiments were conducted using MATLAB. The genetics algorithm was merged with the support vector machine classifier and applied to the four datasets, where the iteration setting was fixed at 51. The same experiment was repeated for the other five approaches. Tables 2-4 summarize the prediction performance results for the dataset in terms of accuracy, STD, and F-measures, respectively. Comparisons of the four developed Processes 2022, 10, 1611 6 of 10 approaches were conducted. In this study, the effectiveness of merging the optimization algorithms and machine learning (SVM) classifiers was evaluated in terms of refactoring prediction performance. Three optimization algorithms and four prediction data sets were studied in this work. To evaluate the developed approach, we used the Wilcoxon signed-rank test to calculate the p-value and to check for any significant differences.   In this work, the prediction effectiveness is mainly measured through its accuracy. After conducting several experiments, there were differences between the addressed approaches. As shown in Table 2, The experiments achieved a high accuracy rate range of between 0.845 for the SVM-Junit system and 0.937 for McMMO − GA + Whale + SVM. It is clear that added value was gained by merging the SVM with two optimization algorithms. Table 3 summarizes the comparison of the STD results obtained from all proposed approaches. The lowest ST was achieved by the GA + Whale + SVM − McMMO system, and was 1.2755. A low STD means that the data are close to the expected value, while the highest STD means that the data are the most widely spread from the expected results. The highest STD was achieved by the GA + Whale + SVM − Antlr4 system and was 6.668. Despite this result, the integration of the SVM with the optimization algorithms improved the performance of the refactoring prediction. Table 4 shows the comparison of the experiments in terms of the F-measure. The experiments achieved a high F-measure range of between 0.861 for the SVM-Antlr4 system and 0.967 for McMMO − GA + Whale + SVM. It is clear that added value was gained from merging the SVM with two optimization algorithms.
Although the empirical refactoring occurrences of the four open-source software system (JUnit, McMMO, MapDB, and ANTLR4) data sets are widely used for prediction purposes, the high rates of accuracy with only slightly tangible differences between the studied algorithms leads the authors to believe that bigger datasets from different domains should be used for generalizing the findings. Such good prediction rates provide promising results, but further investigation using other datasets will be performed in the future.
Four more experimental comparisons were conducted using four well-known and widely used ML algorithms, i.e., NB-Naïve, IBK-Instance, Random Tree, and Random Forest. Tables 5-7 show the prediction performance in terms of the accuracy, F-measure, and STD of the studied classifiers. As shown in Table 5, the lowest prediction accuracy achieved was 0.825 for the IBK-Instance-Junit system, and the highest prediction accuracy was 0.929 for McMMO-RF. It is clear that the lowest accuracy achieved by the proposed approach (0.845) was higher than the lowest accuracy achieved by the studied ML. Moreoevr, the highest accuracy achieved by the proposed approach (0.937) was higher than the highest accuracy achieved by the studied ML Table 6 shows that the random forest method achieved the best results in comparison to the other classifiers for all the datasets in terms of the F-measure. Still, the proposed approach achieved better results in terms of the F-measure. The lowest F-measure was 0.774, while the proposed approach's lowest F-measure was 0.861. On the other hand, the highest achieved F-measure was 0.876, while the proposed approach's highest F-measure was 0.967. Table 7 shows that the random forest approach achieved the best results in comparison to the other classifiers for all the datasets in terms of STD, while the highest STD was achieved by NB-Naïve and was 6.777.

Threat to Validity
In this study, two optimization algorithms were ultilized by incorporating them into the SVM to improve the proposed fault prediction approach. The summary of the threat to validity is highlighted in regards to the studied software systems and their datasets. In order to reduce the threat in this regard, the authors used a well-known dataset. The dataset includes empirical refactoring occurrences of four open-source software systems (JUnit, McMMO, MapDB, and ANTLR4) [36].
As external threat to validity, the authors have not addressed using the most complex open software systems for evaluating the proposed approach. Moreover, the studied and implemented algorithms are insufficient, but the experiment gives a promising result in terms of refactoring predictions. Therefore, the authors intend to use other optimization algorithms and to incorporate them with the SVM or other classification algorithms in future work.

Conclusions and Future Work
In this paper, the authors address the refactoring prediction at the class level by employing SVM with two optimization algorithms. Genetic and whale algorithms were used in this work, and the performance was evaluated using four open-source software product datasets. To the best of our knowledge, machine learning algorithms are most effective in predicting software refactoring, and developers can make faster and more educated decisions regarding what needs to be refactored. However, the optimization algorithms employed in this study were used for the first time for refactoring predictions at the class level. Several experiments were conducted, and promising performance results were observed. All experiments achieved a promising accuracy rate range between 84% for the SVM-Junit system and 93% for McMMO − GA + Whale + SVM. Merging the SVM with the two optimization algorithms played an important role in enhancing the accuracy of the F-measure. Moreover, four well-known ML algorithms were also used, and the prediction results were compared; the proposed approach achieved better performance in terms of accuracy and F-measure.
In future work, authors will attempt to predict the refactoring type at the class or method level by using the studied algorithms on another dataset. This will give us further information about the accuracy of these refactoring prediction systems, and may also clarify which among these four systems responds the best. Data Availability Statement: Publicly available datasets were analyzed in this study. This data can be found here: https://malenezi.github.io/malenezi/data/Internal-Quality-Evolution-Java (accessed on 1 April 2022).