Harmony Search-Based Approach for Multi-Objective Software Architecture Reconstruction

: The success of any software system highly depends on the quality of architectural design. It has been observed that over time, the quality of software architectural design gets degraded. The software system with poor architecture design is di ﬃ cult to understand and maintain. To improve the architecture of a software system, multiple design goals or objectives (often conﬂicting) need to be optimized simultaneously. To address such types of multi-objective optimization problems a variety of metaheuristic-oriented computational intelligence algorithms have been proposed. In existing approaches, harmony search (HS) algorithm has been demonstrated as an e ﬀ ective approach for numerous types of complex optimization problems. Despite the successful application of the HS algorithm on di ﬀ erent non-software engineering optimization problems, it gained little attention in the direction of architecture reconstruction problem. In this study, we customize the original HS algorithm and propose a multi-objective harmony search algorithm for software architecture reconstruction (MoHS-SAR). To demonstrate the e ﬀ ectiveness of the MoHS-SAR, it has been tested on seven object-oriented software projects and compared with the existing related multi-objective evolutionary algorithms in terms of di ﬀ erent software architecture quality metrics and metaheuristic performance criteria. The experimental results show that the MoHS-SAR performs better compared to the other related multi-objective evolutionary algorithms.


Introduction
The growing demand for various functional and non-functional requirements in modern software systems increases the size and complexity of the implementation code drastically. Maintaining a large and complex software system is a difficult and challenging task for software developers. The architecture of the software is an effective and highly used abstract design model, it enables the developers to deal with the large and complex software systems [1]. The definition of the software architecture is "the structure or structures of the system, which comprise software components, the externally visible properties of those components, and the relationships among them" [2]. The architecture of the software system can be viewed from the different perspectives of the stakeholder's concerns. The different architectural views separate the different aspects of the stakeholders [3]. The different architectural views of the software system help in understanding the design decisions and their rationales.
Module view architecture [4] is the most common and important architectural view of a software system where low-level implementation units are organized into higher-level implementation modules/components or sub-systems based on different design criteria. To generate an effective module-view software architecture, multiple design criteria simultaneously (often conflicting) are considered in distributing the low-level implementation units into higher-level modules or sub-systems [5,6]. The definition of low-level implementation units into higher-level modules or sub-systems may vary according to the software domain and purpose of software architecture reconstruction. In object-oriented software systems, developed in Java programming languages, the classes can be considered as low-level implementation units, and packages can be considered as higher-level modules or sub-systems. So, the package organization of a Java-based object-oriented system can be viewed as module view architecture where architectural module, i.e., packages contains several Java classes.
To produce a qualitative software system, developers usually follow the various architectural design guidelines and principles. The well-designed software architecture makes the system comprehensible, maintainable, testable, and manageable. Despite good quality software system at deployment, over time, the quality of software architecture generally deteriorates [4]. There can be many reasons for the deterioration of software architectural quality. The frequent modifications made over the software system without conforming the architectural guidelines is the key reason. For example, in object-oriented software, the quality of module view software architecture, i.e., quality of package structure deteriorates if the classes are associated or created in the irrelevant packages during the maintenance of the system. It is witnessed that the poorly designed or severely deteriorated software system can be useless if the architecture is not overhauled or reconstructed.
To keep the software systems to be relevant and useful in a frequently changing environment, their architecture often needs to be renovated. The renovation of software architecture is usually done by restructuring the software system based on the different architectural quality criteria [7]. For a large and complex software system, reconstruction of software architecture is a difficult and time-consuming process. The reverse engineering community has developed a variety of approaches to reconstruct the architecture of existing software systems [8]. The infeasibility of manual and deterministic approaches for such systems leads the application of automated search-based metaheuristic approaches. Moreover, the success of search-based software engineering (SBSE) [9] approach opens the various avenue to address the various aspects of software architecture reconstruction problems. Even there has been large progress towards the application of metaheuristic search algorithms in software architecture reconstruction, but still, some effective metaheuristic search algorithms such as harmony search algorithm [10] gained little attention.
Recently, the harmony search algorithm has been successfully tailored and customized by the different research community to solve the various types of optimization problems [11][12][13][14]. The flexibility of its customization and effective optimization process makes it a popular metaheuristic algorithm. To overcome shortcomings and various efficiency-related challenges, the operators of the original harmony search algorithm have been customized by the different researchers [15]. Moreover, the simple operators and easy parameter tuning for a diverse set of optimization problems make it more acceptable [15]. The original harmony search algorithm intended to address the single-objective optimization problem. Later many researchers and academics customized the algorithm to address the various types of multi-objective optimization problems [16][17][18][19].
Even after the successful customization and application of multi-objective variants of the harmony search algorithm in different areas, it has not been customized and applied to address the software architecture reconstruction. In this work, we customize and apply the harmony search algorithm as a multi-objective harmony search for the software architecture reconstruction problem. The overall framework of the proposed work is named as a multi-objective harmony search for software architecture reconstruction (MoHS-SAR). More specifically, the main contributions of the paper are as follows. • A framework of multi-objective harmony search for software architecture reconstruction (MoHS-SAR) has been presented. To this contribution, various concepts of multi-objective optimization have been exploited and incorporated into the harmony search algorithm.

•
In the proposed MoHS-SAR approach, the external archive concept of TAMoEA, non-dominated sorting and crowding distance concepts of NSGA-II, and candidate solution representation and initialization concepts of MHypEA have been exploited and utilized

•
To demonstrate the effectiveness of the MoHS-SAR, it has been tested on seven object-oriented software projects and compared with the existing related multi-objective evolutionary algorithms.
The experimental results show that the MoHS-SAR is competitive in modularization quality (MQ), coupling, cohesion, and inverted generational distance (IGD) with other related multi-objective evolutionary algorithms.
The rest of the article is divided into the following sections: Section 2 presents the related works corresponding to the software architecture reconstruction and multi-objective evolutionary algorithm. Section 3 presents the basic concepts of multi-objective software architecture reconstruction. Section 4 presents a detailed description of the proposed work. Section 5 discusses the experimentation details. Section 6 presents the results and discussions. Section 7 discusses possible threats that may affect the validity of results. Finally, Section 8 concludes with future works.

Related Work
In the last two decades, many approaches have been suggested by the software reverse engineering community to reconstruct the architecture of the software system from the existing source code implementation. The existing approaches can be broadly categorized into analytical or deterministic and search-based optimization approaches. The analytical based software architecture recovery approaches (e.g., [20][21][22][23][24][25][26]) have been found as a good alternative for small size software applications. However, for the large and complex software applications, the performances of analytical based software architecture recovery approaches degrade drastically. On the other hand, the search-based optimization approaches (e.g., [27][28][29][30][31]) have been found as a good alternative for large and complex software applications. Hence, for the large and complex software systems, the applications of search-based metaheuristics are gaining wide attention among researchers and academics for reconstructing the software systems.
To provide a better understanding on the existing architecture reconstruction approaches, a comprehensive state-of-art survey is presented in the study [8]. The survey is useful for the researchers and academics who want to know the appropriate architecture restructuring methods and would like to reconstruct the architecture from source code information. In another work, the authors [32] provided a detailed process-oriented taxonomy for the software architecture reconstruction. The study [33] introduced a pattern recognition-based software architecture reconstruction approach named as ARM (Architecture Reconstruction Method) where the iterative process used to discover the historical design decision. The approach is based on the deterministic method of software restructuring. The article [34] discussed the Symphony method of software architecture reconstruction, where multiple view-based software architecture approaches are also introduced.
The research described in [35] proposed a plug-in named as Metrics and Architecture Reconstruction Plug-in for Eclipse (MARPLE) for software architecture restructuring and design pattern detection of the source code implementation. The research described in [36] presented a generic metamodel-driven process for reconstructing software architecture and proposed CaCOphoNy. The study [37] discussed the application of software architecture reconstruction to support the renovation of the existing implementation. The authors [38] proposed a software restructuring approach based on graph clustering and partitioning. They reconstructed the software architecture of the existing software systems by improving the cohesion and coupling. The authors [39] used the static and dynamic source code information to reconstruct the static and dynamic view of software architecture. The study [40] proposed a multi-view software architecture reconstruction approach instead of single-view software architecture reconstruction. They especially considered the structure, design, and behavior view of software architecture reconstruction. The study [41] addressed various questions that are relevant to the software architecture reconstruction such as various issues that must be considered, the way of extracting, analyzing, and presenting the information.
The study [42] proposed a software architecture reconstruction approach for Data-Tier software projects. To represent the inter-module dependency for architecture reconstruction hypergraph is used.
The research described in [43] introduced a run-time software architecture reconstruction approach. To recover the run-time software architecture process mining concept was utilized. The research described in [44] presented a behavior-based software architecture approach to reconstruct the up-to-date software architecture. The study [45] conducted a case study to reconstruct the module-view software architecture for the software projects developed in the C# programming language. The research described in [46] proposed an execution-view software architecture reconstruction approach where they used the execution traces, available documentation, and expert knowledge in this purpose. The study [47] provided a systematic analysis and characterization of the proposed/published software architecture reconstruction approaches.
The aforementioned software architecture reconstruction studies have been used successfully to reconstruct the software architecture of various types of software projects. Even these approaches have shown the great results, still, there are some limitations to these approaches. These approaches are mostly based on deterministic/analytical methods in reconstructing software architecture. Hence, for the large and complex software systems, these approaches will take a huge amount of time, and sometimes it can be infeasible to obtain software architecture within time. To overcome this limitation, many researchers and practitioners have suggested the use of metaheuristic search approach to reconstruct the software architecture.
After the incarnation of the Search-Based Software Engineering (SBSE) [9], many software engineering tasks have been formulated as a search-based optimization problem and have been addressed by customizing existing metaheuristic algorithms or by proposing new metaheuristic algorithms. The effectiveness of the SBSE approaches to address the software engineering problems attracted lots of attention to the software engineering community [48]. Recently, the automatic software reconstruction based on the SBSE approach has gained wide attention to the researchers and academics working for reverse engineering field. In the last two decades, many automatic approaches based on SBSE have been proposed to support the software architecture reconstruction.
The authors [28,31] were the first who used the idea of SBSE in restructuring the software system to improve the architectural design. They extracted the software architecture from the implementation using the clustering technique, which is accomplished by the genetic algorithm. The goal of the work was to demonstrate the formulation of software clustering problem as a search-based optimization problem and solving it by customizing the genetic algorithm. Further, the authors [29] developed a tool called Bunch where they implemented the simulated annealing (SA), Hill-climbing (HC), and genetic algorithm (GA). The research described in [30] addressed the software architecture reconstruction problem as a software module clustering problem where the multiple-hill climbing algorithm is used to find out the suitable clustering solution. The research described in [27] also addressed the software architecture reconstruction problem as a software module clustering problem, whereas they used the evolutionary algorithm to obtain the software reconstruction solution. The research described in [28] addressed the software architecture reconstruction problem as a search-based software architecture recovery problem where particle swarm optimization has been used as the search technique. Recently, the research described in [29] addressed the software architecture reconstruction problem as a software remodularization problem where they used the Harmony search algorithm [10] to remodularize the object-oriented software systems.
The search-based software architecture approaches [28][29][30][31] discussed in above paragraphs have performed well. The main limitation of these approaches is that they are designed to address the single-objective formulation of software architecture reconstruction. The single-objective search-based software architecture reconstruction approaches are simple but there may be a generation of sub-optimal software reconstruction solution. In other words, searching for the best software architecture solution from the huge search space using the single-objective as a guiding force may lead towards a sub-optimal point of the search-space. Overall, the software remodularization technique in software architecture reconstruction employs a single-objective cannot be an appropriate result for various types of software systems. To overcome the limitations of the single-objective search-based software architecture reconstruction approaches, researchers and academics have formulated the software architecture reconstruction as a multi-objective search-based software architecture reconstruction problem and solved it using the customized search-based metaheuristic algorithms.
The authors [49] were the first who have introduced the concepts of multi-objective optimization for the software architecture optimization problem. They reformulated the software architecture reconstruction problem as a multi-objective clustering problem and solved using the multi-objective search algorithm, namely Two-Archive multi-objective evolutionary algorithm (TAMoEA) [50]. The TAMoEA is a genetic evolutionary-based multi-objective optimizer. It exploits the external archive and evolutionary concepts to achieve the good approximation of optimal pareto front. The research described in [51,52] customized the non-dominated sorting genetic algorithm-II (NSGA-II) [53] to address the multi-objective software optimization problems related to the module clustering. The NSGA-II is multi-objective extension of canonical genetic algorithm and uses the concept of non-dominated sorting and crowding distance to realize the multi-objective phenomenon. The study [54] introduced a hyper-heuristic-based multi-objective software module clustering approach named as Multi-objective Hyper-heuristic Evolutionary Algorithm (MHypEA): The MHypEA is demonstrated as an effective approach to address the multi-objective software module clustering problems.
Most of the multi-objective software reconstruction approaches in the literature are based on the genetic algorithm. For example, the TAMoEA, NSGA-II, and MHypEA exploited the crossover and mutation strategies of the standard genetic algorithm to produce new solutions for the next generation. The major limitation in the genetic algorithm-based new solution generation is that the approach involves only two candidate solutions of the population in generation of new solution. However, it has been commonly observed that if all the candidate solutions of the population are involved in generation of new solutions, the final results can be more effective. Keeping this observation, we used the framework of the Harmony Search algorithm in our proposed MoHS-SAR where new candidate solution is generated by considering all the existing candidates of the population and proposed.
Other challenges with the TAMoEA, NSGA-II, and MHypEA is that these approaches provide many options for selection of operators and parameter values and the effectiveness of the final results highly depends on the choices of the operators and the parameter values. So, it is very challenging task to evaluate all operator and parameter combinations for the specific optimization problem. For the software architecture reconstruction problem, which is generally fuzzy in nature, it can be more difficult task to find a most suitable operator and parameter combination. On the other hand, in the proposed MoHS-SAR, there are only few operators and parameter setting values. Hence, without elaborate operator and parameter determination strategy, the proposed MoHS-SAR can generate better results compared to existing multi-objective software reconstruction approaches.

Multi-Objective Software Reconstruction Problem
Software architecture reconstruction problem refers to the problem of reconstructing software architecture descriptions from the implementation of the existing system. The architecture reconstruction of a software system is generally required in case of unavailability of documented architecture or availability of poorly documented architecture. The requirements of various views of software architecture documentation needed for the different purposes makes the software architecture reconstruction problem a multifaceted problem. The description of various views of the software architecture is provided in the studies [3,55]. Our software architecture reconstruction problem is mainly focusing on reconstructing module-view of software architecture as our study is concerned with software construction and maintenance.
In module-view software architecture reconstruction, the low-level implementation units are classified into higher-level architectural modules or sub-systems based on some architecture design criteria. In other words, to generate a module-view software architecture more effectively, developers generally follow the various quality criteria simultaneously (often conflicting) in distributing the low-level implementation units into higher-level modules or sub-systems [5,6]. The definition of low-level implementation units into higher-level modules or sub-systems may vary according to the software domain and purpose of software architecture reconstruction [56]. In this study, our module-view software architecture reconstruction is representing the package-view software architecture reconstruction of Java-based object-oriented software systems.
Since, in module-view software architecture reconstruction, the software architecture is reconstructed by reorganizing the low-level implementation units into the higher-level architectural modules based on multiple architectural design criteria. Hence, module-view software architecture reconstruction can be seen as a multi-objective optimization problem. In this study, the module-view software architecture reconstruction problem is expressed as an optimization problem with multiple objectives (often conflicting) representing the different architectural design criteria.
The multi-objective optimization problems usually consist of different conflicting objective functions (especially more than one) that requires to be optimized simultaneously. Each objective function of the multi-objective formulation of optimization problem is defined in terms of decision variables. The search space derived from the objective functions and decision variables are usually regarded as objective and decision space. The mathematical model of the multi-objective optimization problem is defined as follows: Here Φ(x) denotes the objective functions and x represents the decision variable. To define the objective functions for the module-view software architecture reconstruction, multiple architectural design quality metrics representing the needs of the different stakeholders can be considered. Even, there can be a large number of architectural design quality metrics, but in this study, we select a small number of quality metrics. Table 1 provides two sets of multi-objective formulations for module-view software architecture reconstruction. In each set, five objective functions corresponding to the different aspect of software architecture quality are considered. Similarly, there can be many objective functions in multi-objective software architecture reconstruction depending on the quality requirements of stakeholders. Table 1. Multi-objective formulations for module-view software architecture reconstruction.

Architectural Design Quality Metrics-1 (ADQM-1) Architectural design Quality Metrics-2 (ADQM-2)
Intra-module dependency (maximize) Intra-module dependency (maximize) Inter-module dependency (minimize) Inter-module dependency (minimize) Number of module cycles (minimize) Indirect module dependency (minimize) Implementation units in modules (minimize) Implementation units in modules (minimize) Difference in large and small modules (minimize) Difference in large and small modules (minimize) For module-view software architecture, intra-module dependency (i.e., cohesion) and inter-module dependency (i.e., coupling) are two important architecture design quality metrics. The software system with high intra-module dependency and low inter-module dependency is considered as a good software architecture design. However, along with high intra-module dependency and low inter-module dependency requirements, the development stakeholders' desires other architectural quality characteristics such as minimum module cycles, all implementation units should not be accommodated in a single module, and there should not be a large difference between large and small modules.
Overall, the software architecture reconstruction problem is formulated as the search-based combinatorial optimization problem where optimal software architecture from feasible software architecture is captured based on multiple objective functions. The literature of software engineering uses different terms to refer to software architecture reconstruction: software architecture mining, software architecture discovery, software architecture recovery, software reverse architecting, and software architecture extraction.

Basics of HS Algorithm
HS [10] is an effective and easily customizable metaheuristic optimization algorithm, especially designed to address different types of search-based optimization problems. The basic idea of the HS is derived from the pitch tuning process of musician to achieve better harmony. HS was originally developed and designed to optimize the problems consisting of only a single objective. However, later it was customized by many researchers and practitioners to address the different types of multi-objective optimization problems. The small number of parameters and easily tailorable to complex optimization problems, makes HS algorithms as a most popular metaheuristic optimization algorithm. Many researchers and academics adapted the HS algorithm to solve the different aspects of complex real-world optimization problems.
To understand the working of the HS algorithm, it is important to know the basic formulation of the optimization problem. The mathematical formulation of the single-objective optimization problem needs to be defined to understand the working the HS algorithm. The definition of a typical single-objective optimization problem in terms of objective function and decision variables can be expressed as follows: Minimize where f(x) is expressed as objective function or goal of the optimization problem and variable x = (x 1 , x 2 , . . . ,x n ) is the set of decision variables over which objective function has been defined. Each decision variable can be bounded by upper and lower boundaries. The symbol LB i and UB i denote lower and upper boundaries of a particular decision variable x i respectively. The HS algorithm begins with the creation of a set of random harmony vector (i.e., candidate solutions) known as population. The randomly generated candidate solutions of the population are improvised as follows: (1) corresponding to each candidate solution of the population, a new candidate solution is created using harmony memory consideration, pitch adjustment, or randomization methods, (2) if the newly generated candidate solution is found better than the worst candidate solution of the population then worst candidate solution is replaced with the newly generated candidate solution otherwise newly generated candidate solution is discarded, (3) Such improvisation is continued until a predefined stopping criterion is met. The detailed description of the HS corresponding to a single-objective problem formulation described in Equation (2) is as follows: • Step 1. Parameters initialization: In HS algorithm, the main parameters are harmony memory size (HMS), harmony memory considering rate (HMCR), and pitch adjusting rate (PAR). The values of these parameters highly affect the performance of algorithm. The settings of these parameters' values may vary from problem to problem. Hence, for a specific optimization problem, there requires proper tuning of these parameter values. Apart from the HMS, HMCR, and PAR, the stopping criterion or the number of improvisations also needed special care while setting its values.

•
Step 2. Initialize the harmony memory (HM): In harmony memory, candidate solutions are stored as the vectors of the decision variables. In the beginning, the solution vectors are initialized by selecting the random values from the range of each decision variables. For example, the jth decision variable of ith solution vector x j i can be initialized as follows: where L j and U j are the lower and upper bound of the jth decision variable, rand () is a random function that generates a random value from a uniform distribution of [0,1].

•
Step where U j − L j contains a possible set of values of j th decision variable and rand is the uniform random values between 0 and 1. After applying the memory consideration phase, the decision variables of the newly generated solution vector is examined for pitch adjustment. To complete the pitch adjustment operation, the PAR parameter is used as follows: where bw is the distance bandwidth parameter. The value of the bw is selected according to the nature of the optimization problems.

•
Step 4. Harmony memory updation: After generating the new solution vector x new the solution vectors of the harmony vector is updated. If the newly generated solution vector x new is better than the worst solution vector of the harmony memory, the worst solution vector is replaced with the newly generated solution otherwise it is discarded.

•
Step 5. Stopping: Steps 3 to 4 are repeated until the stopping criterion is satisfied. After the stopping of the algorithm, the best solution of the final harmony memory becomes the final solution.

Proposed Software Architecture Reconstruction Approach
In this study, we propose a harmony search-based software architecture reconstruction approach named as MoHS-SAR, i.e., multi-objective harmony search for software architecture reconstruction. The general framework of the proposed approach is presented in Figure 1.
where − contains a possible set of values of decision variable and rand is the uniform random values between 0 and 1. After applying the memory consideration phase, the decision variables of the newly generated solution vector is examined for pitch adjustment. To complete the pitch adjustment operation, the PAR parameter is used as follows: where bw is the distance bandwidth parameter. The value of the bw is selected according to the nature of the optimization problems.


Step 4. Harmony memory updation: After generating the new solution vector the solution vectors of the harmony vector is updated. If the newly generated solution vector is better than the worst solution vector of the harmony memory, the worst solution vector is replaced with the newly generated solution otherwise it is discarded.  Step 5. Stopping: Steps 3 to 4 are repeated until the stopping criterion is satisfied. After the stopping of the algorithm, the best solution of the final harmony memory becomes the final solution.

Proposed Software Architecture Reconstruction Approach
In this study, we propose a harmony search-based software architecture reconstruction approach named as MoHS-SAR, i.e., multi-objective harmony search for software architecture reconstruction. The general framework of the proposed approach is presented in Figure 1. The framework of the proposed approach is divided into many different components and each component is dedicated to performing special activities. The working details of each component are provided in the following paragraphs: • Extraction of software entities and relationships: This component is responsible for extraction of various types of source code information required for software architecture reconstruction. For object-oriented software system, source code classes and their dependency relationships are extracted and used for the software architecture reconstruction • Formation of entity dependency graph: Based on the extracted information of the software classes/entities and their dependency relationships, a class dependency graph (CDG) is constructed. In CDG nodes of the graph represent the classes and the edges represent the class dependency relationships. • Representation of software architecture as candidate solution: To apply the metaheuristic search algorithm, the problems need to be defined and encoded into an appropriate form so that every feasible solution can be generated by applying the search operators. In the context of software architecture reconstruction problem, the intended software system requires to be represented into a suitable form. In this work, we use the integer array encoding method as used by the previous researchers [30,31,57,58] • Objective functions: The main goal of our proposed software architecture reconstruction is to extract the module-view software architecture of the software projects using multi-objective harmony search algorithm. Hence, we need to select the objective functions that can lead the metaheuristic optimization technique in the generation of good approximation of module-view software architecture. In this work, we use the two multi-objective software architecture formulation sets (i.e., ADQM-1, and ADQM-2) In the software architecture reconstruction, the scales of the defined objective function vary largely. To ensure a reliable distance measure, the objective normalization becomes necessary. Given a software architecture reconstruction solution x 1 ∈ X, each of its objectives f i (x 1 ), i = 1, 2, . . . , M is normalized as follows: where z nad i and z * i denote the nadir point and ideal point, respectively. Since the Pareto front in case of software architecture reconstruction is unknown, the value of z nad i and z * i are usually estimated based on the current population.
MoHS-SAR algorithm: To address the multi-objective software architecture reconstruction problem, we design a multi-objective harmony search algorithm by exploiting and integrating the concepts of existing multi-objective optimization approaches. The designed algorithm is referred as multi-objective harmony search for software architecture reconstruction (MoHS-SAR). The pseudocode of the MoHS-SAR is given in Algorithm 1.
The MoHS-SAR contains many supporting concepts that help in performing and controlling the optimization process. To process and store the candidate solutions, three harmony memory is used, i.e., current harmony memory (CHM), new harmony memory (NHM), external harmony memory (EHM). The size of each harmony memory is equal, i.e., HMS. To control the optimization process following parameters are used: number of iteration (NI), harmony memory size (HMS), harmony memory consideration rate (HMCR), external harmony memory consideration rate (EHMCR), and pitch adjustment rate (PAR). The value of these parameter setting has the major influence on the performance of the MoHS-SAR. Hence, appropriate values for these parameters need to be selected. After the selection and initialization of suitable parameter setting of MoHS-SAR, the algorithm proceeds as follows: Step 1: The MoHS-SAR begins with the initialization of CHM. The CHM contains a set of fixed number of candidate solutions. Hence, initialization of CHM requires to initialize each candidate solution residing in CHM. Each decision variable of the candidate solution of CHM is initialized by using the random initialization method defined as follows: After initializing candidate solutions of the CHM, the corresponding objective vectors of each candidate solution are computed. For this each objective function value of candidate solution is determined.
Step 2: In this step, the entire candidate solution of CHM is sorted into different non-dominated fronts (i.e., front1, front 2, . . . , front n) using the concept of non-dominated sorting method [24]. The front 1 solutions are superior to rest of the solutions of remaining fronts. To maintain the superior solutions found in every iteration of the algorithm, the solutions of front 1 is transferred into an external harmony memory (EHM) based on EHM updation rule.
Step 3: Generating NHM: It is an important step of the MoHS-SAR, where the candidate solutions of CHM is improvised and stored into NHM. In the harmony memory improvisation process, new candidate solutions for NHM are created, corresponding to each candidate solution of CHM. The decision variable values of each new candidate solution of NHM are determined using the following three rules: 1) Using the candidate solution of CHM or EHM, 2) Adjusting the pitch values i.e., decision variable values, and 3) Considering the random selection.
In CHM or EHM consideration, the decision variable values of new candidate solution are generated based on the decision variable values of candidate solutions resided in the CHM or EHM. The consideration of CHM or EHM depends on the harmony memory consideration rate (HMCR) or external harmony memory consideration rate (EHMCR). After consideration of CHM or EHM, the pitch of each candidate solution is adjusted based on the pitch adjustment rate (PAR). If the CHM or EHM consideration is not applied, then the decision variable value of new candidate solution is generated based on the random selection approach.
Step 4: Updation rule of EHM: In every iteration, the candidate solutions of NHM is sorted based on the non-dominated sorting technique and the candidate solutions of the first front is moved to the EHM based on its updated rule. In this rule, the front 1 candidate solutions are compared with each candidate solution stored in the EHM. If the front 1 candidate solution dominates to EHM's candidate solutions then the dominated solutions of the EHM are removed and front 1 solution is placed in EHM, if the front 1 solution is dominated by all solutions of EHM then it is discarded, if the front 1 solution is non-dominated with all solutions of EHM then it is also inserted into the EHM. If the size of EHM increases by its defined size, then the candidate solutions are sorted using non-dominated sorting techniques and the solutions of front considered for EHM. If any front cannot be accommodated in EHS then crowding distance [24] is used to select partial front for EHM.
Step 5: Stopping: Steps 3 to 4 are repeated until the stopping criterion is satisfied. After the stopping of the algorithm, the best solution of the final EHM becomes the final solution.

Experimentation Setup
To carry out a proper investigation about the effectiveness of proposed MoHS-SAR, an appropriate experimentation setup is established. This includes selection of test problems and their representations, configuration of competitive approaches, and suitable parameters settings. The experiments are conducted on a Laptop with Microsoft Windows 10 Home Single Language 64-bit operating system and Intel Core i7-1160G7 4.4 GHz CPU. Since the proposed and existing approaches are stochastic optimizers, they may not produce the same output on the different run for the same problem instance. Hence, we run each algorithm 31 times and applied the Wilcoxon rank-sum test to evaluate the quality measures of software architecture reconstruction solutions.

Test Problems
To demonstrate effectiveness of the proposed MoHS-SAR, seven object-oriented software projects especially developed in the Java programming languages have been selected for experimentation. These software projects are open source and easily available. To avoid the biased implication regarding the results, the software projects having different size and complexity level are included. A brief description of the selected software systems is presented in Table 2. The main reason for inclusion of these object-oriented software project in our experimentation is that these software projects are having diverse characteristics and have been used by many researchers and academics for the different software engineering research purposes [57][58][59][60]. To extract the various required attributes such as classes and their relationships of the software systems, we followed the same tools and method as used in the works [57][58][59][60].

Competitive Approaches
In the literature of SBSE, many multi-objective optimization algorithms dedicated to addressing the different aspects of software engineering problems have been proposed. To justify the effectiveness of the proposed MoHS-SAR, we have selected some related multi-objective optimization algorithms addressing the similar software engineering problems for the comparison. A brief description of these selected metaheuristic optimization algorithms is given below.
• Two-Archive multi-objective evolutionary algorithm (TAMoEA): The TAMoEA [50] was designed to address the multi-objective software module clustering problem. It is a genetic evolutionary-based multi-objective optimizer. It exploits the external archive and evolutionary concepts to achieve the good approximation of optimal pareto front. • Non-dominated sorting genetic algorithm-II (NSGA-II): The NSGA-II [53] is multi-objective extension of canonical genetic algorithm and uses the concept of non-dominated sorting and crowding distance to realize the multi-objective phenomenon. This algorithm has also been tailored to address the multi-objective software module clustering [60].

•
Multi-objective Hyper-heuristic Evolutionary Algorithm (MHypEA): The MHypEA [53] is especially designed to solve the multi-objective software module clustering. This algorithm exploits the concepts of hyper heuristics and evolutionary optimization to achieve the multi-objective optimization behaviour.
The main reason for selection of the above algorithms is that these algorithms were initially designed or customized to address the multi-objective software module clustering problem. But they can be easily used to address the multi-objective software architecture reconstruction problems as the software module clustering technique can be used to reconstruct the module-view of software architecture.

Parameter Settings
The parameter settings of the search-based metaheuristic algorithms play an important role in the performance of the results. [61,62] To make an appropriate parameter value, we use the parameter values as suggested and used by the original manuscript of the competitive algorithms (i.e., TAMoEA, NSGA-II, and MHypEA). The details of the parameter settings of each algorithm are given in Table 3. As the size of software systems (i.e., number of entities) used for the validation of the proposed approach is not the same, hence using the same size for the population and other parameters is not appropriate. Therefore, parameter values of the algorithms have been defined in terms of the number of entities present in the particular software system.

Results and Discussion
This section is intended to show the effectiveness of the proposed MoHS-SAR presented in this study to address the problem of software architecture reconstruction. This section is divided into four parts, as follows: The first part is designed to study the improvement of MQ software architecture quality. Second part is related to the software coupling measurement. Third part is concerned with the cohesion measurement. The last part is dedicated to IGD measurement. Each of these parts is associated with both ADQM-1 and ADQM-2 multi-objective software architecture reconstruction formulation and TAMoEA, NSGA-II, MHypEA, and MoHS-SAR algorithms.
Since the proposed and existing approaches are stochastic optimizer, hence, we used the Wilcoxon rank sum test to evaluate the quality measures of software architecture reconstruction solutions. In this study, the significance level for the result evaluation is set to 0.05. In every part of the result demonstration, the performance of proposed MoHS-SAR is also compared with the performance of TAMoEA, NSGA-II, and MHypEA algorithms. The comparative results are signified with symbols of "[−]", "[+]," and "[≈]". The symbols "[−]", "[+]," and "[≈]" used in the Tables 4-7 denote that the relative performance of the TAMoEA, NSGA-II, and MHypEA algorithms is significantly worse than, better than, and not significantly different than that of the proposed MoHS-SAR, respectively.    Overall, the proposed and existing approaches are evaluated with ADQM-1 and ADQM-2 multi-objective formulation designed for the software architecture reconstruction problem. First, the comparison of MoHS-SAR with TAMoEA, NSGA-II, and MHypEA in terms of software quality metrics (i.e., MQ, coupling, and cohesion) with ADQM-1 and ADQM-2 multi-objective formulation is described. Secondly, the comparison of MoHS-SAR with TAMoEA, NSGA-II, and MHypEA in terms of Pareto quality indicator metrics (i.e., IGD) with ADQM-1 and ADQM-2 is presented. Finally, a summarized analysis of the software quality metrics (i.e., MQ, coupling, and cohesion) with ADQM-1 and ADQM-2 is provided. Table 4 presents the MQ results obtained through proposed MoHS-SAR and existing approaches TAMoEA, NSGA-II, and MHypEA with all seven subject systems under both ADQM-2 and ADQM-1 multi-objective formulation. In ADQM-2 multi-objective formulation, the proposed MoHS-SAR performing significantly better to the existing approaches TAMoEA, NSGA-II, and MHypEA in most of the cases. The proposed MoHS-SAR is significantly better in four cases, five cases, and four cases to the MHypEA, TAMoEA, and NSGA-II, respectively. Similar to the ADQM-2 multi-objective formulation, the proposed MoHS-SAR is performing significantly better to the existing approaches TAMoEA, NSGA-II, and MHypEA in most of the cases. Table 5 presents coupling as an assessment criterion to evaluate the performance of proposed MoHS-SAR and existing approaches TAMoEA, NSGA-II, and MHypEA. The lower the value of coupling is considered as better software architecture reconstruction solution. If we see the coupling results provided in Table 5, it is clear that the proposed MoHS-SAR approach has the lower coupling values compared to the existing approaches TAMoEA, NSGA-II, and MHypEA in most of the cases and these values are significantly lower in most of the cases. This is true for both the multi-objective formulations ADQM-1 and ADQM-2.
The cohesion values of software architecture reconstruction achieved with the proposed MoHS-SAR and existing approaches TAMoEA, NSGA-II, and MHypEA is presented in Table 6. These cohesion results are collected under both ADQM-2 and ADQN-1 multi-objective software architecture reconstruction formulations. In contrast to the coupling values of software architecture reconstruction solution, the higher values of the cohesion of software architecture reconstruction solution is considered as good quality software architecture. The cohesion results presented in Table 6 demonstrate that the proposed MoHS-SAR is able to generate the software architecture reconstruction solutions with higher cohesion values compared to the existing approaches TAMoEA, NSGA-II, and MHypEA. The cohesion results show that the proposed MoHS-SAR is producing significantly larger cohesion values to the existing approaches TAMoEA, NSGA-II, and MHypEA in most of the cases. If we see the cohesion results of both ADQM-2 and ADQN-1 multi-objective software architecture reconstruction, the proposed MoHS-SAR is producing significantly larger cohesion values to the existing approaches TAMoEA, NSGA-II, and MHypEA in most of the cases.
The MQ, coupling, and cohesion assessment criteria are used to determine the quality of software architecture design. The MQ, coupling, and cohesion results reported in Tables 4-6, showed that the proposed MoHS-SAR approach outperforms the existing TAMoEA, NSGA-II, and MHypEA approaches. In other words, the better MQ, coupling, and cohesion value of the proposed MoHS-SAR compared to the TAMoEA, NSGA-II, and MHypEA indicates that the proposed approach is able to reconstruct better software architecture compared to the existing TAMoEA, NSGA-II, and MHypEA approaches. Hence, the proposed MoHS-SAR approach can be considered as a good alternative for software architecture reconstruction.
Even if the proposed MoHS-SAR approach is performing better in terms of the software quality metrics to the TAMoEA, NSGA-II, and MHypEA approaches, its performance needs to be validated with Pareto quality indicators to justify its significance from the algorithmic perspective. For this, we used the IGD quality indicator to evaluate the performance of multi-objective optimization algorithms. In Table 7, the results of IGD Pareto quality indicator obtained through the proposed MoHS-SAR approach and existing TAMoEA, NSGA-II, MHypEA approaches with ADQM-2 and ADQM-1 multi-objective formulation are presented. The IGD results clearly indicate that the proposed MoHS-SAR is performing better compared to the existing TAMoEA, NSGA-II, MHypEA approaches in most of the cases.  Even our proposed MoHS-SAR approach performs well compared to the existing multiobjective software reconstruction approaches in terms of MQ, coupling, cohesion, and IGD, still there are many limitations and differences compared to the traditional software architecture recovery approaches. Our proposed approach mainly focuses on structural module view of software architecture reconstruction and is not guaranteed to generate architectural solution compliance with the developers' view of software architecture. Many traditional approaches are relied on the textual or non-structural input (e.g., identifiers names, method names and file authorship) to generate the architectural components. The design of software architecture reconstruction approaches also varies according to the programming languages used for the software systems. Our proposed approach is designed for the software systems developed in the object-oriented programming languages. But our approach can also work with the software systems developed in other programming languages by redefining the meaning of software entity and component. In summary, the fuzzy nature and multiple views of the software architecture encourage the designing of different types of software architecture reconstruction approaches.

Threats to Validity
The proposed approach is based on the concepts of the search-based metaheuristic optimization where various operators and parameters are used to make an effective search. The better coordination of operators and well-suitable values of the parameters can lead the search-based metaheuristics towards more appropriate optimization results. Apart from the methodology, the modeling of the problem can also affect the outcome of the approach. In this section, we discuss various types of factors that can affect the validity of the outcome obtained through the methodology. More specifically, we have considered the external and internal validity. Overall, the proposed MoHS-SAR approach produces better results compared to the existing multi-objective software architecture reconstruction approaches. There can be multiple reasons for producing such good results. The existing approaches TAMoEA, NSGA-II, and MHypEA are based on the concept of the evolutionary approach, i.e., Genetic Algorithm where the off-spring is created using the two-parent vectors. On the other hand, in MoHS-SAR, new candidate vector is generated by considering all the existing candidate vectors. This feature of the MoHS-SAR makes the approach flexible and effective in the generation of better results. The next reason is that the MoHS-SAR requires only a few numbers of parameters to be determined before the run of the algorithm. Whereas, the TAMoEA, NSGA-II, and MHypEA approaches provide many options for operators and parameters and the effectiveness of the approaches highly depends on the choices of the operators and the parameters' values. It is very tedious to evaluate all operator and parameter combinations for the specific optimization problem. Without an elaborate operator and parameter determination strategy, the proposed MoHS-SAR can generate better results compared to the TAMoEA, NSGA-II, and MHypEA approaches.
Another reason for such good results of the proposed MoHS-SAR approach could be the effective maintenance of best non-dominated solutions obtained in an individual iteration of the algorithm. The proper customization of metaheuristic algorithms suiting the optimization problem always helps in producing good quality solutions. The hybridization of multiple concepts of different metaheuristic algorithms also leads to the generation of good quality solution. In the proposed MoHS-SAR approach, the external archive concept of TAMoEA, non-dominated sorting and crowding distance concepts of NSGA-II, and candidate solution representation and initialization concepts of MHypEA have been exploited and utilized.
Even our proposed MoHS-SAR approach performs well compared to the existing multi-objective software reconstruction approaches in terms of MQ, coupling, cohesion, and IGD, still there are many limitations and differences compared to the traditional software architecture recovery approaches. Our proposed approach mainly focuses on structural module view of software architecture reconstruction and is not guaranteed to generate architectural solution compliance with the developers' view of software architecture. Many traditional approaches are relied on the textual or non-structural input (e.g., identifiers names, method names and file authorship) to generate the architectural components. The design of software architecture reconstruction approaches also varies according to the programming languages used for the software systems. Our proposed approach is designed for the software systems developed in the object-oriented programming languages. But our approach can also work with the software systems developed in other programming languages by redefining the meaning of software entity and component. In summary, the fuzzy nature and multiple views of the software architecture encourage the designing of different types of software architecture reconstruction approaches.

Threats to Validity
The proposed approach is based on the concepts of the search-based metaheuristic optimization where various operators and parameters are used to make an effective search. The better coordination of operators and well-suitable values of the parameters can lead the search-based metaheuristics towards more appropriate optimization results. Apart from the methodology, the modeling of the problem can also affect the outcome of the approach. In this section, we discuss various types of factors that can affect the validity of the outcome obtained through the methodology. More specifically, we have considered the external and internal validity.
In external threats to validity, those factors are considered that could affect the results of the approach for the broad perspective of the problems. In software architecture reconstruction, the software system to be reconstructed are generally contains different characteristics. Some software system may be small and less complex, and some software system may be large and high complexity. Hence, generalizing the results to broader categories of software through some specific software is very challenging. Some approaches may perform better over some set of software systems; however, some approaches may perform better over some other set software systems. This threat to validity for our approach has been mitigated by transforming the software systems into an abstract representation called entity dependency graph. Since, in this abstraction, many of the software systems can be transformed into a certain set of entity dependency graphs. Further, to validate the approach, software systems with various size and complexity have been considered in our experimentation.
In internal threats to validity, the degree of relationships of the independent variable to the dependent is considered. In other words, the internal threat to validity corresponds to the degree of cause-and-effect relationships to which conclusion can be drawn about the causal effect of independent variables on the dependent variables. In our approach, possible internal threats to validity can be selection and assumption about the statistical analysis.

Conclusions and Future Works
This paper has proposed an HS-based multi-objective optimization algorithm for solving software architecture reconstruction problem, called MoHS-SAR. In the proposed MoHS-SAR, the concepts of external harmony memory and crowding distance have been exploited effectively and incorporated in the standard harmony search framework. The experimental results have demonstrated that the MoHS-SAR can produce better software architecture reconstruction problems than several existing customized multi-objective software architecture recovery approaches, such as TAMoEA, NSGA-II, and MHypEA. According to the experimental results, the proposed approach can generate software architecture reconstruction solution having better values of MQ, coupling, cohesion, and IGD quality metrics. However, the proposed approach does not guarantee to generate a good architectural solution from the perspective of other architectural views than the structural module view of software architecture. The proposed approach works well with a limited number of objectives, but for the software architecture problems having a large number of objectives may not perform well. So the proposed approach needs to be further improved for many-objective software architecture reconstruction problems. Future works include the consideration of multiple dimensions of source code information (e.g., lexical, dynamic, etc) along with more quality aspects as the objective function in software architecture reconstruction. To strengthen the supremacy of the proposed approach, it needs to experiment over more complex and large software systems.