Immune System Programming: A Machine Learning Approach Based on Artiﬁcial Immune Systems Enhanced by Local Search

: The foundation of machine learning is to enable computers to automatically solve certain problems. One of the main tools for achieving this goal is genetic programming (GP), which was developed from the genetic algorithm to expand its scope in machine learning. Although many studies have been conducted on GP, there are many questions about the disruption effect of the main GP breeding operators, i.e., crossover and mutation. Moreover, this method often suffers from high computational costs when implemented in some complex applications. This paper presents the metaheuristics programming framework to create new practical machine learning tools alternative to the GP method. Furthermore, the immune system programming with local search (ISPLS) algorithm is composed from the proposed framework to enhance the classical artiﬁcial immune system algorithm with the tree data structure to deal with machine learning applications. The ISPLS method uses a set of breeding procedures over a tree space with gradual changes in order to surmount the defects of GP, especially the high disruptions of its basic operations. The efﬁciency of the proposed ISPLS method was proven through several numerical experiments, including promising results for symbolic regression, 6-bit multiplexer and 3-bit even-parity problems.


Introduction
A genetic algorithm (GA) is a meta-heuristic search algorithm that mimics the biological processes of natural selection and survival of the best. GA has been studied and experimented with widely through many applications in several areas [1]. Next, genetic programming (GP) was introduced as a new evolutionary algorithm that inherits the GA strategy [2,3]. The first appearance of pure GP was written in 1985 by Nichael Cramer [4]. He used the idea of GA to propose the tree-based genetic programming approach. Then, this work became popularized via a search technique created by John Koza in the 1990s [5,6]. John Koza demonstrated the feasibility of the GP in many application areas. Since then, the number of research in this field has spread and increased rapidly, and the concept of GP was widely applied in plenty of applications, such as classification [7,8], control [9,10], dynamic processes [11,12], electrical circuit design [13,14], chemical engineering including polymer design [15,16], regression [17,18], and signal processing [19,20].
The main differences between GA and GP can be summarized in the representation of solutions and application fields. Candidate solutions in GP are represented as executed forms, usually hierarchical tree forms, while solutions in GA are represented as fixed-length binary strings or linear real-valued codes [16]. In GP, each tree consists of a set of leaf nodes called terminals and a set of internal nodes called functions. The functions and terminals are defined to represent candidate solutions depending on the problem under study. Moreover, the tree-based representation of a solution is an important feature of GP and enables it to handle and solve a variety of applications. The GP algorithm optimizes a population of computer programs in tree-based representation that evolve in each generation according to their fitness value defined by a program's ability to solve the problem. The algorithm begins the search with a randomly generated set of computer programs. Then, based on their fitness, some individuals are selected from the current population to breed new individuals, using the crossover and/or mutation operators. This process is iterated to drive the population toward the main goal. Consequently, GP is known to pursue the goal of machine learning, which means that computers program themselves without humans guiding them step by step. In addition, many challenging real-world problems have been solved by using the GP technique [18,21]. Despite all these GP successes and applications, standard breeding operators can spoil promising solutions, and there are some risks that the optimal structure will be difficult to find. Therefore, there have been many attempts to modify GP operators with the purpose of maintaining promising individuals and reaching optimal solutions [13,22].
Meta-heuristics are often seen as promising options for solving problems that can be modeled or transformed into optimization problems. The most widely used datastructure types that are used in meta-heuristics are real-valued vectors and bit strings [23]. Choosing a suitable meta-heuristic method that solves a given problem is fundamental. The performance of different meta-heuristics varies, even when they are applied to the same problem. Moreover, the "No Free Lunch" theorem reports that the average performance of search algorithms is equal over all possible applications, so no search method is better than others in general [24]. The presence of a large variety of meta-heuristics methods has stimulated many attempts to extend the scope of meta-heuristics to deal with the tree-data structure and accommodate different types of applications. Using the tree-data structure, there are a limited number of successful attempts to produce alternatives to GP from traditional meta-heuristic methods, such as memetic programming [25], tabu programming [26], and scatter programming [27]. Therefore, we aim to extend more efficient meta-heuristics methods and design new machine learning tools that use tree data structure and manipulate computer programs.
This paper introduces the immune system programming with a local search (ISPLS) algorithm that searches for an appropriate computer program and inspires its idea from the artificial immune system (AIS) algorithm. The AIS algorithm is a meta-heuristic method that takes inspiration from the human natural immune system. The immune system has many attributes that would be advisable for an artificial system to have, such as selforganization, learning, adaptation, and recognition. AIS is growing rapidly, and it applies the principles of the natural immune system in a wide range of application areas [28,29]. Therefore, we introduce ISPLS as a new machine learning approach that behaves as an AIS algorithm but deals with computer programs that are represented as trees. The main contribution of the proposed algorithm is to plan more alternatives to the GP algorithm in order to adapt more application areas by using the evolutionary and adaptive mechanisms of AIS. The main focus is that ISPLS uses breeding operators over a tree space to generate new offspring from the current parent. These breeding operators make the effects of the program under some restrictions. In addition, ISPLS exploits the breeding operators in the structure of the local search (LS) procedure with gradual changes in order to overcome the defects of GP. The results show that ISPLS is promising in terms of solving many benchmark problems; symbolic regression, 6-bit multiplexer and 3-bit even-parity problems.
The paper is organized as follows. The next section illustrates the main framework of MHP. Section 3 describes the basic concept of AIS, which is an inspirational source for the proposed algorithm. Section 4 shows the proposed ISPLS algorithm together with breeding operators over tree space and the LS procedure that achieves the intensification principle in the most promising areas. The numerical results of the ISPLS method are shown in Section 5 for three types of benchmark problems: symbolic regression, 6-bit multiplexer, and 3-bit even-parity problems. Finally, the conclusion appears in Section 6.

Meta-Heuristic Programming (MHP)
Meta-heuristics can be categorized into two categories: population-based methods and point-to-point methods [1]. In the latter group of methods, at the end of each iteration, the search provides only one solution with which the search then begins in the next iteration. On the other hand, at the end of each iteration, population-based methods retrieve a set of many solutions. The most famous meta-heuristic method is the GA, which is considered an example of population-based meta-heuristics, while tabu search, simulated annealing and ant colony optimization are point-to-point meta-heuristics [30,31].
The MHP is a framework that attempts to cover many of the well-known metaheuristic methods and generalizes the data structures used in these methods to be treedata structures, instead of bit strings or vectors of numbers [26]. In the MHP framework, the initial computer programs can be adapted through four procedures to find an acceptable target solution to the problem at hand. The order of these procedures depends on the meta-heuristic algorithm. These procedures are as follows [26,30]: • TrialProgram: attempts to generate trial programs from the current program(s). • Enhancement: enhances the search process by exploiting the best region (good regions are explored more thoroughly to find better solutions) or escaping from the local region if an improvement step is not achieved. • UpdateProgram: selects one or more programs to use for the next generation or the next iterate. • Diversification: directs the search to new unexplored regions in the search space or to escape from the local area.
TrialProgram and UpdateProgram procedures are the basic procedures in MHP. The other procedures are added to achieve better and faster performance from MHP. In fact, by using these procedures, the MHP behaves as an intelligent hybrid framework [30]. The layout of the MHP framework follows the following steps: 1. Initialization: Generate an initial population P 0 (or an initial program x 0 ) and initialize the iteration counter k := 0. 2. Main Loop: Repeat the main search steps (2.1)-(2.4) for M times.

2.1
Trial Solutions: Use TrialProgram Procedure in order to create trial programs S k from the current ones P k (or x k ).

2.2
Enhancement: Apply Enhancement Procedure to improve the programs in S k .

2.3
Solution Updating: Apply UpdateProgram Procedure to choose the next population P k+1 (or next iterate program x x+1 ).

2.4
Update Parameter: Update the current parameters.

Termination: Proceed to
Step 5 if the termination criteria are met. 4. Diversification: If it is necessary to diversify, apply DiverseProgram Procedure to update the population P k+1 (or solution x k+1 ) with new diverse solutions. Set k =: k + 1 and go to Step 2. 5. Intensification: Apply Enhancement Procedure to improve the best programs obtained so far.

Artificial Immune System
The immune system of each organism differs according to its attributes. The human immune system aims to protect the human body from harmful bacteria, fungi, parasites, and viruses that are classified as pathogenic sources and capable of causing disease [32]. Moreover, the immune system recognizes pathogens through antigen molecules and presents different forms of antibodies consisting of T cells and B cells to generate a reaction against these pathogens [33,34].
Although diseases come in many forms, the immune system is considered an adaptive and robust system because it is capable of forming a group of immune cells of many shapes and sizes [35]. These cells combine to form more complex structures called antibodies. In order for the immune system to be more effective, it can change the structure of immune cells to maximize their affinity for the antigen through the process of affinity mutation. Thus, despite the complexity of diseases, immune cells try to adapt themselves to fight these diseases without any prior knowledge of their structure [36,37].
Clonal selection is a popularly accepted theory used to model the immune system's responses to infection. The clonal selection was proposed by Frank Burnet in 1959 [38]. It is based on the cloning and affinity maturation concept. The entire process of clonal selection is based on antigen recognition and cell proliferation. B cells are used to destroy all antigens that invade the body. When the body is exposed to a foreign antigen attack, these B cells clone a specific type of antibody, Ab, which achieves the best bond with antigen Ag. The bending between Ab and Ag is measured by how well the Ab paratope matches an epitope of the Ag. A closer match means a stronger bond. The mutation process is applied to the cloned cells to ensure diversity. Moreover, the selection process ensures that the cells with a higher affinity survive [32].
Many artificial immune algorithms have been proposed to imitate the clonal selection theory. De Castro and Von Zuben [32] introduced a clonal selection algorithm called CLONALG to solve learning and optimization problems in 2002. The CLONALG algorithm is illustrated in Algorithm 1, and its flowchart is explained in Figure 1. For more details, see [32].

1.
Create an initial population at random consisting of N pop candidate solutions according to the problem under study.

2.
Evaluate all antibodies and determine their affinities.

3.
Select n antibodies with the highest affinities.

4.
Create clones of n selected antibodies (the number of copies is determined according to their affinities); a higher affinity means a larger clone size.

5.
Mutate the cloned antibodies at a rate inversely proportional to their affinities.

6.
Re-select the best n mutated cloned antibodies with the highest affinity to compose the new repertoire.

7.
Replace some low-affinity members of the antibody pool with the new random ones.

8.
Repeat Steps 2 to 7 until a given stopping criterion is met.

9.
Return the best antibodies found.
From the CLONALG algorithm, one can note that individuals have independent mutation rates based on their affinities. Specifically, promising solutions that are close to the optimal solution are processed with smaller mutation rates, while those that fall far from the optimal solution undergo greater mutation rates [32]. The cloning process means reproducing new solutions that are copies of their parents. The number of clones, N i c , generated from each of the n best solutions is calculated as follows: where β is a clonal factor ∈ [0, 1], i is the current solution rank i ∈ [1, n], and N pop is the population size; see [37]. By observing the AIS algorithm, one can note that the clonal selection immune algorithm is a class of evolutionary algorithms and that it is inspired by the human immune system [31].

Immune System Programming with Local Search
The proposed ISPLS algorithm deals with a computer program that is represented as a tree with some inner nodes acting as functions and some external nodes representing terminals. The set of functions and the set of terminals and their domains are determined by the user based on the problem at hand. In addition, the tree construction is converted into executable code during the coding process. The search space in the case of the ISPLS algorithm is the collection of all computer programs that can be represented as tree forms. In addition, each computer program has some neighborhoods that should be generated by using the breeding operators that are illustrated in Section 4.1. With these breeding operators, the LS procedure is defined in Section 4.2 for creating new computer programs that meet certain conditions, which are explained later. Section 4.3 summarizes the full steps of the proposed ISPLS algorithm.
The running algorithm of ISPLS achieves the MHP conditions as shown in the following stages: 1.
Initial Stage: The set Pop of initial programs is randomly generated.

2.
Evaluation Stage: For each program in Pop, evaluate its efficiency through its ability to solve the considered problem.

3.
Clonal Stage: Create some clones of the most promising programs in Pop and save them as the Copy set.

4.
Mutation Stage: Apply a mutation mechanism on programs in the Copy set to create a new set of children programs called the Children set.

5.
Divers Stage: Construct a new set, named the Diverse set, that contains diverse programs to assist the search process variety. 6.
Replace Stage: Replace the Pop set with selected programs from Pop ∪ Children ∪ Diverse.
By iterating the last five stages, the ISPLS looks through the search space of the computer program to gain access to an elite program that solves the problem under consideration. However, during the search process of the main algorithm, the ISPLS should achieve the balance between the intensification search and the diversification search. In fact, the intensification search is achieved by ISPLS through the clonal process and uses the mutation process on a small scale to get close to the neighboring of the current program. The diversification search can be done by carrying out a large-scale mutation process of some selected programs, as long as a new set of random programs is generated in each generation during the diversification stage.
In the clonal stage, the fitness values of all programs in the current population are used to rank these programs in descending sort order. Then, each of the first n programs is replicated several times to produce N i c clones according to Equation (1). On the other hand, the ISPLS algorithm implements a mutation process with different scales of each program in the first n best programs using a factor called M i c , which is calculated as follows: where λ is a given parameter with a positive value, and i is the program rank; see [37]. The values of N i c and M i c are proportional to the program efficiency. For the best program in the current population, N i c arrives at its maximum value, and M i c reaches its minimum value. Therefore, the algorithm applies several mutations with small scales and produces a deep exploration around the best program. As long as i increases, N i c decreases, and M i c increases. Therefore, the algorithm applies mutation processes using a large scale to defeat the trapping in the local maxima or minima.

Breeding Operations
The mutation stage has a main role in advance of the search operation. The ISPLS algorithm depends on the LS procedure to perform the mutation process and obtain good programs in the neighborhood of the current one. The LS procedure employs three types of breeding operations, shaking, grafting, and pruning, in different scales to achieve harmony between intensification and diversification searches. The basic ideas of these breeding operations are taken from the tabu programming [26], with some modifications. Those three operations can be classified into two categories: fixed-structure search and dynamicstructure search. Fixed-structure search discovers the neighborhood of the present program by altering its nodes without changing its structure. On the other hand, the dynamic structure search is able to modify the structure of the current program through extending some of its terminal nodes or deleting some of its subtrees. The LS procedure uses the shaking process as a fixed structure search. However, gra f ting and pruning are employed as dynamic-structure searches. Before starting the description of shaking, grafting and pruning procedures, some basic notations are defined. For a program P, we define the following: • d(P), the program depth, is the number of links in the path from the root of the program P to its farthest terminal node. • d(l) is the number of links in the path from node l to the root of the program that contains l. • M d is the maximum depth for a program that is allowed during the search process. • |P| is the number of all nodes in the program P.

Shaking Procedure
The shaking search process can be classified as a fixed-structure search. It creates a new set of computer programs called X S from the current program P by changing, at most, M c nodes that are selected randomly from P without any effect on the structure. The selected nodes and their alternatives should have the same properties, i.e., a terminal node is replaced with a new terminal one, and a function node is replaced with a new function on conditions that have the same number of arguments. Procedure 1 introduces the description of the shaking process that creates a set of N trials from the current program P, with a maximum number of changes M c for each trial. Procedure 1. X s = Shaking(P, N, M c ) 1. Initialization: Set Γ to hold the numbers of all changeable nodes in P and set the program pool X s to be empty.

3.3.1
If a similar alternative value from the collection of terminals or functions exists, replaceP(v(j)) with it.

3.4
AddP to X s .

Return with X s .
It is important to note that in Step 3.3.1 of Procedure 1, a leaf node is exchanged with another leaf node from the terminal set, and a function node is exchanged with another function node of the same number of arguments. However, the procedure is terminated if there are no alternatives for all nodes in the program.

Grafting Procedure
The grafting search is performed as a dynamic structure search to enhance the search process. This process creates a set X G of new programs altered from a program P through the expansion of some of its terminal nodes, which are selected randomly, to be subtrees. Procedure 2 introduces the grafting process to create N trial programs from a program P, where at most M c terminals are expanded to become subtrees.

2.2.1
Set T to contain all terminal nodes inP whose depth is less than or equal to (M d − i).

2.2.2
If T is empty, then terminate. If not, choose a terminal node t ∈ T at random.

2.2.3
The node t is replaced with a new randomly generated subtree with depth i.

2.3
UpdateP and add it to X g .

3.
Return with X g .

Pruning Procedure
The pruning search process is also another type of dynamic structure search, and it is considered the reverse of the grafting process. The pruning process generates a new set of changed programs X P from the program P by changing some of its branches by some terminals. Procedure 3 illustrate the description of the pruning process to create N trials from the program P, where the procedure cuts, at most, M c branches of a specified depth, selected randomly, for each trial. Procedure 3. X p = Pruning(P, N, M c ) 1. Initialization: Initialize X p to be an empty program pool set and update N to be equal to min{N, d(P)}.

2.2.1
Set S to contain all subtrees inP whose depth is equal to i.

2.2.2
From S, choose a subtree θ at random.

2.2.3
Replace θ with a terminal node chosen from the set of terminals at random.

2.3
UpdateP and add it to X p .

3.
Return with X p . Figure 2 shows, graphically, the strategy of the three types of breeding operations. As was mentioned above in the breeding operators, some values must be determined before calling these three procedures, such as N and M c . The N value determines the number of trials that are produced, but also this number of trials is dependent on the number of functions and terminals in the original solution. On the other hand, the M c value controls the amount of changes in each trial; with more detail, if the value of N = 3 and M c = 1, that means the shaking procedure is introduced in three trials. The first trial is generated by choosing a random node and replacing it with another alternative node, and the second trial is also generated by choosing one random node and replacing it with an alternative node, and so on. On the other hand, when the grafting procedure is called, the first trial is generated by choosing a random node and replacing it with a subtree of depth equal to 1, while the second trial is generated by choosing a random node and replacing it with a subtree of depth equal to 2, and so on. Applying the breeding operators' procedures and getting on the neighbors of the current solution P requires a lot of information about it, such as the size of P, the number of terminal nodes and function node in P, the number of the argument to each function node in P, and the depth of each branch in the solution. Moreover, by applying the previous breeding procedures, no one asserts obtaining all the neighbors around an original solution. So, these procedures are considered stochastic search procedures that obtain random candidate solutions in the neighborhood of the solution at hand. In other words, you can obtain totally different results in each call of any of these procedures, although you are using the same solution and the same parameters.
To enhance the result of the search process, the previous breeding procedures were designed to be able to explore the search space gradually. They can be applied with a small scale of changes to avoid the disruption of the current solution and also applied with a big scale of changes to keep the principle of diversity. The gradual changes keep the balance between the intensification and the diversification principles in the search process, and this is considered a major issue that should be taken into account in designing efficient search procedures [1]. On the other hand, the ISPLS is using an LS procedure that is a strategy that exploits the previous breeding procedures to improve the search process. The next subsection illustrates the LS procedure with its details.

LS Procedure
In this subsection, the breeding procedures that were shown in Section 4.1 are employed in the LS procedure to create new programs under some restrictions. Specifically, when intensification is desired, the LS procedure is used on a small scale. Exactly the contrary, the LS procedure is used on a large scale when the diversification search is required. The LS procedure is proposed to find the best program in the neighborhood of the current program and also to cover all the search space by moving from one area to another. Procedure 4 illustrates the steps of the LS procedure, where the main loop of the procedure is terminated if the maximum number M f of non-improvements is reached.

2.1
Apply the shaking procedure and set X = Shaking(P, N, M c ). 2.2 Set X best be the best program in X.

2.3
If X best is better than P, then set P = X best and go to Step 2.1. Otherwise, set k = k + 1.

2.4
If P is better thanP, then setP = P.

2.5
If k < M f , apply only one option randomly selected from the following choices (i) or (ii). Let P = Y best , where Y best is the best program in the Y.

Termination: ReturnP.
In the initialization step, Step 1, the procedure starts with a program P that is received from another algorithm. In addition,P and a counter k take their initial values. The counter k is used to count the number of non-improvements during the search process. The user must determine two positive integers, M f and N. Specifically, M f is the maximum number of non-improvements, and N represents the number of trial programs that are generated in the neighborhood of the current program by using shaking, grafting and pruning search procedures. In Step 2.1, an inner loop iterates the shaking procedure until it finds a better program near P. Then, it replaces the current program P, and the procedure goes back to Step 2.1. Otherwise, the LS procedure updates the counter k and proceeds to Step 2.4 to updateP if a better program is explored. In Step 2.5, when the number of nonimprovements reaches the maximum number of non-improvements M f , the procedure stops and returns withP. Otherwise, it proceeds to Step 2.6 to diversify the search process by applying either the grafting or the pruning procedure, which is chosen randomly. In Step 2.7, the procedure replaces P with the best program in Y and updatesP if a better program is explored, then goes back to Step 2.1. Finally, when the termination condition is satisfied, the algorithm stops at Step 3 and returns with the best program found. The main loop is repeated as long as the value of the counter k does not exceed the maximum value M f . Therefore, the number of fitness evaluations needed during a single run of the LS procedure varies depending on the improvement of the current program. Figure 3 shows the flowchart of the proposed LS procedure. As mentioned above, the strategy of the LS procedure depends on the shaking procedure that detects the close tree structures around the current program and moves to the best of them. Applying this operation several times leads to making good exploration around the current program; the strategy also uses the dynamic structure search to move to another area when the change for the better is stopped in this region. Therefore, the ISPLS algorithm exploits the LS procedure to enhance the search operation.

ISPLS Algorithm
Algorithm 2 summarizes the complete steps of the ISPLS method, using the breeding operations and procedures introduced in the previous subsections, where n rate is given, and it represents the rate of the most promising solutions that will be cloned and mutated. Additionally, the value d rate is given, and it represents the rate of new programs that are created in the diversification stage.
The termination condition is satisfied by one or more of the following: the algorithm reaches the maximum number of iterations, the algorithm reaches the maximum number of fitness evaluations, or the algorithm obtains the optimal solution.

Termination: Return the best program in Pop.
The ISPLS algorithm uses the strategy of the artificial immune system and provides it by the LS procedure to guide the search to an optimal solution in a suitable time. In Steps 1.1-1.3, the algorithm begins by reading and determining the parameter values, and then it generates the initial solutions in a Pop set. Then, it sorts the initial population in descending order according to the fitness value to give the good solutions more opportunity for improvement by the LS procedure. In each iteration, the algorithm selects the best solutions in the population and applies the LS procedure to each of them. The algorithm is designed to consider the number of trials of one solution that are represented by each of the breeding operators as the number of cloning of this solution to ensure that there is no repetition in the output of the breeding operators. In addition, each solution has a single substitute in the next generation. The algorithm in each generation enhances the search operation by adding a new set of solutions that is generated randomly to increase the diversification. The main loop is repeated until a termination condition is satisfied, then it returns with the best program obtained.
The BestPop set represents the memory section, while the other elements of the population are considered the remaining section. The replace stage rivals between the output of the LS procedure BestPop, the remaining section, and the newly generated solution DivPop.
The computational complexity of the worst-case scenario for a single iteration of the main loop of Algorithm 2 is O(N pop log(N pop ) + M f M d T s L c ), and it can be interpreted as follows: • Step 2.1 is O(N pop log(N pop )), to sorting all programs in Pop.
Step 2.5 is O(d), to generate a set of d programs for DivPop. BestPop [35]. As a result, the complexity of ISPLS is higher than the corresponding complexity of AIS, unless the LS procedure succeeds in improving the programs in BestPop and finds the optimal solution.

Experimental Results
In this section, we focus on the performance of the proposed ISPLS algorithm under different environments. In all experiments, the algorithm is terminated as soon as the maximum number of fitness evaluations is reached or the optimal solution is found. In addition, some comparisons between the ISPLS algorithm and other algorithms are reported. It is important to note that, for the ISPLS algorithm, the initial population in each test problem is generated as full trees of depth S d , which can be considered a parameter.

Symbolic Regression Problems
The main target in a symbolic regression problem is to find a mathematical formula that fits a given data set. The fitness value for a program is computed as the sum with an inverse sign of absolute errors between the expected output and the actual output of all fitness cases. Therefore, the maximum fitness value for this problem is 0. In this paper, we considered four symbolic regression problems where a polynomial function f will be given, and the target for each problem is to produce a new function g that approximates the original polynomial with the minimum error by using a data set generated in random from f [26]. The set {+, −, * , %}, is used as the function set for all symbolic regression problems, where the operator % is the protected division, i.e., a%b = 1 if b = 0 and a%b = a/b otherwise [30,39].

The Fourth Degree Polynomial
In this subsection, we consider the quartic polynomial function f (x) = x 4 + x 3 + x 2 + x, where a data set of 20 fitness cases of the form (x, f (x)) is obtained by choosing x uniformly at random in the interval [−1, 1] [2,26]. This problem is referred to as the SR-QP problem in this paper. The function set of the SR-QP problem is {+, −, * , %}, and the terminal set is the singleton {x}.

The Quintic Degree Polynomial
In this problem, we consider the quintic polynomial function f (x) = x 5 − 2x 3 + x, where a data set of 50 fitness cases of the form (x, f (x)) is obtained by choosing x uniformly at random in the interval [−1, 1] [39]. This problem is referred to as the SR-QUP problem. The function set of the SR-QUP problem is set to be {+, −, * , %}, and the terminal set is {x}.

The Sixtic Degree Polynomial
Here, we consider the sixtic polynomial function f (x) = x 6 − 2x 4 + x 2 , where a data set containing 50 fitness cases of the form (x, f (x)) is obtained by choosing x uniformly at random in the interval [−1, 1] [39]. This problem is referred to as the SR-SP problem throughout the remainder of this paper. Moreover, we use {+, −, * , %} as the function set for the SR-SP problem and the terminal set is {x}.

The Multivariate Polynomial
In this problem, the multivariate polynomial function f (x 1 , x 2 , x 3 , x 4 ) = x 1 x 2 + x 3 x 4 + x 1 x 4 is considered, where a data set of 50 fitness cases of the form (x 1 , x 2 , x 3 , x 4 , f (x 1 , x 2 , x 3 , x 4 )) is generated randomly with x i ∈ [−1, 1], i = 1, 2, 3, 4. This problem is referred to as the POLY-4 problem in this paper. Similarly, to the previous problems, we use {+, −, * , %} as the function set; however, the variables x 1 , x 2 , x 3 and x 4 are used to form the terminal set for the POLY-4 problem.

6-Bit Multiplexer Problem
The input to the Boolean 6-Bit Multiplexer (6-BM) function is composed of two "address" bits a 1 and a 0 , and four "data" bits d 3 , d 2 , d 1 and d 0 . However, the value of the 6-BM function is the data bit d a 0 +2a 1 that is singled out by the two address bits, a 0 and a 1 . All 2 6 combinations of the arguments are considered fitness cases. The truth table is formed by calculating the Boolean value of each combination. The fitness value of a program is evaluated as the number of fitness cases, where the Boolean values returned by that program are the correct Boolean values. Therefore, the program's maximum fitness value is the same as the number of fitness cases, which is 64 [26,30]. For the 6-BM problem, the function set is the set of Boolean functions {AND, OR, NOT, IF}, where IF(x, y, z) returns y if x is true, and it returns z otherwise. Moreover, the set of arguments {a 0 , a 1 , d 0 , d 1 , d 2 , d 3 } is used as the terminal set.

3-Bit Even-Parity Problem
A parity bit is a check bit (1 or 0) that is added to the end of a string of binary code. This parity bit is used to denote whether the total number of 1-bits in the string is even or odd. Parity bits are used for error detection purposes, e.g., detecting of transmission errors [40]. The Boolean 3-bit even-parity (3-BEP) function is a function of 3 arguments of bit type, namely a 0 , a 1 and a 2 . Therefore, the function returns 1 if the arguments include an even number of 1-bits, and it returns 0 otherwise. All 2 3 combinations of the arguments are considered fitness cases. The truth table is formed by calculating the Boolean value of each combination. The fitness value of a program is computed as the number of fitness cases, where Boolean values returned by the program are the correct Boolean values. Therefore, the program's maximum fitness value for the 3-BEP problem is 8 [27,39]. For the 3-BEP problem, the set of Boolean functions {AND, OR, N AND, NOR} forms the function set, and the set of arguments {a 0 , a 1 , a 2 } forms the terminal set [41].

Parameter Tuning
This subsection discusses the performance of the ISPLS algorithm through the benchmark problems in the previous subsection. We focus on the effect of the ISPLS parameters and their proper values for each test problem. Therefore, a set of different values is specified for each of these parameters. Then for each value, 100 independent runs are performed in order to calculate the mean value of the number of fitness evaluations used through these independent runs. Other parameters are fixed at their standard values given in Table 1. These values are determined from the common setting in the literature and from some pilot experiments as well. The results obtained from parameter tuning differ from one problem to another because of the difference in the problem's structure and complexity. This is the reason for applying the parameter tuning process to each problem. Through all the experiments in this paper, we use the maximum depth M d = 7, which means the maximum length of a program is 255 nodes for the SR-QP, SR-QUP, SR-SP, POLY-4 and 3-BEP problems. However, the maximum length as of a program for the 6-BM problem is 3280 since the function set contains the ternary function IF(x, y, z). Tables 2 and 3 show the performance of the ISPLS algorithm with different values of each parameter for all the problems under consideration. In all the experiments shown in these tables, the ISPLS algorithm terminates when it obtains the optimal program with the maximum fitness value. GPLab [43,44] is a Matlab toolbox that includes the traditional features and capabilities of GP tools that can be used for a wide range of uses. In this experiment, we compared the results of the ISPLS algorithm with the corresponding results of the GPLab toolbox, assuming that both have a limited number of fitness evaluations. Table 4 illustrates the comparison between ISPLS and GPLab, where MaxnFits is referred to as the maximum allowed number of fitness evaluations [26].
Tabu programming (TP) algorithm is an extended version of the tabu search algorithm [30], in which the search space of the TP algorithm is a collection of all computer programs that can be represented as trees. The main contribution of TP is designing other alternatives to the GP algorithm in order to accommodate more application areas. Table 4 illustrates the comparison between ISPLS, TP and GPLab for 100 independent runs with a limited number of fitness evaluations for each run. Specifically, the maximum number of fitness evaluations for each run is 2500 for the SR-QP problem and 25,000 for both 6-BM and 3-BEP problems [26]. The results of these algorithms are shown in Table 4 in terms of the mean of the number of fitness evaluations needed for each run and the success rate. A one-sample t-test, two-tailed, is used to compare the significant differences between the means of GPLab and TP versus ISPLS, where p < 0.01 * * means that the difference is highly significant, p < 0.05 * means the difference is significant, and p > 0.05 means the difference is not significant. Parameter values of ISPLS are shown in Table 1, except in the 3-BEB problem where the n rate equals 0.3. Table 4. Comparison of GPLab, TP and ISPLS algorithms under the same limitations on the number of fitness evaluations, where p > 0.05 means "not significantly different", p < 0.05 * means "significantly different", and p < 0.01 * * means "highly significant". Poli and Langdon [45] introduced the BC-GP algorithm as a modified version of the GP algorithms. Moreover, extensive numerical experiments have been performed using thousands of independent runs to compare between the standard GP algorithm and the backward-chaining GP (BC-GP) algorithm for the SR-QP and POLY-4 problems [45,46]. In this section, we performed several experiments for the same problems using the ISPLS algorithm to compare our results with those of Poli and Langdon [46].

GPLab
Poli and Langdon [45] used the GP and BC-GP algorithms through two different experiments for the SR-QP problem with different settings. In the first experiment, 5000 independent runs were performed for using N pop = 100 for 30 generations, i.e., the maximum number of fitness evaluations for each algorithm was MaxnFits = 3000. In the second experiment, 1000 independent runs were performed using N pop = 1000 for 30 generations, i.e., the maximum number of fitness evaluations for each algorithm was MaxnFits = 30,000. Similar experiments were conducted for the POLY-4 problem, where 5000 independent runs and other 1000 independent runs were performed with N pop = 1000 and N pop = 10,000, respectively. For each run, the GP and BC-GP algorithms iterated for 30 generations, i.e., MaxnFits = 30,000 and MaxnFits = 300,000 fitness evaluations. Results of the GP and BC-GP algorithms are obtained from [46].
For the ISPLS algorithm, six different experiments were conducted for the SR-SP problem using different settings. Mainly, 5000 independent runs with MaxnFits = 3000, and 1000 independent runs with MaxnFits = 30,000 were performed, where each experiment was repeated three times using N pop = 25, N pop = 50 and N pop = 100. The remaining parameter values of the ISPLS algorithm are shown in Table 1. Figure 4 shows the performance of the proposed algorithm compared with the GP and BC-GP algorithms in terms of the rate of success for the SR-QP problem.
For the POLY-4 problem, we repeated the same experiments of the SR-QP problem using the ISPLS algorithms. Specifically, 5000 independent runs with MaxnFits = 30,000, and 1000 independent runs with MaxnFits = 300,000 were performed and repeated three times using N pop = 50, N pop = 100 and N pop = 150. The remaining parameter values of the ISPLS algorithm are shown in Table 1. Figure 5 shows the performance of all algorithms in terms of the rate of success for the POLY-4 problem.
As noticed from the previous figures, the ISPLS algorithm can find an optimal solution very fast compared with the BC-GP and GP algorithms. Therefore, the ISPLS algorithm can save a lot of time and computations during the search process. One can conclude that ISPLS was superior to GP and BC-GP.  Walker and Miller in [39], Atkinson in [47] and Fang and Joe in [48] examined the performance of different versions of the Cartesian genetic programming (CGP) algorithm. Specifically, extensive numerical experiments have been performed on many benchmark problems using the CGP, ECGP, EGGP, TAPMCGP, and FMCGP algorithms. In this experiment, we present a comparison between the performance of these algorithms and the performance of the ISPLS algorithm. For the CGP and ECGP algorithms in [39], 50 independent runs were performed for each problem under study. However, for the CGP, EGGP, TAPMCGP, and FMCGP algorithms in [47,48], 100 independent runs were performed for each problem under study. Similarly, we also performed 100 independent runs for each problem using the ISPLS algorithm. For each run, the algorithm ran until an optimal solution with the maximum fitness value was discovered. The parameter values of the ISPLS algorithm for each problem are shown in Table 1. Moreover, we used S d = 5, d rate = 0.4 and n rate = 0.3 for the SR-QUP, SR-SP and 3-BEP problems, respectively. Table 5 shows the performance of the CGP, ECGP, EGGP and ISPLS algorithms in terms of the median (ME), median standard deviation (MAD) and interquartile range (IQR) of the number of fitness evaluations used to reach the optimal solution, where results of CGP, ECGP and EGGP were reported from the original references [39,47]. Table 6 shows the performance of Baseline CGP, TAPMCGP, FMCGP and ISPLS algorithms in terms of the mean and standard deviation (Std) of the number of fitness evaluations used to reach the optimal solution, where results of Baseline CGP, TAPMCGP, and FMCGP were reported from its original reference [48]. A one-sample t-test, two-tailed, was used to compare the significant differences between the means of the Baseline CGP, TAPMCGP, and FMCGP versus ISPLS, where p < 0.01 * * means the difference is very significant, p < 0.05 * means the difference is very significant, and p > 0.05 means the difference is not significant.
From results in Tables 5 and 6, we can see that the ISPLS algorithm clearly outperforms all algorithms under consideration for all test problems, except the FMCGP algorithm for the SR-SP problem. These results reflect the power of the LS procedure with the the proposed breeding operator: shaking, grafting and pruning operators.  Table 6. Comparison of CGP, TAPMCGP, FMCGP and ISPLS algorithms, in terms of mean and Std, where p > 0.05 means "not significantly different", p < 0.05 * means "significantly different", and p < 0.01 * * means "highly significant".

Conclusions
The ISPLS algorithm is proposed as an extension of the AIS algorithm, which is a popular population-based meta-heuristic method. Solution representation can be distinguished as the main difference between ISPLS and AIS algorithms. More specifically, solutions in the ISPLS algorithm are computer programs represented by parse trees. In addition, the proposed sets of local search procedures are used to define and explore the best neighborhoods of a solution. This procedure is able to balance between the intensification and the diversification searches. Three types of standard problems were used to test the performance of the ISPLS algorithm through a set of experiments to analyze the main parameters of the proposed algorithm. From these numerical experiments, the ISPLS algorithm showed promising performance compared to different versions of the GP algorithm. Specifically, the ISPLS algorithm outperformed the GP algorithm in terms of success rate and also the required number of fitness evaluations to reach an optimal solution, at least for the well-studied test problems.