Universal Approach to Solution of Optimization Problems by Symbolic Regression

: Optimization problems and their solution by symbolic regression methods are considered. The search is performed on non-Euclidean space. In such spaces it is impossible to determine a distance between two potential solutions and, therefore, algorithms using arithmetic operations of multiplication and addition are not used there. The search of optimal solution is performed on the space of codes. It is proposed that the principle of small variations of basic solution be applied as a universal approach to create search algorithms. Small variations cause a neighborhood of a potential solution, and the solution is searched for within this neighborhood. The concept of inheritance property is introduced. It is shown that for non-Euclidean search space, the application of evolution and small variations of possible solutions is effective. Examples of using the principle of small variation of basic solution for different symbolic regression methods are presented.


Introduction
All optimization problems can be divided into two large classes. One class includes the problems, where a target function is calculated on values of elements of search space. In problems of another class, the calculation of target function is performed on elements from one space, and the search for optimal solutions is done on the other space with other metrics. These metrics do not coincide, although one-to-one mapping exists between these spaces. The problems of the second class will be referred to as the optimization in non-Euclidean space or, briefly, non-numerical optimization.
All NP-hard problems belong to non-numerical optimization. For solution of nonnumerical optimization problems, random search or complete enumeration algorithms are usually used, or if the problem is of great importance then some special algorithms are developed.
Recently, symbolic regression methods have appeared. They can be applied to a nonnumerical optimization. All methods of symbolic regression search for optimal solutions on a space of codes. The first method of symbolic regression is genetic programming (GP) [1]. The author of genetic programming applied a genetic algorithm to search for an optimal program code. For this purpose, a program was presented in a universal form with prefix notation. Each operator is the code of operator and codes of operands. The codes of other operators may be operands as well. The entire structure can be presented as a tree. Each node of the tree is associated with an operator. The number of branches from a node equals the number of operands for the operator associated with the node. Variables and constant parameters are placed on leaves of the tree.
One of main achievements of the author of GP is that he managed to apply genetic algorithm (GA) to search for a solution encoded by a complex computational operator tree. To do this, it was necessary to change the crossover operation of GA. In GP, the Definition 1. Optimization in a non-numerical space is an optimization when a target function is calculated for elements in a space with one metrics, and the search of optimal solution is performed in a space with other metrics, and these two metrics do not coincide.
The search for the optimal solution consists of performing certain actions on the elements of spaces, for example, calculating the gradient, addition, multiplication by scalars, etc., which also includes arithmetic operations on the components of possible solutions in order to obtain new possible solutions with the optimal value of the target function.
There are optimization problems in which the target function is calculated over the elements of possible solutions presented in spaces with metrics (1)-(3), but the possible solutions and the actions performed on them to obtain new solutions are presented in a different space, with different metrics. Let us call this generalized space the code space where s is a code of possible solution from space with numerical metrics (1)-(3), S is a code space.
There is a one-to-one correspondence between the code space and the spaces with numerical metrics It is also possible to define metrics in the code space, for example, the Levenshtein [25] or Hamming distance or some other, but these metrics will not be the same for the corresponding elements from the spaces with distances (1)- (3). It means that if the inequality is satisfied where s 1 , s 2 , s 3 ∈ S, then the inequality G ∈ {R, C, C}, may not be satisfied. Typical optimization problems in a space with a non-numerical metrics are NP-hard problems. For example, the traveling salesman problem (TSP), when you need to visit all towns once minimizing some criterion (route). In the problem, the target function is calculated in a space with Euclidean metrics. Each possible route is given by the order of towns. A possible solution code is a combinatorial permutation of indices of towns. It is believed that the problem of solving many NP-hard problems is that the search for the optimal solution is performed in one space, and the value of the target function is calculated in another space.
In the second half of the 20th century, a genetic algorithm appeared [26]. The algorithm was designed to find the optimal solution in the vector space. The search for the optimal solution is done on the space of Gray codes, i.e., possible solutions are transformed from the vector space into the space of Gray codes. Genetic operations are performed on the Grey codes. Then, to evaluate the new codes of possible solutions they are decoded into real vectors. Such complications in the search are most likely associated with the fact that the problem of global optimization is being solved, and the target function is not unimodal.
This work is devoted to the methods of symbolic regression. These techniques emerged as a development in genetic programming [1]. Here, we use symbolic regression to find the mathematical expression of some function coded in some way. Then, GA is applied to these codes to find the optimal solution.
Note that classical GA performs the crossover operation similar to the crossover of genes in living organisms. The crossing point is determined randomly and the tails of the genes change after the crossing point. If GA crosses two identical codes then the same codes are obtained as a result of crossover.
In GP, the code for a possible solution is a computational tree. To encode a mathematical expression, it is necessary to determine a basic set of elementary functions. Each elementary function can be encoded by two integers, the function index and the number of its arguments. The computational tree of mathematical expression is a sequence of codes of elementary functions. Unlike codes in GA, codes of different mathematical expressions in GP have different lengths. Therefore, to perform the crossover, we need random crossing points for each parent. Crossover is performed by exchanging branches starting from crossing points. Since the sizes of the branches are different, the lengths of the offspring differ from the lengths of the codes of the parents. The crossover in GP is not similar to crossover of genes in living organisms. When crossing two identical codes we obtain two different codes due to two different crossover points.
Note that when searching for the structure of the mathematical expression, it is necessary to use the code space, since the mathematical notation of a function itself is also a specific code, in a space with a non-numerical metrics. Previously, the search for mathematical expressions ceased to search parameters with some accuracy. In the search process these mathematical expressions could become more complex, as happens in series or in neural networks. GP and other symbolic regression methods allow searching not only for parameters but also for the structure of mathematical expressions. Nowadays, many symbolic regression methods have emerged that eliminate some of the disadvantages of genetic programming. For example, due to redundancy, all codes of possible solutions may have the same length.
All methods of symbolic regression use special operations of crossover and mutation, which allow constructing codes for new possible solutions. It should be noted that, due to the complexity of the code, crossover operations can create new possible solutions that do not preserve the properties of the parents as in classical GA. In this case, complex operations of crossover and mutation can generate new possible solutions such as random number generators.
The effectiveness of symbolic regression methods is usually compared to random search. This means that the operations of crossover and mutation of symbolic regression methods should generate a set of codes of new possible solutions in which the guaranteed best possible solution has a value of target function less than the best possible in the set of codes of the same cardinality, generated randomly.
Let us define the inheritance property of crossover for symbolic regression methods.

Definition 2.
Symbolic regression method has an inheritance property, if at the performing crossover operation for M randomly selected possible solutions, αM , where 1/M < α 1, new possible solutions exist that have a value of target function that differs from parent values by no more than ∆ 0. Theorem 1. For the symbolic regression method with the inheritance property to be more efficient than the random search in solving optimization problems in a non-numerical space, it is enough that the value of the target function is uniformly distributed from f − to f + and f * < α/2 f + + (1 − α/2) f − , where f * is a value of functional of possible solution, α is a neighborhood in the space of solutions with values of functionals differ from f * no more than α/2.
Proof. Let us find the possible solution with the target function f * . Then, according to the condition of the theorem, random search will find a possible solution with the value of target functionf < f * with probability In symbolic regression, the probability of getting into the neighborhood of parent solution is α. Due to uniform distribution, half of the solutions is better than the parent solution, and the other half is worse and, thus, the probability of finding the possible solution with target functionf < f * according to inheritance property is Then p S > p RS if the condition is fulfilled Consider a universal approach to creating symbolic regression methods that most likely preserves the inheritance property.

Principle of Small Variations of Basic Solution
The principle of small variations is universal and can be applied to any symbolic regression method. To apply the principle, small variations are initially determined for the code of symbolic regression method. Small variation is coded by an integer vector of small dimension. This code contains the information required to perform the small variation. Then, one possible solution, let us name it a basic solution, is encoded by the method of symbolic regression. The basic solution is set by the researcher, and it is the closest to the optimal solution of the problem. All other possible solutions are determined by ordered sets of vectors of small variations. The search for the optimal solution is performed by the GA, which is called variational genetic algorithm (VarGA), that searches for the optimal solution on the space of ordered sets of vectors of small variations. During the search and after a given number of generations, the basic solution is replaced by the best possible solution found by this moment.
Let us consider application of the principle of small variations in detail. In general, the elements of the code space of the non-numerical optimization problem can be written in the form of ordered sets of integer vectors where S i is a coded element. Each element of (16) consists of a given number of integer vectors S k = (s k,1 , s k,2 , . . . , s k,n k ).
Here, s k,j is an integer vector, j = 1, n k , n k is a length of one element code, where r is a length of code vector. An element of code can be a vector, a matrix or an ordered set of matrices. In general, these constructions of code can always be presented as an ordered set of integer vectors if you do not use special mathematical operations for them. For example, if a code is an where Hence, the code of the symbolic regression method is an ordered set of integer vectors. A set can have a different number of vectors, but all integer vectors have the same number of elements.
Let us introduce an elementary variation of the code of a non-numerical element. Replacement of one element does not always result in a new valid code. There are certain coding rules that should not be violated. Let us define a small variation.

Definition 4.
A small variation is the required minimum set of elementary variations that are necessary to obtain valid code of the element from another valid code.
In some symbolic regression methods, a small variation consists of one elementary variation, and in other methods, several elementary variations are needed.
For a given set of valid codes, let us define a finite set of small variations Completeness is the main property of a set of small variations for a set of valid codes with bounded length.

Definition 5.
A set of small variations is complete if for any two valid codes ∀S i , S j ∈ Ξ it is always possible to find a finite number of small variations to obtain a valid code S j ∈ Ξ from valid code S i ∈ Ξ.
Any small variation is a mapping function of set of valid element codes into itself Let us define the distance between two valid codes.

Definition 6.
The distance between two valid codes equals the minimum number of small variations to obtain a valid code from another valid code.
Here, the distance corresponds to the Levenshtein metrics [25] but for symbolic regression codes.
Let us define the M-neighborhood ∆ M (S) of code S.

Definition 7.
Neighborhood ∆ R (S) of code S is a set of all codes that are less than R far from S.
To describe a variation let us introduce a vector of variation where r is a dimension of the vector of variation that is determined by the information required to perform a small variation δ i (S). It depends on the symbolic regression method. For example, w 1 is an index of small variation, w 2 , w r−1 is the index of the variable symbol in the code or the indices of the element in vector or in matrix by which the variable element is determined, w r is the new value of the variable element of the code S. Vector of variation is an operator that influences the code and transforms it into another code According to the principle of small variations of the basic solution in the code space for symbolic regression methods, let us define a basic solution. The basic solution is set by the researcher based on the assumption that this possible solution should be the closest to the desired optimal solution. Here, the researcher may interfere with the machine search and "advise" the machine where to search for the optimal solution. If a researcher is looking for, for example, a mathematical expression for solving identification or a control synthesis problem, then he can simplify the statement, find an analytical solution, and use it as a basic one. Next, he encodes the basic solution using the symbolic regression method S 0 = (s 1 , . . . , s n 0 ). (29) Other possible solutions are given by the sets of vectors of variations where Thus, any possible solutions S i is in R-neighborhood of basic solution that is why d(S 0 , S i ) R, 1 i H. Now, instead of searching for a solution on the entire set of codes (16), a solution is sought in the neighborhood of a given basic solution on the space of vectors of variations (30).
To find the optimal solution, we use a genetic algorithm. In this case, there is no need to develop special operations for crossover and mutation.
When performing the crossover, we select two sets of vectors of variations randomly or according to methods used in theory of GA Define a crossover point c ∈ {1, . . . , R}. Exchange the vectors of variations after the crossover point in the selected sets. As a result, we obtain two new sets of vectors of variations Two new sets of vectors of variations are two new codes in the neighborhood of basic solution To perform the mutation in the obtained sets (34), (35), randomly select one of the vectors and replace it with a randomly generated vector of variations.

Small Variations for Symbolic Regression Methods
Nowadays, many symbolic regression methods are known. Let us name some of them: GP [1], analytic programming [4], grammatical evolution [2], Cartesian GP [3], inductive genetic programming [13], network operator method [5], parser-matrix evolution [6], and complete binary GP [7]. Only eight symbolic regression methods are listed here. All these methods, except for the network operator method, do not use the principle of small variations of the basic solution. The principle of small variations was firstly applied in the network operator method. If we apply the principle of small variations of the basic solution to the rest seven symbolic regression methods, then we get seven more new methods. Symbolic regression methods with the principle of small variations of the basic solution have the first word "variational" in their name. For example, variational GP, variational Cartesian GP, etc. Consider the application of the principle of small variations of the basic solution to GP.

Network Operator Method
The network operator method encodes mathematical expressions in the form of directed graphs [5,27]. For coding, the method uses functions with one and two arguments. On the graph, functions with one argument are associated with the edges, functions with two arguments are associated with the nodes, and the arguments of the encoded mathematical expression are associated with the source nodes of the graph. Functions with two arguments must be commutative, associative, and have a unit element. An integer matrix of the network operator is used to store the code in the computer memory.
Let us consider an example of coding in the network operator method. Let a mathematical expression be given as where q 1 , q 2 are parameters, x 1 , x 2 are variables. The parameters and the variables are arguments of the mathematical expression.
To encode a mathematical expression, it is sufficient to use the following sets of arguments and elementary functions The set of arguments The set of function with one argument The set of function with two arguments The graph of the network operator for (38) is given in Figure 1. In the source nodes of the graph, there are arguments of the mathematical expression. The remaining nodes contain indices of functions with two arguments. Next to the edges there are indices of functions with one argument. The indices correspond to the second index of elements in the sets of elementary functions (40) and (41). The nodes are enumerated according to topological sorting in their upper parts. The number of the node from which the edge exits is less than the number of the node where the edge enters. Such enumeration is always possible for graphs without loops, and it allows one to obtain an upper triangular matrix of the network operator. For nodes 8 and 9, the second argument is not specified. We use a unit element, zero for the addition function, as the second argument. The network operator matrix for (38) is the following The numbers of rows and columns in the network operator matrix correspond to the node number in the graph. Edges exiting the node are located in a row, edges entering a node are located in a column. The diagonal elements contain indices of functions with two arguments. The rest nonzero elements are the indices of functions with one argument.
Let us introduce small variations for the code of the network operator: 1-replacement of a nonzero off-diagonal element, 2-replacement of a nonzero diagonal element, 3replacement of a zero off-diagonal element, 4-zeroing of an off-diagonal nonzero element. Small variation 4 is performed only if at least one off-diagonal nonzero element remains in the given row and in the given column.
To present a small variation use a vector of four components where w 1 is an index of small variation, w 2 is an index of row, w 3 is an index of column, w 4 is a new value of element. Suppose that we have a set of four vectors of variations If we apply this set of variations to the network operator matrix (42), we obtain a new network operator matrix Note that the second variation w 2 cannot be performed since, after this, there will be no nonzero nondiagonal elements left.

Variational Genetic Programming
The GP code for a mathematical expression is a computational tree. Arguments of a mathematical expression are located on the branches of the tree. The computational tree for the mathematical expression (38) is shown in Figure 2. In GP computational tree functions are placed in the nodes. The functions from the sets (40) and (41) were used. The number of arguments in the leaves of the tree must match the number of times these arguments are used in the expression. For example, the parameter q 2 appears in a mathematical expression twice, so it appears twice in the leaves of the tree.
To store the computational tree in the computer memory a vector of two components is used where s 1 is a number of function arguments, s 2 is the function index.
Arguments of a mathematical expression are represented as functions without arguments GP code for (38) is Let us define small variations for GP: 1-change of the second component of the function code vector, while the value of the second component indicates the index of the element from the set given by the first component; 2-removal of the vector of the function code with one argument; 3-insertion of a function code vector with one argument; 4increasing the value of the first component of the function vector code, while the vector of the argument code is inserted after the code of the function; 5-decreasing the value of the first component of the function vector code by one, while deleting the first argument code encountered after the variable code. If a contradiction arises when performing a variation, then the small variation is not performed.
To code small variation for GP we use a vector of variation of tree components where w 1 is a type of variation, w 1 ∈ {1, . . . , 5}, w 2 is an index of variable element, w 3 is a value of new element. Suppose that for a GP code (49) of expression (38) the following variations are defined Perform small variations for code (49) As a result of small variations (51) for code (49), we obtain a code (56) of expression y = sin(q 1 q 2 cos(x 1 )) + sin(−x 1 q 2 ). (57)

Variational Cartesian Genetic Programming
Cartesian genetic programming (CGP) does not use graphs to present codes of expressions. All elementary functions are combined into one set. The number of function arguments is determined by its index. A mathematical expression is coded as a sequence of calls of elementary functions. Each function call is coded by an integer vector. The first element of the vector is the function index, the remaining elements are the indices of elements from the set of arguments. The result of the calculation of the function call is immediately added to the set of arguments so that it can be used in subsequent calls.
Consider an example of coding mathematical expression (38) by CGP. We will use the set of arguments (39), combine all functions from (40) and (41) into one set, and exclude identity function f 1,1 (z) = z Since in (58) there are only functions with one and two arguments, it is sufficient to use a vector of three elements to encode a function call. For the function with one argument, the third element is not used.
The CGP for a mathematical expression (38) is as follows Small variation of code in CGP is a change of one element of the code [28]. To present a small variation, it is enough to use an integer vector of three elements where w 1 is an index of column in the code, w 2 is an index of row in the column w 1 , and w 3 is a new value of the element. If we vary the first element of the vector of an elementary function, then its new value is determined by the index of the element from the set of elementary functions (58). If we vary some other element, then its value must be less than the sum of the number of elements in the set of arguments (39) and the index w 1 of the varied call vector.
Let us define some variations of the CGP code of (59).
The disadvantage of CGP is that some calls of function in the final mathematical expression may not be used.

Variational Complete Binary Genetic Programming
Complete binary genetic programming (CBGP) encodes mathematical expressions as complete binary trees. For this structure, only functions with one or two arguments are used. Functions with two arguments are associated with tree nodes, functions with one argument are are associated with tree branches. Arguments of mathematical expressions and unit elements for functions with two arguments are placed on the leaves of the tree. Since the tree is complete, the number of elements at each level is known. There is no need to specify the number of arguments for the function when writing code to store it in computer memory. The quantity of arguments is determined by the position of the function in the code. Unit elements for functions with two arguments are added to the set of arguments.
To encode a mathematical expression (38) by CBGP, we use sets of elementary functions with one and two arguments (40) and (41). We add unit elements for functions with two arguments in the set of arguments (39), i.e. zero for addition and one for multiplication A CBGP computational tree for mathematical expression (38) is given in Figures 3 and 4.  CBGP code is an ordered set of function indices from a tree, written sequentially from left to right. At the last level, the indices of arguments from the set (64) are indicated.
Here, for convenience, the CBGP code is presented on different lines according to the levels of the tree. Each level k contains 2 k number of functions with one argument and the same number of functions with two arguments. Altogether, there are 2 k+1 elements at the level k. The total number of elements L in the CBGP code for a binary tree with K levels is calculated as In the considered example, we have K = 4 levels and thus 2 4+2 − 2 = 62 elements.
To determine whether element f s α , where α is an index of element in the code, belongs to one of the sets F 0 , F 1 , or F 2 , we use the following relations where k is the smallest number, that satisfies inequality To present a small variation of CBGP code, let us use a vector of two elements where w 1 is an index of element position and w 2 is a new value of element.
Consider an example of small variations of CBGP code (65) that describes the mathematical expression (38) This code presents a new mathematical expression

Case 1. Control Synthesis for Mobile Robot
As an example, let us consider the application of the variational symbolic regression method for control synthesis of a mobile robot. In the problem, it is necessary to find a mathematical expression for the control function that transfers the object from the set of initial conditions to the terminal one with the optimal value of the quality criterion.
The mathematical model [29] of control object iṡ The control is constrained The initial condition is a set of 30 states The terminal condition is It is necessary to find a control as a function of state coordinates to minimize the functional where ε = 0.01, t + = 1.5 s, x i (t, x 0,k ) is a partial solution of ODE system (73) with control (77) for initial state x 0,k , k ∈ {1, . . . , 30}.
To solve the problem, we used variational CGP and obtained the following control function whereũ 1 = sgn(arctan(q 2 x 2 /(q 3 x 1 ) + 1/x 1 ))(exp(q 3 arctan(q 2 x 2 /(q 3 The value of quality criterion (78) for the solution is J = 34.6372. Trajectories of mobile robot from different initial states to terminal one on plane are presented on Figure 5. To solve the problem, the system (73) was integrated 2,386,440 times. The calculations were performed on Intel Core i7, 2.8 GHz. The computational time was approx. 15 min. It should be noted here that CGP without applying the principle of small variations of the basic solution did not cope with the solution of the problem and could not find a single acceptable solution with the same search parameters.

Case 2. Knapsack Problem
Consider a classic NP-hard knapsack problem. We need to choose, among a number of objects, some set of objects that satisfy certain criteria. Vector of small variations consists of two elements: 1-index of element in the set of objects; 2-a new value.
Suppose that the capacity of knapsack is C. The set of objects with some weights is where x − and x + are low and upper values of objects, K is a number of objects. Each possible solution is It is necessary to find a vector y to minimize the following quality criterion so that the weight of objects would be as close to the capacity of knapsack as possible. Consider the following example. We have a K = 100 objects that have real values (suppose weights, kg) from 0 to 10. The capacity of the knapsack is 100 kg. We need to choose the objects so that their total weight is close to the capacity. In general case there are different types of constraints,on weight, volume, costs, etc.
To solve this problem, we applied VarGA. The parameters of algorithm are given in Table 1.
The objective function for this solution is F(y) = 0.00125,

Results
Symbolic regression methods are currently used to solve complex optimization problems in non-numeric spaces. To expand the area of their application and, in particular, to simplify the execution of the operations of the genetic algorithm, a universal approach was developed based on the principle of small variations of the basic solution.
The main definitions, the distance between the codes, and the neighborhood of the code, including the concept of the inheritance property, were given. A proof of the theorem was presented for solving an optimization problem on a non-numerical space, and provided that the value of the functional is uniformly distributed over a certain interval, the use of symbolic regression methods with the inheritance property for search of optimal solution on a non-numerical space is more efficient than the random search.
Examples of constructing small variations of the basic solution for various methods of symbolic regression were given. It is proposed that the methods which use the principle of small variations be called variational. An example of solving the synthesis problem was given, in which it was necessary to find one control function to ensure the transfer of an object from 30 initial conditions to one terminal point according to the criterion of speed and accuracy. The problem was solved by the variational Cartesian genetic programming. In addition, the classical NP-hard knapsack problem for 100 objects was solved using a variational genetic algorithm.
The results presented in the article have both fundamental and practical importance.

Discussion
The principle of small variations of basic solution is considered as a universal approach to solving problems using symbolic regression methods. In the future, it is proposed to expand the area of its application for solving other optimization problems on a nonnumerical space.
The set of types of elementary variations can be expanded by introducing new variations: inserting a code element at a certain position with a shift of the rest of the code to the right, deleting a code element with a shift to the left, exchanging code elements, etc. The depth of variations, that is, the number of small variations applied to the basic solution for obtaining a new code, affects the computational time to find the optimal solution and may vary. The study of the effectiveness of using various types of small variations and their depth are tasks to be solved.