Approaches to Numerical Solution of Optimal Control Problem Using Evolutionary Computations

: Two approaches to the numerical solution of the optimal control problem are studied. The direct approach is based on the reduction of the optimal control problem to a nonlinear programming problem. Another approach is so-called synthesized optimal control, and it includes the solution of the control synthesis problem and stabilization at some point in the state space, followed by the search of stabilization points and movement of the control object along these points. The comparison of these two approaches was carried out as the solution of the optimal control problem as a time function cannot be directly used in the control system, although the obtained discretized control can be embedded. The control object was a group of interacting mobile robots. Dynamic and static constraints were included in the quality criterion. Implemented methods were evolutionary algorithms and a random parameter search of piecewise linear approximation and coordinates of stabilization points, along with a multilayer network operator for control synthesis.


Introduction
The focus of this research was the study of numerical methods for solving the optimal control problem. A group of objects should move from given initial states to terminal ones while avoiding obstacles in a minimum time. The problem belongs to the class of infinite-dimensional optimization. There are two approaches to solve it numerically [1]. A direct approach is based on a discretization of the control function and reduction to the finite-dimensional optimization. An indirect approach is based on the Pontryagin maximum principle, which transforms the original optimization problem into a boundary one, which is numerically solved by shooting methods or as a finite-dimensional optimization problem [2].
A complex control object (a group of mobile robots) is considered. The main property of the object is the presence of phase constraints to avoid collisions between robots, which significantly complicates the development of a numerical method. An attempt to solve this problem, for example, by introducing additional control components (so-called epsilon control) was considered in [3,4]. Other methods that deal with constraints are the potential field method [5], the vector field histogram [6], the gap method [7], or artificial intelligence methods such as swarm intelligence algorithms [8], artificial neural networks [9], and fuzzy logic [10].
The presence of phase constraints in the optimal control problem also leads to the absence of convexity and unimodality of the integral quality criterion. Despite the huge number of works in this area, no effective numerical method has been obtained for its solution [11][12][13][14]. All the main solutions were obtained analytically for low-dimensional models, mainly with differential equations that allow finding a general solution. There are more theoretical results devoted to proving convergence [15] rather than developed algorithms. 3 of 14 The rest of the article is organized as follows: the optimal control problem for a group of robots with phase constraints is presented in Section 2. Section 3 describes the method of synthesized control. Section 4 introduces a multilayer network operator method. Section 5 contains a short review of evolutionary algorithms. Simulation results are given in Section 6, followed by a conclusion.

Optimal Control Problem for Group of Robots
The optimal control problem for a group of mobile robots is considered. Robots should move from given initial states to terminal states in a minimum time. They must not collide with each other or with static obstacles. The coordinates of obstacles in the state space are given.
An essential feature of this problem is the mandatory presence of phase constraints that significantly complicate the search for its solution. To solve the optimal control problem with phase constraints, due to the lack of convexity and unimodality of the quality function, it is suggested to use mainly evolutionary algorithms.
The second obstacle here is the feasibility of the obtained solution. It is obvious that the found optimal control function as a time function cannot be applied to a real object. All researchers in the field of optimal control argue that, to implement the found optimal solution, it is necessary to develop a stabilization system for the optimal program path. In this paper, we propose to solve the optimal control problem using the synthesized control method, which is widely used in practice.
Formal studies on this method in mathematical scientific papers have not been carried out because this method requires solving the control synthesis problem, which is more complicated than the optimal control problem since it requires finding a solution in the form of a mathematical expression of the control function, the argument of which is the state vector of the control object. Substitution of the found control function into the model of the control object ensures the stability of the control object in the neighborhood of a point in the state space. The approaches to solving control synthesis problems are often limited by analytical methods, backstepping [26], and the analytical design of aggregated regulators [27]. The successful application of these methods depends on the mathematical model of the control object. In this work, the numerical method of symbolic regression, a multilayer network operator, was used to solve the synthesis problem. After solving the synthesis problem, the object was controlled by switching from one stabilization point to another.
Evolutionary algorithms were used to implement the synthesized control method. Initially, when solving the synthesis problem, a specialized genetic algorithm was used to find an encoded mathematical expression for the optimal control function. Then, using another evolutionary algorithm, the coordinates of stable equilibrium points for all control objects were found. To analyze the effectiveness of this approach, we also investigated a direct solution to the optimal control problem using various algorithms.
Consider the optimal control problem below for a group of N identical mobile robots. A mathematical model of a mobile robot is given as follows [28]: where j is an index of a robot in a group, j = 1, . . . , N, x 1 , x 2 are coordinates of the mass center of the robot, x 3 is a rotation angle of robot axis, and u 1 , u 2 are control signals on rotors.
The control signals are limited by Initial states of each mobile robot j are given as Static phase constraints in the state space are where r i represents distances between the center of static phase constraint and center of the robot that cannot be violated, x * ,i 1 , x * ,i 2 are coordinates of the center of static phase constraint i, and B is the number of static constraints.
Dynamic phase constraints that consider collisions between robots are the following: where r 0 is the radius of dynamic phase constraints, i.e., outside dimensions of the robot, that cannot be violated.
Terminal states of robots are The quality function is where i ), j = 1, N, t + is the maximal control time, and ε is a small positive value.
Phase constraints are included in the quality criterion using the Heaviside function.
When we need to obtain a differentiable quality function, we substitute the Heaviside function with a sigmoid function as follows: where A is a large positive value. The solution of the optimal control problem is a control vector u(·) = [ u 1 (·) . . . u m (·)] T .
An optimal control problem can be transformed into a finite-dimensional optimization one by discretization of control in time. To apply nonlinear programming methods, we approximated u i (·), i = 1, m by the functional dependences with a finite number of parameters using piecewise linear approximations. Let us introduce a constant time interval ∆t = t i+1 − t i , i = 0, . . . , d = t + /∆t . The control is searched for in the form of piecewise linear functions between parameters q i , i = 0, d. The control is constrained by The task is to find a vector of parameters, where For the search of parameters, we used evolutionary and population methods, as well as a random search.

Synthesized Control
The synthesized control problem was solved in two steps. First, we found a stability system for each robot to guarantee its stability near a given point in the state space. To solve the synthesis problem, we used one of the symbolic regression methods, a multilayer network operator method.
A stabilization problem involves synthesizing a control function.
where x j is a point in the state space R n j or state vector of robot j. Secondly, we found a set of points in the state space and parameters of the switch.
Points found in Equation (12) represented stabilization points of robots.
where index p increases its value by 1 when reaching the given stability point, where Coordinates of stability points were searched simultaneously for all robots. The objective function includes the penalty for phase constraint violation. where It is necessary to solve the control synthesis problem in Equation (11) and find a multidimensional nonlinear function u j = g j ( x j − x j ) that guarantees the stability of ODE.

Multilayer Network Operator Method
A network operator method encodes mathematical expressions in the form of directed graphs [29,30]. A multilayer network operator method is a development of the network operator method that encodes mathematical expressions in the form of directed graphs which consist of several subgraphs. Let us consider an example of coding a mathematical expression using the multilayer network operator method.
Let a mathematical expression be given as To code this expression, it is enough to have the following basic sets: If some element of the set has two indices, then the first one shows the number of arguments, whereas the second one is an index of the element in the set. The Heaviside function ϑ(z) is used for conditional operator IF. Let Suppose that a multilayer network operator has K = 4 layers or subgraphs. Suppose that the number of nodes is equal to 8, and four of them are source nodes, n 0 = 4. Each layer has the same number of source nodes, some of which may not be used. The graph of a multilayer network operator for Equation (11) is shown in Figure 1.  In Figure 1, the source nodes of the graph contain the indices of binary functions; next to the edges are the indices of unary functions. To calculate a mathematical expression, it is necessary to specify the matrix of connections D. This matrix has dimensions of K × 2n 0 . Each row of the matrix is associated with a specific layer and indicates the source nodes of this layer. The odd element of the row contains the index of the layer. If the index is equal to 0 then it is a set of arguments. The even element of the row indicates the node number of this layer or the element number in the argument set. For the multilayer network operator in Figure 1, the matrix of connections has the form 2 0 4 0 5  1 5 1 6 1 8 0 3  2 6 1 8 2 8 0 5  1 5 3 5 3 7 3 The network operator matrices of the four layers are Note that, when solving the synthesis problem using the symbolic regression method, direct coding is not required. In the search process, variations of codes and calculations of mathematical expressions are performed, i.e., decoding.

Evolutionary Methods
Evolutionary methods form a class of modern optimization algorithms that work with a set of randomly created possible solutions and apply certain modifications of these solutions. All evolutionary methods have the following common features: The efficiency of evolutionary methods is due to the right balance between exploration and exploitation search. Exploration allows investigating the whole search space for the promising areas. This part of the search process should be applied to the search space as broadly as possible. On the other hand, exploitation involves a local search in promising areas to ensure higher accuracy of the solution. The methods of performing exploration and exploitation searches and the balance between them differ from one evolutionary method to another.
The first and the most well-known of the evolutionary methods is the genetic algorithm (GA) [20]. This method was inspired by Darwin evolution concepts. In the genetic algorithm, operators of crossover and mutation were introduced to modify a set of possible solutions in search for a best one. In this method, each possible solution is encoded with a Gray binary code and called a chromosome, whereas each search iteration is called a generation. Analogous to Darwin's theory, better solutions have a higher probability of participating in generating new chromosomes for the next generation. However, taking into account only the best solution may lead to premature convergence to some local extremum. To prevent this, similar solutions should not be used during crossover. The mutation operator also helps not to get stuck in a local extremum. The probability of mutation, the probability of crossing depending on the quality of solutions, and their proximity are the main tuning parameters of the genetic algorithm.
Another well-known and popular evolutionary method is particle swarm optimization (PSO) [21][22][23]. As the method's name suggests, it mimics the social behavior of some creatures in nature which exist in swarms. PSO was proposed in 1995 and gained popularity due to its effective way of combining exploration and exploitation searches during the solution modification. In this method, possible solutions are called particles. These particles are distributed in the search space and seek better positions by taking into account the best particle and the best position in previous iterations. This technique can be seen in flocks of birds or in schools of fish and is called social intelligence. The movement toward the global best solution is part of the exploitation search. The inertial movement and the movement toward the best position is part of the exploration search. The balance among these three components of the particle movement is determined by the corresponding tuning parameters of the method.
The observation of the collective behavior of bees in finding nectar sources prompted the creation of the bee algorithm (BA) [24]. In this evolutionary method, the exploration part of the search is represented by a random search of the whole search space. The analogous process among bees is the process of looking for promising places with high nectar concentration by scout bees. The found solutions can be divided into highly interesting, interesting, and not interesting. Not interesting solutions are substituted by new randomly created ones. The subdomains of given radii of highly interesting and interesting solutions are investigated more intensively. A given number of randomly created solutions are distributed within these subdomains. It is obvious that highly interesting subdomains get more attention. Best found solutions within each subdomain succeed to the next iteration. The radii of highly interesting and interesting subdomains decrease with each iteration. This ensures the success of the exploitation search. The main tuning parameters of the bee algorithm are the number of additional solutions in subdomains and their initial radii.
The gray wolf optimizer (GWO) is the most recent evolutionary method used in the study [25]. It appeared in 2014. This optimization method was inspired by the social behavior of a pack of wolves during hunting. The leader of the pack is called the alpha. The alpha always participates in the hunting process and coordinates it. Next in the wolf pack hierarchy are betas and deltas, which help in attacking prey. In a mathematical model of this hierarchy, the alpha is the current best solution, whereas the second and third best solutions are the beta and delta, respectively. For each search iteration, the three best possible solutions (alpha, beta, and delta) are selected from the set of all possible solutions. The modification of each possible solution is performed with respect to the positions of alpha, beta, and delta. The balance between exploration and exploitation search is achieved by a special component that is linearly decreased over iterations. The main advantage of the gray wolf optimizer with respect to the methods discussed above is that it is free of any tuning parameters.
When developed, all evolutionary algorithms are tested on benchmark functions that may have many local minima. An algorithm is considered satisfactory if it finds a global optimum. To increase the probability of finding the global optimum, it is necessary to enlarge parameters such as the population size and the number of generations.

Simulation Results
In both considered approaches, we needed to solve the finite-dimensional problem of nonlinear programming. The methods recommended in [18] for solving nonlinear programming problems require the quality criterion to satisfy the conditions of smoothness, convexity, and unimodality. In most applied optimal control problems, satisfaction of these conditions cannot be fulfilled. The application of evolutionary methods for solving nonlinear programming problems does not impose the abovementioned conditions on the quality criterion. Recent research has shown the high efficiency of evolutionary methods for solving optimal control problems [19].
To estimate the complexity of algorithms, we used the average number of objective function calculations required to obtain a result, Y avg . Parameters of the algorithms were selected such that the number of objective function calculations Y avg for different algorithms was nearly the same.
The following parameters were used in computational experiments: N = 4, t + = 2.8, A detailed description of used evolutionary algorithms for both approaches was provided in [19]. For a group of four robots, we searched using a vector of 12 parameters × 2 controls × 4 robots = 96 elements.
The results of computational experiments for the optimal control problem solution using the direct approach are given in Table 1. Table 1 contains the mean values of objective function J avg and average complexity estimation Y avg , i.e., the number of functional calculations. Note that, in Table 1, each row from 1 to 10 contains the results of one experiment for each algorithm. Results presented in Table 1 show that all evolutionary algorithms performed better than the random search. Furthermore, PSO and GA solved the problem more effectively. Figure 2 shows the trajectories of robots that bypassed the obstacles (circles) with control obtained using PSO, J = 2.81. As can be seen from Figure 2, PSO found the optimal solution that ensured the attainment of terminal conditions with given accuracy without violation of the phase constraints and without collisions. The trajectories of the robots had a certain excess length.   In the second approach with synthesized control, we first solved a control synthesis problem relative to the point in the state space. Since all robots were identical, then the synthesized control function in Equation (13) was used for each robot.
Then, we solved a finite-dimensional optimization problem and searched for stabilization points x j,k i , where i is an index of the element in the state vector, i = 1, 3, j is a robot index, j = 1, 4, and k is a point index.
The switch from one point to another was performed after each time interval ∆t = 0.7. For each robot, we searched three points and one known terminal point. Thus, we searched using a vector of 3·4·( 2.8/0.7 − 1) = 36 elements.
The following parameters were used in computational experiments: The average number of objective function calculations Y avg was set smaller than in the direct approach because the simulation of synthesized control was very time-consuming. The results of computational experiments are given in Table 2.  Figure 3 shows the trajectories of robots with control obtained using GWO, J = 2.49. Here, black squares show the identified stabilization points, whereas red circles are obstacles. The paths of the robots crossed, but at different time moments; thus, they avoided collisions and did not come across obstacles. Stabilization points "attracted" the robots but did not necessarily lie on their paths. All robots reached the terminal conditions with given accuracy without violation of the phase constraints and without collisions. The trajectories of robots were smoother in comparison to those shown in Figure 2.

Conclusions
The optimal control problem of a group of robots with phase constraints wa using two numerical approaches. Robots had to achieve the terminal points wit lisions among themselves or with obstacles in a minimal time. The presence of ph straints complicated the search. Evolutionary computations were applied to cope nonconvex and multimodal quality criterion.
In the first approach, a robotic group was treated as one object. The optima Despite the fact that the optimal control search using the synthesized control method required fourfold less calculation of the objective function for each method in comparison to the direct approach, the search results turned out to be much better. The best result was obtained using GWO, on average, followed by PSO.
All evolutionary techniques in both approaches performed better than the random search. We may suppose that evolutionary algorithms use certain features of intelligent search.

Conclusions
The optimal control problem of a group of robots with phase constraints was solved using two numerical approaches. Robots had to achieve the terminal points without collisions among themselves or with obstacles in a minimal time. The presence of phase constraints complicated the search. Evolutionary computations were applied to cope with the nonconvex and multimodal quality criterion.
In the first approach, a robotic group was treated as one object. The optimal control problem was reduced to the nonlinear programming problem. In the second approach, the problem of control synthesis was first solved separately for each robot relative to the points in the state space. To solve the synthesis problem, the multilayer network operator method was used, although this could also be achieved using other symbolic regression methods that can derive mathematical expressions.
Then, the optimal control problem was considered in the original statement, where the coordinates of the stabilization points of robots were used as control. A search was carried out for three stabilization points for each of the robots. Switching between points occurred at a specified time interval. A search of the coordinates of stability points was performed using evolutionary algorithms (genetic algorithm, particle swarm optimization, bee algorithm, and gray wolf optimizer) and a random search.
After a series of 10 tests for each algorithm, they were evaluated by the average value of functional for the best solution identified. Experiments showed that, on average, the PSO algorithm was the most effective in terms of the search of parameters in the direct method and coordinates of stabilization points in synthesized control.
With respect to the application of evolutionary algorithms to the compared approaches, a synthesized control approach was proven to perform better. In the direct approach, only GA and PSO gave satisfactory results on average, whereas, in the second approach, all the evolutionary algorithms performed well and gave, on average, approximately the same results.
Currently, there are no universal numerical methods for solving optimal control problems. It is proposed to continue the study of evolutionary algorithms and consider other new evolutionary algorithms, including hybrid ones. To use evolutionary algorithms in optimal control problems, it is necessary to include phase constraints in the functional, discretize the problem, and reduce it to a nonlinear programming problem. For this purpose, the time of the control process is divided into intervals in which the control function is approximated by polynomials depending on a finite number of parameters. Furthermore, the search for the values of the parameters of the approximating function can be carried out by evolutionary algorithms.
A distinctive feature of the application of evolutionary algorithms for solving optimal control problems is that, when calculating the value of the quality criterion, it is necessary to integrate a system of differential equations that describe the mathematical model of the control object with an approximating control function.