Machine Learning Control Based on Approximation of Optimal Trajectories

: The paper is devoted to an emerging trend in control—a machine learning control. Despite the popularity of the idea of machine learning, there are various interpretations of this concept, and there is an urgent need for its strict mathematical formalization. An attempt to formalize the concept of machine learning is presented in this paper. The concepts of an unknown function, work area, training set are introduced, and a mathematical formulation of the machine learning problem is presented. Based on the presented formulation, the concept of machine learning control is considered. One of the problems of machine learning control is the general synthesis of control. It implies ﬁnding a control function that depends on the state of the object, which ensures the achievement of the control goal with the optimal value of the quality criterion from any initial state of some admissible region. Supervised and unsupervised approaches to solving a problem based on symbolic regression methods are considered. As a computational example, a problem of general synthesis of optimal control for a spacecraft landing on the surface of the Moon is considered as supervised machine learning control with a training set. Author Contributions: Conceptualization, A.D. and E.S.; methodology, A.D., E.S. and S.K.; software, S.K.; validation, E.S. and G.D.; formal analysis, A.D., G.D.; investigation, S.K.; data curation, S.K.; writing—original preparation, A.D., E.S. and S.K.; writing—review and editing, E.S. All authors


Introduction
Complexity of the control synthesis problems for autonomous robots which must perform the assigned tasks and achieve the set goal, led to new ideas in the control theory. Now, to create a control system for an autonomous robot, this system needs to be trained [1,2], instead of obtaining it by solving some known optimization problems.
To formulate the real problem of mobile robot control, it is needed to describe a large number of different phase constraints. These can be walls, doors between the rooms, windows, columns and other obstacles. For example, a robot has to avoid a column, not to hit on a wall and to get in a door. Now, when control systems for mobile robots are being created, programmers imagine the problems that this robot must face and decide how it should overcome them. Quite a laborious process, but it is quite justified in conditions when control systems were developed on an individual basis for single technical objects, such as spacecraft. However, modern automation and robotization is reaching a broader level and becoming ubiquitous. This trend requires the development of new universal and even automatic approaches to the development of control systems.
Application of symbolic regression methods allows to automatically receive mathematical expressions for control functions. Such mathematical expressions describe how the robot should optimally reach the goal avoiding the obstacles.
Only symbolic regression methods can search structure and parameters of mathematical expression. Other methods, and even artificial neural networks, search only parameters.
The searching of control function structure in the control synthesis problem is called machine learning control [1]. This is a new technology in the development of control systems and it has not yet been proposed a rigorous mathematical formulation that defines this approach. In this paper, we propose some mathematical formalization of the machine learning problem (Section 2) and, on the basis of the proposed definitions, we single out a special area of machine learning-machine learning control (Section 3).
One of the main problems of machine learning control is the problem of control synthesis. The paper first presents the general mathematical formulation of the control synthesis problem, and then proposes its numerical formulation, since according to the methodology of machine learning control, the synthesis problem must be solved numerically using symbolic regression methods.
Further in the work in Section 4, we present our approach to solving the problem of machine learning control based on approximation of optimal trajectories. According to the technique of learning firstly it is necessary to create a training set in order to show to learning object what we want of it. For this purpose initially the optimal control problem is solved with the same quality criterion as for the synthesis problem from some different initial conditions. Obtained optimal trajectories are templates for learning. They show what forms of plots for variables must be obtained in the result of control synthesis problem solution and what values of functional must give these solutions. Then, obtained optimal trajectories for different initial conditions are approximated by a numerical method of symbolic regression. The proposed approach of machine learning based on approximation of optimal trajectories is demonstrated in the computational example of general synthesis of optimal control for a spacecraft landing on the surface of the Moon (Section 5).

Problem Statement of Machine Learning
Definition 1. A set of computational procedures, that transforms a vector x from an input space X to a vector y from an output space Y, and there is not any algebraic equation y = f(x) for them, is called an unknown function.
For example, the system of ordinary differential equationsẋ = f(x) is an unknown function for a vector of initial conditions x(0) and a vector of solutions as functions of time x(t, x(0)), if a general solution is unknown for this differential equation.
The unknown function between input vector x and output vector y is defined as Then for differential equations without general solutions, an unknown function has a form Definition 2. A work area is a subset of input vector space, where the input vectors exist surely and that is used for solving the problem.
The unknown function can be realized by a physical equipment or an experiment. Then unknown function will be called black box, but will be described as (1).
Let a set of input vectors be determined in the work areã For every input vector, output vector is determined by the unknown function (1) Definition 3. A pair of sets, (X,Ỹ), is called a training set.
It is known that there are supervised and unsupervised machine learning methods. An unsupervised machine learning problem can be formulated as follows: for some unknown function (1) and a positive small value δ it is necessary to find a function where q is a vector of parameters, q = [q 1 . . . q p ] T , such that ∀x ∈ X A supervised machine learning problem consequently can be formulated as follows: for some unknown function (1) and a positive small value δ, it is necessary to determine a positive value ε, to build a training set (5) and to find a function (6) such that if the total error for the training sample is less than the given value ε then for ∀x * from work area, but not included in the training set x * ∈ X and x * / ∈X the following inequation is performed where y * = α(x * ).
Here the function β(x, q) includes a parameter vector q. In many approaches a structure of function is defined beforehand on the basis of experience or intuitively, and it is necessary to find only values of some parameters. For example, an artificial neural network [3][4][5], which is often used for solving of the machine learning problems, has a set structure and large number of unknown parameters. In contrast, symbolic regression methods [6][7][8] allow you to search for both function structure and parameters.

The Problem of General Control Synthesis as Machine Learning Control
In the field of control there are also problems that require machine learning. One of the main machine learning control problems is a search for a control function in the general control synthesis problem.
The problem of control general synthesis was formulated in the middle of the last century by Boltyanskii [9] after studying the Pontryagin's maximum principle for the optimal control problem.
The problem has the following description. The mathematical model of the control object is given in the form of the system of ordinary differential equationsẋ = f(x, u), where x is a vector of state, x ∈ R n , u is a vector of control, u ∈ U ⊆ R m , U is a compact set, m ≤ n. The domain of initial conditions is given Existence of the initial condition domain is a main feature of the control general synthesis problem. Initially Boltyanskii defined the domain of initial conditions as a whole space of states X 0 = R n , because he tried to solve this problem analytically. In this case we assume to solve this problem numerically. Therefore the domain X 0 is a restricted set in the space of states.
The terminal condition is given where t f is unassigned time of getting from any initial condition x 0 ∈ X 0 to the terminal state (12). The finishing time is bounded where t + is a given positive value.
The phase constraints are given The quality criterion is given where x(t, x 0 ) is a partial solution of differential Equation (10) with control u(t) ∈ U from initial condition x 0 ∈ X 0 . It is necessary to find a control function in the form where h(x) : R n → R m . If one inserts the control function (16) in the right part of differential Equation (10), then the system of stationary differential equations is receiveḋ which does not have a free control vector in the right part. Any partial solution of the differential Equation (17) from initial conditions (11) achieves terminal condition (12), performing all conditions on phase constraints (14) with optimal value of the quality criterion (15).
Note, that the control function (16) can have simple discontinuities, therefore in many cases analytical methods could not be applied. The majority of analytical methods such as integrator backstepping [10,11] and analytical design of aggregated regulators [12,13] provides stability on Lyapunov by nonlinear smooth feedback control. The main drawback of all analytical methods of control synthesis solution is that they are bounded with the specific form of the mathematical model of control object. The control synthesis problem (10)- (17) under consideration is complicated by the arbitrary form of the mathematical model of the control object and sub-integral function of quality criterion, as well as the phase constraints and a wide class of control functions, which can have simple discontinuities.
In general case, this control general synthesis problem can be solved numerically by symbolic regression methods as machine learning control problem.
For application of the numerical methods it is necessary to reformulate the problem statement. The domain of initial conditions is changed onto finite set of initial state points The terminal condition (12) and the phase constraints are added into quality criterion (15), and the integral of the domain of initial conditions is changed onto sum of all initial state points.
where a 1 is a weight coefficient, ϑ(A) is a Heaviside step function ε 0 is a small positive value, that determines accuracy of terminal state achievement. Within the framework of the formulation of the machine learning problem, the solution to the synthesis problem based on symbolic regression methods is machine learning control.

Control Synthesis as Unsupervised Machine Learning Control
The first approach is a direct search of the control function on basis of a quality criterion minimization. In this case we receive unsupervised machine learning control. The stated general control synthesis problem (10), (12), (18)- (21) can be solved in the concept of unsupervised machine learning control by different symbolic regression methods. Such approach is demonstrated by genetic programming [2], network operator method [14], variational genetic programming [15], variational analytic programming [16], multi-layer network operator [17], binary variational genetic programming [18], modified Cartesian genetic programming [19]. All mentioned symbolic regression methods search for mathematical expressions of control functions, that provide for the received solutions achievement of the terminal condition (12) from all the initial conditions (18) with optimal value of the quality criterion (19), describing the time and accuracy of terminal state hitting, and including phase constraints in the form of penalty functions.
Symbolic regression methods use evolutionary algorithms to search for functions and can achieve a certain level of accuracy when minimizing the functional, but it still remains unknown how the values of the criterion (19) for these solutions are far from real optimal values. To correct this problem it is possible to use a supervised machine learning with a training set received by the solution of the optimal control problem.

Control Synthesis as Supervised Machine Learning Control
The second approach is a learning with application of a training set. This is a supervised machine learning control. In this case firstly it is necessary to obtain the training set. For this purpose solutions of the optimal control problem can be used.
The statement of optimal control problem includes a mathematical model of control object (10), an initial condition given in one point terminal condition (12), (13) the phase constraints (14), and a quality criterion where It is necessary to find a control in the form When inserting the function (25) into the right part of the mathematical model of the control object (10), the following system of non-stationary differential equations is receiveḋ To create a training set for the control synthesis problem it is necessary to solve the optimal control problem on criterion (23) for each particular initial condition from (18) and to receive sets of optimal controls Now we define the time interval Deltat and calculate the value of the state vector on each optimal trajectory at the interval boundaries. As a result, get a training set of optimal trajectoriesX = {X 1 , . . . ,X K }, ∆t is a given time interval. Now in order to solve the control synthesis problem (10), (12), (18)- (21), and to find the control function in the form (16) it is enough to approximate the training set (29) on a criterion where t 0 = 0, x(t, x 0,i ) is a partial solution of the Equation (17) with the initial conditions To ensure the fulfillment of phase constraints, both criteria (31) and (19) are applied. In result, the following combined criterion is used where γ is a weight coefficient.
To solve the approximation problem, a symbolic regression is also used. The control synthesis on the base of optimal trajectories approximation allows to find a control function (16) that provides receiving optimal control with accuracy to approximation of the training set. The solution closest to the optimal one is determined by the accuracy of the optimal control problem.

Computational Algorithms
In order to solve the control synthesis problem as machine learning control on the base of optimal trajectories set approximation it is required to solve two complex problems, the optimal control problem in order to form a training set, and the approximation of optimal trajectories by some symbolic regression method. For both problems evolutionary computations are used.

Algorithms for the Optimal Control Problem
The optimal control problem with phase constraints is not uni-modal [20], therefore evolutionary algorithms are applied, which can solve a global optimization problem. Recently, it is popular to use hybrid evolutionary algorithms that combine different evolutionary algorithms. Studies of evolutionary algorithms for numerical solution of the optimal control problem show [21], that the most successful in solving this problem are genetic algorithm (GA) [22], Particle swarm optimization algorithm (PSO) [23] and Grey wolf optimizer algorithm (GWO) [24].
All PSO-algorithm uses the best current possible solution, and the best of solutions from some random selected ones as well as information about historical changes for this possible solution. An evolution is performed for each component of possible solutioñ q i j is the component j of the best solution from k randomly selected ones,q i = [q 1 . . .q p ] T , q j (0) is the component j of the best current possible solution, q(0) = [q 1 (0) . . . q p (0)] T , ξ is a random value in the interval from 0 to 1, at each call this function gives a new random number, α, β, γ are constant parameters of the algorithm, vector v has zero initial value.
The GWO-algorithm performs the following changes of possible solution on the base of some best current solutions where q j (0) . . . q j (N − 1) are j components of N best possible solutions, r is calculated one time in a generation, g is a number of generation, G is a quantity of generations. The GA considers vectors in the form of Grey code and performs evolution for two selected possible solutions as operations of crossover and mutation. For crossover two possible solutions are selected where z i k , z j k ∈ {0, 1}, c is a number of bit for integer part, d is a number of bits for fractional part of Grey code.
Then the point of crossover s is determined. As a result two new possible solutions are receivedz In hybrid algorithms in each cycle of evolution for each possible solution one of three ways (33), (34) or (35), (36), or (37), (38) of obtaining new possible solutions is selected randomly. The algorithm stops calculation after all cycles of evolution are performed.

Numerical Methods of Symbolic Regression for the Control Synthesis Problem
For solution of the control synthesis problem numerical methods of symbolic regression are used. Now more than fourteen methods are know. The methods code a mathematical expression and search for optimal solution on the code space. All methods differ in the form of code.
For example, consider the following mathematical expression To code this mathematical expression the following basic sets are used: the set of arguments where indexes of elements point the number of arguments and a function number, if the first index is equal to zero, then this is an argument of the mathematical expression. The most popular and the earliest symbolic regression method is the genetic programming by J.Koza [6]. This method presents a mathematical expression in the form of computational tree. In the Figure 1 the computational tree for the mathematical expression (39) is presented.
The code of genetic programming consists of indexes of elements of the computational tree on all branches from the top to the leaves. The code of genetic programming is used for presentation of the computational tree in the computer memory.
A code of genetic programming is not very comfortable, as codes of different mathematical expressions have different length that also changes after crossover operation. If in the mathematical expression one argument enters several times, then it has to be on leaves of the computational tree the same number of times.
Another method of symbolic regression-the network operator method [14]-codes mathematical expression in the form of oriented graph. In the Figure 2 the network operator graph for the mathematical expression (39) is presented. On the network operator graph the nodes contain the numbers of functions with two arguments, the source-nodes contain the arguments of the mathematical expression, the arcs are marked with the numbers of functions with one argument.
In the computer memory the network operator graph is presented as an integer matrix.
Each row of the matrix corresponds to a graph node. The numbers of nodes are located at the top of the nodes (see Figure 2). The nodes are numbered in such a way that the number of the node from which the arc exits must be less than the number of the node where the arc enters. Then a network operator matrix has an upper triangular form. In the network operator matrix the numbers of functions with two arguments are located on the main diagonal. Zero element on the main diagonal shows that the row corresponds to a source-node. Other non-zero non-diagonal elements are the numbers of functions with one argument.
Consider one more symbolic regression method. In Cartesian genetic programming [25] the code of the mathematical expression (39) has the following form Here, every integer vector corresponds to one elementary function. The first element of vector is the function number, other elements are element numbers from the set of arguments (40). If a function has one argument, then the second argument is not used. The result of calculation according to each vector is added to the argument set, so a number of arguments increases in one every time after calculation of functions according to the vectors.
Due to redundancy, Cartesian genetic programming code has the same length for all mathematical expressions. A crossover operation for Cartesian genetic programming are performed by exchange of vectors after crossover point, so the length of codes does not change.
Studies of symbolic regression methods for the control synthesis problems show that it is effective to use in these methods the principle of small variations of the basic solution [26]. According to this principle only one basic solution is encoded by a method of symbolic regression. Other possible solutions are encoded by sets of variation vectors. Each variation vector makes one small change of basic solution code. After some generations the basic solution is changed on the best current found solution. This approach makes it possible to speed up the search process by narrowing the search space and avoiding additional checks for the correctness of the codes of possible solutions.

Computational Experiment
In a computational experiment a problem of general synthesis of optimal control for a spacecraft landing on the surface of the Moon is considered [27]. The differential equations of spacecraft state are the followinġ 1000 , 1000 , T is a state vector, namely x 1 is the current speed of the spacecraft (m/s), x 2 is a trajectory inclination angle (rad), x 3 is the current flight altitude relative to the lunar surface (km), x 4 is a flight distance (km), x 5 is the mass of spacecraft including fuel (kg). u = [u 1 u 2 ] T is a control vector, values of which are constrained Parameters of model have the following values: gravitational acceleration at the certain altitude above the lunar surface the Moon gravitational acceleration g M = 1.623 m/s 2 , the Earth gravitational acceleration g E = 9.80665 m/s 2 , the Moon radius r M = 1737 km, nominal thrust of the spacecraft engine P c = 720 kg, spacecraft engine thrust P s = 319 s. A domain of initial states is A terminal state is Phase constraints are determined by the mechanics of spacecraft flight. Obviously the speed x 1 , altitude x 3 and fuel level x 5 cannot be negative, reaching a zero altitude x 3 or zero fuel level x 5 at a significant speed x 1 means that the spacecraft has crashed. Consider the following phase constraints where V max is the maximum landing speed, V max = 1, ϑ(A) is the Heaviside function.
According to the proposed method at the first step the training set is to be formed. We determine the finite set of initial states within the domain (48) and solve the optimal control problem for each initial state from this set.
Let us replace the domain of initial states (48) with a set of M = 21 elements uniformly distributed on this domaiñ Quality criterion considers the proximity of reaching terminal state and the case of phase constraints violation where α i , i = 1, 2 are given penalty factors, K = 5 is a number of phase constraints, j = 1, M.
To search for solution to the optimal control problem the direct approach was used. The original problem was reduced to a nonlinear programming problem by introducing the time interval ∆t. The solution of each optimal control problem in form of control vector at discrete moments of time was searched independently by hybrid evolutionary algorithm combining modern Grey Wolf Optimizer (GWO), which does not require problem specific tuning of additional parameter, and well-known Particle swarm optimization (PSO). Separately these algorithms showed a high efficiency in solving optimal control problems. A hybrid realization is to increase their effectiveness.
In a computational experiment the size of the set of possible solutions was 100, number of search iterations was 5000. Modeling parameters were the following: maximum control time t max = 300, discretization time interval ∆t = 30, penalty factors α 1 = 10, α 2 = 10.
At the second step of proposed approach we use obtained optimal trajectories to synthesize a multidimensional control function of object state space. The search for a control function is conducted by a symbolic regression method that search for the most suitable expression that approximates provided optimal trajectories best.
We used the network operator method to synthesize a control function. NOP allows to search for the structure of mathematical expression simultaneously with the search for optimal values of parameter vector. In the computational experiment we used the following parameters of NOP: size of NOP matrix was 40, size of the set of input variables was 3, size of the set of input parameters was 12, number of outputs was 2, number of candidate solutions in the initial set was 256, maximum number of search iteration was 25,000.
Functions χ 5 and χ 6 are commutative, associative, and have a unit element, zero.
To check the solution we used the found control function to obtain optimal control and corresponding trajectories for various initial states from (48). Among considered initial states were both those that were present in the training set (51) and those that were not present. Table 1 shows the values of quality criterion J * obtained using the found control function (53) for 21 initial states from the finite set (51). The optimal trajectories known for these initial states were previously used as a training set. This test is to show the quality of approximation. The value of the quality criterion J ocp obtained by solving the optimal control problem for the same initial state is showed in the table as a reference value. The average deviation of the quality criterion values from the reference ones is 0.0591, maximum deviation is 0.2514, the standard deviation is 0.0648.  Table 2 shows the values of quality criterion J * obtained using the found control function (53) for 10 initial states generated randomly within the domain (48). This test is to show the suitability of the found control function for any initial state from the domain (48). The value of the quality criterion J ocp obtained by solving the optimal control problem for the same initial state is showed in the table as a reference value. The average deviation of the quality criterion values from the reference ones is 0.0366, maximum deviation is 0.1122, the standard deviation is 0.0341.  Figure 5 shows the found control function values over time.  The computational experiment showed that the found multidimensional control function allows one to obtain a close to optimal solution for any initial states from the given domain (48) even for those initial states that were not in the training set (51).
According to the analysis of the standard deviation, the training set contained a sufficient number of optimal trajectories. A better value of the standard deviation for the experiment with randomly distributed in (48) initial states can be explained by the fact that the set (51) had a large number of initial states on the boundaries of the set (48).

Conclusions and Perspectives
The paper provides mathematical formulations of the machine learning problem, supervised and unsupervised, defines the basic concepts, such as the work area and the training set. Based on the presented formulations, it is shown that the main task of machine learning is to find a function that determines the correspondence between the input data and the resulting data. It is shown that today this problem can be solved numerically using symbolic regression methods. The problem of obtaining a mathematical expression arises in various situations-approximation of experimental data to determine a physical law or a trend model; efficiently analyze and predict variables or indicators based on previous observations; identification of a mathematical model of a process or a dynamic object; generalization of the control law based on the current state of the control object. The application of machine learning based on symbolic regression methods to control opens up the possibility of solving such a complex problem in control theory as the problem of general control synthesis. The paper presents a mathematical formulation of the control synthesis problem and provides methods for its solution using machine learning both directly and based on a training set. An important result of the article is the methodology for solving the problem of general control synthesis as machine learning control based on a training set. An approach to constructing a training sample based on multiple solutions to the optimal control problem is proposed. An example of solving a specific problem of synthesis of control of a complex technical object based on the approximation of optimal trajectories is given. It is shown that such a control, obtained on the basis of machine learning, gives good results not only for the input data from the training set, but also not from it.
The concept of machine learning is widely known, but very often limited by its association with neural network technology. We are expanding the concept of machine learning to include a description of an unknown function in its formulation. Thus, a function can be specified and training is aimed only at finding parameters, as in neural networks, but you can also search for the structure of the function and its parameters. This became possible with the advent of symbolic regression methods. The complexity of these methods lies in the need to organize search in a space in which there is no metric. This greatly complicates the solution of the problem of finding the required structure of the function. This complexity opens up a wide field of research. One of the ways to solve this problem is to use the principle of small variations of the basic solution indicated in the article. This approach allows you to concentrate the search for a solution around a basic solution based on the developer's experience or intuition. This approach also requires further study.

Conflicts of Interest:
The authors declare no conflict of interest.