Application of Mini-Batch Metaheuristic Algorithms in Problems of Optimization of Deterministic Systems with Incomplete Information about the State Vector

† This paper is an extended version of conference paper Application of the mini-batch adaptive method of random search(MAMRS) in problems of optimal in mean control of the trajectory pencils, In Proceedings of the 19th International Conference “Aviation and Cosmonautics” (AviaSpace-2020), Moscow, Russia, 23–27 November 2020. Abstract: In this paper, we consider the application of the zero-order mini-batch optimization method in the problem of finding optimal control of a pencil of trajectories of nonlinear deterministic systems in the case of incomplete information about the state vector. The pencil of trajectories originates from a given set of initial states. To solve the problem, the structure of a feedback system is proposed, which contains models of the plant, measuring system, nonlinear state observer and control law of the fixed structure with unknown coefficients. The objective function proposed considers the quality of pencil of trajectories control, which is estimated by the average value of the Bolz functional over the given set of initial states. Unknown control laws of a plant and an observer are found in the form of expansions in terms of orthonormal systems of basis functions, which are specified on the set of possible states of a dynamical system. The original pencil of trajectories control problem is reduced to a global optimization problem, which is solved using the well-proven zero-order method, which uses a modified mini-batch approach in a random search procedure with adaptation. An algorithm for solving the problem is proposed. The satellite stabilization problem with incomplete information is solved.


Introduction
A general approach to the numerical solution of the problem of finding the average optimal control of nonlinear deterministic dynamical systems under conditions of uncertainty in setting the initial conditions and incomplete information about the state vector is proposed. Since direct information about the state vector is not available, a nonlinear state observer is included in the closed-loop control system, which finds an estimate of the state vector from the output of the nonlinear model of the measuring system. The control laws of the plant and the observer are found simultaneously as functions of time and estimates of the state vector. In contrast to linear systems with a quadratic criterion, in which the synthesis of the optimal controller and the optimal filter is performed independently, in the proposed procedure, the undefined coefficients of the control laws of the plant and the observer are sought simultaneously [1].
An alternative way is to use various numerical methods for solving the Bellman equation as a sufficient condition for optimality of feedback control in the complete state information problem. In this case, arbitrary initial conditions are considered, for which the minimum of the functional should be obtained. When solving practical problems of control theory, it is usually possible to define a set of initial states, determined by the conditions of operation of the control system, and for this set to search for the corresponding law of control with feedback. To complete the solution, one should find the parameters of the nonlinear observer independently and use the estimate of the state vector in the optimal control law instead of exact information about the state vector.
In the present paper, the behavior of a nonlinear continuous deterministic plant (model of object) is described by the ODE system. Parallelepiped constraints are imposed on control vector coordinates. Initial conditions are given by a compact set of initial states. The quality of separate trajectory control is estimated by the value of the Bolz functional. For the given set of initial conditions, a pencil of trajectories is considered. The performance index to be minimized is calculated by the average value of the Bolz functional over the set of initial states. The problem is to find the control laws for the plant and the state observer in the class of functional expansions in terms of elements of orthonormal basis systems with unknown coefficients, depending on time and estimates of the state vector coordinates. The components of the control laws are found using systems of basis functions that are used in problems of spectral analysis [2,3]. It is proposed to apply the mini-batch adaptive method of random search (MAMRS) [4][5][6] for solving the problem under consideration and to analyze the solution of the problem for various models of the measuring system. As a special case, the control problem with complete information about the state vector is considered. MAMRS can be classified as a metaheuristic method [7][8][9][10][11]. MAMRS extends the idea of stochastic gradient methods [12][13][14][15] to a method that does not require information about the gradient. The efficiency and analysis of this method is demonstrated by solving an applied optimal control problem of satellite stabilization [16].

Statement of the Problem
We consider the nonlinear continuous dynamical system described by the vector differential equation: where ( , , ) f t x u is a given continuous function,  The initial conditions are specified as: where Ω is a set with positive measure ( mes 0 Ω > ) and a piecewise smooth boundary. It characterizes the uncertainty in setting the initial conditions.
The model of the measuring system is described by the relation: where m z R ∈ is an output vector and ( ) , h t x is a given continuous function. The information coming from the model of the measuring system arrives at the input of the state observer, producing an estimate of the state vector.
We suppose that it is possible to obtain an estimate of the state vector using a nonlinear observer of the form: x t is a state vector estimate, 0 x ∈Ω is an initial estimate and( , ) n m K t x R × ∈ is an unknown continuous n m × matrix function. This matrix is considered as a feedback control of the observation process. The state vector estimate is used also in the plant control law( , ) u t x .
We define the set of admissible control laws U by functions ∈ is a piecewise continuous and the observer is a continuous function. It is assumed that the solution of the system of equations (1) and (4) with the initial conditions (2), (5) taking into account (3), exists and is unique.
The performance index for a separate trajectory: where 0 ( , , , ), ( ) f t x u K F x are given continuous functions.
We associate the pencil of trajectories of the system of equations (1) and (4) with each admissible control law ˆ( ( , ), ( , )) u t x K t x ∈ U and the set Ω of initial states: that is, the union of the system of equations (1) and (4) and solutions for all possible initial states from the set Ω .
The performance index for the pencil of trajectories control to be minimized is: The optimal control problem is to choose the control policy Since the average value of performance index (6) is minimized on the set of initial states Ω , the required control is called optimal on average.

Solution Search Strategy
We consider the transition to the parametric optimization problem from the control problem (8), i.e., to the problem of finding unknown coefficients of the plant control and the observer control. The plant control constraints of parallelepiped type should be taken into account.
To implement this transition, we use the following assumptions: 1. The set of initial states Ω is a parallelepiped, defined by the direct product of 3. The plant control policy is searched in the form: where saturation function sat guarantees the fulfillment of the plant control constraints of the form , , [ , ] t t and satisfying the condition As the basis functions ( ) ( ) , one can take, for example: • Legendre polynomials: and other systems of basic functions.
The matrix entries K t x are found by a formula similar to (11), where variable u is replaced by K .
The value of the pencil control cost functional (7) is approximated as: The optimization problem is to choose the best parameters mizing performance index (12) by using a mini-batch adaptive method of random search (MAMRS) [4]. The strategy of its application is that, for the approximate calculation of functional (12), randomly selected d non-coinciding trajectories emanating from the set of initial states are used that form a mini-batch: The mini-batch size is user-definable,1 d N ≤ ≤ , and is usually selected step by step. Furthermore, for simplicity of presentation, we assume that each coordinate of the control laws  Step 0. Set the initial mini-batch size: Step 3. Define the initial values of coefficients Step 4. Generate a random vector ( )

Mini-Batch Adaptive Search Algorithm
Step 5. Calculate: Step , the search direction is unsuccessful, go to step 7; , the unsuccessful step is made, go to step 7.
Step 7. Calculate the number of unsuccessful steps from the current solution:  and go to step 10.
Step 10*. Put Step 11*. Check the condition for completing studies of the effect of the mini-batch size: if d N < , put 1, 1 d d s = + = and go to step 1;if d N = , go to step 12.
Step 12. As a result, find the best estimate of * Steps 10 and 11 are performed if necessary. It is recommended to do restarts to increase the chances of finding a global extremum. The best solution is selected from the restarts made.

Satellite Stabilization Problem
The problem of damping the rotational motion of the satellite by the engines installed on it is considered. The system describing the motion of a rigid body relative to the center of inertia after the transition to dimensionless variables has the form: At the final moment of the system functioning, the following conditions must be fulfilled: (1) (1) (1) 0, p q r = = = corresponding to the meaning of the satellite stabilization problem. The fulfillment of terminal conditions should be accompanied by minimization of the fuel used to turn the satellite.
The functional (6): Next, we will consider two examples: the joint estimations and control problem with incomplete information about the state vector and the optimal control problem with complete information about the state vector.

Example 1. The Joint Estimation and Control Problem
The proposed observer equation is: ( )

dp u t x t K t x t z t h t x t dt dq u t x t r p K t x t z t h t x t dt dr u t x t p q K t x t z t h t x t dt
Further, we will consider the cases of solving the problem with different models of the measuring system.
In all tests, the number of initial states is u t x and observer control( , ) K t x , a system of orthonormal Legendre polynomials is used.

Case A
The measuring system model is described by the following relationship: The behavior of trajectories set for different mini-batch sizes is shown in Figure 1:  Table 1 shows the results of solving the problem depending on the mini-batch size.   Table 2 shows the results of solving the problem depending on the mini-batch size.
The behavior of the trajectories set for different sizes of mini-batch is shown in Figure  3:  Table 3 shows the results of solving the problem depending on the mini-batch size. Based on Tables 1-3, we can conclude that, with an increase of the mini-batch size, the accuracy of the problem solution also increases. Figure 4 and Table 4 show the solution to the problem of satellite stabilization depending on the selected model of measuring systems with a mini-batch 27 d = :   From Figure 4 and Table 4, a similar character of convergence for different models of the measuring system is observed.

Example 2. The Control Problem with Complete Information about the State Vector
The measuring system model is described by the following relationship: T ( ) ( ( ), ( ), ( )) z t p t q t r t = In this case, there is no need to use a state observer because there is complete information about the state vector at an arbitrary moment in time. In practice, this case is rarely realized, but it is of interest for the analysis of losses in terms of the value of the cost functional associated with the incompleteness of the information received. In all tests, the number of generated random initial states is . To synthesize the plant control ( , ) u t x , a system of orthonormal Legendre polynomials is used.
The behavior of trajectories set for different sizes of mini-batch is shown in Figure 5:  Table 5 shows the results of solving the problem depending on the mini-batch size. Based on the results of examples 1 and 2, we can conclude that, with the mini-batch size 10 d = , good convergence of the estimates of the state vector coordinates to the true values is already achieved. The total execution time of the algorithm with the mini-batch size 10 d = was 30 min and with the mini-batch size 27 d = was 90 min based on an INTEL CORE i5 2.10 GHz processor. The results obtained indicate that, when using mini-batches, the required quality of transients is achieved at reasonable computational costs.

Conclusions
The developed zero-order metaheuristic optimization algorithm, namely, a mini-batch adaptive method of random search, is tested on the satellite stabilization problem of finding the optimal control for a pencil of trajectories of nonlinear deterministic systems emanating from a given set of initial states. The software for solving the problem of satellite stabilization is developed. Three cases of solving the problem for different models of the measuring system with incomplete information are considered. The analysis of the problem solution for different models of the measuring system with incomplete information is carried out. A comparison is made with the solution of the problem with a model of the measuring system containing complete information about the state vector. The study of the influence of the mini-batch size on the accuracy of the solution in each considered problem is carried out. Recommendations on the choice of the algorithm parameters are given. The obtained numerical results confirm the idea that, for a certain mini-batch size, an acceptable quality of transient processes can be achieved with low computational costs.