Next Article in Journal
Automatic Control for Time Delay Markov Jump Systems under Polytopic Uncertainties
Next Article in Special Issue
State and Control Path-Dependent Stochastic Zero-Sum Differential Games: Viscosity Solutions of Path-Dependent Hamilton–Jacobi–Isaacs Equations
Previous Article in Journal
Schur Complement-Based Infinity Norm Bounds for the Inverse of GDSDD Matrices
Previous Article in Special Issue
SiCaSMA: An Alternative Stochastic Description via Concatenation of Markov Processes for a Class of Catalytic Systems
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Optimal Stabilization of Linear Stochastic System with Statistically Uncertain Piecewise Constant Drift

Federal Research Center “Computer Science and Control” of the Russian Academy of Sciences, 44/2 Vavilova Str., 119333 Moscow, Russia
*
Author to whom correspondence should be addressed.
Mathematics 2022, 10(2), 184; https://doi.org/10.3390/math10020184
Submission received: 3 December 2021 / Revised: 24 December 2021 / Accepted: 30 December 2021 / Published: 7 January 2022
(This article belongs to the Special Issue Stochastic Processes and Their Applications)

Abstract

:
The paper presents an optimal control problem for the partially observable stochastic differential system driven by an external Markov jump process. The available controlled observations are indirect and corrupted by some Wiener noise. The goal is to optimize a linear function of the state (output) given a general quadratic criterion. The separation principle, verified for the system at hand, allows examination of the control problem apart from the filter optimization. The solution to the latter problem is provided by the Wonham filter. The solution to the former control problem is obtained by formulating an equivalent control problem with a linear drift/nonlinear diffusion stochastic process and with complete information. This problem, in turn, is immediately solved by the application of the dynamic programming method. The applicability of the obtained theoretical results is illustrated by a numerical example, where an optimal amplification/stabilization problem is solved for an unstable externally controlled step-wise mechanical actuator.

1. Introduction

The linear quadratic Gaussian (LQG) control problem is one of the most productive results in the control theory. It represents the synthesis of a control law for a linear-Gaussian stochastic system state, which minimizes the expected value of a quadratic utility function [1]. The application area of the result is extremely broad: Aerospace applications were and still are important and include, for example, re-entry vehicle tracking and orbit determination [1]; other uses include navigation and guidance systems, radar tracking, and sonars, i.e., the same tasks that became the first applications for the Kalman filter [2]. Non-stationary technological objects management also became a significant source of applications. These industrial applications arise in paper and mineral processing industries, biochemistry and energy, and many other domains [3]. Industrial applications are often represented by very specific examples, for example, dry process rotary cement kiln, cement raw material mixing, and thickness control of a plastic film [4,5]. Of particular interest are the tasks connected with the mechanics of drives and manipulators, e.g., two-link manipulator stabilization [6] or output feedback stabilization for robot control [7]. In fact, these models do not have stochastic elements, but the applications served as a source of motivation for the present research.
One of the cornerstones of the contemporary LQG technique is the separation theorem, which is immanent to control problems with incomplete information [8,9]. This result justifies a transparent engineering approach, according to which the initial control problem splits into the unobservable state estimation stage and control synthesis based on the obtained state estimate. In the general nonlinear/non-Gaussian case, the separation procedure does not deliver an absolute optimum in the class of the admissible controls. Nevertheless, forcing the separation of the state estimation and control optimization allows synthesizing high-performance controls for various applied problems. This approach is called separation principle. Its applications are not only the aforementioned traditional tasks for applying Kalman filtering and LQG control. Recent applications cover rather modern areas, e.g., of telecommunications: In [10,11], LQG and Risk-Sensitive LQG controls are used in Denial-of-Service (DoS) attack conditions, and [12] uses similar models and tasks for control applications in lossy data networks. In [13], user activity measurements are used to manage computational resources of a data center or deep learning. In [14], the ideas of estimation and control separation are adapted for neural network training tasks. We can also note the development of the meaning of the term “separation”. In [15], for hybrid systems containing real-time control loops and logic-based decision makers, the control loops are separated. In [16], separation is interpreted in the sense that the output feedback controller is based on some separate properties related to state feedback on the one hand and output to state stability on the other hand.
Control problems with incomplete information, especially in stochastic settings, are usually of much higher relevance than those with complete information. The separation principle, being applied in a general control problem, provides a chance to use known theoretical results and corresponding numerical procedures developed for optimal stochastic control with complete information [17,18,19]. The separation principle implies the necessity to calculate a most precise state estimate given the available indirect noisy observations, which results in the optimal filtering problem. Its general solution, in turn, is characterized by either Kushner–Sratonovich [20,21] or Duncan–Mortensen–Zakai [22,23,24] equations, which describe posterior state distribution evolution in time. An approach developed in [25,26,27] results in easily calculated state estimates only in a limited number of special cases. Some examples can be found in [28] for a controlled Markov jump process (MJP).
For the quadratic criterion, the most general model where the separation theorem is valid was considered in [29,30,31]. The main goal of our research is to apply these results to a typical linear model of a mechanical actuator augmented by MJP to simulate changes in the actuator operating modes or external drift. The difference is that, in mentioned papers, there was no quadratic term involving the observed state; the criterion in [31] is more general but still does not include a term with an unobservable MJP. In this case, the obtained results provide an optimal control in the form of linear dependency on the observed state and the optimal estimate of the unobservable component. The latter is interpreted as an uncontrollable disturbance and is not included in the criterion. In the target application of our work, unobservable MJP represents the state of a controlled system. This state can be called an operating mode, and the purpose of control is to monitor this mode and to stabilize the system under the conditions determined by the mode switches. For the analysis of such a system, we do not need general models of the unobserved component as in [29,31] and the corresponding optimal filtering equations, which are difficult to implement in practice. For our purposes, an MJP model as in [30] and optimal Wonham filter [32] are sufficient. Thus, the main theoretical goal of our research is to develop the result of [30] for the case of a general quadratic criterion containing a term with unobservable MJP. The practical goal is to implement this statement in order to stabilize the mechanical actuator in conditions of changing operating modes.
The paper is organized as follows. Section 2 contains a thorough statement of an optimal control problem. The controlled observable output represents a solution to a linear stochastic differential system governed by some unobservable finite-state Markov jump process. The control appears on the right-hand side (RHS) as an additive term with some nonrandom matrix gain. The optimality criterion is a generalization of the traditional quadratic one. It allows investigating various optimal control problems (MJP state tracking in terms of integral performance index, control assistance in systems with noisy controls, damping of the statistically uncertain exogenous step-wise disturbances, etc.) using a unified approach.
On the one hand, a slight modification of the Wonham filter admits the calculation of the conditional expectation of an unobservable MJP state given output observations. On the other hand, the investigated dynamic system can be rewritten in form, which can be treated as a controlled system with complete information. In turn, the optimality criterion can be transformed to a form where the dependency on the unobserved MJP is only through its estimate. These manipulations, performed in Section 3 legitimize the usage of the separation principle in the investigated control problem (see Proposition 1).
Generally, in order to synthesize the optimal control, one has to solve the dynamic programming equation. However, the similarity of the considered control problem to the “classic” LQG case gives a hint for the form of the Bellman function. One can observe that it is a quadratic function of the controlled state. Hence, the problem is reduced to the characterization of corresponding coefficients. Section 4 and Section 5 contain their derivation.
Section 6 presents an illustrative example of the obtained theoretical results application. The object of the control is a mechanical actuator (e.g., the cart of a gantry crane). This actuator performs step-wise movements governed by an external unobservable controlling signal, which is an MJP corrupted by noise. The goal is the assistance of the actuator to carry out external commands more accurately. Section 7 contains concluding remarks.

2. Problem Statement

On the canonical probability space with filtration Ω , F , P , F t , t 0 , T , we consider a partially observable stochastic dynamic system:
d z t = a t y t d t + b t z t d t + c t u t d t + σ t d w t , z 0 = Z ,
d y t = Λ t T y t d t + d λ t , y 0 = Y ,
where z t R n z is an observable system state governed by a statistically uncertain (i.e., unobservable random) drift y t R n y . We suppose that y t is a finite-state Markov jump process (MJP) with values in the set { e 1 , , e n y } formed by unit coordinate vectors of the Euclidean space R n y . The initial value Y has a known distribution π 0 , Λ t is a transition intensity matrix, and λ t is a F t -adapted martingale with quadratic characteristic [33].
λ , λ t = 0 t diag Λ s T y s Λ s T diag y s diag y s Λ s d s .
We denote natural filtration generated by system state z as F t z and assume that F t z F t F . Furthermore, w t R n w is a standard vector Wiener process. Z R n z is a random vector with a finite variance. Z, { y t } , and { w t } are mutually independent; u t R n u is an admissible control. We suppose that u t = u t , z t ; thus, the admissible control is a feedback control: u = u t , z , z R n z [19]. We also assume that state z t is uniformly non-degenerate, i.e., σ t σ t T κ I > 0 , where κ > 0 is some positive value, and I stands for a unit matrix of appropriate dimensionality.
The quality of the control law U 0 T = { u t , z , 0 t T , z R n z } is defined by the following objective:
J U 0 T = E 0 T P t y t + Q t z t + R t u t 2 d t + P T y T + Q T z T 2 ,
where P t R n J × n y , Q t R n J × n z , and R t R n J × n u are known bounded matrix functions; x 2 = x T x is the Euclidean norm of a vector x; and below we also use the Euclidean norm x 2 = tr x T x of a matrix x. In order to eliminate the possibility of zero penalties for individual components of the control vector u t , which would make objective (3) physically incorrect, we assume that the common non-degeneracy assumption R t T R t κ I > 0 is valid.
Note that the stochastic system models (1) and (2) are the same as in [30] and are special cases of models in [29,31]. On the contrary, objective (3) includes additional terms. Namely, the objective from [29,30] is obtained at P t = 0 , Q t = 0 , and the one from [31] is obtained at P t = 0 . This reflects the key terminological difference of the problem under consideration: We consider y t as an element of the observation system, external drift, and not as a complex measurement error. Thus, we represent objective (3), which may find its application in tracking the drift with state y t z t 2 or tracking the drift with control z t u t 2 problems, which take into account control u t 2 .
The matrix functions a t R n z × n y , b t R n z × n z , c t R n z × n u , and σ t R n z × n w are bounded: a t + b t + c t + σ t C for all 0 t T . Thus, the existence of a strong solution to (1) is guaranteed for any admissible control u t . Moreover, we assume that all used time functions Λ t , a t , b t , c t , σ t , P t , Q t , R t , and S t are piecewise continuous to ensure that the typical conditions for the existence of solutions to the ordinary differential equations (ODE) obtained below are satisfied.
Our goal is to find an admissible control U * 0 T = { u * t , z , 0 t T , z R n z } , which delivers the minimum to the quadratic criterion J U 0 T :
U * 0 T = arg min J U 0 T ,
under the assumption of its existence.

3. Separation of Filtering and Control

In order to solve the problem formulated in Section 2, we need to transform it into a form that permits revealing two necessary properties. We note that for system (1) and (2) an optimal filtering problem solution is known: The expectation E { y t | F t z } is given by the Wonham filter [33]. First, for the separation, we need to find such a representation of this filter that shows that the quality of the optimal filter estimate does not depend on the chosen control law. The second goal is to decompose objective (3) into independent additive terms, which determine separately the quality of control and the quality of estimation.
To that end, we propose a change of variables in (1) to eliminate terms b t z t d t and c t u t d t . Denote by B t R n z × n z a solution to equation d B t = B t b t d t , which is the matrix exponential B t = exp 0 t b s d s . Moreover, denote by z t ( 1 ) R n z the following linear output transform:
z t ( 1 ) = B t z t 0 t B s c s u s d s .
By differentiating, we obtain the following:
d z t ( 1 ) = B t b t z t d t + B t d z t B t c t u t d t = = B t d z t b t z t d t c t u t d t = = B t a t y t d t + B t σ t d w t
or
d z t ( 1 ) = a t ( 1 ) y t d t + σ t ( 1 ) d w t , z 0 ( 1 ) = Z ,
with additional notation a t ( 1 ) = B t a t , σ t ( 1 ) = B t σ t .
Taking into account B t 1 = exp 0 t b s d s , for what follows, note the following:
B t 1 d z t ( 1 ) = d z t b t z t d t c t u t d t .
The transformations we made do not affect the filtering problem solution, i.e., E y t | F t z = E y t | F t z ( 1 ) . That is because the change of variables used to obtain z t ( 1 ) is a linear non-degenerate transformation of z t . Hence, the optimal filtering estimate does not depend on u t , and we can use z t ( 1 ) as observations and obtain the same drift estimate no matter which admissible control law is chosen. Consequently, in place of the estimate problem given the control-dependent observations z t , we pass to an equivalent problem of y t estimation given the transformed observations z t ( 1 ) , which are described by Equation (5) and do not depend on u t .
Denoting the optimal estimate y ^ t = E y t | F t z ( 1 ) , we write the equation for it. The uniform non-degeneracy of σ t ( 1 ) σ t ( 1 ) T along with the one of the matrix exponential B t [34] legitimizes the use of the Wonham filter [33]:
d y ^ t = Λ t T y ^ t d t + diag y ^ t y ^ t y ^ t T a t ( 1 ) T σ t ( 1 ) σ t ( 1 ) T 1 × × d z t ( 1 ) a t ( 1 ) y ^ t d t , y ^ 0 = E Y .
Replacing variables a t ( 1 ) , σ t ( 1 ) , and z t ( 1 ) , we obtain the following:
d y ^ t = Λ t T y ^ t d t + diag y ^ t y ^ t y ^ t T a t T B t T B t σ t σ t T B t T 1 / 2 × × B t σ t σ t T B t T 1 / 2 B t B t 1 d z t ( 1 ) a t y ^ t d t = = Λ t T y ^ t d t + diag y ^ t y ^ t y ^ t T a t T B t T B t σ t σ t T B t T 1 B t × × d z t b t z t d t c t u t d t a t y ^ t d t .
Taking into account B t T B t σ t σ t T B t T 1 B t = σ t σ t T 1 , we have the final representation for the Wohnam filter as follows:
d y ^ t = Λ t T y ^ t d t + diag y ^ t y ^ t y ^ t T a t T σ t σ t T 1 / 2 × × σ t σ t T 1 / 2 d z t b t z t d t c t u t d t a t y ^ t d t , y ^ 0 = E Y ,
where
d W t = σ t σ t T 1 / 2 d z t b t z t d t c t u t d t a t y ^ t d t
is a differential of the F t z -adapted standard vector Wiener process. This provides us the final observation equation:
d z t = a t y ^ t d t + b t z t d t + c t u t d t + σ ^ t d W t ,
where σ ^ t = σ t σ t T 1 / 2 , and the final new state equation is as follows:
d y ^ t = Λ t T y ^ t d t + diag y ^ t y ^ t y ^ t T a t T σ t σ t T 1 / 2 d W t .
Furthermore, by denoting the following:
Σ t = Σ t y = diag y y y T a t T σ t σ t T 1 / 2 , y R n y ,
we obtain a control system with complete information:
d z t = a t y ^ t d t + b t z t d t + c t u t d t + σ t ^ d W t , z 0 = Z , d y ^ t = Λ t T y ^ t d t + Σ t y ^ t d W t , y ^ 0 = E Y ,
where both variables y ^ t and z t are observable.
For final formulation of the equivalent control problem, one needs to perform the change of variables in objective (3). This can be conducted by using the law of total expectation [35] as follows:
J U 0 T = E 0 T P t y t y ^ t + y ^ t + Q t z t + R t u t 2 d t + + P T y T y ^ T + y ^ T + Q T z T 2
= E 0 T P t y ^ t + Q t z t + R t u t 2 d t + P T y ^ T + Q T z T 2 + + 0 T P t y t y ^ t 2 d t + P T y T y ^ T 2 .
Since the last additive term in the expression for J U 0 T does not depend on U 0 T but merely characterises the quality of the optimal filter estimate y ^ t , it can be excluded from the criterion. Hence, the new objective takes the following form:
J U 0 T = E 0 T P t y ^ t + Q t z t + R t u t 2 d t + P T y ^ T + Q T z T 2 .
The final statement is formulated as follows.
Proposition 1
(Separation theorem). The optimal ( F t y ^ , z -adapted) feedback control for the system with complete information (9) and objective (10) is at the same time an optimal ( F t z -adapted) solution for the optimal control problem in the original system (1) and (2) with objective (3).
Note that control U * 0 T minimizes both functionals (10) and (3), and the difference
0 T P t y t y ^ t 2 d t + P T y T y ^ T 2
between J U 0 T from (10) and J U 0 T from (3) does not depend on U 0 T ; thus, there is no need for another notation for the objective.

4. Solution to Control Problem

In order to find the control for the system (9), which is optimal with respect to the criterion (3), we use the classical dynamic programming method [18,19].
Denote the following Bellman function:
V t = V t y , z = inf U t T E t T P s y ^ s + Q s z s + R s u s 2 d s + P T y ^ T + Q T z T 2 ,
where U t T = { u s , y , z , t s T , y R n y , z R n z } . Then, with Σ t = Σ t y the dynamic programming equation is as follows:
V t t + 1 2 tr Σ t T 2 V t y 2 Σ t + σ ^ t T 2 V t z 2 σ ^ t + 2 Σ t T 2 V t y z σ ^ t + min u y T Λ t V t y + a t y + b t z + c t u T V t z + P t y + Q t z + R t u 2 = 0 , V T = P T y + Q T z 2 ,
where admissible feedback control u = u t , y , z corresponds to system (9) designations; thus, u t = u t , y ^ t , z t . Note that this is the same control that appeared in the original statement, since y ^ t = y ^ t ( z t ) .
The existence of a solution to (11) is a sufficient optimality condition. Moreover, the optimal control is the one that minimizes to the corresponding additive term. Hence, under condition R t T R t α I > 0 and under the assumption of (11) solution existence, this minimum is delivered by the following control:
u * = u * ( t , y , z ) = 1 2 R t T R t 1 c t T V t z + 2 R t T P t y + Q t z .
By substituting u t * in (11) and regrouping the additive terms, we obtain the following:
V t t + 1 2 tr Σ t T 2 V t y 2 Σ t + σ ^ t T 2 V t z 2 σ ^ t + 2 Σ t T 2 V t y z σ ^ t + y T Λ t V t y + y T a t T + z T b t T R t T P t y + Q t z T R t T R t 1 c t T V t z + P t y + Q t z T I R t R t T R t 1 R t T P t y + Q t z 1 4 V t z T c t R t T R t 1 c t T V t z = 0 .
The linear observations (9) and quadratic objective (10) suggest that a solution of the dynamic programming Equation (13) might be presented as follows:
V t = V t y , z = z T α t z + z T β t y + γ t y ,
which reduces the optimal solution search to a problem of finding a symmetric matrix function α t , vector function β t y , and a scalar γ t y . Moreover, the explicit representation for γ t y is not necessary for optimal control. One only needs a derivative V t z = 2 α t z + β t y ; nevertheless, γ t y will be useful in determining the quality of control.
The Bellman function representation (14) can be simplified further by using the fact that the term with derivative V t y in (13) contains only the multiplier y t Λ t T . This suggests that β t y is an affine transform of y:
β t y = B t y ,
where matrix B t R n z × n y . The boundary condition at the same time takes the following form:
V T = P T y + Q T z 2 = z T Q T T Q T z + 2 z T Q T T P T y + y T P T T P T y ,
In other words, we have the following:
α T = Q T T Q T , B T = 2 Q T T P T , γ T y = y T P T T P T y ,
and the optimal control (12) can finally be represented as
u * = u * ( t , y , z ) = 1 2 R t T R t 1 c t T 2 α t z + B t y + 2 R t T P t y + Q t z .
Thus, u t * contains two terms: The first term is linear in observations (output variable) z; the second one is linear in state (state estimate given by Wonham filter) y. In order to justify the assumptions made, it is necessary to derive α t , B t , and γ t y such that under assumptions (14) and (15) made about the Bellman function V t and the affinity of β t y , V t from (14) would in fact be a solution of (13). To that end, we substitute (14) into Equation (13) and take into account that the derivatives of β t y according to (15) are equal to 2 β t y y 2 = 0 ,   β t y y = B t T , and also that the following is the case:
y T Λ t V t y = V t y T Λ t T y = z T B t Λ t T y + y T Λ t γ t y y , 2 V t y z = B t T .
After this substitution and minor transformations, grouping the terms at z T and z, at z T and at z 0 , we obtain equations for α t , β t y = B t y and γ t = γ t y , respectively:
d α t d t + M t α α t + α t M t α T + N t α α t c t R t T R t 1 c t T α t = 0 ,
d B t d t y + B t Λ t T y + M t β y + N t β B t y = 0 ,
γ t t + 1 2 tr Σ t T 2 γ t y 2 Σ t + tr σ ^ t T α t σ ^ t + Σ t T B t T σ ^ t + y T Λ t γ t y + M t γ y = 0 ,
where the following is the case:
M t α = b t T Q t T R t R t T R t 1 c t T , N t α = Q t T I R t R t T R t 1 R t T Q t , M t β = 2 α t a t c t R t T R t 1 R t T P t + Q t T I R t R t T R t 1 R t T P t , N t β = b t T Q t T R t R t T R t 1 c t T α t c t R t T R t 1 c t T , M t γ y = y T B t T a t c t R t T R t 1 R t T P t y + y T P t T I R t R t T R t 1 R t T P t y 1 4 y T B t T c t R t T R t 1 c t T B t y .
Finally, from (19), we obtain the equation for B t :
d B t d t + B t Λ t T + M t β + N t β B t = 0 .
The obtained equations are solved with boundary conditions (16). In this case, Equation (18) is a matrix Riccati equation for a square symmetric matrix α t . The above assumptions regarding the piecewise continuity of the coefficients of this equation and condition R t T R t > 0 are sufficient for the existence and uniqueness of a positive semidefinite solution for every 0 t T . Indeed, such an equation occurs in the classical linear-quadratic problem, which admits a unique optimal feedback control—linear with respect to the output Z t —and with a gain described by this Riccati equation [36].
Equation (21) is a system of linear ODEs with respect to variable B t with piecewise continuous coefficients; hence, the existence and uniqueness of a solution is ensured by the usual theorem for linear differential equations [37].
Equation (20) for γ t y is a linear partial differential equation of parabolic type, but unlike the equation for β t y , it can not be simplified. We will assume that this equation has a solution, and we interpret this assumption as a sufficient condition the same as the above dynamic programming Equation (11).
The following statement summarizes the reasoning in the section.
Proposition 2
(Output control under complete information). If the dynamic programming Equation (11) solution exists, then this solution admits representation (14), (15) with coefficients α t , B t , and γ t y given by (18), (21), and (20), respectively, and the optimal control u * is given by (17).

5. Stochastic Representation of Gamma Functional

In order to implement control (12), one needs functions α t and B t . These can be calculated by means of any effective numerical method of ODE solution; hence, the practical solution of Equations (18) and (21) presents no difficulties. At the same time, the quality of optimal control is given by a solution of the dynamic programming equation:
J U * 0 T = V 0 Y , Z = E Z T α 0 Z + Z T B 0 Y + γ 0 Y ,
Therefore, functional γ t y is also required. An approximate solution to (20) can be also obtained numerically by traditional grid methods, especially since these methods are well studied for equations of parabolic type. Nevertheless, growth of the dimensionality of Y may cause certain difficulties. However, the main thing is that, for Equation (20), there is only an initial condition γ T y defined by (16), more precisely, a terminal condition, and there is no boundary condition such as γ t y m a x , which is necessary for numerical treatments.
That is why, in our opinion, an alternative approach based on the known relation between solutions of partial differential equations of parabolic type and stochastic equations is more promising. An example of such relation is the Kolmogorov equation [38], which determines the connection between the solution of the Cauchy problem with the terminal condition for the parabolic equation, on the one hand, and the solution of the related stochastic equation, on the other hand. Moreover, here, we need a simpler integral representation of the solution of a parabolic equation, known as the Feynman–Kac formula [39]. The application of such tools for the numerical solution of parabolic equations has been well studied in a more general setting than we need [40].
For differential Equation (20), the Feynman–Kac formula has the following form:
γ t y = E y ^ T T P T T P T y ^ T + t T M τ γ y ^ τ d τ | F t y .
Here, we deliberately use the same notation for two different random processes:
  • Random process, which generates a terminal condition under the mathematical expectation and passes through point t , y : y ^ t = y ;
  • The Wonham filter estimate, which substitutes the state in observation Equation (9).
This emphasizes the fact that both these processes are described by the same equation and have the same probabilistic characteristics.
For practical applications of (22), we rewrite this relation by adding an auxiliary variable y ^ t γ in a differential form:
d y ^ τ = Λ τ T y ^ τ d τ + Σ τ y ^ τ d W τ , y ^ t = y , t τ T , d y ^ τ γ d τ = M τ γ y ^ τ d τ , y ^ t γ = 0 , γ t y = E y ^ T T P T T P T y ^ T + y ^ T γ .
First, (23) is a stochastic representation of γ t y , which explains the meaning of this coefficient. Secondly, (23) provides a basis for an approximate calculation of this coefficient. To be more precise, the proposed method of the numerical calculation of coefficient γ t y consists in applying the Monte Carlo method to system (23). To be more specific, one needs a series of sample solutions of this system for different initial conditions t , y in order to obtain an approximate value of γ t y as a statistical estimate of the terminal condition in (23).

6. Assistance of Mechanical Actuator Control

We investigate a controllable mechanical actuator designed for step-wise movement. Its model is described by the following dynamic system:
d x t v t = 0 1 a b x t v t d t 0 0 0 c 1 c 2 c 3 y t d t + 0 h u t d t + 0 g d w t , t ( 0 , T ] ,
x 0 v 0 N 0 0 , σ x 2 0 0 σ v 2 .
Here, we have the following:
  • x t R 1 is an observable actuator position;
  • v t R 1 is an observable actuator velocity;
  • y t e 1 , e 2 , e 3 is an unobservable external control action, supposed to be an MJP with known transition intensity matrix Λ and initial distribution vector π 0 ;
  • w t R 1 is an external control noise supposed to be a standard Wiener process;
  • u t R 1 is an admissible assisting control.
Thus, in the original notation, we have three-dimensional external unobservable control y t corrupted by the one-dimensional Wiener noise w t , two-dimensional observable state vector z t = col ( x t , v t ) , and one-dimensional control u t . The other notations are as follows:
a t = 0 0 0 c 1 c 2 c 3 , b t = 0 1 a b , c t = 0 h , σ t = 0 g .
It is easy to verify that in the case of noiseless (i.e., g = 0 ) external constant control action y t e j , the actuator coordinate x t tends to its steady-state value C j = c j a as t . Without loss of generality, we suppose that all components of vector C = row ( c 1 a , c 2 a , c 3 a ) are different.
In the case of piecewise constant control action y t and constant g > 0 , the mechanical actuator cannot implement the expected “ideal” step-wise trajectory C y t due to the inertia and noise accompanying the external control. Thus, the aim of the assisting control u t is to improve actuator performance, and the corresponding criterion has the following form:
J ( U 0 T ) = E 0 T ( C y t x t ) 2 + R u t 2 d t + 1 T ( C y T x T ) 2 min U 0 T ,
where R > 0 is a dimensionless unit cost of the assisting control.
We perform the calculations for the following parameter values:
Λ = 0.5 0.5 0 0.5 1 0.5 0 0.5 0.5 ; π 0 = 1 0 0 ,
a = 1 , b = 0.5 , T = 10 , h = 10 , g = 0.01 , σ x = 1 , σ v = 1 , R = 0.01 ,
c 1 c 2 c 3 = 1 0 1 , C = C 1 C 2 C 3 = 1 0 1 .
Stochastic system (24) is solved by the Euler–Maruyama numerical scheme with step 0.01 ; meanwhile, Equations (18) and (21) are solved using the implicit Euler’s scheme with the same step.
Note, that the violation of condition σ t σ t T > 0 in (24) does not prevent the use of the obtained result, since coordinate x t in the state z t is a simple integral of the velocity v t ; thus, in the filtering problem, one can use only observation v t .
It is easy to verify that the actuator is stable if b < 0 and b 2 + 4 a < 0 . In the example, we investigate only the stable case.
Figure 1, Figure 2 and Figure 3 present assisting control coefficients.
Figure 1 contains plots of the evolution in time for coefficients α t ( 11 ) , α t ( 12 ) , and α t ( 22 ) ; Figure 2 stands for coefficients B t ( 11 ) , B t ( 12 ) and B t ( 13 ) ; Figure 3 stands for coefficients B t ( 21 ) , B t ( 22 ) and B t ( 23 ) . Here, we denote by x ( i j ) the element of matrix x with indexes i , j .
Figure 4 presents the comparison of the unobserved external control term c y t and its estimate c y ^ t calculated by the Wonham filter.
Figure 5 contains plots of the actuator coordinate governed by various types of assisting control:
  • x t ( 0 ) obtained without assisting control, i.e., for u t = u t ( 0 ) 0 ;
  • x t m e obtained under the trivial assisting control, i.e., for u t = u t m e u * ( t , y t m e , z t m e ) , where control law u * ( t , y , z ) is defined in (17), y t m e = E y t , z t m e = E z t ;
  • the optimal trajectory x t * calculated for the optimal control u t = u t * ;
  • the ”ideal” trajectory x t * * calculated for control u t = u t * * u * ( t , y t , z t * * ) , i.e., optimal control law u * defined in (17) with full information y = y t , z = z t * * .
A comparison with the step-wise “target” trajectory C y t is conducted.
Figure 6 contains plots of the actuator velocity governed by various types of the assisting control:
  • v t ( 0 ) obtained under control u t ( 0 ) ;
  • v t m e obtained under control u t m e ;
  • The optimal trajectory v t * calculated for the optimal control u t * ;
  • The ”ideal” trajectory v t * * calculated for the control u t * * .
Finally, Figure 7 contains the following:
  • The optimal control u t * ;
  • The “ideal” control u t * * .
We also calculate objective values for various control types: J ( ( U * ) 0 T ) = 1.71 . Meanwhile, J ( ( U ( 0 ) ) 0 T ) = 8.65 and J ( ( U m e ) 0 T ) = 7.83 , i.e., the optimal assistance improves the actuator performance significantly. Additionally, we calculate the performance index for the “ideal” control: J ( ( U * * ) 0 T ) = 1.29 , i.e., it does not demonstrate superiority in comparison with optimal control U * .

7. Conclusions

The principal results of the paper are as follows: the derived explicit form (17) of the optimal control in the problem at hand and the fact that this optimal solution is indeed a feedback control, in which the key role is played by the solution of the auxiliary optimal filtering problem, i.e., conditional expectation Y t = E y t | F t Z .
This provided grounds to formulate the main result, which is the separation theorem, generalizing the results of [29,30,31] to a general criterion (3). The fundamental difference between the classical LQG and the alternatives proposed in the referenced papers and the present research is that the transformed problem is not similar to the original one: the martingale representation of MJP (2) is replaced by a completely different object—the stochastic Ito equation with a Wiener process (9). Thus, the key to the solution turned out to be a unique property of the Wonham filter, which presents the MJP estimate given the indirect noisy observations in the form of the classical Ito equation with a Wiener process.
The obtained results admit the following interpretation and extensions. First, in the initial partially observable system, process y t can be treated as an unobservable system state; meanwhile, process z t plays the role of the controlled observable output. This point of view can be useful for various applications. Second, the obtained results remain valid in general if one complements the initial optimality criterion (3) by the additional terms y t z t 2 for the output optimal tracking and by u t 2 and z t 2 for the optimal output/cost control. Third, the MJP y t can be replaced by any random process with a finite moment of the second order described by some nonlinear stochastic differential equation. In this case, the separation theorem is still correct, but coefficient β in the optimal control is defined by some partial differential equation, and the optimal filtering estimate y ^ t is described via the solution to the Kushner–Strtonovich or Duncan–Mortensen–Zakai equation. Fourth, if y t is an MJP, the diffusion coefficient σ t in (1) can be replaced by function σ ( t , y t ) . In this manner, we extend the class dynamics by the systems with both the statistically uncertain piecewise constant drift and diffusion. All separation theorems and optimal control formulae remain valid. However, the solution to the optimal filtering problem is not expressed by the original Wonham filter but via its generalization [41,42].

Author Contributions

Conceptualization, A.B. (Andrey Borisov); methodology, A.B. (Alexey Bosov); software, G.M.; validation, A.B. (Andrey Borisov); formal analysis and investigation, A.B. (Alexey Bosov); writing—original draft preparation, A.B. (Alexey Bosov), G.M.; writing—review and editing, A.B. (Alexey Bosov); visualization, G.M.; supervision, A.B. (Andrey Borisov). All authors have read and agreed to the published version of the manuscript.

Funding

The research was supported by the Ministry of Science and Higher Education of the Russian Federation, project No. 075-15-2020-799.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data sharing is not applicable to this article.

Acknowledgments

The research was carried out using the infrastructure of the Shared Research Facilities «High Performance Computing and Big Data» (CKP «Informatics») of FRC CSC RAS (Moscow).

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
LQGLinear quadratic Gaussian;
MJPMarkov jump process;
ODEOrdinary differential equation;
RHSRight-hand side.

References

  1. Athans, M. The Role and Use of the Stochastic Linear-Quadratic-Gaussian Problem in Control System Design. IEEE Trans. Autom. Control 1971, 16, 529–552. [Google Scholar] [CrossRef]
  2. Cipra, B.A. Engineers look to Kalman filtering for guidance. SIAM News 1993, 26, 8–9. [Google Scholar]
  3. Johnson, A. LQG applications in the process industries. Chem. Eng. Sci. 1993, 48, 2829–2838. [Google Scholar] [CrossRef]
  4. Mäkilä, P.; Westerlund, T.; Toivonen, H. Constrained linear quadratic gaussian control with process applications. Automatica 1984, 20, 15–29. [Google Scholar] [CrossRef]
  5. Westerlund, T. A digital quality control system for an industrial dry process rotary cement kiln. IEEE Trans. Autom. Control 1981, 26, 885–890. [Google Scholar] [CrossRef]
  6. Ammar, S.; Mabrouk, M.; Vivalda, J.C. Observer and global output feedback stabilisation for some mechanical systems. Int. J. Control 2009, 82, 1070–1081. [Google Scholar] [CrossRef]
  7. Battilotti, S.; Lanari, L.; Ortega, R. On the Role of Passivity and Output Injection in the Output Feedback Stabilisation Problem: Application to Robot Control. Eur. J. Control 1997, 3, 92–103. [Google Scholar] [CrossRef]
  8. Wonham, W.M. On the Separation Theorem of Stochastic Control. SIAM J. Control 1968, 6, 312–326. [Google Scholar] [CrossRef]
  9. Georgiou, T.T.; Lindquist, A. The Separation Principle in Stochastic Control, Redux. IEEE Trans. Autom. Control 2013, 58, 2481–2494. [Google Scholar] [CrossRef]
  10. Amin, S.; Cárdenas, A.A.; Sastry, S.S. Safe and Secure Networked Control Systems under Denial-of-Service Attacks. In Hybrid Systems: Computation and Control; Majumdar, R., Tabuada, P., Eds.; Springer: Berlin/Heidelberg, Germany, 2009; pp. 31–45. [Google Scholar]
  11. Befekadu, G.K.; Gupta, V.; Antsaklis, P.J. Risk-Sensitive Control Under Markov Modulated Denial-of-Service (DoS) Attack Strategies. IEEE Trans. Autom. Control 2015, 60, 3299–3304. [Google Scholar] [CrossRef]
  12. Sinopoli, B.; Schenato, L.; Franceschetti, M.; Poolla, K.; Sastry, S. An LQG Optimal Linear Controller for Control Systems with Packet Losses. In Proceedings of the 44th IEEE Conference on Decision and Control, Seville, Spain, 15 December 2005; pp. 458–463. [Google Scholar] [CrossRef] [Green Version]
  13. Bosov, A.V. Discrete stochastic system linear output control with respect to a quadratic criterion. J. Comput. Syst. Sci. Int. 2016, 55, 349–364. [Google Scholar] [CrossRef]
  14. Achille, A.; Soatto, S. A Separation Principle for Control in the Age of Deep Learning. Annu. Rev. Control Robot. Auton. Syst. 2018, 1, 287–307. [Google Scholar] [CrossRef] [Green Version]
  15. Bencze, W.; Franklin, G. A separation principle for hybrid control system design. IEEE Control Syst. Mag. 1995, 15, 80–84. [Google Scholar] [CrossRef]
  16. Besancon, G.; Battilottib, S.; Lanari, L. A new separation result for a class of quadratic-like systems with application to Euler–Lagrange models. Automatica 2003, 39, 1085–1093. [Google Scholar] [CrossRef]
  17. Kushner, H.; Dupuis, P.G. Numerical Methods for Stochastic Control Problems in Continuous Time; Stochastic Modelling and Applied Probability Series; Springer: New York, NY, USA, 2001; Volume 24. [Google Scholar] [CrossRef]
  18. Bertsekas, D.P. Dynamic Programming and Optimal Control; Athena Scientific: Cambridge, MA, USA, 2013. [Google Scholar]
  19. Fleming, W.H.; Rishel, R.W. Deterministic and Stochastic Optimal Control; Stochastic Modelling and Applied Probability Series; Springer: New York, NY, USA, 1975; Volume 1. [Google Scholar] [CrossRef]
  20. Stratonovich, R.L. Conditional Markov Processes. Theory Probab. Appl. 1960, 5, 156–178. [Google Scholar] [CrossRef]
  21. Kushner, H.J. On the differential equations satisfied by conditional probability densities of Markov processes with applications. J. Soc. Ind. Appl. Math. Ser. A Control 1964, 2, 106–119. [Google Scholar] [CrossRef]
  22. Mortensen, R.E. Stochastic Optimal Control with Noisy Observations. Int. J. Control 1966, 4, 455–464. [Google Scholar] [CrossRef]
  23. Duncan, T.E. On the Absolute Continuity of Measures. Ann. Math. Stat. 1970, 41, 30–38. [Google Scholar] [CrossRef]
  24. Zakai, M. On the optimal filtering of diffusion processes. Z. Wahrsch. Verw. Geb. 1969, 11, 230–243. [Google Scholar] [CrossRef]
  25. Davis, M.H.A.; Varaiya, P. Dynamic Programming Conditions for Partially Observable Stochastic Systems. SIAM J. Control 1973, 11, 226–261. [Google Scholar] [CrossRef]
  26. Beneš, V.; Karatzas, I. On the Relation of Zakai’s and Mortensen’s Equations. SIAM J. Control Optim. 1983, 21, 472–489. [Google Scholar] [CrossRef]
  27. Bensoussan, A. Stochastic Control of Partially Observable Systems; Cambridge University Press: Cambridge, UK, 1992. [Google Scholar]
  28. Miller, B.M.; Avrachenkov, K.E.; Stepanyan, K.V.; Miller, G.B. Flow Control as a Stochastic Optimal Control Problem with Incomplete Information. Probl. Inf. Transm. 2005, 41, 150–170. [Google Scholar] [CrossRef] [Green Version]
  29. Beneš, V. Quadratic Approximation by Linear Systems Controlled From Partial Observations. In Stochastic Analysis; Mayer-Wolf, E., Merzbach, E., Shwartz, A., Eds.; Academic Press: Cambridge, MA, USA, 1991; pp. 39–50. [Google Scholar] [CrossRef]
  30. Helmes, K.; Rishel, R. The solution of a partially observed stochastic optimal control problem in terms of predicted miss. IEEE Trans. Autom. Control 1992, 37, 1462–1464. [Google Scholar] [CrossRef]
  31. Rishel, R. A Strong Separation Principle for Stochastic Control Systems Driven by a Hidden Markov Model. SIAM J. Control Optim. 1994, 32, 1008–1020. [Google Scholar] [CrossRef]
  32. Wonham, W.M. Some Applications of Stochastic Differential Equations to Optimal Nonlinear Filtering. J. Soc. Ind. Appl. Math. Ser. A Control 1964, 2, 347–369. [Google Scholar] [CrossRef]
  33. Elliott, R.J.; Aggoun, L.; Moore, J.B. Hidden Markov Models. Estimation and Control; Stochastic Modelling and Applied Probability; Springer: New York, NY, USA, 1995. [Google Scholar] [CrossRef]
  34. Bhatia, R. Matrix Analysis; Graduate Texts in Mathematics; Springer: New York, NY, USA, 1997. [Google Scholar] [CrossRef]
  35. Shiryaev, A.N. Probability; Graduate Texts in Mathematics; Springer: New York, NY, USA, 1996. [Google Scholar] [CrossRef]
  36. Davis, M. Linear Estimation and Stochastic Control; A Halsted Press Book; Chapman and Hall: London, UK, 1977. [Google Scholar]
  37. Hurewicz, W. Lectures on Ordinary Differential Equations; Dover Phoenix Editions; Dover Publications, Incorporated: New York, NY, USA, 2002. [Google Scholar]
  38. Gihman, I.I.; Skorohod, A.V. The Theory of Stochastic Processes III; Springer: New York, NY, USA, 1979. [Google Scholar] [CrossRef]
  39. Oksendal, B. Stochastic Differential Equations: An Introduction with Applications; Universitext; Springer: Berlin/Heidelberg, Germany, 2003. [Google Scholar] [CrossRef]
  40. Fahim, A.; Touzi, N.; Warin, X. A probabilistic numerical method for fully nonlinear parabolic pdes. Ann. Appl. Probab. 2011, 21, 1322–1364. [Google Scholar] [CrossRef]
  41. Borisov, A.; Sokolov, I. Optimal Filtering of Markov Jump Processes Given Observations with State-Dependent Noises: Exact Solution and Stable Numerical Schemes. Mathematics 2020, 8, 506. [Google Scholar] [CrossRef] [Green Version]
  42. Borisov, A.; Bosov, A.; Miller, G.; Sokolov, I. Partial Diffusion Markov Model of Heterogeneous TCP Link: Optimization with Incomplete Information. Mathematics 2021, 9, 1632. [Google Scholar] [CrossRef]
Figure 1. Optimal control coefficients 1 − α t ( 11 ) , 2 − α t ( 12 ) , and 3 − α t ( 22 ) .
Figure 1. Optimal control coefficients 1 − α t ( 11 ) , 2 − α t ( 12 ) , and 3 − α t ( 22 ) .
Mathematics 10 00184 g001
Figure 2. Optimal control coefficients 1 − B t ( 11 ) , 2 − B t ( 12 ) , and 3 − B t ( 13 ) .
Figure 2. Optimal control coefficients 1 − B t ( 11 ) , 2 − B t ( 12 ) , and 3 − B t ( 13 ) .
Mathematics 10 00184 g002
Figure 3. Optimal control coefficients 1 − B t ( 21 ) , 2 − B t ( 22 ) , and 3 − B t ( 23 ) .
Figure 3. Optimal control coefficients 1 − B t ( 21 ) , 2 − B t ( 22 ) , and 3 − B t ( 23 ) .
Mathematics 10 00184 g003
Figure 4. Trajectory and its estimate: 1 − c y ^ t and 2 − c y t .
Figure 4. Trajectory and its estimate: 1 − c y ^ t and 2 − c y t .
Mathematics 10 00184 g004
Figure 5. Actuator coordinate: 1 − x t ( 0 ) , 2 − x t m e , 3 − x t * , 4 − x t * * , and 5 − target C y t .
Figure 5. Actuator coordinate: 1 − x t ( 0 ) , 2 − x t m e , 3 − x t * , 4 − x t * * , and 5 − target C y t .
Mathematics 10 00184 g005
Figure 6. Actuator velocity: 1 − v t ( 0 ) , 2 − v t m e , 3 − v t * , 4 − v t * * , and 5 − target C y t .
Figure 6. Actuator velocity: 1 − v t ( 0 ) , 2 − v t m e , 3 − v t * , 4 − v t * * , and 5 − target C y t .
Mathematics 10 00184 g006
Figure 7. Controls: 1 − u t * , 2 − u t * * , and 3 − target C y t .
Figure 7. Controls: 1 − u t * , 2 − u t * * , and 3 − target C y t .
Mathematics 10 00184 g007
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Borisov, A.; Bosov, A.; Miller, G. Optimal Stabilization of Linear Stochastic System with Statistically Uncertain Piecewise Constant Drift. Mathematics 2022, 10, 184. https://doi.org/10.3390/math10020184

AMA Style

Borisov A, Bosov A, Miller G. Optimal Stabilization of Linear Stochastic System with Statistically Uncertain Piecewise Constant Drift. Mathematics. 2022; 10(2):184. https://doi.org/10.3390/math10020184

Chicago/Turabian Style

Borisov, Andrey, Alexey Bosov, and Gregory Miller. 2022. "Optimal Stabilization of Linear Stochastic System with Statistically Uncertain Piecewise Constant Drift" Mathematics 10, no. 2: 184. https://doi.org/10.3390/math10020184

APA Style

Borisov, A., Bosov, A., & Miller, G. (2022). Optimal Stabilization of Linear Stochastic System with Statistically Uncertain Piecewise Constant Drift. Mathematics, 10(2), 184. https://doi.org/10.3390/math10020184

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop