Optimal Stochastic Control in the Interception Problem of a Randomly Tacking Vehicle

Galyaev, Andrey A.; Lysenko, Pavel V.; Rubinovich, Evgeny Y.

doi:10.3390/math9192386

Open AccessArticle

Optimal Stochastic Control in the Interception Problem of a Randomly Tacking Vehicle

by

Andrey A. Galyaev

^*

,

Pavel V. Lysenko

and

Evgeny Y. Rubinovich

Institute of Control Sciences of RAS, 117997 Moscow, Russia

^*

Author to whom correspondence should be addressed.

Mathematics 2021, 9(19), 2386; https://doi.org/10.3390/math9192386

Submission received: 30 August 2021 / Revised: 22 September 2021 / Accepted: 22 September 2021 / Published: 25 September 2021

(This article belongs to the Special Issue Identification, Knowledge Engineering and Digital Modeling for Adaptive and Intelligent Control)

Download

Browse Figures

Versions Notes

Abstract

:

This article considers the mathematical aspects of the problem of the optimal interception of a mobile search vehicle moving along random tacks on a given route and searching for a target, which travels parallel to this route. Interception begins when the probability of the target being detected by the search vehicle exceeds a certain threshold value. Interception was carried out by a controlled vehicle (defender) protecting the target. An analytical estimation of this detection probability is proposed. The interception problem was formulated as an optimal stochastic control problem, which was transformed to a deterministic optimization problem. As a result, the optimal control law of the defender was found, and the optimal interception time was estimated. The deterministic problem is a simplified version of the problem whose optimal solution provides a suboptimal solution to the stochastic problem. The obtained control law was compared with classic guidance methods. All the results were obtained analytically and validated with a computer simulation.

Keywords:

optimal stochastic control; path planning; 2D random search; interception

1. Introduction

Search problems have become increasingly popular recently and have attracted a significant number of researchers [1,2,3,4,5]. The search process is considered to be that of exploring a certain area of a physical space in order to detect a searched object (SO) in this area with the search vehicle (SV) using various types of physical sensors. The basis for solving these problems is a symbiosis of models and methods from multiple branches of science, which allows establishing causal relationships among the search conditions, the physical characteristics of the SOs, and the search results.

Mathematical formulations of search problems can include various criteria [6,7] with the goal of the minimization or maximization of these criteria. All search problems can be divided into two groups according to the SO’s type: it can be stationary or mobile. The problems of the first type (Chapter 2 of [1]) are easier to solve than the problems of mobile SOs (Chapter 3 of [1,5]), since the parameters of their movement may be unknown to the SV. The problems of the second type have become popular in recent years due to the development of unmanned vehicles such as unmanned aerial vehicles (UAVs) or unmanned underwater vehicles (UUVs), operating in a largely unpredictable and uncertain marine environment [1,8].

The practical applications of such autonomous vehicles and search problems can vary from environmental monitoring and geological exploration to combat and reconnaissance tasks. Therefore, the parameters of the mathematical models can vary greatly depending on the different characteristics of real-world objects and their operating conditions. The problem considered in this article can be applied to objects in the marine environment such as UUVs or autonomous surface vehicles (ASVs), which can serve as both the SO and SV in the model under discussion.

The search can be performed by one [3,5] or several SVs [9,10]. If the SV and SO are on conflicting sides and the search itself is undesirable for the SO [11,12], then we can talk about the so-called threat environment [13,14]. Several SVs can be connected in a network structure and form a dynamically changing threat map [10,15]. The task of the SO (UUV or UAV) in this case is to avoid these threats while moving. The trajectory planning problem can be formulated for the SO when the threat mapping is known. If the dynamics of the SO is also known, then these problems are classical problems of deterministic optimal control.

If the SV presents a danger to the SO, the problem of interception can be considered. There is a vast class of such problems with various formulations and models of the moving vehicles. These models may include restrictions on the maneuverability of the vehicles [16,17,18]. Moreover, the problem can be considered optimal if any criterion, as for example, the intercept time, must be minimized [19,20,21]. In most problems studied in the literature, the intercepted vehicle moves along a given programmed trajectory [22]. Meanwhile, real vehicles as a rule move in a stochastic way, and this case is considered in the presented article.

The article relates to various branches of mathematics, such as stochastic control, guidance, information processing and search, and optimization, and is devoted to the problem of the optimal interception of an SV that moves randomly on tacks along a given course and searches for a target SO. The interception is carried out by a controlled mobile vehicle protecting the target SO. The presence of an arbitrarily maneuvering search vehicle requires an adequate mathematical formalization in the form of a stochastic control problem. The maneuvering process can be conveniently formalized using a jump-like Markov process with a given state vector and a given matrix of the transition intensities between these states. Such a model allows us to describe the trajectory of the SV in the form of a linear stochastic differential equation, which makes it possible to obtain the equations of the evolution of the mathematical expectation and variance. These equations allow us to formulate the problem of SV interception by the controlled vehicle with the criterion of a predicted miss or with a given mathematical expectation of a miss at the final position of the SV [16,17,18,19,20,21]. The purpose of the article is to find an interception trajectory of the controlled defender vehicle as a result of solving the optimal stochastic control problem and comparing this trajectory with classical guidance algorithms such as the pursuit guidance method and the method of proportional navigation guidance [23,24,25].

The considered problem belongs to the “attacker–target–defender” type [26,27,28], the essence of which is a counteraction to the SV (attacker) from the SO (target), which can be a certain strategically important mobile vehicle, by using an autonomous attacking robotic complex (defender), for example an UAV or UUV.

In this article, by SV, we mean a vehicle moving programmatically or randomly on a plane equipped with a circular detection zone of a fixed radius. The goal of the SV is to detect the SO, i.e., to cover the point of the plane depicting the SO with its detection zone and maximize some functional that characterizes the reliability of detecting the SO in this zone. The reliability of the detection (probability of correct classification) of the SV may depend on various physical factors, in particular on the time spent by the SO in the detection zone, its current distance from the SV, the direction of the velocity vector of the SO, etc. [29].

We considered the SO to be able to observe the real trajectory of the SV and evaluate the characteristics of its movement, i.e., current coordinates and components of the velocity vector. At some point in time, the SO releases a mobile defender, which moves autonomously and stealthily and does not have a communication channel with the SO. It was also assumed that the defender can evaluate the current motion characteristics of the SV using its passive onboard sensors. The stealthiness of the defender is provided, in particular, with its low velocity.

The proposed work has the following structure. In Section 2, the model of the SV with a given detection zone is considered. Section 3 contains a statistic description of the detection probability of the SO moving along a straight-line trajectory. In Section 4, the interception problem is formulated as an optimal stochastic control problem. This problem is analytically solved in Section 5, and the obtained results are discussed and illustrated with simulation examples in Section 6. Section 7 concludes the article and suggests the direction for future work.

2. Model of the SV’s Movement on Tacks

The search system consists of one SV, which has a circle detection zone of radius R. The SV moves piecewise-rectilinearly on a plane, tacking randomly around the line of the general course. The origin O of the stationary Cartesian coordinate system

X O Y

is situated in the initial position of the SV, as shown in Figure 1. This coordinate system is oriented in such a way that its

O X

axis coincides with the line of the general course of the SV.

The SV moves on tacks in accordance with the following law:

\{\begin{matrix} {\dot{x}}_{S V} = v_{x} = v cos α, \\ {\dot{y}}_{S V} = v_{y} = θ_{t} v sin α, \end{matrix}

(1)

where

α

is the specified tacking angle, v is the SV’s search speed, and

θ_{t}

is a random jump-like Markov process. The component of the SV’s velocity vector

\vec{v}

along the line of the general course is constant:

v_{x} = const .

Figure 2 shows a velocity diagram of the SV. As follows from (1), tacking was performed by periodically changing the velocity component

v_{y}

according to a random Markov process

θ_{t}

with a finite vector of states

J = (j_{1}, j_{2}, \dots, j_{n})

and a given matrix of the transition intensities between these states

Λ

. This article discusses the case of processes with three states

J = (- 1, 0, 1)

. This means that the SV’s velocity vector can coincide with the general course line (

θ_{t} = 0

) or deviate from it by a constant angle equal to

\pm α

(when

θ_{t} = \pm 1

), as shown in Figure 2.

We considered transitions between process states equally possible with transition intensity matrix:

Λ = λ (\begin{matrix} - 2 & 1 & 1 \\ 1 & - 2 & 1 \\ 1 & 1 & - 2 \end{matrix}),

(2)

corresponding to the state vector J. The variable

λ

here is

λ = 1 / τ_{0}

, where

τ_{0}

is the average time of the SV being on one tack. This model generates random trajectories that have the approximate shape shown in Figure 1.

For the mathematical formulation of the stochastic optimization problem, it is convenient to study the Gaussian Markov analog instead of the jump-like process

θ_{t}

. This diffusion process

Θ_{t}

has the same mathematical expectation and correlation function as the process

θ_{t}

. It follows from the theory of jump-like Markov processes that

Θ_{t}

allows the stochastic Ito differential [30]:

d Θ_{t} = - D Θ_{t} d t + σ d w_{t},

(3)

where

w_{t}

is a standard Wiener process and

D, σ

are constants related to the original Markov process

θ_{t}

:

D ≜ 3 λ

and

σ ≜ 2 tan α \sqrt{λ}

.

3. Detection Probability of the SO Moving at a Constant Velocity

Firstly, let us consider the task of detecting a target SO (target) with the SV, whose dynamics is described in Section 2. The following model was investigated. The target moves at a constant speed parallel to the general course line of the SV at a distance l from it.

The initial distance between the vehicles along the general course is L, so the initial Cartesian distance is

\sqrt{L^{2} + l^{2}}

. The SV is moving according to (1), where

θ_{t}

is a random Markov process with the state vector J and the transition matrix

Λ

from (2). The target moves according to the law:

\{\begin{matrix} \dot{x} = - u, \\ \dot{y} = 0, \end{matrix}

(4)

where u is its constant velocity.

The target will be detected if the distance between it and the SV becomes less than R. To simplify the model, let us assume that the detection is successful when the target’s and SV’s x-coordinates become equal at some point in time:

x_{S V} (ϑ) = x (ϑ)

, and the inequality

| y_{S V} (ϑ) - y (ϑ) | \leq R

is satisfied for the y coordinates.

The rendezvous instant

ϑ

is defined as:

ϑ = \frac{L}{v cos α + u} .

(5)

The probability of detection will be determined by including the

y_{S V}

coordinate in the interval

[l - R, l + R]

, namely:

P_{\det} = P {l - R \leq y_{S V} (ϑ) \leq l + R} = P \{\frac{l - R}{v sin α} \leq \int_{0}^{ϑ} θ_{s} d s \leq \frac{l + R}{v sin α}\} .

(6)

As mentioned in (3), the random jump-like Markov process

θ_{t}

can be replaced with its Gaussian Markov analog

Θ_{t}

, which has the same mathematical expectation and correlation function as the process

θ_{t}

.

Further, instead of calculating the random integral (6), we estimated the target detection probability by the SV through the analytical approximation of probability histograms, obtained in the numerical simulation. We assumed that at the instant

t_{0} = 0

, the target is situated in the position

E^{0} = (L, l)

and

L ≫ 1

(as shown in Figure 3) and the velocity of the target

u < 1

.

Due to the latter assumption, the SV’s detection zone can be considered as a flat-line segment with the length of the diameter instead of the circle. Thus, the detection probability can be estimated as the probability of meeting the target with this segment.

The histograms of the distribution density of the

y_{S V}

coordinate obtained in the interval

[l - Δ l, l + Δ l]

for some small

Δ l

are well approximated by the symmetric density of the Gaussian distribution. Figure 4 depicts the histogram of the probability of meeting between the target and SV and the corresponding density of the Gaussian distributions:

N (0, σ_{1}^{2})

for

σ_{1} = 0.705

for the case

L = L_{1} = 5

(Figure 4a) and

N (0, σ_{2}^{2})

for

σ_{2} = 0.993

for the case

L = L_{2} = 10

(Figure 4b). The histograms were constructed as a result of computer simulation of the movement of the target and SV for 10,000 implementations of the SV trajectory corresponding to

λ = 5 / 3

.

These graphs allowed us to estimate the SV’s detection probability

P_{\det}

at its various initial positions. Now, Equation (6) may be approximated as:

P_{\det} = P {l - R \leq y_{S V} (ϑ) \leq l + R} = \frac{1}{\sqrt{2 π} σ_{i}} \int_{l - R}^{l + R} exp (- y^{2} / (2 σ_{i}^{2})) d y,

(7)

where

σ_{i}

corresponds to various parameters

(L_{i}, l_{i}, u_{i})

. In particular, when

l = l_{1} = 1.5

and

l = l_{2} = 2.5

for

L_{1}

and

L_{2}

, respectively, these probabilities are presented in Table 1. In all cases, the velocity of the target is

u = 0.3

. All values are given in a normalized scale.

Next, we introduced a certain threshold value (security threshold)

h < 1

of the permissible detection probability of

P_{\det}

, for example

h = 0.07

. The situation with

P_{\det} \leq h

is considered safe. In this case, the target continues to move in a straight line without changing its course and speed. If

P_{\det} > h

, then the situation is considered dangerous. It was assumed that in the case of a dangerous situation, the target (to prevent the negative consequences of possible detection) uses the mobile defender mentioned in the Introduction, whose task is to intercept the SV with a minimum standard error at a given point in the plane relative to the SV.

The minimization of this miss is associated with the solution of the following optimal stochastic control problem.

4. Optimal Stochastic Control Problem

The problem was considered in a moving Cartesian coordinate system

X_{t} O_{t} Y_{t}

, where the origin

O_{t}

is associated with the current position

P^{t}

of the SV and the axis

O_{t} X_{t}

is directed parallel to the SV’s general course. The current position of the defender

E_{2}^{t}

is given by a two-dimensional vector

Z_{2}^{t}

directed from

O_{t} ≜ P^{t}

to

E_{2}^{t}

.

Terminal position

E_{2}^{ϑ}

of the defender is defined by a given two-dimensional vector d, as shown in Figure 5. An auxiliary vector

η_{t} ≜ Z_{2}^{t} - d

was introduced for a more convenient formulation of the defender’s optimal control problem.

In the selected coordinate system, the equations of the relative motion of the defender–SV system have the form:

{\dot{Z}}_{2}^{t} = u_{t} - (\begin{matrix} 1 \\ Θ_{t} \end{matrix}), u_{t} = (\begin{matrix} u_{x}^{t} \\ u_{y}^{t} \end{matrix}),

(8)

where

Θ_{t}

is from (3) and the initial position of

Z_{2}^{0}

were set. The two-dimensional velocity vector

u_{t}

of the defender plays the role of the control and is subject to the restrictions:

| u_{t} | \leq β < 1

(9)

with the specified constant

β

.

In terms of the auxiliary vector

η_{t}

introduced above, the equations of motion (8) take the compact form:

{\dot{η}}_{t} = u_{t} + A + B Θ_{t}, η_{0} ≜ Z_{2}^{0},

(10)

where:

A = (\begin{matrix} - 1 \\ 0 \end{matrix}), B = (\begin{matrix} 0 \\ - 1 \end{matrix}) .

(11)

At the terminal moment

ϑ

, the following condition must be met:

E η_{ϑ} = 0,

(12)

where

E

is the sign of the mathematical expectation. As a criterion, we took the terminal functional:

E G (η_{ϑ}, Θ_{ϑ}) \to min_{u_{t}},

(13)

where:

G (η_{ϑ}, Θ_{ϑ}) = η_{ϑ}^{2} + γ Θ_{ϑ} .

(14)

In (13) and (14), the summand

η_{ϑ}^{2}

characterizes the standard deviation of the defender from the end of the vector d at the terminal moment

ϑ

. The term

γ E Θ_{ϑ}

, where

γ

is a given constant, plays the role of an additional terminal penalty for the “convenient” or “inconvenient” tack of the SV at the time of

ϑ

. Here, the words “convenient” or “inconvenient” are used in the following sense. The tack of the SV at the time of

ϑ

is considered “convenient” if

Θ_{ϑ} < 0

, i.e., the component of the velocity of the SV along the

O Y

axis is negative (the SV is moving away from the line of the movement of the target

E_{1}

). Otherwise, we considered the tack of the SV “inconvenient”.

5. Optimal Stochastic Control

5.1. Reduction of the Optimal Stochastic Control Problem to the Deterministic One

It is known that solving stochastic optimization problems in real time is associated with certain difficulties [30]. For this reason, instead of the original stochastic problem (3), (9)–(14), we solved its deterministic analog. To construct this analog, we need the following auxiliary results.

The solution of Equation (3) has the form:

Θ_{t} = e^{- D t} Θ_{0} + σ \int_{0}^{t} e^{- D (t - s)} d w_{s} .

(15)

Integration (15) leads to the equation:

\int_{0}^{t} Θ_{s} d s = \frac{Θ_{0}}{D} (1 - e^{- D t}) + \frac{σ}{D} \int_{0}^{t} (1 - e^{- D (t - s)}) d w_{s} .

(16)

Now, let us calculate the value of the criterion (13) with an arbitrary permissible program control

u_{t}

and the parameter

ϑ

fixed at the moment

t_{0} = 0

. To this end, we integrated the equations of motion (10) taking into account (16). We have:

η_{ϑ} = η_{0} + A ϑ + B \frac{θ_{0}}{D} (1 - e^{- D ϑ}) + B \frac{σ}{D} \int_{0}^{ϑ} (1 - e^{- D (ϑ - s)}) d w_{s} + \int_{0}^{ϑ} u_{s} d s .

(17)

From (12) and (17) follows:

E η_{ϑ} = η_{0} + A ϑ + B \frac{Θ_{0}}{D} (1 - e^{- D ϑ}) + \int_{0}^{ϑ} u_{s} d s = 0 .

(18)

Finally, from (17) and (18), we obtain:

E η_{ϑ}^{2} = \frac{σ^{2}}{D^{2}} [ϑ - \frac{2}{D} (1 - e^{- D ϑ}) + \frac{1}{2 D} (1 - e^{- 2 D ϑ})] .

(19)

Thus, the (13) criterion takes the form:

E G = \frac{σ^{2}}{D^{2}} [ϑ - \frac{2}{D} (1 - e^{- D ϑ}) + \frac{1}{2 D} (1 - e^{- 2 D ϑ})] + γ e^{- D ϑ} Θ_{0} \to min_{u_{t}} .

(20)

Now, we transformed (18) by introducing a two-dimensional vector

ξ_{t}

subordinate to the equation:

{\dot{ξ}}_{t} = A + B Θ_{0} e^{- D t} + u_{t}

(21)

with boundary conditions:

ξ_{0} = η_{0}, ξ_{ϑ} = 0 .

(22)

In terms of the vector

ξ_{t}

, the desired deterministic analog is the following auxiliary problem of optimal (deterministic) control, which includes the equations of motion (21), boundary conditions (22), control constraints (9), and terminal criterion

F (ϑ) \to min_{u_{t}}

, where

F (ϑ)

denotes the right-hand side of (20) with the excluded additive constants

- 2 σ^{2} / D^{3}

and

σ^{2} / (2 D^{3})

:

F (ϑ) ≜ \frac{σ^{2}}{D^{2}} [ϑ + \frac{2}{D} e^{- D ϑ} - \frac{1}{2 D} e^{- 2 D ϑ}] + γ e^{- D ϑ} Θ_{0} \to min_{u_{t}} .

(23)

5.2. Pontryagin’s Maximum Principle in the Auxiliary Optimal Problem (23)

To solve the auxiliary problem, we used Pontryagin maximum principle (PMP) [31]. According to the procedure of PMP, firstly, we constructed the Hamiltonian:

H = λ_{ξ} \cdot (A + B θ_{0} e^{- D t}) + λ_{ξ} \cdot u_{t} \to max_{u_{t}} .

(24)

Here, the dot between the two-dimensional vectors means a scalar product, and

λ_{ξ} = λ_{ξ} (t)

is a conjugate variable corresponding to the phase variable

ξ_{t}

. From (24), we found the explicit form of the optimal control (here and further, the * symbol indicates the optimal controls):

u_{t}^{*} = β \frac{λ_{ξ} (t)}{| λ_{ξ} (t) |} .

(25)

The conjugate variable satisfies [31]:

{\dot{λ}}_{ξ} (t) = - \frac{\partial H}{\partial ξ} (t) = 0;

(26)

hence

λ_{ξ} (t) = λ_{ξ} = const

, which leads to

u_{t}^{*} = u^{*} = const

with

| u^{*} | = β

. In other words, the program motion of the controlled object is implemented in a straight line with the maximum possible speed. The transversality conditions at instant

ϑ

are given by:

δ F (ϑ) + λ_{ξ} \cdot δ ξ - H δ ϑ = 0,

(27)

where according to (23):

δ F (ϑ) = \frac{\partial F (ϑ)}{\partial ϑ} δ ϑ = \frac{σ^{2}}{D^{2}} [1 - 2 e^{- D ϑ} + e^{- 2 D ϑ}] δ ϑ - γ D e^{- D ϑ} θ_{0} δ ϑ .

(28)

Following (27), (28):

H (ϑ) = \frac{σ^{2}}{D^{2}} [1 - 2 e^{- D ϑ} + e^{- 2 D ϑ}] - γ D e^{- D ϑ} θ_{0} .

(29)

Integrating (21), taking into account (22), gives:

η_{0} + A ϑ + B \frac{θ_{0}}{D} (1 - e^{- D ϑ}) + u^{*} ϑ = 0

(30)

that naturally coincides with (18) under

u_{t} = u^{*}

.

Next, we put

\{\begin{matrix} u^{*} ≜ β (cos φ, sin φ), with φ = const, \\ η_{0} ≜ (x_{0}, y_{0}) . \end{matrix}

(31)

Then, from (30) and (31), we have in a componentwise form of the system of two equations with respect to

φ

and

ϑ

:

\{\begin{matrix} x_{0} - ϑ + β ϑ cos φ = 0, \\ y_{0} + β ϑ sin φ - \frac{θ_{0}}{D} (1 - e^{- D ϑ}) = 0 . \end{matrix}

(32)

From (32) follows:

\{\begin{matrix} cos φ = (ϑ - x_{0}) {(β ϑ)}^{- 1}, \\ sin φ = [\frac{Θ_{0}}{D} (1 - e^{- D ϑ}) - y_{0}] {(β ϑ)}^{- 1}, \end{matrix}

(33)

where

ϑ

can be found as the least-positive root of the equation, following from the identical equality

{cos}^{2} φ + {sin}^{2} φ = 1

with respect to the right parts of (33), namely:

{(ϑ - x_{0})}^{2} + {[\frac{Θ_{0}}{D} (1 - e^{- D ϑ}) - y_{0}]}^{2} = β^{2} ϑ^{2} .

(34)

Formulas (33) and (34) allow us to find the velocity components of the controlled object and the time interval

[0, ϑ]

of its motion from the initial position to the end of the vector d.

If

D ϑ

in (34) is sufficiently large, then the term

e^{- D ϑ}

is close to zero and can be omitted. In this case, (34) takes the form:

{(ϑ - x_{0})}^{2} + {(\frac{Θ_{0}}{D} - y_{0})}^{2} = β^{2} ϑ^{2} .

(35)

Then, the instant

ϑ

can be found as the least root of the square Equation (35):

ϑ = \frac{x_{0} - \sqrt{x_{0}^{2} - (1 - β^{2}) (x_{0}^{2} + {(\frac{Θ_{0}}{D} - y_{0})}^{2})}}{1 - β^{2}} .

(36)

To construct a positional optimal control (feedback control) of the defender, the current moment t was taken as the initial

t_{0}

, the current position

(x_{t}, y_{t})

was taken as the initial

(x_{0}, y_{0})

, and the current value of

Θ_{t}

—for the initial

Θ_{0}

; after that, the instantaneous direction of the vector

u_{t}^{*}

of the defender’s velocity was calculated using the formulas (31) taking into account (33) and (36). Next,

u_{t}^{*}

was recalculated at the rate of updating the current information. Note that at a high rate of updating this information, it may be quite justified to use the piecewise program control of the defender, in which its control is recalculated only at certain moments called correction moments with intervals between them

Δ t_{u}

. During these intervals, the defender moves programmatically according to control

u_{t}^{*}

, calculated in the previous step.

6. Examples

To demonstrate the effectiveness of the obtained optimal control, a numerical simulation was performed for two approaches for studying the interaction between the defender and SV. These approaches differ in the mathematical description of the evolution of the y-component of the SV’s velocity. In the first (discrete) approach, this component is piecewise constant and its evolution is described as a jump-like Markov process

θ_{t}

with three states

(1, 0, - 1)

and the transition intensity matrix

Λ

from (2). The description of this process is given in the beginning of Section 2. In the second (continuous) approach, an evolution of the y-component of the SV’s velocity vector is set by Gaussian process

Θ_{t}

, i.e., continuous diffusion process (3).

In both approaches, the control of the defender was obtained through Equations (31), (33), and (36). In other words, the control of the defender is always calculated according to the continuous diffusive model (3) of the evolution of the y-component of the SV’s velocity vector. Strictly speaking, as this control law is the result of the solution of the continuous problem, it should not always successfully solve the discrete problem, simulated in the first approach. The idea of these experiments is to apply the solution of the continuous problem, which can be solved analytically, to the similar discrete practical model, which cannot be studied in the same convenient way. In all experiments, vector d was considered to be null, i.e., the defender has to intercept the SV.

Both approaches to the simulation are shown in further examples, which were devoted to two different applications of the studied interception problem.

The realization of diffusive process

Θ_{t}

was acquired in Maple with the package for stochastic equations. An approximate formula for

ϑ

(36) was used for the stochastic differential Equation (15). Thus, Maple allows integrating this equation numerically and obtaining the optimal trajectory of the defender, as well as the random trajectory of the SV corresponding to the process with the appropriate mathematical expectation and dispersion.

A more practical discrete jump-like process

θ_{t}

was simulated in Python script. The movement of the SV and defender was computed with a very small discretization step

Δ t

, which is the quality of the simulation. At each step, the SV, according to the model from Section 3, can change the direction of its

v_{y}

velocity component with probability

2 λ Δ t

or not change it with probability

(1 - 2 λ Δ t)

. However, in practice, this model is not very useful. This process is identical to a Gaussian process: the time of another SV tack is sampled exponentially with mathematical expectation

1 / λ

, and the direction of the vertical velocity for this tack is chosen from two directions, different from the current one with probability

1 / 2

. The defender, on the other hand, has its own parameter

Δ t_{u}

and corrects its control law according to (36) every interval

Δ t_{u}

, considering the current positions to be initial.

6.1. Intrusion in the Detection Zone

The first application is the intrusion of the SV’s detection zone by the defender to distract the SV from the target. In normalized scale, these parameters are:

R = 1, v_{x} = 1, τ_{o} = 0.6 .

Let

tan α = 0.5

. Then, the parameters for Gauss process

Θ_{t}

are:

λ = \frac{1}{τ_{0}} \approx 1.67, D = 3 λ \approx 5, σ = 2 tan α \sqrt{λ} = 1.29 .

In the coordinate system associated with the initial position of the SV, the initial coordinates of the defender’s position are

(10, 1)

in the normalized scale. The velocity of the defender was chosen as

β = 0.5

. The probability of the detection of the target following a parallel course from this coordinates equals

P_{\det} = 0.5

, which is higher than the accepted security threshold

h = 0.07

. Thus, according to the above-described security concept, the target must use a mobile defender.

The results of this experiment are shown in Figure 6. The red line depicts the trajectory of the defender, whereas the blue one, that of the SV. Figure 6a shows the evolution of the y-component of the SV’s velocity according to Markov jump-like process

θ_{t}

. Figure 6b shows the trajectories of the vehicles for the diffusion approximation

Θ_{t}

of the process

θ_{t}

. In Figure 6a, the black ellipse depicts the circular detection zone of radius R, which looks ellipsoidal due to the different scale of the

O X

and

O Y

axes. In the case of the discrete model, the parameter

Δ t_{u}

is equal to

τ_{0}

. In the case of the continuous model, the calculation of the defender’s optimal control is performed in time with the SV’s information updating, i.e., almost continuously (

Δ t_{u}

equals the simulation discretization step).

For the estimation of time

ϑ

, Equation (36) was used. According to (36), interception time

ϑ = 7

, which means

e^{- D ϑ} \approx 0

, i.e.,

1 - e^{- D ϑ} \approx 1

, so

u_{t}

can be found from Equations (31), (33), and (36). One can see in Figure 6 that the trajectories of the defender for the discrete and continuous models of the SV’s movement were quite close. The difference of the trajectories in the final sections was due to the significant duration of the interval

Δ t_{u}

between the updates of the information about the SV and, thereby, the corrections of the defender’s program control in the discrete approach.

As one can see, the problem of interception was solved successfully, as the defender moving from the initial position with the found u control finally occurred in the close vicinity of the SV.

6.2. Destruction of the SV

The second application is the task of the destruction of the SV using the defender. To complete this task, the defender must come close enough to the SV. In the normalized scale:

R = 1, v_{x} = 1, τ_{o} \approx 60 .

Let

tan α = 0.5

. Therefore:

λ = 0.017, D = 0.05, σ = 0.13 .

In the coordinate system associated with the initial position of the SV, the initial coordinates of the defender are

(300, 20)

in the normalized scale. The velocity of the defender was chosen as

β = 0.5

. As the target moves parallel to the general course of the SV, then the detection probability

P_{\det}

equals

P_{\det} = 0.37 > h = 0.07

; thus, using the defender is justified.

The results of the modeling are presented in Figure 7. As in the first example, Figure 7a corresponds to the discrete approach to the simulation and the process

θ_{t}

, and Figure 7b relates to the continuous approach and the process

Θ_{t}

.

The accuracy of the interception of the SV by the defender or the so-called terminal miss obviously depends on the parameter

Δ t_{u}

—the time interval between corrections of the defender’s control. Figure 8 presents the results of different simulations of the interception of the SV by the defender for the discrete approach. Figure 8a corresponds to the case of

Δ t_{u} = τ_{0}

. A sufficient miss of the defender can be explained by the relatively significant duration

Δ t_{u}

of its movement without control correction and the “inconvenient” realization of the tack, which combined with the velocity advantage (

β < 1

) allowed the SV to avoid interception by the defender. However, decreasing the parameter

Δ t_{u}

helped achieve more satisfactory results, as shown in Figure 8b. For two similar realizations of process

θ_{t}

(blue lines), the trajectories of the controlled defender (red lines) were clearly very different with dependence on the parameter

Δ t_{u}

(

τ_{0}

and

τ_{0} / 10

, respectively).

6.3. Comparison with Classic Guidance Methods

The optimal control law of the defender obtained here was compared with classic guidance methods, mentioned in the Introduction, such as the pursuit guidance method and parallel guidance, which is a specific case of the proportional navigation guidance method. On average, our method gave better results than the others. In Figure 9, a typical realization of different simulated guidance methods is presented. The orange line designates the trajectory of the defender, acting according to the pursuit guidance method; the red line denotes the trajectory generated by the parallel guidance algorithm; the blue graph shows the SV’s movement. The defender, controlled according to Equations (31), (33) and (36), has a green trajectory. Dashed lines illustrate the distances on the Y axis between the SV and defender at instant

ϑ

when their X-coordinates coincide.

As one can see, the green defender was closer to the SV than the others. Classic guidance methods are effective when the pursuer velocity is higher than the one of the evader. That is not the case in the current study, because the defender’s velocity

β

was less than the velocity of the SV. Moreover, the classic guidance methods are not intended to be use for intercepting stochastic targets, unlike the control law obtained in this article as a solution of the stochastic optimal control problem.

7. Conclusions

The article considered one “attacker–target–defender”-type problem of the interaction on a plane between the search system, consisting of one search vehicle with the circle detection zone, and the mobile searched object. The search vehicle tacked randomly along a given general course towards the searched object, and its movement was described using a Markov jump-like process. The searched object had a mobile defender onboard, which can be used for the distraction and destruction of the search vehicle, if it presents a danger to the searched object in the sense of its detection. The feature of this problem is that the defender has lower dynamic capabilities in comparison to the searching vehicle being intercepted.

It was shown that, being stochastic in nature, the optimal control problem of the interception of a search vehicle can be transformed into the classic deterministic problem of optimal control in the class of piecewise-programmatic controls. The optimal time of interception was estimated, and an optimal control law was found. The examples of the numerical simulations for both the discrete and continuous (stochastic and deterministic) problems were presented to reveal the efficiency of the designed results. Furthermore, a comparison with the interception solutions, based on classic guidance laws, was presented.

In the future, it is planned to consider a similar problem statement with a group of search vehicles instead of one.

Author Contributions

Conceptualization, A.A.G. and E.Y.R.; methodology, A.A.G. and E.Y.R.; software, P.V.L.; validation, A.A.G., P.V.L. and E.Y.R.; formal analysis, A.A.G., P.V.L. and E.Y.R.; investigation, A.A.G. and E.Y.R.; writing—original draft preparation, A.A.G., P.V.L. and E.Y.R.; writing—review and editing, A.A.G., P.V.L. and E.Y.R.; visualization, P.V.L.; supervision, A.A.G. and E.Y.R.; project administration, A.A.G.; funding acquisition, A.A.G. All authors have read and agreed to the published version of the manuscript.

Funding

This paper was partially supported by the Program of Basic Research of RAS.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data sharing not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

SO	Searched object
SV	Search vehicle
UAV	Unmanned aerial vehicle
UUV	Unmanned underwater vehicle
ASV	Autonomous surface vehicle

References

Stone, L.D.; Royset, J.O.; Washburn, A.R. Optimal Search for Moving Targets; Springer International Publishing: Cham, Switzerland, 2016. [Google Scholar]
Meghjani, M.; Manjanna, S.; Dudek, G. Multi-target search strategies. In Proceedings of the 2016 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR), Lausanne, Switzerland, 23–27 October 2016; pp. 328–333. [Google Scholar]
Youngchul, B. Target searching method in the chaotic mobile robot. In Proceedings of the 23rd Digital Avionics Systems Conference, Salt Lake City, UT, USA, 24–28 October 2004; Volume 12, pp. 7–12. [Google Scholar]
Austin, D.J.; Jensfelt, P. Using multiple Gaussian hypotheses to represent probability distributions for mobile robot localization. In Proceedings of the 2000 ICRA, Millennium Conference, IEEE International Conference on Robotics and Automation, San Francisco, CA, USA, 24–28 April 2000; pp. 1036–1041. [Google Scholar]
Galyaev, A.A.; Lysenko, P.V.; Yakhno, V.P. Optimal Path Planning for an Object in a Random Search Region. Autom. Remote Control 2018, 79, 2080–2089. [Google Scholar] [CrossRef]
Shaikin, M.E. On statistical risk funstional in a control problem for an object moving in a conflict environment. J. Comput. Syst. Sci. Int. 2011, 1, 22–31. [Google Scholar]
Sysoev, L.P. Criterion of detection probability on trajectory in the problem of object motion control in threat environment. Control Probl. 2010, 6, 65–72. [Google Scholar]
Andreev, K.V.; Rubinovich, E.Y. Moving observer trajectory control by angular measurements in tracking problem. Autom. Remote Control 2016, 77, 106–129. [Google Scholar] [CrossRef]
Wettergren, T.A.; Baylog, J.G. Collaborative search planning for multiple vehicles in nonhomogeneous environments. In Proceedings of the OCEANS 2009, Biloxi, MS, USA, 26–29 October 2009; pp. 1–7. [Google Scholar]
Wang, Y.; Hussein, I.I. Search and Classification Using Multiple Autonomous Vehicles; Springer: London, UK, 2012. [Google Scholar]
Galyaev, A.A.; Maslov, E.P. Optimization of a mobile object evasion laws from detection. J. Comput. Syst. Sci. Int. 2010, 49, 560–569. [Google Scholar] [CrossRef]
Galyaev, A.A.; Maslov, E.P. Optimization of the Law of Moving Object Evasion from Detection under Constraints. Autom. Remote Control 2012, 73, 992–1004. [Google Scholar] [CrossRef]
Zabarankin, M.; Uryasev, S.; Pardalos, P. Optimal Risk Path Algorithms Cooperative Control and Optimization; Murphey, P., Ed.; Kluwer Acad: Dordrecht, The Netherlands, 2002; Volume 66, pp. 271–303. [Google Scholar]
Sidhu, H.; Mercer, G.; Sexton, M. Optimal trajectories in a threat environment. J. Battlef. Technol. 2006, 9, 33–39. [Google Scholar]
Dogan, A.; Zengin, U. Unmanned Aerial Vehicle Dynamic-Target Pursuit by Using Probabilistic Threat Exposure Map. J. Guid. Control Dyn. 2006, 29, 723–732. [Google Scholar] [CrossRef]
Meyer, Y.; Isaiah, P.; Shima, T. On dubins paths to intercept a moving target. Automatica 2015, 53, 256–263. [Google Scholar] [CrossRef]
Zheng, Y.; Chen, Z.; Shao, X.; Zhao, W. Time-optimal guidance for intercepting moving targets by dubins vehicles. Automatica 2021, 128, 109557. [Google Scholar] [CrossRef]
Buzikov, M.E.; Galyaev, A.A. Time-minimal interception of a moving target by dubins car. Autom. Remote Control 2021, 82, 745–758. [Google Scholar] [CrossRef]
Guelman, M.; Shinar, J. Optimal guidance law in the plane. J. Guid. Control Dyn. 1984, 7, 471–476. [Google Scholar] [CrossRef]
Glizer, V.Y. Optimal planar interception with fixed end conditions: Closed-form solution. J. Optim. Theory Appl. 1996, 88, 503–539. [Google Scholar] [CrossRef]
Gopalan, A.; Ratnoo, A.; Ghose, D. Time-optimal guidance for lateral interception of moving targets. J. Guid. Control Dyn. 2016, 39, 510–525. [Google Scholar] [CrossRef]
Manyam, S.G.; Casbeer, D.W. Intercepting a target moving on a racetrack path. In Proceedings of the 2020 International Conference on Unmanned Aircraft Systems (ICUAS), Athens, Greece, 1–4 September 2020; pp. 799–806. [Google Scholar]
Siouris, G.M. Missile Guidance and Control Systems; Springer: New York, NY, USA, 2004. [Google Scholar]
Lin, C.F. Modern Navigation, Guidance, and Control Processing; Prentice Hall: Hoboken, NJ, USA, 1991. [Google Scholar]
Palumbo, N.F.; Blauwkamp, R.; Lloyd, J. Basic Principles of Homing Guidance. Johns Hopkins APL Tech. Digest. 2010, 29, 25–41. [Google Scholar]
Pachter, M.; Garcia, E.; Casbeer, D.W. Active target defense differential game. In Proceedings of the 52nd Annual Allerton Conference on Communication, Control, and Computing, Monticello, IL, USA, 30 September–3 October 2014; pp. 46–53. [Google Scholar]
Pachter, M.; Garcia, E.; Casbeer, D.W. Active Target defense differential game with a fast Defender. In Proceedings of the American Control Conference (ACC), Chicago, IL, USA, 1–3 July 2015; pp. 3752–3757. [Google Scholar]
Zhang, J.; Zhuang, J. Modeling a Multi-target Attacker-defender Game with Multiple Attack Types. Reliab. Eng. Syst. Saf. 2019, 185, 465–475. [Google Scholar] [CrossRef]
Galyaev, A.A.; Dobrovidov, A.V.; Lysenko, P.V.; Shaikin, M.E.; Yakhno, V.P. Path Planning in Threat Environment for UUV with Non-Uniform Radiation Pattern. Sensors 2020, 20, 2076. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Krichagina, N.V.; Liptser, R.S.; Rubinovich, E.Y. Kalman filter for Markov processes. In Statistics and Control of Stochastic Processes; Publ. Div.: New York, NY, USA, 1985; pp. 197–213. [Google Scholar]
Ross, I.M. A Primer on Pontryagin’s Principle in Optimal Control; Collegiate Publishers: San Francisco, CA, USA, 2009. [Google Scholar]

Figure 1. The SV’s trajectory.

Figure 2. Velocity diagram of the SV.

Figure 3. Relative positions of the SV and SO.

Figure 4. Histograms of the probability detection distribution density of the target moving at a constant velocity.

Figure 5. Geometry of the problem.

Figure 6. Intrusion of the SV’s detection zone. (a) SV and defender trajectories corresponding to the path of

θ_{t}

; (b) SV and defender trajectories corresponding to the path of

Θ_{t}

.

Figure 6. Intrusion of the SV’s detection zone. (a) SV and defender trajectories corresponding to the path of

θ_{t}

; (b) SV and defender trajectories corresponding to the path of

Θ_{t}

.

Figure 7. Destruction of the SV. (a) SV and defender trajectories corresponding to the path of

θ_{t}

; (b) SV and defender trajectories corresponding to the path of

Θ_{t}

.

Figure 7. Destruction of the SV. (a) SV and defender trajectories corresponding to the path of

θ_{t}

; (b) SV and defender trajectories corresponding to the path of

Θ_{t}

.

Figure 8. Interception trajectories with different values of

Δ t_{u}

.

Figure 8. Interception trajectories with different values of

Δ t_{u}

.

Figure 9. Comparison of different guidance methods.

Table 1. The detection probability of the target

P_{\det}

at its various initial positions

E^{0} = (L, l)

.

Table 1. The detection probability of the target

P_{\det}

at its various initial positions

E^{0} = (L, l)

.

L	l	$P_{\det}$
5	1.5	0.238
5	2.5	0.017
10	1.5	0.304
10	2.5	0.065

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Galyaev, A.A.; Lysenko, P.V.; Rubinovich, E.Y. Optimal Stochastic Control in the Interception Problem of a Randomly Tacking Vehicle. Mathematics 2021, 9, 2386. https://doi.org/10.3390/math9192386

AMA Style

Galyaev AA, Lysenko PV, Rubinovich EY. Optimal Stochastic Control in the Interception Problem of a Randomly Tacking Vehicle. Mathematics. 2021; 9(19):2386. https://doi.org/10.3390/math9192386

Chicago/Turabian Style

Galyaev, Andrey A., Pavel V. Lysenko, and Evgeny Y. Rubinovich. 2021. "Optimal Stochastic Control in the Interception Problem of a Randomly Tacking Vehicle" Mathematics 9, no. 19: 2386. https://doi.org/10.3390/math9192386

APA Style

Galyaev, A. A., Lysenko, P. V., & Rubinovich, E. Y. (2021). Optimal Stochastic Control in the Interception Problem of a Randomly Tacking Vehicle. Mathematics, 9(19), 2386. https://doi.org/10.3390/math9192386

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Optimal Stochastic Control in the Interception Problem of a Randomly Tacking Vehicle

Abstract

1. Introduction

2. Model of the SV’s Movement on Tacks

3. Detection Probability of the SO Moving at a Constant Velocity

4. Optimal Stochastic Control Problem

5. Optimal Stochastic Control

5.1. Reduction of the Optimal Stochastic Control Problem to the Deterministic One

5.2. Pontryagin’s Maximum Principle in the Auxiliary Optimal Problem (23)

6. Examples

6.1. Intrusion in the Detection Zone

6.2. Destruction of the SV

6.3. Comparison with Classic Guidance Methods

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI