1. Introduction
The theory of deterministic pursuit–evasion games can single-handedly be attributed to Isaacs in the 1950s [
1]. Here, Isaacs first considered differential games as two-player zero-sum games. One early application was formulation of missile guidance systems during his time with the RAND Corporation. Shortly thereafter, Kalman among others initiated the linear quadratic regulator (LQR) and the linear quadratic tracker (LQT) in the continuous and discrete cases (see Refs. [
2,
3,
4,
5]). Since then, the concept of pursuit–evasion games and optimal control have been closely related, each playing a fundamental role in control engineering and economics. One breakout paper that combined these concepts was written by Ho, Bryson, and Baron. Together, they studied linear quadratic pursuit–evasion games (LQPEG) as regulator problems [
6,
7]. In particular, this work included a three-dimensional target interception problem. Since then, there have been a number of papers that have extended these results in the continuous and discrete cases. One of the issues that researchers have faced in the past is the discrete nature of these mixed strategies.
In 1988, Stefan Hilger initiated the theory of dynamic equations of time scales, which seeks to unify and extend discrete and continuous analysis [
8]. As a result, we can generalize a process to account for both cases, or any combination of the two, provided we restrict ourselves to closed, nonempty subsets of the reals (a time scale). From a numerical viewpoint, this theory can be thought of a generalized sampling technique that allows a researcher to evaluate processes with continuous, discrete, or uneven measurements. Since its inception, this area of mathematics has gained a great deal of international attention. Researchers have since found applications of time scales to include heat transfer, population dynamics, and economics. For a more in depth study of time scales, it is suggested that one see Bohner and Peterson’s books [
9,
10].
There have been a number of researchers who have sought to combine this field with the theory of control. A number of authors have contributed to generalizing the basic notions of controllability and observability (see Refs. [
11,
12,
13,
14,
15]). Bohner first provided the conditions for optimality for dynamic control processes in Ref. [
16]. DaCunha unified the theory of Lyapunov and Floquet theory in his dissertation [
17]. Hilscher along with Zeidan have studied optimal control for sympletic systems [
18]. Additional contributions can be found in Refs. [
19,
20,
21,
22,
23,
24], among several others.
In this paper, we study a natural extension of the LQR and LQT previously generalized to dynamics equations on time scales (see Refs. [
25,
26]). Here, we consider the following separable dynamic systems
where
represent the pursuer and evader states, respectively, and
are the respective controls. In general, we can assume
. Note that
are the associated state matrices, while
are the corresponding control matrices. Here, the pursuing state seeks to overtake the evading state at time
, while the evader state seeks an escape. For convenience, we make the following assumptions. First, we assume the matrices for both players are constant (i.e., we have linear-time invariant). However, it should be noted the control schemes developed throughout can be adapted for the time-varying case in a similar fashion. Second, we assume that the pursuer and evader dynamic systems are both controllable and belong to the same time scale.
Next, we note our state equations are associated with the cost functional
where
and diagonal,
and
,
. Note that the goal of the pursuing state is to minimize (
2), while the evading state seeks to maximize it. Since these states represent opposing players, evaluating this cost can be thought of as a minimax problem.
The pursuit–evasion framework remains an active area across multiple disciplines, as found in Refs. [
27,
28,
29,
30,
31,
32,
33]. It should be noted that there have been other excursions in combining dynamic games with time scales calculus. Libich and Stehlík introduced macroeconomic policy games on time scales with inefficient equilibria in Ref. [
34]. Martins and Torres considered
player games where each player sought to minimize a shared cost functional. Mozhegova and Petrov introduced a simple pursuit problem in Ref. [
35] and a dynamic analogue to the “Cossacks-robbers” in Ref. [
36]. Minh and Phuong have previously studied linear pursuit-evasion games on time scales in Ref. [
37]. However, these results do not include a regulator/saddle point framework, and they are not complete when compared to this manuscript.
The organization of this paper is as follows.
Section 2 presents core definitions and concepts of the time scales calculus. We offer the variational properties needed such that an optimal strategy exists in
Section 3. In
Section 4, we seek a mixed strategy when the final states are both fixed. In this setting, we can rewrite our cost functional (
2) in terms of the difference in gramians of each system. For
Section 5, we find a pair of a controls in terms of an extended state. In
Section 6, we offer some examples including a numerical result. Finally, we provide some concluding remarks and future plans in
Section 7.
In
Table 1 below, we summarize the notation used throughout the manuscript.
2. Time Scales Preliminaries
Here we offer a brief introduction to the theory of dynamic equations on time scales. For a more in-depth study of time scales, see Bohner and Peterson’s books [
9,
10].
Definition 1. A time scale is an arbitrary nonempty closed subset of the real numbers. We let if exists; otherwise, .
Example 1. The most common examples of time scales are , , for , and for .
Next, we introduce two time scales calculus concepts used throughout this paper.
Definition 2. We define the forward jump operator
and the graininess function
by Below, we note how the forward jump operator is applied to functions.
Definition 3. For any function , we define the function by .
Next, we define the delta (or Hilger) derivative as follows.
Definition 4. Assume and let . The delta derivative is the number (when it exists) such that, given any , there is a neighborhood U of t such that In the next two theorems, we consider some properties of the delta derivative.
Theorem 1. (See Ref. [
9], Theorem 1.16)
. Suppose is a function and let . Then we have the following:- a.
If f is differentiable at t, then f is continuous at t.
- b.
If f is continuous at t, where t is right-scattered, then f is differentiable at t and - c.
If f is differentiable at t, where t is right-dense, then - d.
If f is differentiable at t, then
Note that (
3) is called the “simple useful formula.”
Example 2. Note the following examples.
- a.
When , then (if the limit exists) - b.
- c.
When for , then - d.
When for , then
Next we consider the linearity property as well as the product rules.
Theorem 2. (See Ref. [
9], Theorem 1.20)
. Let be differentiable at . Then we have the following:- a.
For any constants α and β, the sum is differentiable at t with - b.
The product is differentiable at t with
Before introducing integration on time scales, we require our functions to be rd-continuous.
Definition 5. A function is said to be rd-continuous on when f is continuous in points with and it has finite left-sided limits in points with . The class of rd-continuous functions is denoted by . The set of functions that are differentiable and whose derivative is rd-continuous is denoted by .
Next, we define an antiderivative on time scales.
Theorem 3. (See Ref. [
9], Theorem 1.74)
. Any rd-continuous function has an antiderivative F, i.e., on . Now we introduce our fundamental theorem of calculus.
Definition 6. Let and let F be any function such that for all . Then the Cauchy integral of f is defined by Example 3. Let with and assume that .
- a.
- b.
- c.
When for , then - d.
When for , then
Next, we introduce our regressivity condition, used throughout the manuscript.
Definition 7. An matrix-valued function A on is rd-continuous if each of its entries are rd-continuous. Furthermore, if , A is said to be regressive
(we write ) if Note that for our purposes, the matrix exponential used in this paper is defined to be the solution to the dynamic equation below.
Theorem 4. (See Ref. [
9], Theorem 5.8)
. Suppose that A is regressive and rd-continuous. Then the initial value problemwhere I is the identity matrix, has a unique matrix-valued solution X. Definition 8. The solution X from Theorem 4 is called the matrix exponential function on and is denoted by .
Next, we offer useful properties associated with the matrix exponential.
Theorem 5. (See Ref. [
9], Theorem 5.21)
. Let A be regressive and rd-continuous. Then for ,
- a.
, hence ,
- b.
,
- c.
,
- d.
,
- e.
.
Next we give the solution (state response) to our linear system using a variation of parameters.
Theorem 6. (See Ref. [
9], Theorem 5.24)
. Let be an matrix-valued function on and suppose that is rd-continuous. Let and . Then the solution of the initial value problemis given by 3. Optimization of Linear Systems on Time Scales
In this section, we make use of variational methods on time scales as introduced by Bohner in Ref. [
16]. First, note that the state equations in (
1) are uncoupled. However, when establishing our conditions for a saddle point, we would prefer to use a state that combines the information of the pursuer and evader. For convenience, we rewrite (
1) as
where
z represents an extended state given by
,
,
, and
. Associated with (
4) is the quadratic cost functional
where
,
, and
,
. To minimize (
5), we introduce the augmented cost functional
where the so-called
Hamiltonian H is given by
and
represents a multiplier to be determined later.
Remark 1. Our treatment of (
1)
differs from the argument used by Ho, Bryson, and Baron in Ref. [6]. In their paper, they appealed to state estimates of the pursuer and evader to evaluate the cost functional. Their motivation for their argument is due to the notion that, when they studied pursing and evading missiles, they considered the difference in altitude to be negligible. As a result of our rewriting of (
1)
, we are not required to make such a restriction. Next, we provide necessary conditions for an optimal control. We assume that
for all
such that
.
In the following result, we determine the equations our extended state, costate, and controls must satisfy.
Lemma 1. Let (
5)
be the cost functional associated with (
4).
Assume (
7)
holds. Then the first variation, , is zero provided that z, λ, u, and v satisfy Proof. Then after rearranging terms, the first variation can be written as
Now in order for
, we set each coefficient of independent increments
,
,
,
equal to zero. This yields the necessary conditions for a minimum of (
5). Using the Hamiltonian (
6), we have state and costate equations
and
Similarly, we have the stationary conditions
and
This concludes the proof. □
The following remark is useful in eliminating the costate later.
Remark 2. We note that z, λ, u, and v solve (
8)
if and only if they solvewhere is a “mixing term” given by Throughout this paper, we assume that is regressive. As a result, we can determine an optimal strategy if we know the value of the costate.
Finally, we provide the sufficient conditions for local optimal controls that ensure a saddle point.
Lemma 2. Let (
5)
be the cost functional associated with (
4).
Assume (
7)
holds. Then the second variation, , is positive provided that , , and satisfy the constraints where and is fixed. Proof. Taking the second derivative of
, we have
If we assume that
,
, and
satisfy the constraint
then the second variation is given by
Note that and while and . Thus, if and is fixed, then (11) is guaranteed to be positive. □
Next, we provide a definition of a saddle point for two competing players.
Definition 9. The pair is a saddle point to the system (
4)
associated with the cost (
5)
provided Here, the stationary conditions needed to ensure a saddle point are
and
(see Ref. [
38]). For our purposes, this pair corresponds to when neither player wishes to deviate from this compromise without being penalized by the other player. It should be understood that this compromise occurs when we have the natural caveat that the pursuer and evader belong to the same time scale. In this paper, we do not claim that this saddle point must be unique.
4. Fixed Final States Case
In this section, we seek an optimal strategy when the final states are fixed. In this setting, we write the equations for the pursuer and evader separately. Here we consider the state and costate equations for the pursuer
as well as those for the evader
associated with the cost functional
The following term is needed to establish an optimal control scheme when the final states are fixed.
Definition 10. The initial state difference
, , is the difference between the zero-input pursuing and evading states, i.e., Next, we determine an open-loop strategy for both players. Note that the following theorem mirrors Kalman’s generalized controllability criterion as found in [
15], Theorem 3.2.
Theorem 7. Suppose that and solve (
11)
, while and satisfy (
12)
. Let the gramians for the pursuer and evaderandrespectively, be such that is invertible for all . Then u and v can be rewritten asand Proof. Solving (
11) for
, we have
Using (
3) and (
8), the state equation becomes
Now solving (
19) with Theorem 6 at time
, we have
Similarly, the final state for the evader can be written as
Taking the difference in the final states and rearranging, we have
Finally, plugging
into (
9) and using (
20) yields
The equation for v can be shown similarly. This concludes the proof. □
Next, we determine the optimal cost.
Theorem 8. If u and v are given by (
17)
and (
18)
, respectively, then the cost functional (
13)
can be rewritten aswhere . Proof. First, plugging (
17), (
18) and (
20) into (
13), we have
using the gramians (
15) and (
16). Since
is symmetric, we can pull out common factors on the left and right to obtain our result. □
Remark 3. Suppose that the pursuer wants to use a strategy u that intercepts the evader (using strategy v) with minimal energy. Note that if and only if . From the classical definition of controllability, this implies that the pursuer captures the evader when the pursuer is “more controllable” than the evader. A sufficient condition for the pursuing state to intercept the evader is given by . As a result, this relationship is preserved in the unification of pursuit–evasion to dynamic equations on time scales.
5. Free Final States Case
In this section, we develop an optimal control law in the form of state feedback. In considering the boundary conditions, note that
is known (meaning
), while
is free (meaning
). Thus, the coefficient on
must be zero. This gives the terminal condition on the costate to be
Remark 4. Now in order to solve this two-point boundary value problem, we make the assumption that z and λ satisfyfor all. This condition (
23)
is called a “sweep condition,” a term used by Bryson and Ho in Ref. [7]. Since the terminal condition , it is natural to assume that as well. Next, we offer a form of our Riccati equation that S must satisfy. Here, the Riccati equation is used to update the pursuer and evader’s controls when expressed in feedback form.
Theorem 9. If x satisfiesand λ is given by (
23)
, then Proof. Since
is as given in (
23), we may use the product rule, (
24), (
25) and (
3) to arrive at
which gives (
26) as desired. □
Next, we offer an alternative form of our Riccati equation.
Lemma 3. If is regressive, then S solves (
24)
if and only if it solves Proof. Plugging the above identity into (
24) yields (
27). □
Next, we define our Kalman gains as follows.
Definition 11. Let be regressive. Then the matrix-valued functionsandare called the pursuer feedback gain
and evader feedback gain
, respectively. Now we introduce our combined control scheme in extended state feedback form.
Theorem 10. Let be regressive and suppose that z and λ solve (
19)
such that (
23)
holds. Then Proof. Using (
9), (
23) and (
3), we have
Now combining like terms yields
Multiplying both side by the inverse of
and rearranging terms, we have
Finally, Equation (
30) follows using (
28) and (
29). □
Next we rewrite our extended state equation under the influence of the pursuit–evasion control laws. This yields the closed-loop plant given by
which can be used to find an optimal trajectory for any given
.
The following result is useful in establishing another form of the Riccati equation.
Lemma 4. If is regressive and S is symmetric, then Moreover, both sides of (
32)
are equal to . Proof. We can use (
28) and (
29) to rewrite the left-hand side of (
32) as
Using (
28) and (
29), the right-hand side of (
32) can be written as
Thus, Equation (
32) holds. □
Now we rewrite the Riccati Equation (
27) in so-called (generalized) Joseph stabilized form (see [
38]).
Theorem 11. If is regressive and S is symmetric, then S solves the Riccati Equation (
27)
if and only if it solves Proof. The statement follows directly from Lemma 4. □
Note that each equation in this section can be stored “offline,” meaning their structures will not be altered when simulating results. It should be noted that each variation of our Riccati equation accounts for the gaps in time between decisions made by the pursuer and evader. Finally, we rewrite the optimal cost.
Theorem 12. Suppose that S solves (
33).
If z, u, and v satisfy (
31)
, and (
30)
respectively, then the cost functional (
5)
can be rewritten as Proof. First note that we may use the product rule, (
3) and (
31) to find
Using this and (
30) in (
5), we have
Using (
35) and (
33), the cost functional can be rewritten as
This concludes the proof. □
From Theorem 12, if the current state and
S are known, we can determine the optimal cost before we apply the optimal control or even calculate it.
Table 2 below summarizes our results.
7. Concluding Remarks and Future Work
In this project, we have established the LQPEG where the pursuer and evader belong to the same arbitrary time scale
One potential application of this work is when the pursuer represents a drone and the evader represents a missile guidance system where their corresponding signals are unevenly sampled. Here, the cost in part represents the wear and tear on the drone. A saddle point in this setting would represent a “live and let live” arrangement, where the drone is allowed to spy briefly on the missile-guidance system and return home, but is not given opportunity to preserve enough of its battery to outstay its welcome. Similarly, in finance, the pursuer and evader can represent competing companies where a saddle point would correspond to an effort to coexist, where a hostile takeover or unnecessarily expended resources can be avoided. We have sidestepped the setting where the pursuer and evader each belong to their own time scale
and
, respectively. However, these time scales can be merged using a sample-and-hold method as found in Refs. [
39,
40].
One potential extension of this work is the introduction of additional pursuers. In this setting, the cost must be adjusted to account for the closest pursuer, which can vary over the time scale. A second potential extension is to consider the setting when one player is subject to a delay. Here, both players can still belong to the same time scale. However, this allows one player to act after the other, perhaps with some knowledge of the opposing player’s strategy. Finally, a third possible approach is to such games in a stochastic setting. Here, we can discretize each player’s stochastic linear time-invariant system to a dynamic system on an isolated time scale, as found in Refs. [
39,
41]. However, the usual separability property is not preserved in this setting.