A Discrete-Time Homing Problem with Two Optimizers

Mario Lefebvre

doi:10.3390/g14060068

Department of Mathematics and Industrial Engineering, Polytechnique Montréal, 2500, Chemin de Polytechnique, Montréal, QC H3T 1J4, Canada

Games2023, 14(6), 68;https://doi.org/10.3390/g14060068

This article belongs to the Special Issue Applications of Game Theory with Mathematical Methods

Version Notes

Order Reprints

Review Reports

Abstract

A stochastic difference game is considered in which a player wants to minimize the time spent by a controlled one-dimensional symmetric random walk

{X_{n}, n = 0, 1, \dots}

in the continuation region

C : = {1, 2, \dots}

, and the second player seeks to maximize the survival time in C. The process starts at

X_{0} = x > 0

and the game ends the first time

X_{n} \leq 0

. An exact expression is derived for the value function, from which the optimal solution is obtained, and particular problems are solved explicitly.

Keywords:

random walk; first-passage time; homing problem; difference game; dynamic programming; difference equation

1. Introduction

A deterministic one-dimensional two-player linear-quadratic (nonzero-sum) difference game can be defined as follows (see, for example, [1]):

X_{n + 1} = a_{n} X_{n} + b_{n} u_{n} + c_{n} v_{n}

(1)

for

n \in {0, 1, \dots, T}

, where

a_{n}

,

b_{n}

and

c_{n}

are deterministic functions, the control

u_{n}

(respectively,

v_{n}

) gives the decision of player 1 (respectively, player 2) at time n. Each player has a general quadratic cost function that he/she tries to minimize.

The difference game can be made stochastic by adding the random variable

ϵ_{n}

in Equation (1). The random variables

ϵ_{0}, ϵ_{1}, \dots

are assumed to be independent and identically distributed.

The final time T can be finite or infinite. Reddy and Zaccour [2] considered a class of non-cooperative N-player finite-horizon linear-quadratic dynamic games with linear constraints. Lin [3] studied the Stackelberg strategies in the infinite horizon LQ mean-field stochastic difference game. Liu et al. [4] used an adaptive dynamic programming approach to solve the infinite horizon linear quadratic Stackelberg game problem for unknown stochastic discrete-time systems with multiple decision makers. Ju et al. [5] used dynamic programming to obtain an optimal linear strategy profile for a class of two-player finite-horizon linear-quadratic difference games.

In this paper, we consider the one-dimensional controlled Markov chain

{X_{n}, n = 0, 1, \dots}

defined by

X_{n + 1} = X_{n} + u_{n} - v_{n} + ϵ_{n} for n = 0, 1, \dots,

(2)

where

ϵ_{n} = \pm 1

with probability 1/2 and the random variables

ϵ_{0}, ϵ_{1}, \dots

are independent, so that

{X_{n}, n = 0, 1, \dots}

is a (controlled) symmetric random walk. The chain starts at

X_{0} = x > 0

.

We define the first-passage time

T (x) : = inf {n > 0 : X_{n} \leq 0 ∣ X_{0} = x > 0} .

(3)

Our aim is to find the controls

u_{n} \in {- 1, 0}

and

v_{n} \in {0, 1}

that minimize the expected value of the cost function

J (x) = \sum_{n = 0}^{T (x) - 1} (u_{n}^{2} - v_{n}^{2} + λ),

(4)

where

λ

is a positive constant. This parameter gives the penalty incurred for survival in the continuation region

C : = {1, 2, \dots}

. It is needed to obtain a well-defined problem. Indeed, if we set

λ = 0

, then the optimal solution for the first (respectively, second) player is trivially to choose

u_{n} \equiv 0

(respectively,

v_{n} \equiv 1

).

Thus, there are two optimizers. The first one, using

u_{n}

, would like the Markov chain to hit the origin as soon as possible. Therefore, he/she would like to choose

u_{n} = - 1

, but this generates a cost. On the other hand, the second optimizer wants the Markov chain to remain positive as long as possible. Hence, he/she would prefer to choose

v_{n} = 0

, which however generates no costs. Both optimizers, and especially the second one, must also take into account the value of the constant

λ

.

Remark 1.

We could, in theory, assume that the parameter λ is negative. Then, it is mainly the first optimizer who would need to consider the value of λ. See, however, Remark 3.

The main difference between the current paper and the related ones found in the literature is the fact that, in our case, the final time is neither finite or infinite; it is rather a random variable.

The above problem is a particular homing problem, in which a stochastic process is controlled until a certain event occurs. This type of problem was introduced by Whittle ([6] p. 289) for n-dimensional diffusion processes. He also considered the case when we take the risk-sensitivity of the optimizer into account; see [7], as well as [8,9].

The author has written numerous papers on homing problems. In [10,11], these problems where extended to the case of discrete-time Markov chains, whereas in [12] the case of autoregressive processes was treated; see also [13].

In [10,11], the authors considered a problem related to the one defined above, but with only one optimizer. Thus, they treated a stochastic optimal control problem, whereas the problem in the current paper is a stochastic difference game.

To solve our problem, we will use dynamic programming. Let

F (x)

be the value function defined by

F (x) = min_{\begin{matrix} (u_{n}, v_{n}) \\ n = 0, \dots, T (x) - 1 \end{matrix}} E [J (x)] .

(5)

In the next section, the dynamic programming equation satisfied by the function

F (x)

will be derived.

2. Dynamic Programming

In theory, we must determine the optimal value of

(u_{n}, v_{n})

for

n = 0, 1, \dots, T (x) - 1

. However, using dynamic programming, the problem is reduced to finding the optimal solution at the initial time

n = 0

only.

Indeed, we can write, making use of Bellman’s principle of optimality, that

\begin{matrix} F (x) & : = & min_{\begin{matrix} (u_{n}, v_{n}) \\ n = 0, \dots, T (x) - 1 \end{matrix}} E [\sum_{n = 0}^{T (x) - 1} (u_{n}^{2} - v_{n}^{2} + λ)] \\ = & min_{\begin{matrix} (u_{n}, v_{n}) \\ n = 0, \dots, T (x) - 1 \end{matrix}} \{u_{0}^{2} - v_{0}^{2} + λ + E [\sum_{n = 1}^{T (x) - 1} (u_{n}^{2} - v_{n}^{2} + λ)]\} \\ = & min_{(u_{0}, v_{0})} \{u_{0}^{2} - v_{0}^{2} + λ + E [F (x + u_{0} - v_{0} + ϵ_{0})]\} . \end{matrix}

(6)

We can now state the following proposition.

Proposition 1.

The value function

F (x)

satisfies the dynamic programming equation

F (x) = min_{(u_{0}, v_{0})} \{u_{0}^{2} - v_{0}^{2} + λ + \frac{1}{2} [F (x + u_{0} - v_{0} - 1) + F (x + u_{0} - v_{0} + 1)]\} .

(7)

The equation is subject to the boundary condition

F (x) = 0 i f x \leq 0 .

(8)

Remark 2.

The usefulness of the value function is that it enables us to determine the optimal controls

u^{*} (0)

and

v^{*} (0)

.

Since

(u_{0}, v_{0}) \in {(- 1, 0), (- 1, 1), (0, 0), (0, 1)}

, we deduce from Equation (7) that

\begin{matrix} F (x) & = & min \{1 + λ + \frac{1}{2} [F (x - 2) + F (x)], λ + \frac{1}{2} [F (x - 3) + F (x - 1)], \\ λ + \frac{1}{2} [F (x - 1) + F (x + 1)], - 1 + λ + \frac{1}{2} [F (x - 2) + F (x)]\} . \end{matrix}

(9)

Now, let

T_{0} (x)

be the random variable that corresponds to

T (x)

when

u_{n} = v_{n} \equiv 0

. Using the well-known results on the gambler’s ruin problem (see, for instance, ([14] p. 349), we can state that

E [T_{0} (x)] = \infty for any x > 0 .

(10)

Hence, we would also have

E [J (x)] = \infty for any x > 0 .

(11)

Since the objective is to minimize the expected value of

J (x)

, we must conclude that the optimal solution is not

(u_{0}, v_{0}) = (0, 0)

.

Remark 3.

(i) We deduce from what precedes that we cannot choose a value of the parameter λ in the interval

(- \infty, 0)

, otherwise we obtain an infinite expected reward by choosing

u_{n} = v_{n} \equiv 0

.

(ii) If

λ = 0

, then there is no penalty (or reward) for survival in the continuation region

{1, 2, \dots}

. The optimal solution is obviously

u_{n}^{*} \equiv 0

. As will be seen below, the optimal value of

v_{n}

is 1 for any value of

λ \geq 0

.

(iii) If we define

T (x) : = inf {n > 0 : X_{n} \leq 0 or X_{n} \geq d ∣ X_{0} = x > 0},

(12)

then, when

u_{n} = v_{n} \equiv 0

, we find that (see, again, ([14] p. 349))

E [T_{0} (x)] = x (d - x) for 0 \leq x \leq d,

(13)

so that

E [J (x)] = λ x (d - x) for 0 \leq x \leq d .

(14)

Therefore, we could take

λ < 0

in that case.

(iv) If

P [ϵ_{n} = 1] \equiv p

and

P [ϵ_{n} = - 1] \equiv q

and

T (x)

is defined as in (12), then (see ([14] p. 348))

E [T_{0} (x)] = \frac{x}{q - p} - \frac{d}{q - p} \frac{1 - {(q / p)}^{x}}{1 - {(q / p)}^{d}} for 0 \leq x \leq d

(15)

and

lim_{d \to \infty} E [T_{0} (x)] = \{\begin{matrix} \infty & if p \geq 1 / 2, \\ \frac{x}{q - p} & if p < 1 / 2 . \end{matrix}

(16)

Thus, we could consider the case when

λ < 0

if

p < 1 / 2

.

Next, notice that

(u_{0}, v_{0}) = (- 1, 0)

and

(u_{0}, v_{0}) = (0, 1)

yield the same expected value of

X_{1}

. However, the choice

(u_{0}, v_{0}) = (- 1, 0)

generates a cost of 1, while with

(u_{0}, v_{0}) = (0, - 1)

a reward of 1 is obtained. Hence, taking

(u_{0}, v_{0}) = (0, 1)

is surely a better decision than choosing

(u_{0}, v_{0}) = (- 1, 0)

. Thus, we must determine whether the optimal solution is

(u_{0}, v_{0}) = (- 1, 1)

or

(0, 1)

.

Proposition 2.

We deduce from what precedes that the second optimizer should choose

v_{n} \equiv 1

, independently of the value of

u_{n}

.

Remark 4.

The optimal choice for

u_{n}

will depend on the value of the parameter λ, which, as mentioned above, gives the penalty incurred for survival in the continuation region.

It follows from Proposition 2 that Equation (9) can be simplified to

F (x) = min \{λ + \frac{1}{2} [F (x - 3) + F (x - 1)], - 1 + λ + \frac{1}{2} [F (x - 2) + F (x)]\} .

(17)

Proposition 3.

The value function

F (x)

satisfies the non-linear third-order difference equation

\begin{matrix} 2 F^{2} (x) - F (x) [6 λ - 4 + F (x - 1) + 2 F (x - 2) + F (x - 3)] \\ + F (x - 2) [F (x - 1) + F (x - 3) + 2 λ] + 2 (λ - 1) [F (x - 1) + F (x - 3)] \\ + 4 λ (λ - 1) = 0 \end{matrix}

(18)

for

x = 1, 2, \dots

The boundary condition is

F (x) = 0 if x \leq 0 .

(19)

Proof.

Making use of the formula

min {a, b} = \frac{a + b - | a - b |}{2},

(20)

we deduce from Equation (17) that

\begin{matrix} 2 F (x) - \{- 1 + 2 λ + \frac{1}{2} [F (x) + F (x - 1) + F (x - 2) + F (x - 3)]\} \\ = - |1 + \frac{1}{2} [F (x - 1) + F (x - 3) - F (x) - F (x - 2)]| . \end{matrix}

(21)

By squaring both sides of the above equation and simplifying, we obtain Equation (18). □

Solving a boundary value problem for a non-linear difference equation of order 3 is not easy. Instead of trying to solve Equation (18) directly, we will proceed as in [10].

3. Optimal Choice for $u_{n}$

We must determine whether the first optimizer should take

u_{0} = - 1

or

u_{0} = 0

. The control variable

u_{n}

(as well as

v_{n}

) is actually a function of

X_{n}

.

Suppose that we set

u_{n} (X_{n}) \equiv - 1

. Then, denoting the function

F (x)

by

Φ (x)

, Equation (17) implies that

Φ (x) = λ + \frac{1}{2} [Φ (x - 3) + Φ (x - 1)],

(22)

that we rewrite as follows:

2 Φ (y + 3) - Φ (y + 2) - Φ (y) = 2 λ,

(23)

where

y : = x - 3

. The equation is valid for

y \in {- 2, - 1, \dots}

and is subject to the boundary condition

Φ (y) = 0 if y \leq - 3 .

(24)

Making use of the mathematical software program Maple, we find that the solution of Equation (23) that satisfies the boundary conditions

Φ (- 3) = Φ (- 4) = Φ (- 5) = 0

is

Φ (y) = \frac{λ}{112} [210 + 56 y + (i \sqrt{7} - 7) {(\frac{i \sqrt{7} - 1}{4})}^{y} - (i \sqrt{7} + 7) {(\frac{i \sqrt{7} - 1}{2})}^{- y}]

(25)

for

y \geq - 3

, so that

Φ (x) = \frac{λ}{112} [42 + 56 x - (i \sqrt{7} + 21) {(\frac{i \sqrt{7} - 1}{4})}^{x} + (i \sqrt{7} - 21) {(\frac{i \sqrt{7} - 1}{2})}^{- x}]

(26)

for

x \geq 0

.

Remark 5.

(i) The function

Φ (x)

is real, even if it contains the imaginary constant i.

(ii) Because Equation (23) is a third-order linear difference equation, we need three boundary conditions to solve it uniquely. Therefore, we used Equation (24) for

y = - 3, - 4

and

- 5

.

Next, let us denote the function

F (x)

by

Ψ (x)

if we set

u_{n} (X_{n}) \equiv 0

. We then deduce from Equation (17) that the function

Ψ (x)

satisfies the second-order difference equation

Ψ (x) = - 1 + λ + \frac{1}{2} [Φ (x - 2) + Φ (x)] .

(27)

The unique solution that is such that

Ψ (0) = Ψ (- 1) = 0

is

Ψ (x) = (- 1 + λ) [x - \frac{{(- 1)}^{x}}{2} + \frac{1}{2}] for x \geq 0 .

(28)

Every time the value of

X_{n}

changes, the (first) optimizer must make a new decision. It then follows from Equation (17) that we can express the value function

F (x)

in terms of

Φ (x)

and

Ψ (x)

as in the following proposition.

Proposition 4.

The value function

F (x)

is given by

\begin{matrix} F (x) & = & min {λ + \frac{1}{2} [min {Φ (x - 3), Ψ (x - 3)} + min {Φ (x - 1), Ψ (x - 1)}], \\ - 1 + λ + \frac{1}{2} [min {Φ (x - 2), Ψ (x - 2)} + min {Φ (x), Ψ (x)}]} \end{matrix}

(29)

for

x = \in {0, 1, \dots}

.

To determine the value function, and hence the optimal value of

u_{0}

(

= u_{0} (x)

), we can compare the two expressions

G (x) : = λ + \frac{1}{2} [min {Φ (x - 3), Ψ (x - 3)} + min {Φ (x - 1), Ψ (x - 1)}]

(30)

and

H (x) : = - 1 + λ + \frac{1}{2} [min {Φ (x - 2), Ψ (x - 2)} + min {Φ (x), Ψ (x)}] .

(31)

That is, we can write that

F (x) = min {G (x), H (x)} .

(32)

Remark 6.

We must set

Ψ (- 2) = 0

when we compute the function

G (x)

(and

F (x)

).

In the next section, we will present the results obtained with different values of the parameter

λ

to see the effect of this parameter on the optimal control

u_{0}^{*} (x)

.

4. Numerical Examples

Assume first that

λ = 1 / 2

. Table 1 gives the value function

F (x)

,

Φ (x)

,

Ψ (x)

,

G (x)

,

H (x)

and the optimal control

u_{0}^{*} (x)

for

x = 1, 2, \dots, 10

. We see that the optimal control is always

u_{0}^{*} = 0

, which could have been expected because the penalty for survival in the continuation region

C : = {1, 2, \dots}

is not large enough to incite the first optimizer to use

u_{0} = - 1

in order to leave C as rapidly as possible.

Table 1. Functions

F (x)

,

Φ (x)

,

Ψ (x)

,

G (x)

and

H (x)

, and optimal control

u_{0}^{*} (x)

for

x = 1, 2, \dots, 10

when

λ = 1 / 2

.

Next, we take

λ = 1

. This case is rather special, because the function

Ψ (x)

becomes

Ψ (x) \equiv 0

. Again,

u_{0}^{*} (x) \equiv 0

; see Table 2.

Table 2. Functions

F (x)

,

Φ (x)

,

Ψ (x)

,

G (x)

and

H (x)

, and optimal control

u_{0}^{*} (x)

for

x = 1, 2, \dots, 10

when

λ = 1

.

In Table 3,

λ = 2

. This time,

u_{0}^{*} (x)

is not a constant:

u_{0}^{*} (1) = - 1

or 0 and then

u_{0}^{*} (x) = - 1

for x odd and

u_{0}^{*} (x) = 0

for x even.

Table 3. Functions

F (x)

,

Φ (x)

,

Ψ (x)

,

G (x)

and

H (x)

, and optimal control

u_{0}^{*} (x)

for

x = 1, 2, \dots, 10

when

λ = 2

.

Finally,

λ = 10

in Table 4 and the optimal control is

u_{0}^{*} (x) \equiv - 1

, which is not surprising because the constant

λ

is large.

Table 4. Functions

F (x)

,

Φ (x)

,

Ψ (x)

,

G (x)

and

H (x)

, and optimal control

u_{0}^{*} (x)

for

x = 1, 2, \dots, 10

when

λ = 10

.

4.1. Critical Value of $λ$

For a fixed x, we can determine the critical value of

λ

, that is, the value of

λ

for which

G (x) = H (x)

. For instance, we see in Figure 1 that

G (1) = H (1)

when

λ = 2

. For

λ \in (0, 2)

,

u_{0}^{*} (1) = 0

, while

u_{0}^{*} (1) = - 1

for

λ > 2

.

Figure 1. Functions

G (1)

(solid line) and

H (1)

when

λ \in [1.95, 2.05]

.

Similarly,

G (10)

and

H (10)

are both approximately equal to 11.576 when

λ \approx 2.1576

; see Figure 2.

Figure 2. Functions

G (10)

(solid line) and

H (10)

when

λ \in [2.157, 2.158]

.

4.2. Solution of the Non-Linear Difference Equation

As we mentioned in Section 2, solving the non-linear third-order difference Equation (18), subject to the boundary condition (19), is not an easy task.

We can try to solve Equation (18) recursively. First, making use of the boundary condition (19), we can write that

F^{2} (1) - F (1) (3 λ - 2) + 2 λ (λ - 1) = 0 .

(33)

It follows that

\begin{matrix} F (1) & = & \frac{1}{2} \{(3 λ - 2) \pm [\sqrt{{(3 λ - 2)}^{2} - 8 [λ (λ - 1)]}]\} \\ = & \frac{1}{2} \{(3 λ - 2) \pm \sqrt{λ^{2} - 4 λ + 4}\} \\ = & \frac{1}{2} \{(3 λ - 2) \pm | λ - 2 |\} . \end{matrix}

(34)

Because

F (1)

can take any real value, it is not obvious to decide which sign to choose in the above equation. If we look at the value of

F (1)

in Table 1, Table 2, Table 3 and Table 4, we see that we must in fact choose the minus sign, so that

F (1) = \{\begin{matrix} 2 (λ - 1) & if λ \in (0, 2], \\ λ & if λ \geq 2 . \end{matrix}

(35)

Next, we could use the above expression for

F (1)

in Equation (18) to determine the value of

F (2)

, and so forth. It is clear that this technique is tedious. Moreover, without having computed the value of

F (x)

as we did above, determining the right sign to choose in the expression for

F (x)

that corresponds to the one in Equation (35) is not straightforward.

From what precedes, we may conclude that the method used in this paper to compute the value function explicitly is of interest in itself and could be used to solve boundary value problems for non-linear difference equations, if we can express the function of interest as a particular value function in a stochastic control problem.

To conclude, we will check that the values of the function

F (x)

given in Table 1 are such that Equation (18), together with the boundary condition (19), is indeed satisfied for

x = 1, 2, \dots, 10

when

λ = 1 / 2

.

First, we saw in Equation (35) that

F (1) = - 1

is a solution of the equation if

λ = 1 / 2

. Next, since

F (2)

is also equal to

- 1

, we must have

\begin{matrix} 2 {(- 1)}^{2} - (- 1) [6 (1 / 2) - 4 + (- 1) + 0 + 0] + 0 + 2 [(1 / 2) - 1] [(- 1) + 0] \\ + 4 (1 / 2) [(1 / 2) - 1] = 0, \end{matrix}

(36)

which is indeed correct.

The value of

F (3)

in Table 1 is

- 2

. We must check that

\begin{matrix} 2 {(- 2)}^{2} - (- 2) [6 (1 / 2) - 4 + (- 1) + 2 (- 1) + 0] + (- 1) [(- 1) + 0 + 2 (1 / 2)] \\ + 2 [(1 / 2) - 1] [(- 1) + 0] + 4 (1 / 2) [(1 / 2) - 1] = 0 . \end{matrix}

(37)

Again, the result is correct. We can proceed in the same way to check the remaining results. For

x = 10

, we obtain that

\begin{matrix} 2 {(- 5)}^{2} - (- 5) [6 (1 / 2) - 4 + (- 5) + 2 (- 4) + (- 4)] + (- 4) [(- 5) + (- 4) + 2 (1 / 2)] \\ + 2 [(1 / 2) - 1] [(- 5) + (- 4)] + 4 (1 / 2) [(1 / 2) - 1] = 0 . \end{matrix}

(38)

5. Conclusions

Homing problems are generally considered for diffusion processes. The author and Kounta [11] extended these problems to the discrete-time case. In [10], the author improved the results found in [11] by finding an explicit expression for the value function.

In the current paper, a homing problem with two optimizers has been defined and solved explicitly. The problem can be interpreted as a stochastic difference game, as one optimizer is trying to minimize the time spent by the controlled stochastic process in the continuation region C, while the second one seeks to maximize the survival time in C.

In Section 2, the equation satisfied by the value function has been derived. This equation is a non-linear third-order difference equation, which is obviously very difficult to solve explicitly.

The technique that we have used in the paper enables us to obtain an exact expression for the solution of a boundary value problem for a non-linear difference equation. This result is of interest in itself.

For the sake of simplicity, we have considered a symmetric random walk, and it has been assumed that the control variables can take only two values. We could of course generalize the results that have been presented in the paper. However, if there are many possible values for the control variables, obtaining an explicit solution to the homing problem considered can be quite tedious.

Funding

This research was supported by the Natural Sciences and Engineering Research Council of Canada. The author also wishes to thank the anonymous reviewers of this paper for their constructive comments.

Data Availability Statement

Not applicable.

Conflicts of Interest

The author declares no conflict of interest.

References

Hämäläinen, R.P. Nash and Stackelberg solutions to general linear-quadratic two player difference games. I: Open-loop and feedback strategies. Kybernetika 1978, 14, 38–56. [Google Scholar]
Reddy, P.V.; Zaccour, G. Feedback Nash equilibria in linear-quadratic difference games with constraints. IEEE Trans. Automat. Control 2017, 62, 590–604. [Google Scholar] [CrossRef]
Lin, Y. Feedback Stackelberg strategies for the discrete-time mean-field stochastic systems in infinite horizon. J. Frank. Inst. 2019, 356, 5222–5239. [Google Scholar] [CrossRef]
Liu, X.; Liu, R.; Li, Y. Infinite time linear quadratic Stackelberg game problem for unknown stochastic discrete-time systems via adaptive dynamic programming approach. Asian J. Control 2020, 2, 937–948. [Google Scholar] [CrossRef]
Ju, P.; Li, X.; Lei, J.; Li, T. Optimal linear strategy for stochastic games with one-side control-sharing information. Int. J. Control 2022, 1–7. [Google Scholar] [CrossRef]
Whittle, P. Optimization over Time; Wiley: Chichester, UK, 1982; Volume 1. [Google Scholar]
Whittle, P. Risk-Sensitive Optimal Control; Wiley: Chichester, UK, 1990. [Google Scholar]
Kuhn, J. The risk-sensitive homing problem. J. Appl. Probab. 1985, 22, 796–803. [Google Scholar] [CrossRef]
Makasu, C. Risk-sensitive control for a class of homing problems. Automatica 2009, 45, 2454–2455. [Google Scholar] [CrossRef]
Lefebvre, M. An explicit solution to a discrete-time stochastic optimal control problem. WSEAS Trans. Syst. 2023, 22, Art. #40. 368–371. [Google Scholar] [CrossRef]
Lefebvre, M.; Kounta, M. Discrete homing problems. Arch. Control Sci. 2013, 23. [Google Scholar] [CrossRef]
Lefebvre, M. The homing problem for autoregressive processes. IMA J. Math. Control Inform. 2022, 39, 322–344. [Google Scholar] [CrossRef]
Kounta, M.; Dawson, N.J. Linear quadratic Gaussian homing for Markov processes with regime switching and applications to controlled population growth/decay. Methodol. Comput. Appl. Probab. 2021, 23, 1155–1172. [Google Scholar] [CrossRef]
Feller, W. An Introduction to Probability Theory and its Applications; Wiley: New York, NY, USA, 1968; Volume I. [Google Scholar]

Figure 1. Functions

G (1)

(solid line) and

H (1)

when

λ \in [1.95, 2.05]

.

Figure 2. Functions

G (10)

(solid line) and

H (10)

when

λ \in [2.157, 2.158]

.

Table 1. Functions

F (x)

,

Φ (x)

,

Ψ (x)

,

G (x)

and

H (x)

, and optimal control

u_{0}^{*} (x)

for

x = 1, 2, \dots, 10

when

λ = 1 / 2

.

Table 1. Functions

F (x)

,

Φ (x)

,

Ψ (x)

,

G (x)

and

H (x)

, and optimal control

u_{0}^{*} (x)

for

x = 1, 2, \dots, 10

when

λ = 1 / 2

.

x	$F (x)$	$Φ (x)$	$Ψ (x)$	$G (x)$	$H (x)$
1	$- 1$	$0.5$	$- 1$	$0.5$	$- 1$
2	$- 1$	$0.75$	$- 1$	0	$- 1$
3	$- 2$	$0.875$	$- 2$	0	$- 2$
4	$- 2$	$1.1875$	$- 2$	$- 1$	$- 2$
5	$- 3$	$1.4687$	$- 3$	$- 1$	$- 3$
6	$- 3$	$1.6719$	$- 3$	$- 2$	$- 3$
7	$- 4$	$1.9297$	$- 4$	$- 2$	$- 4$
8	$- 4$	$2.1992$	$- 4$	$- 3$	$- 4$
9	$- 5$	$2.4355$	$- 5$	$- 3$	$- 5$
10	$- 5$	$2.6826$	$- 5$	$- 4$	$- 5$

Table 2. Functions

F (x)

,

Φ (x)

,

Ψ (x)

,

G (x)

and

H (x)

, and optimal control

u_{0}^{*} (x)

for

x = 1, 2, \dots, 10

when

λ = 1

.

Table 2. Functions

F (x)

,

Φ (x)

,

Ψ (x)

,

G (x)

and

H (x)

, and optimal control

u_{0}^{*} (x)

for

x = 1, 2, \dots, 10

when

λ = 1

.

x	$Φ (x)$	$G (x)$
1	1	1
2	$1.5$	1
3	$1.75$	1
4	$2.375$	1
5	$2.937$	1
6	$3.344$	1
7	$3.859$	1
8	$4.398$	1
9	$4.871$	1
10	$5.365$	1

Table 3. Functions

F (x)

,

Φ (x)

,

Ψ (x)

,

G (x)

and

H (x)

, and optimal control

u_{0}^{*} (x)

for

x = 1, 2, \dots, 10

when

λ = 2

.

Table 3. Functions

F (x)

,

Φ (x)

,

Ψ (x)

,

G (x)

and

H (x)

, and optimal control

u_{0}^{*} (x)

for

x = 1, 2, \dots, 10

when

λ = 2

.

x	$F (x)$	$Φ (x)$	$Ψ (x)$	$G (x)$	$H (x)$	$u_{0}^{*} (x)$
1	2	2	2	2	2	$- 1$ or 0
2	2	3	2	3	2	0
3	3	$3.5$	4	3	$3.75$	$- 1$
4	4	$4.75$	4	$4.75$	4	0
5	5	$5.875$	6	5	$5.687$	$- 1$
6	6	$6.687$	6	$6.687$	6	0
7	7	$7.719$	8	7	$7.797$	$- 1$
8	8	$8.797$	8	$8.797$	8	0
9	9	$9.742$	10	9	$9.730$	$- 1$
10	10	$10.730$	10	$10.730$	10	0

Table 4. Functions

F (x)

,

Φ (x)

,

Ψ (x)

,

G (x)

and

H (x)

, and optimal control

u_{0}^{*} (x)

for

x = 1, 2, \dots, 10

when

λ = 10

.

Table 4. Functions

F (x)

,

Φ (x)

,

Ψ (x)

,

G (x)

and

H (x)

, and optimal control

u_{0}^{*} (x)

for

x = 1, 2, \dots, 10

when

λ = 10

.

x	$F (x)$	$Φ (x)$	$Ψ (x)$	$G (x)$	$H (x)$	$u_{0}^{*} (x)$
1	10	10	18	10	14	$- 1$
2	15	15	18	15	$16.5$	$- 1$
3	$17.5$	$17.5$	36	$17.5$	$22.75$	$- 1$
4	$23.75$	$23.75$	36	$23.75$	$28.375$	$- 1$
5	$29.375$	$29.375$	54	$29.375$	$32.437$	$- 1$
6	$33.437$	$33.437$	54	$33.437$	$37.594$	$- 1$
7	$38.594$	$38.594$	72	$38.594$	$42.984$	$- 1$
8	$43.984$	$43.984$	72	$43.984$	$47.711$	$- 1$
9	$48.711$	$48.711$	90	$48.711$	$52.652$	$- 1$
10	$53.652$	$53.652$	90	$53.652$	$57.818$	$- 1$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

A Discrete-Time Homing Problem with Two Optimizers

Abstract

1. Introduction

2. Dynamic Programming

3. Optimal Choice for $u_{n}$

4. Numerical Examples

4.1. Critical Value of $λ$

4.2. Solution of the Non-Linear Difference Equation

5. Conclusions

Funding

Data Availability Statement

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics

A Discrete-Time Homing Problem with Two Optimizers

Abstract

1. Introduction

2. Dynamic Programming

3. Optimal Choice for u n

4. Numerical Examples

4.1. Critical Value of λ

4.2. Solution of the Non-Linear Difference Equation

5. Conclusions

Funding

Data Availability Statement

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics

3. Optimal Choice for $u_{n}$

4.1. Critical Value of $λ$