A Controlled Discrete-Time Queueing System as a Model for the Orders of Two Competing Companies

: We consider two companies that are competing for orders. Let X 1 ( n ) denote the number of orders processed by the first company at time n , and let τ ( k ) be the first time that X 1 ( n ) < j or X 1 ( n ) = r , given that X 1 ( 0 ) = k . We assume that { X 1 ( n ) , n = 0,1, . . . } is a controlled discrete-time queueing system. Each company is using some control to increase its share of orders. The aim of the first company is to maximize the expected value of τ ( k ) , while its competitor tries to minimize this expected value. The optimal solution is obtained by making use of dynamic programming. Particular problems are solved explicitly.


Introduction
Many papers have been published on the optimal control of queueing systems.Often the authors assume that it is possible to control the service rate; see, for example, Laxmi et al. [1], Chen et al. [2] and Tian et al. [3].In other papers, the aim is to control the arrival of customers into the system; see Wu et al. [4].Sometimes, it is assumed that there are different types of customers; see Wen et al. [5] and Dudin et al. [6].Another problem that is often considered is to determine the optimal number of servers in the system; see Asadzadeh et al. [7].
In this paper, we consider a single-server queueing model in discrete time.Orders arrive one at a time.We assume that the time needed for an order to arrive is a random variable A that follows a geometric distribution with parameter 2 p A , so that Moreover, the service time is a random variable S having a geometric distribution with parameter p S .
There are two competing companies.In equilibrium, orders arrive at each company according to a random variable A i having a geometric distribution with parameter p A , for i = 1, 2.
Instead of solving an optimal control problem, our aim in this paper is to present a stochastic dynamic game that can serve as a model for the behavior of companies competing for orders.We must, of course, make simplifying assumptions in order to obtain a mathematically tractable problem.However, we believe that the model is realistic enough to be useful.
Let X 1 (n) be the number of orders being processed by the first company at time n, and let where j ≤ k ≤ r and j, k, r ∈ N. Now, suppose that each company can try to use some form of control in order to increase its share of the orders.More precisely, we assume that the first company uses the control u n ∈ {0, p A } at time n to increase the arrival rate of its orders.Similarly, the second company uses v n ∈ {−p A , 0} at time n to decrease the order arrival rate of its competitor.

Remark 1.
Writing that v n = −p A means that Company 2 is making some efforts (for instance, by lowering its prices) in order to reduce to zero the rate at which Company 1 receives orders, unless this company also tries to increase or at least keep its share of the orders.

Let
p A (n We define the cost function where q 1 , q 2 and λ are positive constants.We look for the values u * n and v * n of the control variables that are such that the expected value of J(k) is minimized.
The idea behind the cost function is as follows: since the parameter λ is positive, if r is large, then Company 1 wants to maximize the time it stays in business.To do this, it would like to use the control u n = p A ; however, this leads to quadratic control costs.Company 2, for its part, would like to use v n = −p A in order to bankrupt Company 1 as quickly as possible, but this also entails quadratic control costs.
Problems in which the optimizers try to minimize or maximize the expected value of a certain cost function until a given random event occurs are known as homing problems.
Whittle [8] considered the case when the optimizer controls an n-dimensional diffusion process until it leaves a given subset of R n .Rishel [9] also treated the homing problems for n-dimensional diffusion processes; these processes were more realistic models for the wear of devices than those proposed by various authors.
The author has recently extended homing problems to the optimal control of queueing systems in continuous time; see [10][11][12].In these three papers, the aim was to determine the optimal number of servers working at time t.He also published a paper ( [13]) on a homing problem for a controlled random walk with two optimizers.
Next, we define the value function The function F(k) is the expected cost incurred (which can sometimes be a reward) if both optimizers choose the optimal value of u n and v n between the initial time n = 0 and time τ(k) − 1.
In Section 2, the dynamic programming equation satisfied by the value function F(k) will be derived, and a particular problem will be solved explicitly.In Section 3, the problem formulation will be modified.We will then assume that the value of u n is known, and we will look for the value of v n that maximizes the expected value of a certain cost function.Concluding remarks will be made in Section 4.

Dynamic Programmic Equation
We will derive the dynamic programming equation satisfied by F(k).We have Then, making use of Bellman's principle of optimality (see [14]), we can write that Indeed, whatever the two optimizers decide to do at time n = 0, the decisions they make from time n = 1 to time τ(k) − 1 must be optimal.
Remark 2. Equation ( 7) is valid because of our assumptions that A i , for i = 1, 2, and S have a geometric distribution.Indeed, as is well known, this distribution possesses the memoryless property.If we assume instead that these random variables have any other (discrete) distribution, then we will have to take the past into account, rendering the optimization problem almost intractable.
Furthermore, we have Hence, we can state the following proposition.
Proposition 1.The value function F(k) satisfies the dynamic programming equation (DPE) Moreover, we have the boundary condition F(k) = 0 if k < j or k = r.

Remark 3.
(i) We take for granted that each optimizer does not know what the other has decided to do.In Section 3, we will assume that Company 2 knows the decision made by Company 1. (ii) There are four possibilities for (u 0 , v 0 ): (0, 0), (0, −p A ), (p A , 0) and (p A , −p A ).If we solve the difference equation corresponding to each possible value of (u 0 , v 0 ), we actually obtain the value of the function F(k) if the optimizers choose the same value of (u 0 , v 0 ) for any k.Hence, we cannot obtain the value function and/or the optimal controls for any value of k by comparing the four expressions for F(k) obtained by solving the four difference equations.(iii) We can write that (u n , v n ) = (u n (k), v n (k)).The number of possible pairs (u 0 (k), v 0 (k)) for j ≤ k < r is equal to 4 r−j .If we have the values of (u 0 (k), v 0 (k)) for k = j, . . ., r − 1, we can solve a system of r − j linear equations to obtain the corresponding values of F(k) for any k.If r − j is small, it is a simple matter to consider all the possible values of (u 0 (k), v 0 (k)) and compute the function F(k) for k = j, . . ., r − 1.We can then determine the optimal controls and the associated value function.
In the following subsection, a particular problem will be solved explicitly.
Remark 4. The four difference equations that must be solved, subject to the boundary conditions F(0) = F(3) = 0, are given below.We denote their solutions by F i (k), for i = 1, 2, 3, 4.
(1) (u 0 , v 0 ) = (0, 0): We easily find that the solution is We find that (3) (u 0 , v 0 ) = (0, −p A ): Because p A (0) = 0, X 1 (n) cannot reach the value 3 if k = 1 or 2. The solution that satisfies the boundary condition F(0) = 0 is We have The functions F i (k), for i = 2, 3, 4, are shown in Figure 1.Moreover, the values of F i (k) for k = 1, 2 and i = 1, 2, 3, 4 are presented in Table 2.  Notice that F 1 (k) corresponds to Case no. 1, while F 4 (k) corresponds to Case no.16.We observe that none of the functions F 1 (k), . . .F 4 (k) is the value function.However, F 4 (k) yields values which are quite close to those obtained with the value function.Therefore, if r − j is large, so that the number of equations to consider is also large, then a suboptimal solution can be obtained by assuming that (u 0 (k), v 0 (k)) will be the same for any k.

Optimal Control When u n Is Known
In this section, we assume that Company 2 knows the strategy of Company 1, and tries to maximize the expected value of the following cost function: The terminal cost function K(•) is defined by where K 1 < 0 and K 2 > 0. The constant r ∈ N could be the maximum number of orders that Company 1 can process at the same time.Suppose that Company 1 uses the control u n (k) = p A for any n and any k.Company 2 must decide whether to choose v n (k) = 0 or −p A .We define the value function Proceeding as in the previous section, we can prove the following proposition.
Proposition 2. The value function V(k) satisfies the dynamic programming equation There are now 2 r−j possible strategies for Company 2. If r − j is small, we can proceed as in the previous section and calculate the expected value of C(k) for each possible strategy.
Table 3.Values of V(1), . . .V(4) for the 16 possible choices for the control variables v 0 (1), . . ., v 0 (4).Remark 5.As in the previous section, we can obtain at least a suboptimal solution which is close the optimal one by solving the difference equations obtained by assuming first that v 0 (k) ≡ 0, namely and then that v 0 (k) ≡ −p A : The first equation corresponds to Case no. 1, and the second one to Case no.16.We find that the solutions to these equations in the particular case considered above are, respectively, See Figure 2. Notice that the solution obtained when v 0 (k) ≡ 0 (Case no. 1) actually gives the minimum of the expected value of the cost function C(k).Moreover, we see in Table 3 that the choice that corresponds to Case no.11 almost yields the optimal solution that we are looking for.

Conclusions
In this paper, a homing problem for a queueing model in discrete time has been considered.The problem can be seen as a dynamic game because there are two optimizers with opposing objectives.
In Section 2, dynamic programming was used to derive the equation satisfied by the value function.From this equation, one can deduce the optimal values of the control variables.However, we have seen that, in order to do this, one has to solve a possibly large number of systems of linear equations, subject to the appropriate boundary conditions.Although solving each system is straightforward, repeating this procedure a large number of times can become tedious.We have also seen that it is possible to obtain a good suboptimal solution to our problem fairly quickly.
In Section 3, the problem formulation was modified.We assumed that the strategy of Company 1 was known, and we looked for the strategy that Company 2 should adopt to maximize the expected value of a certain cost function.We treated the case when the control variable u n is always equal to p A .However, the same type of analysis could be carried out for any choice of u n .In particular, we could find the optimal strategy of Company 2 if u n ≡ 0.
In theory, we could easily extend the problems considered to the case when each optimizer can choose between more than two possible values for the variable it controls.The calculations would, however, become quite complex.One could possibly use numerical simulations to determine the optimal solutions.Indeed, instead of solving a large number of difference equations, simulating the proposed model can enable us to determine the optimal solution by computing the value function for each simulation.Simulating geometric random variables is not a difficult task.
Finally, it is also possible to consider optimal control problems for queueing models in continuous time with two optimizers.

Table 1 .
Values of F(1) and F(2) for the 16 possible choices for the control variables.