Estimation of the Optimal Threshold Policy in a Queue with Heterogeneous Servers Using a Heuristic Solution and Artificial Neural Networks

Dmitry Efrosinin; Natalia Stepanova

doi:10.3390/math9111267

and

¹

Insitute for Stochastics, Johannes Kepler University Linz, 4030 Linz, Austria

²

Department of Information Technologies, Faculty of Mathematics and Natural Sciences, Peoples’ Friendship University of Russia (RUDN University), 117198 Moscow, Russia

³

Laboratory N17, Trapeznikov Institute of Control Sciences of RAS, 117997 Moscow, Russia

^*

Author to whom correspondence should be addressed.

Mathematics2021, 9(11), 1267;https://doi.org/10.3390/math9111267

This article belongs to the Section E: Applied Mathematics

Version Notes

Order Reprints

Abstract

This paper deals with heterogeneous queues where servers differ not only in service rates but also in operating costs. The classical optimisation problem in queueing systems with heterogeneous servers consists in the optimal allocation of customers between the servers with the aim to minimise the long-run average costs of the system per unit of time. As it is known, under some assumptions the optimal allocation policy for this system is of threshold type, i.e., the policy depends on the queue length and the state of faster servers. The optimal thresholds can be calculated using a Markov decision process by implementing the policy-iteration algorithm. This algorithm may have certain limitations on obtaining a result for the entire range of system parameter values. However, the available data sets for evaluated optimal threshold levels and values of system parameters can be used to provide estimations for optimal thresholds through artificial neural networks. The obtained results are accompanied by a simple heuristic solution. Numerical examples illustrate the quality of estimations.

Keywords:

heterogeneous servers; policy-iteration algorithm; heuristic solution; artificial neural networks

1. Introduction

Many queueing systems are analysed for their dynamic and optimal control related to system access, resource allocation, changing service area characteristics and so on. Sets of computerised tools and procedures provide large data sets which can be useful to expand potential of classical optimisation methods. The paper deals with a known model of a multi-server queueing system with controllable allocation of customers between heterogeneous servers which are differentiated by their service and cost attributes. For the queueing system with two heterogeneous servers it has been shown in [1] by using a dynamic programming approach that to minimise the mean sojourn time of customers in the system, the faster server must be always used and the customer has to be assigned to the slower server if and only if the number of customers in the queue exceeds the certain threshold level. Furthermore, this result was obtained independently in more simple form in [2,3]. In [4], the author has analysed a multi-server version of such a system and confirmed a threshold nature of the optimal policy as well.

The problem of an optimal allocation of customers between heterogeneous servers in queueing systems with additional costs with the aim to minimise the long-run average cost per unit of time is notoriously more difficult. Some progress has been made after the appearance of a review paper [5]. In [6,7], the authors studied a model with set-up costs using a hysteretic control rule, thereby stressing the algorithmic aspects of the optimal control structure. The same system has been discussed in [8], where a direct method that provides a closed-form expression for the stationary occupancy distribution was proposed. In [9,10], the authors have used theoretical study and exhaustive numerical analysis to show that for some specified servers, ordering the optimal allocation policy which minimises the long-run average cost belongs to a set of structural policies. In other words, for the servers’ enumeration (1), the allocation control policy denoted by f can be defined through a sequence of threshold levels

1 = q_{1} \leq q_{2} \leq \dots \leq q_{K} < \infty

. With respect to the defined policy, the server operating at highest rate should remain busy by non-empty queueing system. The kth server (

k \geq 2

) is used only if the first

k - 1

servers are busy and the queue length reaches a threshold level

q_{k} > 0

. In the general case, the optimal threshold levels can depend on states of slower server and formally the optimal policy f is not of a pure threshold type. However, since the kth threshold value may vary by at most one when the state of slower server changes and it has a weak effect on the average cost, such influence can be neglected. Hence the optimal allocation policy for multi-server heterogeneous queueing system can be treated as a threshold one.

Searching for the optimal values of

q_{2}, \dots, q_{K}

by direct minimising the average cost function can be expensive, especially when K is large. To calculate the optimal threshold levels we can use a policy-iteration algorithm [11,12,13] which constructs a sequence of improved policies that converges to optimal one. This algorithm is a fairly versatile tool for solving various optimisation problems. Unfortunately, as is usually the case in practice, this algorithm is not without some limitations, such as the difficulties associated with convergence when the traffic is close to loaded, limitation on the process dimension and, consequently, on the number of states. Thus, we would like to compensate for some of the weaknesses of this algorithm with other methods for calculating the optimal control policy. The contribution of this paper can be briefly described in two conceptual parts. In the first part, we propose a heuristic solution (HS) to obtain functional relationships for optimal thresholds based on a simple discrete approximation of the system’s behaviour. The second part is devoted to the alternative machine learning technique such as artificial neural networks (NN) [14,15,16] which is used again for the estimation of the optimal threshold levels. The policy-iteration algorithm is used in the paper to generate the data sets needed both to verify the quality of the proposed optimal threshold estimation methods and to train the neural networks. We strongly believe that the trained neural network can be successfully used to calculate the optimal thresholds for those system parameters for which alternative numerical methods are difficult or impossible to use, for example, in heavy traffic case, or, in general, to reconstruct the areas of optimality without usage of time-expensive algorithms and procedures. There are some number of papers on prediction of the stochastic behaviour of queueing systems and networks using machine learning algorithms, see e.g., [17,18] and references therein. However, we unsuccessfully tried to find published works where heuristics and machine learning methods would be used to solve a similar optimisation problem for heterogeneous queueing systems and therefore we consider this paper relevant.

This paper is organised as follows. In Section 2, we briefly discuss a mathematical model. Section 3 introduces some heuristic choices for threshold levels that turn out to be nearly optimal. Section 4 presents results when the trained neural network was ran on verification data of the policy-iteration algorithm.

2. Mathematical Model

We summarise briefly the model under study. The queueing system is of the type

M / M / K

with infinite-capacity buffer and K heterogeneous servers. This system is shown schematically in Figure 1. The Poisson arrival stream has a rate

λ

and the exponential distribution of the service time at server j has a rate

μ_{j}

. We assume that the service in the system is without preemption, when customer in service cannot change the server. The random variables of the inter-arrival times and the service times of the servers are assumed to be independent. An additional cost structure is introduced, consisting of the operating cost

c_{j} > 0

per unit of time of service on server j and the holding cost

c_{0} > 0

of waiting in the queue. Assume that the servers are enumerated in a way

\begin{matrix} μ_{1} \geq \dots \geq μ_{K}, c_{1} μ_{1}^{- 1} \leq \dots \leq c_{K} μ_{K}^{- 1}, \end{matrix}

(1)

where

c_{j} μ_{j}^{- 1}

stands for the mean operating cost per customer for the jth server.

Figure 1. Controllable multi-server queueing system with heterogeneous servers and operating costs.

The controller has full information about the system’s state and, based on this information, can make control actions on the system at the decision epochs when certain state transitions occur, following the prescription of the policy f. In our case, the controller selects the control action at the time when a new customer enters the system and at the service completion times, if the queue is not empty. When a new customer arrives, it joins the queue and at the same time, the controller sends another customer from the head of the queue to one of the idle servers or leaves it in the queue. At the service completion, the customer leaves the corresponding server, and at the same time the controller takes the next customer from the head of the queue, if it is not empty, and dispatches it to one of idle servers or can leave it in the queue as well. The service completion in the system without waiting customers does not require the controller to perform any control action.

The fact that the optimal policy for the problem of minimising the long-run average cost per unit of time belongs to a set of threshold-based policies for the multi-server heterogeneous queueing systems with costs were proved first in [10] and further conformed for systems with heterogeneous groups of servers in [19]. The corresponding optimal thresholds can in the general case depend on the states of slower servers. However, according to obtained numerical results in [9], we can neglect the weak influence of the slower servers’ states on the optimal allocation policy for the faster servers. This phenomena was discussed additionally in Example 2. Therefore, we may assume that the optimal policy belongs to the class of a pure threshold policy when the use of a certain server depends solely on the number of waiting customers in the queue. Specifically, for the system under study, such a policy is defined by the following sequence of threshold levels:

\begin{matrix} 1 = q_{1} \leq q_{2} \leq \dots \leq q_{K} < \infty . \end{matrix}

(2)

The policy prescribes the use of the k fastest servers whenever the number of customers waiting in the queue satisfies the condition

q_{k} \leq q \leq q_{k + 1} - 1

.

To calculate optimal thresholds we need to formulate the introduced optimisation problem in terms of a Markov decision process. This process is based on a

K + 1

-dimensional continuous-time Markov chain

{X (t)}_{t \geq 0} = {Q (t), D_{1} (t), \dots, D_{K} (t)}_{t \geq 0}

(3)

with an infinitesimal matrix

Λ^{f}

which depends on the policy f. Here the component

Q (t) \in N_{0}

stands for the number of waiting customers at time t and

D_{j} (t) = \{\begin{matrix} 0 & if j th server is idle \\ 1 & if j th server is busy \end{matrix} .

The state space of the process

{X (t)}_{t \geq 0}

operating under some policy f is

E^{f} = {x = (q (x), d_{1} (x), \dots, d_{K} (x))} \subseteq N_{0} \times {0, 1}^{K}

, where the notations

q (x)

and

d_{j} (x)

are used respectively for the queue length and for the state of jth server in state

x \in E^{f}

.

The possible server states are partitioned as follows:

J_{0} (x) = {j : d_{j} (x) = 0}, J_{1} (x) = {j : d_{j} (x) = 1} .

The sets

J_{0} (x)

and

J_{1} (x)

denote the sets of idle and busy servers in state

x \in E^{f}

, respectively. The set of control actions a is

A = {0, 1, \dots, K}

. If

a = 0

, the controller allocates a customer to the queue. Otherwise, if

a \neq 0

, the controller instructs a customer to occupy the server with a number a. In addition, we can define the subsets

A (x) = J_{0} (x) \cup {0} \subseteq A

of admissible actions in state x The policy f specifies the choice of a control action at any decision epoch and the infinitesimal matrix

Λ^{f} = [λ_{x y} (a)]

of the Markov-chain (3) has then the following elements,

\begin{matrix} λ_{x y} (a) = \{\begin{matrix} λ & y = x + e_{a}, j \in A (x) \\ μ_{j} & y = x - e_{j}, j \in J_{1} (x), q (x) = 0 \\ μ_{j} & y = x - e_{j} - e_{0} + e_{a}, a \in A (x - e_{j} - e_{0}), q (x) > 0, \end{matrix} \end{matrix}

where

e_{j}

is defined as

K + 1

-dimensional unit vector with each element equal to zero except the jth position (

j = 0, 1, \dots, K

).

We will search for the optimal control policy among the set of stationary Markov policies f that guarantee ergodicity of the Markov chain

{X (t)}_{t \geq 0}

. The corresponding stability condition is obviously defined as

λ < \sum_{j = 1}^{K} μ_{j}

. It follows from the fact, that if number of customers exceeds a threshold

q_{K}

, then the queueing systems behaves like a

M / M / 1

queue with an arrival rate

λ

and total service rate

μ_{1} + \dots + μ_{K}

. As it is known, see e.g., [13], the ergodic Markov chain with costs implies the equality of the long-run average cost per unit of time for the policy f and the corresponding assemble average, that can be written in the form

\begin{matrix} g^{f} = \underset{t \to \infty}{lim sup} \frac{1}{t} V^{f} (x, t) = \sum_{y \in E^{f}} c (y) π_{y}^{f}, \end{matrix}

(4)

where

c (y) = c_{0} q (y) + \sum_{j = 1}^{K} c_{j} d_{j} (y)

is an immediate cost in state

y \in E^{f}

. The cost function

V^{f} (x, t)

is given by

\begin{matrix} V^{f} (x, t) = E^{f} [\int_{0}^{t} (c_{0} Q (t) + \sum_{j = 1}^{K} c_{j} D_{j} (t)) d t | X (0) = x] . \end{matrix}

This function describes the total average cost up to time t given the initial state is x and

π_{y}^{f} = P^{f} [X (t) = y]

is a stationary state distribution for the policy f. The policy

f^{*}

is said to be optimal when for

g^{f}

defined in (4) we evaluate

\begin{matrix} g^{*} = inf_{f} g^{f} = min_{q_{2}, \dots, q_{K}} g (q_{2}, \dots, q_{K}) . \end{matrix}

(5)

To evaluate optimal threshold levels and optimised value for the mean average cost per unit of time the policy-iteration Algorithm 1 is used. This algorithm constructs a sequence of improved policies until the average cost optimal is reached. It consists of three main steps: value evaluation, policy improvement and threshold evaluation. The Value evaluation is based on solving, for a given policy f, a system of linear equations

\begin{matrix} v^{f} (x) = \frac{1}{λ_{x} (a)} (c (x) + \sum_{y \neq x} λ_{x y} (a) v^{f} (y) - g^{f}) . \end{matrix}

(6)

Algorithm 1 Policy-iteration algorithm

1:: procedurePIA( $K, W, λ, μ_{j}, c_{j}, j = 1, 2, \dots, K, c_{0}$ )
2:: $f^{(0)} (x) = {argmin}_{j \in J_{0} (x)} \{\frac{c_{j}}{μ_{j}}\}$ ▹ Initial policy
3:: $n \leftarrow 0$
4:: $g^{f^{(n)}} = λ v^{f^{(n)}} (e_{1})$ ▹ Value evaluation
5:: for $x = (0, 1, 0, \dots, 0) to (N, 1, 1, \dots, 1)$ do
6:: $\begin{matrix} v^{f^{(n)}} (x) & = \frac{1}{λ + \sum_{j \in J_{1} (x)} μ_{j}} [c (x) - g^{f^{(n)}} + λ v^{f^{(n)}} (x + e_{f^{(n)} (x)}) \\ + \sum_{j \in J_{1} (x)} μ_{j} v^{f^{(n)}} (x - e_{j}) 1_{{q (x) = 0}} \\ + \sum_{j \in J_{1} (x)} μ_{j} v^{f^{(n)}} (x - e_{j} - e_{0} + e_{f^{(n)} (x - e_{j} - e_{0})}) 1_{{q (x) > 0}}] \end{matrix}$
7:: end for
8:: ▹ Policy improvement

$f^{(n + 1)} (x) = {argmin}_{a \in A (x)} v^{f^{(n)}} (x + e_{a})$
9:: if $f^{(n + 1)} (x) = f^{(n)} (x), x \in E^{f}$ then return $f^{(n + 1)} (x), v^{f^{(n)}} (x), g^{f^{(n)}}$
10:: else $n \leftarrow n + 1$ , go to step 4
11:: end if
12:: ▹ Threshold evaluation

$q_{k} : f^{(n + 1)} (q, 1, \dots, 1, 0, d_{k + 1}, \dots, d_{K}) = \{\begin{matrix} 0 & q \leq q_{k} - 2 \\ k & q > q_{k} - 2 \end{matrix}, k = 2, \dots, K$
13:: end procedure

For the dynamic-programming value function

v^{f} : E^{f} \to R

, which indicates a transition effect of an initial state x to the total average cost and satisfies the following asymptotic relation,

V^{f} (x, t) = g^{f} t + v^{f} (x) + o (1), t \to \infty, x \in E^{f} .

In order to make the system (6) solvable, one of the values

v (x)

must be set to zero, e.g., for

x_{0} = (0, \dots, 0)

we set

v (x_{0}) = 0

. Since in our case

c (x_{0}) = 0

, the first equation of the system (6) is of the form

g^{f} = \sum_{y \neq x_{0}} λ_{x_{0} y} (a) v^{f} (y)

. In the policy improvement step a new policy

f^{'}

is calculated by minimising the value function

v (x + e_{a})

for any state

x \in E^{f}

and any admissible control action

a \in A (x)

. The algorithm converges if the policies f and

f^{'}

on neighbouring iterations are equal. In the threshold evaluation we calculate the optimal thresholds

q_{k}

,

k = 2, \dots, K

, based on optimal policy f. As an initial policy we select the policy which prescribes in any state the usage of a server j with the minimal value of the mean operating cost

\frac{c_{j}}{μ_{j}}

per customer. More detailed information on deriving the dynamic programming equations for the heterogeneous queueing system and calculating the corresponding optimal allocation control policy can be found in [9]. For existence of an optimal stationary policy and convergence of the policy-iteration algorithm we refer to [12,20,21,22].

To realise the policy-iteration algorithm we convert the

K + 1

-dimensional state space

E^{f}

of the Markov decision process to a one-dimensional equivalent state space. Let

Δ : E^{f} \to N_{0}

be a one-to-one mapping of the vector state

x = (q (x), d_{1} (x), \dots, d_{K} (x)) \in E^{f}

to a value from

N_{0}

which is of the form

\begin{matrix} Δ (x) = q (x) 2^{K} + \sum_{i = 1}^{K} d_{i} (x) 2^{i - 1} . \end{matrix}

(7)

A new state after transition involving the addition or removal of customer in some state

x \in E^{f}

, in a one-dimensional state space is calculated by

\begin{matrix} Δ (x \pm e_{0}) = (q (x) \pm 1) 2^{K} + \sum_{i = 1}^{K} d_{i} (x) 2^{i - 1} = Δ (x) \pm 2^{K}, \\ Δ (x \pm e_{j}) = q (x) 2^{K} + \sum_{i = 1}^{K} d_{i} (x) 2^{i - 1} \pm 2^{j - 1} = Δ (x) \pm 2^{j - 1} . \end{matrix}

Further in the algorithm, an infinite buffer system must be approximated by an equivalent system where the number of waiting places is finite but at the same time is sufficiently large. As a truncation criterion, we use the loss probability which should not exceed some small value

ε > 0

.

Remark 1.

If the buffer size is W, the number of states is

| E^{f} | = 2^{K} (W + 1) .

In case the number of waiting customers is getting larger as the level

q_{K}

, all servers must be occupied and the system dynamics is the same as in a classical queue

M / M / 1

with arrival rate λ and service rate

\sum_{j = 1}^{K} μ_{j}

. The stationary state probabilities for the states x where the component

q (x) \geq q_{K}

satisfy the following difference equation

λ π_{(q - 1, 1, \dots, 1)} - (λ + \sum_{j = 1}^{K} μ_{j}) π_{(q, 1, \dots, 1)} + \sum_{j = 1}^{K} μ_{j} π_{(q + 1, 1, \dots, 1)} = 0,

which has a solution in a geometric form,

π_{(q, 1, \dots, 1)} = π_{(q_{K}, 1, \dots, 1)} ρ^{q - q_{K}}

,

q \geq q_{K}

. For details and theoretical substantiation see, e.g., [23]. Note that the value of

q_{K}

included in this formula can be estimated by a heuristic solution (9). Then the truncation parameter W of the buffer size can be evaluated from the following constraint for the loss probability

\sum_{q = W}^{\infty} π_{(q, 1, \dots, 1)} = π_{q_{K}} \sum_{q = W}^{\infty} ρ^{q - q_{K}} \leq \sum_{q = W}^{\infty} ρ^{q - q_{K}} = \frac{ρ^{W - q_{K}}}{1 - ρ} < ε,

where

ρ = \frac{λ}{\sum_{j = 1}^{K} μ_{j}}

. After simple algebra, it implies

W > \frac{log ε (1 - ρ)}{log (ρ)} + q_{K} .

Example 1.

Consider the system

M / M / 5

with

K = 5

and

λ = 15

. All other parameters take the following values

j	0	1	2	3	4	5
$c_{j}$	1	5	4	3	2	1
$μ_{j}$	-	20	8	4	3	1
$c_{j} μ_{j}^{- 1}$	-	0.25	0.50	0.75	0.67	1.00

The truncation parameter W of the buffer size is chosen at value 80 which for

ε = 0.0001

guarantees that

W > \frac{log 0.0001 (1 - 14 / 36)}{log (14 / 36)} + q_{5} = 22.2734

. Here

q_{5} = 12

was calculated by (9). In a control table, we summarise the functions

f (x)

which specify the control actions at time of arrivals to a certain state x:

System State $x$	Queue Length $q (x)$
$d = (d_{1}, d_{2}, d_{3}, d_{4}, d_{5})$	0	1	2	3	4	5	6	7	8	9	10	11	12	...
(0,,,,)	1	1	1	1	1	1	1	1	1	1	1	1	1	1
(1,0,,,*)	0	0	2	2	2	2	2	2	2	2	2	2	2	2
(1,1,0,,)	0	0	0	3	3	3	3	3	3	3	3	3	3	3
(1,1,1,0,*)	0	0	0	0	4	4	4	4	4	4	4	4	4	4
(1,1,1,1,0)	0	0	0	0	0	0	0	0	0	0	0	5	5	5
(1,1,1,1,1)	0	0	0	0	0	0	0	0	0	0	0	0	0	0

Threshold levels

q_{k}

,

k = 1, \dots, K = 5

, can be evaluated by comparing the optimal actions

f (q, \underset{k - 1}{\underset{︸}{1, \dots, 1}}, \underset{K - k + 1}{\underset{︸}{0, \dots, 0}}) < f (q + 1, \underset{k - 1}{\underset{︸}{1, \dots, 1}}, \underset{K - k + 1}{\underset{︸}{0, \dots, 0}})

for

q = 0, \dots, W - 1

. In this example the optimal policy

f^{*}

is defined here through a sequence of threshold levels

(q_{2}, q_{3}, q_{4}, q_{5}) = (3, 4, 5, 12)

and

g^{*} = 4.92897

. The bold and underline format in a control table is used to label the change of the control action in a certain system state.

In the next example we give some arguments that allow us to work further only with the threshold-based control policies.

Example 2.

Consider the system

M / M / 3

with

K = 3

servers. The aim of this example consists in the following: With respect to the system states

x = (q, 1, 0, 0)

and

y = (q, 1, 0, 1)

the assignment to the second server can in general depend not only on the number of customers in the queue but also on the state of the third server. In this example it is optimal to make an assignment in state x but not in state y. We solve optimisation problem for the following parameters:

$λ = 0.238$ , $μ_{1} = 0.621$ , $μ_{2} = 0.071$ and $μ_{3} = 0.070$ ,
$λ = 0.477$ , $μ_{1} = 0.356$ , $μ_{2} = 0.096$ and $μ_{3} = 0.070$ .

The of optimal solution for the first and second group of system parameters are represented in Table 1 and Table 2, respectively.

Table 1. Control table.

Table 2. Control table.

We notice that for most parameter values the optimal decision can be made independently of the states of the slower servers. However, it is interesting to consider the reasons for such possible dependence. It is evident that in our optimisation problem, the optimal policy assigns a customer to the fastest free server in states for which this would not be optimal if there were no arrivals. This is because the system should be ready for possible arrivals, which, if they occur, will wish to see a less congested system.

Consider now the system with three servers in the states

x + e_{0} + e_{1}

and

x + e_{1} + e_{2}

, where

x = (0, 0, 0, 0)

. Let us consider the case of potential service completion at the second server, taking into account a large number q of accompanied arrivals. Because of large q, it is optimal to occupy all accessible idle servers. The states mentioned above become

x + (q - 1) e_{0} + e_{1} + e_{2} + e_{3}

and

x + (q - 2) e_{0} + e_{1} + e_{2} + e_{3}

. Thus, the difference

v (x + (q - 1) e_{0} + e_{1} + e_{2} + e_{3}) - v (x + (q - 2) e_{0} + e_{1} + e_{2} + e_{3})

of value functions measures the advantage that will be obtained in the case of the assignment to the second processor

x + e_{0} + e_{1} \to x + e_{1} + e_{2}

. The events of service completion on the second server provide the incentive to make an assignment to the second server. However, if the two initial states are

x + e_{0} + e_{1} + e_{3}

and

x + e_{1} + e_{2} + e_{3}

, the measure of advantage if service completion takes place is

v (x + q e_{0} + e_{1} + e_{2} + e_{3}) - v ((q - 1) e_{0} + e_{1} + e_{2} + e_{3})

. Since we expect that the value function

v (q e_{0} + e_{1} + e_{2} + e_{3})

is convex in q, it is plausible that the incentive to make an assignment to the second server is greater in state

x + e_{0} + e_{1} + e_{3}

than in

x + e_{0} + e_{1}

. Numerical examples proposed in Table 3 confirm our expectations.

Table 3. Value function for system states.

The further numerical examples show that the threshold levels have a very weak dependence of slower servers’ states. According to our observations, the optimal threshold may vary by at most 1 when the state of a slower server changes.

The data needed either to verify the heuristic solution or for training and verification of the neural network was generated by a policy-iteration algorithm in form of the list

\begin{matrix} S = & {(λ, μ_{1}, \dots, μ_{K}, c_{0}, c_{1}, \dots, c_{K}) \to (q_{2}, \dots, q_{K}) : \\ λ \in [1, 45], μ_{1}, \dots, μ_{K} \in [1, 40], c_{0} \in [1, 3], c_{1}, \dots, c_{K} \in [1, 5], \\ λ < \sum_{j = 1}^{K} μ_{j}, μ_{1} \geq \dots \geq μ_{K}, c_{1} μ_{1}^{- 1} \leq \dots \leq c_{K} μ_{K}^{- 1}} . \end{matrix}

(8)

Example 3.

Some elements of the list S for the

M / M / 5

queueing system are

\begin{matrix} (1, 20, 8, 4, 2, 1, 1, 1, 1, 1, 1, 1) \to (2, 5, 13, 30), (10, 20, 8, 4, 2, 1, 1, 1, 1, 1, 1, 1) \to (1, 4, 9, 21), \\ (1, 20, 8, 4, 2, 1, 1, 5, 4, 3, 2, 1) \to (5, 12, 20, 20), (10, 20, 8, 4, 2, 1, 1, 5, 4, 3, 2, 1) \to (3, 8, 13, 13) . \end{matrix}

3. Heuristic Solution

In this section, we want to obtain a heuristic solution (HS) to calculate the optimal thresholds

q_{k}

,

k = 2, \dots, K

for the arbitrary K in explicit form. For this purpose, we will use a simple deterministic approximation for the dynamic behaviour of the number of customers in the queue as illustrated in Figure 2.

Figure 2. Queue length approximation.

Let

q_{k}

is an optimal threshold used to dispatch the customer to server k in state

(q_{k} - 1, \underset{k - 1}{\underset{︸}{1, \dots, 1}}, \underset{K - k + 1}{\underset{︸}{0, \dots, 0}})

, where the first

k - 1

servers are busy. Now we compare the queues of the system given initial state is

x_{0} = (q_{k}, \underset{k - 1}{\underset{︸}{1, \dots, 1}}, 0, \underset{K - k}{\underset{︸}{0, \dots, 0}})

, where the kth server is not used for a new customer, and

y_{0} = (q_{k} - 1, \underset{k - 1}{\underset{︸}{1, \dots, 1}}, 1, \underset{K - k}{\underset{︸}{0, \dots, 0}})

, where the kth server is occupied by a waiting customer. It is assumed that the stability condition holds. The initial queue lengths are labelled in Figure 2 by

A = q_{k}

and

B = q_{k} - 1

. The proposed deterministic approximation is based on an assumption that the queue length of the system with the first

k - 1

busy servers decreases with the rate

\sum_{j = 1}^{k - 1} μ_{j} - λ

. When this rate is keeping until the queue is empty, it occurs at time points

D = \frac{q_{k}}{\sum_{j = 1}^{k - 1} μ_{j} - λ}

and

C = \frac{q_{k} - 1}{\sum_{j = 1}^{k - 1} μ_{j} - λ}

respectively for the given initial queue length A and B. The total (accumulated) holding times of all customers in the queue with lengths

q_{k}

and

q_{k} - 1

are equal respectively to the number of square blocks of dimension

1 \times \frac{1}{\sum_{j = 1}^{k - 1} μ_{j} - λ}

within the areas

A O D

and

B O C

multiplied by the mean service time of the approximated model:

\begin{matrix} F_{A O D} = (q_{k} + (q_{k} - 1) + (q_{k} - 2) + \dots + 1) \frac{1}{\sum_{j = 1}^{k - 1} μ_{j} - λ} = \frac{q_{k} (q_{k} + 1)}{2} \cdot \frac{1}{\sum_{j = 1}^{k - 1} μ_{j} - λ} and \\ F_{B O C} = ((q_{k} - 1) + (q_{k} - 2) + \dots + 1) \frac{1}{\sum_{j = 1}^{k - 1} μ_{j} - λ} = \frac{q_{k} (q_{k} - 1)}{2} \cdot \frac{1}{\sum_{j = 1}^{k - 1} μ_{j} - λ} . \end{matrix}

The mean operating cost of the first

k - 1

servers during the time period until the queue becomes empty given the initial state is

x_{0}

can be calculated by

q_{k} (\frac{c_{1}}{μ_{1}} \frac{μ_{1}}{\sum_{j = 1}^{k - 1} μ_{j}} + \dots + \frac{c_{k - 1}}{μ_{k - 1}} \frac{μ_{k - 1}}{\sum_{j = 1}^{k - 1} μ_{j}}) = q_{k} \frac{\sum_{j = 1}^{k - 1} c_{j}}{\sum_{j = 1}^{k - 1} μ_{j}} .

The expression

\frac{μ_{i}}{\sum_{j = 1}^{k - 1} μ_{j}}

means the probability of the service completion at the ith server, and the mean operating cost given the initial state is

y_{0}

, which can be defined as

(q_{k} - 1) \frac{\sum_{j = 1}^{k - 1} c_{j}}{\sum_{j = 1}^{k - 1} μ_{j}}

.

Now using the deterministic approximation we can formulate the following proposition.

Proposition 1.

The optimal thresholds

q_{k}

,

k = 2, \dots, K

, are defined by

\begin{matrix} q_{k} \approx {\hat{q}}_{k} = min \{1, ⌊\frac{\sum_{j = 1}^{k - 1} μ_{j} - λ}{c_{0}} [\frac{c_{k}}{μ_{k}} - \frac{\sum_{j = 1}^{k - 1} c_{j}}{\sum_{j = 1}^{k - 1} μ_{j}}]⌋\} . \end{matrix}

(9)

Proof.

Let

V (x)

be the overall average system cost until the system becomes empty given the initial state is

x \in E^{f}

. This value can be represented as a sum of the total holding cost of customers waiting in the queue and mean operating cost of all servers which remain busy in state x. Assume that the controller performs a decision to allocate the customer to the kth server in state

(q_{k} - 1, \underset{k - 1}{\underset{︸}{1, \dots, 1}}, \underset{K - k + 1}{\underset{︸}{0, \dots, 0}})

. As a result, it leads to a reduction of the overall system costs according to the proposed deterministic approximation, i.e.,

\begin{matrix} V (x_{0}) - V (y_{0}) > 0 . \end{matrix}

(10)

where

\begin{matrix} V (x_{0}) & = c_{0} F_{A O D} + q_{k} \frac{\sum_{j = 1}^{k - 1} c_{j}}{\sum_{j = 1}^{k - 1} μ_{j}} + V (0, \underset{k - 1}{\underset{︸}{1, \dots, 1}}, \underset{K - k + 1}{\underset{︸}{0, \dots, 0}}), \end{matrix}

(11)

\begin{matrix} V (y_{0}) & = \frac{c_{k}}{μ_{k}} + V (q_{k} - 1, \underset{k - 1}{\underset{︸}{1, \dots, 1}}, 0, \underset{K - k}{\underset{︸}{0, \dots, 0}}) \\ = \frac{c_{k}}{μ_{k}} + c_{0} F_{B O C} + (q_{k} - 1) \frac{\sum_{j = 1}^{k - 1} c_{j}}{\sum_{j = 1}^{k - 1} μ_{j}} + V (0, \underset{k - 1}{\underset{︸}{1, \dots, 1}}, \underset{K - k + 1}{\underset{︸}{0, \dots, 0}}) . \end{matrix}

After substitution of (11) into (10) we get

\begin{matrix} c_{0} (F_{A O D} - F_{B O C}) + \frac{\sum_{j = 1}^{k - 1} c_{j}}{\sum_{j = 1}^{k - 1} μ_{j}} - \frac{c_{k}}{μ_{k}} \\ = c_{0} \frac{q_{k}}{\sum_{j = 1}^{k - 1} μ_{j} - λ} + \frac{\sum_{j = 1}^{k - 1} c_{j}}{\sum_{j = 1}^{k - 1} μ_{j}} - \frac{c_{k}}{μ_{k}} > 0 . \end{matrix}

Now, expressing

q_{k}

after some simple manipulations we obtain the heuristic solution for the optimal value of

q_{k}

in form (9). □

Example 4.

Consider a queueing system from the previous example for

K = 5

. We select randomly from the data set S (8) a list of system parameters

\vec{α} = (λ, μ_{1}, \dots,

μ_{K}, c_{0}, c_{1}, \dots, c_{K})

and calculate by means of the HS (9) threshold levels

q_{k}

,

k = 1, \dots, K

. Figure 3 illustrates the efficiency of the proposed heuristic solution respectively for threshold levels

(q_{2}, q_{3}, q_{4}, q_{5})

by confusion matrices. The matrix row represents the elements including a predicted value while each column represents the elements for an actual value. As a metric for the closeness of the measurements to a specific value and to the interval with possible deviation of threshold by

\pm 1

from the real value, the overall accuracy and accuracy

\pm 1

are used. The results are summarised in Table 4.

Figure 3. Confusion matrices (a–d) for prediction of

q_{2}, q_{3}, q_{4}

and

q_{5}

using HS.

Table 4. Accuracy for prediction with HS.

4. Artificial Neural Networks

Artificial Neural Networks (NN) belong to a set of supervised machine learning methods. It is most popular in different applied problems including data classification, pattern recognition, regression, clustering and time series forecasting. Here we show that the NN can give even more positive results compared to the HS that indicates the possibility to use it for predicting the structural control policies.

The data set S (8) is used to explore predictions for the optimal threshold levels through the NN. The multilayer neural network is used for the data classification. It can be formally defined as a function

f : \vec{α} \to \vec{y}

, which maps an input vector

\vec{α}

of dimension

2 m + 1

to an estimate output

\vec{y} \in R^{N_{c}}

of the class number

N = 1, \dots, N_{c}

. The network is decomposed into 6 layers as illustrated in Figure 4, each of which represents a different function mapping vectors to vectors. The successive layers are: a linear layer with an output vector of size k, a nonlinear elementwise activation layer, other three linear layers with output vectors of size k and a nonlinear normalisation layer.

Figure 4. Architecture of the neural network.

The first layer is an affine transformation

{\vec{q}}_{1} = W_{1} \vec{α} + {\vec{b}}_{1},

where

{\vec{q}}_{1} = R^{2 m + 1}

is the output vector,

W \in R^{2 m + 1 \times k = 30}

is the weight matrix,

{\vec{b}}_{1} \in R^{2 m + 1}

is the bias vector. The rows in

W_{1}

are interpreted as features that are relevant for differentiating between corresponding classes. Consequently,

W_{1} \vec{α}

is a projection of the input

\vec{α}

onto these features. The second layer is an elementwise activation layer which is defined by the nonlinear function

{\vec{q}}_{2} = max (0, {\vec{q}}_{1})

setting negative entries of

q_{1}

to zero and uses only positive entries. The next three layers are other affine transformations,

{\vec{q}}_{i} = W_{i} {\vec{q}}_{i - 1} + \vec{b_{i}},

where

{\vec{q}}_{i} \in R^{k}

,

W_{i} \in R^{k \times k}

, and

b_{i} \in R^{k}

,

i = 3, 4, 5

. The last layer is the normalisation layer

\vec{y} = softmax ({\vec{q}}_{5})

, whose componentwise is of the form

y_{N} = \frac{e^{q_{5 N}}}{\sum_{N} e^{q_{5 N}}}, N = 1, \dots, N_{c} .

The last layer normalises the output vector

\vec{y}

with the aim to get the values between 0 and 1. The output

\vec{y}

can be treated as a probability distribution vector, where the Nth element

y_{N}

represents the likelihood that

\vec{α}

belongs to class N.

We use 70% of the same data S which was not used to verify the quality of the HS in a training phase of the NN and the rest of S—as validation data. We train a multilayer (6-layer) NN using an adaptive moment estimation method [24] and the neural network toolbox in Mathematica© of the Wolfram Research. Then we verify the approximated function

{\hat{q}}_{k} : = {\hat{q}}_{k} (λ, μ_{1}, \dots, μ_{K}, c_{0}, c_{1}, \dots, c_{K}),

which should be accurate enough to be used to predict new output from verification data. The algorithm was ran many times on samples and networks with different sizes. In all cases the results were quite positive and indicate the potential of machine learning methodology for optimisation problems in the queueing theory.

Example 5.

The results of estimations of the optimal threshold values using the trained NN are summarised again in form of confusion matrices, as is shown in Figure 5. The overall accuracy of classification and accuracies for the values with deviations are given in Table 5. We can see that the NN methodology exhibits even more accurate estimations for the optimal thresholds if the results are compared with the corresponding HS.

Figure 5. Confusion matrices (a–d) for prediction of

q_{2}, q_{3}, q_{4}

and

q_{5}

using NN.

Table 5. Accuracy for prediction with NN.

5. Conclusions

We combine classic methodology of analysing controllable queues with a heuristic solution and machine learning to study the possibility to estimate the values of optimal thresholds. Due to the fact that the results were quite positive, we can make the following general conclusion. With this study we confirm that the analysis of controlled queueing systems and the solution of optimisation problems using classical Markov decision theory can be successfully combined with machine learning techniques. These approaches do not contradict each other; on the contrary, combining them provides new results.

Author Contributions

Conceptualization, D.E.; formal analysis, investigation, methodology, software and writing, D.E. and N.S. Both authors have read and agreed to the published version of the manuscript.

Funding

Open Access Funding by the University of Linz. This research has been supported by the RUDN University Strategic Academic Leadership Program (recipient D. Efrosinin).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

This paper has been supported by the RUDN University Strategic Academic Leadership Program (recipient D. Efrosinin), Open Access Funding by the University of Linz.

Conflicts of Interest

The authors declare no conflict of interest.

References

Lin, W.; Kumar, P.R. Optimal control of a queueing system with two heterogeneous servers. IEEE Trans. Autom. Control 1984, 29, 696–703. [Google Scholar] [CrossRef]
Koole, G. A simple proof of the optimality of a threshold policy in a two-server queueing system. Syst. Control. Lett. 1995, 26, 301–303. [Google Scholar] [CrossRef]
Walrand, J. A note on: “Optimal control of a queuing system with two heterogeneous servers”. Syst. Control Lett. 1984, 4, 131–134. [Google Scholar] [CrossRef]
Rykov, V. Monotone Control of Queueing Systems with Heterogeneous Servers. QUESTA 2001, 37, 391–403. [Google Scholar]
Crabill, T.; Gross, D.; Magazine, M.J. A classified bibliography of research on optimal design and control of queues. Oper. Res. 1977, 25, 219–232. [Google Scholar] [CrossRef]
Nobel, N. Hysteretic and Heuristic Control of Queueing Systems. Ph.D. Thesis, Vrije University Amsterdam, Amsterdam, The Netherlands, November 1998. [Google Scholar]
Nobel, R.; Tijms, H.C. Optimal control of a queueing system with heterogeneous servers and set-up costs. IEEE Trans. Autom. Control 2000, 45, 780–784. [Google Scholar] [CrossRef]
Le Ny, L.-M.; Tuffin, B. A Simple Analysis of Heterogeneous Multi-Server Threshold Queues with Hysteresis; Institut National de Recherche en Informatique: Nancy, France, 2000. [Google Scholar]
Efrosinin, D. Controlled Queueing Systems with Heterogeneous Servers: Dynamic Optimization and Monotonicity Properties of Optimal Control Policies in Multiserver Heterogeneous Queues; VDM Verlag: Saarbrücken, Germany, 2008. [Google Scholar]
Rykov, V.; Efrosinin, D. On the slow server problem. Autom. Remote. Control 2010, 70, 2013–2023. [Google Scholar] [CrossRef]
Howard, R. Dynamic Programming and Markov Processes; Wiley Series; Wiley: London, UK, 1960. [Google Scholar]
Puterman, M.L. Markov Decision Process; Wiley Series in Probability and Mathematical Statistics; Wiley: London, UK, 1994. [Google Scholar]
Tijms, H.C. Stochastic Models. An Algorithmic Approach; John Wiley and Sons: New York, NY, USA, 1994. [Google Scholar]
Gershenson, C. Artificial Neural Networks for Beginners; 2003; Available online: http://arxiv.org/abs/cs/0308031 (accessed on 20 August 2003).
Rätsch, G. A Brief Introduction into Machine Learning; Friedrich Miescher Laboratory of the Max Planck Society: Tuebinger, Germany, 2004. [Google Scholar]
Russel, S.J.; Norvig, P. Artificial Intelligence. A Modern Approach; Prentice-Hall, Inc.: Upper Saddle River, NJ, USA, 1995. [Google Scholar]
Kyritsis, A.I.; Deriaz, M. A machine learning approach to waiting time prediction in queueing scenarios. In Proceedings of the Second International Conference on Artificial Intelligence for Industries, Laguna Hills, CA, USA, 25–27 September 2019; pp. 17–21. [Google Scholar]
Stintzing, J.; Norrman, F. Prediction of Queuing Behaviour through the Use of Artificial Neural Networks. Available online: http://www.diva-portal.se/smash/get/diva2:1111289/FULLTEXT01.pdf (accessed on 18 June 2017).
Xia, L.; Zhang, Z.G.; Li, Q.-L.; Glynn, P.W. A c/μ-Rule for Service Resource Allocation in Group-Server Queues. arXiv 2018, arXiv:1807.05367. [Google Scholar]
Aviv, Y.; Federgruen, A. The value-iteration method for countable state Markov decision processes. Oper. Res. Lett. 1999, 24, 223–234. [Google Scholar] [CrossRef]
Özkan, E.; Kharoufeh, J.P. Optimal control of a two-server queueing system with failures. Probab. Eng. Inform. Sci. 2014, 28, 489–527. [Google Scholar] [CrossRef]
Sennott, L.I. Stochastic Dynamic Programming and the Control of Queueing Systems; Wiley: New York, NY, USA, 1999. [Google Scholar]
Efrosinin, D.; Sztrik, J. An algorithmic approach to analyzing the reliability of a controllable unreliable queue with two heterogeneous servers. Eur. J. Oper. Res. 2018, 271, 934–952. [Google Scholar] [CrossRef]
Kingma, D.P.; Adam, J.B. A Method for Stochastic Optimization; 2015; Available online: https://arxiv.org/abs/1412.6980 (accessed on 30 January 2017).

Figure 1. Controllable multi-server queueing system with heterogeneous servers and operating costs.

Figure 2. Queue length approximation.

Figure 3. Confusion matrices (a–d) for prediction of

q_{2}, q_{3}, q_{4}

and

q_{5}

using HS.

Figure 4. Architecture of the neural network.

Figure 5. Confusion matrices (a–d) for prediction of

q_{2}, q_{3}, q_{4}

and

q_{5}

using NN.

Table 1. Control table.

System State x	Queue Length $q (x)$
$(d_{1}, d_{2}, d_{3})$	0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	...
(0,0,0)	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1
(1,0,0)	0	0	0	0	0	2	2	2	2	2	2	2	2	2	2	2	2	2
(0,1,0)	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1
(1,1,0)	0	0	0	0	0	3	3	3	3	3	3	3	3	3	3	3	3	3
(0,0,1)	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1
(1,0,1)	0	0	0	0	2	2	2	2	2	2	2	2	2	2	2	2	2	2
(0,1,1)	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1
(1,1,1)	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0

Table 2. Control table.

System State x	Queue Length $q (x)$
$(d_{1}, d_{2}, d_{3})$	0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	...
(0,0,0)	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1
(1,0,0)	0	2	2	2	2	2	2	2	2	2	2	2	2	2	2	2	2	2
(0,1,0)	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1
(1,1,0)	0	3	3	3	3	3	3	3	3	3	3	3	3	3	3	3	3	3
(0,0,1)	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1
(1,0,1)	2	2	2	2	2	2	2	2	2	2	2	2	2	2	2	2	2	2
(0,1,1)	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1
(1,1,1)	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0

Table 3. Value function for system states.

System State x	Value Function $v (x)$
$(q, d_{1}, d_{2}, d_{3})$	example 1	example 2
(0,0,0,0)	0	0
(0,1,0,0)	2.6034	19.4480
(0,0,1,0)	14.0865	28.3810
(0,0,0,1)	14.2872	33.7009
(1,1,0,0)	7.7979	51.3142
(0,1,1,0)	16.6905	51.4444
(0,1,0,1)	16.8910	55.9981
(0,0,1,1)	28.3747	65.9866
(2,1,0,0)	15.5520	96.1454
(1,1,1,0)	21.8874	90.3521
(1,1,0,1)	22.0873	93.2714
(0,1,1,1)	30.9798	93.2581
(3,1,0,0)	25.7823	154.6580
(2,1,1,0)	29.6487	142.7630
(2,1,0,1)	29.8469	145.4230
(1,1,1,1)	36.1809	140.4050
...	...	-
(6,1,0,0)	68.3382	-
(5,1,1,0)	66.8622	-
(5,1,0,1)	66.9946	-
(4,1,1,1)	66.9830	-
(7,1,0,0)	85.9322	-
(6,1,1,0)	82.9672	-
(6,1,0,1)	83.0730	-
(5,1,1,1)	81.9234	-

Table 4. Accuracy for prediction with HS.

HS	$q_{2}$	$q_{3}$	$q_{4}$	$q_{5}$
Accuracy	0.8430	0.8778	0.7899	0.6282
Accuracy $\pm 1$	0.9861	0.9884	0.9871	0.9769

Table 5. Accuracy for prediction with NN.

NN	$q_{2}$	$q_{3}$	$q_{4}$	$q_{5}$
Accuracy	0.9700	0.8785	0.8708	0.7977
Accuracy $\pm 1$	0.9991	0.9951	0.9874	0.9962

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Estimation of the Optimal Threshold Policy in a Queue with Heterogeneous Servers Using a Heuristic Solution and Artificial Neural Networks

Abstract

1. Introduction

2. Mathematical Model

3. Heuristic Solution

4. Artificial Neural Networks

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics