Optimal Scheduling in General Multi-Queue System by Combining Simulation and Neural Network Techniques

Efrosinin, Dmitry; Vishnevsky, Vladimir; Stepanova, Natalia

doi:10.3390/s23125479

Open AccessArticle

Optimal Scheduling in General Multi-Queue System by Combining Simulation and Neural Network Techniques

by

Dmitry Efrosinin

^1,2,*

,

Vladimir Vishnevsky

³

and

Natalia Stepanova

⁴

¹

Institute for Stochastics, Johannes Kepler University Linz, 4040 Linz, Austria

²

Department of Information Sciences, Peoples’ Friendship University of Russia (RUDN University), Moscow 117198, Russia

³

V.A. Trapeznikov Institute of Control Sciences of Russian Academy of Sciences, Moscow 117997, Russia

⁴

Scientific and Production Company “INSET”, Moscow 129085, Russia

^*

Author to whom correspondence should be addressed.

Sensors 2023, 23(12), 5479; https://doi.org/10.3390/s23125479

Submission received: 29 April 2023 / Revised: 21 May 2023 / Accepted: 8 June 2023 / Published: 10 June 2023

(This article belongs to the Special Issue Internet of Mobile Things and Wireless Sensor Networks)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

The problem of optimal scheduling in a system with parallel queues and a single server has been extensively studied in queueing theory. However, such systems have mostly been analysed by assuming homogeneous attributes of arrival and service processes, or Markov queueing models were usually assumed in heterogeneous cases. The calculation of the optimal scheduling policy in such a queueing system with switching costs and arbitrary inter-arrival and service time distributions is not a trivial task. In this paper, we propose to combine simulation and neural network techniques to solve this problem. The scheduling in this system is performed by means of a neural network informing the controller at a service completion epoch on a queue index which has to be serviced next. We adapt the simulated annealing algorithm to optimize the weights and the biases of the multi-layer neural network initially trained on some arbitrary heuristic control policy with the aim to minimize the average cost function which in turn can be calculated only via simulation. To verify the quality of the obtained optimal solutions, the optimal scheduling policy was calculated by solving a Markov decision problem formulated for the corresponding Markovian counterpart. The results of numerical analysis show the effectiveness of this approach to find the optimal deterministic control policy for the routing, scheduling or resource allocation in general queueing systems. Moreover, a comparison of the results obtained for different distributions illustrates statistical insensitivity of the optimal scheduling policy to the shape of inter-arrival and service time distributions for the same first moments.

Keywords:

optimal scheduling; heterogeneous queues; Markov decision problem; queue simulation; simulated annealing; neural network

1. Introduction

Machine learning algorithms have been used over the last ten years in almost all fields where problems associated with data classification, pattern recognition, non-linear regression, etc., have to be solved. The application of such algorithms has also intensified in the field of queueing theory. While the first steps in the successful application of machine learning to evaluate the performance characteristics of simple and complex queueing systems have already been taken, the total number of works on this topic still remains modest. As for reviews, we can only refer to a recent paper by Vishnevsky and Gorbunova [1] which proposes a systematic introduction to the use of machine learning in the study of queueing systems and networks. Before we formulate our specific problem we would like also to make a small contribution to the popularisation of machine learning in the queueing theory by describing briefly the latest works. In Stintzing and Norrman [2], an artificial neural network was used for predicting the number of busy servers in the

M / M / s

queueing system. The papers of Nii et al. [3] and Sherzer et al. [4] have answered positively the question regarding whether the machines could be useful for solving the problems in general queueing systems. They employed a neural network approach to estimate the mean performance measures of the multi-server queues

G I / G / s

based on the first two moments of the inter-arrival and service time distributions. A machine learning approach was used in the work of Kyritsis and Deriaz [5] to predict the waiting time in queueing scenarios. The combination of a simulation and machine learning techniques for assessing the performance characteristics was illustrated in Vishnevsky et al. [6] on a queueing system

M M A P / P H / M / N

with K priority classes. Markovian queues were simulated using artificial neural networks in Sivakami et al. [7]. Neural networks were used also in research by Efrosinin and Stepanova [8] to estimate the optimal threshold policy in a heterogeneous

M / M / K

queueing system. The combination of the Markov decision problem and neural networks for the heterogeneous queueing model with process sharing was studied by Efrosinin et al. [9]. The performance parameters of the closed queueing network by means of a neural network were evaluated in Gorbunova and Vishnevsky [10]. In addition to the presented results of using neural networks in hypothetical queueing theory models, academic studies in this area with real-world applications have gradually been proposed. For example, the problem regarding the choice of an optimum charging–discharging schedule for electric vehicles with the usage of a neural network is proposed by Aljafari et al. [11]. The main conclusion to be drawn from the previous results obtained via the application of machine learning to models of the queueing theory is that the neural networks cannot be treated as a replacement for classical methods in system performance analysis, but rather as a complement to the capabilities of such an analysis.

The systems with parallel queues and one server are known also as polling systems which have found wide application in various fields such as computer networks, telecommunications systems, control in manufacturing and road traffic. For analytic and numerical results in various types of polling systems with applications to broadband wireless Wi-Fi and Wi-MAX networks, we refer interested readers to the textbook by Vishnevsky and Semenova [12] and the references therein. The same authors in [13] developed their research on polling systems to systems with correlated arrival flows such as

M A P

,

B M A P

, and the group Poisson arrivals. In Vishnevskiy et al. [14], it was shown that the results obtained by a neural network are close enough to the results of analytical or simulation calculations for the

M / M / 1

and

M A P / M / 1

-type polling systems with cyclic polling. Markovian versions of a single-server model with parallel queues have been investigated by a number of authors. The two-queue homogeneous model with equal service rates and holding costs was studied in Horfi and Ross [15], where it was shown that the queues must be serviced exhaustively according to the optimal policy. In research by Liu et al. [16], it was shown that the scheduling policy that routes the server with respect to the LQF (Longest Queue First) policy is optimal when all queue lengths are known and that the cyclic scheduling policy is optimal in cases where the only information available is the previous decisions. The systems with multiple heterogeneous queues in different settings, also known as asymmetric polling systems, have been studied intensively in cases where there are no switching costs by Buyukkoc et al. [17], Cox and Smith [18], where the optimality of the static

c μ

-rule was proved. This policy schedules a server first to the queue i with a maximum weight

c_{i} μ_{i}

consisting of the holding cost and service rate. In Koole [19], the problem of optimal control in a two-queue system was analysed by means of the continuous-time Markov decision process and dynamic programming approach. The author found numerically that the optimal policy which minimizes the average cost per unit of time can be quite complex if there are both holding and switching costs. The threshold-based policy for such a queueing system was applied by Avram and Gómez-Corral [20], where the expressions for the long-run expected average cost of holding units and switching actions of the server were given. The queueing system with general service times and set-up costs which have an effect on the instantaneous switch from one queue to another was studied in Duenyas and Van Oyen [21]. The authors proposed a simple heuristic scheduling policy for the system with multiple queues. A rather similar model is described in Matsumoto [22], where the optimal scheduling problem is solved in a system with arbitrary time distributions. Here, instead of switching costs, the corresponding set-up time intervals required for switching are used. The system is controlled by the Learning Vector Quantization (LVQ) network, see Kohonen [23] for details, which classifies the system state by the closest codebook vector of a certain class in terms of the Euclidean metric. The problem with this approach is the large number of parameters associated with the codebook vectors, where it is normally required that several vectors per class must be estimated for a given control policy using computationally expensive recurrent algorithms.

This paper proposes a fairly universal method for solving the problem of optimal dynamic scheduling or allocation in queueing systems of the general type, i.e., where the times between events are arbitrarily distributed, and in queueing systems with correlated inter-arrival and service times. Furthermore, it can provide a performance analysis of complex controlled systems described by multidimensional random processes, for which finding analytical, approximate or heuristic solutions is a difficult task. The main idea of the paper is to use a multi-layer neural network for server scheduling. The parameters of this neural network trained first on some arbitrary control policy are optimized then with the aim to minimize a specified average cost function. Moreover, such a cost function for systems with arbitrary inter-arrival and service time distributions can only be computed via simulation. We consider this approach, which combines neural networks with simulation technique, to be quite universal to obtain an optimal deterministic control policy in complicated queueing systems. The method is exemplified by some version of a single-server system with parallel queues equipped with a controller for scheduling a server. The system under study is assumed to have heterogeneous arrival and service attributes, i.e., unequal arrival and service rates, as well as holding and switching costs. Systems with arbitrary distributions and switching costs have not yet been considered by other authors. It is assumed in our model that the queue currently being served by the server is serviced exhaustively. The next queue to be served by the server is selected according to a dynamic scheduling policy based on the queue state information, i.e., on the number of customers waiting in each of parallel queues. It is expected that the changing of the serviced queue involves the switching costs. The holding of a customer in the system is also linked to the corresponding cost. Clearly, even with some fixed scheduling control policy, calculating any characteristics of the proposed queueing system with arbitrary inter-arrival and service time distributions in explicit form is not a trivial task. It is also difficult to fix the dynamic control policy defining the scheduling in large systems in a standard way, e.g., through a control matrix that would contain the corresponding control action for all possible states of the system. Therefore, in such a case we consider it justified to solve the problem of finding the optimal scheduling policy with the aim to minimize the average cost per unit of time by combining the simulation as a tool to calculate the performance characteristics of the system with a machine learning technique, where the neural network will be responsible for dynamic control. By training a neural network for some initial control policy, we obtain characteristics of the network in the form of a matrix of weights and a vector of biases. The process of solving the optimal scheduling problem is then reduced to a discrete parametric optimization. The parameters of the neural network must be optimized in such a way that this network can guarantee the minimal values of the average cost functional by generating control actions at decision epochs. For this purpose, we have chosen one of the random search methods, such as simulated annealing, see, e.g., in Aarts and Korst [24], Ahmed [25]. It is a heuristic method based on a concept of heating and controlled cooling in metallurgy and is normally used for global optimization problems in a large search space without any assumption on the form of the objective function. This algorithm was implemented by Gallo and Capozzi [26] specifically for the probabilistic scheduling problem. The algorithm will be adapted for a non-explicitly defined parametric function with a large number of variables defined on a discrete domain.

To verify the quality of the calculated optimal parameters of the neural network, the values of the average cost functional for the markovian version of the queueing system are compared with the results obtained by solving the Markov decision problem (MDP). The general theory on MDP models is discussed in Puterman [27] and Tijms [28]. The details on application of MDP to controlled queueing systems with heterogeneous servers can be found in Efrosinin [29]. The optimal control policy and the corresponding objective function are calculated in the paper via a policy-iteration algorithm proposed in Howard [30] for an arbitrary finite-state Markov decision process. According to the MDP, the router in our system has to find an optimal control action in the state visited at a decision epoch with the aim to minimize the long-run average cost. Note that for our queueing model under general assumptions the semi-Markov decision problem (SMDP) can be formulated. The SMDP is a more powerful model than the MDP since the time spent by the system in each state before a transition is taken into account by calculating the objective function. The objective function must be calculated here also by means of a simulation. In this case, the reinforcement learning algorithm, e.g., Q-P-Learning, can be applied. The main problem of this approach consists of the fact that many pairs of state and action can remain non-observable for deterministic control policy and as a result the control actions in such states can not be optimized. However, in our opinion, neural networks can also be used to solve this problem which presents a potential task for further research. The SMDP topic is outside the scope of this article but we refer readers to work by Gosavi [31], where one can find a very interesting overview on reinforcement learning and a well-designed classification of simulated-based optimization algorithms.

Summarising our research in this paper we can highlight the following main contributions: (a) We propose a new controlled single-server system with parallel queues where the router uses a trained multi-level neural network to perform a scheduling control: (b) A simulated annealing method is adapted to optimize the weights and biases of the neural network with the aim to minimize the average cost function which can be calculated only via simulation; (c) The quality of the resulting optimal scheduling policy is verified solving a Markov decision problem for the Markovian analog of the queueing system; (d) We provide detailed numerical analysis of the optimal scheduling policy and discuss its sensitivity to the shape of the inter-arrival and service time distributions; (e) The distinctive feature of our paper is the presence of algorithms employed in the form of pseudocodes with detailed descriptions of relevant steps.

The rest of the paper is organized as follows. Section 2 presents a formal description of the queueing system and optimization problem. Section 3 describes the Markov decision problem and the policy-iteration algorithm used to calculate optimal scheduling policy. In Section 4, the event-based simulation procedure of the proposed queueing system is discussed. The neural network architecture, parametrization and training algorithm are summarized in Section 5. Section 6 presents simulated annealing optimization algorithm. The numerical analysis is shown in Section 7 and concluding remarks are provided in Section 8.

The following notations are introduced for use in sequel. Let

e_{j}

denote the vector of appropriate dimension with 1 in the jth position beginning from 0th and 0 elsewhere,

1_{{A}}

denote the indicator function which takes the value 1 if the event A occurs and 0 otherwise. The notations

min_{i} {a_{i}}

and

max_{i} {a_{i}}

mean the minimum and maximum of the values that a can assume, and

\underset{i}{arg min} {a_{i}}

,

\underset{i}{arg max} {a_{i}}

denote the element index associated, respectively, with the minimum and maximum value.

2. Single-Server System with Parallel Queues

Consider a single-server system with N parallel heterogeneous queues of the type

G I / G / 1

and router for scheduling the server across the queues. Heterogeneity here refers to unequal distributions associated with inter-arrival and service times of customers in different queues, as well as unequal holding and switching costs. The queue that is currently being serviced is exhaustively serviced. Denote

I = {1, 2, \dots, N}

as a queue index set. The proposed queueing system is shown schematically in Figure 1.

Denote

τ_{n, i}

,

n \geq 1

as the time instants of arrivals to queue i and

ν_{i} : = ν_{n, i} = τ_{n, i} - τ_{n - 1, i}

,

n \geq 1

as the sequence of mutually independent and identically distributed inter-arrival times with a CDF

A_{i} (t)

,

i \in I

. Further denote by

ζ_{i} : = ζ_{n, i}

,

n \geq 1

, the service time of the nth customer in the ith queue. These random variables are also assumed to be mutually independent and generally distributed with CDF

B_{i} (t)

,

i \in I

. We assume that the random variables

ν_{i}

and

ζ_{i}

have at least two first finite moments

\begin{matrix} a_{k, i} = k \int_{0}^{\infty} x^{k - 1} (1 - A_{i} (t)) d t, b_{k, i} = k \int_{0}^{\infty} x^{k - 1} (1 - B_{i} (t)) d t, k = 1, 2 . \end{matrix}

The squared coefficients of variation are defined then, respectively, as

\begin{matrix} C V_{ν_{i}}^{2} = \frac{a_{2, i}}{a_{1, i}^{2}} - 1, C V_{ζ_{i}}^{2} = \frac{b_{2, i}}{b_{1, i}^{2}} - 1 . \end{matrix}

This characteristic will be required to provide a comparison analysis of the optimal scheduling policy for different types of inter-arrival and service time distributions. From now it is assumed that the ergodicity condition is fulfilled, i.e., the traffic load

ρ = \sum_{i = 1}^{N} ρ_{i} = \sum_{i = 1}^{N} \frac{b_{1, i}}{a_{1, i}} < 1

.

Let

D (t)

indicate the sequence number of the queue currently being serviced by the server at time t, and

Q_{i} (t)

denote the number of customers in the ith queue at time t, where

i \in I

. The states of the system at time t are then given by a multidimensional random process

\begin{matrix} {X (t)}_{t \geq 0} = {D (t), Q_{1} (t), \dots, Q_{N} (t)}_{t \geq 0} \end{matrix}

(1)

with a state space

\begin{matrix} E = {x = (d, q_{1}, \dots, q_{N}) : d \in I, q_{i} \in N_{0}, i \in I} . \end{matrix}

(2)

Further in this section, the notations

d (x)

and

q_{i} (x)

will be used to identify the corresponding components of the vector state

x \in E

. The cost structure consists of the holding cost

c_{i}

per unit of time the customer spends in queue i and the switching cost

c_{i, j}

to switch the server from queue i to queue j.

It is assumed that the system states

X (t)

are constantly monitored by the router which defines the queue index to be serviced next after a current queue becomes empty. In initial state, when the total system is empty, a server is randomly scheduled to some queue. If the ith queue to be served becomes empty, such a moment we call a decision epoch, the router makes a decision by means of the trained neural network whether it must leave the server at the current queue or dispatch it to another queue. The routing to an idle queue is also possible. We remind that the server allocated by the router to a certain queue serves it exhaustively, i.e., it is only possible to change the queue if it becomes empty. Denote by

A = I

an action space with elements

a \in A

, where a indicates the queue index to be served next after the current queue has been emptied. The subsets

A (x)

of control actions in state

x \in \hat{E} \subset E

with

\hat{E} = {x = E : q_{d} (x) = 0}

coincide with the action space A. In all other states x from

E ∖ \hat{E}

the subsets

A (x) = {0}

includes only a fictitious control action 0 which has no influence on the system’s behavior.

The router can operate according to some heuristic control policies. It could be for example a Longest Queue First (LQF) policy which is a dynamic one and it prescribes at decision epochs to serve the next queue with the highest number of customers. If there are more than one queue with the same maximal number of customers, the queue number is selected randomly. Alternatively, the static

c μ

-rule, which needs only the information if a certain queue in non-empty, can be used for scheduling. According to this control policy the queue i with the highest factor

c_{i} μ_{i}

which is the product of the holding cost and the service intensity, must be serviced next. In the system with totally symmetric queues the former policy is according to [16] optimal. The latter control policy is optimal due to [17] if there is no switching costs, i.e.,

c_{i, j} = 0

. Otherwise, in case of positive switching costs and asymmetric or heterogeneous queues such policies are not optimal with respect to minimization of the average cost per unit of time.

The main idea of an optimal scheduling in our general model is as follows. We will equip the router with a trained neural network which will inform it on the index number of the next queue to which the server should be routed with the aim to reach formulated optimization aims. Obviously, we can only train the neural network on available data sets, i.e., on some heuristic control policy, and then we will need to optimize the network parameters such as the weights and the biases to solve the problem of finding the optimal scheduling policy. In the average cost criterion the limit of the expected average cost over finite time intervals is minimized in a set of admissible policies. The control policy

f : \hat{E} \to A (x)

is a stationary policy which prescribes the usage of a control action

f (x) \in A (x)

whenever at a decision epoch the system state is

x \in E

. The decision epochs arise whenever after serving any queue that queue becomes empty. For studied controllable queueing system operating under a control policy f, the average cost per unit of time for the ergodic system is of the form

\begin{matrix} g^{f} = lim_{t \to \infty} \frac{1}{t} E^{f} [\int_{0}^{t} \sum_{i = 1}^{N} c_{i} Q_{i} (u) d u + \sum_{i = 1}^{N} \sum_{j = 1}^{N} c_{i, j} S_{i, j} (t) | X (0) = (d, 0, \dots, 0)], \end{matrix}

(3)

where

S_{i, j} (t)

is the random number of switches from queue i to queue j in time interval

[0, t]

. Expectation

E^{f}

must be calculated with respect to the control policy f. The policy

f^{*}

is said to be optimal when for any admissible policy f,

\begin{matrix} g^{*} : = g^{f^{*}} = min_{f} g^{f} . \end{matrix}

(4)

Our purpose focuses on a combination of simulation and neural network techniques. To verify the quality of results obtained by solving the optimization problem (4) we formulate an appropriate Markov decision problem. Then we compute the optimal control policy together with the corresponding average cost

g^{*}

using a policy iteration algorithm, see, e.g., in Howard [30], Puterman [27], Tijms [28], which will be discussed in detail in a subsequent section.

3. Markov Decision Problem Formulation

Assume that the inter-arrival and service times are exponentially distributed, i.e.,

ν_{i} \sim E (λ_{i})

and

ζ_{i} \sim E (μ_{i})

,

i \in I

. Under Markovian assumption the process (1) is a continuous-time Markov chain with a state space E. The MDP associated with this Markov process is represented as a five-tuple:

\begin{matrix} (E, A, {A (x), x \in E}, λ_{x y} (a), c (x, a)), \end{matrix}

(5)

where state space E, action spaces A and

A (x)

have been already defined in the previous section.

–: $λ_{x y}$ is a transition rate to go from state x to state y by choosing a control action a is defined as

$\begin{matrix} λ_{x y} (a) = \{\begin{matrix} λ_{i} & y = x + e_{i}, \\ μ_{i} & y = x - e_{i}, d (x) = i, q_{i} (x) > 1, \\ μ_{i} & y = x - e_{i} + (a - i) e_{0}, d (x) = i, q_{i} (x) = 1, a \in A (x - e_{i}), \\ 0 & otherwise for y \neq x, \end{matrix} \end{matrix}$

(6)

where $λ_{x x} : = λ_{x x} (a) = - \sum_{y \neq x} λ_{x y} (a)$ .
–: $c (x, a)$ is an immediate cost in state $x \in E$ by selecting an action a,

$\begin{matrix} c (x, a) = \sum_{i = 1}^{N} c_{i} q_{i} (x) + μ_{j} c_{j, a} 1_{{d (x) = j, q_{j} (x) = 1}} . \end{matrix}$

Here the first summand denotes the total holding cost of customers in all parallel queues in state x which is independent of a control action. Let $c (x) = \sum_{i = 1}^{N} c_{i} q_{i} (x)$ and if $c_{i} = 1$ , $i \in I$ , we get the number of customers in state x. The second summand includes the fixed cost $c_{j, a}$ for switching the server from the current queue j to the next queue with an index a.

The optimal control policy

f^{*}

and the corresponding average cost

g^{f^{*}}

are the solutions of the system of Bellman optimality equations,

\begin{matrix} B v (x) = - λ_{x x} v (x) + g = [\sum_{i = 1}^{N} λ_{i} + μ_{j} 1_{{d (x) = j, q_{j} (x) \geq 1}}] v (x) + g, x \in E, \end{matrix}

(7)

where B is a dynamic programming operator acting on value function

v : E \to R

.

Proposition 1.

The dynamic programming operator B is defined as

\begin{matrix} B v (x) & = c (x) + \sum_{i = 1}^{N} λ_{i} v (x + e_{i}) + μ_{j} v (x - e_{j}) 1_{{d (x) = j, q_{j} (x) > 1}} \\ + μ_{j} min_{a \in A (x - e_{j})} {v (x - e_{j} + (a - j) e_{0}) + c_{j, a}} 1_{{d (x) = j, q_{j} (x) = 1}}, x \in E . \end{matrix}

(8)

Proof.

From the Markov decision theory, e.g., [27,28], it is known that for continuous time Markov chain the operator B can be defined as

B v (x) = min_{a} [c (x, a) + \sum_{y \neq x} λ_{x y} v (y)]

. This equality for the proposed system can be obviously rewritten in form (8). In this equation, the first term

c (x)

represents the immediate holding cost of customers in state x. The second term by

λ_{i}

describes the changes in value function due to new arrivals to the system. The third term by

μ_{j}

for

q_{j} (x) > 1

stands for the value function by service completion in the queue j where there are customers waiting for service. The last term by

μ_{j}

for

q_{j} (x) = 1

describes also a service completion which leads now to the state with an empty queue when a control action must be performed. Hence only the last term occurs with a min operator. □

Note that the state space of the Markov decision model is countable infinite and the immediate costs

c (x, a)

are unbounded. The existence of the optimal stationary policy and convergence of the policy iteration algorithm can be verified for the system under study in a similar way as in Özkan and Kharoufeh [32], where first, the convergence of the value iteration algorithm for the equivalent discounted model is proved, and then, using the criteria proposed in Sennott [33], this result is extended to the policy iteration algorithm for the average cost criterion.

To solve Equation (8) in the policy iteration algorithm required to calculate the optimal control policy, we convert the multidimensional state space into a one-dimensional space by mapping

Δ : E \to N_{0}

. The buffer sizes of the queues must be obviously truncated, namely

B_{i} < \infty

. Thereby the state

x = (d, q_{1}, \dots, q_{N})

can be rewritten in the following form:

\begin{matrix} s : = Δ (x) = d (x) β_{1, N} + \sum_{i = 1}^{N} q_{i} (x) β_{i, N - 1}, \end{matrix}

(9)

where

β_{i, j} = \prod_{k = i}^{j} (B_{k} + 1)

with

β_{N, N - 1} = 1

. The notation

Δ^{- 1} (s)

will be used for the inverse function. In one-dimensional case the state transitions can be expressed as

\begin{matrix} Δ (x \pm e_{i}) = Δ (x) \pm β_{i, N - 1}, \\ Δ (x + (a - j) e_{0}) = Δ (x) + (a - j) β_{1, N} . \end{matrix}

The set of states E in truncated model is finite with a cardinality

| E | = N β_{1, N}

. The policy iteration Algorithm 1 consists of two main steps: Policy evaluation and policy improvement. In first step for the given initial control policy, it can be for example the LQF policy, the system of linear equations with constant coefficients must be solved. To make the system solvable the value function

v (s)

for one of the states can be assumed to be an arbitrary constant, e.g.,

v (0) = 0

in the first state with

d = 1

and

q_{i} = 0

. In this case we obtain from the optimality Equation (7) the equality

g = \sum_{i = 1}^{N} λ_{i} v (β_{i, N - 1})

. The remaining equations can be solved numerically. As a solution we get the

| E |

values

v (s)

and the current value of the average cost g. In the policy improvement step, a control action a that minimizes the test value in the right-hand side of Equation (7) must be evaluated. The algorithm generates a sequence of control policies that converges to the optimum one. The convergence of the algorithm requires that the control actions in two adjacent iterations coincide in each state. To avoid policy improvement bouncing between equally good control actions in a given state, one can simply keep the previous control action unchanged if the corresponding test function is at least as large as for any other policy in determining the new policy. As an alternative to the proposed convergence criterion, one can use the values of average costs the variation of which should be for example less than a given some small value.

Example 1.

Consider the queueing system with

N = 4

queues. The buffer sizes are equal to

B_{i} = 10

,

i \in I

. At these settings the number of states already reaches large values,

| E | = 58, 564

, which confirms one of significant restrictions on application of dynamic programming for this type of control problems. The switching costs can be defined for example as

c_{i, j} = j - i + 4 mod 4

. The holding costs

c_{i}

for simplicity are assumed to be equal. The values of system parameters

λ_{i}

,

μ_{i}

,

c_{i}

and

c_{i, j}

are summarized in Table 1 and reflect heterogeneity of the system parameters, i.e.,

λ_{i} = 0.05 i

and

μ_{i} = \frac{3.750}{i}

.

Algorithm 1 Policy iteration algorithm

1:: procedure PIA( $N, B_{i}, λ_{i}, μ_{i}, c_{i}, c_{i, j}, i, j \in I$ )
2:: ▹ Initial policy
: $\begin{matrix} f^{(0)} (s) = \{\begin{matrix} Random {\underset{j \in I}{arg max} {q_{j} (Δ^{- 1} (s))}} & if d (Δ^{- 1} (s)) = i \in I, q_{i} (Δ^{- 1} (s)) = 0 \\ 0 & otherwise \end{matrix} \end{matrix}$
3:: $n \leftarrow 0$
4:: $g^{(n)} \leftarrow \sum_{i = 1}^{N} λ_{i} v^{(n)} (β_{i, N - 1})$ ▹ Policy evaluation
5:: for $s = 1 to | E |$ do
6:
: $\begin{matrix} v^{(n)} (s) & \leftarrow \frac{1}{\sum_{i = 1}^{N} λ_{i} + μ_{j} 1_{{q_{j} (Δ^{- 1} (s)) > 0}}} [c (Δ^{- 1} (s)) + μ_{j} c_{j, a} 1_{{d (Δ^{- 1} (s)) = j, q_{j} (Δ^{- 1} (s)) = 1}} - g^{(n)} \\ + \sum_{i = 1}^{N} λ_{i} [v^{(n)} (s + β_{i, N - 1}) 1_{{q_{i} (Δ^{- 1} (s)) < B_{i}}} + v (s) 1_{{q_{i} (Δ^{- 1} (s)) = B_{i}}}] \\ + μ_{j} v^{(n)} (s - β_{j, N - 1}) 1_{{d (Δ^{- 1} (s)) = j, q_{j} (Δ^{- 1} (s)) > 1}} \\ + μ_{j} v^{(n)} (s - β_{j, N - 1} + (a - j) β_{1, N}) 1_{{d (Δ^{- 1} (s)) = j, q_{j} (Δ^{- 1} (s)) = 1}}], \\ a & \leftarrow f^{(n)} (s - β_{j, N - 1}) \end{matrix}$
7:: end for
8:: ▹ Policy improvement
: $f^{(n + 1)} (s) \leftarrow \underset{a \in A (s - β_{j, N - 1})}{arg min} {c_{j, a} + v^{(n)} (s - β_{j, N - 1} + (a - j) β_{1, N})} 1_{{d (Δ^{- 1} (s)) = j, q_{j} (Δ^{- 1} (s)) = 1}}$
9:: if $f^{(n + 1)} (s) \leftarrow f^{(n)} (s), s \in {0, 1, \dots, | E |}$ then return $f^{(n + 1)} (s), v^{(n)} (s), g^{(n)}$
10:: else $n \leftarrow n + 1$ , go to step 4
11:: end if
12:: end procedure

These values correspond to the system load

ρ = \sum_{i = 1}^{N} ρ_{i} = 0.4

, that is the system is stable. This value is enough small to ensure on the one hand that the system is sufficiently loaded so that states appear where all queues are not empty, and on the other hand to minimize the probability of losing an arriving customer for given rather small buffer sizes. The solution of the large system of optimality equations is carried out numerically. The optimized average cost is

g^{*} = 2.5632

.

Using Algorithm 1, we calculate the optimal scheduling policy. For some of states with fixed number of customers in the third and the fourth queues and varied number of customers in the first two queues the control actions are listed in Table 2. The first row of the table contains the values of the number of customers

q_{2}

and

q_{1}

in the second or first queue when a decision is made, respectively, when the first or second queue is emptied. The first column contains some selected states of the system for the fixed levels

q_{3}

and

q_{4}

of the third and fourth queues. As we can see, the optimal scheduling policy has a complex structure with a large number of thresholds, making it difficult to obtain any acceptable heuristic solution explicitly. To better visualise the complexity in structure of the optimal control policy, the background of the table cells changes in grey colour from darker to lighter backgrounds as the queue index decreases. The

c μ

-rule as expected is not optimal here,

g_{c μ} = 6.7237

that is almost two and a half times more than the value of the average cost under the optimal policy. When the values

q_{1}

and

q_{2}

are small, the router schedules the server to serve the queues with low service rates. In this case the switching costs are low as well. According to the optimal scheduling policy the initiative to route a server to the queue with a higher service rate and switching costs increases as the length of the first two queues increases.

Example 2.

In this example we increase the arrival rates

λ_{i}

as given in Table 3. The other parameters are fixed at the same values as in the previous example. The load factor now is

ρ = 0.64

, and the corresponding optimized average cost is

g^{*} = 3.8201

and

g_{c μ} = 7.0420

.

The Table 4 of scheduling policy shows that as the system load increases the router switches the server to queue 2 or to queue 1 with a higher service rates at almost all queue lengths

q_{1}

and

q_{2}

, respectively.

4. Event-Based Simulation for General Model

We use an event-based simulation to simulate the proposed queueing system. This technique is suitable for random process evaluation where it is sufficient to have the information about the time instants when changes in states occur. Such changes will be referred to as events. Note that although simulation modelling is extensively used in queueing theory, many papers lack explicitly described algorithms that readers can use for independent research. For more information on simulation methods with applications to single- and multi-server queueing systems, we can recommend Ebert et al. [34] and Franzl [35]. In this regard, it will certainly not be superfluous if we present and discuss here an algorithm for the system simulation which is not difficult to adapt for other similar systems.

In our case, the events are the arrivals to one of N parallel queues and the departures of customers from the queue d currently being served by the server. The present time is selected as a global time reference.

In Figure 2, on the time axis we mark the moments of arrival of new customers and the moments of their service in a fixed queue with index d by means of arrows above and below the axis, respectively. The dotted arrows indicate the arrival of new customers in other queues. The successive events are denoted by

ε_{i}

and the corresponding time moments by

t (ε_{i})

. In the proposed queue simulation Algorithm 2 all the times are referred to the present time. Suppose that at the present moment of time there is a new arrival to the queue with the number d, which is serviced by the server, i.e.,

t (ε_{i}) = 0

. Denote by

T_{x} (ε_{i})

the holding time of the system in state x up to the occurrence of the event

ε_{i}

. According to the time schema the holding time in a previous state is defined as

t_{i} = min {T_{x} (ε_{i}), T_{b} (d) - T_{x} (ε_{i - 1}), \dots} = T_{x} (ε_{i})

, where

T_{x} (ε_{i})

is a remaining inter-arrival time to the queue d,

T_{b} (d)

stands for the generated service time after the event

ε_{i - 2}

of the previously occurred departure and the dots replace the time intervals associated with arrivals of customers in other queues. The next event is determined then by subtracting the holding time

t_{i}

from the all event time intervals. In this case the current event is a new arrival. Thus, the holding time

t_{i + 1}

in state up to the event

ε_{i + 1}

of an arrival to some other queue which not equal to d is calculated by

t_{i + 1} = min {T_{a} (d), T_{b} (d) - \sum_{j = i - 1}^{i} T_{x} (ε_{j}), \dots} = T_{x} (ε_{i + 1})

. The subsequent holding times are calculated as follows,

t_{i + 2} = min {T_{a} (d) - T_{x} (ε_{i + 1}), T_{b} (d) - \sum_{j = i - 1}^{i + 1} T_{x} (ε_{j}), \dots} = T_{x} (ε_{i + 2}) = T_{b} (d) - \sum_{j = i - 1}^{i + 1} T_{x} (ε_{j})

, i.e., the event

ε_{i + 2}

is then the next departure from queue d,

t_{i + 3} = min {T_{a} (d) - \sum_{j = i + 1}^{i + 2} T_{x} (ε_{j}), T_{b + 1} (d), \dots} = T_{x} (ε_{i + 3})

, where

T_{b + 1} (d)

is the next generated service time,

t_{i + 4} = min {T_{a} (d) - \sum_{j = i + 1}^{i + 3} T_{x} (ε_{j}), T_{b + 1} (d) - T_{x} (ε_{i + 3}), \dots} = T_{x} (ε_{i + 4}) = T_{b + 1} (d) - T_{x} (ε_{i + 3})

and

t_{i + 5} = min {T_{a} (d) - \sum_{j = i + 1}^{i + 4} T_{x} (ε_{j}), T_{b + 2}, \dots} = T_{x} (ε_{i} + 5) = T_{a} (d) - \sum_{j = i + 1}^{i + 4} T_{x} (ε_{j})

is a remaining inter-arrival time for the next arrival to the queue d. Continuing the process in a similar manner, all holding times of the system in the corresponding states are evaluated. By summing up the times

t_{i}

we obtain the total simulation running time of the system

s i m T

. The average cost per unit of time is then obtained by division of the accumulated cost by the time

s y m T

.

The time instants of arrival events to the queue

q \in I

are stored in vector variable

T_{a}

and the departure events in the queue with a number q in

T_{b} [q]

. The Algorithm 2 contains pseudo-code of the main elements of the event based simulation procedure.

Algorithm 2 Queue simulation algorithm

1:: procedure QSIM( $N, B_{i}, A_{i}, B_{i}, c_{i}, c_{i, j}, i, j \in I$ , $θ$ , $n_{max}$ , $n_{min}$ ) ▹ Initialization
2:: $T_{a} = (0, \dots, 0), | T_{a} | = N, T_{b} = ((\infty), \dots, (\infty)), | T_{b} | = N, x T = 0, i = 0, s c = 0$
3:: $d = Random [{1, \dots, N}], x = (d, 0, \dots, 0), | x | = N + 1$
4:: while $i < n_{max}$ do ▹ State recording
5:: $t_{i} \leftarrow min (T_{a}, min (T_{b} [1]), \dots, min (T_{b} [N]))$
6:: $T_{a} \leftarrow T_{a} - t_{i}$
7:: for $q = 1 to N$ do
8:: $T_{b} [q] (2 : | T_{b} [q] |) \leftarrow T_{b} [q] (2 : | T_{b} [q] |) - t_{i}$
9:: end for
10:: if $i > n_{min}$ then
11:: $s i m T \leftarrow s i m T + t_{i}$ ▹ Simulation time
12:: $x T \leftarrow x T + t_{i} \sum_{j = 1}^{N} c_{j} x [j + 1] + s c$ ▹ Sum up the cost
13:: end if
14:: $c s \leftarrow 0$
15:: for $q = 1 to N$ do
16:: if $(q = d & T_{a} [q] \leq ε)$ then return
17:: $T_{a} [q] \leftarrow RandomVariate [A_{q} (t)]$ ▹ Generate interarrival time
18:: $x \leftarrow x + e_{q + 1} (N + 1)$ , $i \leftarrow i + 1$
19:: if $| T_{b} [q] | \leq 1$ then return
20:: $T_{b} [q] \leftarrow (T_{b} [q], RandomVariate [B_{q} (t)])$ ▹ Generate service time
21:: end if
22:: end if
23:: if $(q = d & T_{a} [q] > ε & | T_{b} [q] \leq ε | > 0)$ then return
24:: $index \leftarrow T_{b} [q] \leq ε$ ▹ Index of the current departure
25:: $T_{b} [q] \leftarrow (T_{b} [q] ∖ T_{b} [q] [index])$ ▹ Remove current departure
26:: $x \leftarrow x - e_{q + 1} (N + 1)$
27:: if $x [q + 1] \geq 1$ then return
28:: $T_{b} [q] \leftarrow (T_{b} [q], RandomVariate [B_{q} (t)])$
29:: end if
30:: if $x [q + 1] = 0$ then return
31:: $a \leftarrow f (x, θ)$ , $d \leftarrow a$ ▹ New server scheduling
32:: $x \leftarrow x + (a - q) e_{1} (N + 1)$
33:: $s c = c_{q, a}$
34:: if $x [a + 1] > 0$ then return
35:: $T_{b} [a] \leftarrow (T_{b} [a], RandomVariate [B_{a} (t)])$ ▹ Generate service time
36:: end if
37:: end if
38:: end if
39:: if $(q \neq d & T_{a} [q] \leq ε)$ then return
40:: $T_{a} [q] \leftarrow RandomVariate [A_{q} (t)]$ ▹ Generate inter-arrival time
41:: $x \leftarrow x + e_{q + 1} (N + 1)$ , $i \leftarrow i + 1$
42:: end if
43:: end for
44:: end while
45:: $g \leftarrow x T / s i m T$
46:: end procedure

5. Neural Network Architecture

In our model, we propose to equip the router with a trained neural network. This network will determine an index of the queue that the server will serve next based on the information about the system state at a decision epoch when the server finishes service of the current queue. We have chosen a simple architecture for the neural network consisting of only two layers in such a way that, on the one hand, it would have a small number of parameters for further optimization and, on the other hand, that the quality of correct classification of some fixed initial control policy would be equal to at least 95%. The proposed neural network has one linear layer which represents an affine transformation and softmax normalization layer as illustrated in Figure 3.

The input includes

N + 1

neurons according to the system state

x = (d, q_{1}, \dots, q_{N})

, where

q_{d} (x) = 0

. The neuron 0 gets the information on

d (x)

, the ith neuron for

i \in I

gets the information on the state of ith queue. When the server finishes service at queue d, then the neural network classifies this state to one of N classes which defines a current control action

a \in A

in state x. The hidden linear layer consists of N neurons

y = {(y_{1}, \dots, y_{N})}^{'}

which are connected with an input neurons via the system of linear equations

\begin{matrix} y_{1} = w_{1, 0} x_{0} + w_{1, 1} x_{1} + \dots + w_{1, N} x_{N} + b_{1} \\ y_{2} = w_{2, 0} x_{0} + w_{2, 1} x_{1} + \dots + w_{2, N} x_{N} + b_{2} \\ \dots \\ y_{N} = w_{N, 0} x_{0} + w_{N, 1} x_{1} + \dots + w_{N, N} x_{N} + b_{N}, \end{matrix}

or in matrix form

y = W x + B

with

W \in R^{N \times (N + 1)}

and

B \in R^{N}

, where

\begin{matrix} W = (\begin{matrix} w_{1, 0} & w_{1, 1} & \dots & w_{1, N} \\ w_{2, 0} & w_{2, 1} & \dots & w_{2, N} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ w_{N, 0} & w_{N, 1} & \dots & w_{N, N} \end{matrix}) = (\begin{matrix} w_{1} \\ w_{2} \\ ⋮ \\ w_{N} \end{matrix}) and B = {(b_{1}, b_{2}, \dots, b_{N})}^{'} \end{matrix}

(10)

with

w_{i} = (w_{i, 0}, w_{i, 1}, \dots, w_{i, N})

are, respectively, the matrix of weights and the vector of biases of the given neural network which must be estimated by means of the training set. The softmax layer

z = softmax (y)

is a final layer of the multiclass classification. The softmax layer generates as an output the vector of N estimated probabilities of the input sample

y_{i}

, where the ith entry is the likelihood that x belongs to class i. The vector y is normalized by the transformation

\begin{matrix} z = (\begin{matrix} z_{1} \\ z_{2} \\ ⋮ \\ z_{N} \end{matrix}) = \frac{1}{\sum_{i = 1}^{N} e^{y_{i}}} (\begin{matrix} e^{y_{1}} \\ e^{y_{2}} \\ ⋮ \\ e^{y_{N}} \end{matrix}) . \end{matrix}

The class number is then defined as

\hat{a} = arg max z_{i}

. Hence, the output z is a mapping of the form

z = φ (x, θ)

, where

θ \in R^{N (N + 2)}

is the parameter vector of the neural network which includes all entries of the weight matrix

W \in R^{N \times (N + 1)}

and the bias vector

B \in R^{N}

, i.e.,

\begin{matrix} θ = (w_{1}, w_{2}, \dots, w_{N}, B^{'}) . \end{matrix}

(11)

The values of the parameter vector

θ

of the initial control policy, which in the next section will be used as a starting solution for optimization procedure, are obtained by training the neural network on some known heuristic control policy. In our case this policy is the LQF. In the training phase the following optimization problem must be solved given the training set

{x^{(k)}}_{k = 1}^{m} \to {a^{(k)}}_{k = 1}^{m}

,

\begin{matrix} θ^{*} = \underset{θ}{arg min} \frac{1}{m} \sum_{k = 1}^{m} l_{k} (θ), \end{matrix}

(12)

where a non-negative loss function

l_{k} (θ) = - \sum_{i = 1}^{N} 1_{{a^{(k)} = i}} ln z_{i}^{(k)}

with

z_{i}^{(k)} = P [a^{(k)} = i | x^{(k)}, θ]

takes the value 0 only if the class of the kth element of a sample is i, i.e.,

\hat{a} = a^{(k)}

. The problem (12) can be solved in a usual way by the stochastic gradient descent method, where a single learning rate

η

to update all parameters is maintained. The corresponding iterative expression is given below,

θ^{(n)} = θ^{(n - 1)} - η \nabla_{θ} (\frac{1}{m} \sum_{k = 1}^{m} l_{k} (θ^{(n - 1)})),

where

\nabla_{θ}

is a Nabla-operator defining the gradient of the function relative to the parameter vector

θ

. In our calculations we use the adaptive moment estimation algorithm (ADAM) to solve the problem (12). It updates iteratively the parameters of the neural network based on training data. The ADAM calculates independent adaptive learning rates for the elements of

θ

by evaluating the first-moment and second moment estimation of the gradient. The method is simple to implement, computationally efficient, requires little memory and is invariant to diagonal changes in gradients. The further detailed information regarding ADAM algorithm can be found in Kingma and Ba [36]. Despite the fact that the ADAM algorithm can be found across various sources, we have also chosen to cite it in this article. The main steps required for the iterative updating the parameter vector

θ

are summarized in the Algorithm 3.

The parameters of the Algorithm 3 are fixed to

η = 0.001

,

β_{1} = 0.9

,

β_{2} = 0.999

,

ε = 10^{- 8}

and

δ = 0.001

. The classification accuracy of the proposed neural network trained on the LQF policy is over 97%. The test phases of the trained network were conducted on system states with a queue length of up to 100 customers per queue. Thus, this starting network can be used to generate control actions of the initial control policy for subsequent parameters’ optimization of this neural network.

Algorithm 3 Adaptive moment estimation algorithm

1:: procedure ADAM( $η$ , $β_{1}$ , $β_{2}$ , $ε$ , $δ$ )
2:: $M_{1}^{(0)} \leftarrow (0, \dots, 0)$ ▹ Initialisation of the moment 1
3:: $M_{2}^{(0)} \leftarrow (0, \dots, 0)$ ▹ Initialisation of the moment 2
4:: $C I \leftarrow 0$ ▹ Convergence index
5:: $n \leftarrow 0$
6:: while $C I = 0$ do
7:: $n \leftarrow n + 1$
8:: $G^{(n)} \leftarrow \nabla_{θ} (\frac{1}{m} \sum_{k = 1}^{m} l_{k} (θ^{(n - 1)}))$ ▹ Calculate the gradient at step n
9:: $M_{1}^{(n)} \leftarrow β_{1} M_{1}^{(n - 1)} + (1 - β_{1}) G^{(n)}$ ▹ Update the biased first moment
10:: $M_{2}^{(n)} \leftarrow β_{2} M_{2}^{(n - 1)} + (1 - β_{2}) {(G^{(n)})}^{2}$ ▹ Update the biased second moment
11:: ${\hat{M}}_{1}^{(n)} \leftarrow \frac{M_{1}^{(n)}}{1 - β_{1}^{n}}$ ▹ The bias-corrected first moment
12:: ${\hat{M}}_{2}^{(n)} \leftarrow \frac{M_{2}^{(n)}}{1 - β_{2}^{n}}$ ▹ The bias-corrected second moment
13:: $θ^{(n)} = θ^{(n - 1)} - η \frac{{\hat{M}}_{1}^{(n)}}{\sqrt{{\hat{M}}_{2}^{(n)} + ε}}$ ▹ Update the parameter vector
14:: if $| θ^{(n)} - θ^{(n - 1)} | < δ$ then return $θ^{(n)}$
15:: $C I \leftarrow 1$ ▹ Check the convergence
16:: end if
17:: end while
18:: end procedure

6. Optimization of the Neural-Network-Based Scheduling Policy

Denote by

θ

the known parameter vector of the trained neural network as was defined in (11). The function

g (θ)

means the average cost for the queueing system where the router chooses an action obtained from the trained neural network with the parameter vector

θ

. We adapt further a simulated annealing method described in Algorithm 4 for discrete stochastic optimization of the average cost function

\begin{matrix} g^{*} = min_{θ} g (θ), θ^{*} = \underset{θ}{arg min} g (θ) \end{matrix}

(13)

with a multidimensional parameter vector

θ

. This algorithm is quite straightforward. It needs some starting solution and in each iteration the algorithm evaluates for the randomly selected neighbor values of the function parameters the corresponding function value. If the neighbor occurs to be better than the current solution with respect to value of the objective function, algorithms replaces the current solution with a new one. If the neighbor value is worse, the algorithm keeps the current solution with a high probability and chooses a new value with a specified low probability.

The simulated annealing requires the finite discrete space for the parameters of the optimized function. It is assumed that all weights and biases of the neural network summarized in the vector

θ

take values in the interval

[θ_{min}, θ_{max}]

with a low bound

θ_{min}

and an upper bound

θ_{max}

. Moreover, this interval is quantized in such a way that

θ_{i}

,

i = 1, \dots, N (N + 2)

, takes only discrete values

θ_{min} + k Δ

,

k = 0, 1, \dots, Q

, where

Q = \frac{θ_{max} - θ_{min}}{Δ}

is a quantization level. Note that the domains for the elements of the parameter vector

θ

can be specified separately, and the values of the vector obtained by training the neural network based on the optimal policy of the Markov model will be suitable for determining the possible maximum and minimum bounds. In this case it is possible to achieve faster convergence of Algorithm 4 to the optimal value.

Algorithm 4 Simulated annealing algorithm

1:: procedure SA( $T (n)$ , $Δ$ ,m, $η$ , $τ$ , $ν$ , $θ_{min}$ , $θ_{max}$ ) ▹ Initialisation
2:: $θ^{(0)} \leftarrow (w_{1, LQF}, w_{2, LQF}, \dots, w_{N, LQF}, B_{LQF}^{'})$
3:: $n \leftarrow 0$
4:: $\bar{g} (θ^{(n)}) \leftarrow \frac{1}{m} \sum_{k = 1}^{m} QSIM (\dots, θ^{(n)})$
5:: $g^{*} \leftarrow \bar{g} (θ^{(n)})$ , $θ^{*} \leftarrow θ^{(n)}$
6:: while $T (n) > τ | | n < ν$ do
7:: $n \leftarrow n + 1$ ▹ Perturbation
8:: $i \leftarrow Random [{1, \dots, N (N + 2)}]$
9:: $ξ \leftarrow Random [{max {- η Δ, θ_{min} - θ_{i}^{(n - 1)}}, \dots, min {η Δ, θ_{max} - θ_{i}^{(n - 1)}}}]$
10:: $θ^{(n)} \leftarrow θ^{(n - 1)} + ξ e_{i}^{'}$
11:: $\bar{g} (θ^{n}) \leftarrow \frac{1}{m} \sum_{k = 1}^{m} QSIM (\dots, θ^{(n)})$ ▹ Acceptance
12:: if $\bar{g} (θ^{(n)}) - g^{*} - S_{g (θ^{(n)}), g (θ^{(n - 1)})} t_{2 m - 2; 1 - α} > 0$ then return
13:: $p \leftarrow e^{- \frac{\bar{g} (θ^{(n)}) - g^{*} - S_{g (θ^{(n)}), g (θ^{(n - 1)})} t_{2 m - 2; 1 - α}}{T (n)}}$
14:: else $p \leftarrow 1$
15:: end if
16:: $u \leftarrow Random []$
17:: if $p \geq u$ then return $g^{*} \leftarrow \bar{g} (θ^{(n)})$ , $θ^{*} \leftarrow θ^{(n)}$
18:: else $θ^{(n)} \leftarrow θ^{(n - 1)}$ , $m \leftarrow m + 1$
19:: end if
20:: end while
21:: end procedure

Since the average cost function g can not be calculated analytically, for this purpose a simulation technique is used. As shown in Algorithm 4, at each iteration at the step where the current solution can be accepted with a given probability we need to calculate the difference between the object functions. Due to the fact that this function can only be calculated numerically, it is necessary to check whether this difference is statistically significant at each iteration of the algorithm. The algorithm is modified in such a way that the t-test for two samples is used to compare the expected values of two normally distributed samples with unknown but equal variances. Denote by

θ_{1}

and

θ_{2}

, respectively, the current and the modified parameter vector and by

\begin{matrix} \bar{g} (θ_{1}) = \frac{1}{m} \sum_{k = 1}^{m} g^{(k)} (θ_{1}), \bar{g} (θ_{2}) = \frac{1}{m} \sum_{k = 1}^{m} g^{(k)} (θ_{2}) \end{matrix}

(14)

two corresponding first empirical moments of the objective function. According to the t-test the null hypothesis which states that for the modified vector the average cost is statistically smaller then the previous solution is rejected if

\begin{matrix} \bar{g} (θ_{2}) - \bar{g} (θ_{1}) - S_{g (θ_{1}), g (θ_{2})} t_{2 m - 2; 1 - α} > 0, \end{matrix}

(15)

where

t_{m; q}

stands for the q-quantile of the t-distribution and statistics

S_{g (θ_{1}), g (θ_{2})}

is defined as

\begin{matrix} S_{g (θ_{1}), g (θ_{2})} = \sqrt{\frac{V_{g (θ_{1})}^{(m)} + V_{g (θ_{2})}^{(m)}}{m}}, \end{matrix}

(16)

with empirical variances

V_{g (θ_{1})}^{(m)}

and

V_{g (θ_{2})}^{(m)}

.

Below, we briefly describe the main steps of the Algorithm 4. At the initialisation step of the algorithm, the neural network is trained based on the LQF control policy. The parameter vector is then equal to the initial vector

θ^{(0)}

to be optimized. The simulation Algorithm 2 is then used to calculate the initial sample

{g^{(k)} (θ^{(0)})}_{k = 1}^{m}

with

g^{(k)} (θ^{(0)}) = QSIM (\dots)

of the average cost function for a given initial parameter vector

θ^{(0)}

and the corresponding first empirical moment

\bar{g} (θ^{(0)})

. These values are set as the current solution

g^{*}

and

θ^{*}

to the optimization problem (13). At the perturbation step, a randomly chosen element of the previous parameter vector

θ^{(n - 1)}

must be randomly perturbed on the specified set

L (i) = {max {θ_{i}^{(n - 1)} - η Δ, θ_{min}}, \dots, min {θ_{i}^{(n - 1)} + η Δ, θ_{max}}}

of admissible discrete domain. For a new parameter vector

θ^{(n)}

next sample

{g^{(k)} (θ^{(n)})}_{k = 1}^{m}

of average costs must be calculated together with the first empirical moment

\bar{g} (θ^{(n)})

. At the acceptance step, a new policy

θ^{(n)}

can be accepted as a current solution with a probability p defined as

\begin{matrix} p = \{\begin{matrix} 1 & if \bar{g} (θ^{(n)}) \leq g^{*} \\ e^{- \frac{\bar{g} (θ^{(n)}) - g^{*} - S_{g (θ^{(n)}), g (θ^{(n - 1)})} t_{2 m - 2; 1 - α}}{T (n)}} & if \bar{g} (θ^{(n)}) > g^{*}, \end{matrix} \end{matrix}

where

T (n)

is the temperature at the nth iteration. If a new policy

θ^{(n)}

is accepted, then it is defined together with a corresponding average cost

\bar{g} (θ^{(n)})

as a current solution. Otherwise, the last change in the parameter vector

θ^{(n - 1)}

must be reversed, i.e.,

θ^{(n)} = θ^{(n - 1)}

and the sample size m for calculating the first moments is updated. Then the perturbation step must be repeated. For termination of the algorithm the stopping criteria

T (n) < τ

or

n < ν

is used.

We note that the classical simulated annealing method generates for some function

g (θ)

a sample

θ^{(n)}

which for the constant temperature

T (n) = T

can be interpreted as a realization of a homogeneous Markov chain

{Θ_{n}} {n \in N_{0}}

with transition probabilities

\begin{matrix} p_{θ_{i}, θ_{j}} = P [Θ_{n + 1} = θ_{j} | Θ_{n} = θ_{i}] = \frac{1}{| L (i) |} P [U_{n} \leq e^{- \frac{g (θ_{j}) - g (θ_{i})}{T}}], θ_{j} \in L (i), \end{matrix}

(17)

where

U_{n}

is a uniformly distributed random variable on the interval

[0, 1]

. It is easy to show that the modified transition probabilities, where the objective function is calculated numerically, converges to the transition probabilities (17) which in turn can guarantee the convergence to an optimal solution.

Proposition 2.

The acceptance probability

p (n)

satisfies the limit relation

\begin{matrix} lim_{n \to \infty} p (n) = lim_{n \to \infty} P [U_{n} \leq e^{- \frac{\bar{g} (θ_{j}) - \bar{g} (θ_{i}) - S_{g (θ_{j}), g (θ_{i})} t_{2 m - 2; 1 - α}}{T}}] = P [U_{n} \leq e^{- \frac{g (θ_{j}) - g (θ_{i})}{T}}] . \end{matrix}

(18)

Proof.

The probability

P [U_{n} \leq X]

can be obviously rewritten as

\begin{matrix} P [U_{n} \leq X] = \int_{0}^{1} P [u \leq X] f_{U_{n}} (u) d u = E [X], \end{matrix}

where

X = e^{- \frac{\bar{g} (θ_{j}) - \bar{g} (θ_{i}) - S_{g (θ_{j}), g (θ_{i})} t_{2 m - 2; 1 - α}}{T}}

. Then the following relation holds,

\begin{matrix} lim_{n \to \infty} E [e^{- \frac{\bar{g} (θ_{j}) - \bar{g} (θ_{i}) - S_{g (θ_{j}), g (θ_{i})} t_{2 m - 2; 1 - α}}{T}}] = E [e^{- \frac{g (θ_{j}) - g (θ_{i})}{T}}], \end{matrix}

due to the strong law of large numbers and the fact that for

n \to \infty

the sample size

m \to \infty

and hence

lim_{m \to \infty} S_{g (θ_{j}), g (θ_{i})} = lim_{m \to \infty} \sqrt{\frac{σ_{j}^{2} + σ_{i}^{2}}{m}} = 0 .

□

7. Numerical Analysis

Consider the queueing system with

N = 4

. We first analyse a Markov model, where the parallel queues are of the type

M / M / 1

with

ν_{i} \sim E (λ_{i})

and

ζ_{i} \sim E (μ_{i})

,

i \in I

, the coefficient of variation

C V_{ν_{i}}^{2} = C V_{ζ_{i}}^{2} = 1

. The values of system parameters

λ_{i}

and

μ_{i}

are fixed as in examples 1 and 2 which will refer to as Cases 1 and 2. We compare the optimization results obtained by combining the simulation, neural network and simulated annealing algorithm with the results evaluated by the policy iteration algorithm. In Cases 1 and 2, the weights and the biases of the neural network trained on the calculated by PIA optimal scheduling policy take, respectively, the following values

\begin{matrix} W_{PIA} = (\begin{matrix} 0.4 & 3.2 & 0.3 & 0.1 & 0.2 \\ - 0.3 & - 3.8 & 0.8 & 0.2 & 0.2 \\ 0.1 & - 2.9 & - 3.6 & 0.4 & 0.3 \\ 0.4 & - 0.3 & - 1.6 & - 1.3 & 0.3 \end{matrix}) B_{PIA} = (- 1.6, 1.0, 0.9, 0.4), \\ W_{PIA} = (\begin{matrix} 0.5 & 2.0 & 0.2 & 0.0 & 0.3 \\ - 0.3 & - 2.0 & 0.7 & 0.0 & 0.3 \\ 0.2 & - 1.3 & - 2.1 & 0.0 & 0.4 \\ 0.1 & 0.0 & - 1.0 & 0.0 & 0.3 \end{matrix}) B_{PIA} = (- 1.1, 1.1, 0.6, 0.0) . \end{matrix}

On the basis of these values, we can estimate in the simulation annealing Algorithm 4 the domain or solution space for each element of the vector

θ

. For simplicity, in our experiments we set common boundaries for all elements as

θ_{min} = - 6

and

θ_{max} = 6

. The length of the increment

Δ = 0.1

implies the quantization level

Q = 120

. Next, we set

η = 6

,

ν = 200

, and

T (n) = \frac{0.2}{log (n)}

. As an initial vector

θ^{(0)}

we take the parameter vector obtained by training the neural network on the LQF policy. For the initial control policy, one could also choose the policy

W_{PIA}, B_{PIA}

obtained by Algorithm 1. However, we would like to check the convergence of the algorithm when choosing not the best initial solution, since in general case one usually chooses either some heuristic policy or an arbitrary one. The empiric average cost

\bar{g} (θ^{(n)})

for each iteration step is calculated based on sample with a size

m \geq 20

. The accumulation of sample data in QSIM Algorithm 2 is carried out after 1000 customers have entered the system and is completed after 5000 customers have entered the system.

Application of the Algorithm 4 to a Markov model leads to the following optimal solutions:

Case 1:: Optimal solution is reached at $n = 184$ , $g^{*} = g (θ^{*}) = 2.2436$ ,

$\begin{matrix} W_{LQF} = (\begin{matrix} 0.5 & 2.0 & - 1.0 & 0.7 & - 0.9 \\ 0.3 & - 0.7 & 1.6 & - 0.7 & - 0.9 \\ 0.4 & - 0.6 & - 1.3 & 2.0 & - 0.8 \\ 0.0 & - 0.6 & - 1.3 & - 1.8 & 1.9 \end{matrix}) \Rightarrow W_{SA} = (\begin{matrix} 0.7 & 5.3 & - 0.3 & - 0.8 & 0.9 \\ 0.4 & - 2.8 & 1.6 & 0.2 & - 0.1 \\ 0.4 & - 5.7 & - 5.9 & 1.6 & 0.9 \\ 1.0 & - 0.7 & - 2.3 & - 2.8 & 1.2 \end{matrix}) \\ B_{LQF} = {(0.5, - 0.1, - 0.2, - 0.2)}^{'} \Rightarrow B_{SA} = {(- 1.8, - 0.2, - 0.6, - 0.3)}^{'} . \end{matrix}$
Case 2:: Optimal solution is reached at $n = 188$ , $g^{*} = g (θ^{*}) = 3.2279$ ,

$\begin{matrix} W_{LQF} = (\begin{matrix} 0.5 & 2.0 & - 1.0 & 0.7 & - 0.9 \\ 0.3 & - 0.7 & 1.6 & - 0.7 & - 0.9 \\ 0.4 & - 0.6 & - 1.3 & 2.0 & - 0.8 \\ 0.0 & - 0.6 & - 1.3 & - 1.8 & 1.9 \end{matrix}) \Rightarrow W_{SA} = (\begin{matrix} 0.4 & 5.3 & - 0.6 & 0.8 & 0.4 \\ - 0.2 & - 3.2 & 1.8 & 0.0 & 0.3 \\ 0.5 & - 3.1 & - 3.9 & 1.4 & 0.0 \\ 0.4 & - 1.0 & - 1.0 & - 3.5 & 1.3 \end{matrix}) \\ B_{LQF} = {(0.5, - 0.1, - 0.2, - 0.2)}^{'} \Rightarrow B_{SA} = {(- 2.5, 0.6, 0.0, 0.6)}^{'} . \end{matrix}$

We see that the elements of matrices

W_{PIA}

and

W_{SA}

are different, but they are markedly similar in terms of the elements with dominant values. The optimization process of the scheduling policy is illustrated in Figure 4. In addition to values of the average cost function obtained at each iteration step of the simulated annealing algorithm, the figures show horizontal dotted and dash-dotted lines, respectively, at level of the average cost

g_{LQF} = 9.7093

and

g_{c μ} = 4.1984

in figure labelled by (a) and

g_{LQF} = 11.1740

and

g_{c μ} = 5.2546

in figure labelled by (b) for the LQF and

c μ

heuristic policies. As expected, a non-optimal control policy LQF implies too high average cost. The results look much better for policy

c μ

, but still the presence of switching costs significantly worsens the performance of this policy. The red horizontal line indicates the average cost

g_{PIA} = 2.5632

and

g_{PIA} = 3.5500

obtained by solving the Markov decision problem using the policy iteration Algorithm 1. We can observe that the values are quite close to those obtained by random search. However, some small difference may be due, firstly, to the fact that the simulation is used for calculations and the results have a certain scattering, and, secondly, we do not exclude the influence of boundary states in the Markov model, where a buffer size truncation has been used. Testing the hypothesis for the difference between the optimal average costs

g^{*}

and

g_{PIA}

at least for our model showed the values to be statistically equivalent. In the figures, we have also marked with triangles those iteration steps with accepted policy (AP) where the perturbed parameter vector has been accepted. The number of accepted points in Case 1 and 2 is equal, respectively, to 98 and 110. From above results in case of exponential time distributions we can make the following observations. If the parameter vector

θ^{(0)}

with elements

W_{PIA}

and

B_{PIA}

is used for the initial scheduling policy, then one can expect the faster convergence of the simulated annealing algorithm to the optimal solution which was confirmed numerically. If an optimal policy for a controlled Markov process is not available, e.g., when the number of queues is too large, in this case it is reasonable to use the static

c μ

-rule as an initial policy.

Figure 5 displays experiments realized for the queues of the type

D / D / 1

with deterministic inter-arrival and service times which are equal to corresponding mean values

\frac{1}{λ_{i}}

and

\frac{1}{μ_{i}}

of the Markov model. Here the coefficient

C V_{ν_{i}}^{2} = C V_{ζ_{i}}^{2} = 0

. The SA algorithm converges to the values

g^{*} = 1.6500

and

g^{*} = 2.0326

, respectively, for Case 1 and 2 with the following optimal policies,

Case 1:: $\begin{matrix} W_{LQF} = (\begin{matrix} 0.5 & 2.0 & - 1.0 & 0.7 & - 0.9 \\ 0.3 & - 0.7 & 1.6 & - 0.7 & - 0.9 \\ 0.4 & - 0.6 & - 1.3 & 2.0 & - 0.8 \\ 0.0 & - 0.6 & - 1.3 & - 1.8 & 1.9 \end{matrix}) \Rightarrow W_{SA} = (\begin{matrix} 0.4 & 2.5 & - 0.7 & - 0.3 & - 0.2 \\ 0.4 & - 3.6 & 1.6 & 0.5 & - 0.5 \\ 0.5 & - 5.8 & - 1.2 & 1.2 & 0.0 \\ 0.9 & - 0.1 & - 3.5 & - 2.9 & 1.2 \end{matrix}) \\ B_{LQF} = {(0.5, - 0.1, - 0.2, - 0.2)}^{'} \Rightarrow B_{SA} = {(0.1, 0.5, 0.8, 0.6)}^{'} . \end{matrix}$
Case 2:: $\begin{matrix} W_{LQF} = (\begin{matrix} 0.5 & 2.0 & - 1.0 & 0.7 & - 0.9 \\ 0.3 & - 0.7 & 1.6 & - 0.7 & - 0.9 \\ 0.4 & - 0.6 & - 1.3 & 2.0 & - 0.8 \\ 0.0 & - 0.6 & - 1.3 & - 1.8 & 1.9 \end{matrix}) \Rightarrow W_{SA} = (\begin{matrix} 0.1 & 4.5 & - 1.2 & 0.7 & 0.9 \\ - 0.7 & - 2.6 & 1.8 & 0.0 & 0.5 \\ 0.0 & - 0.3 & - 3.8 & 0.6 & 0.6 \\ 0.5 & - 1.1 & - 3.4 & - 1.0 & 0.7 \end{matrix}) \\ B_{LQF} = {(0.5, - 0.1, - 0.2, - 0.2)}^{'} \Rightarrow B_{SA} = {(- 2.0, - 0.3, 0.3, - 1.0)}^{'} . \end{matrix}$

The average costs for heuristic policies take the values

g_{LQF} = 3.7333

,

g_{c μ} = 2.8000

,

g_{PIA} = 1.6500

and

g_{LQF} = 5.0133

,

g_{c μ} = 3.9866

,

g_{PIA} = 2.7373

.

It is observed that the optimal policy obtained by the SA algorithm is quite close to those obtained by the PIA. Nevertheless, from experiment to experiment certain deviations in the value of the average costs may appear. Therefore it is of interest for us to check whether such differences are statistically significant.

Further we analyse how sensitive is the optimal policy obtained in exponential case by the SA algorithm to the shape of arrival and service time distributions. The following distributions will be used to calculate the optimal control policy in the non-exponential case: gamma

G (α, β)

, log-normal

LN (μ, σ)

and Pareto

PR (α, k)

distributions, where two last options belong to a set of heavy tail distributions. The parameters of these distributions are chosen so that their first and second moments coincide. Moreover, the first moments are the same as for exponential distributions. The moments need to be represented as functions depending on the corresponding sample moments as in the method of moments used for parameter estimation. In the following experiments, the first moments of the inter-arrival and service times are fixed at values of Case 2, and the squared coefficient of variation is varied as

C V_{ν_{i}}^{2} = C V_{ζ_{i}}^{2} = 0.5

and

C V_{ν_{i}}^{2} = C V_{ζ_{i}}^{2} = 20

. Denote by

{Z^{(k)}}_{k = 1}^{m}

a sample random variable Z distributed according to the proposed distributions with two first sample moments

\bar{Z}

,

{\bar{Z}}_{2}

and squared empirical coefficient of variation

C V_{Z}^{2} = \frac{{\bar{Z}}_{2}}{\bar{Z}} - 1

. Then for the gamma distribution

Z \sim G (α, β)

with a PDF

f_{Z} (z) = \{\begin{matrix} \frac{β {(β z)}^{α - 1} e^{- β z}}{Γ (α)} & z \geq 0, \\ 0 & z < 0 \end{matrix}

the parameters

α > 0

and

β > 0

satisfy the relations,

α = \frac{1}{C V_{Z}^{2}}, β = \frac{α}{\bar{Z}} .

In case of the lognormal distribution

Z \sim LN (μ, σ)

with a PDF

f_{Z} (z) = \frac{1}{σ z} Φ (\frac{ln (z) - μ}{σ}), z > 0,

the parameters

μ \in R

and

σ > 0

are calculated by

σ = \sqrt{ln (1 + C V_{Z}^{2})}, μ = ln (\bar{Z}) - \frac{σ^{2}}{2} .

In case of a Pareto distribution

Z \sim PR (k, α)

with a PDF

f_{Z} (z) = \{\begin{matrix} \frac{α k^{α}}{x^{α + 1}} & x \geq k \\ 0 & x < k \end{matrix}

the parameters

k > 0

and

α > 0

are calculated by relations

α = 1 + \frac{\sqrt{1 + C V_{Z}^{2}}}{C V_{Z}}, k = \frac{α - 1}{α} \bar{Z} .

Parameters of the proposed probability distributions are listed in Table 5 and Table 6, respectively, for inter-arrival and service time distributions.

The sensitivity of the optimal control policy to the shape of the distributions is tested by means of a two-sided t-test for samples with unknown but equal variances. Let

g_{\exp}

and

g_{opt}

are the samples of the average cost values obtained for the optimal control policy in case of exponentially distributed times and for the system with proposed distributions for the inter-arrival and service times. These samples of size m are associated with the normally distributed random variables

Z_{\exp} \sim N (μ_{g_{\exp}}, σ_{g_{\exp}})

and

Z_{opt} \sim N (μ_{g_{opt}}, σ_{g_{opt}})

, where

μ_{g_{\exp}}, μ_{g_{opt}} \in R

and

σ_{g_{\exp}} = σ_{g_{opt}} > 0

. The test is defined then as

\begin{matrix} H_{0} : μ_{g_{\exp}} = μ_{g_{opt}} H_{1} : μ_{g_{\exp}} \neq μ_{g_{opt}} p = P [\frac{| {\bar{g}}_{\exp} - {\bar{g}}_{opt} |}{S_{g_{opt}, g_{\exp}}} > t_{2 m - 2; 1 - \frac{α}{2}}], \end{matrix}

where statistics

S_{g_{opt}, g_{\exp}}

is calculated by (16). The results of tests in form of the p-value, the values of the average costs

{\bar{g}}_{\exp}

and

{\bar{g}}_{opt}

together with their

95 %

confidence intervals are summarized in Table 7 and Table 8 for the systems with different inter-arrival and service time distributions with smaller and greater levels of dispersion around the mean, d.h. for

C V_{ν_{i}}^{2} = C V_{ζ_{i}}^{2} = 0.5

in Table 7 and

C V_{ν_{i}}^{2} = C V_{ζ_{i}}^{2} = 20

in Table 8. Table cell contains two rows with the values for the average costs

{\bar{g}}_{\exp}

and

{\bar{g}}_{opt}

together with confidence boundaries, and the third row has the p-value.

From the numerical examples, it is observed that the shape of distributions expressed through a coefficient of variation has a high level of influence over the value of the average cost functions

{\bar{g}}_{\exp}

and

{\bar{g}}_{opt}

. In almost all cases, the average cost increases significantly when the coefficient of variation increases. Only in the case of the Pareto distribution for the inter-arrival and service times is the change in values not significant. However, an examination of the entries in the last two tables reveals that in all experiments the p-value exceeds the significance level of

α = 0.05

. Furthermore, it is worth noting that in most cases this exceeding is sufficient large. In this regard, the statistical test fails to reject null hypothesis at a given significance level, in other words, the average cost values are statistically equal and the corresponding optimal control policies are equivalent. Therefore, at least within the framework of the experiments conducted, we can state that the optimal scheduling policy is insensitive to the shape of the inter-arrival and service time distributions given that the first moments are equal. For practical purposes, in general queueing systems one can either apply the proposed optimization method, or use the control policy optimized for the equivalent exponential model as a suboptimal scheduling policy.

8. Conclusions

In this paper, we combined the queue simulation technique, neural network and simulated annealing optimization to calculate the optimal scheduling policy and optimized average cost function in a general single-server queueing system with multiple parallel queues. The proposed combination of tools is sufficiently versatile to solve discrete optimization problems that occur during resource allocation in complex queueing systems and networks. The numerical results subsequently demonstrate the effectiveness of the proposed approach. The obtained optimal scheduling policy outperforms the best available heuristic policy which is the

c μ

-rule by more than 45% on average. Nevertheless, a couple of important points must be stressed that can be considered when using the proposed method. In simulated annealing, the choice of initial control policy affects the speed of convergence to the optimal solution. Furthermore, it is required that the finite domain be defined for the solution. If the dimensionality of the state space allows, the initial control policy and the corresponding finite solution space can be obtained by the policy iteration algorithm implemented for the Markov model. The obtained optimal solution seems to be statistically insensitive to the form of inter-arrival and service time distributions where the first two moments are the same. Moreover, the optimal policy in exponential case can be treated as a suboptimal policy and the corresponding trained neural network can be used by routers in queueing systems with arbitrary distributions. In terms of future research, we see potential in developing and applying this method to other complex controlled queueing systems where the search for optimal routing, scheduling and resource allocation policies is required. The possibility to compose the reinforcement learning algorithms and neural networks to solve optimization problems in general controlled queueing models could also be considered as a further line of research.

Author Contributions

Conceptualization, V.V.; Methodology, D.E.; Validation, N.S.; Formal analysis, D.E. and N.S.; Visualization, N.S.; Project administration, V.V.; Funding acquisition, V.V. All authors have read and agreed to the published version of the manuscript.

Funding

Open Access Funding by the University of Linz. The reported study was funded by RSF, project number 22-49-02023 (recipient V. Vishnevsky). This paper was supported by the RUDN University Strategic Academic Leadership Program (recipient D. Efrosinin).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The authors can be contacted to obtain data used in the study.

Acknowledgments

The authors acknowledge, with gratitude, the useful and constructive comments and remarks of an anonymous referee and the Editor.

Conflicts of Interest

The authors declare no conflict of interest.

References

Vishnevsky, V.; Gorbunova, A.V. Application of machine learning methods to solving problems of queuing theory. In Information Technologies and Mathematical Modelling. Queueing Theory and Applications: 20th International Conference, ITMM 2021, Named after AF Terpugov, Tomsk, Russia, 1–5 December 2021; Communications in Computer and Information Science; Dudin, A., Nazarov, A., Moiseev, A., Eds.; Springer International Publishing: Cham, Switzerland, 2022; Volume 1605, pp. 304–316. [Google Scholar]
Stintzing, J.; Norrman, F. Prediction of Queuing Behaviour through the Use of Artificial Neural Networks. 2017. Available online: http://www.diva-portal.se/smash/get/diva2:1111289/FULLTEXT01.pdf (accessed on 25 May 2023).
Nii, S.; Okuda, T.; Wakita, T. A performance evaluation of queueing systems by machine learning. In Proceedings of the IEEE International Conference on Consumer Electronics (ICCE-Taiwan), Taoyuan, Taiwan, 28–30 September 2020. [Google Scholar]
Sherzer, E.; Senderovich, A.; Baron, O.; Krass, D. Can machines solve general queueing systems? arXiv 2022, arXiv:2202.01729. [Google Scholar]
Kyritsis, A.I.; Deriaz, M. A machine mearning approach to waiting time prediction in queueing scenarios. In Proceedings of the 2019 Second International Conference on Artificial Intelligence for Industries (AI4I), Laguna Hills, CA, USA, 25–27 September 2019; pp. 17–21. [Google Scholar]
Vishnevsky, V.; Klimenok, V.; Sokolov, A.; Larionov, A. Performance evaluation of the priority multi-server system MMAP/PH/M/N using machine learning methods. Mathematics 2021, 9, 3236. [Google Scholar] [CrossRef]
Sivakami, S.M.; Senthil, K.K.; Yamini, S.; Palaniammal, S. Artificial neural network simulation for Markovian queueing models. Indian J. Comput. Sci. Eng. 2020, 11, 127–134. [Google Scholar]
Efrosinin, D.; Stepanova, N. Estimation of the optimal threshold policy in a queue with heterogeneous servers using a heuristic solution and artificial neural networks. Mathematics 2021, 9, 1267. [Google Scholar] [CrossRef]
Efrosinin, D.; Rykov, V.; Stepanova, N. Evaluation and prediction of an optimal control in a processor sharing queueing system with heterogeneous servers. In Distributed Computer and Communication Networks: 23rd International Conference, DCCN 2020, Moscow, Russia, 14–18 September 2020; Lecture Notes in Computer Science; Vishnevsky, V.M., Samouylov, K.E., Kozyrev, D.V., Eds.; Springer International Publishing: Cham, Switzerland, 2020; Volume 12563, pp. 450–462. [Google Scholar]
Gorbunova, A.V.; Vishnevsky, V. Evaluation of the Performance Parameters of a Closed Queuing Network Using Artificial Neural Networks. In Distributed Computer and Communication Networks: Control, Computation, Communications: 24th International Conference, DCCN 2021, Moscow, Russia, 20–24 September 2021; Lecture Notes in Computer Science; Vishnevskiy, V.M., Samouylov, K.E., Kozyrev, D.V., Eds.; Springer International Publishing: Cham, Switzerland, 2021; Volume 13144, pp. 265–278. [Google Scholar]
Aljafari, B.; Jeyaraj, P.R.; Kathiresan, A.C.; Thanikanti, S.B. Electric vehicle optimum charging-discharging scheduling with dynamic pricing employing multi agent deep neural network. Comput. Electr. Eng. 2022, 105, 108555. [Google Scholar] [CrossRef]
Vishnevsky, V.; Semenova, O. Polling Systems Theory and Applications for Broadband Wireless Networks; LAP LAMBERT Academic Publishing GmbH: London, UK, 2012. [Google Scholar]
Vishnevsky, V.; Semenova, O. Polling systems and their application to telecommunication networks. Mathematics 2021, 9, 117. [Google Scholar] [CrossRef]
Vishnevsky, V.; Semenova, O.; Bui, D.T. Using a machine learning approach for analysis of polling systems with correlated arrivals. In Distributed Computer and Communication Networks: Control, Computation, Communications: 24th International Conference, DCCN 2021, Moscow, Russia, 20–24 September 2021; Lecture Notes in Computer Science; Vishnevskiy, V.M., Samouylov, K.E., Kozyrev, D.V., Eds.; Springer International Publishing: Cham, Switzerland, 2021; Volume 13144, pp. 336–345. [Google Scholar]
Hofri, M.; Ross, K.W. On the optimal control of two queues with server setup times and its analysis. SIAM J. Comput. 1987, 16, 399–420. [Google Scholar] [CrossRef] [Green Version]
Liu, Z.; Nain, P.; Towsley, D. On optimal polling policies. Queueing Syst. Their Appl. 1992, 11, 59–83. [Google Scholar] [CrossRef] [Green Version]
Buyukkoc, C.; Varaiya, P.; Walrand, I. The cμ rule revisited. Adv. Appl. Probab. 1985, 17, 237–238. [Google Scholar] [CrossRef] [Green Version]
Cox, D.R.; Smith, W.L. Queues; Chapman & Hall: London, UK, 1991. [Google Scholar]
Koole, G. Assigning a single server to inhomogeneous queues with switching costs. In Theoretical Computer Science; CWI Report BS-R9405; Elsevier: Amsterdam, The Netherlands, 1994. [Google Scholar]
Avram, F.; Gómez-Corral, A. On the optimal control of a two-queue polling model. Oper. Res. Lett. 2006, 34, 339–348. [Google Scholar] [CrossRef] [Green Version]
Duenyas, I.; Van Oyen, M.P. Stochastic scheduling of parallel queues with set-up costs. Queueing Syst. 1995, 19, 421–444. [Google Scholar] [CrossRef] [Green Version]
Matsumoto, Y. On optimization of polling policy represented by neural network. Comput. Commun. Rev. 1994, 4, 181–190. [Google Scholar] [CrossRef]
Kohonen, T. The self-organizing map. Proc. IEEE 1990, 78, 1464–1480. [Google Scholar] [CrossRef]
Aarts, E.; Korst, J. Simulated Annealing and Boltzmann Machines; John Wiley & Sons: Hoboken, NJ, USA, 1989. [Google Scholar]
Ahmed, M.A. A modification of the simulated annealing algorithm for discrete stochastic optimization. Eng. Optim. 2007, 39, 701–714. [Google Scholar] [CrossRef]
Gallo, C.; Capozzi, V. A simulated annealing algorithm for scheduling problem. Open J. Appl. Math. Phys. 2019, 7, 2579–2594. [Google Scholar] [CrossRef] [Green Version]
Puterman, M.L. Markov Decision Process; Wiley series in Probability and Mathematcal Statistics; John Wiley & Sons: New York, NY, USA, 1994. [Google Scholar]
Tijms, H.C. Stochastic Models. An Algorithmic Approach; John Wiley & Sons: Hoboken, NJ, USA, 1994. [Google Scholar]
Efrosinin, D. Controlled Queueing Systems with Heterogeneous Servers. Dynamic Optimization and Monotonicity Properties; VDM Verlag: Saarbrücken, Germany, 2008. [Google Scholar]
Howard, R.A. Dynamic Programming and Markov Processes; John Wiley: Hoboken, NJ, USA, 1960. [Google Scholar]
Gosavi, A. Simulation-Based Optimization; Springer: New York, NY, USA, 2015. [Google Scholar]
Özkan, E.; Kharoufeh, J. Optimal control of a two-server queueing system with failures. Probab. Eng. Inf. Sci. 2014, 28, 489–527. [Google Scholar] [CrossRef] [Green Version]
Sennott, L.I. Average cost optimal stationary policies in infinite state Markov decision processes with unbounded costs. Oper. Res. 1989, 37, 626–633. [Google Scholar] [CrossRef]
Ebert, A.; Wu, P.; Mengersen, K.; Ruggeri, F. Computationally efficient simulation of queues: The R package queuecomputer. J. Stat. Softw. 2020, 95. [Google Scholar] [CrossRef]
Franzl, G. Queueing Models for Multi-Service Networks. Ph.D. Thesis, Technique University of Vienna, Vienna, Austria, 2015. [Google Scholar]
Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2017, arXiv:1412.6980. [Google Scholar]

Figure 1. Controlled single-server queueing system with parallel queues.

Figure 2. The time assignment for the present time based simulation.

Figure 3. Neural network architecture.

Figure 4. Iteration steps for g with

ν_{i} \sim E (λ_{i})

and

ζ_{i} \sim E (μ_{i})

for Case 1 (a) and Case 2 (b).

Figure 4. Iteration steps for g with

ν_{i} \sim E (λ_{i})

and

ζ_{i} \sim E (μ_{i})

for Case 1 (a) and Case 2 (b).

Figure 5. Iteration steps for g with

ν_{i} = \frac{1}{λ_{i}}

and

ζ_{i} = \frac{1}{μ_{i}}

for Case 1 (a) and Case 2 (b).

Figure 5. Iteration steps for g with

ν_{i} = \frac{1}{λ_{i}}

and

ζ_{i} = \frac{1}{μ_{i}}

for Case 1 (a) and Case 2 (b).

Table 1. The values of system parameters.

i	1	2	3	4
$λ_{i}$	0.05	0.10	0.15	0.20
$μ_{i}$	3.750	1.875	1.250	0.938
$c_{i}$	1	1	1	1
$c_{i, 1}$	0	1	2	3
$c_{i, 2}$	3	0	1	2
$c_{i, 3}$	2	3	0	1
$c_{i, 4}$	1	2	3	0

Table 2. The optimal scheduling policy for selected states.

$(d, q_{1}, q_{2}, q_{3}, q_{4})$	0	1	2	3	4	5	6	7	8	9	10
$(1, 0, q_{2}, 1, 1)$	4	4	4	4	4	4	4	4	4	4	4
$(1, 0, q_{2}, 3, 3)$	4	4	4	4	4	3	3	3	3	3	3
$(1, 0, q_{2}, 5, 5)$	4	3	3	3	3	3	3	2	2	2	2
$(1, 0, q_{2}, 8, 8)$	3	3	3	3	2	2	2	2	2	2	2
$(1, 0, q_{2}, 9, 9)$	3	3	2	2	2	2	2	2	2	2	2
$(2, q_{1}, 0, 1, 1)$	4	4	4	4	4	4	4	4	4	4	4
$(2, q_{1}, 0, 3, 3)$	4	4	4	4	4	4	3	3	3	3	3
$(2, q_{1}, 0, 5, 5)$	3	3	3	3	3	3	3	3	3	3	3
$(2, q_{1}, 0, 8, 8)$	3	3	3	3	3	3	3	3	3	3	3
$(2, q_{1}, 0, 9, 9)$	3	3	3	3	3	3	3	3	1	1	1

Table 3. The values of arrival rates.

i	1	2	3	4
$λ_{i}$	0.08	0.16	0.24	0.32

Table 4. The optimal scheduling policy for selected states.

$(d, q_{1}, q_{2}, q_{3}, q_{4})$	0	1	2	3	4	5	6	7	8	9	10
$(1, 0, q_{2}, 1, 1)$	3	2	2	2	2	2	2	2	2	2	2
$(1, 0, q_{2}, 3, 3)$	3	2	2	2	2	2	2	2	2	2	2
$(1, 0, q_{2}, 5, 5)$	3	2	2	2	2	2	2	2	2	2	2
$(1, 0, q_{2}, 8, 8)$	3	2	2	2	2	2	2	2	2	2	2
$(1, 0, q_{2}, 9, 9)$	3	2	2	2	2	2	2	2	2	2	2
$(2, q_{1}, 0, 1, 1)$	3	3	1	1	1	1	1	1	1	1	1
$(2, q_{1}, 0, 3, 3)$	3	3	1	1	1	1	1	1	1	1	1
$(2, q_{1}, 0, 5, 5)$	3	1	1	1	1	1	1	1	1	1	1
$(2, q_{1}, 0, 8, 8)$	3	1	1	1	1	1	1	1	1	1	1
$(2, q_{1}, 0, 9, 9)$	3	1	1	1	1	1	1	1	1	1	1

Table 5. Parameters for inter-arrival time distributions,

C V_{ν_{i}}^{2} = 0.5

(a) and

C V_{ν_{i}}^{2} = 20

(b).

Table 5. Parameters for inter-arrival time distributions,

C V_{ν_{i}}^{2} = 0.5

(a) and

C V_{ν_{i}}^{2} = 20

(b).

(a)
$i$	1	2	3	4
$G (α_{i}, β_{i})$	$(2.00, 0.16)$	$(2.00, 0.32)$	$(2.00, 0.48)$	$(2.00, 0.64)$
$LN (m_{i}, σ_{i})$	$(2.323, 0.637)$	$(1.629, 0.637)$	$(0.937, 0.637)$	$(0.637, 0.637)$
$PR (k_{i}, α_{i})$	$(7.925, 2.732)$	$(3.962, 2.732)$	$(2.642, 2.732)$	$(1.981, 2.732)$
(b)
$i$	1	2	3	4
$G (α_{i}, β_{i})$	$(0.05, 0.004)$	$(0.05, 0.008)$	$(0.05, 0.012)$	$(0.05, 0.016)$
$LN (m_{i}, σ_{i})$	$(1.003, 1.745)$	$(0.310, 1.745)$	$(- 0.095, 1.745)$	$(- 0.383, 1.745)$
$PR (k_{i}, α_{i})$	$(6.326, 2.025)$	$(3.163, 2.025)$	$(2.109, 2.025)$	$(1.582, 2.025)$

Table 6. Parameters for service time distributions,

C V_{ζ_{i}}^{2} = 0.5

(a) and

C V_{ζ_{i}}^{2} = 20

(b).

Table 6. Parameters for service time distributions,

C V_{ζ_{i}}^{2} = 0.5

(a) and

C V_{ζ_{i}}^{2} = 20

(b).

(a)
$i$	1	2	3	4
$G (α_{i}, β_{i})$	$(2.00, 7.500)$	$(2.00, 3.750)$	$(2.00, 2.500)$	$(2.00, 1.875)$
$LN (m_{i}, σ_{i})$	$(- 1.524, 0.637)$	$(- 0.831, 0.637)$	$(- 0.426, 0.637)$	$(- 0.138, 0.637)$
$PR (k_{i}, α_{i})$	$(0.169, 2.732)$	$(0.338, 2.732)$	$(0.507, 2.732)$	$(0.676, 2.732)$
(b)
$i$	1	2	3	4
$G (α_{i}, β_{i})$	$(0.05, 0.198)$	$(0.05, 0.094)$	$(0.05, 0.063)$	$(0.05, 0.047)$
$LN (m_{i}, σ_{i})$	$(- 2.844, 1.745)$	$(- 2.151, 1.745)$	$(- 1.745, 1.745)$	$(- 1.458, 1.745)$
$PR (k_{i}, α_{i})$	$(0.135, 2.025)$	$(0.269, 2.025)$	$(0.405, 2.025)$	$(0.539, 2.025)$

Table 7. Comparison of optimal policies for

C V_{ν_{i}}^{2} = C V_{ζ_{i}}^{2} = 0.5

.

Table 7. Comparison of optimal policies for

C V_{ν_{i}}^{2} = C V_{ζ_{i}}^{2} = 0.5

.

	$G$	$LN$	$PR$
Service	$G$	$LN$	$PR$
$G$	$3.0836 \pm 0.0286$ $3.0491 \pm 0.0576$ $p = 0.2964$	$3.0556 \pm 0.0196$ $3.0203 \pm 0.0301$ $p = 0.0569$	$3.0096 \pm 0.0437$ $3.0083 \pm 0.0726$ $p = 0.9736$
$LN$	$3.0818 \pm 0.0291$ $3.0445 \pm 0.0233$ $p = 0.0527$	$3.0654 \pm 0.0227$ $3.0282 \pm 0.0364$ $p = 0.0931$	$3.0347 \pm 0.0622$ $3.0351 \pm 0.0877$ $p = 0.9881$
$PR$	$3.0904 \pm 0.0485$ $3.0168 \pm 0.0701$ $p = 0.0942$	$3.1142 \pm 0.0539$ $3.0572 \pm 0.0614$ $p = 0.1749$	$3.3081 \pm 0.4249$ $3.1435 \pm 0.1305$ $p = 0.4709$

Table 8. Comparison of optimal policies for

C V_{ν_{i}}^{2} = C V_{ζ_{i}}^{2} = 20

.

Table 8. Comparison of optimal policies for

C V_{ν_{i}}^{2} = C V_{ζ_{i}}^{2} = 20

.

	$G$	$LN$	$PR$
Service	$G$	$LN$	$PR$
$G$	$44.5518 \pm 5.3662$ $40.8015 \pm 4.0916$ $p = 0.2788$	$48.3524 \pm 13.0935$ $38.6532 \pm 15.3943$ $p = 0.3493$	$19.1573 \pm 3.7810$ $16.3102 \pm 1.6154$ $p = 0.1793$
$LN$	$44.0659 \pm 4.6092$ $41.6925 \pm 5.4512$ $p = 0.5162$	$26.7610 \pm 6.0684$ $28.8180 \pm 8.7892$ $p = 0.7067$	$9.6126 \pm 1.9352$ $11.9165 \pm 4.6811$ $p = 0.3759$
$PR$	$36.3436 \pm 4.0311$ $34.1937 \pm 2.4608$ $p = 0.3749$	$32.9247 \pm 11.1232$ $24.4347 \pm 4.1215$ $p = 0.1656$	$5.6667 \pm 0.7101$ $6.4067 \pm 1.5618$ $p = 0.4008$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Efrosinin, D.; Vishnevsky, V.; Stepanova, N. Optimal Scheduling in General Multi-Queue System by Combining Simulation and Neural Network Techniques. Sensors 2023, 23, 5479. https://doi.org/10.3390/s23125479

AMA Style

Efrosinin D, Vishnevsky V, Stepanova N. Optimal Scheduling in General Multi-Queue System by Combining Simulation and Neural Network Techniques. Sensors. 2023; 23(12):5479. https://doi.org/10.3390/s23125479

Chicago/Turabian Style

Efrosinin, Dmitry, Vladimir Vishnevsky, and Natalia Stepanova. 2023. "Optimal Scheduling in General Multi-Queue System by Combining Simulation and Neural Network Techniques" Sensors 23, no. 12: 5479. https://doi.org/10.3390/s23125479

APA Style

Efrosinin, D., Vishnevsky, V., & Stepanova, N. (2023). Optimal Scheduling in General Multi-Queue System by Combining Simulation and Neural Network Techniques. Sensors, 23(12), 5479. https://doi.org/10.3390/s23125479

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Optimal Scheduling in General Multi-Queue System by Combining Simulation and Neural Network Techniques

Abstract

1. Introduction

2. Single-Server System with Parallel Queues

3. Markov Decision Problem Formulation

4. Event-Based Simulation for General Model

5. Neural Network Architecture

6. Optimization of the Neural-Network-Based Scheduling Policy

7. Numerical Analysis

8. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI