Here we discuss the main model of the nonpreemptive finitesource controlled queueing system of the type
$M/M/K/N//N$ illustrated in
Figure 1. The system has
K heterogeneous servers with different rates
${\mu}_{1}\ge {\mu}_{2}\ge \cdots \ge {\mu}_{K}>0$ and
N customers in a source. It operates under the optimal allocation policy which minimizes the mean number of customers in the system. It will be shown that this policy is defined through a sequence of threshold levels
$1={q}_{1}\le {q}_{2}\le \cdots \le {q}_{K}<\infty $ for the queue lengths which prescribe the activation of slower servers. The analysed system can be treated as a model for the machinerepairman problem, where
N unreliable machines in a working area with exponential distributed life times and equal rates
$\lambda >0$ must be repaired by
K heterogeneous repair stations. The machines fail independently of each other. The stream of failed machines can be treated as an arrival stream of customers to the queueing system. Hereafter, we will refer to the customer as a failed machine which enters the repair system and gets there a repair service. After the repair the machine becomes as good as a new one and it returns to the working area. The aim is to dynamically allocate the customers to the servers in order to minimize the longrun average number of customers in the system and to calculate the corresponding mean performance measures.
2.1. MDP Formulation
We formulate the optimal allocation problem in this machinerepairman system as a Markov Decision Process (MDP) in the following way. The behaviour of the system is described by a multidimensional continuoustime Markovchain
where
$Q\left(t\right)$ stands for the number of customers waiting in the queue at time
t and
${D}_{j}\left(t\right)$ specifies the state of the
jth server at time
t, where
State space: The set
${E}_{X}$ consists of
$K+1$ dimensional row vectors,
where
$q\left(x\right)$ denotes the number of customers in the queue and
${d}_{j}\left(x\right)$ – the status of the
jth server in state
x. The total number of states in the set
${E}_{X}$ is equal to
${E}_{X}={\sum}_{j=0}^{K}\left(\genfrac{}{}{0pt}{}{K}{j}\right)(Nj+1)$.
Decision epochs: The arrival and service completion epochs in the system with waiting customers.
Action space:$A=\{0,1,\cdots ,K\}$. To identify the group of idle and busy servers, the following sets are defined,
With this notations the set of admissible control actions $A\left(x\right)\subseteq A$ in state $x\in {E}_{X}$ can be defined as $A\left(x\right)={J}_{0}\left(x\right)\cup \left\{0\right\}$. The action $a\in {J}_{0}\left(x\right)$ means that in state x a customer must be allocated to an idle server, while $a=0$ means that the customer must be routed to the queue. At an arrival epoch, which occurs only if the number of customers in the system is less than N, the arrived customer joins the queue and simultaneously another one from the head of the queue must be routed to some idle server or returned back to the queue. At a service completion epoch the same happens, i.e. the customer from the head of the queue is routed either to one of idle servers or to the queue again. By service completion in a system without waiting customers no actions have to be performed.
Immediate cost: The function
$l\left(x\right)$ specifies the number of customers in a state
$x\in {E}_{X}$, i.e.,
which is in fact independent of a control action
a.
Transition rates: The policydependent infinitesimal matrix
${\Lambda}^{f}={\left[{\lambda}_{xy}\left(a\right)\right]}_{x,y\in {E}_{X}}$ of the Markovchain (
1) includes the rates to go from state
x to state
y given the control action is
a defined as
with
${\lambda}_{x}\left(a\right)={\lambda}_{xx}\left(a\right)={\sum}_{y\ne x}{\lambda}_{xy}\left(a\right)$, where
${S}_{a}$ and
${S}_{j}^{1}$ stand for the shift operators applied to the vector state
x in the following way,
Due to the finiteness of the state space
${E}_{X}$ and boundedness of the immediate cost function
$l\left(x\right)\le N$, a stationary averagecost optimal policy
$f:E\to A$ exists with a finite constant mean value for the number of customers in the system
which is independent of the initial state
x. In this case the policyiteration algorithm introduced in Algorithm 1 converges.
This algorithm consists of two main parts: Policy evaluation and Policy improvement. In the first part, a system of linear equations with immediate costs
$l\left(x\right)$
is solved for the unknown realvalued dynamicprogramming value function
${v}^{f}:{E}_{X}\to \mathbb{R}$ and mean value
${g}^{f}$ given a control policy is
f. The second part of the algorithm is responsible for improving the previous policy, which for a given system consists in determining, for each system state, a control action
a that minimizes the value function
$v\left({S}_{a}x\right)$. The improved control action in state
x is defined then as
${f}^{*}\left(x\right)=\underset{a\in A\left(x\right)}{argmin}v\left({S}_{a}x\right)$ for
$x\in {E}_{X}\setminus \{x:l\left(x\right)=N\}$. Thus, the algorithm constructs a sequence of improved control policies until it finds one that minimizes the gain
${g}^{f}$.
In Algorithm 1 we perform a conversion of the
$K+1$dimensional state space
${E}_{X}$ of the Markov chain (
1) to onedimensional equivalent state space using the function
$\Delta :{E}_{X}\to {\mathbb{N}}_{0}$, where
In onedimensional state space the transitions due to arrivals and service completions can be defined then as
For more details about derivation of the optimality equation for heterogeneous queueing systems the interested reader is referred to relevant publications, e.g., [
3].
Algorithm 1 Policyiteration algorithm 
 1:
procedurePIA($K,N,\lambda ,{\mu}_{j},j=1,2,\cdots ,K$)  2:
${f}^{\left(0\right)}\left(x\right)=\underset{j\in {J}_{0}\left(x\right)}{argmax}\left\{{\mu}_{j}\right\}$ ▹ Initial policy  3:
$n\leftarrow 0$  4:
${g}^{{f}^{\left(n\right)}}=N\lambda {v}^{{f}^{\left(n\right)}}\left({\mathbf{e}}_{1}(K+1)\right)$ ▹ Policy evaluation  5:
for $x=(0,1,0,\cdots ,0)\mathbf{to}(NK,1,1,\cdots ,1)$ do  6:
end for  7:
 8:
if ${f}^{(n+1)}\left(x\right)={f}^{\left(n\right)}\left(x\right),\phantom{\rule{0.166667em}{0ex}}x\in {E}^{f}$ then return ${f}^{(n+1)}\left(x\right),{v}^{{f}^{\left(n\right)}}\left(x\right),{g}^{{f}^{\left(n\right)}}$  9:
else $n\leftarrow n+1$, go to step 4  10:
end if  11:
 12:
end procedure

Numerical analysis confirms our expectation that the optimal control policy in heterogeneous systems for a finite number of customers also belongs to a class of threshold policies, as in infinite population case. Theoretical justification of this statement is still difficult. For this purpose it is necessary to prove that the dynamicprogramming operator
B defined for our queueing model as
where
${T}_{0}$ and
${T}_{j}$ are the events operators in case of a new arrival and a service completion at server
$j\in {J}_{1}\left(x\right)$,
preserves the monotonicity properties of the increments of the value function
v:
In proving the inequality (
7) we encounter difficulty. This is due primarily to the form of the operator
B in (
5). There is a term describing arriving customers whose coefficient
$(Nl(x\left)\right)\lambda $ depends on the system state
x. Bringing the terms in inequality (7) to a common denominator by introducing fictitious transitions, we get terms which cannot be proved to be negative. We hope that we will be able to overcome these difficulties in our next paper, but to date we’re basing our statement about a threshold structure of the optimal control policy
f exclusively on the performed numerical experiments. The following example makes the case vividly.
Example 1. Consider the system with $K=5$, $N=60$ and $\lambda =0.3$. The service rates take the following values: $({\mu}_{1},{\mu}_{2},{\mu}_{3},{\mu}_{4},{\mu}_{5})=$ $(20,8,4,2,1)$. The Table 1 of optimal control actions $f\left(x\right)$ for selected system states x is of the form: Threshold levels ${q}_{k}$, $2\le k\le K$, are evaluated by comparing the optimal actions $f\left(x\right)=0$ and $f\left({S}_{0}x\right)=k$ for $x=(q\left(x\right),1,\cdots ,1,0,{d}_{k+1}\left(x\right),\cdots ,{d}_{K}\left(x\right))$, $0\le q\left(x\right)\le N{\sum}_{j=1}^{K}{d}_{j}\left(x\right)$, ${d}_{j}\left(x\right)\in \{0,1\}$. In this example the optimal policy f is defined here through a sequence of threshold levels $({q}_{2},{q}_{3},{q}_{4},{q}_{5})$ $=(1,2,4,9)$ and ${g}^{f}=4.91549$.
2.2. Evaluation of System Performance Measures
We are concerned in calculation of the system performance measures for a given policy f. The state probabilities and performance characteristics defined here will refer to some particular fixed control policy f, so we will use in notations the corresponding upper index. The states x of the set ${E}_{X}$ with $q\left(x\right)=0$ are ordered according to the number of busy servers ${J}_{1}\left(x\right)$ while the states for $q\left(x\right)>0$ are ordered with respect to the queue length, so that the infinitesimal matrix ${\Lambda}^{f}$ has a block threediagonal structure for the fixed policy f. First we define the performance characteristics:
The probability that the kth server $1\le k\le K$ is busy, ${\overline{U}}_{k}^{f}={\sum}_{x\in {E}_{X}}{d}_{k}\left(x\right){\pi}_{x}^{f}$;
The mean number of busy servers, ${\overline{C}}^{f}={\sum}_{k=1}^{K}{\overline{U}}_{k}^{f}$;
The mean number of customers in the queue, ${\overline{Q}}^{f}={\sum}_{x\in {E}_{X}}q\left(x\right){\pi}_{x}^{f}$.
The mean number of customers in the system, ${\overline{N}}^{f}={\overline{C}}^{f}+{\overline{Q}}^{f}$.
The following vectors of dimension
${E}_{X}1$ comprise the policydependent values
${a}^{f}\left(x\right)$ and policyindependent values
$b\left(x\right)$,
where the first elements of the vectors are respectively
${a}^{f}\left({\mathbf{e}}_{1}(K+1)\right)$ and
$b\left({\mathbf{e}}_{1}(K+1)\right)$. Denote by
${\overline{M}}_{1}^{f}$ one of the performance characteristics
${\overline{U}}_{k}^{f}$,
${\overline{C}}^{f}$,
${\overline{Q}}^{f}$ and
${\overline{N}}^{f}$.
Proposition 1. The performance measure ${\overline{M}}_{1}^{f}$ satisfies the relationwhere the vector ${\mathbf{a}}^{f}$ is a solution of the system The matrix ${\tilde{\Lambda}}^{f}$ is obtained from ${\Lambda}^{f}$ by removing the first column and the first row, and Proof. We multiply the both sides of the equality (
9) by the rowvector of the stationary state probabilities
${\tilde{\pi}}^{f}=({\pi}_{x}^{f}:x\in E\setminus \left\{{x}_{0}\right\})$,
where
${\tilde{\pi}}^{f}\mathbf{b}={\sum}_{x\in {E}_{X}\setminus \left\{{x}_{0}\right\}}b\left(x\right){\pi}_{x}^{f}$ for the corresponding function
$b\left(x\right)$ is obviously equal to the performance measure
${\overline{M}}_{1}^{f}$. The following sequence of relations
validates the statement. □
The following measures characterize the behaviour of the system in a busy period which we define as a duration starting when the arrived customer enters the empty system in state ${x}_{0}$ and finishes when the system visits ${x}_{0}$ again after a service completion.
The mean length of a busy period, ${\overline{L}}^{f}=\frac{1}{N\lambda}\left(\frac{1}{{\pi}_{{x}_{0}}^{f}}1\right)$;
The mean number of customers served in a busy period by the kth server, ${\overline{N}}_{L,k}^{f}$;
The total mean number of customers served in a busy period,
${\overline{N}}_{L}^{f}={\sum}_{k=1}^{K}{\overline{N}}_{L,k}^{f}=\frac{1}{{\pi}_{{x}_{0}}^{f}}$.
In the following proposition we describe a general way to calculate these characteristics for the fixed control policy f. Denote by ${\overline{M}}_{2}^{f}$ one of the performance characteristics ${\overline{L}}^{f}$, ${\overline{N}}_{L,k}^{f}$ and ${\overline{N}}_{L}^{f}$.
Proposition 2. The performance measure ${\overline{M}}_{2}^{f}$ satisfies the relationwhere the vector ${\mathbf{a}}^{f}$ is a solution of the system The matrix ${\tilde{\Lambda}}^{f}$ is obtained from ${\Lambda}^{f}$ by removing the first column and the first row, and Proof. Denote by
${\tilde{\phi}}_{x}^{f}\left(s\right)={\int}_{0}^{\infty}{\phi}_{x}^{f}\left(t\right){e}^{st}dt$,
$Re\left[s\right]>0$, the LaplaceStiltjes transform (LST) of the probability density function (PDF)
${\phi}_{x}^{f}\left(t\right)$ for the first passage time to state
${x}_{0}$ given that the initial state is
$x\in {E}_{X}$, the control policy is
f and by
${\overline{L}}_{x}^{f}={\int}_{0}^{\infty}t{\phi}_{x}^{f}\left(t\right)dt$ the corresponding first moment. According to the first step analysis we get for the LST the system
We take into account that
${\overline{L}}^{f}\left(x\right)=\frac{d}{ds}{\tilde{\phi}}_{x}^{f}\left(s\right){}_{s=0}$, we can obtain from (
13) the system for the conditional moments
The system (
14) for the transition rates (
2) is of the form
By expressing relations (
15) in matrix form and taking into account that
${\overline{L}}^{f}:={\overline{L}}^{f}\left({\mathbf{e}}_{1}(K+1)\right)$ we obtain the expressions (
11) for
${a}^{f}\left(x\right)={\overline{L}}^{f}\left(x\right)$.
Denote now by
${\tilde{\psi}}_{x,k}^{f}={\sum}_{i=0}^{\infty}{\psi}_{x,k}^{f}\left(i\right){z}^{i}$,
$\leftz\right\le 1$, the probability generating function (PGF) of the PDF
${\psi}_{x,k}^{f}\left(i\right)$ of the number of service completion at server
k up to the end of busy period given that the initial state is
$x\in {E}_{X}\setminus \left\{{x}_{0}\right\}$. With respect to the law of the total probability we get the following relations for the function
${\psi}_{x,k}^{f}\left(i\right)$,
The first term on the right hand side of (
16) represents the transition to state
u accompanied with an event we count, that is a service completion at server
k. The second term stands for other possible transitions. The system (
16) can be rewritten in terms of the PGF in the following form,
The expressions (
17) can be modified using the property
${\overline{N}}_{L,k}^{f}\left(x\right)=\frac{d}{dz}{\tilde{\psi}}_{x,k}\left(z\right){}_{z=1}$ in such a way that we get a system for the corresponding first moments,
For the model under study the system (
18) is of the form
The last system can be also expressed in form (
11) for
${a}^{f}\left(x\right)={\overline{N}}_{L,k}^{f}\left(x\right)$ and
${\overline{N}}_{L,k}^{f}={\overline{N}}_{L,k}\left({\mathbf{e}}_{1}(K+1)\right)$. For the mean total number of customers served
${\overline{N}}_{L}$ the term
${d}_{k}\left(x\right){\mu}_{k}$ on the right hand side of (
19) must be replaced by
${\sum}_{k=1}^{K}{d}_{k}\left(x\right){\mu}_{k}$. □
Finally, one more performance measure in this section is of our interest, namely, the distribution of the maximal queue length in a busy period for the given control policy
f. Denote by
${Q}_{max}^{f}$ the maximum number of customers waiting in the queue during a busy period. For each fixed value
$n\ge 0$ the event
$\{{Q}_{max}^{f}\le n\}$ is equivalent to the event that the process
${\left\{X\left(t\right)\right\}}_{t\ge 0}$ starting in state
${\mathbf{e}}_{1}(K+1)$, where the first server is busy, hits the empty state
${x}_{0}$ before visiting the subset of states
The probability
${\overline{Q}}_{max,n}^{f}=\mathbb{P}[{Q}_{max}^{f}\le n]$ will be calculated by means of absorption probabilities for states in a set of absorbing states
${E}_{max,n}\cup \left\{{x}_{0}\right\}$ given that the initial state is
$x\in {E}_{X,n}={E}_{X}\setminus {E}_{max,n}\cup \left\{{x}_{0}\right\}$. Denote by
the columnvectors of dimension
${E}_{X,n}={E}_{X}{E}_{max,n}1={\sum}_{j=0}^{K}(\genfrac{}{}{0pt}{}{K}{j})(n+1)1$, where
n is fixed. Denote further by
${\overline{M}}_{3}^{f}$ one of performance characteristics
${\overline{Q}}_{max,n}^{f}$,
$n\ge 0$.
Proposition 3. The performance measure ${\overline{M}}_{3}^{f}$ satisfies the relation where the vector ${\mathbf{a}}^{f}$ is a solution of the system The matrix ${\tilde{\Lambda}}^{f}\left(n\right)$ is obtained from ${\tilde{\Lambda}}^{f}$ by removing all columns and rows starting with the $n+1$, and Proof. Denote by
${\overline{Q}}_{max,n}^{f}\left(x\right)$ the probability of absorption into empty state
${x}_{0}$ starting in
$x\in {E}_{X,n}$, where
${\overline{Q}}_{max,n}^{f}={\overline{Q}}_{max,n}^{f}\left({\mathbf{e}}_{1}(K+1)\right)$, where
${\mathbf{e}}_{1}(K+1)$ as before is the state after an arrival to an empty state
${x}_{0}$. The following system can be obtained by conditioning on the next visited state Using again the first principles,
For the queueing system operation under the control policy
f the system (
23) is of the form,
Then after a routine of (block) identification the system (
24) can be expressed in form (
21), where
${a}^{f}\left(x\right)={\overline{Q}}_{max,n}^{f}\left(x\right)$,
$x\in {E}_{X,n}$. □
As we can see, calculating the performance characteristics requires solving very similar systems of equations. Thus, the same algorithm can be used for this purpose by substituting appropriate values into vectors ${\mathbf{a}}^{f}$ and b, This versatility of the proposed approach greatly simplifies the application of algorithmic types of analysis of complex controlled queueing systems. In principle, we assume that for a fixed control threshold policy, the structure of the infinitesimal matrix can be even fully defined for an arbitrary number of servers, as will be proposed in the next section for the special case of the control policy where all thresholds are equal to 1. Thus we believe that matrix expressions can be derived explicitly from the presented matrix systems for performance characteristics. We leave this problem for our research in the near future.