Analysis of Multi-Server Queueing System with Flexible Priorities

: In this paper, a multi-server queueing system providing service to two correlated ﬂows of requests was considered. Non-preemptive priority was granted to one ﬂow via the preliminary delay of requests in the intermediate buffers with different rates of extracting from the buffers. Customers’ im-patience during waiting in the intermediate and main buffers was taken into account. The possibility of the use of the results of the mathematical analysis for managerial goals is numerically illustrated


Introduction
Queueing theory is a powerful tool for solving the problems of optimal sharing and scheduling limited resources in many real-world systems in the fields of telecommunications, transportation, logistics, emergency services, health-care, computer systems and networks, manufacturing, etc.; for recent references, see, e.g., [1][2][3][4][5][6][7][8][9].While the main amount of the existing queueing literature is devoted to the systems with homogeneous requests, the efforts of many researches have been focused also on the queueing systems with heterogeneous requests having, in general, different requirements for the service time and different economic value.An important class of such queueing systems assumes the support of a certain system of priorities provided to different types of requests aiming to create more comfortable conditions for requests belonging to the higher-priority classes.Examples of such classes are the urgent (related to safety for life or the security of objects) and non-urgent information in communication networks; handover and new calls in mobile communication networks; primary and cognitive users in cognitive radio systems; injured patients with a danger to their lives or without this in health emergency services; emergency and public or private transport on the roads in the city; preferred or ordinary clients of banks and other systems, etc.
In priority queueing systems, usually, requests of different types are stored in different (physically or virtually) buffers.Customers of low priority can be picked up for service only when the buffers designed for higher-priority requests are empty.There is a variety of different priority schemes suitable for modelling and optimising various real-life systems, including non-preemptive (not allowing the interruption of ongoing service), preemptive (allowing the interruption of service), and alternating priorities.Due to the finiteness of the shared resource and the use of work-conserving disciplines, the better are the conditions guaranteed to the high-priority requests, the worse are the conditions provided to the low-priority requests.Traditional, statical, priority schemes suggest that the priority is assigned in advance and does not depend on the queue lengths in the system.Thus, it is possible that, sometimes, there is a very long queue of low-priority requests while the queue of high-priority requests is short.This may not be fair with respect to low-priority requests.Therefore, different strategies of dynamically providing the priorities have been offered in the literature since the paper [23]; see the survey [24].Such strategies suggest that, at any moment of decision-making about the type of the request to be taken for service, this type is defined via some control policy depending on the relation of the queue lengths of different types of requests.A popular class of such policies is monotone policies using thresholds.Other possibilities to make the statical priority more favourable with respect to the low-priority requests are the use of randomisation in the choice (providing, with a certain probability, the chance to a low-priority request to enter the service even in the presence of high-priority requests), the mandatory service of a low-priority request after service, in turn a certain number of high-priority requests, provisioning the weighted service rates, etc.
There are also many works considering the accumulation of priority during a request stay in the queue.For a short review of the corresponding research, see, e.g., the papers [25,26].In [25], the model with a heterogeneous-batch-correlated arrival process of two types of requests and the phase-type distribution of service time (see, e.g., [27] for the definition and properties of such a distribution) was analysed.A non-priority request becomes the priority request after its waiting time exceeds some random time having the phase-type distribution.In [26], the model with the heterogeneous correlated arrival process of an arbitrary finite number of types of requests, a finite common buffer space, the phase-type distribution of the service time, and exponential distributions of times until priority upgrading was analysed.In both of these papers, the arrival flow was assumed to be defined by the Marked Markov Arrival Process (MMAP) (see, e.g., [28,29]), which is the generalisation of the well-known Markov Arrival Process (MAP) to the case of heterogeneous requests.In turn, the MAP is the significant generalisation of a stationary Poisson arrival process.In contrast to the stationary Poisson arrival process, the MAP is suitable for modelling, in particular, the flows in the modern communication networks and contact centres that exhibit the correlation and high variability of inter-arrival times.It is already well known that the ignorance of the correlation and high variability of inter-arrival times can lead to huge errors in evaluating the performance and the design of real-world systems.For the literature about the queues with the MAP, its properties, partial cases, and possible applications see, e.g., [6,7,[30][31][32][33][34][35][36][37][38].The literature on the priority queues and MMAP arrival process is still not very extensive.Among the recent papers mentioned above, such an arrival process was assumed in [14,18,20,22].
In the paper [39], a new flexible scheme for non-preemptive priority provision was offered.The idea of that scheme is not to define the rule for picking up requests of different priorities from the buffer, but to regulate the rate of admission of these requests to the buffer.This is achieved via managing the auxiliary intermediate buffers for preliminarily storing the arriving requests.The capacities of two intermediate buffers are different, as well as the rates of the transition of requests from these buffers into the main buffer, from which all requests are picked up for service according to First In-First Out (FIFO) principle.Via the proper choice of these rates and capacities, it is possible to provide any degree of priority for requests of both types.Usually considered in the literature, non-preemptive priorities are obtained as a very particular case of this priority scheme.
In this paper, we extended the results of [39] in two directions.The first direction is the consideration of a multi-server system instead of a single-server system, as analysed in [39].Multi-server queueing systems more adequately describe many real-world systems where the shared restricted resource is split into independent units providing service to the requests (operators in call centres, cashiers in stores, logical information transmission channels obtained from a single physical channel via the use of various multiplexing methods, etc.) and are a more difficult subject for investigation.The second direction is avoiding the loss of requests in the case when the intermediate buffers are overloaded.
Newly arriving requests to any intermediate buffer seeing that the buffer is full are not lost, as was assumed in [39], but push the first request from this buffer into the main buffer and occupy the vacant place in the intermediate buffer.This feature allows not only modelling the systems where the loss of requests due to the buffer overflow is not possible, but it allows dynamically giving additional priority to the requests from the currently long queue.As in [39], we took into account the possible impatience of requests in the intermediate and main buffers because it is well known (see, e.g., [40]) that requests in many systems exhibit impatience due to various reasons.
The structure of the rest of the paper is as follows.The mathematical model is described in Section 2. The multidimensional stochastic process describing the behaviour of the considered model is introduced and analysed in Section 3. In Section 4, the formulas for the computation of the key performance measures of the system are presented.In Section 5, the results of the numerical experiment are given.Section 6 concludes the paper.

Mathematical Model
We analysed a queueing system having N independent identical servers and a buffer of infinite capacity.Each server of the system can provide service to two types of requests at rate µ, µ > 0, independent of the type of request.
The arrivals of requests occur according to an MMAP.The MMAP is determined by the irreducible continuous-time Markov chain (CTMC) ν t , t ≥ 0, with the finite state space {1, 2, . . ., W}.The transition rates of this CTMC are defined by the generator, denoted as D(1).The matrix D(1) is represented in the additive form as where the sub-generator D 0 defines the transition rates of the CTMC ν t , which do not cause requests' arrival.The non-negative matrix D k defines the transition rates of the CTMC ν t , which are accompanied with the Type-k request arrival, k = 1, 2.
Let θ be the invariant probability row vector of the CTMC ν t .This vector is computed as the unique solution to the system of linear algebraic equations θD(1) = 0, θe = 1.Here and further, e denotes a column vector of 1s and 0 denotes a row vector of 0s with the appropriate dimension.The average arrival rate λ k of Type-k requests is computed by the formula λ k = θD k e, k = 1, 2. The total arrival rate of requests to the system is defined as λ = λ 1 + λ 2 .Generally speaking, the lengths of the intervals between requests' arrivals are correlated.The formulas for the computation of the coefficients of variation and correlation can be found, e.g., in [36].The methods for the estimation of the parameters of the MMAP describing the flow of requests in some real-world system based on the finite set of the observed request arrival moments (timestamps) were presented, e.g., in [41].
We assumed that Type-1 requests have a priority over Type-2 requests provided via the application of a request admission procedure, described as follows.If the request of any type arrives at the system when at least one server is idle, this request immediately starts service on an arbitrary idle server and, then, after being exponentially distributed with rate µ time, departs from the system.If an arbitrary Type-k request sees that all servers are busy, it is stored in the kth intermediate buffer, k = 1, 2. The capacities of the first and second intermediate buffers are equal to K and R, respectively.If the corresponding buffer is full, this request is placed in the buffer while the first request staying in this buffer is immediately pushed out of the buffer and transits to the main buffer of an infinite capacity.Each request placed in the kth buffer should reside there during exponential time with the rate γ k , γ k ≥ 0, k = 1, 2. After this time expires, the request immediately transits to the main buffer.After storing in this buffer, the requests of both types become indistinguishable and are picked up from this buffer for service according to the FIFO principle.If, at some service completion moment, the main buffer is empty, the released server picks up for service the first request from the first buffer, if any.If the first buffer is empty, the offer to start service receives the first request from the second buffer.If all buffers are empty, the server waits until any request arrives at the system, and it will have a chance to start service for this request.
As was proclaimed above, the described admission procedure is flexible in the sense of the degree of the privilege provided to Type-1 requests.The privilege is given via: (i) the order of polling the intermediate buffers when some server becomes idle (the request from the first buffer is invited for service first); (ii) the choice of the rates γ k of the transition of the requests from the intermediate buffers to the main buffer (rate γ 1 can be arbitrarily larger than γ 2 ); (iii) the proper choice of the capacities of the intermediate buffers.In contrast to [39], where the small capacity of the buffer might cause the loss of an arriving request and the capacity of the buffer for low-priority requests could be chosen as small, to drop part of these requests, in the model considered in this paper, a small buffer for some type of requests helps to obtain a quicker transition to the main buffer due to the push out mechanism.
Customers staying in the intermediate buffers are impatient.Customers staying in the kth buffer depart from the buffer independently of each other (are lost) instead of transitioning to the main buffer after residing in the buffer while being exponentially distributed with the rate α k time, α k ≥ 0, k = 1, 2. Therefore, the large capacity of a buffer and a large impatience rate may stimulate the frequent loss of low-priority requests.Customers staying in the main buffer also can be impatient.The patience time was assumed to have an exponential distribution with the rate ϕ, ϕ ≥ 0. After this time expires, the request departs from the system without service (is lost).
The operation of the system is schematically illustrated in Figure 1.Our goals were to construct the Markov process describing the behaviour of the system, implement its steady-state analysis, and numerically highlight some dependencies of the system performance measures on the parameters of the model.

Selection of the Random Process
Let: • i t , i t ≥ 0, be the total number of requests in service and in the main buffer; • k t , k t = 0, K, be the number of requests in the first intermediate buffer; • r t , r t = 0, R, be the number of requests in the second intermediate buffer; • ν t , ν t = 1, W, be the state of the underlying process of the MMAP; at moment t, t ≥ 0. Here, and further, notation like ν = 1, W means that the parameter ν admits values from the set {1, 2, . . ., W}.
The four-dimensional CTMC ξ t = {i t , k t , r t , ν t }, t ≥ 0, is regular and irreducible.Its infinite state space is defined as

Generator of the Random Process
To write down the generator of the CTMC ξ t , we need the following denotations: diag{a 1 , . . . ,a L } is the diagonal matrix with the diagonal entries given by the numbers {a 1 , . . ., a L }; square matrices C l , Îl , Ĩl , E − l , and E + l of size l, where l = K + 1 or l = R + 1, are given by: C l = diag{0, 1, . . ., l − 1}; Îl = diag{1, 0, . . ., 0}; Ĩl = diag{0, . . ., 0, 1}; matrices E − l and E + l have all zero entries, except the values ⊗ is the symbol of the matrix Kronecker product; see, e.g., [42][43][44]; I is the identity matrix, and O is a square zero matrix of appropriate size.If needed, the size is indicated as the suffix.
To simplify the analysis of the multi-dimensional CTMC ξ t , t ≥ 0, having one countable component (i t ) and three finite components, let us enumerate its states in the direct lexicographic order of the components.We call the set of states of this CTMC, which have the value i of the countable component i t , as level i of the CTMC.
Let Q be the generator of the CTMC ξ t , t ≥ 0.
Theorem 1.The generator Q has the following block-tridiagonal structure: where the non-zero blocks Q i,j , i, j ≥ 0, contain the intensities of the transition of the CTMC from the states that belong to the level i to the states that belong to the level j.These blocks are defined as follows: Proof.The proof of Theorem 1 was implemented by means of the analysis of the intensities of all possible transitions of the CTMC ξ t during the infinitely small time and is presented below.
The block-tridiagonal structure of the generator Q stems from the fact that requests arrive at the system and depart from it (due to service completion or impatience) only one by one.
The form of the non-zero blocks Q i,j , i, j ≥ 0, is explained as follows: • The block Q 0,0 : If the system is empty (i = 0), that is all three buffers are empty and all servers are idle, the behaviour of the CTMC ξ t is determined only by the process ν t .The intensities of its transitions to other states are equal to the non-diagonal elements of the matrix D 0 , and the rates of the exit from the corresponding states are determined up to the sign by the diagonal elements of this matrix.Thus, Q 0,0 = D 0 .

•
The diagonal entries of the blocks Q i,i , i ≥ 1: These entries are negative.Their modules define the exit rate of the CTMC ξ t from its state.The exit can occur due to the following reasons: (a) The underlying process ν t departs from its current state.The rates of departures are given by the modules of the diagonal elements of the matrix D 0 , if i = 1, N − 1, or matrix Service completion in one busy server occurs.The rates are given by the diagonal elements of the matrix µiI W , if i = 1, N, or matrix µN I (K+1)(R+1)W , if i > N. (c) A Type-1 request departs from the dedicated intermediate buffer due to impatience or transfer to the main buffer.The rates are given by the matrix (α 1 + γ 1 )C K+1 ⊗ I (R+1)W , i ≥ N. (d) A Type-2 request departs from the dedicated intermediate buffer due to impatience or transfer to the infinite buffer.The rates are given by the matrix (α 2 + γ 2 )I K+1 ⊗ C R+1 ⊗ I W , i ≥ N. (e) A request departs from the main buffer due to impatience.The rates are given by the matrix • The non-diagonal entries of the blocks Q i,i , i ≥ 1: These entries define the rates of transition of the CTMC ξ t within the level i.Such transition rates are given by: (a) Non-diagonal entries of the matrices D 0 , for i = 1, N − 1, or

•
The blocks Q i,i+1 , i ≥ 0: These blocks define the rates of transition of the CTMC ξ t when the number of requests in service or in the main buffer increases from , there is at least one idle server, this occurs when a request of any type arrives at the system and the request starts service.The transition rates of the process ν t at the moment of a request arrival are determined by the elements of the matrix D 1 + D 2 .When i = N − 1, the arrived request occupies the last idle server, and from this moment, the numbers of requests in the intermediate buffers should be counted.Row vector êK+1 ⊗ êR+1 fixes that both of these buffers are empty.Therefore, the block The increase of the number of requests in the infinite buffer may occur due to the transition of a request from some intermediate buffer to the infinite buffer.
Matrix γ 1 C K+1 E − K+1 determines the rate of transition of a request from Buffer 1 to the infinite buffer under the current number of requests in Buffer 1 and the decrease of the number of requests in Buffer 1.No transition of the number of requests in the infinite buffer and underlying process ν t can occur simultaneously with the transition of a request to the infinite buffer.Therefore, the intensities of all transitions of the CTMC ξ t at the moment of the request transition from Intermediate Buffer 1 to the infinite buffer are given by the matrix γ 1 C K+1 E − K+1 ⊗ I (R+1)W .By analogy, it may be shown that the intensities of the transitions of the CTMC ξ t at the moment of the request transition from Intermediate Buffer 2 to the infinite buffer are given by the matrix The increase of the number of requests in service and the infinite buffer from i to i + 1 when i, i ≥ N, can occur also when Buffer 1 is full and a new Type-1 request arrives.This request pushes the first request out of this buffer to the infinite buffer.The rates of transition of the CTMC ξ t at this moment are determined by the matrix ĨK+1 ⊗ I R+1 ⊗ D 1 .By analogy, it may be shown that the intensities of the transitions of the CTMC ξ t at the moment of the request being pushed out of Intermediate Buffer 2 to the infinite buffer are given by the matrix I K+1 ⊗ ĨR+1 ⊗ D 2 .As a result, we obtain above-given formula for the block.
The transitions from the level i to the level i − 1 are possible at the service completion moments (the corresponding rates are given by the matrix µiI and the moments of requests' departure from the infinite buffer due to impatience (the corresponding rates are given by the matrix . If i = N, the service completion leads to emptying one server.Thus, the block Q N,N−1 admits the form êT K+1 ⊗ êT R+1 ⊗ µN I W , where the column vector (ê K+1 ) T ⊗ (ê R+1 ) T is used to cancel the components describing the numbers of requests in Buffer 1 and Buffer 2 (these numbers are equal to zero by default).
Theorem 1 is proven.

Ergodicity Condition for the Random Process
Having determined the generator of the CTMC ξ t , we can proceed to the derivation of the ergodicity condition of this CTMC.
Theorem 2. The following statements are true.
If the requests residing in the infinite buffer are impatient, i.e., the rate ϕ is positive, then the CTMC ξ t is ergodic for an arbitrary set of the parameters of the system.
If the requests in this buffer are patient, i.e., the rate ϕ is equal to zero, then the criterion of the ergodicity of the CTMC ξ t is the fulfilment of the inequality: where the vector y is the unique solution to the system: Proof.Let us first consider the case ϕ = 0.In this case, it is easy to verify that there exist the limits: where the matrix R i is a diagonal matrix with the diagonal entries defined by the corresponding diagonal entries of the matrix −Q i,i , i ≥ 0. Therefore, the CTMC ξ t belongs to the class of continuous-time asymptotically quasi-Toeplitz-Markov chains (AQTMC); see [36,45].It follows from [45] that the sufficient condition for the ergodicity of the Markov chain ξ t is the fulfilment of the inequality: where the vector w is the unique solution to the system: It is easy to check that, for the considered CTMC ξ t with the limiting matrices defined in (4) and (5), it transforms to the evident inequality 1 > 0. This proves that the CTMC ξ t is ergodic for an arbitrary set of the parameters of the system.
Let us now consider the case ϕ = 0.In this case, the CTMC ξ t is the particular case of the quasi-birth-and-death processes (see [27]), and the criterion of the ergodicity the CTMC ξ t has the form: yQ − e > yQ + e (6) where the vector y is the unique stochastic solution to the equation: Taking into account that Q − = µN I (K+1)(R+1)W and, thus, yQ − e = µN, Inequality (6) reduces to (2).
Theorem 2 is proven.
Remark 1.It is easy to check that the vector y has the following probabilistic sense.When the main buffer is overloaded, the vector y defines the joint stationary distribution of the number of requests and the underlying process of MMAP in the queueing system with the MMAP arrival process, no buffer, two parallel service groups consisting of K and R servers, correspondingly, and the exponential service time distribution in the servers belonging to the rth group with the rate α r + γ r , r = 1, 2. It can be verified that the departure process of successfully serviced requests from this queueing system is the MAP defined by the matrices: The mean departure rate from this system is yH 1 e = yQ + e.In the situation when there are many requests in the main buffer, the discussed process of requests' departure from the system with two service groups defines the arrival process at the main buffer for service in the multi-server system with N servers.Therefore, the process defining the operation of this multi-server system when it is overloaded coincides with the process defining the operation of the MAP/M/N system with the MAP defined by the matrices H 0 and H 1 and the service rate in each server equal to µ.For the former system, it is well known that the ergodicity condition is yQ + e < Nµ.This inequality, only derived based on intuitive reasoning, coincides with the strictly proven Condition (2) above.

Remark 2.
It can be verified that the obtained Condition (2) in the case of a single server (i.e., N = 1) does not coincide with the condition derived for a single-server queue in [39].This is explained by the different assumptions about the fate of a request arriving when the dedicated intermediate buffer is full.Such a request is assumed to be lost in [39], while in the model under study in this paper, this request pushes out of the intermediate buffer the first request staying there, which joins the main buffer.
In the case of the patient requests in the infinite buffer (ϕ = 0), the way of solving this infinite system is well known; see [27,36].In particular, the vectors π i , i ≥ N, are computed by the formula: where the matrix S is the minimal non-negative solution of the nonlinear matrix equation: The vectors (π 0 , π 1 , . . ., π N ) are computed as the unique solution to the finite subsystem of equilibrium equations.
In the case of the impatient requests in the main buffer (ϕ > 0), the solution of this infinite system is much more involved.However, it can be solved using the numerically stable methods developed for the AQTMC; see [45][46][47].

Performance Measures
To give some insight into the quantitative behaviour of the system, we need to have the possibility to compute some key performance measures of the system.A few of these are listed below.
The mean number of requests in the system is calculated by the formula: The mean number of busy servers is calculated as The mean number of requests in the main buffer is computed by The mean number of requests in the first buffer is calculated by the formula: The mean number of requests in the second buffer is calculated by the formula: The loss probability of an arbitrary Type-1 request from the first buffer due to impatience is calculated by the formula: The loss probability of an arbitrary request from the first buffer due to impatience is calculated by the formula: The loss probability of an arbitrary Type-2 request from the second buffer due to impatience is calculated by the formula: The loss probability of an arbitrary request from the second buffer due to impatience is calculated by the formula: The intensity of the output flow of successfully served requests is computed by The loss probability of an arbitrary request is calculated by the formula: The loss probability of an arbitrary request from the main buffer due to impatience is calculated by the formula: Remark 3. It should be noted that the following equalities hold well: L = N serv + N bu f + N bu f −1 + N bu f −2 and P loss = P loss bu f −1 + P loss bu f −2 + P loss bu f , which can be used for the control of the accuracy of the computer realisation of the computation of the stationary probability vectors π i , i ≥ 0, and the performance characteristics of the model.
The intensity of the arrival flow of requests at the main buffer or directly at the servers is computed by The probability that an arbitrary Type-1 request will start servicing in the system immediately upon arrival is calculated by the formula: The probability that an arbitrary Type-2 request will start service in the system immediately upon arrival is calculated by the formula: The probability that an arbitrary Type-1 request will be selected for service from the first buffer without visiting the main buffer is calculated by the formula: The probability that an arbitrary Type-2 request will be selected for service from the second buffer without visiting the main buffer is calculated by the formula: The probability that an arbitrary Type-1 request upon arrival in the system will find the first buffer full and the first request from this buffer will go to the main buffer is calculated by the formula: The probability that an arbitrary Type-2 request upon arrival in the system will find the second buffer full and the first request from this buffer will go to the main buffer is calculated by the formula:

Numerical Examples
The arrival flow of requests was modelled by the MMAP arrival process defined by the following matrices: The total rate of requests' (priority and non-priority) arrival at the system is λ = 10.0009.The coefficient of correlation of successive inter-arrival times in this arrival process is 0.300005, and the squared coefficient of variation is 4.00035.The average intensity of priority (Type-1) requests' arrival is λ 1 = 3.00027, and the average intensity of non-priority (Type-2) requests' arrival is λ 2 = 7.00063.
The intensities of impatience in the first and the second buffers are equal to α 1 = 0.03 and α 2 = 0.01; the intensities of the transitions from the first and the second buffers to the main buffer are γ 1 = 0.5 and γ 2 = 0.2, respectively.The mean service rate is µ = 1.
We present the results of two experiments.In the first experiment, we fixed the capacities of the intermediate buffers and show the impact of the number of servers N and the impatience rate ϕ in the main buffer.In the second experiment, we fixed the values of N and ϕ and demonstrate the effect of changing the capacities K and R of the intermediate buffers.
Experiment 1.We assumed that the capacities of the intermediate buffers are K = 10 for priority requests and R = 15 for non-priority requests.Let us vary the intensity of the impatience ϕ over the interval [0.1,1] with a step of 0.1, and the number of servers N was varied over the interval [1,40] with a step of 1.
Figures 2-5 illustrate the dependence of the mean number of requests in the system L, the mean number of busy servers N serv , and the mean number of requests N bu f −1 in the first buffer and N bu f −2 in the second buffer on the values of the intensity ϕ and the number of servers N. It is evidently seen in Figure 2 that the mean number of requests in the system L is huge (about 70) when the number N of servers is relatively small (N = 5) and the impatience rate ϕ is also small.An explanation of this fact follows from Figure 3.It is seen in this figure that, when the number N of servers is 5, the average number N serv of busy servers is close to 5.This means that all available servers are practically always busy.It is well known that, in such a situation, the queue length is very long.Because the average number of requests in the main buffer N bu f is the summand in the right-hand side of the expression L = N serv + N bu f + N bu f −1 + N bu f −2 , it is easy to understand why L is huge when the number N of servers and the impatience rate are small.As expected, the value of L and all summands essentially decrease when the number of servers N and impatience rate ϕ increase.For large values of N (N ≥ 35), the mean number of busy servers reduces to about 10, while the values of other summands become practically negligible.The influence of the impatience rate ϕ is essential only when the number N of servers is small.When it is sufficiently large, service is provided quickly, the main buffer is practically always empty, and requests very rarely depart from this buffer due to impatience.
It should be noted, based on Figures 4 and 5, that the average number N bu f −2 of requests residing in the second buffer is essentially larger than the mean number N bu f −1 of requests in the first buffer.This takes place because the arrival rate at the second buffer is 2.33-times higher than the arrival rate at the first buffer and due to the priority provided to Type-1 requests via the higher transition rate to the main buffer and the smaller capacity of the intermediate buffer.For a small number N of servers, on average, only about 45 percent of the first buffer is occupied.The average percentage of occupation of the second buffer is about 1.9-times higher.
Figures 6-9 illustrate the dependence of the loss probability P loss bu f −1 of an arbitrary request from the first buffer, the loss probability P loss bu f −2 of an arbitrary request from the second buffer, the loss probability P loss bu f of an arbitrary request from the main buffer, and the loss probability of an arbitrary request P loss (all these losses occur due to impatience) on the values of the rate of impatience ϕ and the number of servers N. The shapes of the surfaces presented in these figures are similar to the shapes of surfaces presented in Figures 2-5.This was as anticipated because all the mentioned losses occurred due to impatience, and thus, the probabilities P loss bu f −1 , P loss bu f −2 and P loss bu f of the losses from the two intermediate buffers and the main buffer were proportional (with the weights defined by the respective impatience rates) to the mean number of requests in each buffer.The probability P loss of an arbitrary request loss is the sum of the loss probabilities P loss bu f −1 , P loss bu f −2 and P loss bu f , which is confirmed by Figures 6-9.It may be concluded from these figures that all loss probabilities essentially depend on N. The dependence on ϕ is weaker, especially for the loss probabilities from the intermediate buffers.Let us briefly illustrate the possibility of the use of the obtained results for the managerial goals.We considered the problem of the optimal choice of the number N of servers to maximise the profit of the system.It was assumed that the profit earned by the system during a unit of time under the fixed number N of servers is evaluated by the profit function: where a is the profit gained via service provision to one request, b k is the penalty of the system paid for the loss of one request from the kth intermediate buffer, k = 1, 2, c is the penalty of the system paid for the loss of a request from the main buffer, and d is the cost of the maintenance of one server per unit of time.
Let the cost coefficients a, b 1 , b 2 , c, d be fixed as follows: The surface showing the dependence of the cost function E(N) on the number of servers N and the impatience rate ϕ is presented in Figure 10.The optimal values of N were separately computed for each fixed value of the impatience rate ϕ.Table 1 contains the optimal value N * of N and the corresponding optimal value E(N * ) for ten fixed values of ϕ.It is clear that the increase of the impatience rate ϕ implies a larger value of the probability P loss bu f .To decrease this probability, it is necessary to decrease the mean number of requests in the buffer, which can be achieved via the increase of the number of servers N.This explains the growth of N * when ϕ increases observed in Table 1.When the number of servers is sufficiently large, the servers succeed in providing service at such a speed that the queue length in the main buffer is very small and the increase of the impatience rate ϕ practically does not have an impact on the value of the profit function.

Example 1.
Let us now fix the number of servers N = 15 and the impatience rate in the main buffer φ = 0.05.To show the impact of the capacities of the intermediate buffers R and K, we computed the values of various performance measures for the values of R and K varying in the range from 1 to 20 with a step of one.
Figures 11-13 illustrate the dynamics of the mean number of requests N bu f −1 and N bu f −2 in the first and second buffers and the mean number of requests N bu f in the infinite buffer.It is natural that the value of the mean number N bu f −1 of requests in the first buffer increases when the capacity K of this buffer increases.The essential growth of N bu f −1 when the capacity R of the second buffer decreases is explained as follows.When R decreases, more Type-2 requests are pushed out from the second buffer due to the arrival of new Type-2 requests.Therefore, the probability that the main buffer is empty decreases and the chances of Type-1 requests to realise their priority via the privilege to be taken for service when the main buffer becomes empty decrease.This leads to the increase of N bu f −1 .The maximum of the mean number N bu f −2 of requests in the second buffer is essentially larger than the maximum of N bu f −1 .This occurs due to the higher arrival rate of Type-2 requests and the lower rate of transition from the intermediate buffer to the main one.However, the influence of the relation of the capacities of the intermediate buffers is also high.If R is small, clearly, this reduces the part of the priority of Type-1 requests achieved via their higher rate of transition from the intermediate buffer to the main buffer.
The maximum of the mean number N bu f of requests in the main buffer is achieved for a small capacity R of the second buffer.The arrival rate at this buffer is essentially higher than at the first buffer, and a small R leads to the short stay of Type-2 requests in the second buffer before being pushed out to the main buffer.When both K and R are larger, requests stay in the intermediate buffer during a more or less long time.This long delay reduces the burstiness of the flow to the main buffer (we remind that the coefficient of correlation in the arrival process is about 0.3, which is rather large), while it is known in the literature that lower burstiness (or higher regularity) in the arrival process leads to a shorter queue in the system.
Figures 14 and 15 depict the dependence of the probability P choose−k that an arbitrary Type-k request will be selected for service from the k buffer, k = 1, 2, without visiting the main buffer on K and R. Recall that, for Type-1 requests, this can happen if all N servers are busy, the main buffer is empty, the service in one of the servers is completed, and the first intermediate buffer is not empty.For Type-2 requests, this can happen if all N servers are busy, the main buffer is empty, the service in one of the servers is completed, the first intermediate buffer is empty, and the second intermediate buffer is not empty.Figure 14 correlates with Figure 13.When K and R are large, the mean number N bu f is the minimal.Thus, the probability that the infinite buffer is empty at the moment of a server releasing is high and the probability P choose−1 is large.Analogously, when K and R are small (the main role is played by the capacity R of the intermediate buffer, which stores a more intensive flow of Type-2 requests), the mean number N bu f is the max.Thus, the probability that the infinite buffer is empty at the moment of a server releasing is small, and correspondingly, the probability P choose−1 is small.The growth of P choose−1 with the increase of K (which is sharper when K is still relatively small) stems from the increase of the probability that the first buffer will not be empty at the moment of a server releasing.The reason for the growth of P choose−2 with the increase of R is similar.The impact of the variation of K on the value of P choose−2 is weak.Dependence of the probability P choose−1 that an arbitrary Type-1 request will be selected for service from the first buffer without visiting the main buffer on K and R. Dependence of the probability P choose−2 that an arbitrary Type-2 request will be selected for service from the second buffer without visiting the main buffer on K and R.  show the dependence on K and R of the following loss probabilities: the probabilities P loss bu f −k of an arbitrary request loss from the kth buffer, k = 1, 2, the probability P loss bu f of an arbitrary request loss from the main buffer, and the probability P loss of an arbitrary request loss.Because an arbitrary request loss in the intermediate buffer is due to impatience, it is clear that the loss probability P loss bu f −k increases with the increase of the capacity of the kth intermediate buffer, k = 1, 2. Because the capacities of these buffers are relatively small and the impatience rate in the infinite main buffer is larger compared to the rates in the intermediate buffers, the probability P loss bu f of an arbitrary request from the main buffer also is larger.As is seen from Figures 16-19, this probability is a dominating summand at the righthand side of the relation P loss = P loss bu f + P loss bu f −1 + P loss bu f −2 .The decrease of the probability P loss bu f when the capacity R grows is explained by the the increase of the probability P loss bu f −2 , leading to the decrease of the arrival rate at the main buffer, the decrease of the queue length in this buffer, and eventually, the decrease of the rate of requests' departure from the main buffer due to impatience.
The dependence of the probabilities P push−k that an arbitrary Type-k request upon arrival in the system will find the kth buffer full, k = 1, 2, and the first request from this buffer will go to the main buffer on K and R is shown in Figures 20 and 21  As expected, the probabilities P push−k are maximal when the capacity of the kth buffer is small and essentially decrease when this capacity increases.Furthermore, these probabilities are weakly sensitive with respect to the capacity of another buffer.

Conclusions
In this paper, a new flexible mechanism for providing preference to one type of request, which was offered in [39] for a single-server priority queueing system, was applied to a multi-server queueing system.The priority is granted via the introduction of intermediate buffers having finite capacities.Requests of different priorities are distinguished by the rate of transfer from these buffers to the main buffer and the rates of departing from the buffers without service.The arriving process of requests can be correlated and have a large inter-arrival time variance.Requests staying in the main buffer receive service in the order of their transition to this buffer.A suitable choice of the rates of transition from the intermediate buffers to the main buffer, as well as the capacities of the intermediate buffers allows optimising the operation of the system.The impact of the capacities of the intermediate buffers, the number of servers, and the impatience rate in the main buffer was illustrated via the presented results of the numerical experiment.
The results obtained in the paper can be used for the optimisation of various real-world systems with heterogeneous requests having different importance for the system.They can be extended to the cases of the batch arrival of requests, the phase-type distribution of the service time and the patience time in the intermediate buffers, the possibility of server breakdowns or errors occurring during the service, an arbitrary number of priority classes, etc.

Figure 1 .
Figure 1.Structure of the system.

Figure 2 .Figure 3 .Figure 4 .Figure 5 .
Figure 2. Dependence of the mean number of requests in the system L on ϕ and N.

Figure 6 .Figure 7 .Figure 8 .Figure 9 .
Figure 6.Dependence of the loss probability P loss bu f −1 of an arbitrary request from the first buffer on ϕ and N.

Figure 10 .
Figure 10.Dependence of the profit function E(N) on the number of servers N and the impatience rate ϕ.

Figure 11 .Figure 12 .Figure 13 .
Figure 11.Dependence of the mean number of requests N bu f −1 in the first buffer on K and R.

Figure 14 .
Figure14.Dependence of the probability P choose−1 that an arbitrary Type-1 request will be selected for service from the first buffer without visiting the main buffer on K and R.

Figure 15 .
Figure15.Dependence of the probability P choose−2 that an arbitrary Type-2 request will be selected for service from the second buffer without visiting the main buffer on K and R.

Figure 16 .Figure 17 .Figure 18 .Figure 19 .
Figure 16.Dependence of loss probability P loss bu f −1 of an arbitrary request from the first buffer on K and R. .

Figure 20 .Figure 21 .
Figure20.Dependence of the probability P push−1 that an arbitrary Type-1 request upon arrival will push the first request from the intermediate buffer to the main buffer on K and R.
when the process ν t makes a transition without the generation of a request.(b) Entries of the matrix E + K+1 ⊗ I R+1 ⊗ D 1 , i ≥ N, when a Type-1 request arrives and joins the first intermediate buffer.(c) Entries of the matrix I K+1 ⊗ E + R+1 ⊗ D 2 , i ≥ N, when a Type-2 request arrives and joins the second intermediate buffer.(d) Entries of the matrix α 1 C K+1 E − K+1 ⊗ I (R+1)W , i ≥ N, when a Type-1 request departs from the intermediate buffer due to impatience.(e) Entries of the matrix α 2 I K+1 ⊗ C R+1 E − R+1 ⊗ I W , i ≥ N, when a Type-2 request departs from the intermediate buffer due to impatience.(f) Entries of the matrix (E − K+1 ⊗ I R+1 + ÎK+1 ⊗ E − R+1 ) ⊗ µN I W when the main buffer is empty at some service completion moment, while the intermediate buffers are not both empty.

Table 1 .
Optimal values of the number of servers and the profit function for various values of ϕ.