Abstract
This paper deals with heterogeneous queues where servers differ not only in service rates but also in operating costs. The classical optimisation problem in queueing systems with heterogeneous servers consists in the optimal allocation of customers between the servers with the aim to minimise the long-run average costs of the system per unit of time. As it is known, under some assumptions the optimal allocation policy for this system is of threshold type, i.e., the policy depends on the queue length and the state of faster servers. The optimal thresholds can be calculated using a Markov decision process by implementing the policy-iteration algorithm. This algorithm may have certain limitations on obtaining a result for the entire range of system parameter values. However, the available data sets for evaluated optimal threshold levels and values of system parameters can be used to provide estimations for optimal thresholds through artificial neural networks. The obtained results are accompanied by a simple heuristic solution. Numerical examples illustrate the quality of estimations.
    1. Introduction
Many queueing systems are analysed for their dynamic and optimal control related to system access, resource allocation, changing service area characteristics and so on. Sets of computerised tools and procedures provide large data sets which can be useful to expand potential of classical optimisation methods. The paper deals with a known model of a multi-server queueing system with controllable allocation of customers between heterogeneous servers which are differentiated by their service and cost attributes. For the queueing system with two heterogeneous servers it has been shown in [] by using a dynamic programming approach that to minimise the mean sojourn time of customers in the system, the faster server must be always used and the customer has to be assigned to the slower server if and only if the number of customers in the queue exceeds the certain threshold level. Furthermore, this result was obtained independently in more simple form in [,]. In [], the author has analysed a multi-server version of such a system and confirmed a threshold nature of the optimal policy as well.
The problem of an optimal allocation of customers between heterogeneous servers in queueing systems with additional costs with the aim to minimise the long-run average cost per unit of time is notoriously more difficult. Some progress has been made after the appearance of a review paper []. In [,], the authors studied a model with set-up costs using a hysteretic control rule, thereby stressing the algorithmic aspects of the optimal control structure. The same system has been discussed in [], where a direct method that provides a closed-form expression for the stationary occupancy distribution was proposed. In [,], the authors have used theoretical study and exhaustive numerical analysis to show that for some specified servers, ordering the optimal allocation policy which minimises the long-run average cost belongs to a set of structural policies. In other words, for the servers’ enumeration (1), the allocation control policy denoted by f can be defined through a sequence of threshold levels . With respect to the defined policy, the server operating at highest rate should remain busy by non-empty queueing system. The kth server () is used only if the first  servers are busy and the queue length reaches a threshold level . In the general case, the optimal threshold levels can depend on states of slower server and formally the optimal policy f is not of a pure threshold type. However, since the kth threshold value may vary by at most one when the state of slower server changes and it has a weak effect on the average cost, such influence can be neglected. Hence the optimal allocation policy for multi-server heterogeneous queueing system can be treated as a threshold one.
Searching for the optimal values of  by direct minimising the average cost function can be expensive, especially when K is large. To calculate the optimal threshold levels we can use a policy-iteration algorithm [,,] which constructs a sequence of improved policies that converges to optimal one. This algorithm is a fairly versatile tool for solving various optimisation problems. Unfortunately, as is usually the case in practice, this algorithm is not without some limitations, such as the difficulties associated with convergence when the traffic is close to loaded, limitation on the process dimension and, consequently, on the number of states. Thus, we would like to compensate for some of the weaknesses of this algorithm with other methods for calculating the optimal control policy. The contribution of this paper can be briefly described in two conceptual parts. In the first part, we propose a heuristic solution (HS) to obtain functional relationships for optimal thresholds based on a simple discrete approximation of the system’s behaviour. The second part is devoted to the alternative machine learning technique such as artificial neural networks (NN) [,,] which is used again for the estimation of the optimal threshold levels. The policy-iteration algorithm is used in the paper to generate the data sets needed both to verify the quality of the proposed optimal threshold estimation methods and to train the neural networks. We strongly believe that the trained neural network can be successfully used to calculate the optimal thresholds for those system parameters for which alternative numerical methods are difficult or impossible to use, for example, in heavy traffic case, or, in general, to reconstruct the areas of optimality without usage of time-expensive algorithms and procedures. There are some number of papers on prediction of the stochastic behaviour of queueing systems and networks using machine learning algorithms, see e.g., [,] and references therein. However, we unsuccessfully tried to find published works where heuristics and machine learning methods would be used to solve a similar optimisation problem for heterogeneous queueing systems and therefore we consider this paper relevant.
This paper is organised as follows. In Section 2, we briefly discuss a mathematical model. Section 3 introduces some heuristic choices for threshold levels that turn out to be nearly optimal. Section 4 presents results when the trained neural network was ran on verification data of the policy-iteration algorithm.
2. Mathematical Model
We summarise briefly the model under study. The queueing system is of the type  with infinite-capacity buffer and K heterogeneous servers. This system is shown schematically in Figure 1. The Poisson arrival stream has a rate  and the exponential distribution of the service time at server j has a rate . We assume that the service in the system is without preemption, when customer in service cannot change the server. The random variables of the inter-arrival times and the service times of the servers are assumed to be independent. An additional cost structure is introduced, consisting of the operating cost  per unit of time of service on server j and the holding cost  of waiting in the queue. Assume that the servers are enumerated in a way   
      
      
        
      
      
      
      
    
      where  stands for the mean operating cost per customer for the jth server.
 
      
    
    Figure 1.
      Controllable multi-server queueing system with heterogeneous servers and operating costs.
  
The controller has full information about the system’s state and, based on this information, can make control actions on the system at the decision epochs when certain state transitions occur, following the prescription of the policy f. In our case, the controller selects the control action at the time when a new customer enters the system and at the service completion times, if the queue is not empty. When a new customer arrives, it joins the queue and at the same time, the controller sends another customer from the head of the queue to one of the idle servers or leaves it in the queue. At the service completion, the customer leaves the corresponding server, and at the same time the controller takes the next customer from the head of the queue, if it is not empty, and dispatches it to one of idle servers or can leave it in the queue as well. The service completion in the system without waiting customers does not require the controller to perform any control action.
The fact that the optimal policy for the problem of minimising the long-run average cost per unit of time belongs to a set of threshold-based policies for the multi-server heterogeneous queueing systems with costs were proved first in [] and further conformed for systems with heterogeneous groups of servers in []. The corresponding optimal thresholds can in the general case depend on the states of slower servers. However, according to obtained numerical results in [], we can neglect the weak influence of the slower servers’ states on the optimal allocation policy for the faster servers. This phenomena was discussed additionally in Example 2. Therefore, we may assume that the optimal policy belongs to the class of a pure threshold policy when the use of a certain server depends solely on the number of waiting customers in the queue. Specifically, for the system under study, such a policy is defined by the following sequence of threshold levels:
      
        
      
      
      
      
    
The policy prescribes the use of the k fastest servers whenever the number of customers waiting in the queue satisfies the condition .
To calculate optimal thresholds we need to formulate the introduced optimisation problem in terms of a Markov decision process. This process is based on a -dimensional continuous-time Markov chain
      
      
        
      
      
      
      
    
      with an infinitesimal matrix  which depends on the policy f. Here the component  stands for the number of waiting customers at time t and
      
      
        
      
      
      
      
    
The state space of the process  operating under some policy f is , where the notations  and  are used respectively for the queue length and for the state of jth server in state .
The possible server states are partitioned as follows:
      
        
      
      
      
      
    
The sets  and  denote the sets of idle and busy servers in state , respectively. The set of control actions a is . If , the controller allocates a customer to the queue. Otherwise, if , the controller instructs a customer to occupy the server with a number a. In addition, we can define the subsets  of admissible actions in state x The policy f specifies the choice of a control action at any decision epoch and the infinitesimal matrix  of the Markov-chain (3) has then the following elements,
      
      
        
      
      
      
      
    
      where  is defined as -dimensional unit vector with each element equal to zero except the jth position ().
We will search for the optimal control policy among the set of stationary Markov policies f that guarantee ergodicity of the Markov chain . The corresponding stability condition is obviously defined as . It follows from the fact, that if number of customers exceeds a threshold , then the queueing systems behaves like a  queue with an arrival rate  and total service rate . As it is known, see e.g., [], the ergodic Markov chain with costs implies the equality of the long-run average cost per unit of time for the policy f and the corresponding assemble average, that can be written in the form
      
      
        
      
      
      
      
    
      where  is an immediate cost in state . The cost function  is given by
      
      
        
      
      
      
      
    
This function describes the total average cost up to time t given the initial state is x and  is a stationary state distribution for the policy f. The policy  is said to be optimal when for  defined in (4) we evaluate
      
      
        
      
      
      
      
    
To evaluate optimal threshold levels and optimised value for the mean average cost per unit of time the policy-iteration Algorithm 1 is used. This algorithm constructs a sequence of improved policies until the average cost optimal is reached. It consists of three main steps: value evaluation, policy improvement and threshold evaluation. The Value evaluation is based on solving, for a given policy f, a system of linear equations
      
      
        
      
      
      
      
    
| Algorithm 1 Policy-iteration algorithm | 
| 
 | 
For the dynamic-programming value function , which indicates a transition effect of an initial state x to the total average cost and satisfies the following asymptotic relation,
      
      
        
      
      
      
      
    
In order to make the system (6) solvable, one of the values  must be set to zero, e.g., for  we set . Since in our case , the first equation of the system (6) is of the form . In the policy improvement step a new policy  is calculated by minimising the value function  for any state  and any admissible control action . The algorithm converges if the policies f and  on neighbouring iterations are equal. In the threshold evaluation we calculate the optimal thresholds , , based on optimal policy f. As an initial policy we select the policy which prescribes in any state the usage of a server j with the minimal value of the mean operating cost  per customer. More detailed information on deriving the dynamic programming equations for the heterogeneous queueing system and calculating the corresponding optimal allocation control policy can be found in []. For existence of an optimal stationary policy and convergence of the policy-iteration algorithm we refer to [,,,].
To realise the policy-iteration algorithm we convert the -dimensional state space  of the Markov decision process to a one-dimensional equivalent state space. Let  be a one-to-one mapping of the vector state  to a value from  which is of the form
      
      
        
      
      
      
      
    
A new state after transition involving the addition or removal of customer in some state , in a one-dimensional state space is calculated by
      
      
        
      
      
      
      
    
Further in the algorithm, an infinite buffer system must be approximated by an equivalent system where the number of waiting places is finite but at the same time is sufficiently large. As a truncation criterion, we use the loss probability which should not exceed some small value .
Remark 1. 
If the buffer size is W, the number of states is
      
        
      
      
      
      
    In case the number of waiting customers is getting larger as the level , all servers must be occupied and the system dynamics is the same as in a classical queue  with arrival rate λ and service rate . The stationary state probabilities for the states x where the component  satisfy the following difference equation
      
        
      
      
      
      
    which has a solution in a geometric form, , . For details and theoretical substantiation see, e.g., []. Note that the value of  included in this formula can be estimated by a heuristic solution (9). Then the truncation parameter W of the buffer size can be evaluated from the following constraint for the loss probability
      
        
      
      
      
      
    where . After simple algebra, it implies
      
        
      
      
      
      
    
Example 1. 
Consider the system  with  and . All other parameters take the following values
| j | 0 | 1 | 2 | 3 | 4 | 5 | 
| 1 | 5 | 4 | 3 | 2 | 1 | |
| - | 20 | 8 | 4 | 3 | 1 | |
| - | 0.25 | 0.50 | 0.75 | 0.67 | 1.00 | 
The truncation parameter W of the buffer size is chosen at value 80 which for  guarantees that . Here  was calculated by (9). In a control table, we summarise the functions  which specify the control actions at time of arrivals to a certain state x:
| System State | Queue Length | |||||||||||||
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | ... | |
| (0,*,*,*,*) | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 
| (1,0,*,*,*) | 0 | 0 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 
| (1,1,0,*,*) | 0 | 0 | 0 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 
| (1,1,1,0,*) | 0 | 0 | 0 | 0 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 
| (1,1,1,1,0) | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 5 | 5 | 5 | 
| (1,1,1,1,1) | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 
Threshold levels , , can be evaluated by comparing the optimal actions  for . In this example the optimal policy  is defined here through a sequence of threshold levels  and . The bold and underline format in a control table is used to label the change of the control action in a certain system state.
In the next example we give some arguments that allow us to work further only with the threshold-based control policies.
Example 2. 
  
    
         
       
    
  
  
  
    
         
       
    
  
  
  
    
         
       
    
  
  
Consider the system  with  servers. The aim of this example consists in the following: With respect to the system states  and  the assignment to the second server can in general depend not only on the number of customers in the queue but also on the state of the third server. In this example it is optimal to make an assignment in state x but not in state y. We solve optimisation problem for the following parameters:
- , , and ,
- , , and .
The of optimal solution for the first and second group of system parameters are represented in Table 1 and Table 2, respectively.
 
       
    
    Table 1.
    Control table.
  
 
       
    
    Table 2.
    Control table.
  
We notice that for most parameter values the optimal decision can be made independently of the states of the slower servers. However, it is interesting to consider the reasons for such possible dependence. It is evident that in our optimisation problem, the optimal policy assigns a customer to the fastest free server in states for which this would not be optimal if there were no arrivals. This is because the system should be ready for possible arrivals, which, if they occur, will wish to see a less congested system.
Consider now the system with three servers in the states  and , where . Let us consider the case of potential service completion at the second server, taking into account a large number q of accompanied arrivals. Because of large q, it is optimal to occupy all accessible idle servers. The states mentioned above become  and . Thus, the difference  of value functions measures the advantage that will be obtained in the case of the assignment to the second processor . The events of service completion on the second server provide the incentive to make an assignment to the second server. However, if the two initial states are  and , the measure of advantage if service completion takes place is . Since we expect that the value function  is convex in q, it is plausible that the incentive to make an assignment to the second server is greater in state  than in . Numerical examples proposed in Table 3 confirm our expectations.
 
       
    
    Table 3.
    Value function for system states.
  
The further numerical examples show that the threshold levels have a very weak dependence of slower servers’ states. According to our observations, the optimal threshold may vary by at most 1 when the state of a slower server changes.
The data needed either to verify the heuristic solution or for training and verification of the neural network was generated by a policy-iteration algorithm in form of the list
      
      
        
      
      
      
      
    
Example 3. 
Some elements of the list S for the  queueing system are
      
        
      
      
      
      
    
3. Heuristic Solution
In this section, we want to obtain a heuristic solution (HS) to calculate the optimal thresholds ,  for the arbitrary K in explicit form. For this purpose, we will use a simple deterministic approximation for the dynamic behaviour of the number of customers in the queue as illustrated in Figure 2.
 
      
    
    Figure 2.
      Queue length approximation.
  
Let  is an optimal threshold used to dispatch the customer to server k in state , where the first  servers are busy. Now we compare the queues of the system given initial state is , where the kth server is not used for a new customer, and , where the kth server is occupied by a waiting customer. It is assumed that the stability condition holds. The initial queue lengths are labelled in Figure 2 by  and . The proposed deterministic approximation is based on an assumption that the queue length of the system with the first  busy servers decreases with the rate . When this rate is keeping until the queue is empty, it occurs at time points  and  respectively for the given initial queue length A and B. The total (accumulated) holding times of all customers in the queue with lengths  and  are equal respectively to the number of square blocks of dimension  within the areas  and  multiplied by the mean service time of the approximated model:   
      
        
      
      
      
      
    
The mean operating cost of the first  servers during the time period until the queue becomes empty given the initial state is  can be calculated by
      
      
        
      
      
      
      
    
The expression  means the probability of the service completion at the ith server, and the mean operating cost given the initial state is , which can be defined as .
Now using the deterministic approximation we can formulate the following proposition.
Proposition 1. 
The optimal thresholds , , are defined by
      
        
      
      
      
      
    
Proof.  
Let  be the overall average system cost until the system becomes empty given the initial state is . This value can be represented as a sum of the total holding cost of customers waiting in the queue and mean operating cost of all servers which remain busy in state x. Assume that the controller performs a decision to allocate the customer to the kth server in state . As a result, it leads to a reduction of the overall system costs according to the proposed deterministic approximation, i.e.,
        
      
        
      
      
      
      
    
        where
        
      
        
      
      
      
      
    
      
        
      
      
      
      
    
Now, expressing  after some simple manipulations we obtain the heuristic solution for the optimal value of  in form (9).    □
Example 4. 
  
    
       
      
    
  
  
  
    
         
       
    
  
  
Consider a queueing system from the previous example for . We select randomly from the data set S (8) a list of system parameters  and calculate by means of the HS (9) threshold levels , . Figure 3 illustrates the efficiency of the proposed heuristic solution respectively for threshold levels  by confusion matrices. The matrix row represents the elements including a predicted value while each column represents the elements for an actual value. As a metric for the closeness of the measurements to a specific value and to the interval with possible deviation of threshold by  from the real value, the overall accuracy and accuracy  are used. The results are summarised in Table 4.
 
      
    
    Figure 3.
      Confusion matrices (a–d) for prediction of  and  using HS.
  
 
       
    
    Table 4.
    Accuracy for prediction with HS.
  
4. Artificial Neural Networks
Artificial Neural Networks (NN) belong to a set of supervised machine learning methods. It is most popular in different applied problems including data classification, pattern recognition, regression, clustering and time series forecasting. Here we show that the NN can give even more positive results compared to the HS that indicates the possibility to use it for predicting the structural control policies.
The data set S (8) is used to explore predictions for the optimal threshold levels through the NN. The multilayer neural network is used for the data classification. It can be formally defined as a function , which maps an input vector  of dimension  to an estimate output  of the class number . The network is decomposed into 6 layers as illustrated in Figure 4, each of which represents a different function mapping vectors to vectors. The successive layers are: a linear layer with an output vector of size k, a nonlinear elementwise activation layer, other three linear layers with output vectors of size k and a nonlinear normalisation layer.
 
      
    
    Figure 4.
      Architecture of the neural network.
  
The first layer is an affine transformation
      
      
        
      
      
      
      
    
      where  is the output vector,  is the weight matrix,  is the bias vector. The rows in  are interpreted as features that are relevant for differentiating between corresponding classes. Consequently,  is a projection of the input  onto these features. The second layer is an elementwise activation layer which is defined by the nonlinear function  setting negative entries of  to zero and uses only positive entries. The next three layers are other affine transformations,
      
      
        
      
      
      
      
    
      where , , and , . The last layer is the normalisation layer , whose componentwise is of the form
      
      
        
      
      
      
      
    
The last layer normalises the output vector  with the aim to get the values between 0 and 1. The output  can be treated as a probability distribution vector, where the Nth element  represents the likelihood that  belongs to class N.
We use 70% of the same data S which was not used to verify the quality of the HS in a training phase of the NN and the rest of S—as validation data. We train a multilayer (6-layer) NN using an adaptive moment estimation method [] and the neural network toolbox in Mathematica© of the Wolfram Research. Then we verify the approximated function
      
      
        
      
      
      
      
    
      which should be accurate enough to be used to predict new output from verification data. The algorithm was ran many times on samples and networks with different sizes. In all cases the results were quite positive and indicate the potential of machine learning methodology for optimisation problems in the queueing theory.
Example 5. 
  
    
       
      
    
  
  
  
    
         
       
    
  
  
The results of estimations of the optimal threshold values using the trained NN are summarised again in form of confusion matrices, as is shown in Figure 5. The overall accuracy of classification and accuracies for the values with deviations are given in Table 5. We can see that the NN methodology exhibits even more accurate estimations for the optimal thresholds if the results are compared with the corresponding HS.
 
      
    
    Figure 5.
      Confusion matrices (a–d) for prediction of  and  using NN.
  
 
       
    
    Table 5.
    Accuracy for prediction with NN.
  
5. Conclusions
We combine classic methodology of analysing controllable queues with a heuristic solution and machine learning to study the possibility to estimate the values of optimal thresholds. Due to the fact that the results were quite positive, we can make the following general conclusion. With this study we confirm that the analysis of controlled queueing systems and the solution of optimisation problems using classical Markov decision theory can be successfully combined with machine learning techniques. These approaches do not contradict each other; on the contrary, combining them provides new results.
Author Contributions
Conceptualization, D.E.; formal analysis, investigation, methodology, software and writing, D.E. and N.S. Both authors have read and agreed to the published version of the manuscript.
Funding
Open Access Funding by the University of Linz. This research has been supported by the RUDN University Strategic Academic Leadership Program (recipient D. Efrosinin).
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
Not applicable.
Acknowledgments
This paper has been supported by the RUDN University Strategic Academic Leadership Program (recipient D. Efrosinin), Open Access Funding by the University of Linz.
Conflicts of Interest
The authors declare no conflict of interest.
References
- Lin, W.; Kumar, P.R. Optimal control of a queueing system with two heterogeneous servers. IEEE Trans. Autom. Control 1984, 29, 696–703. [Google Scholar] [CrossRef]
- Koole, G. A simple proof of the optimality of a threshold policy in a two-server queueing system. Syst. Control. Lett. 1995, 26, 301–303. [Google Scholar] [CrossRef]
- Walrand, J. A note on: “Optimal control of a queuing system with two heterogeneous servers”. Syst. Control Lett. 1984, 4, 131–134. [Google Scholar] [CrossRef]
- Rykov, V. Monotone Control of Queueing Systems with Heterogeneous Servers. QUESTA 2001, 37, 391–403. [Google Scholar]
- Crabill, T.; Gross, D.; Magazine, M.J. A classified bibliography of research on optimal design and control of queues. Oper. Res. 1977, 25, 219–232. [Google Scholar] [CrossRef]
- Nobel, N. Hysteretic and Heuristic Control of Queueing Systems. Ph.D. Thesis, Vrije University Amsterdam, Amsterdam, The Netherlands, November 1998. [Google Scholar]
- Nobel, R.; Tijms, H.C. Optimal control of a queueing system with heterogeneous servers and set-up costs. IEEE Trans. Autom. Control 2000, 45, 780–784. [Google Scholar] [CrossRef]
- Le Ny, L.-M.; Tuffin, B. A Simple Analysis of Heterogeneous Multi-Server Threshold Queues with Hysteresis; Institut National de Recherche en Informatique: Nancy, France, 2000. [Google Scholar]
- Efrosinin, D. Controlled Queueing Systems with Heterogeneous Servers: Dynamic Optimization and Monotonicity Properties of Optimal Control Policies in Multiserver Heterogeneous Queues; VDM Verlag: Saarbrücken, Germany, 2008. [Google Scholar]
- Rykov, V.; Efrosinin, D. On the slow server problem. Autom. Remote. Control 2010, 70, 2013–2023. [Google Scholar] [CrossRef]
- Howard, R. Dynamic Programming and Markov Processes; Wiley Series; Wiley: London, UK, 1960. [Google Scholar]
- Puterman, M.L. Markov Decision Process; Wiley Series in Probability and Mathematical Statistics; Wiley: London, UK, 1994. [Google Scholar]
- Tijms, H.C. Stochastic Models. An Algorithmic Approach; John Wiley and Sons: New York, NY, USA, 1994. [Google Scholar]
- Gershenson, C. Artificial Neural Networks for Beginners; 2003; Available online: http://arxiv.org/abs/cs/0308031 (accessed on 20 August 2003).
- Rätsch, G. A Brief Introduction into Machine Learning; Friedrich Miescher Laboratory of the Max Planck Society: Tuebinger, Germany, 2004. [Google Scholar]
- Russel, S.J.; Norvig, P. Artificial Intelligence. A Modern Approach; Prentice-Hall, Inc.: Upper Saddle River, NJ, USA, 1995. [Google Scholar]
- Kyritsis, A.I.; Deriaz, M. A machine learning approach to waiting time prediction in queueing scenarios. In Proceedings of the Second International Conference on Artificial Intelligence for Industries, Laguna Hills, CA, USA, 25–27 September 2019; pp. 17–21. [Google Scholar]
- Stintzing, J.; Norrman, F. Prediction of Queuing Behaviour through the Use of Artificial Neural Networks. Available online: http://www.diva-portal.se/smash/get/diva2:1111289/FULLTEXT01.pdf (accessed on 18 June 2017).
- Xia, L.; Zhang, Z.G.; Li, Q.-L.; Glynn, P.W. A c/μ-Rule for Service Resource Allocation in Group-Server Queues. arXiv 2018, arXiv:1807.05367. [Google Scholar]
- Aviv, Y.; Federgruen, A. The value-iteration method for countable state Markov decision processes. Oper. Res. Lett. 1999, 24, 223–234. [Google Scholar] [CrossRef]
- Özkan, E.; Kharoufeh, J.P. Optimal control of a two-server queueing system with failures. Probab. Eng. Inform. Sci. 2014, 28, 489–527. [Google Scholar] [CrossRef]
- Sennott, L.I. Stochastic Dynamic Programming and the Control of Queueing Systems; Wiley: New York, NY, USA, 1999. [Google Scholar]
- Efrosinin, D.; Sztrik, J. An algorithmic approach to analyzing the reliability of a controllable unreliable queue with two heterogeneous servers. Eur. J. Oper. Res. 2018, 271, 934–952. [Google Scholar] [CrossRef]
- Kingma, D.P.; Adam, J.B. A Method for Stochastic Optimization; 2015; Available online: https://arxiv.org/abs/1412.6980 (accessed on 30 January 2017).
| Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. | 
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
