Asymptotic Analysis for Systems with Deferred Abandonment

: This short paper concerns the analysis of the M/M/k queueing system with customer abandonment. In this system, service managers provide a ﬁnite buffer space, which is a waiting area that prevents customers from abandoning the system. Abandonment of the system can occur from reneging (exiting from the queue while waiting), and/or balking (leaving the system without waiting). We derive an analytical expression to represent the impact of the buffer space capacity on the delay probability and the abandonment probability for a system with deferred abandonment. The result indicates the provision of the buffer space in a large system could only increase the delay probability while the abandonment probability remains unchanged. Despite the benevolent intentions of service managers, providing a buffer space may exacerbate the performance of larger systems.


Introduction
Facility managers who operate large service systems such as call centers often face two conflicting goals when their systems are congested. The first goal is to reduce the probability that customers need to wait, which is represented by a delay probability P Q . The second goal is to increase the number of customers they can serve (i.e., to increase the throughput of the system). Managers can increase their throughput by reducing the number of customers leaving the system due to reneging (exiting the queue while waiting) and/or balking (not entering the queue and walking away), or in other words, by reducing a customer abandonment probability P ab . Facility managers typically prioritize one over the other, if both goals cannot be satisfied at the same time when a system is congested. In this paper, we provide a model that is helpful for managers to control the balance between these important performance indicators, P Q and P ab , when a simultaneous improvement is not possible.
Large service systems tend to exhibit high customer abandonment via reneging and/or balking. Such systems attract many prominent researchers and have been studied [1][2][3][4]. In recent years, many variations of systems with customer abandonment have also been studied: service slowdowns are incorporated to the system [5], availability of servers is time varying [6], and customers' patience depends on their individual service requirements [7]. Out of many models, the Erlang A model, an M/M/k+M queueing model with exponential reneging, is frequently used. The most important finding for the Erlang A model is the three asymptotic regimes describing the congestion properties of the system: Quality-and-Efficiency-Driven (QED), Quality-Driven (QD), and Efficiency-Driven (ED) regimes.
The square-root staffing rule in the QED regime plays an important role in the analysis of the Erlang A model [8,9]. The square-root staffing rule shows that by allocating a specific number of staff following the rule, facility managers can stably operate their systems. However, there are several elements commonly observed in reality, but not incorporated in the standard Erlang A model. First, the Erlang A model only considers reneging as a form of customer abandonment. In reality, customers not only renege, but may also balk when systems are heavily congested. If a customer knows there is going to be a significant amount of wait time, many customers would not want to enter the queue and will balk. Second, facility managers often provide a buffer space between the queue and the service area to prevent abandonment. A buffer space is an area that customers can wait before proceeding to a service area. For example, in the lobby of a restaurant those at the front of the queue are in a buffer space, which does not incur reneging/balking; however, once the queue extends outside of the facility, abandonment is more likely. Another important example is the emergency department (ED), where many patients who need urgent medical attention randomly arrive. After being triaged and registered, patients often face a long wait in the buffer space. Patients who have been triaged and registered may not renege or balk, but those not yet triaged and waiting outside may renege at random times, or may balk and promptly leave for another hospital if the queue is long. In today's competitive market, the need for hospitals and urgent care to prevent balking or reneging is crucial from a fiscal perspective and from a medical perspective.
We propose a deferred abandonment model that represents a service facility with a buffer space, such as the emergency department and restaurants. Our deferred abandonment model is similar to the two-stage reneging queue discussed in [10], which represents a queue with two different reneging stages. Our model is different from the two-stage queue in three aspects: (1) Our model assumes a buffer space that does not allow customers/patients to abandon a queue, while the two-stage queue does not have such a buffer space, (2) our model allows either reneging or balking as an abandonment, while the two-stage queue only considers reneging, and (3) our model allows arrival and/or service rates to change when there's a queue. We study a system with deferred abandonment, derive the asymptotic formulae to represent its performance indicators, and analyze the impact of the buffer space on these measures. We show that despite the benevolent intentions of facility managers to improve the performance of their systems, providing a buffer space for customers could increase the waiting probability without improving the throughput when systems are large.

Deferred Abandonment Model
The deferred abandonment model represents a system where the first n (≥0) customers in queue do not renege or do not incur state-dependent balking; abandonment is deferred by n as a result (see Figure 1). To analyze the properties of the deferred abandonment model, we split the system into sub-systems. We have three sub-chains: sub-chain 1 (an M/M/s/s queue: from states 0 to s), sub-chain 2 (a reneging/balking queue: from states t to infinity), and the buffer chain representing a buffer space, which is sub-chain 3 (an M/M/1/n queue: from states s to t), where states s and t are shared by neighboring sub-chains. We denote stationary probabilities of state k in the entire Markov chain and truncated sub-chain i as π k and π i k , respectively. We consider two systems: reneging system and balking system. For the reneging system, we assume exponential reneging with rate γ > 0. For the balking system, we assume that the arrival rate drops by a linear balking rate δ > 0 for each additional customer in queue. For either system, we allow changes in arrival and service rates when there is a queue that incurs abandonment, and denote these rates as λ Q = (1 − ε)λ and µ Q = (1 + τ)µ, respectively. (Thus, constant balking λ − λ Q = ελ is incorporated in both (exponential) reneging and (linear) balking systems.) Let reneging/balking start at state t(= s + n), where s is the number of staff. The birth and death coefficients of the Markov chain representing the deferred abandonment model are as follows: 1.
For the reneging system, the total arrival rate and the total service rate at state k are

2.
For the balking system, the total arrival rate and the total service rate at state k are Note: (·) + denotes a positive part of a function. Our deferred abandonment model can represent either the (exponential) reneging system (γ > 0) or the (linear) balking system (δ > 0), both of which incorporate buffer spaces (n ≥ 0), constant balking (ε ≥ 0), and change of server speed (any τ). Our model is reduced to the original Erlang A/B/C models by choosing parameters appropriately. For example, if we set γ > 0 (δ = 0), n = 0 (thus s = t), and ε = τ = 0, then our model becomes the standard Erlang A model (M/M/s+M queue). If n = ∞ when s > λ/µ, our model approaches the Erlang C model (M/M/s queue). Finally, if n = 0 and ε = 1 (thus λ Q = 0), our model becomes the Erlang B model (M/M/s/s queue).
Before concluding this section, we define several parameters to simplify the presentation of this paper. We define the resource requirement of sub-chain 1 as R := λ µ and the resource requirement of sub-chain 2 as R Q := λ Q µ Q . In this paper, we assume R Q ≤ R (because facility managers always try to reduce the level of congestion when the system is congested). We define linear and square-root staffing coefficients as a := s−R R (≥ −1) and c := s−R √ R , respectively. Server utilization is defined as ρ := λ sµ . We also define a Q := Other symbols are defined as needed throughout the paper.

Analysis of the Deferred Abandonment Model
We define the following performance indicators: P Q is the probability that a customer enters a queue with abandonment (reneging or balking) (i.e., P Q := ∑ ∞ k=t π k ); P ab is the probability that customers abandon a queue via constant balking or exponential reneging (for the reneging system), or abandon a queue via constant balking or linear balking (for the balking system); P W is the probability that an arriving customer sits in one of the n seats in the buffer space (i.e., P W := ∑ t−1 k=s π k ). (Note: P W = 0 when n = 0.) We define the delay probability for this system as P Q+ := ∑ ∞ k=s π k = P W + P Q . We represent these performance indicators by the blocking probabilities of three truncated sub-chains: π 1 s , π 2 t , π 3 s , and π 3 t . For this purpose, we utilize Kelly's property (Corollary 1.10. in [11]) that holds for a reversible Markov chain. (Note: The extension of the property to more general Markov chains is discussed in [12].) Since our deferred abandonment model is a reversible Markov chain, the entire Markov chain and its truncated sub-chains satisfy Kelly's property: Lemma 1 (Kelly's property). Suppose that a Markov chain is reversible. Let P i be the probability that we observe a state in sub-chain i. P i = π k /π i k holds for any sub-chain i and any state k in sub-chain i.
The proof of Lemma 1 is omitted (see [11] or [12] if interested.) Lemma 1 essentially states that a truncated sub-chain has the same stationary distribution as the distribution of the entire Markov chain given in the sub-chain. Lemma 1 allows us to work on the individual truncated sub-chain rather than on the entire Markov chain, simplifying the analysis of Markov chain models substantially. Using Lemma 1 repeatedly, we can derive the exact relationships among performance indicators and blocking probabilities, which are summarized in Lemma 2.

Lemma 2.
The following structural representation holds for the deferred abandonment model: Proof of Lemma 2. We denote the sub-chain that combines sub-chain 3 and sub-chain 2 as sub-chain 2+. By viewing chain 2+ as the entire chain and chain 3 as a sub-chain of chain 2+, we can apply Lemma 1 to states s and t that belong to sub-chain 2+, and obtain the relationship π 2+ s /π 3 s = π 2+ t /π 3 t , from which we obtain π 2+ s = (π 3 s /π 3 t )π 2+ t . Additionally, using Lemma 1, we can show and likewise, 1 Combining these results, we can derive 1/π s : We can also derive P W and P Q using Lemma 1: Finally, notice that the abandonment occurs only at the reneging/balking sub-chain (i.e., sub-chain 2) and the probability of abandonment given sub-chain 2 is p = Thus, using Lemma 1 again, we obtain

Remark 1.
By plugging the exact, approximate, or asymptotic limit of blocking probabilities π 1 s , π 2 t , π 3 s , and π 3 t into Lemma 2, we can derive the exact, approximate, or asymptotic limit of performance indicators for the deferred abandonment model, respectively.
For the rest of this section, we show that the two important indicators, the delay probability P Q+ and the abandonment probability P ab , exhibit a trade-off relationship when the number of buffer spaces n changes. For this purpose, we denote performance indicators of the deferred abandonment model as an explicit function of n: π s (n), P W (n), P Q (n), P Q+ (n), and P ab (n). When n = 0, sub-chain 3 is reduced to a single state s; thus, s = t, π 3 s = π 3 t = 1, P W (0) = 0, and P Q+ (0) = P Q (0) hold. To simplify the representation, we introduce functions P Q and P ab that represent delay and abandonment probabilities for the Markov chain model which comprises two of the three sub-chains in the deferred abandonment model: sub-chains 1 and 2, sub-chains 1 and 3, or sub-chains 3 and 2. (Note: Abandonment probability is defined properly only when the right sub-chain is sub-chain 2.) When n = 0, the model is composed of sub-chains 1 and 2, and thus, and P ab (0) = To prove the trade-off relationship between P Q+ (n) and P ab (n), the following lemma is necessary.

Proposition 1.
For the deferred abandonment system, P Q+ (n) and P ab (n) show the trade-off relationship as n changes.

Remark 2.
In Proposition 1, performance indicators of the two sub-chain model (i.e., P Q and P ab ) show up; this is explained intuitively as follows: When sub-chain 3 (middle sub-chain) does not exist (n = 0), the deferred abandonment model with three sub-chains (left, middle, and right) becomes the model with only sub-chains 1 and 2 (left and right). Likewise, when sub-chain 3 is infinitely large (n → ∞), the deferred abandonment model becomes equivalent to the model with either sub-chains 1 and 3 (left and middle) if ρ < 1, or sub-chains 3 and 2 (middle and right) if ρ > 1. Since the upper/lower bounds of P Q+ (n) and P ab (n) are obtained at either n = 0 or n → ∞, the properties of the deferred abandonment model can be described by the properties of the two sub-chain model. Note that the following properties hold for sub-chain 3: 1/π 3 s = 1/(1 − ρ) = 1 + 1/a at n → ∞ when a > 0 (thus s > R and ρ < 1) and 1/π 3 t = 1/(1 − ρ −1 ) = −1/a at n → ∞ when a < 0 (thus s < R and ρ > 1). Proposition 1 indicates that the trade-off relationship exists between P Q+ (n) and P ab (n). If we provide more seats for customers in a buffer space in an abandonment system, we are able to reduce the number of customers abandoning the system (i.e., reduce P ab (n)) at the cost of higher delay probability for arriving customers (i.e., increase P Q+ (n)). For the remainder of this section, we show the proof of Proposition 1.
To prove that P ab (n) decreases monotonically as n increases, notice that η(n) is a positive increasing function of n. Regardless of the value of a, using Lemma 3, η(n) is a monotonically increasing function of n which satisfies where the equality holds at n = 0. We can conclude that P ab (n) is a decreasing function of n where the the upper bound is P ab (1/π 1 s , 1/π 2 s ) obtained at n = 0, and the lower bound is either 0 (when a > 0) or P ab −1/a, 1/π 2 t (when a < 0) at n → ∞.

Asymptotic Representation of Systems with Deferred Abandonment
In Proposition 1, we observe that there exists a trade-off relationship between the delay probability and the abandonment probability when the size of the buffer space changes. This is true for any systems with smaller (finite) resource requirement R. However, what if when the system grows large? In fact, many systems exhibit larger R compared to the number of buffer spaces n. In this section, we analyze the asymptotic limit of larger systems, obtain useful linear/square-root staffing rules, and discuss the trade-off relationship for larger systems. To find the asymptotic limit of performance indicators, all we need to know is the asymptotic limit of blocking probabilities for sub-chains 1 and 2 since sub-chain 3 is only affected by n and not by R.
To represent asymptotic results, we first define the necessary parameters in Table 1. For simplicity, we use square-root coefficients , which correspond to sub-chain 1 (M/M/s/s), sub-chain 2 (reneging), and sub-chain 2 (balking), respectively. Following the normal approximation described in [10], we obtain Lemma 4: Blocking probabilities of sub-chains are approximated by the hazard function for the standard normal distribution h(·) and the continuity correction terms ∆ = 0.
We omit the proof of Lemma 4, as it is almost identical to that given in [10]. The key idea of this approximation is to represent blocking probabilities of sub-chains by the Poisson representation, and approximate them by the standard normal representation. The averages of the three Poisson distributions for sub-chains 1, 2 (reneging), and 2 (balking) when blocking probabilities are represented by the Poisson random variables are R, R and R , and the continuity correction terms when converting the discrete Poisson distribution to the continuous standard normal distribution are ∆, ∆ , and ∆ , respectively. The Poisson-to-normal approximation is elementary, but highly accurate when the average of the Poisson distribution is around 10 or more, and the approximation becomes exact when the average goes to infinity. Thus, Lemma 4 accurately represents the blocking probabilities of all sub-chains as R → ∞ (which leads to R , R → ∞ as well). We now obtain two asymptotic limits of blocking probabilities: (i) linear staffing rule (when R Q < R); and (ii) square-root staffing rule (when R Q = R).
Lemma 5 (Linear staffing asymptotic regime). Let s = R + aR (or a = s−R R ) and take the limit of large R with fixed a. Then

Proof of Lemma 5.
We use the properties of the standard normal hazard function in this proof: x/h(x) → 1 as x → ∞ and x/h(x) → −∞ as x → −∞. Now, consider taking the limit of large R while fixing a = s−R R . For sub-chain 1, if a < 0 (i.e., s < R), then c = a √ R → −∞, and thus, 1 and if a > 0 (i.e., s > R), then c = a √ R → ∞, and thus, 1 For sub-chain 2, if a Q < 0 (i.e., s < R Q ), then a < 0 and a > 0, both of which are fixed. Thus, c = a √ R → −∞ and c = a √ R → +∞, leading to 1 (Note that R , R → ∞ as R → ∞; see Table 1) Likewise, if a Q > 0 (i.e., s > R Q ), then a > 0 and a < 0. Thus, c = a √ R → +∞ and c = a √ R → −∞, leading to 1
Proof of Lemma 6. We take the limit of large R while fixing c = s−R √ R . Thus, a = c √ R → 0. Additionally, using Table 1 and the assumption R Q = R (thus + τ = 0), we obtain γ · c, and a = − a a+1 . Finally, all continuity correction terms become negligible in the limit of large R: ∆, ∆ , ∆ → 0. Combining these results with Lemma 4, we obtain the result of Lemma 6.
Combining Lemmas 2, 5 and 6 with the assumption R Q ≤ R, we obtain Proposition 2. Proposition 2 describes the asymptotic representation of performance indicators given n (or more specifically, we take the limit of large R while fixing n). Notice that the n = = τ = 0 case (thus, R Q = R holds) corresponds to the asymptotic formulae for the Erlang A model. We define a function needed for the square-root staffing rule: , where θ = γ (or δ) for a reneging (or balking) system.

Proposition 2.
We consider three asymptotic regimes for the deferred abandonment model: ED asymptotic regime: We take the limit of large R while fixing the linear coefficient a that satisfies s = R + aR < R Q and obtain P Q+ (n) → 1 and P ab (n) → p.

2.
QD asymptotic regime: We take the limit of large R while fixing the linear coefficient a that satisfies s = R + aR > R and obtain P Q+ (n) → 0 and P ab (n) → 0.
3. QED asymptotic regime: There are two QED asymptotic regimes.
(a) (Linear staffing rule) When R Q < R, we take the limit of large R while fixing the linear coefficient a that satisfies R Q < s = R + aR < R and obtain (b) (Square-root staffing rule) When R Q = R (thus, + τ = 0), we take the limit of large R while fixing the square-root coefficient c that satisfies s = R + c √ R and obtain P Q+ (n) → φ(c) and P ab (n) → φ(c).
Proof of Proposition 2. Following the linear staffing representation, it is easy to analyze two extreme cases: s < R Q (≤ R) (ED regime) and s > R (≥ R Q ) (QD regime). Using Lemmas 2 and 5, we obtain P W (n) → 0, P Q (n) → 1, P Q+ = P W (n) + P Q (n) → 1, P ab (n) → p for the ED regime; and P W (n) → 0, P Q (n) → 0, P Q+ = P W (n) + P Q (n) → 0, P ab (n) → 0 for the QD regime. We next consider the QED regime that exists in between the two extreme (ED and QD) regimes. If R Q = R, the linear staffing rule following R Q < s = R + aR < R can achieve the QED regime. The properties of this QED regime are obtained using Lemmas 2 and 5: P W (n) → − (1 − ρ −n )(a Q /a) 1 − (a Q /a) , P Q (n) → 1 1 − (a Q /a) , P Q+ = P W (n) + P Q (n) → 1 − (1 − ρ −n )(a Q /a) 1 − (a Q /a) , and P ab (n) → ε 1 − (a Q /a) . (We omit the calculation since it is straightforward, although cumbersome.) If R Q = R (i.e., + τ = 0), two extreme regimes are adjacent in the linear staffing representation. Thus, we utilize the finer square-root staffing representation to describe the properties of the QED regime that exists at the boundary of the ED and QD regimes. Using Lemmas 2 and 6, we obtain P W (n) → 0, P Q (n) → φ(c), P Q+ (n) = P W (n) + P Q (n) → φ(c), and P ab (n) → φ(c).
Proposition 2 shows that there is no trade-off between the delay probability (P Q+ (n)) and the abandonment probability (P ab (n)) in the asymptotic limit of R. In the two extreme regimes, ED and QD, both P Q+ (n) and P ab (n) do not depend on n, implying that the number of seats n in the buffer space does not impact the performance indicators. For the QED regime that exists in between the two extreme regimes, we consider two scenarios: (1) if R = R Q , the linear staffing rule applies and tells us that as n increases, P Q+ (n) could increase (since a Q > 0, a < 0, and ρ > 1); and (2) if R = R Q , the square-root staffing rule applies and tells us that P Q+ (n) is not affected by n. P ab (n) remains the same as n increases for both scenarios. We conclude that providing a buffer space would not be beneficial in the asymptotic limit, which is in contrast to the non asymptotic limit case (Proposition 1).

Conclusions
We propose a queueing model with customer abandonment (reneging/balking) that incorporates a buffer space. We call this model the deferred abandonment model. We derive asymptotic formulae for performance indicators: the delay probability and the abandonment probability. We find that the provision of the buffer space may be worthwhile for smaller systems, but is not beneficial and in fact could be harmful for larger systems. This is because the buffer space may only exacerbate the delay probability without improving the throughput of the facility in an asymptotic limit.