An Analytical Model for 5G Network Resource Sharing with Flexible SLA -Oriented Slice Isolation

: Network slicing is a novel key technology in 5G networks which permits to provide a multitude of heterogeneous communication services over a common network infrastructure while satisfying strict Quality of Service (QoS) requirements. Since radio spectrum resources are inherently scarce, the slicing of the radio access network should rely on a ﬂexible resource sharing policy that provides efﬁcient resource usage, fairness and slice isolation. In this article, we propose such a policy for bandwidth-greedy communication services. The policy implies a convex programming problem and is formalized to allow for session-level stochastic modeling. We developed a multi-class service system with service rates obtained as a solution to the optimization problem, a Markovian Arrival Process and state-dependent preemptive priorities. We use matrix-analytic methods to ﬁnd the steady state distribution of the resulting continuous-time Markov chain and the expressions for important performance metrics, such as data rates. Numerical analysis illustrates the efﬁciency of the proposed slicing scheme compared to the complete sharing and complete partitioning policies, showing that our approach leads to a data rate about the double of that obtained under complete partitioning for the analyzed scenario.


Introduction
Network slicing is a key novel technology introduced in 5G networks as a response to two challenges: • to efficiently transmit data with completely different characteristics and Quality of Service (QoS) requirements over the same physical network infrastructure, and • to provide seamless support for diverse business models and market scenarios, for example, Mobile Virtual Network Operators (MVNO), which do not possess their own network infrastructure yet seek autonomy in administration and admission control.
Network slicing permits creating a multitude of logical networks for different tenants over a common infrastructure. Current industry standards define a network slice as a logical network that provides specific network capabilities and network characteristics [1,2]. According to Reference [2], the dynamic scale-in and scale-out of resources in a slice should be supported and occur with minimal utility functions [13,24], to minimize the overall resource usage while providing performance isolation of the slices [25] or meeting data-rate-and latency-related requirements [16,23]. In References [23,24] a fixed bandwidth allocated to the slice in the core network is considered as an additional constraint. The policy proposed in Reference [12] is aimed at satisfying, on average, the prenegotiated resource shares per tenant within an allowed deviation. Preset minimum shares per slice are also assumed in References [8,9,19,29], however in References [8,29] the effective slice capacities are allowed to go below the guaranteed minima, following the slice workload, while in Reference [19] the minima serve as an optimization constraint, hence being reserved. The objective of the slicing policy proposed in Reference [29] is to minimize the weighted sum of the relative differences between allocated and demanded resource shares. QoS provision and fairness are the focus of References [15,30]. The objective in Reference [26] is to maximize the time-averaged expectation of the infrastructure provider's utility, defined as the difference between the income (data rate and delay gain) and cost (power and spectrum resources). Some works adopt auction models, where resource allocation follows the tenants' bids for their slices [14] or for their slice users [20,21]. Some works [16,17] specifically consider slice isolation, which is provided by setting a fixed upper limit on the slice's capacity. Slice isolation via guaranteed minima is proposed in References [8,9,19,29]. Finally, some authors, namely References [15,17,27], take into account slice priorities.
In the literature, the majority of studies, except for the most technology-specific ones, assume each of the considered resources to be homogeneous and divisible, and propose an algorithm for determining slices' sizes (capacities) as shares of the resources' capacities [9,[12][13][14][15]18,20,21,29,30]. If the services provided in the slice to end users require a certain minimum data rate, such as Guaranteed Bit Rate (GBR) services, the slicing policy may include not only resource allocation but also admission control [17,21,23,27]. Network slicing at the admission-control level is discussed in Reference [33] along with other approaches to RAN slicing, while user blocking probabilities for such systems are estimated in References [17,21]. Besides, since network slices are to be created and terminated dynamically [2], some studies [31,32,34] investigate the slice admission control, that is, whether a new slice instance can/should be admitted to the network. The problem of associating users to base stations under slicing constraints is addressed in References [23,24]. More technology-specific works, such as References [10,11], represent radio resources in the form of a time-frequency grid composed of Resource Blocks (which reflects the physical layer of 4G/5G RAN) and consider not only the share of the total number of Resource Blocks allocated to a slice per time slot, but also their position on the grid, taking account of the corresponding technology-specific constraints.
Two major approaches to the resource slicing problem can be distinguished in the literature. Some authors [12,13,[19][20][21]23,27] propose to perform resource allocation to slices (inter-slice allocation) by means of resource allocation to individual network users with some additional slicing-specific constraints. Then, the slice capacity can be obtained by summing up the resource shares allocated to the corresponding users. Although such a strategy can yield the optimal solution, it may also be computationally inefficient and deprive slice tenants of a certain degree of autonomy and confidentiality [15,22]. The other approach assumes a hierarchical resource slicing (also referred to as two-level [14] or distributed [22]), in which inter-slice allocation is decoupled from resource scheduling to slice users (intra-slice allocation). In this case, the slice shares are determined based upon some aggregated information about slice users and/or additional criteria (e.g., the tenants' bids [14]), after which resources are allocated to slice users within the computed slice capacities. Furthermore, some researchers assume the two steps of the hierarchical RAN slicing to be performed on the same time scale [8,15,22], usually every Transmission Time Interval, while others consider a longer slicing/decision interval [9,11], which may imply traffic prediction using machine learning techniques [25,31].
Algorithmic resource slicing policies are proposed in References [8,9]. Besides, numerous works opt for game theory methods [14,[20][21][22]. Machine learning along with Markov decision processes are applied in References [25,31,32]. Continuous-Time Markov Chain (CTMC) models for network slicing analysis are proposed in References [17,34]. The authors of Reference [34] consider a one-dimensional CTMC which transitions represent slice instantiation and termination, and use it to establish a flexible auction-based slice-admission policy. A four-dimensional CTMC which transitions correspond to establishing and terminating user sessions is proposed in Reference [17] to analyze the performance of resource allocation to two network slices each offering two GBR services with different priorities.
In this article, we propose a flexible and highly customizable resource slicing scheme that satisfies the important criteria given above-efficient resource usage, fairness and performance isolation of slices. The scheme is intended for a QoS-aware, service-oriented slicing, where each slice is homogeneous with respect to traffic characteristics and QoS requirements. We adopt a hierarchical approach to slicing and focus on the inter-slice allocation, assuming that the intra-slice allocation is performed by slice tenants at their discretion. Overall, our approach and formulation of the inter-slice allocation problem is similar to the one adopted in Reference [15], however, we consider the performance isolation of slices, by which we understand, in particular, that traffic bursts in one slice do not affect performance and QoS in other slices, or, at least, effects thereof are minimized. A similar concept of performance isolation is adopted in References [9,25]. Also, similar to References [17,21,23,27], we assume user admission control as part of our slicing scheme.
The proposed slicing scheme is formulated in such a way that its efficiency and the impact of its various parameters and customizable components can be readily analyzed via session-level analytical modeling. To this end, we make use of the Markov process theory and develop a multi-class service system with state-dependent preemptive priorities, in which job service rates are found as the solution to a non-linear programming problem. Also, we use the Markovian Arrival Process (MAP, [35,36]) for one class of jobs to represent a slice which workload is less regular than that represented by a Poisson process. Being both analytically tractable and highly parametrizable, the MAP allows for applying the matrix-analytic methods and, on the other hand, permits to specify a wide range of arrival processes due to its many independent parameters; moreover, there exist techniques for MAP fitting from real traffic data (see e.g., Reference [37]). Similar to Reference [38], we apply matrix-analytic methods to obtain the stationary distribution of the CTMC representing the system and use it to find expressions for performance measures. Finally, we provide a numerical example, where the analytical model serves to compare the proposed policy with the classical resource sharing schemes-complete sharing (CS) and complete partitioning (CP, sometimes referred to as static slicing in the context of network slicing [20]). An algorithm based upon the Gradient Projection Method [39] is suggested for solving the arising optimization problem.
The contribution of the article is twofold. First, we propose a new resource slicing scheme, which focuses on flexible performance isolation of slices and fairness, and is customizable to reflect both QoS requirements and SLA terms. Second, the scheme is formalized to allow for session-level stochastic modeling, and in order to evaluate its performance we develop a CTMC model-a multi-class service system with state-dependent preemptive priorities, a MAP and the service rates obtained as a solution to a convex programming problem-for which the stationary state distribution is derived.
The article is structured as follows. Section 2 introduces the basic model assumptions. Section 3 details the proposed slicing policy and formalizes its resource allocation and preemption components. In Section 4, we present a multi-class service system for analyzing the performance of the proposed policy for three active slices. The steady-state distribution of the system is derived and the expressions of its main performance metrics are obtained. Section 5 offers numerical results, which give an insight into the performance of the proposed slicing policy, in particular, in comparison with CS and CP. Also, an algorithm for solving the optimization problem based upon the Gradient Projection Method is suggested. Section 6 concludes the article.

Basic Assumptions
We consider the downlink transmission from a 4G/5G co-located base station (featuring both LTE and NR radio access technologies) at which radio access resource virtualization and network slicing are implemented [40]. Note that the proposed approach is applicable to a 5G-only network as well, however, although the network architecture for 5G has not yet been standardized (in contrast to its radio interface), the likely upcoming solution is a full integration with 4G (akin to 2G/GSM or 3G/UMTS). Following Reference [28], we suppose that the network slicing is performed at the level of virtualized resources. This approach implies the resource allocation among slices/users in terms of virtual resources, which are then translated into network (physical) resources. Generally speaking, the total amount of virtual resources available for allocation depends on the radio conditions for users at each time instant; however, for simplicity, we assume the total base station capacity available for allocation to be fixed.
Suppose that S slices are active at the base station and denote the set of active slices by S, S = |S|. Denote by C > 0 the capacity of the base station (measured in bits per second, bps), that is, the total amount of resources available, and by C s ≥ 0 the capacity of slice s ∈ S. We assume that each slice is intended for a particular type of traffic, for example, for streaming video, videoconferencing, software updates, and so forth, rather than for a mix of diverse traffic administrated by the same tenant (which may be the case namely in the MVNO scenario). Thus, we assume that each slice s ∈ S is homogeneous in terms of traffic characteristics and QoS requirements and provides to its users only one service with a data rate not smaller than a s > 0 (bps) and not larger than b s ≤ C (bps). The lower bound may reflect QoS requirements [25,27] and corresponds, for instance, to the data rate necessary for providing a certain maximum tolerable delay [15], while the upper bound may be due to the characteristics of a particular service, implying that a higher data rate does not improve the associated QoS (the transmission of voice is a good example of a service with such characteristics). Note that we assume all capacity/data rate parameters to be real-valued, that is, C, C s , a s , b s ∈ R + .
Let n s denote the number of users in slice s ∈ S. We suppose that an effective data rate x allocated to a slice s user i is variable and depends on the slice's capacity and the number of users currently in it, since the sum of the users' data rates cannot exceed the capacity of the slice, that is, s ≤ C s . Without loss of generality, we assume further that a slice's resources are equally shared among all its users, that is, x (i) s = x s = C s n s , i = 1, ..., n s . Note that the latter assumption serves only to determine the slices' capacities C s , s ∈ S, by the slicing scheme detailed in the next session, however, once the capacities established, the actual allocation among users x (i) s , i = 1, ..., n s , is not the subject of this work, and the capacity of a slice can be distributed among its users at the discretion of the slice's tenant (e.g., in function of the radio channel conditions). Finally, we assume that a user can be active in one slice only and have no more than one active connection in it.
A slicing algorithm implemented at the base station is aimed at proving (a) efficient resource usage, (b) fair resource allocation among users, and (c) performance isolation of slices. Clearly, performance isolation cannot be guaranteed for unlimited traffic in all slices, hence, following Reference [32], we assume that performance isolation of slice s ∈ S is provided as long as the number n s of users in this slice does not exceed a threshold G s , 0 ≤ G s ≤ C a s = max y ∈ N : y ≤ C a s . Since a s > 0, the threshold can be set as a share 0 ≤ γ s ≤ 1 of the capacity C, in which case G s = Cγ s a s , s ∈ S. Note that we allow for capacity overbooking [31], and therefore, in the general case, 0 ≤ ∑ s∈S γ s ≤ S.
In order to provide flexibility and efficient resource usage, we assume that the booked capacity γ s C is not reserved or strictly guaranteed to slice s, but rather slice s has priority in its allocation. Generally speaking, if the base station is fully loaded, the slices in which the number of users is less or equal to G s have priority in resource allocation over slices in which the number of users is above G s (or the share of the base station resources occupied by the slice's users at the minimum data rate a s is above γ s ).
We suppose that the allocated user data rate x s is not allowed to drop below a s , because this results in the violation of the service's QoS requirements. In order to satisfy the minimum data rate requirements and provide performance isolation, admission control with request prioritization is used as a part of the slicing scheme. We assume that a request arriving to a slice where the number of users is under or equal to G s , if free resources are not enough, will preempt users in "violator" slices-the slices in which the number of users is above the threshold. We define the proposed slicing scheme formally in the next section.

Resource Allocation
Denote by x s (n) the data rate to each user in slice s ∈ S in state n ∈ Ω. Then, the capacities of slices are obtained as C s (n) = n s x s (n). We assume, following Reference [41], that allocating data rate x s (n) ∈ [a s , b s ] to a user of slice s has utility U s (x s ) to the infrastructure provider, and the functions U s (x s ), s ∈ S, are increasing, strictly concave and continuously differentiable of x s , a s ≤ x s ≤ b s . In this paper we focus on the utility functions proposed by Kelly in Reference [41] for proportionally fair resource sharing, which in our case coincide with max-min fairness [42]. However, other utility functions, which reflect better the nature of the slices' traffic, can also be considered. For n ∈ Ω 0 , we set x s (n) = b s and, by consequence, C s (n) = n s b s for all s ∈ S. For n ∈ Ω 1 , we set x s (n) = b s for s ∈ S 0 (n) and propose to determine the data rates for s ∈ S + (n) as the solution to a convex programming problem as follows: subject to For each s ∈ S, the weight function w s (n) is assumed positive for all n ∈ N, equal to 1 for n ≤ G s , and smaller than 1 and nonincreasing for n > G s . The idea behind such requirements is to ensure max-min fair resource allocation to users as long as the corresponding slices do not exceed their thresholds G s , and to penalize (squeeze) the "violator" slices, in which the number of users exceeds G s . In this work, we define the weight functions as In what follows, unless specifically stated, by w s we mean w s (n s ) and by w, the column vector (w 1 (n 1 ), ..., w S (n S )) T .
Since the objective function (3) is differentiable and strictly concave by assumption and the feasible region (4), (5) is compact and convex, there exists a unique maximum for the data rate vector x, which can be found by Lagrangian methods. For the utility functions (2), the unique stationary point of the problem (3), (4) has the coordinates and is located at the intersection of the hyperplane (4) and the open ray x s = w s y , y > 0, s ∈ S + (n). However, if the stationary point (7) does not satisfy the direct constraint (5) then the solution lies on the boundary of the feasible region (4), (5).
In the case when x * / ∈ P, the problem (3)-(5) can be solved numerically via the Gradient Projection Method [39]. The method is based upon the iterative process where P is the projection matrix and ∇ f (x) is the gradient of the objective function. The latter, for the utility (2), equals the column vector Initially, P is the projection matrix onto the hyperplane (4): where Π = n. However, once a boundary of P is reached, matrix Π should be appended to include the hyperplane of the corresponding constraint, and the current approximation x k should be placed on the boundary. Finally, the intersection point of (4) with the diagonal of P connecting the points (a s ) s∈S + (n) and (b s ) s∈S + (n) can be used as the initial approximation x 0 .

Admission Control and Resource Preemption
Let e s represent a row vector of order S in which the sth entry is 1 and all others are zero. Let the system be in state n ∈ Ω and denote g(n) = ∑ s∈S a s min{n s , G s }.
Now, an arriving request to slice s will be lost only in two cases-either n + e s / ∈ Ω and n s + 1 > G s , or n + e s / ∈ Ω, n s + 1 ≤ G s and C − g(n) < a s . The arriving request is, conversely, accepted if either n + e s ∈ Ω, or n + e s / ∈ Ω, n s + 1 ≤ G s and C − g(n) ≥ a s . In the latter case resources for the arriving request are freed according to Algorithm 1. Condition C − g(n) ≥ a s ensures that enough resources are available to preempt. Upon acceptance of the request the system will transition into statê n + e s wheren is obtained via Algorithm 1.
Algorithm 1: Resource preemption upon an arrival into slice s in state n Input: s ∈ S, n ∈ Ω such that n + e s / ∈ Ω, n s + 1 ≤ G s and C − g(n) ≥ a s Output:n ∈ Ω such that, upon acceptance of the request the system will transition into statê n + e s 1n := n // initialization 2 repeat 3 R := {r ∈ S :n r > G r , w r (n r ) = min j∈S w j (n j )} // candidates for preemption 4 Choose r ∈ R // randomly or according to a preset order 5n :=n − e r 6 until C −na ≥ a s

Model Assumptions
We suppose that three slices are active at the base station and that re-slicing is performed often enough so that we can assume resources to be reallocated whenever a user connection is established or terminated. Under these assumptions, we can represent the functioning of the base station described in Sections 2 and 3 as a three-class preempt-loss system with elastic jobs and make use of the matrix-analytic methods for its analysis. Consider a loss system of continuous capacity C with no waiting spaces and three job classes, S = {1, 2, 3}. We assume class 1 and 2 jobs to arrive according to Poisson processes (PP) with rates λ 1 and λ 2 respectively. Class 3 jobs form a MAP characterized by two square matrices Q 0 and Q 1 of order K. The matrix Q = Q 0 + Q 1 represents the infinitesimal generator of the CTMC {ξ(t), t ≥ 0} that controls the MAP. Matrix Q 1 contains the transition rates of {ξ(t), t ≥ 0} accompanied by an arrival, whereas the off-diagonal entries of Q 0 represent the transition rates of {ξ(t), t ≥ 0} without arrivals. We assume Q 1 non-zero and {ξ(t), t ≥ 0} irreducible. Denote by 1 a column vector of ones and by 0 a row vector of zeros of appropriate length. The row vector q of the stationary state probabilities of {ξ(t), t ≥ 0} is determined as the unique solution to the global balance equations qQ = 0, q1 = 1. The mean rate of the MAP is given by We assume that jobs are served according to the Egalitarian Processor Sharing (EPS) discipline [43], however, each class s job demands no less than a s and no more than b s of system capacity, 0 < a s ≤ b s ≤ C. Jobs' lengths, that is, the holding times when served by exactly one unit of capacity, are assumed to be exponentially distributed with parameters µ s , s ∈ S. An accepted job is served with a variable rate until its remaining length is zero and leaves the system thereafter. The service rate of a class s job varies with the numbers of jobs of each class currently on service, n = (n 1 , n 2 , n 3 ), and is determined by the allocated data rate x s (n), which is found as the solution to the optimization problem (3)- (6). In what follows the notations x s (n), x s (n 1 , ..., n S ) and x (n 1 ,...,n S ) s are equivalent. Finally, jobs are accepted and preempted according to the rules detailed in Section 3.2.

Stationary State Distribution
Denote by n s (t) the number of class s jobs on service at time t ≥ 0, s ∈ S. The stochastic behavior of the system can be represented by the four-dimensional CTMC {ψ(t) = (n 1 (t), n 2 (t), n 3 (t), ξ(t)), t ≥ 0} over the state space Ψ = {(n, k) : n ∈ Ω, k = 1, 2, ..., K}, where Ω is given by (1). Now, order the states of {ψ(t), t ≥ 0} lexicographically and let A represent its infinitesimal generator. We express A as whereÃ represents an infinitesimal generator of {ψ(t), t ≥ 0} without preemption and H contains the preemption rates. Let L = C/a 1 denote the maximum number of class 1 jobs on service simultaneously. We also introduce a notation for the maximum number of class 2 jobs when l class 1 jobs are on service and, similarly, for the maximum number of class 3 jobs when l class 1 jobs and m class 2 jobs are on service: N(l, m) = C − la 1 − ma 2 a 3 , l = 0, 1, ..., L, m = 0, 1, ..., M(l).
Then,Ã is a block tridiagonal matrix of the form The blocks located at the intersection of the ith block row and jth block column are block matrices of block size ∑ m=0 (N(j − 1, m) + 1) composed of square matrices of order K. The diagonal blocks ofÃ are also block tridiagonal: Here, the blocks located at the intersection of the ith block row and jth block column are block matrices of K-block size (N(l, i − 1) + 1) × (N(l, j − 1) + 1).
Super-and subdiagonal blocks of D l are rectangular K-block matrices of the form Finally, super-and subdiagonal blocks ofÃ are block diagonal rectangular matrices of the form The structure of H depends on the weights w s (n), s ∈ S, therefore we propose to construct it using a general recursive algorithm. We introduce subspaces of states in which an arriving class s job is accepted via preemption of jobs of other classes: Here, the first condition means that the arriving job cannot be accepted directly, the condition n s + 1 ≤ G s provides to the arriving job priority over possible "violators", while the last condition assures that in the system there are "violators" preemption of which will permit to vacate enough resources for the arriving job to be accepted. Note that the subspace of all states in which a class s job is accepted equals A s =Ã s ∪ H s , and hence the subspace of all states in which an arriving class s job is lost is B s = Ω \ A s . The number of state n ∈ Ω in lexicographical order is given by Now, the preemption rate matrix H = H i,j , where H i,j are square blocks of order K, i, j = 1, ..., |Ω|, can be obtained via Algorithm 2. Note that in Algorithm 2 the "violators" having the same weight are preempted with equal probability. Note also that in step 1 of the algorithmÃ can be used for initialization instead of 0, in which case the algorithm will produce the infinitesimal generator A. However, the preemption rate matrix H will be used to compute performance measures in Section 4.3, so we prefer to construct it separately. Since under our assumptions {ψ(t), t ≥ 0} is irreducible and its state space Ψ is finite, the stationary distribution of {ψ(t), t ≥ 0} exists. Let us write it in vector form in accordance with the partitioning (16) ofÃ into blocks: p = (p 0 , p 1 , ..., p L ), where p l = (p l,0 , p l,1 , ..., p l,M(l) ), l = 0, ..., L, p l,m = (p l,m,0 , p l,m,1 , ..., p l,m,N(l,m) ), m = 0, ..., M(l), and p l,m,n = (p l,m,n,1 , p l,m,n,2 , ..., p l,m,n,K ), n = 0, ..., N(l, m). Vector p satisfies the global balance equations We note that whileÃ is block-tridiagonal, adding H may disrupt this structure, since the positions of non-zero entries in H depend on the ratios among a s , G s and w s (n s ). This makes it difficult to apply special block-matrix methods for solving the global balance equations, such as in References [38,44]. However, A is sparse, which allows for the use of sparse linear systems' routines such as UMFPACK.

Performance Measures
Using the stationary state distribution of the system, we can easily obtain a number of its performance measures. In particular, for s ∈ S, the mean number of jobs in service, which represents the mean number of users in a slice, is given bȳ N s = ∑ n∈Ω n s p n 1 ,n 2 ,n 3 1 = N(n 1 ,n 2 ) ∑ n 3 =0 n s p n 1 ,n 2 ,n 3 1.
The average user data rate in slice s corresponds to the capacity occupied by one class s job when class s jobs are present in the system: x s (n)p n 1 ,n 2 ,n 3 1 ∑ n∈Ω:n s >0 p n 1 ,n 2 ,n 3 1 .
The average capacity of slice s is given bȳ n s x s (n)p n 1 ,n 2 ,n 3 1.
The blocking probability in slice s, that is, the probability that a slice s user will not receive full service, corresponds to the loss probability of a class s job, which is the sum of the probability for the job to be lost at arrival and the probability to be preempted: B s = B arr s + B pr s . The loss probabilities upon arrival are given by B arr s = ∑ n∈B s p n 1 ,n 2 ,n 3 1, s = 1, 2, p n 1 ,n 2 ,n 3 Q 1 1.
To obtain the preemption probabilities we make use of the fact that all the transition rates related to preemption are combined in matrix H: Note that B pr s represents the probability for any incoming slice s user to be preempted, while the probability for an accepted user to be preempted is Besides, the total blocking probability can be also found via the arrival and departure rates: The overall blocking probability in the system is given by Finally, Little's Law provides the expression for the average user session duration (job holding time) making use ofN s and B arr s :

Numerical Results
Suppose that three slices are instantiated at a base station of capacity 40 Mbps. Each slice provides a service that implies transferring files of size 2.5 MB on average with rates no less than 2 Mbps for slice 1 and no less than 1 Mbps for slices 2 and 3. We model this scenario using the service system presented in Section 4 with the following parameter values: S = {1, 2, 3}, C = 40, a = (2, 1, 1) T , b s = C and µ s = 0.05, s ∈ S. Additionally, we set λ s = 0.55 for all s ∈ S and specify the MAP of slice 3 by The MAP has been chosen so that its fundamental rate equals that of the other two arrival processes, but the variance of the interarrival time is substantially higher-20.875 vs. 3.306.
To obtain the numerical results presented in this section, we used UMFPACK routines for solving the global balance equations (26), while the optimization problem (3)-(5) was solved for n ∈ Ω 1 such that S 0 (n) = ∅ using the Gradient Projection Method via Algorithm 3. First, we set the booked capacity shares of the slices equal to each other, γ s = γ, s ∈ S, and vary γ from 0 to 1 with step 0.025 (which yields overbooking for γ > 1/3). The charts in Figures 1-3 permit to compare the proposed slicing scheme with the CS and CP policies in terms of blocking probabilities (Figure 1) average user data rate ( Figure 2) and average user session duration ( Figure 3). Indeed, γ = 1 corresponds to CS with max-min resource allocation and first-come, first served (FCFS) discipline, since all the weights equal 1 and there are no preemptive priorities among slices. Such CS provides high resource utilization and max-min fair resource allocation to users but no slice isolation. The values of the performance measures under CS are not explicitly indicated in Figures 1-3; they can be found on the corresponding curves for the slicing policy for γ = 1. The respective performance measures under CP such that the capacity of each slice is fixed and equals exactly C/3 are shown by dashed lines. These values were obtained separately for each slice via the CTMC model of Section 4 with the rates of the arrival processes corresponding to the other two slices set to zero and the system's capacity set to C/3. Note that CP provides perfect slice isolation but may result in poor utilization and fairness.
The zigzag shape of the curves is due to the fact that, as we have stated in Section 2, γ s is not used in the slicing scheme directly but through parameter G s via the relation G s = Cγ s a s , s ∈ S. As γ grows from 0 to 1, G 1 increases by 1 at γ = 0.05n, n = 1, 2, ..., 20, while G 2,3 increase by 1 at γ = 0.025n, n = 1, 2, ..., 40. In particular, this explains that the average user data rate in slice 2 is higher than that in slice 1 for small γ > 0 (see Figure 2). Indeed, for γ = 0.025 we have G 1 = 0 and G 2,3 = 1, which gives advantage to slices 2 and 3 over slice 1 in resource allocation. This advantage is slightly compensated at γ = 0.05, since here G 1 increases to 1, while G 2,3 = 2, but becomes important again at γ = 0.075 since G 1 is still 1 while G 2,3 = 3.
Overall, we can see that the proposed slicing scheme leads to a much better network performance compared to the CP policy (static slicing) not only in terms of the blocking probability (which are almost by one order of magnitude greater under CP for all slices), but also when considering average data rate (which are roughly half as large under CP) and average session duration (factors depending on the slices due to their different characteristics). A higher blocking probability in slice 1 compared to slice 2 under CP or CS is due to a higher minimum data rate in slice 1, while an even higher blocking probability in slice 3 stem from the MAP parameters. As expected, the difference in the arrival processes' characteristics between slices 1 and 2, on the one side, and slice 3, on the other side, clearly determine the behavior of the curves, with γ = 1/3 being an obvious turning point. Figures 1-3 demonstrate that the slicing policy protects slices 1 and 2 from the irregular slice 3 traffic-for smaller γ through resource allocation, and for larger γ also through resource preemption. Indeed, for γ = 0 there is no preemption of resources, but the more users a slice has the less capacity per user it receives. Interestingly, the overall blocking probability for γ = 0 is slightly lower than for γ = 1.
While Figures 1-3 give insight into the functioning of the proposed slicing scheme, Figures 4-6 illustrate how the scheme can be used to accommodate slices with different traffic characteristics and balance their performance. Here, the booked capacity share of slice 3, γ 3 , is plotted along the abscissa, while the booked capacities of slices 1 and 2 are determined as γ 2 = 1−γ 3 3 , γ 1 = 2γ 2 for fully booked capacity (solid lines) and as γ 2 = 1.1−γ 3 3 , γ 1 = 2γ 2 for 10% overbooking (dashed lines). Figures 4-6 show, respectively, the blocking probabilities, average user data rates the average user session duration as functions of γ 3 . As one can see from Figure 4, by varying γ s it is possible to protect some slices from the others to a different extent as well as to bring the blocking probability in each slice to the base-station average level. Note that the overall blocking probability is, again, slightly higher on average in the case of overbooking.  Figure 6. The average session duration T s vs. γ 3 ; γ 2 = α−γ 3 3 , γ 1 = 2γ 2 (solid lines-α = 1, dashed lines-α = 1.1).

Discussion
The paper addresses the problem of inter-slice resource allocation in a shared capacity infrastructure among slices with different QoS and SLA requirements. A flexible and customizable resource slicing scheme, which leads to efficient resource usage, fairness and performance isolation of slices, is proposed along with an analytical CTMC model for its performance evaluation.
We assume each slice homogeneous in terms of traffic characteristics and QoS requirements and assign to it three parameters-non-zero minimum and maximum data rates per user and the booked number of users. While the former two parameters reflect the QoS requirements, the latter corresponds to the number of slice users for whom the capacity is booked, although not reserved in the strict sense, meaning that it may still be allocated to another slice if not used up. As long as the number of users in each slice remains within the booked threshold, all users are treated equally, which implies FCFS admission and fair (e.g., max-min fair) resource allocation. However, if the number of users in some slice goes above the threshold, the slice gets penalized-first, by way of resource allocation (it gets "squeezed") and then, if capacity is insufficient, by way of admission control through assignment of a lower preemptive priority. Thus, slice performance isolation is provided to slices as long as their number of users remains below a preset threshold, yet a larger number of users may receive service if the overall capacity and workload in other slices permit.
The slicing policy is formulated as a combination of a convex programming problem for resource allocation and an admission control policy with resource preemption. It is formalized in a way to allow for session-level modeling. We make use of the Markov process theory and develop a multi-class service system with state-dependent preemptive priorities, in which job service rates are found as the solution to the optimization problem. The MAP is used for one class of jobs to represent a slice which workload is less regular than the one represented by a Poisson process. We apply matrix-analytic methods to obtain the stationary distribution of the CTMC representing the system and use it to derive expressions for performance measures.
The numerical results provided in Section 5 give an insight into the performance of the proposed slicing policy, namely in comparison with CS and CP. Also, an algorithm for solving the optimization problem based upon the Gradient Projection Method is suggested. The numerical results show the capability of the proposed slicing scheme to accommodate heterogeneous traffic and provide performance isolation of slices while maintaining resource utilization at the CS level, thus clearly outperforming CP. The scheme is highly flexible and customizable, and further research is hence needed in order to provide a clear guidance as to its parametrization and the choice of utility/weight functions. Moreover, future work will be aimed at extending the slicing policy to allow for slice prioritization and leaving the admission control to the discretion of slice tenants.