1. Introduction
Networks will fundamentally transform digital ecosystems by delivering unprecedented data rates, ultra-low latency, and intelligent resource orchestration. To support emerging applications such as extended reality, autonomous systems, and real-time industrial automation, multi-access edge computing (MEC) has emerged as a critical architectural paradigm that extends cloud-like capabilities to the network edge [
1,
2]. As communication infrastructures evolve beyond 5G, decentralization of computational resources becomes increasingly essential to meet stringent quality of service (QoS) requirements [
2,
3]. By deploying computing resources in close proximity to end-users, MEC significantly reduces latency and enhances overall network efficiency [
3].
The fundamental distinction between MEC and traditional cloud computing lies in their topological organization. Cloud computing relies on centralized, often remote data centers, where service provisioning management involves sophisticated semantic and data-driven frameworks [
4,
5]. MEC, by contrast, employs a distributed network of edge nodes geographically near users. This proximity is indispensable for latency-sensitive applications, including immersive media, smart city services, and mission-critical industrial operations [
2,
6]. Service migration—the process of relocating a running service or its state between computational nodes—serves as a key mechanism to maintain QoS adherence and optimize resource utilization under dynamic conditions [
7,
8]. Unlike in cloud environments, where migration primarily targets load balancing and cost reduction, MEC migration must prioritize latency reduction and real-time performance guarantees [
7,
9].
Recent studies have explored adaptive migration strategies using machine learning and deep reinforcement learning (DRL) [
10,
11,
12]. However, most existing models adopt an all-or-nothing approach, assuming that a service resides entirely either in the cloud or at the edge at any given time [
7,
8]. This assumption overlooks the benefits of split-user offloading, wherein a service’s user base is dynamically partitioned between the edge and the cloud. Furthermore, DRL-based approaches operate as black boxes and lack closed-form performance guarantees, which hinders their deployment in systems requiring auditable, low-overhead control logic. As data-driven decision support approaches gain traction in cloud-based service provisioning [
5], the need for rigorous analytical models of hybrid edge–cloud systems becomes particularly acute. Queueing theory provides a powerful mathematical foundation for modeling and optimizing such systems [
13,
14], offering tractable tools to evaluate performance metrics and design efficient resource management strategies under uncertainty. To the best of our knowledge, no existing work provides a continuous-time Markov chain (CTMC)-based model that simultaneously captures split-user offloading and an explicit delay-aware migration policy in a hybrid MEC–cloud system—the precise gap this paper addresses.
A related but different line of research considers task-level offloading between mobile devices and MEC servers. In such models, a computational task generated by a user may be executed locally, offloaded to the MEC, or split between local and edge resources, often under wireless-channel and energy constraints. Examples include energy-latency trade-off optimization in MEC networks, IRS-aided binary offloading, and RIS-aided cooperative MEC systems [
15,
16,
17]. These studies operate mainly at the device–edge computation layer. In contrast, the present paper considers the MEC–cloud orchestration layer: individual user sessions are not divided into local and edge subtasks; instead, the population of users of the same service is split between a local MEC node and the corresponding cloud path. Therefore, the local/MEC split and the MEC/Cloud split address complementary levels of the offloading problem.
In this paper, we propose a queueing-theoretic framework for adaptive service migration in hybrid MEC–cloud environments with split-user offloading capability. The system comprises a resource-constrained MEC node and multiple remote cloud instances, each permanently hosting a dedicated service. A distinctive feature of the model is the ability to dynamically offload a subset of a service’s users from the cloud to the MEC node, while the MEC node hosts users from only one service at a time. Building upon our earlier work [
18,
19], this paper establishes a rigorous mathematical foundation for analyzing hybrid MEC–cloud systems with adaptive split-user migration. Practical deployments require consideration of additional factors such as migration overhead, heterogeneous network conditions, and multi-node edge environments; however, a solid analytical framework is essential for systematic exploration of fundamental system properties before addressing these applied challenges.
The main contributions of our study are as follows:
- 1.
We propose a hybrid MEC–cloud queueing model that formalizes split-user offloading and a delay-aware greedy orchestration policy, which determines at every arrival and departure event which service is active at the MEC node and how many of its users are offloaded from the cloud.
- 2.
We prove that the stationary distribution of the underlying CTMC admits a product-form solution that decouples across services, and derive closed-form expressions for the mean end-to-end (E2E) delay, per-service MEC hosting probability, MEC occupancy/saturation, and delay-saving indicators.
- 3.
We validate the closed-form expressions using a discrete-event simulation (DES), extend the numerical evaluation to a heterogeneous five-service industrial MEC–cloud scenario, compare the proposed split-user offloading policy with a cloud-only baseline, and examine the sensitivity of the results to alternative arrival and service-time distributions.
In contrast to monolithic migration, where a service is moved to the edge as a whole, the proposed split-user mechanism allows only a capacity-limited subset of users of the selected service to be served at the MEC, while the remaining users continue to be served by the cloud.
The remainder of this paper is organized as follows:
Section 2 reviews related work on MEC service migration, orchestration policies, and analytical queueing models.
Section 3 details the system model and the proposed migration policy.
Section 4 develops the queueing model and derives closed-form performance metrics.
Section 5 presents numerical results and discusses practical implications and limitations.
Section 6 concludes this paper and outlines future research directions.
3. System Model
In this section, we formalize the hybrid MEC–cloud architecture and define the migration policy that governs user placement. We first describe the system architecture and its key assumptions, then introduce the delay model and the QoS objective that motivates the migration decisions, and finally specify the delay-aware greedy orchestration policy, including its triggering conditions and migration logic.
3.1. System Architecture
This work considers a hybrid computing system consisting of a single resource-constrained MEC node and K remote cloud instances, one per service class. Let denote the set of services. Each service is permanently hosted by a dedicated cloud instance with bandwidth capacity .
This architecture is used as a focused abstraction of an industrial edge deployment in which one local MEC node is attached to several existing back-end cloud services, such as control, telemetry, rendering, or analytics services. The single-MEC assumption isolates the main bottleneck considered in this paper: how a limited edge resource should be shared between heterogeneous service classes. A multi-MEC topology with inter-edge migration is an important extension, but it would introduce additional routing and association decisions beyond the scope of the present CTMC model.
The dedicated-cloud-per-service representation should be interpreted as a per-class cloud path with its own capacity and effective delay, rather than as a restriction that a physical cloud platform can run only one service. In practice, these parameters can represent logically separated service backends or isolated cloud slices. A shared elastic cloud can be incorporated in future work by making and state-dependent.
The MEC node has a limited bandwidth capacity and can simultaneously serve users of only one active service , where denotes the idle state. For the active service s, exactly m of its users are placed on the MEC node, while the remaining users of that service continue to be served by the corresponding cloud instance; users of all other services reside entirely in their respective clouds. This splitting of a service’s user base between the edge and the cloud is referred to as the split-user offloading mechanism.
Users of each service
k arrive according to a Poisson process with rate
1/s and have exponentially distributed service times with mean
s, independently of their placement. When placed on the MEC, each user of service
k occupies a fixed bandwidth
bps; cloud users consume no MEC resources. The maximum number of users that can be simultaneously hosted on the MEC for service
s is
, and the maximum cloud capacity for service
k is
. The cloud always retains sufficient capacity to accept users already admitted to the system when they are reassigned from the MEC back to the cloud. New arrivals, however, are admitted only while
; otherwise, they are rejected by the finite admission bound introduced in assumption (A3). The main notation is summarized in
Table 1.
3.2. Modeling Assumptions
For clarity and to make the scope of the analytical model explicit, we summarize the modeling assumptions used throughout this paper.
- (A1)
Poisson arrivals. Users of service class k arrive according to an independent Poisson process with rate . This assumption makes the vector of active users Markovian and is used in the product-form derivation.
- (A2)
Exponential service times. The service time of each user of class k is exponentially distributed with mean and is independent of the user’s placement. Placement at the MEC or in the cloud changes the experienced end-to-end delay, but not the service completion rate. This assumption is required for the CTMC representation and is relaxed numerically through alternative service-time distributions.
- (A3)
Finite admission capacity. For each service class k, the total number of active users is bounded by . An arriving user of class k is admitted only when ; otherwise, the arrival is rejected by the finite admission bound. This bound may represent a finite cloud-side admission limit or a numerical truncation of the state space. Users already admitted to the system are never lost when they are reassigned between the MEC and the cloud.
- (A4)
Single active service at the MEC. At any time, the MEC node hosts users of at most one service class , where denotes the idle state. This captures the case where a resource-constrained MEC node runs one active service container or network function at a time.
- (A5)
Instantaneous and reversible reconfiguration. Changes in the active service s and in the number of MEC-hosted users m occur instantaneously and do not consume additional bandwidth in the baseline CTMC model. The baseline CTMC therefore provides an idealized lower-delay reference case; explicit migration and reconfiguration overhead is discussed as a modeling limitation and left for future migration-aware extensions.
- (A6)
Class-dependent effective delays. A user served at the MEC experiences delay , while a user of service class k served in the cloud experiences delay . These values are interpreted as effective mean end-to-end delays that aggregate radio transmission, transport, processing, buffering, and protocol overheads. They can be obtained from measurements or from a lower-layer channel-aware model.
3.3. Delay Model and QoS Objectives
The E2E delay represents the total round-trip time experienced by a user, encompassing radio-channel transmission, processing at the serving node, buffering, encoding/decoding, and the return path. A user placed on the MEC node experiences a fixed delay due to physical proximity and the absence of a long-haul transmission segment. A user of service k served in the cloud experiences a significantly higher delay , which includes inter-domain transmission, remote processing, and the return path. The quantity measures the per-user latency benefit of MEC placement for service k and serves as the primary criterion for migration decisions: a larger implies a greater gain from moving users to the edge.
The QoS objective of the orchestration layer is to minimize the instantaneous total system delay, defined as the sum of E2E delays over all users present in the system:
where
is the vector of total user counts per service,
m is the number of active-service users at the MEC, and
s is the index of the active service. When
(MEC idle), the formula reduces to
, since all users reside in their respective clouds. Minimizing (
1) at each event is the basis of the orchestration policy described next.
3.4. Migration Orchestration Policy
User placement is governed by a delay-aware greedy orchestration policy that operates in an event-driven manner: a placement decision is made upon every user arrival and every service completion. At each event, the policy selects the service
s and the number of edge-hosted users
m that minimize the total delay (
1) subject to the capacity constraints
and
for all
. Priority is given to the service with the largest delay difference
, since its users derive the greatest benefit from MEC placement.
The policy distinguishes three outcomes after each event. If the currently active service
s remains optimal, the MEC placement is updated in place:
m increases by one on an arrival to
s (provided MEC capacity permits) or decreases by one on a departure from
s. If an alternative service
becomes preferable, full migration occurs when
, switching the active service to
l and placing all
of its users on the MEC; split-user offloading occurs when
, placing
users of service
l on the MEC and returning the remaining
to the cloud. The conditions triggering each outcome are summarized in
Table 2. Migration is instantaneous and costless in this model—an idealization discussed in
Section 5—and does not cause loss of users already admitted to the system since cloud-side reassignment is assumed feasible under (A3).
The reconfiguration is interpreted at the session-routing level. The model does not assume that a partially executed task is checkpointed and moved between the cloud and the MEC; instead, subsequent requests of the affected session are served according to the updated placement. Any additional handover or warm-up delay is outside the baseline CTMC model and is discussed as a limitation.
4. Queueing Model
In this section, we develop the queueing-theoretic model for the system. We define the CTMC state space and embed the migration policy into the stochastic formulation, specify the transition rates governing system dynamics, establish the product-form stationary distribution, and derive closed-form expressions for the key performance metrics: average E2E delay, per-service hosting probability, and MEC utilization.
4.1. CTMC Formulation and State Space
To describe the system dynamics, we introduce a CTMC
, where
captures the total number of active users, the number of users of the active service placed at the MEC node, and the active service index at time
t. The extended state space
consists of all triples
satisfying the resource capacity constraints:
State transitions in occur at user arrival and service completion epochs. Migration is not an independent event type: it occurs instantaneously at the same epoch, whenever the policy requires a change in s or m. The transition structure is as follows: Let denote the unit vector with a one in position k.
Arrivals. When a user of service k arrives in state with , the load vector updates to and the policy is re-evaluated:
If and : no migration; new state .
If and : MEC saturated, user goes to cloud; new state .
If and service k does not become preferable: no migration; new state .
If and full migration to k (i.e., and ): new state .
If and split-user offloading to k (i.e., and ): new state .
Departures. When a user of service k completes in state with , the load vector updates to and the policy is re-evaluated:
If and : user was at MEC; new state , followed by possible migration if another service l becomes preferable.
If : user was in the cloud of service k; new state , with possible migration if service s is no longer optimal.
In all cases, the post-event migration step applies the conditions of
Table 2 with
equal to the updated value of
m after the arrival or departure, and selects the state
that minimizes
from (
1).
The migration policy defined in
Section 3 uniquely maps every vector
to a placement decision
via:
Consequently, the extended state
is fully determined by
alone, and the effective state space reduces to:
The process is therefore a well-defined CTMC on with generator . The CTMC is irreducible: the zero state is reachable from any state via a finite sequence of departures, and any state is reachable from via a finite sequence of arrivals. Irreducibility guarantees the existence and uniqueness of the stationary distribution. Because migration is an internal reconfiguration that does not alter the arrival or service-completion rates of any service class, the stationary distribution over is independent of the specific migration policy.
4.2. Transition Rates
Projected onto the effective state space
, the dynamics of the CTMC
are governed by the following transition rates: arrivals increment the vector
by
at rate
(provided
); departures decrement the vector
by
at rate
(provided
):
The diagonal elements are set in the standard way . After each transition , the placement is updated instantaneously according to (2) and (3). The transition rates (6) depend only on and not on the current placement , which is the key structural property exploited in the next subsection.
4.3. Product-Form Stationary Distribution
The stationary distribution over the effective state space admits a product form. The key observation is that the migration policy changes only the placement of already admitted users between the MEC and the cloud, whereas the load vector changes only due to arrivals and service completions.
Proposition 1. Under assumptions (A1)–(A5), the finite-state CTMC on is irreducible and therefore positive recurrent. Its unique stationary distribution is:where is the offered load of service class k, and:is the normalization constant. Proof. First, the state space is finite. From any state , the zero state can be reached through a finite sequence of service completions. Conversely, any state in can be reached from the zero state through a finite sequence of admissible arrivals. Since the corresponding transition rates are positive on these edges, the CTMC is irreducible. Finiteness then implies positive recurrence and uniqueness of the stationary distribution.
Second, under the greedy policy (2) and (3), the placement variables
are deterministic functions of the load vector
. A migration or reconfiguration event changes only this placement; it does not create or remove users. Therefore, the marginal process
evolves only through the birth–death transitions:
These are exactly the transition rates defined in
Section 4.2 and they do not depend on the current MEC placement
. This is the structural reason why the stationary distribution of the load vector is independent of the placement policy, provided that the policy only reassigns already admitted users and does not modify the class-wise arrival and service completion dynamics.
Third, the distribution (
7) satisfies the detailed-balance equations on every admissible edge of the state space. Indeed, for any
such that
:
Substituting (
7) into both sides gives:
which holds because
. Hence, the detailed-balance relations hold for all neighboring states, and therefore
. By uniqueness of the invariant distribution on the finite irreducible CTMC,
in (
7) is the stationary distribution.
The migration policy affects performance metrics through the deterministic projection
, but it does not affect the stationary distribution of
itself. Consequently, delay, MEC hosting probabilities, and saturation metrics are computed by weighting the policy-induced placement decisions with the product-form probabilities (
7), without enumerating the full extended state space
. □
4.4. Performance Metrics
Using the stationary distribution (
7), we derive closed-form expressions for the key QoS indicators evaluated at the user-session level.
The hosting probability of service
k—the fraction of time the MEC node is assigned to service
k—is:
The average number of users of service
k at the MEC node is:
The mean total system delay is the expected sum of E2E delays over all users in the system:
The average delay saving due to MEC placement for service
k is:
and the overall delay saving is
. Finally, the MEC saturation probability is:
5. Numerical Results and Discussion
In this section, we present the numerical results for the proposed hybrid MEC–cloud framework. We describe the industrial scenario and parameter baseline, analyze the impact of traffic load on all key metrics, examine the sensitivity of mean delay to cloud path quality and service duration, and discuss modeling limitations.
5.1. Scenario Description
The numerical study is carried out for a five-service industrial MEC–cloud scenario whose architecture is illustrated in
Figure 1. The five service classes represent qualitatively distinct industrial applications with heterogeneous latency requirements, traffic intensities, and service durations. The default system parameters are summarized in
Table 3. The MEC delay is fixed at
ms for all services, whereas the cloud delays range from 60 ms to 150 ms. Therefore, the delay benefit of MEC placement,
, is service-dependent and ranges from 55 ms for Service 3 to 145 ms for Service 4.
The MEC delay is ms for all service classes; therefore, the corresponding delay benefits are ms for Services 1–5, respectively. The offered loads are computed as , giving , , , , and .
The parameter set deliberately combines service classes with different roles. Service 4 is the most delay-critical class: it has the largest cloud delay ( ms) and the largest MEC placement benefit ( ms), although its offered load is moderate (). Service 5 is the most load-intensive class, with the highest offered load () and a substantial delay benefit ( ms). Service 2 combines unit offered load () with a large delay benefit ( ms), making it another strong candidate for MEC placement. By contrast, Service 1 has moderate load () and moderate latency gain ( ms), whereas Service 3 has the smallest cloud delay ( ms), the smallest MEC placement benefit ( ms), and offered load , making it the least latency-critical class in the considered scenario.
Under the policy (2) and (3), the MEC placement priority is not determined by
alone. Instead, the active service is selected according to the aggregate instantaneous delay saving
. Therefore, the actual MEC occupancy depends jointly on the delay benefit, the offered load, and the service-specific MEC capacity
. In the baseline configuration, Services 2, 4, and 5 are expected to be the main competitors for MEC placement. The sensitivity analyses that follow vary the load scaling factor
a (all
scaled by
), the cloud delay ratio
, and the service time scale
multiplier; for each experiment, only the indicated parameter deviates from the baseline in
Table 3.
5.2. Impact of Traffic Load
We first examine how the main performance metrics evolve as the traffic intensity is uniformly scaled. The load scaling factor a multiplies all arrival rates simultaneously, , so that the offered loads grow proportionally, while the service rates, capacity constraints, and delay parameters remain fixed.
Figure 2 shows the probability
that service
k is active at the MEC node. At the baseline load, the MEC node is occupied for approximately
of the time, with idle probability only about
. The hosting probabilities are highly uneven:
,
,
,
, and
. Thus, MEC occupancy is mainly shared by Services 2, 4, and 5. Service 2 is selected most frequently because it combines unit offered load with a large delay benefit
ms. Service 4 is also frequently selected owing to the largest per-user delay benefit
ms, while Service 5 receives a substantial share of MEC hosting time because it has the highest offered load,
. Services 1 and 3 are selected only rarely since their aggregate instantaneous delay saving
is typically lower.
Figure 3 shows the conditional probability that the active service fully occupies the MEC node, i.e., the probability that
under the condition
. This metric characterizes how often the selected service reaches its service-specific MEC capacity once it is active. The saturation behavior is service-dependent because the classes differ both in traffic intensity and in MEC capacity. Services with smaller MEC capacities, such as Services 2 and 4 with
, reach full MEC occupancy more easily once selected, whereas services with larger MEC capacities require more simultaneously active users to saturate the MEC. As the load scaling factor
a increases, the conditional saturation probabilities generally increase, reflecting stronger competition for the limited edge resource.
Figure 4 presents the unconditional contribution of each service class to MEC saturation. In contrast to the conditional probabilities in
Figure 3, this metric also accounts for how often each service is selected by the orchestration policy. Therefore, a service contributes strongly to unconditional MEC saturation only if it is both frequently active and likely to fill its allocated MEC capacity. The dominant contributions are associated with the service classes that combine high MEC selection probability, limited service-specific MEC capacity, and substantial delay benefit.
Figure 5 shows the average user delay under three cloud-latency scenarios as the load scaling factor
a increases. The three curves correspond to low-, medium-, and high-latency cloud configurations, while the MEC delay and the service-capacity parameters remain fixed. In all cases, the average delay increases monotonically with load, because a larger number of active users intensifies competition for the limited MEC capacity and leaves more users served through the cloud path.
The cloud-latency level has a pronounced impact on the resulting delay. At the baseline load, the average user delay is approximately 20 ms in the low-latency cloud scenario, about 50 ms in the medium-latency scenario, and about 87 ms in the high-latency scenario. This confirms that the proposed split-user offloading policy is especially beneficial in regimes where the cloud path delay is large: the larger the gap between MEC and cloud latency, the more important the delay-aware service selection becomes.
Figure 6 shows the contribution of each service class to the aggregate delay saving achieved by MEC placement. This contribution is determined by two factors: how often the service is selected for MEC hosting and how large its per-user delay benefit
is. Therefore, the dominant contributors are not necessarily the services with the largest offered load or the largest cloud delay alone, but the services with the largest aggregate effect under the policy criterion
.
At the baseline load , the largest delay-saving contributions are provided by Services 2 and 5, approximately ms and ms, respectively, followed by Service 4, with approximately ms. Services 1 and 3 contribute only about ms and ms, respectively. This confirms that the aggregate effect of MEC placement is concentrated in Services 2, 4, and 5: Service 2 is selected most frequently, Service 4 has the largest per-user delay benefit, and Service 5 combines the highest offered load with a substantial delay benefit.
5.3. Impact of Cloud Delay and Service Time
We next examine how the delay-related metrics respond to changes in cloud-path latency and service duration. First, we vary the normalized cloud delay ratio to quantify how the benefit of MEC placement grows as the cloud path becomes slower relative to the MEC path. Second, we vary the service-time scale multiplier to evaluate the effect of longer sessions on MEC contention.
Figure 7 shows the average delay saving obtained through MEC placement as a function of the normalized cloud delay ratio
. The ratio is varied while the remaining parameters are fixed at the baseline values. The vertical dashed line marks the boundary
, where MEC and cloud delays are equal and edge placement provides no latency benefit.
For all services, the delay saving increases as the cloud path becomes slower relative to the MEC path. Service 5 exhibits the steepest growth, followed by Service 2 and Service 4, because these services combine substantial MEC hosting probability with large latency benefits. Services 1 and 3 grow more slowly, which is consistent with their lower MEC-selection probabilities. This confirms that the proposed policy becomes increasingly beneficial as the cloud-to-MEC delay gap widens.
Figure 8 shows the effect of service-time scaling on the average MEC-related delay. In this experiment, the mean service time of each class is multiplied by the same scale factor, while the arrival rates, capacity limits, and delay parameters remain fixed. As the service time scale increases, users remain active in the system for longer periods, which increases contention for the limited MEC capacity and raises the delay for all service classes.
The sensitivity to service-time scaling is service-dependent. Services 3 and 5 exhibit the steepest growth at large scale factors: Service 3 increases to more than 14 ms at scale factor 2, while Service 5 grows to approximately ms. Service 1 shows a moderate increase, reaching about ms. Services 2 and 4 remain below 10 ms over the considered range, indicating that their MEC placement remains relatively stable even when service durations are scaled upward. Overall, the experiment confirms that longer sessions amplify competition for the MEC resource and can change the relative delay sensitivity of the service classes.
5.4. Discrete-Event Simulation Validation and Baseline Comparison
To validate the analytical CTMC-based model and to assess the sensitivity of the results to the assumed input distributions, we developed a discrete-event simulation (DES) model of the same hybrid MEC–cloud system. The simulator stores the load vector , the currently active service at the MEC node s, and the number of users of this service placed at the MEC node, m. After each event, either a user arrival or a service completion, the same greedy decision rule as in the analytical model is applied to update the MEC placement.
In contrast to the analytical CTMC model, the DES does not require all event times to be generated under Markovian assumptions. Instead, inter-arrival times and service times are generated explicitly from specified distributions. This allows us to verify the analytical model in the Markovian case and to evaluate how the system behaves when the arrival or service-time distribution is changed, while the mean values remain consistent with the baseline parameters in
Table 3.
Four distributional configurations are considered. The baseline Markovian configuration, denoted as “pois/exp”, uses Poisson arrivals, equivalently exponentially distributed inter-arrival times, and exponential service times. This case provides a direct verification of the analytical CTMC model. The “unif/unif” case uses uniform inter-arrival and service times. The mixed “exp/unif” case combines exponential inter-arrival times with uniform service times, while the reverse “unif/exp” case combines uniform inter-arrival times with exponential service times. The load scaling factor
a is shown on the horizontal axis, and the vertical axis reports the mean system delay computed using the same delay metric as in the analytical model. The DES-based sensitivity analysis of the mean system delay with respect to the arrival- and service-time distributions is summarized in
Table 4. Values are given in milliseconds.
The fully Markovian DES configuration closely matches the analytical values over the entire load range. For instance, at the baseline load , the analytical mean system delay is ms, while the corresponding DES estimate is ms. Across all considered load values, the relative discrepancy remains below , which validates the analytical CTMC formulation under the Poisson/exponential assumptions.
Changing the arrival or service-time distribution modifies the absolute delay level, but the qualitative dependence on the load factor remains the same. The mean system delay increases monotonically with a in all configurations. The largest delays are observed in the “exp/unif” case, reaching ms at and ms at . The “unif/exp” and “unif/unif” cases remain closer to the Markovian baseline, although they still produce higher delays than the analytical Poisson/exponential model. These results indicate that the Markovian model should be interpreted as an analytically tractable baseline, while the proposed split-user offloading policy preserves its qualitative behavior under distributional perturbations.
Figure 9 presents the line-plot comparison of the mean system delay under different arrival/service-time distribution combinations.
Figure 10 complements these results by showing the same data in a pointwise bar-chart form for each load factor. This representation makes it easier to compare the absolute differences between the analytical values and the DES results for each distributional configuration at fixed values of
a. The bar-chart view again confirms that the Poisson/exponential DES results are nearly identical to those of the analytical model, while the non-Markovian cases produce a consistently higher mean system delay, especially for the “exp/unif” configuration.
At the baseline load , the bar chart clearly shows the ordering of the considered configurations: the analytical model and the Poisson/exponential DES results are approximately 193 ms and 192 ms, respectively, followed by “unif/exp” at about 200 ms, “unif/unif” at about 208 ms, and “exp/unif” at about 237 ms. The same ordering is preserved at higher load levels, with the largest deviation from the analytical baseline observed again for the “exp/unif” case.
Finally, we compare the proposed MEC-assisted split-user offloading scheme with a cloud-only baseline. In the cloud-only configuration, all users are served through their corresponding cloud paths and the MEC node is not used. In the proposed configuration, the limited MEC capacity is dynamically assigned to the service class that provides the largest aggregate instantaneous delay saving according to (2) and (3).
Figure 11 shows that MEC-assisted split-user offloading substantially reduces the average system delay over the entire load range. The cloud-only delay remains close to 90 ms and increases only moderately with
a, whereas the proposed MEC-assisted scheme keeps the average delay below approximately 17 ms even at the highest considered load. The relative delay reduction decreases from about
at
to about
at
, remaining above
throughout the experiment. This confirms that even a capacity-constrained MEC node can provide a large delay gain when its resources are allocated selectively according to the proposed delay-aware policy.
5.5. Discussion
The presented results lead to several structural observations about the behavior of the proposed framework. First, the orchestration policy produces a naturally stratified allocation of MEC resources in the five-service scenario. Service 2 captures the largest share of MEC hosting time at the baseline load (), because it combines unit offered load () with a large delay benefit ( ms). Service 4 is also selected frequently (), despite its moderate offered load (), because it has the largest per-user MEC benefit ( ms). Service 5 receives a substantial share of MEC hosting time () due to the highest offered load () and a significant delay benefit ( ms). By contrast, Services 1 and 3 are rarely selected ( and ), which is consistent with their lower aggregate instantaneous delay-saving potential.
This stratification is not a design artifact but an emergent consequence of the greedy delay-minimization criterion (2) and (3). The policy does not prioritize the service with the largest offered load or the largest cloud delay in isolation. Instead, it maximizes the aggregate instantaneous saving ; so, the MEC assignment depends jointly on the number of active users, the service-specific MEC capacity, and the cloud-to-MEC delay difference. This explains why Services 2, 4, and 5 dominate MEC occupancy, whereas Services 1 and 3 remain marginal under the considered traffic conditions.
Second, the split-user offloading mechanism differs conceptually from monolithic migration [
7,
8]. In all-or-nothing migration, a service is either placed at the MEC as a whole or remains entirely in the cloud. In contrast, the proposed mechanism allows only a capacity-limited subset of users of the selected service to be served at the MEC. When
, all users of the active service can be placed at the MEC; when
, the policy places
users at the MEC, while the remaining
users of the same service continue to be served by the cloud. This configuration is particularly relevant when the MEC node is resource-constrained and cannot host the entire active service population.
The quantitative comparison with the cloud-only baseline in
Figure 11 shows that MEC-assisted split-user offloading substantially reduces the average system delay over the considered load range. The relative delay reduction remains above
, decreasing from about
at
to about
at
, with an improvement of approximately
near the baseline load. This confirms that even a capacity-constrained MEC node can provide a large delay gain when its resources are allocated selectively according to the proposed delay-aware policy.
Third, the DES validation confirms that the analytical CTMC model accurately reproduces the event-driven system dynamics under the Markovian assumptions. The Poisson/exponential DES results are nearly indistinguishable from the analytical values across the full load range; at , for example, the analytical mean system delay is ms, while the corresponding DES estimate is ms. This agreement supports the product-form stationary analysis and validates the closed-form computation of the main performance metrics.
The distributional sensitivity analysis clarifies the role of the Poisson/exponential assumptions. Replacing the arrival or service-time distribution changes the absolute delay level but preserves the monotone growth of the mean system delay with the load factor. The largest delays are observed for the “exp/unif” configuration, which reaches ms at and ms at . Therefore, the Markovian model should be interpreted as an analytically tractable baseline rather than as a complete representation of all possible industrial traffic patterns. At the same time, the qualitative behavior of the proposed policy remains stable under the considered distributional perturbations.
The present study is subject to several modeling limitations. First, migration and MEC reconfiguration are modeled as instantaneous and costless. This idealization simplifies the CTMC analysis but excludes the latency, bandwidth cost, and control-plane overhead associated with live session transfer, session rerouting, or container warm-up. Therefore, the reported delay values should be interpreted as an idealized lower-delay reference case. Incorporating explicit migration overhead, for example, through a switching cost or a finite-duration migration phase, is a natural direction for follow-up work.
Second, the model uses class-dependent effective delays and rather than a detailed channel-aware or computation-aware delay decomposition. This is appropriate for the service-orchestration layer considered in this paper, where and can be interpreted as measured or precomputed mean end-to-end delays. More detailed wireless-channel, task-size, and processing-delay models can be coupled to the proposed framework by periodically updating these effective delay parameters.
Third, the baseline assumptions of Poisson arrivals and exponential service times enable the product-form result, but real industrial traffic may be bursty, correlated, or non-stationary. The DES results with alternative distributions provide a first sensitivity check, while more realistic extensions could use MMPP arrivals [
35], phase-type service-time distributions, or stochastic network calculus bounds [
36]. Finally, the single-MEC-node scope isolates the split-user offloading decision under one resource bottleneck. Multi-MEC topologies, shared edge resources, and inter-edge migration remain important directions for future investigation, building on multi-server frameworks [
13].
6. Conclusions
This paper proposed a queueing-theoretic framework for adaptive service migration in hybrid MEC–cloud environments with split-user offloading. The system was modeled as a CTMC over the vector of active users, while the MEC placement was determined by a delay-aware greedy policy. Under the stated assumptions, the stationary distribution admits a product-form representation, which enables closed-form computation of the main performance metrics, including service hosting probabilities, MEC saturation, average delay, and delay saving.
The numerical study considered a five-service industrial MEC–cloud scenario with heterogeneous arrival rates, service durations, cloud delays, and MEC capacities. The results showed that the proposed policy produces a naturally stratified allocation of MEC resources. At the baseline load, the MEC node is occupied for approximately of the time, and its hosting time is mainly shared by Services 2, 4, and 5. This behavior follows directly from the policy criterion : the selected service is not determined by load or latency alone, but by their aggregate instantaneous delay-saving effect.
The split-user mechanism differs from all-or-nothing migration and provides a substantial gain over cloud-only operation. If , all users of the active service can be placed at the MEC; if , the MEC hosts users, while the remaining users of the same service continue to be served by the cloud. This configuration is unavailable under monolithic migration, where a service must either be placed at the MEC entirely or remain in the cloud. The comparison with the cloud-only baseline showed that MEC-assisted split-user offloading reduces the average system delay by more than over the considered load range, with an improvement of approximately near the baseline load.
The analytical results were further validated by discrete-event simulation. Under the Poisson/exponential assumptions, the DES results closely matched the closed-form CTMC values, with relative discrepancy below across the considered load range. Additional simulations with alternative inter-arrival and service-time distributions showed that non-Markovian inputs change the absolute delay level but preserve the qualitative load-dependent behavior of the proposed policy. Thus, the Markovian model serves as a tractable analytical baseline while still capturing the main structural effects of split-user offloading.
Future work will address three main extensions: relaxing the Poisson/exponential assumptions to phase-type or Markovian Arrival Process models; incorporating explicit migration overhead as a finite-duration service phase; and generalizing the architecture to multi-node MEC topologies with shared coverage, building on multi-server analytical frameworks [
13] and energy-aware orchestration for next-generation networks [
23].