Workload-Aware Edge Node Orchestration and Dynamic Resource Scaling in MEC

Oikonomou, Efthymios; Rouskas, Angelos

doi:10.3390/fi18040184

Open AccessArticle

Workload-Aware Edge Node Orchestration and Dynamic Resource Scaling in MEC

by

Efthymios Oikonomou

^†

and

Angelos Rouskas

^*,†

Department of Digital Systems, University of Piraeus, 18532 Piraeus, Greece

^*

Author to whom correspondence should be addressed.

^†

Current address: M. Karaoli & A. Dimitriou 80, 18534 Piraeus, Greece.

Future Internet 2026, 18(4), 184; https://doi.org/10.3390/fi18040184

Submission received: 8 February 2026 / Revised: 19 March 2026 / Accepted: 27 March 2026 / Published: 1 April 2026

(This article belongs to the Special Issue Edge and Fog Computing for the Internet of Things, 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

The emergence of edge computing introduces significant opportunities to improve real-time responsiveness and reduce latency by deploying computational resources closer to end users, at the edge, compared to traditional centralized cloud computing. However, stochastic and fluctuating workloads pose challenges in maintaining Quality of Service, often leading to resource fragmentation, service node saturation, and energy inefficiencies. In addition, imbalances in service node utilization, arising from either under-utilization or over-utilization, degrade the overall system performance and lead to unnecessary operational costs. Furthermore, finding an optimal balance between total latency cost and load balancing in different network topologies remains a significant challenge. In this research, we propose and evaluate a workload-aware orchestration framework that integrates short-term workload forecasting with dynamic resource scaling to efficiently manage edge node infrastructure under dynamic processing demands. The framework employs heuristic schemes that consider both workload distribution and service proximity throughout the edge network to optimize the distribution of edge users’ service requests across service nodes. Simulation results on grid and irregular edge network topologies, utilizing both synthetic and real-world dataset, demonstrate that the proposed framework and the integrated heuristics outperform other benchmark approaches. Specifically, our framework achieves up to 20% lower load imbalance variance, maintains high resource utilization, decreases system reconfigurations and increases service reliability, providing a robust, low-overhead and adaptive solution for dynamic orchestration in edge computing environments and infrastructures.

Keywords:

edge computing; MEC; latency; load balancing; utilization; SLA violation; energy; reconfigurations; QoS

Graphical Abstract

1. Introduction

The rapid and widespread deployment of Internet of Things (IoT) technologies together with the emergence of 5G and upcoming 6G networks, has driven a significant transition shift from traditional centralized cloud architectures toward to Multi-Access Edge Computing (MEC) [1,2]. By deploying computational resources at the edge of the Radio Access Network (RAN), in close proximity to end-users, MEC significantly reduces propagation latency and backhaul congestion, enabling delay-sensitive applications such as autonomous driving, Augmented Reality (AR), and real-time video analytics [3,4].

However, the effective orchestration of edge resources introduces complex challenges distinct from traditional cloud environments. Edge nodes (e.g., servers colocated with Base Stations) are characterized by limited computational capacity and strict energy constraints [5]. Furthermore, edge workloads are inherently highly dynamic, characterized by sharp fluctuations in both time-varying intensity and spatial distribution, driven by user mobility and daily different traffic patterns. In such a volatile environment, static resource provisioning approaches are insufficient, inevitably resulting in either resource wastage due to under-utilization or service outages and Service Level Agreement (SLA) violations during traffic peaks. Consequently, operators require robust Dynamic Edge Node Orchestration strategies that can continuously right-size the network by activating only the necessary number of Service Nodes (SNs) and efficiently distributing the Access Nodes’ (ANs) workload among them to maintain optimal performance.

A fundamental challenge in this domain is the inherent trade-off between minimizing communication latency and maintaining load balance. Naive latency minimization strategies, which assign users to the geometrically closest server, modeled as the classic capacitated K-median problem [6], often result in “hotspots” where central nodes become saturated while peripheral nodes remain idle. Conversely, aggressive load balancing may route traffic to distant nodes, violating latency constraints.

In our previous work, we have progressively addressed specific aspects of this challenge. In [7], we addressed the spatial aspect by formulating the static placement problem as a bi-objective optimization task and proposing Service node Selection and Access node Assignment (SSAA) schemes to balance latency and load. In [8], we explored Machine Learning-based workload prediction to anticipate demand, while in [9], we addressed the temporal aspect by introducing a hysteresis-based stability control to manage time-varying workloads.

While ref. [9] focused on temporal load variations to perform system reconfigurations, adapting the active service nodes and redirecting load, it did not consider spatial load variations during those reconfigurations. In this study, we extend the work presented in [9] to address the orchestration problem across a fully dynamic spatio-temporal domain. Here, we propose a robust Workload-Aware Edge Node Orchestration framework that integrates the spatially-optimized SSAA schemes [7] with the predictive temporal stability mechanisms of [9]. By combining spatial load balancing with temporal forecasting (Exponentially Weighted Moving Average—EWMA) and hysteresis, this framework enables the system to proactively scale resources and redistribute load in time to avoid congestion and overloading (underutilization) when demand increases (decreases), thus ensuring operational stability without the “flapping” often observed in purely reactive systems. Unlike computationally intensive Deep Reinforcement Learning (DRL) approaches, our framework prioritizes low complexity, making it suitable for real-time execution on resource-constrained edge controllers.

While prior studies, including our previous works [7,9], have addressed spatial allocation or temporal scaling in isolation, their performance under fully dynamic, combined spatio-temporal volatility is questionable. The core problem addressed in this paper is how to maintain QoS and operational stability in a continuously fluctuating environment with limited computational overhead, and its contribution lies in the practical, system-level integration of lightweight components, such as EWMA forecasting, hysteresis, and heuristics (e.g., SSAA), into a closed-loop, fully dynamic orchestration framework capable of real-time edge node management.

The main contributions of this article are summarized as follows:

We formulate the edge node orchestration problem as a dynamic variation of the capacitated K-median clustering problem, targeting both communication cost minimization and load balance optimization in space and time.
We extend our previous temporal stability mechanism [9] by integrating it with the SSAA schemes. This combination allows the system to handle spatial shifts in demand, as well as temporal variations.
We implement a closed-loop controller using EWMA forecasting to drive dynamic resource scaling, minimizing the operational overhead of service migrations and avoiding SLA violations.
We conduct extensive simulations on a Manhattan Grid topology under uniform and random traffic patterns as well as on an irregular graph using real-world CPU utilization traces from BitBrains. The results demonstrate that our Workload-Aware Edge Node Orchestration framework, integrated with the SSAA schemes, achieves superior performance in terms of load balancing, latency and service outages compared to other proposed schemes.

The remainder of this paper is organized as follows: Section 2 reviews the related research work. Section 3 describes the system model and formulates the problem. Section 4 presents the proposed heuristic approaches. Section 5 details the predictive monitoring and dynamic scaling mechanisms. Section 6 provides a comprehensive performance evaluation. Finally, Section 7 concludes our study.

2. Related Work

The problem of Edge Server Placement (ESP) and User Allocation has been studied extensively, evolving from static combinatorial optimization to dynamic, AI-driven orchestration.

2.1. Static Placement and Latency Minimization

Early research primarily focused on minimizing average response time in static environments, often treating the problem as a variation of the facility location or K-median problem. In [10], the authors proposed heuristics to place cloudlets in Wireless Metropolitan Area Networks (WMANs) to minimize latency. Similarly, the study in [11] utilized mixed-integer programming to optimize server placement, while the authors in [12] applied heuristic approaches for industrial cellular networks. While effective for reducing propagation delay, these static approaches fail to adapt to time-varying workloads.

However, focusing solely on average latency can conceal critical performance issues. As highlighted by the authors in [13], minimizing average delay does not guarantee real-time performance for all users where optimizing the maximum delay reduction is crucial to prevent long-tail latency events in delay-sensitive applications. Thus, these approaches often fail to account for the rapid temporal variance of edge workloads.

2.2. Load Balancing and Energy Efficiency

Recognizing the limitations of latency optimization, recent works have integrated load balancing and energy metrics. In our previous work [7], we introduced the concept of “Proximity-Aware” load balancing, formalized as a weighted bi-objective optimization task. This aligns with findings in [14], where the authors proposed adaptive thresholds using K-ans clustering to dynamically distinguish between overloaded and underloaded servers. Furthermore, the works in [15,16] treated energy consumption as a primary cost factor alongside delay. More recently, the study in [17] extended this by integrating Quality of Experience (QoE) awareness directly into the energy efficiency optimization, demonstrating that energy savings should not come at the cost of user-perceived quality. Our proposed framework aligns with these trends by treating load balance not as a constraint, but as a primary optimization objective along with latency.

2.3. Predictive, Dynamic, and Resilient Orchestration

To address dynamic workloads, the literature has shifted heavily toward predictive models and Deep Reinforcement Learning. The authors in [18] proposed a “Spectral-DDQN” strategy that combines spectral clustering with Double Deep Q-Networks to optimize edge server deployment, effectively reducing decision space dimensionality. Similarly, in [19], a DRL-based framework was introduced for resilient service placement in vehicular networks, explicitly optimizing for fault tolerance and latency. Another cooperative scheme, presented in [20], allows overlapping server coverage to handle peak demands, reducing deployment costs by over 38%.

However, the complexity of these systems often introduces new challenges. For instance, the survey in [21] highlights that while heterogeneity in edge resources allows for flexible scheduling, it makes dynamic allocation particularly prone to SLA violations if not managed with predictive foresight. To mitigate this, the study in [22] proposed a Digital Twin-assisted strategy to predict resource demands and preemptively migrate Virtual Network Functions (VNFs), thereby reducing service outage time. Similarly, the work in [23] demonstrated that AI-based workload prediction models (e.g., using LSTM) significantly outperform static thresholds in reducing latency and optimizing costs. To ensure real-time responsiveness, our work adopts a lightweight algorithmic design, employing EWMA for workload forecasting and minimizing inference latency and load imbalance while reducing processing overhead at the controller level.

Despite these advancements, ensuring operational stability remains a critical concern. As argued in [24], budgeted edge server placement must explicitly account for fairness to ensure equitable service across different user groups. In our recent study [9], we addressed the instability inherent in dynamic scaling by introducing a hysteresis-based stability control. This mechanism prevents the “flapping” of active nodes during transient traffic spikes. This conceptual approach is supported by the authors in [25], who developed a moderate handover method using predictive scoring to balance latency gains against the reliability costs of frequent handovers.

Recent literature has also explored dynamic resource allocation to handle end-user mobility and fluctuating channel conditions. For instance, advanced online optimization models, such as the two-timescale approach presented in [26], expertly manage user handovers by continuously deciding between real-time service migration and task rerouting. While highly effective for fine-grained operational tuning, these online methods rely on complex algorithms that often incur significant computational overhead due to the constant live migration of services. In contrast, our ENOM framework tackles this overhead by establishing a structurally robust offline topological placement as the system naturally absorbs stochastic workload variations through simple dynamic resource scaling.

Our current research bridges the gap between these computationally intensive AI-driven models and traditional static optimization heuristics. By integrating EWMA lightweight forecasting with the bi-objective heuristics established in [7], we propose a robust Workload-Aware Edge Node Orchestration framework that delivers a solution combining the responsiveness of dynamic auto-scaling with the strict with the strict constraints on latency sensitivity, performance stability, and service continuity that characterize MEC infrastructures.

3. System Model and Problem Statement

3.1. Network Architecture and Dynamic Context

We consider a wireless edge-computing architecture operated by a mobile network provider. The access network comprises M fixed interconnected Access Nodes (ANs), modeled as a connected graph

G = (V, E)

, where

V = {v_{1}, \dots, v_{M}}

represents the nodes and E are backhaul links connecting them. Any two ANs can reach each other via multi-hop paths.

Each AN

v_{j}, j = 1, 2, \dots, M

is capable of hosting a Service Node (SN) equipped with computational resources, including CPU, memory, and storage. To offer edge services, the operator selectively instantiates SNs colocated with ANs, enabling computation to be performed close to end users. Unlike static deployments, we consider a fully dynamic environment where resource provisioning adapts to space-time varying demands.

Let the time horizon T be divided into control intervals (epochs)

t = 1, 2, \dots, T

. At any time epoch t:

The Offered Workload represented by a non-negative weight $w_{j} (t)$ , is the computational demand (measured in MIPS) originating from end-users attached to AN $v_{j}$ , which varies in both time (temporal intensity) and space (spatial location), reflecting a dynamic environment. Both uniform and non-uniform distributions of $w_{j} (t)$ across ANs at the network edge are assumed in our model.
The Total Offered Workload $W (t) = \sum_{j = 1}^{M} w_{j} (t)$ , is the aggregate computational demand (measured in MIPS) originating from all end-users attached to each AN in the network.
The Dynamic Dimensioning $N (t)$ , is the minimum number of active SNs required at time t, chosen from the set of available ANs (each AN may host a SN) to accommodate the aggregate workload demand, such that $1 \leq N (t) \leq M$ .
The Active Service Node Set $S (t)$ , is the subset of ANs chosen to host active SNs, where $S (t) \subseteq V$ and $| S (t) | = N (t)$ .

Activating only a subset of SNs (

N (t) < M

) is an energy-aware operating strategy, which allows the operator to optimize energy consumption by matching active resources to the offered (or forecasted) aggregate workload

W (t)

while keeping headroom for short-term spikes and avoiding system’s under-utilization. Under these considerations, fundamental challenges are to dynamically determine the optimal deployment size

N (t)

, the active service node set

S (t)

, and the user assignment mapping in order to minimize latency and ensure load balancing while both preserving overall system performance and meeting Quality of Service (QoS) guarantees.

Figure 1 illustrates the system model considered in this study. At the lowest layer, six (6) end-user devices connect via wireless links to their nearest AN, requesting edge resources and services. Only nine (9) of the ANs in this grid network are shown and only three of them host an active service SN. As a result, users attached to the remaining six (6) ANs of the figure, whose collocated SNs are inactive, must obtain edge resources and services from one of the three or some other operational SNs. All SNs are similarly dimensioned with per-node computational capacity C (e.g., MIPS).

3.2. Traffic and Communication Costs

Let

d_{j k}

be the minimum-delay path cost between ANs

v_{j}

and

v_{k}

. Since we focus on computation-side decisions, inter-AN links are assumed to share the same characteristics in capacity and latency and are adequately provisioned; consequently

d_{j k}

is determined by the the hop count (or an equivalently normalized delay) on the shortest path. Furthermore, this study focuses on the interconnections among the network nodes rather than the wireless UE–AN links. These links are assumed to be sufficiently provisioned to accommodate computational service traffic, ensuring that radio access conditions do not affect the SN placement and ANs assignment.

3.3. Problem Formulation

From the user point of view, for each time epoch t, the system must provide services within acceptable time thresholds, while from the operator point of view (a) the number of established capacity should match the offered workload with minimum server over-provisioning and (b) server infrastructures should be adequately utilized avoiding utilization imbalances, so that situations with overloaded and underutilized equipment are avoided as much as possible.

Let C denote the homogeneous processing capacity of each SN, so that the active processing capacity

N (t) \cdot C

can satisfy the current aggregate workload demand

W (t)

(

W (t) \leq N (t) \cdot C

). We define two sets of binary decision variables for each time epoch t, where,

y_{k} (t) \in {0, 1}

equals 1 (0) if an SN is active (not active) at AN

v_{k}

at time t and

x_{j k} (t) \in {0, 1}

equals 1 (0), if the workload of AN

v_{j}

is assigned (not assigned) to AN

v_{k}

that hosts a SN at time t.

The constraints for any feasible configuration at time t are:

\sum_{k = 1}^{M} x_{j k} (t) = 1, j = 1, \dots, M

(1)

x_{j k} (t) \leq y_{k} (t), j, k = 1, \dots, M

(2)

\sum_{k = 1}^{M} y_{k} (t) = N (t)

(3)

\sum_{j = 1}^{M} w_{j} (t) \cdot x_{j k} (t) \leq C \cdot y_{k} (t), \forall k = 1, \dots, M

(4)

Constraint (1) ensures every AN is served by exactly one SN. Constraint (2) ensures users are only assigned to active SNs. Constraint (3) limits the number of active servers to the dynamically determined value

N (t)

. Constraint (4) enforces the server capacity limit, namely the load directed to some SN cannot exceed its capacity.

Minimization of Communication Latency

From the user point of view, this objective represents the classic instance of the capacitated K-median (a.k.a. capacitated facility-location) problem [6] with the goal of reducing the weighted sum of distances between users and edge servers at every time epoch t.

min D (t) = \sum_{j = 1}^{M} \sum_{k = 1}^{M} w_{j} (t) \cdot d_{j k} \cdot x_{j k} (t)

(5)

Subject to (1)–(4).

2.: Minimization of Load Imbalance

From the operator point of view, the near equal distribution of load across the

N (t)

SNs is equivalent to minimizing the maximum load assigned to any single edge server at every time epoch t.

min w_{m a x} (t) = max_{k \in {1, \dots, M}} \sum_{j = 1}^{M} w_{j} (t) \cdot x_{j k} (t)

(6)

Subject to (1)–(4).

The K-median problem is

N P

-hard and a bi-objective optimization problem that combines both objectives into a single weighted normalized sum would be also computationally intractable. In addition, the procedure of solving these problems repeatedly at every time epoch t is not a practical choice, especially when load conditions do not significantly change and the current SNs capacity configuration can absorb instant workload variations.

Thus, given the aggregate processing demand at epoch t across ANs and the per-SN capacity C, our approach determines the optimal edge node configuration via a two-stage decomposition:

A.: Compute-Node Selection Set (CSS) Subproblem

Given

N (t)

that can accommodate

W (t)

, determine the optimal subset

S (t) \subseteq V

of

N (t)

ANs that will host the SNs. The selection of

S (t)

aims to reduce the aggregate latency path costs for the entire topology while creating favorable conditions for load balancing. So, both latency and load balance are targeted in this step.

B.: Access Node-to-Compute Node Assignment (ACA) Subproblem

Given the subset

S (t)

, partition the remaining

M - N (t)

ANs into

N (t)

clusters and assign each AN

v_{j}

(with weight

w_{j} (t)

) to exactly one preferred active SN located at

v_{k} \in S (t)

. During this stage, the assignment also aims to minimize communication cost subject to capacity limits while promoting balanced utilization across the operational SNs.

C.: Edge Node Orchestration Module

To bypass the impractical situation of formulating and solving the above problems at each time epoch while effectively manage this highly dynamic operating conditions of the edge environment, we introduce the Edge Node Orchestration Module (ENOM), which operates as a closed-loop control entity deployed at the network edge or within a regional controller. Specifically, ENOM is responsible for:

Predictive Sensing: Monitoring whether the offered workload conditions have significantly changed, by measuring the global system utilization $U_{s} (t)$ and forecasting future demand and filtering temporal short peaks or dips.
Temporal Stability: Determining the number of active nodes $N (t)$ while applying hysteresis constraints to prevent unecessary “ping-pong” reconfigurations.
Spatial Optimization: Solving CSS and ACA subproblems to activate the most proper $N (t)$ SNs and allocate the remaining ANs to these SNs or using precomputed (offline) active sets $S (t)$ and ANs (workload) assignments to accelerate reconfiguration phases.

ENOM operates as a centralized orchestration entity but ensures low computational complexity and is highly efficient for small to medium and large MEC networks. However, as the network expands to large-scale MEC deployments, a single control plane may encounter control signaling bottlenecks. In such large-scale scenarios, the ENOM framework may operate as a local domain controller. The overall network orchestration would scale by partitioning the extensive infrastructure into smaller, manageable edge domains, each governed by its own ENOM instance, cooperating under a federated or hierarchical control-plane architecture. Finally, the system model used in our simulations assumes a reliable and mature infrastructure. Factors such as unexpected SN hardware failures, AN disconnections or link outages and disconnection fall outside the scope of this study, as our primary focus is on handling workload volatility and dynamic resource orchestration under stable, fault-free operational conditions.

4. Proposed Schemes

As established in our previous work [7], two computationally efficient heuristic schemes were introduced for the CSS and ACA subproblems. While the heuristics were validated for static snapshots, in this work, they are employed during the real-time reconfiguration mechanism triggered by the ENOM controller when operator-defined performance, utilization and QoS constraints are violated, or can be executed offline to produce network configurations that can be applied in real-time when workload demands change substantially.

4.1. Heuristic for CSS Subproblem

We utilize the heuristic detailed in [7], whose main idea is to (a) distribute SNs as evenly as possible throught the whole network of ANs and (b) promote the selection of SNs that are close to highly loaded ANs. Its operation is briefly described below:

Scoring: Each candidate node $v_{k}$ is scored based on the centrality measure $D_{k} = \sum_{j = 1}^{M} w_{j} (t) \cdot d_{j k}$ , which is the weighted distance of all other ANs to $v_{k}$ . Nodes are listed in increasing order of this measure. Intuitively, nodes with low (high) $D_{k}$ scores are topologically central (peripheral) to the current spatial distribution of the workload.
Selection with Dispersion: The algorithm iteratively selects the best candidate from the sorted list starting from the nodes with low values towards the higher ones. However, to prevent the clustering of servers in heavily loaded central locations, the selection procedure applies a minimum distance between selected nodes constraint that favors the dispersion of the selected servers, thus ensuring a wider coverage area.

4.2. Heuristic for ACA Subproblem

Once the locations

S (t)

are fixed, this stage assigns every AN

v_{j}

to a specific SN

v_{k} \in S (t)

. A naive nearest-neighbor assignment often leads to capacity violations and severe load imbalances. To bypass this, we apply the Proximity-Aware Load Balancing logic proposed in [7], whose idea is to assign ANs to SNs in a round robin (RR) fashion, however with a different order of visiting SNs in each cycle that favors SNs that have ANs to be assigned very close to them. Its operation is briefly described as follows:

Determining the visiting order in a RR cycle: First, the normalized distances of every not yet assigned AN from every SN, defined as the AN to SN actual distance divided by the sum of distances of this AN to all SNs. A normalized distance (lower) greater than $1 / N (t)$ indicates an AN (close to) far away from a SN. For every SN we find the smallest such normalized distance and we sort SNs in increasing order of these minimum normalized distances. This is the order of visiting SNs in this cycle.
Assignments during a RR cycle: For each SN visited, if its minimum normalized distance is lower than $1 / N (t)$ we assign the AN that corresponds to this minimum normalized distance to this SN and remove the AN from further consideration, provided that the load of the SN will not exceed its capacity. If, however, the SN minimum normalized distance is greater than $1 / N (t)$ , the SN and all remaining SNs are skipped to avoid allocating very distant ANs to them.

As proven in [7], both heuristics have polynomial execution times with respect to the number of ANs M and SNs

N (t)

. In the following, we will refer to these heuristics as the

S S A A

schemes.

5. Predictive Utilization Monitoring and System Reconfigurations

To address the volatility of edge workloads while preventing the operational instability known as the ping-pong effect, the ENOM module implements a closed-loop control cycle. This controller continuously monitors system utilization, forecasts short-term aggregate demand, and dynamically adjusts the active SN set

S (t)

. The controller is designed to achieve two conflicting operational goals: (i) maintaining average per-SN utilization within a target efficiency range to absorb transient bursts without significant over-provisioning, and (ii) minimizing service disruptions by triggering reconfigurations only when demand shifts are sustained for longer periods.

5.1. Sensing and Forecasting

Accurate workload forecasting is a prerequisite for proactive scaling. While Deep Learning models (e.g., Long Short-Term Memory—LSTM) offer high long-term accuracy, they impose significant overheads that can violate the strict timing constraints of real-time edge controllers. Therefore, we adopt an Exponentially Weighted Moving Average (EWMA) predictor for system utilization, typically on workload estimation, a standard choice for smoothing noisy measurements in time-varying systems while rapidly tracking evolving trends with minimal computational complexity. Compared to other lightweight predictors like Simple Moving Average (SMA) or Windowed Linear Regression [27], EWMA is exceptionally suited for a real-time edge node orchestrator due to its

O (1)

computational complexity and minimal memory footprint, as it does not require storing historical data windows.

Let

U_{s} (t)

denote the observed system utilization at time t defigned as the ratio of aggregate offered load to the aggregate capacity of the currently active SNs and

U_{e} (t)

its EWMA estimate:

U_{e} (t) = (1 - α) U_{s} (t) + α U_{e} (t - 1), 0 \leq α \leq 1,

(7)

where a smaller

α

reacts faster to changes, while a larger

α

promotes historical data, thus providing smoother trends. Values between 0.70 and 0.80 assign significant weight to the most recent observation

U_{s} (t)

, allowing the system to react promptly to the rapid traffic spikes characteristic of edge environments while also providing a stable responsive tracking in our traces.

5.2. Dynamic Scaling and Threshold Policy

Based on the forecasted utilization

U_{e} (t)

, which corresponds to the expected workload, the orchestration module ENOM determines an estimate of the optimal number of SNs

N_{e} (t + 1)

required for the next control interval. The system iteratively evaluates the current configuration to identify the minimum number of SNs that can maintain the average system utilization

U_{a v g}

within a specific safe operating window:

U_{m i n} \leq U_{a v g} \leq U_{m a x}

(8)

In our experiments, we define the lower bound

U_{m i n} = 35 %

to prevent resource wastage and the upper bound

U_{m a x} = 75 %

to ensure sufficient headroom for processing bursts without violating Service Level Agreements (SLAs). If the estimated load would push utilization outside these bounds, and this estimation is observed consecutively, a reconfiguration is triggered to either scale up or scale down

N (t + 1)

, as explained below.

The selected thresholds of 35% and 75% are strategically based on established server performance and energy efficiency profiles. Specifically, operating a typical edge server consistently below 35% utilization is highly power-inefficient. Therefore, a lower bound of 35% ensures that under-utilized nodes are promptly deactivated to conserve energy. Conversely, an upper threshold above 75% leaves insufficient computational headroom to safely absorb load micro-bursts, risking transient saturation and SLA violations. Furthermore, lowering this upper bound (e.g., to 60%) would trigger premature scaling and excessive reconfiguration overhead. Thus, the 35–75% operational band consistently provides the optimal balance between minimizing energy waste and preserving robust service reliability. Finally, these thresholds remain fully configurable, allowing network operators to dynamically adjust them according to specific hardware characteristics, operational policies or network conditions.

5.3. Hysteresis and Stability Control

To avoid “flapping”, where the system oscillates rapidly between configurations due to short-term load fluctuations, we implement a temporal hysteresis mechanism. The ENOM does not apply a new configuration immediately upon detecting a violation of the utilization thresholds. Instead, it utilizes a stability counter i and a monitoring window of length L. The system tentatively calculates the required number of SNs for each time epoch t (timestamp), but commits to a reconfiguration only after the estimation remains consistent for L consecutive timestamps. Once this stability criterion is met, the module calculates the ceiling of the average number of SNs requested over the window to determine the final new configuration:

N (t + 1) = ⌈\frac{1}{L} \sum_{i = 0}^{L - 1} N_{e} (t - i)⌉

(9)

Once

N (t + 1) \neq N (t)

is confirmed, the ENOM triggers the SSAA schemes to physically identify the optimal set S of size

N (t + 1)

and re-map the ANs, thereby closing the control loop. This approach ensures that reconfigurations are driven by sustained trends rather than momentary outliers, thereby minimizing the operational overhead associated with necessary activations or deactivations of edge SNs.

Figure 2 details the whole process of the ENOM module.

6. Evaluation: Results, Analysis and Discussion

We compare our proposed

S S A A

schemes against two benchmark algorithms enhanced with a load–balance criterion: (i) Forward Greedy with Load Balance (

F G L B

), a K-median–oriented selector [28,29] and (ii) Betweenness Centrality with Load Balance (

B C L B

), a centrality-driven selector [30]. In both schemes, each AN is assigned to its closest/central SN within the selected set

S (t)

of size

N (t)

. When multiple SN selections yield identical communication cost, ties are resolved using a common load–balance rule.

Forward Greedy with Load Balance

FGLB follows the classic forward-greedy K-median strategy: it initiates with the single SN that minimizes the weighted sum of communication costs for all ANs and iteratively adds one SN

v_{k}

that induces the smallest marginal increase in total communication cost, and then ANs are reassigned to their closest SNs in

S \cup {v_{k}}

. When multiple candidates yield the same minimum cost, FGLB breaks ties by favoring load balance: it picks the candidate that minimizes the variance of SNs workloads.

Betweenness Centrality with Load Balance

BCLB ranks nodes by betweenness centrality—i.e., quantifies how often a node lies on shortest paths between other node pairs—capturing their potential to serve as transit/aggregation hubs in the topology. BCLB selects the top-N ANs by centrality as SNs, with ties decided using the same load–balance rule: among tied candidates, choose the set that, after assigning ANs to their closest SNs, minimizes the variance of SNs workloads. This centrality-first selection tends to place SNs on structurally critical locations, while the load–balance feature mitigates over-concentration of workloads on a few hubs.

6.1. Experimental Setup: Environment, Topology and Workload Modeling

To evaluate the performance of the proposed ENOM framework, we developed a discrete-event simulation environment that captures the dynamic interactions between time-varying and space-shifted traffic loads within an edge network environment. The simulation assesses the system’s ability to maintain SLAs while optimizing response time, energy efficiency and load balancing under stochastic demands.

The edge network used is modeled as a set of

M = 25

ANs arranged in a

5 \times 5

Manhattan Grid topology. This grid-based layout represents the dense urban infrastructures and captures the flat, mesh-oriented architectures of current 5G infrastructures and beyond-5G, 6G, edge deployments. Figure 3 illustrates the layout where in this example the large circles mark the seven operational SNs colocated with the relative ANs and the numbers in the circles denote the corresponding AN indices.

The simulation performs a continuous 24-h operational horizon, divided into T = 1440 control intervals, corresponding to one-minute slots. The offered workload

w_{j} (t)

, measured in Million Instructions Per Second (MIPS), represents the aggregate computational demand generated by users attached to AN

v_{j}

. To emulate the stochastic nature of mobile edge traffic, we model the daily load variation using eight distinct Traffic Intensity Periods, P1–P8. The synthetic workloads utilized in this study are designed to emulate distinct periods of traffic intensity and burstiness. The offered workload profile is detailed in Table 1 and Figure 4 provides its visual representation.

We evaluate the framework under two distinct load distribution scenarios:

Uniform Offered Workload Distribution: The aggregate workload $W (t)$ is evenly distributed across all ANs, $w_{j} (t) \approx W (t) / M$ , during each time interval. This scenario tests the system’s ability to minimize latency when demand is geographically homogeneous.
Random Offered Workload Distribution: The offered workload over time is stochastically distributed to emulate heterogeneous demand and create a fully dynamic environment. This scenario introduces high spatial variance, forcing the orchestrator to balance the trade-off between proximity and load balancing.

The orchestration controller utilizes an EWMA smoothing factor of

α = 0.75

for workload forecasting and enforces a hysteresis window of

L = 5

to prevent rapid fluctuating reconfigurations. The smoothing factor was empirically tuned to

α = 0.75

. Lower values (e.g.,

α = 0.5

) over-smoothed the data, causing the system to miss genuine traffic bursts, while higher values (e.g.,

α = 0.9

) made the system overly sensitive to transient spikes, triggering unnecessary reconfigurations resulting to the ping-pong effect. The hysteresis window length L fundamentally controls the stability-delay tradeoff. A smaller L reduces the delay in scaling down resources, saving energy but increasing the risk of ‘flapping’ (rapid scale-up/scale-down cycles). Conversely, a larger L maximizes system stability but forces the system to consume unnecessary energy for longer periods after a traffic burst has subsided. The target utilization band is set to

[35 %, 75 %]

to optimize energy efficiency while reserving headroom for load bursts. These values represent configurable parameters, allowing the network operator to dynamically adjust the orchestration strategy in alignment with real-time network monitoring, traffic demands, and specific operational priorities and strategies. The complete set of system parameters is summarized in Table 2.

To eliminate the real-time computational overhead, the sets of active SNs utilized in our simulation experiments were calculated and dimensioned offline assuming a uniform workload distribution. During this offline phase, the optimal topological locations for the active SNs are precomputed for every potential network scaling level (

1 \leq N \leq 25

). These baseline sets are dimensioned using the SSAA, FGLB, and BCLB schemes based strictly on the graph topology and capacity constraints, assuming a uniform baseline workload. Subsequently, during the online (real-time) operational phase, the ENOM module simply maps the dynamically forecasted traffic demand to these pre-calculated configurations. This decoupling essentially reduces the real-time control-plane orchestration to highly efficient

O (1)

table lookups. To rigorously validate the operational resilience of these offline-dimensioned sets, they were subsequently stress-tested against both uniform and highly stochastic, random workload scenarios.

To ensure statistical validity of the results, each simulation scenario was executed for 30 independent iterations using distinct random seeds to generate the offered workload and distribution (Table 1 and Figure 4). The results presented in the following figures depict the mean values calculated across these 30 runs.

Our experiments were conducted on a workstation equipped with an Intel^® Core™ Ultra 7 processor (up to 5.0 GHz), 32 GB of RAM and Windows 11 Pro (64-bit). Both the simulation framework and all the evaluated algorithms were implemented in Python 3.12.

6.2. Resource Utilization Efficiency

We first evaluate the resource utilization efficiency of the active SNs as the network scales in response to varying demand. The orchestration module dynamically adjusts the number of active SNs

N (t)

, incrementing the active set only when the aggregate computational capacity

N (t) \times C

becomes insufficient to meet the offered load. Likewise, the active set of SNs is decreased if the installed capacity is underutilized. Figure 5 reports the average utilization, defined as a fraction of the available capacity that is actively used, of the active SNs, under both (a) uniform and (b) random workload distributions.

In the uniform offered workload scenario, presented in Figure 5a, the SSAA maintains utilization levels higher than FGLB and BCLB baselines. This sustained high utilization indicates accommodation of more workload and as a result fewer SLA violations, as well as a more efficient exploitation of active capacity within the safe operating window with minimum operational expenditures associated with underutilized active servers. Similarly, in Figure 5b for random workload distribution, SSAA continues to exhibit superior performance, ensuring equitable workload distribution across the active SNs and likewise fewer SLA violations on the most loaded SNs, even in the presence of demand asymmetry.

6.3. Latency Performance

Figure 6 presents the average end-to-end latency, measured by hop count using the Manhattan distance metric, as the number of active SNs varies. In both load distribution scenarios, although the FGLB due to its inherent design is strictly optimizes the K-median objective by iteratively selecting the nearest available resources, the SSAA schemes incur a marginal latency penalty relative to FGLB, typically observing an average increase of 0.1–0.2 hops as the number of SNs is rising.

The behavior of SSAA represents a strategic design trade-off, where the algorithm prioritizes load distribution by assigning ANs to slightly more distant, yet under-utilized, SNs when the nearest neighbors approach capacity. This marginal decrease in latency performance is strategically acceptable, as the improved load balancing mitigates the risk of processor saturation and the resulting queuing delays, associated with the hotspots, ensuring consistent service reliability.

6.4. Load Balancing Capability

Figure 7 illustrates the proposed framework’s contribution to even resource utilization by plotting the standard deviation

σ

of the workload processed by active SNs. A lower

σ

value denotes a more uniform distribution of computational tasks across the infrastructure. The results indicate that the SSAA schemes consistently outperform both the FGLB and BCLB baselines, achieving significantly lower load dispersion across all tested configurations of active SNs N. Notably, in the random workload distribution scenario presented in Figure 7b, SSAA reduces the standard deviation by approximately 40% compared to FGLB when

N = 6

, thereby validating its superior capability to maintain system stability under severe demand asymmetry.

Both FGLB and BCLB exhibit a proximity-driven allocation bias, where central and nearby nodes absorb most requests, resulting processing hotspots while peripheral nodes remain underutilized. In contrast, SSAA employs a proximity-aware round-robin assignment logic that effectively distributes excess load to adjacent tiers of available SNs. This mechanism successfully eliminates hotspots, preventing the structural inefficiency where saturated and idle nodes coexist, and ensuring that the aggregate workload is shared evenly among the active resources.

6.5. Service Reliability

We quantify the service and system reliability using the SLAV-Service Outage metric, defined as the total compute demand, measured in MIPS, which remains unprocessed due capacity limits of active SNs and thus triggered an SLA violation. While constraint (4) represents a hard constraint in the optimization formulation, in our dynamic simulation environment, instantaneous violations of C, where utilization exceeds 100%, are penalized as SLA violations.

Figure 8 reports these service outages (i) as a function of the number of active SNs N (Figure 8a,b) and (ii) as a function of time over the 24-h simulation horizon (Figure 8c,d), under the same dynamic offered workload conditions. The results from the scaling analysis of active SNs indicate that while N is increasing the FGLB and BCLB exhibit significant service drops even when the aggregate system capacity is theoretically sufficient in contrast to the SSAA schemes. This behavior reveals that service failures in edge networks are primarily driven by load imbalance rather than a simple lack of resources. While FGLB and BCLB are assigning tasks to the nearest or most central nodes, and create “hotspots” that saturate specific servers while others remain underutilized, SSAA eliminates these outages by distributing excess demand to adjacent, non-saturated nodes, thereby effectively exploiting the available capacity across the entire active set

S (t)

.

The operational superiority of the proposed framework is further evidenced in the time-series analysis, particularly during the transition periods between traffic intensity levels (e.g., P2 to P3). As illustrated in Figure 8c,d, FGLB and BCLB strategies suffer from severe spikes in SLA violations during these peak intervals as they suffer to reactively accommodate bursts. Conversely, the SSAA schemes maintains extremely low outage levels throughout the 24-h horizon, regardless of the workload distribution type.

6.6. Energy Efficiency

We model the SN’s power consumption as a step function of the CPU utilization measured, using the load–power curve published by SPECpower_ssj2008SPEC [31] benchmark for the Lenovo ThinkSystem SR655 V3. Although this model focuses exclusively on active compute power and does not account for baseline idle power draw, network-side energy consumption (e.g., switches, access points, routers), and the potentially energy overhead associated with service migration during reconfiguration events, the SPECpower-based step-function effectively captures high-level trends in server-side CPU power consumption under varying utilization states and represents a simplified abstraction of real-world MEC power dynamics. For each SN with utilization

u_{s} \in [0, 100] %

, we map

u_{s}

to a 10-percentage-point load range (0–10, …, 90–100%) with corresponding input power

{128, 160, 187, 213, 236, 258, 280, 302, 340, 365}

in Watts. Thus, the aggregate electrical power required during operation is computed as

P_{agg} = \sum_{s} P (u_{s})

, where

u_{s}

is the utilization of SN s.

Figure 9 illustrates the aggregate electrical consumption for the different schemes as the network scales different active SN subsets

S \subseteq V

. The results indicate that the proposed SSAA schemes incurs marginally higher energy consumption compared to FGLB and BCLB schemes, which is an acceptable trade-off for the improved reliability shown in Figure 8 despite its load balancing logic. Consistent to the system utilization depicted in Figure 5; higher power consumption is inherently associated with increased server utilization by accommodating higher processing load to reduce SLAV occurrences, and by maintaining a more balanced utilization across the active service pool, SSAA keeps edge nodes within a power-efficient operating band, avoiding energy waste associated with local processing saturation. Consequently, the proposed framework delivers superior reliability and performance stability without introducing additional operational expenditure (OPEX) related to energy, demonstrating that intelligent load distribution can simultaneously enhance service quality and preserve power efficiency. Therefore, SSAA achieves a desirable MEC operating point with higher SLA compliance and better fairness at comparable energy cost.

6.7. Performance Evaluation Under Irregular Topology and Real-World Traces

In this section, we evaluate the ENOM framework on a 49-node irregular graph that better captures the gradual and irregular expansion of a real-world evolving network architecture. Additionally, we utilize real-world CPU utilization traces from the BitBrains dataset [32], which capture the highly stochastic and temporal CPU utilization of servers over time. Furthermore, we conduct a sensitivity analysis across different combinations of the hysteresis stability window (L) and the EWMA smoothing factor (

α

) to study their impact on system performance under constantly varying workloads.

Figure 10 illustrates this topology layout presenting a snapshot of the system configuration where 13 SNs strategically activated to handle the offered workload.

The GWA-T-12 dataset from BitBrains contains detailed performance metrics and resource utilization traces of numerous virtual machines from distributed enterprise environments, providing a realistic representation of stochastic user traffic. Before mapping the dataset to our simulated MEC topology, data sanitization was applied. We eliminated abnormal traces, including records indicating over 100% server CPU utilization, incomplete or empty files, and files with a sparse number of timestamps. From the refined dataset, 49 distinct and continuous workload traces were randomly selected to feed the 49 ANs of our topology. Furthermore, to align the dataset with our specific MEC environment, the raw utilization metrics of each node were mapped and normalized as “offered workload”, according to the maximum processing capacity of the edge nodes (500 MIPS).

The simulation timeline was extended to 7200 timestamps in each simulation run. We performed 10 independent iterations utilizing random seeds to select the workload traces for the 49 nodes in each iteration, from the BitBrains dataset, configuring the offered workload profile illustrated in Figure 11. The results presented in the following figures depict the mean values calculated across these 10 independent runs.

The complete set of system parameters for this irregular deployment is summarized in Table 3.

Figure 12 details the operational performance of the three evaluated algorithms (FGLB, BCLB, and the proposed SSAA) across varying numbers of active SNs. For this experimental scenario, the hysteresis stability window and the EWMA smoothing factor are set to

L = 10

and

α = 0.75

, respectively. The analysis, as illustrated in Figure 12, highlights further the robustness of both the ENOM framework and the FGLB and SSAA schemes over the BCLB algorithm under this stochastic workload within the 49 node irregular topology. Notably, based on the load profile of the BitBrains dataset, the dynamic scaling mechanism mainly configures the system to operate with 8, 9, or 10 active SNs to efficiently satisfy the offered demand.

A.: Mean Latency

As expected by its greedy K-median design, FGLB achieves the lowest absolute latency (averaging between 0.6 and 1.0 hops). The proposed SSAA incurs a very marginal latency penalty compared to FGLB. This occurs because SSAA deliberately routes some traffic to slightly more distant nodes to prevent forming localized bottlenecks due to its balancing logic.

B.: Load Balancing Performance

The SSAA scheme consistently demonstrates a significantly lower standard deviation in resource utilization across almost all network sizes compared to both FGLB and BCLB. Although there are specific instances such as when the system is provisioned with 12 or 17 active SNs, where its load imbalance exceeds that of FGLB, its overall variance remains lower, as shown in Figure 13b (

L = 10

,

α = 0.75

). This is also justified by the fact that the system predominantly operates within the 8 to 10 SNs range, where SSAA excels.

C.: Average Resource Utilization

The SSAA maintains consistently higher and more stable average utilization percentages across the active SNs. While there are specific instances, where its utilization equals or falls below that of FGLB, overall, the SSAA’s efficiency remains superior, as depicted in Figure 13c (

L = 10

,

α = 0.75

). Because the dynamic scaling mechanism mainly configures the system to operate within the 8 to 10 SNs range, where SSAA achieves a better exploitation of active capacity, the utilization with SSAA is higher in total.

D.: Service Reliability

As illustrated in Figure 12d, both FGLB and BCLB experience massive spikes in unprocessed MIPS (SLA violations/outages), even when the aggregate active capacity is theoretically sufficient. Although there may be instances (e.g., when the system is provisioned with 12 or 17 active SNs) where SSAA records higher SLA violations, its overall reliability remains superior, as shown in Figure 13d (

L = 10

,

α = 0.75

). This behavior confirms that service failures are mainly driven by poor load distribution (hotspots) rather than a strict lack of resources. Ultimately, SSAA effectively minimizes these outages, validating its superiority in maintaining QoS.

E.: Hysteresis Stability Window and EWMA Smoothing Factor

Figure 13 evaluates the proposed framework under various configurations of the hysteresis stability window (

L \in {5, 10, 20}

) and the EWMA smoothing factor (

α \in {0.5, 0.75, 0.9}

), effectively illustrating the trade-off between system stability and responsiveness. A key observation from this sensitivity analysis is that the SSAA scheme maintains its superiority across all variations of L and

α

. Specifically, it consistently delivers the lowest load balance variance, the highest average utilization, and superior service reliability (Figure 13b–d). This proves that the structural approach of SSAA remains robust regardless of how aggressively the temporal forecasting controller is tuned, further validating the overall stability of the ENOM framework.

Furthermore, Figure 13e highlights that the length of the hysteresis window, L, strictly determines the frequency of system reconfigurations. A smaller window (

L = 5

) renders the system highly sensitive to the temporary traffic micro-bursts present in the BitBrains data traces, triggering an excessive number of reconfigurations (averaging 640 events). Conversely, a larger window (

L = 20

) significantly enhances system stability, reducing reconfigurations to approximately 156.

However, this stability has an impact on resource utilization and service reliability. As shown in Figure 13c, as L increases, utilization deteriorates across all schemes because the system waits longer before deactivating SNs, thus underutilizing resource capacity. Furthermore, as depicted in Figure 13d, while a larger window (

L = 20

) minimizes reconfiguration overhead, it inevitably introduces delayed resource scaling. Since the system waits longer to confirm a sustained load increase before activating new SNs, existing nodes become temporarily saturated, leading to a higher rate of service outages.

Finally, the total power consumption (Figure 13f) varies slightly across configurations, depending primarily on the duration that SNs remain active and their respective resource utilization levels. A larger, more conservative window (

L = 20

) keeps nodes powered on longer during scale-down phases. This delayed deactivation, combined with the higher average resource utilization depicted in Figure 13c, leads to a marginal increase in total power consumption compared to highly sensitive configurations.

Beyond the empirical performance demonstrated in our evaluations, traditional facility location algorithms, such as the K-median or K-means clustering models, aim strictly to minimize the sum of routing distances. Even with the incorporation of a load balancing adaption, these models tend to create highly centralized clusters, leading to severe localized network bottlenecks around the chosen center nodes during traffic bursts. In contrast, the SSAA logic inherently acts as a spatial dispersion mechanism. By iteratively selecting and placing active SNs based on a combination of geographical spread and localized demand, the heuristic naturally distributes the computational load across the network graph. Furthermore, during the assignment phase of ANs to SNs, the heuristic balances user proximity with the maximum utilization thresholds of the SNs. This capacity-aware assignment enforces fairness in resource allocation where it prevents any single node from reaching saturation while neighboring nodes remain idle.

7. Conclusions

In this research, we presented a Workload-Aware Edge Node Orchestration (ENOM) framework that integrates short-term workload forecasting (EWMA), hysteresis-based stability control, and the SSAA placement-assignment heuristics to efficiently manage dynamic processing demands in MEC environments. The proposed approach targets the core MEC trade-off between proximity (latency) and even resource utilization (load balance) while explicitly accounting for operational stability by avoiding unnecessary reconfiguration “flapping”. Extensive simulations on grid and irregular network topologies, under synthetic (uniform and non-uniform) and real-world (BitBrains) workload distributions, demonstrate that our mechanism is highly robust and reliable. Compared to FGLB and BCLB, the SSAA scheme achieves more stable and efficient utilization of the active service pool, substantially improved load balancing, significantly reduced SLAVs (service outage), including high-demand periods, and no observable energy penalty under the adopted utilization-to-power model. These gains are achieved while incurring only a small hop-count latency increase relative to the strict latency-optimized baseline, a trade-off that is operationally justified by the improvements in reliability and hotspot avoidance.

These findings confirm that the proposed framework offers a practical, lightweight, and scalable solution for next-generation edge networks, capable of delivering consistent QoS without the computational overhead of complex learning-based models. Directions for future work include extending the ENOM framework by incorporating robust fault-tolerance and recovery mechanisms to effectively manage unexpected hardware failures, AN disconnections, and link outages. Furthermore, to support large-scale orchestration in highly complex, heterogeneous edge infrastructures, we will investigate decentralized and federated coordination architectures. This will be coupled with the development of lightweight, multi-step forecasting methods designed to preserve controller simplicity while further enhancing responsiveness to rapid temporal fluctuations.

Author Contributions

Conceptualization, E.O. and A.R.; methodology, E.O. and A.R.; software, E.O.; validation, E.O. and A.R.; formal analysis, E.O. and A.R.; investigation, E.O.; writing—original draft preparation, E.O. and A.R.; writing—review and editing, E.O. and A.R.; supervision, A.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in the study are included in the article material, further inquiries can be directed to the corresponding authors.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ACA	Access Node-to-Compute Node Assignment
AN	Access Node
AR	Augmented Reality
BCLB	Betweenness Centrality with Load Balancing
CSS	Compute-Node Selection Set
DL	Deep Learning
DRL	Deep Reinforcement Learning
ENOM	Edge Node Orchestration Module
ESP	Edge Server Placement
EWMA	Exponentially Weighted Moving Average
FGLB	Forward Greedy with Load Balancing
ILP	Integer Linear Programming
IoT	Internet of Things
LSTM	Long Short-Term Memory
MEC	Multi-Access Edge Computing
MIPS	Million Instructions Per Second
OPEX	Operational Expenditure
QoE	Quality of Experience
QoS	Quality of Service
RAN	Radio Access Network
RR	Round Robin
SSAA	Service-Node Selection and Access-Node Assignment
SLA	Service Level Agreement
SN	Service Node

References

Satyanarayanan, M. The Emergence of Edge Computing. Computer 2017, 50, 30–39. [Google Scholar] [CrossRef]
Wu, Q.; Wang, W.; Fan, P.; Fan, Q.; Wang, J.; Letaief, K.B. URLLC-Aware Resource Allocation for Heterogeneous Vehicular Edge Computing. IEEE Trans. Veh. Technol. 2024, 73, 11789–11805. [Google Scholar] [CrossRef]
Satyanarayanan, M.; Bahl, P.; Caceres, R.; Davies, N. The Case for VM-Based Cloudlets in Mobile Computing. IEEE Pervasive Comput. 2009, 8, 14–23. [Google Scholar] [CrossRef]
Hu, Y.C.; Patel, M.; Sabella, D.; Sprecher, N.; Young, V. Mobile Edge Computing: A Key Technology Towards 5G. ETSI White Pap. 2015, 11, 1–16. [Google Scholar]
Cao, K.; Liu, Y.; Meng, G.; Sun, Q. An Overview on Edge Computing Research. IEEE Access 2020, 8, 85714–85728. [Google Scholar] [CrossRef]
Li, S. On Uniform Capacitated k-Median Beyond the Natural LP Relaxation. ACM Trans. Algorithms 2017, 13, 1–18. [Google Scholar] [CrossRef]
Oikonomou, E.; Rouskas, A. Efficient Schemes for Optimizing Load Balancing and Communication Cost in Edge Computing Networks. Information 2024, 15, 670. [Google Scholar] [CrossRef]
Oikonomou, E.; Plastras, S.; Tsoumatidis, D.; Skoutas, D.N.; Rouskas, A. Workload Prediction for Efficient Node Management in Mobile Edge Computing. In Proceedings of the 2024 IFIP Networking Conference (IFIP Networking), Thessaloniki, Greece, 3–6 June 2024; pp. 461–467. [Google Scholar]
Oikonomou, E.; Rouskas, A.; Skoutas, D.N. Adaptive Node Management in Edge Networks Under Time-Varying Workloads. In Proceedings of the 2024 IEEE Virtual Conference on Communications (VCC), NY, USA, 3–5 December 2024; pp. 1–6. [Google Scholar]
Jia, M.; Cao, J.; Liang, W. Optimal Cloudlet Placement and User to Cloudlet Allocation in Wireless Metropolitan Area Networks. IEEE Trans. Cloud Comput. 2017, 5, 725–737. [Google Scholar] [CrossRef]
Wang, S.; Zhao, Y.; Xu, J.; Yuan, J.; Hsu, C.-H. Edge server placement in mobile edge computing. J. Parallel Distrib. Comput. 2019, 127, 160–168. [Google Scholar] [CrossRef]
Kasi, S.K.; Kasi, M.K.; Ali, K.; Raza, M.; Afzal, H.; Lasebae, A.; Naeem, B.; Islam, S.; Rodrigues, J.J. Heuristic Edge Server Placement in Industrial Internet of Things and Cellular Networks. IEEE Internet Things J. 2021, 8, 10308–10317. [Google Scholar] [CrossRef]
Shibata, K.; Miyata, S. Edge Server Placement and Task Allocation for Maximum Delay Reduction. IEEE Open J. Commun. Soc. 2025, 6, 6207–6217. [Google Scholar] [CrossRef]
Maqsood, T.; Zaman, S.K.U.; Qayyum, A.; Rehman, F.; Mustafa, S.; Shuja, J. Adaptive thresholds for improved load balancing in mobile edge computing using K-means clustering. Telecommun. Syst. 2024, 86, 519–532. [Google Scholar] [CrossRef]
Li, Y.; Wang, S. An Energy-Aware Edge Server Placement Algorithm in Mobile Edge Computing. In Proceedings of the 2018 IEEE International Conference on Edge Computing (EDGE), San Francisco, CA, USA, 2–7 July 2018; pp. 66–73. [Google Scholar]
Fan, Q.; Ansari, N. Towards Workload Balancing in Fog Computing Empowered IoT. IEEE Trans. Netw. Sci. Eng. 2020, 7, 253–262. [Google Scholar] [CrossRef]
Xie, Z.; Xia, X.; Hu, B.; Khalil, I.; Wang, Z.; Cui, G.; Xie, G.; Xue, M. Optimizing Energy Efficiency with QoE-Awareness in Multi-Access Edge Computing. In Proceedings of the 2025 IEEE/ACM International Symposium on Quality of Service (IWQoS), Gold Coast, Australia, 2–4 July 2025; pp. 1–10. [Google Scholar]
Tang, Y.; Zhang, H.; Li, Y.; Wang, Z. SDD: Spectral clustering and double deep Q-network based edge server deployment strategy. Front. Comput. Sci. 2025, 7, 1668495. [Google Scholar] [CrossRef]
Zeng, Y.; Ye, H.; Wang, S. A Deep Reinforcement Learning-Based Approach to Resilient Service Placement for Mobile Edge Computing. IEEE Access 2025, 13, 159231–159239. [Google Scholar] [CrossRef]
Yu, P.; Wang, X.; Zhou, L.; Li, K. Cost-Effective Server Deployment for Multi-Access Edge Networks: A Cooperative Scheme. IEEE Trans. Parallel Distrib. Syst. 2024, 35, 1583–1597. [Google Scholar]
Xu, W. A Survey on Dynamic Heterogeneous Resource Scheduling Optimization in Mobile Edge Computing. Sci. J. Intell. Syst. Res. 2025, 7, 54–62. [Google Scholar] [CrossRef]
Tang, L.; Hou, Q.; Wen, W.; Fang, D.; Chen, Q. Digital-Twin-Assisted VNF Migration Through Resource Prediction in SDN/NVF-Enabled IoT Networks. IEEE Internet Things J. 2024, 11, 35445–35464. [Google Scholar] [CrossRef]
Nadukuru, S. AI-Based Workload Prediction Models for Optimizing Serverless Resource Allocation. J. Quantum Sci. Technol. 2024, 1, 45–54. [Google Scholar] [CrossRef]
Xu, W.; Li, Y.; Chen, M.; Wu, J. Fairness-Aware Budgeted Edge Server Placement for Connected Autonomous Vehicles. IEEE Trans. Mob. Comput. 2025, 24, 4762–4776. [Google Scholar] [CrossRef]
Burbano, J.S.; Abdullah, A.; Zhantileuov, E.; Liyanage, M.; Schuster, R. Dynamic Edge Server Selection in Time-Varying Environments: A Reliability-Aware Predictive Approach. arXiv 2025, arXiv:2511.10146. [Google Scholar]
Shi, Y.; Yi, C.; Wang, R.; Wu, Q.; Chen, B.; Cai, J. Service Migration or Task Rerouting: A Two-Timescale Online Resource Optimization for MEC. IEEE Trans. Wirel. Commun. 2024, 23, 1503–1519. [Google Scholar] [CrossRef]
Montgomery, D.C.; Jennings, C.L.; Kulahci, M. Introduction to Time Series Analysis and Forecasting, 2nd ed.; John Wiley & Sons: Hoboken, NJ, USA, 2015. [Google Scholar]
Dohan, D.; Karp, S.; Matejek, B. K-Median Algorithms: Theory in Practice. Working Paper, Princeton University. 2015. Available online: https://www.cs.princeton.edu/courses/archive/fall14/cos521/projects/kmedian.pdf (accessed on 26 March 2026).
Karaca, O.; Tihanyi, D.; Kamgarpour, M. Performance guarantees of forward and reverse greedy algorithms for minimizing nonsupermodular nonsubmodular functions on a matroid. Oper. Res. Lett. 2021, 49, 855–861. [Google Scholar] [CrossRef]
Chernoskutov, M.; Ineichen, Y.; Bekas, C. Heuristic Algorithm for Approximation Betweenness Centrality Using Graph Coarsening. Procedia Comput. Sci. 2015, 66, 83–92. [Google Scholar] [CrossRef]
SPECpower_ssj2008. Available online: https://spec.org/power_ssj2008/results/res2024q4/power_ssj2008-20241118-01475.html (accessed on 11 February 2025).
BitBrains. GWA-T-12 BitBrains. Available online: https://atlarge-research.com/gwa-t-12 (accessed on 28 February 2026).

Figure 1. Three-layer architecture (Cloud/Edge/End Device). The Edge layer shows a grid network of ANs; green nodes host active (operational) SNs and orange nodes denote ANs without an active SN. Solid and dashed lines depict physical inter-AN links. End devices access wirelessly to their nearest AN and offload to the assigned SN to get service.

Figure 2. Flow of the ENOM module.

Figure 3. Manhattan grid graph topology.

Figure 4. Day-time load profile (expressed in MIPS).

Figure 5. Average resource Utilization at varying number of active SNs. (a) Uniform offered workload distribution. (b) Random offered workload distribution.

Figure 6. Mean end-to-end Latency in hop count at varying number of active SNs. (a) Uniform offered workload distribution. (b) Random offered workload distribution.

Figure 7. Load Balancing performance at varying number of active SNs. (a) Uniform offered workload distribution. (b) Random offered workload distribution.

Figure 8. Service Reliability quantified by Service Outage expressed in MIPS for (a) Uniform offered workload distribution, at varying number of SNs. (b) Random offered workload distribution, at varying number of SNs. (c) Uniform offered workload distribution, over the 24-h simulation horizon. (d) Random offered workload distribution, over the 24-h simulation horizon.

Figure 9. Power Consumption across different capacity needs. (a) Uniform offered workload distribution. (b) Random offered workload distribution.

Figure 10. Irregular edge network topology.

Figure 11. BitBrains dataset load profile (expressed in MIPS).

Figure 12. Performance metrics at varying number of active SNs. (a) Mean end-to-end Latency. (b) Load Balancing performance. (c) Average resource Utilization. (d) Service Reliability.

Figure 13. Sensitivity Analysis accross different L and $α$ configurations. (a) Mean end-to-end Latency. (b) Load Balancing performance. (c) Average resource Utilization. (d) Service Reliability. (e) System Reconfigurations. (f) Power Consumption.

Table 1. Offered workload profile per time-of-day period (expressed in MIPS).

Period	Time Window	Load Intensity	Avg. Load (MIPS)	Load Dispersion
P1	00:00–02:00	Medium–High	215.0	90
P2	02:00–07:00	Medium	145.0	90
P3	07:00–10:00	High	275.0	90
P4	10:00–12:00	Medium	145.0	90
P5	12:00–14:00	Medium–Low	85.0	70
P6	14:00–16:00	Low	30.0	60
P7	16:00–21:00	High	275.0	90
P8	21:00–24:00	Medium	145.0	90

Table 2. System simulation parameters.

Parameter	Value	Definition
M	25	Total number of ANs
$N (t)$	$1 \leq N (t) \leq 25$	Number of Active SNs
C	500	SN Processing Capacity (MIPS)
$d_{a d j}$	1	Manhattan distance between adjacent ANs (hop)
$U_{m i n}, U_{m a x}$	35%, 75%	Target Utilization Bounds (Low–High)
$α$	0.75	EWMA smoothing factor
L	5	Hysteresis stability window (time epochs)

Table 3. System simulation parameters for the irregular topology.

Parameter	Value	Definition
M	49	Total number of ANs
$N (t)$	$1 \leq N (t) \leq 49$	Number of Active SNs
C	500	SN Processing Capacity (MIPS)
$d_{a d j}$	1	Manhattan distance between adjacent ANs (hop)
$U_{m i n}, U_{m a x}$	35%, 75%	Target Utilization Bounds (Low–High)
$α$	0.5, 0.75, 0.9	EWMA smoothing factor
L	5, 10, 20	Hysteresis stability window (time epochs)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Oikonomou, E.; Rouskas, A. Workload-Aware Edge Node Orchestration and Dynamic Resource Scaling in MEC. Future Internet 2026, 18, 184. https://doi.org/10.3390/fi18040184

AMA Style

Oikonomou E, Rouskas A. Workload-Aware Edge Node Orchestration and Dynamic Resource Scaling in MEC. Future Internet. 2026; 18(4):184. https://doi.org/10.3390/fi18040184

Chicago/Turabian Style

Oikonomou, Efthymios, and Angelos Rouskas. 2026. "Workload-Aware Edge Node Orchestration and Dynamic Resource Scaling in MEC" Future Internet 18, no. 4: 184. https://doi.org/10.3390/fi18040184

APA Style

Oikonomou, E., & Rouskas, A. (2026). Workload-Aware Edge Node Orchestration and Dynamic Resource Scaling in MEC. Future Internet, 18(4), 184. https://doi.org/10.3390/fi18040184

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Workload-Aware Edge Node Orchestration and Dynamic Resource Scaling in MEC

Abstract

1. Introduction

2. Related Work

2.1. Static Placement and Latency Minimization

2.2. Load Balancing and Energy Efficiency

2.3. Predictive, Dynamic, and Resilient Orchestration

3. System Model and Problem Statement

3.1. Network Architecture and Dynamic Context

3.2. Traffic and Communication Costs

3.3. Problem Formulation

4. Proposed Schemes

4.1. Heuristic for CSS Subproblem

4.2. Heuristic for ACA Subproblem

5. Predictive Utilization Monitoring and System Reconfigurations

5.1. Sensing and Forecasting

5.2. Dynamic Scaling and Threshold Policy

5.3. Hysteresis and Stability Control

6. Evaluation: Results, Analysis and Discussion

6.1. Experimental Setup: Environment, Topology and Workload Modeling

6.2. Resource Utilization Efficiency

6.3. Latency Performance

6.4. Load Balancing Capability

6.5. Service Reliability

6.6. Energy Efficiency

6.7. Performance Evaluation Under Irregular Topology and Real-World Traces

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI