Next Article in Journal
System Design for On-Board Multi-Mission Compatibility of Spaceborne SAR
Next Article in Special Issue
Small-Sample Thermal Fault Diagnosis Using Knowledge Graph and Generative Adversarial Networks
Previous Article in Journal
A Hierarchical Predictive-Adaptive Control Framework for State-of-Charge Balancing in Mini-Grids Using Deep Reinforcement Learning
Previous Article in Special Issue
Permeability Index Modeling with Multiscale Time Delay Characteristics Excavation in Blast Furnace Ironmaking Process
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Lightweight Learning-Based Approach for Online Edge-to-Cloud Service Placement

by
Mohammadsadeq Garshasbi Herabad
1,
Javid Taheri
1,2,*,
Bestoun S. Ahmed
1 and
Calin Curescu
3
1
Department of Mathematics and Computer Science, Karlstad University, 651 88 Karlstad, Sweden
2
School of Electronics, Electrical Engineering and Computer Science, Queen’s University Belfast, Belfast BT7 1NN, UK
3
Ericsson AB, 16483 Stockholm, Sweden
*
Author to whom correspondence should be addressed.
Electronics 2026, 15(1), 65; https://doi.org/10.3390/electronics15010065
Submission received: 14 November 2025 / Revised: 15 December 2025 / Accepted: 18 December 2025 / Published: 23 December 2025

Abstract

The integration of edge and cloud computing is critical for resource-intensive applications which require low-latency communication, high reliability, and efficient resource utilisation. The service placement problem in these environments poses significant challenges owing to dynamic network conditions, heterogeneous resource availability, and the necessity for real-time decision-making. Because determining an optimal service placement in such networks is an NP-complete problem, the existing solutions rely on fast but suboptimal heuristics or computationally intensive metaheuristics. Neither approach meets the real-time demands of online scenarios, owing to its inefficiency or high computational overhead. In this study, we propose a lightweight learning-based approach for the online placement of services with multi-version components in edge-to-cloud computing. The proposed approach utilises a Shallow Neural Network (SNN) with both weight and power coefficients optimised using a Genetic Algorithm (GA). The use of an SNN ensures low computational overhead during the training phase and almost instant inference when deployed, making it well suited for real-time and online service placement in edge-to-cloud environments where rapid decision-making is crucial. The proposed method (SNN-GA) is specifically evaluated in AR/VR-based remote repair and maintenance scenarios, developed in collaboration with our industrial partner, and demonstrated robust performance and scalability across a wide range of problem sizes. The experimental results show that SNN-GA reduces the service response time by up to 27% compared to metaheuristics and 55% compared to heuristics at larger scales. It also achieves over 95% platform reliability, outperforming heuristics (which remain below 85%) and metaheuristics (which decrease to 90% at larger scales).

Graphical Abstract

1. Introduction

The integration of edge and cloud computing improves performance and fulfils critical latency requirements for resource-intensive systems such as Augmented Reality (AR) and Virtual Reality (VR). Edge computing processes data near its source, allowing low-latency handling of time-sensitive tasks on local devices or on nearby servers. The cloud complements the edge by providing support for more demanding workloads [1]. In edge-to-cloud computing, real-time decision-making is essential for effectively allocating resources online to services while fulfilling the Quality of Service (QoS) requirements. The complexity of service placement in such environments is amplified by the demanding nature of real-time multimedia applications. The heterogeneity of devices and computing nodes in the edge-to-cloud network (which varies significantly in communication protocols, processing power, storage capacity, and other resource attributes) adds further complexity. Additionally, dynamic conditions such as variable task arrival rates and fluctuations in resource availability render the development of online service placement algorithms even more challenging in edge-to-cloud computing [2,3,4].
The service placement problem in edge-to-cloud networks is identified as a combinatorial optimisation problem [5] that involves selecting and placing services on computing nodes (in an edge-to-cloud platform) under various constraints. Because the solution space expands exponentially with the number of services and computing nodes (identified as NP-complete [3,6,7]), exploring all possible solutions in polynomial time is infeasible. In addition, resource and latency constraints, interdependencies between service components, and the need to optimise multiple objectives (e.g., latency and system reliability) make it more difficult when compared with conventional resource allocation and scheduling problems. The addition of more flexibility further expands the solution space, making the problem even more difficult. For example, in our previous study [3] (also in this study), we considered a situation in which each service component can have multiple implementation versions. Each version differs in terms of resource demand, performance characteristics, and QoS. This expands the solution space significantly, as the placement algorithm must simultaneously decide not only where to place each service component but also which version to deploy. This versioning aspect introduces additional decision variables and constraints, rendering the service placement problem even more challenging and computationally intensive.
Service placement optimisation in edge-to-cloud environments can be handled using exact, heuristic, and metaheuristic techniques. Exact methods provide optimal solutions if adequate time is available. However, their runtimes increased exponentially as the problem size increased. Heuristic methods do not provide optimal solutions but usually yield relatively reasonable results almost instantly. Metaheuristics can provide near-optimal solutions, although applying them directly to the service placement problem can significantly increase the runtime when compared with heuristics, which makes them only suitable for solving offline resource allocation problems. This makes only heuristics suitable for online scenarios (despite their serious performance issues in producing optimal solutions) that require real-time decision-making for service placement [5].
Learning-based techniques are other approaches that have recently gained attention for solving problems related to service placements [8,9]. These methods introduce an approach that uses a training-inference framework. A model is first trained during the typically time-consuming training phase and then applied to solve new problem instances. The inference phase to provide the best possible solution is presumably as fast as the simple greedy algorithms (i.e., almost instant). These methods provide higher-quality solutions than heuristics. They are also significantly more suitable than metaheuristics and exact methods in terms of efficient use of computational resources during the inference phase. Their data-driven nature makes them powerful tools for modern edge-to-cloud systems that require intelligent and robust placement decisions. However, these methods have several limitations. First, although training is performed offline, a large volume of high-quality data is required, which may not always be available. Second, these models (especially deep learning models) perform as black boxes without transparency and interpretability in decision-making. Third, models trained in one environment may not generalise well to others without retraining or adaptation, which can be costly and time consuming. To address these challenges and to develop reliable and efficient learning solvers, they must be designed with major considerations. These models must effectively balance exploration and exploitation to prevent overfitting or underfitting to specific problem instances while maintaining strong generalisation capabilities. They must be carefully designed because learning-based approaches tend to be more sensitive to their hyperparameters.
Although service placement has been extensively studied, most existing approaches rely on simplified assumptions, such as single-version service components, offline placement with full system knowledge, homogeneous computing nodes or resource demands, or optimisation focused primarily on latency. Recent learning-based methods also depend on computationally expensive deep reinforcement learning or deep neural networks, making them unsuitable for real-time decision-making. In contrast, this study considers a more complex and realistic online edge-to-cloud scenario in which each service component has multiple implementation versions with different resource demands and reliability scores. This multi-version architecture, combined with heterogeneous computing nodes and the absence of future system-state knowledge, significantly expands the search space and introduces new challenges. These include selecting an appropriate version of a service component, choosing the most suitable node for placement, jointly optimising response time and both software/hardware reliability, and achieving generalisability without requiring large training datasets.
In this study, we propose a novel lightweight learning-based approach (SNN-GA) to address the online service placement problem. We designed a shallow yet extremely efficient Neural Network (NN) that accepts various features of the edge-to-cloud system as input and produces the optimal (or very close to optimal) solution almost instantly. We used a Genetic Algorithm (GA) to set the hyperparameters. To handle nonlinearity, SNN-GA uses both power and weight coefficients (instead of the commonly used activation function) to process input features from edge-to-cloud environments. During the training phase, the GA optimiser determined the optimal values for both the weight and power coefficients in the NN layer of the SNN-GA. The optimisation process simultaneously considers multiple objectives (minimising service response time and maximising system reliability). Once the SNN-GA is fully trained, it can be employed in the inference phase to make online service placement decisions in other edge-to-cloud environments. SNN-GA was evaluated using a wide range of problem instances and proved to be efficient, robust, and generalisable to solve problem instances multiple times larger than those used for its training.
The SNN-GA differs from the existing learning-based methods. First, it employs a shallow neural network, which reduces the number of parameters to be learned, resulting in faster training and inference compared with other methods, such as commonly used deep learning models. Second, the training process is unsupervised and does not require large volumes of high-quality labelled data, which are often difficult to obtain in edge-to-cloud environments. These differences significantly enhance the interpretability and explainability of the SNN-GA, making it more suitable for online scenarios. The main contributions of this study are as follows:
  • We model an online service placement scenario with multi-version service components in edge-to-cloud environments. We evaluate our approach using an AR/VR application (for remote repair and maintenance) with unique requirements.
  • We design a lightweight learning-based approach, called SNN-GA, to solve the stated problem with the primary objectives of minimising service response time while maximising system reliability.
  • We use a Genetic Optimiser to train a shallow neural network with both weight and power coefficients (instead of commonly used activation functions) to help it to handle nonlinearity in the environment.
  • We perform a comprehensive analysis to evaluate SNN-GA (in terms of effectiveness, efficiency, robustness, and generalisability) and compare it against several state-of-the-art heuristic and metaheuristic approaches.
The remainder of this paper is organised as follows. Section 2 reviews the related work. Section 3 and Section 4 describe the proposed system and objective function, respectively. Section 5 details our approach (SNN-GA) to solving the online service placement problem. Section 6 discusses the experimental setup. Section 7 and Section 8 evaluate the SNN-GA and compares it with other approaches. Section 9 concludes the study and highlights future work.

2. Related Work

Approaches for optimally placing services in edge-to-cloud computing environments can be categorised into three main classes.

2.1. Heuristic-Based Solutions

Several heuristic approaches for service placement and resource allocation have been introduced in literature. For example, Brogi et al. [10] presented latency-aware heuristic algorithms for deploying multicomponent IoT applications on an edge infrastructure. Li et al. [11] proposed a proactive graph-colouring heuristic to optimise task offloading and resource allocation in mobile edge computing to improve virtual reality users’ quality of experience (QoE). Mahjoubi et al. [12] introduced a set of heuristic algorithms to handle service chain placement in three-layer IoT–edge–cloud architectures, aiming to minimise total service delay through Mixed-Integer Linear Programming (MILP). Khan et al. [13] introduced a computation offloading framework for edge systems by introducing two methods (Maximum Offloading with Delay Constraint and Minimum Delay Offloading) to handle sudden video streaming spikes. Xu et al. [2] developed a heuristic-based optimisation model for multi-user edge computing networks to minimise task delays by incorporating elements of genetic algorithms to refine the solution. Wu et al. [14] designed a decentralised strategy for resource allocation using fuzzy control systems that allow edge users to utilise local information for decision-making.
Although the above studies provide valuable insights into optimising service placement in edge-to-cloud networks, most of them only focus on a single objective, often simply minimising latency. This narrow focus neglects other crucial Quality of Service (QoS) metrics. Many also employ heuristics that may not be effectively scaled to large and complex network environments, potentially leading to suboptimal solutions. Some techniques rely heavily on the local information available at individual devices or edge nodes, which may hinder the achievement of globally optimal solutions. Consequently, heuristic service placement approaches encounter challenges in terms of the solution quality and generalisability when applied to edge-to-cloud environments.

2.2. Metaheuristic-Based Solutions

In our earlier work [3], we proposed a Multi-Objective Genetic Algorithm (MOGA) for AR/VR service placement for remote repair and maintenance applications. Extensive experiments have revealed that metaheuristic algorithms significantly outperform conventional heuristics in terms of the solution quality. However, their runtimes increase exponentially as the problem size increases. de Souza et al. [6] developed a Bee Colony optimisation strategy to reduce application execution time by offloading tasks to the network edge. Hosseinzadeh et al. [7] introduced a discrete Butterfly optimisation Algorithm for task scheduling in edge-computing environments. Apat et al. [15] investigated service placement optimisation in IoT use cases by considering makespan and energy consumption objectives. They developed various population-based metaheuristics (Genetic Algorithm, Simulated Annealing, and Particle Swarm Optimisation), as well as hybrid versions (GA-SA and GA-PSO). Their results indicated that hybrid metaheuristics outperform simpler greedy solutions. Bey et al. [16] introduced a Quantum-inspired PSO scheme for IoT-driven service placement. Furthermore, Huang et al. [17] proposed a multi-objective Ant Colony optimisation technique for container placement across edge-to-cloud infrastructures. Ghobaei-Arani et al. [18] presented a Whale optimisation Algorithm-based approach to solve service placement challenges in IoT environments.
Although recent studies have demonstrated growing interest in employing metaheuristic algorithms to address service placement problems, several limitations persist. These approaches, while capable of producing near-optimal solutions, often exhibit generalisability issues. In addition, their execution times can increase exponentially as problem complexity increases, making them impractical for online scenarios and large-scale deployments. Furthermore, achieving optimal performance frequently necessitates careful tuning of algorithm-specific hyperparameters, which can be time consuming and may lead to suboptimal outcomes if not properly configured. Consequently, metaheuristic service placement techniques face challenges in terms of computational efficiency, robustness, and generalisability.

2.3. Learning-Based Solutions

Liu et al. [19] utilised Deep Reinforcement Learning (DRL) to make online decisions about service deployment and computational resource allocation in a 5G-supported edge computing framework. The work focuses only on latency minimisation, without modelling service heterogeneity, reliability, or multi-version computational behaviors that arise in real-world edge-to-cloud systems. Fahimullah et al. [20] investigated how different learning-based techniques, such as NNs and RL, could address service placement challenges by predicting user demands and resource availability in edge/fog computing. However, it remains mostly high-level and does not analyse fine-grained online decision-making challenges (such as multi-version service heterogeneity, real-time constraints, and placement reliability trade-offs), which limits its applicability to practical edge-cloud scenarios. Sharma et al. [21] proposed a dynamic placement algorithm for IoT services in an edge-to-cloud setting, employing a Double Deep Q-Network combined with Prioritized Experience Replay. Nevertheless, it treats each service as a single monolithic unit and does not capture execution-level heterogeneity, intercomponent dependencies, or reliability variations. Wang et al. [22] modelled the service placement problem as a Markov Decision Process and used deep Q-learning to decide where various service components should be allocated in an edge network. The method uses a deep Q-learning architecture with online decision-making overhead and does not evaluate real-time inference latency or the practicality of deploying such multi-step RL models in latency-critical edge environments. Chen et al. [23] focused on unsupervised deep learning for binary offloading in mobile edge computing, using a deep neural network to optimise offloading decisions. However, it operates on atomic tasks and does not consider multi-component service graphs, version heterogeneity, or reliability factors required in modern edge-cloud applications. Truong et al. [24] also employed the DRL for partial computational offloading in similar environments. Their appraoch has limited scalability because its RL state and action spaces grow with all user–subchannel combinations, making training and inference increasingly expensive as the network size expands. Meanwhile, Lingayya et al. [25] applied multi-agent collaborative RL to dynamically assign tasks in edge computing systems. However, its overall complexity and computational overhead scale poorly as the number of IoT devices, tasks, and edge nodes increases. Pang et al. [26] introduced a multi-agent DRL framework for task offloading in heterogeneous edge networks. The high-dimensional joint state–action space of the method limits scalability. Li et al. [27] integrated DRL with Lyapunov optimisation to further improve task offloading in mobile edge computing scenarios. They assume each task corresponds to a single service type and does not support multi-component service graphs or inter-service dependencies. Zhang et al. [28] proposed a distributed Stackelberg game framework for task offloading and bandwidth allocation in MEC-enabled C-ITS, and introduced a multi-agent reinforcement learning algorithm (SG-MAPG) to approximate the Stackelberg equilibrium and improve computation rate. However, the approach is limited by its binary offloading model, and it does not address heterogeneous service with multi-version components.
A significant drawback of many existing learning-based approaches for service placement optimisation is their reliance on complex deep learning algorithms (particularly DRL). These models often require substantial amounts of data and extensive training time, making them resource-intensive and challenging to scale into large and complex networks. Furthermore, they often fall short when it comes to the rigorous evaluation of model generalisability. In many cases, a large portion of the data is used for training (i.e., a small subset is reserved for testing), making the evolution of the approaches less rigorous. Furthermore, it may limit such approaches to adequately generalising to unseen scenarios. Many of these techniques also involve numerous hyperparameters that significantly influence performance, requiring careful tuning. RL-based methods, especially multi-agent approaches, can face challenges in providing high-quality solutions given that they rely on local data. Therefore, current learning-based approaches for service placement optimisation may face challenges in terms of generalisability and resource efficiency (particularly resource consumption in the training phase).
In addition to aforementioned shortcomings, all existing strategies assume homogeneous and single-version components, which are not entirely correct in advanced edge-to-cloud scenarios involving heterogeneous environments. Consequently, there is a clear need for further research to develop novel algorithms that are generalisable and more efficient, both in terms of resource consumption and solution quality, for online service placement in larger scale edge-to-cloud systems.

3. System Model

In this study, we considered service placement in a specific edge-to-cloud scenario that uses AR and VR technology for remote repair and maintenance tasks [3]. This use case, which was formulated in collaboration with our industrial partner (Ericsson), considers a scenario in which an industrial device malfunctions and no expert is available onsite. In such cases, a local technician uses an AR/VR application to connect with a remote expert and share the video footage of the malfunctioning device to facilitate identification and troubleshooting of the issue. Based on our industry-partner requirements, this system must provide private real-time high-definition video streaming with low-latency communication and efficient task distribution across an edge-to-cloud infrastructure to maintain high system reliability. In addition, unlike centralised cloud-based video-calling systems that share components and risk privacy breaches, our system must assign dedicated services to each user-helper pair. This eliminates shared components and creates a lightweight and decentralised architecture that supports deployment across diverse hardware, from large servers to small edge devices, providing flexibility, scalability, and privacy.
The edge-to-cloud AR/VR-based remote repair and maintenance use case studied in this paper was formulated in detail in our earlier work [3], which focused on offline one-shot placement of services for the stated problem. For clarity, we provide an overview of the updated model for the online scenarios. The notations relevant to the system model are summarised in Table 1.

3.1. Infrastructure

A three-tier infrastructure is considered in this study: Tier-1 consists of access points (APs) that act as entry points for devices to connect to the network; Tier-2 comprises edge nodes close to the APs; and Tier-3 is the cloud that provides high-capacity computational resources and storage for tasks requiring extensive processing. Figure 1 shows the components of our edge-to-cloud architecture.
Each computing node within the infrastructure is described by a unique set of characteristics defined as CN = { C N 1 t , C N 2 t , , C N k t , , C N K t } , where CN k t = C C k t , M C k t , D C k t , R S k t . The total number of computing nodes is denoted by K, whereas C C k t , M C k t , and D C k t correspond to the available computational, memory, and disk capacities of the computing node k at time t, respectively. R S k t represents the reliability score of computing node at time t. The characteristics of all the computing nodes at time t are denoted as C N a l l c h t .
C N a l l c h t = C C 1 t M C 1 t D C 1 t R S 1 t C C 2 t M C 2 t D C 2 t R S 2 t C C K t M C K t D C K t R S K t
Moving upward through tiers in the infrastructure, nodes provide more computational and memory capacity; however, this improvement is accompanied by increased network delays. We assume that the current available bandwidth (BW) and transmission delays (LD) for communication links between computing nodes are known to model the interaction between entities within the infrastructure. To formalise this, Equation (2) is introduced to capture the available bandwidth and observed delay at time t, where the delay is approximated as half of the round-trip time between the two nodes. In this formulation, rows and columns indexed from 1 to K correspond to computing nodes, columns from K + 1 to K + N represent connected user nodes, and columns from K + N + 1 to K + N + M represent connected helper nodes.
C N a l l b w t = 0 , 0 B W 1 , K + N + M t , L D 1 , K + N + M t B W 2 , 1 t , L D 2 , 1 t B W 2 , K + N + M t , L D 2 , K + N + M t B W K , 1 t , L D K , 1 t B W K , K + N + M t , L D K , K + N + M t
In the infrastructure, computing nodes in each tier can establish connections with computing nodes in other tiers (inter-tier communication), as well as within the same tier (intra-tier communication). The communication bandwidth decreases as we move toward higher tiers.

3.2. Services and Applications

In our use case, “users” connect with remote “helpers” through AR/VR applications for repair or maintenance tasks in industrial settings. Users and helpers use their personal devices, which vary in characteristics, such as computational capacity (CC), memory capacity (MC), disk capacity (DC), and reliability score (RS). The device characteristics are represented as u i = C C i , M C i , D C i , R S i for users and h j = C C j , M C j , D C j , R S j for helpers.
The platform hosting a diverse set of AR/VR services is represented as S t = { S 1 , S 2 , , S x , , S X } , where each service S x consists of multiple components denoted by S x = { S C 1 x , S C 2 x , , S C y x , , S C Y x } . Every component S C y x is available in multiple versions and is described by S C y x = { S C y , 1 x , S C y , 2 x , , S C y , v x , , S C y , V x } . These versions are defined by their distinct requirements, including the computational power, memory, disk space, and data transfer specifications necessary for interaction with other components. We assume that service component versions are provided by various providers. Accordingly, each version is also denoted by unique characteristics, including the Codec Type and reliability score, which indicate the failure probability of a component. The resource requirements and characteristics of a service component are represented by S C y c h x (Equation (3)), encompassing the computational requirements (CR), memory requirements (MR), disk requirements (DR), data size (DS), provider (PR), codec type (CT), and reliability score (RS).
Services are modelled as Directed Acyclic Graphs (DAGs), where nodes represent the individual service components and edges signify their interdependencies. This DAG-based representation provides a clear framework for understanding the relationships and workflows between the service components [3].
S C y c h x = C R 1 y M R 1 y D R 1 y D S 1 y P R 1 y C T 1 y R S 1 y C R 2 y M R 2 y D R 2 y D S 2 y P R 2 y C T 2 y R S 2 y C R V y M R V y D R V y D S V y P R V y C T V y R S V y
Because we consider an online scenario, we assume that we only have data on the characteristics of the currently connected user-helper pairs and lack prior knowledge of the characteristics of pairs that will connect to the system. Therefore, our approach focused only on the current state of the system when making decisions. This assumption aligns with the real-time nature of online environments, where the system must handle unpredictable and continuously changing user-helper interactions.

4. Performance Metrics and Objective Function

4.1. Service Response Time

The response times for S C y , v x (denoted as R T S C y , v x ) were calculated using Equation (4), where T D S C y , v x , E T S C y , v x , P D S C y , v x , and C D S C y , v x represent the network delay, execution time, provider delay, and coding delay, respectively. Similarly, the response time for service x ( R T S x ) and the total response time for all services ( R T S t ) at time t are determined using Equations (5) and (6), respectively.
R T S C y , v x = T D S C y , v x + E T S C y , v x + P D S C y , v x + C D S C y , v x
R T S x = y = 0 , v V Y R T S C y , v x
R T S t = x = 0 X R T S x
Provider delays and encoding/decoding delays are obtained through passive measurements and/or by platform providers. The modelling of the data transmission and execution times for a service component is provided below.

4.1.1. Network Delay

The network delay is calculated using Equation (7), where D S S C y , v x denotes the data size for version v of service component y in service x, B W l t reflects the available bandwidth of communication link l, and P R i j is the propagation delay, which represents the travel time from node i to node j across the communication link.
T D S C y , v x = D S S C y , v x B W l t + P R i j

4.1.2. Execution Time

Equation (8) calculates the execution time of S C y , v x . C R S C y , v x denotes the total instructions required to execute version v of the service component y, and C C k t represents the computational capacity of the computing node at time t. The equation considers the waiting time (W), incorporating delays when the service component is queued for execution at the node.
E T S C y , v x = C R S C y , v x C C k t + W

4.2. System Reliability

To estimate the reliability of the system, we consider both software (service components) and hardware (computing nodes) reliability derived from their historical behaviour and performance. Following the reliability model presented in [29], we computed the reliability score of each component using the exponential function e λ t , where λ is the failure rate and t is the observed time period. By applying this model to historical data, we obtained reliability scores for both service components and computing nodes.

4.2.1. Service Reliability

Given the interdependencies between the service components of a service, the reliability of S x (between 0.0 and 1.0 ) is determined using Equation (9) [29]. This equation estimates the probability of successfully completing a service, where R S S C y , v x denotes the reliability score of S C y , v x , where higher values reflect greater reliability.
R S S x = y = 0 , v V Y R S S C y , v x
The average reliability across all services at time t is calculated using Equation (10).
R S S t = x = 0 X R S S x X

4.2.2. Platform Reliability

Platform reliability, based on the independent relationships of computing nodes, is calculated using Equation (11) [29], where R S k (between 0.0 and 1.0 ) denotes the reliability score of computing node k, where higher values reflect greater reliability.
R S C N t = 1 k = 1 K ( 1 R S k )
Therefore, given that the failure of any part (whether it is the user node, helper node, or computing nodes) can negatively impact the others, the platform reliability of a user-helper pair is calculated by R S p = R S i × R S j × R S C N t , where R S i and R S j represent the hardware reliability of user i and helper j, respectively. The average platform reliability of all pairs at time t was calculated using Equation (12).
R S P t = p = 0 X R S p X

4.3. Objective Function

The primary objective of the system is to minimise the service response time and maximise both the hardware and software reliability. To achieve this, the values for the service response time ( R T S t ), software reliability ( R S S t ), and hardware reliability ( R S P t ) are first normalised to ensure comparability across metrics. The service placement cost ( S P c o s t ) is then computed using the weighted-sum method, as expressed in Equation (13). By default, equal weights are assigned to each factor ( w 1 = w 2 = w 3 , where w 1 + w 2 + w 3 = 1 ), balancing the importance of the response time, software reliability, and hardware reliability. However, this approach is flexible, allowing customisation of weights based on specific priorities or system requirements.
S P c o s t = w 1 × R T S t + w 2 × ( R S S t ) + w 3 × ( R S P t )
The objective function is represented by Equation (14), which focuses on determining the optimal solution under the defined constraints.
O b j e c t i v e f u n c t i o n : m i n S P c o s t
subjected to:
v V S C y , v x = 1 , y Y , S C y , v x { 0 , 1 }
x , y , v X , Y , V r e s ( S C y , v x ) < r e s ( C N k ) , k K
S C y , v x = u x , x X
S C y , v x = h x , x X
The objective function was constrained by several conditions. Constraint (15) ensures that no additional copies of the service component are created. Constraint (16) ensures that the total resource consumption of all service components running on a node does not exceed its available resources. Constraints (17) and (18) ensure that users and helpers can execute only the service components directly linked to them.

5. SNN-GA: The Proposed Solution

In this section, we introduce the proposed learning-based approach. The SNN-GA is divided into two phases: training and inference. In the training phase, a shallow NN is trained using a GA optimiser. In the inference phase, the trained model is employed to estimate the suitability of the computing nodes for hosting service components when making online service placement decisions. The notations relevant to this section are summarised in Table 2.

5.1. SNN-GA: Inputs and Outputs

We assume that data regarding the current system state (both requirements and characteristics) are available and are considered as input every time a new user-helper pair is added to the network at time t. Consequently, the input layer of the designed NN is in the current state of the system, that is, the characteristics of the nodes, such as the available computational capacity ( C C k t ), memory usage ( M C k t ), reliability score ( R S k t ), and average available bandwidth of the communication links of the computing node ( B W k t ). It also considers the characteristics of the service components to be added to the network, that is, the computational ( C R S C y , v x ) and memory ( M R S C y , v x ) requirements, data transfer size ( D S S C y , v x ), and reliability score ( R S S C y , v x ) for each version of the service components.
The proposed shallow NN uses Equation (19) to estimate the suitability of each computing node. Therefore, the output layer is designed with a single neuron that estimates (computes) the suitability value as the final output for each computing node.
In this equation, C C k t , M C k t , R S k t and B W k t represent the normalised features of node k at time t. C R S C y , v x , M R S C y , v x , D S S C y , v x , and R S S C y , v x represent the normalised features of S C y , v x . α 1 , , α 8 and β 1 , , β 8 are learnable parameters including the weight and power coefficients. The power coefficients apply a nonlinear transformation to the features that enable the model to capture nonlinear and non-polynomial relationships. Specifically, the combination of weight and power coefficients not only scales the features, but also adjusts their influence based on their importance and interaction patterns. This flexibility makes the model well suited for handling diverse feature distributions and capturing strong nonlinear dependencies. Because we used power coefficients to provide nonlinearity, we did not include an activation function in the output layer. The optimal values for the learnable parameters were determined using a GA-based optimiser during the training phase.
S V k = α 1 × ( C C k t ) β 1 + α 2 × ( M C k t ) β 2 , + α 3 × ( R S k t ) β 3 + α 4 × ( B W k t ) β 4 , + α 5 × ( C R S C y , v x ) β 5 + α 6 × ( M R S C y , v x ) β 6 , + α 7 × ( D S S C y , v x ) β 7 + α 8 × ( R S S C y , v x ) β 8

5.2. SNN-GA: The Training Phase

To train the model for service placement decision-making, we use a GA to identify the optimal values for the learnable parameters α 1 , , α 8 and β 1 , , β 8 . GA tunes the model parameters to ensure that the model produces the most suitable value for selecting the optimal node to place the current service components. GA, as a population-based metaheuristic, was chosen owing to its ability to explore large search spaces effectively and identify global optima. GA finds solutions through its crossover, mutation, and selection operators across multiple iterations.
It is worth noting that because the proposed model (SNN-GA) introduces nonlinearity through power coefficients, the optimisation space becomes non-smooth and non-differentiable. This makes traditional gradient-based optimisers such as SGD or Adam unsuitable because they rely on well-defined gradients and stable activation functions. For this reason, we employ a GA, a derivative-free global optimiser capable of exploring highly irregular search spaces while avoiding poor local minima. These characteristics make GA more appropriate than conventional DNN optimisers for the model architecture used in this study. Deep learning models were deliberately avoided due to their higher training complexity, sensitivity to hyperparameter tuning, and limited suitability for real-time online deployment in dynamic edge environments. The proposed shallow neural network significantly reduces the number of learnable parameters, lowering the risk of overfitting when training data is limited or synthetically generated.

5.2.1. GA Solution Encoding

To apply the GA to solve an optimisation function, it is essential to encode the initial solutions (called chromosomes). This encoding represents solutions in a format that can be manipulated by the algorithm during the optimisation process. As illustrated in Figure 2, the initial population of the GA comprises several chromosomes (each representing a unique solution). Each value in the chromosome array corresponded to a specific learnable parameter in the model. Considering the size of our input, which reflects the type of information that must be considered when making a placement decision, the length of the array was set to 16: the first eight elements for α 1 , , α 8 , followed by another eight elements for β 1 , , β 8 .

5.2.2. GA Cost Function

The cost function in GA, also known as the fitness function, is designed based on the defined objective function (Equation (14)) to evaluate the performance of a given solution within the search space. Specifically, solutions that produce smaller values for the objective function are associated with reduced placement costs, and are considered superior. This type of cost function for our GA ensures that the optimisation process consistently prioritises solutions that lead to improved service placement decision-making policies.

5.2.3. GA Operators

The GA optimises solutions by evolving them through crossover, mutation, and selection operators across multiple iterations. We adopted a single-point crossover operator to combine the parent chromosomes. This operator swaps part of the two selected parent solutions at a randomly determined crossover point to build offspring that inherit the characteristics from both parents. The crossover rate ( c r ) determines the probability of performing this operation on each selected pair of chromosomes. Mutation provides exploration by randomly changing parts of candidate solutions. Specifically, because the solution elements (chromosome genes) are continuous, we use a Gaussian mutation operator that modifies the solution elements at a mutation rate ( m r ) by adding/subtracting small random values drawn from a Gaussian distribution. We used a tournament-based selection mechanism for the selection operator. This method selects solutions for the next iteration (generation) by comparing a subset of the population and choosing the best performing ones. The selection size ( s s ) determines the number of candidates considered for each tournament. The GA terminates when either a predefined number of iterations is reached or no improvement occurs for p percent of consecutive iterations.

5.3. SNN-GA: The Inference Phase (Online Service Placement)

The trained model is used for online service placement in unseen edge-to-cloud environments. In the proposed approach, before assigning a service component to a computing node, the trained model is used to estimate the suitability value for each computing node to host that specific service component. As each service component is available in multiple versions, the suitability value ( S V ) is calculated separately for each version of each computing node. Once the suitability values are determined for all versions, the version of the service component that achieves the highest suitability value for a given computing node is identified as the most favourable candidate for allocation to that node. This process is repeated for all computing nodes in the network, resulting in a scenario where each computing node identifies its best-suited version of the service component based on the calculated suitability values. Finally, the computing node with the highest overall suitability value (among all computing nodes) was selected for the placement of the service component version. This process is iterated until all the components of the service are placed on the computing nodes.
To handle the constraints during service placement, if a computing node cannot satisfy one of the required constraints (Equations (15)–(18)) before the assignment of a candidate service component, its suitability value is set to ( S V = ). This ensures that only suitable nodes are considered when hosting a given service component.
Figure 3 shows the flowchart of the proposed approach, where training is performed offline and inference is conducted online.

6. Experimental Setup

In this section, we discuss the experimental implementation and evaluation.

6.1. The Edge-to-Cloud Simulator

We used the simulator developed in our previous work for implementation and evaluation [3,4]. This cloud-native simulator models the entire infrastructure, along with its associated services. For reproducibility, we shared all study and research materials through a GitHub repository [30]. These materials include comprehensive documentation, Wikis, and YAML configuration files required for re-deploying our containerised simulator on the Kubernetes platform. It also includes the complete set of problem instances utilised to evaluate all implemented algorithms in the current and previous studies.
Our cloud-native edge-to-cloud simulator provides a complete and faithful representation of the three-tier infrastructure, network characteristics, and AR/VR service structures described in Section 3. It supports fine-grained configuration of all system elements, including computing nodes (CPU, memory, disk capacity, and reliability scores), communication links (available bandwidth and latency), and multi-version service components (resource requirements, codec type, provider, and reliability). The simulator also generates user-helper pairs, detailed service DAGs, and heterogeneous system states. All system specifications (including node capabilities, component-level resource demands, and network parameters) conform to the definitions provided in [3,4,30]. The problem instances used in our evaluation span multiple scales (from small to xxLarge) and are accompanied by configuration files that explicitly define all resource and network attributes.

6.2. Problem Instances

Two sets of instances were used for evaluation. The first set, which included instances at small, medium, large, and extra-large (xLarge) scales, was used for the training. Each instance represents a different level of complexity, enabling a thorough assessment of all the algorithms under varying conditions. For testing, we utilised the second set of instances, which also included problems of the same size as the training set but with different characteristics, along with an additional extra-extra-large (xxLarge) instance to further evaluate all algorithms. Table 3 presents the size of each instance. The specific characteristics of these instances, such as the computational capacity, memory requirements, and other resource specifications, were designed to align with the detailed specifications outlined in [30].
The training data used in this work is synthetically generated by our simulator. The simulator parameters are derived from the detailed requirements provided by our industrial partner and from system specifications reported in [3,4,30]. This provides that the resource ranges, network characteristics, data sizes, and reliability metrics reflect realistic edge-to-cloud systems. Moreover, the simulator introduces heterogeneity and randomness in node capabilities, bandwidth, latency, and system states, which exposes the model to a broad range of operating conditions and increases its robustness.

6.3. Comparing Algorithms

To evaluate the performance of the SNN-GA and establish a comparative analysis, we implemented various heuristic and metaheuristic algorithms. Existing approaches that precisely address the specific details of the proposed service placement problem are limited. Therefore, we compared SNN-GA with heuristic and metaheuristic algorithms (introduced in our previous works [3,4]) specifically designed for service placement in AR/VR-based remote repair and maintenance scenarios in edge-to-cloud systems, where each service component has multiple versions.
We implemented five heuristic solvers: (1) TCA–Task Continuation Affinity, (2) LRC–Least-Required CPU, (3) MDS–Most Data Size, (4) MR–Most Reliability, and (5) LP–Least-Powerful. TCA prioritises placing services on user nodes and moving to higher tiers if resources are insufficient. The LRC selects the version that requires the least CPU. The MDS prioritises components with larger data sizes for users or edge nodes. MR chooses the most reliable version of the most reliable node. The LP executes the most demanding version of the least powerful node. We also implemented three metaheuristic algorithms: (6) GA–Genetic Algorithm, (7) PSO–Particle Swarm Optimisation, and (8) hybrid PSO-GA. Both sets of heuristics and metaheuristics were selected because they demonstrated strong performance in previous studies [3,4]. Their configurations were adapted based on the conditions specified in the original work to ensure that they operate under optimal settings and achieve their best possible performance.
To demonstrate that our design choices (activation functions, etc.) in developing the SNN-GA led to the best results, we also implemented three shallow neural networks that utilise standard activation functions, rather than power coefficients, and compared them with the proposed SNN-GA model in terms of both training efficiency and solution quality.

6.4. SNN-GA Hyperparameters

The proposed SNN-GA approach requires the selection of a limited number of hyperparameters, which are primarily related to the SNN-GA optimiser. We followed the procedure invented in our previous approach [3] and set the GA hyperparameters, as outlined in Table 4.

6.5. SNN-GA Trained Models

Four SNN-GA models were trained, each corresponding to a specific instance scale. The SNN-GA-SM model was trained using small-scale instances. SNN-GA-MM/-LM/-xLM were trained using medium-, large-scale, and xlarge-scale instances, respectively. Each sample instance includes: (a) overall network characteristics (e.g., available bandwidth, link latency), (b) node characteristics (e.g., available CPU, memory capacity, node reliability score), and (c) resource requirements (e.g., CPU demand, memory demand, data size, service component reliability score).

7. Performance Analysis

In addition to reporting on the main objective of all scheduling algorithms (i.e., the cost function in Equation (14)), we also analysed all algorithms for their performance to optimise each metric (response time, software, and hardware reliability) for the whole system, as well as for each individual service. The former highlights that SNN-GA simultaneously optimised all three metrics in its cost function (i.e., not sacrificing one metric to improve the others). The latter highlights that all deployed services (between all user and helper pairs) experience the same quality (i.e., not sacrificing some services in favour of others).

7.1. Overall Placement Cost Analysis

Figure 4 shows a heatmap of the performance of all the algorithms across different scales. The values within the heatmap describe the placement cost ( S P c o s t ) obtained when all the services are placed across the infrastructure. Based on these results, it is evident that SNN-based algorithms (SNN-GA-xx) demonstrate a significant reduction in placement cost compared with heuristic algorithms across all scales and metaheuristic algorithms across larger scales. Although metaheuristics (particularly PSO-GA) demonstrate relatively superior performance on smaller scales, their performance declines as the problem size increases.
The results demonstrate the scalability and generalisability of our proposed SNN-GA across different problem sizes, regardless of their training scale. For example, SNN-GA-MM, which was trained using medium-scale problem instances, achieved superior performance not only on medium scales but also on smaller and larger scales.

7.2. Overall Response Time Analysis

Figure 5 shows a comparative analysis of the average service response times for all the algorithms. The results demonstrate that SNN-GA models achieve lower service response times than other algorithms, especially at larger and more complex scales. Although metaheuristics perform well on small and medium scales, their service response times increase as the scale of the problem increases. In contrast, the SNN-based models demonstrated consistently stable response times, maintaining a high performance across all scales.

7.3. Overall Platform Reliability Analysis

The platform reliability achieved by all the algorithms at various scales is shown in Figure 6. The SNN-GA models consistently demonstrated a platform reliability of more than 95% across all scales. Metaheuristic algorithms also perform well, with approximately 90% to 95% reliability, although they face slight declines at the xxLarge scale. Heuristic methods demonstrated significantly lower reliability for all the problem instances.

7.4. Overall Service Reliability Analysis

Similar to platform reliability, Figure 7 indicates the consistency of the SNN-GA models, demonstrating their efficiency in achieving higher service reliability compared with both metaheuristic and heuristic algorithms. However, the MR heuristic achieves the best service reliability among all the algorithms (because it prioritises service components with higher reliability scores). Despite this advantage, MR performs poorly in terms of response time and platform reliability. The GA and PSO-GA metaheuristics also maintain high service reliability, but show declines as the problem scale increases.

7.5. Per-Service Performance Analysis

We select SNN-GA-LM as a representative of the SNN-GA-based models, given their similar performance (as shown in Figure 4, Figure 5, Figure 6 and Figure 7), to analyse the performance metrics per service. In addition, to avoid redundancy, the results are presented for xlarge-scale scenarios in this subsection.
Figure 8 shows the distribution and variability of response time per service for various algorithms at xlarge scale. The results indicate that the SNN-GA-LM algorithm achieves the lowest response times, with smaller differences in response times across services compared with the heuristic algorithms. Metaheuristics also show a balanced response time per service, although their response times were slightly higher than those achieved by the SNN-GA.
The SNN-GA-LM maintained response times mostly between 150–350 ms for all services, which is lower than those of the other algorithms. Heuristic algorithms demonstrated higher and more dispersed response times, indicating inconsistent service placement efficiency. These findings indicate the superiority of SNN-GA-LM in optimising response times and achieving balanced response times across services.
Figure 9 show the variability of platform reliability per service for various service placement algorithms at the xlarge scale. The results demonstrate that SNN-GA-LM consistently achieves high platform reliability for all services, with values mostly exceeding 0.94 across both scales. In contrast, metaheuristic algorithms achieve a lower platform reliability than SNN-GA-LM. Heuristics also indicate a wider spread in value, indicating greater variability in their ability to ensure consistently high platform reliability per service. These findings highlight the superiority of SNN-GA-LM in terms of hardware reliability per service, outperforming both the metaheuristic and heuristic approaches.
Figure 10 compares the service reliability per service for the different placement algorithms at xlarge scale. Similar to response time and platform reliability, SNN-GA-LM consistently achieves the highest software reliability for services, with values mostly ranging between 0.90 and 0.97 across both scales. Metaheuristic algorithms also demonstrate good performance, although their reliability distributions exhibit high variability. Heuristic algorithms (except for MR, which is a reliability-aware algorithm) also exhibit a broader spread of reliability values, indicating fluctuations in their ability to maintain consistent reliability across services. These findings indicate the significant advantage of SNN-GA-LM in providing superior software reliability for each service compared with the other algorithms investigated.

7.6. Algorithm Computational Time and Complexity Analysis

In addition to the solution quality, the execution time required by each algorithm to obtain a solution must be considered. This factor is particularly critical in optimisation problems where time constraints are a key concern. Table 5 presents the execution times of the various algorithms for solving the service placement problem at different scales. For all heuristic and metaheuristics, the convergence time reflects the total time taken by each algorithm to find a placement solution for a given problem instance. For the SNN-GA, the execution time reflects the amount of time spent during the inference phase.
Table 5 shows that SNN-GA models and heuristics require only milliseconds to solve the problem for scales ranging from small to xLarge. At the xxLarge scale, heuristics maintained their execution time < 2 s, whereas SNN-GA models required ∼6 s to efficiently assign 500 × 6 service components for all 500 user-helper on 250 + 125 + 64 computing nodes. Despite their fast execution time, the quality of the solutions for the heuristic algorithms was significantly lower than that computed by the SNN-GA models. Metaheuristic algorithms, on the other hand, continue to produce high-quality solutions but face a significant increase in execution time as the problem size grows, making them inefficient for larger-scale scenarios. For instance, at the xxLarge scale, metaheuristics require ∼20 min to solve a problem instance and find a solution on par with those computed by SNN-GA models.
Unlike heuristic and metaheuristic algorithms that explore the solution space and produce a solution at the same time, a learning-based algorithm perform the ‘solution exploration’ and ‘solution production’ at different phases, which are referred to ‘training’ and ‘inferring’ phases, respectively. Therefore, to fairly compare the execution of all algorithms, we also need to report the training time for the SNN-GA approaches. Table 6 lists the training times for the four SNN-GA-xx models, showing a reasonable training time for all models. Given that a key advantage of learning-based models is their multiple usage, provided they are generalisable, this time could be easily ignored if the trained model is repeatedly used to solve numerous problem instances. For example, computing high-quality solutions for 10 xxLarge problem instances would take 22.4 m × 10 = 224 m = 3 h:44 m for GA vs. 300.5 m + 6.2 s × 10 = 301.53 m = 5 h:1.53 m for SNN-GA (using the SNN-GA-xLM model); whereas, solving 100 xxLarge problem instances would take 22.4 m × 100 = 37 h:20 m for GA vs. 300.5 m + 6.2 s × 100 = 310.83 m = 5 h:10.83 m for SNN-GA, clearly showing SNN-GA’s advantage over multiple usage. If smaller SNN-GA models are used (e.g., SNN-GA-MM with only 13.1 m for training), the advantage would become even more prominent (37 h:20 m for GA vs. 23.43 m for SNN-GA in this case).
Therefore, during the training phase, the time complexity of SNN-GA is O ( G × N × C o s t ) , where G is the number of GA generations (iterations), N is the population size, and C o s t is the cost to evaluate one chromosome. Because training is performed offline and only once per model, this cost is amortised over multiple inference usages for SNN-GA. Additionally, during the inference phase, for each service component, the model evaluates all V versions across K computing nodes, resulting in K × V evaluations per component. Given that there are C components in the service, the total number of evaluations is C × V × K . Because each evaluation involves a forward pass through the model with complexity O ( 1 ) , the overall inference time complexity for placing service components is O ( C × V × K ) . Table 7 presents a comparison of the time complexities of the evaluated algorithms. This analysis confirms that although SNN-GA incurs a one-time training cost, its inference complexity is comparable to that of heuristic methods and significantly lower than that of metaheuristic algorithms, making it better suited for large-scale and real-time decision-making.

7.7. System Entropy Analysis

To assess the balance of resource utilisation across computing nodes, we employed entropy as a quantitative measurement. Entropy captures the randomness in the distribution of service components and is commonly used as an indicator of the workload balance among computing nodes. Higher entropy values signify a more uniform distribution of service components, which in turn indirectly indicates balanced resource usage. In contrast, lower entropy values indicate disproportionate loading of certain computing nodes. For our calculations, we used the entropy model outlined in [31] which was developed specifically for discrete heterogeneous systems.
Given the similar performance of the SNN-GA models, we selected the SNN-GA-LM model as a representative for evaluating the system entropy. Figure 11 and Figure 12 illustrate the system entropy in terms of the computational resources achieved by various algorithms on small and xxlarge scales, respectively. As shown in these figures, the SNN-GA model and metaheuristics provide higher entropy when initial pairs are added to the system compared with other heuristic algorithms at both small and xxLarge scales. This indicates their superior capability to quickly achieve balanced resource distribution and manage system complexity, even in the early stages of system scaling. In this regard, all heuristic methods, particularly MR and MDS, significantly underperformed with notably low entropy values.

8. SNN-GA: Model Explainability

8.1. The Choice of Activation Function

Figure 13 illustrates the convergence process of the SNN-GA-LM and shows how its performance would have changed if its NN had used different activation functions (e.g., leakyRelu, sigmoid, and hyperbolic tangent) instead of the power coefficients utilised in all SNN-GA models. The models that use the sigmoid, leakyRelu, and tanh activation functions show a small reduction in cost during the initial training iterations, but quickly plateau without significant further improvement. Furthermore, Figure 14, Figure 15 and Figure 16 presents the quantitative comparisons, showing that SNN-GA improves service response time, platform reliability, and service reliability compared to the models that use standard activation functions. The results clearly indicate the superior performance of SNN-GA-LM in terms of both the convergence speed and final cost. This indicates that using these activation functions to handle the nonlinearity in our NN is insufficient to effectively optimise the cost function.
Unlike most deep learning-based placement methods that use 3–4-layer deep neural networks trained using backpropagation and large labelled datasets, SNN-GA employs a shallow neural network with only a single learnable layer. The nonlinearity is introduced through GA-optimised power coefficients rather than traditional activation functions. This eliminates the computationally expensive gradient-descent training process, thus significantly reducing the number of parameters, memory usage, and training time. As a result, both the training and inference phases become substantially lighter than those of conventional DNN architectures.

8.2. Feature Impacts

To investigate how different inputs affect (determine) the overall suitability of a computing node to host a given service component, we performed a series of hypothetical computations where all inputs (features) were constant but one was swept from a minimum to a maximum value. This is to investigate the importance of each feature in determining the suitability of the computing nodes for host service components.
In Figure 17, for example, all features except for “Available Memory” are set to 0.5 , and the feature for “Available Memory” was swept from 0.1 to 1.0 . Considering this, Figure 17 illustrates the relationship between each feature’s value and its impact on overall node suitability. Based on this figure, the available memory of a computing node positively affects its suitability, with a sharp increase in suitability as its available memory capacity increases. This indicates that computing nodes with higher available memory are more desirable for service placement. By contrast, the available CPU capacity of a computing node negatively affects its suitability. This trend also reflects the model’s aim of balancing the prioritisation of memory and computational capacities across computing nodes. Similar to memory capacity, the reliability scores associated with computing and service components show a strong positive correlation with each node’s suitability value. Higher node and service component reliabilities led to increased suitability, which is reasonable.
On the other hand, features such as “Memory requirement”, “CPU requirement”, and “Data size” show negative correlations with the suitability value. Higher values for these features reduce their positive impact on the suitability value, indicating that lower memory and CPU requirements, along with smaller data sizes for service components, are more desirable because of the reduced resource usage and overhead. Features related to the available bandwidth also demonstrated a positive correlation. This demonstrates the importance of a sufficient bandwidth for optimal service placement. In fact, because lower tiers (e.g., Tier-1) have higher bandwidth capacities than the upper tiers (e.g., Tier-3), the model prefers to assign service components to computing nodes with greater bandwidths, thereby prioritising edge nodes.
The analysis of the feature impact shows that the system prioritises node reliability, available memory, service component reliability, and available bandwidth as the most influential factors in determining suitability scores for service placement. These features indicate a steep increase in impact with increasing feature values, indicating that the model is highly sensitive to improvements in these attributes. Specifically, node and component reliabilities reflect the system’s preference for stable and failure-resilient environments, which are critical in maintaining service continuity. Similarly, a high available memory and bandwidth indicate that the system seeks nodes capable of handling resource-intensive tasks and supporting smooth data transmission. In contrast, less influential features (e.g., available CPU) contribute to the overall evaluation but act more as supporting indicators rather than primary determinants. Thus, the model reflects a design that emphasises stability, capacity, and communication efficiency over the raw computing power alone.

9. Conclusions

In this study, we introduced a novel learning-based approach called SNN-GA to address the online service placement problem in edge-to-cloud systems, with a focus on the critical requirements of AR/VR applications for remote repair and maintenance. The SNN-GA uses a shallow NN to compute the suitability of each node for hosting the given service components. The proposed NN utilises weight and power coefficients (instead of the commonly used activation function) to capture nonlinearity and the complex relationships between system characteristics. SNN-GA utilised a GA-based optimisation technique to determine the best combination of these coefficients, which in turn affected the efficiency of service placement for online edge-to-cloud platforms. Through comprehensive evaluation, SNN-GA demonstrated superior performance in minimising service response time while maximising system reliability. The SNN-GA also demonstrates robust generalisability across various problem scales. Compared with other heuristic and metaheuristic algorithms, SNN-GA consistently achieves lower placement costs with faster execution times while maintaining high-quality solutions. The results confirm the potential of lightweight learning-based methods to address the dynamic challenges of service placement in edge-to-cloud computing. In future work, we will extend the SNN-GA to enhance system fault tolerance by addressing node failures and network disruptions in our edge-to-cloud system. We will develop both reactive and proactive fault-tolerant strategies to ensure an uninterrupted service delivery.

Author Contributions

Conceptualization, M.G.H., J.T., B.S.A. and C.C.; methodology, M.G.H., J.T., B.S.A. and C.C.; software, M.G.H.; validation, M.G.H., J.T., B.S.A. and C.C.; formal analysis, M.G.H.; investigation, M.G.H.; resources, M.G.H.; data curation, M.G.H.; writing—original draft preparation, M.G.H.; writing—review and editing, M.G.H., J.T., B.S.A. and C.C.; visualization, M.G.H.; supervision, J.T., B.S.A. and C.C.; project administration, J.T.; funding acquisition, J.T. All authors have read and agreed to the published version of the manuscript.

Funding

This study was partially supported by the Knowledge Foundation of Sweden (KKS).

Data Availability Statement

All data and source codes for all algorithms are publicly available through GitHub at https://github.com/ms-garshasbi/service-placement-simulator (accessed on 10 December 2025).

Conflicts of Interest

Author Calin Curescu was employed by the company Ericsson. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

  1. Asghari, A.; Sohrabi, M.K. Server placement in mobile cloud computing: A comprehensive survey for edge computing, fog computing and cloudlet. Comput. Sci. Rev. 2024, 51, 100616. [Google Scholar] [CrossRef]
  2. Xu, L.; Liu, Y.; Fan, B.; Xu, X.; Mei, Y.; Feng, W. An Improved Gravitational Search Algorithm for Task Offloading in a Mobile Edge Computing Network with Task Priority. Electronics 2024, 13, 540. [Google Scholar] [CrossRef]
  3. Garshasbi Herabad, M.; Taheri, J.; Ahmed, B.S.; Curescu, C. Optimizing Service Placement in Edge-to-Cloud AR/VR Systems using a Multi-Objective Genetic Algorithm. In Proceedings of the 14th International Conference on Cloud Computing and Services Science (CLOSER 2024), Angers, France, 2–4 May 2024. [Google Scholar]
  4. Herabad, M.G.; Taheri, J.; Ahmed, B.S.; Curescu, C. E-PSOGA: An Enhanced Hybrid Metaheuristic for Optimal Edge-to-Cloud Placement of Services with Multi-Version Components. IEEE Access 2025, 13, 151170–151188. [Google Scholar] [CrossRef]
  5. Jiang, Q.; Zhang, Y.; Yan, J. Neural combinatorial optimization for energy-efficient offloading in mobile edge computing. IEEE Access 2020, 8, 35077–35089. [Google Scholar] [CrossRef]
  6. de Souza, A.B.; Rego, P.A.L.; Chamola, V.; Carneiro, T.; Rocha, P.H.G.; de Souza, J.N. A bee colony-based algorithm for task offloading in vehicular edge computing. IEEE Syst. J. 2023, 17, 4165–4176. [Google Scholar] [CrossRef]
  7. Hosseinzadeh, M.; Masdari, M.; Rahmani, A.M.; Mohammadi, M.; Aldalwie, A.H.M.; Majeed, M.K.; Karim, S.H.T. Improved butterfly optimization algorithm for data placement and scheduling in edge computing environments. J. Grid Comput. 2021, 19, 14. [Google Scholar] [CrossRef]
  8. Liu, S.; Zhang, Y.; Tang, K.; Yao, X. How good is neural combinatorial optimization? A systematic evaluation on the traveling salesman problem. IEEE Comput. Intell. Mag. 2023, 18, 14–28. [Google Scholar] [CrossRef]
  9. I. Garmendia, A.; Ceberio, J.; Mendiburu, A. Applicability of neural combinatorial optimization: A critical view. ACM Trans. Evol. Learn. Optim. 2024, 4, 1–26. [Google Scholar] [CrossRef]
  10. Brogi, A.; Forti, S. QoS-aware deployment of IoT applications through the fog. IEEE Internet Things J. 2017, 4, 1185–1192. [Google Scholar] [CrossRef]
  11. Li, S.; Lin, P.; Song, J.; Song, Q. Computing-assisted task offloading and resource allocation for wireless vr systems. In Proceedings of the 2020 IEEE 6th International Conference on Computer and Communications (ICCC), Chengdu, China, 11–14 December 2020; pp. 368–372. [Google Scholar]
  12. Mahjoubi, A.; Taheri, J.; Grinnemo, K.J.; Deng, S. Optimal placement of recurrent service chains on distributed edge-cloud infrastructures. In Proceedings of the 2021 IEEE 46th Conference on Local Computer Networks (LCN), Edmonton, AB, Canada, 4–7 October 2021; pp. 495–502. [Google Scholar]
  13. Khan, M.A.; Baccour, E.; Erbad, A.; Hamila, R.; Hamdi, M. CODE: Computation Offloading in D2D-Edge System for Video Streaming. IEEE Syst. J. 2022, 17, 4014–4025. [Google Scholar] [CrossRef]
  14. Wu, C.; Peng, Q.; Xia, Y.; Ma, Y.; Zheng, W.; Xie, H.; Pang, S.; Li, F.; Fu, X.; Li, X.; et al. Online user allocation in mobile edge computing environments: A decentralized reactive approach. J. Syst. Archit. 2021, 113, 101904. [Google Scholar] [CrossRef]
  15. Apat, H.K.; Sahoo, B.; Goswami, V.; Barik, R.K. A hybrid meta-heuristic algorithm for multi-objective IoT service placement in fog computing environments. Decis. Anal. J. 2024, 10, 100379. [Google Scholar] [CrossRef]
  16. Bey, M.; Kuila, P.; Naik, B.B.; Ghosh, S. Quantum-inspired particle swarm optimization for efficient IoT service placement in edge computing systems. Expert Syst. Appl. 2024, 236, 121270. [Google Scholar] [CrossRef]
  17. Huang, T.; Lin, W.; Xiong, C.; Pan, R.; Huang, J. An ant colony optimization-based multiobjective service replicas placement strategy for fog computing. IEEE Trans. Cybern. 2020, 51, 5595–5608. [Google Scholar] [CrossRef]
  18. Ghobaei-Arani, M.; Shahidinejad, A. A cost-efficient IoT service placement approach using whale optimization algorithm in fog computing environment. Expert Syst. Appl. 2022, 200, 117012. [Google Scholar] [CrossRef]
  19. Liu, T.; Ni, S.; Li, X.; Zhu, Y.; Kong, L.; Yang, Y. Deep reinforcement learning based approach for online service placement and computation resource allocation in edge computing. IEEE Trans. Mob. Comput. 2022, 22, 3870–3881. [Google Scholar] [CrossRef]
  20. Fahimullah, M.; Ahvar, S.; Agarwal, M.; Trocan, M. Machine learning-based solutions for resource management in fog computing. Multimed. Tools Appl. 2024, 83, 23019–23045. [Google Scholar] [CrossRef]
  21. Sharma, A.; Thangaraj, V. Intelligent service placement algorithm based on DDQN and prioritized experience replay in IoT-Fog computing environment. Internet Things 2024, 25, 101112. [Google Scholar] [CrossRef]
  22. Wang, Y.; Li, Y.; Lan, T.; Choi, N. A reinforcement learning approach for online service tree placement in edge computing. In Proceedings of the 2019 IEEE 27th International Conference on Network Protocols (ICNP), Chicago, IL, USA, 7–10 October 2019; pp. 1–6. [Google Scholar]
  23. Chen, X.; Xu, H.; Zhang, G.; Chen, Y.; Li, R. Unsupervised deep learning for binary offloading in mobile edge computation network. Wirel. Pers. Commun. 2022, 124, 1841–1860. [Google Scholar] [CrossRef]
  24. Tuong, V.D.; Truong, T.P.; Nguyen, T.V.; Noh, W.; Cho, S. Partial computation offloading in NOMA-assisted mobile-edge computing systems using deep reinforcement learning. IEEE Internet Things J. 2021, 8, 13196–13208. [Google Scholar] [CrossRef]
  25. Lingayya, S.; Jodumutt, S.B.; Pawar, S.R.; Vylala, A.; Chandrasekaran, S. Dynamic task offloading for resource allocation and privacy-preserving framework in Kubeedge-based edge computing using machine learning. Clust. Comput. 2024, 27, 9415–9431. [Google Scholar] [CrossRef]
  26. Pang, S.; Wang, T.; Gui, H.; He, X.; Hou, L. An intelligent task offloading method based on multi-agent deep reinforcement learning in ultra-dense heterogeneous network with mobile edge computing. Comput. Netw. 2024, 250, 110555. [Google Scholar] [CrossRef]
  27. Li, N.; Zhai, L.; Ma, Z.; Zhu, X.; Li, Y. Lyapunov-guided Deep Reinforcement Learning for service caching and task offloading in Mobile Edge Computing. Comput. Netw. 2024, 250, 110593. [Google Scholar] [CrossRef]
  28. Zhang, S.; Tong, X.; Chi, K.; Gao, W.; Chen, X.; Shi, Z. Stackelberg game-based multi-agent algorithm for resource allocation and task offloading in mec-enabled c-its. IEEE Trans. Intell. Transp. Syst. 2025, 26, 17940–17951. [Google Scholar] [CrossRef]
  29. Maciel, P.; Dantas, J.; Melo, C.; Pereira, P.; Oliveira, F.; Araujo, J.; Matos, R. A survey on reliability and availability modeling of edge, fog, and cloud computing. J. Reliab. Intell. Environ. 2021, 8, 227–245. [Google Scholar] [CrossRef]
  30. Herabad, M.G. Service-Placement-Simulator. 2024. Available online: https://github.com/ms-garshasbi/service-placement-simulator (accessed on 10 December 2025).
  31. Politanskyi, R.; Bobalo, Y.; Zarytska, O.; Kiselychnyk, M.; Vistak, M. Entropy calculation for networks with determined values of flows in nodes. Math. Model. Comput. 2022, 9, 936–944. [Google Scholar] [CrossRef]
Figure 1. The edge-to-cloud infrastructure.
Figure 1. The edge-to-cloud infrastructure.
Electronics 15 00065 g001
Figure 2. GA solution encoding.
Figure 2. GA solution encoding.
Electronics 15 00065 g002
Figure 3. The flowchart of SNN-GA.
Figure 3. The flowchart of SNN-GA.
Electronics 15 00065 g003
Figure 4. Placement cost of the algorithms across various scales.
Figure 4. Placement cost of the algorithms across various scales.
Electronics 15 00065 g004
Figure 5. Service response time achieved by the algorithms.
Figure 5. Service response time achieved by the algorithms.
Electronics 15 00065 g005
Figure 6. Platform reliability achieved by the algorithms.
Figure 6. Platform reliability achieved by the algorithms.
Electronics 15 00065 g006
Figure 7. Service reliability achieved by the algorithms.
Figure 7. Service reliability achieved by the algorithms.
Electronics 15 00065 g007
Figure 8. Distribution of response time for services in the xlarge scale.
Figure 8. Distribution of response time for services in the xlarge scale.
Electronics 15 00065 g008
Figure 9. Distribution of platform reliability for services in the xlarge scale.
Figure 9. Distribution of platform reliability for services in the xlarge scale.
Electronics 15 00065 g009
Figure 10. Distribution of service reliability for services in the xlarge scale.
Figure 10. Distribution of service reliability for services in the xlarge scale.
Electronics 15 00065 g010
Figure 11. Entropy analysis of the algorithms on small-scale.
Figure 11. Entropy analysis of the algorithms on small-scale.
Electronics 15 00065 g011
Figure 12. Entropy analysis of the algorithms on xxLarge-scale.
Figure 12. Entropy analysis of the algorithms on xxLarge-scale.
Electronics 15 00065 g012
Figure 13. Convergence process.
Figure 13. Convergence process.
Electronics 15 00065 g013
Figure 14. Response time analysis (SNN-GA vs. standard activation functions).
Figure 14. Response time analysis (SNN-GA vs. standard activation functions).
Electronics 15 00065 g014
Figure 15. Platform reliability analysis (SNN-GA vs. standard activation functions.
Figure 15. Platform reliability analysis (SNN-GA vs. standard activation functions.
Electronics 15 00065 g015
Figure 16. Service reliability analysis (SNN-GA vs. standard activation functions.
Figure 16. Service reliability analysis (SNN-GA vs. standard activation functions.
Electronics 15 00065 g016
Figure 17. Normalised impact of each feature on the suitability value.
Figure 17. Normalised impact of each feature on the suitability value.
Electronics 15 00065 g017
Table 1. System model-related notation.
Table 1. System model-related notation.
NotationDescription
C C k t Available CPU capacity of computing node k at time t
M C k t Available memory of computing node k at time t
D C k t Available disk capacity of computing node k at time t
R S k t Reliability score of computing node k at time t
C R v y CPU requirement of component y with version v
M R v y Memory requirement of component y with version v
D S v y Data size of component y with version v
P R v y Provider of component y with version v
C T v y Encoding/decoding type of component y with version v
R S v y Reliability score of component y with version v
u i The characteristics of user i
h j The characteristics of helper j
C N k t The characteristics of computing node k at time t
C N a l l c h t The characteristics of all computing nodes at time t
C N a l l b w t The BW and LD of the communication links at time t
S t The set of services at time t
S x x t h service
S C y x y t h service component of S x
S C y , v x y t h service component with version v of S x
S C y c h x The characteristics of service components
D S S C y , v x Data size of S C y , v x
B W l t Bandwidth of communication link l at time t
R T T Round trip time of the communication link
T D S C y , v x Data transmission delay of S C y , v x
C R S C y , v x Computational requirement of S C y , v x
WWaiting time
E T S C y , v x Execution time of S C y , v x
P D S C y , v x Service provider delay of S C y , v x
C D S C y , v x Encoding and decoding delay of S C y , v x
R T S C y , v x Response time of S C y , v x
R T S x Response time of S x
R T S t Response time of S t
R S S C y , v x Reliability score of S C y , v x
R S S x Reliability score of S x
R S S t Reliability score of S t
R S k Reliability score of computing node k
R S i Reliability score of user node i
R S j Reliability score of helper node j
R S p Reliability score of user-helper pair p
R S C N t The total reliability score of the computing nodes at time t
U/H/KThe total number of users/helpers/computing nodes
X/Y/VThe total number of services/service components/versions
L D t The link delay at time t
B W t The link bandwidth at time t
tTime
S P c o s t The total service placement cost
Table 2. Proposed approach-related notation.
Table 2. Proposed approach-related notation.
NotationDescription
C C k t Available CPU capacity of computing node k at time t
M C k t Available memory of computing node k at time t
R S k t Reliability score of computing node k at time t
B W k t Average available bandwidth of the computing node k at time t
C R S C y , v x Computational requirement of S C y , v x
M R S C y , v x Memory requirement of S C y , v x
D S S C y , v x Data size transferred by S C y , v x
R S S C y , v x The reliability score of S C y , v x
S V k Suitability value of computing node k
α 1 , , α 7 Learnable parameters related to weight coefficients
β 1 , , β 7 Learnable parameters related to power coefficients
c r Crossover rate
m r Mutation rate
s s Selection size
p p Population size
pPercentage of iterations with no improvement.
Table 3. Problem instances.
Table 3. Problem instances.
SpecificationsSmallMediumLargexLargexxLarge
U/H30/1560/30120/60250/125500/250
K in Tier-1/2/315/8/430/15/860/30/16125/62/32250/125/64
X/Y/V30/6/560/6/6120/6/7250/6/8500/6/9
RS for CNs>70%
RS for SCs>70%
Table 4. Configurations of the GA.
Table 4. Configurations of the GA.
Conf.Value
p s 100
c r 70%
m r 5%
s s 10%
i t 100
p 20 %
Table 5. Execution time of the algorithms.
Table 5. Execution time of the algorithms.
AlgorithmSmallMediumLargexLargexxLarge
SNN-GA-SM<1 s<1 s<1 s<1 s6.3 s
SNN-GA-MM<1 s<1 s<1 s<1 s6.1 s
SNN-GA-LM<1 s<1 s<1 s<1 s6.1 s
SNN-GA-xLM<1 s<1 s<1 s<1 s6.2 s
GA10 s32 s1.6 m5.5 m22.4 m
PSO7 s18 s53 s3.8 m18.11 m
PSO-GA9 s26 s1.3 m4.2 m20.1 m
TCA<1 s<1 s<1 s<1 s1.6 s
LRC<1 s<1 s<1 s<1 s1.7 s
MDS<1 s<1 s<1 s<1 s1.8 s
MR<1 s<1 s<1 s<1 s1.6 s
LP<1 s<1 s<1 s<1 s1.9 s
Table 6. Training time of the SNN-GA models.
Table 6. Training time of the SNN-GA models.
SNN-GA-SMSNN-GA-MMSNN-GA-LMSNN-GA-xLM
4.7 m13.1 m51.8 m300.5 m
Table 7. Training and inference time complexity of the algorithms.
Table 7. Training and inference time complexity of the algorithms.
AlgorithmsTraining Time Complexity (Offline)Inference Time Complexity (Online)
HeuristicsNone O ( C × V × K )
MetaheuristicsNone O ( G × N × C o s t )
SNN-GA O ( G × N × C o s t ) O ( C × V × K )
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Garshasbi Herabad, M.; Taheri, J.; Ahmed, B.S.; Curescu, C. A Lightweight Learning-Based Approach for Online Edge-to-Cloud Service Placement. Electronics 2026, 15, 65. https://doi.org/10.3390/electronics15010065

AMA Style

Garshasbi Herabad M, Taheri J, Ahmed BS, Curescu C. A Lightweight Learning-Based Approach for Online Edge-to-Cloud Service Placement. Electronics. 2026; 15(1):65. https://doi.org/10.3390/electronics15010065

Chicago/Turabian Style

Garshasbi Herabad, Mohammadsadeq, Javid Taheri, Bestoun S. Ahmed, and Calin Curescu. 2026. "A Lightweight Learning-Based Approach for Online Edge-to-Cloud Service Placement" Electronics 15, no. 1: 65. https://doi.org/10.3390/electronics15010065

APA Style

Garshasbi Herabad, M., Taheri, J., Ahmed, B. S., & Curescu, C. (2026). A Lightweight Learning-Based Approach for Online Edge-to-Cloud Service Placement. Electronics, 15(1), 65. https://doi.org/10.3390/electronics15010065

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop