1. Introduction
The integration of edge and cloud computing improves performance and fulfils critical latency requirements for resource-intensive systems such as Augmented Reality (AR) and Virtual Reality (VR). Edge computing processes data near its source, allowing low-latency handling of time-sensitive tasks on local devices or on nearby servers. The cloud complements the edge by providing support for more demanding workloads [
1]. In edge-to-cloud computing, real-time decision-making is essential for effectively allocating resources online to services while fulfilling the Quality of Service (QoS) requirements. The complexity of service placement in such environments is amplified by the demanding nature of real-time multimedia applications. The heterogeneity of devices and computing nodes in the edge-to-cloud network (which varies significantly in communication protocols, processing power, storage capacity, and other resource attributes) adds further complexity. Additionally, dynamic conditions such as variable task arrival rates and fluctuations in resource availability render the development of online service placement algorithms even more challenging in edge-to-cloud computing [
2,
3,
4].
The service placement problem in edge-to-cloud networks is identified as a combinatorial optimisation problem [
5] that involves selecting and placing services on computing nodes (in an edge-to-cloud platform) under various constraints. Because the solution space expands exponentially with the number of services and computing nodes (identified as NP-complete [
3,
6,
7]), exploring all possible solutions in polynomial time is infeasible. In addition, resource and latency constraints, interdependencies between service components, and the need to optimise multiple objectives (e.g., latency and system reliability) make it more difficult when compared with conventional resource allocation and scheduling problems. The addition of more flexibility further expands the solution space, making the problem even more difficult. For example, in our previous study [
3] (also in this study), we considered a situation in which each service component can have multiple implementation versions. Each version differs in terms of resource demand, performance characteristics, and QoS. This expands the solution space significantly, as the placement algorithm must simultaneously decide not only where to place each service component but also which version to deploy. This versioning aspect introduces additional decision variables and constraints, rendering the service placement problem even more challenging and computationally intensive.
Service placement optimisation in edge-to-cloud environments can be handled using exact, heuristic, and metaheuristic techniques. Exact methods provide optimal solutions if adequate time is available. However, their runtimes increased exponentially as the problem size increased. Heuristic methods do not provide optimal solutions but usually yield relatively reasonable results almost instantly. Metaheuristics can provide near-optimal solutions, although applying them directly to the service placement problem can significantly increase the runtime when compared with heuristics, which makes them only suitable for solving offline resource allocation problems. This makes only heuristics suitable for online scenarios (despite their serious performance issues in producing optimal solutions) that require real-time decision-making for service placement [
5].
Learning-based techniques are other approaches that have recently gained attention for solving problems related to service placements [
8,
9]. These methods introduce an approach that uses a training-inference framework. A model is first trained during the typically time-consuming training phase and then applied to solve new problem instances. The inference phase to provide the best possible solution is presumably as fast as the simple greedy algorithms (i.e., almost instant). These methods provide higher-quality solutions than heuristics. They are also significantly more suitable than metaheuristics and exact methods in terms of efficient use of computational resources during the inference phase. Their data-driven nature makes them powerful tools for modern edge-to-cloud systems that require intelligent and robust placement decisions. However, these methods have several limitations. First, although training is performed offline, a large volume of high-quality data is required, which may not always be available. Second, these models (especially deep learning models) perform as black boxes without transparency and interpretability in decision-making. Third, models trained in one environment may not generalise well to others without retraining or adaptation, which can be costly and time consuming. To address these challenges and to develop reliable and efficient learning solvers, they must be designed with major considerations. These models must effectively balance exploration and exploitation to prevent overfitting or underfitting to specific problem instances while maintaining strong generalisation capabilities. They must be carefully designed because learning-based approaches tend to be more sensitive to their hyperparameters.
Although service placement has been extensively studied, most existing approaches rely on simplified assumptions, such as single-version service components, offline placement with full system knowledge, homogeneous computing nodes or resource demands, or optimisation focused primarily on latency. Recent learning-based methods also depend on computationally expensive deep reinforcement learning or deep neural networks, making them unsuitable for real-time decision-making. In contrast, this study considers a more complex and realistic online edge-to-cloud scenario in which each service component has multiple implementation versions with different resource demands and reliability scores. This multi-version architecture, combined with heterogeneous computing nodes and the absence of future system-state knowledge, significantly expands the search space and introduces new challenges. These include selecting an appropriate version of a service component, choosing the most suitable node for placement, jointly optimising response time and both software/hardware reliability, and achieving generalisability without requiring large training datasets.
In this study, we propose a novel lightweight learning-based approach (SNN-GA) to address the online service placement problem. We designed a shallow yet extremely efficient Neural Network (NN) that accepts various features of the edge-to-cloud system as input and produces the optimal (or very close to optimal) solution almost instantly. We used a Genetic Algorithm (GA) to set the hyperparameters. To handle nonlinearity, SNN-GA uses both power and weight coefficients (instead of the commonly used activation function) to process input features from edge-to-cloud environments. During the training phase, the GA optimiser determined the optimal values for both the weight and power coefficients in the NN layer of the SNN-GA. The optimisation process simultaneously considers multiple objectives (minimising service response time and maximising system reliability). Once the SNN-GA is fully trained, it can be employed in the inference phase to make online service placement decisions in other edge-to-cloud environments. SNN-GA was evaluated using a wide range of problem instances and proved to be efficient, robust, and generalisable to solve problem instances multiple times larger than those used for its training.
The SNN-GA differs from the existing learning-based methods. First, it employs a shallow neural network, which reduces the number of parameters to be learned, resulting in faster training and inference compared with other methods, such as commonly used deep learning models. Second, the training process is unsupervised and does not require large volumes of high-quality labelled data, which are often difficult to obtain in edge-to-cloud environments. These differences significantly enhance the interpretability and explainability of the SNN-GA, making it more suitable for online scenarios. The main contributions of this study are as follows:
We model an online service placement scenario with multi-version service components in edge-to-cloud environments. We evaluate our approach using an AR/VR application (for remote repair and maintenance) with unique requirements.
We design a lightweight learning-based approach, called SNN-GA, to solve the stated problem with the primary objectives of minimising service response time while maximising system reliability.
We use a Genetic Optimiser to train a shallow neural network with both weight and power coefficients (instead of commonly used activation functions) to help it to handle nonlinearity in the environment.
We perform a comprehensive analysis to evaluate SNN-GA (in terms of effectiveness, efficiency, robustness, and generalisability) and compare it against several state-of-the-art heuristic and metaheuristic approaches.
The remainder of this paper is organised as follows.
Section 2 reviews the related work.
Section 3 and
Section 4 describe the proposed system and objective function, respectively.
Section 5 details our approach (SNN-GA) to solving the online service placement problem.
Section 6 discusses the experimental setup.
Section 7 and
Section 8 evaluate the SNN-GA and compares it with other approaches.
Section 9 concludes the study and highlights future work.
2. Related Work
Approaches for optimally placing services in edge-to-cloud computing environments can be categorised into three main classes.
2.1. Heuristic-Based Solutions
Several heuristic approaches for service placement and resource allocation have been introduced in literature. For example, Brogi et al. [
10] presented latency-aware heuristic algorithms for deploying multicomponent IoT applications on an edge infrastructure. Li et al. [
11] proposed a proactive graph-colouring heuristic to optimise task offloading and resource allocation in mobile edge computing to improve virtual reality users’ quality of experience (QoE). Mahjoubi et al. [
12] introduced a set of heuristic algorithms to handle service chain placement in three-layer IoT–edge–cloud architectures, aiming to minimise total service delay through Mixed-Integer Linear Programming (MILP). Khan et al. [
13] introduced a computation offloading framework for edge systems by introducing two methods (Maximum Offloading with Delay Constraint and Minimum Delay Offloading) to handle sudden video streaming spikes. Xu et al. [
2] developed a heuristic-based optimisation model for multi-user edge computing networks to minimise task delays by incorporating elements of genetic algorithms to refine the solution. Wu et al. [
14] designed a decentralised strategy for resource allocation using fuzzy control systems that allow edge users to utilise local information for decision-making.
Although the above studies provide valuable insights into optimising service placement in edge-to-cloud networks, most of them only focus on a single objective, often simply minimising latency. This narrow focus neglects other crucial Quality of Service (QoS) metrics. Many also employ heuristics that may not be effectively scaled to large and complex network environments, potentially leading to suboptimal solutions. Some techniques rely heavily on the local information available at individual devices or edge nodes, which may hinder the achievement of globally optimal solutions. Consequently, heuristic service placement approaches encounter challenges in terms of the solution quality and generalisability when applied to edge-to-cloud environments.
2.2. Metaheuristic-Based Solutions
In our earlier work [
3], we proposed a Multi-Objective Genetic Algorithm (MOGA) for AR/VR service placement for remote repair and maintenance applications. Extensive experiments have revealed that metaheuristic algorithms significantly outperform conventional heuristics in terms of the solution quality. However, their runtimes increase exponentially as the problem size increases. de Souza et al. [
6] developed a Bee Colony optimisation strategy to reduce application execution time by offloading tasks to the network edge. Hosseinzadeh et al. [
7] introduced a discrete Butterfly optimisation Algorithm for task scheduling in edge-computing environments. Apat et al. [
15] investigated service placement optimisation in IoT use cases by considering makespan and energy consumption objectives. They developed various population-based metaheuristics (Genetic Algorithm, Simulated Annealing, and Particle Swarm Optimisation), as well as hybrid versions (GA-SA and GA-PSO). Their results indicated that hybrid metaheuristics outperform simpler greedy solutions. Bey et al. [
16] introduced a Quantum-inspired PSO scheme for IoT-driven service placement. Furthermore, Huang et al. [
17] proposed a multi-objective Ant Colony optimisation technique for container placement across edge-to-cloud infrastructures. Ghobaei-Arani et al. [
18] presented a Whale optimisation Algorithm-based approach to solve service placement challenges in IoT environments.
Although recent studies have demonstrated growing interest in employing metaheuristic algorithms to address service placement problems, several limitations persist. These approaches, while capable of producing near-optimal solutions, often exhibit generalisability issues. In addition, their execution times can increase exponentially as problem complexity increases, making them impractical for online scenarios and large-scale deployments. Furthermore, achieving optimal performance frequently necessitates careful tuning of algorithm-specific hyperparameters, which can be time consuming and may lead to suboptimal outcomes if not properly configured. Consequently, metaheuristic service placement techniques face challenges in terms of computational efficiency, robustness, and generalisability.
2.3. Learning-Based Solutions
Liu et al. [
19] utilised Deep Reinforcement Learning (DRL) to make online decisions about service deployment and computational resource allocation in a 5G-supported edge computing framework. The work focuses only on latency minimisation, without modelling service heterogeneity, reliability, or multi-version computational behaviors that arise in real-world edge-to-cloud systems. Fahimullah et al. [
20] investigated how different learning-based techniques, such as NNs and RL, could address service placement challenges by predicting user demands and resource availability in edge/fog computing. However, it remains mostly high-level and does not analyse fine-grained online decision-making challenges (such as multi-version service heterogeneity, real-time constraints, and placement reliability trade-offs), which limits its applicability to practical edge-cloud scenarios. Sharma et al. [
21] proposed a dynamic placement algorithm for IoT services in an edge-to-cloud setting, employing a Double Deep Q-Network combined with Prioritized Experience Replay. Nevertheless, it treats each service as a single monolithic unit and does not capture execution-level heterogeneity, intercomponent dependencies, or reliability variations. Wang et al. [
22] modelled the service placement problem as a Markov Decision Process and used deep Q-learning to decide where various service components should be allocated in an edge network. The method uses a deep Q-learning architecture with online decision-making overhead and does not evaluate real-time inference latency or the practicality of deploying such multi-step RL models in latency-critical edge environments. Chen et al. [
23] focused on unsupervised deep learning for binary offloading in mobile edge computing, using a deep neural network to optimise offloading decisions. However, it operates on atomic tasks and does not consider multi-component service graphs, version heterogeneity, or reliability factors required in modern edge-cloud applications. Truong et al. [
24] also employed the DRL for partial computational offloading in similar environments. Their appraoch has limited scalability because its RL state and action spaces grow with all user–subchannel combinations, making training and inference increasingly expensive as the network size expands. Meanwhile, Lingayya et al. [
25] applied multi-agent collaborative RL to dynamically assign tasks in edge computing systems. However, its overall complexity and computational overhead scale poorly as the number of IoT devices, tasks, and edge nodes increases. Pang et al. [
26] introduced a multi-agent DRL framework for task offloading in heterogeneous edge networks. The high-dimensional joint state–action space of the method limits scalability. Li et al. [
27] integrated DRL with Lyapunov optimisation to further improve task offloading in mobile edge computing scenarios. They assume each task corresponds to a single service type and does not support multi-component service graphs or inter-service dependencies. Zhang et al. [
28] proposed a distributed Stackelberg game framework for task offloading and bandwidth allocation in MEC-enabled C-ITS, and introduced a multi-agent reinforcement learning algorithm (SG-MAPG) to approximate the Stackelberg equilibrium and improve computation rate. However, the approach is limited by its binary offloading model, and it does not address heterogeneous service with multi-version components.
A significant drawback of many existing learning-based approaches for service placement optimisation is their reliance on complex deep learning algorithms (particularly DRL). These models often require substantial amounts of data and extensive training time, making them resource-intensive and challenging to scale into large and complex networks. Furthermore, they often fall short when it comes to the rigorous evaluation of model generalisability. In many cases, a large portion of the data is used for training (i.e., a small subset is reserved for testing), making the evolution of the approaches less rigorous. Furthermore, it may limit such approaches to adequately generalising to unseen scenarios. Many of these techniques also involve numerous hyperparameters that significantly influence performance, requiring careful tuning. RL-based methods, especially multi-agent approaches, can face challenges in providing high-quality solutions given that they rely on local data. Therefore, current learning-based approaches for service placement optimisation may face challenges in terms of generalisability and resource efficiency (particularly resource consumption in the training phase).
In addition to aforementioned shortcomings, all existing strategies assume homogeneous and single-version components, which are not entirely correct in advanced edge-to-cloud scenarios involving heterogeneous environments. Consequently, there is a clear need for further research to develop novel algorithms that are generalisable and more efficient, both in terms of resource consumption and solution quality, for online service placement in larger scale edge-to-cloud systems.
3. System Model
In this study, we considered service placement in a specific edge-to-cloud scenario that uses AR and VR technology for remote repair and maintenance tasks [
3]. This use case, which was formulated in collaboration with our industrial partner (Ericsson), considers a scenario in which an industrial device malfunctions and no expert is available onsite. In such cases, a local technician uses an AR/VR application to connect with a remote expert and share the video footage of the malfunctioning device to facilitate identification and troubleshooting of the issue. Based on our industry-partner requirements, this system must provide private real-time high-definition video streaming with low-latency communication and efficient task distribution across an edge-to-cloud infrastructure to maintain high system reliability. In addition, unlike centralised cloud-based video-calling systems that share components and risk privacy breaches, our system must assign dedicated services to each user-helper pair. This eliminates shared components and creates a lightweight and decentralised architecture that supports deployment across diverse hardware, from large servers to small edge devices, providing flexibility, scalability, and privacy.
The edge-to-cloud AR/VR-based remote repair and maintenance use case studied in this paper was formulated in detail in our earlier work [
3], which focused on offline one-shot placement of services for the stated problem. For clarity, we provide an overview of the updated model for the online scenarios. The notations relevant to the system model are summarised in
Table 1.
3.1. Infrastructure
A three-tier infrastructure is considered in this study: Tier-1 consists of access points (APs) that act as entry points for devices to connect to the network; Tier-2 comprises edge nodes close to the APs; and Tier-3 is the cloud that provides high-capacity computational resources and storage for tasks requiring extensive processing.
Figure 1 shows the components of our edge-to-cloud architecture.
Each computing node within the infrastructure is described by a unique set of characteristics defined as
, where
. The total number of computing nodes is denoted by
K, whereas
,
, and
correspond to the available computational, memory, and disk capacities of the computing node
k at time
t, respectively.
represents the reliability score of computing node at time
t. The characteristics of all the computing nodes at time
t are denoted as
.
Moving upward through tiers in the infrastructure, nodes provide more computational and memory capacity; however, this improvement is accompanied by increased network delays. We assume that the current available bandwidth (BW) and transmission delays (LD) for communication links between computing nodes are known to model the interaction between entities within the infrastructure. To formalise this, Equation (
2) is introduced to capture the available bandwidth and observed delay at time
t, where the delay is approximated as half of the round-trip time between the two nodes. In this formulation, rows and columns indexed from 1 to
K correspond to computing nodes, columns from
to
represent connected user nodes, and columns from
to
represent connected helper nodes.
In the infrastructure, computing nodes in each tier can establish connections with computing nodes in other tiers (inter-tier communication), as well as within the same tier (intra-tier communication). The communication bandwidth decreases as we move toward higher tiers.
3.2. Services and Applications
In our use case, “users” connect with remote “helpers” through AR/VR applications for repair or maintenance tasks in industrial settings. Users and helpers use their personal devices, which vary in characteristics, such as computational capacity (CC), memory capacity (MC), disk capacity (DC), and reliability score (RS). The device characteristics are represented as for users and for helpers.
The platform hosting a diverse set of AR/VR services is represented as
, where each service
consists of multiple components denoted by
. Every component
is available in multiple versions and is described by
. These versions are defined by their distinct requirements, including the computational power, memory, disk space, and data transfer specifications necessary for interaction with other components. We assume that service component versions are provided by various providers. Accordingly, each version is also denoted by unique characteristics, including the Codec Type and reliability score, which indicate the failure probability of a component. The resource requirements and characteristics of a service component are represented by
(Equation (
3)), encompassing the computational requirements (CR), memory requirements (MR), disk requirements (DR), data size (DS), provider (PR), codec type (CT), and reliability score (RS).
Services are modelled as Directed Acyclic Graphs (DAGs), where nodes represent the individual service components and edges signify their interdependencies. This DAG-based representation provides a clear framework for understanding the relationships and workflows between the service components [
3].
Because we consider an online scenario, we assume that we only have data on the characteristics of the currently connected user-helper pairs and lack prior knowledge of the characteristics of pairs that will connect to the system. Therefore, our approach focused only on the current state of the system when making decisions. This assumption aligns with the real-time nature of online environments, where the system must handle unpredictable and continuously changing user-helper interactions.
5. SNN-GA: The Proposed Solution
In this section, we introduce the proposed learning-based approach. The SNN-GA is divided into two phases: training and inference. In the training phase, a shallow NN is trained using a GA optimiser. In the inference phase, the trained model is employed to estimate the suitability of the computing nodes for hosting service components when making online service placement decisions. The notations relevant to this section are summarised in
Table 2.
5.1. SNN-GA: Inputs and Outputs
We assume that data regarding the current system state (both requirements and characteristics) are available and are considered as input every time a new user-helper pair is added to the network at time t. Consequently, the input layer of the designed NN is in the current state of the system, that is, the characteristics of the nodes, such as the available computational capacity (), memory usage (), reliability score (), and average available bandwidth of the communication links of the computing node (). It also considers the characteristics of the service components to be added to the network, that is, the computational () and memory () requirements, data transfer size (), and reliability score () for each version of the service components.
The proposed shallow NN uses Equation (
19) to estimate the suitability of each computing node. Therefore, the output layer is designed with a single neuron that estimates (computes) the suitability value as the final output for each computing node.
In this equation,
,
,
and
represent the normalised features of node
k at time
t.
,
,
, and
represent the normalised features of
.
and
are learnable parameters including the weight and power coefficients. The power coefficients apply a nonlinear transformation to the features that enable the model to capture nonlinear and non-polynomial relationships. Specifically, the combination of weight and power coefficients not only scales the features, but also adjusts their influence based on their importance and interaction patterns. This flexibility makes the model well suited for handling diverse feature distributions and capturing strong nonlinear dependencies. Because we used power coefficients to provide nonlinearity, we did not include an activation function in the output layer. The optimal values for the learnable parameters were determined using a GA-based optimiser during the training phase.
5.2. SNN-GA: The Training Phase
To train the model for service placement decision-making, we use a GA to identify the optimal values for the learnable parameters and . GA tunes the model parameters to ensure that the model produces the most suitable value for selecting the optimal node to place the current service components. GA, as a population-based metaheuristic, was chosen owing to its ability to explore large search spaces effectively and identify global optima. GA finds solutions through its crossover, mutation, and selection operators across multiple iterations.
It is worth noting that because the proposed model (SNN-GA) introduces nonlinearity through power coefficients, the optimisation space becomes non-smooth and non-differentiable. This makes traditional gradient-based optimisers such as SGD or Adam unsuitable because they rely on well-defined gradients and stable activation functions. For this reason, we employ a GA, a derivative-free global optimiser capable of exploring highly irregular search spaces while avoiding poor local minima. These characteristics make GA more appropriate than conventional DNN optimisers for the model architecture used in this study. Deep learning models were deliberately avoided due to their higher training complexity, sensitivity to hyperparameter tuning, and limited suitability for real-time online deployment in dynamic edge environments. The proposed shallow neural network significantly reduces the number of learnable parameters, lowering the risk of overfitting when training data is limited or synthetically generated.
5.2.1. GA Solution Encoding
To apply the GA to solve an optimisation function, it is essential to encode the initial solutions (called chromosomes). This encoding represents solutions in a format that can be manipulated by the algorithm during the optimisation process. As illustrated in
Figure 2, the initial population of the GA comprises several chromosomes (each representing a unique solution). Each value in the chromosome array corresponded to a specific learnable parameter in the model. Considering the size of our input, which reflects the type of information that must be considered when making a placement decision, the length of the array was set to 16: the first eight elements for
, followed by another eight elements for
.
5.2.2. GA Cost Function
The cost function in GA, also known as the fitness function, is designed based on the defined objective function (Equation (
14)) to evaluate the performance of a given solution within the search space. Specifically, solutions that produce smaller values for the objective function are associated with reduced placement costs, and are considered superior. This type of cost function for our GA ensures that the optimisation process consistently prioritises solutions that lead to improved service placement decision-making policies.
5.2.3. GA Operators
The GA optimises solutions by evolving them through crossover, mutation, and selection operators across multiple iterations. We adopted a single-point crossover operator to combine the parent chromosomes. This operator swaps part of the two selected parent solutions at a randomly determined crossover point to build offspring that inherit the characteristics from both parents. The crossover rate () determines the probability of performing this operation on each selected pair of chromosomes. Mutation provides exploration by randomly changing parts of candidate solutions. Specifically, because the solution elements (chromosome genes) are continuous, we use a Gaussian mutation operator that modifies the solution elements at a mutation rate () by adding/subtracting small random values drawn from a Gaussian distribution. We used a tournament-based selection mechanism for the selection operator. This method selects solutions for the next iteration (generation) by comparing a subset of the population and choosing the best performing ones. The selection size () determines the number of candidates considered for each tournament. The GA terminates when either a predefined number of iterations is reached or no improvement occurs for p percent of consecutive iterations.
5.3. SNN-GA: The Inference Phase (Online Service Placement)
The trained model is used for online service placement in unseen edge-to-cloud environments. In the proposed approach, before assigning a service component to a computing node, the trained model is used to estimate the suitability value for each computing node to host that specific service component. As each service component is available in multiple versions, the suitability value () is calculated separately for each version of each computing node. Once the suitability values are determined for all versions, the version of the service component that achieves the highest suitability value for a given computing node is identified as the most favourable candidate for allocation to that node. This process is repeated for all computing nodes in the network, resulting in a scenario where each computing node identifies its best-suited version of the service component based on the calculated suitability values. Finally, the computing node with the highest overall suitability value (among all computing nodes) was selected for the placement of the service component version. This process is iterated until all the components of the service are placed on the computing nodes.
To handle the constraints during service placement, if a computing node cannot satisfy one of the required constraints (Equations (
15)–(
18)) before the assignment of a candidate service component, its suitability value is set to
(
). This ensures that only suitable nodes are considered when hosting a given service component.
Figure 3 shows the flowchart of the proposed approach, where training is performed offline and inference is conducted online.
6. Experimental Setup
In this section, we discuss the experimental implementation and evaluation.
6.1. The Edge-to-Cloud Simulator
We used the simulator developed in our previous work for implementation and evaluation [
3,
4]. This cloud-native simulator models the entire infrastructure, along with its associated services. For reproducibility, we shared all study and research materials through a GitHub repository [
30]. These materials include comprehensive documentation, Wikis, and YAML configuration files required for re-deploying our containerised simulator on the Kubernetes platform. It also includes the complete set of problem instances utilised to evaluate all implemented algorithms in the current and previous studies.
Our cloud-native edge-to-cloud simulator provides a complete and faithful representation of the three-tier infrastructure, network characteristics, and AR/VR service structures described in
Section 3. It supports fine-grained configuration of all system elements, including computing nodes (CPU, memory, disk capacity, and reliability scores), communication links (available bandwidth and latency), and multi-version service components (resource requirements, codec type, provider, and reliability). The simulator also generates user-helper pairs, detailed service DAGs, and heterogeneous system states. All system specifications (including node capabilities, component-level resource demands, and network parameters) conform to the definitions provided in [
3,
4,
30]. The problem instances used in our evaluation span multiple scales (from small to xxLarge) and are accompanied by configuration files that explicitly define all resource and network attributes.
6.2. Problem Instances
Two sets of instances were used for evaluation. The first set, which included instances at small, medium, large, and extra-large (xLarge) scales, was used for the training. Each instance represents a different level of complexity, enabling a thorough assessment of all the algorithms under varying conditions. For testing, we utilised the second set of instances, which also included problems of the same size as the training set but with different characteristics, along with an additional extra-extra-large (xxLarge) instance to further evaluate all algorithms.
Table 3 presents the size of each instance. The specific characteristics of these instances, such as the computational capacity, memory requirements, and other resource specifications, were designed to align with the detailed specifications outlined in [
30].
The training data used in this work is synthetically generated by our simulator. The simulator parameters are derived from the detailed requirements provided by our industrial partner and from system specifications reported in [
3,
4,
30]. This provides that the resource ranges, network characteristics, data sizes, and reliability metrics reflect realistic edge-to-cloud systems. Moreover, the simulator introduces heterogeneity and randomness in node capabilities, bandwidth, latency, and system states, which exposes the model to a broad range of operating conditions and increases its robustness.
6.3. Comparing Algorithms
To evaluate the performance of the SNN-GA and establish a comparative analysis, we implemented various heuristic and metaheuristic algorithms. Existing approaches that precisely address the specific details of the proposed service placement problem are limited. Therefore, we compared SNN-GA with heuristic and metaheuristic algorithms (introduced in our previous works [
3,
4]) specifically designed for service placement in AR/VR-based remote repair and maintenance scenarios in edge-to-cloud systems, where each service component has multiple versions.
We implemented five heuristic solvers: (1) TCA–Task Continuation Affinity, (2) LRC–Least-Required CPU, (3) MDS–Most Data Size, (4) MR–Most Reliability, and (5) LP–Least-Powerful. TCA prioritises placing services on user nodes and moving to higher tiers if resources are insufficient. The LRC selects the version that requires the least CPU. The MDS prioritises components with larger data sizes for users or edge nodes. MR chooses the most reliable version of the most reliable node. The LP executes the most demanding version of the least powerful node. We also implemented three metaheuristic algorithms: (6) GA–Genetic Algorithm, (7) PSO–Particle Swarm Optimisation, and (8) hybrid PSO-GA. Both sets of heuristics and metaheuristics were selected because they demonstrated strong performance in previous studies [
3,
4]. Their configurations were adapted based on the conditions specified in the original work to ensure that they operate under optimal settings and achieve their best possible performance.
To demonstrate that our design choices (activation functions, etc.) in developing the SNN-GA led to the best results, we also implemented three shallow neural networks that utilise standard activation functions, rather than power coefficients, and compared them with the proposed SNN-GA model in terms of both training efficiency and solution quality.
6.4. SNN-GA Hyperparameters
The proposed SNN-GA approach requires the selection of a limited number of hyperparameters, which are primarily related to the SNN-GA optimiser. We followed the procedure invented in our previous approach [
3] and set the GA hyperparameters, as outlined in
Table 4.
6.5. SNN-GA Trained Models
Four SNN-GA models were trained, each corresponding to a specific instance scale. The SNN-GA-SM model was trained using small-scale instances. SNN-GA-MM/-LM/-xLM were trained using medium-, large-scale, and xlarge-scale instances, respectively. Each sample instance includes: (a) overall network characteristics (e.g., available bandwidth, link latency), (b) node characteristics (e.g., available CPU, memory capacity, node reliability score), and (c) resource requirements (e.g., CPU demand, memory demand, data size, service component reliability score).
8. SNN-GA: Model Explainability
8.1. The Choice of Activation Function
Figure 13 illustrates the convergence process of the SNN-GA-LM and shows how its performance would have changed if its NN had used different activation functions (e.g., leakyRelu, sigmoid, and hyperbolic tangent) instead of the power coefficients utilised in all SNN-GA models. The models that use the sigmoid, leakyRelu, and tanh activation functions show a small reduction in cost during the initial training iterations, but quickly plateau without significant further improvement. Furthermore,
Figure 14,
Figure 15 and
Figure 16 presents the quantitative comparisons, showing that SNN-GA improves service response time, platform reliability, and service reliability compared to the models that use standard activation functions. The results clearly indicate the superior performance of SNN-GA-LM in terms of both the convergence speed and final cost. This indicates that using these activation functions to handle the nonlinearity in our NN is insufficient to effectively optimise the cost function.
Unlike most deep learning-based placement methods that use 3–4-layer deep neural networks trained using backpropagation and large labelled datasets, SNN-GA employs a shallow neural network with only a single learnable layer. The nonlinearity is introduced through GA-optimised power coefficients rather than traditional activation functions. This eliminates the computationally expensive gradient-descent training process, thus significantly reducing the number of parameters, memory usage, and training time. As a result, both the training and inference phases become substantially lighter than those of conventional DNN architectures.
8.2. Feature Impacts
To investigate how different inputs affect (determine) the overall suitability of a computing node to host a given service component, we performed a series of hypothetical computations where all inputs (features) were constant but one was swept from a minimum to a maximum value. This is to investigate the importance of each feature in determining the suitability of the computing nodes for host service components.
In
Figure 17, for example, all features except for “Available Memory” are set to
, and the feature for “Available Memory” was swept from
to
. Considering this,
Figure 17 illustrates the relationship between each feature’s value and its impact on overall node suitability. Based on this figure, the available memory of a computing node positively affects its suitability, with a sharp increase in suitability as its available memory capacity increases. This indicates that computing nodes with higher available memory are more desirable for service placement. By contrast, the available CPU capacity of a computing node negatively affects its suitability. This trend also reflects the model’s aim of balancing the prioritisation of memory and computational capacities across computing nodes. Similar to memory capacity, the reliability scores associated with computing and service components show a strong positive correlation with each node’s suitability value. Higher node and service component reliabilities led to increased suitability, which is reasonable.
On the other hand, features such as “Memory requirement”, “CPU requirement”, and “Data size” show negative correlations with the suitability value. Higher values for these features reduce their positive impact on the suitability value, indicating that lower memory and CPU requirements, along with smaller data sizes for service components, are more desirable because of the reduced resource usage and overhead. Features related to the available bandwidth also demonstrated a positive correlation. This demonstrates the importance of a sufficient bandwidth for optimal service placement. In fact, because lower tiers (e.g., Tier-1) have higher bandwidth capacities than the upper tiers (e.g., Tier-3), the model prefers to assign service components to computing nodes with greater bandwidths, thereby prioritising edge nodes.
The analysis of the feature impact shows that the system prioritises node reliability, available memory, service component reliability, and available bandwidth as the most influential factors in determining suitability scores for service placement. These features indicate a steep increase in impact with increasing feature values, indicating that the model is highly sensitive to improvements in these attributes. Specifically, node and component reliabilities reflect the system’s preference for stable and failure-resilient environments, which are critical in maintaining service continuity. Similarly, a high available memory and bandwidth indicate that the system seeks nodes capable of handling resource-intensive tasks and supporting smooth data transmission. In contrast, less influential features (e.g., available CPU) contribute to the overall evaluation but act more as supporting indicators rather than primary determinants. Thus, the model reflects a design that emphasises stability, capacity, and communication efficiency over the raw computing power alone.