Next Article in Journal
A Sequential Segmentation and Classification Learning Approach for Skin Lesion Images
Previous Article in Journal
Physics-Informed Neural Modeling of 2D Transient Electromagnetic Fields
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Self-Organized Neural Network Inference in Dynamic Edge Networks

Fraunhofer Institute for Integrated Circuits (IIS), 91058 Erlangen, Germany
*
Author to whom correspondence should be addressed.
Appl. Sci. 2025, 15(23), 12615; https://doi.org/10.3390/app152312615
Submission received: 9 October 2025 / Revised: 25 November 2025 / Accepted: 26 November 2025 / Published: 28 November 2025
(This article belongs to the Special Issue Advances of Edge Computing in Distributed Systems)

Abstract

Inference of large machine learning models can quickly exceed the capabilities of edge devices in terms of performance, memory or energy consumption. When offloading computations to a cloud server is not possible or feasible, for instance, due to data sovereignty concerns or latency constraints, a solution can be to distribute the inference load across multiple devices in a local edge network. We propose an approach which is capable of orchestrating multi-stage inference tasks in a mobile ad-hoc network consisting of heterogeneous devices in a self-organized and fully distributed manner. As individual edge devices may be battery-powered and volatile, the framework ensures a high degree of reliability even in dynamic environments. In particular, new nodes are automatically and seamlessly integrated into the ensemble, rendering the approach highly scalable. Moreover, resilience against spontaneous node dropouts or connection failures is implemented through adaptive task rerouting. Finally, by enabling complex inference tasks to be processed in small segments on the most suitable hardware available in the network, the ensemble is able to attain considerable pipelining performance and energy efficiency.

1. Introduction

Edge computing allows processing of data closer to its source, aiming to reduce latency, conserve bandwidth and ensure data sovereignty, as (sensitive) information is not resorted to remote systems [1,2,3]. Small, low-powered edge devices often operate in ensembles–such as swarms of robots, drones or groups of IoT devices–distributing computational tasks across multiple nodes rather than relying on specific single units offers several advantages. First, in an edge environment, tasks may exceed the capacity of a single device, require special hardware or benefit from less frequent context switches, which can reduce energy efficiency and performance [4,5]. Second, it allows the integration of specialized hardware for executing specific sub-tasks (e.g., accelerator chips for certain portions in a neural network inference). Third, distributed systems provide considerable flexibility and scalability; adding or removing nodes is often more cost-effective than upgrading a single system to meet increasing demands. Lastly, a distributed approach facilitates data throughput by enabling pipelining, i.e., streamlining processing stages to ensure continuous data flow. One use case where all of the aforementioned points can be particularly relevant are modern machine learning applications. Typically, deep neural network architectures, but also classical machine learning algorithms, comprise a vast number of computational parameters. Tiny machine learning systems often handle high volumes of individually small inference elements, such as low-resolution images or time series of sensor data. By distributing the computation across multiple devices [6], as sketched in Figure 1, hardware limitations can be alleviated by partitioning the task into steps that align with the capabilities of each device, taking advantage of specialized hardware where possible. Moreover, given that inference elements are individually small compared to the size of the neural network, efficient pipelining can be accomplished.
In dynamic edge networks with mobile, volatile participants, such as, for instance, swarms of autonomous robots or drones, reliable cooperative task processing becomes challenging. Connectivity instabilities caused by technical or environmental factors, nodes, failing due to hardware damage, power loss or (temporary) disconnection from the network due to poor positioning, make the network topology unpredictable. Wireless connectivity can also be disrupted by signal interference or network congestion.
Specific applications for AI-driven data analysis directly on dynamic edge network can be envisioned, such as simultaneous localization and mapping (SLAM) on a fleet of autonomous mobile robots (AMR) [7,8,9,10], navigation for autonomous swarms of drones, in particular determination of points of interest during flight time beyond the line of sight [11,12,13,14,15,16]. Other use cases include local preprocessing of data on wireless sensor networks (WSN) of massive IoT applications, such as agriculture monitoring [17,18,19], disaster relief and emergency operations, where temporary ad-hoc communication networks are installed when traditional infrastructure is compromised [20,21,22,23,24,25]. Finally, industrial IoT [26], smart city [27] or home automation can be further relevant fields for this technology.
Applications within dynamic edge networks, where nodes work cooperatively, must address the challenge of how to maintain reliable, distributed task execution when the set of available devices, their connectivity, and their computational capabilities are constantly shifting. Unlike static clusters, these networks lack stable routing paths and persistent neighbors, meaning that any assignment of computation must be resilient to intermittent links or unreachable nodes. The heterogeneity of hardware further complicates coordination, as nodes differ not only in compute power and in the types of operations they can efficiently support, but–especially in the machine learning regime–also in their ability to handle the substantial memory demands of parameter-heavy models. Consequently, a distributed computing framework is required which ensures that multi-stage machine learning pipelines progress automatically and without interruption and which features mechanisms that can reassess resources continuously, hence being able to react to failures instantly and reconfigure task flows on the fly.
In this manuscript, we present DATOR (Distributed Automatic Task ORchestration), a conceptual algorithmic framework capable of processing sensor data directly within a volatile cluster of heterogeneous edge devices. In particular, inference tasks can be processed in multiple sequential stages in a fully distributed and self-organized manner, without the need for a central orchestration unit and hence without a single point of failure. Broadly speaking, the collective ensemble of multiple edge devices ensures that the system as a whole can continue to operate even when one or more nodes fail or leave the network, or when connections disrupt, by appropriately rerouting tasks. Conversely, the system is designed to automatically adapt when new nodes are added, seamlessly integrating them into the orchestration process. The machine learning inference task is split into distinct, sequential computation steps, which are allocated dynamically across various nodes within the ensemble. Our solution is generic, capable of application to any network comprising multiple nodes engaged in cooperative tasks. However, its lightweight and fully distributed design renders it particularly suitable for low-power mobile edge networks.

2. Related Work

Edge networks that rely on small embedded devices face significant constraints in processing power, memory, storage and battery capacity. These devices are typically designed for simple, specific tasks, limiting the complexity of applications they can execute and the amount of data they can process. This study investigates a connected cluster composed entirely of resource-constrained devices, defining an extreme edge computing environment. This “extreme” context is characterized by, first, the absence of any reliable nodes with considerable performance above the level of low-power microcontroller units, and second, the dominance of mobile, volatile devices, thus demanding a decentralized approach. To the best of our knowledge, no existing orchestration framework explicitly targets these edge networks.
Research in the related domain of mobility-aware edge computing [28,29,30] typically relies on trajectory prediction, usage patterns, or geometric models to proactively migrate services and cache data across nearby devices. These approaches generally assume the presence of a stable backbone—such as edge servers or cloudlets—to orchestrate the handover. In contrast, we consider scenarios with non-deterministic volatile dynamics rather than following predictable patterns. Moreover, our architecture is strictly decentralized; by design, we cannot rely on a hierarchical controller or a stable super-node to maintain a consistent global view of the network state. Consequently, our problem statement requires a reactive, lightweight mechanism capable of functioning locally.
Also, there are conceptual similarities with fog orchestration. In the literature [31,32,33,34], the term fog colony is used to refer to an ensemble of computational resources confined to a given geographical region [35,36]. Fog nodes are typically located closer to the data source than cloud services and encompass a wide range of devices, including dedicated servers, routers, smartphones or edge devices. Fog computing aligns well with modern IoT applications such as smart cities and industrial IoT, where at least some computationally capable and stationary hardware is available to handle processing and coordination tasks. Many fog orchestration approaches adopt a centralized or hierarchical architecture. In a centralized setup, a dedicated orchestrator unit manages resource allocation and service deployment. In contrast, hierarchical models distribute orchestration responsibilities across multiple layers, where higher-tier nodes coordinate lower-tier ones. The more centralized an architecture, the greater the risk of single points of failure and scalability bottlenecks, even as resource management becomes potentially easier. Hierarchical models improve fault tolerance by partially distributing the orchestration responsibilities but still depend on higher-tier nodes, which can create bottlenecks.
Table 1 summarizes the key differences compared to our approach. It eliminates the need for a central controller as it distributes the decision-making process across all nodes. This distinction is critical in edge environments, where unpredictable network topology changes can render a centralized orchestrator unreliable. Some decentralized approaches introduce area orchestrators responsible for local resource management, with role rotation mechanisms to enhance robustness [37]. Our solution, in contrast, manages to orchestrate computation payloads across all nodes without predefined roles or hierarchical coordination, avoiding dependencies on any kind of designated orchestrator unit or role.
Moreover, virtualization and containerization are widely used in fog computing to abstract resources and enable flexible deployment [38]. These approaches are generally impractical for tiny embedded devices due to severe resource constraints. Naturally, hypervisors and container runtimes, requiring a full software stack (kernel, operating system, networking) and specific hardware virtualization support like Intel VT-x, are infeasible in these scenarios. Additionally, limited bandwidth further complicates deployment, as containers are often too large to be efficiently transferred or updated. As a result, edge orchestration must rely on lightweight deployment strategies, such as bare-metal execution, where the neural network is optimized and pre-deployed directly on the embedded device and where optimization enable efficient execution with minimal resource consumption. In particular, DATOR is tailored to orchestrate already pre-segmented inference steps of machine learning architectures. Parameters (typically weights and biases) are selectively loaded from persistent storage into fast memory statically. To minimize the overhead of frequent context switches and to improve pipeline efficiency, these parameters should be retained in the fast memory for the duration of the relevant task or function.
The orchestration concept presented in this work is designed to be paired with a generic network backend allowing for stable communication within a group of devices, enabling peer-to-peer transmission as well as message routing. The decentralized nature of our approach makes it particularly well-suited for application in mobile mesh networks, in particular Mobile Ad-hoc Networks (MANETs) [39,40], which is a self-configuring network of mobile devices that communicate directly without relying on fixed infrastructure or dedicated master nodes. While the framework allows for the implementation of significant fault tolerance (such as acknowledgments and dynamic rescheduling) at the package level, additional fault tolerance in the backend can further enhance stability, especially for routed communication over several hops in volatile environments.
MANETs are used in applications like disaster recovery, environmental monitoring, sensor networks or vehicular networks. Notable examples are Meshmerize, a mesh networking solution designed for industrial applications such as automated warehouses and robotics, providing high-throughput and low-latency communication in environments with frequent obstacles and changing conditions [41], and BATMAN, a decentralized mesh networking protocol used for large-scale community-driven networks, where nodes proactively exchange routing information to adapt quickly to network changes, ensuring continuous connectivity in urban environments [42]. Moreover, Meshtastic is an open-source, low-power wireless communication platform that enables ad-hoc mesh networking over long distances using LoRa transceivers, though its LPWAN-based design inherently limits data throughput and message size. Additionally, recent work has proposed an ad-hoc network concept for low-energy Bluetooth edge devices [43], featuring a particularly lightweight, table-driven routing mechanism, incorporated into standard Bluetooth advertisement messages. For the remainder of this article, we assume the existence of a suitable mesh network backend, enabling stable inter-node communication when connectivity is available, without specifying a particular implementation.
Our approach is particularly well-suited for sequential computations, a characteristic inherent to many deep neural network architectures. This allows to adopt horizontal layer splitting, a technique that distributes consecutive layers across multiple devices to enable efficient pipelined execution. As shown in Figure 1, one device receives the input data, processes Layers 1 to j, communicates the resulting data to device 2, which continues to process layers j + 1 to j + N , and so on. There is no uniform notion for this technique in the literature, however terms like Horizontal Partitioning [44], Layerwise Splitting [45], Layer Pipelining [46] or Sequential Layer Mapping [47] describe the same principle. An excellent survey article is provided by Reference [6].
A key advantage of this approach is the immediate reduction of memory requirements per device. By only requiring a subset of the neural network parameters, weights can reside in the local cache. This minimizes the need for frequent reloading, thereby reducing energy consumption and latency since permanent storage access is slow. Because different layers in a deep learning system naturally have varying numbers of parameters, this method is naturally suited for distribution across heterogeneous hardware [48]. Finally, the implementation and deployment overhead of layer-split models is low, since we only deploy the lightweight, inference-ready model in the production phase. Training is done, as usual, on stronger hardware and before deployment, without any additional overhead.
The identification of optimal split points is a standard multi-objective optimization problem and has been discussed in the literature (compare [6] and references therein). We define an inference step as the sum of computation performed by a group of layers between two split points. The final inference result is available once a data package has traversed all steps in the correct, sequential, order. Note that a large number of split points poses no significant overhead, as consecutive steps can be loaded onto the same device, effectively treating them as a single, larger, sequence of layers. Finally, we note that the layer splitting approach is obviously limited for neural networks with non-sequential architectures, such as multi-modal inputs, branching or long-distance skip connections. While our concept could, in principle, be extended to handle arbitrary directed acyclic graph (DAG) architectures, i.e., execute parallel branches, this is left for future work.

3. Materials and Methods

A core characteristic of our concept is that each device within an ensemble of network nodes operates independently. In other words, the orchestration system is fully distributed and does not rely on a central unit. In a cluster, all nodes operate on the same functional core, irrespective of their hardware capabilities. Relevant nodes roles (e.g., sensor node, inference worker, routing unit, or cloud/satellite gateway) are pre-defined but not statically assigned and instead dynamically assumed at runtime. Hardware limitations may restrict certain specialized roles (e.g., camera input, satellite uplink, neural accelerator) to nodes possessing the necessary equipment. The adaptive role adjustment is enabled by an event-driven architecture, in which operations are always initiated in response to incoming trigger messages, as sketched in Figure 2. Decisions are made locally on the respective device. This guarantees a lightweight execution with negligible algorithmic and memory overhead. This requires messages to be formalized in a way that they can be processed effectively at the relevant decision points. For this purpose packages are aware of the currently required step of the computation as well as the intended final target of the inference result.
At its core, we follow the principles of the Contract Net Protocol (CNP) [49,50,51], employing an auction-based broadcast–response mechanism for decentralized task orchestration. Whenever a cluster participant identifies a new data package in its local queue, it issues periodic request messages describing the required inference step for that specific payload. These requests are broadcast to all reachable nodes within the network, independent of whether the payload originates from a sensor input or from the output of a prior computation stage. Upon receiving such a request, each node evaluates its local capabilities and current workload. If the corresponding computation module is available and the node is idle or within acceptable utilization bounds, it replies with an acceptance message. Once the original requester collects one or more confirmations, it dispatches the data payload to the selected responder(s) for execution. Beyond this basic request-accept scheme, the framework allows for the incorporation of more elaborate bidding strategies–such as weighted auctions–to account for heterogeneous processing capabilities, network latency, and energy constraints across the swarm.
The recipient node immediately begins its part of the inference process, which is finished once the next required step is not loaded. The package is moved to the output queue, and the node initiates another auction round to find other network participants that can process this intermediate result by performing the next required inference step. This process continues until all steps have been successfully executed and the result has been sent to the final target, which is encapsulated within the package meta data.
We emphasize that an ensemble of such nodes can operate with an arbitrary and dynamically changing distribution of inference steps. Packages automatically find their path “through the network”, ensuring that the steps are executed in the correct order and that the result is eventually sent to the intended target. It is clear that if the target node (e.g., equipped with cloud uplink) permanently leaves the system, proper operation is compromised. To address this possibility, one may employ backup target nodes or utilize persistent storage for inference results. Temporary target node outages present no issues due to the system’s ability to seamlessly reintegrate the node upon return.
Since there is no functional difference of whether a data packet is received from another node as the result of a previous computation or directly from an input device, it becomes clear that any node can be fed with suitably shaped data at any time. Incoming data is formally encapsulated by a work package manager (compare Figure 2), which provides queuing functionality for the sending and receiving process and can filter duplicates. It can also be used for monitoring of work packages, or for issuing retransmissions if packages appear to have been lost (indicated by, e.g., time limits or failed acknowledgments).

4. Results

The following simulations provide an initial proof of concept that demonstrates the viability of the proposed approach, whereas a full experimental validation will be addressed in future work. In order to demonstrate how the system achieves pipelining, with nodes working concurrently, we simulate a cluster of five devices in Figure 3. The principal network topology is depicted in the lower right of the figure. For clarity, all orchestration messages (requests, responses, acknowledgments) have been omitted in the figure, which hence shows only actual payload data transfer and computation events. We simulate a scenario, where the data pipeline is segmented into six steps, starting from feed (purple), i.e., gathering and preprocessing sensor data, followed by four sequential inference blocks (orange to red) and eventually an upload of the result (grey). Colored boxes in front of specific rows indicate which steps of this task are currently loaded on the respective device. Only one of the six steps can be executed at a time, while sending and receiving are, in the examples presented here, handled concurrently. Specifically, Node 1 is equipped with both a sensor and an upload gateway, but cannot perform any parts of the actual inference computation. Five work packages (A, B, C, …) are sequentially introduced into the system. If we follow package D, entering the system around t = 17 on Node 1 (purple event), we find that it is first processed by Node 5, as the closer Node 2 is currently busy. Even though consecutive execution of three steps on the same device might be efficient, the framework is not inherently aware of such optimizations. Instead, package routing decision are made locally, in this case based solely on availability and the order of accept arrivals. After the first three inference steps are completed, the package is routed through Nodes 3 and 1 before reaching Node 4, which handles the final computation step and sends the package back to Node 1 for result handling. Note that redundant allocation of certain computation steps increases the degree of parallel execution. In this example, only Step 4 is available on a single node (Node 4), possibly creating a bottleneck.
In another set of simulations we evaluate the distributed inference performance depending on the rate of incoming inference elements (e.g., the frequency of images captured by a camera connected to Node 1). We employ the same network topology and step configuration that has been used above (compare Figure 3). Node 1 is continuously fed with inference packages, separated by defined time intervals. The inverse of these interval constitutes the incoming package rate λ . Durations of the individual steps are modeled such as to mimic a generic CNN-based vision or object detection architecture [52]. In particular, we set for the step computing durations t c = ( t c ( 1 ) , t c ( 2 ) , t c ( 3 ) , t c ( 4 ) ) = ( 50 , 30 , 20 , 40 ) s, with the most computational load (in terms of FLOPs) residing on the first few convolution layers–which typically present a large number of channels–as well as the classification section at the end with often multiple fully connected layers. The transmission duration of intermediate results between the steps are modeled as t send = ( t send ( 1 2 ) , t send ( 2 3 ) , t send ( 3 4 ) ) = ( 5.0 , 3.0 , 2.0 ) s, reflecting a generic VGG-like architecture with a larger number of activations in earlier layers and progressively fewer towards the classifier. Sensor data acquisition and preprocessing (only performed by Node 1) account for t pre = 10 s. Transmission after preprocessing to inference step 1, as well as from step 4 to the target node is considered small, t send ( feed 1 ) = t send ( 4 target ) = 1.0 s, since only little data is transferred (low-resolution image and few bytes encoding the result, respectively). Since orchestration messages are designed to be particularly lightweight and are efficiently handled by the framework message handling architecture discussed in Figure 2, their transmission times t orch are also significantly shorter compared to other processes. In summary this leads to the following comparison of timescales, t orch t send t c . Note that transmission times are given for each link in the path (i.e., per hop). Moreover, all timings have been randomized in order to capture environmental influences present in a real-world setting more realistically.
We evaluate system performance parameters under four distinct network scenarios, detailed in Table 2. We begin by analyzing the stable scenario, represented by the blue curves in Figure 4 and Figure 5. This baseline scenario simulates a network with no failures. At low incoming packet rates λ , the average packet latency (the time from packet submission to result handling) remains constant. In this regime, packets are processed sequentially without queuing or mutual interference. Consequently, throughput increases approximately linearly with λ . Figure 5e) shows that Node 5 is initially inactive. This is because Node 2, which also performs the first inference step, is located closer to Node 1 (less hops). However, as the system becomes more congested (around λ 8 × 10 3 ), Node 5 begins processing packets and quickly assumes a share of the workload. Average system latency remains low until approximately λ 2 × 10 2 , where the incoming packet rate exceeds the combined processing capacity of Nodes 2 and 5 for inference step 1. At this point, Node 4 also becomes a bottleneck, as can be seen in Figure 5d). Further increases in λ lead to a traffic congestion regime, where both latency and throughput saturate, indicating the maximum achievable performance for this configuration.
In the unstable scenario (orange curves), the unreliability of Node 2 leads to immediate involvement of Node 5 in packet processing. The average latency begins to increase at lower values of λ compared to the stable scenario. This is because the intermittent unavailability of Node 2 not only reduces the efficiency of inference step 1 but also disrupts routing paths temporarily. The unstable system eventually reaches saturation, however it does so at a higher λ value and lower throughput than the stable configuration.
The 5% message loss scenario (green curves) exhibits similar characteristics. The baseline latency is elevated due to the delays introduced by message loss and the subsequent rescheduling of orchestration messages or payloads. Given that a four-step inference process requires five successful transmissions (including preprocessing and result gathering), there is a probability of 1 0 . 95 n for failure in a chain of n consecutive messages, which is approximately 22.6% for n = 5 . Similar to the unstable scenario, Node 5 becomes active already for small λ . However, the average idle time is lower than in the unstable scenario. This difference likely stems from the fact that the unstable scenario reduces available processing options due to node failures, whereas in the loss scenario, devices can continue their regular operations, even if some messages are lost.
Finally, we consider the slow network scenario (red curves). As expected, the baseline latency for purely sequential computation is higher than in the fast stable case, which can be directly attributed to the increased transmission latency. Node 5 becomes active even at very low packet rates and shows a significant increase in activity around λ 2 × 10 2 . This slow network configuration reaches the traffic congestion regime earlier (around λ 2 × 10 2 ) and exhibits a higher average latency and lower throughput, similar to the fast but unstable or loss scenarios. The average idle time is notably lower compared to all other scenarios. This is particularly evident for Node 1, which increases its activity from approximately 10–30% in the fast scenarios to about 60% in the slow network. This increased workload is attributed to the fact that Node 1 is not only preprocessing and distributing initial packets but also serves as a central routing hub for communication between Nodes 5 or 4 and Node 3. A similar increase in activity is observed for Nodes 2 and 3. Only the “end nodes”, 4 and 5, maintain similar activity levels compared to the fast scenarios.

5. Discussion

We present a concept for distributed machine learning inference in a mobile ad-hoc network of low-powered edge devices. By deploying an inference task as pre-defined steps, our approach becomes particularly lightweight as only the neuron activations at the split points, along with minimal header information, are transmitted between computations. Appropriate split points can minimize activation sizes and hence transmissions loads. For optimal pipelining performance and energy efficiency, we recommend partitioning the network such that the neural network weights and biases associated with each inference step fit within the cache of the specific edge devices used in the application, thereby mitigating the overhead of frequent context switching. Our approach can be well complemented by established neural network compression methods, such as pruning or quantization.
It is evident that even in the context of only few devices, volatile ad-hoc networks exhibit a large number of degrees of freedom. There can be temporary activation and deactivation of nodes, dynamic addition and removal of nodes or connections, dynamic allocation of computational tasks, transmission interference leading to spontaneous message loss, or injection of data packages anywhere and anytime. These factors create combinatorially vast possibilities for the realization of task paths in the orchestration process. The self-organized design principle, which is capable of handling these dynamic conditions at runtime and is able to route packages accordingly is hence not merely convenience but a necessity.
A cluster of DATOR devices is not restricted to inference of a single machine learning task but can handle an entire procedure of tasks. Consider a wildlife research scenario with lightweight mobile nodes, e.g., drones or tagged animals [53,54]. Particular nodes, equipped with a camera, periodically capture images. An object detection architecture (Task 1) identifies animals in these images. Upon detection, a second neural network (e.g., a Super-Resolution Generative Adversarial Network, SRGAN), enhances details in the corresponding region of interest (Task 2). The feed-forward architecture of the generator part (convolution layers, upsampling layers, residual blocks) makes it suitable for integration within our sequential step design. If the researcher is, for instance, interested in specific birds, a third classification task (Task 3, possibly several steps) can employed. The recognized species is then communicated via satellite (Task 4, single step). Figure 6 illustrates the conceptual process diagram for this multistage distributed on-edge AI classification system using our framework.
The present work provides a proof-of-concept, successfully showcasing the desired functionality in simulations. This work should be viewed as a starting point for future research, with several avenues for extension and refinement. Currently, inference steps are statically assigned and manually adjusted via trigger messages. Simple heuristics, such as a device automatically assuming a step after repeated rejections by others, or an initiating node delegating tasks, are currently being implemented. A more advanced approach, currently under development, involves adaptive and self-organized task reassignment based on on-demand resource and hardware constraints. Furthermore, incorporating a short-term memory mechanism to prioritize stable connections and paths over unstable ones is currently under investigation.
Our systematic evaluations have thus far been limited to small- to medium-sized networks (up to approximately ten nodes). Much larger meshes are feasible, and in principle, their work performance would benefit from the redundancy provided by the distributed computational steps. However, a concern for scaling involves the management of communication overhead, as network-wide request messages can quickly lead to channel saturation. Therefore, appropriate lightweight strategies to mitigate this effect are currently under development.
Also, we aim to further enhance fault tolerance and load balancing. More elaborate acknowledgment mechanisms, well-suited to our lightweight messaging pattern, are planned to further improve robustness against disruptions. While the current framework inherently balances loads (occupied resources become unavailable for other computations), we envision further refinements. Including weights or minimal hardware information (e.g., battery levels, perhaps already provided by the network backend) within orchestration messages will enable prioritized task path decisions and hence a more nuanced load management.
Most importantly, we are in the process of deploying the system on actual hardware in a real use case. This step will enable a more quantitative empirical evaluation, including measurements of energy consumption, latency, communication overhead, and system behavior under realistic environmental volatility. Such validation will be essential to assess practical performance and to guide possible improvements of the orchestration concept. Ultimately, within an ongoing project, we plan to deploy the system in an existing use-case in the field by equipping animal tags designed for AI-driven behavioral analysis with this technology as part of the GAIA biodiversity protection project locates in the Etosha national park in Namibia [53,54].

Author Contributions

Conceptualization, M.S., M.T., T.O. and F.K.; methodology, M.S. and T.O.; software, M.S. and M.T.; validation, M.S.; formal analysis, M.S.; investigation, M.S.; resources, F.K.; writing—original draft preparation, M.S.; writing—review and editing, M.S., M.T., T.O. and F.K.; visualization, M.S.; project administration, M.S., F.K. and T.O. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by the German Aerospace Center (DLR) and the German Federal Ministry for Economic Affairs and Climate Action (BMWK) through grant number 50YB2202B.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author due to the ongoing development of the proprietary framework and its involvement in pending patent applications/intellectual property rights.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

  1. Khan, W.Z.; Ahmed, E.; Hakak, S.; Yaqoob, I.; Ahmed, A. Edge computing: A survey. Future Gener. Comput. Syst. 2019, 97, 219–235. [Google Scholar] [CrossRef]
  2. Wang, X.; Han, Y.; Leung, V.C.M.; Niyato, D.; Yan, X.; Chen, X. Convergence of Edge Computing and Deep Learning: A Comprehensive Survey. IEEE Commun. Surv. 2020, 22, 869–904. [Google Scholar] [CrossRef]
  3. Portilla, J.; Mujica, G.; Lee, J.S.; Riesgo, T. The Extreme Edge at the Bottom of the Internet of Things: A Review. IEEE Sens. J. 2019, 19, 3179–3190. [Google Scholar] [CrossRef]
  4. David, F.M.; Carlyle, J.C.; Campbell, R.H. Context switch overheads for Linux on ARM platforms. In Proceedings of the 2007 Workshop on Experimental Computer Science, San Diego, CA, USA, 13–14 June 2007. [Google Scholar]
  5. Suo, K.; Shi, Y.; Hung, C.C.; Bobbie, P. Quantifying context switch overhead of artificial intelligence workloads on the cloud and edges. In Proceedings of the 36th Annual ACM Symposium on Applied Computing, SAC ’21, Virtual, 22–26 March 2021. [Google Scholar] [CrossRef]
  6. Rodriguez-Conde, I.; Campos, C.; Fdez-Riverola, F. Horizontally distributed inference of deep neural networks for AI-enabled IoT. Sensors 2023, 23, 1911. [Google Scholar] [CrossRef]
  7. Kegeleirs, M.; Grisetti, G.; Birattari, M. Swarm SLAM: Challenges and Perspectives. Front. Robot. AI 2021, 8, 618268. [Google Scholar] [CrossRef]
  8. Majcherczyk, N.; Nallathambi, D.J.; Antonelli, T.; Pinciroli, C. Distributed Data Storage and Fusion for Collective Perception in Resource-Limited Mobile Robot Swarms. IEEE Robot. Autom. Lett. 2021, 6, 5549–5556. [Google Scholar] [CrossRef]
  9. Ramachandran, R.K.; Kakish, Z.; Berman, S. Information Correlated Lévy Walk Exploration and Distributed Mapping Using a Swarm of Robots. IEEE Trans. Robot. 2020, 36, 1422–1441. [Google Scholar] [CrossRef]
  10. Saeedi, S.; Trentini, M.; Seto, M.; Li, H. Multiple-Robot Simultaneous Localization and Mapping: A Review: Multiple-Robot Simultaneous Localization and Mapping. J. Field Robot. 2015, 33, 3–46. [Google Scholar] [CrossRef]
  11. Bezas, K.; Tsoumanis, G.; Angelis, C.T.; Oikonomou, K. Coverage Path Planning and Point-of-Interest Detection Using Autonomous Drone Swarms. Sensors 2022, 22, 7551. [Google Scholar] [CrossRef]
  12. McGuire, K.N.; De Wagter, C.; Tuyls, K.; Kappen, H.J.; de Croon, G.C.H.E. Minimal navigation solution for a swarm of tiny flying robots to explore an unknown environment. Sci. Robot. 2019, 4, eaaw9710. [Google Scholar] [CrossRef]
  13. Soria, E.; Schiano, F.; Floreano, D. Distributed Predictive Drone Swarms in Cluttered Environments. IEEE Robot. Autom. Lett. 2022, 7, 73–80. [Google Scholar] [CrossRef]
  14. Schilling, F.; Schiano, F.; Floreano, D. Vision-Based Drone Flocking in Outdoor Environments. IEEE Robot. Autom. Lett. 2021, 6, 2954–2961. [Google Scholar] [CrossRef]
  15. Schilling, F.; Soria, E.; Floreano, D. On the Scalability of Vision-Based Drone Swarms in the Presence of Occlusions. IEEE Access 2022, 10, 28133–28146. [Google Scholar] [CrossRef]
  16. Gielis, J.; Shankar, A.; Prorok, A. A Critical Review of Communications in Multi-robot Systems. Curr. Robot. Rep. 2022, 3, 213–225. [Google Scholar] [CrossRef]
  17. Xu, J.; Gu, B.; Tian, G. Review of agricultural IoT technology. Artif. Intell. Agric. 2022, 6, 10–22. [Google Scholar] [CrossRef]
  18. Kalyani, Y.; Collier, R. A systematic survey on the role of cloud, fog, and edge computing combination in smart agriculture. Sensors 2021, 21, 5922. [Google Scholar] [CrossRef]
  19. Julian Hagert, B.S. Adaptive Mesh-Netzwerke zur Steigerung der Konnektivität von Landmaschinen. In 44. GIL—Jahrestagung, Biodiversität Fördern Durch Digitale Landwirtschaft; Gesellschaft für Informatik e.V.: Bonn, Germany, 2024; pp. 281–286. [Google Scholar] [CrossRef]
  20. Raffelsberger, C.; Hellwagner, H. Evaluation of MANET routing protocols in a realistic emergency response scenario. In Proceedings of the 10th International Workshop on Intelligent Solutions in Embedded Systems, Klagenfurt, Austria, 5–6 July 2012; pp. 88–92. [Google Scholar]
  21. Anjum, S.S.; Noor, R.M.; Anisi, M.H. Review on MANET based communication for search and rescue operations. Wirel. Pers. Commun. 2017, 94, 31–52. [Google Scholar] [CrossRef]
  22. Panda, K.G.; Das, S.; Sen, D.; Arif, W. Design and deployment of UAV-aided post-disaster emergency network. IEEE Access 2019, 7, 102985–102999. [Google Scholar] [CrossRef]
  23. Peer, M.; Bohara, V.A.; Srivastava, A. Multi-UAV Placement Strategy for Disaster-Resilient Communication Network. In Proceedings of the 2020 IEEE 92nd Vehicular Technology Conference (VTC2020-Fall), Virtual, 18 November–16 December 2020; pp. 1–7. [Google Scholar] [CrossRef]
  24. Debnath, S.; Arif, W.; Roy, S.; Baishya, S.; Sen, D. A Comprehensive Survey of Emergency Communication Network and Management. Wirel. Pers. Commun. 2022, 124, 1375–1421. [Google Scholar] [CrossRef]
  25. Waheed, M.; Ahmad, R.; Ahmed, W.; Mahtab Alam, M.; Magarini, M. On coverage of critical nodes in UAV-assisted emergency networks. Sensors 2023, 23, 1586. [Google Scholar] [CrossRef]
  26. Mao, W.; Zhao, Z.; Chang, Z.; Min, G.; Gao, W. Energy-efficient industrial Internet of Things: Overview and open issues. IEEE Trans. Ind. Inform. 2021, 17, 7225–7237. [Google Scholar] [CrossRef]
  27. Syed, A.S.; Sierra-Sosa, D.; Kumar, A.; Elmaghraby, A. IoT in smart cities: A survey of technologies, practices and challenges. Smart Cities 2021, 4, 429–475. [Google Scholar] [CrossRef]
  28. Zaman, S.K.U.; Jehangiri, A.I.; Maqsood, T.; Ahmad, Z.; Umar, A.I.; Shuja, J.; Alanazi, E.; Alasmary, W. Mobility-aware computational offloading in mobile edge networks: A survey. Clust. Comput. 2021, 24, 2735–2756. [Google Scholar] [CrossRef]
  29. Abkenar, F.S.; Ramezani, P.; Iranmanesh, S.; Murali, S.; Chulerttiyawong, D.; Wan, X.; Jamalipour, A.; Raad, R. A survey on mobility of edge computing networks in IoT: State-of-the-art, architectures, and challenges. IEEE Commun. Surv. Tutor. 2022, 24, 2329–2365. [Google Scholar] [CrossRef]
  30. Cao, Y.; Maghsudi, S.; Ohtsuki, T.; Quek, T.Q.S. Mobility-Aware Routing and Caching in Small Cell Networks Using Federated Learning. IEEE Trans. Commun. 2024, 72, 815–829. [Google Scholar] [CrossRef]
  31. Wen, Z.; Yang, R.; Garraghan, P.; Lin, T.; Xu, J.; Rovatsos, M. Fog orchestration for internet of things services. IEEE Internet Comput. 2017, 21, 16–24. [Google Scholar] [CrossRef]
  32. De Brito, M.S.; Hoque, S.; Magedanz, T.; Steinke, R.; Willner, A.; Nehls, D.; Keils, O.; Schreiner, F. A service orchestration architecture for fog-enabled infrastructures. In Proceedings of the 2017 Second International Conference on Fog and Mobile Edge Computing (FMEC), Valencia, Spain, 8–11 May 2017; pp. 127–132. [Google Scholar]
  33. Costa, B.; Bachiega, J., Jr.; De Carvalho, L.R.; Araujo, A.P. Orchestration in fog computing: A comprehensive survey. ACM Comput. Surv. (CSUR) 2022, 55, 1–34. [Google Scholar] [CrossRef]
  34. Kashani, M.H.; Mahdipour, E. Load Balancing Algorithms in Fog Computing. IEEE Trans. Serv. Comput. 2023, 16, 1505–1521. [Google Scholar] [CrossRef]
  35. Vaquero, L.M.; Cuadrado, F.; Elkhatib, Y.; Bernal-Bernabe, J.; Srirama, S.N.; Zhani, M.F. Research challenges in nextgen service orchestration. Future Gener. Comput. Syst. 2019, 90, 20–38. [Google Scholar] [CrossRef]
  36. Mann, Z.A. Decentralized application placement in fog computing. IEEE Trans. Parallel Distrib. Syst. 2022, 33, 3262–3273. [Google Scholar] [CrossRef]
  37. Toczé, K.; Nadjm-Tehrani, S. ORCH: Distributed Orchestration Framework using Mobile Edge Devices. In Proceedings of the 2019 IEEE 3rd International Conference on Fog and Edge Computing (ICFEC), Larnaca, Cyprus, 14–17 May 2019; pp. 1–10. [Google Scholar] [CrossRef]
  38. Jalali, F.; Lynar, T.; Smith, O.J.; Kolluri, R.R.; Hardgrove, C.V.; Waywood, N.; Suits, F. Dynamic edge fabric environment: Seamless and automatic switching among resources at the edge of iot network and cloud. In Proceedings of the 2019 IEEE International Conference on Edge Computing (EDGE), San Diego, CA, USA, 25–30 June 2019; pp. 77–86. [Google Scholar]
  39. Corson, M.S.; Macker, J.P. Mobile Ad hoc Networking (MANET): Routing Protocol Performance Issues and Evaluation Considerations. RFC 1999, 2501, 1–12. [Google Scholar]
  40. Sharmila, S.; Shanthi, T.K. A survey on wireless ad hoc network: Issues and implementation. In Proceedings of the 2016 International Conference on Emerging Trends in Engineering, Technology and Science (ICETETS), Pudukkottai, India, 24–26 February 2016; pp. 1–6. [Google Scholar]
  41. Meshmerize. User Guide. 2024. Available online: https://docs.meshmerize.net (accessed on 25 November 2025).
  42. Seither, D.; König, A.; Hollick, M. Routing performance of Wireless Mesh Networks: A practical evaluation of BATMAN advanced. In Proceedings of the 2011 IEEE 36th Conference on Local Computer Networks, Bonn, Germany, 4–7 October 2011; pp. 897–904. [Google Scholar] [CrossRef]
  43. Ohlenforst, T.; Schreiber, M.; Kreyß, F.; Schrauth, M. Enabling Distributed Inference of Large Neural Networks on Resource Constrained Edge Devices using Ad Hoc Networks. In Proceedings of the International Symposium on Distributed Computing and Artificial Intelligence, Guimarães, Portugal, 12–14 July 2023; Springer: Berlin/Heidelberg, Germany, 2023; pp. 145–154. [Google Scholar]
  44. Hou, X.; Guan, Y.; Han, T.; Zhang, N. Distredge: Speeding up convolutional neural network inference on distributed edge devices. In Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS), Lyon, France, 30 May–3 June 2022; pp. 1097–1107. [Google Scholar]
  45. Parthasarathy, A.; Krishnamachari, B. DEFER: Distributed edge inference for deep neural networks. In Proceedings of the 2022 14th International Conference on COMmunication Systems & NETworkS (COMSNETS), Bengaluru, India, 3–9 January 2022; pp. 749–753. [Google Scholar]
  46. Ben-Nun, T.; Hoefler, T. Demystifying parallel and distributed deep learning: An in-depth concurrency analysis. ACM Comput. Surv. (CSUR) 2019, 52, 1–43. [Google Scholar] [CrossRef]
  47. Stahl, R.; Hoffman, A.; Mueller-Gritschneder, D.; Gerstlauer, A.; Schlichtmann, U. Deeperthings: Fully distributed cnn inference on resource-constrained edge devices. Int. J. Parallel Program. 2021, 49, 600–624. [Google Scholar] [CrossRef]
  48. Hu, C.; Li, B. Distributed inference with deep learning models across heterogeneous edge devices. In Proceedings of the IEEE INFOCOM 2022-IEEE Conference on Computer Communications, Virtual, 2–5 May 2022; pp. 330–339. [Google Scholar]
  49. Skaltsis, G.M.; Shin, H.S.; Tsourdos, A. A review of task allocation methods for UAVs. J. Intell. Robot. Syst. 2023, 109, 76. [Google Scholar] [CrossRef]
  50. Khamis, A.; Hussein, A.; Elmogy, A. Multi-robot task allocation: A review of the state-of-the-art. In Cooperative Robots and Sensor Networks; Springer: Berlin/Heidelberg, Germany, 2015; pp. 31–51. [Google Scholar]
  51. Smith, R.G. The contract net protocol: High-level communication and control in a distributed problem solver. IEEE Trans. Comput. 1980, 29, 1104–1113. [Google Scholar] [CrossRef]
  52. Li, H.; Wang, Z.; Yue, X.; Wang, W.; Tomiyama, H.; Meng, L. An architecture-level analysis on deep learning models for low-impact computations. Artif. Intell. Rev. 2023, 56, 1971–2010. [Google Scholar] [CrossRef]
  53. Rast, W.; Portas, R.; Shatumbu, G.I.; Berger, A.; Cloete, C.; Curk, T.; Götz, T.; Aschenborn, O.; Melzheimer, J. Death detector: Using vultures as sentinels to detect carcasses by combining bio-logging and machine learning. J. Appl. Ecol. 2024, 61, 2936–2945. [Google Scholar] [CrossRef]
  54. Ingaleshwar, S.; Thasharofi, F.; Pava, M.A.; Vaishya, H.; Tabak, Y.; Ernst, J.; Portas, R.; Rast, W.; Melzheimer, J.; Aschenborn, O.; et al. Wildlife Species Classification on the Edge: A Deep Learning Perspective. In Proceedings of the 16th International Conference on Agents and Artificial Intelligence, ICAART 2024, Rome, Italy, 24–26 February 2024; Volume 3, pp. 600–608. [Google Scholar]
Figure 1. (a) Conceptual illustration of splitting a sequential neural network into four groups of layers. Panels (b,c) show examples of distributed data processing from the input (INP) through the four computation steps (e.g., groups of layers in a neural network inference) to result handling (RES), requiring several transmissions (arrows, annotated with the next required stage). In this simplified example, nodes can carry up to two stages.
Figure 1. (a) Conceptual illustration of splitting a sequential neural network into four groups of layers. Panels (b,c) show examples of distributed data processing from the input (INP) through the four computation steps (e.g., groups of layers in a neural network inference) to result handling (RES), requiring several transmissions (arrows, annotated with the next required stage). In this simplified example, nodes can carry up to two stages.
Applsci 15 12615 g001
Figure 2. Illustration of the software architecture that runs independently on each device in a cluster. Distinguished by their type, incoming messages trigger the corresponding specialized handler unit. Configuration messages act directly on or step assignments and variables, whereas orchestration messages interact with the work package manager and drive the broadcast-response mechanism through transmission of further messages. A work package typically holds one inference element or some associated intermediate result.
Figure 2. Illustration of the software architecture that runs independently on each device in a cluster. Distinguished by their type, incoming messages trigger the corresponding specialized handler unit. Configuration messages act directly on or step assignments and variables, whereas orchestration messages interact with the work package manager and drive the broadcast-response mechanism through transmission of further messages. A work package typically holds one inference element or some associated intermediate result.
Applsci 15 12615 g002
Figure 3. Timeline diagram showcasing the paths of five packages (A, B, C, D, E) through a network of five nodes. Colors represent different actions (sending, receiving, computing, etc.). The cluster operates a six-stage process, starting with data acquisition (purple), followed by four consecutive inference blocks (from orange to red), and concluding with cloud upload (gray). The filled or empty boxes in the corresponding rows on the left indicate which steps are available on each node. Green and blue colors are reserved for data transmission–they do not count as processing stages.
Figure 3. Timeline diagram showcasing the paths of five packages (A, B, C, D, E) through a network of five nodes. Colors represent different actions (sending, receiving, computing, etc.). The cluster operates a six-stage process, starting with data acquisition (purple), followed by four consecutive inference blocks (from orange to red), and concluding with cloud upload (gray). The filled or empty boxes in the corresponding rows on the left indicate which steps are available on each node. Green and blue colors are reserved for data transmission–they do not count as processing stages.
Applsci 15 12615 g003
Figure 4. Simulation results for different scenarios listed in Table 2. Latency is the average time an inference element spends in the system. Throughput is the number of packages processed per unit of time. Shaded regions denote approximate uncertainty ranges. Dashed line indicate baseline latency values for uncrowded systems.
Figure 4. Simulation results for different scenarios listed in Table 2. Latency is the average time an inference element spends in the system. Throughput is the number of packages processed per unit of time. Shaded regions denote approximate uncertainty ranges. Dashed line indicate baseline latency values for uncrowded systems.
Applsci 15 12615 g004
Figure 5. Further simulation results for the scenarios listed in Table 2. Panels (ae) visualize the amount of idle time for each node, whereas panel (f) shows the overall average, with shaded regions showing the standard error of the average.
Figure 5. Further simulation results for the scenarios listed in Table 2. Panels (ae) visualize the amount of idle time for each node, whereas panel (f) shows the overall average, with shaded regions showing the standard error of the average.
Applsci 15 12615 g005
Figure 6. Illustrative example of a sequential pipeline with several tasks, involving simple logic branching points: A wildlife researcher is operating an edge cluster which captures images and performs a multi-stage object detection task in order to identify animal species.
Figure 6. Illustrative example of a sequential pipeline with several tasks, involving simple logic branching points: A wildlife researcher is operating an edge cluster which captures images and performs a multi-stage object detection task in order to identify animal species.
Applsci 15 12615 g006
Table 1. Comparison of our extreme edge task orchestration framework vs. common fog orchestration approaches.
Table 1. Comparison of our extreme edge task orchestration framework vs. common fog orchestration approaches.
FeatureOur ApproachFog Orchestration
ArchitectureDecentralized, self-organizingCentralized, hierarchical, or partially decentralized
ControlLocal decision-making, autonomous nodesDedicated central or domain-based controller
Task AllocationDynamic, event-drivenOften centralized scheduling
Resource AvailabilityMobile devices, high volatilityRelatively stable devices, low to moderate mobility
Resource ManagementDiscover-and-useCentral or domain-based management
DeploymentPre-deployed, optimized embedded algorithmsDynamic containers, VMs, or serverless computing
Target WorkloadsParameter-intensive, predominantly sequential tasksBroad, service-centric applications
Table 2. Scenarios simulated in this work.
Table 2. Scenarios simulated in this work.
NameDescription
stableStatic network topology, no node or link failures. Baseline scenario.
unstableNode 2 experiences intermittent failures, resulting in 50% unavailability
loss 5%5% packet loss rate for all messages (orchestration and payload)
slowmessage transmission time ( t send ) increased by a factor of 10, i.e.,  t send t c
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Schrauth, M.; Thome, M.; Ohlenforst, T.; Kreyß, F. Self-Organized Neural Network Inference in Dynamic Edge Networks. Appl. Sci. 2025, 15, 12615. https://doi.org/10.3390/app152312615

AMA Style

Schrauth M, Thome M, Ohlenforst T, Kreyß F. Self-Organized Neural Network Inference in Dynamic Edge Networks. Applied Sciences. 2025; 15(23):12615. https://doi.org/10.3390/app152312615

Chicago/Turabian Style

Schrauth, Manuel, Moritz Thome, Torsten Ohlenforst, and Felix Kreyß. 2025. "Self-Organized Neural Network Inference in Dynamic Edge Networks" Applied Sciences 15, no. 23: 12615. https://doi.org/10.3390/app152312615

APA Style

Schrauth, M., Thome, M., Ohlenforst, T., & Kreyß, F. (2025). Self-Organized Neural Network Inference in Dynamic Edge Networks. Applied Sciences, 15(23), 12615. https://doi.org/10.3390/app152312615

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop