HGA-DP: Optimal Partitioning of Multimodal DNNs Enabling Real-Time Image Inference for AR-Assisted Communication Maintenance on Cloud-Edge-End Systems
Abstract
1. Introduction
- We design an automated Cloud-Edge-End collaborative inference framework that intelligently partitions complex multimodal DNNs to co-optimize for inference latency and terminal energy consumption.
- We propose a novel Hybrid Genetic Algorithm for DNN Partitioning (HGA-DP), featuring a unique hybrid search strategy and a constraint-aware repair mechanism to efficiently find near-optimal solutions for this NP-hard problem.
- We construct a simulation platform covering cloud, edge, and terminal nodes with heterogeneous computational capabilities (40–600 GFLOPs at terminal, 10 TFLOPs at edge, and 100 TFLOPs at cloud) and realistic network bandwidths (300 Mbps to 1 Gbps) to evaluate that our framework achieves significant performance gains in both latency and energy, validating its practical effectiveness.
2. Related Work
2.1. Cloud-Edge-End Collaborative Inference for DNNs
2.2. DNN Model Partitioning for Computation Offloading
3. System Model and Problem Formulation
3.1. System Architecture
- Terminal Device: The end-user device (e.g., AR glasses), offering less processing latency but severely constrained by computational power and a finite energy budget.
- Edge Server: An intermediate computing node geographically close to the terminal, providing moderate computational resources.
- Cloud Server: A centralized data center with virtually unlimited computational resources, but subject to significant communication latency due to its remote location.
3.2. Application Model as a Directed Acyclic Graph (DAG)
3.3. Model Partitioning Scheme
3.4. Performance Models
3.4.1. Inference Latency
- Computation Latency of a Subgraph
- 2.
- Communication Latency between Subgraphs
- 3.
- Critical Path Latency Calculation
3.4.2. Terminal Energy Consumption
3.5. Problem Formulation
4. A Hybrid Genetic Algorithm for DNN Partitioning
4.1. Chromosome Design and Fitness Function
4.2. Evolutionary Operators
4.3. Constraint-Aware Repair Mechanism
| Algorithm 1: Constraint-Aware Repair Mechanism (Repair_Chromosome) | |
| Input: Chromosome , Subgraph DAG | |
| Output: A valid (repaired) chromosome | |
| 1: | for each subgraph in topological order of do |
| 2: | ← |
| 3: | if then |
| 4: | |
| 5: | end if |
| 6: | end for |
| 7: | return |
4.4. Hybridization with Neighborhood Search
| Algorithm 2: Neighborhood Search (Neighborhood_Search) | |
| Input: Elite chromosome , Subgraph DAG , Number to inject | |
| Output: List of top improved neighbors | |
| 1: | ← |
| 2: | ← Fitness (, ) |
| 3: | for each gene in do |
| 4: | for each level offset do |
| 5: | ← |
| 6: | new_level ← |
| 7: | if new_level is a valid device level then |
| 8: | ← new_level |
| 9: | ← Repair(, ) |
| 10: | ← Fitness(, ) |
| 11: | if > then |
| 12: | Add (, ) to |
| 13: | end if |
| 14: | end if |
| 15: | end for |
| 16: | end for |
| 17: | Sort in descending order of fitness |
| 18: | ← top unique solutions from |
| 19: | return |
| Algorithm 3: Main Steps of the Hybrid Genetic Algorithm for DNN Partitioning (HGA-DP) | |
| Input: Subgraph DAG , population size , max generations , etc. | |
| Output: The best deployment solution | |
| 1: | // 1. Initialization Phase |
| 2: | ← InitializePopulation(, ) |
| 3: | Evaluate fitness for all individuals in using objective function |
| 4: | ← Fittest individual from |
| 5: | // 2. Evolutionary Loop |
| 6: | for generation = 1 to do |
| 7: | ← Fittest individual in |
| 8: | ← Neighborhood_Search(, , ) |
| 9: | ← ∅ |
| 10: | Add top individuals from to |
| 11: | Add all individuals from to |
| 12: | while do |
| 13: | , ← Tournament_Selection() |
| 14: | ← Crossover(, ) |
| 15: | ← Mutate() |
| 16: | ← Repair(, ) |
| 17: | Add to |
| 18: | end while |
| 19: | ← |
| 20: | Evaluate fitness for all new individuals in |
| 21: | // Update Best-So-Far Solution |
| 22: | end for |
| 23: | return |
5. Results
5.1. Experimental Setup
- Benchmark Model and Dataset
- 2.
- Simulated Environment and Parameters
- 3.
- Baseline Methods
- All-Terminal: The entire LXMERT model is executed on the terminal device. This strategy yields the lowest possible latency if the device is powerful enough, but at the cost of maximum terminal energy consumption.
- All-Edge: The entire model is offloaded and executed on the edge server. This strategy serves as an intermediate option, offloading computation from the terminal while potentially offering lower network latency than the cloud. Its effectiveness, however, is contingent on the edge server’s processing power and the quality of the local network connection.
- All-Cloud: The model is executed entirely on the cloud server, with only raw data being transmitted from the terminal. This approach minimizes terminal energy consumption but typically results in high end-to-end latency due to network communication overhead.
- DADS (Dynamic Adaptive DNN Surgery): We also benchmark against DADS, a prominent heuristic that models DNN partitioning as a min-cut problem. Its sole objective is to find a partition that minimizes end-to-end latency. As a single-objective algorithm, DADS exclusively focuses on latency and ignores terminal energy consumption, making it a strong latency-centric baseline. We adapt its original two-tier design to our three-tier system using a hierarchical approach.
5.2. Results Comparison
- Total Inference Latency
- 2.
- Terminal Energy Consumption
- 3.
- Overall Fitness and Global Optimality
- 4.
- Impact of Terminal Capability
- 5.
- Algorithm Efficiency
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Siriwardhana, Y.; Porambage, P.; Liyanage, M.; Ylianttila, M. A Survey on Mobile Augmented Reality with 5G Mobile Edge Computing: Architectures, Applications, and Technical Aspects. IEEE Commun. Surv. Tutor. 2021, 23, 1160–1192. [Google Scholar] [CrossRef]
- Guo, W.; Wang, J.; Wang, S. Deep Multimodal Representation Learning: A Survey. IEEE Access 2019, 7, 63373–63394. [Google Scholar] [CrossRef]
- Oh, S.; Kim, M.; Kim, D.; Jeong, M.; Lee, M. Investigation on performance and energy efficiency of CNN-based object detection on embedded device. In Proceedings of the 2017 4th International Conference on Computer Applications and Information Processing Technology (CAIPT), Bali, Indonesia, 8–10 August 2017; pp. 1–4. [Google Scholar]
- Fernando, N.; Loke, S.W.; Rahayu, W. Mobile cloud computing: A survey. Future Gener. Comput. Syst. 2013, 29, 84–106. [Google Scholar] [CrossRef]
- Wang, Y.; Yang, C.; Lan, S.; Zhu, L.; Zhang, Y. End-Edge-Cloud Collaborative Computing for Deep Learning: A Comprehensive Survey. IEEE Commun. Surv. Tutor. 2024, 26, 2647–2683. [Google Scholar] [CrossRef]
- Kar, B.; Yahya, W.; Lin, Y.-D.; Ali, A. A Survey on Offloading in Federated Cloud-Edge-Fog Systems with Traditional Optimization and Machine Learning. arXiv 2022, arXiv:2202.10628. [Google Scholar] [CrossRef]
- Zeng, C.; Wang, X.; Zeng, R.; Li, Y.; Shi, J.; Huang, M. Joint optimization of multi-dimensional resource allocation and task offloading for QoE enhancement in Cloud-Edge-End collaboration. Future Gener. Comput. Syst. 2024, 155, 121–131. [Google Scholar] [CrossRef]
- Huo, D.; Zhou, Y.; Hao, Y.; Hu, L.; Mo, Y.; Chen, M.; Humar, I. Multi-modal model partition strategy for end-edge collaborative inference. J. Parallel Distrib. Comput. 2026, 208, 105189. [Google Scholar] [CrossRef]
- Li, Q.; Zhou, M.-T.; Ren, T.-F.; Jiang, C.-B.; Chen, Y. Partitioning multi-layer edge network for neural network collaborative computing. EURASIP J. Wirel. Commun. Netw. 2023, 2023, 80. [Google Scholar] [CrossRef]
- Zhang, S.-F.; Zhai, J.-H.; Xie, B.-J.; Zhan, Y.; Wang, X. Multimodal Representation Learning: Advances, Trends and Challenges. In Proceedings of the 2019 International Conference on Machine Learning and Cybernetics (ICMLC), Kobe, Japan, 7–10 July 2019; pp. 1–6. [Google Scholar] [CrossRef]
- Liu, G.; Dai, F.; Xu, X.; Fu, X.; Dou, W.; Kumar, N.; Bilal, M. An adaptive DNN inference acceleration framework with end–edge–cloud collaborative computing. Future Gener. Comput. Syst. 2023, 140, 422–435. [Google Scholar] [CrossRef]
- Saeik, F.; Avgeris, M.; Spatharakis, D.; Santi, N.; Dechouniotis, D.; Violos, J.; Leivadeas, A.; Athanasopoulos, N.; Mitton, N.; Papavassiliou, S. Task offloading in Edge and Cloud Computing: A survey on mathematical, artificial intelligence and control theory solutions. Comput. Netw. 2021, 195, 108177. [Google Scholar] [CrossRef]
- Hua, H.; Li, Y.; Wang, T.; Dong, N.; Li, W.; Cao, J. Edge Computing with Artificial Intelligence: A Machine Learning Perspective. ACM Comput Surv 2023, 55, 1–35. [Google Scholar] [CrossRef]
- Guo, T. Cloud-Based or On-Device: An Empirical Study of Mobile Deep Inference. In Proceedings of the 2018 IEEE International Conference on Cloud Engineering (IC2E), Orlando, FL, USA, 17–20 April 2018; pp. 184–190. [Google Scholar] [CrossRef]
- Teerapittayanon, S.; McDanel, B.; Kung, H.T. Distributed Deep Neural Networks Over the Cloud, the Edge and End Devices. In Proceedings of the 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS), Atlanta, GA, USA, 5–8 June 2017; pp. 328–339. [Google Scholar] [CrossRef]
- Zhao, Z.; Barijough, K.M.; Gerstlauer, A. DeepThings: Distributed Adaptive Deep Learning Inference on Resource-Constrained IoT Edge Clusters. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 2018, 37, 2348–2359. [Google Scholar] [CrossRef]
- Hu, S.; Dong, C.; Wen, W. Enable Pipeline Processing of DNN Co-inference Tasks in the Mobile-Edge Cloud. In Proceedings of the 2021 IEEE 6th International Conference on Computer and Communication Systems (ICCCS), Chengdu, China, 23–26 April 2021; pp. 186–192. [Google Scholar] [CrossRef]
- Kar, B.; Yahya, W.; Lin, Y.-D.; Ali, A. Offloading Using Traditional Optimization and Machine Learning in Federated Cloud–Edge–Fog Systems: A Survey. IEEE Commun. Surv. Tutor. 2023, 25, 1199–1226. [Google Scholar] [CrossRef]
- Younis, A.; Tran, T.X.; Pompili, D. Energy-Latency-Aware Task Offloading and Approximate Computing at the Mobile Edge. In Proceedings of the 2019 IEEE 16th International Conference on Mobile Ad Hoc and Sensor Systems (MASS), Monterey, CA, USA, 4–7 November 2019; pp. 299–307. [Google Scholar] [CrossRef]
- Na, J.; Zhang, H.; Lian, J.; Zhang, B. Partitioning DNNs for Optimizing Distributed Inference Performance on Cooperative Edge Devices: A Genetic Algorithm Approach. Appl. Sci. 2022, 12, 10619. [Google Scholar] [CrossRef]
- Kang, Y.; Hauswald, J.; Gao, C.; Rovinski, A.; Mudge, T.; Mars, J.; Tang, L. Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge. ACM SIGARCH Comput. Archit. News 2017, 45, 615–629. [Google Scholar] [CrossRef]
- Banitalebi-Dehkordi, A.; Vedula, N.; Pei, J.; Xia, F.; Wang, L.; Zhang, Y. Auto-Split: A General Framework of Collaborative Edge-Cloud AI. arXiv 2021, arXiv:2108.13041. [Google Scholar] [CrossRef]
- Zhao, Z.; Wang, K.; Ling, N.; Xing, G. EdgeML: An AutoML Framework for Real-Time Deep Learning on the Edge. In Proceedings of the International Conference on Internet-of-Things Design and Implementation, Nashville, TN, USA, 18–21 May 2021; ACM: Charlottesvle, VA, USA, 2021; pp. 133–144. [Google Scholar] [CrossRef]
- Li, H.; Li, X.; Fan, Q.; He, Q.; Wang, X.; Leung, V.C.M. Distributed DNN Inference with Fine-Grained Model Partitioning in Mobile Edge Computing Networks. IEEE Trans. Mob. Comput. 2024, 23, 9060–9074. [Google Scholar] [CrossRef]
- Hu, C.; Bao, W.; Wang, D.; Liu, F. Dynamic Adaptive DNN Surgery for Inference Acceleration on the Edge. In Proceedings of the IEEE INFOCOM 2019-IEEE Conference on Computer Communications, Paris, France, 29 April–2 May 2019; pp. 1423–1431. [Google Scholar] [CrossRef]
- Zhang, S.; Li, Y.; Liu, X.; Guo, S.; Wang, W.; Wang, J.; Ding, B.; Wu, D. Towards Real-time Cooperative Deep Inference over the Cloud and Edge End Devices. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2020, 4, 1–24. [Google Scholar] [CrossRef]
- Parthasarathy, A.; Krishnamachari, B. Partitioning and Placement of Deep Neural Networks on Distributed Edge Devices to Maximize Inference Throughput. In Proceedings of the 2022 32nd International Telecommunication Networks and Applications Conference (ITNAC), Wellington, New Zealand, 30 November–2 December 2022; pp. 239–246. [Google Scholar] [CrossRef]
- Sada, A.B.; Khelloufi, A.; Naouri, A.; Ning, H.; Dhelim, S. Selective Task offloading for Maximum Inference Accuracy and Energy efficient Real-Time IoT Sensing Systems. arXiv 2024, arXiv:2402.16904. [Google Scholar] [CrossRef]
- Li, Y.; Liu, Z.; Kou, Z.; Wang, Y.; Zhang, G.; Li, Y.; Sun, Y. Real-Time Adaptive Partition and Resource Allocation for Multi-User End-Cloud Inference Collaboration in Mobile Environment. IEEE Trans. Mob. Comput. 2024, 23, 13076–13094. [Google Scholar] [CrossRef]
- Fudala, T.; Tsouvalas, V.; Meratnia, N. Fine-tuning Multimodal Transformers on Edge: A Parallel Split Learning Approach. arXiv 2025, arXiv:2502.06355. [Google Scholar] [CrossRef]
- Sardellitti, S.; Scutari, G.; Barbarossa, S. Joint Optimization of Radio and Computational Resources for Multicell Mobile-Edge Computing. IEEE Trans. Signal Inf. Process. Netw. 2015, 1, 89–103. [Google Scholar] [CrossRef]
- Zeng, W.; Zheng, J.; Gao, L.; Niu, J.; Ren, J.; Wang, H.; Cao, R.; Ji, S. Generative AI-Aided Multimodal Parallel Offloading for AIGC Metaverse Service in IoT Networks. IEEE Internet Things J. 2025, 12, 13273–13285. [Google Scholar] [CrossRef]







| Platform | Computational Capability (FLOPs/s) | Inter-Device Bandwidth 1 |
|---|---|---|
| Terminal | ||
| Edge Server | ||
| Cloud Server |
| Category | Parameter | Value |
|---|---|---|
| Computational Tiers | Computational Capability (Terminal) | 300 GFLOPs |
| Computational Capability (Edge) | 10 TFLOPs | |
| Computational Capability (Cloud) | 100 TFLOPs | |
| Network Links | Bandwidth (Terminal to Edge) | 300 Mbps |
| Bandwidth (Edge to Cloud) | 1000 Mbps | |
| HGA-DP Hyperparameters | Population Size | 100 |
| Number of Generations | 100 | |
| Elitism Count | 2 | |
| Tournament Size | 5 | |
| Mutation Rate (Dynamic) | 0.2 → 0.01 (Linear Decay) | |
| Stagnation Threshold | 15 Generations |
| Algorithm Configuration | Total Runtime (s) | Overhead vs. Standard GA |
|---|---|---|
| Standard GA (Baseline) | 0.6953 | - |
| GA + Repair | 0.7157 | +2.9% |
| HGA-DP (GA + Repair + NS) | 0.8649 | +24.4% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ye, C.; Zhang, R.; Li, X.; Deng, W.; Wang, J.; Shao, S. HGA-DP: Optimal Partitioning of Multimodal DNNs Enabling Real-Time Image Inference for AR-Assisted Communication Maintenance on Cloud-Edge-End Systems. Information 2025, 16, 1091. https://doi.org/10.3390/info16121091
Ye C, Zhang R, Li X, Deng W, Wang J, Shao S. HGA-DP: Optimal Partitioning of Multimodal DNNs Enabling Real-Time Image Inference for AR-Assisted Communication Maintenance on Cloud-Edge-End Systems. Information. 2025; 16(12):1091. https://doi.org/10.3390/info16121091
Chicago/Turabian StyleYe, Cong, Ruihang Zhang, Xiao Li, Wenlong Deng, Jianlei Wang, and Sujie Shao. 2025. "HGA-DP: Optimal Partitioning of Multimodal DNNs Enabling Real-Time Image Inference for AR-Assisted Communication Maintenance on Cloud-Edge-End Systems" Information 16, no. 12: 1091. https://doi.org/10.3390/info16121091
APA StyleYe, C., Zhang, R., Li, X., Deng, W., Wang, J., & Shao, S. (2025). HGA-DP: Optimal Partitioning of Multimodal DNNs Enabling Real-Time Image Inference for AR-Assisted Communication Maintenance on Cloud-Edge-End Systems. Information, 16(12), 1091. https://doi.org/10.3390/info16121091

