NUDIF: A Non-Uniform Deployment Framework for Distributed Inference in Heterogeneous Edge Clusters
Abstract
:1. Introduction
- Non-Uniform Deployment Framework: We propose a non-uniform deployment-based inference framework (NUDIF) that allocates sub-model instances across heterogeneous devices to optimize inter-stage processing speed alignment. This framework enhances overall resource utilization through non-uniform deployment.
- Optimization Modeling and Solving: A mixed-integer nonlinear programming (MINLP) model is formulated to maximize system throughput through the joint optimization of sub-model deployment, load balancing, and communication efficiency, providing theoretical guidance for heterogeneous resource adaptation.
- Experimental Validation: Evaluations on real-world edge device clusters demonstrate an average throughput improvement of 9.95% compared to traditional single-pipeline optimization methods, validating the effectiveness of non-uniform deployment strategies in batch inference scenarios.
2. Related Work
3. Non-Uniform Deployment Framework
3.1. Framework Overview
3.2. MINLP-Based Deployment Optimization Modeling
- (1)
- The sub-model i can only be assigned to the devices of stage i:
- (2)
- The total memory requirement of all sub-models on device j cannot exceed its memory capacity:
- (3)
- A device can have at most one model:
- (4)
- Regarding whether there is a sub-model on a device, if there is, a device can be in at most one stage. A binary auxiliary variable is introduced to represent whether device j is assigned a sub-model:If device j is assigned a sub-model; then , otherwise, :
- (5)
- For each stage, the number of devices in that stage is greater than or equal to 1:
- (6)
- In each stage s there is at least one device processing sub-model s:
- (7)
- For each stage s, sub-models from other stages cannot be executed in the current stage s:
- (8)
- is not greater than the throughput of any stage:
3.3. Optimization Problem Solving
4. Experimental Results
4.1. Experimental Setting
4.2. Experimental Results Analysis
4.3. Discussion
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Zhou, Z.; Chen, X.; Li, E.; Zeng, L.; Luo, K.; Zhang, J. Edge intelligence: Paving the last mile of artificial intelligence with edge computing. Proc. IEEE 2019, 107, 1738–1762. [Google Scholar] [CrossRef]
- Eshratifar, A.E.; Abrishami, M.S.; Pedram, M. JointDNN: An efficient training and inference engine for intelligent mobile cloud computing services. IEEE Trans. Mob. Comput. 2019, 20, 565–576. [Google Scholar] [CrossRef]
- Hu, Y.; Imes, C.; Zhao, X.; Kundu, S.; Beerel, P.A.; Crago, S.P.; Walters, J.P. Pipeedge: Pipeline parallelism for large-scale model inference on heterogeneous edge devices. In Proceedings of the 2022 25th IEEE Euromicro Conference on Digital System Design (DSD), Maspalomas, Spain, 31 August–2 September 2022; pp. 298–307. [Google Scholar]
- Duan, S.; Wang, D.; Ren, J.; Lyu, F.; Zhang, Y.; Wu, H.; Shen, X. Distributed artificial intelligence empowered by end-edge-cloud computing: A survey. IEEE Commun. Surv. Tutor. 2022, 25, 591–624. [Google Scholar] [CrossRef]
- Zhao, J.; Wan, B.; Peng, Y.; Lin, H.; Wu, C. Llm-pq: Serving llm on heterogeneous clusters with phase-aware partition and adaptive quantization. arXiv 2024, arXiv:2403.01136. [Google Scholar]
- Zhang, M.; Shen, X.; Cao, J.; Cui, Z.; Jiang, S. Edgeshard: Efficient llm inference via collaborative edge computing. IEEE Internet Things J. 2024. [Google Scholar] [CrossRef]
- Feng, C.; Han, P.; Zhang, X.; Yang, B.; Liu, Y.; Guo, L. Computation offloading in mobile edge computing networks: A survey. J. Netw. Comput. Appl. 2022, 202, 103366. [Google Scholar] [CrossRef]
- Boukouvala, F.; Misener, R.; Floudas, C.A. Global optimization advances in mixed-integer nonlinear programming, MINLP, and constrained derivative-free optimization, CDFO. Eur. J. Oper. Res. 2016, 252, 701–727. [Google Scholar]
- Yang, C.Y.; Kuo, J.J.; Sheu, J.P.; Zheng, K.J. Cooperative distributed deep neural network deployment with edge computing. In Proceedings of the ICC 2021-IEEE International Conference on Communications, Montreal, QC, Canada, 14–23 June 2021; pp. 1–6. [Google Scholar]
- Liu, H.; Zheng, H.; Jiao, M.; Chi, G. SCADS: Simultaneous computing and distribution strategy for task offloading in mobile-edge computing system. In Proceedings of the 2018 IEEE 18th International Conference on Communication Technology (ICCT), Chongqing, China, 8–11 October 2018; pp. 1286–1290. [Google Scholar]
- Xue, M.; Wu, H.; Li, R.; Xu, M.; Jiao, P. EosDNN: An efficient offloading scheme for DNN inference acceleration in local-edge-cloud collaborative environments. IEEE Trans. Green Commun. Netw. 2021, 6, 248–264. [Google Scholar] [CrossRef]
- Mohammed, T.; Joe-Wong, C.; Babbar, R.; Di Francesco, M. Distributed inference acceleration with adaptive DNN partitioning and offloading. In Proceedings of the IEEE INFOCOM 2020-IEEE Conference on Computer Communications, Toronto, ON, Canada, 6–9 July 2020; pp. 854–863. [Google Scholar]
- Lin, C.Y.; Wang, T.C.; Chen, K.C.; Lee, B.Y.; Kuo, J.J. Distributed deep neural network deployment for smart devices from the edge to the cloud. In Proceedings of the ACM MobiHoc Workshop on Pervasive Systems in the IoT Era, Catania, Italy, 2 July 2019; pp. 43–48. [Google Scholar]
- Dhuheir, M.; Baccour, E.; Erbad, A.; Sabeeh, S.; Hamdi, M. Efficient real-time image recognition using collaborative swarm of uavs and convolutional networks. In Proceedings of the 2021 IEEE International Wireless Communications and Mobile Computing (IWCMC), Harbin, China, 28 June– 2 July 2021; pp. 1954–1959. [Google Scholar]
- Jouhari, M.; Al-Ali, A.K.; Baccour, E.; Mohamed, A.; Erbad, A.; Guizani, M.; Hamdi, M. Distributed CNN inference on resource-constrained UAVs for surveillance systems: Design and optimization. IEEE Internet Things J. 2021, 9, 1227–1242. [Google Scholar] [CrossRef]
- Hemmat, M.; Davoodi, A.; Hu, Y.H. EdgenAI: Distributed Inference with Local Edge Devices and Minimal Latency. In Proceedings of the 2022 27th IEEE Asia and South Pacific Design Automation Conference (ASP-DAC), Taipei, Taiwan, 17–20 January 2022; pp. 544–549. [Google Scholar]
- Zhao, Z.; Barijough, K.M.; Gerstlauer, A. Deepthings: Distributed adaptive deep learning inference on resource-constrained iot edge clusters. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 2018, 37, 2348–2359. [Google Scholar] [CrossRef]
- Gurobi Optimization, LLC. Gurobi Optimizer Reference Manual. 2024. Available online: https://www.gurobi.com (accessed on 6 April 2025).
Notations | Explanations |
---|---|
Number of devices | |
Number of sub-models | |
Output size of sub-model i | |
Memory requirement of sub-model i | |
Memory capacity of device j | |
Bandwidth between device j and device | |
Computation time of the i-th sub-model on device j | |
Communication time for the output of the -th sub-model from device j to device | |
Total inference time for sub-model s executed by device j | |
Throughput of stage s | |
Throughput of the stage with the minimum throughput |
Description | Data |
---|---|
Memory requirement of each sub-model (GB) | [5, 6, 7, 6] |
Memory capacity of each device (GB) | [8, 10, 9, 9, 8, 9, 8, 9] |
Computation time of each sub-model on each device (s) | [2.31, 1.87, 2.45, 2.02, 2.15, 2.05, 1.88, 1.75, 1.94, 2.62, 1.79, 2.35, 2.20, 1.73, 1.72, 1.70, 2.58, 1.68, 2.21, 1.90, 2.00, 1.87, 1.82, 1.75, 1.85, 2.49, 2.07, 1.74, 1.92, 1.98, 1.98, 1.90] |
Output size of each sub-model (MB) | [8, 6, 9, 7] |
Bandwidth between each pair of devices (Mbps) | [0, 562, 708, 345, 590, 620, 655, 700, 562, 0, 823, 412, 678, 730, 780, 820, 708, 823, 0, 678, 745, 810, 850, 890, 345, 412, 678, 0, 510, 590, 625, 675, 590, 678, 745, 510, 0, 680, 710, 760, 620, 730, 810, 590, 680, 0, 720, 770, 655, 780, 850, 625, 710, 720, 0, 750, 700, 820, 890, 675, 760, 770, 750, 0] |
Sub-Model 0 | Sub-Model 1 | Sub-Model 2 | Sub-Model 3 | Throughput |
---|---|---|---|---|
Device 1 | Device 2 | Device 3 | Device 0 | 0.5192 |
Number of Devices | Grouping | Sub-Model 0 | Sub-Model 1 | Sub-Model 2 | Sub-Model 3 | Group Throughput | Total Throughput |
---|---|---|---|---|---|---|---|
12 | Group 1 | Device 10 | Device 9 | Device 1 | Device 11 | 0.5905 | 1.6563 |
Group 2 | Device 8 | Device 2 | Device 7 | Device 3 | 0.5529 | ||
Group 3 | Device 6 | Device 0 | Device 5 | Device 4 | 0.5129 | ||
13 | Group 1 | Device 11 | Device 9 | Device 10 | Device 12 | 0.8623 | 1.9597 |
Group 2 | Device 7 | Device 8 | Device 1 | Device 3 | 0.5681 | ||
Group 3 | Device 6 | Device 2 | Device 5 | Device 0 | 0.5293 | ||
14 | Group 1 | Device 12 | Device 9 | Device 10 | Device 11 | 0.8623 | 1.9775 |
Group 2 | Device 13 | Device 6 | Device 1 | Device 3 | 0.5747 | ||
Group 3 | Device 8 | Device 2 | Device 7 | Device 0 | 0.5405 |
Number of Devices | Stage | Used Devices | Stage Throughput | Total Throughput |
---|---|---|---|---|
12 | Stage 0 | 0, 3, 4, 8 | 1.9162 | 1.7295 |
Stage 1 | 5, 6, 7 | 1.7295 | ||
Stage 2 | 9, 10 | 1.7365 | ||
Stage 3 | 1, 2, 11 | 1.8106 | ||
13 | Stage 0 | 7, 8, 9 | 2.0256 | 2.0256 |
Stage 1 | 2, 11, 12 | 2.0756 | ||
Stage 2 | 1, 4, 5, 6 | 2.1257 | ||
Stage 3 | 0, 3, 10 | 2.0676 | ||
14 | Stage 0 | 1, 9, 11 | 2.2893 | 2.2748 |
Stage 1 | 5, 6, 7, 8 | 2.2748 | ||
Stage 2 | 2, 3, 4, 10 | 2.3068 | ||
Stage 3 | 0, 12, 13 | 2.3030 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, P.; Qing, C.; Liu, H. NUDIF: A Non-Uniform Deployment Framework for Distributed Inference in Heterogeneous Edge Clusters. Future Internet 2025, 17, 168. https://doi.org/10.3390/fi17040168
Li P, Qing C, Liu H. NUDIF: A Non-Uniform Deployment Framework for Distributed Inference in Heterogeneous Edge Clusters. Future Internet. 2025; 17(4):168. https://doi.org/10.3390/fi17040168
Chicago/Turabian StyleLi, Peng, Chen Qing, and Hao Liu. 2025. "NUDIF: A Non-Uniform Deployment Framework for Distributed Inference in Heterogeneous Edge Clusters" Future Internet 17, no. 4: 168. https://doi.org/10.3390/fi17040168
APA StyleLi, P., Qing, C., & Liu, H. (2025). NUDIF: A Non-Uniform Deployment Framework for Distributed Inference in Heterogeneous Edge Clusters. Future Internet, 17(4), 168. https://doi.org/10.3390/fi17040168