# Horizontally Distributed Inference of Deep Neural Networks for AI-Enabled IoT

^{1}

^{2}

^{3}

^{4}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Background: Towards Deep Learning at the Edge

## 3. In Situ Distributed Intelligence

- The centralized architecture [70,71,72,78,80,81,84,86] stands out as the most implemented communication pattern for in-cluster co-inference. A MapReduce-like distributed programming model is used to provide a synchronized coordination of CNN inference computations across a given number of mobile and embedded devices. It has been found that using a distributed architecture such as MapReduce makes parallel data processing easier during DNN execution [76]. Its effectiveness has been proven in many ML applications on mobile platforms [77,78], maximizing the usage of computing resources of the nodes in a computing cluster [77]. To facilitate parallel processing, the Map procedure breaks down a job into smaller, more manageable chunks that can be executed in parallel, while the Reduce procedure brings together the data collected during the Map procedure’s intermediate stages.Regarding infrastructure, the model is implemented over a distributed network cluster formed by a central node, commonly referred to as the master node in the studies analyzed, but also as the group owner, gateway device, host device, and group leader, and multiple supplementary worker nodes, also known as slave devices, assisting devices, and follower nodes. A single device takes on the role of the central node and is responsible for coordinating the partitioning [80] and co-inference [72,86], which, according to the frameworks analyzed, may entail a broad spectrum of tasks: analyzing, splitting, and distributing input data [70]; registering participating devices and setting up the communications with them [86]; and managing the mechanisms underneath the data structures utilized for node coordination [73] such as IP address tables [72]. Worker nodes assist the master device by performing some of the computations required. During inference, each IoT device produces its own partial results, which are then aggregated to produce the final output of the system. Each worker node is mapped with only some of the partitions, which are processed and then reduced back to the central node, thus generating the input for the next layer considered in the Map procedure.
- The pipelined architecture [77,79,85,91] conceives the workflow as a sequence of n computation stages—corresponding to the n nodes in the hardware infrastructure—and n-1 communication steps for transferring intermediate results between adjacent devices [77]. Both the computation nodes and the execution flow are typically predetermined at configuration time, thereby simplifying task assignment into a mere sequential ordering, circumscribing the pursued objectives to find the split points that optimize the performance of the exploited CNN deployed. In this regard, whereas in [85], for example, each stage refers to the processing of a group of DNN layers on a subcluster, i.e., a subset of the devices constituting the overall system, in [79], partitioning is reduced to its minimal expression on a setup consisting of two devices, requiring a decision as to which portion of the DNN model should be executed locally and which should be offloaded to a second paired device. Finally, as far as node roles are concerned, generally speaking, and contrary to centralized architectures, there is not such an explicit distinction except for the fact that the first stage typically receives the input data, and the final stage is the one that generates the inference result [85].
- The decentralized architecture [74,89], as its name suggests, represents an even larger stepped divergence from the centralized communication pattern. In a centralized framework, as described above, worker nodes commonly need to register only with the central node, and the communication between them is typically bidirectional. Although bidirectional communication remains pertinent for the decentralized pattern, the latter requires much communication between each pair of nodes, thereby dramatically increasing the number of communications almost two-fold. Moreover, as there is no central node coordinating the other devices, each one must register with every other device, making that management far more complicated [74]. As made explicit in [89], each node holds information about how to coordinate itself with the rest of the computing entities in compliance with the intricate DNN layer dependencies, and, accordingly, each one is ultimately responsible not only for calculating the corresponding intermediate results, but also for forwarding them to the next intended device.

## 4. DNN Partitioning and Parallelism for Collaborative Inference

#### 4.1. Taxonomy of Parallelism Strategies and Partitioning Schemes

#### 4.2. Decision Making for Partitioning Scheme Generation

#### 4.3. Major Challenges and Specific Strategies Explored

## 5. Discussion

- Regarding the design of the experiment, there is considerable heterogeneity, not only in terms of the evaluation metrics considered or the hardware and software infrastructure used, but also, and most importantly, in terms of the DNN models adopted, as well as the competing schemes and the specific aspects of analysis considered when conducting an in-depth study of the proposed solutions or their comparative analysis with other existing options. In this sense, the development of a common evaluation benchmark, at least for the most representative frameworks, would contribute to the creation of a more solid or scientifically rigorous groundwork in this context of study, enabling a unified comparison of these frameworks and the eventual abandonment of the current model, which adopts the local execution of models on a single device as baseline [71,72,79,81,83,86,88,91] for comparison.
- In a significant number of studies, certain factors, the majority of which are closely related to the hardware configuration used in each case, are not adequately evaluated or analyzed. A third or less of the co-authored papers do not address issues such as accuracy [71,74,75,77,83] (fundamental in user-centric intelligent systems), memory footprint [71,73,74,81,87,89], and communication overhead [70,71,73,82,84,90,91] (prevalent as major challenges in the smart IoT research reviewed), or energy consumption [72,75,79,81,86,88,91] (not unique to IoT solutions, but particularly problematic in these contexts). Therefore, it becomes evident that the study of such factors should be generalized to the extent possible.
- Although execution time assessment is discussed in nearly all of the reviewed studies, with the exception of [77,89], the vast majority of them lack depth or level of detail. In general, the analysis conducted in this context focuses almost exclusively on the end-to-end inference latency, omitting its treatment in relation to aspects or tasks of great importance that have emerged throughout the document, with detail found in only very few works, such as data transmission [70,71], pre-communication data transformation [91], and decision making for generation or updating the partitioning scheme [90].

## 6. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## References

- Zheng, L.-R.; Tenhunen, H.; Zou, Z. Smart Electronic Systems: Heterogeneous Integration of Silicon and Printed Electronicsl; John Wiley & Sons: Hoboken, NJ, USA, 2018. [Google Scholar]
- Nord, J.H.; Koohang, A.; Paliszkiewicz, J. The Internet of Things: Review and theoretical framework. Expert Syst. Appl.
**2019**, 133, 97–108. [Google Scholar] [CrossRef] - Veres, M.; Moussa, M. Deep Learning for Intelligent Transportation Systems: A Survey of Emerging Trends. IEEE Trans. Intell. Transp. Syst.
**2020**, 21, 3152–3168. [Google Scholar] [CrossRef] - Farooq, M.S.; Riaz, S.; Abid, A.; Abid, K.; Naeem, M.A. A Survey on the Role of IoT in Agriculture for the Implementation of Smart Farming. IEEE Access
**2019**, 7, 156237–156271. [Google Scholar] [CrossRef] - Lu, Y.; Xu, X.; Wang, L. Smart manufacturing process and system automation—A critical review of the standards and envisioned scenarios. J. Manuf. Syst.
**2020**, 56, 312–325. [Google Scholar] [CrossRef] - Baker, S.B.; Xiang, W.; Atkinson, I. Internet of Things for Smart Healthcare: Technologies, Challenges, and Opportunities. IEEE Access
**2017**, 5, 26521–26544. [Google Scholar] [CrossRef] - Khan, A.; Sohail, A.; Zahoora, U.; Qureshi, A.S. A survey of the recent architectures of deep convolutional neural networks. Artif. Intell. Rev.
**2020**, 53, 5455–5516. [Google Scholar] [CrossRef] - Abbas, N.; Zhang, Y.; Taherkordi, A.; Skeie, T. Mobile Edge Computing: A Survey. IEEE Internet Things J.
**2018**, 5, 450–465. [Google Scholar] [CrossRef] - Rodriguez-Conde, I.; Campos, C.; Fdez-Riverola, F. Cloud-Assisted Collaborative Inference of Convolutional Neural Networks for Vision Tasks on Resource-Constrained Devices. Neurocomputing, 2022; submitted for publication. [Google Scholar]
- Chen, M.; Hao, Y.; Li, Y.; Lai, C.-F.; Wu, D. On the computation offloading at ad hoc cloudlet: Architecture and service modes. IEEE Commun. Mag.
**2015**, 53, 18–24. [Google Scholar] [CrossRef] - Zhou, Z.; Chen, X.; Li, E.; Zeng, L.; Luo, K.; Zhang, J. Edge Intelligence: Paving the Last Mile of Artificial Intelligence with Edge Computing. Proc. IEEE
**2019**, 107, 1738–1762. [Google Scholar] [CrossRef] - Chen, J.; Ran, X. Deep Learning with Edge Computing: A Review. Proc. IEEE
**2019**, 107, 1655–1674. [Google Scholar] [CrossRef] - Mejías, B.; Roy, P.V. From Mini-clouds to Cloud Computing. In Proceedings of the 2010 Fourth IEEE International Conference on Self-Adaptive and Self-Organizing Systems Workshop, Budapest, Hungary, 27–28 September 2010. [Google Scholar]
- Elkhatib, Y.; Porter, B.; Ribeiro, H.B.; Zhani, M.F.; Qadir, J.; Riviere, E. On Using Micro-Clouds to Deliver the Fog. IEEE Internet Comput.
**2017**, 21, 8–15. [Google Scholar] [CrossRef] - Bonomi, F.; Milito, R.; Zhu, J.; Addepalli, S. Fog Computing and its Role in the Internet of Things. In Proceedings of the First Edition of the MCC Workshop on Mobile Cloud Computing, Helsinki, Finland, 17 August 2012. [Google Scholar]
- Yao, J.; Zhang, S.; Yao, Y.; Wang, F.; Ma, J.; Zhang, J.; Chu, Y.; Ji, L.; Jia, K.; Shen, T.; et al. Edge-Cloud Polarization and Collaboration: A Comprehensive Survey for AI. IEEE Trans. Knowl. Data Eng.
**2022**, 1. [Google Scholar] [CrossRef] - Filho, C.P.; Marques, E.; Chang, V.; dos Santos, L.; Bernardini, F.; Pires, P.F.; Ochi, L.; Delicato, F.C. A Systematic Literature Review on Distributed Machine Learning in Edge Computing. Sensors
**2022**, 22, 2665. [Google Scholar] [CrossRef] - Wang, X.; Han, Y.; Leung, V.C.M.; Niyato, D.; Yan, X.; Chen, X. Convergence of Edge Computing and Deep Learning: A Comprehensive Survey. IEEE Commun. Surv. Tutor.
**2020**, 22, 869–904. [Google Scholar] [CrossRef] - Matsubara, Y.; Levorato, M.; Restuccia, F. Split computing and early exiting for deep learning applications: Survey and research challenges. ACM Comput. Surv.
**2022**, 55, 1–30. [Google Scholar] [CrossRef] - Rausch, T.; Dustdar, S. Edge Intelligence: The Convergence of Humans, Things, and AI. In Proceedings of the 2019 IEEE International Conference on Cloud Engineering (IC2E), Milan, Italy, 24–27 June 2019. [Google Scholar]
- Murshed, M.G.S.; Murphy, C.; Hou, D.; Khan, N.; Ananthanarayanan, G.; Hussain, F. Machine Learning at the Network Edge: A Survey. ACM Comput. Surv.
**2021**, 54, 1–37. [Google Scholar] [CrossRef] - Deng, S.; Zhao, H.; Fang, W.; Yin, J.; Dustdar, S.; Zomaya, A.Y. Edge Intelligence: The Confluence of Edge Computing and Artificial Intelligence. IEEE Internet Things J.
**2020**, 7, 7457–7469. [Google Scholar] [CrossRef] - Xu, D.; Li, T.; Li, Y.; Su, X.; Tarkoma, S.; Jiang, T. Edge intelligence: Architectures, challenges, and applications. arXiv
**2020**, arXiv:2003.12172. [Google Scholar] - Verbraeken, J.; Wolting, M.; Katzy, J.; Kloppenburg, J.; Verbelen, T.; Rellermeyer, J.S. A Survey on Distributed Machine Learning. ACM Comput. Surv.
**2020**, 53, 1–33. [Google Scholar] [CrossRef] - Wang, J.; Pan, J.; Esposito, F.; Calyam, P.; Yang, Z.; Mohapatra, P. Edge cloud offloading algorithms: Issues, methods, and perspectives. ACM Comput. Surv.
**2019**, 52, 1–23. [Google Scholar] [CrossRef] - Shi, Y.; Yang, K.; Jiang, T.; Zhang, J.; Letaief, K.B. Communication-Efficient Edge AI: Algorithms and Systems. IEEE Commun. Surv. Tutor.
**2020**, 22, 2167–2191. [Google Scholar] [CrossRef] - Lin, L.; Liao, X.; Jin, H.; Li, P. Computation Offloading Toward Edge Computing. Proc. IEEE
**2019**, 107, 1584–1607. [Google Scholar] [CrossRef] - Zou, Z.; Jin, Y.; Nevalainen, P.; Huan, Y.; Heikkonen, J.; Westerlund, T. Edge and Fog Computing Enabled AI for IoT-An Overview. In Proceedings of the 2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS), Hsinchu, Taiwan, 18–20 March 2019. [Google Scholar]
- Rosendo, D.; Costan, A.; Valduriez, P.; Antoniu, G. Distributed intelligence on the Edge-to-Cloud Continuum: A systematic literature review. J. Parallel Distrib. Comput.
**2022**, 166, 71–94. [Google Scholar] [CrossRef] - Lane, N.D.; Bhattacharya, S.; Georgiev, P.; Forlivesi, C.; Jiao, L.; Qendro, L.; Kawsar, F. DeepX: A Software Accelerator for Low-Power Deep Learning Inference on Mobile Devices. In Proceedings of the 2016 15th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN), Vienna, Austria, 11–14 April 2016. [Google Scholar]
- Li, H.; Ng, J.K.; Abdelzaher, T. Enabling Real-time AI Inference on Mobile Devices via GPU-CPU Collaborative Execution. In Proceedings of the 2022 IEEE 28th International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA), Taipei, Taiwan, 23–25 August 2022. [Google Scholar]
- Dagli, I.; Cieslewicz, A.; McClurg, J.; Belviranli, M.E. AxoNN: Energy-aware execution of neural network inference on multi-accelerator heterogeneous SoCs. In Proceedings of the 59th ACM/IEEE Design Automation Conference, San Francisco, CA, USA, 5–9 December 2021; Association for Computing Machinery: New York, NY, USA, 2021. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv
**2014**, arXiv:1409.1556. [Google Scholar] - Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun. ACM
**2017**, 60, 84–90. [Google Scholar] [CrossRef] - Liu, S.; Deng, W. Very deep convolutional neural network based image classification using small training sample size. In Proceedings of the 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR), Kuala Lumpur, Malaysia, 3–6 November 2015. [Google Scholar]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D. Going deeper with convolutions. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015. [Google Scholar]
- Chen, T.; Du, Z.; Sun, N.; Wang, J.; Wu, C.; Chen, Y.; Temam, O. DianNao: A small-footprint high-throughput accelerator for ubiquitous ma-chine-learning. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, Salt Lake City, UT, USA, 1–5 March 2014. [Google Scholar]
- Chen, Y.-H.; Yang, T.-J.; Emer, J.S.; Sze, V. Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices. IEEE J. Emerg. Sel. Top. Circuits Syst.
**2019**, 9, 292–308. [Google Scholar] [CrossRef] - Yin, X.; Chen, L.; Zhang, X.; Gao, Z. Object Detection Implementation and Optimization on Embedded GPU System. In Proceedings of the 2018 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB), Valencia, Spain, 6–8 June 2018. [Google Scholar]
- Andargie, F.A.; Rose, J.; Austin, T.; Bertacco, V. Energy efficient object detection on the mobile GP-GPU. IEEE Africon
**2017**, 945–950. [Google Scholar] [CrossRef] - Zhang, C.; Li, P.; Sun, G.; Guan, Y.; Xiao, B.; Cong, J. Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Net-works. In Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, CA, USA, 22–24 February 2015; Association for Computing Machinery: New York, NY, USA, 2015. [Google Scholar]
- Guo, K.; Zeng, S.; Yu, J.; Wang, Y.; Yang, H. [DL] A Survey of FPGA-based Neural Network Inference Accelerators. ACM Trans. Reconfigurable Technol. Syst.
**2019**, 12, 1–26. [Google Scholar] [CrossRef] - Cheng, J.; Wang, P.-S.; Li, G.; Hu, Q.-H.; Lu, H.-Q. Recent advances in efficient computation of deep convolutional neural networks. Front. Inf. Technol. Electron. Eng.
**2018**, 19, 64–77. [Google Scholar] [CrossRef] - Deng, B.L.; Li, G.; Han, S.; Shi, L.; Xie, Y. Model Compression and Hardware Acceleration for Neural Networks: A Comprehensive Survey. Proc. IEEE
**2020**, 108, 485–532. [Google Scholar] [CrossRef] - Bhattacharya, S.; Lane, N.D. Sparsification and Separation of Deep Learning Layers for Constrained Resource Inference on Wearables. In Proceedings of the 14th ACM Conference on Embedded Networked Sensor Systems (SenSys), Stanford, CA, USA, 14–16 November 2016; pp. 176–189. [Google Scholar]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv
**2017**, arXiv:1704.04861. [Google Scholar] - Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Rodriguez-Conde, I.; Campos, C.; Fdez-Riverola, F. Optimized convolutional neural network architectures for efficient on-device vision-based object detection. Neural Comput. Appl.
**2022**, 34, 10469–10501. [Google Scholar] [CrossRef] - Rodriguez-Conde, I.; Campos, C.; Fdez-Riverola, F. On-Device Object Detection for More Efficient and Privacy-Compliant Visual Perception in Context-Aware Systems. Appl. Sci.
**2021**, 11, 9173. [Google Scholar] [CrossRef] - Norouzi, N.; Bruder, G.; Belna, B.; Mutter, S.; Turgut, D.; Welch, G. A Systematic Review of the Convergence of Augmented Reality, Intelligent Virtual Agents, and the Internet of Things. In Artificial Intelligence in IoT; Al-Turjman, F., Ed.; Springer: Berlin/Heidelberg, Germany, 2019; pp. 1–24. [Google Scholar]
- Lu, H.; Liu, Q.; Tian, D.; Li, Y.; Kim, H.; Serikawa, S. The Cognitive Internet of Vehicles for Autonomous Driving. IEEE Netw.
**2019**, 33, 65–73. [Google Scholar] [CrossRef] - Strom, N. Scalable distributed DNN training using commodity GPU cloud computing. In Proceedings of the Interspeech 2015, Dresden, Germany, 6–10 September 2015; pp. 1488–1492. [Google Scholar] [CrossRef]
- Khan, A.U.R.; Othman, M.; Madani, S.A.; Khan, S.U. A Survey of Mobile Cloud Computing Application Models. IEEE Commun. Surv. Tutor.
**2014**, 16, 393–413. [Google Scholar] [CrossRef] - Premsankar, G.; Francesco, M.D.; Taleb, T. Edge Computing for the Internet of Things: A Case Study. IEEE Internet Things J.
**2018**, 5, 1275–1284. [Google Scholar] [CrossRef] - Meng, N.; Lam, E.Y.; Tsia, K.K.; So, H.K.-H. Large-Scale Multi-Class Image-Based Cell Classification with Deep Learning. IEEE J. Biomed. Health Inform.
**2018**, 23, 2091–2098. [Google Scholar] [CrossRef] - Hauswald, J.; Kang, Y.; Laurenzano, M.A.; Chen, Q.; Li, C.; Mudge, T. DjiNN and Tonic: DNN as a service and its implications for future warehouse scale computers. In Proceedings of the 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA), Portland, OR, USA, 13–17 June 2015. [Google Scholar]
- Jauro, F.; Chiroma, H.; Gital, A.Y.; Almutairi, M.; Abdulhamid, S.M.; Abawajy, J.H. Deep learning architectures in emerging cloud computing architectures: Recent development, challenges and next research trend. Appl. Soft Comput.
**2020**, 96, 106582. [Google Scholar] [CrossRef] - Varghese, B.; Buyya, R. Next generation cloud computing: New trends and research directions. Future Gener. Comput. Syst.
**2018**, 79, 849–861. [Google Scholar] [CrossRef] - Wu, H.; Zhang, Z.; Guan, C.; Wolter, K.; Xu, M. Collaborate Edge and Cloud Computing with Distributed Deep Learning for Smart City Internet of Things. IEEE Internet Things J.
**2020**, 7, 8099–8110. [Google Scholar] [CrossRef] - Qayyum, A.; Ijaz, A.; Usama, M.; Iqbal, W.; Qadir, J.; Elkhatib, Y.; Al-Fuqaha, A. Securing Machine Learning in the Cloud: A Systematic Review of Cloud Machine Learning Security. Front. Big Data
**2020**, 3, 587139. [Google Scholar] [CrossRef] [PubMed] - Huang, D.; Wu, H. Chapter 1—Mobile Cloud Computing Taxonomy. In Mobile Cloud Computing; Huang, D., Wu, H., Eds.; Morgan Kaufmann: Burlington, MA, USA, 2018; pp. 5–29. [Google Scholar]
- Fernando, N.; Loke, S.W.; Rahayu, W. Mobile cloud computing: A survey. Future Gener. Comput. Syst.
**2013**, 29, 84–106. [Google Scholar] [CrossRef] - Satyanarayanan, M.; Bahl, P.; Caceres, R.; Davies, N. The Case for VM-Based Cloudlets in Mobile Computing. IEEE Pervasive Comput.
**2009**, 8, 14–23. [Google Scholar] [CrossRef] - Tong, L.; Li, Y.; Gao, W. A Hierarchical Edge Cloud Architecture for Mobile Computing. In Proceedings of the IEEE INFOCOM 2016—The 35th Annual IEEE International Conference on Computer Communications, San Francisco, CA, USA, 10–14 April 2016. [Google Scholar]
- Kang, Y.; Hauswald, J.; Gao, C.; Rovinski, A.; Mudge, T.; Mars, J.; Tang, L. Neurosurgeon: Collaborative intelligence between the cloud and mobile edge. ACM SIGPLAN Not.
**2017**, 52, 615–629. [Google Scholar] [CrossRef] - Jeong, H.-J.; Lee, H.-J.; Shin, C.H.; Moon, S.-M. IONN: Incremental Offloading of Neural Network Computations from Mobile Devices to Edge Servers. In Proceedings of the ACM Symposium on Cloud Computing; Association for Computing Machinery: Carlsbad, CA, USA, 2018; pp. 401–411. [Google Scholar]
- Hu, C.; Bao, W.; Wang, D.; Liu, F. Dynamic Adaptive DNN Surgery for Inference Acceleration on the Edge. In Proceedings of the IEEE INFOCOM 2019—IEEE Conference on Computer Communications, Paris, France, 29 April–2 May 2019. [Google Scholar]
- Zhang, S.; Li, Y.; Liu, X.; Guo, S.; Wang, W.; Wang, J.; Ding, B.; Wu, D. Towards Real-time Cooperative Deep Inference over the Cloud and Edge End Devices. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.
**2020**, 4, 1–24. [Google Scholar] [CrossRef] - Mao, J.; Chen, X.; Nixon, K.W.; Krieger, C. MoDNN: Local distributed mobile computing system for Deep Neural Network. In Proceedings of the Design, Automation & Test in Europe Conference & Exhibition (DATE), Lausanne, Switzerland, 27–31 March 2017. [Google Scholar] [CrossRef]
- Mao, J.; Yang, Z.; Wen, W.; Wu, C.; Song, L.; Nixon, K.W.; Chen, X.; Li, H.; Chen, Y. MeDNN: A distributed mobile system with enhanced partition and deployment for large-scale DNNs. In Proceedings of the 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), Irvine, CA, USA, 13–16 November 2017. [Google Scholar] [CrossRef]
- Hadidi, R.; Cao, J.; Woodward, M.; Ryoo, M.S.; Kim, H. Musical chair: Efficient real-time recognition using collaborative iot devices. arXiv
**2018**, arXiv:1802.02138. [Google Scholar] - Zhao, Z.; Barijough, K.M.; Gerstlauer, A. DeepThings: Distributed Adaptive Deep Learning Inference on Resource-Constrained IoT Edge Clusters. IEEE Trans. Comput. Des. Integr. Circuits Syst.
**2018**, 37, 2348–2359. [Google Scholar] [CrossRef] - Du, J.; Shen, M.; Du, Y. A Distributed In-Situ CNN Inference System for IoT Applications. In Proceedings of the 2020 IEEE 38th International Conference on Computer Design (ICCD), Hartford, CT, USA, 18–21 October 2020. [Google Scholar]
- Hadidi, R.; Asgari, B.; Cao, J.; Bae, Y.; Shim, D.E.; Kim, H. LCP: A low-communication parallelization method for fast neural network in-ference in image recognition. arXiv
**2020**, arXiv:2003.06464. [Google Scholar] - Hadidi, R.; Cao, J.; Ryoo, M.S.; Kim, H. Toward Collaborative Inferencing of Deep Neural Networks on Internet-of-Things Devices. IEEE Internet Things J.
**2020**, 7, 4950–4960. [Google Scholar] [CrossRef] - Hu, D.; Krishnamachari, B. Fast and Accurate Streaming CNN Inference via Communication Compression on the Edge. In Proceedings of the 2020 IEEE/ACM Fifth International Conference on Internet-of-Things Design and Implementation (IoTDI), Sydney, Australia, 21–24 April 2020. [Google Scholar]
- Miao, W.; Zeng, Z.; Wei, L.; Li, S.; Jiang, C.; Zhang, Z. Adaptive DNN Partition in Edge Computing Environments. In Proceedings of the 2020 IEEE 26th International Conference on Parallel and Distributed Systems (ICPADS), Hong Kong, China, 2–4 December 2020. [Google Scholar]
- Xu, M.; Qian, F.; Zhu, M.; Huang, F.; Pushp, S.; Liu, X. DeepWear: Adaptive Local Offloading for On-Wearable Deep Learning. IEEE Trans. Mob. Comput.
**2019**, 19, 314–330. [Google Scholar] [CrossRef] - Xue, F.; Fang, W.; Xu, W.; Wang, Q.; Ma, X.; Ding, Y. EdgeLD: Locally Distributed Deep Learning Inference on Edge Device Clusters. In Proceedings of the 2020 IEEE 22nd International Conference on High Performance Computing and Communications; IEEE 18th International Conference on Smart City; IEEE 6th International Conference on Data Science and Systems (HPCC/SmartCity/DSS), Yanuca Island, Cuvu, Fiji, 14–16 December 2020. [Google Scholar]
- Zhang, S.Q.; Lin, J.; Zhang, Q. Adaptive distributed convolutional neural network inference at the network edge with ADCNN. In Proceedings of the 49th International Conference on Parallel Processing-ICPP, Edmonton, AB, Canada, 17–20 August 2020. [Google Scholar]
- Dhuheir, M.; Baccour, E.; Erbad, A.; Sabeeh, S.; Hamdi, M. Efficient Real-Time Image Recognition Using Collaborative Swarm of UAVs and Convolutional Networks. In Proceedings of the 2021 International Wireless Communications and Mobile Computing (IWCMC), Harbin, China, 28 June–2 July 2021. [Google Scholar] [CrossRef]
- Du, J.; Zhu, X.; Shen, M.; Du, Y.; Lu, Y.; Xiao, N.; Liao, X. Model Parallelism Optimization for Distributed Inference via Decoupled CNN Structure. IEEE Trans. Parallel Distrib. Syst.
**2020**, 32, 1665–1676. [Google Scholar] [CrossRef] - Naveen, S.; Kounte, M.R.; Ahmed, M.R. Low Latency Deep Learning Inference Model for Distributed Intelligent IoT Edge Clusters. IEEE Access
**2021**, 9, 160607–160621. [Google Scholar] [CrossRef] - Yang, X.; Qi, Q.; Wang, J.; Guo, S.; Liao, J. Towards Efficient Inference: Adaptively Cooperate in Heterogeneous IoT Edge Cluster. In Proceedings of the 2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS), Washington, DC, USA, 7–10 July 2021. [Google Scholar] [CrossRef]
- Zeng, L.; Chen, X.; Zhou, Z.; Yang, L.; Zhang, J. CoEdge: Cooperative DNN Inference with Adaptive Workload Partitioning Over Heterogeneous Edge Devices. IEEE/ACM Trans. Netw.
**2020**, 29, 595–608. [Google Scholar] [CrossRef] - Miao, W.; Zeng, Z.; Wei, L.; Li, S.; Jiang, C.; Zhang, Z. DeepSlicing: Collaborative and Adaptive CNN Inference with Low Latency. IEEE Trans. Parallel Distrib. Syst.
**2021**, 32, 2175–2187. [Google Scholar] - Goel, A.; Tung, C.; Hu, X.; Thiruvathukal, G.K.; Davis, J.C.; Lu, Y.H. Efficient Computer Vision on Edge Devices with Pipeline-Parallel Hi-erarchical Neural Networks. In Proceedings of the 2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC), Taipei, Taiwan, 17–20 January 2022. [Google Scholar]
- Hu, C.; Li, B. Distributed Inference with Deep Learning Models across Heterogeneous Edge Devices. In Proceedings of the IEEE INFOCOM 2022—IEEE Conference on Computer Communications, Virtual, 2–5 May 2022. [Google Scholar]
- Jouhari, M.; Al-Ali, A.K.; Baccour, E.; Mohamed, A.; Erbad, A.; Guizani, M.; Hamdi, M. Distributed CNN Inference on Resource-Constrained UAVs for Surveillance Systems: Design and Optimization. IEEE Internet Things J.
**2022**, 9, 1227–1242. [Google Scholar] [CrossRef] - Parthasarathy, A.; Krishnamachari, B. DEFER: Distributed Edge Inference for Deep Neural Networks. In Proceedings of the 2022 14th International Conference on COMmunication Systems & NETworkS (COMSNETS), Bengaluru, India, 4–8 January 2022. [Google Scholar]
- Reddy, P.K.; Babu, R. An Evolutionary Secure Energy Efficient Routing Protocol in Internet of Things. Int. J. Intell. Eng. Syst.
**2017**, 10, 337–346. [Google Scholar] [CrossRef] - Coates, A.; Huval, B.; Wang, T.; Wu, D.; Catanzaro, B.; Andrew, N. Deep learning with COTS HPC systems. In Proceedings of the International Conference on Machine Learning, Atlanta, GA, USA, 16–21 June 2013. [Google Scholar]
- He, Y.; Liu, X.; Zhong, H.; Ma, Y. AddressNet: Shift-based primitives for efficient convolutional neural networks. Proceedings—2019 IEEE Winter Conference on Applications of Computer Vision, WACV 2019, Waikoloa Village, HI, USA, 7–11 January 2019; pp. 1213–1222. [Google Scholar]
- Xie, X.; Zhou, Y.; Kung, S.Y. Exploring Highly Efficient Compact Neural Networks for Image Classification. In Proceedings of the 2020 IEEE International Conference on Image Processing (ICIP), Abu Dhabi, United Arab Emirates, 25–28 October 2020. [Google Scholar]
- Zhang, X.; Zhou, X.; Lin, M.; Sun, J. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 6848–6856. [Google Scholar]
- Fiergolla, S.; Wolf, P. Improving Run Length Encoding by Preprocessing. In Proceedings of the 2021 Data Compression Conference (DCC), Virtual, 23–26 March 2021. [Google Scholar]
- Gia, T.N.; Qingqing, L.; Queralta, J.P.; Tenhunen, H.; Zou, Z.; Westerlund, T. Lossless Compression Techniques in Edge Computing for Mis-sion-Critical Applications in the IoT. In Proceedings of the 2019 Twelfth International Conference on Mobile Computing and Ubiquitous Network (ICMU), Kathmandu, Nepal, 4–6 November 2019. [Google Scholar]
- Merenda, M.; Porcaro, C.; Iero, D. Edge Machine Learning for AI-Enabled IoT Devices: A Review. Sensors
**2020**, 20, 2533. [Google Scholar] [CrossRef]

Work | Year | Domain | Focus | Scope |
---|---|---|---|---|

[12] | 2019 | EC ∩ AI | Methods for fast inference | LTWT DNN, MOD ADPT, INF ACC, DIST INF, INF PRIV, DIST TRAIN, APPS |

[27] | 2019 | EC | Computation offloading | CORE, APP PART, TSK ALLOC, DIST TSK, APPS |

[20] | 2019 | EC ∩ AI | Operational challenges | HW, APPS, MULTI TEN, SCHED, MOB, SCAL, PRV, AI LC |

[11] | 2019 | EC ∩ AI | - | CORE, DIST TRAIN, METR, MOD PART, MOD ADPT, MOD SEL, CACHE |

[28] | 2019 | EC ∩ AI | Specialized hardware | HW |

[22] | 2019 | EC ∩ AI | - | CORE, DIST TRAIN, MOD PART, MOD ADPT, INF ACC |

[25] | 2019 | ECC | Computation offloading | MULTI TEN, WKLD BAL, MOB, PART TYPES |

[26] | 2020 | EC ∩ AI | Communication challenges | COMM CHLG, COMM-EFF TRAIN, COMM-EFF INF |

[24] | 2020 | DIST ML | - | CORE, ML WF, TOPO, DIST FMW, DIST ML TRAIN |

[18] | 2020 | EC ∩ AI | - | CORE, HW, DL FWK, DIST TRAIN, CACHE, MOD ADPT, MOD SEL, MOD PART, OFLD TYPES, APPS |

[23] | 2020 | EC ∩ AI | - | DIST TRAIN, LTWT DNN, MOD ADPT, INF ACC, OFLD TYPES, APPS |

[21] | 2021 | EC ∩ AI | - | DIST TRAIN, LTWT DNN, MOD ADPT, DIST INF, DL FWK, HW, APPS |

[17] | 2022 | EC ∩ AI | Training techniques | DIST TRAIN, MOD PART, MOD ADPT, PRETRAIN, EDG PREPROC, BC, LTWT DNN, APPS |

[19] | 2022 | EC ∩ AI | - | CORE, LTWT DNN, MOD ADPT, MOD PART |

[29] | 2022 | ECC ∩ AI | ML-based analytics | DT ANAL, DIST TRAIN, MOD PART, PARAL, TSTB |

[16] | 2022 | ECC ∩ AI | Architectures for collaborative learning | CORE, DIST TRAIN, INF OFLD, BI COLLAB, LTWT DNN, MOD ADPT, RL, APPS |

Architecture | Workflow | ||||||||
---|---|---|---|---|---|---|---|---|---|

Work | Year | Approach | Objectives | DL Tasks | Applications | Communication | Tier | Offline Setup | Runtime |

MoDNN [70] | 2017 | FWK | Lower OVR LAT | IMG CLASS | GEN | Centralized | LAN | PART | CO-INF |

MeDNN [71] | 2017 | FWK | Lower OVR LAT Lower TX OVHD | IMG CLASS | GEN | Centralized | LAN | PART TSK ASSG | CO-INF |

Musical Chair [72] | 2018 | FWK | Real-time DNN CMPT | IMG CLASS AR | GEN | Centralized | LAN | PART TSK ASSG | TSK ASSG CO-INF |

DeepThings [73] | 2018 | FWK | Lower MEM FP | OBJ DET | GEN | Centralized | Edge | PART TSK ASSG | TSK ASSG CO-INF |

[74] | 2020 | FWK | High ACC | IMG CLASS | GEN | - | LAN | DNN TWK PART TSK ASSG | CO-INF |

LCP [75] | 2020 | DNN SPT | Lower TX OVHD | IMG CLASS | GEN | Decentralized | Edge | - | - |

[76] | 2020 | DNN DIST | Higher TPUT Lower OVR LAT | IMG CLASS VID CLASS | GEN | - | Edge | PART TASK ASSG | TSK ASSG |

[77] | 2020 | FWK | Higher TPUT Lower OVR LAT | IMG CLASS IMG ST ANAL | GEN | Pipelined | Edge | PART TSK ASSG DNN TWK | DNN TWK CO-INF |

[78] | 2020 | TSK ASSG | Lower OVR LAT | IMG CLASS | GEN | Centralized | Edge | TSK ASSG | TSK ASSG |

DeepWear [79] | 2020 | FWK | No ACC loss | IMG CLASS AR DOC CLASS EMO RECOG SP RECOG | WEAR ANAL | Pipelined | LAN | - | PART TSK ASSG CO-INF |

EdgeLD [80] | 2020 | FWK | Lower OVR LAT | IMG CLASS | GEN | Centralized | LAN | - | PART TSK ASSG CO-INF |

ADCNN [81] | 2020 | FWK | ACC-LAT TO | IMG CLASS | GEN | Centralized | Edge | DNN TWK PART | TSK ASSG CO-INF |

[82] | 2021 | TSK ASSG | Lower DM LAT | IMG CLASS | UAV SV | - | Edge | - | TSK ASSG |

DeCNN [83] | 2021 | FWK | Lower OVR LAT | IMG CLASS | GEN | - | LAN | DNN TWK | PART TSK ASSG CO-INF |

[84] | 2021 | FWK | Lower OVR LAT Lower TX OVHD | IMG CLASS OBJ DET | GEN | Centralized | Edge | DNN TWK | PART TSK ASSG CO-INF |

PICO [85] | 2021 | FWK | Lower OVR LAT | IMG CLASS OBJ DET | GEN | Pipelined | LAN | - | PART TSK ASSG CO-INF |

CoEdge [86] | 2021 | FWK | Lower OVR LAT Lower POW | IMG CLASS | GEN | Centralized | Edge | - | PART TSK ASSG CO-INF |

DeepSlicing [87] | 2021 | FWK | Lower OVR LAT | IMG CLASS | GEN | - | Edge | USER-DEF PART | TSK ASSG CO-INF |

[88] | 2022 | DNN DIST | Higher TPUT | - | GEN | - | Edge | - | - |

EdgeFlow [89] | 2022 | FWK | Lower OVR LAT | IMG CLASS OBJ DET | GEN | Decentralized | Edge | PART | TSK ASSG CO-INF |

[90] | 2022 | TSK ASSG | Lower OVR LAT No ACC loss | IMG CLASS | UAV SV | - | Edge | - | TSK ASSG |

DEFER [91] | 2022 | FWK | Higher TPUT | IMG CLASS | GEN | Pipelined | Edge | - | PART TSK ASSG CO-INF |

**Table 3.**Overview of the more relevant aspects of parallelism and partitioning for collaborative inference across IoT devices.

Decision-Making | ||||||
---|---|---|---|---|---|---|

Work | Parallelism | Partitioning | Problem Model | Objectives | Constraints | Solution |

[70] | Model PAR | 1D input PART for CL | WKLD BAL | MIN LAT | DEV CMPT | - |

[71] | Model PAR | 2D grid input PART for CL | WKLD BAL | MIN LAT | DEV CMPT | Greedy algorithm |

[72] | Data PAR Model PAR | Output-based PART for FL | WKLD BAL | MAX TPUT | DEV MEM SZ LAT | Exhaustive search |

[73] | Model PAR | 2D grid input PART for CL | - | - | - | - |

[74] | Model PAR | Group-wise PART for CL Input-based PART for FL | WKLD BAL | - | - | - |

[75] | Model PAR | Inter-branch PART | WKLD BAL | MIN LAT | DEV MEM SZ DEV CMPT | Exhaustive search |

[76] | Model PAR | - | WKLD BAL | MAX TPUT | DEV MEM SZ | Heuristic algorithm |

[77] | Pipeline PAR | Layer-wise PART | WKLD BAL | MAX TPUT | - | Dynamic programming |

[78] | Model PAR | Inter-branch PART | WKLD BAL | MIN LAT | - | Greedy algorithm |

[79] | Pipeline PAR | Layer-wise PART | DAG SPLIT | MIN POW MIN LAT | - | Heuristic algorithm |

[80] | Model PAR | 1D input PART for CL | WKLD BAL | MIN LAT | - | Greedy algorithm |

[81] | Model PAR | 2D grid input PART for CL | SCHED | MIN LAT | DEV STOR | Greedy algorithm |

[82] | Pipeline PAR | Layer-wise PART | ILP | MIN LAT | MAX MEM MAX WKLD Layer per node Binary control | Greedy algorithm |

[83] | Model PAR | Inter-channel PART for CL Input-based PART for FL | - | - | - | - |

[84] | Model PAR | 2D grid input PART for CL | - | - | - | - |

[85] | Pipeline PAR | Layer-wise PART | SCHED | MAX TPUT | - | Dynamic programming Greedy algorithm |

[86] | Model PAR | 1D input PART | INLP | MIN POW | LAT | Primal simplex algorithm |

[87] | Model PAR | 1D input PART | WKLD BAL | MIN LAT | - | - |

[88] | Pipelining | Layer-wise PART | WKLD BAL | MIN LAT | - | Exhaustive search Heuristic algorithm |

[89] | Model PAR | 1D input PART | ILP | MIN LAT | No overlapping | - |

[90] | Pipeline PAR | Layer-wise PART | NLP | MIN LAT | DEV MEM SZ DEV CMPT | - |

[91] | Pipeline PAR | Layer-wise PART | - | - | - | - |

Memory Footprint | Communication Overhead | Computational Burden | Inference Accuracy | Inference Efficiency | System Variability | Decision-Making | |
---|---|---|---|---|---|---|---|

Scheduling strategies | [87] | [73,74,83] | [73] | - | - | [73,77,87] | - |

Partitioning strategies | [80] | [73,74,80,81,83,84,86] | [81] | - | [84] | - | - |

DNN structure tailoring | [75,83,84] | [71,74,75,77,83] | [71,75,83] | [74,75,83] | [71] | - | - |

DNN retraining | - | - | - | [71,81] | - | - | - |

Compression methods | - | [81,91] | - | - | - | - | - |

Profiling | - | - | - | - | - | [72,78,79,80,81,86,87,90] | - |

Problem reformulation | - | - | - | - | - | [71,85] | [82,86,89,90] |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Rodriguez-Conde, I.; Campos, C.; Fdez-Riverola, F.
Horizontally Distributed Inference of Deep Neural Networks for AI-Enabled IoT. *Sensors* **2023**, *23*, 1911.
https://doi.org/10.3390/s23041911

**AMA Style**

Rodriguez-Conde I, Campos C, Fdez-Riverola F.
Horizontally Distributed Inference of Deep Neural Networks for AI-Enabled IoT. *Sensors*. 2023; 23(4):1911.
https://doi.org/10.3390/s23041911

**Chicago/Turabian Style**

Rodriguez-Conde, Ivan, Celso Campos, and Florentino Fdez-Riverola.
2023. "Horizontally Distributed Inference of Deep Neural Networks for AI-Enabled IoT" *Sensors* 23, no. 4: 1911.
https://doi.org/10.3390/s23041911