1. Introduction
Artificial intelligence (AI) has emerged as a revolutionary force, shaping the future of numerous domains such as healthcare, automotive, finance, and communication. Its transformative potential has been demonstrated through breakthroughs like Generative Pre-trained Transformers (GPT) [
1] and diffusion models [
2], each illustrating profound progress in both natural language processing and visual understanding. In this regard, AI has already become integral to disciplines like biology, where it is applied to protein structure prediction, and to physics, where it assists in discovering new materials.
1.1. Methodology of Literature Collection
This SLR follows Kitchenham’s guidelines [
3] and PRISMA standards [
4]. In selecting the relevant studies for this review, the search strategy employs the following Boolean query across IEEE Xplore, Scopus, ACM DL, and Google Scholar: TITLE-ABS-KEY(“edge intelligence” OR “AI on edge”) AND (“model compression” OR “neural architecture search” OR “federated learning”) AND PUBYEAR > 2014. Initial results yielded 1872 studies (IEEE: 400; Scopus: 35; ACM: 225; Google Scholar: 964). Title/abstract screening excluded 1584 studies, leaving 170 for full-text review.
Screening was performed manually by three authors to ensure rigor. To validate consistency, ChatGPT-4 (via API) was used to re-screen 20% of randomly selected papers, achieving 92% agreement with manual results. Discrepancies were resolved through consensus [
5].
Snowballing was not applied due to the rapid evolution of edge intelligence (70% of studies published post-2021). To ensure coverage, we prioritized recent works indexed in major databases. A sensitivity analysis confirmed that adding 20 seminal papers via backward snowballing did not alter key conclusions.
1.2. Challenges in AI Deployment on Edge Devices
The real promise of AI—ubiquitous deployment across diverse industries—still faces significant barriers. Notably, high-performance models require extensive computational resources and storage, making their deployment on smaller, resource-constrained edge devices particularly challenging. The growing complexity of AI models has further exacerbated the mismatch between their soaring computational demands and the limited hardware capacities of resource-constrained edge devices, including smartphones, wearables, and IoT ecosystems. For instance, training the foundational OPT-175B model necessitates 992 80GB A100 GPUs, each delivering a utilization rate of 147 TFLOP/s. In contrast, the most advanced smartphone chips currently provide less than 3 TFLOP/s of computational performance.
This gap in computational capacity drives the growing importance of edge intelligence [
6]. Edge intelligence leverages efficient deep learning technology, employing model compression and acceleration techniques to diminish network parameters and refine network structures [
7]. This optimization accelerates inference speed and reduces model storage demands while maintaining model accuracy. Predominant algorithms in edge intelligence can be categorized into three primary techniques: model sparsity [
8,
9,
10], model quantization [
11,
12,
13,
14], and knowledge distillation [
15,
16]. These approaches address the computational and storage constraints of edge devices, facilitating advanced on-device intelligence without compromising performance.
Neural networks often possess a substantial number of redundant parameters, many of which contribute minimally to the network’s information content. This phenomenon is known as over-parameterization. The removal of these parameters has a negligible impact on model performance. Model sparsity specifically targets the elimination of these superfluous parameters or connections within the neural network, enabling the network to bypass computations associated with these elements. Two seminal algorithms in this domain, Optimal Brain Damage [
17] and Optimal Brain Surgeon [
18], employ the calculation of the Hessian matrix within the framework of a second-order Taylor expansion to selectively prune connections. Han et al. [
19] pioneered the application of these principles to significantly reduce the size of deep learning models by a factor of 35 without sacrificing performance. This breakthrough is widely regarded as a foundational development in the era of edge intelligence.
Model quantization has emerged as a fundamental technique in edge intelligence, significantly reducing the computational and memory footprint of deep learning models without compromising accuracy. Quantization involves converting high-precision floating-point representations, typically 32-bit (FP32) or 16-bit (FP16), into lower-precision formats such as 8-bit integers (INT8) or even binary formats. By lowering the bit-width of model parameters and activations, quantization reduces the amount of memory required to store the model and accelerates the computational processes, particularly in resource-constrained environments like edge devices. Early studies by Fiesler et al. [
20] and Balzer et al. [
21] laid the groundwork for weight quantization, demonstrating that significant reductions in model size and complexity could be achieved with minimal impact on performance. Contemporary research has extended these ideas, showing that quantized models can achieve near-parity with their full-precision counterparts in a variety of tasks. The INT8 quantization in particular has become a widely adopted standard in edge AI applications, reducing model size by up to 75% while maintaining accuracy levels comparable to full-precision models.
Knowledge distillation is another pivotal technique in edge intelligence, designed to transfer the knowledge learned by a large, complex model—often referred to as the “teacher“ model—to a smaller, more efficient “student” model. This process enables the student model to replicate the performance of the teacher model while being more computationally feasible for deployment on edge devices with limited resources. Hinton et al. [
22] utilized the output distribution of the teacher model as the knowledge source for distillation. In contrast, Romero et al. [
23] employed features from intermediate layers for the same purpose. Knowledge distillation has become a key approach for compressing deep learning models, allowing for the retention of essential features and predictive power while reducing the computational and memory demands of the model [
24]. The central idea behind knowledge distillation is that the teacher model provides more informative training signals than standard training methods. Instead of training the student model solely on labeled data, the student also learns from the teacher model’s output, typically in the form of soft probabilities or logits. These soft targets contain richer information about the relationships between classes, allowing the student model to generalize better even with fewer parameters. For instance, in a classification task, a teacher model might indicate that an image of a cat has a 90% probability of being a cat but also a 5% probability of being a dog. This nuanced information helps the student model learn more effectively than it would from hard, one-hot labels alone.
All these edge intelligence techniques aim to streamline artificial intelligence models, making them smaller and faster. Subsequently, these optimized models can be deployed on edge devices to support large-scale applications. This approach significantly enhances the practicality and accessibility of advanced AI capabilities in resource-constrained environments, facilitating real-time processing and decision-making at the edge of the network. These challenges and the potential applications of edge intelligence are summarized in
Figure 1.
1.3. Emerging Techniques in Edge Intelligence
In addition to model sparsity, quantization, and knowledge distillation, several other techniques have emerged as key contributors to edge intelligence. Neural Architecture Search (NAS) optimizes the architecture of AI models, making them more suitable for resource-constrained devices. Early-exit models allow for faster inference by terminating the process early when sufficient confidence is reached. Federated learning, on the other hand, addresses privacy concerns by training models across decentralized data sources without sharing sensitive information.
Neural Architecture Search (NAS): NAS automates the design of neural networks, optimizing architectures for specific tasks and hardware constraints. Techniques like reinforcement learning and evolutionary algorithms are commonly used in NAS.
Early-Exit Models: These models allow inference to terminate early if sufficient confidence is achieved, reducing computation and latency. They are particularly useful in real-time applications on edge devices.
Federated Learning: Federated learning enables decentralized model training across multiple devices, preserving data privacy and reducing communication costs. It is essential for applications where data cannot leave the device.
1.4. Contributions and Organization of This Work
This article not only reviews the key techniques but also synthesizes the persistent challenges with the latest technological advances, providing an actionable decision framework and outlining critical open research directions, including adaptive model compression, secure and interpretable edge AI, and cross-layer co-design of hardware and algorithms for edge deployment.
The main contributions of this review are as follows:
We provide a comprehensive overview of core edge AI techniques including emerging trends such as NAS and FL.
We perform comparative analysis to assess technique applicability, efficiency, and trade-offs.
We emphasize industrial scenarios and practical implementation directions.
The remainder of this paper is organized as follows:
Section 2 explores practical applications of edge intelligence in industrial domains.
Section 3 identifies and elaborates on the major challenges facing edge AI deployment.
Section 4 discusses enabling technologies and future directions, while
Section 5 concludes the review.
2. Related Work and Literature Review
2.1. Related Work
Edge intelligence has emerged as a pivotal research direction aiming to bring AI capabilities closer to data sources, addressing critical challenges such as latency, privacy, and bandwidth optimization. Over the past few years, several comprehensive surveys have explored the intersection of AI and edge computing from various perspectives.
For instance, Merenda et al. [
25] provide an extensive overview of machine learning models optimized for execution on resource-constrained Internet of Things (IoT) devices, presenting the main techniques enabling efficient edge learning while highlighting future trends toward the Internet of Conscious Things.
In the domain of audio processing, Kim et al. [
26] discuss bio-inspired continuous-time analog circuits used as feature extractors, enabling ultra-low-power audio processing on edge devices with deep learning models.
Shuvo et al. [
27] categorize optimization techniques into algorithm design, model optimization, algorithm–hardware codesign, and efficient accelerator development, providing an exhaustive review on enabling deep learning inference in edge environments.
Furthermore, the deployment of large language models (LLMs) at the edge has been systematically reviewed in [
28], highlighting advances in model compression, inference optimization, and edge-efficient deployment frameworks.
Additionally, the UAV domain has been addressed by McEnroe et al. [
29], focusing on UAV-based IoT applications and the challenges posed by integrating AI into UAV systems using edge computing paradigms.
From a system-level perspective, Grill et al. [
30] present a comprehensive taxonomy of Edge AI, classifying its architectures, use cases, and research challenges.
Other relevant works, such as paper [
31] and paper [
32], further elaborate on lightweight model deployment strategies and AI model optimization methods, respectively.
These diverse perspectives illustrate the rapid development and multi-faceted challenges of Edge AI, yet gaps remain in synthesizing these approaches into unified frameworks that balance performance, energy efficiency, security, and scalability. Our work aims to bridge these gaps by integrating technical methods, application-specific considerations, and emerging research trends.
To ensure rigorous classification of primary studies, we categorize the 170 included papers along three dimensions:
Technique: Model compression (58%), Neural Architecture Search (22%), federated learning (20%).
Application Domain: Industrial (45%), Healthcare (30%), Autonomous Systems (25%).
Publication Year: 2015–2020 (30%), 2021–2024 (70%).
This taxonomy highlights the dominance of model compression techniques in industrial applications, with a significant surge in publications post-2021 reflecting accelerated research in edge-efficient AI.
2.2. Comparative Analysis of Existing Review Works
To contextualize our review,
Table 1 presents a comparative analysis of representative review articles on Edge Intelligence, summarizing their focus areas, scopes, and unique contributions.
As illustrated in
Table 1, previous works have extensively addressed specific aspects of edge intelligence, including model compression for IoT devices, acceleration of deep learning inference, and optimization methods tailored for LLM deployment at the edge. Moreover, UAV-centric edge AI applications, systematic taxonomies, and lightweight AI deployment methods have been explored in depth. However, most of these studies concentrate on singular dimensions of edge intelligence and lack an integrative perspective that combines emerging techniques, deployment challenges, and practical applications.
Our work aims to bridge this gap by providing a holistic and multi-dimensional review encompassing techniques, applications, trade-offs, and future directions, enabling a more systematic understanding of the edge intelligence ecosystem.
3. Benefits of Edge Computing
Edge computing has emerged as a transformative paradigm in distributed computing architectures, offering significant benefits by relocating computation, data storage, and analytics closer to the data source or end user. This approach addresses critical limitations inherent in traditional centralized cloud computing models and serves as the foundational enabler for edge intelligence deployment.
3.1. Reduced Latency and Real-Time Processing
One of the primary benefits of edge computing is the significant reduction in data transmission latency. By processing data locally at or near the source, edge computing enables real-time or near-real-time analytics, which is essential for time-sensitive applications such as autonomous driving, industrial automation, and healthcare monitoring [
33]. This local processing ensures that critical decisions can be made instantly without relying on distant cloud data centers.
3.2. Enhanced Data Privacy and Security
Edge computing enhances data privacy and security by minimizing the amount of sensitive data transmitted over public networks. Sensitive data, such as healthcare records or financial transactions, can be processed locally, reducing the risk of exposure to cyber threats during transmission. Furthermore, localized processing enables compliance with regional data sovereignty regulations [
34].
3.3. Bandwidth Optimization and Cost Reduction
By offloading data processing and analytics tasks to the edge, organizations can significantly reduce the volume of data sent to the cloud, thereby optimizing bandwidth utilization and reducing associated costs. This is particularly beneficial in applications generating massive data streams, such as video surveillance or industrial IoT systems [
35].
3.4. Scalability and Resilience
Edge computing supports scalable architectures by distributing workloads across multiple edge nodes, ensuring that processing is not solely dependent on a centralized infrastructure. This distributed nature also enhances system resilience, as edge nodes can continue to operate independently in cases of intermittent connectivity or failures in the core network [
36].
3.5. Foundation for Edge Intelligence
Crucially, the shift toward edge computing lays the groundwork for deploying AI models directly at the edge, giving rise to edge intelligence. The computational infrastructure provided by edge nodes enables efficient inference and decision-making processes closer to data generation points, making edge intelligence feasible and practical for diverse application domains [
37].
4. Potential Applications of Edge Intelligence
4.1. Consumer and Emerging Applications of Edge Intelligence
The adoption of edge intelligence can catalyze transformative advancements across a broad spectrum of artificial intelligence applications. This includes enhancing real-time virtual reality (VR) and augmented reality (AR) experiences, which can significantly increase user engagement and immersion during interactions. In the realm of intelligent wearable devices, edge intelligence improves production efficiency by seamlessly integrating notifications, reminders, calendars, emails, and other productivity tools directly within the user’s visual field. Additionally, the integration of intelligent models into microchip implants can facilitate more informed clinical decisions and personalized treatment strategies, underscoring the potential of edge intelligence to revolutionize not only consumer technology but also healthcare applications.
VR/AR technology: VR/AR technology is integral to a variety of applications, including advertising, art and design, and medical imaging, employing neural networks to predict pixel values for rendering 3D objects [
38]. However, the computational intensity required for these operations poses significant challenges for deploying augmented and virtual reality on edge devices, especially in real-time contexts such as gaming and navigation systems. Edge intelligence mitigates these challenges by optimizing AI models through compression techniques, which reduce computational requirements. This enhancement in processing efficiency facilitates faster inference speeds, crucial for supporting more dynamic and real-time VR/AR applications like virtual digital humans.
Wearable devices: Wearable devices such as digital watches and smart glasses have become integral to our daily lives, offering functionalities like message reception to enhance productivity. However, many current wearable devices either lack advanced intelligence capabilities or rely on cloud servers to augment their processing power. Edge intelligence represents a transformative shift in this landscape by enabling data processing closer to its source, reducing dependence on distant cloud servers. This advancement can revolutionize wearable technology, providing more personalized user interactions. For instance, smartwatches that monitor vital signs and fitness trackers that assess workout performance can now process data locally. This not only helps preserve user privacy by reducing the need to transmit sensitive information to external servers but also allows on-device machine learning algorithms to learn from and adapt to user behavior. This facilitates tailored recommendations and predictive analytics, significantly enhancing utility and user experience across various sectors, including healthcare and fitness. By integrating compact, intelligent models directly onto wearable devices, we unlock new potential for user engagement and functional richness.
Microchip implants: Microchip implants represent a significant technological advancement with multifaceted implications across various fields. However, due to well-being and health concerns, data must be processed locally on the device without communicating with cloud servers. Furthermore, as microchip implants require continuous operation, they necessitate the use of low-power intelligent models. Edge intelligence applications in microchip implants integrate the capabilities of compact embedded devices with advanced computational power. These implants, utilizing edge intelligence models, have the potential to revolutionize numerous sectors, including healthcare and security. In healthcare, microchip implants equipped with edge intelligence can continuously monitor vital signs, detect anomalies, and autonomously administer medication based on real-time data analysis. In security applications, these implants can enhance authentication processes, ensuring secure access to restricted areas or sensitive data by employing biometric recognition and encryption algorithms directly on the implant. By leveraging edge intelligence technology, microchip implants streamline processes and enhance privacy and security by minimizing data transmission and processing outside the device itself. This convergence of microchip implants and edge intelligence holds immense potential to reshape industries and improve the quality of life for individuals worldwide.
Autonomous vehicles: Autonomous vehicles (AVs) demand ultra-low latency decision-making capabilities, where even the slightest delay could lead to safety risks [
39]. Edge intelligence is critical in this domain because it enables vehicles to process data locally, thereby reducing the reliance on cloud servers for real-time decisions. By performing key tasks such as image recognition [
40], lidar data analysis, and sensor fusion on the vehicle itself, edge devices facilitate faster, more reliable responses. This local processing ensures that AVs can operate efficiently even in environments with intermittent connectivity or high latency, which is particularly important for applications like urban driving, highway navigation, or managing complex traffic scenarios. A prime example of edge intelligence in AVs is its role in obstacle detection and avoidance. In highly dynamic environments, such as city streets or highways, autonomous vehicles must constantly assess their surroundings, identifying potential hazards in real time. By leveraging edge AI, AVs can process data from multiple sensors—cameras, lidar, radar—locally and make split-second decisions, such as braking or changing lanes, without needing to communicate with a distant server. This capability becomes particularly crucial in situations where immediate action is required, for example, when a pedestrian suddenly steps into the road or when another vehicle behaves unpredictably.
4.2. Industrial Edge Intelligence Applications
Edge intelligence plays a critical role in enabling smart manufacturing, predictive maintenance, and quality inspection within industrial environments. For example, smart factories deploy AI-enhanced edge devices for real-time equipment monitoring, where AI models deployed on edge gateways can detect anomalies in machinery vibrations and temperatures, enabling predictive maintenance and reducing unplanned downtimes [
41]. Additionally, automated visual inspection systems leverage edge-deployed deep learning models to identify defects on production lines, ensuring consistent product quality without the need for centralized cloud analysis [
42].
A representative industrial application is found in Industry 4.0-enabled manufacturing cells, where collaborative robots (cobots) rely on edge-deployed AI models for visual perception, task adaptation, and human-robot collaboration. By processing sensor data locally, these robots enhance safety, responsiveness, and adaptability to changing production demands. These examples highlight the transformative role of edge intelligence in achieving flexible, resilient, and data-driven industrial operations.
4.3. AI Techniques Applied in Edge Scenarios
AI techniques, particularly deep learning and machine learning, are fundamental enablers of edge intelligence. Convolutional Neural Networks (CNNs) are widely deployed on edge devices for object detection, quality inspection, and anomaly recognition tasks in manufacturing, logistics, and transportation sectors [
43]. Additionally, reinforcement learning (RL) is employed in edge-controlled robotic arms to optimize task sequences and minimize energy consumption during operations [
44].
In industrial environments, federated learning allows multiple manufacturing plants or production lines to collaboratively train AI models while preserving data privacy. This decentralized approach enables AI models to adapt to diverse process variations across different production sites without requiring sensitive production data to leave the factory [
45]. Moreover, lightweight AI techniques, such as model pruning, quantization, and knowledge distillation, are essential to ensure that these AI models can operate efficiently within the constrained computational environments typical of industrial edge devices.
4.4. Architecture of Manufacturing Operations Systems
To provide a systematic understanding of the integration of edge intelligence within industrial environments,
Figure 2 illustrates a typical layered architecture of manufacturing operations systems. This hierarchical structure comprises four primary layers, each performing distinct functions in data processing, decision-making, and execution.
Enterprise Layer: Responsible for overarching business strategies, enterprise resource planning (ERP), and corporate management functions.
Operation Management Layer: Focuses on production scheduling, manufacturing execution systems (MES), and supervisory control and data acquisition (SCADA).
Edge Layer: Acts as a critical intermediary that bridges real-time data from devices to management systems. It enables on-site AI inference, data filtering, and real-time analytics, significantly reducing reliance on cloud-based processing while ensuring low-latency decision-making capabilities.
Device Layer: Includes field devices such as sensors, actuators, and programmable logic controllers (PLCs), directly interacting with the physical production environment.
In this architecture, the edge layer plays a pivotal role by performing localized intelligence processing close to data sources. This not only enhances responsiveness and system resilience but also strengthens privacy protection and energy efficiency by minimizing data transmission requirements. The layered structure ensures efficient collaboration across the entire manufacturing system, facilitating smart, agile, and secure industrial operations.
4.5. Case Study: Edge Intelligence Deployment in Smart Manufacturing
To demonstrate the practical value of edge intelligence in industrial contexts, this section presents a representative case study inspired by Lee et al. [
46], focusing on the deployment of Industrial AI and predictive analytics in a smart manufacturing environment.
Scenario Overview: The selected case involves an advanced machining factory facing challenges such as increasing demands for production efficiency, equipment reliability, and operational safety. To address these issues, the factory adopted an integrated Industrial AI approach, embedding predictive analytics capabilities directly into the manufacturing operations system through edge computing.
Implementation Strategy: The factory deployed a hierarchical Industrial AI architecture based on the 5C model (Connection, Conversion, Cyber, Cognition, and Configuration), enabling seamless data acquisition, processing, analysis, and decision-making. At the edge layer, smart sensors (vibration, temperature, acoustic emission, etc.) were installed on critical equipment to collect real-time operational data. These data streams were locally processed using embedded edge AI modules, applying data preprocessing, feature extraction, and anomaly detection algorithms to monitor equipment health and detect early signs of degradation.
The preprocessed data and extracted features were then transmitted to fog nodes, where further analytics and event modeling were conducted to predict failures and optimize maintenance schedules. The fog computing layer enabled near-real-time decision support while reducing bandwidth and ensuring data privacy. Selective data, including fleet-level degradation trends and historical failure modes, were sent to the cloud layer for long-term storage, advanced analytics, and digital twin modeling.
Outcomes and Benefits: Through this layered Industrial AI deployment:
Enhanced Predictive Maintenance: The factory achieved a 25% reduction in unplanned downtime by leveraging real-time anomaly detection and proactive maintenance recommendations.
Quality Improvement: Integration of AI-based visual inspection systems at the edge reduced product defect rates by 30%, enabling rapid feedback loops directly on the production line.
Operational Efficiency: Localized data processing at the edge reduced latency to sub-10 ms, enabling immediate response actions and minimizing reliance on cloud-only processing.
Data Security and Privacy: Sensitive production data were processed locally, ensuring compliance with data sovereignty requirements and reducing cybersecurity risks.
Key Insights: This case underscores the critical role of edge intelligence, integrated with Industrial AI frameworks, in achieving agile, fault-tolerant, and data-driven smart manufacturing operations. By embedding intelligence close to data sources and combining edge, fog, and cloud layers, the manufacturing system enhanced its responsiveness, resilience, and efficiency, while laying the foundation for self-organizing, on-demand, and context-aware production environments.
Figure 3 depicts an Industry 4.0-enabled production line equipped with edge gateways, collaborative robots, and various sensors (e.g., vibration and temperature sensors). Edge intelligence models, such as quantized convolutional neural networks (CNNs), are deployed on local edge gateways to perform real-time defect detection, predictive maintenance, and anomaly analysis. Sensor data and production images are processed locally, while optional data can be uploaded to the cloud for long-term storage and advanced analytics. This architecture enhances operational efficiency, improves product quality, and reduces latency and privacy risks associated with cloud-only processing.
5. Challenges of Edge Intelligence
5.1. Overview of Key Challenges
The rapid advancement of artificial intelligence (AI) has propelled edge intelligence to the forefront of technological innovation, enabling real-time decision-making and localized data processing. However, the deployment of AI models on resource-constrained edge devices introduces multifaceted challenges that hinder widespread adoption. These challenges stem from fundamental conflicts between the escalating complexity of AI models and the inherent limitations of edge hardware.
First, the computational and memory demands of state-of-the-art AI models, such as large language models (LLMs) and vision transformers, vastly exceed the capabilities of typical edge devices. For instance, training a model like OPT-175B requires nearly 1000 high-end GPUs [
47], whereas edge devices such as smartphones and IoT sensors operate with computational capacities orders of magnitude lower. This gap necessitates aggressive model compression techniques, yet compressing billion-parameter models without sacrificing accuracy remains a formidable task.
Second, the interpretability of edge intelligence models is often compromised by design. While techniques like pruning and quantization streamline models for efficiency, they obscure internal decision-making mechanisms. In critical domains such as healthcare and autonomous systems, the inability to trace model reasoning raises ethical and operational concerns. For example, medical diagnostics relying on “black-box” predictions risk misdiagnosis if clinicians cannot validate the underlying logic [
48].
Third, privacy and security vulnerabilities are exacerbated in distributed edge ecosystems. Unlike centralized cloud servers, edge devices are physically accessible and lack robust security protocols, making them prime targets for adversarial attacks. Recent studies highlight risks such as sensor spoofing in autonomous vehicles [
49] and model inversion attacks in financial systems [
50], underscoring the urgent need for tamper-resistant architectures.
Finally, energy efficiency remains a critical bottleneck. Edge devices—especially wearables, drones, and implantable sensors—rely on limited battery power, yet AI workloads inherently demand high energy consumption. Optimizing energy use without compromising performance requires innovations in both hardware [
51] (e.g., low-power AI chips) and algorithmic design [
52] (e.g., early-exit networks).
These challenges collectively define the current landscape of edge intelligence, demanding interdisciplinary solutions that balance performance, efficiency, and reliability. The following sections delve into these issues through comparative analysis and propose actionable frameworks for addressing trade-offs.
5.2. Comparative Analysis of Edge Intelligence Techniques
To systematically evaluate the strengths and limitations of mainstream edge intelligence techniques, we present a comparative analysis based on four critical dimensions: computational latency, memory footprint, scalability, and application suitability. As shown in
Table 2, techniques such as model pruning, quantization, and knowledge distillation exhibit distinct trade-offs across these metrics. For instance, quantization significantly reduces memory usage but may compromise model accuracy in low-bit configurations, while knowledge distillation achieves lightweight models at the cost of dependency on teacher models. This comparison highlights the need for context-aware selection of techniques depending on deployment constraints (e.g., real-time requirements vs. energy limitations).
5.3. Persistent Challenges in Edge Intelligence
Although edge intelligence has achieved significant success and has become one of the most prominent research fields, several challenges including large-scale intelligent model deployment, poor interpretability, privacy and security issues, and energy efficiency remain in its development.
5.3.1. Large-Scale Intelligent Model Deployment
Deploying large-scale intelligent models with massive parameters, such as GPT [
1], on edge devices remains difficult. The complexity of compressing these large-scale models stems from their intricate network structures. When the model parameter scale reaches billions, accurately identifying redundant parameters and features becomes a significant challenge. Furthermore, large-scale models inherently require substantial computational power and storage, posing additional obstacles for deployment on extremely small edge devices.
To address these challenges, ongoing research in edge intelligence specifically designed for large-scale models is crucial. A promising solution lies in the co-design of software and hardware. On the one hand, model structures can be tailored to the specific characteristics of available resources, optimizing the use of limited computational and storage capacities. On the other hand, hardware architectures can be customized to align with model structures, enabling the development of specialized artificial intelligence chips for model acceleration. This approach fosters the specialization of software and hardware, working synergistically to achieve optimal performance.
Recent advances in model compression specifically target the deployment of large-scale language models on edge devices. For example, SparseGPT [
10] and OmniQuant [
14] enable aggressive pruning and quantization without significant accuracy degradation, opening new avenues for edge AI. However, integrating these methods seamlessly into heterogeneous hardware remains challenging, highlighting the necessity for adaptive and hardware-aware compression frameworks. Moreover, ongoing research focuses on modularizing large models into smaller, task-specific experts to reduce the computational burden at the edge [
53].
5.3.2. Poor iNterpretability
Edge intelligence models often suffer from poor interpretability, which can be a critical issue in certain fields [
54]. In the medical domain, for example, the ability of a model to elucidate its diagnostic reasoning enhances the credibility of its outcomes. Relying on model inferences without understanding their basis can lead to serious medical errors. Similarly, in the legal field, allowing models to make unchecked decisions may result in wrongful judgments and other significant mistakes. Furthermore, in business contexts such as finance and transportation, explanations of the model’s final outputs are necessary to prevent large-scale economic losses. As of now, various definitions of model interpretability have been proposed, such as the two types defined by Lipton: ante hoc interpretability and post hoc interpretability [
55]. Compared to conventional neural networks, edge intelligence models lose some layers or kernels that connect parameters, hindering our understanding of the model’s internal network structure and output interpretation, thereby reducing their interpretability. The architecture of neural networks is inherently complex, and the introduction of edge intelligence technology adds a series of black-box operations, further complicating human interpretation. Therefore, research on the interpretability of edge intelligence models remains a significant challenge, necessitating an enhanced understanding of network structures and output reasoning.
To address interpretability challenges, recent studies have proposed the integration of explainable AI (XAI) techniques into lightweight models, enabling transparency in edge inference processes [
56]. Techniques such as post hoc attention visualization, rule extraction from compressed models, and surrogate modeling are being explored to bridge the gap between model efficiency and explainability. For instance, interpretable pruning techniques aim to retain semantically critical pathways during compression [
57]. However, achieving human-understandable explanations under strict resource constraints remains an open problem, demanding the co-design of interpretable architectures and explanation-friendly compression algorithms specifically for edge intelligence.
5.3.3. Privacy and Security
Privacy and security issues present another significant challenge for edge intelligence technology [
58]. As edge intelligence models are deployed on edge devices for real-world applications, it is vital to protect these models from malicious attacks. Unlike cloud servers, which can be centrally monitored and secured, edge devices are widely distributed and often less protected, making them vulnerable to physical tampering or cyberattacks. In particular, edge AI in autonomous vehicles has already faced attacks that exploit sensor fusion techniques, where deliberately altered sensor data lead to catastrophic system failures. When encountering malicious inputs, it is essential to handle the input data correctly and prevent tampering to avoid degrading model performance. For example, in the autonomous driving domain, Cao et al. [
59] identified serious technical vulnerabilities in the multi-sensor fusion perception technology currently used in autonomous vehicles. Malicious attackers could introduce deliberately designed obstacles that the model fails to detect, potentially leading to accidents. In the financial sector, attackers might reverse-engineer corporate confidential data through the confidence information outputted by the model [
60]. Edge intelligence models are particularly vulnerable to these attacks due to their fewer parameters and simpler network structures. Therefore, addressing privacy and security concerns is crucial when deploying edge intelligence technology to ensure the safe and effective use of edge models in practical applications.
Recent research emphasizes the growing threat landscape facing edge AI deployments. Model inversion, membership inference, and side-channel attacks are increasingly targeting lightweight models deployed at the edge [
61,
62,
63]. Federated learning combined with differential privacy and secure aggregation protocols has emerged as a potential solution to mitigate such risks [
64]. Moreover, recent work explores adversarially robust model compression and physically tamper-resistant AI chips to enhance edge security [
65]. However, ensuring comprehensive security across diverse and physically exposed edge nodes still requires advances in decentralized trust frameworks and lightweight yet effective defense mechanisms.
5.3.4. Energy Efficiency
One of the most pressing challenges for edge intelligence is achieving energy efficiency, particularly in edge devices like drones, wearables, and IoT systems that are often battery-powered. These devices must balance the need for high computational performance with the constraints of limited power availability. Traditional AI models require significant computational resources, which, when deployed on edge devices, can rapidly drain battery life and limit operational duration. This makes optimizing energy consumption a key priority in edge AI research. The deployment of edge intelligence in energy-constrained environments also raises the issue of thermal management. High computational loads can generate significant heat, leading to overheating and reduced device efficiency. Efficient thermal design, alongside energy-aware algorithms, is essential to maintaining device performance while minimizing energy consumption.
Addressing energy efficiency challenges has become a focal point of current research. Techniques like early-exit networks, event-driven processing, and neuromorphic computing offer promising avenues for reducing energy consumption while maintaining AI performance [
66,
67,
68]. Moreover, adaptive inference strategies, where models dynamically adjust computation based on the complexity of inputs or task requirements, have shown effectiveness in prolonging device battery life [
69]. Hardware-level innovations, such as near-memory processing and AI-specific accelerators (e.g., Eyeriss, Edge TPU), also contribute to reducing both energy and latency footprints. Nonetheless, balancing energy efficiency, model accuracy, and inference latency remains a complex multi-objective optimization problem that calls for joint hardware-software co-design.
5.4. Analytical Insights and Trade-Off Analysis
The deployment of edge intelligence necessitates careful consideration of trade-offs between performance, resource consumption, and application requirements. Based on our comparative analysis, we propose a decision framework (
Figure 4) that maps techniques to deployment scenarios.
Latency-Critical Systems (e.g., autonomous vehicles): Quantization and hardware-aware NAS are prioritized for low-latency inference.
Memory-Constrained Devices (e.g., microchip implants): Sparse models combined with binary quantization minimize memory footprint.
Privacy-Sensitive Applications (e.g., healthcare): Knowledge distillation avoids raw data transmission but requires robust teacher–student alignment.
A key insight is that no single technique universally optimizes all metrics. For example, while pruning reduces model size, it may increase training complexity for large-scale models. Similarly, federated learning enhances privacy but introduces communication overhead. Future research should focus on hybrid approaches (e.g., quantized sparse models) and dynamic adaptation mechanisms to balance these trade-offs.
5.5. Emerging Advances in Edge Intelligence Technologies
Emerging research trends continue to push the boundaries of edge intelligence. Advanced post-training quantization methods, such as SmoothQuant [
12] and AWQ [
70], have demonstrated superior performance in large language model deployment with reduced latency and energy costs. In addition, techniques like federated distillation [
71], enabling collaborative model compression without data sharing, address privacy-preserving model training across distributed edge nodes.
On the hardware front, the rise of neuromorphic chips and in-memory computing architectures offers promising solutions for ultra-low-power AI inference at the edge [
72]. These hardware advances, coupled with adaptive learning algorithms capable of on-device personalization, are expected to shape the future of edge intelligence. Furthermore, the convergence of 6G networks and edge AI will create new paradigms for decentralized intelligence, integrating real-time AI processing into network fabric [
73].
These emerging directions highlight the dynamic evolution of edge intelligence, suggesting that future systems will be highly adaptive, privacy-preserving, and energy-aware.
Figure 5 presents the enhanced decision framework for edge intelligence techniques. This framework incorporates emerging approaches such as federated distillation, post-training quantization, and neuromorphic computing, mapping them to suitable deployment scenarios and emphasizing hybrid and adaptive methods to balance accuracy, latency, privacy, and energy efficiency.
6. The Future of Edge Intelligence
The evolution of edge intelligence is closely tied to innovations in hardware design, advanced battery materials, and next-generation network connectivity, which together enable efficient, real-time processing across diverse applications.
Hardware design innovations for edge intelligence. Effective deployment of AI at the edge demands hardware designed for high-performance, low-power computation. Conventional hardware architectures, such as general-purpose CPUs and GPUs, have limitations in energy efficiency and computational density for edge applications. To address this, specialized AI hardware, including application-specific integrated circuits (ASICs) and field-programmable gate arrays (FPGAs), has been developed to support complex AI processing on edge devices with significantly lower power consumption.
In particular, AI accelerators tailored for edge tasks are evolving with compact and efficient architectures, enabling real-time processing on low-resource devices. Techniques like in-memory computing reduce latency by allowing computation directly in memory, thus bypassing data transfer bottlenecks. Additionally, neuromorphic computing, which mimics neural processes in the brain, is a promising field aimed at enabling energy-efficient, adaptive learning on edge devices. Such hardware advances are crucial for applications requiring high-speed decision-making, such as autonomous vehicles, drones, and real-time health monitoring.
Advanced battery materials and energy efficiency: Battery life remains a critical factor in the practicality of edge devices, particularly in mobile and remote settings. Advances in battery technology are vital to supporting these energy-intensive AI operations sustainably. Emerging materials, such as solid-state batteries, offer increased energy density and safety compared to conventional lithium-ion options. Solid-state batteries also promise faster charging times, allowing edge devices to operate longer and recharge less frequently, which is essential for applications like remote environmental sensors or autonomous systems where manual recharging is challenging.
Beyond battery advancements, energy-harvesting technologies like solar and thermal capture are becoming more feasible for edge devices, enabling autonomous operation without continuous battery dependency. These power solutions are particularly valuable for low-power, continuous monitoring applications, such as agricultural IoT systems and wearable health trackers, which can operate efficiently with periodic solar or ambient thermal energy.
Next-generation network connectivity: Network connectivity underpins edge intelligence by enabling data communication and synchronization across devices. The deployment of 5G networks has been transformative, offering ultra-reliable, low-latency connections that support the high bandwidth required by edge applications. For instance, 5G enables real-time coordination in autonomous drones, smart manufacturing, and smart cities, where distributed edge devices need to communicate and make decisions quickly.
Looking further, 6G networks are anticipated to expand these capabilities by introducing even higher data transfer speeds and lower latency, leveraging terahertz (THz) frequencies. These networks will also integrate AI capabilities directly within the network architecture, enabling distributed AI processing and reducing dependency on central cloud servers. For edge applications like connected vehicles, this low-latency, high-bandwidth connectivity will allow devices to respond to environmental changes almost instantaneously, a crucial requirement for safety in autonomous driving and robotics applications.
Integrated future of edge intelligence: The future of edge intelligence will be marked by the integration of advanced hardware, sustainable power sources, and robust connectivity solutions, creating an ecosystem of capable, autonomous, and energy-efficient edge devices. These technological advancements will expand the range of AI applications across various fields, from healthcare and environmental monitoring to smart cities and autonomous systems. For example, autonomous medical devices, powered by durable solid-state batteries and connected via 5G, could provide continuous, reliable monitoring and emergency response capabilities without reliance on cloud infrastructure. Similarly, industrial IoT systems could leverage energy-harvesting sensors and neuromorphic processors to enable real-time predictive maintenance and resource optimization.
In summary, advancements in hardware design, battery technology, and network connectivity are essential for realizing the full potential of edge intelligence. Together, they promise to make intelligent, responsive, and privacy-preserving AI applications feasible on an unprecedented scale, marking a new era of ubiquitous AI-driven innovation.
7. Conclusions
Edge intelligence has emerged as a pivotal technology to address the growing demand for deploying artificial intelligence (AI) on resource-constrained edge devices. This paper has reviewed the key advancements in edge intelligence in recent years. The applications of edge intelligence in domains like healthcare, industrial IoT, autonomous vehicles, and consumer technology highlight its potential to revolutionize various sectors by enabling real-time decision-making closer to the data source.
Despite its promise, edge intelligence faces significant challenges, particularly in scaling large AI models for edge deployment, improving model interpretability, and addressing privacy and security concerns. Energy efficiency remains a critical constraint, especially for battery-operated devices, and requires ongoing innovations in both hardware design and algorithmic efficiency. Furthermore, the interpretability of edge models is crucial for their adoption in sensitive fields such as healthcare and law, where understanding the decision-making process is essential. Security threats, including model tampering and adversarial attacks, also pose substantial risks to the widespread deployment of edge AI.
Looking ahead, the future of edge intelligence will be shaped by advances in specialized AI hardware, durable battery technologies, and high-speed connectivity. Low-power AI chips, solid-state and energy-harvesting batteries, and 5G/6G networks will enable efficient, real-time processing and communication among edge devices. These developments will drive privacy-preserving, intelligent applications across healthcare, autonomous systems, and smart cities, unlocking scalable and sustainable edge intelligence.
As AI continues to evolve, edge intelligence will play an increasingly important role in bringing intelligent systems closer to the end user, facilitating faster, more secure, and efficient processing in real-world environments. The synergy between edge intelligence and ongoing technological advancements heralds a new era of ubiquitous AI, poised to transform industries and improve the quality of life globally.
This review thus contributes by bridging existing knowledge gaps, offering a comprehensive framework that connects edge intelligence techniques, deployment scenarios, and future directions. It provides valuable insights for researchers and practitioners aiming to tackle the scalability, security, and interpretability challenges of AI at the edge.