Key Considerations for Real-Time Object Recognition on Edge Computing Devices

Surantha, Nico; Sutisna, Nana

doi:10.3390/app15137533

Open AccessReview

Key Considerations for Real-Time Object Recognition on Edge Computing Devices

by

Nico Surantha

^1,*

and

Nana Sutisna

²

¹

Department of Electrical, Electronics and Communication Engineering, Faculty Science and Engineering, Tokyo City University, Setagaya-ku, Tokyo 158-8557, Japan

²

School of Electrical Engineering and Informatics, Institut Teknologi Bandung, Bandung 40132, Indonesia

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(13), 7533; https://doi.org/10.3390/app15137533

Submission received: 20 February 2025 / Revised: 21 June 2025 / Accepted: 1 July 2025 / Published: 4 July 2025

(This article belongs to the Special Issue Edge AI for Real-Time Object Recognition: Innovations, Applications, and Challenges)

Download

Browse Figures

Versions Notes

Abstract

The rapid growth of the Internet of Things (IoT) and smart devices has led to an increasing demand for real-time data processing at the edge of networks closer to the source of data generation. This review paper introduces how artificial intelligence (AI) can be integrated with edge computing to enable efficient and scalable object recognition applications. It covers the key considerations of employing deep learning on edge computing devices, such as selecting edge devices, deep learning frameworks, lightweight deep learning models, hardware optimization, and performance metrics. An example of an application is also presented in this article, which is about real-time power transmission line detection using edge computing devices. The evaluation results show the significance of implementing lightweight models and model compression techniques such as quantized Tiny YOLOv7. It also shows the hardware performance on some edge devices, such as Raspberry Pi and Jetson platforms. Through practical examples, readers will gain insights into designing and implementing AI-powered edge solutions for various object recognition use cases, including smart surveillance, autonomous vehicles, and industrial automation. The review concludes by addressing emerging trends, such as federated learning and hardware accelerators, which are set to shape the future of AI on edge computing for object recognition.

Keywords:

edge AI; object recognition; deep learning; edge computing; real-time inspection

1. Introduction

The proliferation of Internet of Things (IoT) devices and the increasing demand for real-time applications are driving the need for efficient object recognition capabilities at the network edge [1,2]. Traditional cloud-based approaches face challenges in meeting the latency, bandwidth, and privacy requirements of these applications. Edge computing, which brings computation and data storage closer to the data sources, offers a promising solution by enabling local processing and reducing reliance on the cloud [3]. The proximity of edge computing can support scalability and privacy-policy enforcement for the IoT, highly responsive cloud services for mobile computing, and the capacity to conceal temporary cloud outages [4].

Integrating artificial intelligence (AI), particularly deep learning techniques, with edge computing architectures is transforming object recognition capabilities across various domains [5]. This powerful combination, known as edge AI or edge intelligence, allows for real-time object detection, tracking, and classification on resource-constrained edge devices, opening up new possibilities for intelligent systems. From smart surveillance systems that can identify suspicious activities to autonomous vehicles that can navigate complex environments, edge AI is crucial in enhancing safety, efficiency, and automation [1,6]. The emergence of edge AI is motivated by rapid improvement in edge devices and single-board computer technology. Some edge devices with high-performance hardware accelerators are on the market, such as NVIDIA Jetson, Google Coral, FPGA, mobile device, and drone platform [7]. This emergence of edge devices is also accompanied by the emergence of special deep learning frameworks for edge devices, such as TensorflowLite [8], TensorRT [9], ONNX [10], OpenCL [11], and VitisAI [12].

Despite their potential, deploying and optimizing deep neural networks for object recognition on edge devices presents unique challenges related to the edge device, object recognition, security and performance evaluation aspects. Edge devices often have limited processing power, memory, and battery life, making it essential to design and implement AI models that are lightweight, efficient, and adaptable to these constraints [13]. Techniques like model compression, network pruning, and hardware-aware neural architecture search are crucial for tailoring deep learning models to the specific requirements of edge hardware [14]. Furthermore, strategies for task offloading, model partitioning, and distributed inference are essential for optimizing resource utilization and achieving real-time performance [15]. In terms of object recognition, there are several well-known challenges. For examples, in terms of aerial object detection using drones, there are challenges related to the object detection in challenging weather conditions [16], small object detection [17], and object detection with varying orientations [18].

In terms of security, some problems have been raised about privacy policy enforcement. Edge computing offers the opportunity to preserve privacy by not uploading all sensitive data to cloud infrastructure. The federated learning approach is one enabler which encourages the training process to be performed on the edge device, with only training weight being uploaded to cloud platform to update the global model [19]. On the other hand, there are also concerns about the reliability of edge AI systems against some well-known attacks, such as adversarial attacks and overload attacks, which can reduce the accuracy and latency performance of the systems [20,21]. In terms of evaluation performance, edge AI is different to traditional AI, which is only evaluated based on detection accuracy performance. An excellent edge AI solution needs to balance between detection accuracy, real-time performance, energy consumption, security, and reliability, depending on the application [22,23,24].

This article provides a comprehensive exploration of AI in edge computing for object recognition applications, covering the fundamentals, challenges, and emerging trends in this rapidly evolving field. It examines the benefits, challenges, and deployment strategies for effectively integrating AI algorithms with edge computing infrastructure. Through practical examples and case studies, readers will gain insights into designing and implementing AI-powered edge solutions for various object recognition use cases, including smart surveillance, autonomous vehicles, and industrial automation. This article also addresses emerging trends, such as federated learning and hardware accelerators, which are set to further enhance the capabilities and efficiency of AI in edge computing for object recognition. The contributions of this paper are summarized as follows

Key considerations of object recognition in edge computing based on literature study and our research experience in this field.
Case study of object recognition in real-time power line inspection with edge computing. In this part, a new evaluation metric, such as the edge–AI deployment score (EADS), is also proposed.
Future research suggestions for object recognition in edge computing.

This paper is organized as follows. In Section 2, a system overview of the edge computing architecture is discussed. Section 3 discusses the key considerations of object recognition on edge computing platforms. Section 4 discusses a case study of power line transmission using edge computing platforms. Section 5 discusses the future research direction. Finally, the conclusion is presented in Section 6.

2. System Overview

In this section, a system overview of edge AI, including the general architecture, benefits, and challenges of the systems, are discussed. Edge computing architectures are designed to bring computation closer to the data sources, reducing latency and improving the performance of AI applications. They involve a hierarchical structure that distributes processing tasks across different layers, including end devices, edge servers, and the cloud. This is shown in Figure 1.

Edge devices: This layer comprises devices like smartphones, IoT sensors, wearable devices, drones, and surveillance cameras. These devices may have limited computational capabilities but play a crucial role in data collection and potential pre-processing. Edge devices can perform lightweight AI computations locally, reducing reliance on the cloud. Examples include smartphones running facial recognition apps or smart speakers responding to voice commands [25]. Edge AI implementation includes several components, i.e., deep learning models, deep learning frameworks for edge devices, hardware accelerators, task processors, power, and storage. In this paper, we try to emphasize how to optimize edge AI implementation in each of these components.
Edge servers: This layer consists of more powerful computing resources located closer to end devices, such as cloudlets, fog nodes, edge servers, routers, base stations, and IoT gateways. These nodes handle more complex AI tasks that are offloaded from end devices or require the aggregation of data from multiple sources. They can perform tasks like object detection in video streams, data filtering, and local model updates [26].
Cloud: The cloud layer acts as a central hub with substantial computational and storage resources [27]. It is responsible for tasks that require extensive processing power or involve large datasets. These tasks may include training complex deep learning models, performing large-scale data analytics, and managing and orchestrating the overall edge computing infrastructure. The cloud can also provide backup and support for edge nodes, ensuring service continuity.

Communication between these layers is crucial for efficient data flow and task distribution. Edge computing frameworks include KubeEdge, OpenEdge, and Azure IoT [28]. Edge computing facilitates this communication and enables the deployment and management of AI services across the edge computing ecosystem. The specific architecture and task distribution depend on the application requirements, device capabilities, and network conditions. It is important to design edge computing architectures that are scalable, flexible, and adaptive to the dynamic nature of edge environments.

There are some benefits of using edge computing for object recognition applications [1,5].

Reduced latency and agile service response: Edge computing reduces latency by processing data closer to the data source, leading to faster response times. This is particularly beneficial for AI applications that require real-time interactions, like autonomous driving, smart cities, and industrial automation. Edge systems serve as nearby computing platforms that process data locally and quickly. Some studies have been developed to optimize latency reduction in edge AI implementation [29,30]
Reduced bandwidth: Edge computing alleviates the burden on the core network by processing data locally, reducing the amount of data transmitted to the cloud. This is especially important with the explosive growth in data generated by IoT devices. It can also reduce reliance on expensive cloud computing resources. A recent study by Rouf et al. reported that the implementation of AI in edge computing can reduce bandwidth usage by 51% compared to implementation on the cloud [31]
Enhanced privacy and security: Processing sensitive data locally on edge devices reduces the risk of data breaches during transmission to the cloud. Edge AI can analyze information locally without exposing it to humans, potentially improving data security. Some studies have reported efforts to improve the security and privacy of edge AI implementation against several threat models [32,33].
Improved reliability and availability: Edge computing enables AI systems to operate even with intermittent or no internet connectivity. This decentralized nature enhances the reliability and availability of critical AI applications. A recent study by Aral et al. proposed dependency- and topology-aware failure resilience (DTFR), a two-stage scheduler that minimizes failure probability while maintaining low network delay to achieve superior availability performance [34].
Real-time data analysis: Edge AI systems can analyze data in real time, allowing for faster decision making and responses to changing conditions. This is crucial for applications like autonomous vehicles and smart homes, which require immediate actions based on sensor data. A recent study by Al Amin et al. reported an inference latency of 15 frames per second (FPS) on the FPGA platform [35].
Energy efficiency: Edge AI can improve efficiency by processing data locally, significantly reducing the amount of data transmitted to a cloud server. It also reduces reliance on energy-intensive data centers. Finally, the deep learning model is executed on resource-constrained edge devices with a lower power consumption. For example, Xiangjie et al. proposed a deep reinforcement learning-based edge computing scheme that can reduce energy costs by more than 15% while ensuring the task can be completed on time [36]. On the other hand, Mendula et al. proposed a novel middleware for adaptive and efficient split computing that can preserve up to 30% energy while achieving a 16% higher accuracy rate [37].

However, there are some potential challenges in system development that require us to reconsider every approach or method that will be used in the development process, such as the following.

Development cost: There is a need to reduce development costs to produce a product with a competitive price. The cost depends on several factors, such the selection of the device, the development time, and the hiring of engineers with special skills.
The demand of the applications: Every application demands different requirements. For example, autonomous driving systems requires the real-time performance [6], while the autonomous drones require an energy-efficient system [38].
Human resources: There are some development approaches that requires special expertise. For example, hardware design expertise is needed for FPGA system development. This expertise comes with a steep learning curve compared to other systems’ development, which will impact the development cost.
Ever-changing systems requirements: There rapid evolution of industry standards requires the system to be adaptable to requirement changes. Some requirements include accuracy performance, hardware performance, security and privacy, and interoperability [39].
Performance evaluation: There is a need to achieve a good balance some parameters, such as accuracy, latency, energy consumption, security, and reliability, depending on the application. Therefore, there is a need for good metrics that can represent an ideal implementation of edge AI [40].
Thermal driven performance collapse: Mobile GPU on edge devices can be leveraged to improve the processing performance of deep neural networks (DNNs). However, this consumes a large amount of energy. After a short period of time, the mobile device may become overheated and the processors are forced to reduce the clock speed, significantly reducing the inference speed [41].
Accuracy and latency attack on edge AI: Some studies reveal that the implementation of edge AI for objective recognition is also prone to the accuracy and latency attacks. Adversarial attacks introduce carefully crafted changes to the input data that can reduce the accuracy performance of a deep learning model [20]. Meanwhile, overload attacks can escalate the required computing costs during the inference time, consequently leading to an extended inference time for object detection. They present a significant threat, especially to systems with limited computing resources [21]

3. Key Considerations of Object Recognition on Edge Computing

In Section 2, the challenges of edge AI development are discussed. In this section, the key considerations to solving the challenges are discussed. There are some key considerations for efficiently deploying and optimizing deep learning for object detection on resource-constrained edge computing devices that we defined based on a couple years of research experience in this field [42,43,44,45,46]. In terms of computing resources, edge computing devices have limited procession capabilities compared to the cloud, making it challenging to run complex AI models for object detection, such as deep neural networks or the multistage processing pipeline [47]. Another consideration is the real-time processing requirements. Time-critical applications, such as object recognition, need low latency and high throughput, demanding efficient AI models on edge devices. As shown in Figure 2, some points that need to be considered when deploying deep learning on edge computing devices are defined in detail in Section 3.1, Section 3.2, Section 3.3, Section 3.4 and Section 3.5.

3.1. Edge Devices

An appropriate edge device with the necessary processing power, memory, and connectivity is crucial for efficient AI deployment. Edge AI usually contains a hardware accelerator, which is hardware that is designed to perform specific tasks more efficiently compared to the general-purpose processor. There are three well-known hardware accelerators for edge AI, which are a graphics processing unit (GPU), tensor processing unit (TPU), and field programmable gate array (FPGA) [48]. The GPU was initially designed to render graphics and perform parallel processing tasks. The main advantage of the GPU is its highly parallel architecture, which is ideal for matrix computations in deep learning [49]. The high performance of the GPU makes it excellent for training and inference tasks. However, it consumes significant power, making it less ideal for resource-constrained applications and typically more expensive than TPUs and FPGAs.

On the other hand, a TPU is a specialized hardware developed by Google for accelerating tensor-based operations, specifically for machine learning workloads [50]. It is highly optimized for matrix multiplications and tensor operations used in neural networks. It is also more energy efficient than a GPU. The limitation of the TPUs is its low flexibility, which means it is not well suited for general-purpose computations or tasks outside AI. It also has limited support for frameworks other than TensorFlow.

Finally, FPGA is a reconfigurable hardware that can be programmed to execute specific tasks efficiently. It is highly customizable for specific workloads, including AI and non-AI tasks [51]. It is also excellent for real-time applications and energy efficient for tailored AI inference tasks. However, developing an FPGA-based solution requires hardware design expertise with a steep learning curve compared to GPUs and TPUs. The development cycle is usually more prolonged due to the need for custom configuration. A comparison of the GPU, TPU, and FPGA is presented in Table 1.

The actual edge AI devices are presented in Table 2. NVIDIA Jetson Nano (Santa Clara, CA, USA) [52] and Jetson Orin Nano (Santa Clara, CA, USA) [53] are devices empowered by GPU. Google Coral Dev board [54] and Xilinx Kria KV 260 are empowered by TPU and FPGA, respectively. Raspberry Pi 4 (Cambridge, UK) is a device with no specific hardware accelerator but it is sometimes also used for edge AI applications [43]. Some end devices contain the embedded CPU and hardware accelerator, for example, the unmanned aerial vehicle (UAV) platform and mobile phones. Flight RB5 5G is a UAV platform produced by Qualcomm, with an embedded CPU and GPU [38]. In this platform, developers can easily implement TensorFlow, ONNX, or PyTorch-trained deep learning models by using the Qualcomm neural processing SDK for AI. On the other hand, the Samsung Galaxy S24 ultra is one of the mobile devices empowered by Qualcomm Snapdragon octa-core and Adreno. The deep learning model can be implemented on the mobile device using Qualcomm Neural Processing SDK or TensorFlow Lite. This capability can be used for various applications such as real-time object and scene recognition or portrait/background segmentation.

The selection of edge devices depends on the system requirement, such as the required system performance, development cost, and development time. For example, NVIDIA Jetson Orin Nano might produce a significant improvement in performance because it is equipped with a high-specs GPU from NVIDIA. It might also be easier to develop due to a wealth of resources and a well-developed community [53]. However, the cost of the device is the highest among all devices presented in Table 2. On the other, Raspberry Pi 4 might present the cheapest solution among all of them, but the performance is limited, especially for high-complexity and time-constrained applications [22].

3.2. Deep Learning Frameworks for Edge Implementation

Deep learning frameworks play a significant role in enabling edge AI for object recognition applications. These frameworks are designed to optimize the deployment of deep learning models in resource-constrained edge devices, ensuring real-time inference and efficient implementation on resource-constrained edge devices.

One of the well-known deep learning frameworks on edge devices is TensorFlow Lite. It supports model quantization for efficient deployment that can convert the model to lower-precision formats, such as floating point 16 bits (FP16) or integer 8 bits (INT8), to reduce memory usage and speed up inference [58]. Two types of quantization methods can be supported by TensorFlow Lite, which are post-training quantization and quantization-aware training (QAT) [59]. Post-quantization is a model optimization technique applied after training, where model weights and activations are reduced in precision (e.g., from FP32 to INT8) to improve inference efficiency; it is more straightforward but may lead to slight accuracy degradation. On the other hand, QAT simulates quantization during training by incorporating quantization effects into the loss function, resulting in a model that is more robust to precision reduction and retains higher accuracy after quantization.

For hardware acceleration, TensorFlow Lite can support layer fusion, fusing operations, e.g., convolution, activation, and batch normalization, into single kernels to minimize memory transfers and computation overhead. It also has the capability to offload computations to hardware accelerators, such as GPU or TPU, for real-time inference. TensorFlow Lite is ideal for TensorFlow-trained models and specific edge platforms, such as Google Coral and mobile devices. TensorRT from NVIDIA is also another framework that can support model quantization and layer fusion [9].

Some of the well-known deep learning frameworks are vendor-specific frameworks, such as TensorRT (NVIDIA), OpenVINO (Intel) [60], and VitisAI (Xilinx) [12]. If the flexibility in development or cross-compatibility to several hardware becomes a concern, then an ONNX framework can provide a solution. It provides cross-platform compatibility by acting as an open standard for representing deep learning models. This compatibility enables the seamless deployment and interoperability of models across various deep learning frameworks and hardware platforms [23]. For example, ONNX defines a standardized format (.onnx) for representing deep learning models, independent of the framework in which they were trained. Models from popular frameworks such as TensorFlow, PyTorch, and Keras can be converted to ONNX format. ONNX Runtime also supports various execution providers, which enable hardware-specific optimizations while maintaining compatibility; for example, CUDA execution provides optimized inference on NVIDIA GPUS, and OpenVINO execution provides model optimization in Intel-based edge devices.

Finally, for FPGA implementation, there are several well-known frameworks, such as Vitis AI, OpenCL [61], and FINN, that enable the efficient deployment of AI models on FPGA platforms by leveraging hardware-specific optimizations and parallel processing capabilities. Vitis AI, developed by Xilinx, offers pre-built tools and libraries for deploying optimized models on Xilinx FPGAs, supporting frameworks like TensorFlow and PyTorch while providing model pruning, quantization, and real-time performance [62]. OpenCL provides a standardized programming interface for creating portable FPGA implementations, allowing developers to write high-level, parallel code that is compiled into FPGA-compatible logic, making it versatile for custom AI applications [11]. FINN, a framework from Xilinx, focuses on ultra-low-latency, quantized neural networks, generating FPGA-optimized designs tailored for resource-constrained tasks like edge computing [63]. These frameworks make FPGAs accessible for deep learning, offering high energy efficiency, real-time processing, and customizable solutions for AI workloads in applications like autonomous systems, healthcare, and robotics. A summary of deep learning frameworks for edge implementation is shown by Table 3.

3.3. Lightweight Deep Learning Model

Lightweight deep learning models, as shown by Table 4, are specifically designed for efficient object recognition on resource-constrained edge devices. MobileNet uses depthwise separable convolutions, which factorize a standard convolution into a depthwise convolution and a 1 × 1 pointwise convolution to reduce computational complexity while maintaining significantly good accuracy, making it ideal for mobile and embedded systems [64]. MobileNet V2 [65] adds inverted residual blocks and linear bottlenecks, enhancing performance and memory efficiency while improving representational power for tasks like object detection and segmentation. MobileNet V3 [66] combined neural architecture search (NAS) and advances like SE (Squeeze-and-Excitation) blocks to optimize accuracy, latency, and energy consumption, providing both small (MobileNetV3-Small) and large (MobileNetV3-Large) variants for different use cases. The common characteristic across all versions of MobileNet is its focus on efficiency and scalability for mobile and edge devices, enabling high-performance deep learning with minimal computational and memory requirements.

Tiny YOLO, a smaller variant of the YOLO architecture, prioritizes real-time detection with reduced model complexity [67]. There are some commonly used Tiny YOLO model versions. Tiny YOLO v3 is the reduced version of the YOLOv3. YOLOv3 was developed based on a robust Darknet-53 framework employing 53 convolution layers [68] to improve performance. YOLOv3-tiny introduced much-reduced network depth, seven convolution layers, and six max-pooling layers, significantly reducing the processing time [69]. Tiny YOLO v7 [70] is part of the YOLOv7 series [71]. The main idea of YOLOv7 is to introduce trainable auxiliary architectures and optimize the network for real-time object detection. The modifications to create YOLOv7-tiny involved reducing the model size and using a simplified architecture. This significantly reduced the number of parameters to 6.2 million, compared to 36.9 million in YOLOv7. It also uses a leaky rectified linear unit (RELU), which has less complexity than the sigmoid linear unit (SILU) used in YOLOV7. Throughout their development, Tiny YOLO models have consistently aimed to balance speed, accuracy, and computational efficiency for surveillance, robotics, and IoT applications.

There are also some well-known lightweight deep learning models other than MobileNet and Tiny YOLO. EfficientNet-Lite [72], a streamlined version of EfficientNet [73], employs compound scaling to balance model size, accuracy, and speed, offering excellent performance for classification and detection tasks on edge platforms. NASNet-Mobile [74], developed through neural architecture search (NAS) techniques, provides high accuracy with an optimized structure for mobile devices, though it is slightly more resource-intensive than other models. SqueezeNet and ShuffleNet focus on extreme compactness; SqueezeNet [75] achieves high efficiency through 1 × 1 convolutions and parameter reduction, while ShuffleNet [76] leverages grouped convolutions and channel shuffling for fast and lightweight inference. These models are highly optimized for edge environments, enabling real-time and power-efficient object recognition in applications like IoT, autonomous systems, and AR/VR.

Table 4. Lightweight deep learning model.

Model	Key Features	Advantages	Limitations	Common Use Cases
MobileNet [64] MobileNetV2 [65] MobileNetV3 [66]	Designed for mobile and edge devices, focusing on efficiency.	Low computational cost; supports quantization	Slightly lower accuracy compared to larger models for complex tasks	Real-time image classification; object detection
TinyYOLO [69]	Simplified YOLO model with fewer layers and parameters.	High inference speed; supports smaller devices	Accuracy trade-off for speed	Real-time surveillance; autonomous driving
EfficientNetLite [73]	Optimized version of EfficientNet for edge devices with compound scaling.	Excellent trade-off between accuracy and efficiency	Requires specific tuning for edge deployment	Image classification; object detection demanding high accuracy
NASNetMobile [74]	AutoML-designed lightweight architecture for mobile devices.	High accuracy; optimized structure	More resource-intensive than MobileNet	Mobile classification and detection tasks
SqueezeNet [75]	Compact architecture using 1 × 1 convolutions to reduce parameters.	Very small model size; low latency.	Lower accuracy than modern architectures	Resource-constrained applications; resource constraint hardware like FPGA
ShuffleNet [76]	Uses pointwise group convolution and channel shuffle for efficiency.	High speed; low computational cost	Slightly lower accuracy compared to MobileNet, may require specialized hardware for optimal performance	Embedded systems, AR/VR applications

Another technique to building a lightweight deep learning model is using model compression techniques. Model compression techniques are essential for deploying efficient object recognition models on edge AI devices, where resources like memory, storage, and processing power are limited [77]. There are five well-known model compression techniques, as listed in Table 5. Model quantization reduces parameter precision from floating-point (e.g., floating points (FPs) 32 bits) to lower-bit formats (e.g., integer (INT) 8 bits), significantly shrinking memory usage and accelerating inference on constrained edge devices [78]. On the other hand, model pruning and sparsity techniques focus on removing redundant weights, channels, or layers to compress the network structure, thus reducing storage and computational resource usage [79]. Knowledge distillation transfers knowledge from a larger, complex model (the teacher) to a smaller, more efficient model (the student), allowing the student model to achieve similar performance while using significantly fewer parameters [80]. Additionally, low-rank factorization breaks large-weight matrices into products of smaller matrices, lowering the total parameter count and decreasing the required number of multiply-accumulate operations [81]. Finally, layer fusion can combine multiple layers into one to reduce computational overheads and improve efficiency [82]. Model optimization techniques can be applied individually or in combination to achieve the best balance between accuracy and resource consumption for specific edge devices.

3.4. Hardware Optimization

Hardware optimization is crucial for implementing edge AI, given the resource constraints of edge devices. There are some well-known hardware optimization techniques, as defined in Table 6. Table 6 also cites the references and highlights the achievements in the research. Pipeline design divides tasks into sequential stages that can execute concurrently, maximizing throughput and reducing latency for real-time inference [83]. Parallel processing leverages multiple hardware resources to perform operations simultaneously, significantly improving inference speed but requiring careful synchronization [84]. Dataflow optimization focuses on minimizing data transfer delays by optimizing memory access patterns, which enhances processing efficiency, particularly in data-heavy tasks [85]. Winograd convolution accelerates convolutional layers by reducing the number of multiplications, making it ideal for optimizing CNN-based object recognition models [86]. Memory hierarchy optimization customizes on-chip memory usage to minimize reliance on slower off-chip memory, improving data locality and reducing access latency [87]. Approximate computing minorly sacrifices accuracy for faster, lower-power operations by reducing computation precision or complexity, making it suitable for energy-constrained devices like drones and IoT cameras [88,89]. These techniques, when combined, enable real-time, scalable AI deployment on edge devices with limited resources. For example, Tatar et al. [83]. and Rezk et al. [84] achieved a throughput of 24,715 GOP and 3458 Gbps for a proposing pipeline and parallel processing techniques, respectively. In another work, Shubham et al. achieved an average reduction of 49% in energy consumption for their proposed tunable compressor, and an average reduction of 36% in energy consumption for the proposed multiply and accumulate (MAC) unit [89]. However, each optimization requires careful tuning to balance accuracy, speed, and hardware constraints.

Another approach in hardware design for edge AI is hardware–software (HW-SW) co-design and co-optimization [90]. HW-SW co-design and co-optimization is a development approach that integrates both hardware and software optimizations to create efficient edge AI solutions for object recognition on FPGAs. This approach optimizes the AI model and algorithms alongside the hardware design to maximize performance, resource utilization, and power efficiency. Software optimizations involve model compression techniques such as quantization, pruning, and layer fusion, which reduce the computational load and memory usage for efficient inference. On the hardware side, some hardware optimization techniques mentioned in Table 6 can be utilized [91]. Additionally, on the hardware side, some ready-made FPGA-specific accelerators like the Deep Learning Processing Unit (DPU) can also be used to optimize tasks such as convolutions, activations, and data movement, enabling parallelism and pipelining.

Frameworks like Vitis AI simplify the co-design process by allowing pre-trained models to be compiled and deployed on FPGA hardware while providing hardware-specific optimizations. Additionally, developers can create custom accelerators using High-Level Synthesis (HLS) to tailor the hardware for specific AI workloads. The integration of both hardware and software improvements ensures low-latency, real-time object recognition, making this co-design approach ideal for applications like autonomous vehicles, surveillance, and industrial automation. By aligning hardware capabilities with model requirements, edge AI systems achieve both high performance and scalability in resource-constrained environments.

Another term that is also used is cross-layer optimization [92]. It involves optimizing multiple layers of the system stack, which are the application layer, the model layer, the framework/runtime layer, firmware, and the hardware layer [93]. They are all optimized jointly rather than in isolation. This ensures efficiency, low latency, and reduced power consumption. Cross-layer optimization is essential in edge AI because it aligns AI models, hardware, and software to work together efficiently. Instead of optimizing each layer separately, collaborative optimization across layers leads to better power efficiency, lower latency, and improved overall system performance.

The smart surveillance system serves as a use case example for cross-layer optimization. In the application layer, we can simplify the detection operations to detect only suspicious human movements to remove unnecessary computations. In the model layer, we can utilize a lightweight deep learning model (Table 4) and model compression technique (Table 5) to achieve real-time inference. In the framework/runtime layer, the framework, such as VitisAI or TensorRT, can optimize data flow and fuse layers to accelerate processing. Finally, in the hardware layer, we can use hardware optimization techniques (Table 6) or a custom-made hardware accelerator such as DPU or GPU to process the deep neural network operations efficiently. As a result of joint optimization, we obtain the low-power and real-time object recognition with minimal latency.

3.5. Edge Performance Metrics

It is essential to evaluate the development of edge AI solutions on metrics beyond just accuracy, including processing time, hardware utilization model size, and power consumption. There are several categories of metric performance, which are AI model performance (Table 7), hardware performance (Table 8), communication and network performance (Table 9), security and reliability (Table 10). Inference accuracy denotes the percentage of correctly classified or detected objects, which ensures reliable recognition results [94]. Some commonly used accuracy metrics are accuracy, F1-score, precision, and recall [95]. Processing time measures the real-time performance of the proposed solutions. It is critical for real-time applications, such as self-driving vehicles, industrial automation, and precision farming. The processing time can be represented by some metrics, such as inference time or latency [96]. Other than processing time, the compute efficiency can also be evaluated using the ratio of floating-point operations per second (FLOPs) and multiply-accumulate operations (MACs) [23]. Lower FLOPs/MACs indicates better efficiency on edge devices. Finally, the model size metrics confirm if the deep learning model is deployable on devices with limited storage [97]. A summary of AI model performance metrics is shown in Table 7.

Hardware performance metrics measure the computing speed and hardware utilization efficiency. Compute throughput measures the number of inferences per second that hardware can perform. Hardware utilization (such as GPU, TPU, CPU, and RAM utilization) and power consumption (consumed power frame and battery life) are standard metrics that can measure the resource-efficient implementation of edge AI solutions. Thermal efficiency is also an important metric to ensure that the designed hardware operates without overheating. A summary of the key performance metrics is shown in Table 8. Other than the two categories already discussed above, two other metric categories can also be evaluated depending on the implementation case. The communication and network metrics, as shown in Table 9, can be considered for hybrid edge–cloud implementation, especially for remote IoT applications. Meanwhile, the security and reliability metrics, as shown in Table 10, are significant for applications that contains sensitive data, for example, the digital health applications.

Table 11 summarizes existing related research and the performance metrics used in the evaluation [23,44,45,107,108,109,110,111]. Listed research in Table 11 is conducted on different hardware platforms, i.e., Jetson platform [23,107,109,110,111], Raspberry Pi [108], and FPGA [44]. mAP and F1-score are the most common accuracy parameters used. While a few studies also evaluated accuracy, precision, recall, for processing time performance, most studies in Table 11 evaluated inference time (ms), FPS, Giga FLOPS (FLOPS), and throughput in giga operations per seconds (GOPSs). Meanwhile, for hardware utilization, most of the studies in Table 11 evaluated energy consumption (Watt hour (WH)), RAM/CPU/GPU utilization for studies conducted on the Jetson platform, and block random access memory (BRAM), digital signal processing (DSP) elements, look-up table (LUT), and flip-flop (FF) utilizations for studies conducted on FPGA.

4. Case Study: Real-Time Power Line Inspection

In Section 3, the key considerations in object recognition development on edge computing devices have been discussed. In order to understand the real-implementation of the concepts and issues presented in previous sections, an example of an application in power transmission line inspection is presented in this section. This case study was selected by considering the need for real-time object recognition to be conducted on the edge devices that are installed on the drone [112,113,114]. The power transmission line should be regularly inspected to promptly identify and fix any damage to guarantee the effective and dependable transmission of high-voltage electricity. The traditional method, i.e., line crawling, is not safe because the possibility of the engineers falling from the wire or getting electrocuted is very high. Some other methods have been developed, such as using telescopes to observe the lines from the ground, power transmission line inspection robots (PTLIRs) [115], helicopter-assisted inspection, and automated helicopter-assisted inspection [116]. However, these methods mainly depend on human observation, are less effective, and present high risk [55].

In our study, we develop a power transmission line inspection system using edge computing platforms [22]. The developed deep learning framework for this research is shown in Figure 3. It primarily consists of two stages, which are the training phase and the real-time detection phase. In the training phase, the data generation, data annotation, and training are conducted. In this study, YOLOv7 and Tiny YOLOv7 are trained. Both are state-of-the-art methods for the object detection task. In the real-time detection phase, the trained model is evaluated on edge computing devices, which are Raspberry Pi 4B, Jetson Nano, and Jetson Orin Nano. The specification of each device is defined in Table 12. Some deep learning frameworks are also evaluated in this research, i.e., OpenVINO 2023.1.0, PyTorch 2.1.2, and TensorRT 8.2.1. The model quantization to FP16 and INT8 are also evaluated in this research. Finally, the final evaluation results are the mAP, inference time, RAM utilization, and power consumption.

The evaluation results are shown in Table 13. It shows that the detection processing time in Raspberry Pi 4B is significantly longer compared to other platforms, and is 16.4 seconds (s). From this result, we can understand that the YOLOv7 is not suitable for real-time detection as works on the Raspberry Pi 4B platform. However, some approaches can be taken to improve the performance, for example, by using the different edge devices such as Jetson Nano and Jetson Orin Nano. Our results show that YOLOv7 can achieve 0.33 s inference time using the FP16 quantized model on Jetson Nano with the TensorRT framework. Meanwhile, it can achieve 0.02 s inference time using the INT8 quantized model on Jetson Orin Nano with the TensorRT framework.

On the other hand, using YOLOv7-tiny on Raspberry Pi4B can reduce the inference time to 2.74 s. YOLOv7-tiny can achieve 0.06 s inference time using the FP16 quantized model on Jetson Nano with the TensorRT framework, and it can achieve a 0.008 s inference time using the INT8 quantized model on Jetson Orin Nano with the TensorRT framework. In terms of accuracy, the usage of YOLOv7-tiny and model quantization technique does not reduce the mAP significantly. YOLOv7-Tiny INT8 model implementation on Jetson Orin Nano with TensorRT It can still achieve 0.936 mAP. Finally, in terms of RAM utilization and power consumption, the implementation with Raspberry Pi4B achieves the lowest RAM utilization and power consumption.

From this result, we can also understand the impact of deep learning framework selection, for example, in Jetson Nano implementation. The implementation with TensorRT can reduce the inference time for every implementation case compared to PyTorch. For example, it can reduce the inference time for quantized YOLOv7 from 1.24 s to 0.33 s. For quantized YOLOv7-tiny, it can reduce the inference time from 0.23 s. to 0.06 s. This result can be obtained due to the hardware-specific optimization performed by TensortRT for NVIDIA GPUs, including those in Jetson devices [9].

In some previous work, researchers defined the metrics that define the balance between accuracy and complexity for practical usage. Canziani et al. defines an information density metric that evaluates the practical usage of an AI model by considering its accuracy and number of parameters [117]. On the other hand, Wong proposed the NetScore metric, which asses the performance of a deep neural network in practical usage by considering the accuracy of the network, number of parameters in the network, and the number of multiply-accumulate operation during network inference. Inspired by these works, we propose the edge-AI deployment score (EADS) (denoted by

Ε (N)

) metric, which asses the performance of edge AI deployment by considering the edge implementation accuracy (mAP), inference time, RAM utilization, and power consumption. The proposed EADS can be defined as follows:

Ε (N) = \frac{{a (N)}^{α}}{{t (N)}^{β} {u (N)}^{γ} {p (N)}^{δ}}

(1)

Ε_{s c a l e d} (N) = \frac{Ε (N) - Ε_{m i n} (N)}{Ε_{m a x} (N) - Ε_{m i n} (N)}

(2)

where

a (N)

is the accuracy of the network,

t (N)

is the inference time,

u (N)

is edge resource utilization (e.g., RAM, CPU), and

p (N)

is power consumption. On the other hand,

α, β, γ, δ

are coefficients that control the influences of accuracy, inference time, resource utilization, and power consumption, respectively. Finaly,

Ε_{s c a l e d} (N)

is the scaled-value of EADS that ensures the final results are scaled from 0 to 1. Table 13 shows the EADS results. Quantized YOLOv7-Tiny (INT8) implementation on Jetson Orin Nano shows the best score, 1.000, due to acceptable accuracy result and outstanding inference time results, while YOLO-v7 (FP-32) implementation shows the lowest score, 0.000, due to the very large value of inference time.

Although the implementation on Jetson Orin Nano achieves the lowest inference time, the cost the device is, however, much more expensive than Jetson Nano and Raspberry Pi. Therefore, further optimization of Raspberry Pi and Jetson Nano is required to achieve the best solution for object recognition on edge computing devices at a lower cost.

5. Future Directions

The previous sections discussed the current state of research and development in edge AI for object recognition. In this section, future issues and potential research directions are discussed. Some potential future research directions for future research and development in edge AI for object recognition are as follows:

Optimizing deep neural networks for edge devices: Edge AI systems for object detection often rely on deep neural networks (DNNs), which can be computationally intensive and require significant memory resources. Research should focus on developing more efficient DNN models specifically designed for edge devices with limited resources. This includes exploring lightweight models, which is to design DNN architectures with fewer parameters and operations, such as the RTSD-Net model proposed in [23], which simplifies the YOLOv4-tiny architecture for real-time strawberry detection on a Jetson Nano. In another work, Sabbir et al. proposed DeepCompress-VIT, which can achieve a high compression ratio (more than 14 times) to enable the entire model of vision transformer (VIT) to be stored in the limited storage space of edge devices [118].
Addressing real-time processing requirements: Object detection in many edge AI applications, such as autonomous driving, requires real-time or near real-time performance. Future research should focus on optimizing AI algorithms for speed, such as designing algorithms that can process data quickly and efficiently, such as the Edge YOLO system presented in [15], which modifies the YOLOv4 architecture for real-time object detection on edge devices in autonomous vehicles. On the other hand, it is also necessary to leverage hardware accelerators. Hardware accelerators can be designed to speed up specific parts of the object detection pipeline, such as convolutional operations or object tracking. There are many studies developed in this theme. For example, Adiono et al. proposed a fast and scalable multicore YOLOv3-Tiny accelerator using input stationery systolic architecture, which improved the FPS up to 2.23 times, compared to the existing method [45].
Improving energy efficiency: Edge devices often rely on battery power, making energy efficiency a crucial consideration for object detection systems. Research should focus on designing energy-efficient algorithms that minimize energy consumption without sacrificing accuracy. This may involve using lightweight models [43], optimizing the inference process [119], and employing techniques like model compression [77]. Another direction is to optimize hardware for low-power use, which is to develop energy-efficient hardware platforms and components specifically designed for edge AI applications. This includes utilizing low-power processors, optimizing memory access patterns, employing energy harvesting techniques, and employing cross-layer optimization techniques [92]. The edge AI architecture consists of a cloud server, edge server, and edge devices. Therefore, research related to task allocation, scheduling, and offloading among the architecture components can be developed further to improve the energy efficiency [37,107].
Addressing security and privacy concerns: Edge AI systems for object detection can be vulnerable to security threats and privacy breaches, for example, adversarial and overload attacks [120]. Future research should focus on developing secure and privacy-preserving algorithms that protect sensitive data and prevent malicious attacks. This includes techniques like federated learning [121], differential privacy, homomorphic encryption, or a combination of them. For example, Rahbari et al. proposed leveraging homomorphic encryption to design an efficient edge AI architecture with federated learning [122]. Another direction is to implement secure hardware and software platforms, as well as secure communication protocols for data transmission. For example, Wang et al. proposed a cost-efficient micro AI for securing edge devices against cyber attacks at hardware level, such as malware and side-channel attacks (SCAs) [123].

These research directions require collaboration from researchers in artificial intelligence, edge computing, hardware design, and security to develop innovative solutions in this rapidly evolving field.

6. Conclusions

In this paper, we present artificial intelligence for object recognition applications. The benefits, challenges, and deployment considerations are discussed. In this paper, we consider the key considerations in employing deep learning on edge computing devices, such as selecting edge devices, deep learning frameworks, lightweight deep learning models, hardware optimization, and performance metrics. The available techniques for all aspects are also discussed in this paper. A potential application for real-time power line inspection is presented in order to give a better understanding of a real implementation of the issues discussed. The evaluation results in several edge computing platforms show that the selections of a lightweight method, model compression method, deep learning framework, and appropriate edge hardware are significant in achieving the real-time requirement. A new metric, EADS, is also proposed in the paper. This metric asses the performance of edge AI deployment by considering the edge implementation accuracy, inference time, resource utilization, and power consumption. Finally, some recommendations are made for future research in this area.

Author Contributions

Conceptualization, N.S. (Nico Surantha); methodology, N.S. (Nico Surantha); investigation, N.S. (Nico Surantha) and N.S. (Nana Sutisna); resources, N.S. (Nico Surantha); writing—original draft preparation, N.S. (Nico Surantha); writing—review and editing, N.S. (Nana Sutisna); visualization, N.S. (Nico Surantha); funding acquisition, N.S. (Nico Surantha) and N.S. (Nana Sutisna). All authors have read and agreed to the published version of the manuscript.

Funding

The research was conducted as part of the Tokyo City University Prioritized Studies.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Singh, R.; Gill, S.S. Edge AI: A survey. Internet Things Cyber-Phys. Syst. 2023, 3, 71–92. [Google Scholar] [CrossRef]
Chang, Z.; Liu, S.; Xiong, X.; Cai, Z.; Tu, G. A Survey of Recent Advances in Edge-Computing-Powered Artificial Intelligence of Things. IEEE Internet Things J. 2021, 8, 13849–13875. [Google Scholar] [CrossRef]
Wang, X.; Han, Y.; Leung, V.C.M.; Niyato, D.; Yan, X.; Chen, X. Edge AI: Convergence of Edge Computing and Artificial Intelligence; Springer Nature: Berlin/Heidelberg, Germany, 2020; ISBN 9789811561863. [Google Scholar]
Satyanarayanan, M. The emergence of edge computing. Computer 2017, 50, 30–39. [Google Scholar] [CrossRef]
Liu, D.; Kong, H.; Luo, X.; Liu, W.; Subramaniam, R. Bringing AI to edge: From deep learning’s perspective. Neurocomputing 2022, 485, 297–320. [Google Scholar] [CrossRef]
Chen, C.; Wang, C.; Liu, B.; He, C.; Cong, L.; Wan, S. Edge Intelligence Empowered Vehicle Detection and Image Segmentation for Autonomous Vehicles. IEEE Trans. Intell. Transp. Syst. 2023, 24, 13023–13034. [Google Scholar] [CrossRef]
Kljucaric, L.; George, A.D. Deep Learning Inferencing with High-performance Hardware Accelerators. ACM Trans. Intell. Syst. Technol. 2023, 14, 1–25. [Google Scholar] [CrossRef]
Manor, E.; Greenberg, S. Custom Hardware Inference Accelerator for TensorFlow Lite for Microcontrollers. IEEE Access 2022, 10, 73484–73493. [Google Scholar] [CrossRef]
Jeong, E.; Kim, J.; Ha, S. TensorRT-Based Framework and Optimization Methodology for Deep Learning Inference on Jetson Boards. ACM Trans. Embed. Comput. Syst. 2022, 21, 1–26. [Google Scholar] [CrossRef]
Jajal, P.; Jiang, W.; Tewari, A.; Kocinare, E.; Woo, J.; Sarraf, A.; Lu, Y.H.; Thiruvathukal, G.K.; Davis, J.C. Interoperability in Deep Learning: A User Survey and Failure Analysis of ONNX Model Converters. In Proceedings of the ISSTA 2024: 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis, Vienna, Austria, 16–20 September 2024; pp. 1466–1478. [Google Scholar] [CrossRef]
Prerad, D.; Ivanovic, Z.; Avramovic, A. Edge AI acceleration using OpenCL framework on NXP i.MX 8M Nano SoM. In Proceedings of the 2024 XV International Symposium on Industrial Electronics and Applications (INDEL), Banja Luka, Bosnia and Herzegovina, 6–8 November 2024; pp. 1–5. [Google Scholar] [CrossRef]
Wang, J.; Gu, S. FPGA Implementation of Object Detection Accelerator Based on Vitis-AI. In Proceedings of the 2021 11th International Conference on Information Science and Technology (ICIST), Chengdu, China, 21–23 May 2021; pp. 571–577. [Google Scholar] [CrossRef]
Foukalas, F.; Tziouvaras, A. Edge Artificial Intelligence for Industrial Internet of Things Applications: An Industrial Edge Intelligence Solution. IEEE Ind. Electron. Mag. 2021, 15, 28–36. [Google Scholar] [CrossRef]
Chen, J.; Ran, X. Deep Learning with Edge Computing: A Review. Proc. IEEE 2019, 107, 1655–1674. [Google Scholar] [CrossRef]
Liang, S.; Wu, H.; Zhen, L.; Hua, Q.; Garg, S.; Kaddoum, G.; Hassan, M.M.; Yu, K. Edge YOLO: Real-Time Intelligent Object Detection System Based on Edge-Cloud Cooperation in Autonomous Vehicles. IEEE Trans. Intell. Transp. Syst. 2022, 23, 25345–25360. [Google Scholar] [CrossRef]
Gupta, H.; Kotlyar, O.; Andreasson, H.; Lilienthal, A.J. Robust Object Detection in Challenging Weather Conditions. In Proceedings of the 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 3–8 January 2024; pp. 7508–7517. [Google Scholar] [CrossRef]
Shao, X.Y.; Guo, Y.; Wang, Y.W.; Bao, Z.W.; Wang, J.Y. A small object detection algorithm based on feature interaction and guided learning. J. Vis. Commun. Image Represent. 2024, 98, 104011. [Google Scholar] [CrossRef]
Chen, S.; Ye, M.; Du, B. Rotation Invariant Transformer for Recognizing Object in UAVs. In Proceedings of the MM ’22: 30th ACM International Conference on Multimedia, Lisbon, Portugal, 10–14 October 2022; pp. 2565–2574. [Google Scholar] [CrossRef]
Khoa, T.A.; Van Nguyen, D.; Dao, M.S.; Zettsu, K. Fed xData: A Federated Learning Framework for Enabling Contextual Health Monitoring in a Cloud-Edge Network. In Proceedings of the 2021 IEEE International Conference on Big Data (Big Data), Orlando, FL, USA, 15–18 December 2021; pp. 4979–4988. [Google Scholar] [CrossRef]
Liang, H.; He, E.; Zhao, Y.; Jia, Z.; Li, H. Adversarial Attack and Defense: A Survey. Electronics 2022, 11, 1283. [Google Scholar] [CrossRef]
Chen, E.-C.; Chen, P.-Y.; Chung, I.-H.; Lee, C. Overload: Latency Attacks on Object Detection for Edge Devices. arXiv 2023, arXiv:2304.05370. [Google Scholar]
Surantha, N.; Yamashina, E.; Sato, Y.; Iwao, T. Power Transmission Line Component Detection Using YOLOv7 on Single-Board Computer Platforms. In Proceedings of the 2024 IEEE 99th Vehicular Technology Conference (VTC2024-Spring), Singapore, 24–27 June 2024; pp. 1–6. [Google Scholar] [CrossRef]
Zhang, Y.; Yu, J.; Chen, Y.; Yang, W.; Zhang, W.; He, Y. Real-time strawberry detection using deep neural networks on embedded system (rtsd-net): An edge AI application. Comput. Electron. Agric. 2022, 192, 106586. [Google Scholar] [CrossRef]
Ullah, F.U.M.; Muhammad, K.; Haq, I.U.; Khan, N.; Heidari, A.A.; Baik, S.W.; De Albuquerque, V.H.C. AI-Assisted Edge Vision for Violence Detection in IoT-Based Industrial Surveillance Networks. IEEE Trans. Ind. Inform. 2022, 18, 5359–5370. [Google Scholar] [CrossRef]
Hua, H.; Li, Y.; Wang, T.; Dong, N.; Li, W.; Cao, J. Edge Computing with Artificial Intelligence: A Machine Learning Perspective. ACM Comput. Surv. 2023, 55, 1–35. [Google Scholar] [CrossRef]
Wu, Y.; Guo, H.; Chakraborty, C.; Khosravi, M.R.; Berretti, S.; Wan, S. Edge Computing Driven Low-Light Image Dynamic Enhancement for Object Detection. IEEE Trans. Netw. Sci. Eng. 2023, 10, 3086–3098. [Google Scholar] [CrossRef]
Kalyani, Y.; Collier, R. A systematic survey on the role of cloud, fog, and edge computing combination in smart agriculture. Sensors 2021, 21, 5922. [Google Scholar] [CrossRef]
Lingayya, S.; Jodumutt, S.B.; Pawar, S.R.; Vylala, A.; Chandrasekaran, S. Dynamic task offloading for resource allocation and privacy-preserving framework in Kubeedge-based edge computing using machine learning. Cluster Comput. 2024, 27, 9415–9431. [Google Scholar] [CrossRef]
Li, E.; Zeng, L.; Zhou, Z.; Chen, X. Edge AI: On-Demand Accelerating Deep Neural Network Inference via Edge Computing. IEEE Trans. Wirel. Commun. 2020, 19, 447–457. [Google Scholar] [CrossRef]
Hawlader, F.; Robinet, F.; Frank, R. Leveraging the edge and cloud for V2X-based real-time object detection in autonomous driving. Comput. Commun. 2024, 213, 372–381. [Google Scholar] [CrossRef]
Rouf, A.; Arslan, E.; Charyyev, B. Latency and Bandwidth Benefits of Edge Computing for Scientific Applications. In Proceedings of the IEEE INFOCOM 2024—IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), Vancouver, BC, Canada, 20 May 2024; pp. 1–6. [Google Scholar] [CrossRef]
Ahmad, I.; Kim, J.; Shin, S. Privacy-Preserving Uncertainty Calibration using Perceptual Encryption in Cloud-Edge Collaborative Artificial Intelligence of Things. IEEE Internet Things J. 2025, 12, 25424–25441. [Google Scholar] [CrossRef]
Wang, Y.; Nakachi, T. A privacy-preserving learning framework for face recognition in edge and cloud networks. IEEE Access 2020, 8, 136056–136070. [Google Scholar] [CrossRef]
Aral, A.; Brandic, I. Learning Spatiotemporal Failure Dependencies for Resilient Edge Computing Services. IEEE Trans. Parallel Distrib. Syst. 2021, 32, 1578–1590. [Google Scholar] [CrossRef]
Al Amin, R.; Hasan, M.; Wiese, V.; Obermaisser, R. FPGA-Based Real-Time Object Detection and Classification System Using YOLO for Edge Computing. IEEE Access 2024, 12, 73268–73278. [Google Scholar] [CrossRef]
Kong, X.; Duan, G.; Hou, M.; Shen, G.; Wang, H.; Yan, X.; Collotta, M. Deep Reinforcement Learning-Based Energy-Efficient Edge Computing for Internet of Vehicles. IEEE Trans. Ind. Inform. 2022, 18, 6308–6316. [Google Scholar] [CrossRef]
Mendula, M.; Bellavista, P.; Levorato, M.; Contreras, S.L.d.G. A novel middleware for adaptive and efficient split computing for real-time object detection. Pervasive Mob. Comput. 2025, 108, 102028. [Google Scholar] [CrossRef]
Bai, Z.; Lin, Y.; Cao, Y.; Wang, W. Delay-Aware Cooperative Task Offloading for Multi-UAV Enabled Edge-Cloud Computing. IEEE Trans. Mob. Comput. 2024, 23, 1034–1049. [Google Scholar] [CrossRef]
Setyanto, A.; Sasongko, T.B.; Fikri, M.A.; Kim, I.K. Near-Edge Computing Aware Object Detection: A Review. IEEE Access 2024, 12, 2989–3011. [Google Scholar] [CrossRef]
Wong, A. Netscore: Towards universal metrics for large-scale performance analysis of deep neural networks for practical on-device edge usage. In Image Analysis and Recognition. ICIAR 2019. Lecture Notes in Computer Science; Karray, F., Campilho, A., Yu, A., Eds.; Springer: Cham, Switzerland, 2019; Volume 11663, pp. 15–26. [Google Scholar] [CrossRef]
Tan, T.; Cao, G. Thermal-Aware Scheduling for Deep Learning on Mobile Devices With NPU. IEEE Trans. Mob. Comput. 2024, 23, 10706–10719. [Google Scholar] [CrossRef]
Vincent; Darian, G.; Surantha, N. Performance Evaluation of Convolutional Neural Network (CNN) for Skin Cancer Detection on Edge Computing Devices. Appl. Sci. 2025, 15, 3077. [Google Scholar] [CrossRef]
Surantha, N.; Sugijakko, B. Lightweight face recognition-based portable attendance system with liveness detection. Internet Things 2024, 25, 101089. [Google Scholar] [CrossRef]
Adiono, T.; Putra, A.; Sutisna, N.; Syafalni, I.; Mulyawan, R. Low Latency YOLOv3-Tiny Accelerator for Low-Cost FPGA Using General Matrix Multiplication Principle. IEEE Access 2021, 9, 141890–141913. [Google Scholar] [CrossRef]
Adiono, T.; Ramadhan, R.M.; Sutisna, N.; Syafalni, I.; Mulyawan, R.; Lin, C.H. Fast and Scalable Multicore YOLOv3-Tiny Accelerator Using Input Stationary Systolic Architecture. IEEE Trans. Very Large Scale Integr. Syst. 2023, 31, 1774–1787. [Google Scholar] [CrossRef]
Surantha, N.; Wicaksono, W.R. Design of Smart Home Security System using Object Recognition and PIR Sensor. Procedia Comput. Sci. 2018, 135, 465–472. [Google Scholar] [CrossRef]
Tsirtsakis, P.; Zacharis, G.; Maraslidis, G.S.; Fragulis, G.F. Deep learning for object recognition: A comprehensive review of models and algorithms. Int. J. Cogn. Comput. Eng. 2025, 6, 298–312. [Google Scholar] [CrossRef]
Vandendriessche, J.; Wouters, N.; da Silva, B.; Lamrini, M.; Chkouri, M.Y.; Touhafi, A. Environmental sound recognition on embedded systems: From fpgas to tpus. Electronics 2021, 10, 2622. [Google Scholar] [CrossRef]
Guney, E.; Bayilmis, C.; Cakan, B. Correction: An implementation of real-time traffic signs and road objects detection based on mobile GPU platforms. IEEE Access 2022, 10, 86191–86203, Erratum in IEEE Access 2022, 10, 103587. [Google Scholar] [CrossRef]
Shahid, A.; Mushtaq, M. A Survey Comparing Specialized Hardware and Evolution in TPUs for Neural Networks. In Proceedings of the 2020 IEEE 23rd International Multitopic Conference (INMIC), Bahawalpur, Pakistan, 5–7 November 2020; pp. 1–6. [Google Scholar] [CrossRef]
Hu, Y.; Liu, Y.; Liu, Z. A Survey on Convolutional Neural Network Accelerators: GPU, FPGA and ASIC. In Proceedings of the 2022 14th International Conference on Computer Research and Development (ICCRD), Shenzhen, China, 7–9 January 2022; pp. 100–107. [Google Scholar] [CrossRef]
Swaminathan, T.P.; Silver, C.; Akilan, T. Benchmarking Deep Learning Models on NVIDIA Jetson Nano for Real-Time Systems: An Empirical Investigation. Procedia Comput. Sci. 2024, 260, 906–913. [Google Scholar] [CrossRef]
Scalcon, F.P.; Tahal, R.; Ahrabi, M.; Huangfu, Y.; Ahmed, R.; Nahid-Mobarakeh, B.; Shirani, S.; Vidal, C.; Emadi, A. AI-Powered Video Monitoring: Assessing the NVIDIA Jetson Orin Devices for Edge Computing Applications. In Proceedings of the 2024 IEEE Transportation Electrification Conference and Expo (ITEC), Chicago, IL, USA, 19–21 June 2024; pp. 1–6. [Google Scholar] [CrossRef]
Winzig, J.; Almanza, J.C.A.; Mendoza, M.G.; Schumann, T. Edge AI—Use Case on Google Coral Dev Board Mini. In Proceedings of the 2022 IET International Conference on Engineering Technologies and Applications (IET-ICETA), Changhua, Taiwan, 14–16 October 2022; pp. 2–3. [Google Scholar] [CrossRef]
Surantha, N.; Sukizaki, Y.; Yamashina, E.; Iwao, T. Power Transmission Line Component Detection using YOLO V3 on Raspberry Pi. In Proceedings of the 2023 22nd International Symposium on Communications and Information Technologies (ISCIT), Sydney, Australia, 16–18 October 2023; pp. 139–140. [Google Scholar] [CrossRef]
Ramesh, S.N.; Sarkar, M.; Audette, M.; Paolini, C. Efficient Real-time Fall Prediction and Detection using Privacy-Centric Vision-based Human Pose Estimation on the Xilinx^® Kria^TM K26 SOM. In Proceedings of the 2023 IEEE Biomedical Circuits and Systems Conference (BioCAS), Toronto, ON, Canada, 19–21 October 2023; pp. 1–5. [Google Scholar] [CrossRef]
Choi, P.; Kim, J.; Kwak, J. Impact of Joint Heat and Memory Constraints of Mobile Device in Edge-Assisted On-Device Artificial Intelligence. In Proceedings of the NetAISys ’24: 2nd International Workshop on Networked AI Systems, Tokyo, Japan, 3–7 June 2024; pp. 31–36. [Google Scholar] [CrossRef]
Pandey, J.; Asati, A.R. Lightweight convolutional neural network architecture implementation using TensorFlow lite. Int. J. Inf. Technol. 2023, 15, 2489–2498. [Google Scholar] [CrossRef]
Iqbal, U.; Davies, T.; Perez, P. A Review of Recent Hardware and Software Advances in GPU-Accelerated Edge-Computing Single-Board Computers (SBCs) for Computer Vision. Sensors 2024, 24, 4830. [Google Scholar] [CrossRef] [PubMed]
Zunin, V.V. Intel OpenVINO Toolkit for Computer Vision: Object Detection and Semantic Segmentation. In Proceedings of the 2021 International Russian Automation Conference (RusAutoCon), Sochi, Russia, 5–11 September 2021; pp. 847–851. [Google Scholar] [CrossRef]
Koo, Y.; Kim, S.; Ha, Y. guk OpenCL-Darknet: Implementation and optimization of OpenCL-based deep learning object detection framework. World Wide Web 2021, 24, 1299–1319. [Google Scholar] [CrossRef]
Fukuda, Y.; Yoshida, K.; Fujino, T. Evaluation of Model Quantization Method on Vitis-AI for Mitigating Adversarial Examples. IEEE Access 2023, 11, 87200–87209. [Google Scholar] [CrossRef]
Stanisz, J.; Lis, K.; Gorgon, M. Implementation of the PointPillars Network for 3D Object Detection in Reprogrammable Heterogeneous Devices Using FINN. J. Signal Process. Syst. 2022, 94, 659–674. [Google Scholar] [CrossRef]
Andreetto, M.; Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications Automatic 3-D Modeling of Textured Cultural Heritage Objects View project MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. 2017. Available online: https://www.researchgate.net/publication/316184205 (accessed on 3 May 2025).
Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar] [CrossRef]
Chu, G.; Chen, L.; Chen, B.; Tan, M.; Howard, A.; Wang, W.; Sandler, M. Searching for MobileNetV3 Accuracy vs MAdds vs model size. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1314–1324. [Google Scholar]
Hussain, M. YOLOv1 to v8: Unveiling Each Variant-A Comprehensive Review of YOLO. IEEE Access 2024, 12, 42816–42833. [Google Scholar] [CrossRef]
Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. Comput. Vis Pattern Recognit. 2018, 1804, 1–8. [Google Scholar]
Adarsh, P.; Rathi, P.; Kumar, M. YOLO v3-Tiny: Object Detection and Recognition using one stage improved model. In Proceedings of the 2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS), Coimbatore, India, 6–7 March 2020; pp. 687–694. [Google Scholar] [CrossRef]
She, F.; Hong, Z.; Zeng, Z.; Yu, W. Improved Traffic Sign Detection Model Based on YOLOv7-Tiny. IEEE Access 2023, 11, 126555–126567. [Google Scholar] [CrossRef]
Wang, C.-Y.; Bochkovskiy, A.; Liao, H.-Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar]
Ab Wahab, M.N.; Nazir, A.; Ren, A.T.Z.; Noor, M.H.M.; Akbar, M.F.; Mohamed, A.S.A. Efficientnet-Lite and Hybrid CNN-KNN Implementation for Facial Expression Recognition on Raspberry Pi. IEEE Access 2021, 9, 134065–134080. [Google Scholar] [CrossRef]
Mingxing Tan, Q.V. Le EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks Mingxing. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; Volume 97, pp. 6105–6114. Available online: https://proceedings.mlr.press/v97/tan19a/tan19a.pdf (accessed on 8 May 2025).
Zoph, B.; Vasudevan, V.; Shlens, J.; Le, Q.V. Learning Transferable Architectures for Scalable Image Recognition. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8697–8710. [Google Scholar] [CrossRef]
Iandola, F.N.; Han, S.; Moskewicz, M.W.; Ashraf, K.; Dally, W.J.; Keutzer, K. SqueezeNet: AlexNet-Level Accuracy with 50× Fewer Parameters and Less Than 0.5MB Model Size. arXiv 2017, arXiv:1602.07360. [Google Scholar]
Zhang, X.; Lin, M.; Sun, J. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 6848–6856. [Google Scholar]
Dantas, P.V.; Sabino da Silva, W.; Cordeiro, L.C.; Carvalho, C.B. A comprehensive review of model compression techniques in machine learning. Appl. Intell. 2024, 54, 11804–11844. [Google Scholar] [CrossRef]
Rokh, B.; Azarpeyvand, A.; Khanteymoori, A. A Comprehensive Survey on Model Quantization for Deep Neural Networks in Image Classification. ACM Trans. Intell. Syst. Technol. 2023, 14, 1–48. [Google Scholar] [CrossRef]
Jiang, Y.; Wang, S.; Valls, V.; Ko, B.J.; Lee, W.H.; Leung, K.K.; Tassiulas, L. Model Pruning Enables Efficient Federated Learning on Edge Devices. IEEE Trans. Neural Netw. Learn. Syst. 2023, 34, 10374–10386. [Google Scholar] [CrossRef]
Gou, J.; Yu, B.; Maybank, S.J.; Tao, D. Knowledge Distillation: A Survey. Int. J. Comput. Vis. 2021, 129, 1789–1819. [Google Scholar] [CrossRef]
Chen, L.; Jiang, X.; Liu, X.; Zhou, Z. Logarithmic Norm Regularized Low-Rank Factorization for Matrix and Tensor Completion. IEEE Trans. Image Process. 2021, 30, 3434–3449. [Google Scholar] [CrossRef]
Muhammad, G.; Shamim Hossain, M. COVID-19 and Non-COVID-19 Classification using Multi-layers Fusion From Lung Ultrasound Images. Inf. Fusion. 2021, 72, 80–88. [Google Scholar] [CrossRef]
Tatar, G.; Bayar, S.; Çiçek, I.; Çiçek, Ł. Real-Time Multi-Learning Deep Neural Network on an MPSoC-FPGA for Intelligent Vehicles: Harnessing Hardware Acceleration With Pipeline. IEEE Trans. Intell. Veh. 2024, 9, 5021–5032. [Google Scholar] [CrossRef]
Rezk, A.A.; Madian, A.H.; Radwan, A.G.; Soliman, A.M. On-the-Fly Parallel Processing IP-Core for Image Blur Detection, Compression, and Chaotic Encryption Based on FPGA. IEEE Access 2021, 9, 82726–82746. [Google Scholar] [CrossRef]
Chen, T.; Tan, Y.-A.; Zhang, Z.; Luo, N.; Li, B.; Li, Y. Dataflow optimization with layer-wise design variables estimation method for enflame CNN accelerators. J. Parallel Distrib. Comput. 2024, 189, 104869. [Google Scholar] [CrossRef]
Alam, S.A.; Anderson, A.; Barabasz, B.; Gregg, D. Winograd Convolution for Deep Neural Networks: Efficient Point Selection. ACM Trans. Embed. Comput. Syst. 2022, 21, 1–28. [Google Scholar] [CrossRef]
Sethi, G.; Acun, B.; Agarwal, N.; Kozyrakis, C.; Trippel, C.; Wu, C.J. RecShard: Statistical Feature-Based Memory Optimization for Industry-Scale Neural Recommendation. In Proceedings of the ASPLOS ’22: 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, 28 February–4 March 2022; pp. 344–358. [Google Scholar] [CrossRef]
Dalloo, A.M.; Humaidi, A.J.; Al Mhdawi, A.K.; Al-Raweshidy, H. Approximate Computing: Concepts, Architectures, Challenges, Applications, and Future Directions. IEEE Access 2024, 12, 146022–146088. [Google Scholar] [CrossRef]
Garg, S.; Monga, K.; Chaturvedi, N.; Gurunarayanan, S. Tunable Energy-Efficient Approximate Circuits for Self-Powered AI and Autonomous Edge Computing Systems. IEEE Access 2025, 13, 43607–43630. [Google Scholar] [CrossRef]
Rashidi, B.; Gao, C.; Lu, S.; Wang, Z.; Zhou, C.; Niu, D.; Sun, F. UNICO: Unified Hardware Software Co-Optimization for Robust Neural Network Acceleration. In Proceedings of the MICRO ’23: 56th Annual IEEE/ACM International Symposium on Microarchitecture, Toronto, ON, Canada, 28 October–1 November 2023; pp. 77–90. [Google Scholar] [CrossRef]
Marchisio, A.; Hanif, M.A.; Putra, R.V.W.; Shafique, M. HW/SW co-design and co-optimizations for deep learning. In Proceedings of the INTESA ’18: Workshop on INTelligent Embedded Systems Architectures and Applications, Turin, Italy, 4 October 2018; pp. 13–18. [Google Scholar] [CrossRef]
Shafique, M.; Marchisio, A.; Putra, R.V.W.; Hanif, M.A. Towards Energy-Efficient and Secure Edge AI: A Cross-Layer Framework. In Proceedings of the 2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD), Munich, Germany, 1–4 November 2021; pp. 1–9. [Google Scholar] [CrossRef]
Hanif, M.A.; Shafique, M. Cross-Layer Optimizations for Efficient Deep Learning Inference at the Edge. In Embedded Machine Learning for Cyber-Physical, IoT, and Edge Computing; Springer: Cham, Switzerland, 2024; pp. 225–248. Available online: https://link.springer.com/chapter/10.1007/978-3-031-39932-9_9 (accessed on 17 May 2025).
Colin, J.; Surantha, N. Interpretable Deep Learning for Pneumonia Detection Using Chest X-Ray Images. Information 2025, 16, 53. [Google Scholar] [CrossRef]
Padilla, R.; Passos, W.L.; Dias, T.L.B.; Netto, S.L.; Da Silva, E.A.B. A comparative analysis of object detection metrics with a companion open-source toolkit. Electronics 2021, 10, 279. [Google Scholar] [CrossRef]
Yokomine, Y.; Surantha, N. Evaluation of Real-Time Train Overhead Line Component Detection on Edge Device. In Proceedings of the 2025 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), Fukuoka, Japan, 18–21 February 2025; pp. 55–59. [Google Scholar] [CrossRef]
Dong, W.; Lv, J.; Chen, G.; Wang, Y.; Li, H.; Gao, Y.; Bharadia, D. TinyNet: A Lightweight, Modular, and Unified Network Architecture for the Internet of Things. In Proceedings of the MobiSys ’22: 20th Annual International Conference on Mobile Systems, Applications and Services, Portland, Oregon, 27 June–1 July 2022; Volume 1, pp. 248–260. [Google Scholar] [CrossRef]
Moosmann, J.; Bonazzi, P.; Li, Y.; Bian, S.; Mayer, P.; Benini, L.; Magno, M. Ultra-Efficient On-Device Object Detection on AI-Integrated Smart Glasses with TinyissimoYOLO; Springer Nature: Cham, Switzerland, 2023; Volume 1, ISBN 9783031919893. [Google Scholar]
Yoo, Y.; Niu, Z.; Yoo, C.; Cheng, P.; Xiong, Y. SegaNet: An Advanced IoT Cloud Gateway for Performant and Priority-Oriented Message Delivery. In Proceedings of the APNet ’23: 7th Asia-Pacific Workshop on Networking, Hong Kong, China, 29–30 June 2023; pp. 54–60. [Google Scholar] [CrossRef]
Chen, C.; Min, H.; Peng, Y.; Yang, Y.; Wang, Z. An Intelligent Real-Time Object Detection System on Drones. Appl. Sci. 2022, 12, 10227. [Google Scholar] [CrossRef]
Wang, Y.; Wang, W.; Liu, D.; Jin, X.; Jiang, J.; Chen, K. Enabling Edge-Cloud Video Analytics for Robotics Applications. IEEE Trans. Cloud Comput. 2023, 11, 1500–1513. [Google Scholar] [CrossRef]
Maresch, M.; Nastic, S. VATE: Edge-Cloud System for Object Detection in Real-Time Video Streams. In Proceedings of the 2024 IEEE 8th International Conference on Fog and Edge Computing (ICFEC), Philadelphia, PA, USA, 6–9 May 2024; pp. 27–34. [Google Scholar] [CrossRef]
Vigil-Hayes, M.; Hossain, M.N.; Elliott, A.K.; Belding, E.M.; Zegura, E. LoRaX: Repurposing LoRa as a Low Data Rate Messaging System to Extend Internet Boundaries. In Proceedings of the COMPASS ’22: 5th ACM SIGCAS/SIGCHI Conference on Computing and Sustainable Societies, Seattle, WA, USA, 29 June–1 July 2022; pp. 195–213. [Google Scholar] [CrossRef]
Khalid, N.; Qayyum, A.; Bilal, M.; Al-Fuqaha, A.; Qadir, J. Privacy-preserving artificial intelligence in healthcare: Techniques and applications. Comput. Biol. Med. 2023, 158, 106848. [Google Scholar] [CrossRef]
Wu, Y.; Peng, P.; Cai, B.; Li, L. Batch-in-Batch: A new adversarial training framework for initial perturbation and sample selection. Complex. Intell. Syst. 2025, 11, 132. [Google Scholar] [CrossRef]
Rech, P. Artificial Neural Networks for Space and Safety-Critical Applications: Reliability Issues and Potential Solutions. IEEE Trans. Nucl. Sci. 2024, 71, 377–404. [Google Scholar] [CrossRef]
Zhu, S.; Ota, K.; Dong, M. Energy-Efficient Artificial Intelligence of Things With Intelligent Edge. IEEE Internet Things J. 2022, 9, 7525–7532. [Google Scholar] [CrossRef]
Salim, R.J.; Sürantha, N. Masked face recognition by zeroing the masked region without model retraining. Int. J. Innov. Comput. Inf. Control 2023, 19, 1087–1101. [Google Scholar] [CrossRef]
Surantha, N.; Yose, E.; Isa, S.M. Low-Resolution Face Recognition for CCTV and Edge-Powered Smart Attendance Systems. In Proceedings of the 2024 IEEE 48th Annual Computers, Software, and Applications Conference (COMPSAC), Osaka, Japan, 2–4 July 2024; pp. 676–681. [Google Scholar] [CrossRef]
Berardini, D.; Migliorelli, L.; Galdelli, A.; Marín-Jiménez, M.J. Edge artificial intelligence and super-resolution for enhanced weapon detection in video surveillance. Eng. Appl. Artif. Intell. 2025, 140, 109684. [Google Scholar] [CrossRef]
Yuan, Y.; Gao, S.; Zhang, Z.; Wang, W.; Xu, Z.; Liu, Z. Edge-Cloud Collaborative UAV Object Detection: Edge-Embedded Lightweight Algorithm Design and Task Offloading Using Fuzzy Neural Network. IEEE Trans. Cloud Comput. 2024, 12, 306–318. [Google Scholar] [CrossRef]
Ayoub, N.; Schneider-Kamp, P. Real-time on-board deep learning fault detection for autonomous UAV inspections. Electronics 2021, 10, 1091. [Google Scholar] [CrossRef]
Surantha, N.; Iwao, T.; Ren, Z.; Morishita, H. Digital Transformation on Power Transmission Line Inspection using Autonomous Drone and Deep Learning. In Proceedings of the 2022 2nd International Conference on Robotics, Automation and Artificial Intelligence (RAAI), Singapore, 9–11 December 2022; pp. 80–86. [Google Scholar] [CrossRef]
Schofield, O.B.; Iversen, N.; Ebeid, E. Autonomous power line detection and tracking system using UAVs. Microprocess. Microsyst. 2022, 94, 104609. [Google Scholar] [CrossRef]
Sawada, J.; Kusumoto, K.; Munakata, T.; Maikawa, Y.; Ishikawa, Y. A Mobile Robot for Inspection of Power Transmission Lines. IEEE Power Eng. Rev. 1991, 11, 57. [Google Scholar] [CrossRef]
Luque-Vega, L.F.; Castillo-Toledo, B.; Loukianov, A.; Gonzalez-Jimenez, L.E. Power line inspection via an unmanned aerial system based on the quadrotor helicopter. In Proceedings of the MELECON 2014—2014 17th IEEE Mediterranean Electrotechnical Conference, Beirut, Lebanon, 13–16 April 2014; pp. 393–397. [Google Scholar] [CrossRef]
Canziani, A.; Paszke, A.; Culurciello, E. An Analysis of Deep Neural Network Models for Practical Applications. arXiv 2016, arXiv:1605.07678. [Google Scholar]
Ahmed, S.; Al Arafat, A.; Najafi, D.; Mahmood, A.; Rizve, M.N.; Al Nahian, M.; Zhou, R.; Angizi, S.; Rakin, A.S. DeepCompress-ViT: Rethinking Model Compression to Enhance Efficiency of Vision Transformers at the Edge. In Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR), Nashville, TN, USA, 11–15 June 2025; pp. 30147–30156. [Google Scholar]
Achmadiah, M.N.; Ahamad, A.; Sun, C.C.; Kuo, W.K. Energy-Efficient Fast Object Detection on Edge Devices for IoT Systems. IEEE Internet Things J. 2025, 12, 16681–16694. [Google Scholar] [CrossRef]
Rupanetti, D.; Kaabouch, N. Combining Edge Computing-Assisted Internet of Things Security with Artificial Intelligence: Applications, Challenges, and Opportunities. Appl. Sci. 2024, 14, 7104. [Google Scholar] [CrossRef]
Aminifar, A.; Shokri, M.; Aminifar, A. Privacy-preserving edge federated learning for intelligent mobile-health systems. Futur. Gener. Comput. Syst. 2024, 161, 625–637. [Google Scholar] [CrossRef]
Rahbari, D.; Daneshtalab, M.; Jenihhin, M. An Efficient Architecture for Edge AI Federated Learning with Homomorphic Encryption. IEEE Access 2025, 13, 97919–97929. [Google Scholar] [CrossRef]
Wang, H.; Sayadi, H.; Pudukotai Dinakarrao, S.M.; Sasan, A.; Rafatirad, S.; Homayoun, H. Enabling Micro AI for Securing Edge Devices at Hardware Level. IEEE J. Emerg. Sel. Top. Circuits Syst. 2021, 11, 803–815. [Google Scholar] [CrossRef]

Figure 1. General architecture.

Figure 2. Key considerations of object recognition on edge computing.

Figure 3. Deep learning framework for real-time power line inspection.

Table 1. TPU vs. GPU vs. FPGA-based devices.

Features	GPU [49]	TPU [50]	FPGA [51]
Flexibility	High—general purpose computing tasks	Moderate—not well suited for other tasks outside AI	Very high—highly customizable at hardware level
Performance	High—good for training and inference	High—optimized for machine learning inference	Moderate to High—Can achieve high performance if optimized correctly
Ease of use	High—Easy with frameworks	High—Easy with frameworks	Low—requires hardware design expertise
Power efficiency	Moderate—Can consume significant power	High—Designed for low power consumption	Very high—Can be optimized for low power consumption

Table 2. Selection of edge devices.

Devices	Processor	Hardware Accelerator	Computer Vision Frameworks	Remarks
NVIDIA Jetson Nano [52]	Quad-core ARM Cortex-A57	NVIDIA Maxwell GPU (128 CUDA cores)	TensorFlow, PyTorch, TensorRT	Affordable GPU
NVIDIA Jetson Orin Nano [53]	6-core ARM Cortex-A78AE	NVIDIA Maxwell GPU (128 CUDA cores)	TensorFlow, PyTorch, TensorRT	High-performance GPU
Google Coral Dev board [54]	Quad-core ARM Cortex-A53	Edge TPU	TensorFlow Lite, Edge TPU Runtime	Optimized for TensorFlow models
Raspberry Pi 4 [55]	Quad-core ARM Cortex-A72	No built-in accelerator	TensorFlow Lite, Onnx	No dedicated AI accelerator
Xilinx Kria KV 260 [56]	Quad-core ARM Cortex-A53	Xilinx FPGA (Zynq Ultrascale+)	Vitis AI	Ideal for real-time applications
Flight RB5 5G Qualcomm UAV platform [38]	Qualcomm^® Kryo 585	Qualcomm^® Adreno 650	Qualcomm Neural Processing SDK	Ideal for real-time applications using UAV
Samsung Galaxy S24 Ultra [57]	Qualcomm Snapdragon octa-core	Qualcomm Snapdragon Adreno 750	Qualcomm Neural Processing SDK, TensorFlow Lite	Ideal for real-time applications using mobile phone

Table 3. Selection of deep learning frameworks for edge implementation.

Framework	Key Features	Advantages	Limitations	Common Edge Hardware
TensorFlow Lite [58]	Optimized version of TensorFlow for mobile and edge devices.	Lightweight, supports quantization and hardware acceleration.	Designed primarily for models trained in TensorFlow.	Google Coral, Raspberry Pi, NVIDIA Jetson
ONNX runtime [23]	Open runtime for deploying ONNX models across platforms.	Cross-framework compatibility, hardware acceleration support.	Requires converting models to ONNX format.	NVIDIA Jetson, Raspberry Pi
TensorRT [9]	NVIDIA’s deep learning inference optimizer and runtime.	High performance, low-latency inference for NVIDIA GPUs.	Limited to NVIDIA hardware.	NVIDIA Jetson
OpenVINO [60]	Intel’s inference optimization framework.	Optimized for Intel CPUs, GPUs, and VPUs.	Focused on Intel hardware ecosystem.	Intel Neural Compute Stick, Intel FPGA
Open CL [61]	Open standard for programming heterogeneous systems, including FPGAs.	Enables parallel computing, hardware-agnostic design, and portability.	Requires FPGA expertise and long compilation times.	Intel FPGAs, Xilinx FPGAs
VitisAI [12]	Xilinx’s AI framework for deploying machine learning on FPGA.	Optimized for Xilinx FPGAs, prebuilt AI model support, and easy hardware acceleration.	Requires Xilinx-specific tools and hardware.	Xilinx Kria KV260, Alveo U50, U280
FINN [63]	Framework for generating FPGA-optimized deep learning accelerators.	Focused on low-latency, quantized neural networks for FPGAs.	Limited to specific model architectures (e.g., quantized models).	Xilinx FPGAs

Table 5. Model compression techniques.

Model	Key Features	Advantages	Limitations	Common Use Cases
Model quantization [78]	Reduces the precision of weights and activations (e.g., from 32-bit floating-point to 8-bit integer).	Reduces memory usage, accelerates inference, and lowers power consumption.	May cause slight accuracy degradation.	Battery-powered edge devices, real-time systems.
Pruning and Sparsity [79]	Removes unnecessary weights or neurons from the model.	Reduces model size and speeds up computation.	Requires retraining or fine-tuning to minimize accuracy loss.	Lightweight edge models, IoT devices.
Knowledge Distillation [80]	Transfers knowledge from a large, complex model (teacher) to a smaller, efficient model (student).	Fits large models into edge hardware constraints.	Requires careful tuning to balance performance and accuracy.	Edge inference in low-memory devices.
Low-rank factorization [81]	Decomposes large-weight matrices into smaller, lower-rank components.	Reduces the number of parameters and computations.	May reduce model capacity, leading to accuracy degradation on complex data.	Efficient object recognition in video analytics and edge inference.
Layer Fusion [82]	Combines multiple operations into a single layer to reduce redundant computation.	Lowers inference latency and increases efficiency.	Limited to compatible operations and frameworks.	Optimized edge AI pipelines.

Table 6. Hardware optimization techniques.

Model	Key Features	Advantages	Limitations	Obtained Results
Pipeline design [83]	Breaks the computation into stages, with each stage processed in parallel.	Maximizes throughput and reduces latency.	Requires careful resource allocation to avoid bottlenecks.	Throughput of 24,715 GOP 4% lower power consumption (6.920 W) 18.86% less logic resource consumption
Parallel Processing [84]	Executes multiple tasks or operations simultaneously by leveraging the FPGA’s parallel architecture.	Improves computation speed significantly.	Increases resource usage and complexity.	Throughput of 3.458 Gbps 5736 slice LUT (25% utilization) 30 dB PSNR at compression ratios in the range of (0.08–0.38)
Dataflow Optimization [85]	Customizes data movement patterns to optimize memory access and reduce bottlenecks.	Reduces latency and enhances performance.	Requires detailed hardware knowledge.	1862.39 GFLOPS for 32-bit data width 3296.66 GFLOPS for 16-bit data width 2.08 times general computing unit (GCU) utilization compared
Winograd Convolution [86]	Optimizes convolution operations by reducing the number of multiplications required in CNNs.	High accuracy; optimized structure.	Effective only for certain kernel sizes and structures.	Reduction in error of 22–63%
Memory Hierarchy Optimization [87]	Customizes on-chip memory usage to reduce latency and increase data reuse.	Improves data throughput and reduces off-chip memory access.	Requires careful hardware planning.	Over 6-times-higher EMB training throughput on average for capacity-constrained deep learning recommendation models (DRLMs) The reduced access to the slower memory by over 87 times
Approximate computing [89]	Reduces precision or complexity in operations to improve efficiency.	Enhances energy efficiency and reduces latency.	May cause slight accuracy loss and requires application-specific tuning.	An average reduction of 49% in energy consumption and 30% in delay for tunable compressor compared to the state-of-the-art compressor 36% in energy consumption and 18% in delay for the MAC unit compared with the conventional MAC

Table 7. AI model performance metrics.

Metrics	Key Features	Importance	Considerations	Common Metrics
Inference Accuracy [94]	The percentage of correctly classified objects or detections.	Ensures reliable recognition results.	Must be balanced with efficiency goals; lower accuracy may be acceptable for lightweight models.	Accuracy, F1-score, precision, recall, mean average precision (mAP).
Processing time [96]	The real-time performance.	Improves computation speed significantly.	Critical for real-time applications.	Inference time (latency), inference time per image
Compute Efficiency [23]	FLOPs and MACs measure computational complexity.	Measures the efficiency of computing process.	Lower FLOPs/MACs indicate better efficiency on edge devices.	FLOPs/MACs.
Model size [97]	The total memory size of the model (in MB or GB).	Affects deployability on devices with limited storage.	Smaller models are preferred for edge devices.	Stored model size (in MB or GB).

Table 8. Hardware performance metrics.

Performance	Key Features	Importance	Considerations	Common Metrics
Compute throughput [98]	Measures the number of inferences per second that the hardware can perform.	Ensures the computing speed.	Critical for real-time applications.	Inference per second (IPS) or frames per second (FPS)
Hardware utilization [99]	The percentage of computational resources used during inference.	Measures the efficiency of hardware deployment.	High utilization indicates optimal use of resources.	GPU, CPU, TPU, RAM utilization
Power consumption (energy efficiency) [100]	The amount of energy consumed per inference.	Crucial for battery-operated or energy-constrained devices.	Must balance energy use with performance.	Consumed power per frame (watts), battery life (hours or minutes)
Thermal efficiency (heat dissipation) [41]	Thermal impact of the systems.	Edge devices must operate without overheating.	Efficient AI inference minimizes thermal impact.	Temperature (°C) under load

Table 9. Communication and network metrics.

Performance	Key Features	Importance	Considerations	Common Metrics
Bandwidth usage [101]	Measures the amount of data transferred between edge devices and the cloud.	Ensures efficient data processing to reduce network congestion.	Low bandwidth usage is crucial for remote IoT applications.	Data transfer in megabits per second (Mbps).
Edge–cloud offloading ratio [102]	The ratio of computing that occurs in edge or cloud.	Ensures efficient resource allocation.	Higher edge computation = lower latency and better privacy.	Percentage of inference performed on-device vs. cloud.
Connectivity latency [103]	Data transfer delay from edge to cloud.	Is critical for real-time applications.	It is influenced by data transfer size and network condition	Round-trip time (ms) for cloud interactions.
Success transmission rate [99]	Measures success rate of packet delivery.	Ensures successful transmission of data packet.	It is influenced by network condition, QoS level, and data packet size.	Success rate (%).

Table 10. Security and reliability metrics.

Performance	Key Features	Importance	Considerations	Common Metrics
Data privacy and security compliance [104]	Measure of data privacy level and compliance to security standards	Ensures sensitive AI data are processed locally instead of being exposed to security risks in the cloud.	Minimize data transmission to cloud, implementing local encryption and access control.	Compliance with GDPR, HIPAA, or ISO 27001 security standards. [104]
Model robustness [105]	Resilience of AI model against noise and outliers	AI models should handle adversarial attacks or noisy inputs effectively.	A robust model can handle unexpected situations on the edge without significantly compromising its performance.	Error rate under adversarial conditions (%).
System uptime and failure rate [106]	The availability of system	Measures the reliability and fault tolerance of the edge AI system.	How long it remains active and processing data without experiencing downtime	Mean Time Between Failures (MTBF) in hours.

Table 11. Key performance metrics used in previous research.

Reference	Title	Used Metrics
[107]	Energy-Efficient artificial intelligence of Things with Intelligent Edge objects or detections	Power consumption (watt), model loading time (ms), average processing time (ms), frame rate (FPS), energy consumption (Wh)
[23]	Real-time strawberry detection using deep neural networks on embedded system (rtsd-net): An edge AI application	mAP, F1-score, FPS, FLOPS
[108]	Masked Face Recognition by Zeroing the Masked Region without Model Retraining	Accuracy, F1-score, feature extraction time, inference time
[109]	Low-Resolution Face Recognition for CCTV and Edge-Powered Smart Attendance Systems	Accuracy, precision, recall, F1-score, inference time (ms), RAM/CPU/GPU utilization
[110]	Edge artificial intelligence and super-resolution for enhanced weapon detection in video surveillance	Average precision 50 (AP50), Giga FLOPS (GFLOPS), latency (ms/img), F1-score
[111]	Edge-Cloud Collaborative UAV Object Detection: Edge-Embedded Lightweight Algorithm Design and Task Offloading Using Fuzzy Neural Network	mAP, inference time
[44]	Low Latency YOLOv3-Tiny Accelerator for Low-Cost FPGA Using General Matrix Multiplication Principle	Hardware Utilization (BRAM, DSP, LUT, FF), gate counts conversion, latency, mAP, throughput (GOPS), power consumption
[45]	Fast and Scalable Multicore YOLOv3-Tiny Accelerator using Input Stationary Systolic Architecture	Hardware Utilization (BRAM, DSP, LUT, FF), gate counts conversion, latency, mAP, throughput (GOPS), power consumption

Table 12. Evaluation platforms specifications.

Specifications	Evaluation Platforms
Specifications	Raspberry Pi 4B	Jetson Nano	Jetson Orin Nano
CPU	1.5 GHz quad-core Cortex A-72	Quad-core ARM A57 @1.43 GHz	Quad-core ARM A57 @1.43 GHz
Hardware Accelerator	-	128-core Maxwell GPU	1024-core Ampere GPU
RAM	8 GB LPDDR4	4 GB 64-bit LPDDR4	8 GB 128-bit LPDDR5
Storage	microSDHC card 64 GB	microSDHC 64 GB.	microSDHC 64 GB

Table 13. Evaluation results.

Method	Edge Devices	Quantization Mode	Edge Framework	mAP	Inference Time (s)	RAM Utilization (MB)	Power Consumption (mW)	Scaled-EADS $Ε_{s c a l e d} (N)$
YOLOv7	Rasp. Pi 4B	FP 32	OpenVINO	0.959	16.40	3119	4590	0.000
	J. Nano	FP 16	PyTorch	0.966	1.24	6676	5540	0.005
	J. Nano	FP 16	TensorRT	0.962	0.33	5672	7290	0.019
	J. Orin Nano	FP 16	PyTorch	0.966	0.08	6775	5712	0.086
		FP 16	Tensor RT	0.962	0.03	4866	8043	0.227
		INT 8	TensorRT	0.943	0.02	4767	7402	0.346
YOLOV7-Tiny	Rasp. Pi 4B	FP 32	OpenVINO	0.956	2.74	1104	4210	0.006
	J. Nano	FP16	PyTorch	0.953	0.23	5961	4205	0.036
	J. Nano	FP16	TensorRT	0.949	0.06	5228	5393	0.130
	J. Orin Nano	FP16	PyTorch	0.953	0.03	6511	5096	0.243
		FP16	TensorRT	0.949	0.01	4780	6591	0.742
		INT8	TensorRT	0.936	0.008	4693	5464	1.000

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Surantha, N.; Sutisna, N. Key Considerations for Real-Time Object Recognition on Edge Computing Devices. Appl. Sci. 2025, 15, 7533. https://doi.org/10.3390/app15137533

AMA Style

Surantha N, Sutisna N. Key Considerations for Real-Time Object Recognition on Edge Computing Devices. Applied Sciences. 2025; 15(13):7533. https://doi.org/10.3390/app15137533

Chicago/Turabian Style

Surantha, Nico, and Nana Sutisna. 2025. "Key Considerations for Real-Time Object Recognition on Edge Computing Devices" Applied Sciences 15, no. 13: 7533. https://doi.org/10.3390/app15137533

APA Style

Surantha, N., & Sutisna, N. (2025). Key Considerations for Real-Time Object Recognition on Edge Computing Devices. Applied Sciences, 15(13), 7533. https://doi.org/10.3390/app15137533

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Key Considerations for Real-Time Object Recognition on Edge Computing Devices

Abstract

1. Introduction

2. System Overview

3. Key Considerations of Object Recognition on Edge Computing

3.1. Edge Devices

3.2. Deep Learning Frameworks for Edge Implementation

3.3. Lightweight Deep Learning Model

3.4. Hardware Optimization

3.5. Edge Performance Metrics

4. Case Study: Real-Time Power Line Inspection

5. Future Directions

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI