MDPI - Publisher of Open Access Journals

21 pages, 875 KiB

Open AccessArticle

Comprehensive Analysis of Neural Network Inference on Embedded Systems: Response Time, Calibration, and Model Optimisation

by Patrick Huber, Ulrich Göhner, Mario Trapp, Jonathan Zender and Rabea Lichtenberg

Sensors 2025, 25(15), 4769; https://doi.org/10.3390/s25154769 - 2 Aug 2025

Viewed by 184

Abstract

The response time of Artificial Neural Network (ANN) inference is critical in embedded systems processing sensor data close to the source. This is particularly important in applications such as predictive maintenance, which rely on timely state change predictions. This study enables estimation of [...] Read more.

The response time of Artificial Neural Network (ANN) inference is critical in embedded systems processing sensor data close to the source. This is particularly important in applications such as predictive maintenance, which rely on timely state change predictions. This study enables estimation of model response times based on the underlying platform, highlighting the importance of benchmarking generic ANN applications on edge devices. We analyze the impact of network parameters, activation functions, and single- versus multi-threading on response times. Additionally, potential hardware-related influences, such as clock rate variances, are discussed. The results underline the complexity of task partitioning and scheduling strategies, stressing the need for precise parameter coordination to optimise performance across platforms. This study shows that cutting-edge frameworks do not necessarily perform the required operations automatically for all configurations, which may negatively impact performance. This paper further investigates the influence of network structure on model calibration, quantified using the Expected Calibration Error (ECE), and the limits of potential optimisation opportunities. It also examines the effects of model conversion to Tensorflow Lite (TFLite), highlighting the necessity of considering both performance and calibration when deploying models on embedded systems. Full article

(This article belongs to the Section Fault Diagnosis & Sensors)

► Show Figures

Figure 1

21 pages, 3250 KiB

Open AccessArticle

Deploying Optimized Deep Vision Models for Eyeglasses Detection on Low-Power Platforms

by Henrikas Giedra, Tomyslav Sledevič and Dalius Matuzevičius

Electronics 2025, 14(14), 2796; https://doi.org/10.3390/electronics14142796 - 11 Jul 2025

Viewed by 489

Abstract

This research addresses the optimization and deployment of convolutional neural networks for eyeglasses detection on low-power edge devices. Multiple convolutional neural network architectures were trained and evaluated using the FFHQ dataset, which contains annotated eyeglasses in the context of faces with diverse facial [...] Read more.

This research addresses the optimization and deployment of convolutional neural networks for eyeglasses detection on low-power edge devices. Multiple convolutional neural network architectures were trained and evaluated using the FFHQ dataset, which contains annotated eyeglasses in the context of faces with diverse facial features and eyewear styles. Several post-training quantization techniques, including Float16, dynamic range, and full integer quantization, were applied to reduce model size and computational demand while preserving detection accuracy. The impact of model architecture and quantization methods on detection accuracy and inference latency was systematically evaluated. The optimized models were deployed and benchmarked on Raspberry Pi 5 and NVIDIA Jetson Orin Nano platforms. Experimental results show that full integer quantization reduces model size by up to 75% while maintaining competitive detection accuracy. Among the evaluated models, MobileNet architectures achieved the most favorable balance between inference speed and accuracy, demonstrating their suitability for real-time eyeglasses detection in resource-constrained environments. These findings enable efficient on-device eyeglasses detection, supporting applications such as virtual try-ons and IoT-based facial analysis systems. Full article

(This article belongs to the Special Issue Convolutional Neural Networks and Vision Applications, 4th Edition)

► Show Figures

Figure 1

18 pages, 1837 KiB

Open AccessArticle

Real-Time Dolphin Whistle Detection on Raspberry Pi Zero 2 W with a TFLite Convolutional Neural Network

by Rocco De Marco, Francesco Di Nardo, Alessandro Rongoni, Laura Screpanti and David Scaradozzi

Robotics 2025, 14(5), 67; https://doi.org/10.3390/robotics14050067 - 19 May 2025

Cited by 1 | Viewed by 1040

Abstract

The escalating conflict between cetaceans and fisheries underscores the need for efficient mitigation strategies that balance conservation priorities with economic viability. This study presents a TinyML-driven approach deploying an optimized Convolutional Neural Network (CNN) on a Raspberry Pi Zero 2 W for real-time [...] Read more.

The escalating conflict between cetaceans and fisheries underscores the need for efficient mitigation strategies that balance conservation priorities with economic viability. This study presents a TinyML-driven approach deploying an optimized Convolutional Neural Network (CNN) on a Raspberry Pi Zero 2 W for real-time detection of bottlenose dolphin whistles, leveraging spectrogram analysis to address acoustic monitoring challenges. Specifically, a CNN model previously developed for classifying dolphins’ vocalizations and originally implemented with TensorFlow was converted to TensorFlow Lite (TFLite) with architectural optimizations, reducing the model size by 76%. Both TensorFlow and TFLite models were trained on 22 h of underwater recordings taken in controlled environments and processed into 0.8 s spectrogram segments (300 × 150 pixels). Despite reducing model size, TFLite models maintained the same accuracy as the original TensorFlow model (87.8% vs. 87.0%). Throughput and latency were evaluated by varying the thread allocation (1–8 threads), revealing the best performance at 4 threads (quad-core alignment), achieving an inference latency of 120 ms and sustained throughput of 8 spectrograms/second. The system demonstrated robustness in 120 h of continuous stress tests without failure, underscoring its reliability in marine environments. This work achieved a critical balance between computational efficiency and detection fidelity (F1-score: 86.9%) by leveraging quantized, multithreaded inference. These advancements enable low-cost devices for real-time cetacean presence detection, offering transformative potential for bycatch reduction and adaptive deterrence systems. This study bridges artificial intelligence innovation with ecological stewardship, providing a scalable framework for deploying machine learning in resource-constrained settings while addressing urgent conservation challenges. Full article

(This article belongs to the Section Sensors and Control in Robotics)

► Show Figures

Graphical abstract

19 pages, 5545 KiB

Open AccessArticle

Edge Computing for AI-Based Brain MRI Applications: A Critical Evaluation of Real-Time Classification and Segmentation

by Khuhed Memon, Norashikin Yahya, Mohd Zuki Yusoff, Rabani Remli, Aida-Widure Mustapha Mohd Mustapha, Hilwati Hashim, Syed Saad Azhar Ali and Shahabuddin Siddiqui

Sensors 2024, 24(21), 7091; https://doi.org/10.3390/s24217091 - 4 Nov 2024

Cited by 2 | Viewed by 2827

Abstract

Medical imaging plays a pivotal role in diagnostic medicine with technologies like Magnetic Resonance Imagining (MRI), Computed Tomography (CT), Positron Emission Tomography (PET), and ultrasound scans being widely used to assist radiologists and medical experts in reaching concrete diagnosis. Given the recent massive [...] Read more.

Medical imaging plays a pivotal role in diagnostic medicine with technologies like Magnetic Resonance Imagining (MRI), Computed Tomography (CT), Positron Emission Tomography (PET), and ultrasound scans being widely used to assist radiologists and medical experts in reaching concrete diagnosis. Given the recent massive uplift in the storage and processing capabilities of computers, and the publicly available big data, Artificial Intelligence (AI) has also started contributing to improving diagnostic radiology. Edge computing devices and handheld gadgets can serve as useful tools to process medical data in remote areas with limited network and computational resources. In this research, the capabilities of multiple platforms are evaluated for the real-time deployment of diagnostic tools. MRI classification and segmentation applications developed in previous studies are used for testing the performance using different hardware and software configurations. Cost–benefit analysis is carried out using a workstation with a NVIDIA Graphics Processing Unit (GPU), Jetson Xavier NX, Raspberry Pi 4B, and Android phone, using MATLAB, Python, and Android Studio. The mean computational times for the classification app on the PC, Jetson Xavier NX, and Raspberry Pi are 1.2074, 3.7627, and 3.4747 s, respectively. On the low-cost Android phone, this time is observed to be 0.1068 s using the Dynamic Range Quantized TFLite version of the baseline model, with slight degradation in accuracy. For the segmentation app, the times are 1.8241, 5.2641, 6.2162, and 3.2023 s, respectively, when using JPEG inputs. The Jetson Xavier NX and Android phone stand out as the best platforms due to their compact size, fast inference times, and affordability. Full article

(This article belongs to the Section Biomedical Sensors)

► Show Figures

Figure 1

22 pages, 6602 KiB

Open AccessArticle

Performance Analysis of YOLO and Detectron2 Models for Detecting Corn and Soybean Pests Employing Customized Dataset

by Guilherme Pires Silva de Almeida, Leonardo Nazário Silva dos Santos, Leandro Rodrigues da Silva Souza, Pablo da Costa Gontijo, Ruy de Oliveira, Matheus Cândido Teixeira, Mario De Oliveira, Marconi Batista Teixeira and Heyde Francielle do Carmo França

Agronomy 2024, 14(10), 2194; https://doi.org/10.3390/agronomy14102194 - 24 Sep 2024

Cited by 3 | Viewed by 4267

Abstract

One of the most challenging aspects of agricultural pest control is accurate detection of insects in crops. Inadequate control measures for insect pests can seriously impact the production of corn and soybean plantations. In recent years, artificial intelligence (AI) algorithms have been extensively [...] Read more.

One of the most challenging aspects of agricultural pest control is accurate detection of insects in crops. Inadequate control measures for insect pests can seriously impact the production of corn and soybean plantations. In recent years, artificial intelligence (AI) algorithms have been extensively used for detecting insect pests in the field. In this line of research, this paper introduces a method to detect four key insect species that are predominant in Brazilian agriculture. Our model relies on computer vision techniques, including You Only Look Once (YOLO) and Detectron2, and adapts them to lightweight formats—TensorFlow Lite (TFLite) and Open Neural Network Exchange (ONNX)—for resource-constrained devices. Our method leverages two datasets: a comprehensive one and a smaller sample for comparison purposes. With this setup, the authors aimed at using these two datasets to evaluate the performance of the computer vision models and subsequently convert the best-performing models into TFLite and ONNX formats, facilitating their deployment on edge devices. The results are promising. Even in the worst-case scenario, where the ONNX model with the reduced dataset was compared to the YOLOv9-gelan model with the full dataset, the precision reached 87.3%, and the accuracy achieved was 95.0%. Full article

(This article belongs to the Section Precision and Digital Agriculture)

► Show Figures

Figure 1

20 pages, 4604 KiB

Open AccessArticle

On-Edge Deployment of Vision Transformers for Medical Diagnostics Using the Kvasir-Capsule Dataset

by Dara Varam, Lujain Khalil and Tamer Shanableh

Appl. Sci. 2024, 14(18), 8115; https://doi.org/10.3390/app14188115 - 10 Sep 2024

Cited by 6 | Viewed by 2631

Abstract

This paper aims to explore the possibility of utilizing vision transformers (ViTs) for on-edge medical diagnostics by experimenting with the Kvasir-Capsule image classification dataset, a large-scale image dataset of gastrointestinal diseases. Quantization techniques made available through TensorFlow Lite (TFLite), including post-training float-16 (F16) [...] Read more.

This paper aims to explore the possibility of utilizing vision transformers (ViTs) for on-edge medical diagnostics by experimenting with the Kvasir-Capsule image classification dataset, a large-scale image dataset of gastrointestinal diseases. Quantization techniques made available through TensorFlow Lite (TFLite), including post-training float-16 (F16) quantization and quantization-aware training (QAT), are applied to achieve reductions in model size, without compromising performance. The seven ViT models selected for this study are EfficientFormerV2S2, EfficientViT_B0, EfficientViT_M4, MobileViT_V2_050, MobileViT_V2_100, MobileViT_V2_175, and RepViT_M11. Three metrics are considered when analyzing a model: (i) F1-score, (ii) model size, and (iii) performance-to-size ratio, where performance is the F1-score and size is the model size in megabytes (MB). In terms of F1-score, we show that MobileViT_V2_175 with F16 quantization outperforms all other models with an F1-score of 0.9534. On the other hand, MobileViT_V2_050 trained using QAT was scaled down to a model size of 1.70 MB, making it the smallest model amongst the variations this paper examined. MobileViT_V2_050 also achieved the highest performance-to-size ratio of 41.25. Despite preferring smaller models for latency and memory concerns, medical diagnostics cannot afford poor-performing models. We conclude that MobileViT_V2_175 with F16 quantization is our best-performing model, with a small size of 27.47 MB, providing a benchmark for lightweight models on the Kvasir-Capsule dataset. Full article

(This article belongs to the Special Issue AI Technologies for eHealth and mHealth)

► Show Figures

Figure 1

17 pages, 8221 KiB

Open AccessArticle

A Mobile Image Aesthetics Processing System with Intelligent Scene Perception

by Xiaoyan Zhao, Ling Shi, Zhao Han and Peiyan Yuan

Appl. Sci. 2024, 14(2), 822; https://doi.org/10.3390/app14020822 - 18 Jan 2024

Viewed by 1845

Abstract

Image aesthetics processing (IAP) is used primarily to enhance the aesthetic quality of images. However, IAP faces several issues, including its failure to analyze the influence of visual scene information and the difficulty of deploying IAP capabilities to mobile devices. This study proposes [...] Read more.

Image aesthetics processing (IAP) is used primarily to enhance the aesthetic quality of images. However, IAP faces several issues, including its failure to analyze the influence of visual scene information and the difficulty of deploying IAP capabilities to mobile devices. This study proposes an automatic IAP system (IAPS) for mobile devices that integrates machine learning and traditional image-processing methods. First, we employ an extremely computation-efficient deep learning model, ShuffleNet, designed for mobile devices as our scene recognition model. Then, to enable computational inferencing on resource-constrained edge devices, we use a modern mobile machine-learning library, TensorFlow Lite, to convert the model type to TFLite format. Subsequently, we adjust the image contrast and color saturation using group filtering, respectively. These methods enable us to achieve maximal aesthetic enhancement of images with minimal parameter adjustments. Finally, we use the InceptionResNet-v2 aesthetic evaluation model to rate the images. Even when employing the benchmark model with an accuracy of 70%, the score of the IAPS processing image is verified to be higher and more effective compared with a state-of-the-art smartphone’s beautification function. Additionally, an anonymous questionnaire survey with 100 participants is conducted, and the result shows that IAPS enhances the aesthetic appeal of images based on the public’s preferences. Full article

► Show Figures

Figure 1

18 pages, 3154 KiB

Open AccessArticle

A Methodology and Open-Source Tools to Implement Convolutional Neural Networks Quantized with TensorFlow Lite on FPGAs

by Dorfell Parra, David Escobar Sanabria and Carlos Camargo

Electronics 2023, 12(20), 4367; https://doi.org/10.3390/electronics12204367 - 21 Oct 2023

Cited by 4 | Viewed by 2511

Abstract

Convolutional neural networks (CNNs) are used for classification, as they can extract complex features from input data. The training and inference of these networks typically require platforms with CPUs and GPUs. To execute the forward propagation of neural networks in low-power devices with [...] Read more.

Convolutional neural networks (CNNs) are used for classification, as they can extract complex features from input data. The training and inference of these networks typically require platforms with CPUs and GPUs. To execute the forward propagation of neural networks in low-power devices with limited resources, TensorFlow introduced TFLite. This library enables the inference process on microcontrollers by quantizing the network parameters and utilizing integer arithmetic. A limitation of TFLite is that it does not support CNNs to perform inference on FPGAs, a critical need for embedded applications that require parallelism. Here, we present a methodology and open-source tools for implementing CNNs quantized with TFLite on FPGAs. We developed a customizable accelerator for AXI-Lite-based systems on chips (SoCs), and we tested it on a Digilent Zybo-Z7 board featuring the XC7Z020 FPGA and an ARM processor at 667 MHz. Moreover, we evaluated this approach by employing CNNs trained to identify handwritten characters using the MNIST dataset and facial expressions with the JAFFE database. We validated the accelerator results with TFLite running on a laptop with an AMD 16-thread CPU running at 4.2 GHz and 16 GB RAM. The accelerator’s power consumption was 11× lower than the laptop while keeping a reasonable execution time. Full article

(This article belongs to the Topic Machine Learning in Internet of Things)

► Show Figures

Figure 1

19 pages, 4975 KiB

Open AccessArticle

Deep Learning-Based Yoga Posture Recognition Using the Y_PN-MSSD Model for Yoga Practitioners

by Aman Upadhyay, Niha Kamal Basha and Balasundaram Ananthakrishnan

Healthcare 2023, 11(4), 609; https://doi.org/10.3390/healthcare11040609 - 17 Feb 2023

Cited by 40 | Viewed by 8175

Abstract

In today’s digital world, and in light of the growing pandemic, many yoga instructors opt to teach online. However, even after learning or being trained by the best sources available, such as videos, blogs, journals, or essays, there is no live tracking available [...] Read more.

In today’s digital world, and in light of the growing pandemic, many yoga instructors opt to teach online. However, even after learning or being trained by the best sources available, such as videos, blogs, journals, or essays, there is no live tracking available to the user to see if he or she is holding poses appropriately, which can lead to body posture issues and health issues later in life. Existing technology can assist in this regard; however, beginner-level yoga practitioners have no means of knowing whether their position is good or poor without the instructor’s help. As a result, the automatic assessment of yoga postures is proposed for yoga posture recognition, which can alert practitioners by using the Y_PN-MSSD model, in which Pose-Net and Mobile-Net SSD (together named as TFlite Movenet) play a major role. The Pose-Net layer takes care of the feature point detection, while the mobile-net SSD layer performs human detection in each frame. The model is categorized into three stages. Initially, there is the data collection/preparation stage, where the yoga postures are captured from four users as well as an open-source dataset with seven yoga poses. Then, by using these collected data, the model undergoes training where the feature extraction takes place by connecting key points of the human body. Finally, the yoga posture is recognized and the model assists the user through yoga poses by live-tracking them, as well as correcting them on the fly with 99.88% accuracy. Comparatively, this model outperforms the performance of the Pose-Net CNN model. As a result, the model can be used as a starting point for creating a system that will help humans practice yoga with the help of a clever, inexpensive, and impressive virtual yoga trainer. Full article

(This article belongs to the Special Issue Information Technologies Applied on Healthcare)

► Show Figures

Figure 1

15 pages, 827 KiB

Open AccessArticle

Prediction of Glucose Concentration in Children with Type 1 Diabetes Using Neural Networks: An Edge Computing Application

by Federico D’Antoni, Lorenzo Petrosino, Fabiola Sgarro, Antonio Pagano, Luca Vollero, Vincenzo Piemonte and Mario Merone

Bioengineering 2022, 9(5), 183; https://doi.org/10.3390/bioengineering9050183 - 21 Apr 2022

Cited by 17 | Viewed by 2958

Abstract

Background: Type 1 Diabetes Mellitus (T1D) is an autoimmune disease that can cause serious complications that can be avoided by preventing the glycemic levels from exceeding the physiological range. Straightforwardly, many data-driven models were developed to forecast future glycemic levels and to allow [...] Read more.

Background: Type 1 Diabetes Mellitus (T1D) is an autoimmune disease that can cause serious complications that can be avoided by preventing the glycemic levels from exceeding the physiological range. Straightforwardly, many data-driven models were developed to forecast future glycemic levels and to allow patients to avoid adverse events. Most models are tuned on data of adult patients, whereas the prediction of glycemic levels of pediatric patients has been rarely investigated, as they represent the most challenging T1D population. Methods: A Convolutional Neural Network (CNN) and a Long Short-Term Memory (LSTM) Recurrent Neural Network were optimized on glucose, insulin, and meal data of 10 virtual pediatric patients. The trained models were then implemented on two edge-computing boards to evaluate the feasibility of an edge system for glucose forecasting in terms of prediction accuracy and inference time. Results: The LSTM model achieved the best numeric and clinical accuracy when tested in the .tflite format, whereas the CNN achieved the best clinical accuracy in uint8. The inference time for each prediction was far under the limit represented by the sampling period. Conclusion: Both models effectively predict glucose in pediatric patients in terms of numerical and clinical accuracy. The edge implementation did not show a significant performance decrease, and the inference time was largely adequate for a real-time application. Full article

► Show Figures

Figure 1

19 pages, 2045 KiB

Open AccessArticle

A Deep Learning Framework Performance Evaluation to Use YOLO in Nvidia Jetson Platform

by Dong-Jin Shin and Jeong-Joon Kim

Appl. Sci. 2022, 12(8), 3734; https://doi.org/10.3390/app12083734 - 7 Apr 2022

Cited by 49 | Viewed by 10606

Abstract

Deep learning-based object detection technology can efficiently infer results by utilizing graphics processing units (GPU). However, when using general deep learning frameworks in embedded systems and mobile devices, processing functionality is limited. This allows deep learning frameworks such as TensorFlow-Lite (TF-Lite) and TensorRT [...] Read more.

Deep learning-based object detection technology can efficiently infer results by utilizing graphics processing units (GPU). However, when using general deep learning frameworks in embedded systems and mobile devices, processing functionality is limited. This allows deep learning frameworks such as TensorFlow-Lite (TF-Lite) and TensorRT (TRT) to be optimized for different hardware. Therefore, this paper introduces a performance inference method that fuses the Jetson monitoring tool with TensorFlow and TRT source code on the Nvidia Jetson AGX Xavier platform. In addition, central processing unit (CPU) utilization, GPU utilization, object accuracy, latency, and power consumption of the deep learning framework were compared and analyzed. The model is You Look Only Once Version4 (YOLOv4), and the dataset uses Common Objects in Context (COCO) and PASCAL Visual Object Classes (VOC). We confirmed that using TensorFlow results in high latency. We also confirmed that TensorFlow-TensorRT (TF-TRT) and TRT using Tensor Cores provide the most efficiency. However, it was confirmed that TF-Lite showed the lowest performance because it utilizes a GPU limited to mobile devices. Through this paper, we think that when developing deep learning-related object detection technology on the Nvidia Jetson platform or desktop environment, services and research can be efficiently conducted through measurement results. Full article

(This article belongs to the Special Issue Data Analysis and Artificial Intelligence for IoT)

► Show Figures

Figure 1

Search Results (11)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (11)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI