Deploying Optimized Deep Vision Models for Eyeglasses Detection on Low-Power Platforms

Giedra, Henrikas; Sledevič, Tomyslav; Matuzevičius, Dalius

doi:10.3390/electronics14142796

Open AccessArticle

Deploying Optimized Deep Vision Models for Eyeglasses Detection on Low-Power Platforms

by

Henrikas Giedra

^*

,

Tomyslav Sledevič

and

Dalius Matuzevičius

Department of Electronic Systems, Vilnius Gediminas Technical University (VILNIUS TECH), 10105 Vilnius, Lithuania

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(14), 2796; https://doi.org/10.3390/electronics14142796

Submission received: 17 June 2025 / Revised: 9 July 2025 / Accepted: 9 July 2025 / Published: 11 July 2025

(This article belongs to the Special Issue Convolutional Neural Networks and Vision Applications, 4th Edition)

Download

Browse Figures

Versions Notes

Abstract

This research addresses the optimization and deployment of convolutional neural networks for eyeglasses detection on low-power edge devices. Multiple convolutional neural network architectures were trained and evaluated using the FFHQ dataset, which contains annotated eyeglasses in the context of faces with diverse facial features and eyewear styles. Several post-training quantization techniques, including Float16, dynamic range, and full integer quantization, were applied to reduce model size and computational demand while preserving detection accuracy. The impact of model architecture and quantization methods on detection accuracy and inference latency was systematically evaluated. The optimized models were deployed and benchmarked on Raspberry Pi 5 and NVIDIA Jetson Orin Nano platforms. Experimental results show that full integer quantization reduces model size by up to 75% while maintaining competitive detection accuracy. Among the evaluated models, MobileNet architectures achieved the most favorable balance between inference speed and accuracy, demonstrating their suitability for real-time eyeglasses detection in resource-constrained environments. These findings enable efficient on-device eyeglasses detection, supporting applications such as virtual try-ons and IoT-based facial analysis systems.

Keywords:

edge computing; eyewear detection; model quantization; TFLite; model optimization; glasses detection; eyeglasses; object detection; Raspberry Pi; Jetson Orin Nano

1. Introduction

In recent years, there has been a paradigm shift in artificial intelligence towards edge computing, where data is processed directly on-device (e.g., smartphones, IoT sensors, wearables) instead of in the cloud [1,2]. Running computer vision models on edge devices can drastically reduce latency and improve privacy, enabling real-time visual data processing for critical applications across industries [1,3]. However, deploying state-of-the-art deep learning models on resource-constrained edge devices remains highly challenging [4,5]. Modern deep neural networks often demand considerable memory and computational power—typically requiring GPU-class hardware—which prohibits their use on typical edge platforms with limited hardware and power supply [4,5,6]. For instance, traditional deep learning models can draw on the order of 1–30 W of power, making them impractical for ultra-low-power wearable systems like smart glasses [7,8,9]. While emerging TinyML techniques can execute neural networks at far lower power, they often come at the cost of reduced accuracy [10]. This creates a pressing need for strategies to maintain high accuracy of vision models while dramatically improving their efficiency for on-device inference [4,5,11].

To address these constraints, researchers have devoted significant effort to optimizing deep vision models for low-power and embedded devices. Broadly, two complementary approaches have emerged: designing efficient model architectures from the ground up, and applying model compression and optimization techniques to shrink existing models [12,13]. On the architecture side, the goal is to craft neural networks that achieve the best possible accuracy with a much smaller footprint in terms of parameters and operations [13,14,15]. Notable examples include MobileNet [16,17,18] and SqueezeNet [19]. MobileNet is a family of convolutional neural network (CNN) architectures explicitly designed for small size, low latency, and low power consumption, making them well-suited for on-device inference on mobile phones and embedded systems [20]. By using depth-wise separable convolutions, MobileNet achieves substantial speed-ups while maintaining accuracy and was intended to run efficiently on-device, for example, with TensorFlow Lite [20]. Similarly, SqueezeNet was engineered to reach AlexNet-level image classification performance with 50× fewer parameters, resulting in a model 510× smaller than AlexNet without accuracy loss. These and other compact architectures (such as ShuffleNet and EfficientNet) demonstrate that significant redundancy in deep networks can be eliminated through careful design, yielding models that are inherently more efficient for edge deployment [12,14,15,21].

Various model compression and optimization techniques can be applied to existing deep networks to make them lighter and faster [22]. Key techniques include pruning, quantization, and knowledge distillation [23,24,25]. Pruning removes unnecessary weights or filters from the network (e.g., those with small magnitudes), thereby sparsifying the model and dramatically reducing its size and computations by cutting out redundant connections, often with minimal impact on accuracy [26,27]. Quantization reduces the numerical precision of model parameters and operations (e.g., from 32-bit floating-point to 8-bit integers), reducing memory usage and accelerating inference by using integer arithmetic [28]. When combined with pruning, it can further shrink the model’s footprint significantly. Many edge AI frameworks now support 8-bit quantized inference, seeking a balance between speed and accuracy. Knowledge distillation involves training a smaller “student” model to replicate the outputs of a large “teacher” model. This technique transfers knowledge from a complex model into a compact one, often preserving much of the accuracy of the original model while using far fewer parameters [29]. Together with other methods (like low-rank factorization), pruning, quantization, and distillation can shrink deep models with only minor accuracy degradation, which is crucial for deploying advanced vision algorithms on devices with limited computing power and memory [30]. Such model compression techniques are considered essential for bringing heavy AI models from the cloud to the edge [23,24,31].

Another optimization direction is leveraging hardware-software co-design for edge AI [32,33]. Efficient implementations in inference engines, together with dedicated hardware accelerators (Figure 1), can significantly improve performance per watt [34]. For instance, vision processing units (VPUs) and neural processing units (NPUs) have been introduced to accelerate deep vision tasks on-device with minimal power usage [35]. An example is Intel’s Neural Compute Stick 2, powered by a Movidius Myriad X VPU, which runs vision inference (e.g., object or face recognition) on a USB-sized device at about 2–3 W [35]. Edge TPUs (Google’s Tensor Processing Units for edge) similarly provide hardware acceleration for neural networks on embedded systems [36], and embedded GPUs/DSPs on mobile SoCs are increasingly optimized for running neural networks. These specialized chips, together with efficient frameworks such as TensorFlow Lite (Google LLC, Mountain View, CA, USA) or PyTorch Mobile (Meta Platforms, Inc., Menlo Park, CA, USA), form part of the overall optimization toolkit, enabling complex computer vision models to run in real time under low-power resources [37]. The choice of model architecture and optimizations often involves a trade-off between accuracy and efficiency. Pushing for maximum accuracy with larger models can overwhelm an edge device, whereas aggressive compression or tiny models may sacrifice accuracy. Researchers actively seek balanced solutions that provide both high accuracy and low resource usage [38,39]. Generally, combining smart architectural choices (e.g., using a lightweight model such as MobileNet instead of a heavy ResNet) with compression techniques (pruning, quantization, etc.) makes it possible to deploy tasks like object detection and recognition directly on low-power devices [37,40]. One-stage object detectors such as YOLO and Single Shot MultiBox Detector (SSD) are examples of this philosophy–they forego the heavy multi-stage processing of earlier R-CNN methods and therefore require far fewer computational resources, making them much more suitable for edge deployment [41,42,43,44]. YOLO’s lightweight architecture and SSD’s balance of accuracy and speed have made them popular for real-time vision in embedded applications ranging from surveillance cameras to autonomous drones [41,45,46,47,48].

Amid these advances in efficient vision AI, a compelling application area–and the focus of this work–is detecting eyeglasses (Figure 2). The goal here is to localize the eyeglasses on a face. Although eyeglass detection may seem like a narrow task, it plays an important role in several contexts [49,50]. In face recognition and soft biometrics for person identification, knowing if a subject is wearing glasses is useful both as an identifying attribute and a way to handle variations in facial appearance [51,52]. In surveillance and forensic analysis, automatically detecting glasses can aid in recognizing or tracking individuals [53,54,55]. Glasses are considered a cardinal component in facial/ocular analysis for biometrics and security systems [56]. Furthermore, in augmented reality and human–computer interaction (e.g., virtual try-on applications or smart eyewear), detecting the presence of glasses is a prerequisite for subsequent processing [57,58,59,60]. For example, a wearable device may need to detect if the user already has prescription glasses on, in order to adjust an AR headset’s display or to issue alerts [61]. Despite its usefulness, automatic glasses detection is a surprisingly challenging computer vision problem in practice. The difficulty arises from a combination of factors: face pose variations, lighting conditions (glare/reflections on lenses), occlusions, and the often subtle appearance of transparent eyeglasses frames [62,63]. These factors can significantly perturb the visual features needed to distinguish glasses, especially in unconstrained “in the wild” images. Under favorable conditions, glasses detection can be largely solved [64,65]. However, many of the high-performance results were obtained on relatively standardized or high-quality images. When these algorithms face real-world, non-standard imagery (e.g., casual selfie photos from mobile cameras), performance tends to drop noticeably [66]. With sufficient training data and network capacity, deep models can handle the nuances of glasses in diverse environments to a large extent.

Although improvements in the accuracy of eyeglasses detection models have been achieved, relatively little attention has been devoted to the efficiency and deployment of these models on low-power devices. Most existing works focus on improving accuracy in laboratory settings, using powerful GPUs or cloud-based inference. The computational load of the deep models can be significant. This hints at an accuracy–efficiency trade-off: a smaller model may be more feasible for real-time use on a phone or smart glasses, but might underperform compared to large networks. For deployment in wearable devices—such as smart glasses capable of detecting the presence of eyeglasses—models must be designed to be both lightweight and effective. Battery-powered devices have severe constraints on processing (often a few hundred MHz CPU or a modest neural accelerator) and power (operating on a few watts or less). High computational overhead is untenable in such scenarios, as it would drain the battery and overheat the device. Consequently, a strong motivation exists for the application of comprehensive edge model optimizations to the eyeglasses detection problem. In this paper, the gap between accuracy and efficiency in eyeglasses detection is addressed. Advances in model optimization are leveraged to build a solution that can run in real time on low-power hardware without sacrificing the accuracy needed for reliable glasses detection. By optimizing a deep vision model for the eyeglasses detection task and targeting a representative edge platform, robust performance is demonstrated within the constraints of limited computational resources.

The novelty and contributions of this work may be summarized in the following:

Performing a systematic study on the optimization of CNNs for the specific task of eyeglasses detection on low-power edge devices, a domain where the trade-offs between accuracy and computational efficiency have not been comprehensively evaluated in the context of this application.
Systematically evaluating multiple CNN architectures using the high-quality and diverse FFHQ dataset, annotated with eyeglass bounding boxes, to determine model generalization across various facial attributes and eyewear styles.
Applying and comparing several quantization techniques, including Float16, dynamic range, and full integer quantization, to reduce model size and computational requirements while preserving detection accuracy.
Deploying and benchmarking the optimized models on Raspberry Pi 5 and NVIDIA Jetson Orin Nano platforms, providing a comprehensive assessment of real-world inference latency, detection accuracy in terms of IoU, and memory footprint in practical edge computing scenarios.

The problem of detecting eyeglasses in images, deploying and benchmarking the optimized models on Raspberry Pi 5 and NVIDIA Jetson Orin Nano platforms is summarized in Figure 1 and Figure 2.

The structure of the paper is organized as follows. Section 2 details the experimental setup, including a summary of evaluated convolutional neural network architectures, model optimization strategies for edge deployment, and the performance evaluation methodology. This section also describes the data preparation process, annotation protocols, and software and hardware platforms used for benchmarking. Section 3 presents a comparative analysis of model accuracy, resource efficiency, and quantization effects on inference speed and detection performance across two representative low-power platforms. The practical implications of the findings, platform-specific considerations, and error analysis are discussed in depth. Finally, Section 4 concludes the paper by summarizing the principal results and outlining directions for future research.

2. Materials and Methods

This section describes the methodology employed to develop and evaluate deep learning models for eyeglasses detection on embedded edge platforms. The experimental design encompasses model selection and optimization strategies, dataset preparation and annotation procedures, and performance evaluation metrics. Additionally, the hardware platforms used for both training and deployment benchmarking are detailed to provide a comprehensive context for the reported results.

2.1. Experimental Setup

2.1.1. Model Summary

To systematically identify an optimal network for efficient eyeglasses detection, a set of CNN architectures was trained and evaluated, all implemented in TensorFlow/Keras using Keras Applications as backbones and initialized from ImageNet pre-trained weights when available. The models compared included MobileNet (width multiplier

α = 0.5

) [16], MobileNetV2 (

α = 0.5

) [17], DenseNet121 [68], EfficientNetB0–B5 [69], and EfficientNetV2B0–B3 [70] (Table 1). MobileNet and MobileNetV2 are families of lightweight CNNs that leverage depthwise separable convolutions (and, for V2, inverted residuals and linear bottlenecks) to minimize computational load, making them well suited for mobile and embedded inference. EfficientNet models employ compound scaling of depth, width, and input resolution to achieve superior accuracy–efficiency tradeoffs, with B0 being the smallest and B5 the largest in the selection; EfficientNetV2 variants offer even better parameter efficiency and training speed via architectural refinements and fused-MBConv blocks. DenseNet121, a densely connected CNN with 121 layers, served as a higher-accuracy, higher-complexity baseline, utilizing feature reuse via dense connections to potentially improve gradient flow and accuracy at a higher memory cost.

Each backbone had its original classification head replaced by a custom detection head for bounding box regression (Figure 3). The output of the backbone’s final convolutional feature map was processed through global average pooling (or flattening), followed by fully connected layers that output the four normalized bounding box coordinates (xmin, ymin, xmax, ymax). The task was formulated as a single-object localization problem: if an image contained no eyeglasses, no bounding box was provided as ground truth, and the model was expected to output none. During training, only images with bounding boxes contributed to the regression loss.

2.1.2. Model Training

All models were trained on a workstation with an NVIDIA RTX 4090 GPU (24 GB VRAM), using TensorFlow 2.x and Keras (see Section 2.3 and Section 2.4). Training hyperparameters were fixed across all architectures for fair comparison: an input size of 384 × 384 pixels, batch size 64, for 200 epochs (each epoch comprising 500 steps), and an initial learning rate of 0.002 decaying gradually to 0.0001. The Adam optimizer was used with default momentum settings. The loss function for bounding box regression was smooth L1 (Huber loss), which combines the robustness of L1 with the stability of L2 losses. No objectness loss was used, given that the dataset primarily comprised images containing eyeglasses. All layers were fine-tuned, but pre-trained layers used a reduced learning rate to prevent early loss of useful features.

2.1.3. Model Optimizations for Edge Deployment

TensorFlow Lite (TFLite) supports several quantization strategies, each with distinct trade-offs concerning model size, computational efficiency, and deployment compatibility. Post-training float16 quantization (denoted “float16” in this research) is a straightforward approach where the model weights are converted from 32-bit floating point (float32) to 16-bit floating point (float16) values. This conversion is performed without the need for a calibration or representative dataset and does not quantize activations, inputs, or outputs. The primary benefit of float16 quantization is a reduction in model size—typically by approximately 50%—while retaining most of the model’s original accuracy, as float16 can represent a similar dynamic range as float32, though with less precision. During inference, however, calculations often revert to float32 unless the target hardware natively supports float16 operations, limiting computational speedup but enhancing model portability and reducing memory bandwidth requirements.

Dynamic range quantization (“dynamic_int8”) offers more aggressive compression and performance optimization by quantizing the model’s weights from float32 to 8-bit integers (int8), while keeping activations, inputs, and outputs in float32. This approach is performed post-training and does not require a representative dataset, as the quantization parameters for weights are statically determined from the trained values. Dynamic range quantization typically achieves a 4x reduction in model size. Inference speedups are often observed, especially when the hardware can efficiently handle int8 data loads; however, the computational graph remains mixed-precision, with some operations, particularly those involving activations, still executing in float32. As such, this method provides a balanced trade-off between compression and computational efficiency, and is compatible with a broad range of hardware, including those lacking dedicated integer computation units.

For maximum optimization, full integer quantization (“full_int8”) converts not only the weights but also activations, inputs, and outputs to the int8 data type, producing a fully integer model. This comprehensive quantization process necessitates a small representative dataset to calibrate the dynamic ranges of activations, ensuring that quantization parameters accurately reflect real-world data distributions encountered during inference. Full integer quantization delivers substantial model size reductions, typically by a factor of four, and enables the deployment of TFLite models on specialized accelerators such as Edge TPUs or ARM CPUs supporting integer-only arithmetic. However, this method may introduce a marginal decrease in predictive accuracy, especially if the representative dataset does not adequately reflect the diversity of the target data distribution. Despite this potential accuracy loss, full integer quantization is essential for use cases demanding high-throughput, low-latency, and low-power execution on edge devices.

2.1.4. Performance Evaluation

Model performance was evaluated using intersection over union (IoU) and inference rate (frames per second, FPS). IoU, a standard object detection metric, measures the overlap between predicted and ground truth bounding boxes as the ratio of their intersection area to their union area, ranging from 0 (no overlap) to 1 (perfect alignment) [71]. Mean IoU calculated over the validation sets of 3-fold cross-validation splits was used to summarize model localization accuracy.

After training, all models were exported from Keras to TFLite format to enable efficient deployment on embedded devices. Three post-training quantization strategies were applied for each model: (1) float16 quantization, which stores weights in 16-bit floating point format, reducing model size by roughly half while typically preserving accuracy; (2) dynamic range quantization, which converts weights to 8-bit integers and leaves activations as floating point, usually reducing model size by a factor of four and improving CPU speed with minimal accuracy drop; and (3) full integer quantization, which converts both weights and activations to 8-bit integers after calibrating with a representative dataset, maximizing compression and inference speed on ARM CPUs or compatible NPUs. The impact of each quantization mode was measured by comparing the model’s file size, speed, and accuracy before and after quantization.

Optimized TFLite models were deployed and benchmarked on two representative low-power platforms: Raspberry Pi 5 and NVIDIA Jetson Orin Nano. The Raspberry Pi 5 features a Broadcom BCM2712 SoC with a quad-core Cortex-A76 CPU at 2.4 GHz and 4–8 GB RAM, running 64-bit Raspberry Pi OS; inference used the TFLite runtime and all available CPU cores, and the system was set to “performance” governor mode to avoid CPU throttling. The Jetson Orin Nano contains a 6-core ARM Cortex-A78AE CPU and an Ampere GPU with 1024 CUDA cores and 32 Tensor Cores, running Ubuntu with JetPack SDK. For consistency and to isolate the effect of quantization, inference was run on the CPU using TFLite on both devices, without resorting to GPU or TensorRT acceleration. Devices were operated in headless mode via SSH, with the graphical user interface and most unnecessary system services disabled. This helped reduce system noise and ensured a more controlled and repeatable environment for benchmarking.

For benchmarking, a Python (version 3.11.9) script loaded each TFLite model and processed the validation set of each 3-fold cross-validation split, measuring wall-clock time for each inference with a batch size of one. Multiple runs were used (100 forward passes), ignoring 20 initial warm-up iterations, and per-image average inference time was converted to FPS. For each device and model variant, the following were recorded: (1) IoU on the validation split of a 3-fold cross-validation setup, (2) inference rate (FPS), and (3) model size on disk. This enabled a detailed analysis of trade-offs between accuracy, speed, and memory use for different architectures and quantization methods. The results were summarized with plots of IoU versus FPS, both pre- and post-quantization, highlighting the shift in the accuracy-speed curve due to model optimization. The change in model size, inference rate power draw due to quantization was also plotted. These empirical measurements provided insight into which model architectures and quantization strategies offer the best balance of performance and efficiency for real-time eyeglasses detection on embedded edge devices.

2.2. Data Preparation

This research used the Flickr-Faces-HQ (FFHQ) dataset as the primary source of images for training and evaluation. FFHQ is a large-scale collection of high-quality human face images, originally comprising 70,000 photographs at 1024 × 1024 resolution, notable for its diversity in age, ethnicity, and facial accessories, including a wide variety of eyeglasses styles [67]. For the experiments, a subset of approximately 16,000 images was used, each annotated with an eyeglasses bounding box in the format [xmin, ymin, xmax, ymax], tightly enclosing the eyeglasses on the face. The bounding boxes were generated by a semi-automated labeling process, where an object detector provided initial candidates, and human annotators verified or corrected these proposals [58]. This approach enabled the creation of a robust ground truth set.

Prior to training, the dataset was randomly partitioned into three subsets to facilitate benchmarking using a 3-fold cross-validation approach. All images were resized to

384 \times 384

pixels to match the input size required by the models, balancing sufficient spatial resolution for detecting eyeglasses with computational efficiency suitable for embedded deployment. Intensities were normalized to the range [0,1] and further preprocessed as needed for models initialized from ImageNet weights.

The summary of the datasets used to benchmark deep vision models for embedded eyeglasses detection:

FFHQ (Flickr-Faces-HQ) (https://github.com/NVlabs/ffhq-dataset, accessed on 2 June 2025) [67]. The FFHQ dataset of 70,000 images containing diverse, high-quality facial images.
The extension of the FFHQ dataset for the eyewear detection [58] available at Zenodo (https://doi.org/10.5281/zenodo.14252074) (accessed on 2 June 2025) consisting of approximately 16,000 images with eyeglasses.

2.3. Software Used

The software tools and programming languages used in this research are as follows:

Python (version 3.11.9) (https://www.python.org, (accessed on 16 April 2025)) [72], an interpreted, high-level, general-purpose programming language. Used for the machine learning applications.
TensorFlow (version 2.17.0) with Keras (version 3.5.0) and KerasCV (version 0.9.0) (https://www.tensorflow.org, (accessed on 16 April 2025) [73,74,75], an open-source platform for machine learning. Used for the training of deep vision models for eyeglasses detection.
Albumentations (version 1.4.14) (https://albumentations.ai, (accessed on 16 April 2025)) [76], a Python library for fast and flexible image augmentation. Used for the image augmentations during deep vision model training.
OpenCV (version 4.10.0) (https://opencv.org/, (accessed on 16 April 2025)) [77], an open source computer vision library. Used for image input/output and manipulations.

2.4. Hardware Used

The platforms used in this research are as follows:

Workstation for deep vision model training: NVIDIA GeForce RTX 4090 GPU, Intel Core i7 12700K CPU, 128 GB RAM.
Low-power system 1 for deployment testing: Raspberry Pi 5 single-board computer with quad-core ARM Cortex-A76 CPU, 8 GB RAM. OS: Ubuntu 20.04.6 LTS, JetPack v5.1.1, L4T v35.3.1.
Low-power system 2 for deployment testing: NVIDIA Jetson Orin Nano development board with a quad-core ARM Cortex-A57 MPCore CPU, 4 GB RAM. OS: Debian GNU/Linux 12 (bookworm).

3. Results and Discussion

Experiments were performed on two representative low-power platforms: the Raspberry Pi 5, featuring a quad-core ARM Cortex-A76 CPU, and the NVIDIA Jetson Orin Nano, equipped with a quad-core ARM Cortex-A57 MPCore CPU. For both systems, a range of CNN backbones were evaluated, including MobileNet, MobileNetV2, DenseNet121, six variants of EfficientNet, and four of EfficientNetV2. Results are summarized in Table 2.

The experimental results, summarized in Table 2, provide a comprehensive comparison of several deep CNN architectures for the task of eyeglasses detection, with all models evaluated on the validation sets of 3-fold cross-validation splits of the FFHQ dataset. Performance was assessed according to three principal metrics: model file size (MB), detection accuracy measured as IoU, and inference rate in frames per second (FPS) on two representative low-power platforms: Raspberry Pi 5 and NVIDIA Jetson Orin Nano. Three post-training quantization strategies were applied: float16 quantization (“float16”), dynamic range quantization (“dynamic_int8”), and full integer quantization (“full_int8”). The effect of quantization is presented both in absolute terms and as a ratio (“change”) relative to the non-quantized (“none”) baseline.

Among the evaluated architectures, lightweight models such as MobileNet (0.5) and MobileNetV2 (0.5) demonstrated the lowest memory footprints, with sizes of 21.14 MB and 47.64 MB, respectively. The more complex DenseNet121, EfficientNet-Bx, and EfficientNetV2-Bx models ranged from approximately 60 MB (EfficientNetB0) to 180 MB (EfficientNetB5), reflecting increased depth and parameter counts (Table 2, Figure 4).

Detection accuracy, expressed as mean IoU, remained consistently high for all models. MobileNet (0.5) and MobileNetV2 (0.5) achieved mean IoU values of 0.901 and 0.893, respectively. Several EfficientNet-based models slightly outperformed the MobileNet family, with IoU values peaking at 0.913 for EfficientNetB2 and staying within 0.903–0.911 for the other high-performing variants. EfficientNetV2 models also exhibited strong accuracy, with EfficientNetV2B3 achieving an IoU of 0.908.

Inference rate measurements on low-power systems as frames/second were performed with TensorFlow Lite XNNPACK delegates for CPU-enabled and disabled. Inference latency was measured by averaging over 100 forward passes per configuration, discarding initial 20 runs for cache warm-up (Table 2, Figure 5 and Figure 6). Enabling TensorFlow Lite XNNPACK delegates for CPU consistently improved inference rate. The trade-off between detection accuracy (IoU) and inference rate (FPS) is shown in Figure 5.

Inference speed on the Raspberry Pi 5 and Jetson Orin Nano still showed a clear trade-off between model complexity and real-time performance. MobileNetV2 (0.5) delivered the highest throughput, reaching 28.46 FPS on the Pi 5 and 26.73 FPS on the Jetson. In contrast, larger EfficientNet variants were markedly slower: EfficientNetB5 managed only 1.25 FPS on the Pi 5 and 1.04 FPS on the Jetson, while EfficientNetV2B3 achieved 2.54 FPS on the Pi 5 and 1.96 FPS on the Jetson, underscoring the cost of increased model capacity.

3.1. Impact of Quantization

Quantization strategies produced the anticipated reduction in model size (Figure 4). Across all architectures, float16 quantization halved the model size (change ≈ 0.50), while both dynamic int8 and full int8 quantization reduced model size by approximately 75% (change ≈ 0.25–0.27). These reductions substantially lower storage and memory requirements, which is critical for embedded deployment.

Inference rate, as measured by FPS, was significantly improved by quantization (Figure 6), especially on the Raspberry Pi 5. For all models, float16 quantization yielded only marginal changes in FPS (change ≈ 0.99–1.02), consistent with the limited native hardware support for float16 operations. In contrast, dynamic int8 and full int8 quantization produced more pronounced speed-ups if TensorFlow Lite XNNPACK delegates for CPU were enabled. For disabled delegates, full int8 quantization led to a smaller increase in FPS, and in the case of EfficientNet models, it even resulted in a decrease in FPS. MobileNet (0.5) achieved a 1.57-fold increase in FPS (from 26.25 to 41.18) on the Raspberry Pi 5 with dynamic int8, and a 2.77-fold increase (to 72.63) under full int8 quantization. EfficientNetB0 demonstrated a similar trend, with FPS rising from 5.22 (none) to 6.18 (dynamic int8, change 1.18-fold) and 15.64 (full int8, change 3.00-fold). Across all architectures on the Pi 5, dynamic int8 provided 1.14–1.79-fold speed-ups, whereas full int8 delivered 2.31–3.33-fold acceleration. Jetson Orin Nano (CPU-only) results mirrored these trends but with generally smaller gains (e.g., MobileNet (0.5): 1.82-fold and 2.26-fold for dynamic and full int8, respectively), reflecting differences in CPU architecture and memory bandwidth.

Detection accuracy, as measured by IoU, was largely preserved following quantization. For most architectures, both float16 and dynamic int8 quantization had negligible impact on IoU, with change ratios of 1.00 relative to the baseline. Full int8 quantization induced a slight decrease in accuracy in certain cases. For example, MobileNetV2 (0.5) exhibited an IoU reduction from 0.893 (none) to 0.878 (full int8), corresponding to a change ratio of 0.98. Some EfficientNet-based models, such as EfficientNetB5, exhibited reduced resilience to quantization, with the IoU under full int8 quantization decreasing to 65% of the baseline value. For the DenseNet121 model, full int8 quantization reduced the IoU to 29% of the baseline, corresponding to an IoU value of 0.257. This represents the largest decrease in IoU observed among the evaluated models. The overall results confirm that post-training quantization enables substantial model compression with minimal accuracy loss, validating its suitability for resource-constrained platforms (Figure 5).

The results highlight clear trade-offs between model complexity, detection accuracy, and edge-deployment efficiency. MobileNet (0.5) and MobileNetV2 (0.5) still delivered the fastest real-time throughput—peaking at 72.6 FPS and 65.7 FPS on the Raspberry Pi 5 under full-int8 quantization—but at the cost of slightly reduced localization accuracy (mean IoU 0.873 and 0.878). Within the EfficientNet line-up, EfficientNetB0 and EfficientNetV2B0 offered the most practical compromise, retaining high accuracy (IoU

\approx 0.86

) while sustaining 15–16 FPS once fully quantized. Scaling further (EfficientNetB4, B5, V2B3) yielded only marginal accuracy gains (up to IoU 0.911) yet imposed steep latency penalties: baseline inference fell to 1.3–2.5 FPS (none) and, even after full-int8 conversion, rarely exceeded 8.3 FPS on low-power devices—well below the 25 FPS real-time threshold.

Dynamic int8 and full int8 quantization both provided significant speed and size benefits. Full int8 models achieved the highest acceleration, but occasionally at the cost of minor accuracy reductions. Nevertheless, for most EfficientNet variants, the decrease in IoU was ≤3%, supporting the use of integer quantization for deployment scenarios requiring strict memory and latency constraints.

Quantization-induced speedups were consistently more pronounced on the Raspberry Pi 5 than on the Jetson Orin Nano without GPU acceleration. For example, full int8 quantization increased MobileNet (0.5) FPS by a factor of 2.77 on Pi 5 (from 26.25 to 72.63 FPS) but only 2.26 on Jetson (from 22.39 to 50.53 FPS). EfficientNetB0 showed a similar pattern, rising from 5.22 to 15.64 FPS on Pi 5 (change by a factor of 3.00) versus an increase from 5.01 to 11.59 FPS on Jetson (change of 2.31). These differences reflect both the architectural characteristics of each platform and the efficiency of TFLite’s backend implementation on ARM CPUs.

Additional experiments were conducted to measure the power consumption during model inference directly (Figure 7). These power measurements were performed exclusively on the NVIDIA Jetson Orin Nano, as it features built-in hardware sensors for monitoring power draw on individual rails (Board, CPU/GPU/CV). Using the Jetson’s integrated “tegrastats” utility and custom high-frequency sampling scripts, we captured real-time power usage data during inference runs. Due to the lack of equivalent onboard sensors, direct power measurements could not be obtained on the Raspberry Pi 5, and thus, energy efficiency results presented here are limited to the Jetson platform. To mitigate measurement inaccuracies caused by the brief duration of individual inferences (often close to the sensor’s sampling interval), each model was executed sequentially multiple times, and average power consumption, along with standard deviation, were recorded. The resulting data, visualized in the provided figures (Figure 7), confirm that higher quantization levels effectively reduce average power draw, thus demonstrating the effectiveness of quantization in improving power efficiency on low-power hardware.

3.2. Failure Case Analysis

Since most quantized models achieved an average IoU of approximately 0.85, we defined a threshold of IoU ≤ 0.80 to identify clearly underperforming predictions. A manual review of failure cases revealed several consistent error modes (Figure 8). The most common was “Partial temple only” (42.9%), typically resulting from extreme viewing angles where eyeglass bounding boxes overlapped the face edge, complicating the localization of temple arms. The second most frequent category was “Not on face” (39.9%), involving detections on the head, in hands, or elsewhere off the face. Less frequent but visually challenging cases included “Rare or novelty designs” (2.7%) and “Rimless or low-contrast frames” (1.6%). The remaining 12.8% of failures were grouped under “Other”. To address these failure modes, future work could explore data augmentation with synthetic [78,79] or adversarially lit samples and incorporate contrastive learning techniques to improve model robustness.

3.3. Deployment Guidelines

Based on the empirical results, the following deployment guidelines are suggested:

For real-time eyeglasses detection on embedded platforms, MobileNet (0.5), MobileNetV2 (0.5) are recommended due to their balance of accuracy, compactness, and throughput after quantization.
Full int8 quantization should be favored for scenarios with stringent memory and latency requirements, accepting minimal accuracy loss (≤5%). Dynamic int8 serves as an effective compromise when slightly higher accuracy preservation is needed.
Raspberry Pi 5 demonstrated greater FPS improvements from quantization than Jetson Orin Nano without GPU acceleration, but absolute throughput was comparable. Both platforms are suitable for the deployment of quantized models, with model selection tailored to application-specific accuracy and speed constraints.

3.4. Limitations and Future Work

In this research, the evaluation focused on CPU inference with a batch size of one and did not explore mixed-precision hybrids or hardware-specific compiler stacks such as TensorRT or TVM; these could provide further gains. Selective float-precision retention might reclaim the last few points of accuracy without consuming a large amount of memory if layers are differently sensitive to quantization noise. Future research will explore automated Neural Architecture Search under latency and power constraints, pruning-plus-quantization pipelines, and extending the workflow to multi-object detection tasks (e.g., masks, hats) in the same unified framework.

4. Conclusions

This study systematically evaluated the deployment of optimized deep vision models for eyeglasses detection on low-power embedded platforms. Experimental results demonstrated that post-training quantization strategies, particularly full integer quantization, substantially reduced model size and improved inference speed with only minimal accuracy degradation. Lightweight architectures such as MobileNet (0.5) and MobileNetV2 provided the most favorable balance of detection accuracy, model compactness, and real-time performance on both Raspberry Pi 5 and NVIDIA Jetson Orin Nano devices. These findings confirm that quantized CNN models are well-suited for real-time eyeglasses detection in resource-constrained environments. Future work will address further optimization using hardware-specific acceleration and evaluate energy efficiency across broader deployment scenarios.

Author Contributions

Conceptualization, D.M. and T.S.; methodology, D.M. and T.S.; software, H.G.; validation, H.G., T.S. and D.M.; resources, D.M.; data curation, D.M. and H.G.; writing—original draft preparation, H.G.; writing—review and editing, H.G., D.M. and T.S.; visualization, H.G.; project administration, D.M.; funding acquisition, D.M. All authors have read and agreed to the published version of the manuscript.

Funding

This project has received funding from the Research Council of Lithuania (LMTLT), agreement No S-ITP-24-12.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The extension of the FFHQ dataset for the eyewear detection can be found at Zenodo (https://doi.org/10.5281/zenodo.14252074) (accessed on 2 June 2025).

Conflicts of Interest

The author declares no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

FFHQ	Flickr-Faces-HQ
CNN	Convolutional neural network
SSD	Single Shot MultiBox Detector
FPS	Frames per second
IoU	Intersection over union
VPU	Vision processing unit
NPU	Neural processing unit
TFLite	TensorFlow Lite

References

Merenda, M.; Porcaro, C.; Iero, D. Edge machine learning for ai-enabled iot devices: A review. Sensors 2020, 20, 2533. [Google Scholar] [CrossRef] [PubMed]
Shi, W.; Cao, J.; Zhang, Q.; Li, Y.; Xu, L. Edge computing: Vision and challenges. IEEE Internet Things J. 2016, 3, 637–646. [Google Scholar] [CrossRef]
Rancea, A.; Anghel, I.; Cioara, T. Edge computing in healthcare: Innovations, opportunities, and challenges. Future Internet 2024, 16, 329. [Google Scholar] [CrossRef]
Chen, J.; Ran, X. Deep learning with edge computing: A review. Proc. IEEE 2019, 107, 1655–1674. [Google Scholar] [CrossRef]
Goethals, T.; Volckaert, B.; De Turck, F. Enabling and leveraging AI in the intelligent edge: A review of current trends and future directions. IEEE Open J. Commun. Soc. 2021, 2, 2311–2341. [Google Scholar] [CrossRef]
Žuraulis, V.; Matuzevičius, D.; Serackis, A. A method for automatic image rectification and stitching for vehicle yaw marks trajectory estimation. Promet-Traffic Transp. 2016, 28, 23–30. [Google Scholar] [CrossRef]
Dutta, L.; Bharali, S. Tinyml meets iot: A comprehensive survey. Internet Things 2021, 16, 100461. [Google Scholar] [CrossRef]
Novac, P.E.; Boukli Hacene, G.; Pegatoquet, A.; Miramond, B.; Gripon, V. Quantization and deployment of deep neural networks on microcontrollers. Sensors 2021, 21, 2984. [Google Scholar] [CrossRef] [PubMed]
Tang, J.; Sun, D.; Liu, S.; Gaudiot, J.L. Enabling Deep Learning on IoT Devices. Computer 2017, 50, 92–96. [Google Scholar] [CrossRef]
Heydari, S.; Mahmoud, Q.H. Tiny Machine Learning and On-Device Inference: A Survey of Applications, Challenges, and Future Directions. Sensors 2025, 25, 3191. [Google Scholar] [CrossRef]
Suwannaphong, T.; Jovan, F.; Craddock, I.; McConville, R. Optimising TinyML with quantization and distillation of transformer and mamba models for indoor localisation on edge devices. Sci. Rep. 2025, 15, 10081. [Google Scholar] [CrossRef] [PubMed]
Deng, L.; Li, G.; Han, S.; Shi, L.; Xie, Y. Model compression and hardware acceleration for neural networks: A comprehensive survey. Proc. IEEE 2020, 108, 485–532. [Google Scholar] [CrossRef]
Roth, W.; Schindler, G.; Klein, B.; Peharz, R.; Tschiatschek, S.; Fröning, H.; Pernkopf, F.; Ghahramani, Z. Resource-efficient neural networks for embedded systems. J. Mach. Learn. Res. 2024, 25, 1–51. [Google Scholar]
Ghimire, D.; Kil, D.; Kim, S.h. A survey on efficient convolutional neural networks and hardware acceleration. Electronics 2022, 11, 945. [Google Scholar] [CrossRef]
Liu, L.; Wang, L.; Ma, Z. Improved lightweight YOLOv5 based on ShuffleNet and its application on traffic signs detection. PLoS ONE 2024, 19, e0310269. [Google Scholar] [CrossRef] [PubMed]
Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar]
Howard, A.; Sandler, M.; Chu, G.; Chen, L.C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V.; et al. Searching for MobileNetV3. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1314–1324. [Google Scholar]
Iandola, F.N.; Han, S.; Moskewicz, M.W.; Ashraf, K.; Dally, W.J.; Keutzer, K. SqueezeNet: AlexNet-level accuracy with 50× fewer parameters and <0.5 MB model size. arXiv 2016, arXiv:1602.07360. [Google Scholar]
Zhu, Q.; Zhuang, H.; Zhao, M.; Xu, S.; Meng, R. A study on expression recognition based on improved mobilenetV2 network. Sci. Rep. 2024, 14, 8121. [Google Scholar] [CrossRef] [PubMed]
Arora, L.; Singh, S.K.; Kumar, S.; Gupta, H.; Alhalabi, W.; Arya, V.; Bansal, S.; Chui, K.T.; Gupta, B.B. Ensemble deep learning and EfficientNet for accurate diagnosis of diabetic retinopathy. Sci. Rep. 2024, 14, 30554. [Google Scholar] [CrossRef]
Liu, D.; Zhu, Y.; Liu, Z.; Liu, Y.; Han, C.; Tian, J.; Li, R.; Yi, W. A survey of model compression techniques: Past, present, and future. Front. Robot. AI 2025, 12, 1518965. [Google Scholar] [CrossRef]
He, Y.; Xiao, L. Structured pruning for deep convolutional neural networks: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 46, 2900–2919. [Google Scholar] [CrossRef]
Liang, T.; Glossner, J.; Wang, L.; Shi, S.; Zhang, X. Pruning and quantization for deep neural network acceleration: A survey. Neurocomputing 2021, 461, 370–403. [Google Scholar] [CrossRef]
Gou, J.; Yu, B.; Maybank, S.J.; Tao, D. Knowledge distillation: A survey. Int. J. Comput. Vis. 2021, 129, 1789–1819. [Google Scholar] [CrossRef]
Zhu, K.; Hu, F.; Ding, Y.; Zhou, W.; Wang, R. A comprehensive review of network pruning based on pruning granularity and pruning time perspectives. Neurocomputing 2025, 626, 129382. [Google Scholar] [CrossRef]
Li, Z.; Li, H.; Meng, L. Model compression for deep neural networks: A survey. Computers 2023, 12, 60. [Google Scholar] [CrossRef]
Peng, P.; You, M.; Xu, W.; Li, J. Fully integer-based quantization for mobile convolutional neural network inference. Neurocomputing 2021, 432, 194–205. [Google Scholar] [CrossRef]
Alkhulaifi, A.; Alsahli, F.; Ahmad, I. Knowledge distillation in deep learning and its applications. PeerJ Comput. Sci. 2021, 7, e474. [Google Scholar] [CrossRef]
Sze, V.; Chen, Y.H.; Yang, T.J.; Emer, J.S. Efficient Processing of Deep Neural Networks; Springer: Berlin/Heidelberg, Germany, 2020. [Google Scholar]
Dantas, P.V.; Sabino da Silva Jr, W.; Cordeiro, L.C.; Carvalho, C.B. A comprehensive review of model compression techniques in machine learning. Appl. Intell. 2024, 54, 11804–11844. [Google Scholar] [CrossRef]
Dai, S.; Luo, Z.; Luo, W.; Wang, S.; Dai, C.; Guo, B.; Zhou, X. Energy-Efficient Inference With Software-Hardware Co-Design for Sustainable Artificial Intelligence of Things. IEEE Internet Things J. 2024, 11, 39170–39182. [Google Scholar] [CrossRef]
Singh, R.; Gill, S.S. Edge AI: A survey. Internet Things Cyber-Phys. Syst. 2023, 3, 71–92. [Google Scholar] [CrossRef]
Cittadini, E.; Marinoni, M.; Buttazzo, G. A hardware accelerator to support deep learning processor units in real-time image processing. Eng. Appl. Artif. Intell. 2025, 145, 110159. [Google Scholar] [CrossRef]
Dunkel, E.R.; Swope, J.; Candela, A.; West, L.; Chien, S.A.; Towfic, Z.; Buckley, L.; Romero-Ca nas, J.; Espinosa-Aranda, J.L.; Hervas-Martin, E.; et al. Benchmarking deep learning models on myriad and snapdragon processors for space applications. J. Aerosp. Inf. Syst. 2023, 20, 660–674. [Google Scholar] [CrossRef]
Corral, J.M.R.; Civit-Masot, J.; Luna-Perejón, F.; Díaz-Cano, I.; Morgado-Estévez, A.; Domínguez-Morales, M. Energy efficiency in edge TPU vs. embedded GPU for computer-aided medical imaging segmentation and classification. Eng. Appl. Artif. Intell. 2024, 127, 107298. [Google Scholar] [CrossRef]
Kirtas, M.; Oikonomou, A.; Passalis, N.; Mourgias-Alexandris, G.; Moralis-Pegios, M.; Pleros, N.; Tefas, A. Quantization-aware training for low precision photonic neural networks. Neural Netw. 2022, 155, 561–573. [Google Scholar] [CrossRef]
Cherniuk, D.; Abukhovich, S.; Phan, A.H.; Oseledets, I.; Cichocki, A.; Gusak, J. Quantization Aware Factorization for Deep Neural Network Compression. J. Artif. Intell. Res. 2024, 81, 973–988. [Google Scholar] [CrossRef]
Jiang, B.; Chen, J.; Liu, Y. Single-shot pruning and quantization for hardware-friendly neural network acceleration. Eng. Appl. Artif. Intell. 2023, 126, 106816. [Google Scholar] [CrossRef]
Sledevič, T.; Serackis, A.; Plonis, D. FPGA Implementation of a Convolutional Neural Network and Its Application for Pollen Detection upon Entrance to the Beehive. Agriculture 2022, 12, 1849. [Google Scholar] [CrossRef]
Song, B.; Chen, J.; Liu, W.; Fang, J.; Xue, Y.; Liu, X. YOLO-ELWNet: A lightweight object detection network. Neurocomputing 2025, 636, 129904. [Google Scholar] [CrossRef]
Chen, R.; Wang, P.; Lin, B.; Wang, L.; Zeng, X.; Hu, X.; Yuan, J.; Li, J.; Ren, J.; Zhao, H. An optimized lightweight real-time detection network model for IoT embedded devices. Sci. Rep. 2025, 15, 3839. [Google Scholar] [CrossRef]
Varna, D.; Abromavičius, V. A System for a Real-Time Electronic Component Detection and Classification on a Conveyor Belt. Appl. Sci. 2022, 12, 5608. [Google Scholar] [CrossRef]
Murthy, C.B.; Hashmi, M.F.; Bokde, N.D.; Geem, Z.W. Investigations of object detection in images/videos using various deep learning techniques and embedded platforms—A comprehensive review. Appl. Sci. 2020, 10, 3280. [Google Scholar] [CrossRef]
Ahmed, M.; Hashmi, K.A.; Pagani, A.; Liwicki, M.; Stricker, D.; Afzal, M.Z. Survey and performance analysis of deep learning based object detection in challenging environments. Sensors 2021, 21, 5116. [Google Scholar] [CrossRef] [PubMed]
Wang, D.; Wang, J.G.; Xu, K. Deep learning for object detection, classification and tracking in industry applications. Sensors 2021, 21, 7349. [Google Scholar] [CrossRef]
Jiang, D.; Wang, H.; Li, T.; Gouda, M.A.; Zhou, B. Real-time tracker of chicken for poultry based on attention mechanism-enhanced YOLO-Chicken algorithm. Comput. Electron. Agric. 2025, 237, 110640. [Google Scholar] [CrossRef]
Du, Y.; Pan, N.; Xu, Z.; Deng, F.; Shen, Y.; Kang, H. Pavement distress detection and classification based on YOLO network. Int. J. Pavement Eng. 2021, 22, 1659–1672. [Google Scholar] [CrossRef]
Lee, Y.W.; Kim, K.W.; Hoang, T.M.; Arsalan, M.; Park, K.R. Deep residual CNN-based ocular recognition based on rough pupil detection in the images by NIR camera sensor. Sensors 2019, 19, 842. [Google Scholar] [CrossRef]
Le, N.T.; Wang, J.W.; Wang, C.C.; Nguyen, T.N. Automatic defect inspection for coated eyeglass based on symmetrized energy analysis of color channels. Symmetry 2019, 11, 1518. [Google Scholar] [CrossRef]
Bekhet, S.; Alahmer, H. A robust deep learning approach for glasses detection in non-standard facial images. IET Biom. 2021, 10, 74–86. [Google Scholar] [CrossRef]
Huo, D.; Wang, J.; Qian, Y.; Yang, Y.H. Glass segmentation with RGB-thermal image pairs. IEEE Trans. Image Process. 2023, 32, 1911–1926. [Google Scholar] [CrossRef]
Jain, R.; Goyal, A.; Venkatesan, K. Real-time eyeglass detection using transfer learning for non-standard facial data. Int. J. Electr. Comput. Eng. 2022, 12, 3709–3720. [Google Scholar] [CrossRef]
Matuzevičius, D. A Retrospective Analysis of Automated Image Labeling for Eyewear Detection Using Zero-Shot Object Detectors. Electronics 2024, 13, 4763. [Google Scholar] [CrossRef]
Al Qudah, M.; Mohamed, A.; Lutfi, S. Analysis of Facial Occlusion Challenge in Thermal Images for Human Affective State Recognition. Sensors 2023, 23, 3513. [Google Scholar] [CrossRef] [PubMed]
Zhang, H.; Guo, J. ERAT: Eyeglasses removal with attention. Pattern Recognit. 2025, 158, 110970. [Google Scholar] [CrossRef]
Wang, J.; Liu, P.; Liu, J.; Xu, W. Text-guided Eyeglasses Manipulation with Spatial Constraints. IEEE Trans. Multimed. 2023, 26, 4375–4388. [Google Scholar] [CrossRef]
Matuzevičius, D. Diverse Dataset for Eyeglasses Detection: Extending the Flickr-Faces-HQ (FFHQ) Dataset. Sensors 2024, 24, 7697. [Google Scholar] [CrossRef]
Abromavicius, V.; Serackis, A.; Katkevicius, A.; Plonis, D. Evaluation of EEG-based Complementary Features for Assessment of Visual Discomfort based on Stable Depth Perception Time. Radioengineering 2018, 27, 1138–1146. [Google Scholar] [CrossRef]
Alionte, C.G.; Ungureanu, L.M.; Alexandru, T.M. Innovation process for optical face scanner used to customize 3D printed spectacles. Materials 2022, 15, 3496. [Google Scholar] [CrossRef]
Marelli, D.; Bianco, S.; Ciocca, G. Designing an AI-based virtual try-on web application. Sensors 2022, 22, 3832. [Google Scholar] [CrossRef]
Yu, R.; Ren, W.; Zhao, M.; Wang, J.; Wu, D.; Xie, Y. Transparent objects segmentation based on polarization imaging and deep learning. Opt. Commun. 2024, 555, 130246. [Google Scholar] [CrossRef]
Yan, T.; Xu, S.; Huang, H.; Li, H.; Tan, L.; Chang, X.; Lau, R.W. NRGlassNet: Glass surface detection from visible and near-infrared image pairs. Knowl.-Based Syst. 2024, 294, 111722. [Google Scholar] [CrossRef]
Basbrain, A.M.; Al-Taie, I.; Azeez, N.; Gan, J.Q.; Clark, A. Shallow convolutional neural network for eyeglasses detection in facial images. In Proceedings of the 2017 9th Computer Science and Electronic Engineering (CEEC), London, UK, 27–29 September 2017; pp. 157–161. [Google Scholar]
Zhang, B.; Wang, Z.; Ling, Y.; Guan, Y.; Zhang, S.; Li, W.; Wei, L.; Zhang, C. ShuffleTrans: Patch-wise weight shuffle for transparent object segmentation. Neural Netw. 2023, 167, 199–212. [Google Scholar] [CrossRef] [PubMed]
Tamulionis, M.; Sledevič, T.; Abromavičius, V.; Kurpytė-Lipnickė, D.; Navakauskas, D.; Serackis, A.; Matuzevičius, D. Finding the Least Motion-Blurred Image by Reusing Early Features of Object Detection Network. Appl. Sci. 2023, 13, 1264. [Google Scholar] [CrossRef]
Karras, T.; Laine, S.; Aila, T. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 4401–4410. [Google Scholar]
Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely Connected Convolutional Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
Tan, M.; Le, Q. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In Proceedings of the International Conference on Machine Learning. PMLR, Long Beach, CA, USA, 9–15 June 2019; pp. 6105–6114. [Google Scholar]
Tan, M.; Le, Q. EfficientNetV2: Smaller Models and Faster Training. In Proceedings of the International Conference on Machine Learning. PMLR, Virtually, 18–24 July 2021; pp. 10096–10106. [Google Scholar]
Diwan, T.; Anirudh, G.; Tembhurne, J.V. Object detection using YOLO: Challenges, architectural successors, datasets and applications. Multimed. Tools Appl. 2023, 82, 9243–9275. [Google Scholar] [CrossRef] [PubMed]
Van Rossum, G.; Drake, F.L. Python 3 Reference Manual; CreateSpace: Scotts Valley, CA, USA, 2009. [Google Scholar]
Abadi, M.; Agarwal, A.; Barham, P.; Brevdo, E.; Chen, Z.; Citro, C.; Corrado, G.S.; Davis, A.; Dean, J.; Devin, M.; et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. 2015. Available online: https://www.tensorflow.org/ (accessed on 16 April 2025).
Chollet, F. Keras. Available online: https://keras.io (accessed on 16 April 2025).
Wood, L.; Tan, Z.; Stenbit, I.; Bischof, J.; Zhu, S.; Chollet, F.; Sreepathihalli, D.; Sampath, R. KerasCV. Available online: https://github.com/keras-team/keras-cv (accessed on 16 April 2025).
Buslaev, A.; Iglovikov, V.I.; Khvedchenya, E.; Parinov, A.; Druzhinin, M.; Kalinin, A.A. Albumentations: Fast and Flexible Image Augmentations. Information 2020, 11, 125. [Google Scholar] [CrossRef]
OpenCV Team. OpenCV: Open Source Computer Vision Library. Available online: https://opencv.org/ (accessed on 16 April 2025).
Matuzevičius, D. Synthetic Data Generation for the Development of 2D Gel Electrophoresis Protein Spot Models. Appl. Sci. 2022, 12, 4393. [Google Scholar] [CrossRef]
Matuzevičius, D. Rulers2023: An Annotated Dataset of Synthetic and Real Images for Ruler Detection Using Deep Learning. Electronics 2023, 12, 4924. [Google Scholar] [CrossRef]

Figure 1. Deployment platforms for optimized eyeglasses detection on edge devices: (a) Raspberry Pi 5 single-board computer, featuring a quad-core ARM Cortex-A76 CPU, represents a low-power hardware configuration; (b) NVIDIA Jetson Orin Nano development board, equipped with a quad-core ARM Cortex-A57 MPCore CPU. Both platforms were utilized as target devices for benchmarking quantized deep convolutional neural network models trained for eyeglasses detection. These setups support real-time on-device eyeglasses detection in embedded applications, demonstrating the practical feasibility of efficient, resource-constrained inference on commercially available edge hardware.

Figure 2. The problem of the eyeglasses detection in the image. Sample images are from the FFHQ [58,67] dataset used for eyeglasses detection. The bounding box annotation covers all components of the eyeglasses.

Figure 3. Block diagram of the eyeglasses detection model architecture. A CNN backbone is followed by a detection head regressing the bounding box.

Figure 4. Comparison of model sizes for various neural network architectures under different quantization schemes.

Figure 5. Trade-off between accuracy (IoU) and inference rate with TensorFlow Lite XNNPACK delegates for CPU-enabled (FPS) across models and platforms.

Figure 6. Comparison of inference speed (frames per second, mean ± standard deviation) across Raspberry Pi 5 (a) and Jetson Orin Nano (b) platforms using different quantization methods. Solid lines indicate measurements with TensorFlow Lite XNNPACK delegates for CPU-enabled; dashed lines indicate delegates disabled.

Figure 7. Comparison of power draw for the board (a) and CPU/GPU/CV rail (b) across different quantization types (mean ± standard deviation). Solid lines indicate measurements with TensorFlow Lite XNNPACK delegates for CPU-enabled; dashed lines indicate delegates disabled.

Figure 8. Qualitative analysis of common detection failure cases using the MobileNetV2 “full_int8” model (IoU ≤ 0.80), rows illustrate: (a) missed temples—temple arms not detected despite both lenses being visible; (b) glasses not worn on face; (c) other cases—failures not matching predefined categories; (d) rare or novelty/designer glasses—unusual or decorative eyewear types; (e) rimless or low-contrast—thin, wire, or rimless frames blending into the face/background.

Table 1. Summary of the deep vision models.

Backbone	Parameters ¹	Depth ²	Size (MB) ³	Input ⁴
MobileNet (0.5) ⁵	$5.55$ M	27	$21.14$	$384^{2}$
MobileNetV2 (0.5) ⁵	$12.51$ M	52	$47.64$	$384^{2}$
DenseNet121	$16.48$ M	120	$62.64$	$384^{2}$
EfficientNetB0	$15.85$ M	81	$60.33$	$384^{2}$
EfficientNetB1	$18.38$ M	115	$69.83$	$384^{2}$
EfficientNetB2	$20.74$ M	115	$78.91$	$384^{2}$
EfficientNetB3	$24.94$ M	130	$94.80$	$384^{2}$
EfficientNetB4	$34.19$ M	160	$129.82$	$384^{2}$
EfficientNetB5	$47.39$ M	194	$179.99$	$384^{2}$
EfficientNetV2B0	$17.72$ M	91	$67.36$	$384^{2}$
EfficientNetV2B1	$18.73$ M	111	$71.17$	$384^{2}$
EfficientNetV2B2	$21.75$ M	116	$82.62$	$384^{2}$
EfficientNetV2B3	$27.09$ M	136	$102.84$	$384^{2}$

¹ Total number of models’ (backbone and head) parameters; ² the number of convolutional layers in the backbone; ³ the size of the model; ⁴ input image size (resolution); ⁵ alpha parameter of the model provided in parentheses.

Table 2. Performance summary of the models, evaluated on the validation sets across 3-fold cross-validation splits of the FFHQ dataset. Values with a darker gray background indicate the best results, lighter gray background denote the second-best results.

Model	Metric	Quantization Type
		none ⁵		float16			dynamic int8			full int8
		Mean	s.d.	Mean	s.d.	Change ⁶	Mean	s.d.	Change ⁶	Mean	s.d.	Change ⁶
MobileNet (0.5)	Size (MB) ¹	21.14	—	10.58	—	0.50	5.390	—	0.25	5.440	—	0.26
	IoU ²	0.901	0.002	0.901	0.002	1.00	0.899	0.002	1.00	0.873	0.019	0.97
	RPi5 On ³	26.25	0.622	26.51	0.108	1.01	41.18	0.733	1.57	72.63	0.219	2.77
	RPi5 Off ⁴	17.80	0.205	18.44	0.227	1.04	26.04	0.455	1.46	39.32	0.037	2.21
	Jetson On ³	22.39	0.024	22.37	0.160	1.00	40.66	0.127	1.82	50.53	0.057	2.26
	Jetson Off ⁴	17.13	0.053	17.32	0.026	1.01	28.25	0.047	1.65	30.34	0.104	1.77
MobileNetV2 (0.5)	Size (MB) ¹	47.64	—	23.85	—	0.50	12.08	—	0.25	12.18	—	0.26
	IoU ²	0.893	0.004	0.893	0.004	1.00	0.893	0.003	1.00	0.878	0.013	0.98
	RPi5 On ³	28.46	0.180	28.82	0.285	1.01	32.38	0.821	1.14	65.73	0.132	2.31
	RPi5 Off ⁴	17.24	0.234	17.90	0.244	1.04	21.16	0.384	1.23	34.11	0.140	1.98
	Jetson On ³	26.73	0.050	26.76	0.062	1.00	38.14	0.150	1.43	46.45	0.054	1.74
	Jetson Off ⁴	18.63	0.051	18.69	0.057	1.00	23.86	0.088	1.28	26.76	0.092	1.44
DenseNet121	Size (MB) ¹	62.64	—	31.41	—	0.50	16.08	—	0.26	16.02	—	0.26
	IoU ²	0.891	0.010	0.891	0.010	1.00	0.892	0.012	1.00	0.257	0.196	0.29
	RPi5 On ³	1.488	0.004	1.486	0.005	1.00	2.572	0.008	1.73	4.883	0.007	3.28
	RPi5 Off ⁴	0.697	0.002	0.694	0.001	1.00	2.037	0.038	2.92	2.581	0.005	3.70
	Jetson On ³	1.268	0.000	1.269	0.009	1.00	3.349	0.002	2.64	3.612	0.002	2.85
	Jetson Off ⁴	0.989	0.000	0.989	0.001	1.00	2.810	0.002	2.84	2.699	0.002	2.73
EfficientNetB0	Size (MB) ¹	60.33	—	30.22	—	0.50	15.58	—	0.26	15.94	—	0.26
	IoU ²	0.905	0.007	0.904	0.007	1.00	0.904	0.006	1.00	0.848	0.047	0.94
	RPi5 On ³	5.217	0.039	5.201	0.038	1.00	6.179	0.035	1.18	15.64	0.040	3.00
	RPi5 Off ⁴	4.094	0.021	4.213	0.015	1.03	4.933	0.059	1.21	4.516	0.009	1.10
	Jetson On ³	5.012	0.006	5.014	0.007	1.00	6.522	0.014	1.30	11.59	0.004	2.31
	Jetson Off ⁴	4.257	0.190	4.341	0.074	1.02	5.778	0.012	1.36	4.726	0.006	1.11
EfficientNetB1	Size (MB) ¹	69.83	—	35.00	—	0.50	18.22	—	0.26	18.74	—	0.27
	IoU ²	0.904	0.010	0.904	0.010	1.00	0.905	0.007	1.00	0.790	0.164	0.87
	RPi5 On ³	3.748	0.035	3.779	0.020	1.01	4.387	0.027	1.17	11.19	0.020	2.99
	RPi5 Off ⁴	2.961	0.010	3.007	0.015	1.02	3.651	0.029	1.23	3.061	0.006	1.03
	Jetson On ³	3.533	0.004	3.536	0.004	1.00	4.636	0.008	1.31	8.186	0.011	2.32
	Jetson Off ⁴	3.049	0.004	3.079	0.004	1.01	4.166	0.006	1.37	3.235	0.003	1.06
EfficientNetB2	Size (MB) ¹	78.91	—	39.52	—	0.50	20.53	—	0.26	21.09	—	0.27
	IoU ²	0.913	0.002	0.913	0.002	1.00	0.911	0.002	1.00	0.895	0.008	0.98
	RPi5 On ³	3.410	0.026	3.405	0.021	1.00	4.169	0.024	1.22	10.37	0.017	3.04
	RPi5 Off ⁴	2.634	0.006	2.662	0.012	1.01	3.379	0.030	1.28	2.878	0.003	1.09
	Jetson On ³	3.189	0.007	3.190	0.004	1.00	4.321	0.007	1.35	7.566	0.003	2.37
	Jetson Off ⁴	2.730	0.082	2.785	0.003	1.02	3.886	0.005	1.42	3.034	0.001	1.11
EfficientNetB3	Size (MB) ¹	94.80	—	47.50	—	0.50	24.72	—	0.26	25.46	—	0.27
	IoU ²	0.905	0.002	0.905	0.002	1.00	0.903	0.002	1.00	0.878	0.013	0.97
	RPi5 On ³	2.487	0.013	2.511	0.016	1.01	3.039	0.013	1.22	7.444	0.023	2.99
	RPi5 Off ⁴	1.908	0.007	1.938	0.006	1.02	2.491	0.011	1.31	2.117	0.001	1.11
	Jetson On ³	2.269	0.004	2.264	0.005	1.00	3.125	0.004	1.38	5.439	0.001	2.40
	Jetson Off ⁴	1.971	0.001	1.988	0.012	1.01	2.857	0.004	1.45	2.206	0.001	1.12
EfficientNetB4	Size (MB) ¹	129.8	—	65.03	—	0.50	33.92	—	0.26	34.96	—	0.27
	IoU ²	0.903	0.010	0.903	0.010	1.00	0.901	0.009	1.00	0.776	0.190	0.86
	RPi5 On ³	1.816	0.007	1.796	0.004	0.99	2.248	0.006	1.24	5.493	0.011	3.03
	RPi5 Off ⁴	1.356	0.005	1.372	0.004	1.01	1.845	0.012	1.36	1.562	0.001	1.15
	Jetson On ³	1.555	0.003	1.553	0.002	1.00	2.289	0.013	1.47	3.985	0.001	2.56
	Jetson Off ⁴	1.369	0.001	1.382	0.001	1.01	2.121	0.002	1.55	1.650	0.000	1.21
EfficientNetB5	Size (MB) ¹	180.0	—	90.13	—	0.50	46.97	—	0.26	48.43	—	0.27
	IoU ²	0.912	0.003	0.912	0.003	1.00	0.908	0.001	1.00	0.680	0.339	0.75
	RPi5 On ³	1.247	0.003	1.246	0.004	1.00	1.632	0.004	1.31	3.830	0.006	3.07
	RPi5 Off ⁴	0.929	0.002	0.928	0.002	1.00	1.317	0.006	1.42	1.090	0.001	1.17
	Jetson On ³	1.045	0.001	1.045	0.001	1.00	1.612	0.010	1.54	2.748	0.001	2.63
	Jetson Off ⁴	0.922	0.001	0.926	0.001	1.00	1.496	0.001	1.62	1.153	0.000	1.25
EfficientNetV2B0	Size (MB) ¹	67.36	—	33.74	—	0.50	17.53	—	0.26	18.05	—	0.27
	IoU ²	0.899	0.008	0.899	0.008	1.00	0.900	0.009	1.00	0.862	0.046	0.96
	RPi5 On ³	5.081	0.010	5.080	0.010	1.00	8.670	0.068	1.71	16.37	0.028	3.22
	RPi5 Off ⁴	2.978	0.010	3.112	0.038	1.04	6.018	0.050	2.02	5.848	0.034	1.96
	Jetson On ³	4.121	0.006	4.125	0.004	1.00	7.905	0.014	1.92	11.43	0.004	2.77
	Jetson Off ⁴	3.390	0.003	3.403	0.004	1.00	6.368	0.008	1.88	5.702	0.005	1.68
EfficientNetV2B1	Size (MB) ¹	71.17	—	35.65	—	0.50	18.61	—	0.26	19.21	—	0.27
	IoU ²	0.905	0.003	0.905	0.003	1.00	0.905	0.003	1.00	0.894	0.002	0.99
	RPi5 On ³	3.698	0.008	3.698	0.007	1.00	6.238	0.046	1.69	12.31	0.014	3.33
	RPi5 Off ⁴	2.176	0.021	2.235	0.007	1.03	4.387	0.035	2.02	4.540	0.040	2.09
	Jetson On ³	2.998	0.002	3.000	0.003	1.00	5.969	0.010	1.99	8.547	0.003	2.85
	Jetson Off ⁴	2.460	0.003	2.459	0.002	1.00	4.749	0.006	1.93	4.424	0.004	1.80
EfficientNetV2B2	Size (MB) ¹	82.62	—	41.39	—	0.50	21.59	—	0.26	22.29	—	0.27
	IoU ²	0.907	0.003	0.907	0.003	1.00	0.905	0.002	1.00	0.892	0.001	0.98
	RPi5 On ³	3.305	0.010	3.361	0.021	1.02	5.614	0.027	1.70	10.99	0.015	3.33
	RPi5 Off ⁴	2.049	0.008	2.061	0.010	1.01	4.115	0.050	2.01	4.047	0.007	1.97
	Jetson On ³	2.647	0.005	2.647	0.002	1.00	5.362	0.008	2.03	7.261	0.441	2.74
	Jetson Off ⁴	2.194	0.002	2.197	0.002	1.00	4.337	0.007	1.98	4.009	0.004	1.83
EfficientNetV2B3	Size (MB) ¹	102.8	—	51.51	—	0.50	26.95	—	0.26	27.88	—	0.27
	IoU ²	0.908	0.002	0.908	0.002	1.00	0.907	0.002	1.00	0.895	0.001	0.99
	RPi5 On ³	2.539	0.007	2.571	0.006	1.01	4.548	0.044	1.79	8.297	0.012	3.27
	RPi5 Off ⁴	1.571	0.019	1.539	0.009	0.98	3.239	0.040	2.06	3.138	0.023	2.00
	Jetson On ³	1.962	0.002	1.962	0.002	1.00	4.106	0.006	2.09	5.742	0.001	2.93
	Jetson Off ⁴	1.615	0.001	1.616	0.001	1.00	3.464	0.003	2.15	3.106	0.003	1.92

¹ The size of the model weights; ² detection accuracy measured as intersection over union; ³ inference rate on low-power systems in frames s⁻¹with TensorFlow-Lite XNNPACK delegates enabled; ⁴ inference rate with delegates disabled; ⁵ optimisation types; “none” is the baseline (no optimisation); ⁶ impact of optimisation expressed as a ratio to the baseline.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Giedra, H.; Sledevič, T.; Matuzevičius, D. Deploying Optimized Deep Vision Models for Eyeglasses Detection on Low-Power Platforms. Electronics 2025, 14, 2796. https://doi.org/10.3390/electronics14142796

AMA Style

Giedra H, Sledevič T, Matuzevičius D. Deploying Optimized Deep Vision Models for Eyeglasses Detection on Low-Power Platforms. Electronics. 2025; 14(14):2796. https://doi.org/10.3390/electronics14142796

Chicago/Turabian Style

Giedra, Henrikas, Tomyslav Sledevič, and Dalius Matuzevičius. 2025. "Deploying Optimized Deep Vision Models for Eyeglasses Detection on Low-Power Platforms" Electronics 14, no. 14: 2796. https://doi.org/10.3390/electronics14142796

APA Style

Giedra, H., Sledevič, T., & Matuzevičius, D. (2025). Deploying Optimized Deep Vision Models for Eyeglasses Detection on Low-Power Platforms. Electronics, 14(14), 2796. https://doi.org/10.3390/electronics14142796

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Deploying Optimized Deep Vision Models for Eyeglasses Detection on Low-Power Platforms

Abstract

1. Introduction

2. Materials and Methods

2.1. Experimental Setup

2.1.1. Model Summary

2.1.2. Model Training

2.1.3. Model Optimizations for Edge Deployment

2.1.4. Performance Evaluation

2.2. Data Preparation

2.3. Software Used

2.4. Hardware Used

3. Results and Discussion

3.1. Impact of Quantization

3.2. Failure Case Analysis

3.3. Deployment Guidelines

3.4. Limitations and Future Work

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI