Design of a Lightweight Video-Based Ear Biometric System on Raspberry Pi 5 Using You Only Look Once Version 12 and EfficientNet-4

Padilla, Kristian Emmanuel; Saculsan, Michael Robin; Cruz, John Paul

doi:10.3390/engproc2026134050

Open AccessProceeding Paper

Design of a Lightweight Video-Based Ear Biometric System on Raspberry Pi 5 Using You Only Look Once Version 12 and EfficientNet-4^†

by

Kristian Emmanuel Padilla

,

Michael Robin Saculsan

and

John Paul Cruz

^*

School of Electrical, Electronics, and Computer Engineering, Mapúa University, Manila 1002, Philippines

^*

Author to whom correspondence should be addressed.

^†

Presented at the 7th Eurasia Conference on IoT, Communication and Engineering 2025 (ECICE 2025), Yunlin, Taiwan, 14–16 November 2025.

Eng. Proc. 2026, 134(1), 50; https://doi.org/10.3390/engproc2026134050

Published: 14 April 2026

(This article belongs to the Proceedings of The 7th Eurasia Conference on IoT, Communication and Engineering 2025 (ECICE 2025))

Download

Browse Figures

Versions Notes

Abstract

Recent advances in ear biometrics have yielded increasingly accurate detection and recognition methods, driven by the ear’s uniqueness and permanence as a non-invasive biometric modality. Nonetheless, several limitations persist, including computationally demanding models, inconsistent evaluation metrics, and portable systems restricted by manual capture and limited datasets. To address these challenges, we developed a lightweight, video-based ear biometric system implemented on the Raspberry Pi 5. The system integrates You Only Look Once Version 12 (YOLOv12) for ear detection, EfficientNet-4 for feature extraction, and k-Nearest Neighbors (k-NNs) for recognition. Its robust hardware platform combines Raspberry Pi 5 with the Raspberry Pi AI Camera and AI HAT+. To train, fine-tune, and optimize YOLOv12 and EfficientNet-4, we used the Visual Geometry Group (VGG)Face-Ear dataset for training and the Unconstrained Ear Recognition Challenge 2019 dataset for validation, with k-NN employed for classification. The system is evaluated for classification accuracy and system-level performance. 13 participants, comprising 10 enrolled and three unenrolled subjects, participated in testing the system. The enrolled participants registered in the system were correctly identified, whereas unenrolled participants were excluded and rejected. The system achieved 92.31% accuracy, 95.45% precision, 96.97% recall, and an F1-score of 0.95, confirming the feasibility of deploying advanced ear biometric methods on embedded, resource-constrained devices.

Keywords:

ear biometrics; ear biometric system; YOLOv12; EfficientNet-4; Raspberry Pi 5; k-Nearest Neighbors

1. Introduction

The human ear is a robust and reliable biometric, characterized by its anatomical uniqueness, universality, and permanence, with the ear’s cartilage remaining remarkably stable and predictable throughout most of an individual’s life [1,2,3,4,5]. In contrast to facial or fingerprint recognition, ear biometrics is non-intrusive, therefore passive and autonomous of the subject, and unaffected by changes in facial expression or the use of face coverings [2,6]. However, the widespread adoption of ear biometric systems is hindered by the high computational demands of advanced recognition methods, inconsistent performance under occlusions, and limited compatibility with low-power embedded devices, making them impractical for portable use [1,2,7].

Nevertheless, among the significant developments in the literature on ear biometrics, Ref. [8] demonstrated the viability of portable ear biometric systems, contrasting state-of-the-art traditional and deep learning-based ear detection and recognition approaches synthesized in studies such as [3,4,6,9,10,11,12]. Convolutional Neural Networks (CNNs) in ear biometrics have been widely used, including their use for occluded ear recognition [2]. Despite their effectiveness, CNNs remain computationally demanding, limiting their applicability to resource-constrained platforms such as the Raspberry Pi. While server-based offloading can mitigate these demands, it is unsuitable for portable, standalone security devices [13]. Furthermore, existing studies are constrained by the limited availability of open-source ear datasets, inconsistent evaluation metrics, and a disconnect between academic research and practical industry applications.

In this context, Ref. [8] implemented a portable ear biometric system on the Raspberry Pi, representing one of the few attempts to bridge academic research with real-world application. The system reported an accuracy of 98.04%. However, this performance was achieved using a small, non-public dataset of 40 subjects under controlled conditions (restricted to right-ear images and manual capture). The identification threshold was set at 60% through trial and error, and evaluation was limited to accuracy alone. Consequently, the system’s generalizability, real-world applicability, and reliability remain limited.

To address the lack of automation, a real-time ear detection and recognition model was developed in this study by employing precision, recall, F1-score, and system-level measures, including processing time, CPU, and memory utilization. We used publicly available, unconstrained datasets such as the Visual Geometry Group (VGG)Face-Ear dataset for training and the Unconstrained Ear Recognition Challenge 2019 dataset. These enhancements enable an ear biometric system with modern, automated, and near-instantaneous recognition capabilities, trained on data representative of real-world conditions and evaluated using comprehensive model- and system-level metrics.

We developed a lightweight, video-based ear biometric system on the Raspberry Pi 5 using You Only Look Once Version 12 (YOLOv12), EfficientNet-4, and k-Nearest Neighbors (k-NN) classification. Leveraging CNNs, the proposed system will demonstrate the feasibility of video input for portable ear biometrics, enabling fully automated, real-time detection and recognition. A robust hardware platform was also developed for real-time ear biometric acquisition by integrating the Raspberry Pi 5 with the Raspberry Pi AI Camera for video capture and the Raspberry Pi AI HAT+ with its Hailo-8 accelerator for efficient, low-latency edge processing. To train, fine-tune, and optimize YOLOv12 for ear detection and EfficientNet-4 for feature extraction and recognition, we used the large-scale VGGFace-Ear dataset for training and the UERC 2019 dataset for validation, with k-NN employed for identification matching. To conduct a comprehensive performance evaluation using standard classification metrics (accuracy, precision, recall, F1-score) and system-level metrics (processing time, CPU, and random access memory utilization, and power consumption) to assess practical viability in real-world applications. The model is fine-tuned on the VGGFace-Ear dataset, while UERC 2019 is used for validation. Testing involved 13 participants under semi-controlled conditions, with informed consent and data confidentiality ensured. The lightweight 8-megabyte YOLO model compatible with the Raspberry Pi AI Camera reduces computational load, processes one ear at a time, and does not address occlusion or 3D geometry.

2. Related Literature

A biometric system automates individual identification by analyzing biometric traits—physical, behavioral, or physiological attributes whose distinctive features are extracted and stored for recognition [4,14,15,16]. These traits are mainly evaluated based on universality, uniqueness, permanence, measurability, performance, acceptability, and resistance to circumvention [11,16]. The human ear satisfies these criteria through its uniqueness [8,11,17], universality [3], anatomical permanence [9,11,17], and non-intrusive measurability [2,18], enabling remote capture and establishing it as a passive biometric [14]. An ear biometric system requires a dataset of ears and their identities, an ear-detection method, preprocessing and feature-extraction procedures, and a recognition or classification algorithm [7]. Such systems are commonly evaluated using accuracy, precision, recall, and F1-score [4]. These metrics are consistently reported in CNN-based ear recognition studies [7,12,19,20,21,22,23,24,25,26]. Additional evaluation measures include the receiver operating characteristic (ROC) curve [1,7], mean average precision (mAP) for detection tasks [27], training and validation loss [21], and specificity [26]. This image-based modality contrasts with time-series methods that employ Recurrent Neural Networks [28].

Among image-based models, CNNs achieve state-of-the-art performance in detection, segmentation, and classification [4,12,29]. Two CNN families are used in this study. YOLO is a state-of-the-art real-time object detection framework [30,31]. Its lightweight variants have been successfully deployed on embedded platforms such as the Raspberry Pi [32]. YOLO has also been applied to biomedical detection tasks [33], demonstrating its adaptability across domains. Secondly, EfficientNet was employed for ear recognition through uniform scaling of width, depth, and resolution [34]. EfficientNet has been widely adopted across diverse image classification and detection tasks [20,21,22,23,24,25,26,35,36,37,38,39,40,41]. In biometric applications, it has been applied to ear recognition [39] and finger-vein identification [40]. EfficientNet is frequently compared with MobileNet in lightweight recognition systems [20,21,22,23,24,25,26,36,37], where comparative analyses report higher accuracy for EfficientNet in several scenarios [20,21,22,23,24,36]. At the same time, MobileNet-based architectures remain favorable for faster inference and reduced computational complexity [25,26,42]. These findings highlight the accuracy–efficiency trade-off relevant to portable and embedded deployments [36,37]. CNNs have also been utilized purely as feature extractors in ear biometric studies [43,44,45].

While model architecture strongly influences performance, the quality and diversity of datasets used for training and fine-tuning are equally critical. Limited datasets have been shown to degrade CNN performance [36]. However, open-source ear datasets remain scarce, particularly with respect to ethnic diversity, and are often collected under constrained conditions that fail to reflect real-world scenarios. The VGGFace-Ear dataset (234,651 images, 660 subjects) and the UERC 2019 dataset (11,804 images, 3704 subjects) are among the few publicly available unconstrained datasets [4].

At the system level, deploying CNN models on resource-constrained edge devices requires systematic optimization [46]. Single-board computers such as the Raspberry Pi 5 possess limited computational resources, particularly for live video processing, and must balance low power consumption, thermal stability, and real-time responsiveness [47,48,49]. Despite these constraints, CNNs such as YOLOv5 and YOLOv4-tiny have been successfully implemented on the Raspberry Pi 4/B [27,32,50], while GFP-GAN and custom CNN architectures have been demonstrated on the Raspberry Pi 5 [51,52]. Key evaluation metrics include CPU and random access memory (RAM) utilization, power consumption, and average processing time or frame rate [50,52].

3. Methodology

We developed a lightweight, video-based ear biometric system on the Raspberry Pi 5, leveraging YOLOv12 and EfficientNet-4. The system was implemented in portable biometric systems as a state-of-the-art technique. Model- and system-level performance were evaluated using the metrics.

The developed ear biometric system processes live video input through a YOLO-enabled camera for real-time ear detection and cropping. Each detected ear frame is sent to the Raspberry Pi 5, where a fine-tuned EfficientNet model extracts features to generate an averaged feature vector. This vector is compared with stored representations in the database using a k-NN classifier to identify the subject or output a no-match result (Figure 1).

3.1. System Components

The system includes a Raspberry Pi 5 as the computing platform, a Raspberry Pi AI Camera running the YOLO model for ear detection, and a Raspberry Pi AI HAT+ hosting the EfficientNet model for feature extraction (Figure 2). Modules for cooling, power, storage, and interfacing are summarized in Table 1.

3.2. Software Development

The system software (Figure 3 and Figure 4) is responsible for the enrollment mode for registering users by processing cropped ear frames into extracted feature vectors, which are paired with user names and stored as biometric templates, and the identification mode that executes the complete recognition pipeline. The system conducts ear detection with YOLOv12, feature extraction with EfficientNet, and classification using k-NN. These processes constitute the backend, while a graphical user interface (GUI) facilitates user interaction. The software operates on a customized 64-bit Raspberry Pi OS (Bookworm), optimized for startup execution and implemented entirely in Python 3.11.8.

3.3. Model Fine-Tuning Process

To fine-tune both YOLOv12 and EfficientNet-4 models, the VGGFace-Ear dataset was split into 90:10 (660 subjects, 213,318 and 21,333 images in each subdataset), and the UERC 2019 test set (9500 images) was used as an external test set to evaluate their generalization capabilities. The VGGFace-Ear dataset was provided by the Concytec-World Bank Project (ProCiencia) [53,54], while the UERC 2019 dataset was provided by the University of Ljubljana, Slovenia [9,43,44,45,55,56,57]. YOLOv12’s fine-tuning parameters were adjusted until its performance was satisfactory on the VGGFace-Ear and UERC 2019 test sets. Once overall satisfactory performance was achieved, the fine-tuned model was converted, packaged, and deployed to the RPi AI Camera.

The EfficientNet-4 fine-tuning process included an additional preprocessing step on the VGGFace-Ear and UERC 2019 datasets, where cropped ear images were generated using the fine-tuned YOLOv12 model. During fine-tuning on the VGGFace-Ear training set, repeated collapse checks and automatic parameter adjustments were applied, continuing for 400 epochs until satisfactory performance was achieved on both the VGGFace-Ear and UERC 2019 test sets. The final model was then converted, packaged, and deployed to the Raspberry Pi AI HAT+ (Hailo-8).

3.4. Experimental Design and Statistics Treatment

Before system evaluation, the fine-tuned YOLOv12 and EfficientNet models were assessed using four confusion-matrix-based metrics: accuracy, precision, recall, and F1-score. These metrics also determined performance satisfaction during evaluation on the VGGFace-Ear and UERC 2019 datasets. The system testing involved 13 participants across the following phases.

Enrollment phase: Ten participants were enrolled. Upon successful detection, five frames were recorded, feature-extracted, and averaged into biometric templates.
Identification phase: All 13 participants underwent recognition. Upon detection, a three-second video stream (up to 45 frames) of the ear was recorded and feature-extracted. Identification results from valid frames were aggregated to determine the outcome (match or no match).

Figure 5 illustrates participant positioning during both phases. The results were summarized in a confusion matrix to derive system-level accuracy, precision, recall, and F1-score. Additionally, overall processing time, CPU and RAM utilization, and power consumption were measured using Python’s psutil library.

4. Results and Discussion

4.1. Model Fine-Tuning Results

Both fine-tuned YOLOv12 and EfficientNet-4 models achieved high detection and feature-extraction performance metrics, respectively. Table 2 and Table 3 summarize the metrics. The YOLOv12 model demonstrated reliable ear-region detection and minimal false classifications. The EfficientNet-4 model yielded strong feature-extraction results on both datasets, confirming its discriminative capability for ear-based identification. Overall, the performance metrics provided indicate that the proposed models effectively balance accuracy, precision, and recall, providing a robust foundation for the developed biometric system.

4.2. System Testing Results

The system evaluation achieved a 92.31% accuracy, a 95.45% precision, a 96.97% recall, and an F1-score of 0.95. These results indicate that the system accurately identifies enrolled individuals while minimizing recognition of non-enrolled subjects, with only a single false positive observed in the non-registered class. Overall, the evaluation results confirm that the integration of YOLOv12 and EfficientNet-4 within the ear biometric framework provides robust identification performance on the Raspberry Pi 5 platform, thereby validating its feasibility for lightweight biometric applications.

Table 4 provides the confusion matrix for the stated evaluation, where participants not enrolled in the system are grouped under a single class labeled “Not Recognized” (NR). A representative subset of five frames per participant was used for the matrix, although the actual number of frames processed per inference may range from 30 to 45 (10–15 FPS in over 3 s).

Supporting the system’s model-level analysis, its hardware-based performance results indicate that the developed ear biometric system sustained an average processing rate of 12.8 frames per second (FPS) (78 ms per inference) during typical operation on the Raspberry Pi 5 with the Hailo-8 accelerator. The total power draw averaged 12.35 W (2.47 A at 5 V), corresponding to 310 mJ of energy per identification and a power efficiency of 1.04 IDs/W. CPU utilization remained below 50%, while the Hailo 8 utilization remained at 35% and RAM usage reached 1.38 GB (17% of available memory). The measured thermal output of 42.1 British Thermal Units per hour resulted in steady operating temperatures of 52 °C for the system and 48 °C for the accelerator. These results confirm that the hardware platform delivers real-time biometric identification performance while maintaining efficient power consumption and stable thermal characteristics suitable for sustained embedded deployment.

5. Conclusions

The gap between academic research and practical applications in ear biometrics continues to hinder real-world deployment. We developed a lightweight, video-based system employing YOLOv12 and EfficientNet-4, enabling real-time ear capture and low-latency recognition on portable hardware. The evaluation results demonstrated strong performance, achieving 92.31% accuracy, 95.45% precision, 96.97% recall, and an F1-score of 0.95, while operating at 85 ms per frame (11.76 FPS), with Hailo-8 utilization at 35%, the RAM usage of 1.02 GB (15.2%), and the power consumption of 12.35 W (2.47 A at 5 V) under typical load. These results confirm the feasibility of deploying advanced biometric models, fine-tuned and validated with publicly available unconstrained datasets, on portable, resource-constrained systems.

System-level testing is necessary using larger and more diverse datasets, as well as performance evaluation across demographic and environmental factors such as ethnicity, ear occlusion, age groups, lighting conditions, background complexity, crowd density, capture angles, distances, and video quality. It is required to optimize models and software to reduce resource usage on the Raspberry Pi or similar platforms. Constrained and unconstrained datasets and the fine-tuning of alternative pre-trained models need to be established for ear biometrics.

Author Contributions

Conceptualization, J.P.C., K.E.P. and M.R.S.; methodology, K.E.P. and M.R.S.; software, M.R.S.; validation, J.P.C.; formal analysis, K.E.P. and M.R.S.; investigation, J.P.C., K.E.P. and M.R.S.; resources, J.P.C. and K.E.P.; data curation, K.E.P. and M.R.S.; writing—original draft preparation, J.P.C., K.E.P. and M.R.S.; writing—review and editing, J.P.C., K.E.P. and M.R.S.; visualization, K.E.P. and M.R.S.; supervision, J.P.C.; project administration, J.P.C. and K.E.P.; funding acquisition, K.E.P. and M.R.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Online and physical informed consents were obtained from all participants involved in the study.

Data Availability Statement

The dataset used to train the models in this study is available upon request. The VGGFace-Ear dataset can be requested in VGGFace-Ear at https://github.com/grisellycooper/VGGFace-Ear?tab=readme-ov-file, accessed on 29 June 2025, and the UERC 2019 dataset can be requested in Ear Recognition Research at http://uerc.fri.uni-lj.si/index.html, accessed on 26 July 2025. Restrictions and obligations apply when requesting these datasets. The raw data used for system testing is not readily or publicly available to protect participant confidentiality and comply with ethical research standards; however, supplementary data analysis will be made available by the authors on request.

Acknowledgments

The authors thank their colleagues from the Mapúa University School of Electrical, Electronics, and Computer Engineering (EECE) for their support and assistance in this research.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Mursalin, M.; Ahmed, M.; Haskell-Dowland, P. Biometric security: A novel ear recognition approach using a 3D morphable ear model. Sensors 2022, 22, 8988. [Google Scholar] [CrossRef]
Tey, H.-C.; Chong, L.Y.; Chong, S.-C. Comparative analysis of VGG-16 and ResNet-50 for occluded ear recognition. J. Inform. Vis. 2023, 7, 2247–2254. [Google Scholar]
Singh, D.; Singh, S. A survey on human ear recognition system based on 2D and 3D ear images. Open J. Inf. Secur. Appl. 2014, 2, 21–30. [Google Scholar] [CrossRef]
Oyebiyi, O.G.; Abayomi-Alli, A.; Arogundade, O.T.; Qazi, A.; Imoize, A.L.; Awotunde, J.B. A systematic literature review on human ear biometrics: Approaches, algorithms, and trends in the last decade. Information 2023, 14, 192. [Google Scholar] [CrossRef]
Burge, M.; Burger, W. Ear biometrics. In Biometrics; Li, S.Z., Jain, A.K., Eds.; Springer: Boston, MA, USA, 2006; pp. 273–285. [Google Scholar]
Abaza, A.; Ross, A.; Hebert, C.; Harrison, M.A.F.; Nixon, M.S. A survey on ear biometrics. ACM Comput. Surv. 2013, 45, 1–36. [Google Scholar] [CrossRef]
Kamboj, A.; Rani, R.; Nigam, A.; Jha, R.R. CED-Net: Context-aware ear detection network for unconstrained images. Pattern Anal. Appl. 2021, 24, 779–800. [Google Scholar] [CrossRef]
Balangue, R.D.; Padilla, C.D.M.; Linsangan, N.B.; Cruz, J.P.T.; Juanatas, R.A.; Juanatas, I.C. Ear recognition for ear biometrics using integrated image processing techniques via Raspberry Pi. In Proceedings of the 2022 IEEE 14th International Conference on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment and Management (HNICEM), Boracay Island, Philippines, 1–4 December 2022. [Google Scholar]
Emeršič, Ž.; Štruc, V.; Peer, P. Ear recognition: More than a survey. Neurocomputing 2016, 255, 26–39. [Google Scholar] [CrossRef]
Chutani, P.; Sharma, N. Ear recognition advancements and trends: A scientometrics and systematic review. In Proceedings of the 2024 IEEE 5th India Council International Subsections Conference (INDISCON), Chandigarh, India, 22–24 August 2024. [Google Scholar]
Chutani, P.; Sharma, N. Ear recognition: Advancements and trends in biometric identification. In Proceedings of the 2024 International Conference on Expert Clouds and Applications (ICOECA), Bengaluru, India, 18–19 April 2024; pp. 956–963. [Google Scholar]
Zhang, Y.; Mu, Z. Ear detection under uncontrolled conditions with multiple scale faster region-based convolutional neural networks. Symmetry 2017, 9, 53. [Google Scholar] [CrossRef]
Mehta, R.; Singh, K.K. Deep convolutional neural network-based effective model for 2D ear recognition using data augmentation. Imaging Sci. J. 2024, 72, 403–420. [Google Scholar] [CrossRef]
Suryawanshi, S.H. The ear as a biometric. Int. J. Comput. Sci. Mob. Comput. 2015, 4, 61–65. [Google Scholar]
Dantcheva, A.; Elia, P.; Ross, A. What else does your biometric data reveal? A survey on soft biometrics. IEEE Trans. Inf. Forensics Secur. 2016, 11, 441–467. [Google Scholar] [CrossRef]
Wayman, J.; Jain, A.; Maltoni, D.; Maio, D. (Eds.) Biometric Systems; Springer: London, UK, 2005. [Google Scholar]
Xu, X.; Liu, Y.; Cao, S.; Lu, L. An efficient and lightweight method for human ear recognition based on MobileNet. Wirel. Commun. Mob. Comput. 2022, 2022, 9069007. [Google Scholar] [CrossRef]
Sarangi, P.P.; Mishra, B.S.P.; Dehuri, S.; Cho, S.B. An evaluation of ear biometric system based on enhanced Jaya algorithm and SURF descriptors. Evol. Intell. 2020, 13, 443–461. [Google Scholar] [CrossRef]
Raveane, W.; Galdámez, P.L.; Arrieta, M.A.G. Ear detection and localization with convolutional neural networks in natural images and videos. Processes 2019, 7, 457. [Google Scholar] [CrossRef]
Shams, M.Y.; Hassan, E.; Gamil, S.; Ibrahim, A.; Gabr, E.; Gamal, S.; Ibrahim, E.; Abbas, F.; Mohammed, A.; Khamis, A.; et al. Skin disease classification: A comparison of ResNet50, MobileNet, and Efficient-B0. J. Curr. Multidiscip. Res. 2025, 1, 1–7. [Google Scholar] [CrossRef]
Harahap, S.A.F.; Irmawan, I. Performance comparison of MobileNet, EfficientNet, and Inception for predicting crop disease. Sriwijaya Electr. Comput. Eng. J. 2024, 1, 30–36. [Google Scholar] [CrossRef]
Muthulakshmi, M.; Venkatesan, K.; Harigaran, R.; Jeevanantham, K.; Vineeth, M.S.; Rahayu, S.B.; Sakthivel, V. Comparative study of EfficientNet and MobileNet models for lung cancer classification using chest CT scan images. In Proceedings of the 2024 Second International Conference on Emerging Trends in Information Technology and Engineering (ICETITE), Vellore, India, 22–23 February 2024. [Google Scholar]
Choudhury, B.; Rajakumar, K.; Badhale, A.A.; Roy, A.; Sahoo, R.; Margret, I.N. Comparative analysis of advanced models for satellite-based aircraft identification. In Proceedings of the 2024 International Conference on Smart Systems for Electrical, Electronics, Communication and Computer Engineering (ICSSEECC), Coimbatore, India, 28–29 June 2024; pp. 483–488. [Google Scholar]
Rajkumar, M.; Siriyala, S.R.; Sharma, A.; Kumar, D.N.S. Comparative analysis of ResNet, EfficientNet, and MobileNetV2 for detecting skin conditions in deep learning models. In Proceedings of the 2024 International Conference on Sustainable Communication Networks and Application (ICSCNA), Theni, India, 11–13 December 2024; pp. 1487–1492. [Google Scholar]
Thapliyal, N.; Aeri, M.; Kukreja, V.; Sharma, R. Navigating landscapes through AI: A comparative study of EfficientNet and MobileNetV2 in image classification. In Proceedings of the 2024 International Conference on Emerging Technologies in Computer Science for Interdisciplinary Applications (ICETCS), Bengaluru, India, 22–23 April 2024. [Google Scholar]
Mia, Y.; Rahman, M.S.; Basak, A.; Hossain, S.M.; Zulfiker, M.S.; Sumaia, M.A. Detecting helmets of the bike riders using deep learning algorithms. In Proceedings of the 2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT), Delhi, India, 6–8 July 2023. [Google Scholar]
Magayones, L.I.B.; Tañag, R.D.B.; Villaverde, J.F. Asian currency identification and recognition using lightweight convolutional neural network. In Proceedings of the TENCON 2024 IEEE Region 10 Annual International Conference, Singapore, 1–4 December 2024; pp. 1710–1715. [Google Scholar]
Padilla, D.A.; Magwili, G.V.; Mercado, L.B.Z.; Reyes, J.T.L. Air quality prediction using recurrent air quality predictor with ensemble learning. In Proceedings of the 2020 IEEE 12th International Conference on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment, and Management (HNICEM), Manila, Philippines, 3–7 December 2020. [Google Scholar]
Manlises, C.O.; Santos, J.B.; Adviento, P.A.; Padilla, D.A. Expiry date character recognition on canned goods using convolutional neural network VGG16 architecture. In Proceedings of the 2023 15th International Conference on Computer and Automation Engineering (ICCAE), Sydney, Australia, 3–5 March 2023; pp. 394–399. [Google Scholar]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Wang, C.-Y.; Bochkovskiy, A.; Liao, H.-Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar]
Chan, C.J.L.; Reyes, E.J.A.; Linsangan, N.B.; Juanatas, R.A. Real-time detection of aquarium fish species using YOLOv4-tiny on Raspberry Pi 4. In Proceedings of the 2022 IEEE International Conference on Artificial Intelligence in Engineering and Technology (IICAIET), Kota Kinabalu, Malaysia, 13–15 September 2022. [Google Scholar]
Pellegrino, R.V.; Lacuesta, J.H.T.; Dela Cuesta, C.F.L. Nail abnormality identification using Roboflow and YOLOv8. In Proceedings of the 2024 14th International Conference on Biomedical Engineering and Technology, Seoul, Republic of Korea, 14–17 June 2024; pp. 121–127. [Google Scholar]
Tan, M.; Le, Q.V. EfficientNet: Rethinking model scaling for convolutional neural networks. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 10691–10700. [Google Scholar]
Vanitha, P.; Mohana Priya, T.; Navasakthi, P.; Rakshana Devi, V.S.; Aarthi, R. Identification of fake logo detection using deep learning. In Proceedings of the 2024 3rd International Conference on Artificial Intelligence for Internet of Things (AIIoT), Vellore, India, 3–4 May 2024. [Google Scholar]
Pu, H.; Yi, K. A comparative analysis of EfficientNet and MobileNet models’ performance on limited datasets: An example of American Sign Language alphabet detection. Highlights Sci. Eng. Technol. 2024, 94, 558–564. [Google Scholar] [CrossRef]
Arjun, P.A.; Suryanarayan, S.; Viswamanav, R.S.; Abhishek, S.; Anjali, T. Unveiling underwater structures: MobileNet vs. EfficientNet in sonar image detection. Procedia Comput. Sci. 2024, 233, 518–527. [Google Scholar] [CrossRef]
Gattu, A. Real-Time Facial Expression Recognition Using EfficientNetB7 Model Architecture. Doctoral Dissertation, Dublin Business School, Dublin, Ireland, 2024. [Google Scholar]
Booysens, A.; Viriri, S. Exploration of ear biometrics using EfficientNet. Comput. Intell. Neurosci. 2022, 2022, 3514807. [Google Scholar] [CrossRef]
Singh, M.; Singla, S.K. EFI-SATL: An EfficientNet and self-attention based biometric recognition for finger-vein using deep transfer learning. Comput. Model. Eng. Sci. 2025, 142, 3003–3029. [Google Scholar] [CrossRef]
Ul Haq, H.F.D.; Ismail, R.; Ismail, S.; Purnama, S.R.; Warsito, B.; Setiawan, J.D.; Wibowo, A. EfficientNet optimization on heartbeats sound classification. In Proceedings of the 2021 5th International Conference on Informatics and Computational Sciences (ICICoS), Semarang, Indonesia, 24–25 November 2021; pp. 216–221. [Google Scholar]
Padilla, D.; Yumang, A.; Diaz, A.L.; Inlong, G. Differentiating atopic dermatitis and psoriasis chronic plaque using convolutional neural network MobileNet architecture. In Proceedings of the 2019 IEEE 11th International Conference on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment, and Management (HNICEM), Laoag, Philippines, 29 November–1 December 2019. [Google Scholar]
Emeršič, Ž.; Štepec, D.; Štruc, V.; Peer, P.; George, A.; Ahmad, A.; Omar, E.; Boult, T.E.; Safdaii, R.; Zhou, Y.; et al. The unconstrained ear recognition challenge. In Proceedings of the 2017 IEEE International Joint Conference on Biometrics (IJCB), Denver, CO, USA, 1–4 October 2017; pp. 715–724. [Google Scholar]
Emeršič, Ž.; SV, A.K.; Harish, B.S.; Gutfeter, W.; Khiarak, J.N.; Pacut, A.; Hansley, E.; Segundo, M.P.; Sarkar, S.; Park, H.J.; et al. The unconstrained ear recognition challenge 2019. In Proceedings of the 2019 International Conference on Biometrics (ICB), Crete, Greece, 4–7 June 2019. [Google Scholar]
Emeršič, Ž.; Ohki, T.; Akasaka, M.; Arakawa, T.; Maeda, S.; Okano, M.; Sato, Y.; George, A.; Marcel, S.; Ganapathi, I.I.; et al. The unconstrained ear recognition challenge 2023: Maximizing performance and minimizing bias. In Proceedings of the 2023 IEEE International Joint Conference on Biometrics (IJCB), Ljubljana, Slovenia, 25–28 September 2023. [Google Scholar]
Ameen, S.; Siriwardana, K.; Theodoridis, T. Optimizing deep learning models for Raspberry Pi. arXiv 2023, arXiv:2304.13039. [Google Scholar] [CrossRef]
Manlises, C.O.; Martinez, J.M.; Belenzo, J.L.; Perez, C.K.; Postrero, M.K.T.A. Real-time integrated CCTV using face and pedestrian detection image processing algorithm for automatic traffic light transitions. In Proceedings of the 2015 International Conference on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment and Management (HNICEM), Cebu, Philippines, 9–12 December 2015. [Google Scholar]
Yumang, A.N.; Avendano, G.O.; Talisic, G.C. Greening data communications and computer networks through the Networking Academy. In Proceedings of the World Congress on Engineering, London, UK, 6–8 July 2011; pp. 355–359. [Google Scholar]
Ghosh, A. YOLO11 on Raspberry Pi: Optimizing Object Detection for Edge. Available online: https://learnopencv.com/yolo11-on-raspberry-pi/ (accessed on 22 June 2025).
Valencia, C.A.A.; Suliva, R.S.S.; Villaverde, J.F. Hardware performance evaluation of different computing devices on YOLOv5 ship detection model. In Proceedings of the 2022 IEEE 14th International Conference on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment, and Management (HNICEM), Boracay Island, Philippines, 1–4 December 2022. [Google Scholar]
Anonas, A.C.; Mercado, G.S.J.; Villamor, I.V. Enhancing low-resolution images for FaceNet facial recognition using GFPGAN on Raspberry Pi 5. In Proceedings of the 2025 17th International Conference on Computer and Automation Engineering (ICCAE), Perth, Australia, 20–22 March 2025; pp. 12–16. [Google Scholar]
Alaidi, A.H.M.; Ramadhan, Z.A.; Alrubaye, J.S.; Alrikabi, H.T.S.; Mutar, H.A.; Svyd, I. AI-based monkeypox detection model using Raspberry Pi 5 AI Kit. Sustain. Eng. Innov. 2025, 7, 1–14. [Google Scholar] [CrossRef]
Ramos-Cooper, S.; Gomez-Nieto, E.; Camara-Chavez, G. VGGFace-Ear: An extended dataset for unconstrained ear recognition. Sensors 2022, 22, 1752. [Google Scholar] [CrossRef]
Ramos-Cooper, S.; Camara-Chavez, G. Ear recognition in the wild with convolutional neural networks. In Proceedings of the 2021 XLVII Latin American Computing Conference (CLEI), Cartago, Costa Rica, 25–29 October 2021. [Google Scholar]
Emeršič, Ž.; Meden, B.; Peer, P.; Štruc, V. Evaluation and analysis of ear recognition models: Performance, complexity and resource requirements. Neural Comput. Appl. 2020, 32, 15785–15800. [Google Scholar] [CrossRef]
Emeršič, Ž.; Križaj, J.; Štruc, V.; Peer, P. Deep ear recognition pipeline. Stud. Comput. Intell. 2019, 804, 333–362. [Google Scholar]
Emeršič, Ž.; Gabriel, L.L.; Štruc, V.; Peer, P. Convolutional encoder–decoder networks for pixel-wise ear detection and segmentation. IET Biom. 2018, 7, 175–184. [Google Scholar] [CrossRef]

Figure 1. System workflow.

Figure 2. Hardware design of the developed system.

Figure 3. System software flowchart.

Figure 4. Enrollment and identification mode flowcharts.

Figure 5. Experimental setup.

Table 1. Specifications of the system component.

Component	Specification
RPi 5	2.4 GHz CPU, 16 GB RAM
RPi AI HAT+	Hailo-8 accelerator (26 TOPS)
RPi AI Camera	12.3 MP Sony IMX500, manual focus
Ring Light	Small, rechargeable
RPi Active Cooler	8k RPM ±15% max fan speed
Lafvin 5″ IPS Display	800 × 480 px, 5-point touch, 60 Hz
DFRobot RPi 5 UPS	5 V @ 5 A out, 4 × 18,650 cells
Micro SD Card	XC1/U3/V30/A2 rated
USB Storage	USB 3.0 (minimum rated speed)
3D-Printed Case	Vented; detachable tripod mount

Table 2. Fine-tuned YOLOv12 performance metrics.

Metric	VGGFace-Ear	UERC 2019
Accuracy	98.00%	100.00%
Precision	100.00%	100.00%
Recall	97.14%	100.00%
F1-score	0.9855	1.00

Table 3. Fine-tuned EfficientNet-4 performance metrics.

Metric	VGGFace-Ear	UERC 2019
Accuracy	97.81%	98.25%
Precision	98.25%	100.00%
Recall	95.62%	97.14%
F1-score	0.9691	0.9855

Table 4. System evaluation confusion matrix.

Observation	Prediction											Total
Observation	ID 1	ID 2	ID 3	ID 4	ID 5	ID 6	ID 7	ID 8	ID 9	ID 10	NR	Total
ID 1	5	0	0	0	0	0	0	0	0	0	0	5
ID 2	0	5	0	0	0	0	0	0	0	0	0	5
ID 3	0	0	5	0	0	0	0	0	0	0	0	5
ID 4	0	0	0	5	0	0	0	0	0	0	0	5
ID 5	0	0	0	0	5	0	0	0	0	0	0	5
ID 6	0	0	0	0	0	5	0	0	0	0	0	5
ID 7	0	0	0	0	0	0	5	0	0	0	0	5
ID 8	0	0	0	0	0	0	0	5	0	0	0	5
ID 9	0	0	0	0	0	0	0	0	5	0	0	5
ID 10	0	0	0	0	0	0	0	0	0	5	0	5
NR *	5	0	0	0	0	0	0	0	0	0	10	15
Total	10	5	5	5	5	5	5	5	5	5	10	65

* ID: participant identification, NR: not recognized. Three subjects were not enrolled into the system.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Padilla, K.E.; Saculsan, M.R.; Cruz, J.P. Design of a Lightweight Video-Based Ear Biometric System on Raspberry Pi 5 Using You Only Look Once Version 12 and EfficientNet-4. Eng. Proc. 2026, 134, 50. https://doi.org/10.3390/engproc2026134050

AMA Style

Padilla KE, Saculsan MR, Cruz JP. Design of a Lightweight Video-Based Ear Biometric System on Raspberry Pi 5 Using You Only Look Once Version 12 and EfficientNet-4. Engineering Proceedings. 2026; 134(1):50. https://doi.org/10.3390/engproc2026134050

Chicago/Turabian Style

Padilla, Kristian Emmanuel, Michael Robin Saculsan, and John Paul Cruz. 2026. "Design of a Lightweight Video-Based Ear Biometric System on Raspberry Pi 5 Using You Only Look Once Version 12 and EfficientNet-4" Engineering Proceedings 134, no. 1: 50. https://doi.org/10.3390/engproc2026134050

APA Style

Padilla, K. E., Saculsan, M. R., & Cruz, J. P. (2026). Design of a Lightweight Video-Based Ear Biometric System on Raspberry Pi 5 Using You Only Look Once Version 12 and EfficientNet-4. Engineering Proceedings, 134(1), 50. https://doi.org/10.3390/engproc2026134050

Article Menu

Design of a Lightweight Video-Based Ear Biometric System on Raspberry Pi 5 Using You Only Look Once Version 12 and EfficientNet-4^†

Abstract

1. Introduction

2. Related Literature