Electronics

Research

18 pages, 1838 KB

Open AccessArticle

A Deep Learning Model for Wave V Peak Detection in Auditory Brainstem Response Data

by Jun Ma, Nak-Jun Sung, Sungjun Choi, Min Hong and Sungyeup Kim

Electronics 2026, 15(3), 511; https://doi.org/10.3390/electronics15030511 - 25 Jan 2026

Viewed by 244

In this study, we propose a YOLO-based object detection algorithm for the automated and accurate identification of the fifth wave (Wave V) in auditory brainstem response (ABR) graphs. The ABR test plays a critical role in the diagnosis of hearing disorders, with the [...] Read more.

In this study, we propose a YOLO-based object detection algorithm for the automated and accurate identification of the fifth wave (Wave V) in auditory brainstem response (ABR) graphs. The ABR test plays a critical role in the diagnosis of hearing disorders, with the fifth wave serving as a key marker for clinical assessment. However, conventional manual detection is time-consuming and subject to variability depending on the examiner’s expertise. To address these limitations, we developed a real-time detection method that utilizes a YOLO object detection model applied to ABR graph images. Prior to YOLO training, we employed a U-Net-based preprocessing algorithm to automatically remove existing annotated peaks from the ABR images, thereby generating training data suitable for peak detection. The proposed model was evaluated in terms of precision, recall, and mean average precision (mAP). The experimental results demonstrate that the YOLO-based approach achieves high detection performance across these metrics, indicating its potential as an effective tool for reliable Wave V peak localization in audiological applications. Full article

(This article belongs to the Special Issue Recent Advances and Applications of Machine Learning in Pattern Recognition)

► Show Figures

Figure 1

17 pages, 1423 KB

Open AccessArticle

Residual Motion Correction in Low-Dose Myocardial CT Perfusion Using CNN-Based Deformable Registration

by Mahmud Hasan, Aaron So and Mahmoud R. El-Sakka

Electronics 2026, 15(2), 450; https://doi.org/10.3390/electronics15020450 - 20 Jan 2026

Viewed by 270

Abstract

Dynamic myocardial CT perfusion imaging enables functional assessment of coronary artery stenosis and myocardial microvascular disease. However, it is susceptible to residual motion artifacts arising from cardiac and respiratory activity. These artifacts introduce temporal misalignments, distorting Time-Enhancement Curves (TECs) and leading to inaccurate [...] Read more.

Dynamic myocardial CT perfusion imaging enables functional assessment of coronary artery stenosis and myocardial microvascular disease. However, it is susceptible to residual motion artifacts arising from cardiac and respiratory activity. These artifacts introduce temporal misalignments, distorting Time-Enhancement Curves (TECs) and leading to inaccurate myocardial perfusion measurements. Traditional nonrigid registration methods can address such motion but are often computationally expensive and less effective when applied to low-dose images, which are prone to increased noise and structural degradation. In this work, we present a CNN-based motion-correction framework specifically trained for low-dose cardiac CT perfusion imaging. The model leverages spatiotemporal patterns to estimate and correct residual motion between time frames, aligning anatomical structures while preserving dynamic contrast behaviour. Unlike conventional methods, our approach avoids iterative optimization and manually defined similarity metrics, enabling faster, more robust corrections. Quantitative evaluation demonstrates significant improvements in temporal alignment, with reduced Target Registration Error (TRE) and increased correlation between voxel-wise TECs and reference curves. These enhancements enable more accurate myocardial perfusion measurements. Noise from low-dose scans affects registration performance, but this remains an open challenge. This work emphasizes the potential of learning-based methods to perform effective residual motion correction under challenging acquisition conditions, thereby improving the reliability of myocardial perfusion assessment. Full article

(This article belongs to the Special Issue Recent Advances and Applications of Machine Learning in Pattern Recognition)

► Show Figures

Figure 1

28 pages, 2342 KB

Open AccessArticle

Machine Learning-Based Blood Pressure Prediction Using Cardiovascular Disease Data: A Comprehensive Comparative Study

by Irina Naskinova, Mikhail Kolev, Dilyana Karova and Mariyan Milev

Electronics 2026, 15(2), 312; https://doi.org/10.3390/electronics15020312 - 10 Jan 2026

Viewed by 549

Abstract

Hypertension remains one of the most pressing public health challenges worldwide, affecting more than one billion individuals and serving as a principal risk factor for cardiovascular morbidity and mortality. Whilst blood pressure measurement constitutes a routine component of clinical practice, the capacity to [...] Read more.

Hypertension remains one of the most pressing public health challenges worldwide, affecting more than one billion individuals and serving as a principal risk factor for cardiovascular morbidity and mortality. Whilst blood pressure measurement constitutes a routine component of clinical practice, the capacity to predict blood pressure values from readily obtainable patient characteristics could substantially enhance preventive care strategies and facilitate timely intervention. The present study examines whether machine learning methodologies can reliably forecast blood pressure measurements utilizing cardiovascular risk factors in conjunction with demographic and anthropometric data. We have analyzed data from 68,616 individuals following rigorous quality assessment of 70,000 patient records obtained from Kaggle’s cardiovascular disease repository. Beyond the 10 original variables, we engineered additional features encompassing demographic patterns, body composition indices, clinical risk indicators, and their interactions. Nine distinct predictive models were systematically evaluated, spanning from elementary baseline approaches through to sophisticated gradient boosting ensembles. CatBoost demonstrated superior performance, yielding systolic blood pressure predictions with a root mean squared error (RMSE) of 14.37 mmHg and coefficient of determination (R²) of 0.265, alongside diastolic blood pressure predictions with RMSE of 8.57 mmHg and R² of 0.187. These modest explained variance values—substantially below unity—reveal a fundamental limitation: blood pressure proves remarkably resistant to prediction from the demographic, anthropometric, and clinical variables typically available in epidemiological datasets. These findings illuminate a sobering reality regarding blood pressure prediction from routinely collected clinical data. The observation that standard variables account for merely one-quarter of blood pressure variance should temper expectations for machine learning applications within this domain, whilst simultaneously underscoring the necessity for richer data sources or novel biomarkers to achieve clinically meaningful predictive accuracy. Full article

(This article belongs to the Special Issue Recent Advances and Applications of Machine Learning in Pattern Recognition)

► Show Figures

Figure 1

20 pages, 1176 KB

Open AccessArticle

DnCNN-Based Denoising Model for Low-Dose Myocardial CT Perfusion Imaging

by Mahmud Hasan, Aaron So and Mahmoud R. El-Sakka

Electronics 2026, 15(1), 124; https://doi.org/10.3390/electronics15010124 - 26 Dec 2025

Cited by 1 | Viewed by 370

Abstract

Unlike high-dose scans, low-dose cardiac CT perfusion imaging reduces patient radiation exposure and thereby the risk of potential health effects. However, it introduces significant image noise, degrading diagnostic quality and limiting clinical assessment. Denoising is thus a critical preprocessing step to enhance image [...] Read more.

Unlike high-dose scans, low-dose cardiac CT perfusion imaging reduces patient radiation exposure and thereby the risk of potential health effects. However, it introduces significant image noise, degrading diagnostic quality and limiting clinical assessment. Denoising is thus a critical preprocessing step to enhance image quality without compromising anatomical or perfusion details. Traditionally used reconstruction-domain methods, such as Iterative Reconstruction and Compressed Sensing, are often limited by algorithmic complexity, dependence on raw sinogram data, and restricted adaptability. Conversely, image-domain methods offer more adaptable denoising options. Recently, learning-based approaches have further expanded this flexibility and demonstrated state-of-the-art performance across various denoising tasks. In this work, we present a deep learning-based denoising method specifically tuned for low-dose cardiac CT perfusion imaging. Our model is trained to reduce noise while preserving structural integrity and temporal contrast dynamics, which are critical for downstream analysis. Unlike many existing methods, our approach is optimized for perfusion data, where temporal consistency is essential. Residual cardiac motion remains a separate challenge, which we aim to address in our future work. Experimental results show significant improvements in quantitative image quality, using both reference-based and no-reference metrics, such as MSE/PSNR/SSIM and NIQE/FID/KID, as well as improved accuracy of perfusion measurements. Full article

(This article belongs to the Special Issue Recent Advances and Applications of Machine Learning in Pattern Recognition)

► Show Figures

Figure 1

25 pages, 3370 KB

Open AccessArticle

A SimAM-Enhanced Multi-Resolution CNN with BiGRU for EEG Emotion Recognition: 4D-MRSimNet

by Yutao Huang and Jijie Deng

Electronics 2026, 15(1), 39; https://doi.org/10.3390/electronics15010039 - 22 Dec 2025

Viewed by 328

Abstract

This study proposes 4D-MRSimNet, a framework that employs attention mechanisms to focus on distinct dimensions. The approach applies enhancements to key responses in the spatial and spectral domains and provides a characterization of dynamic evolution in temporal domain, which extracts and integrates complementary [...] Read more.

This study proposes 4D-MRSimNet, a framework that employs attention mechanisms to focus on distinct dimensions. The approach applies enhancements to key responses in the spatial and spectral domains and provides a characterization of dynamic evolution in temporal domain, which extracts and integrates complementary emotional features to facilitate final classification. At the feature level, differential entropy (DE) and power spectral density (PSD) are combined within four core frequency bands (

θ

,

α

,

β

, and

γ

). These bands are recognized as closely related to emotional processing. This integration constructs a complementary feature representation that preserves both energy distribution and entropy variability. These features are organized into a 4D representation that integrates electrode topology, frequency characteristics, and temporal dependencies inherent in EEG signals. At the network level, a multi-resolution convolutional module embedded with SimAM attention extracts spatial and spectral features at different scales and adaptively emphasizes key information. A bidirectional GRU (BiGRU) integrated with temporal attention further emphasizes critical time segments and strengthens the modeling of temporal dependencies. Experiments show that our method achieves an accuracy of 97.68% for valence and 97.61% for arousal on the DEAP dataset and 99.60% for valence and 99.46% for arousal on the DREAMER dataset. The results demonstrate the effectiveness of complementary feature fusion, multidimensional feature representation, and the complementary dual attention enhancement strategy for EEG emotion recognition. Full article

(This article belongs to the Special Issue Recent Advances and Applications of Machine Learning in Pattern Recognition)

► Show Figures

Figure 1

17 pages, 1639 KB

Open AccessArticle

Context-Aware Tourism Recommendations Using Retrieval-Augmented Large Language Models and Semantic Re-Ranking

by Ratomir Karlović, Mia Rovis, Alma Smajić, Luka Sever and Ivan Lorencin

Electronics 2025, 14(22), 4448; https://doi.org/10.3390/electronics14224448 - 14 Nov 2025

Viewed by 1240

Abstract

This study evaluates the performance of seven large language models (LLMs) in generating context-aware recommendations. The system is built on a collection of PDF documents (brochures) describing local events and activities, which are embedded into an FAISS vector store to support semantic retrieval. [...] Read more.

This study evaluates the performance of seven large language models (LLMs) in generating context-aware recommendations. The system is built on a collection of PDF documents (brochures) describing local events and activities, which are embedded into an FAISS vector store to support semantic retrieval. Synthetic user profiles are defined to simulate diverse preferences, while static weather conditions are incorporated to enhance the contextual relevance of recommendations. To further improve output quality, a reranking step, utilizing Cohere’s API, is used to refine the top retrieved results before passing them to the LLMs for final response generation. This allows better semantic organization of relevant content in line with user context. The main aim of this research is to identify which models best integrate multimodal inputs, such as user intent, profile attributes, environmental context and how these insights can inform the development of adaptive, personalized recommendation systems. The main contribution of this study is a structured comparative analysis of 7 LLMs, applied to a tourism-specific RAG framework, providing practical insights into how effectively different models integrate contextual factors to produce personalized recommendations. The evaluation revealed notable differences in model performance, with Qwen and Phi emerging as the strongest performers, whereas LLaMA frequently produced irrelevant recommendations. Moreover, many models favored gastronomy-related venues over other types of attractions. These findings indicate that although the RAG framework provides a solid foundation, the selection of underlying models plays an important role in achieving high quality recommendations. Full article

(This article belongs to the Special Issue Recent Advances and Applications of Machine Learning in Pattern Recognition)

► Show Figures

Figure 1

18 pages, 3213 KB

Open AccessArticle

Automating Code Recognition for Cargo Containers

by José Santos, Daniel Canedo and António J. R. Neves

Electronics 2025, 14(22), 4437; https://doi.org/10.3390/electronics14224437 - 14 Nov 2025

Cited by 1 | Viewed by 776

Abstract

Maritime transport plays a pivotal role in global trade, where efficiency and accuracy in port operations are crucial. Among the various tasks carried out in ports, container code recognition is essential for tracking and handling cargo. Manual inspections of container codes are becoming [...] Read more.

Maritime transport plays a pivotal role in global trade, where efficiency and accuracy in port operations are crucial. Among the various tasks carried out in ports, container code recognition is essential for tracking and handling cargo. Manual inspections of container codes are becoming increasingly impractical, as they induce delays and raise the risk of human error. To address these issues, this work proposes a hybrid Optical Character Recognition system that integrates YOLOv7 for text detection with the transformer-based TrOCR for recognition of the container codes, enabling accurate and efficient automated recognition. This design addresses the real-world challenges, such as varying light, distortions, and multi-orientation of container codes. To evaluate the system, we conducted a comprehensive evaluation on datasets that simulate the conditions found in port environments. The results demonstrate that the proposed hybrid model delivers significant improvements in detection and recognition accuracy and robustness compared to traditional OCR methods. In particular, the reliability in recognizing multi-oriented codes marks a notable advancement compared to existing solutions. Overall, this study presents an approach to automating container code recognition, contributing to the efficiency and modernization of port operations, with the potential to streamline port operations, reduce human error, and enhance the overall logistics workflow. Full article

(This article belongs to the Special Issue Recent Advances and Applications of Machine Learning in Pattern Recognition)

► Show Figures

Figure 1

17 pages, 11184 KB

Open AccessArticle

Automated Crack Detection in Micro-CT Scanning for Fiber-Reinforced Concrete Using Super-Resolution and Deep Learning

by João Pedro Gomes de Souza, Aristófanes Corrêa Silva, Marcello Congro, Deane Roehl, Anselmo Cardoso de Paiva, Sandra Pereira and António Cunha

Electronics 2025, 14(21), 4208; https://doi.org/10.3390/electronics14214208 - 28 Oct 2025

Cited by 1 | Viewed by 1018

Abstract

Fiber-reinforced concrete is a crucial material for civil construction, and monitoring its health is important for preserving structures and preventing accidents and financial losses. Among non-destructive monitoring methods, Micro Computed Tomography (Micro-CT) imaging stands out as an inexpensive method that is free from [...] Read more.

Fiber-reinforced concrete is a crucial material for civil construction, and monitoring its health is important for preserving structures and preventing accidents and financial losses. Among non-destructive monitoring methods, Micro Computed Tomography (Micro-CT) imaging stands out as an inexpensive method that is free from noise and external interference. However, manual inspection of these images is subjective and requires significant human effort. In recent years, several studies have successfully utilized Deep Learning models for the automatic detection of cracks in concrete. However, according to the literature, a gap remains in the context of detecting cracks using Micro-CT images of fiber-reinforced concrete. Therefore, this work proposes a framework for automatic crack detection that combines the following: (a) a super-resolution-based preprocessing to generate, for each image, versions with double and quadruple the original resolution, (b) a classification step using EfficientNetB0 to classify the type of concrete matrix, (c) specific training of Detection Transformer (DETR) models for each type of matrix and resolution, and (d) and a votation committee-based post-processing among the models trained for each resolution to reduce false positives. The model was trained on a new publicly available dataset, the FIRECON dataset, which consists of 4064 images annotated by an expert, achieving metrics of 86.098% Intersection over Union, 89.37% Precision, 83.26% Recall, 84.99% F1-Score, and 44.69% Average Precision. The framework, therefore, significantly reduces analysis time and improves consistency compared to the manual methods used in previous studies. The results demonstrate the potential of Deep Learning to aid image analysis in damage assessments, providing valuable insights into the damage mechanisms of fiber-reinforced concrete and contributing to the development of durable, high-performance engineering materials. Full article

(This article belongs to the Special Issue Recent Advances and Applications of Machine Learning in Pattern Recognition)

► Show Figures

Figure 1

18 pages, 43842 KB

Open AccessFeature PaperArticle

DPO-ESRGAN: Perceptually Enhanced Super-Resolution Using Direct Preference Optimization

by Wonwoo Yun and Hanhoon Park

Electronics 2025, 14(17), 3357; https://doi.org/10.3390/electronics14173357 - 23 Aug 2025

Viewed by 2123

Abstract

Super-resolution (SR) is a long-standing task in the field of computer vision that aims to improve the quality and resolution of an image. ESRGAN is a representative generative adversarial network specialized to produce perceptually convincing SR images. However, it often fails to recover [...] Read more.

Super-resolution (SR) is a long-standing task in the field of computer vision that aims to improve the quality and resolution of an image. ESRGAN is a representative generative adversarial network specialized to produce perceptually convincing SR images. However, it often fails to recover local details and still produces blurry or unnatural visual artifacts, resulting in producing SR images that people do not prefer. To address this problem, we propose to adopt Direct Preference Optimization (DPO), which was originally devised to fine-tune large language models based on human preferences. To this end, we develop a method for applying DPO to ESRGAN, and add a DPO loss for training the ESRGAN generator. Through ×4 SR experiments utilizing benchmark datasets, it is demonstrated that the proposed method can produce SR images with a significantly higher perceptual quality and higher human preference than ESRGAN and other ESRGAN variants that have modified the loss or network structure of ESRGAN. Specifically, when compared to ESRGAN, the proposed method achieved, on average, 0.32 lower PieAPP values, 0.79 lower NIQE values, and 0.05 higher PSNR values on the BSD100 dataset, as well as 0.32 lower PieAPP values, 0.32 lower NIQE values, and 0.17 higher PSNR values on the Set14 dataset. Full article

(This article belongs to the Special Issue Recent Advances and Applications of Machine Learning in Pattern Recognition)

► Show Figures

Figure 1

30 pages, 4741 KB

Open AccessArticle

TriViT-Lite: A Compact Vision Transformer–MobileNet Model with Texture-Aware Attention for Real-Time Facial Emotion Recognition in Healthcare

by Waqar Riaz, Jiancheng (Charles) Ji and Asif Ullah

Electronics 2025, 14(16), 3256; https://doi.org/10.3390/electronics14163256 - 16 Aug 2025

Cited by 2 | Viewed by 1059

Abstract

Facial emotion recognition has become increasingly important in healthcare, where understanding delicate cues like pain, discomfort, or unconsciousness can support more timely and responsive care. Yet, recognizing facial expressions in real-world settings remains challenging due to varying lighting, facial occlusions, and hardware limitations [...] Read more.

Facial emotion recognition has become increasingly important in healthcare, where understanding delicate cues like pain, discomfort, or unconsciousness can support more timely and responsive care. Yet, recognizing facial expressions in real-world settings remains challenging due to varying lighting, facial occlusions, and hardware limitations in clinical environments. To address this, we propose TriViT-Lite, a lightweight yet powerful model that blends three complementary components: MobileNet, for capturing fine-grained local features efficiently; Vision Transformers (ViT), for modeling global facial patterns; and handcrafted texture descriptors, such as Local Binary Patterns (LBP) and Histograms of Oriented Gradients (HOG), for added robustness. These multi-scale features are brought together through a texture-aware cross-attention fusion mechanism that helps the model focus on the most relevant facial regions dynamically. TriViT-Lite is evaluated on both benchmark datasets (FER2013, AffectNet) and a custom healthcare-oriented dataset covering seven critical emotional states, including pain and unconsciousness. It achieves a competitive accuracy of 91.8% on FER2013 and of 87.5% on the custom dataset while maintaining real-time performance (~15 FPS) on resource-constrained edge devices. Our results show that TriViT-Lite offers a practical and accurate solution for real-time emotion recognition, particularly in healthcare settings. It strikes a balance between performance, interpretability, and efficiency, making it a strong candidate for machine-learning-driven pattern recognition in patient-monitoring applications. Full article

(This article belongs to the Special Issue Recent Advances and Applications of Machine Learning in Pattern Recognition)

► Show Figures

Figure 1

25 pages, 8862 KB

Open AccessArticle

Building a Self-Explanatory Social Robot on the Basis of an Explanation-Oriented Runtime Knowledge Model

by José Galeas, Alberto Tudela, Óscar Pons, Suna Bensch, Thomas Hellström and Antonio Bandera

Electronics 2025, 14(16), 3178; https://doi.org/10.3390/electronics14163178 - 10 Aug 2025

Cited by 1 | Viewed by 1350

Abstract

In recent years, there has been growing interest in developing robots capable of explaining their behavior, thereby improving their acceptance by humans with whom they share their environment. Proposed software designs are typically based on the advances being made in conversational systems built [...] Read more.

In recent years, there has been growing interest in developing robots capable of explaining their behavior, thereby improving their acceptance by humans with whom they share their environment. Proposed software designs are typically based on the advances being made in conversational systems built on deep learning techniques. However, apart from the ability to formulate explanations, the robot also needs an internal episodic memory, where it stores information from the continuous stream of experiences. Most previous proposals are designed to deal with short streams of episodic data (several minutes long). With the aim of managing larger experiences, we propose in this work a high-level episodic memory, where relevant events are abstracted to natural language concepts. The proposed framework is intimately linked to a software architecture in which the explanations, whether externalized or not, are shaped internally in a collaborative process involving the task-oriented software agents that make up the architecture. The core of this process is a runtime knowledge model, employed as working memory whose evolution allows for capturing the causal events stored in the episodic memory. We present several use cases that illustrate how the suggested framework allows an autonomous robot to generate correct and relevant explanations of its actions and behavior. Full article

(This article belongs to the Special Issue Recent Advances and Applications of Machine Learning in Pattern Recognition)

► Show Figures

Figure 1

14 pages, 1712 KB

Open AccessArticle

Machine Learning-Based Predictive Model for Risk Stratification of Multiple Myeloma from Monoclonal Gammopathy of Undetermined Significance

by Amparo Santamaría, Marcos Alfaro, Cristina Antón, Beatriz Sánchez-Quiñones, Nataly Ibarra, Arturo Gil, Oscar Reinoso and Luis Payá

Electronics 2025, 14(15), 3014; https://doi.org/10.3390/electronics14153014 - 29 Jul 2025

Cited by 3 | Viewed by 1212

Abstract

Monoclonal Gammopathy of Undetermined Significance (MGUS) is a precursor to hematologic malignancies such as Multiple Myeloma (MM) and Waldenström Macroglobulinemia (WM). Accurate risk stratification of MGUS patients remains a clinical and computational challenge, with existing models often misclassifying both high-risk and low-risk individuals, [...] Read more.

Monoclonal Gammopathy of Undetermined Significance (MGUS) is a precursor to hematologic malignancies such as Multiple Myeloma (MM) and Waldenström Macroglobulinemia (WM). Accurate risk stratification of MGUS patients remains a clinical and computational challenge, with existing models often misclassifying both high-risk and low-risk individuals, leading to inefficient healthcare resource allocation. This study presents a machine learning (ML)-based approach for early prediction of MM/WM progression, using routinely collected hematological data, which are selected based on clinical relevance. A retrospective cohort of 292 MGUS patients, including 7 who progressed to malignancy, was analyzed. For each patient, a feature descriptor was constructed incorporating the latest biomarker values, their temporal trends over the previous year, age, and immunoglobulin subtype. To address the inherent class imbalance, data augmentation techniques were applied. Multiple ML classifiers were evaluated, with the Support Vector Machine (SVM) achieving the highest performance (94.3% accuracy and F1-score). The model demonstrates that a compact set of clinically relevant features can yield robust predictive performance. These findings highlight the potential of ML-driven decision-support systems in electronic health applications, offering a scalable solution for improving MGUS risk stratification, optimizing clinical workflows, and enabling earlier interventions. Full article

(This article belongs to the Special Issue Recent Advances and Applications of Machine Learning in Pattern Recognition)

► Show Figures

Graphical abstract

24 pages, 9767 KB

Open AccessArticle

Improved Binary Classification of Underwater Images Using a Modified ResNet-18 Model

by Mehrunnisa, Mikolaj Leszczuk, Dawid Juszka and Yi Zhang

Electronics 2025, 14(15), 2954; https://doi.org/10.3390/electronics14152954 - 24 Jul 2025

Cited by 3 | Viewed by 2462

Abstract

In recent years, the classification of underwater images has become one of the most remarkable areas of research in computer vision due to its useful applications in marine sciences, aquatic robotics, and sea exploration. Underwater imaging is pivotal for the evaluation of marine [...] Read more.

In recent years, the classification of underwater images has become one of the most remarkable areas of research in computer vision due to its useful applications in marine sciences, aquatic robotics, and sea exploration. Underwater imaging is pivotal for the evaluation of marine eco-systems, analysis of biological habitats, and monitoring underwater infrastructure. Extracting useful information from underwater images is highly challenging due to factors such as light distortion, scattering, poor contrast, and complex foreground patterns. These difficulties make traditional image processing and machine learning techniques struggle to analyze images accurately. As a result, these challenges and complexities make the classification difficult or poor to perform. Recently, deep learning techniques, especially convolutional neural network (CNN), have emerged as influential tools for underwater image classification, contributing noteworthy improvements in accuracy and performance in the presence of all these challenges. In this paper, we have proposed a modified ResNet-18 model for the binary classification of underwater images into raw and enhanced images. In the proposed modified ResNet-18 model, we have added new layers such as Linear, rectified linear unit (ReLU) and dropout layers, arranged in a block that was repeated three times to enhance feature extraction and improve learning. This enables our model to learn the complex patterns present in the image in more detail, which helps the model to perform the classification very well. Due to these newly added layers, our proposed model addresses various complexities such as noise, distortion, varying illumination conditions, and complex patterns by learning vigorous features from underwater image datasets. To handle the issue of class imbalance present in the dataset, we applied a data augmentation technique. The proposed model achieved outstanding performance, with 96% accuracy, 99% precision, 92% sensitivity, 99% specificity, 95% F1-score, and a 96% Area under the Receiver Operating Characteristic Curve (AUC-ROC) score. These results demonstrate the strength and reliability of our proposed model in handling the challenges posed by the underwater imagery and making it a favorable solution for advancing underwater image classification tasks. Full article

(This article belongs to the Special Issue Recent Advances and Applications of Machine Learning in Pattern Recognition)

► Show Figures

Figure 1

16 pages, 7057 KB

Open AccessFeature PaperArticle

VRBiom: A New Periocular Dataset for Biometric Applications of Head-Mounted Display

by Ketan Kotwal, Ibrahim Ulucan, Gökhan Özbulak, Janani Selliah and Sébastien Marcel

Electronics 2025, 14(9), 1835; https://doi.org/10.3390/electronics14091835 - 30 Apr 2025

Cited by 1 | Viewed by 2035

Abstract

With advancements in hardware, high-quality head-mounted display (HMD) devices are being developed by numerous companies, driving increased consumer interest in AR, VR, and MR applications. This proliferation of HMD devices opens up possibilities for a wide range of applications beyond entertainment. Most commercially [...] Read more.

With advancements in hardware, high-quality head-mounted display (HMD) devices are being developed by numerous companies, driving increased consumer interest in AR, VR, and MR applications. This proliferation of HMD devices opens up possibilities for a wide range of applications beyond entertainment. Most commercially available HMD devices are equipped with internal inward-facing cameras to record the periocular areas. Given the nature of these devices and captured data, many applications such as biometric authentication and gaze analysis become feasible. To effectively explore the potential of HMDs for these diverse use-cases and to enhance the corresponding techniques, it is essential to have an HMD dataset that captures realistic scenarios. In this work, we present a new dataset of periocular videos acquired using a virtual reality headset called VRBiom. The VRBiom, targeted at biometric applications, consists of 900 short videos acquired from 25 individuals recorded in the NIR spectrum. These 10 s long videos have been captured using the internal tracking cameras of Meta Quest Pro at 72 FPS. To encompass real-world variations, the dataset includes recordings under three gaze conditions: steady, moving, and partially closed eyes. We have also ensured an equal split of recordings without and with glasses to facilitate the analysis of eye-wear. These videos, characterized by non-frontal views of the eye and relatively low spatial resolutions (

400 \times 400

), can be instrumental in advancing state-of-the-art research across various biometric applications. The VRBiom dataset can be utilized to evaluate, train, or adapt models for biometric use-cases such as iris and/or periocular recognition and associated sub-tasks such as detection and semantic segmentation. In addition to data from real individuals, we have included around 1100 presentation attacks constructed from 92 PA instruments. These PAIs fall into six categories constructed through combinations of print attacks (real and synthetic identities), fake 3D eyeballs, plastic eyes, and various types of masks and mannequins. These PA videos, combined with genuine (bona fide) data, can be utilized to address concerns related to spoofing, which is a significant threat if these devices are to be used for authentication. The VRBiom dataset is publicly available for research purposes related to biometric applications only. Full article

(This article belongs to the Special Issue Recent Advances and Applications of Machine Learning in Pattern Recognition)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Recent Advances and Applications of Machine Learning in Pattern Recognition

Share This Special Issue

Special Issue Editors

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Published Papers (14 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI