Machine Learning Applications in Pattern Recognition

A special issue of Computers (ISSN 2073-431X).

Deadline for manuscript submissions: 30 June 2025 | Viewed by 24017

Special Issue Editor


E-Mail Website
Guest Editor
School of Information Science and Technology, Donghua University, Shanghai 201620, China
Interests: image processing; pattern recognition; hyperspectral data analysis and processing; multi-source information fusion and application
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

The field of pattern recognition has seen significant advancements in recent years largely due to the integration of machine learning techniques. Machine learning algorithms have enabled the development of more accurate and efficient pattern recognition systems across a wide range of applications, including image and speech recognition, biometrics, medical imaging, remote sensing, communication, and more.

This Special Issue aims to showcase the latest research and developments in the area of machine learning applications in pattern recognition. We invite researchers, academics, and practitioners to submit original research articles, reviews, and case studies that explore the use of machine learning algorithms in pattern recognition tasks. Extended conference papers are also welcome, but they should contain at least 50% of new material, e.g., in the form of technical extensions, more in-depth evaluations, or additional use cases.

Topics of interest include, but are not limited to, the following:

  • Deep learning for pattern recognition;
  • Feature selection and extraction techniques;
  • Ensemble learning methods in pattern recognition;
  • Transfer learning for pattern recognition;
  • Supervised, unsupervised, and semi-supervised techniques;
  • Applications of machine learning in biometrics;
  • Machine learning approaches for medical image analyses;
  • Pattern recognition in natural language processing;
  • Machine learning approaches in remote sensing image analyses;
  • Application of machine learning to communication systems;
  • Ethical considerations in machine learning applications in pattern recognition;
  • Machine learning for multi-source data fusion.

Dr. Xiaochen Lu
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Computers is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1800 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • deep learning
  • feature extraction
  • ensemble learning
  • transfer learning
  • semi-supervised learning
  • unsupervised learning
  • biometrics
  • medical imaging
  • natural language processing
  • data fusion

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (16 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Other

26 pages, 12177 KiB  
Article
An Efficient Hybrid 3D Computer-Aided Cephalometric Analysis for Lateral Cephalometric and Cone-Beam Computed Tomography (CBCT) Systems
by Laurine A. Ashame, Sherin M. Youssef, Mazen Nabil Elagamy and Sahar M. El-Sheikh
Computers 2025, 14(6), 223; https://doi.org/10.3390/computers14060223 - 7 Jun 2025
Viewed by 121
Abstract
Lateral cephalometric analysis is commonly used in orthodontics for skeletal classification to ensure an accurate and reliable diagnosis for treatment planning. However, most current research depends on analyzing different type of radiographs, which requires more computational time than 3D analysis. Consequently, this study [...] Read more.
Lateral cephalometric analysis is commonly used in orthodontics for skeletal classification to ensure an accurate and reliable diagnosis for treatment planning. However, most current research depends on analyzing different type of radiographs, which requires more computational time than 3D analysis. Consequently, this study addresses fully automatic orthodontics tracing based on the usage of artificial intelligence (AI) applied to 2D and 3D images, by designing a cephalometric system that analyzes the significant landmarks and regions of interest (ROI) needed in orthodontics tracing, especially for the mandible and maxilla teeth. In this research, a computerized system is developed to automate the tasks of orthodontics evaluation during 2D and Cone-Beam Computed Tomography (CBCT or 3D) systems measurements. This work was tested on a dataset that contains images of males and females obtained from dental hospitals with patient-informed consent. The dataset consists of 2D lateral cephalometric, panorama and CBCT radiographs. Many scenarios were applied to test the proposed system in landmark prediction and detection. Moreover, this study integrates the Grad-CAM (Gradient-Weighted Class Activation Mapping) technique to generate heat maps, providing transparent visualization of the regions the model focuses on during its decision-making process. By enhancing the interpretability of deep learning predictions, Grad-CAM strengthens clinical confidence in the system’s outputs, ensuring that ROI detection aligns with orthodontic diagnostic standards. This explainability is crucial in medical AI applications, where understanding model behavior is as important as achieving high accuracy. The experimental results achieved an accuracy exceeding 98.9%. This research evaluates and differentiates between the two-dimensional and the three-dimensional tracing analyses applied to measurements based on the practices of the European Board of Orthodontics. The results demonstrate the proposed methodology’s robustness when applied to cephalometric images. Furthermore, the evaluation of 3D analysis usage provides a clear understanding of the significance of integrated deep-learning techniques in orthodontics. Full article
(This article belongs to the Special Issue Machine Learning Applications in Pattern Recognition)
Show Figures

Figure 1

15 pages, 7036 KiB  
Article
Detection of Fiber-Flaw on Pill Surface Based on Lightweight Network SA-MGhost-DVGG
by Jipei Lou, Hongyi Wang, Haodong Liang and Ziwei Wu
Computers 2025, 14(5), 200; https://doi.org/10.3390/computers14050200 - 21 May 2025
Viewed by 154
Abstract
Fiber-flaw detection on pill surfaces is a critical yet challenging task in industrial pharmacy due to diverse defect characteristics. To overcome the limitations of traditional methods in accuracy and real-time performance, this study introduces SA-MGhost-DVGG, a novel lightweight network for enhanced detection. The [...] Read more.
Fiber-flaw detection on pill surfaces is a critical yet challenging task in industrial pharmacy due to diverse defect characteristics. To overcome the limitations of traditional methods in accuracy and real-time performance, this study introduces SA-MGhost-DVGG, a novel lightweight network for enhanced detection. The proposed network integrates an MGhost module for reducing parameters and computational load, a mixed-channel spatial attention (SA) module to refine features specific to fiber regions, and depthwise separable convolutions (DepSepConv) for efficient dimensionality reduction while preserving feature information. Experimental evaluations demonstrate that SA-MGhost-DVGG achieves a mean detection accuracy of 99.01% with an average inference time of 2.23 ms per pill. The findings confirm that SA-MGhost-DVGG effectively balances high accuracy with computational efficiency, offering a robust solution for industrial applications. Full article
(This article belongs to the Special Issue Machine Learning Applications in Pattern Recognition)
Show Figures

Figure 1

18 pages, 10587 KiB  
Article
M18K: A Multi-Purpose Real-World Dataset for Mushroom Detection, 3D Pose Estimation, and Growth Monitoring
by Abdollah Zakeri, Mulham Fawakherji, Jiming Kang, Bikram Koirala, Venkatesh Balan, Weihang Zhu, Driss Benhaddou and Fatima A. Merchant
Computers 2025, 14(5), 199; https://doi.org/10.3390/computers14050199 - 20 May 2025
Viewed by 314
Abstract
Automating agricultural processes holds significant promise for enhancing efficiency and sustainability in various farming practices. This paper contributes to the automation of agricultural processes by providing a dedicated mushroom detection dataset related to automated harvesting, 3D pose estimation, and growth monitoring of the [...] Read more.
Automating agricultural processes holds significant promise for enhancing efficiency and sustainability in various farming practices. This paper contributes to the automation of agricultural processes by providing a dedicated mushroom detection dataset related to automated harvesting, 3D pose estimation, and growth monitoring of the button mushroom produced using Agaricus Bisporus fungi. With a total of 2000 images for object detection, instance segmentation, and 3D pose estimation—containing over 100,000 mushroom instances—and an additional 3838 images for yield estimation featuring eight mushroom scenes covering the complete growth period, it fills the gap in mushroom-specific datasets and serves as a benchmark for detection and instance segmentation as well as 3D pose estimation algorithms in smart mushroom agriculture. The dataset, featuring realistic growth environment scenarios with comprehensive 2D and 3D annotations, is assessed using advanced detection and instance segmentation algorithms. This paper details the dataset’s characteristics, presents detailed statistics on mushroom growth and yield, evaluates algorithmic performance, and, for broader applicability, makes all resources publicly available, including images, code, and trained models, via our GitHub repository. (accessed on 22 March 2025). Full article
(This article belongs to the Special Issue Machine Learning Applications in Pattern Recognition)
Show Figures

Figure 1

17 pages, 11121 KiB  
Article
Few-Shot Data Augmentation by Morphology-Constrained Latent Diffusion for Enhanced Nematode Recognition
by Xiong Ouyang, Jiayan Zhuang, Jianfeng Gu and Sichao Ye
Computers 2025, 14(5), 198; https://doi.org/10.3390/computers14050198 - 19 May 2025
Viewed by 189
Abstract
Plant-parasiticnematodes represent a significant biosecurity threat in cross-border plant quarantine, necessitating precise identification for effective border control. While DL models have demonstrated success in nematode image classification based on morphological features, the limited availability of high-quality samples and the species-specific nature of nematodes [...] Read more.
Plant-parasiticnematodes represent a significant biosecurity threat in cross-border plant quarantine, necessitating precise identification for effective border control. While DL models have demonstrated success in nematode image classification based on morphological features, the limited availability of high-quality samples and the species-specific nature of nematodes result in insufficient training data, which constrains model performance. Although generative models have shown promise in data augmentation, they often struggle to balance morphological fidelity and phenotypic diversity. This paper proposes a novel few-shot data augmentation framework based on a morphology-constrained latent diffusion model, which, for the first time, integrates morphological constraints into the latent diffusion process. By geometrically parameterizing nematode morphology, the proposed approach enhances topological fidelity in the generated images and addresses key limitations of traditional generative models in controlling biological shapes. This framework is designed to augment nematode image datasets and improve classification performance under limited data conditions. The framework consists of three key components: First, we incorporate a fine-tuning strategy that preserves the generalization capability of model in few-shot settings. Second, we extract morphological constraints from nematode images using edge detection and a moving least squares method, capturing key structural details. Finally, we embed these constraints into the latent space of the diffusion model, ensuring generated images maintain both fidelity and diversity. Experimental results demonstrate that our approach significantly enhances classification accuracy. For imbalanced datasets, the Top-1 accuracy of multiple classification models improved by 7.34–14.66% compared to models trained without augmentation, and by 2.0–5.67% compared to models using traditional data augmentation. Additionally, when replacing up to 25% of real images with generated ones in a balanced dataset, model performance remained nearly unchanged, indicating the robustness and effectiveness of the method. Ablation experiments demonstrate that the morphology-guided strategy achieves superior image quality compared to both unconstrained and edge-based constraint methods, with a Fréchet Inception Distance of 12.95 and an Inception Score of 1.21 ± 0.057. These results indicate that the proposed method effectively balances morphological fidelity and phenotypic diversity in image generation. Full article
(This article belongs to the Special Issue Machine Learning Applications in Pattern Recognition)
Show Figures

Figure 1

14 pages, 4391 KiB  
Article
AFQSeg: An Adaptive Feature Quantization Network for Instance-Level Surface Crack Segmentation
by Shaoliang Fang, Lu Lu, Zhu Lin, Zhanyu Yang and Shaosheng Wang
Computers 2025, 14(5), 182; https://doi.org/10.3390/computers14050182 - 9 May 2025
Viewed by 272
Abstract
Concrete surface crack detection plays a crucial role in infrastructure maintenance and safety. Deep learning-based methods have shown great potential in this task. However, under real-world conditions such as poor image quality, environmental interference, and complex crack patterns, existing models still face challenges [...] Read more.
Concrete surface crack detection plays a crucial role in infrastructure maintenance and safety. Deep learning-based methods have shown great potential in this task. However, under real-world conditions such as poor image quality, environmental interference, and complex crack patterns, existing models still face challenges in detecting fine cracks and often rely on large training parameters, limiting their practicality in complex environments. To address these issues, this paper proposes a crack detection model based on adaptive feature quantization, which primarily consists of a maximum soft pooling module, an adaptive crack feature quantization module, and a trainable crack post-processing module. Specifically, the maximum soft pooling module improves the continuity and integrity of detected cracks. The adaptive crack feature quantization module enhances the contrast between cracks and background features and strengthens the model’s focus on critical regions through spatial feature fusion. The trainable crack post-processing module incorporates edge-guided post-processing algorithms to correct false predictions and refine segmentation results. Experiments conducted on the Crack500 Road Crack Dataset show that, the proposed model achieves notable improvements in detection accuracy and efficiency, with an average F1-score improvement of 2.81% and a precision gain of 2.20% over the baseline methods. In addition, the model significantly reduces computational cost, achieving a 78.5–88.7% reduction in parameter size and up to 96.8% improvement in inference speed, making it more efficient and deployable for real-world crack detection applications. Full article
(This article belongs to the Special Issue Machine Learning Applications in Pattern Recognition)
Show Figures

Figure 1

23 pages, 3368 KiB  
Article
SDKU-Net: A Novel Architecture with Dynamic Kernels and Optimizer Switching for Enhanced Shadow Detection in Remote Sensing
by Gilberto Alvarado-Robles, Isac Andres Espinosa-Vizcaino, Carlos Gustavo Manriquez-Padilla and Juan Jose Saucedo-Dorantes
Computers 2025, 14(3), 80; https://doi.org/10.3390/computers14030080 - 23 Feb 2025
Viewed by 1804
Abstract
Shadows in remote sensing images often introduce challenges in accurate segmentation due to their variability in shape, size, and texture. To address these issues, this study proposes the Supervised Dynamic Kernel U-Net (SDKU-Net), a novel architecture designed to enhance shadow detection in complex [...] Read more.
Shadows in remote sensing images often introduce challenges in accurate segmentation due to their variability in shape, size, and texture. To address these issues, this study proposes the Supervised Dynamic Kernel U-Net (SDKU-Net), a novel architecture designed to enhance shadow detection in complex remote sensing scenarios. SDKU-Net integrates dynamic kernel adjustment, a combined loss function incorporating Focal and Tversky Loss, and optimizer switching to effectively tackle class imbalance and improve segmentation quality. Using the AISD dataset, the proposed method achieved state-of-the-art performance with an Intersection over Union (IoU) of 0.8552, an F1-Score of 0.9219, an Overall Accuracy (OA) of 96.50%, and a Balanced Error Rate (BER) of 5.08%. Comparative analyses demonstrate SDKU-Net’s superior performance against established methods such as U-Net, U-Net++, MSASDNet, and CADDN. Additionally, the model’s efficient training process, requiring only 75 epochs, highlights its potential for resource-constrained applications. These results underscore the robustness and adaptability of SDKU-Net, paving the way for advancements in shadow detection and segmentation across diverse fields. Full article
(This article belongs to the Special Issue Machine Learning Applications in Pattern Recognition)
Show Figures

Figure 1

32 pages, 8818 KiB  
Article
Latent Outlier Exposure in Real-Time Anomaly Detection at the Large Hadron Collider
by Thomas Dartnall Stern, Amit Kumar Mishra and James Michael Keaveney
Computers 2025, 14(3), 79; https://doi.org/10.3390/computers14030079 - 20 Feb 2025
Viewed by 825
Abstract
We propose a novel approach to real-time anomaly detection at the Large Hadron Collider, aimed at enhancing the discovery potential for new fundamental phenomena in particle physics. Our method leverages the Latent Outlier Exposure technique and is evaluated using three distinct anomaly detection [...] Read more.
We propose a novel approach to real-time anomaly detection at the Large Hadron Collider, aimed at enhancing the discovery potential for new fundamental phenomena in particle physics. Our method leverages the Latent Outlier Exposure technique and is evaluated using three distinct anomaly detection models. Among these is a novel adaptation of the variational autoencoder’s reparameterisation trick, specifically optimised for anomaly detection. The models are validated on simulated datasets representing collider processes from the Standard Model and hypothetical Beyond the Standard Model scenarios. The results demonstrate significant advantages, particularly in addressing the formidable challenge of developing a signal-agnostic, hardware-level anomaly detection trigger for experiments at the Large Hadron Collider. Full article
(This article belongs to the Special Issue Machine Learning Applications in Pattern Recognition)
Show Figures

Figure 1

27 pages, 5537 KiB  
Article
Real-Time Gaze Estimation Using Webcam-Based CNN Models for Human–Computer Interactions
by Visal Vidhya and Diego Resende Faria
Computers 2025, 14(2), 57; https://doi.org/10.3390/computers14020057 - 10 Feb 2025
Viewed by 2078
Abstract
Gaze tracking and estimation are essential for understanding human behavior and enhancing human–computer interactions. This study introduces an innovative, cost-effective solution for real-time gaze tracking using a standard webcam, providing a practical alternative to conventional methods that rely on expensive infrared (IR) cameras. [...] Read more.
Gaze tracking and estimation are essential for understanding human behavior and enhancing human–computer interactions. This study introduces an innovative, cost-effective solution for real-time gaze tracking using a standard webcam, providing a practical alternative to conventional methods that rely on expensive infrared (IR) cameras. Traditional approaches, such as Pupil Center Corneal Reflection (PCCR), require IR cameras to capture corneal reflections and iris glints, demanding high-resolution images and controlled environments. In contrast, the proposed method utilizes a convolutional neural network (CNN) trained on webcam-captured images to achieve precise gaze estimation. The developed deep learning model achieves a mean squared error (MSE) of 0.0112 and an accuracy of 90.98% through a novel trajectory-based accuracy evaluation system. This system involves an animation of a ball moving across the screen, with the user’s gaze following the ball’s motion. Accuracy is determined by calculating the proportion of gaze points falling within a predefined threshold based on the ball’s radius, ensuring a comprehensive evaluation of the system’s performance across all screen regions. Data collection is both simplified and effective, capturing images of the user’s right eye while they focus on the screen. Additionally, the system includes advanced gaze analysis tools, such as heat maps, gaze fixation tracking, and blink rate monitoring, which are all integrated into an intuitive user interface. The robustness of this approach is further enhanced by incorporating Google’s Mediapipe model for facial landmark detection, improving accuracy and reliability. The evaluation results demonstrate that the proposed method delivers high-accuracy gaze prediction without the need for expensive equipment, making it a practical and accessible solution for diverse applications in human–computer interactions and behavioral research. Full article
(This article belongs to the Special Issue Machine Learning Applications in Pattern Recognition)
Show Figures

Figure 1

24 pages, 11018 KiB  
Article
Integrating Few-Shot Learning and Multimodal Image Enhancement in GNut: A Novel Approach to Groundnut Leaf Disease Detection
by Imran Qureshi
Computers 2024, 13(12), 306; https://doi.org/10.3390/computers13120306 - 22 Nov 2024
Viewed by 1437
Abstract
Groundnut is a vital crop worldwide, but its production is significantly threatened by various leaf diseases. Early identification of such diseases is vital for maintaining agricultural productivity. Deep learning techniques have been employed to address this challenge and enhance the detection, recognition, and [...] Read more.
Groundnut is a vital crop worldwide, but its production is significantly threatened by various leaf diseases. Early identification of such diseases is vital for maintaining agricultural productivity. Deep learning techniques have been employed to address this challenge and enhance the detection, recognition, and classification of groundnut leaf diseases, ensuring better management and protection of this important crop. This paper presents a new approach to the detection and classification of groundnut leaf diseases by the use of an advanced deep learning model, GNut, which integrates ResNet50 and DenseNet121 architectures for feature extraction and Few-Shot Learning (FSL) for classification. The proposed model overcomes groundnut crop diseases by addressing an efficient and highly accurate method of managing diseases in agriculture. Evaluated on a novel Pak-Nuts dataset collected from groundnut fields in Pakistan, the GNut model achieves promising accuracy rates of 99% with FSL and 95% without it. Advanced image preprocessing techniques, such as Multi-Scale Retinex with Color Restoration and Adaptive Histogram Equalization and Multimodal Image Enhancement for Vegetative Feature Isolation were employed to enhance the quality of input data, further improving classification accuracy. These results illustrate the robustness of the proposed model in real agricultural applications, establishing a new benchmark for groundnut leaf disease detection and highlighting the potential of AI-powered solutions to play a role in encouraging sustainable agricultural practices. Full article
(This article belongs to the Special Issue Machine Learning Applications in Pattern Recognition)
Show Figures

Figure 1

20 pages, 4520 KiB  
Article
Employing Different Algorithms of Lightweight Convolutional Neural Network Models in Image Distortion Classification
by Ismail Taha Ahmed, Falah Amer Abdulazeez and Baraa Tareq Hammad
Computers 2024, 13(10), 268; https://doi.org/10.3390/computers13100268 - 12 Oct 2024
Cited by 1 | Viewed by 1639
Abstract
The majority of applications use automatic image recognition technologies to carry out a range of tasks. Therefore, it is crucial to identify and classify image distortions to improve image quality. Despite efforts in this area, there are still many challenges in accurately and [...] Read more.
The majority of applications use automatic image recognition technologies to carry out a range of tasks. Therefore, it is crucial to identify and classify image distortions to improve image quality. Despite efforts in this area, there are still many challenges in accurately and reliably classifying distorted images. In this paper, we offer a comprehensive analysis of models of both non-lightweight and lightweight deep convolutional neural networks (CNNs) for the classification of distorted images. Subsequently, an effective method is proposed to enhance the overall performance of distortion image classification. This method involves selecting features from the pretrained models’ capabilities and using a strong classifier. The experiments utilized the kadid10k dataset to assess the effectiveness of the results. The K-nearest neighbor (KNN) classifier showed better performance than the naïve classifier in terms of accuracy, precision, error rate, recall and F1 score. Additionally, SqueezeNet outperformed other deep CNN models, both lightweight and non-lightweight, across every evaluation metric. The experimental results demonstrate that combining SqueezeNet with KNN can effectively and accurately classify distorted images into the correct categories. The proposed SqueezeNet-KNN method achieved an accuracy rate of 89%. As detailed in the results section, the proposed method outperforms state-of-the-art methods in accuracy, precision, error, recall, and F1 score measures. Full article
(This article belongs to the Special Issue Machine Learning Applications in Pattern Recognition)
Show Figures

Figure 1

18 pages, 8530 KiB  
Article
Spatiotemporal Bayesian Machine Learning for Estimation of an Empirical Lower Bound for Probability of Detection with Applications to Stationary Wildlife Photography
by Mohamed Jaber, Robert D. Breininger, Farag Hamad and Nezamoddin N. Kachouie
Computers 2024, 13(10), 255; https://doi.org/10.3390/computers13100255 - 8 Oct 2024
Viewed by 991
Abstract
An important parameter in the monitoring and surveillance systems is the probability of detection. Advanced wildlife monitoring systems rely on camera traps for stationary wildlife photography and have been broadly used for estimation of population size and density. Camera encounters are collected for [...] Read more.
An important parameter in the monitoring and surveillance systems is the probability of detection. Advanced wildlife monitoring systems rely on camera traps for stationary wildlife photography and have been broadly used for estimation of population size and density. Camera encounters are collected for estimation and management of a growing population size using spatial capture models. The accuracy of the estimated population size relies on the detection probability of the individual animals, and in turn depends on observed frequency of the animal encounters with the camera traps. Therefore, optimal coverage by the camera grid is essential for reliable estimation of the population size and density. The goal of this research is implementing a spatiotemporal Bayesian machine learning model to estimate a lower bound for probability of detection of a monitoring system. To obtain an accurate estimate of population size in this study, an empirical lower bound for probability of detection is realized considering the sensitivity of the model to the augmented sample size. The monitoring system must attain a probability of detection greater than the established empirical lower bound to achieve a pertinent estimation accuracy. It was found that for stationary wildlife photography, a camera grid with a detection probability of at least 0.3 is required for accurate estimation of the population size. A notable outcome is that a moderate probability of detection or better is required to obtain a reliable estimate of the population size using spatiotemporal machine learning. As a result, the required probability of detection is recommended when designing an automated monitoring system. The number and location of cameras in the camera grid will determine the camera coverage. Consequently, camera coverage and the individual home-range verify the probability of detection. Full article
(This article belongs to the Special Issue Machine Learning Applications in Pattern Recognition)
Show Figures

Figure 1

17 pages, 3728 KiB  
Article
YOLOv8-Based Drone Detection: Performance Analysis and Optimization
by Betul Yilmaz and Ugurhan Kutbay
Computers 2024, 13(9), 234; https://doi.org/10.3390/computers13090234 - 17 Sep 2024
Cited by 1 | Viewed by 5122
Abstract
The extensive utilization of drones has led to numerous scenarios that encompass both advantageous and perilous outcomes. By using deep learning techniques, this study aimed to reduce the dangerous effects of drone use through early detection of drones. The purpose of this study [...] Read more.
The extensive utilization of drones has led to numerous scenarios that encompass both advantageous and perilous outcomes. By using deep learning techniques, this study aimed to reduce the dangerous effects of drone use through early detection of drones. The purpose of this study is the evaluation of deep learning approaches such as pre-trained YOLOv8 drone detection for security issues. This study focuses on the YOLOv8 model to achieve optimal performance in object detection tasks using a publicly available dataset collected by Mehdi Özel for a UAV competition that is sourced from GitHub. These images are labeled using Roboflow, and the model is trained on Google Colab. YOLOv8, known for its advanced architecture, was selected due to its suitability for real-time detection applications and its ability to process complex visual data. Hyperparameter tuning and data augmentation techniques were applied to maximize the performance of the model. Basic hyperparameters such as learning rate, batch size, and optimization settings were optimized through iterative experiments to provide the best performance. In addition to hyperparameter tuning, various data augmentation strategies were used to increase the robustness and generalization ability of the model. Techniques such as rotation, scaling, flipping, and color adjustments were applied to the dataset to simulate different conditions and variations. Among the augmentation techniques applied to the specific dataset in this study, rotation was found to deliver the highest performance. Blurring and cropping methods were observed to follow closely behind. The combination of optimized hyperparameters and strategic data augmentation allowed YOLOv8 to achieve high detection accuracy and reliable performance on the publicly available dataset. This method demonstrates the effectiveness of YOLOv8 in real-world scenarios, while also highlighting the importance of hyperparameter tuning and data augmentation in increasing model capabilities. To enhance model performance, dataset augmentation techniques including rotation and blurring are implemented. Following these steps, a significant precision value of 0.946, a notable recall value of 0.9605, and a considerable precision–recall curve value of 0.978 are achieved, surpassing many popular models such as Mask CNN, CNN, and YOLOv5. Full article
(This article belongs to the Special Issue Machine Learning Applications in Pattern Recognition)
Show Figures

Figure 1

18 pages, 5905 KiB  
Article
Detection of Bus Driver Mobile Phone Usage Using Kolmogorov-Arnold Networks
by János Hollósi, Áron Ballagi, Gábor Kovács, Szabolcs Fischer and Viktor Nagy
Computers 2024, 13(9), 218; https://doi.org/10.3390/computers13090218 - 3 Sep 2024
Cited by 4 | Viewed by 1776
Abstract
This research introduces a new approach for detecting mobile phone use by drivers, exploiting the capabilities of Kolmogorov-Arnold Networks (KAN) to improve road safety and comply with regulations prohibiting phone use while driving. To address the lack of available data for this specific [...] Read more.
This research introduces a new approach for detecting mobile phone use by drivers, exploiting the capabilities of Kolmogorov-Arnold Networks (KAN) to improve road safety and comply with regulations prohibiting phone use while driving. To address the lack of available data for this specific task, a unique dataset was constructed consisting of images of bus drivers in two scenarios: driving without phone interaction and driving while on a phone call. This dataset provides the basis for the current research. Different KAN-based networks were developed for custom action recognition tailored to the nuanced task of identifying drivers holding phones. The system’s performance was evaluated against convolutional neural network-based solutions, and differences in accuracy and robustness were observed. The aim was to propose an appropriate solution for professional Driver Monitoring Systems (DMS) in research and development and to investigate the efficiency of KAN solutions for this specific sub-task. The implications of this work extend beyond enforcement, providing a foundational technology for automating monitoring and improving safety protocols in the commercial and public transport sectors. In conclusion, this study demonstrates the efficacy of KAN network layers in neural network designs for driver monitoring applications. Full article
(This article belongs to the Special Issue Machine Learning Applications in Pattern Recognition)
Show Figures

Figure 1

20 pages, 17178 KiB  
Article
Stego-STFAN: A Novel Neural Network for Video Steganography
by Guilherme Fay Vergara, Pedro Giacomelli, André Luiz Marques Serrano, Fábio Lúcio Lopes de Mendonça, Gabriel Arquelau Pimenta Rodrigues, Guilherme Dantas Bispo, Vinícius Pereira Gonçalves, Robson de Oliveira Albuquerque and Rafael Timóteo de Sousa Júnior
Computers 2024, 13(7), 180; https://doi.org/10.3390/computers13070180 - 19 Jul 2024
Viewed by 2314
Abstract
This article presents an innovative approach to video steganography called Stego-STFAN, as by using a cheap model process to use the temporal and spatial domains together, they end up presenting fine adjustments in each frame, the Stego-STFAN had a [...] Read more.
This article presents an innovative approach to video steganography called Stego-STFAN, as by using a cheap model process to use the temporal and spatial domains together, they end up presenting fine adjustments in each frame, the Stego-STFAN had a PSNRc metric of 27.03 and PSNRS of 23.09, which is close to the state-of-art. Steganography is the ability to hide a message so that third parties cannot perceive communication between them. Thus, one of the precautions in steganography is the size of the message you want to hide, as the security of the message is inversely proportional to its size. Inspired by this principle, video steganography appears to expand channels further and incorporate data into a message. To improve the construction of better stego-frames and recovered secrets, we propose a new architecture for video steganography derived from the Spatial-Temporal Adaptive Filter Network (STFAN) in conjunction with the Attention mechanism, which together generates filters and maps dynamic frames to increase the efficiency and effectiveness of frame processing, exploiting the redundancy present in the temporal dimension of the video, as well as fine details such as edges, fast-moving pixels and the context of secret and cover frames and by using the DWT method as another feature extraction level, having the same characteristics as when applied to an image file. Full article
(This article belongs to the Special Issue Machine Learning Applications in Pattern Recognition)
Show Figures

Figure 1

19 pages, 1015 KiB  
Article
A Regularized Physics-Informed Neural Network to Support Data-Driven Nonlinear Constrained Optimization
by Diego Armando Perez-Rosero, Andrés Marino Álvarez-Meza and Cesar German Castellanos-Dominguez
Computers 2024, 13(7), 176; https://doi.org/10.3390/computers13070176 - 18 Jul 2024
Cited by 2 | Viewed by 1768
Abstract
Nonlinear optimization (NOPT) is a meaningful tool for solving complex tasks in fields like engineering, economics, and operations research, among others. However, NOPT has problems when it comes to dealing with data variability and noisy input measurements that lead to incorrect solutions. Furthermore, [...] Read more.
Nonlinear optimization (NOPT) is a meaningful tool for solving complex tasks in fields like engineering, economics, and operations research, among others. However, NOPT has problems when it comes to dealing with data variability and noisy input measurements that lead to incorrect solutions. Furthermore, nonlinear constraints may result in outcomes that are either infeasible or suboptimal, such as nonconvex optimization. This paper introduces a novel regularized physics-informed neural network (RPINN) framework as a new NOPT tool for both supervised and unsupervised data-driven scenarios. Our RPINN is threefold: By using custom activation functions and regularization penalties in an artificial neural network (ANN), RPINN can handle data variability and noisy inputs. Furthermore, it employs physics principles to construct the network architecture, computing the optimization variables based on network weights and learned features. In addition, it uses automatic differentiation training to make the system scalable and cut down on computation time through batch-based back-propagation. The test results for both supervised and unsupervised NOPT tasks show that our RPINN can provide solutions that are competitive compared to state-of-the-art solvers. In turn, the robustness of RPINN against noisy input measurements makes it particularly valuable in environments with fluctuating information. Specifically, we test a uniform mixture model and a gas-powered system as NOPT scenarios. Overall, with RPINN, its ANN-based foundation offers significant flexibility and scalability. Full article
(This article belongs to the Special Issue Machine Learning Applications in Pattern Recognition)
Show Figures

Figure 1

Other

Jump to: Research

40 pages, 5965 KiB  
Systematic Review
A Systematic Review and Comparative Analysis Approach to Boom Gate Access Using Plate Number Recognition
by Asaju Christine Bukola, Pius Adewale Owolawi, Chuling Du and Etienne Van Wyk
Computers 2024, 13(11), 286; https://doi.org/10.3390/computers13110286 - 4 Nov 2024
Cited by 1 | Viewed by 1584
Abstract
Security has been paramount to many organizations for many years, with access control being one of the critical measures to ensure security. Among various approaches to access control, vehicle plate number recognition has received wide attention. However, its application to boom gate access [...] Read more.
Security has been paramount to many organizations for many years, with access control being one of the critical measures to ensure security. Among various approaches to access control, vehicle plate number recognition has received wide attention. However, its application to boom gate access has not been adequately explored. This study proposes a method to access the boom gate by optimizing vehicle plate number recognition. Given the speed and accuracy of the YOLO (You Only Look Once) object detection algorithm, this study proposes using the YOLO deep learning algorithm for plate number detection to access a boom gate. To identify the gap and the most suitable YOLO variant, the study systematically surveyed the publication database to identify peer-reviewed articles published between 2020 and 2024 on plate number recognition using different YOLO versions. In addition, experiments are performed on four YOLO versions: YOLOv5, YOLOv7, YOLOv8, and YOLOv9, focusing on vehicle plate number recognition. The experiments, using an open-source dataset with 699 samples in total, reported accuracies of 81%, 82%, 83%, and 73% for YOLO V5, V7, V8, and V9, respectively. This comparative analysis aims to determine the most appropriate YOLO version for the task, optimizing both security and efficiency in boom gate access control systems. By optimizing the capabilities of advanced YOLO algorithms, the proposed method seeks to improve the reliability and effectiveness of access control through precise and rapid plate number recognition. The result of the analysis reveals that each YOLO version has distinct advantages depending on the application’s specific requirements. In complex detection conditions with changing lighting and shadows, it was revealed that YOLOv8 performed better in terms of reduced loss rates and increased precision and recall metrics. Full article
(This article belongs to the Special Issue Machine Learning Applications in Pattern Recognition)
Show Figures

Figure 1

Back to TopTop