Computers

Research

Jump to: Other

27 pages, 5866 KB

Open AccessArticle

DCGAN Feature-Enhancement-Based YOLOv8n Model in Small-Sample Target Detection

by Peng Zheng, Yun Cheng, Wei Zhu, Bo Liu, Chenhao Ye, Shijie Wang, Shuhong Liu and Jinyin Bai

Computers 2025, 14(9), 389; https://doi.org/10.3390/computers14090389 - 15 Sep 2025

Viewed by 756

Abstract

This paper proposes DCGAN-YOLOv8n, an integrated framework that significantly advances small-sample target detection by synergizing generative adversarial feature enhancement with multi-scale representation learning. The model’s core contribution lies in its novel adversarial feature enhancement module (AFEM), which leverages conditional generative adversarial networks to [...] Read more.

This paper proposes DCGAN-YOLOv8n, an integrated framework that significantly advances small-sample target detection by synergizing generative adversarial feature enhancement with multi-scale representation learning. The model’s core contribution lies in its novel adversarial feature enhancement module (AFEM), which leverages conditional generative adversarial networks to reconstruct discriminative multi-scale features while effectively mitigating mode collapse. Furthermore, the architecture incorporates a deformable multi-scale feature pyramid that dynamically fuses generated high-resolution features with hierarchical semantic representations through an attention mechanism. The proposed triple marginal constraint optimization jointly enhances intra-class compactness and inter-class separation, thereby structuring a highly discriminative feature space. Extensive experiments on the NWPU VHR-10 dataset demonstrate state-of-the-art performance, with the model achieving an mAP50 of 90.46% and an mAP50-95 of 57.06%, representing significant improvements of 4.52% and 4.08% over the baseline YOLOv8n, respectively. These results validate the framework’s effectiveness in addressing critical challenges of feature representation scarcity and cross-scale adaptation in data-limited scenarios. Full article

(This article belongs to the Special Issue Machine Learning Applications in Pattern Recognition)

► Show Figures

Figure 1

18 pages, 3961 KB

Open AccessArticle

Multi-Task Graph Attention Net for Electricity Consumption Prediction and Anomaly Detection

by Na Bai, Jian Zhang and Zhaoli Wu

Computers 2025, 14(9), 350; https://doi.org/10.3390/computers14090350 - 26 Aug 2025

Viewed by 836

Abstract

Precise electricity consumption forecasting and anomaly detection constitute fundamental requirements for maintaining grid reliability in smart power systems. While consumption patterns demonstrate quasi-periodic behavior with region-specific fluctuations influenced by environmental factors, existing approaches may fail to systematically model these dynamic variations or quantify [...] Read more.

Precise electricity consumption forecasting and anomaly detection constitute fundamental requirements for maintaining grid reliability in smart power systems. While consumption patterns demonstrate quasi-periodic behavior with region-specific fluctuations influenced by environmental factors, existing approaches may fail to systematically model these dynamic variations or quantify environmental impacts. This limitation results in a compromised prediction accuracy and ambiguous anomaly identification. To overcome these challenges, we propose a novel Multi-Task Graph Attention Network (MGAT) framework leveraging an adaptive entropy analysis. Our methodology comprises four key innovations: (1) the temporal decomposition of consumption data with entropy-based adaptive clustering into predictable low-entropy components (processed via multi-scale attention networks) and volatile high-entropy components; (2) the graph-based representation of high-entropy fluctuations through numerical correlation encoding, complemented by temporal environmental graphs quantifying external influences; (3) the hierarchical fusion of environmental and fluctuation graphs via a specialized Graph Attention Autoencoder that jointly models dynamic patterns and environmental dependencies; (4) the integrated synthesis of all components for simultaneous consumption prediction and anomaly detection. Experiments verify the MGAT’s performance in both forecasting precision and anomaly identification compared to conventional methods. Full article

(This article belongs to the Special Issue Machine Learning Applications in Pattern Recognition)

► Show Figures

Figure 1

19 pages, 2569 KB

Open AccessArticle

CNN-Random Forest Hybrid Method for Phenology-Based Paddy Rice Mapping Using Sentinel-2 and Landsat-8 Satellite Images

by Dodi Sudiana, Sayyidah Hanifah Putri, Dony Kushardono, Anton Satria Prabuwono, Josaphat Tetuko Sri Sumantyo and Mia Rizkinia

Computers 2025, 14(8), 336; https://doi.org/10.3390/computers14080336 - 18 Aug 2025

Cited by 2 | Viewed by 1331

Abstract

The agricultural sector plays a vital role in achieving the second Sustainable Development Goal: “Zero Hunger”. To ensure food security, agriculture must remain resilient and productive. In Indonesia, a major rice-producing country, the conversion of agricultural land for non-agricultural uses poses a serious [...] Read more.

The agricultural sector plays a vital role in achieving the second Sustainable Development Goal: “Zero Hunger”. To ensure food security, agriculture must remain resilient and productive. In Indonesia, a major rice-producing country, the conversion of agricultural land for non-agricultural uses poses a serious threat to food availability. Accurate and timely mapping of paddy rice is therefore crucial. This study proposes a phenology-based mapping approach using a Convolutional Neural Network-Random Forest (CNN-RF) Hybrid model with multi-temporal Sentinel-2 and Landsat-8 imagery. Image processing and analysis were conducted using the Google Earth Engine platform. Raw spectral bands and four vegetation indices—NDVI, EVI, LSWI, and RGVI—were extracted as input features for classification. The CNN-RF Hybrid classifier demonstrated strong performance, achieving an overall accuracy of 0.950 and a Cohen’s Kappa coefficient of 0.893. These results confirm the effectiveness of the proposed method for mapping paddy rice in Indramayu Regency, West Java, using medium-resolution optical remote sensing data. The integration of phenological characteristics and deep learning significantly enhances classification accuracy. This research supports efforts to monitor and preserve paddy rice cultivation areas amid increasing land use pressures, contributing to national food security and sustainable agricultural practices. Full article

(This article belongs to the Special Issue Machine Learning Applications in Pattern Recognition)

► Show Figures

Figure 1

21 pages, 1179 KB

Open AccessArticle

ELFA-Log: Cross-System Log Anomaly Detection via Enhanced Pseudo-Labeling and Feature Alignment

by Xiaowei Zhao, Kaiwei Guo, Mingting Huang, Shaojian Qiu and Lu Lu

Computers 2025, 14(7), 272; https://doi.org/10.3390/computers14070272 - 10 Jul 2025

Cited by 1 | Viewed by 1469

Abstract

Existing log-based anomaly detection methods typically require large volumes of labeled data for training, presenting significant challenges when applied to new systems with limited labeled data. This limitation has spurred the need for cross-system log anomaly detection (CSLAD) methods. However, current CSLAD approaches [...] Read more.

Existing log-based anomaly detection methods typically require large volumes of labeled data for training, presenting significant challenges when applied to new systems with limited labeled data. This limitation has spurred the need for cross-system log anomaly detection (CSLAD) methods. However, current CSLAD approaches often face challenges in effectively handling distributional differences in log data across systems. To address this issue, we propose ELFA-Log, a transfer learning-based approach for cross-system log anomaly detection. By enhancing pseudo-label generation with uncertainty estimation and feature alignment, ELFA-Log improves detection performance even in the presence of data distribution shifts. It uses entropy-based metrics to generate high-confidence pseudo-labels, minimizing reliance on labeled data. Additionally, a distance-based loss function optimizes the shared representation of cross-system log features. Experimental results on benchmark datasets demonstrate that ELFA-Log enhances the performance of CSLAD, offering a practical solution to the challenge of high labeling costs in real-world applications. Full article

(This article belongs to the Special Issue Machine Learning Applications in Pattern Recognition)

► Show Figures

Figure 1

16 pages, 2358 KB

Open AccessArticle

A Hybrid Content-Aware Network for Single Image Deraining

by Guoqiang Chai, Rui Yang, Jin Ge and Yulei Chen

Computers 2025, 14(7), 262; https://doi.org/10.3390/computers14070262 - 4 Jul 2025

Viewed by 896

Abstract

Rain streaks degrade the quality of optical images and seriously affect the effectiveness of subsequent vision-based algorithms. Although the applications of a convolutional neural network (CNN) and self-attention mechanism (SA) in single image deraining have shown great success, there are still unresolved issues [...] Read more.

Rain streaks degrade the quality of optical images and seriously affect the effectiveness of subsequent vision-based algorithms. Although the applications of a convolutional neural network (CNN) and self-attention mechanism (SA) in single image deraining have shown great success, there are still unresolved issues regarding the deraining performance and the large computational load. The work in this paper fully coordinates and utilizes the advantages between CNN and SA and proposes a hybrid content-aware deraining network (CAD) to reduce complexity and generate high-quality results. Specifically, we construct the CADBlock, including the content-aware convolution and attention mixer module (CAMM) and the multi-scale double-gated feed-forward module (MDFM). In CAMM, the attention mechanism is used for intricate windows to generate abundant features and simple convolution is used for plain windows to reduce computational costs. In MDFM, multi-scale spatial features are double-gated fused to preserve local detail features and enhance image restoration capabilities. Furthermore, a four-token contextual attention module (FTCA) is introduced to explore the content information among neighbor keys to improve the representation ability. Both qualitative and quantitative validations on synthetic and real-world rain images demonstrate that the proposed CAD can achieve a competitive deraining performance. Full article

(This article belongs to the Special Issue Machine Learning Applications in Pattern Recognition)

► Show Figures

Figure 1

26 pages, 12177 KB

Open AccessArticle

An Efficient Hybrid 3D Computer-Aided Cephalometric Analysis for Lateral Cephalometric and Cone-Beam Computed Tomography (CBCT) Systems

by Laurine A. Ashame, Sherin M. Youssef, Mazen Nabil Elagamy and Sahar M. El-Sheikh

Computers 2025, 14(6), 223; https://doi.org/10.3390/computers14060223 - 7 Jun 2025

Viewed by 2401

Abstract

Lateral cephalometric analysis is commonly used in orthodontics for skeletal classification to ensure an accurate and reliable diagnosis for treatment planning. However, most current research depends on analyzing different type of radiographs, which requires more computational time than 3D analysis. Consequently, this study [...] Read more.

Lateral cephalometric analysis is commonly used in orthodontics for skeletal classification to ensure an accurate and reliable diagnosis for treatment planning. However, most current research depends on analyzing different type of radiographs, which requires more computational time than 3D analysis. Consequently, this study addresses fully automatic orthodontics tracing based on the usage of artificial intelligence (AI) applied to 2D and 3D images, by designing a cephalometric system that analyzes the significant landmarks and regions of interest (ROI) needed in orthodontics tracing, especially for the mandible and maxilla teeth. In this research, a computerized system is developed to automate the tasks of orthodontics evaluation during 2D and Cone-Beam Computed Tomography (CBCT or 3D) systems measurements. This work was tested on a dataset that contains images of males and females obtained from dental hospitals with patient-informed consent. The dataset consists of 2D lateral cephalometric, panorama and CBCT radiographs. Many scenarios were applied to test the proposed system in landmark prediction and detection. Moreover, this study integrates the Grad-CAM (Gradient-Weighted Class Activation Mapping) technique to generate heat maps, providing transparent visualization of the regions the model focuses on during its decision-making process. By enhancing the interpretability of deep learning predictions, Grad-CAM strengthens clinical confidence in the system’s outputs, ensuring that ROI detection aligns with orthodontic diagnostic standards. This explainability is crucial in medical AI applications, where understanding model behavior is as important as achieving high accuracy. The experimental results achieved an accuracy exceeding 98.9%. This research evaluates and differentiates between the two-dimensional and the three-dimensional tracing analyses applied to measurements based on the practices of the European Board of Orthodontics. The results demonstrate the proposed methodology’s robustness when applied to cephalometric images. Furthermore, the evaluation of 3D analysis usage provides a clear understanding of the significance of integrated deep-learning techniques in orthodontics. Full article

(This article belongs to the Special Issue Machine Learning Applications in Pattern Recognition)

► Show Figures

Figure 1

15 pages, 7036 KB

Open AccessArticle

Detection of Fiber-Flaw on Pill Surface Based on Lightweight Network SA-MGhost-DVGG

by Jipei Lou, Hongyi Wang, Haodong Liang and Ziwei Wu

Computers 2025, 14(5), 200; https://doi.org/10.3390/computers14050200 - 21 May 2025

Viewed by 658

Abstract

Fiber-flaw detection on pill surfaces is a critical yet challenging task in industrial pharmacy due to diverse defect characteristics. To overcome the limitations of traditional methods in accuracy and real-time performance, this study introduces SA-MGhost-DVGG, a novel lightweight network for enhanced detection. The [...] Read more.

Fiber-flaw detection on pill surfaces is a critical yet challenging task in industrial pharmacy due to diverse defect characteristics. To overcome the limitations of traditional methods in accuracy and real-time performance, this study introduces SA-MGhost-DVGG, a novel lightweight network for enhanced detection. The proposed network integrates an MGhost module for reducing parameters and computational load, a mixed-channel spatial attention (SA) module to refine features specific to fiber regions, and depthwise separable convolutions (DepSepConv) for efficient dimensionality reduction while preserving feature information. Experimental evaluations demonstrate that SA-MGhost-DVGG achieves a mean detection accuracy of 99.01% with an average inference time of 2.23 ms per pill. The findings confirm that SA-MGhost-DVGG effectively balances high accuracy with computational efficiency, offering a robust solution for industrial applications. Full article

(This article belongs to the Special Issue Machine Learning Applications in Pattern Recognition)

► Show Figures

Figure 1

18 pages, 10587 KB

Open AccessArticle

M18K: A Multi-Purpose Real-World Dataset for Mushroom Detection, 3D Pose Estimation, and Growth Monitoring

by Abdollah Zakeri, Mulham Fawakherji, Jiming Kang, Bikram Koirala, Venkatesh Balan, Weihang Zhu, Driss Benhaddou and Fatima A. Merchant

Computers 2025, 14(5), 199; https://doi.org/10.3390/computers14050199 - 20 May 2025

Cited by 1 | Viewed by 2264

Abstract

Automating agricultural processes holds significant promise for enhancing efficiency and sustainability in various farming practices. This paper contributes to the automation of agricultural processes by providing a dedicated mushroom detection dataset related to automated harvesting, 3D pose estimation, and growth monitoring of the [...] Read more.

Automating agricultural processes holds significant promise for enhancing efficiency and sustainability in various farming practices. This paper contributes to the automation of agricultural processes by providing a dedicated mushroom detection dataset related to automated harvesting, 3D pose estimation, and growth monitoring of the button mushroom produced using Agaricus Bisporus fungi. With a total of 2000 images for object detection, instance segmentation, and 3D pose estimation—containing over 100,000 mushroom instances—and an additional 3838 images for yield estimation featuring eight mushroom scenes covering the complete growth period, it fills the gap in mushroom-specific datasets and serves as a benchmark for detection and instance segmentation as well as 3D pose estimation algorithms in smart mushroom agriculture. The dataset, featuring realistic growth environment scenarios with comprehensive 2D and 3D annotations, is assessed using advanced detection and instance segmentation algorithms. This paper details the dataset’s characteristics, presents detailed statistics on mushroom growth and yield, evaluates algorithmic performance, and, for broader applicability, makes all resources publicly available, including images, code, and trained models, via our GitHub repository. (accessed on 22 March 2025). Full article

(This article belongs to the Special Issue Machine Learning Applications in Pattern Recognition)

► Show Figures

Figure 1

17 pages, 11121 KB

Open AccessArticle

Few-Shot Data Augmentation by Morphology-Constrained Latent Diffusion for Enhanced Nematode Recognition

by Xiong Ouyang, Jiayan Zhuang, Jianfeng Gu and Sichao Ye

Computers 2025, 14(5), 198; https://doi.org/10.3390/computers14050198 - 19 May 2025

Viewed by 1968

Abstract

Plant-parasiticnematodes represent a significant biosecurity threat in cross-border plant quarantine, necessitating precise identification for effective border control. While DL models have demonstrated success in nematode image classification based on morphological features, the limited availability of high-quality samples and the species-specific nature of nematodes [...] Read more.

Plant-parasiticnematodes represent a significant biosecurity threat in cross-border plant quarantine, necessitating precise identification for effective border control. While DL models have demonstrated success in nematode image classification based on morphological features, the limited availability of high-quality samples and the species-specific nature of nematodes result in insufficient training data, which constrains model performance. Although generative models have shown promise in data augmentation, they often struggle to balance morphological fidelity and phenotypic diversity. This paper proposes a novel few-shot data augmentation framework based on a morphology-constrained latent diffusion model, which, for the first time, integrates morphological constraints into the latent diffusion process. By geometrically parameterizing nematode morphology, the proposed approach enhances topological fidelity in the generated images and addresses key limitations of traditional generative models in controlling biological shapes. This framework is designed to augment nematode image datasets and improve classification performance under limited data conditions. The framework consists of three key components: First, we incorporate a fine-tuning strategy that preserves the generalization capability of model in few-shot settings. Second, we extract morphological constraints from nematode images using edge detection and a moving least squares method, capturing key structural details. Finally, we embed these constraints into the latent space of the diffusion model, ensuring generated images maintain both fidelity and diversity. Experimental results demonstrate that our approach significantly enhances classification accuracy. For imbalanced datasets, the Top-1 accuracy of multiple classification models improved by 7.34–14.66% compared to models trained without augmentation, and by 2.0–5.67% compared to models using traditional data augmentation. Additionally, when replacing up to 25% of real images with generated ones in a balanced dataset, model performance remained nearly unchanged, indicating the robustness and effectiveness of the method. Ablation experiments demonstrate that the morphology-guided strategy achieves superior image quality compared to both unconstrained and edge-based constraint methods, with a Fréchet Inception Distance of 12.95 and an Inception Score of 1.21 ± 0.057. These results indicate that the proposed method effectively balances morphological fidelity and phenotypic diversity in image generation. Full article

(This article belongs to the Special Issue Machine Learning Applications in Pattern Recognition)

► Show Figures

Figure 1

14 pages, 4391 KB

Open AccessArticle

AFQSeg: An Adaptive Feature Quantization Network for Instance-Level Surface Crack Segmentation

by Shaoliang Fang, Lu Lu, Zhu Lin, Zhanyu Yang and Shaosheng Wang

Computers 2025, 14(5), 182; https://doi.org/10.3390/computers14050182 - 9 May 2025

Viewed by 825

Abstract

Concrete surface crack detection plays a crucial role in infrastructure maintenance and safety. Deep learning-based methods have shown great potential in this task. However, under real-world conditions such as poor image quality, environmental interference, and complex crack patterns, existing models still face challenges [...] Read more.

Concrete surface crack detection plays a crucial role in infrastructure maintenance and safety. Deep learning-based methods have shown great potential in this task. However, under real-world conditions such as poor image quality, environmental interference, and complex crack patterns, existing models still face challenges in detecting fine cracks and often rely on large training parameters, limiting their practicality in complex environments. To address these issues, this paper proposes a crack detection model based on adaptive feature quantization, which primarily consists of a maximum soft pooling module, an adaptive crack feature quantization module, and a trainable crack post-processing module. Specifically, the maximum soft pooling module improves the continuity and integrity of detected cracks. The adaptive crack feature quantization module enhances the contrast between cracks and background features and strengthens the model’s focus on critical regions through spatial feature fusion. The trainable crack post-processing module incorporates edge-guided post-processing algorithms to correct false predictions and refine segmentation results. Experiments conducted on the Crack500 Road Crack Dataset show that, the proposed model achieves notable improvements in detection accuracy and efficiency, with an average F1-score improvement of 2.81% and a precision gain of 2.20% over the baseline methods. In addition, the model significantly reduces computational cost, achieving a 78.5–88.7% reduction in parameter size and up to 96.8% improvement in inference speed, making it more efficient and deployable for real-world crack detection applications. Full article

(This article belongs to the Special Issue Machine Learning Applications in Pattern Recognition)

► Show Figures

Figure 1

23 pages, 3368 KB

Open AccessArticle

SDKU-Net: A Novel Architecture with Dynamic Kernels and Optimizer Switching for Enhanced Shadow Detection in Remote Sensing

by Gilberto Alvarado-Robles, Isac Andres Espinosa-Vizcaino, Carlos Gustavo Manriquez-Padilla and Juan Jose Saucedo-Dorantes

Computers 2025, 14(3), 80; https://doi.org/10.3390/computers14030080 - 23 Feb 2025

Cited by 1 | Viewed by 3073

Abstract

Shadows in remote sensing images often introduce challenges in accurate segmentation due to their variability in shape, size, and texture. To address these issues, this study proposes the Supervised Dynamic Kernel U-Net (SDKU-Net), a novel architecture designed to enhance shadow detection in complex [...] Read more.

Shadows in remote sensing images often introduce challenges in accurate segmentation due to their variability in shape, size, and texture. To address these issues, this study proposes the Supervised Dynamic Kernel U-Net (SDKU-Net), a novel architecture designed to enhance shadow detection in complex remote sensing scenarios. SDKU-Net integrates dynamic kernel adjustment, a combined loss function incorporating Focal and Tversky Loss, and optimizer switching to effectively tackle class imbalance and improve segmentation quality. Using the AISD dataset, the proposed method achieved state-of-the-art performance with an Intersection over Union (IoU) of 0.8552, an F1-Score of 0.9219, an Overall Accuracy (OA) of 96.50%, and a Balanced Error Rate (BER) of 5.08%. Comparative analyses demonstrate SDKU-Net’s superior performance against established methods such as U-Net, U-Net++, MSASDNet, and CADDN. Additionally, the model’s efficient training process, requiring only 75 epochs, highlights its potential for resource-constrained applications. These results underscore the robustness and adaptability of SDKU-Net, paving the way for advancements in shadow detection and segmentation across diverse fields. Full article

(This article belongs to the Special Issue Machine Learning Applications in Pattern Recognition)

► Show Figures

Figure 1

32 pages, 8818 KB

Open AccessArticle

Latent Outlier Exposure in Real-Time Anomaly Detection at the Large Hadron Collider

by Thomas Dartnall Stern, Amit Kumar Mishra and James Michael Keaveney

Computers 2025, 14(3), 79; https://doi.org/10.3390/computers14030079 - 20 Feb 2025

Cited by 1 | Viewed by 2287

Abstract

We propose a novel approach to real-time anomaly detection at the Large Hadron Collider, aimed at enhancing the discovery potential for new fundamental phenomena in particle physics. Our method leverages the Latent Outlier Exposure technique and is evaluated using three distinct anomaly detection [...] Read more.

We propose a novel approach to real-time anomaly detection at the Large Hadron Collider, aimed at enhancing the discovery potential for new fundamental phenomena in particle physics. Our method leverages the Latent Outlier Exposure technique and is evaluated using three distinct anomaly detection models. Among these is a novel adaptation of the variational autoencoder’s reparameterisation trick, specifically optimised for anomaly detection. The models are validated on simulated datasets representing collider processes from the Standard Model and hypothetical Beyond the Standard Model scenarios. The results demonstrate significant advantages, particularly in addressing the formidable challenge of developing a signal-agnostic, hardware-level anomaly detection trigger for experiments at the Large Hadron Collider. Full article

(This article belongs to the Special Issue Machine Learning Applications in Pattern Recognition)

► Show Figures

Figure 1

27 pages, 5537 KB

Open AccessArticle

Real-Time Gaze Estimation Using Webcam-Based CNN Models for Human–Computer Interactions

by Visal Vidhya and Diego Resende Faria

Computers 2025, 14(2), 57; https://doi.org/10.3390/computers14020057 - 10 Feb 2025

Cited by 4 | Viewed by 6703

Abstract

Gaze tracking and estimation are essential for understanding human behavior and enhancing human–computer interactions. This study introduces an innovative, cost-effective solution for real-time gaze tracking using a standard webcam, providing a practical alternative to conventional methods that rely on expensive infrared (IR) cameras. [...] Read more.

Gaze tracking and estimation are essential for understanding human behavior and enhancing human–computer interactions. This study introduces an innovative, cost-effective solution for real-time gaze tracking using a standard webcam, providing a practical alternative to conventional methods that rely on expensive infrared (IR) cameras. Traditional approaches, such as Pupil Center Corneal Reflection (PCCR), require IR cameras to capture corneal reflections and iris glints, demanding high-resolution images and controlled environments. In contrast, the proposed method utilizes a convolutional neural network (CNN) trained on webcam-captured images to achieve precise gaze estimation. The developed deep learning model achieves a mean squared error (MSE) of 0.0112 and an accuracy of 90.98% through a novel trajectory-based accuracy evaluation system. This system involves an animation of a ball moving across the screen, with the user’s gaze following the ball’s motion. Accuracy is determined by calculating the proportion of gaze points falling within a predefined threshold based on the ball’s radius, ensuring a comprehensive evaluation of the system’s performance across all screen regions. Data collection is both simplified and effective, capturing images of the user’s right eye while they focus on the screen. Additionally, the system includes advanced gaze analysis tools, such as heat maps, gaze fixation tracking, and blink rate monitoring, which are all integrated into an intuitive user interface. The robustness of this approach is further enhanced by incorporating Google’s Mediapipe model for facial landmark detection, improving accuracy and reliability. The evaluation results demonstrate that the proposed method delivers high-accuracy gaze prediction without the need for expensive equipment, making it a practical and accessible solution for diverse applications in human–computer interactions and behavioral research. Full article

(This article belongs to the Special Issue Machine Learning Applications in Pattern Recognition)

► Show Figures

Figure 1

24 pages, 11018 KB

Open AccessArticle

Integrating Few-Shot Learning and Multimodal Image Enhancement in GNut: A Novel Approach to Groundnut Leaf Disease Detection

by Imran Qureshi

Computers 2024, 13(12), 306; https://doi.org/10.3390/computers13120306 - 22 Nov 2024

Cited by 2 | Viewed by 2690

Abstract

Groundnut is a vital crop worldwide, but its production is significantly threatened by various leaf diseases. Early identification of such diseases is vital for maintaining agricultural productivity. Deep learning techniques have been employed to address this challenge and enhance the detection, recognition, and [...] Read more.

Groundnut is a vital crop worldwide, but its production is significantly threatened by various leaf diseases. Early identification of such diseases is vital for maintaining agricultural productivity. Deep learning techniques have been employed to address this challenge and enhance the detection, recognition, and classification of groundnut leaf diseases, ensuring better management and protection of this important crop. This paper presents a new approach to the detection and classification of groundnut leaf diseases by the use of an advanced deep learning model, GNut, which integrates ResNet50 and DenseNet121 architectures for feature extraction and Few-Shot Learning (FSL) for classification. The proposed model overcomes groundnut crop diseases by addressing an efficient and highly accurate method of managing diseases in agriculture. Evaluated on a novel Pak-Nuts dataset collected from groundnut fields in Pakistan, the GNut model achieves promising accuracy rates of 99% with FSL and 95% without it. Advanced image preprocessing techniques, such as Multi-Scale Retinex with Color Restoration and Adaptive Histogram Equalization and Multimodal Image Enhancement for Vegetative Feature Isolation were employed to enhance the quality of input data, further improving classification accuracy. These results illustrate the robustness of the proposed model in real agricultural applications, establishing a new benchmark for groundnut leaf disease detection and highlighting the potential of AI-powered solutions to play a role in encouraging sustainable agricultural practices. Full article

(This article belongs to the Special Issue Machine Learning Applications in Pattern Recognition)

► Show Figures

Figure 1

20 pages, 4520 KB

Open AccessArticle

Employing Different Algorithms of Lightweight Convolutional Neural Network Models in Image Distortion Classification

by Ismail Taha Ahmed, Falah Amer Abdulazeez and Baraa Tareq Hammad

Computers 2024, 13(10), 268; https://doi.org/10.3390/computers13100268 - 12 Oct 2024

Cited by 3 | Viewed by 2366

Abstract

The majority of applications use automatic image recognition technologies to carry out a range of tasks. Therefore, it is crucial to identify and classify image distortions to improve image quality. Despite efforts in this area, there are still many challenges in accurately and [...] Read more.

The majority of applications use automatic image recognition technologies to carry out a range of tasks. Therefore, it is crucial to identify and classify image distortions to improve image quality. Despite efforts in this area, there are still many challenges in accurately and reliably classifying distorted images. In this paper, we offer a comprehensive analysis of models of both non-lightweight and lightweight deep convolutional neural networks (CNNs) for the classification of distorted images. Subsequently, an effective method is proposed to enhance the overall performance of distortion image classification. This method involves selecting features from the pretrained models’ capabilities and using a strong classifier. The experiments utilized the kadid10k dataset to assess the effectiveness of the results. The K-nearest neighbor (KNN) classifier showed better performance than the naïve classifier in terms of accuracy, precision, error rate, recall and F1 score. Additionally, SqueezeNet outperformed other deep CNN models, both lightweight and non-lightweight, across every evaluation metric. The experimental results demonstrate that combining SqueezeNet with KNN can effectively and accurately classify distorted images into the correct categories. The proposed SqueezeNet-KNN method achieved an accuracy rate of 89%. As detailed in the results section, the proposed method outperforms state-of-the-art methods in accuracy, precision, error, recall, and F1 score measures. Full article

(This article belongs to the Special Issue Machine Learning Applications in Pattern Recognition)

► Show Figures

Figure 1

18 pages, 8530 KB

Open AccessArticle

Spatiotemporal Bayesian Machine Learning for Estimation of an Empirical Lower Bound for Probability of Detection with Applications to Stationary Wildlife Photography

by Mohamed Jaber, Robert D. Breininger, Farag Hamad and Nezamoddin N. Kachouie

Computers 2024, 13(10), 255; https://doi.org/10.3390/computers13100255 - 8 Oct 2024

Viewed by 1382

Abstract

An important parameter in the monitoring and surveillance systems is the probability of detection. Advanced wildlife monitoring systems rely on camera traps for stationary wildlife photography and have been broadly used for estimation of population size and density. Camera encounters are collected for [...] Read more.

An important parameter in the monitoring and surveillance systems is the probability of detection. Advanced wildlife monitoring systems rely on camera traps for stationary wildlife photography and have been broadly used for estimation of population size and density. Camera encounters are collected for estimation and management of a growing population size using spatial capture models. The accuracy of the estimated population size relies on the detection probability of the individual animals, and in turn depends on observed frequency of the animal encounters with the camera traps. Therefore, optimal coverage by the camera grid is essential for reliable estimation of the population size and density. The goal of this research is implementing a spatiotemporal Bayesian machine learning model to estimate a lower bound for probability of detection of a monitoring system. To obtain an accurate estimate of population size in this study, an empirical lower bound for probability of detection is realized considering the sensitivity of the model to the augmented sample size. The monitoring system must attain a probability of detection greater than the established empirical lower bound to achieve a pertinent estimation accuracy. It was found that for stationary wildlife photography, a camera grid with a detection probability of at least 0.3 is required for accurate estimation of the population size. A notable outcome is that a moderate probability of detection or better is required to obtain a reliable estimate of the population size using spatiotemporal machine learning. As a result, the required probability of detection is recommended when designing an automated monitoring system. The number and location of cameras in the camera grid will determine the camera coverage. Consequently, camera coverage and the individual home-range verify the probability of detection. Full article

(This article belongs to the Special Issue Machine Learning Applications in Pattern Recognition)

► Show Figures

Figure 1

17 pages, 3728 KB

Open AccessArticle

YOLOv8-Based Drone Detection: Performance Analysis and Optimization

by Betul Yilmaz and Ugurhan Kutbay

Computers 2024, 13(9), 234; https://doi.org/10.3390/computers13090234 - 17 Sep 2024

Cited by 10 | Viewed by 9508

Abstract

The extensive utilization of drones has led to numerous scenarios that encompass both advantageous and perilous outcomes. By using deep learning techniques, this study aimed to reduce the dangerous effects of drone use through early detection of drones. The purpose of this study [...] Read more.

The extensive utilization of drones has led to numerous scenarios that encompass both advantageous and perilous outcomes. By using deep learning techniques, this study aimed to reduce the dangerous effects of drone use through early detection of drones. The purpose of this study is the evaluation of deep learning approaches such as pre-trained YOLOv8 drone detection for security issues. This study focuses on the YOLOv8 model to achieve optimal performance in object detection tasks using a publicly available dataset collected by Mehdi Özel for a UAV competition that is sourced from GitHub. These images are labeled using Roboflow, and the model is trained on Google Colab. YOLOv8, known for its advanced architecture, was selected due to its suitability for real-time detection applications and its ability to process complex visual data. Hyperparameter tuning and data augmentation techniques were applied to maximize the performance of the model. Basic hyperparameters such as learning rate, batch size, and optimization settings were optimized through iterative experiments to provide the best performance. In addition to hyperparameter tuning, various data augmentation strategies were used to increase the robustness and generalization ability of the model. Techniques such as rotation, scaling, flipping, and color adjustments were applied to the dataset to simulate different conditions and variations. Among the augmentation techniques applied to the specific dataset in this study, rotation was found to deliver the highest performance. Blurring and cropping methods were observed to follow closely behind. The combination of optimized hyperparameters and strategic data augmentation allowed YOLOv8 to achieve high detection accuracy and reliable performance on the publicly available dataset. This method demonstrates the effectiveness of YOLOv8 in real-world scenarios, while also highlighting the importance of hyperparameter tuning and data augmentation in increasing model capabilities. To enhance model performance, dataset augmentation techniques including rotation and blurring are implemented. Following these steps, a significant precision value of 0.946, a notable recall value of 0.9605, and a considerable precision–recall curve value of 0.978 are achieved, surpassing many popular models such as Mask CNN, CNN, and YOLOv5. Full article

(This article belongs to the Special Issue Machine Learning Applications in Pattern Recognition)

► Show Figures

Figure 1

18 pages, 5905 KB

Open AccessArticle

Detection of Bus Driver Mobile Phone Usage Using Kolmogorov-Arnold Networks

by János Hollósi, Áron Ballagi, Gábor Kovács, Szabolcs Fischer and Viktor Nagy

Computers 2024, 13(9), 218; https://doi.org/10.3390/computers13090218 - 3 Sep 2024

Cited by 6 | Viewed by 2955

Abstract

This research introduces a new approach for detecting mobile phone use by drivers, exploiting the capabilities of Kolmogorov-Arnold Networks (KAN) to improve road safety and comply with regulations prohibiting phone use while driving. To address the lack of available data for this specific [...] Read more.

This research introduces a new approach for detecting mobile phone use by drivers, exploiting the capabilities of Kolmogorov-Arnold Networks (KAN) to improve road safety and comply with regulations prohibiting phone use while driving. To address the lack of available data for this specific task, a unique dataset was constructed consisting of images of bus drivers in two scenarios: driving without phone interaction and driving while on a phone call. This dataset provides the basis for the current research. Different KAN-based networks were developed for custom action recognition tailored to the nuanced task of identifying drivers holding phones. The system’s performance was evaluated against convolutional neural network-based solutions, and differences in accuracy and robustness were observed. The aim was to propose an appropriate solution for professional Driver Monitoring Systems (DMS) in research and development and to investigate the efficiency of KAN solutions for this specific sub-task. The implications of this work extend beyond enforcement, providing a foundational technology for automating monitoring and improving safety protocols in the commercial and public transport sectors. In conclusion, this study demonstrates the efficacy of KAN network layers in neural network designs for driver monitoring applications. Full article

(This article belongs to the Special Issue Machine Learning Applications in Pattern Recognition)

► Show Figures

Figure 1

20 pages, 17178 KB

Open AccessArticle

Stego-STFAN: A Novel Neural Network for Video Steganography

by Guilherme Fay Vergara, Pedro Giacomelli, André Luiz Marques Serrano, Fábio Lúcio Lopes de Mendonça, Gabriel Arquelau Pimenta Rodrigues, Guilherme Dantas Bispo, Vinícius Pereira Gonçalves, Robson de Oliveira Albuquerque and Rafael Timóteo de Sousa Júnior

Computers 2024, 13(7), 180; https://doi.org/10.3390/computers13070180 - 19 Jul 2024

Cited by 3 | Viewed by 3575

Abstract

This article presents an innovative approach to video steganography called Stego-STFAN, as by using a cheap model process to use the temporal and spatial domains together, they end up presenting fine adjustments in each frame, the Stego-STFAN had a [...] Read more.

This article presents an innovative approach to video steganography called Stego-STFAN, as by using a cheap model process to use the temporal and spatial domains together, they end up presenting fine adjustments in each frame, the Stego-STFAN had a

P S N R_{c}

metric of 27.03 and

P S N R_{S}

of 23.09, which is close to the state-of-art. Steganography is the ability to hide a message so that third parties cannot perceive communication between them. Thus, one of the precautions in steganography is the size of the message you want to hide, as the security of the message is inversely proportional to its size. Inspired by this principle, video steganography appears to expand channels further and incorporate data into a message. To improve the construction of better stego-frames and recovered secrets, we propose a new architecture for video steganography derived from the Spatial-Temporal Adaptive Filter Network (STFAN) in conjunction with the Attention mechanism, which together generates filters and maps dynamic frames to increase the efficiency and effectiveness of frame processing, exploiting the redundancy present in the temporal dimension of the video, as well as fine details such as edges, fast-moving pixels and the context of secret and cover frames and by using the DWT method as another feature extraction level, having the same characteristics as when applied to an image file. Full article

(This article belongs to the Special Issue Machine Learning Applications in Pattern Recognition)

► Show Figures

Figure 1

19 pages, 1015 KB

Open AccessArticle

A Regularized Physics-Informed Neural Network to Support Data-Driven Nonlinear Constrained Optimization

by Diego Armando Perez-Rosero, Andrés Marino Álvarez-Meza and Cesar German Castellanos-Dominguez

Computers 2024, 13(7), 176; https://doi.org/10.3390/computers13070176 - 18 Jul 2024

Cited by 6 | Viewed by 2789

Abstract

Nonlinear optimization (NOPT) is a meaningful tool for solving complex tasks in fields like engineering, economics, and operations research, among others. However, NOPT has problems when it comes to dealing with data variability and noisy input measurements that lead to incorrect solutions. Furthermore, [...] Read more.

Nonlinear optimization (NOPT) is a meaningful tool for solving complex tasks in fields like engineering, economics, and operations research, among others. However, NOPT has problems when it comes to dealing with data variability and noisy input measurements that lead to incorrect solutions. Furthermore, nonlinear constraints may result in outcomes that are either infeasible or suboptimal, such as nonconvex optimization. This paper introduces a novel regularized physics-informed neural network (RPINN) framework as a new NOPT tool for both supervised and unsupervised data-driven scenarios. Our RPINN is threefold: By using custom activation functions and regularization penalties in an artificial neural network (ANN), RPINN can handle data variability and noisy inputs. Furthermore, it employs physics principles to construct the network architecture, computing the optimization variables based on network weights and learned features. In addition, it uses automatic differentiation training to make the system scalable and cut down on computation time through batch-based back-propagation. The test results for both supervised and unsupervised NOPT tasks show that our RPINN can provide solutions that are competitive compared to state-of-the-art solvers. In turn, the robustness of RPINN against noisy input measurements makes it particularly valuable in environments with fluctuating information. Specifically, we test a uniform mixture model and a gas-powered system as NOPT scenarios. Overall, with RPINN, its ANN-based foundation offers significant flexibility and scalability. Full article

(This article belongs to the Special Issue Machine Learning Applications in Pattern Recognition)

► Show Figures

Figure 1

Other

Jump to: Research

40 pages, 5965 KB

Open AccessSystematic Review

A Systematic Review and Comparative Analysis Approach to Boom Gate Access Using Plate Number Recognition

by Asaju Christine Bukola, Pius Adewale Owolawi, Chuling Du and Etienne Van Wyk

Computers 2024, 13(11), 286; https://doi.org/10.3390/computers13110286 - 4 Nov 2024

Cited by 4 | Viewed by 3027

Abstract

Security has been paramount to many organizations for many years, with access control being one of the critical measures to ensure security. Among various approaches to access control, vehicle plate number recognition has received wide attention. However, its application to boom gate access [...] Read more.

Security has been paramount to many organizations for many years, with access control being one of the critical measures to ensure security. Among various approaches to access control, vehicle plate number recognition has received wide attention. However, its application to boom gate access has not been adequately explored. This study proposes a method to access the boom gate by optimizing vehicle plate number recognition. Given the speed and accuracy of the YOLO (You Only Look Once) object detection algorithm, this study proposes using the YOLO deep learning algorithm for plate number detection to access a boom gate. To identify the gap and the most suitable YOLO variant, the study systematically surveyed the publication database to identify peer-reviewed articles published between 2020 and 2024 on plate number recognition using different YOLO versions. In addition, experiments are performed on four YOLO versions: YOLOv5, YOLOv7, YOLOv8, and YOLOv9, focusing on vehicle plate number recognition. The experiments, using an open-source dataset with 699 samples in total, reported accuracies of 81%, 82%, 83%, and 73% for YOLO V5, V7, V8, and V9, respectively. This comparative analysis aims to determine the most appropriate YOLO version for the task, optimizing both security and efficiency in boom gate access control systems. By optimizing the capabilities of advanced YOLO algorithms, the proposed method seeks to improve the reliability and effectiveness of access control through precise and rapid plate number recognition. The result of the analysis reveals that each YOLO version has distinct advantages depending on the application’s specific requirements. In complex detection conditions with changing lighting and shadows, it was revealed that YOLOv8 performed better in terms of reduced loss rates and increased precision and recall metrics. Full article

(This article belongs to the Special Issue Machine Learning Applications in Pattern Recognition)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Machine Learning Applications in Pattern Recognition

Share This Special Issue

Special Issue Editor

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Published Papers (21 papers)

Research

Other

Further Information

Guidelines

MDPI Initiatives

Follow MDPI