Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (6,103)

Search Parameters:
Keywords = YOLOv5

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
34 pages, 3563 KB  
Article
Computer Vision Applied to the Analysis of Pig Behavior Patterns in an Air-Conditioned Environment
by Maria de Fatima Araújo Alves, Héliton Pandorfi, Rodrigo Gabriel Ferreira Soares, Victor Wanderley Costa de Medeiros, Taíze Calvacante Santana, Vitoria Katarina Grobner, Gabriel Thales Barboza Marinho, Gledson Luiz Pontes de Almeida, Maria Beatriz Ferreira and Marcos Vinícius da Silva
Animals 2026, 16(9), 1353; https://doi.org/10.3390/ani16091353 - 28 Apr 2026
Abstract
Observing pig behavior, such as feed intake, water intake, and resting behavior, is essential for improving the well-being of these animals. However, monitoring such behaviors by traditional methods can be exhausting for both humans and animals, interfering with their development. The research aimed [...] Read more.
Observing pig behavior, such as feed intake, water intake, and resting behavior, is essential for improving the well-being of these animals. However, monitoring such behaviors by traditional methods can be exhausting for both humans and animals, interfering with their development. The research aimed to identify behavioral patterns of pigs in an air-conditioned environment through computer vision. Microcameras were installed in the animals’ stalls to generate videos over an experimental period of 92 days and the temperature and humidity of the air were simultaneously recorded. The physiological variables of the animals were collected to identify whether they were under heat stress. To recognize the drinking, eating, standing and lying behavior of pigs, YOLOv5 was trained and then the model was used to detect the animals. Regions in the images corresponding to the feeders and drinkers were established. To identify feeding behavior and water intake, criteria based on the occupation of the feeding zone by pigs detected in the standing position were established. The results showed that the trained model achieved an average accuracy rate of 97.3% and an average recall of 96.1% in animal detection. The model exhibited 97.5% accuracy and 97.0% recall rates in recognizing the feeding behavior and water consumption of pigs. The proposed method can be used in videos or images and minimizes the need for manual intervention, offering an efficient means of monitoring pig behavior in agricultural environments and contributing to the productivity of pig farming operations. Full article
(This article belongs to the Section Pigs)
Show Figures

Figure 1

24 pages, 8644 KB  
Article
YOLO-REFB: Rectangular Edge Fusion for Cardboard Box Detection in Warehouse Environments Using Mobile Robot
by Narendra Kumar Kolla and Pandu Ranga Vundavilli
Modelling 2026, 7(3), 83; https://doi.org/10.3390/modelling7030083 (registering DOI) - 28 Apr 2026
Abstract
Accurate detection of cardboard boxes is essential to mobile manipulators to perform pick-and-place operations in warehouses. Conventional object detection methods like YOLOv11 struggle in low-texture and occluded environments. This paper presents YOLO-REFB, a novel object detection framework for real-time cardboard box detection in [...] Read more.
Accurate detection of cardboard boxes is essential to mobile manipulators to perform pick-and-place operations in warehouses. Conventional object detection methods like YOLOv11 struggle in low-texture and occluded environments. This paper presents YOLO-REFB, a novel object detection framework for real-time cardboard box detection in robotic manipulation using a dual-arm mobile robot (DAMR) operating in indoor warehouse environments. The proposed approach enhances the network by integrating the Rectangular Edge Fusion Block (REFB) into the YOLOv11 architecture; it focuses on learning the geometric and structural features of cardboard boxes. Enhanced edge information extraction and feature fusion improve training stability and localization accuracy. A custom dataset of 3501 annotated images, collected under varied conditions, was utilized. The images were randomly assigned to training and validation sets while keeping an 80:20 ratio. They were manually annotated and trained using Roboflow software, ensuring precise alignment of bounding boxes with cardboard box edges for accurate comparison with existing YOLO models. The model outperformed existing YOLO variants (YOLOv8n and YOLOv5n) in terms of precision (89.29%), recall (83.95%), and F1-score (86.54%). YOLO-REFB achieved improved localization metrics, including mean Average Precision (mAP)@0.5 (91.68%) and mAP@0.5:0.95 (68.61%). The inclusion of REFB was essential to performance gains, enabling effective detection of objects in challenging environments. Future developments may include 3D pose estimation and multi-object grasp planning for advanced robotic manipulation. Full article
Show Figures

Figure 1

35 pages, 1539 KB  
Article
PGT-Net: A Physics-Guided Transformer–CNN Hybrid Network for Low-Light Image Enhancement and Object Detection in Traffic Scenes
by Bin Chen, Jian Qiao, Baowei Li, Shipeng Liu and Wei She
J. Imaging 2026, 12(5), 191; https://doi.org/10.3390/jimaging12050191 - 28 Apr 2026
Abstract
In autonomous driving and intelligent transportation systems, the degradation of image quality under low-light conditions severely impacts the reliability of subsequent object detection. Existing methods predominantly employ data-driven deep learning models for image enhancement, often lacking physical interpretability and struggling to maintain robustness [...] Read more.
In autonomous driving and intelligent transportation systems, the degradation of image quality under low-light conditions severely impacts the reliability of subsequent object detection. Existing methods predominantly employ data-driven deep learning models for image enhancement, often lacking physical interpretability and struggling to maintain robustness in complex lighting-varying traffic scenarios. To address this, this paper proposes a Physically Guided Transformer–CNN Hybrid Network (Physically Guided Transformer–CNN Hybrid Network, PGT-Net) for end-to-end joint optimization of low-light enhancement and object detection. PGT-Net innovatively integrates the atmospheric scattering physical model with deep learning architecture: first, a learnable physical guidance branch estimates the scene’s atmospheric illumination map and transmittance map, providing explicit physical priors for the network; second, a dual-branch enhancement backbone is designed, where the local CNN branch (based on an improved UNet) restores fine textures, while the Global Transformer Branch (based on Swin Transformer) models long-range dependencies to correct global uneven illumination, with features adaptively combined via a Physical Fusion Module to ensure enhancement results align with physical laws while retaining rich visual features; finally, the enhanced images are directly fed into a lightweight detection head (e.g., YOLOv7) for joint training and optimization. Comprehensive experiments on public datasets (ExDark, BDD100K-night, etc.) demonstrate that PGT-Net significantly outperforms mainstream methods (e.g., RetinexNet, KinD, Zero-DCE) in both low-light image enhancement quality (PSNR/SSIM) and object detection accuracy (mAP), while maintaining high inference efficiency. This research offers an interpretable, high-performance solution for visual perception tasks under adverse lighting conditions, holding strong theoretical significance and practical value. Full article
(This article belongs to the Section AI in Imaging)
41 pages, 16618 KB  
Article
Multi-Type Ship Detection in Complex Marine Backgrounds Using an Enhanced YOLO-Based Network
by Anran Du, Huiqi Xu and Wenqiang Yao
Sensors 2026, 26(9), 2718; https://doi.org/10.3390/s26092718 - 28 Apr 2026
Abstract
Accurate detection of ship targets in complex marine environments is fundamental to ensuring maritime security and safeguarding maritime rights. With the increasing diversity of vessel types and configurations, achieving precise identification of multiple ship classes amidst dynamic interference and cluttered backgrounds has emerged [...] Read more.
Accurate detection of ship targets in complex marine environments is fundamental to ensuring maritime security and safeguarding maritime rights. With the increasing diversity of vessel types and configurations, achieving precise identification of multiple ship classes amidst dynamic interference and cluttered backgrounds has emerged as a formidable challenge in marine surveillance. To address three pervasive issues in ship target detection—namely, high false-negative rates for small targets, inadequate feature discrimination, and imprecise localization—this paper proposes AK-DSAM-YOLOv13, a multi-scale detection algorithm specifically tailored for complex marine scenarios. Built upon the YOLOv13n architecture, the proposed algorithm implements integrated optimizations across the backbone network, neck structure, and loss function. First, a lightweight cross-scale feature extraction module, AKC3k2, is constructed by incorporating Alterable Kernel Convolutions (AKConv) to reconstruct the feature extraction path, thereby significantly enhancing the representation of multi-scale targets. Second, a Dynamic Up-Sampling Dual-Stream Attention Merging (DyDSAM) structure is designed, which integrates the DySample operator with a Dual-Stream Attention Mechanism (DSAM) to effectively suppress background clutter and improve feature fusion accuracy. Third, an Accuracy-Intersection-over-Union (AIoU) loss function is introduced to jointly optimize overlap area, center distance, and aspect ratio, enhancing localization robustness for small-scale objects. Experimental results on the self-built CM-Ships dataset, as well as the public SeaShips and McShips datasets, demonstrate that AK-DSAM-YOLOv13 significantly outperforms baseline models in detection accuracy, recall, and generalization capability while maintaining a low computational overhead. This research provides an efficient and reliable technical framework for intelligent maritime visual monitoring in complex environments. Full article
Show Figures

Figure 1

21 pages, 3220 KB  
Article
Enhanced Non-Invasive Estimation of Pig Body Weight in Growth Stage Based on Computer Vision
by Franck Morais de Oliveira, Verónica González Cadavid, Jairo Alexander Osorio Saraz, Felipe Andrés Obando Vega, Gabriel Araújo e Silva Ferraz and Patrícia Ferreira Ponciano Ferraz
AgriEngineering 2026, 8(5), 165; https://doi.org/10.3390/agriengineering8050165 - 28 Apr 2026
Abstract
Pig weighing is an essential procedure for monitoring growth and animal health; however, conventional methods are often labor-intensive, costly, and potentially stressful. In this context, this study proposes a non-invasive approach for estimating the body weight of pigs during the growing stage based [...] Read more.
Pig weighing is an essential procedure for monitoring growth and animal health; however, conventional methods are often labor-intensive, costly, and potentially stressful. In this context, this study proposes a non-invasive approach for estimating the body weight of pigs during the growing stage based on computer vision and the YOLOv11 algorithm, enabling automatic segmentation and individual identification in multi-animal environments. The study used RGB images of 10 group-housed pigs captured throughout the growing phase, in which automatic dorsal segmentation was combined with individual identification through numerical markings. From the generated binary masks, the segmented dorsal area was extracted and used as a predictor variable in Linear Regression and a Multilayer Perceptron (MLP) Artificial Neural Network. The YOLOv11 model showed consistent performance in the segmentation task, achieving test-set metrics of Precision = 0.849, Recall = 0.886, mAP@0.50 = 0.936, and mAP@0.50–0.95 = 0.819, demonstrating good generalization capability in scenarios with intense animal interaction. In the weight prediction stage, Linear Regression and the MLP achieved high coefficients of determination (R2 = 0.96 and 0.95, respectively) with low errors (RMSE = 1.52 kg and 1.63 kg; MAE = 1.20 kg and 1.25 kg), indicating a strong correlation between segmented dorsal area and actual body weight. Class-wise analysis revealed superior performance for classes 7 and 9, with R2 values up to 0.98 and RMSE below 1.1 kg, whereas class 8 showed greater error dispersion, associated with higher morphological variability and a smaller number of available samples. These results demonstrate that the direct use of morphometric information extracted from segmented masks in 2D images constitutes a robust, accurate, and low-cost approach for automatic pig body-weight estimation. Moreover, this study is among the few addressing this task specifically during the growing stage, highlighting its potential for future deployment in embedded systems and intelligent monitoring platforms for precision pig farming, although further evaluation of computational efficiency and real-time performance is still required. Full article
Show Figures

Figure 1

0 pages, 845 KB  
Proceeding Paper
You Only Look Once-Based Bitter Melon Size Classification Enhanced by Harris Corner Detection and Douglas–Peucker Algorithm
by Julian Marc B. Surara, Charles Ivan Matthew C. Nangit, Analyn N. Yumang and Charmaine C. Paglinawan
Eng. Proc. 2026, 134(1), 85; https://doi.org/10.3390/engproc2026134085 - 27 Apr 2026
Abstract
Accurate size classification remains a persistent challenge for agricultural products with irregular morphology, such as bitter melon (Momordica charantia). Proper grading is essential for fair pricing, efficient packaging, and compliance with the Association of Southeast Asian Nations and Philippine National Standards, [...] Read more.
Accurate size classification remains a persistent challenge for agricultural products with irregular morphology, such as bitter melon (Momordica charantia). Proper grading is essential for fair pricing, efficient packaging, and compliance with the Association of Southeast Asian Nations and Philippine National Standards, yet traditional manual sorting often results in inconsistencies. To address this, we introduce an automated classification framework built on the You Only Look Once Version 8 (YOLOv8) model. The system integrates Harris Corner Detection to enhance feature extraction and the Douglas–Peucker algorithm to simplify contour representations, thereby reducing noise and improving shape analysis. A dataset of Ampalaya images was trained and processed to detect and categorize fruit sizes, with evaluation conducted through a confusion matrix. Experimental results showed an overall classification accuracy of 93.75%, demonstrating that the combined approach effectively balances precision with computational efficiency. Beyond improving classification accuracy, the findings highlight the broader potential of combining deep learning and contour-based methods to advance agricultural automation, optimize post-harvest workflows, and strengthen competitiveness in both local and international markets. Full article
Show Figures

Figure 1

21 pages, 2785 KB  
Article
Comparative Evaluation of Deep Learning Object Detectors for Embedded Weed Detection on Resource-Constrained Platforms
by Nurtay Albanbay, Yerik Nugman, Mukhagali Sagyntay, Azamat Mustafa, Ramona Blanes, Algazy Zhauyt, Rustem Kaiyrov and Nurgali Nurgozhayev
Technologies 2026, 14(5), 265; https://doi.org/10.3390/technologies14050265 - 27 Apr 2026
Abstract
Computer vision–based weed detection plays a critical role in agricultural robotics, enabling accurate, selective weeding. These systems operate on resource-constrained embedded platforms, which introduces a significant trade-off between accuracy and efficiency. This study presents a comparative evaluation of six detection models (YOLOv11n, YOLOv11s, [...] Read more.
Computer vision–based weed detection plays a critical role in agricultural robotics, enabling accurate, selective weeding. These systems operate on resource-constrained embedded platforms, which introduces a significant trade-off between accuracy and efficiency. This study presents a comparative evaluation of six detection models (YOLOv11n, YOLOv11s, SSD-Lite, NanoDet, Faster R-CNN, RT-DETR) for agro-robotic applications, measuring precision, recall, mAP@0.5, and runtime on low-power hard-ware. NanoDet achieved the highest detection accuracy (precision 98.6%, recall 94.2%, mAP@0.5 97.7%). YOLOv11s demonstrated similar performance (mAP@0.5: 96.1%) but required more computation. YOLOv11n provides the most favourable balance between accuracy and throughput (mAP@0.5: 94.6%, 207 FPS on a workstation). On Raspberry Pi 5, light models achieved 3–5 FPS. RT-DETR and Faster R-CNN exhibited high latency (3112–6500 ms/frame), which prevents real-time operation. NanoDet excelled in detection, while YOLOv11n provides the best balance between accuracy and efficiency for limited devices. Full article
33 pages, 2418 KB  
Article
Comparative Evaluation of YOLOv11n Neck-Level Modifications for Precast Component and PPE Object Detection in Construction Environments
by Teerapun Saeheaw
Buildings 2026, 16(9), 1728; https://doi.org/10.3390/buildings16091728 - 27 Apr 2026
Abstract
Construction monitoring systems can benefit from automated detection of both structural components and personal protective equipment (PPE). While previous studies focus on single-task applications using YOLOv5-YOLOv10 architectures, this study presents a systematic comparative evaluation of four neck-level architectural modifications within the YOLOv11n framework: [...] Read more.
Construction monitoring systems can benefit from automated detection of both structural components and personal protective equipment (PPE). While previous studies focus on single-task applications using YOLOv5-YOLOv10 architectures, this study presents a systematic comparative evaluation of four neck-level architectural modifications within the YOLOv11n framework: Depthwise separable convolutions (DSC) for computational efficiency, multi-scale dilated convolutions (MSDC) for expanded receptive fields, feature refinement interfaces (FRI) for learnable feature adaptation, and dual attention mechanisms (DAM) for enhanced feature discriminability. Controlled experiments were conducted on precast components (3771 images, 6 classes) and PPE (5201 images, 3 classes) datasets. DAM-YOLO achieved the highest performance with mAP@50 of 0.972 (precast) and 0.968 (PPE), while the performance range across all variants spanned 0.942–0.972 (precast) and 0.936–0.968 (PPE). All variants demonstrated robust detection capabilities with mAP@50 ≥ 0.936 and mAP@50–95 spanning 0.760–0.825. The narrow performance differences across variants indicate that different neck-level modifications yield comparable detection accuracy, providing empirical evidence to support architecture selection within these two evaluated construction object detection scenarios. Full article
(This article belongs to the Section Construction Management, and Computers & Digitization)
36 pages, 9864 KB  
Article
Orchard-YOLO: A Robust Deep Learning Framework for Fruit Detection Complex Optical and Environmental Degradation
by Yichen Wang, Hongjun Tian, Yuhan Zhou, Yang Xiong, Yichen Li, Manlin Wang, Yijie Yin, Xiaoyin Guo, Jiani Wu, Jiesen Zhang, Ying Tang and Shuai Huang
Photonics 2026, 13(5), 429; https://doi.org/10.3390/photonics13050429 - 27 Apr 2026
Abstract
Accurate target perception in unstructured outdoor environments remains a fundamental challenge in computational imaging and machine vision, primarily due to severe optical degradation caused by variable illumination, specular highlights, and dense foliage occlusion. Existing optical sensing systems often struggle to maintain robustness under [...] Read more.
Accurate target perception in unstructured outdoor environments remains a fundamental challenge in computational imaging and machine vision, primarily due to severe optical degradation caused by variable illumination, specular highlights, and dense foliage occlusion. Existing optical sensing systems often struggle to maintain robustness under these physical constraints, especially when deployed on edge devices with strict computational limits. To address these challenges, this paper proposes Orchard-YOLO, a lightweight, computationally efficient object detection network designed to maintain robustness against environmental and optical noise in complex orchard environments. Unlike generic architectures, Orchard-YOLO introduces three architectural enhancements for robust detection: (1) a High-Resolution P2 Detection Head to preserve high-frequency optical details and fine-grained texture cues often lost during digital downsampling; (2) Coordinate Attention (CA) mechanisms integrated into the feature fusion pathway to filter out background optical interference and enhance spatial discrimination for heavily occluded targets; and (3) a Ghost-convolution-based backbone to optimize the inference pipeline for real-time edge processing. Evaluated on a comprehensive multi-fruit dataset under simulated optical stress (including ±50% illumination variation and up to 70% occlusion), Orchard-YOLO achieves 94.8% mAP@0.5. It shows improved robustness under illumination variation and occlusion compared to baseline models, while achieving up to 25 FPS on an NVIDIA Jetson Nano edge device. These results suggest that Orchard-YOLO offers a detection framework suitable for resource-constrained orchard perception. Full article
(This article belongs to the Special Issue Computational Imaging: Photonics and Optical Applications)
Show Figures

Figure 1

15 pages, 4149 KB  
Article
LRNet: A Lightweight Detection Model for Foreign Objects on Coal Mine Conveyor
by Lili Xu, Youli Yao and Airan Zhang
Electronics 2026, 15(9), 1848; https://doi.org/10.3390/electronics15091848 - 27 Apr 2026
Abstract
Coal mine conveyor belt foreign objects detection is critical in conveyor belt transportation of coal. Aiming at the problems that the existing coal mine conveyor belt foreign objects detection model has a large number of parameters, occupies more computer resources, and detects fewer [...] Read more.
Coal mine conveyor belt foreign objects detection is critical in conveyor belt transportation of coal. Aiming at the problems that the existing coal mine conveyor belt foreign objects detection model has a large number of parameters, occupies more computer resources, and detects fewer types of foreign objects, the original YOLOv13 object detection algorithm is optimized to achieve lightweight design and high precision. Therefore, a sophisticated lightweight YOLO network named LRNet is proposed based on the original YOLOv13, which is tailored for foreign objects detection on coal mine conveyor belts. First, lightweight ShuffleNetv2 is used as the backbone network for YOLOv13 to reduce computational cost and the number of parameters, and to improve the network parallelism. Second, the Bidirectional Feature Pyramid Network (BIFPN) is used as a feature fusion network to effectively fuse global deep and shallow key detail information. Finally, the Coordinate Attention (CA) mechanism is added to enhance the extraction capability of key features and strengthen the foreign objects target attention to improve the network model detection accuracy. The experimental results show that the average detection accuracy of LRNet reaches 91.0%, the number of parameters is 3.6 M. The proposed method can quickly and accurately detect foreign objects in coal mine conveyor belts with less computational resources, and at the same time, it shows strong adaptability and anti-interference ability, which reflects the effectiveness and advancedness of the LRNet model. Full article
20 pages, 5788 KB  
Article
YOLO-ESO: A Lightweight YOLOv10-Based Model for Individual Pig Identification in Complex Farming Environments
by Juanhua Zhu, Lele Song, Tong Fu, Yan Wang, Miao Wang and Ang Wu
Information 2026, 17(5), 421; https://doi.org/10.3390/info17050421 - 27 Apr 2026
Abstract
In intensive farming, contactless individual pig identification is crucial for precision feeding and health monitoring. However, real-world barn conditions—such as fluctuating illumination, severe occlusions, non-rigid poses, and high inter-individual similarity—pose significant challenges. Existing models struggle to balance high accuracy with lightweight deployment. To [...] Read more.
In intensive farming, contactless individual pig identification is crucial for precision feeding and health monitoring. However, real-world barn conditions—such as fluctuating illumination, severe occlusions, non-rigid poses, and high inter-individual similarity—pose significant challenges. Existing models struggle to balance high accuracy with lightweight deployment. To address this, we propose YOLO-ESO, an optimized detection framework based on YOLOv10n. YOLO-ESO introduces three core innovations: (1) integrating the C2f_ODConv module into the backbone to strengthen feature learning under complex poses via dynamic convolution; (2) redesigning the neck with a Semantics and Detail Infusion (SDI) module to improve multi-scale fusion while suppressing background noise; and (3) embedding an Efficient Multi-Scale Attention (EMA) mechanism before the detection head to capture fine-grained identity cues like texture and contours. Evaluated on a real-world pig dataset, YOLO-ESO achieves an mAP@0.5 of 96.6%, an mAP@0.5:0.95 of 71.1%, and an F1 of 92.0%. YOLO-ESO surpasses state-of-the-art detectors including YOLOv8, YOLOv11, and RT-DETR, while introducing only 8.7 GFLOPs and 3.48 million parameters. Overall, the proposed YOLO-ESO provides an accurate and lightweight solution for robust individual pig identification in complex farming environments, showing strong potential for practical deployment in precision livestock farming. Full article
Show Figures

Figure 1

26 pages, 13053 KB  
Article
GLAFC-YOLO: Multimodal Object Detection of Personnel for Indoor Fire Rescue in Smoke-Obscured Environments
by Chengyao Hou and Pingshan Liu
Fire 2026, 9(5), 182; https://doi.org/10.3390/fire9050182 - 27 Apr 2026
Abstract
Reliable detection of personnel is critical for situational awareness and life-saving interventions during indoor fire rescue operations, where dense smoke rapidly obscures visibility and compromises conventional vision systems. Visible-light cameras fail under such conditions due to severe Mie scattering, while thermal infrared (TIR) [...] Read more.
Reliable detection of personnel is critical for situational awareness and life-saving interventions during indoor fire rescue operations, where dense smoke rapidly obscures visibility and compromises conventional vision systems. Visible-light cameras fail under such conditions due to severe Mie scattering, while thermal infrared (TIR) imaging—though capable of penetrating smoke—often lacks the fine-grained texture needed to distinguish human forms from background clutter. Furthermore, practical deployment of multimodal sensors is hindered by spatial misalignment between modalities, which degrades fusion efficacy and detection accuracy. To address these challenges, this paper proposes GLAFC-YOLO (Global-Local Alignment and Frequency-aware Cross-attention Fusion), a dual-stream multimodal detection framework specifically designed for personnel localization in smoke-obscured indoor fires. GLAFC-YOLO fuses near-infrared (NIR) and TIR imagery through three novel components: (1) a Global-Local Feature Alignment Subnet (GL-FAS) that rectifies geometric misalignment across modalities; (2) a Modality-Adaptive Frequency Channel Attention (MA-FCA) module that enhances complementary smoke-penetrating thermal signatures and NIR texture cues in the frequency domain; and (3) a Confidence-Aware Transposed Cross-Attention (CA-TCA) mechanism that suppresses smoke-induced artifacts and restores weakened human-centric features. Evaluated on a newly collected multimodal dataset of indoor fire scenarios with annotated personnel, GLAFC-YOLO achieves substantial improvements over the baseline YOLOv11 architecture. Specifically, it achieves Recall improvements of 43.2% and 0.5% compared to unimodal NIR and TIR baselines, respectively. In addition, it achieves improvements of 37.4% and 3.9% in mAP50 and 17.3% and 17.0% in mAP5095. Experimental results indicate that GLAFC-YOLO outperforms competitive models and reduces personnel miss rates, demonstrating its robustness and readiness for real-world fire-rescue assistance. Full article
Show Figures

Figure 1

20 pages, 6122 KB  
Article
Automated Detection and Classification of Lunar Linear Tectonic Features Using a Deep Learning Method
by Xiaoyang Liu, Yang Luo, Jianhui Wang, Denggao Qiu, Jianguo Yan, Wensong Zhang and Yaowen Luo
Remote Sens. 2026, 18(9), 1330; https://doi.org/10.3390/rs18091330 - 26 Apr 2026
Abstract
On the lunar surface, wrinkle ridges, grabens, and lobate scarps represent key tectonic landforms that reflect the evolution of the Moon’s stress field and its tectonic processes. However, these linear structures often exhibit weak textures, low contrast, and large scale variations, making manual [...] Read more.
On the lunar surface, wrinkle ridges, grabens, and lobate scarps represent key tectonic landforms that reflect the evolution of the Moon’s stress field and its tectonic processes. However, these linear structures often exhibit weak textures, low contrast, and large scale variations, making manual interpretation inefficient and subjective. To address this issue, this study introduces an improved YOLOv8 model, termed HL-YOLOv8, for the automated detection of lunar linear features. The model incorporates a multiscale lightweight channel attention (C2f_MLCA) module into the backbone network to enhance the extraction of fine-grained and weak-texture features and integrates a multihead self-attention (C2f_MHSA) module in the feature fusion stage to improve the modelling of long-range spatial dependencies. In addition, the combination of a dual focal loss and a diversified data augmentation strategy effectively mitigates the detection difficulties caused by class imbalance and weak-feature samples. The experimental results obtained using the global LROC-WAC image dataset demonstrate that HL-YOLOv8 significantly outperforms the baseline YOLOv8 and other comparative models in terms of precision, recall, and mAP@0.5. Specifically, the proposed model achieved an average precision of 73.5%, an average recall of 73.1%, and an average mAP@0.5 of 74.6% on the evaluation dataset, showing particularly strong performance in detecting elongated grabens and boundary-blurred lobate scarps. The global distribution maps derived from the model predictions indicate that HL-YOLOv8 can be applied to comprehensively reconstruct the spatial patterns of the three types of linear structures and identify potential new features in high-latitude and geologically complex regions, demonstrating excellent generalizability and robustness. This study provides an efficient and reliable framework for the automated identification and global mapping of lunar linear features and offers a transferable methodological reference for the tectonic interpretation of terrestrial planets. Full article
23 pages, 5919 KB  
Article
Backbone and Feature Fusion Design for YOLOv8-Based Bacterial Microcolony Detection in Microscopy Images
by Malek Rababa, Anas AlSobeh, Namariq Dhahir and Amer AbuGhazaleh
Appl. Sci. 2026, 16(9), 4241; https://doi.org/10.3390/app16094241 - 26 Apr 2026
Abstract
Foodborne bacterial contamination creates significant public health and economic challenges. In the United States, the CDC estimates that foodborne illness causes approximately 48 million illnesses and 3000 deaths annually. Rapid screening is important because conventional confirmation methods are time- and labor-intensive. Microscopy-based analysis [...] Read more.
Foodborne bacterial contamination creates significant public health and economic challenges. In the United States, the CDC estimates that foodborne illness causes approximately 48 million illnesses and 3000 deaths annually. Rapid screening is important because conventional confirmation methods are time- and labor-intensive. Microscopy-based analysis of early bacterial microcolonies can enable detection within hours rather than days, yet manual inspection is slow, subjective, and impractical at scale. Although deep learning object detectors such as YOLO offer a promising solution, the impact of architectural design choices on microscopy-based bacterial detection has not been systematically characterized under controlled conditions. In this work, we conducted a controlled architectural evaluation of YOLOv8 for detecting bacterial microcolonies in high-resolution microscopy images. We replaced the CSP-Darknet backbone with EfficientNetV2 variants and evaluated three feature fusion designs: no neck, the original PAN-FPN neck, and a NAS-FPN-inspired neck. All experiments were performed under identical conditions on a two-class dataset of Salmonella and E. coli. Our results show that EfficientNetV2 architectures consistently outperform the YOLOv8x baseline, which achieved 0.891 precision, 0.867 recall, and 0.898 mAP@50. The best overall performance was obtained with EfficientNetV2-S and the original YOLOv8 neck, reaching 0.976 precision, 0.968 recall, and 0.987 mAP@50, with comparable performance of 0.986 mAP@50 achieved by EfficientNetV2-S + NAS-FPN. The highest precision was obtained with EfficientNetV2-L + NAS-FPN, reaching 0.978. These findings demonstrate that effective bacterial detection depends on the interaction between backbone capacity and feature fusion design rather than backbone scaling alone. Full article
(This article belongs to the Special Issue Innovative Computer Vision and Deep Learning Applications)
30 pages, 6414 KB  
Article
Research on Distracted and Fatigue-Related Driving Behavior Detection Based on YOLOv12-LAD
by Xiyao Liu, Zhiwei Guan, Qiang Chen and Yi Ren
Electronics 2026, 15(9), 1838; https://doi.org/10.3390/electronics15091838 - 26 Apr 2026
Abstract
Distracted and fatigue-related driving behaviors are major causes of road traffic accidents, creating an urgent need for reliable driver monitoring systems. Vision-based detection methods have garnered widespread attention due to their low cost of deployment and practical applicability. However, existing lightweight models often [...] Read more.
Distracted and fatigue-related driving behaviors are major causes of road traffic accidents, creating an urgent need for reliable driver monitoring systems. Vision-based detection methods have garnered widespread attention due to their low cost of deployment and practical applicability. However, existing lightweight models often suffer from limited global contextual perception and insufficient preservation of fine details. Motivated by these challenges, this study introduces an improved distracted and fatigue-related driving behavior detection model, YOLOv12-LAD, built on the YOLOv12 architecture. The proposed framework integrates a Large Separable Kernel Attention module (LSKA) to enhance global contextual perception, an Adaptive Downsampling module (ADown) to mitigate information loss during feature compression, and a Dynamic Sampling module (DySample) to enable content-adaptive feature reconstruction and improve multi-scale behavior representation. Experimental results show that YOLOv12-LAD achieved 97.5% precision, 96.3% recall, and 98.4% mAP@50 with only 2.5 million parameters, 6.2 GFLOPs, and an inference speed of 249 FPS. Ablation studies, comparisons with representative models, cross-dataset evaluation, and real-vehicle tests further verify the effectiveness and robustness of the proposed method. The proposed method demonstrates strong performance while maintaining computational efficiency, making it suitable for real-time vision-based driver monitoring applications. Full article
Show Figures

Figure 1

Back to TopTop