Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (6,115)

Search Parameters:
Keywords = YOLOv5

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
16 pages, 13549 KB  
Article
YOLO-ALD: An Efficient and Robust Lightweight Model for Apple Leaf Disease Detection in Complex Orchard Environments
by Lei Liu, Yinyin Li, Qingyu Liu, Huihui Sun, Yeguo Sun and Xiaobo Shen
Horticulturae 2026, 12(5), 550; https://doi.org/10.3390/horticulturae12050550 (registering DOI) - 30 Apr 2026
Abstract
Real-time detection of apple leaf diseases in orchard environments faces ongoing challenges, particularly in preserving fine-grained disease features with limited computing resources. To address these issues, we propose a high-precision lightweight model based on YOLOv10n, called YOLO-ALD. First, we introduce Spatial and Channel [...] Read more.
Real-time detection of apple leaf diseases in orchard environments faces ongoing challenges, particularly in preserving fine-grained disease features with limited computing resources. To address these issues, we propose a high-precision lightweight model based on YOLOv10n, called YOLO-ALD. First, we introduce Spatial and Channel Reconstruction Convolution into deeper backbone networks to replace standard downsampling layers and convolutions. This suppresses spatial and channel redundancy caused by environmental noise and optimizes feature representation. Second, we design a new C2f-Faster-SimAM module for the neck network. This module combines the inference efficiency of FasterNet with a parameter-free 3D attention mechanism to adaptively focus on early lesions, effectively distinguishing them from leaf veins without increasing model complexity. Third, in the detection head section, we use the Focaler-ShapeIoU loss function to optimize bounding box regression. It utilizes a dynamic focusing mechanism and geometric constraints to ensure the localization accuracy of irregular shapes and hard-to-detect samples. Experimental results on our self-built dataset covering four specific diseases and healthy leaves showed that, compared with YOLOv10n, the mAP@0.5 of YOLO-ALD reached 92.1%, achieving a 2.1% increase. In addition, the model has an inference speed of 105 FPS, with only 2.1 M parameters and 5.6 GFLOPs. Therefore, YOLO-ALD achieves a good balance between efficiency and robustness, showing strong theoretical potential for resource-constrained mobile agriculture diagnosis. Full article
(This article belongs to the Special Issue Emerging Technologies in Smart Agriculture)
Show Figures

Figure 1

24 pages, 4665 KB  
Article
Human Fall Detection with Infrared Imaging: A Comparison of Graph Convolutional Networks and YOLO
by Karol Perliński, Artur Faltyński and Aleksandra Świetlicka
Sensors 2026, 26(9), 2794; https://doi.org/10.3390/s26092794 (registering DOI) - 30 Apr 2026
Abstract
This paper presents a comparative study of two artificial intelligence approaches—graph convolutional networks (GCNs) and the YOLO object detection algorithm—for analyzing human fall events using infrared imaging. From the AI perspective, the study introduces a GCN model that achieves over 99% classification accuracy [...] Read more.
This paper presents a comparative study of two artificial intelligence approaches—graph convolutional networks (GCNs) and the YOLO object detection algorithm—for analyzing human fall events using infrared imaging. From the AI perspective, the study introduces a GCN model that achieves over 99% classification accuracy by modeling 2D and 3D skeletal data as graph structures and evaluates the real-time detection capabilities of YOLOv8 on infrared video frames. On the engineering side, the research addresses practical challenges in elderly care and healthcare monitoring systems by demonstrating how these AI methods can accurately detect and classify fall directions under infrared conditions. The results highlight each model’s strengths and propose a hybrid framework combining YOLO’s spatial localization with GCN’s motion-pattern analysis for future real-world applications. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

20 pages, 12707 KB  
Article
SWUAV-DANet: A Severe-Weather UAV Dataset and Dynamic AlignAir Network for Robust Aerial Vehicle Detection
by Longze Zhang and Yihong Li
Sensors 2026, 26(9), 2793; https://doi.org/10.3390/s26092793 (registering DOI) - 30 Apr 2026
Abstract
Unmanned aerial vehicle (UAV) aerial object detection is increasingly important for traffic monitoring, emergency rescue, and environmental perception. However, vehicle detection in heavy rain, dense fog, blizzards, and backlit night scenes suffers from target information loss, feature misalignment, and unstable performance. We, therefore, [...] Read more.
Unmanned aerial vehicle (UAV) aerial object detection is increasingly important for traffic monitoring, emergency rescue, and environmental perception. However, vehicle detection in heavy rain, dense fog, blizzards, and backlit night scenes suffers from target information loss, feature misalignment, and unstable performance. We, therefore, construct a new severe-weather UAV dataset, Severe-Weather UAV (SWUAV), and propose the real-time Dynamic AlignAir Network (DANet). SWUAV contains 18,195 red–green–blue (RGB) aerial images covering 12 adverse weather/illumination conditions with 236,392 vehicle instances. After the high-resolution backbone features, we insert a cross-scale adaptive alignment module that performs adaptive channel calibration, contrastive self-attention, and geometric/semantic remapping to reduce scale drift/mismatch, suppress noise, and strengthen degraded target cues; we then design a dynamic adaptive alignment head (DAAH) with a shared encoder and a deformable regression branch to mitigate classification–regression mismatch under adverse conditions while further reducing complexity. On SWUAV, DANet raises the YOLOv11-s baseline average precision (AP)/AP50 (AP at intersection over union, IoU = 0.50) from 43.9%/62.6% to 46.9%/64.8%, with only 8.65 M parameters, 22.7 giga floating-point operations (GFLOPs), and a 323.47 frames-per-second (FPS) end-to-end throughput (3.09 ms per image at batch size 16), outperforming EdgeYOLO-s and RT-DETR. The dataset and code are publicly available. Full article
(This article belongs to the Section Vehicular Sensing)
Show Figures

Figure 1

18 pages, 2135 KB  
Article
A Non-Destructive Early Sex Identification Method for Chicken Embryos Based on Improved MobileViT-V3
by Qian Yan, Chengyu Yu, Zhoushi Tan, Zesheng Wang and Qiaohua Wang
Animals 2026, 16(9), 1377; https://doi.org/10.3390/ani16091377 - 30 Apr 2026
Abstract
The global poultry hatching industry faces severe challenges of resource waste and animal ethics issues due to the routine culling of day-old male chicks. Meanwhile, early sex identification of 4-day-incubated chicken embryos is limited by low accuracy, as embryos at this stage have [...] Read more.
The global poultry hatching industry faces severe challenges of resource waste and animal ethics issues due to the routine culling of day-old male chicks. Meanwhile, early sex identification of 4-day-incubated chicken embryos is limited by low accuracy, as embryos at this stage have weak, low-contrast blood vessels that are highly susceptible to interference from the eggshell’s texture. To address these issues, this paper proposes a non-destructive early sex identification method for chicken embryos based on an improved MobileViT-V3 model. Taking the lightweight hybrid architecture MobileViT-V3 as the backbone, we embedded a Micro Feature Enhancement module (MFE-Module) in Stage 3 to strengthen the extraction of fine vascular details, and a Multi-Scale Adaptive Attention Fusion module (MSAAF-Module) in Stage 4 to realize adaptive weighted screening of multi-source features. Experiments on the self-constructed dataset of 4-day-incubated embryos show that the improved model achieves a test set classification accuracy of 92.26%, with an F1-score of 92.15%, a recall rate of 92.12%, and a Kappa coefficient of 0.845. It outperforms mainstream models such as YOLOv12, ShuffleNetV2, ConvNeXt-T, ResNet, and Swin-ViT, with only 2.98 M parameters and an inference speed of 97.6 FPS, well exceeding the 30 FPS real-time requirement of industrial sorting lines and showing high potential for practical industrial deployment. This method provides a new scheme for non-destructive, high-precision, and high-efficiency early sex identification in poultry hatching. Full article
Show Figures

Figure 1

14 pages, 3627 KB  
Article
Efficient YOLOv11 with a FasterNet Backbone and Attention for Multi-Class Underwater Object Detection in Nearshore Waters
by Yinghao He, Wenjie Yin, Ruomiao Song, Siyi Zhou, Shimin Shan and Shuo Liu
J. Mar. Sci. Eng. 2026, 14(9), 827; https://doi.org/10.3390/jmse14090827 - 29 Apr 2026
Abstract
Underwater multi-class object detection in nearshore waters is essential for intelligent cleaning operations and ecological monitoring. However, strong reflection and scattering interference, color attenuation, frequent occlusion, and non-rigid deformation often cause fine-grained information loss and feature misalignment in conventional detectors, leading to missed [...] Read more.
Underwater multi-class object detection in nearshore waters is essential for intelligent cleaning operations and ecological monitoring. However, strong reflection and scattering interference, color attenuation, frequent occlusion, and non-rigid deformation often cause fine-grained information loss and feature misalignment in conventional detectors, leading to missed and false detections. To address these challenges, we propose an enhanced YOLOv11 framework integrating FasterNet and attention mechanisms. Specifically, we include FasterNet to replace the YOLOv11 baseline backbone to improve fine-grained feature preservation while reducing computational redundancy. Furthermore, a Deformable Underwater Attention Module (DUAM) is introduced to capture local texture variations and deformation-aware features, enhancing discrimination among heterogeneous categories. Additionally, a Submerged Occlusion-Aware Head (SOAH) is designed to recalibrate features based on occlusion visibility, improving the detection of small-scale and partially occluded objects in the high-resolution P2 layer. Performance gains mainly stem from the recalibration strategy and its synergy with multi-scale optimization objectives. Experiments on a nearshore underwater multi-class dataset (8610 images across 40 classes) show that the proposed method increases mAP from 66.9% to 82.3%, achieving a 15.4-point improvement over baseline YOLOv11, with superior robustness under complex backgrounds. Full article
(This article belongs to the Special Issue Assessment and Monitoring of Coastal Water Quality)
34 pages, 36077 KB  
Article
Modular Multi-Attribute Vehicle Analysis by Color, License Plate, Make and Sub-Model Using YOLO and OCR: A Benchmark Across YOLO Versions
by Cristian Japhet Islas-Yañez, Viridiana Hernández-Herrera and Moisés Márquez-Olivera
Sensors 2026, 26(9), 2785; https://doi.org/10.3390/s26092785 - 29 Apr 2026
Abstract
We present a modular multi-attribute vehicle analysis pipeline that integrates YOLO-based models and an OCR engine into a single workflow. The system detects vehicles, classifies color, recognizes make and sub-model, detects license plates, and extracts plate characters to generate a structured vehicle record. [...] Read more.
We present a modular multi-attribute vehicle analysis pipeline that integrates YOLO-based models and an OCR engine into a single workflow. The system detects vehicles, classifies color, recognizes make and sub-model, detects license plates, and extracts plate characters to generate a structured vehicle record. Vehicle detection is reported with standard metrics (precision, recall, and mAP@0.5), while license plate detection is reported at IoU = 0.3 to reflect the small-object nature of plates and downstream OCR usability. Among the evaluated versions, YOLOv8 provides the most balanced overall performance across modules, while maintaining real-time-equivalent throughput of approximately 18–22 FPS for the full pipeline on recorded traffic videos, depending on scene complexity. We emphasize module-level evaluation and runtime benchmarking; instance-level end-to-end identification across unique vehicles is defined as future work once track-based ground truth becomes available. Full article
(This article belongs to the Topic Deep Visual Recognition: Methods, and Applications)
Show Figures

Figure 1

26 pages, 54080 KB  
Article
MPES-YOLO: A Multi-Scale Lightweight Framework with Selective Edge Enhancement for Loess Landslide Detection
by Hanyu Cheng, Jiali Su, Jiangbo Xi, Haixing Shang, Zhen Zhang, Bingkun Wang and Pan Li
Remote Sens. 2026, 18(9), 1374; https://doi.org/10.3390/rs18091374 - 29 Apr 2026
Abstract
Loess landslides in northwestern China are highly unstable and difficult to distinguish due to sparse vegetation and their spectral and morphological similarity to the surrounding terrain. These landslides demonstrate considerable diversity in manifestation, encompassing shallow translational slides, small-scale features, partially obscured formations, and [...] Read more.
Loess landslides in northwestern China are highly unstable and difficult to distinguish due to sparse vegetation and their spectral and morphological similarity to the surrounding terrain. These landslides demonstrate considerable diversity in manifestation, encompassing shallow translational slides, small-scale features, partially obscured formations, and instances with irregular or poorly defined boundaries. To address the above issues, we propose MPES-YOLO, a multi-scale lightweight YOLO-based framework with selective edge enhancement to detect loess landslides. This model is based on the YOLOv8 architecture and incorporates a multi-scale partial convolution and exponential moving average (MPCE) module to improve multi-scale feature representation while reducing computational cost and enhancing small-target sensitivity. Additionally, to address ambiguous boundaries, a selective edge enhancement (SEE) module is introduced to extract authentic object edges from original images and inject them into key training layers, improving boundary perception. Finally, SIoU is adopted to improve geometric consistency for irregular landslide boundary localization. This paper first verified the basic detection performance of MPES-YOLO on the publicly available Bijie landslide dataset. Then, an experimental study was conducted in the loess landslides of Yan’an City, Shaanxi Province. The mAP@0.5 was 91.9%, and the parameter quantity was reduced by 23.3% compared with the baseline model. A generalization experiment was also carried out on the landslides in the Ningxia region, with the mAP@0.5 being 97.4%. The results show that MPES-YOLO achieves a strong balance between detection accuracy and computational efficiency, providing an effective and scalable solution for automated loess landslide detection and geological disaster early warning. Full article
Show Figures

Figure 1

20 pages, 5162 KB  
Article
Toward Intelligent Emergency Triage: A Feasibility Study of Real-Time Facial Expression-Based Chest Pain Intensity Assessment
by Yu-Tse Tsan, Rita Wiryasaputra, Yi-Jun Hsieh, Qi-Xiang Zhang, Hsing-Hung Liu and Chao-Tung Yang
Diagnostics 2026, 16(9), 1346; https://doi.org/10.3390/diagnostics16091346 - 29 Apr 2026
Abstract
Objectives: Ensuring an effective triage to treat patients with chest pain in emergency settings is critical, but it can often be challenging, particularly when patients wear face masks or are unable to clearly communicate their pain. To address this limitation, this study [...] Read more.
Objectives: Ensuring an effective triage to treat patients with chest pain in emergency settings is critical, but it can often be challenging, particularly when patients wear face masks or are unable to clearly communicate their pain. To address this limitation, this study presents a real-time facial expression–based system for chest pain intensity assessment as an initial step toward realizing intelligent emergency triage. The proposed system integrates deep learning with real-time video analysis to provide objective and rapid pain level recognition. Methods: A YOLOv12-based facial expression recognition model was trained using annotated facial images of patients experiencing chest pain, and the model categorizes pain into three intensity levels: no pain, slight pain, and moderate to severe pain. Multiple YOLOv12 variants were systematically evaluated to identify an optimal configuration for potential clinical use. The developed system supports two operational modes: real-time recognition, which analyzes continuous video streams and delivers immediate visual feedback through an interactive interface, and a manual upload mode for offline video analysis, review of results, and playback. Additional usability features, including error prompts and data reset functions, were implemented to enhance system stability and user experience. Results: Among the evaluated models, the YOLOv12-L model achieved the best performance with an accuracy of 98.81%, sensitivity of 98.76%, specificity of 98.79%, precision of 98.04%, and an F1-score of 98.41%, demonstrating stable and accurate recognition. The proposed system is designed to support the triage process of assessing patients with chest pain, particularly in cases where patients wear masks or cannot clearly express their pain. By providing real-time and objective pain intensity assessment, the system shows potential to assist healthcare professionals in identifying patients who may require priority attention and to serve as a supportive tool for emergency triage workflows. Conclusions: Future work will incorporate edge computing with a lightweight model to enable real-time pain assessment in ambulances, facilitating faster intervention and treatment. Full article
(This article belongs to the Section Machine Learning and Artificial Intelligence in Diagnostics)
Show Figures

Figure 1

26 pages, 4074 KB  
Article
Early Diagnosis of Blood Disorders via Enhanced Image Preprocessing and Deep Learning Modeling
by Alpamis Kutlimuratov, Dilshod Eshmurodov, Fotima Tulaganova, Akhmet Utegenov, Piratdin Allayarov, Jamshid Khamzaev, Islambek Saymanov and Fazliddin Makhmudov
BioMedInformatics 2026, 6(3), 25; https://doi.org/10.3390/biomedinformatics6030025 - 29 Apr 2026
Abstract
Background: Accurate and early detection of hematological disorders from microscopic peripheral blood smear images remains a technically challenging task due to inherent imaging limitations, including noise contamination, low contrast, staining variability, and significant cellular overlap. Conventional deep learning-based object detection frameworks often [...] Read more.
Background: Accurate and early detection of hematological disorders from microscopic peripheral blood smear images remains a technically challenging task due to inherent imaging limitations, including noise contamination, low contrast, staining variability, and significant cellular overlap. Conventional deep learning-based object detection frameworks often exhibit limited robustness under such conditions and demonstrate reduced sensitivity to small-scale morphological structures, particularly platelets and abnormal cell variants. Methods: To address these challenges, this study proposes a hybrid detection framework that integrates a fuzzy logic-driven image preprocessing module with the YOLOv11 object detection architecture. The proposed preprocessing pipeline employs adaptive fuzzy membership functions to normalize pixel intensity distributions, suppress high-frequency noise, and enhance edge-defined cellular boundaries. This transformation produces a structurally optimized feature representation, improving downstream feature extraction and localization performance. The proposed framework was evaluated on a curated dataset of 3000 annotated microscopic blood smear images spanning five hematological classes. Results: Experimental results show that the fuzzy logic module improves mAP@0.5 by +3.4% and mAP@0.5:0.95 by +3.6%, confirming its effectiveness in enhancing both classification and localization accuracy. Conclusions: These findings demonstrate the robustness and practical applicability of the proposed hybrid approach under challenging imaging conditions. Full article
Show Figures

Figure 1

25 pages, 6442 KB  
Article
YOLOv12-WCIRS: An Improved YOLOv12-Based Framework for Small Intestinal Lesion Detection in WCE
by Shiren Ye, Liangjing Li, Zetong Zhang and Haipeng Ma
Computers 2026, 15(5), 283; https://doi.org/10.3390/computers15050283 - 29 Apr 2026
Abstract
Accurate detection of small intestinal lesions in wireless capsule endoscopy (WCE) images remains challenging because lesions are often small, weakly contrasted, irregular in shape, and easily confused with complex mucosal backgrounds. To address these difficulties, this study proposes YOLOv12-WCIRS, a WCE-oriented improvement of [...] Read more.
Accurate detection of small intestinal lesions in wireless capsule endoscopy (WCE) images remains challenging because lesions are often small, weakly contrasted, irregular in shape, and easily confused with complex mucosal backgrounds. To address these difficulties, this study proposes YOLOv12-WCIRS, a WCE-oriented improvement of YOLOv12 that jointly enhances local feature extraction, selective multi-scale fusion, background suppression, localization sensitivity, and scale-aware optimization. The proposed framework incorporates a Weighted Convolution (WConv) module, a Contextual Selection Fusion Module (CSFM), an Information Integration Attention Fusion (IIA_Fusion) module, a Receptive Field Attention-based detection head (RFAHeadDetect), and a Scale Dynamic Loss (SD Loss). Experiments on the SEE-AI dataset show that YOLOv12-WCIRS achieves 83.4% mAP@0.5 and 61.1% mAP@0.5:0.95, improving mAP@0.5 from 76.9% to 83.4% over the direct baseline YOLOv12 while maintaining competitive efficiency. Additional analyses, including cross-dataset validation on overlapping categories in Kvasir-Capsule, normal-frame false-alarm evaluation, false-positive/false-negative breakdown, and repeated-run statistical testing, further support the robustness and practical value of the proposed framework. These results indicate that YOLOv12-WCIRS provides an effective solution for automated lesion detection in WCE images and shows promise for computer-aided capsule endoscopy analysis. Full article
(This article belongs to the Special Issue Artificial Intelligence (AI) in Medical Informatics)
Show Figures

Figure 1

25 pages, 3336 KB  
Article
Automated Identification from CT Using Sphenoid Sinus Geometry as an Anatomical Biometric
by Nataliya Bilous, Vladyslav Malko, Dmytro Tkachenko and Marcus Frohme
Appl. Syst. Innov. 2026, 9(5), 89; https://doi.org/10.3390/asi9050089 - 29 Apr 2026
Abstract
Reliable identification of deceased individuals may be difficult when conventional biometric methods such as facial recognition, fingerprint analysis, or DNA profiling cannot be applied. In such cases, medical imaging records acquired during a person’s lifetime may serve as an alternative source of identifying [...] Read more.
Reliable identification of deceased individuals may be difficult when conventional biometric methods such as facial recognition, fingerprint analysis, or DNA profiling cannot be applied. In such cases, medical imaging records acquired during a person’s lifetime may serve as an alternative source of identifying information. Certain anatomical structures visible in computed tomography (CT), including the sphenoid sinus, exhibit considerable inter-individual variability while remaining relatively stable within the same individual. This study investigates the feasibility of using sphenoid sinus morphology as an anatomical biometric for automated identification from head CT scans. Identification is formulated as a ranking problem in which a query CT examination is compared with a reference database using geometric descriptors derived from segmentation masks, reducing dependence on CT intensity values. The dataset consisted of CT scans from 816 individuals acquired in two patient positioning modes: Head First Supine (HFS) and Head First Prone (HFP). Several deep learning architectures, including YOLOv8 variants, YOLO11L-seg, UNet++, DeepLabV3+, HRNet, and SegFormer-B2, were evaluated for sphenoid sinus segmentation. Based on F1-score performance and cross-mode stability, YOLO11L-seg was selected and further trained to construct a database of binary masks representing individual sphenoid sinus anatomy. Identification was performed using pairwise mask comparison based on the Intersection over Union (IoU) metric. To reduce the influence of segmentation artifacts and slice-level variability, the final similarity score for each candidate was computed as the average of the four highest IoU values across slice comparisons. Individuals were ranked according to similarity, and identification was considered successful if the correct subject appeared among the top five candidates and exceeded a predefined similarity threshold. The proposed approach achieved Top-5 identification accuracies of 97.27% for HFP and 87.67% for HFS acquisitions. These results demonstrate the feasibility of using sphenoid sinus geometry as a stable anatomical biometric for automated identification. The key contribution of this study is the introduction of a ranking-based identification framework that utilizes anatomical biometrics derived from CT data for reliable patient matching. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

31 pages, 3603 KB  
Article
High-Throughput Citrus Detection via Citrus-SGYOLOv2: A Symmetric Ghost-Based Architecture with High-Resolution Feature Fusion
by Jinfeng Li, Yutian Miao, Wenxuan Guo, Yuxiang Li, Qian Xu, Yue Xiang, Yanyu Chen, Xianyao Wang, Yunsen Liang and Jun Li
Agronomy 2026, 16(9), 894; https://doi.org/10.3390/agronomy16090894 - 28 Apr 2026
Abstract
Accurate high-throughput fruit detection is the core prerequisite for precision citrus management. Existing models face a critical trade-off between accuracy for small fruits and computational efficiency, restricting large-scale industry transformation. To resolve this, we propose Citrus-SGYOLOv2, an optimized deep learning architecture specifically engineered [...] Read more.
Accurate high-throughput fruit detection is the core prerequisite for precision citrus management. Existing models face a critical trade-off between accuracy for small fruits and computational efficiency, restricting large-scale industry transformation. To resolve this, we propose Citrus-SGYOLOv2, an optimized deep learning architecture specifically engineered for high-throughput phenotypic monitoring. The primary contribution of this work lies in three synergistic innovations: a novel Symmetric Ghost Backbone that prunes architectural redundancy while maintaining hierarchical feature depth; a Citrus Color Prior Calibration Attention Mechanism (Citrus_SE) that embeds physiological chromaticity priors to suppress complex spectral noise from foliage; and a P2-layer-based full-scale fusion strategy designed to recover fine-grained spatial details lost during downsampling. Experiments on our self-built dataset show that Citrus-SGYOLOv2 achieves 95.54% mAP@50 and 77.13% mAP@50–95, outperforming YOLOv11s by 5.03 and 9.90 percentage points respectively. Notably, the model achieves a 48.8% reduction in parameters (4.84 M) while sustaining a high-throughput inference speed of 139.00 FPS. This research provides a robust and efficient foundational framework for intelligent yield estimation and precision orchard management. Full article
(This article belongs to the Special Issue Novel Studies in High-Throughput Plant Phenomics)
34 pages, 3563 KB  
Article
Computer Vision Applied to the Analysis of Pig Behavior Patterns in an Air-Conditioned Environment
by Maria de Fatima Araújo Alves, Héliton Pandorfi, Rodrigo Gabriel Ferreira Soares, Victor Wanderley Costa de Medeiros, Taíze Calvacante Santana, Vitoria Katarina Grobner, Gabriel Thales Barboza Marinho, Gledson Luiz Pontes de Almeida, Maria Beatriz Ferreira and Marcos Vinícius da Silva
Animals 2026, 16(9), 1353; https://doi.org/10.3390/ani16091353 - 28 Apr 2026
Abstract
Observing pig behavior, such as feed intake, water intake, and resting behavior, is essential for improving the well-being of these animals. However, monitoring such behaviors by traditional methods can be exhausting for both humans and animals, interfering with their development. The research aimed [...] Read more.
Observing pig behavior, such as feed intake, water intake, and resting behavior, is essential for improving the well-being of these animals. However, monitoring such behaviors by traditional methods can be exhausting for both humans and animals, interfering with their development. The research aimed to identify behavioral patterns of pigs in an air-conditioned environment through computer vision. Microcameras were installed in the animals’ stalls to generate videos over an experimental period of 92 days and the temperature and humidity of the air were simultaneously recorded. The physiological variables of the animals were collected to identify whether they were under heat stress. To recognize the drinking, eating, standing and lying behavior of pigs, YOLOv5 was trained and then the model was used to detect the animals. Regions in the images corresponding to the feeders and drinkers were established. To identify feeding behavior and water intake, criteria based on the occupation of the feeding zone by pigs detected in the standing position were established. The results showed that the trained model achieved an average accuracy rate of 97.3% and an average recall of 96.1% in animal detection. The model exhibited 97.5% accuracy and 97.0% recall rates in recognizing the feeding behavior and water consumption of pigs. The proposed method can be used in videos or images and minimizes the need for manual intervention, offering an efficient means of monitoring pig behavior in agricultural environments and contributing to the productivity of pig farming operations. Full article
(This article belongs to the Section Pigs)
Show Figures

Figure 1

24 pages, 8644 KB  
Article
YOLO-REFB: Rectangular Edge Fusion for Cardboard Box Detection in Warehouse Environments Using Mobile Robot
by Narendra Kumar Kolla and Pandu Ranga Vundavilli
Modelling 2026, 7(3), 83; https://doi.org/10.3390/modelling7030083 - 28 Apr 2026
Abstract
Accurate detection of cardboard boxes is essential to mobile manipulators to perform pick-and-place operations in warehouses. Conventional object detection methods like YOLOv11 struggle in low-texture and occluded environments. This paper presents YOLO-REFB, a novel object detection framework for real-time cardboard box detection in [...] Read more.
Accurate detection of cardboard boxes is essential to mobile manipulators to perform pick-and-place operations in warehouses. Conventional object detection methods like YOLOv11 struggle in low-texture and occluded environments. This paper presents YOLO-REFB, a novel object detection framework for real-time cardboard box detection in robotic manipulation using a dual-arm mobile robot (DAMR) operating in indoor warehouse environments. The proposed approach enhances the network by integrating the Rectangular Edge Fusion Block (REFB) into the YOLOv11 architecture; it focuses on learning the geometric and structural features of cardboard boxes. Enhanced edge information extraction and feature fusion improve training stability and localization accuracy. A custom dataset of 3501 annotated images, collected under varied conditions, was utilized. The images were randomly assigned to training and validation sets while keeping an 80:20 ratio. They were manually annotated and trained using Roboflow software, ensuring precise alignment of bounding boxes with cardboard box edges for accurate comparison with existing YOLO models. The model outperformed existing YOLO variants (YOLOv8n and YOLOv5n) in terms of precision (89.29%), recall (83.95%), and F1-score (86.54%). YOLO-REFB achieved improved localization metrics, including mean Average Precision (mAP)@0.5 (91.68%) and mAP@0.5:0.95 (68.61%). The inclusion of REFB was essential to performance gains, enabling effective detection of objects in challenging environments. Future developments may include 3D pose estimation and multi-object grasp planning for advanced robotic manipulation. Full article
Show Figures

Figure 1

36 pages, 1539 KB  
Article
PGT-Net: A Physics-Guided Transformer–CNN Hybrid Network for Low-Light Image Enhancement and Object Detection in Traffic Scenes
by Bin Chen, Jian Qiao, Baowei Li, Shipeng Liu and Wei She
J. Imaging 2026, 12(5), 191; https://doi.org/10.3390/jimaging12050191 - 28 Apr 2026
Abstract
In autonomous driving and intelligent transportation systems, the degradation of image quality under low-light conditions severely impacts the reliability of subsequent object detection. Existing methods predominantly employ data-driven deep learning models for image enhancement, often lacking physical interpretability and struggling to maintain robustness [...] Read more.
In autonomous driving and intelligent transportation systems, the degradation of image quality under low-light conditions severely impacts the reliability of subsequent object detection. Existing methods predominantly employ data-driven deep learning models for image enhancement, often lacking physical interpretability and struggling to maintain robustness in complex lighting-varying traffic scenarios. To address this, this paper proposes a Physically Guided Transformer–CNN Hybrid Network (Physically Guided Transformer–CNN Hybrid Network, PGT-Net) for end-to-end joint optimization of low-light enhancement and object detection. PGT-Net innovatively integrates the atmospheric scattering physical model with deep learning architecture: first, a learnable physical guidance branch estimates the scene’s atmospheric illumination map and transmittance map, providing explicit physical priors for the network; second, a dual-branch enhancement backbone is designed, where the local CNN branch (based on an improved UNet) restores fine textures, while the Global Transformer Branch (based on Swin Transformer) models long-range dependencies to correct global uneven illumination, with features adaptively combined via a Physical Fusion Module to ensure enhancement results align with physical laws while retaining rich visual features; finally, the enhanced images are directly fed into a lightweight detection head (e.g., YOLOv7) for joint training and optimization. Comprehensive experiments on public datasets (ExDark, BDD100K-night, etc.) demonstrate that PGT-Net significantly outperforms mainstream methods (e.g., RetinexNet, KinD, Zero-DCE) in both low-light image enhancement quality (PSNR/SSIM) and object detection accuracy (mAP), while maintaining high inference efficiency. This research offers an interpretable, high-performance solution for visual perception tasks under adverse lighting conditions, holding strong theoretical significance and practical value. Full article
(This article belongs to the Section AI in Imaging)
Back to TopTop