MDPI - Publisher of Open Access Journals

32 pages, 18066 KB

Open AccessArticle

Grapevine Winter Pruning Point Localization Using YOLO-Based Instance Segmentation

by Magdalena Kapłan and Kamil Buczyński

Agriculture 2026, 16(9), 943; https://doi.org/10.3390/agriculture16090943 - 24 Apr 2026

Winter pruning is a key management practice in viticulture that directly affects vine architecture, yield balance, and grape quality. At the same time, it is a highly labor-intensive operation, and the selective identification of appropriate cutting locations remains one of the main challenges [...] Read more.

Winter pruning is a key management practice in viticulture that directly affects vine architecture, yield balance, and grape quality. At the same time, it is a highly labor-intensive operation, and the selective identification of appropriate cutting locations remains one of the main challenges limiting the automation of pruning in vineyards. Advances in machine vision provide new opportunities to support the development of robotic pruning systems. The objective of this study was to develop and evaluate a vision-based method for estimating grapevine pruning points and cutting lines using instance segmentation outputs generated by YOLO models. A dataset of 1500 RGB images of dormant grapevines was collected under field conditions in the Nobilis vineyard located in southeastern Poland. Two annotation strategies were implemented to define pruning regions. YOLO-based instance segmentation models were trained and evaluated for detecting cutting-related structures. Based on the predicted segmentation masks, a geometry-based method termed PCAcutSeg-V was developed to estimate class-dependent cutting points and cutting lines using principal component analysis applied to object contours. The results indicate that YOLOv8 and YOLO11 architectures achieved the highest segmentation performance among the evaluated models. The simplified annotation strategy provided more stable geometric inputs for the PCAcutSeg-V method, enabling more reliable estimation of cutting points and cutting lines compared with the extended annotation approach. When combined with the PCAcutSeg-V method, the proposed perception–geometry pipeline achieved high effectiveness in pruning decision estimation. The method was further implemented in a real-time processing pipeline using an RGB camera and an edge computing platform, where it maintained performance consistent with the results obtained from offline image analysis. These findings demonstrate that combining deep learning-based instance segmentation with deterministic geometric reasoning enables accurate and interpretable estimation of grapevine pruning locations and provides a promising foundation for future autonomous pruning systems. Full article

(This article belongs to the Special Issue Adapting Horticultural Plant Cultivation Technology and Storage to Changing Conditions)

► Show Figures

Figure 1

26 pages, 13181 KB

Open AccessArticle

QHAWAY: An Instance Segmentation and Monocular Distance Estimation ADAS for Vulnerable Road Users in Informal Andean Urban Corridors

by Abel De la Cruz-Moran, Hemerson Lizarbe-Alarcon, Wilmer Moncada, Victor Bellido-Aedo, Carlos Carrasco-Badajoz, Carolina Rayme-Chalco, Cristhian Aldana Yarlequé, Yesenia Saavedra, Edwin Saavedra and Alex Pereda

Sensors 2026, 26(8), 2569; https://doi.org/10.3390/s26082569 - 21 Apr 2026

Viewed by 154

Abstract

Vulnerable road users in informal urban environments confront a distinct set of hazards that standard computer vision datasets are ill-equipped to represent: artisanal speed bumps constructed without regulatory compliance, deteriorated road markings, and the mototaxi—a three-wheeled motorized vehicle that constitutes the primary informal [...] Read more.

Vulnerable road users in informal urban environments confront a distinct set of hazards that standard computer vision datasets are ill-equipped to represent: artisanal speed bumps constructed without regulatory compliance, deteriorated road markings, and the mototaxi—a three-wheeled motorized vehicle that constitutes the primary informal transport mode in intermediate Andean cities yet is absent from all major international repositories. This paper presents QHAWAY—from Quechua qhaway, a transitive verb meaning “to look; to observe”—an Advanced Driver Assistance System (ADAS) predicated on instance segmentation, monocular distance estimation via the pinhole camera model, and Time-to-Collision (TTC) computation, developed for the road environment of Ayacucho, Peru (2761 m a.s.l.), a city recognised by UNESCO as a Creative City of Crafts and Folk Art since 2019. A hybrid dataset comprising 25,602 images with 127,525 annotated instances across 12 classes was assembled by combining an original local collection of 4598 images (10,701 instances) captured through four complementary acquisition methods across the five urban districts of the Huamanga province with three established international datasets (BDD100K, BSTLD, RLMD; 21,004 images, 116,824 instances). A three-phase progressive training strategy with monotonically increasing resolution (640, 800, and 1024 pixels) was evaluated as an ablation study. A multi-architecture comparison spanning YOLOv8L-seg and the YOLO26 family (nano, small, large) identified YOLO26L-seg as the best-performing model, attaining mAP50 Box of 0.829 and mAP50 Mask of 0.788 at epoch 179. The integration of ByteTrack multi-object tracking with the pinhole equation

D = (H_{real} \times f) / h_{px}

delineates operational risk zones aligned with the NHTSA forward collision warning standard (danger: <3 m; caution: 3–7 m; TTC threshold ≤ 2.4 s). The system sustains processing rates of 19.2–25.4 FPS on an NVIDIA RTX 5080 GPU. A systematic field survey established that 96% of the audited speed bumps fail to comply with MTC Directive No. 01-2011-MTC/14, constituting the first quantitative record of informal road infrastructure non-compliance in the Andean region. Validation was conducted under naturalistic driving conditions without staged scenarios. Grad-CAM explainability analysis, encompassing three complementary visualisation algorithms (Grad-CAM, Grad-CAM++, and EigenCAM), confirmed that model attention concentrates consistently on safety-critical objects. Full article

(This article belongs to the Collection Applications of Convolutional Neural Networks in Imaging and Sensing)

► Show Figures

Figure 1

20 pages, 3012 KB

Open AccessArticle

A Two-Stage Deep Learning Framework for Automated Corrosion Detection and Severity Estimation in High-Resolution SEM Images

by Satyabrata Aich, Sudipta Mohapatra, Shrabani Nanda, Taqdees Khan, Ayushi Bharti, Hajra Sultana, Umashankari Kalaiarsan, Chea Senghuy, Okpete Uchenna Esther Ada, Proloy Kumar Mondal and Yong-Ki Lee

Automation 2026, 7(2), 65; https://doi.org/10.3390/automation7020065 - 20 Apr 2026

Viewed by 272

Abstract

Accurate detection and severity estimation of corrosion on metallic surfaces is essential for maintaining material integrity and ensuring operational safety in industrial systems. To address limitations in manual inspection methods, this study presents a two-stage deep learning pipeline tailored for high-resolution scanning electron [...] Read more.

Accurate detection and severity estimation of corrosion on metallic surfaces is essential for maintaining material integrity and ensuring operational safety in industrial systems. To address limitations in manual inspection methods, this study presents a two-stage deep learning pipeline tailored for high-resolution scanning electron microscopy images. The framework combines instance-level corrosion segmentation using the YOLOv8-seg architecture with subsequent severity classification performed by EfficientNet-B0 and ResNet18. In the segmentation stage, models are trained using both manually annotated and automatically generated binary masks, enabling robust instance mask prediction through prototype-based mask decoding. The classification stage assesses the severity of corrosion by analyzing localized regions based on morphological features, leveraging convolutional neural networks optimized for binary output. The experimental results demonstrate strong performance: the segmentation model trained on manual annotations achieves a Mean Intersection over Union (mIoU) of 89.91, a mask mAP@50 of 98.6, and an ROC-AUC of 94.69. For severity classification, EfficientNet-B0 achieves an accuracy of 93.75% and an F1-score of 93.29, outperforming ResNet18. The proposed framework connects advanced SEM with state-of-the-art machine learning. It provides a scalable, annotation-efficient way to use intelligent and automated corrosion characterization in materials science and industrial applications. Full article

► Show Figures

Figure 1

24 pages, 5270 KB

Open AccessArticle

Decoupled Detection and Category-Level 6D Pose Estimation for Robot Grasping

by Chia-Tse Lai, Chen-Chien Hsu, Shao-Kang Huang and Yin-Tien Wang

Electronics 2026, 15(8), 1706; https://doi.org/10.3390/electronics15081706 - 17 Apr 2026

Viewed by 168

Abstract

6D object pose estimation is an essential component for robotic grasping. Most existing deep learning-based approaches focus on instance-level pose estimation, which requires prior object models and consequently limits their applicability on unseen objects in real-world scenarios. In contrast, category-level 6D pose estimation [...] Read more.

6D object pose estimation is an essential component for robotic grasping. Most existing deep learning-based approaches focus on instance-level pose estimation, which requires prior object models and consequently limits their applicability on unseen objects in real-world scenarios. In contrast, category-level 6D pose estimation adopts Normalized Object Coordinate Space (NOCS) maps to represent intra-class object geometry, enabling pose prediction without relying on predefined object models and thus improving generalization to unseen instances. However, the original NOCS-based category-level framework typically trains NOCS prediction and object classification in a joint manner, which introduces NOCS regression error among inter-class instances with similar appearances, thereby degrading pose estimation accuracy. To address this issue, we integrate the YOLOv8 object detection with SegFormer and propose a novel Category-Level SegFormer for 6D Object Pose Estimation (CLSF-6DPE). By decoupling object classification from NOCS regression through independent learning branches, the proposed framework significantly improves pose estimation performance. Furthermore, we validate the practical feasibility of CLSF-6DPE by integrating it with a robotic gripper via the Robot Operating System (ROS) in a Real-World grasping setup. Experimental results on the CAMERA and Real-World datasets demonstrate that the proposed method achieves mAP scores of 93.8% and 81.1%, respectively. Overall, the proposed method provides a modular and effective solution for category-level pose estimation in real-world robotic grasping applications. Full article

(This article belongs to the Special Issue Robotics: From Technologies to Applications)

► Show Figures

Figure 1

21 pages, 23093 KB

Open AccessArticle

Keyframe-Guided Crack Segmentation and 3D Localization for UAV-Based Monocular Inspection

by Feifei Tang, Wuyuntana Gongzhabayier, Jing Li, Tao Zhou, Yue Qiu, Yong Zhan and Qiulin Song

Symmetry 2026, 18(4), 657; https://doi.org/10.3390/sym18040657 - 15 Apr 2026

Viewed by 246

Abstract

In unmanned aerial vehicle (UAV)-based monocular inspection, cracks typically present as geometrically asymmetric, elongated, low-contrast weak targets, making accurate segmentation and spatial localization challenging. Existing methods are susceptible to missed detections and false positives when handling slender cracks, and monocular 3D reconstruction for [...] Read more.

In unmanned aerial vehicle (UAV)-based monocular inspection, cracks typically present as geometrically asymmetric, elongated, low-contrast weak targets, making accurate segmentation and spatial localization challenging. Existing methods are susceptible to missed detections and false positives when handling slender cracks, and monocular 3D reconstruction for localization is often burdened by redundant frames, resulting in limited modeling efficiency. To mitigate these issues, we propose a high-precision framework for crack segmentation and spatial localization from UAV imagery. First, Oriented FAST and Rotated BRIEF–Simultaneous Localization and Mapping, version 3 (ORB-SLAM3) is adopted for keyframe selection to suppress data redundancy and improve reconstruction stability. Second, we develop an enhanced YOLOv11-seg model by integrating the Dilation-wise Residual Segmentation (DWRSeg) module, the Weighted IoU (WIoU) loss, and the Lightweight shared convolutional separator batch-normalization detection head (LSCSBD) to strengthen feature discrimination and segmentation robustness for slender cracks, yielding high-quality crack masks. Finally, the predicted masks are projected onto the reconstructed 3D surface to obtain precise spatial localization. Our experimental results demonstrate that the proposed approach improves the segmentation mAP@50 by 7.2% over the baseline while reducing computational complexity from 10.2 to 9.8 GFLOPs. In addition, keyframe-based processing reduces the 3D modeling time by 59.4% compared to that with full-frame reconstruction. Overall, the proposed framework jointly enhances crack segmentation accuracy and substantially accelerates 3D modeling and localization, providing an effective solution for efficient UAV-based crack inspection. Full article

(This article belongs to the Special Issue Symmetry/Asymmetry in Intelligent Transportation)

► Show Figures

Figure 1

15 pages, 2304 KB

Open AccessArticle

A Reproducible COCO-Polygon Quality-Control Pipeline Improves Segmentation Stability in Endoscopic Airway Imaging

by Medine Atmaca, Ilkay Sibel Kervancı and Necati Olgun

Diagnostics 2026, 16(8), 1160; https://doi.org/10.3390/diagnostics16081160 - 14 Apr 2026

Viewed by 285

Abstract

Background/Objectives: Endoscopic airway imaging, used in endoscopy-guided airway management, enables geometry-based assessment of the airways. However, COCO-format datasets often include fragmented regions and geometrically inconsistent polygon annotations. Such inconsistencies may reduce reproducibility and spatial stability in deep learning-based image segmentation. This study [...] Read more.

Background/Objectives: Endoscopic airway imaging, used in endoscopy-guided airway management, enables geometry-based assessment of the airways. However, COCO-format datasets often include fragmented regions and geometrically inconsistent polygon annotations. Such inconsistencies may reduce reproducibility and spatial stability in deep learning-based image segmentation. This study proposes a systematic annotation quality-control (QC) workflow to improve dataset integrity before model training. Methods: The phantom subset of the Upper Airway Anatomical Landmark (UAAL) dataset containing 4526 polygon instances across 2746 frames (2267 training; 479 validation) was analyzed. The QC pipeline validated polygon structure, generated masks at native image resolution, and removed small noise-like instances using an area threshold. A YOLOv8-seg model was trained using (i) original annotations and (ii) QC-refined annotations. Performance was evaluated using precision, recall, mAP@0.5, mAP@0.5:0.95, and Dice similarity coefficient (DSC). Frame-level DSC values were compared using the Wilcoxon signed-rank test. Results: Annotation QC improved boundary consistency and reduced mask fragmentation. Training with QC-refined annotations increased box mAP@0.5:0.95 from 0.602 to 0.628 and mean DSC from 0.823 to 0.830 (p < 0.05). A pilot evaluation on the UAAL clinical subset also showed improved performance, with Box mAP@0.5 increasing from 0.635 to 0.706 and Mask mAP@0.5 from 0.631 to 0.704. Conclusions: Annotation-level QC enhances segmentation robustness without modifying the network architecture. The proposed workflow improves the reproducibility of results in endoscopic image segmentation and may support more stable geometry-based airway analysis in deep learning applications. Full article

(This article belongs to the Section Medical Imaging and Theranostics)

► Show Figures

Figure 1

6 pages, 1313 KB

Open AccessProceeding Paper

Detection and Segmentation of Surface Corrosion in Steel-Based Hand Tools Using You Only Look Once Version 8

by Allen Gabriel B. Tolentino, Vynz Alfred B. Raynes and Analyn N. Yumang

Eng. Proc. 2026, 134(1), 56; https://doi.org/10.3390/engproc2026134056 - 13 Apr 2026

Viewed by 139

Abstract

A YOLOv8-based instance segmentation model is developed for detecting and segmenting surface corrosion in steel hand tools in this study. The model was trained using a 7000-image custom dataset and deployed on a Raspberry Pi 5-based setup with a controlled lighting environment. The [...] Read more.

A YOLOv8-based instance segmentation model is developed for detecting and segmenting surface corrosion in steel hand tools in this study. The model was trained using a 7000-image custom dataset and deployed on a Raspberry Pi 5-based setup with a controlled lighting environment. The testing results showed an overall segmentation precision of 86.62%, a recall of 82.90%, and an F1-score 84.72%. Corrosion segmentation struggles with a precision of 75.02%, a recall of 17.40%, an F1-score of 28.25% and a Dice coefficient of 72.21%, demonstrating effective tool detection and classification, but it struggles on small corrosion patches, emphasizing the need for architectural and dataset enhancements. Full article

(This article belongs to the Proceedings of The 7th Eurasia Conference on IoT, Communication and Engineering 2025 (ECICE 2025))

► Show Figures

Figure 1

25 pages, 16496 KB

Open AccessArticle

MassSeg-Framework: A Breast Mass Detection and Segmentation Framework Based on Deep Learning and an Active Contour Model

by Camila Zambrano, Noel Pérez-Pérez, Miguel Coimbra, Maria Baldeon-Calisto, Ricardo Flores-Moyano, José Ramón Mora, Oscar Camacho and Diego Benítez

Life 2026, 16(4), 653; https://doi.org/10.3390/life16040653 - 12 Apr 2026

Viewed by 484

Abstract

This work introduces the MassSeg-Framework, a fully automatic two-stage pipeline for breast mass analysis in mammography that integrates YOLOv11-based detection with Chan–Vese ACM refinement to achieve accurate mass localization and segmentation with a lightweight computational footprint. The framework was trained and evaluated [...] Read more.

This work introduces the MassSeg-Framework, a fully automatic two-stage pipeline for breast mass analysis in mammography that integrates YOLOv11-based detection with Chan–Vese ACM refinement to achieve accurate mass localization and segmentation with a lightweight computational footprint. The framework was trained and evaluated on two publicly available datasets using consistent experimental protocols. In the detection stage, YOLOv11-nano was the most effective architecture, with a confidence threshold of 0.4, achieving statistically significant mAP50 values of 0.862 and 0.709 on the dINbreast and dCBIS datasets, respectively. These results confirm that a moderate threshold preserves clinically relevant true-positive candidates, which is particularly important for screening-oriented settings where missed lesions are costly. In the segmentation stage, the proposed framework achieved mean DICE scores of 0.721 and 0.700 on the test sets of the same datasets, demonstrating consistent overlap with expert annotations. Compared with state-of-the-art approaches that commonly assume lesion-centered ROIs or rely on heavier backbones, the proposed pipeline addresses a more realistic scenario by performing automatic detection followed by segmentation while maintaining substantially lower computational requirements. This balance between performance and efficiency makes the MassSeg-Framework a promising tool for scalable mammography analysis, particularly in resource-constrained environments or high-throughput screening workflows that require rapid processing. Full article

(This article belongs to the Special Issue Artificial Intelligence and Machine Learning for Biomedical Diagnostics and Prognostics)

► Show Figures

Figure 1

22 pages, 3840 KB

Open AccessArticle

An Integrated Vision–Mobile Fusion Framework for Real-Time Smart Parking Navigation

by Oleksandr Laptiev, Ananthakrishnan Thuruthel Murali, Nathalie Saab, Nihad Soltanov and Agnė Paulauskaitė-Tarasevičienė

Logistics 2026, 10(4), 84; https://doi.org/10.3390/logistics10040084 - 9 Apr 2026

Viewed by 582

Abstract

Background: Efficient parking navigation in large and dynamic parking areas requires systems that can adapt to real-time conditions and provide precise vehicle localization. Methods: This paper presents a smart car parking navigation module that integrates camera-based vehicle perception, homography-based ground-plane localization, [...] Read more.

Background: Efficient parking navigation in large and dynamic parking areas requires systems that can adapt to real-time conditions and provide precise vehicle localization. Methods: This paper presents a smart car parking navigation module that integrates camera-based vehicle perception, homography-based ground-plane localization, mobile GNSS positioning, and dynamic route planning into a unified framework. Instance segmentation (YOLOv8n-seg) is used to detect vehicles and extract ground-contact regions, which are associated with parking slots defined in a GeoJSON-based site model. Mobile GNSS data are fused with visual observations via spatio-temporal proximity scoring to enable robust user–vehicle matching without optical identification. An A* routing algorithm dynamically computes and updates navigation paths, adapting to lane obstructions and slot availability in real time. Results: Experimental evaluation on a real six-camera parking facility shows that the proposed segmentation-based localization reduces mean error from 0.732 m to 0.283 m (61.3% improvement), with the 95th-percentile error dropping from 1.892 m to 0.908 m, and outperforming the bounding-box baseline in 85.3% of detections. Conclusions: These results demonstrate that sub-meter vehicle localization and reliable user–vehicle association are achievable using standard surveillance cameras without specialized infrastructure, offering a scalable and cost-effective solution for intelligent parking navigation. Full article

(This article belongs to the Special Issue Multi-Criteria Decision-Making and Its Application in Sustainable Smart Logistics—2nd Edition)

► Show Figures

Figure 1

21 pages, 5808 KB

Open AccessFeature PaperArticle

Segmentation of Skin Lesions Using Deep YOLO-Family Networks: A Comparison of the Performance of Selected Models on a New Dataset

by Zbigniew Omiotek, Natalia Krukar, Aleksandra Olejarz, Piotr Lichograj, Miłosz Komada and Magda Konieczna

Electronics 2026, 15(8), 1545; https://doi.org/10.3390/electronics15081545 - 8 Apr 2026

Viewed by 460

Abstract

The aim of this study was to develop an effective and fast tool to support the automatic segmentation of skin lesions, with particular emphasis on the precise differentiation between malignant and benign lesions. In response to the problem of high false positive rates [...] Read more.

The aim of this study was to develop an effective and fast tool to support the automatic segmentation of skin lesions, with particular emphasis on the precise differentiation between malignant and benign lesions. In response to the problem of high false positive rates in existing CAD systems, modern neural network architectures from the YOLO family (YOLOv8, YOLOv9, YOLOv11, YOLOv12, and YOLOv26) were used in this research. The models were trained and evaluated on a new, balanced dataset (7000 images) based on the ISIC archive, where the key innovation was the introduction of a dedicated background class representing healthy skin. Through a multi-stage, rigorous optimization process, it was demonstrated that the yolov11s-seg model is highly effective for this task. It achieved a strong balance between effectiveness and processing speed, obtaining an mAP@50 score of 0.840 and an overall precision of 0.852. From a clinical perspective, the model’s high sensitivity (85.9%) in detecting the most aggressive lesion, invasive melanoma (MI), is particularly noteworthy. Thanks to its extremely short inference time (only 4.8 ms), the proposed yolov11s-seg variant overcomes the limitations of heavy hybrid architecture, providing a stable and highly efficient solution showing significant potential for deployment in real-time medical mobile applications. Full article

(This article belongs to the Special Issue Signal and Image Processing Applications in Artificial Intelligence, 2nd Edition)

► Show Figures

Figure 1

18 pages, 28028 KB

Open AccessArticle

SCEA-YOLO: A General-Purpose Maturity Grading Model of Multi-Crop Greenhouse Robots

by Tianyuan Li, Ping Liu, Dongfang Song, Xingtian Zhao, Xiangyu Lyu and Kun Zhang

Plants 2026, 15(7), 1102; https://doi.org/10.3390/plants15071102 - 3 Apr 2026

Viewed by 359

Abstract

Accurate classification of fruit maturity is essential for automated grading and robotic manipulation in modern greenhouse cultivation. Most existing methods rely on crop-specific models, severely restricting their scalability in multi-crop scenarios. To overcome this limitation, this study presents SCEA-YOLO, a unified and efficient [...] Read more.

Accurate classification of fruit maturity is essential for automated grading and robotic manipulation in modern greenhouse cultivation. Most existing methods rely on crop-specific models, severely restricting their scalability in multi-crop scenarios. To overcome this limitation, this study presents SCEA-YOLO, a unified and efficient instance segmentation framework built on YOLOv11s-seg, for simultaneous maturity classification of tomatoes and sweet peppers. To boost feature discrimination, reduce computational redundancy, and alleviate class imbalance, SCEA-YOLO integrates spatial-channel reconstruction convolution and an efficient multi-scale attention mechanism, while replacing the original detection head with the proposed EA-Head. The model is evaluated on a hybrid dataset captured under diverse greenhouse conditions, including varying illumination, fruit occlusion, and overlapping canopies. Its robustness to different viewing angles and camera distances is further validated via deployment on an automated grading robot. Compared with the baseline, SCEA-YOLO enhances classification precision and mAP_50–95 by 5.3% and 2.3% for tomatoes, and 1.2% and 1.4% for sweet peppers, respectively. With only 33.2 GFLOPs, the model satisfies real-time inference demands. Benefiting from its lightweight structure and real-time performance, SCEA-YOLO can be readily deployed on embedded systems and robotic platforms. It offers a practical, unified, and scalable solution for intelligent fruit maturity evaluation in multi-crop greenhouse production. Full article

(This article belongs to the Special Issue Advanced Remote Sensing and AI Techniques in Agriculture and Forestry)

► Show Figures

Figure 1

46 pages, 13181 KB

Open AccessArticle

Passable Area Evaluation of Tractor Road Based on Improved YOLOv5s and Multi-Factor Fusion

by Qian Zhang, Wenjie Xu, Wenfei Wu, Lizhang Xu, Zhenghui Zhao and Shaowei Liang

Agriculture 2026, 16(7), 752; https://doi.org/10.3390/agriculture16070752 - 28 Mar 2026

Viewed by 320

Abstract

The tractor road, as the core scene for autonomous driving of grain transport vehicles, is unstructured, complex, and obstacle-rich, leading to poor real-time performance and accuracy of joint road and obstacle detection with existing YOLOv5s. Furthermore, the reliability of passable area evaluation is [...] Read more.

The tractor road, as the core scene for autonomous driving of grain transport vehicles, is unstructured, complex, and obstacle-rich, leading to poor real-time performance and accuracy of joint road and obstacle detection with existing YOLOv5s. Furthermore, the reliability of passable area evaluation is low solely based on environmental factors. Therefore, YOLOv5s-C2S is proposed, fusing multi-scale features, attention mechanism, and dynamic features for joint detection. Firstly, YOLOv5s-CC is proposed for road detection by fusing context and spatial details and introducing Criss-Cross attention. Secondly, YOLOv5s-SGA is proposed for obstacle detection by grouped and spatial convolution, parameter-free attention, and adaptive feature fusion. By reusing YOLOv5s-CC weights, YOLOv5s-C2S shares low-level features and decouples high-level specificity. Based on the tractor road and obstacle information, combined with vehicle factors, a weighted scoring–based comprehensive method for passable area evaluation is proposed. Finally, the method was verified through experiments with an intelligent tracked grain transport vehicle using self-constructed datasets, including VOC_Road (11,927 images) and VOC_Obstacle (21,779 images). Compared with existing YOLOv5s, Deeplabv3+, FCN, Unet and SegNet, the mAP₅₀ of road detection by YOLOv5s-CC increased by over 1.2%. Compared with existing YOLOv5s, R-CNN, YOLOv7, SSD and YOLOv8n, the mAP₅₀ of obstacle detection by YOLOv5s-SGA increased by over 2%. Compared with YOLOv5s-SD, the mAP₅₀ of joint detection by YOLOv5s-C2S increased by 9.3%, and the frame rate increased by 7.0 FPS. The proposed passable area evaluation method exhibits strong robustness and reliability in complex environments, meeting the accuracy and real-time requirements in autonomous driving of grain transport vehicles. Full article

(This article belongs to the Section Artificial Intelligence and Digital Agriculture)

► Show Figures

Figure 1

22 pages, 4435 KB

Open AccessArticle

Semantic Mapping in Public Indoor Environments Using Improved Instance Segmentation and Continuous-Frame Dynamic Constraint

by Yumin Lu, Xueyu Feng, Zonghuan Guo, Jianchao Wang, Lin Zhou and Yingcheng Lin

Electronics 2026, 15(7), 1392; https://doi.org/10.3390/electronics15071392 - 26 Mar 2026

Viewed by 432

Abstract

Reliable semantic perception is crucial for service robots operating in complex public indoor environments. However, existing semantic mapping approaches often face the dual challenges of high computational overhead and semantic redundancy in maps. To address these limitations, this paper proposes a low-resource semantic [...] Read more.

Reliable semantic perception is crucial for service robots operating in complex public indoor environments. However, existing semantic mapping approaches often face the dual challenges of high computational overhead and semantic redundancy in maps. To address these limitations, this paper proposes a low-resource semantic mapping framework based on improved instance segmentation and dynamic constraints from consecutive frames. First, we design the lightweight model MS-YOLO, which adopts MobileNetV4 as its backbone network and incorporates the SHViT neck module, effectively optimizing the balance between detection accuracy and computational cost. Second, we propose a consecutive frame dynamic constraint method that eliminates redundant object annotations through consecutive frame stability verification. Experimental results relating to both fusion and custom datasets demonstrate that compared to YOLOv8n-seg, MS-YOLO achieves improvements in accuracy, recall, and mAP@0.5, while reducing the number of parameters by 11.7% and floating-point operations (FLOPs) by 32.2%. Furthermore, compared to YOLOv11n-seg and YOLOv5n-seg, its FLOPs are reduced by 17.2% and 25.5%, respectively. Finally, the successful deployment and field validation of this system on the Jetson Orin NX platform demonstrate its real-time capability and engineering practicality for edge computing in public indoor service robots. Full article

(This article belongs to the Section Artificial Intelligence)

► Show Figures

Figure 1

20 pages, 2662 KB

Open AccessArticle

A Synthetic Data-Driven Approach for Oil Spill Detection: Fine-Tuning YOLOv11-Seg with LIC-Based Ocean Flow Modeling

by Farkhod Akhmedov, Khujakulov Toshtemir Abdikhafizovich, Furkat Bolikulov and Fazliddin Makhmudov

J. Mar. Sci. Eng. 2026, 14(7), 608; https://doi.org/10.3390/jmse14070608 - 26 Mar 2026

Viewed by 445

Abstract

Oil spills represent a severe environmental hazard, threatening marine and coastal ecosystems, biodiversity, and socio-economic stability. Timely and accurate detection of such incidents is critical for mitigating their ecological and economic consequences. Conventional detection techniques, including manual inspection and satellite-based observation, remain limited [...] Read more.

Oil spills represent a severe environmental hazard, threatening marine and coastal ecosystems, biodiversity, and socio-economic stability. Timely and accurate detection of such incidents is critical for mitigating their ecological and economic consequences. Conventional detection techniques, including manual inspection and satellite-based observation, remain limited by high operational costs, temporal delays, and restricted spatial coverage. To overcome these limitations, this study introduces a comprehensive computer vision framework that addresses two core challenges: (i) the construction of a large-scale, high-quality synthetic oil spill dataset through mask extraction and seamless blending of oil spill regions with diverse oceanic backgrounds, and (ii) the development of a fine-tuned YOLOv11m-seg detection model trained on this enriched dataset. To further enhance the realism and spatial distinctiveness of oil spill textures, the Line Integral Convolution (LIC) is applied to estimate and visualize ocean surface flow patterns, generating coherent streamline textures that simulate the natural diffusion and transport of oil in water. The model exhibited strong generalization and precision, achieving a training accuracy exceeding IoU@0.50-0.95 to 85% over 50 epochs. Evaluation metrics confirmed its reliability, with an F1 score of 94%, precision of 94%, and recall (mAP@0.50) of 94%. These results demonstrate that the developed approach not only enhances dataset diversity but also substantially improves the accuracy and representativeness of real-time oil spill detection in marine environments. Full article

(This article belongs to the Special Issue Artificial Intelligence Technology and Application in Marine Science and Engineering)

► Show Figures

Figure 1

28 pages, 14283 KB

Open AccessArticle

FSD-YOLO: A Fusion Framework for Region Segmentation and Deformable Object Detection in Container Yards

by Linghao Dai, Zhihong Liang, Qi Feng, Shihuan Xie and Hongxu Li

Sensors 2026, 26(7), 2029; https://doi.org/10.3390/s26072029 - 24 Mar 2026

Viewed by 389

Abstract

Safety monitoring in container hoisting operations within rail-road intermodal logistics parks is a critical task in industrial safety management. Such scenarios are characterized by complex environments, large variations in target scales, deformable object shapes, and frequent occlusions, which pose significant challenges to visual [...] Read more.

Safety monitoring in container hoisting operations within rail-road intermodal logistics parks is a critical task in industrial safety management. Such scenarios are characterized by complex environments, large variations in target scales, deformable object shapes, and frequent occlusions, which pose significant challenges to visual perception systems. Conventional single-task models suffer from inherent limitations in handling low recall rates for distant small targets and insufficient adaptability to geometric deformations, making them inadequate for high-precision, real-time safety warning applications. To address these challenges, this study proposes a unified visual analysis framework that integrates semantic segmentation and object detection to enhance the recognition performance of small and deformable targets in complex operational environments, enabling real-time perception and safety warning of key objects and hazardous regions within container yards. Specifically, we introduce FSD-YOLO, a fusion-based architecture composed of the following key components. First, a SegFormer-based semantic segmentation module is employed to achieve pixel-level delineation of different operational regions. Second, an improved object detection network is developed based on the YOLOv8n architecture, incorporating: (1) the integration of C2f modules in the shallow layers of the backbone to enhance high-resolution feature extraction; (2) the embedding of C2fDCN modules within the detection head to improve modeling capability for deformable objects via deformable convolution; (3) the adoption of CARAFE upsampling operators to optimize multi-scale feature fusion; and (4) a dynamic loss-weighting strategy for small objects, where loss weights are adaptively adjusted according to target area to increase training emphasis on small-scale targets. Finally, a decision-level fusion strategy is applied to combine segmentation and detection outputs, enabling real-time safety judgment based on semantic rules. Experimental results on a self-constructed container yard dataset demonstrate that the proposed detection model achieves an mAP50-95 of 0.6433 and an mAP50 of 0.9565, significantly outperforming the baseline YOLOv8n model (mAP50-95: 0.5394, mAP50: 0.8435), thereby validating the effectiveness of the proposed framework. Full article

(This article belongs to the Topic AI and Data-Driven Advancements in Industry 4.0, 2nd Edition)

► Show Figures

Figure 1

Search Results (173)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (173)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI