Saved Queries

Precise detection of maize root–stem junction is crucial for hole fertilization in maize cultivation. However, maize root–stem junction detection under field conditions is severely affected by soil clods, crop residues, and weeds, and is further complicated by variations in plant morphology, the small scale of targets, and their sparse spatial distribution. To address these issues, an improved model named PGi-YOLO is proposed in this study, based on YOLOv11n-OBB. A P2 high-resolution detection layer is introduced to improve multi-scale feature representation and enhance small-target localization. The C2PSA-iRMB module replaces the original attention module by integrating an inverted residual mobile block (iRMB) mechanism, thereby strengthening global contextual information fusion while preserving its lightweight design. In addition, the Group Shuffle Convolution (GSConv) module is adopted to replace part of the standard convolution operations, reducing computational redundancy and improving inference efficiency. Experimental results show that PGi-YOLO achieves a precision of 92.0%, a recall of 93.4%, and an mAP@0.5 of 96.9%, with parameters of 2.61 M, a model size of 6.0 MB and an inference time of 5.1 ms. Overall, PGi-YOLO achieves a favorable balance between accuracy and efficiency, demonstrating strong robustness for maize root–stem junction detection in complex field environments and providing reliable support for precision agriculture applications. Full article

(This article belongs to the Section Artificial Intelligence and Digital Agriculture)

►▼ Show Figures

Figure 1

28 pages, 5551 KB

Open AccessArticle

Capacity-Aware Lightweight Object Detection for UAV Remote Sensing: Dynamic Coupling Regularity and the SP-YOLO Model Family

by Shihao Yin and Weiqiang Tang

Appl. Sci. 2026, 16(11), 5249; https://doi.org/10.3390/app16115249 (registering DOI) - 23 May 2026

Abstract

Object detection in UAV remote sensing imagery is confronted with three primary challenges: severe scale variation, densely clustered small targets, and constrained computational resources. This work introduces a family of lightweight detection models guided by the “Capacity-Aware Configuration Regularity” and incorporates a Feature-Refinement C2f module to enhance representational efficiency. A dynamic coupling mechanism is identified between detection head capacity and the representational quality of Backbone features, which is further validated through systematic ablation studies spanning three parameter magnitudes. Evaluated on the VisDrone2019 benchmark, the proposed model family exhibits a progressive parameter scaling from 1.67 M to 6.15 M. The nano variant achieves 31.7% mAP₅₀ using only 55% of the parameter budget of YOLOv8n, surpassing it by 0.7 percentage points. The small variant, with a parameter budget comparable to YOLOv8n, attains 36.7% mAP₅₀, exceeding it by 5.7 points. The medium variant reaches 43.1% mAP₅₀ with 58% of the parameters of YOLOv8s, outperforming it by 4.1 points. The improvements are pronounced under the stricter mAP_50–95 metric, where the small variant outperforms YOLOv8n by 3.3 points and the medium variant surpasses YOLOv8s by 2.8 points, demonstrating robust localization accuracy across a wide range of IoU thresholds. This consistent superiority in the accuracy–efficiency trade-off extends to the DIOR dataset, confirming the robust generalization of the proposed models across diverse remote sensing scenarios. Moreover, the uncovered capacity-matching regularity offers transferable methodological guidance for designing lightweight detection models tailored to resource-constrained platforms. Full article

(This article belongs to the Section Applied Industrial Technologies)

26 pages, 1353 KB

Open AccessArticle

Keypoint-Based Forest Musk Deer Behavioral Recognition Method

by Dequan Guo, Chuankang Chen, Chengli Zheng, Zhenyu Wang, Dapeng Zhang and Dening Luo

Animals 2026, 16(11), 1594; https://doi.org/10.3390/ani16111594 (registering DOI) - 23 May 2026

Abstract

The traditional monitoring of forest musk deer behavior primarily relies on direct human observation or the post hoc playback analysis of ordinary surveillance videos. This approach is not only time-consuming and labor-intensive but also highly subjective, easily leading to missing or misjudged critical behavioral information. Moreover, it is difficult to achieve real-time monitoring and anomaly warning. These limitations severely constrain the efficiency of the large-scale artificial breeding of forest musk deer and the effective advancement of wild population conservation. Thus, this study proposes a forest musk deer behavioral recognition method based on an improved YOLOv8-Pose. A forest musk deer behavior image dataset covering four typical behaviors was constructed, and 18 keypoints were systematically annotated. This study designs a Dilated Spatial Pyramid Pooling-Fast (DILATED-SPPF) module and a Multi-scale Depthwise Separable Context Mixer (MDSC-Mixer) module, and integrates them into YOLOv8-Pose. Experimental results show that the improved model outperforms the original YOLOv8-Pose and comparison models such as YOLOv11/v12-Pose on key metrics of object detection (Box-mAP50 0.929, Box-mAP50-95 0.814) and pose estimation (Pose-mAP50 0.879, Pose-mAP50-95 0.565). This study further develops a visual interactive interface that intuitively presents detection results and skeleton structures. This work provides a high-precision, low-cost automated behavior analysis tool for the artificial breeding and wild conservation of forest musk deer with significant application value for enhancing the intelligence level of endangered species protection. Full article

(This article belongs to the Special Issue The Application of Artificial Intelligence (AI) in Animal Behavior, Emotion and Health)

22 pages, 3661 KB

Open AccessArticle

Industrial Weld Defect Detection Based on Monocular Depth Estimation and Dual-Attention Point Cloud Network

by Nannan Zhao and Shijie Chen

Sensors 2026, 26(11), 3321; https://doi.org/10.3390/s26113321 (registering DOI) - 23 May 2026

Abstract

In industrial quality control, the precise identification of severe structural weld defects is paramount. Traditional 2D image-based detection methods are susceptible to illumination and texture interference, while high-precision 3D laser scanning solutions are costly and impractical for large-scale deployment. To achieve reliable geometric defect detection at low cost, this paper proposes a detection framework based on monocular depth estimation and a dual-attention point cloud network. First, YOLOv8 is employed for rapid region of interest extraction, and an advanced monocular depth estimation model generates 3D pseudo-point clouds containing geometric information. Secondly, addressing the challenge of distinct spatial orientation features in missed weld defects that are prone to confusion, this paper introduces a dual-attention-enhanced point cloud classification network named DA-PointNet++. This model embeds dual-attention modules within the PointNet++ backbone network, enhancing key feature representation in both the channel and spatial dimensions. Experimental results demonstrate that this approach achieves an accuracy of 93.67% and a recall rate of 90.51% in a unified binary classification task for general weld defect detection, effectively identifying both normal welds and complex missed weld defects. Compared to PointConv, Dynamic Graph Convolutional Neural Network (DGCNN), and mainstream Point Cloud Transformer, this method significantly reduces false negative rates while maintaining low computational costs, offering a cost-effective solution for industrial automation. Full article

(This article belongs to the Section Industrial Sensors)

►▼ Show Figures

Figure 1

21 pages, 4832 KB

Open AccessArticle

YOLOv9-Based Detection of Diseases in Poplar Trees Using Histogram Equalization and Computer Vision

by Fazliddin Makhmudov, Kudratjon Zohirov, Jura Kuvandikov, Zavqiddin Temirov, Akmalbek Abdusalomov Bobomirzayevich, Mukhriddin Mukhiddinov, Khodisakhon Muraeva, Jasur Sevinov and Furkat Bolikulov

Sensors 2026, 26(11), 3320; https://doi.org/10.3390/s26113320 (registering DOI) - 23 May 2026

Abstract

Poplar (Populus) trees are indispensable to various industries and environmental sustainability efforts. They are widely utilized for paper production, timber, and windbreaks, while also playing a significant role in carbon sequestration. Given their economic and ecological importance, the effective management of diseases is crucial. Convolutional Neural Networks (CNNs), renowned for their ability to process visual data, are pivotal in accurately detecting and classifying plant diseases. This study presents a domain-specific dataset of manually collected images of diseased poplar leaves from Uzbekistan and South Korea, ensuring geographic diversity and broader applicability. The dataset includes four disease classes, i.e., “Parsha (Scab),” “Brown spotting,” “White-Gray spotting,” and “Rust,” which represent common afflictions in these regions. To advance research efforts, this dataset will be made publicly accessible, providing a valuable resource for the scientific community. Leveraging the cutting-edge YOLOv9c model, a state-of-the-art CNN architecture, we applied the Histogram Equalization technique as a preprocessing step to enhance the image quality to increase the accuracy of disease detection. This method not only improves the diagnostic performance of the model but also provides a scalable solution for monitoring and managing poplar diseases. By ensuring the health of poplar trees, this approach supports the sustainability of these critical resources. To our knowledge, this is the first publicly available dataset specifically focused on diseased poplar leaves, making it a significant contribution to global research efforts. It offers an invaluable resource for researchers and practitioners, enabling further advancements in early disease detection and sustainable forestry management. Full article

(This article belongs to the Section Intelligent Sensors)

►▼ Show Figures

Figure 1

24 pages, 7825 KB

Open AccessArticle

SY-SLAM: Real-Time Dynamic Indoor RGB-D SLAM with SuperPoint Detection and Asynchronous YOLOv8s-Based Keypoint Suppression

by Shaoshuai Zhi, Shuangfeng Wei, Shan Zhou, Yulan Lao, Mingyang Zhai, Tianyu Yang, Keming Qu and Boyan Jiang

Sensors 2026, 26(11), 3315; https://doi.org/10.3390/s26113315 (registering DOI) - 23 May 2026

Abstract

Traditional visual SLAM pipelines are typically designed under the static-world assumption and often degrade severely in indoor environments with frequent human motion. To improve trajectory accuracy and front-end stability in such scenarios while maintaining real-time throughput, we present SY-SLAM, an RGB-D SLAM system for dynamic indoor environments with frequent human motion. (S stands for SuperPoint, which is used as a detector-only learned keypoint front-end, and Y stands for YOLO, which provides asynchronous person-aware keypoint suppression based on detected human bounding boxes.) We integrate a TensorRT-deployed detector-only SuperPoint module to improve keypoint repeatability and robustness while retaining ORB binary descriptors for efficient matching and place recognition within the ORB-SLAM3 framework. To avoid feature starvation while preserving keypoint quality, we further introduce an adaptive SuperPoint keypoint selection strategy that applies stricter filtering when keypoints are abundant and relaxes the selection constraints when they are scarce. In parallel, an asynchronous YOLOv8s TensorRT thread performs person detection with temporal bounding-box memory, and keypoints inside detected person regions are removed before ORB descriptor computation and matching to reduce dynamic-feature contamination in the front end. We evaluate SY-SLAM on five dynamic TUM RGB-D fr3 sequences using ATE and RPE metrics. Compared with ORB-SLAM3, SY-SLAM reduces ATE RMSE by 93.45% across four dynamic walking sequences. On the widely reported fr3/w/x sequence, SY-SLAM achieves competitive accuracy with recent dynamic SLAM methods while maintaining real-time performance. The system runs in real time at 46.8 Hz (21.36 ms per frame) on an Intel i9-13900H CPU with an NVIDIA RTX 4070 Laptop GPU. Full article

(This article belongs to the Section Sensors and Robotics)

44 pages, 2331 KB

Open AccessFeature PaperArticle

Image-Based Classification of Concrete Carbonation Using YOLO Models

by Yaren Aydın, Ümit Işıkdağ, Sinan Melih Nigdeli, Gebrail Bekdaş and Celal Cakiroglu

Materials 2026, 19(11), 2198; https://doi.org/10.3390/ma19112198 (registering DOI) - 23 May 2026

Abstract

Detecting the presence of carbonation is critical for monitoring structural safety and durability. Identifying the presence of carbonation reveals the risk of chemical changes within the concrete and the potential for reinforcement corrosion. This detection allows for a reliable and prioritized assessment of the structure’s current condition. Therefore, checking for the presence or absence of carbonation is a critical indicator in determining structural safety and maintenance priorities. This study explicitly addresses a critical gap in the literature, where existing carbonation research predominantly focuses on regression-based estimation of carbonation depth, while the problem of direct visual classification of carbonation presence for rapid decision-making currently remains underexplored. In this context, the study aims to fill this research gap through developing a robust and field-applicable deep learning-based classification framework for the automated detection of carbonation presence on concrete surfaces using images, while systematically comparing the performance of different YOLO architectures and assessing the suitability of a previously unused dataset (ConcreteCARB) for carbonation classification tasks. In this context, YOLOv8m, YOLOv11m, YOLOv12m, and YOLOv26m were compared for concrete carbonation classification, aiming to find the most suitable model. The results show that YOLOv8m and YOLOv11m achieve perfect accuracy (Accuracy = 0.9981, Precision = 1, Recall = 0.9964, Specificity = 1, AUC-ROC = 1). In inference efficiency analyses, the YOLOv11m model was identified as the fastest model with the lowest latency and highest FPS. While YOLOv8m and YOLOv26m offered balanced speed-performance results, YOLOv12m showed a relatively lower processing speed. The findings indicate that YOLOv11m is the most suitable option for real-time applications. Full article

(This article belongs to the Section Construction and Building Materials)

►▼ Show Figures

Graphical abstract

22 pages, 1106 KB

Open AccessArticle

Heliocot: A Field RGB Imaging Approach for Diurnal Canopy Orientation Dynamics in Early-Season Cotton

by Uğur Çakaloğulları and Deniz İştipliler

Agriculture 2026, 16(11), 1141; https://doi.org/10.3390/agriculture16111141 - 22 May 2026

Abstract

Understanding diurnal canopy orientation in crops is important for interpreting plant responses to light and environmental conditions, yet field-based quantification remains limited. In this study, we present Heliocot, a field RGB imaging approach that converts time-resolved images into reference-area standardized projected leaf area (PLA) time series to quantify within-day canopy orientation dynamics in early-season cotton. Leaf instance segmentation was performed using YOLOv8m-seg and refined through a 144-combination post-processing optimization. On the held-out early-stage validation/tuning set, the selected workflow showed strong agreement with manual ground truth (R² = 0.948; NRMSE = 0.082) and destructive leaf area measurements (R² = 0.836). Derived diurnal metrics, including Daily Orientation Amplitude (DOA) and Peak Orientation Index (POI), consistently revealed a midday maximum (13:15) in canopy projection. Exploratory genotype-level analysis suggested negative associations between orientation indices and selected plant traits, including specific leaf area (SLA) versus DOA (r = −0.71, p = 0.021, R² = 0.508), destructive leaf area (LA) versus DOA (r = −0.69, p = 0.028, R² = 0.471), and stem dry weight (SDW) versus POI (r = −0.74, p = 0.014, R² = 0.554), while plant height was not significantly associated with POI and DOA (p > 0.05). Although currently limited to early-season conditions and two field-imaging dates, this approach provides a practical workflow for field-based monitoring of canopy projection dynamics in cotton, while broader temporal and environmental validation remains necessary. Full article

(This article belongs to the Special Issue Field Phenotyping for Precise Crop Management)

22 pages, 18195 KB

Open AccessArticle

A Modular Vision System for Practical Object Detection on Resource-Constrained Humanoid Robots

by MengCheng Lau and Nicolas Pottier

Biomimetics 2026, 11(6), 363; https://doi.org/10.3390/biomimetics11060363 - 22 May 2026

Abstract

Deploying modern deep learning-based vision systems on humanoid robots remains challenging due to limited onboard computational resources and legacy software constraints. This paper presents a modular vision system for practical object detection on resource-constrained humanoid platforms, based on the YOLOv9 framework. The proposed architecture adopts a dual-environment design, decoupling the perception pipeline from the robot control system to enable compatibility between modern deep learning libraries and a ROS-based platform. To support efficient deployment, task-specific lightweight models are trained and integrated into a modular pipeline optimized for CPU-only inference. The system is evaluated across multiple task scenarios derived from the FIRA RoboWorld Cup (Hurocup) competition, including Marathon, Basketball, and Archery. Performance is assessed in terms of detection accuracy and computational efficiency, demonstrating that reliable perception can be achieved at 4–8 FPS under constrained hardware conditions. The results show that the proposed approach improves robustness compared to traditional geometric vision methods, particularly in dynamic and visually complex environments, while maintaining practical responsive task-level perception for robotic decision-making. The work highlights the trade-offs between accuracy, computational cost, and system responsiveness and demonstrates the feasibility of deploying modern object detection models on embedded humanoid platforms. Full article

(This article belongs to the Special Issue Bio-Inspired Intelligent Robot)

►▼ Show Figures

Graphical abstract

19 pages, 12590 KB

Open AccessArticle

OPTP-System: A Lightweight Pedestrian Trajectory Prediction System for Complex Occlusion Environments

by Zijian Lin, Hong Huang, Yirui Zhang and Wenfeng Zhao

Electronics 2026, 15(11), 2247; https://doi.org/10.3390/electronics15112247 - 22 May 2026

Abstract

Pedestrian trajectory prediction in complex occlusion environments remains a critical challenge for autonomous driving systems. Although high-precision prediction models have achieved notable success, they often entail substantial computational overhead and struggle to maintain both accuracy and physical plausibility under real-world occluded conditions. To address these limitations, this paper proposes OPTP-System, a lightweight prediction framework that integrates YOLOv11 with DeepSORT for robust multi-pedestrian tracking in occluded scenes. An extended Kalman filter (EKF)-based motion prediction module is employed to generate trajectory forecasts, while the EKF-derived prior knowledge guides detection re-searching in occluded regions. Furthermore, feedback from trajectory smoothing refines detection confidence, substantially enhancing the model’s capability for continuous tracking and prediction under severe occlusion. Experimental results under challenging occlusion settings (exceeding 50% occlusion) show that the proposed model reduces ADE and FDE by 30.0% and 29.3%, respectively, compared to state-of-the-art methods. These findings demonstrate that OPTP-System achieves superior prediction accuracy while maintaining computational efficiency, offering a practical solution for reliable pedestrian trajectory prediction in complex traffic environments. Full article

►▼ Show Figures

Figure 1

27 pages, 1685 KB

Open AccessArticle

EMWMS-YOLO: Efficient Multi-Scale Detection Framework for Small Objects in Challenging Remote Sensing Scenes

by Shuo Tian, Yuguo Li, Jian Li, Wenzheng Sun, Longfa Chen and Na Meng

Remote Sens. 2026, 18(11), 1682; https://doi.org/10.3390/rs18111682 - 22 May 2026

Abstract

Nowadays, remote sensing images are characterized by significant scale variations, a high density of small targets, and complex background conditions, which pose substantial challenges for small-object detection. To address these issues, we propose EMWMS-YOLO, a lightweight and efficient detection framework built upon YOLOv11n. Specifically, an Efficient Multi-Scale Cross-Layer Extraction (EMSCLE) backbone is designed by integrating the Dual-Branch Feature Extraction (DBFE), Multi-Scale Feature Perception (MSFP), and Spatial Pyramid Pooling Fast with Large Separable Kernel Attention (SPPF-LSKA) modules, enabling effective multi-scale feature extraction and cross-channel interaction. Furthermore, a Multi-Scale Adaptive Feature Fusion (MSAFF) neck architecture, composed of the Channel-Enhanced Convolution (CEC) and Multi-Scale Gated Feature Fusion (MSGFF) modules, is introduced to dynamically fuse cross-scale features and enhance salient target responses while suppressing background noise. In addition, the WaveletPool module replaces conventional pooling operations to reduce information loss and feature aliasing while preserving structural details. A Detect-MultiSEAM detection head is constructed by embedding a multi-scale spatial enhancement attention mechanism, which improves feature representation under complex conditions and reduces missed detections and false positives. Finally, the ShapeIoU loss function is employed to better model geometric and morphological properties, thereby improving localization accuracy. Experimental results on the VEDAI and NWPU-VHR-10 datasets demonstrate that the proposed method achieves improvements of 9.8% and 4.1% in mAP50 over the YOLOv11n baseline, respectively, verifying its effectiveness in small-object detection. Full article

(This article belongs to the Section Remote Sensing Image Processing)

23 pages, 1978 KB

Open AccessArticle

A Multi-Scale Attention-Enhanced YOLOv26 Framework for Steel Structure Corrosion Detection and Segmentation

by Hongmei Hou, Zhixin Wang, Jianbo Zheng, Jinzhen Xi and Libin Tian

Buildings 2026, 16(11), 2057; https://doi.org/10.3390/buildings16112057 - 22 May 2026

Abstract

Steel structures in complex service environments are highly susceptible to corrosion, making accurate detection challenging. This study proposes an improved YOLOv26-based method for corrosion damage segmentation. A diverse dataset is constructed by combining field-collected and public data with varying lighting conditions and multi-scale features. Enhancements to the YOLOv26-seg architecture include integrating Efficient Channel Attention (ECA) in the backbone to strengthen low-contrast feature representation, designing a multi-branch attention mechanism (ECA + CBAM) in the detection head to improve small- and medium-scale target recognition, and introducing Selective Kernel Attention (SKA) in the segmentation branch to refine boundary details. The resulting YOLOv26-ECS model achieves an mAP50 of 0.920 and mAP50–95 of 0.851 on the self-constructed dataset, outperforming the baseline by 5.0% and 6.0%, respectively, while maintaining 28.34 FPS. Experiments on public datasets further demonstrate strong generalization. A GUI system is also developed for visualization and practical deployment. Overall, the proposed method delivers accurate and efficient corrosion detection and segmentation, showing strong potential for engineering applications. Full article

(This article belongs to the Section Building Structures)

30 pages, 4499 KB

Open AccessArticle

Gap Measurement Method for Railway Switch Machines Based on the Fusion of Deep Vision and Geometric Features

by Wenxuan Zhi, Qingsheng Feng, Shuai Xiao, Xilong He, Haowei Liu, Yiyang Zou and Hong Li

Sensors 2026, 26(11), 3280; https://doi.org/10.3390/s26113280 - 22 May 2026

Abstract

The gap dimension of a railway switch machine is a critical physical quantity for determining the locking status of railway turnouts. Under operating conditions characterized by heavy oil contamination, complex illumination, and equipment vibration, existing visual measurement methods often struggle to maintain stability and achieve sub-pixel precision. To address this issue, this paper proposes a gap measurement method based on the fusion of vision and geometric features (G-VFM). The method first utilizes a confidence-aware optimized YOLOv8 model to achieve robust localization of the gap region. Subsequently, an improved multi-channel U-Net is employed to extract soft-edge probability maps, based on which a 20-dimensional structured geometric descriptor is constructed. Finally, visual semantic features and geometric priors are fused for regression through an R34-Fusion two-stream residual network, and systematic errors are corrected using a weighted Huber loss combined with a piecewise linear calibration strategy. Test results on a constructed field dataset show that the proposed method achieves a Mean Absolute Error (MAE) of 0.0076 mm and a maximum error of 0.0193 mm. It achieves a 100% pass rate under an industrial tolerance of 0.02 mm, with an end-to-end inference time of 52.23 ms (~19.15 FPS), balancing both precision and efficiency. Further tests on illumination degradation, noise interference, and cross-batch evaluations indicate that the method maintains relatively stable performance across various complex scenarios. However, performance decreases significantly under extremely low-light conditions, suggesting that actual deployment may require integration with active lighting or multi-sensor fusion to ensure system reliability across all working conditions. Overall, this method achieves high-precision gap measurement under current experimental conditions and provides a feasible solution for vision-based switch machine status monitoring. Full article

(This article belongs to the Special Issue Advanced Sensing Technologies for Sustainable and Resilient Railway Infrastructures)

►▼ Show Figures

Figure 1

16 pages, 3229 KB

Open AccessArticle

Design of a Rapid License Plate Localization Algorithm Utilizing Color Statistical Features

by Mingjin Li, Xianfeng Tang, Ying Xiong, Huajie Guo, Jingqian Wu, Chao Jiang, Rui Han, Hengjia Xiang, Zhe Wang, Zhongfu Zhang and Juan Gao

Electronics 2026, 15(11), 2232; https://doi.org/10.3390/electronics15112232 - 22 May 2026

Abstract

Aiming at the problems of weak background adaptive ability, high dependence on edge features, high computational complexity of some traditional license plate location algorithms, high deployment cost and strong training dependence of location model based on deep learning, this paper proposes a fast license plate location algorithm based on statistical color features. The algorithm uses the HSV color space as the main processing channel, and quantifies the regional color distribution characteristics by constructing the hue histogram and calculating its standard deviation and other statistics, which significantly improves the discrimination and illumination adaptability of the license plate mask in complex background. Compared with the lightweight deep learning models such as “You Only Look Once Version 12 Nano”, this algorithm does not need GPU acceleration and model loading, eliminates the need for data training, significantly reduces the deployment cost and complexity, and can run efficiently on the general computing platform. The experimental results show that compared with the YOLOv12n model, the average processing time of this algorithm is shortened by 30.81% (when YOLOv12n is evaluated with GPU) or 48.42% (when YOLOv12n is evaluated with CPU) at the cost of sacrificing about 5.8% positioning accuracy. The positioning accuracy still reaches 93.7%, demonstrating high processing efficiency and excellent platform adaptability. The algorithm has the advantages of being lightweight, efficient and interpretable, and is especially suitable for intelligent parking lots, edge devices and other scenes sensitive to real time, cost and energy consumption. Full article

►▼ Show Figures

Figure 1

30 pages, 1998 KB

Open AccessArticle

Tomato-Adaptive Attention YOLOv8 for Accurate and Interpretable Maturity Detection Across Diverse Environments

by Umme Fawzia Rahim, Md. Mushibur Rahman and Hiroshi Mineno

Agriculture 2026, 16(10), 1130; https://doi.org/10.3390/agriculture16101130 - 21 May 2026

Viewed by 177

Abstract

Accurate tomato maturity detection is critical for optimizing key agricultural operations in precision agriculture, including harvesting, grading, and quality control. Despite advances in deep learning and machine vision, reliable detection in real-world environments remains challenging due to cluttered backgrounds, dense fruit clustering, and subtle color differences between maturity stages. In response to these challenges, we present TAA-YOLOv8, an attention-enhanced detection architecture integrating a novel Tomato-Adaptive Attention (TAA) module that performs sequential channel–spatial feature refinement using an adaptive 1D convolution for channel recalibration and a balanced 5 × 5 spatial kernel for improved localization, enhancing discriminative representation while preserving computational efficiency. The framework is evaluated on three datasets representing diverse agricultural environments: a newly introduced Cross-Regional Tomato dataset collected from open-field farms in Bangladesh and greenhouse facilities in Japan, and two public benchmarks, Laboro Tomato and Tomato Plantfactory. TAA-YOLOv8m outperforms baseline YOLOv8m, achieving mAP@50–95 improvements of +9.29%, +9.00%, and +6.65% with F1-scores of 0.968, 0.976, and 0.955, respectively. It further surpasses attention-enhanced variants and RT-DETR-L, and remains competitive with YOLOv11m. Gradient-Weighted Class Activation Mapping (Grad-CAM) shows concentrated fruit-centered activations, providing transparent decision-making evidence and supporting stakeholder confidence in practical deployment within vision-based agricultural management systems. Full article

(This article belongs to the Section Artificial Intelligence and Digital Agriculture)

Show export options Show export options

Select all

Export citation of selected articles as:

Error

Oops... you haven't selected anything for export.

Displaying article 1-50 on page 1 of 126.

Go to page 1 2 3 4 5

Search Results (6,258)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI