Saved Queries

Small object detection is fundamentally constrained by the lack of discriminative fine-grained features. Although introducing higher resolution detection scales can improve performance, it also amplifies background noise. In addition, the independently decoupled design of conventional detection heads is insufficient to address the persistent challenges of missed detections and false positives for small objects. To this end, we propose MSIA-YOLO, a YOLOv11-based detector with multi-scale semantic interaction and alignment, optimized from three complementary perspectives: feature modeling, high resolution semantic compensation, and task coordinated alignment. First, Receptive Field Attention Convolution (RFAConv) is integrated into the backbone to enhance critical local details, such as edge and texture cues, via receptive field aware attention. Second, to alleviate fine detail attenuation caused by repeated downsampling, we construct a CHSP-P2 small object detection framework with an additional P2 branch. A scale sequence fusion mechanism is further introduced to perform high resolution semantic compensation through cross scale hybrid inputs. Finally, we design a DTIA-Head (Dynamic Task Interaction and Alignment Head), which promotes joint optimization of classification and localization through dynamic task interaction and spatial alignment. Extensive experiments on the public datasets VisDrone, TinyPerson, and RSOD show that, compared with the YOLOv11n baseline, MSIA-YOLO improves mAP50 by 7.7%, 10.3%, and 1.0%, respectively, while also outperforming several advanced detectors. These results demonstrate the effectiveness and generalization capability of the proposed method in small object, dense object, and complex scene object detection scenarios. Full article

(This article belongs to the Special Issue Target Detection, Recognition, Tracking, and Positioning Using Remote Sensing and AI Techniques (Second Edition))

23 pages, 5420 KB

Open AccessArticle

Real-Time Detection of Rare Traffic Situations Using RGB-LiDAR Fusion and a Rule-Based Safety Agent in CARLA

by Matúš Čávojský, Matúš Dopiriak, Eugen Šlapak, Arisha Al Faruque, Tomáš Doboš and Gabriel Bugár

Appl. Sci. 2026, 16(13), 6722; https://doi.org/10.3390/app16136722 (registering DOI) - 5 Jul 2026

Abstract

Rare and safety-critical traffic situations remain challenging for autonomous driving (AD) because they are underrepresented in common training data and may include objects outside standard detector classes. This paper presents a real-time RGB-LiDAR fusion framework for detecting and reacting to rare traffic situations in CARLA (Car Learning to Act), a reproducible simulator for AD research. The approach combines YOLOv8n-based RGB perception, bird’s-eye-view (BEV) LiDAR clustering, decision-level fusion, an interpretable rule-based safety agent with hysteresis, Time-to-Collision (TTC)-aware escalation, and an automatic emergency braking (AEB) override above the CARLA autopilot. Fused observations are classified as semantic–geometric detections, semantic-only detections, or geometric-only obstacle candidates, where unmatched LiDAR clusters are treated conservatively as candidate-level physical evidence rather than confirmed rare objects. The framework was evaluated on three CARLA maps and 3CSim-inspired corner-case scenarios comprising 19,253 frames, with additional weather/lighting stress tests and a public nuScenes mini cross-platform check. On a manually annotated subset of 4800 CARLA frames, corresponding to approximately

24.9 %

of the recorded CARLA log, the full framework achieved

96.2 %

precision,

97.3 %

recall, and a

96.7 %

F1-score for safety-relevant threat detection. The control experiments show that the fusion-based safety agent reduced unnecessary braking to

1.7 %

compared with

8.6 %

for the LiDAR-only baseline and achieved event-level success on the annotated critical intervals. The proposed CPU-only implementation maintained real-time performance, with an average processing time of

34.7 ms

. Full article

►▼ Show Figures

Figure 1

19 pages, 5338 KB

Open AccessArticle

Enhanced DINO for Cross-Domain Transmission Tower Detection Using Remote Sensing Images

by Junjie Wang, Jiahe Tian, Lingli Zhao, Ningsheng Liao, Jie Yang and Lei Shi

Remote Sens. 2026, 18(13), 2198; https://doi.org/10.3390/rs18132198 (registering DOI) - 5 Jul 2026

Abstract

Transmission towers are fundamental components of electric power networks. However, their structure, scale and background textures vary substantially across remote sensing images acquired from different geographic regions. These discrepancies often reduce the detection accuracy of a model trained in one region when it is applied to another region. This paper proposes an enhanced DINO-based framework for cross-domain transmission tower detection that incorporates three lightweight optimisation modules. First, a Query-level Objectness Gating (QOG) module adaptively reweights decoder queries by estimating per-query objectness scores, thereby suppressing background-dominated queries. Second, MPDIoU regression is used to improve the localisation accuracy of elongated transmission tower targets. Third, a Quality-aware Scoring Module (QSM) calibrates classification confidence using predicted localisation-quality logits, thereby reducing high-confidence false detections caused by poor box alignment. Experiments are conducted on two remote sensing image datasets from different geographic regions. Under the 10% target-domain annotation setting, the proposed method achieves a precision of 0.8947, a recall of 0.8199, an F1-score of 0.8556 and an mAP@0.5 of 0.8684, outperforming the original DINO baseline and mainstream detectors including YOLOv8, YOLOv9 and YOLOv11. The results demonstrate that the proposed framework provides an effective solution for robust cross-domain detection of slender transmission tower targets in remote sensing images. Full article

►▼ Show Figures

Figure 1

26 pages, 13514 KB

Open AccessArticle

Diffusion-Model-Based Data Augmentation for Target Detection in Side-Scan Sonar Images

by Yuanxu Yang and Tao Zhang

Remote Sens. 2026, 18(13), 2193; https://doi.org/10.3390/rs18132193 (registering DOI) - 4 Jul 2026

Abstract

Side-scan sonar images play an important role in underwater target detection, seabed mapping, and marine environment monitoring. However, the performance of deep learning-based detectors is often limited by the small scale of available sonar datasets, the high cost of data acquisition, and class imbalance among target categories. To address these issues, this paper proposes a diffusion-model-based data augmentation method for side-scan sonar target detection. A FLUX.1 diffusion model is adopted as the base generative framework and is fine-tuned using low-rank adaptation (LoRA) to adapt the pretrained model to the side-scan sonar image domain under limited training data conditions. The generated samples are further filtered and added only to the training set, while the validation and test sets are kept unchanged and contain only real sonar images. To ensure a fair evaluation of the augmentation strategy, all detection experiments are conducted using a fixed YOLOv8n (You Only Look Once version 8 nano) detector under the same training hyperparameters and three random seeds. Compared with training on the original dataset, the proposed FLUX+LoRA augmentation improves mean average precision (mAP)@0.5 from 0.7400 ± 0.0132 to 0.8582 ± 0.0328 and mAP@0.5:0.95 from 0.3994 ± 0.0187 to 0.5115 ± 0.0164. It also outperforms conventional augmentation methods under the same real-only validation/test protocol. In addition, Fréchet Inception Distance (FID)/Kernel Inception Distance (KID)-based image quality evaluation, generated-sample amount ablation, screening-strategy ablation, LoRA-rank sensitivity analysis, and a controlled 600-sample diffusion-backbone comparison are conducted. The results show that the 600-sample manually annotated FLUX+LoRA subset selected from generated samples achieves better image quality and detection performance than FLUX-base and SD1.5+LoRA under the same annotation budget. These findings demonstrate that FLUX+LoRA-generated sonar images can provide useful structural diversity for detector training and improve target detection performance under limited-data conditions. Full article

(This article belongs to the Section Remote Sensing Image Processing)

►▼ Show Figures

Figure 1

19 pages, 24929 KB

Open AccessArticle

MFFDet: Enhancing Multi-Scale Forest Fire Detection in UAV Imagery

by Zhengshen Huang, Rui Wang, Xin Li, Weili Kou, Qinyan Gu, Zengxing Li, Jiangxia Ye and Qiuhua Wang

Fire 2026, 9(7), 278; https://doi.org/10.3390/fire9070278 (registering DOI) - 4 Jul 2026

Abstract

In Unmanned aerial vehicle (UAV) forest fire detection, flames and smoke exhibit dramatic scale variations. Existing methods often struggle with multi-scale feature extraction, fusion quality, and localization reliability, resulting in limited accuracy improvements. To address this issue, this study optimizes the backbone, neck, and head of YOLOv11n to propose a novel multi-scale forest fire detector (MFFDet), which consists of three key modules: (1) the Multi-Scale Feature Calibration Module (MFCM) is designed to improve multi-scale feature representation by context aggregation and detail calibration; (2) the Cross-Scale Semantic Alignment Module (CSAM) is proposed to enhance fusion quality by applying channel reorganization and local spatial refinement; and (3) the Location Quality Estimator Head (LQEH) is presented for reliable localization by mapping the statistical information of regression distributions into a localization quality score, which systematically boosts the accuracy and stability of multi-scale object detection. In addition, to alleviate the scarcity of UAV forest fire detection data, this study constructs a UAV Forest Fire Dataset (UF

^{2}

D), providing important data support for UAV-based fire detection. Experiments on UF

^{2}

D show that MFFDet achieves an mAP@0.5 of 70.1%, the best among all compared models, representing a 4.4% improvement over the baseline. Moreover, it attains the top performance on small, medium, and large objects, with AP

_{s}

of 20.3%, AP

_{m}

of 31.5%, and AP

_{l}

of 44.8%, highlighting MFFDet’s robustness and accuracy for multi-scale flame and smoke detection in a complex forest fire environment, which bears important practical significance for the intelligent upgrade of forest fire prevention and control. Full article

(This article belongs to the Special Issue Computer Vision and Artificial Intelligence in Fire and Flame Detection)

28 pages, 11757 KB

Open AccessArticle

A Structure-Aware Deep Learning Framework for Automated Bridge Inspection Integrating SegFormer-Based Structural Member Segmentation and YOLOv8 Damage Detection

by Sushama De Silva and Pang-jo Chun

Sensors 2026, 26(13), 4255; https://doi.org/10.3390/s26134255 (registering DOI) - 4 Jul 2026

Abstract

As a pilot-scale feasibility study, aging bridge infrastructure and limited inspection resources have created an urgent need for automated and reliable bridge condition assessment systems. Most existing deep learning-based inspection approaches detect damage types from images without considering the structural member on which the damage occurs, limiting their practical utility for maintenance decision-making. This study proposes a structure-aware deep learning framework for automated bridge inspection that integrates structural member segmentation, two-class damage detection, and spatial damage-to-member association within a unified pipeline. A SegFormer-based semantic segmentation model was trained on a custom bridge inspection dataset comprising 1339 images to identify three primary structural member classes—main girder, deck slab, and abutment—achieving a test mean Intersection over Union (mIoU) of 0.851. Boundary refinement using the Segment Anything Model (SAM) in mask-prompt mode was applied to improve mask precision during training data preparation. A YOLOv8s object detection model was trained on a custom bridge damage dataset of 9142 annotated images (6531 training, 1740 validation, and 871 test images) to detect two damage classes—crack and corrosion—achieving a mean Average Precision (mAP50) of 0.445 at a confidence threshold of 0.30. The framework associates detected damage with segmented structural members using a region-based spatial assignment strategy, enabling structure-aware outputs such as “crack on main girder” and “corrosion on deck slab.” Manual evaluation on 100 bridge inspection images demonstrated a fully correct damage detection accuracy of 70.0% and a fully correct member assignment accuracy of 62.0%. When partially correct predictions were additionally considered for qualitative analysis, the corresponding accuracies increased to 84.0% and 87.0%, respectively. The main girder class achieved the highest combined accuracy for both damage detection (90.9%) and member assignment (93.9%). These results demonstrate the potential of the proposed framework as a first layer for AI-assisted bridge inspection by associating detected damage with structural members, providing structured inspection information to support subsequent maintenance assessment and infrastructure monitoring. Full article

(This article belongs to the Section Fault Diagnosis & Sensors)

►▼ Show Figures

Figure 1

31 pages, 3034 KB

Open AccessArticle

Multi-Feature Fusion and Optimization for Micropterus salmoides Tracking and Body Length Monitoring in Complex Aquaculture Environments

by Ziyi Yin, Guanxu Li, Zhiyi Liu, Feng Liu, Mai Li and Chengguo Wang

Sensors 2026, 26(13), 4250; https://doi.org/10.3390/s26134250 (registering DOI) - 4 Jul 2026

Abstract

To achieve non-contact and continuous monitoring of body length in Micropterus salmoides and overcome the stress damage and subjective error associated with traditional manual measurement, this paper proposes an improved YOLOv8-based multi-target tracking framework for intensive recirculating aquaculture systems. The system employs a geometric measurement framework based on monocular vision that achieves conversion from pixel coordinates to actual body length through camera calibration, water-surface refraction correction, and pose projection correction. Under a collaborative optimization framework integrating detection and tracking, the model incorporates multi-scale feature enhancement, lightweight re-identification (ReID), and a robust data association mechanism, which improves system stability under conditions of high fish density, variable illumination, and turbid water. A shallow feature fusion path is introduced to enhance small-target perception, and a MobileNetV3_ReID model is adopted to extract highly discriminative appearance features, which improves identity consistency while maintaining model compactness. In the data association stage, a hybrid cost matrix integrating IoU, cosine similarity, and motion consistency is constructed, and optimal matching is realized through the Hungarian algorithm. Dynamic threshold adjustment and an exponential moving-average feature-update strategy are introduced to effectively suppress identity switching. Experiments were conducted on an overhead video dataset of Micropterus salmoides collected at a recirculating aquaculture system facility. The results show that the proposed method achieves 82.7% mAP50 while maintaining a real-time throughput of 88 FPS, with MOTA reaching 76.9% and IDF1 reaching 81.5%—the latter representing an improvement of 3.2 percentage points over BoT-SORT and 5.3 percentage points over the YOLOv8 baseline tracker. The number of identity switches (IDSW) decreased from 89 in the baseline configuration to 39, a reduction of 56.2%. Crucially, these component-level improvements translate into a body length error (BLE) of 5.2 ± 1.8% (MAE = 1.35 cm, Pearson r = 0.972), representing a 38.8% improvement over the baseline BLE of 8.5% and satisfying the 5–10% tolerance required for aquaculture growth monitoring. Ablation analysis confirms that both detection enhancements (contributing −1.3% BLE) and tracking optimizations (contributing −2.0% BLE) are necessary to achieve this application-level accuracy. Full article

(This article belongs to the Section Smart Agriculture)

25 pages, 15657 KB

Open AccessArticle

YOLO-DC: A Crop Detection and Counting Network for UAV-Based Agricultural Scenes

by Haotian Bai, Lei Liu, Haocheng Kong, Xiaoyu Li and Yuefeng Du

Remote Sens. 2026, 18(13), 2187; https://doi.org/10.3390/rs18132187 (registering DOI) - 4 Jul 2026

Abstract

Crop targets in UAV aerial images are typically characterized by small scale, dense distribution, severe mutual occlusion, and complex backgrounds, which often lead to low detection accuracy and large counting errors for existing deep learning models. To address these issues, this study proposes an improved YOLOv12-based crop detection and counting model, named YOLO-DC. By introducing an attention mechanism (LGCB-AM) and a multi-scale detection head (MS-DH), the proposed model effectively enhances local texture extraction, global modeling, foreground–background contrast, and boundary perception for dense small objects. Subsequently, a series of comparative experiments, ablation studies, and transfer experiments were conducted on the wheat and rice datasets. The results show that YOLO-DC achieves a favorable balance among detection accuracy, counting error, and model efficiency and overall outperforms the other comparison models. Ablation studies further verify the effectiveness of the proposed design, showing that LGCB-AM is the key contributor to the performance improvement, while the boundary branch and repulsion branch play critical roles in dense-target discrimination. In addition, an appropriate module insertion strategy can effectively balance high-level semantic enhancement and feature fusion stability. Transfer experiments demonstrate that pretraining on the wheat dataset and fine-tuning on the rice dataset significantly outperform training from scratch, indicating strong cross-crop transfer potential. Overall, the proposed YOLO-DC provides an effective solution for high-precision crop detection and counting in agricultural scenarios. Full article

(This article belongs to the Special Issue Application of UAV Images in Precision Agriculture)

►▼ Show Figures

Figure 1

33 pages, 17421 KB

Open AccessArticle

A Diffusion-Regularized Object Detection Framework for Agricultural Target Detection with Theoretical Analysis

by Yung-Hsiang Chen, Wan-Ju Lin, Kuang-Yueh Pan and Yi-Hong Lin

Mathematics 2026, 14(13), 2373; https://doi.org/10.3390/math14132373 - 3 Jul 2026

Abstract

Accurate object detection in agricultural environments remains challenging due to illumination variation, background clutter, partial occlusion, and overlapping fruits. Conventional object detection methods mainly rely on deterministic data augmentation strategies or feature-level refinement, which often exhibit limited robustness under complex field conditions. To address this issue, this paper proposes a Diffusion-Regularized Object Detection (DROD) framework for robust pineapple target detection in agricultural imagery. The proposed framework introduces a mathematically grounded forward diffusion and diffusion-guided representation mechanism directly in the image domain, where stochastic perturbations are generated through forward diffusion and semantically meaningful image representations are learned via diffusion-guided representation. A unified optimization framework and theoretical analyses of perturbation propagation, Lipschitz stability, and training convergence are further established to provide mathematical support for the proposed method. Extensive experiments were conducted on a self-constructed dataset containing 1600 real-world pineapple images collected under practical agricultural conditions. Comparative evaluations involving YOLOv8-s, YOLOv8-L, traditional data augmentation, and the recent JTA:GAN method demonstrate that the proposed DROD framework consistently achieves the best detection performance in terms of Precision, Recall, mAP@0.5, and mAP@0.5:0.95 while maintaining computational complexity and inference speed comparable to the original YOLOv8 architecture. Furthermore, ablation studies, diffusion parameter sensitivity analysis, visualization analysis, and experimental validation under different perturbation levels consistently verify the effectiveness and robustness of the proposed diffusion mechanism. These results demonstrate that diffusion-based regularization provides an effective and computationally efficient solution for robust agricultural object detection and offers a practical framework for intelligent precision agriculture applications. Full article

(This article belongs to the Special Issue Mathematics Methods of Robotics and Intelligent Systems)

►▼ Show Figures

Figure 1

21 pages, 21481 KB

Open AccessArticle

Computer Vision-Based Airport Turnaround Monitoring Using YOLOv11, Multi-Object Tracking, and Motion-Based Passenger and Baggage Activity Detection

by Nutchanon Suvittawat and De Wen Soh

Sensors 2026, 26(13), 4231; https://doi.org/10.3390/s26134231 - 3 Jul 2026

Abstract

Airport turnaround is an important operational process that directly affects flight punctuality, airport capacity, and ground-handling efficiency. However, many turnaround activities are still monitored manually or through fragmented operational records, which can limit real-time visibility and delay identification. This study proposes a computer vision-based airport turnaround monitoring pipeline that integrates YOLOv11 object detection, Norfair multi-object tracking, and frame differencing-based motion analysis to extract key operational events from airport video footage. Publicly available turnaround footage from Shinshu Matsumoto Airport, Japan, was collected under different environmental conditions, including daytime, nighttime, rainy, after-rain, and transition lighting conditions. From selected videos, 1446 images were labeled into 11 airport turnaround object classes, including tow tug, aerobridge, airplane, baggage container, belt loader, belt loader roof, fuel line, fuel tanker, fuel tube, tractor, and window. The dataset was divided into training, validation, and testing sets using a 70:20:10 ratio. The trained YOLOv11 model achieved strong detection performance, with overall test an precision of 0.9609, recall of 0.9445, and mAP50 of 0.9617. To support activity-level interpretation beyond object detection, the proposed pipeline applies frame differencing within specific regions of interest, including the aerobridge window region for passenger deboarding and boarding detection, and the belt loader roof region for baggage unloading and loading detection. The extracted object detections, motion spikes, and temporal logs are then converted into a Gantt chart that summarizes major turnaround activities, including airplane parking, deboarding, baggage unloading, refueling, baggage loading, boarding, and pushback. The results demonstrate that the proposed modified YOLO-based pipeline can transform ordinary airport video footage into structured operational timelines, supporting more transparent, data-driven, and automated monitoring of airport turnaround processes. Full article

(This article belongs to the Special Issue AI-Based Computer Vision Sensors & Systems—2nd Edition)

►▼ Show Figures

Figure 1

24 pages, 4429 KB

Open AccessArticle

From Detection to Functional Analysis: Evaluating Vehicle Detection Models in High-Resolution Earth Observation Imagery

by Damian Wierzbicki, Kinga Karwowska, Wojciech Karwowski and Vladimir Kovarik

Remote Sens. 2026, 18(13), 2166; https://doi.org/10.3390/rs18132166 - 3 Jul 2026

Abstract

The rapid development of deep learning methods has significantly improved the effectiveness of object detection in Earth Observation (EO) imagery. However, standard metrics such as Mean Average Precision (mAP) do not fully reflect their utility in operational analyses. This paper proposes a multi-stage methodology for evaluating vehicle detection models, combining classical evaluation with functional analysis encompassing object counting, density estimation, and occupancy index. The research was conducted on high-resolution imagery (WorldView, Pleiades) and the xView dataset, evaluating five YOLO variants alongside transformer-based and two-stage detectors under three training strategies, including fine-tuning. The results show that models achieving high mAP values (up to 0.952) can simultaneously produce significant errors in object count estimation. Models trained exclusively on xView exhibit a substantial performance drop (mAP@0.50 ≈ 0.45) under domain shift conditions. The best results were obtained using a fusion-based approach combining YOLOv9 and YOLOv12, which reduced the mean relative error to 0.14 and the counting error to 13 objects, maintaining a low density error (0.0023). Functional validation across 20 parking areas confirmed the stability of the proposed approach. The findings confirm that functional analysis constitutes a critical complement to classical evaluation in remote sensing applications. Full article

(This article belongs to the Special Issue Object Detection in Remote Sensing Images Based on Artificial Intelligence)

►▼ Show Figures

Figure 1

27 pages, 12344 KB

Open AccessArticle

A Lightweight Small-Object Detector for UAV Imagery via Multi-Scale Feature Enhancement and Saliency-Guided Cross-Layer Fusion

by Hao Zhen, Guijun Chen, Fangli Guan, Liqi Yan, Zhixiang Fang, Jianhui Zhang, Haosheng Huang and Pan Li

Remote Sens. 2026, 18(13), 2164; https://doi.org/10.3390/rs18132164 - 3 Jul 2026

Abstract

As unmanned aerial vehicles (UAVs) become central to traffic inspection, urban security, and emergency response, UAV-based environmental perception requires both high accuracy and real-time efficiency. However, UAV imagery remains challenging due to three primary factors: detail loss, where small targets occupy minimal pixels and weak edges are diluted by downsampling; ineffective cross-scale fusion, where semantic gaps between shallow and deep features lead to scale misalignment and small-object suppression; and environmental interference, where clutter, occlusion, and dense layouts cause localization drift. To address these challenges, we propose an optimized efficient detector built upon the YOLOv8s framework, incorporating multi-scale feature enhancement and saliency-guided cross-layer fusion. Specifically, we integrate RFCAConv and RGCSP modules into the backbone to strengthen local detail and spatial structure modeling. Furthermore, we design a Multi-Scale Adaptive Fusion Module (MSAFM) to align deep and shallow cues through dual-pooling and adaptive channel recalibration. To handle complex backgrounds, a Saliency-Guided Contextual Attention Module (CASM) is introduced to emphasize target regions, alongside a dynamic detection head for adaptive feature modulation. Evaluated on the VisDrone2019 dataset, our method achieves 48.3% mAP@0.5 and 29.0% mAP@[0.5:0.95], outperforming YOLOv8s by 10.2 and 6.3 points, respectively, while keeping the model compact with 7.2M parameters and a 14.4 MB model size. Full article

(This article belongs to the Special Issue Small Target Detection, Recognition, and Tracking in Remote Sensing)

►▼ Show Figures

Figure 1

26 pages, 48368 KB

Open AccessArticle

Foreign Object Detection Model for Retail Cabinets Under Complex Backgrounds

by Zhenshuo Zhou, Kai Xie, Wei Zhang and Jianbiao He

Electronics 2026, 15(13), 2920; https://doi.org/10.3390/electronics15132920 - 3 Jul 2026

Abstract

With the rapid expansion of the unmanned retail ecosystem, real-time foreign object detection (FOD) in smart vending cabinets has become a critical technology for ensuring equipment safety and protecting user rights. However, existing models often face bottlenecks in accuracy when dealing with small targets and occlusion scenarios, and struggle to balance accuracy with speed on edge devices. To address these challenges, this paper proposes an improved model specifically designed for foreign object detection based on the YOLOv11n framework, named YOLOv11n-FOD (foreign object detection). In terms of algorithm design, this paper reconstructs the feature extraction and fusion paradigm. Specifically, the original C3K2 module in the backbone network is replaced with a C3K2-SAC (Spatial Attention Convolution) module incorporating an attention mechanism, which enhances global context modeling capabilities. Subsequently, the CARAFE (Content-Aware ReAssembly of Features) operator is introduced to replace traditional interpolation, significantly improving sensitivity to small targets and textural details. Furthermore, the CBAM (Convolutional Block Attention Module) is integrated into the downsampling stage to suppress background noise while reducing computational redundancy. Notably, these improvements maintain an extremely lightweight architecture, increasing computational overhead by only 0.3 GFLOPs. Experimental results demonstrate that the proposed YOLOv11n-FOD achieves significant performance gains: mAP@50 is increased by 0.4%, and mAP@50-95 is improved by 1.0%. Extensive experiments on the SKU-110K dataset further verify the superior performance of the proposed model. In conclusion, this study effectively balances detection accuracy, model complexity, and inference speed, providing an efficient solution for foreign object detection in smart retail cabinets. Full article

(This article belongs to the Special Issue Intelligent Sensing Empowered by Artificial Intelligence)

►▼ Show Figures

Figure 1

33 pages, 7330 KB

Open AccessArticle

Lightweight Small-Object Defect Detection for Industrial Small Transformers Based on an Improved YOLOv12 Network

by Jitao Zou, Fan Zhang and Changlong Wang

Appl. Sci. 2026, 16(13), 6664; https://doi.org/10.3390/app16136664 - 3 Jul 2026

Abstract

Appearance defect detection of small industrial transformers is challenging because defects such as bent pins, missing pins, wire breakage, and missing wires are usually small in size and weak in visual features. To improve detection accuracy while maintaining real-time deployment capability, this study proposes an improved lightweight object detection model, named YOLOv12-Optimized, for small transformer quality inspection. First, reparameterized ghost module (RepGhost) re-parameterized modules are introduced into the backbone network to enhance fine-grained feature extraction and reduce computational redundancy. Second, an improved convolutional block attention module (CBAM) is embedded in the neck network to strengthen the response to weak defect features and suppress background interference. Third, an improved wise intersection over union (WIoU) loss function with numerical stability constraints is adopted to improve bounding-box regression robustness for dense small targets. A dedicated small transformer defect dataset was constructed using industrial camera images and data augmentation. Ablation experiments demonstrate that RepGhost, improved CBAM, and improved WIoU each contribute to performance improvement, and their combination achieves the best overall results. Compared with the baseline YOLOv12 model, YOLOv12-Optimized improves mean average precision at an intersection over union threshold of 0.5 (mAP@0.5) from 77.48% to 89.17%, with precision and recall reaching 88.61% and 84.07%, respectively. The model maintains a lightweight structure with 1.98 M parameters and 5.15 giga floating-point operations (GFLOPs), while satisfying real-time inspection requirements. The results indicate that the proposed method effectively balances detection accuracy, model complexity, and industrial applicability, providing a feasible solution for automated appearance quality inspection of small transformers. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

►▼ Show Figures

Figure 1

34 pages, 41500 KB

Open AccessArticle

Training-Free Defect Image Generation with Multi-Domain Consistency and Geometric-Semantic Constraints for Industrial Visual Sensing Inspection

by Yushen Wang, Dengbiao Jiang, Yiming Wang, Kelong Zhu and Guoquan Yao

Sensors 2026, 26(13), 4216; https://doi.org/10.3390/s26134216 - 3 Jul 2026

Abstract

Industrial defect generation has long been challenged by the scarcity of real anomaly samples and the imbalance of defect categories, particularly in complex industrial scenarios involving transparent containers. Taking vials as an example, glass reflection, specular highlights, and fine-grained defects make continuous defect acquisition difficult, thereby making the realism and controllability of augmented samples critical to downstream detection performance. Although existing diffusion-based generation methods can improve synthetic image quality, they often require additional training or lightweight fine-tuning, which limits their efficiency in sample-limited industrial scenarios. To address this issue, this paper builds upon the TF-IDG framework and proposes a training-free industrial defect generation method based on multi-domain consistency and geometric-semantic constraints. To alleviate the unnatural texture details, boundary transitions, and background blending commonly observed in generated defects, a multi-domain consistency constraint is introduced to enhance generation realism from both frequency-domain structures and cross-domain contextual representations, thereby improving anomaly texture expression and overall visual coherence. To further mitigate unstable defect contours, spatial deviation, and structural mismatch with target objects, a geometric-semantic constraint is designed to regulate the generation process through elastic shape constraints and semantic region-anchored attention, enhancing the rationality of defect morphology evolution and spatial localization. Experimental results on both the MVTec AD dataset and a self-built vial defect dataset demonstrate that the proposed method outperforms comparative approaches. Specifically, when YOLOv11 is used as the downstream detector, the mAP@50 on the MVTec AD dataset and the self-built vial defect dataset is improved from 88.5% and 98.0% for the TF-IDG baseline to 89.6% and 98.8%, respectively. Full article

(This article belongs to the Section Industrial Sensors)

►▼ Show Figures

Figure 1

Show export options Show export options

Select all

Export citation of selected articles as:

Error

Oops... you haven't selected anything for export.

Displaying article 1-50 on page 1 of 131.

Go to page 1 2 3 4 5

Search Results (6,506)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI