Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (387)

Search Parameters:
Keywords = deep instance segmentation

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
20 pages, 4637 KB  
Article
A Lightweight YOLOv13-G Framework for High-Precision Building Instance Segmentation in Complex UAV Scenes
by Yao Qu, Libin Tian, Jijun Miao, Sergei Leonovich, Yanchun Liu, Caiwei Liu and Panfeng Ba
Buildings 2026, 16(3), 559; https://doi.org/10.3390/buildings16030559 - 29 Jan 2026
Abstract
Accurate building instance segmentation from UAV imagery remains a challenging task due to significant scale variations, complex backgrounds, and frequent occlusions. To tackle these issues, this paper proposes an improved lightweight YOLOv13-G-based framework for building extraction in UAV imagery. The backbone network is [...] Read more.
Accurate building instance segmentation from UAV imagery remains a challenging task due to significant scale variations, complex backgrounds, and frequent occlusions. To tackle these issues, this paper proposes an improved lightweight YOLOv13-G-based framework for building extraction in UAV imagery. The backbone network is enhanced by incorporating cross-stage lightweight connections and dilated convolutions, which improve multi-scale feature representation and expand the receptive field with minimal computational cost. Furthermore, a coordinate attention mechanism and an adaptive feature fusion module are introduced to enhance spatial awareness and dynamically balance multi-level features. Extensive experiments on a large-scale dataset, which includes both public benchmarks and real UAV images, demonstrate that the proposed method achieves superior segmentation accuracy with a mean intersection over union of 93.12% and real-time inference speed of 38.46 frames per second while maintaining a compact Model size of 5.66 MB. Ablation studies and cross-dataset experiments further validate the effectiveness and generalization capability of the framework, highlighting its strong potential for practical UAV-based urban applications. Full article
(This article belongs to the Topic Application of Smart Technologies in Buildings)
Show Figures

Figure 1

29 pages, 1843 KB  
Systematic Review
Deep Learning for Tree Crown Detection and Delineation Using UAV and High-Resolution Imagery for Biometric Parameter Extraction: A Systematic Review
by Abdulrahman Sufyan Taha Mohammed Aldaeri, Chan Yee Kit, Lim Sin Ting and Mohamad Razmil Bin Abdul Rahman
Forests 2026, 17(2), 179; https://doi.org/10.3390/f17020179 - 29 Jan 2026
Abstract
Mapping individual-tree crowns (ITCs) along with extracting tree morphological attributes provides the core parameters required for estimating thermal stress and carbon emission functions. However, calculating morphological attributes relies on the prior delineation of ITCs. Using the Preferred Reporting Items for Systematic Reviews and [...] Read more.
Mapping individual-tree crowns (ITCs) along with extracting tree morphological attributes provides the core parameters required for estimating thermal stress and carbon emission functions. However, calculating morphological attributes relies on the prior delineation of ITCs. Using the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) framework, this review synthesizes how deep-learning (DL)-based methods enable the conversion of crown geometry into reliable biometric parameter extraction (BPE) from high-resolution imagery. This addresses a gap often overlooked in studies focused solely on detection by providing a direct link to forest inventory metrics. Our review showed that instance segmentation dominates (approximately 46% of studies), producing the most accurate pixel-level masks for BPE, while RGB imagery is most common (73%), often integrated with canopy-height models (CHM) to enhance accuracy. New architectural approaches, such as StarDist, outperform Mask R-CNN by 6% in dense canopies. However, performance differs with crown overlap, occlusion, species diversity, and the poor transferability of allometric equations. Future work could prioritize multisensor data fusion, develop end-to-end biomass modeling to minimize allometric dependence, develop open datasets to address model generalizability, and enhance and test models like StarDist for higher accuracy in dense forests. Full article
Show Figures

Figure 1

41 pages, 5796 KB  
Article
Comparative Analysis of R-CNN and YOLOv8 Segmentation Features for Tomato Ripening Stage Classification and Quality Estimation
by Ali Ahmad, Jaime Lloret, Lorena Parra, Sandra Sendra and Francesco Di Gioia
Horticulturae 2026, 12(2), 127; https://doi.org/10.3390/horticulturae12020127 - 23 Jan 2026
Viewed by 171
Abstract
Accurate classification of tomato ripening stages and quality estimation is pivotal for optimizing post-harvest management and ensuring market value. This study presents a rigorous comparative analysis of morphological and colorimetric features extracted via two state-of-the-art deep learning-based instance segmentation frameworks—Mask R-CNN and YOLOv8n-seg—and [...] Read more.
Accurate classification of tomato ripening stages and quality estimation is pivotal for optimizing post-harvest management and ensuring market value. This study presents a rigorous comparative analysis of morphological and colorimetric features extracted via two state-of-the-art deep learning-based instance segmentation frameworks—Mask R-CNN and YOLOv8n-seg—and their efficacy in machine learning-driven ripening stage classification and quality prediction. Using 216 fresh-market tomato fruits across four defined ripening stages, we extracted 27 image-derived features per model, alongside 12 laboratory-measured physio-morphological traits. Multivariate analyses revealed that R-CNN features capture nuanced colorimetric and structural variations, while YOLOv8 emphasizes morphological characteristics. Machine learning classifiers trained with stratified 10-fold cross-validation achieved up to 95.3% F1-score when combining both feature sets, with R-CNN and YOLOv8 alone attaining 96.9% and 90.8% accuracy, respectively. These findings highlight a trade-off between the superior precision of R-CNN and the real-time scalability of YOLOv8. Our results demonstrate the potential of integrating complementary segmentation-derived features with laboratory metrics to enable robust, non-destructive phenotyping. This work advances the application of vision-based machine learning in precision agriculture, facilitating automated, scalable, and accurate monitoring of fruit maturity and quality. Full article
(This article belongs to the Special Issue Sustainable Practices in Smart Greenhouses)
Show Figures

Figure 1

27 pages, 4802 KB  
Article
Fine-Grained Radar Hand Gesture Recognition Method Based on Variable-Channel DRSN
by Penghui Chen, Siben Li, Chenchen Yuan, Yujing Bai and Jun Wang
Electronics 2026, 15(2), 437; https://doi.org/10.3390/electronics15020437 - 19 Jan 2026
Viewed by 148
Abstract
With the ongoing miniaturization of smart devices, fine-grained hand gesture recognition using millimeter-wave radar has attracted increasing attention, yet practical deployment remains challenging in continuous-gesture segmentation, robust feature extraction, and reliable classification. This paper presents an end-to-end fine-grained gesture recognition framework based on [...] Read more.
With the ongoing miniaturization of smart devices, fine-grained hand gesture recognition using millimeter-wave radar has attracted increasing attention, yet practical deployment remains challenging in continuous-gesture segmentation, robust feature extraction, and reliable classification. This paper presents an end-to-end fine-grained gesture recognition framework based on frequency modulated continuous wave(FMCW) millimeter-wave radar, including gesture design, data acquisition, feature construction, and neural network-based classification. Ten gesture types are recorded (eight valid gestures and two return-to-neutral gestures); for classification, the two return-to-neutral gesture types are merged into a single invalid class, yielding a nine-class task. A sliding-window segmentation method is developed using short-time Fourier transformation(STFT)-based Doppler-time representations, and a dataset of 4050 labeled samples is collected. Multiple signal classification(MUSIC)-based super-resolution estimation is adopted to construct range–time and angle–time representations, and instance-wise normalization is applied to Doppler and range features to mitigate inter-individual variability without test leakage. For recognition, a variable-channel deep residual shrinkage network (DRSN) is employed to improve robustness to noise, supporting single-, dual-, and triple-channel feature inputs. Results under both subject-dependent evaluation with repeated random splits and subject-independent leave one subject out(LOSO) cross-validation show that DRSN architecture consistently outperforms the RefineNet-based baseline, and the triple-channel configuration achieves the best performance (98.88% accuracy). Overall, the variable-channel design enables flexible feature selection to meet diverse application requirements. Full article
Show Figures

Figure 1

17 pages, 11104 KB  
Article
Lightweight Improvements to the Pomelo Image Segmentation Method for Yolov8n-seg
by Zhen Li, Baiwei Cao, Zhengwei Yu, Qingting Jin, Shilei Lyu, Xiaoyi Chen and Danting Mao
Agriculture 2026, 16(2), 186; https://doi.org/10.3390/agriculture16020186 - 12 Jan 2026
Viewed by 304
Abstract
Instance segmentation in agricultural robotics requires a balance between real-time performance and accuracy. This study proposes a lightweight pomelo image segmentation method based on the YOLOv8n-seg model integrated with the RepGhost module. A pomelo dataset consisting of 5076 samples was constructed through systematic [...] Read more.
Instance segmentation in agricultural robotics requires a balance between real-time performance and accuracy. This study proposes a lightweight pomelo image segmentation method based on the YOLOv8n-seg model integrated with the RepGhost module. A pomelo dataset consisting of 5076 samples was constructed through systematic image acquisition, annotation, and data augmentation. The RepGhost architecture was incorporated into the C2f module of the YOLOv8-seg backbone network to enhance feature reuse capabilities while reducing computational complexity. Experimental results demonstrate that the YOLOv8-seg-RepGhost model enhances efficiency without compromising accuracy: parameter count is reduced by 16.5% (from 3.41 M to 2.84 M), computational load decreases by 14.8% (from 12.8 GFLOPs to 10.9 GFLOPs), and inference time is shortened by 6.3% (to 15 ms). The model maintains excellent detection performance with bounding box mAP50 at 97.75% and mask mAP50 at 97.51%. The research achieves both high segmentation efficiency and detection accuracy, offering core support for developing visual systems in harvesting robots and providing an effective solution for deep learning-based fruit target recognition and automated harvesting applications. Full article
(This article belongs to the Special Issue Advances in Precision Agriculture in Orchard)
Show Figures

Figure 1

28 pages, 11495 KB  
Article
A Pipeline for Mushroom Mass Estimation Based on Phenotypic Parameters: A Multiple Oudemansiella raphanipies Model
by Hua Yin, Danying Lei, Anping Xiong, Lu Yuan, Minghui Chen, Yilu Xu, Yinglong Wang, Hui Xiao and Quan Wei
Agronomy 2026, 16(1), 124; https://doi.org/10.3390/agronomy16010124 - 4 Jan 2026
Viewed by 242
Abstract
Estimating the mass of Oudemansiella raphanipies quickly and accurately is indispensable in optimizing post-harvest packaging processes. Traditional methods typically involve manual grading followed by weighing with a balance, which is inefficient and labor-intensive. To address the challenges encountered in actual production scenarios, in [...] Read more.
Estimating the mass of Oudemansiella raphanipies quickly and accurately is indispensable in optimizing post-harvest packaging processes. Traditional methods typically involve manual grading followed by weighing with a balance, which is inefficient and labor-intensive. To address the challenges encountered in actual production scenarios, in this work, we developed a novel pipeline for estimating the mass of multiple Oudemansiella raphanipies. To achieve this goal, an enhanced deep learning (DL) algorithm for instance segmentation and a machine learning (ML) model for mass prediction were introduced. On one hand, to segment multiple samples in the same image, a novel instance segmentation network named FinePoint-ORSeg was applied to obtain the finer edges of samples, by integrating an edge attention module to improve the fineness of the edges. On the other hand, for individual samples, a novel cap–stem segmentation approach was applied and 18 phenotypic parameters were obtained. Furthermore, principal component analysis (PCA) was utilized to reduce the redundancy among features. Combining the two aspects mentioned above, the mass was computed by an exponential GPR model with seven principal components. In terms of segmentation performance, our model outperforms the original Mask R-CNN; the AP, AP50, AP75, and APs are improved by 2%, 0.7%, 1.9%, and 0.3%, respectively. Additionally, our model outperforms other networks such as YOLACT, SOLOV2, and Mask R-CNN with Swin. As for mass estimation, the results show that the average coefficient of variation (CV) of a single sample mass in different attitudes is 6.81%. Moreover, the average mean absolute percentage error (MAPE) for multiple samples is 8.53%. Overall, the experimental results indicate that the proposed method is time-saving, non-destructive, and accurate. This can provide a reference for research on post-harvest packaging technology for Oudemansiella raphanipies. Full article
(This article belongs to the Special Issue Novel Studies in High-Throughput Plant Phenomics)
Show Figures

Figure 1

29 pages, 6257 KB  
Article
WGMG-Net: A Wavelet-Guided Real-Time Instance Segmentation Framework for Automated Post-Harvest Grape Quality Assessment
by Haoyuan Hao, Lvhan Zhuang, Yi Yang, Chongchong Yu, Xinting Yang and Jiangbo Li
Agriculture 2026, 16(1), 121; https://doi.org/10.3390/agriculture16010121 - 2 Jan 2026
Viewed by 293
Abstract
Grading of table grapes depends on reliable berry-level phenotyping, yet manual inspection is subjective and slow. A wavelet-guided instance segmentation network named WGMG-Net is introduced for automated assessment of post-harvest grape clusters. A multi-scale feature merging module based on discrete wavelet transform is [...] Read more.
Grading of table grapes depends on reliable berry-level phenotyping, yet manual inspection is subjective and slow. A wavelet-guided instance segmentation network named WGMG-Net is introduced for automated assessment of post-harvest grape clusters. A multi-scale feature merging module based on discrete wavelet transform is used to preserve edges under dense occlusion, and a bivariate fusion enhanced attention mechanism is used to strengthen channel and spatial cues. Instance masks are produced for all berries, a regression head estimates the total berry count, and a mask-derived compactness index assigns clusters to three tightness grades. On a Shine Muscat dataset with 252 cluster images acquired on a simulated sorting line, the WGMG-Net variant attains a mean average precision at Intersection over Union (IoU) 0.5 of 98.98 percent and at IoU 0.5 to 0.95 of 87.76 percent, outperforming Mask R-CNN, PointRend and YOLO models with fewer parameters. For berry counting, a mean absolute error of 1.10 berries, root mean square error of 1.48 berries, mean absolute percentage error of 2.82 percent, accuracy within two berries of 92.86 percent and Pearson correlation of 0.986 are achieved. Compactness grading reaches Top-1 accuracy of 98.04 percent and Top-2 accuracy of 100 percent, supporting the use of WGMG-Net for grape quality evaluation. Full article
(This article belongs to the Section Artificial Intelligence and Digital Agriculture)
Show Figures

Figure 1

32 pages, 10287 KB  
Article
Shape-Aware Refinement of Deep Learning Detections from UAS Imagery for Tornado-Induced Treefall Mapping
by Mitra Nasimi and Richard L. Wood
Remote Sens. 2026, 18(1), 141; https://doi.org/10.3390/rs18010141 - 31 Dec 2025
Viewed by 303
Abstract
This study presents a geometry-based post-processing framework developed to refine deep-learning detections of tornado-damaged trees. The YOLO11-based instance segmentation framework served as the baseline, but its predictions often included multiple masks for a single tree or incomplete fragments of the same trunk, particularly [...] Read more.
This study presents a geometry-based post-processing framework developed to refine deep-learning detections of tornado-damaged trees. The YOLO11-based instance segmentation framework served as the baseline, but its predictions often included multiple masks for a single tree or incomplete fragments of the same trunk, particularly in dense canopy areas or within tiled orthomosaics. Overlapping masks led to duplicated predictions of the same tree, while fragmentation broke a single fallen trunk into disconnected parts. Both issues reduced the accuracy of tree-count estimates and weakened orientation analysis, two factors that are critical for treefall methods. To resolve these problems, a Shape-Aware Non-Maximum Suppression (SA-NMS) procedure was introduced. The method evaluated each mask’s collinearity and, based on its geometric condition, decided whether segments should be merged, separated, or suppressed. A spatial assessment then aggregated prediction vectors within a defined Region of Interest (ROI), reconnecting trunks that were divided by obstacles or tile boundaries. The proposed method, applied to high-resolution orthomosaics from the December 2021 Land Between the Lakes tornado, achieved 76.4% and 77.1% instance-level orientation agreement accuracy in two validation zones. Full article
(This article belongs to the Special Issue Advances in GIS and Remote Sensing Applications in Natural Hazards)
Show Figures

Graphical abstract

26 pages, 48691 KB  
Article
A Multi-Channel Convolutional Neural Network Model for Detecting Active Landslides Using Multi-Source Fusion Images
by Jun Wang, Hongdong Fan, Wanbing Tuo and Yiru Ren
Remote Sens. 2026, 18(1), 126; https://doi.org/10.3390/rs18010126 - 30 Dec 2025
Viewed by 346
Abstract
Synthetic Aperture Radar Interferometry (InSAR) has demonstrated significant advantages in detecting active landslides. The proliferation of computing technology has enabled the combination of InSAR and deep learning, offering an innovative approach to the automation of landslide detection. However, InSAR-based detection faces two persistent [...] Read more.
Synthetic Aperture Radar Interferometry (InSAR) has demonstrated significant advantages in detecting active landslides. The proliferation of computing technology has enabled the combination of InSAR and deep learning, offering an innovative approach to the automation of landslide detection. However, InSAR-based detection faces two persistent challenges: (1) the difficulty in distinguishing active landslides from other deformation phenomena, which leads to high false alarm rates; and (2) insufficient accuracy in delineating precise landslide boundaries due to low image contrast. The incorporation of multi-source data and multi-branch feature extraction networks can alleviate this issue, yet it inevitably increases computational cost and model complexity. To address these issues, this study first constructs a multi-source fusion image dataset combining optical remote sensing imagery, DEM-derived slope information, and InSAR deformation data. Subsequently, it proposes a multi-channel instance segmentation framework named MCLD R-CNN (Multi-Channel Landslide Detection R-CNN). The proposed network is designed to accept multi-channel inputs and integrates a landslide-focused attention mechanism, which enhances the model’s ability to capture landslide-specific features. The experimental findings indicate that the proposed strategy effectively addresses the aforementioned challenges. Moreover, the proposed MCLD R-CNN achieves superior detection accuracy and generalization ability compared to other benchmark models. Full article
Show Figures

Figure 1

19 pages, 5799 KB  
Article
An Improved Single-Stage Object Detection Model and Its Application to Oil Seal Defect Detection
by Yangzhuo Chen, Yuhang Wu, Xiaoliang Wu, Weiwei He, Guangtian He and Xiaowen Cai
Electronics 2026, 15(1), 128; https://doi.org/10.3390/electronics15010128 - 26 Dec 2025
Viewed by 321
Abstract
Oil seals, as core industrial components, often exhibit defects with sparse features and low contrast, posing significant challenges for traditional vision-based inspection methods. Although deep learning facilitates automatic feature extraction for defect detection, many instance segmentation models are computationally expensive, hindering their deployment [...] Read more.
Oil seals, as core industrial components, often exhibit defects with sparse features and low contrast, posing significant challenges for traditional vision-based inspection methods. Although deep learning facilitates automatic feature extraction for defect detection, many instance segmentation models are computationally expensive, hindering their deployment in real-time edge applications. In this paper, we present an efficient oil seal defect detection model based on an enhanced YOLOv11n architecture (YOLOv11n_CDK). The proposed approach introduces several dynamic convolution variants and integrates the Kolmogorov–Arnold Network (KAN) into the backbone. A newly designed parallel module, the nested asynchronous pooling convolutional module (NAPConv), is also incorporated to form a lightweight yet powerful feature extraction network. Experimental results demonstrate that, compared to the baseline YOLOv11n, our model reduces computational cost by 4.76% and increases mAP@0.5 by 2.14%. When deployed on a Jetson Nano embedded device, the model achieves an average processing time of 6.3 ms per image, corresponding to a frame rate of 105–110 FPS. These outcomes highlight the model’s strong potential for high-performance, real-time industrial deployment, effectively balancing detection accuracy with low computational complexity. Full article
Show Figures

Figure 1

16 pages, 2697 KB  
Article
Real-Time Callus Instance Segmentation in Plant Tissue Culture Using Successive Generations of YOLO Architectures
by Yunus Egi, Tülay Oter, Mortaza Hajyzadeh and Muammer Catak
Plants 2026, 15(1), 47; https://doi.org/10.3390/plants15010047 - 23 Dec 2025
Viewed by 473
Abstract
Callus induction is a complex procedure in plant organ, cell, and tissue culture that underpins processes such as metabolite production, regeneration, and genetic transformation. It is important to monitor callus formation alongside subjective evaluations, which require labor-intensive care. In this research, the first [...] Read more.
Callus induction is a complex procedure in plant organ, cell, and tissue culture that underpins processes such as metabolite production, regeneration, and genetic transformation. It is important to monitor callus formation alongside subjective evaluations, which require labor-intensive care. In this research, the first curated lentil (Lens culinaris) callus dataset for instance segmentation was experimentally generated using three genotypes as one data set: Firat-87, Cagil, and Tigris. Leaf explants were cultured on MS medium fortified with different concentrations of gross regulators of BA and NAA to induce callus formation. Three biologically relevant stages, the leaf stage, the green callus, and the necrosis callus, were produced. During this process, 122 high-resolution images were obtained, resulting in 1185 total annotations across them. The dataset was evaluated across four successive generations (v5/7/8/11) of YOLO deep learning models under identical conditions using mAP, Dice coefficient, Precision, Recall, and IoU, together with efficiency metrics including parameter counts, FLOPs, and inference speed. The results show that anchor-based variants (YOLOv5/7) relied on predefined priors and showed limited boundary precision, whereas anchor-free designs (YOLOv8/11) used decoupled heads and direct center/boundary regression that provided clear advantages for callus structures. YOLOv8 reached the highest instance segmentation precision with mAP50@0.855, while it matched the accuracy with greater efficiency and achieved real-time inference with 166 FPS. Full article
(This article belongs to the Special Issue Advances in Artificial Intelligence for Plant Research—2nd Edition)
Show Figures

Figure 1

22 pages, 17762 KB  
Article
Highway Reconstruction Through Fine-Grained Semantic Segmentation of Mobile Laser Scanning Data
by Yuyu Chen, Zhou Yang, Huijing Zhang and Jinhu Wang
Sensors 2026, 26(1), 40; https://doi.org/10.3390/s26010040 - 20 Dec 2025
Cited by 1 | Viewed by 435
Abstract
The highway is a crucial component of modern transportation systems, and its efficient management is essential for ensuring safety and facilitating communication. The automatic understanding and reconstruction of highway environments are therefore pivotal for advanced traffic management and intelligent transportation systems. This work [...] Read more.
The highway is a crucial component of modern transportation systems, and its efficient management is essential for ensuring safety and facilitating communication. The automatic understanding and reconstruction of highway environments are therefore pivotal for advanced traffic management and intelligent transportation systems. This work introduces a methodology for the fine-grained semantic segmentation and reconstruction of highway environments using dense 3D point cloud data acquired via mobile laser scanning. First, a multi-scale, object-based data augmentation and down-sampling method is introduced to address the issue of training sample imbalance. Subsequently, a deep learning approach utilizing the KPConv convolutional network is proposed to achieve fine-grained semantic segmentation. The segmentation results are then used to reconstruct a 3D model of the highway environment. The methodology is validated on a 32 km stretch of highway, achieving semantic segmentation across 27 categories of environmental features. When evaluated against a manually annotated ground truth, the results exhibit a mean Intersection over Union (mIoU) of 87.27%. These findings demonstrate that the proposed methodology is effective for fine-grained semantic segmentation and instance-level reconstruction of highways in practical scenarios. Full article
(This article belongs to the Special Issue Application of LiDAR Remote Sensing and Mapping)
Show Figures

Figure 1

24 pages, 27907 KB  
Article
Efficient Object-Related Scene Text Grouping Pipeline for Visual Scene Analysis in Large-Scale Investigative Data
by Enrique Shinohara, Jorge García, Luis Unzueta and Peter Leškovský
Electronics 2026, 15(1), 12; https://doi.org/10.3390/electronics15010012 - 19 Dec 2025
Viewed by 296
Abstract
Law Enforcement Agencies (LEAs) typically analyse vast collections of media files, extracting visual information that helps them to advance investigations. While recent advancements in deep learning-based computer vision algorithms have revolutionised the ability to detect multi-class objects and text instances (characters, words, numbers) [...] Read more.
Law Enforcement Agencies (LEAs) typically analyse vast collections of media files, extracting visual information that helps them to advance investigations. While recent advancements in deep learning-based computer vision algorithms have revolutionised the ability to detect multi-class objects and text instances (characters, words, numbers) from in-the-wild scenes, their association remains relatively unexplored. Previous studies focus on clustering text given its semantic relationship or layout, rather than its relationship with objects. In this paper, we present an efficient, modular pipeline for contextual scene text grouping with three complementary strategies: 2D planar segmentation, multi-class instance segmentation and promptable segmentation. The strategies address common scenes where related text instances frequently share the same 2D planar surface and object (vehicle, banner, etc.). Evaluated on a custom dataset of 1100 images, the overall grouping performance remained consistently high across all three strategies (B-Cubed F1 92–95%; Pairwise F1 80–82%), with adjusted Rand indices between 0.08 and 0.23. Our results demonstrate clear trade-offs between computational efficiency and contextual generalisation, where geometric methods offer reliability, semantic approaches provide scalability and class-agnostic strategies offer the most robust generalisation. The dataset used for testing will be made available upon request. Full article
(This article belongs to the Special Issue Deep Learning-Based Scene Text Detection)
Show Figures

Figure 1

21 pages, 16491 KB  
Article
Glue Strips Measurement and Breakage Detection Based on YOLOv11 and Pixel Geometric Analysis
by Yukai Lu, Xihang Li, Jingran Kang, Shusheng Xiong and Shaopeng Zhu
Sensors 2025, 25(24), 7624; https://doi.org/10.3390/s25247624 - 16 Dec 2025
Viewed by 395
Abstract
With the rapid development of the new energy vehicle industry, the quality control of battery pack glue application processes has become a critical factor in ensuring the sealing, insulation, and structural stability of the battery. However, existing detection methods face numerous challenges in [...] Read more.
With the rapid development of the new energy vehicle industry, the quality control of battery pack glue application processes has become a critical factor in ensuring the sealing, insulation, and structural stability of the battery. However, existing detection methods face numerous challenges in complex industrial environments, such as metal reflections, interference from heating film grids, inconsistent orientations of glue strips, and the difficulty of accurately segmenting elongated targets, leading to insufficient precision and robustness in glue dimension measurement and glue break detection. To address these challenges, this paper proposes a battery pack glue application detection method that integrates the YOLOv11 deep learning model with pixel-level geometric analysis. The method first uses YOLOv11 to precisely extract the glue region and identify and block the heating film interference area. Glue strips orientation correction and image normalization are performed through adaptive binarization and Hough transformation. Next, high-precision pixel-level measurement of glue strip width and length is achieved by combining connected component analysis and multi-line statistical strategies. Finally, glue break and wire drawing defects are reliably detected based on image slicing and pixel ratio analysis. Experimental results show that the average measurement errors in glue strip width and length are only 1.5% and 2.3%, respectively, with a 100% accuracy rate in glue break detection, significantly outperforming traditional vision methods and mainstream instance segmentation models. Ablation experiments further validate the effectiveness and synergy of the modules. This study provides a high-precision and robust automated detection solution for glue application processes in complex industrial scenarios, with significant engineering application value. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

23 pages, 2564 KB  
Article
Research on Morphometric Methods for Larimichthys crocea Based on YOLOv11-CBAM X-Ray Imaging
by Yatong Yao, Guangde Qiao, Shengmao Zhang, Chong Wu, Zuli Wu, Tianfei Cheng and Hanfeng Zheng
Fishes 2025, 10(12), 641; https://doi.org/10.3390/fishes10120641 - 11 Dec 2025
Viewed by 378
Abstract
Traditional morphometric analysis of Large Yellow Croaker (Larimichthys crocea) relies heavily on manual dissection, which is time-consuming, labor-intensive, and prone to subjectivity. To address these limitations, we propose an automated quantitative approach based on deep-learning–driven instance segmentation. A dataset comprising 160 [...] Read more.
Traditional morphometric analysis of Large Yellow Croaker (Larimichthys crocea) relies heavily on manual dissection, which is time-consuming, labor-intensive, and prone to subjectivity. To address these limitations, we propose an automated quantitative approach based on deep-learning–driven instance segmentation. A dataset comprising 160 X-ray images of L. crocea was established, encompassing five anatomical categories: whole fish, air bladder, spine, eyes, and otoliths. Building upon the baseline YOLOv11-Seg model, we integrated a lightweight Convolutional Block Attention Module (CBAM) to construct an improved YOLOv11-CBAM network, thereby enhancing segmentation accuracy for complex backgrounds and fine-grained targets. Experimental results demonstrated that the modified model achieved superior performance in both mAP50 and mAP50–95 compared with the baseline, with particularly notable improvements in the segmentation of small-scale structures such as the air bladder and spine. By introducing coin-based calibration, pixel counts were converted into absolute areas and relative proportions. The measured area ratios of the air bladder, otoliths, eyes, and spine were 7.72%, 0.59%, 2.20%, and 8.48%, respectively, with standard deviations remaining within acceptable ranges, thus validating the robustness of the proposed method. Collectively, this study establishes a standardized, efficient, and non-destructive workflow for X-ray image-based morphometric analysis, providing practical applications for aquaculture management, germplasm conservation, and fundamental biological research. Full article
(This article belongs to the Special Issue Biodiversity and Spatial Distribution of Fishes, Second Edition)
Show Figures

Figure 1

Back to TopTop