MDPI - Publisher of Open Access Journals

16 pages, 1011 KB

Open AccessArticle

Point Cloud Semantic Segmentation Network Design with Neighborhood Feature Enhancement

by Shi He and Xiang Li

Appl. Sci. 2025, 15(21), 11700; https://doi.org/10.3390/app152111700 (registering DOI) - 1 Nov 2025

The complex structures and diverse object categories in indoor environments pose significant challenges for point cloud semantic segmentation. To address the insufficient capability of extracting local features in complex scenes, this paper proposes a point cloud segmentation network based on neighborhood feature enhancement [...] Read more.

The complex structures and diverse object categories in indoor environments pose significant challenges for point cloud semantic segmentation. To address the insufficient capability of extracting local features in complex scenes, this paper proposes a point cloud segmentation network based on neighborhood feature enhancement termed PKA-Net. First, to obtain richer and more discriminative feature representations, we design a local feature encoding module that extracts geometric features, color information, and spatial information from local regions of the point cloud for joint feature encoding. Furthermore, we enhance the hierarchical feature extraction by integrating Kolmogorov–Arnold Networks (KAN) to form the SAPK module, improving the network’s ability to fit complex geometric structures. A residual structure is also adopted to optimize feature propagation and alleviate the problem of gradient vanishing. Finally, we propose the dual attention mechanism C-MSCA, which dynamically selects and strengthens key features through the synergistic action of channel and spatial attention, enhancing the network’s perception of local details and global structure. To evaluate the performance of the proposed PKA-Net, extensive experiments were conducted on the S3DIS dataset. Experimental results demonstrate that PKA-Net improves OA by 2.1%, mAcc by 2.9%, and mIoU by 4% compared to the baseline model. It outperforms other mainstream models, delivering enhanced overall segmentation performance. Full article

► Show Figures

Figure 1

16 pages, 2913 KB

Open AccessArticle

OGS-YOLOv8: Coffee Bean Maturity Detection Algorithm Based on Improved YOLOv8

by Nannan Zhao and Yongsheng Wen

Appl. Sci. 2025, 15(21), 11632; https://doi.org/10.3390/app152111632 (registering DOI) - 31 Oct 2025

Abstract

This study presents the OGS-YOLOv8 model for coffee bean maturity identification, designed to enhance accuracy in identifying coffee beans at different maturity stages in complicated contexts, utilizing an upgraded version of YOLOv8. Initially, the ODConv (full-dimensional dynamic convolution) substitutes the convolutional layers in [...] Read more.

This study presents the OGS-YOLOv8 model for coffee bean maturity identification, designed to enhance accuracy in identifying coffee beans at different maturity stages in complicated contexts, utilizing an upgraded version of YOLOv8. Initially, the ODConv (full-dimensional dynamic convolution) substitutes the convolutional layers in the backbone and neck networks to augment the network’s capacity to capture attributes of coffee bean images. Second, we replace the C2f layer in the neck networks with the CSGSPC (Convolutional Split Group-Shuffle Partial Convolution) module to reduce the computational load of the model. Lastly, to improve bounding box regression accuracy by concentrating on challenging samples, we substitute the Inner-FocalerIoU function for the CIoU loss function. According to experimental results, OGS-YOLO v8 outperforms the original model by 7.4%, achieving a detection accuracy of 73.7% for coffee bean maturity. Reaching 76% at mAP@0.5, it represents a 3.2% increase over the initial model. Furthermore, GFLOPs dropped 26.8%, from 8.2 to 6.0. For applications like coffee bean maturity monitoring and intelligent harvesting, OGS-YOLOv8 offers strong technical support and reference by striking a good balance between high detection accuracy and low computational cost. Full article

(This article belongs to the Section Agricultural Science and Technology)

► Show Figures

Figure 1

19 pages, 16806 KB

Open AccessArticle

Refined Extraction of Sugarcane Planting Areas in Guangxi Using an Improved U-Net Model

by Tao Yue, Zijun Ling, Yuebiao Tang, Jingjin Huang, Hongteng Fang, Siyuan Ma, Jie Tang, Yun Chen and Hong Huang

Drones 2025, 9(11), 754; https://doi.org/10.3390/drones9110754 (registering DOI) - 30 Oct 2025

Abstract

Sugarcane, a vital economic crop and renewable energy source, requires precise monitoring of the area in which it has been planted to ensure sugar industry security, optimize agricultural resource allocation, and allow the assessment of ecological benefits. Guangxi Zhuang Autonomous Region, leveraging its [...] Read more.

Sugarcane, a vital economic crop and renewable energy source, requires precise monitoring of the area in which it has been planted to ensure sugar industry security, optimize agricultural resource allocation, and allow the assessment of ecological benefits. Guangxi Zhuang Autonomous Region, leveraging its subtropical climate and abundant solar thermal resources, accounts for over 63% of China’s total sugarcane cultivation area. In this study, we constructed an enhanced RCAU-net model and developed a refined extraction framework that considers different growth stages to enable rapid identification of sugarcane planting areas. This study addresses key challenges in remote-sensing-based sugarcane extraction, namely, the difficulty of distinguishing spectrally similar objects, significant background interference, and insufficient multi-scale feature fusion. To significantly enhance the accuracy and robustness of sugarcane identification, an improved RCAU-net model based on the U-net architecture was designed. The model incorporates three key improvements: it replaces the original encoder with ResNet50 residual modules to enhance discrimination of similar crops; it integrates a Convolutional Block Attention Module (CBAM) to focus on critical features and effectively suppress background interference; and it employs an Atrous Spatial Pyramid Pooling (ASPP) module to bridge the encoder and decoder, thereby optimizing the extraction of multi-scale contextual information. A refined extraction framework that accounts for different growth stages was ultimately constructed to achieve rapid identification of sugarcane planting areas in Guangxi. The experimental results demonstrate that the RCAU-net model performed excellently, achieving an Overall Accuracy (OA) of 97.19%, a Mean Intersection over Union (mIoU) of 94.47%, a Precision of 97.31%, and an F1 Score of 97.16%. These results represent significant improvements of 7.20, 10.02, 6.82, and 7.28 percentage points in OA, mIoU, Precision, and F1 Score, respectively, relative to the original U-net. The model also achieved a Kappa coefficient of 0.9419 and a Recall rate of 96.99%. The incorporation of residual structures significantly reduced the misclassification of similar crops, while the CBAM and ASPP modules minimized holes within large continuous patches and false extractions of small patches, resulting in smoother boundaries for the extracted areas. This work provides reliable data support for the accurate calculation of sugarcane planting area and greatly enhances the decision-making value of remote sensing monitoring in modern agricultural management of sugarcane. Full article

► Show Figures

Figure 1

22 pages, 4198 KB

Open AccessArticle

CGHP: Component-Guided Hierarchical Progressive Point Cloud Unsupervised Segmentation Framework

by Shuo Shi, Haifeng Zhao, Wei Gong and Sifu Bi

Remote Sens. 2025, 17(21), 3589; https://doi.org/10.3390/rs17213589 - 30 Oct 2025

Abstract

With the rapid development of airborne LiDAR and photogrammetric techniques, massive amounts of high-resolution 3D point cloud data have become increasingly available. However, extracting meaningful semantic information from such unstructured and noisy point clouds remains a challenging task, particularly in the absence of [...] Read more.

With the rapid development of airborne LiDAR and photogrammetric techniques, massive amounts of high-resolution 3D point cloud data have become increasingly available. However, extracting meaningful semantic information from such unstructured and noisy point clouds remains a challenging task, particularly in the absence of manually annotated labels. We present CGHP, a novel component-guided hierarchical progressive framework that addresses this challenge through a two-stage learning approach. Our method first decomposes point clouds into components using geometric and appearance consistency, constructing comprehensive geometric-appearance descriptors that capture shape, scale, and gravity-aligned distribution information to guide initial feature learning. These component-level features then undergo progressive growth through an adjacency-constrained clustering algorithm that gradually merges components into object-level semantic clusters. Extensive experiments on publicly available point cloud datasets S3DIS and ScanNet++ datasets demonstrate the effectiveness of the proposed method. On the S3DIS dataset, our method achieves state-of-the-art performance, with 48.69% mIoU and 79.68% OA, without using any annotations, closely approaching the results of fully supervised PointNet++ (50.1% mIoU, 77.5% OA). On the more challenging ScanNet++ benchmark, our approach also demonstrates competitive performance in terms of both mAcc and mIoU. Full article

► Show Figures

Figure 1

23 pages, 3485 KB

Open AccessArticle

MMA-Net: A Semantic Segmentation Network for High-Resolution Remote Sensing Images Based on Multimodal Fusion and Multi-Scale Multi-Attention Mechanisms

by Xuanxuan Huang, Xuejie Zhang, Longbao Wang, Dandan Yuan, Shufang Xu, Fengguang Zhou and Zhijun Zhou

Remote Sens. 2025, 17(21), 3572; https://doi.org/10.3390/rs17213572 - 28 Oct 2025

Viewed by 334

Abstract

Semantic segmentation of high-resolution remote sensing images is of great application value in fields like natural disaster monitoring. Current multimodal semantic segmentation methods have improved the model’s ability to recognize different ground objects and complex scenes by integrating multi-source remote sensing data. However, [...] Read more.

Semantic segmentation of high-resolution remote sensing images is of great application value in fields like natural disaster monitoring. Current multimodal semantic segmentation methods have improved the model’s ability to recognize different ground objects and complex scenes by integrating multi-source remote sensing data. However, these methods still face challenges such as blurred boundary segmentation and insufficient perception of multi-scale ground objects when achieving high-precision classification. To address these issues, this paper proposes MMA-Net, a semantic segmentation network enhanced by two key modules: cross-layer multimodal fusion module and multi-scale multi-attention module. These modules effectively improve the model’s ability to capture detailed features and model multi-scale ground objects, thereby enhancing boundary segmentation accuracy, detail feature preservation, and consistency in multi-scale object segmentation. Specifically, the cross-layer multimodal fusion module adopts a staged fusion strategy to integrate detailed information and multimodal features, realizing detail preservation and modal synergy enhancement. The multi-scale multi-attention module combines cross-attention and self-attention to leverage long-range dependencies and inter-modal complementary relationships, strengthening the model’s feature representation for multi-scale ground objects. Experimental results show that MMA-Net outperforms state-of-the-art methods on the Potsdam and Vaihingen datasets. Its mIoU reaches 88.74% and 84.92% on the two datasets, respectively. Ablation experiments further verify that each proposed module contributes to the final performance. Full article

► Show Figures

Graphical abstract

22 pages, 3835 KB

Open AccessArticle

Phenology-Guided Wheat and Corn Identification in Xinjiang: An Improved U-Net Semantic Segmentation Model Using PCA and CBAM-ASPP

by Yang Wei, Xian Guo, Yiling Lu, Hongjiang Hu, Fei Wang, Rongrong Li and Xiaojing Li

Remote Sens. 2025, 17(21), 3563; https://doi.org/10.3390/rs17213563 - 28 Oct 2025

Viewed by 168

Abstract

Wheat and corn are two major food crops in Xinjiang. However, the spectral similarity between these crop types and the complexity of their spatial distribution has posed significant challenges to accurate crop identification. To this end, the study aimed to improve the accuracy [...] Read more.

Wheat and corn are two major food crops in Xinjiang. However, the spectral similarity between these crop types and the complexity of their spatial distribution has posed significant challenges to accurate crop identification. To this end, the study aimed to improve the accuracy of crop distribution identification in complex environments in three ways. First, by analysing the kNDVI and EVI time series, the optimal identification window was determined to be days 156–176—a period when wheat is in the grain-filling to milk-ripening phase and maize is in the jointing to tillering phase—during which, the strongest spectral differences between the two crops occurs. Second, principal component analysis (PCA) was applied to Sentinel-2 data. The top three principal components were extracted to construct the input dataset, effectively integrating visible and near-infrared band information. This approach suppressed redundancy and noise while replacing traditional RGB datasets. Finally, the Convolutional Block Attention Module (CBAM) was integrated into the U-Net model to enhance feature focusing on key crop areas. An improved Atrous Spatial Pyramid Pooling (ASPP) module based on deep separable convolutions was adopted to reduce the computational load while boosting multi-scale context awareness. The experimental results showed the following: (1) Wheat and corn exhibit obvious phenological differences between the 156th and 176th days of the year, which can be used as the optimal time window for identifying their spatial distributions. (2) The method proposed by this research had the best performance, with its mIoU, mPA, F1-score, and overall accuracy (OA) reaching 83.03%, 91.34%, 90.73%, and 90.91%, respectively. Compared to DeeplabV3+, PSPnet, HRnet, Segformer, and U-Net, the OA improved by 5.97%, 4.55%, 2.03%, 8.99%, and 1.5%, respectively. The recognition accuracy of the PCA dataset improved by approximately 2% compared to the RGB dataset. (3) This strategy still had high accuracy when predicting wheat and corn yields in Qitai County, Xinjiang, and had a certain degree of generalisability. In summary, the improved strategy proposed in this study holds considerable application potential for identifying the spatial distribution of wheat and corn in arid regions. Full article

(This article belongs to the Special Issue Advancements in Remote Sensing for Sustainable Agriculture (Second Edition))

► Show Figures

Figure 1

31 pages, 34773 KB

Open AccessArticle

Learning Domain-Invariant Representations for Event-Based Motion Segmentation: An Unsupervised Domain Adaptation Approach

by Mohammed Jeryo and Ahad Harati

J. Imaging 2025, 11(11), 377; https://doi.org/10.3390/jimaging11110377 - 27 Oct 2025

Viewed by 181

Abstract

Event cameras provide microsecond temporal resolution, high dynamic range, and low latency by asynchronously capturing per-pixel luminance changes, thereby introducing a novel sensing paradigm. These advantages render them well-suited for high-speed applications such as autonomous vehicles and dynamic environments. Nevertheless, the sparsity of [...] Read more.

Event cameras provide microsecond temporal resolution, high dynamic range, and low latency by asynchronously capturing per-pixel luminance changes, thereby introducing a novel sensing paradigm. These advantages render them well-suited for high-speed applications such as autonomous vehicles and dynamic environments. Nevertheless, the sparsity of event data and the absence of dense annotations are significant obstacles to supervised learning for motion segmentation from event streams. Domain adaptation is also challenging due to the considerable domain shift in intensity images. To address these challenges, we propose a two-phase cross-modality adaptation framework that translates motion segmentation knowledge from labeled RGB-flow data to unlabeled event streams. A dual-branch encoder extracts modality-specific motion and appearance features from RGB and optical flow in the source domain. Using reconstruction networks, event voxel grids are converted into pseudo-image and pseudo-flow modalities in the target domain. These modalities are subsequently re-encoded using frozen RGB-trained encoders. Multi-level consistency losses are implemented on features, predictions, and outputs to enforce domain alignment. Our design enables the model to acquire domain-invariant, semantically rich features through the use of shallow architectures, thereby reducing training costs and facilitating real-time inference with a lightweight prediction path. The proposed architecture, alongside the utilized hybrid loss function, effectively bridges the domain and modality gap. We evaluate our method on two challenging benchmarks: EVIMO2, which incorporates real-world dynamics, high-speed motion, illumination variation, and multiple independently moving objects; and MOD++, which features complex object dynamics, collisions, and dense 1kHz supervision in synthetic scenes. The proposed UDA framework achieves 83.1% and 79.4% accuracy on EVIMO2 and MOD++, respectively, outperforming existing state-of-the-art approaches, such as EV-Transfer and SHOT, by up to 3.6%. Additionally, it is lighter and faster and also delivers enhanced mIoU and F1 Score. Full article

(This article belongs to the Section Image and Video Processing)

► Show Figures

Figure 1

17 pages, 16728 KB

Open AccessArticle

Semantic and Sketch-Guided Diffusion Model for Fine-Grained Restoration of Damaged Ancient Paintings

by Li Zhao, Yingzhi Chen, Guangqi Du and Xiaojun Wu

Electronics 2025, 14(21), 4187; https://doi.org/10.3390/electronics14214187 - 27 Oct 2025

Viewed by 319

Abstract

Ancient paintings, as invaluable cultural heritage, often suffer from damages like creases, mold, and missing regions. Current restoration methods, while effective for natural images, struggle with the fine-grained control required for ancient paintings’ artistic styles and brushstroke patterns. We propose the Semantic and [...] Read more.

Ancient paintings, as invaluable cultural heritage, often suffer from damages like creases, mold, and missing regions. Current restoration methods, while effective for natural images, struggle with the fine-grained control required for ancient paintings’ artistic styles and brushstroke patterns. We propose the Semantic and Sketch-Guided Restoration (SSGR) framework, which uses pixel-level semantic maps to restore missing and mold-affected areas and depth-aware sketch maps to ensure texture continuity in creased regions. The sketch maps are automatically extracted using advanced methods that preserve original brushstroke styles while conveying geometry and semantics. SSGR employs a semantic segmentation network to categorize painting regions and depth-sensitive sketch extraction to guide a diffusion model. To enhance style controllability, we cluster diverse attributes of landscape paintings and incorporate a Semantic-Sketch-Attribute-Normalization (SSAN) block that explores consistent patterns across styles through spatial semantic and attribute-adaptive normalization modules. Evaluated on the CLP-2K dataset, SSGR achieves an mIoU of 53.30%, SSIM of 0.42, and PSNR of 13.11, outperforming state-of-the-art methods. This approach not only preserves historical aesthetics but also advances digital heritage preservation with a tailored, controllable technique for ancient paintings. Full article

(This article belongs to the Special Issue Artificial Intelligence for Smart Image Perception, Recognition and Understanding)

► Show Figures

Figure 1

20 pages, 13884 KB

Open AccessArticle

Prototype-Guided Zero-Shot Medical Image Segmentation with Large Vision-Language Models

by Huong Pham and Samuel Cheng

Appl. Sci. 2025, 15(21), 11441; https://doi.org/10.3390/app152111441 - 26 Oct 2025

Viewed by 414

Abstract

Building on advances in promptable segmentation models, this work introduces a framework that integrates Large Vision-Language Model (LVLM) bounding box priors with prototype-based region of interest (ROI) selection to improve zero-shot medical image segmentation. Unlike prior methods such as SaLIP, which often misidentify [...] Read more.

Building on advances in promptable segmentation models, this work introduces a framework that integrates Large Vision-Language Model (LVLM) bounding box priors with prototype-based region of interest (ROI) selection to improve zero-shot medical image segmentation. Unlike prior methods such as SaLIP, which often misidentify regions due to reliance on text–image CLIP similarity, the proposed approach leverages visual prototypes to mitigate language bias and enhance ROI ranking, resulting in more accurate segmentation. Bounding box estimation is further strengthened through systematic prompt engineering to optimize LVLM performance across diverse datasets and imaging modalities. Evaluation was conducted on three publicly available benchmark datasets—CC359 (brain MRI), HC18 (fetal head ultrasound), and CXRMAL (chest X-ray)—without any task-specific fine-tuning. The proposed method achieved substantial improvements over prior approaches. On CC359, it reached a Dice score of 0.95 ± 0.06 and a mean Intersection-over-Union (mIoU) of 0.91 ± 0.10. On HC18, it attained a Dice score of 0.82 ± 0.20 and mIoU of 0.74 ± 0.22. On CXRMAL, the model achieved a Dice score of 0.90 ± 0.08 and mIoU of 0.83 ± 0.12. These standard deviations reflect variability across test images within each dataset, indicating the robustness of the proposed zero-shot framework. These results demonstrate that integrating LVLM-derived bounding box priors with prototype-based selection substantially advances zero-shot medical image segmentation. Full article

(This article belongs to the Special Issue Advances and Applications of Generative AI: Bridging Theory and Practice)

► Show Figures

Figure 1

15 pages, 2225 KB

Open AccessFeature PaperArticle

An Automatic Pixel-Level Segmentation Method for Coal-Crack CT Images Based on U²-Net

by Yimin Zhang, Chengyi Wu, Jinxia Yu, Guoqiang Wang and Yingying Li

Electronics 2025, 14(21), 4179; https://doi.org/10.3390/electronics14214179 - 26 Oct 2025

Viewed by 223

Abstract

Automatically segmenting coal cracks in CT images is crucial for 3D reconstruction and the physical properties of mines. This paper proposes an automatic pixel-level deep learning method called Attention Double U²-Net to enhance the segmentation accuracy of coal cracks in CT [...] Read more.

Automatically segmenting coal cracks in CT images is crucial for 3D reconstruction and the physical properties of mines. This paper proposes an automatic pixel-level deep learning method called Attention Double U²-Net to enhance the segmentation accuracy of coal cracks in CT images. Due to the lack of public datasets of coal CT images, a pixel-level labeled coal crack dataset is first established through industrial CT scanning experiments and post-processing. Then, the proposed method utilizes a Double Residual U-Block structure (DRSU) based on U²-Net to improve feature extraction and fusion capabilities. Moreover, an attention mechanism module is proposed, which is called Atrous Asymmetric Fusion Non-Local Block (AAFNB). The AAFNB module is based on the idea of Asymmetric Non-Local, which enables the collection of global information to enhance the segmentation results. Compared with previous state-of-the-art models, the proposed Attention Double U²-Net method exhibits better performance over the coal crack CT image dataset in various evaluation metrics such as PA, mPA, MIoU, IoU, Precision, Recall, and Dice scores. The crack segmentation results obtained from this method are more accurate and efficient, which provides experimental data and theoretical support to the field of CBM exploration and damage of coal. Full article

(This article belongs to the Section Artificial Intelligence)

► Show Figures

Figure 1

22 pages, 6682 KB

Open AccessArticle

Multimodal Fire Salient Object Detection for Unregistered Data in Real-World Scenarios

by Ning Sun, Jianmeng Zhou, Kai Hu, Chen Wei, Zihao Wang and Lipeng Song

Fire 2025, 8(11), 415; https://doi.org/10.3390/fire8110415 - 26 Oct 2025

Viewed by 659

Abstract

In real-world fire scenarios, complex lighting conditions and smoke interference significantly challenge the accuracy and robustness of traditional fire detection systems. Fusion of complementary modalities, such as visible light (RGB) and infrared (IR), is essential to enhance detection robustness. However, spatial shifts and [...] Read more.

In real-world fire scenarios, complex lighting conditions and smoke interference significantly challenge the accuracy and robustness of traditional fire detection systems. Fusion of complementary modalities, such as visible light (RGB) and infrared (IR), is essential to enhance detection robustness. However, spatial shifts and geometric distortions occur in multi-modal image pairs collected by multi-source sensors due to installation deviations and inconsistent intrinsic parameters. Existing multi-modal fire detection frameworks typically depend on pre-registered data, which struggles to handle modal misalignment in practical deployment. To overcome this limitation, we propose an end-to-end multi-modal Fire Salient Object Detection framework capable of dynamically fusing cross-modal features without pre-registration. Specifically, the Channel Cross-enhancement Module (CCM) facilitates semantic interaction across modalities in salient regions, suppressing noise from spatial misalignment. The Deformable Alignment Module (DAM) achieves adaptive correction of geometric deviations through cascaded deformation compensation and dynamic offset learning. For validation, we constructed an unregistered indoor fire dataset (Indoor-Fire) covering common fire scenarios. Generalizability was further evaluated on an outdoor dataset (RGB-T Wildfire). To fully validate the effectiveness of the method in complex building fire scenarios, we conducted experiments using the Fire in historic buildings (Fire in historic buildings) dataset. Experimental results demonstrate that the F1-score reaches 83% on both datasets, with the IoU maintained above 70%. Notably, while maintaining high accuracy, the number of parameters (91.91 M) is only 28.1% of the second-best SACNet (327 M). This method provides a robust solution for unaligned or weakly aligned modal fusion caused by sensor differences and is highly suitable for deployment in intelligent firefighting systems. Full article

► Show Figures

Figure 1

30 pages, 7695 KB

Open AccessArticle

RTUAV-YOLO: A Family of Efficient and Lightweight Models for Real-Time Object Detection in UAV Aerial Imagery

by Ruizhi Zhang, Jinghua Hou, Le Li, Ke Zhang, Li Zhao and Shuo Gao

Sensors 2025, 25(21), 6573; https://doi.org/10.3390/s25216573 - 25 Oct 2025

Viewed by 739

Abstract

Real-time object detection in Unmanned Aerial Vehicle (UAV) imagery is critical yet challenging, requiring high accuracy amidst complex scenes with multi-scale and small objects, under stringent onboard computational constraints. While existing methods struggle to balance accuracy and efficiency, we propose RTUAV-YOLO, a family [...] Read more.

Real-time object detection in Unmanned Aerial Vehicle (UAV) imagery is critical yet challenging, requiring high accuracy amidst complex scenes with multi-scale and small objects, under stringent onboard computational constraints. While existing methods struggle to balance accuracy and efficiency, we propose RTUAV-YOLO, a family of lightweight models based on YOLOv11 tailored for UAV real-time object detection. First, to mitigate the feature imbalance and progressive information degradation of small objects in current architectures multi-scale processing, we developed a Multi-Scale Feature Adaptive Modulation module (MSFAM) that enhances small-target feature extraction capabilities through adaptive weight generation mechanisms and dual-pathway heterogeneous feature aggregation. Second, to overcome the limitations in contextual information acquisition exhibited by current architectures in complex scene analysis, we propose a Progressive Dilated Separable Convolution Module (PDSCM) that achieves effective aggregation of multi-scale target contextual information through continuous receptive field expansion. Third, to preserve fine-grained spatial information of small objects during feature map downsampling operations, we engineered a Lightweight DownSampling Module (LDSM) to replace the traditional convolutional module. Finally, to rectify the insensitivity of current Intersection over Union (IoU) metrics toward small objects, we introduce the Minimum Point Distance Wise IoU (MPDWIoU) loss function, which enhances small-target localization precision through the integration of distance-aware penalty terms and adaptive weighting mechanisms. Comprehensive experiments on the VisDrone2019 dataset show that RTUAV-YOLO achieves an average improvement of 3.4% and 2.4% in mAP50 and mAP50-95, respectively, compared to the baseline model, while reducing the number of parameters by 65.3%. Its generalization capability for UAV object detection is further validated on the UAVDT and UAVVaste datasets. The proposed model is deployed on a typical airborne platform, Jetson Orin Nano, providing an effective solution for real-time object detection scenarios in actual UAVs. Full article

(This article belongs to the Special Issue Image Processing and Analysis for Object Detection: 3rd Edition)

► Show Figures

Figure 1

27 pages, 4104 KB

Open AccessArticle

CropCLR-Wheat: A Label-Efficient Contrastive Learning Architecture for Lightweight Wheat Pest Detection

by Yan Wang, Chengze Li, Chenlu Jiang, Mingyu Liu, Shengzhe Xu, Binghua Yang and Min Dong

Insects 2025, 16(11), 1096; https://doi.org/10.3390/insects16111096 - 25 Oct 2025

Viewed by 874

Abstract

To address prevalent challenges in field-based wheat pest recognition—namely, viewpoint perturbations, sample scarcity, and heterogeneous data distributions—a pest identification framework named CropCLR-Wheat is proposed, which integrates self-supervised contrastive learning with an attention-enhanced mechanism. By incorporating a viewpoint-invariant feature encoder and a diffusion-based feature [...] Read more.

To address prevalent challenges in field-based wheat pest recognition—namely, viewpoint perturbations, sample scarcity, and heterogeneous data distributions—a pest identification framework named CropCLR-Wheat is proposed, which integrates self-supervised contrastive learning with an attention-enhanced mechanism. By incorporating a viewpoint-invariant feature encoder and a diffusion-based feature filtering module, the model significantly enhances pest damage localization and feature consistency, enabling high-accuracy recognition under limited-sample conditions. In 5-shot classification tasks, CropCLR-Wheat achieves a precision of 89.4%, a recall of 87.1%, and an accuracy of 88.2%; these metrics further improve to 92.3%, 90.5%, and 91.2%, respectively, under the 10-shot setting. In the semantic segmentation of wheat pest damage regions, the model attains a mean intersection over union (mIoU) of 82.7%, with precision and recall reaching 85.2% and 82.4%, respectively, markedly outperforming advanced models such as SegFormer and Mask R-CNN. In robustness evaluation under viewpoint disturbances, a prediction consistency rate of 88.7%, a confidence variation of only 7.8%, and a prediction consistency score (PCS) of 0.914 are recorded, indicating strong stability and adaptability. Deployment results further demonstrate the framework’s practical viability: on the Jetson Nano device, an inference latency of 84 ms, a frame rate of 11.9 FPS, and an accuracy of 88.2% are achieved. These results confirm the efficiency of the proposed approach in edge computing environments. By balancing generalization performance with deployability, the proposed method provides robust support for intelligent agricultural terminal systems and holds substantial potential for wide-scale application. Full article

(This article belongs to the Special Issue Artificial Intelligence (AI) and Insect Pests Management: Securing Food Security, Human Health, and Natural Resources)

► Show Figures

Figure 1

23 pages, 11997 KB

Open AccessArticle

Deep Learning-Driven Automatic Segmentation of Weeds and Crops in UAV Imagery

by Jianghan Tao, Qian Qiao, Jian Song, Shan Sun, Yijia Chen, Qingyang Wu, Yongying Liu, Feng Xue, Hao Wu and Fan Zhao

Sensors 2025, 25(21), 6576; https://doi.org/10.3390/s25216576 - 25 Oct 2025

Viewed by 256

Abstract

Accurate segmentation of crops and weeds is essential for enhancing crop yield, optimizing herbicide usage, and mitigating environmental impacts. Traditional weed management practices, such as manual weeding or broad-spectrum herbicide application, are labor-intensive, environmentally harmful, and economically inefficient. In response, this study introduces [...] Read more.

Accurate segmentation of crops and weeds is essential for enhancing crop yield, optimizing herbicide usage, and mitigating environmental impacts. Traditional weed management practices, such as manual weeding or broad-spectrum herbicide application, are labor-intensive, environmentally harmful, and economically inefficient. In response, this study introduces a novel precision agriculture framework integrating Unmanned Aerial Vehicle (UAV)-based remote sensing with advanced deep learning techniques, combining Super-Resolution Reconstruction (SRR) and semantic segmentation. This study is the first to integrate UAV-based SRR and semantic segmentation for tobacco fields, systematically evaluate recent Transformer and Mamba-based models alongside traditional CNNs, and release an annotated dataset that not only ensures reproducibility but also provides a resource for the research community to develop and benchmark future models. Initially, SRR enhanced the resolution of low-quality UAV imagery, significantly improving detailed feature extraction. Subsequently, to identify the optimal segmentation model for the proposed framework, semantic segmentation models incorporating CNN, Transformer, and Mamba architectures were used to differentiate crops from weeds. Among evaluated SRR methods, RCAN achieved the optimal reconstruction performance, reaching a Peak Signal-to-Noise Ratio (PSNR) of 24.98 dB and a Structural Similarity Index (SSIM) of 69.48%. In semantic segmentation, the ensemble model integrating Transformer (DPT with DINOv2) and Mamba-based architectures achieved the highest mean Intersection over Union (mIoU) of 90.75%, demonstrating superior robustness across diverse field conditions. Additionally, comprehensive experiments quantified the impact of magnification factors, Gaussian blur, and Gaussian noise, identifying an optimal magnification factor of 4

\times

, proving that the method was robust to common environmental disturbances at optimal parameters. Overall, this research established an efficient, precise framework for crop cultivation management, offering valuable insights for precision agriculture and sustainable farming practices. Full article

(This article belongs to the Special Issue Smart Sensing and Control for Autonomous Intelligent Unmanned Systems)

► Show Figures

Figure 1

22 pages, 9453 KB

Open AccessArticle

A Hybrid YOLO and Segment Anything Model Pipeline for Multi-Damage Segmentation in UAV Inspection Imagery

by Rafael Cabral, Ricardo Santos, José A. F. O. Correia and Diogo Ribeiro

Sensors 2025, 25(21), 6568; https://doi.org/10.3390/s25216568 - 25 Oct 2025

Viewed by 547

Abstract

The automated inspection of civil infrastructure with Unmanned Aerial Vehicles (UAVs) is hampered by the challenge of accurately segmenting multi-damage in high-resolution imagery. While foundational models like the Segment Anything Model (SAM) offer data-efficient segmentation, their effectiveness is constrained by prompting strategies, especially [...] Read more.

The automated inspection of civil infrastructure with Unmanned Aerial Vehicles (UAVs) is hampered by the challenge of accurately segmenting multi-damage in high-resolution imagery. While foundational models like the Segment Anything Model (SAM) offer data-efficient segmentation, their effectiveness is constrained by prompting strategies, especially for geometrically complex defects. This paper presents a comprehensive comparative analysis of deep learning strategies to identify an optimal deep learning pipeline for segmenting cracks, efflorescences, and exposed rebars. It systematically evaluates three distinct end-to-end segmentation frameworks: the native output of a YOLO11 model; the Segment Anything Model (SAM), prompted by bounding boxes; and SAM, guided by a point-prompting mechanism derived from the detector’s probability map. Based on these findings, a final, optimized hybrid pipeline is proposed: for linear cracks, the native segmentation output of the SAHI-trained YOLO model is used, while for efflorescence and exposed rebar, the model’s bounding boxes are used to prompt SAM for a refined segmentation. This class-specific strategy yielded a final mean Average Precision (mAP50) of 0.593, with class-specific Intersection over Union (IoU) scores of 0.495 (cracks), 0.331 (efflorescence), and 0.205 (exposed rebar). The results establish that the future of automated inspection lies in intelligent frameworks that leverage the respective strengths of specialized detectors and powerful foundation models in a context-aware manner. Full article

(This article belongs to the Special Issue Intelligent Sensors and Artificial Intelligence in Building)

► Show Figures

Figure 1

Search Results (1,505)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (1,505)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI