MDPI - Publisher of Open Access Journals

21 pages, 6892 KiB

Open AccessArticle

Enhanced Temporal Action Localization with Separated Bidirectional Mamba and Boundary Correction Strategy

by Xiangbin Liu and Qian Peng

Mathematics 2025, 13(15), 2458; https://doi.org/10.3390/math13152458 - 30 Jul 2025

Viewed by 182

Temporal action localization (TAL) is a research hotspot in video understanding, which aims to locate and classify actions in videos. However, existing methods have difficulties in capturing long-term actions due to focusing on local temporal information, which leads to poor performance in localizing [...] Read more.

Temporal action localization (TAL) is a research hotspot in video understanding, which aims to locate and classify actions in videos. However, existing methods have difficulties in capturing long-term actions due to focusing on local temporal information, which leads to poor performance in localizing long-term temporal sequences. In addition, most methods ignore the boundary importance for action instances, resulting in inaccurate localized boundaries. To address these issues, this paper proposes a state space model for temporal action localization, called Separated Bidirectional Mamba (SBM), which innovatively understands frame changes from the perspective of state transformation. It adapts to different sequence lengths and incorporates state information from the forward and backward for each frame through forward Mamba and backward Mamba to obtain more comprehensive action representations, enhancing modeling capabilities for long-term temporal sequences. Moreover, this paper designs a Boundary Correction Strategy (BCS). It calculates the contribution of each frame to action instances based on the pre-localized results, then adjusts weights of frames in boundary regression to ensure the boundaries are shifted towards the frames with higher contributions, leading to more accurate boundaries. To demonstrate the effectiveness of the proposed method, this paper reports mean Average Precision (mAP) under temporal Intersection over Union (tIoU) thresholds on four challenging benchmarks: THUMOS13, ActivityNet-1.3, HACS, and FineAction, where the proposed method achieves mAPs of 73.7%, 42.0%, 45.2%, and 29.1%, respectively, surpassing the state-of-the-art approaches. Full article

(This article belongs to the Special Issue Advances in Applied Mathematics in Computer Vision)

► Show Figures

Figure 1

27 pages, 6715 KiB

Open AccessArticle

Structural Component Identification and Damage Localization of Civil Infrastructure Using Semantic Segmentation

by Piotr Tauzowski, Mariusz Ostrowski, Dominik Bogucki, Piotr Jarosik and Bartłomiej Błachowski

Sensors 2025, 25(15), 4698; https://doi.org/10.3390/s25154698 - 30 Jul 2025

Viewed by 198

Abstract

Visual inspection of civil infrastructure for structural health assessment, as performed by structural engineers, is expensive and time-consuming. Therefore, automating this process is highly attractive, which has received significant attention in recent years. With the increasing capabilities of computers, deep neural networks have [...] Read more.

Visual inspection of civil infrastructure for structural health assessment, as performed by structural engineers, is expensive and time-consuming. Therefore, automating this process is highly attractive, which has received significant attention in recent years. With the increasing capabilities of computers, deep neural networks have become a standard tool and can be used for structural health inspections. A key challenge, however, is the availability of reliable datasets. In this work, the U-net and DeepLab v3+ convolutional neural networks are trained on a synthetic Tokaido dataset. This dataset comprises images representative of data acquired by unmanned aerial vehicle (UAV) imagery and corresponding ground truth data. The data includes semantic segmentation masks for both categorizing structural elements (slabs, beams, and columns) and assessing structural damage (concrete spalling or exposed rebars). Data augmentation, including both image quality degradation (e.g., brightness modification, added noise) and image transformations (e.g., image flipping), is applied to the synthetic dataset. The selected neural network architectures achieve excellent performance, reaching values of 97% for accuracy and 87% for Mean Intersection over Union (mIoU) on the validation data. It also demonstrates promising results in the semantic segmentation of real-world structures captured in photographs, despite being trained solely on synthetic data. Additionally, based on the obtained results of semantic segmentation, it can be concluded that DeepLabV3+ outperforms U-net in structural component identification. However, this is not the case in the damage identification task. Full article

(This article belongs to the Special Issue AI-Assisted Condition Monitoring and Fault Diagnosis)

► Show Figures

Figure 1

16 pages, 5245 KiB

Open AccessArticle

Automatic Detection of Foraging Hens in a Cage-Free Environment with Computer Vision Technology

by Samin Dahal, Xiao Yang, Bidur Paneru, Anjan Dhungana and Lilong Chai

Poultry 2025, 4(3), 34; https://doi.org/10.3390/poultry4030034 - 30 Jul 2025

Viewed by 125

Abstract

Foraging behavior in hens is an important indicator of animal welfare. It involves both the search for food and exploration of the environment, which provides necessary enrichment. In addition, it has been inversely linked to damaging behaviors such as severe feather pecking. Conventional [...] Read more.

Foraging behavior in hens is an important indicator of animal welfare. It involves both the search for food and exploration of the environment, which provides necessary enrichment. In addition, it has been inversely linked to damaging behaviors such as severe feather pecking. Conventional studies rely on manual observation to investigate foraging location, duration, timing, and frequency. However, this approach is labor-intensive, time-consuming, and subject to human bias. Our study developed computer vision-based methods to automatically detect foraging hens in a cage-free research environment and compared their performance. A cage-free room was divided into four pens, two larger pens measuring 2.9 m × 2.3 m with 30 hens each and two smaller pens measuring 2.3 m × 1.8 m with 18 hens each. Cameras were positioned vertically, 2.75 m above the floor, recording the videos at 15 frames per second. Out of 4886 images, 70% were used for model training, 20% for validation, and 10% for testing. We trained multiple You Only Look Once (YOLO) object detection models from YOLOv9, YOLOv10, and YOLO11 series for 100 epochs each. All the models achieved precision, recall, and mean average precision at 0.5 intersection over union (mAP@0.5) above 75%. YOLOv9c achieved the highest precision (83.9%), YOLO11x achieved the highest recall (86.7%), and YOLO11m achieved the highest mAP@0.5 (89.5%). These results demonstrate the use of computer vision to automatically detect complex poultry behavior, such as foraging, making it more efficient. Full article

► Show Figures

Figure 1

20 pages, 19642 KiB

Open AccessArticle

SIRI-MOGA-UNet: A Synergistic Framework for Subsurface Latent Damage Detection in ‘Korla’ Pears via Structured-Illumination Reflectance Imaging and Multi-Order Gated Attention

by Baishao Zhan, Jiawei Liao, Hailiang Zhang, Wei Luo, Shizhao Wang, Qiangqiang Zeng and Yongxian Lai

Spectrosc. J. 2025, 3(3), 22; https://doi.org/10.3390/spectroscj3030022 - 29 Jul 2025

Viewed by 125

Abstract

Bruising in ‘Korla’ pears represents a prevalent phenomenon that leads to progressive fruit decay and substantial economic losses. The detection of early-stage bruising proves challenging due to the absence of visible external characteristics, and existing deep learning models have limitations in weak feature [...] Read more.

Bruising in ‘Korla’ pears represents a prevalent phenomenon that leads to progressive fruit decay and substantial economic losses. The detection of early-stage bruising proves challenging due to the absence of visible external characteristics, and existing deep learning models have limitations in weak feature extraction under complex optical interference. To address the postharvest latent damage detection challenges in ‘Korla’ pears, this study proposes a collaborative detection framework integrating structured-illumination reflectance imaging (SIRI) with multi-order gated attention mechanisms. Initially, an SIRI optical system was constructed, employing 150 cycles·m⁻¹ spatial frequency modulation and a three-phase demodulation algorithm to extract subtle interference signal variations, thereby generating RT (Relative Transmission) images with significantly enhanced contrast in subsurface damage regions. To improve the detection accuracy of latent damage areas, the MOGA-UNet model was developed with three key innovations: 1. Integrate the lightweight VGG16 encoder structure into the feature extraction network to improve computational efficiency while retaining details. 2. Add a multi-order gated aggregation module at the end of the encoder to realize the fusion of features at different scales through a special convolution method. 3. Embed the channel attention mechanism in the decoding stage to dynamically enhance the weight of feature channels related to damage. Experimental results demonstrate that the proposed model achieves 94.38% mean Intersection over Union (mIoU) and 97.02% Dice coefficient on RT images, outperforming the baseline UNet model by 2.80% with superior segmentation accuracy and boundary localization capabilities compared with mainstream models. This approach provides an efficient and reliable technical solution for intelligent postharvest agricultural product sorting. Full article

► Show Figures

Figure 1

31 pages, 103100 KiB

Open AccessArticle

Semantic Segmentation of Small Target Diseases on Tobacco Leaves

by Yanze Zou, Zhenping Qiang, Shuang Zhang and Hong Lin

Agronomy 2025, 15(8), 1825; https://doi.org/10.3390/agronomy15081825 - 28 Jul 2025

Viewed by 210

Abstract

The application of image recognition technology plays a vital role in agricultural disease identification. Existing approaches primarily rely on image classification, object detection, or semantic segmentation. However, a major challenge in current semantic segmentation methods lies in accurately identifying small target objects. In [...] Read more.

The application of image recognition technology plays a vital role in agricultural disease identification. Existing approaches primarily rely on image classification, object detection, or semantic segmentation. However, a major challenge in current semantic segmentation methods lies in accurately identifying small target objects. In this study, common tobacco leaf diseases—such as frog-eye disease, climate spots, and wildfire disease—are characterized by small lesion areas, with an average target size of only 32 pixels. This poses significant challenges for existing techniques to achieve precise segmentation. To address this issue, we propose integrating two attention mechanisms, namely cross-feature map attention and dual-branch attention, which are incorporated into the semantic segmentation network to enhance performance on small lesion segmentation. Moreover, considering the lack of publicly available datasets for tobacco leaf disease segmentation, we constructed a training dataset via image splicing. Extensive experiments were conducted on baseline segmentation models, including UNet, DeepLab, and HRNet. Experimental results demonstrate that the proposed method improves the mean Intersection over Union (mIoU) by 4.75% on the constructed dataset, with only a 15.07% increase in computational cost. These results validate the effectiveness of our novel attention-based strategy in the specific context of tobacco leaf disease segmentation. Full article

(This article belongs to the Section Pest and Disease Management)

► Show Figures

Figure 1

25 pages, 2518 KiB

Open AccessArticle

An Efficient Semantic Segmentation Framework with Attention-Driven Context Enhancement and Dynamic Fusion for Autonomous Driving

by Jia Tian, Peizeng Xin, Xinlu Bai, Zhiguo Xiao and Nianfeng Li

Appl. Sci. 2025, 15(15), 8373; https://doi.org/10.3390/app15158373 - 28 Jul 2025

Viewed by 274

Abstract

In recent years, a growing number of real-time semantic segmentation networks have been developed to improve segmentation accuracy. However, these advancements often come at the cost of increased computational complexity, which limits their inference efficiency, particularly in scenarios such as autonomous driving, where [...] Read more.

In recent years, a growing number of real-time semantic segmentation networks have been developed to improve segmentation accuracy. However, these advancements often come at the cost of increased computational complexity, which limits their inference efficiency, particularly in scenarios such as autonomous driving, where strict real-time performance is essential. Achieving an effective balance between speed and accuracy has thus become a central challenge in this field. To address this issue, we present a lightweight semantic segmentation model tailored for the perception requirements of autonomous vehicles. The architecture follows an encoder–decoder paradigm, which not only preserves the capability for deep feature extraction but also facilitates multi-scale information integration. The encoder leverages a high-efficiency backbone, while the decoder introduces a dynamic fusion mechanism designed to enhance information interaction between different feature branches. Recognizing the limitations of convolutional networks in modeling long-range dependencies and capturing global semantic context, the model incorporates an attention-based feature extraction component. This is further augmented by positional encoding, enabling better awareness of spatial structures and local details. The dynamic fusion mechanism employs an adaptive weighting strategy, adjusting the contribution of each feature channel to reduce redundancy and improve representation quality. To validate the effectiveness of the proposed network, experiments were conducted on a single RTX 3090 GPU. The Dynamic Real-time Integrated Vision Encoder–Segmenter Network (DriveSegNet) achieved a mean Intersection over Union (mIoU) of 76.9% and an inference speed of 70.5 FPS on the Cityscapes test dataset, 74.6% mIoU and 139.8 FPS on the CamVid test dataset, and 35.8% mIoU with 108.4 FPS on the ADE20K dataset. The experimental results demonstrate that the proposed method achieves an excellent balance between inference speed, segmentation accuracy, and model size. Full article

► Show Figures

Figure 1

21 pages, 5527 KiB

Open AccessArticle

SGNet: A Structure-Guided Network with Dual-Domain Boundary Enhancement and Semantic Fusion for Skin Lesion Segmentation

by Haijiao Yun, Qingyu Du, Ziqing Han, Mingjing Li, Le Yang, Xinyang Liu, Chao Wang and Weitian Ma

Sensors 2025, 25(15), 4652; https://doi.org/10.3390/s25154652 - 27 Jul 2025

Viewed by 278

Abstract

Segmentation of skin lesions in dermoscopic images is critical for the accurate diagnosis of skin cancers, particularly malignant melanoma, yet it is hindered by irregular lesion shapes, blurred boundaries, low contrast, and artifacts, such as hair interference. Conventional deep learning methods, typically based [...] Read more.

Segmentation of skin lesions in dermoscopic images is critical for the accurate diagnosis of skin cancers, particularly malignant melanoma, yet it is hindered by irregular lesion shapes, blurred boundaries, low contrast, and artifacts, such as hair interference. Conventional deep learning methods, typically based on UNet or Transformer architectures, often face limitations in regard to fully exploiting lesion features and incur high computational costs, compromising precise lesion delineation. To overcome these challenges, we propose SGNet, a structure-guided network, integrating a hybrid CNN–Mamba framework for robust skin lesion segmentation. The SGNet employs the Visual Mamba (VMamba) encoder to efficiently extract multi-scale features, followed by the Dual-Domain Boundary Enhancer (DDBE), which refines boundary representations and suppresses noise through spatial and frequency-domain processing. The Semantic-Texture Fusion Unit (STFU) adaptively integrates low-level texture with high-level semantic features, while the Structure-Aware Guidance Module (SAGM) generates coarse segmentation maps to provide global structural guidance. The Guided Multi-Scale Refiner (GMSR) further optimizes boundary details through a multi-scale semantic attention mechanism. Comprehensive experiments based on the ISIC2017, ISIC2018, and PH2 datasets demonstrate SGNet’s superior performance, with average improvements of 3.30% in terms of the mean Intersection over Union (mIoU) value and 1.77% in regard to the Dice Similarity Coefficient (DSC) compared to state-of-the-art methods. Ablation studies confirm the effectiveness of each component, highlighting SGNet’s exceptional accuracy and robust generalization for computer-aided dermatological diagnosis. Full article

(This article belongs to the Section Biomedical Sensors)

► Show Figures

Figure 1

21 pages, 3463 KiB

Open AccessArticle

Apple Rootstock Cutting Drought-Stress-Monitoring Model Based on IMYOLOv11n-Seg

by Xu Wang, Hongjie Liu, Pengfei Wang, Long Gao and Xin Yang

Agriculture 2025, 15(15), 1598; https://doi.org/10.3390/agriculture15151598 - 24 Jul 2025

Viewed by 270

Abstract

To ensure the normal water status of apple rootstock softwood cuttings during the initial stage of cutting, a drought stress monitoring model was designed. The model is optimized based on the YOLOv11n-seg instance segmentation model, using the leaf curl degree of cuttings as [...] Read more.

To ensure the normal water status of apple rootstock softwood cuttings during the initial stage of cutting, a drought stress monitoring model was designed. The model is optimized based on the YOLOv11n-seg instance segmentation model, using the leaf curl degree of cuttings as the classification basis for drought-stress grades. The backbone structure of the IMYOLOv11n-seg model is improved by the C3K2_CMUNeXt module and the multi-head self-attention (MHSA) mechanism module. The neck part is optimized by the KFHA module (Kalman filter and Hungarian algorithm model), and the head part enhances post-processing effects through HIoU-SD (hierarchical IoU–spatial distance filtering algorithm). The IMYOLOv11-seg model achieves an average inference speed of 33.53 FPS (frames per second) and the mean intersection over union (MIoU) value of 0.927. The average recognition accuracies for cuttings under normal water status, mild drought stress, moderate drought stress, and severe drought stress are 94.39%, 93.27%, 94.31%, and 94.71%, respectively. The IMYOLOv11n-seg model demonstrates the best comprehensive performance in ablation and comparative experiments. The automatic humidification system equipped with the IMYOLOv11n-seg model saves 6.14% more water than the labor group. This study provides a design approach for an automatic humidification system in protected agriculture during apple rootstock cutting propagation. Full article

(This article belongs to the Section Artificial Intelligence and Digital Agriculture)

► Show Figures

Figure 1

23 pages, 9603 KiB

Open AccessArticle

Label-Efficient Fine-Tuning for Remote Sensing Imagery Segmentation with Diffusion Models

by Yiyun Luo, Jinnian Wang, Jean Sequeira, Xiankun Yang, Dakang Wang, Jiabin Liu, Grekou Yao and Sébastien Mavromatis

Remote Sens. 2025, 17(15), 2579; https://doi.org/10.3390/rs17152579 - 24 Jul 2025

Viewed by 207

Abstract

High-resolution remote sensing imagery plays an essential role in urban management and environmental monitoring, providing detailed insights for applications ranging from land cover mapping to disaster response. Semantic segmentation methods are among the most effective techniques for comprehensive land cover mapping, and they [...] Read more.

High-resolution remote sensing imagery plays an essential role in urban management and environmental monitoring, providing detailed insights for applications ranging from land cover mapping to disaster response. Semantic segmentation methods are among the most effective techniques for comprehensive land cover mapping, and they commonly employ ImageNet-based pre-training semantics. However, traditional fine-tuning processes exhibit poor transferability across different downstream tasks and require large amounts of labeled data. To address these challenges, we introduce Denoising Diffusion Probabilistic Models (DDPMs) as a generative pre-training approach for semantic features extraction in remote sensing imagery. We pre-trained a DDPM on extensive unlabeled imagery, obtaining features at multiple noise levels and resolutions. In order to integrate and optimize these features efficiently, we designed a multi-layer perceptron module with residual connections. It performs channel-wise optimization to suppress feature redundancy and refine representations. Additionally, we froze the feature extractor during fine-tuning. This strategy significantly reduces computational consumption and facilitates fast transfer and deployment across various interpretation tasks on homogeneous imagery. Our comprehensive evaluation on the sparsely labeled dataset MiniFrance-S and the fully labeled Gaofen Image Dataset achieved mean intersection over union scores of 42.7% and 66.5%, respectively, outperforming previous works. This demonstrates that our approach effectively reduces reliance on labeled imagery and increases transferability to downstream remote sensing tasks. Full article

(This article belongs to the Special Issue AI-Driven Mapping Using Remote Sensing Data)

► Show Figures

Figure 1

25 pages, 6462 KiB

Open AccessArticle

Phenotypic Trait Acquisition Method for Tomato Plants Based on RGB-D SLAM

by Penggang Wang, Yuejun He, Jiguang Zhang, Jiandong Liu, Ran Chen and Xiang Zhuang

Agriculture 2025, 15(15), 1574; https://doi.org/10.3390/agriculture15151574 - 22 Jul 2025

Viewed by 187

Abstract

The acquisition of plant phenotypic traits is essential for selecting superior varieties, improving crop yield, and supporting precision agriculture and agricultural decision-making. Therefore, it plays a significant role in modern agriculture and plant science research. Traditional manual measurements of phenotypic traits are labor-intensive [...] Read more.

The acquisition of plant phenotypic traits is essential for selecting superior varieties, improving crop yield, and supporting precision agriculture and agricultural decision-making. Therefore, it plays a significant role in modern agriculture and plant science research. Traditional manual measurements of phenotypic traits are labor-intensive and inefficient. In contrast, combining 3D reconstruction technologies with autonomous vehicles enables more intuitive and efficient trait acquisition. This study proposes a 3D semantic reconstruction system based on an improved ORB-SLAM3 framework, which is mounted on an unmanned vehicle to acquire phenotypic traits in tomato cultivation scenarios. The vehicle is also equipped with the A * algorithm for autonomous navigation. To enhance the semantic representation of the point cloud map, we integrate the BiSeNetV2 network into the ORB-SLAM3 system as a semantic segmentation module. Furthermore, a two-stage filtering strategy is employed to remove outliers and improve the map accuracy, and OctoMap is adopted to store the point cloud data, significantly reducing the memory consumption. A spherical fitting method is applied to estimate the number of tomato fruits. The experimental results demonstrate that BiSeNetV2 achieves a mean intersection over union (mIoU) of 95.37% and a frame rate of 61.98 FPS on the tomato dataset, enabling real-time segmentation. The use of OctoMap reduces the memory consumption by an average of 96.70%. The relative errors when predicting the plant height, canopy width, and volume are 3.86%, 14.34%, and 27.14%, respectively, while the errors concerning the fruit count and fruit volume are 14.36% and 14.25%. Localization experiments on a field dataset show that the proposed system achieves a mean absolute trajectory error (mATE) of 0.16 m and a root mean square error (RMSE) of 0.21 m, indicating high localization accuracy. Therefore, the proposed system can accurately acquire the phenotypic traits of tomato plants, providing data support for precision agriculture and agricultural decision-making. Full article

(This article belongs to the Section Artificial Intelligence and Digital Agriculture)

► Show Figures

Figure 1

18 pages, 10000 KiB

Open AccessArticle

Predicting Neoadjuvant Chemotherapy Response in Triple-Negative Breast Cancer Using Pre-Treatment Histopathologic Images

by Hikmat Khan, Ziyu Su, Huina Zhang, Yihong Wang, Bohan Ning, Shi Wei, Hua Guo, Zaibo Li and Muhammad Khalid Khan Niazi

Cancers 2025, 17(15), 2423; https://doi.org/10.3390/cancers17152423 - 22 Jul 2025

Viewed by 274

Abstract

Triple-negative breast cancer (TNBC) remains a major clinical challenge due to its aggressive behavior and lack of targeted therapies. Accurate early prediction of response to neoadjuvant chemotherapy (NACT) is essential for guiding personalized treatment strategies and improving patient outcomes. In this study, we [...] Read more.

Triple-negative breast cancer (TNBC) remains a major clinical challenge due to its aggressive behavior and lack of targeted therapies. Accurate early prediction of response to neoadjuvant chemotherapy (NACT) is essential for guiding personalized treatment strategies and improving patient outcomes. In this study, we present an attention-based multiple instance learning (MIL) framework designed to predict pathologic complete response (pCR) directly from pre-treatment hematoxylin and eosin (H&E)-stained biopsy slides. The model was trained on a retrospective in-house cohort of 174 TNBC patients and externally validated on an independent cohort (n = 30). It achieved a mean area under the curve (AUC) of 0.85 during five-fold cross-validation and 0.78 on external testing, demonstrating robust predictive performance and generalizability. To enhance model interpretability, attention maps were spatially co-registered with multiplex immunohistochemistry (mIHC) data stained for PD-L1, CD8+ T cells, and CD163+ macrophages. The attention regions exhibited moderate spatial overlap with immune-enriched areas, with mean Intersection over Union (IoU) scores of 0.47 for PD-L1, 0.45 for CD8+ T cells, and 0.46 for CD163+ macrophages. The presence of these biomarkers in high-attention regions supports their biological relevance to NACT response in TNBC. This not only improves model interpretability but may also inform future efforts to identify clinically actionable histological biomarkers directly from H&E-stained biopsy slides, further supporting the utility of this approach for accurate NACT response prediction and advancing precision oncology in TNBC. Full article

(This article belongs to the Section Cancer Informatics and Big Data)

► Show Figures

Figure 1

24 pages, 9379 KiB

Open AccessArticle

Performance Evaluation of YOLOv11 and YOLOv12 Deep Learning Architectures for Automated Detection and Classification of Immature Macauba (Acrocomia aculeata) Fruits

by David Ribeiro, Dennis Tavares, Eduardo Tiradentes, Fabio Santos and Demostenes Rodriguez

Agriculture 2025, 15(15), 1571; https://doi.org/10.3390/agriculture15151571 - 22 Jul 2025

Viewed by 483

Abstract

The automated detection and classification of immature macauba (Acrocomia aculeata) fruits is critical for improving post-harvest processing and quality control. In this study, we present a comparative evaluation of two state-of-the-art YOLO architectures, YOLOv11x and YOLOv12x, trained on the newly constructed [...] Read more.

The automated detection and classification of immature macauba (Acrocomia aculeata) fruits is critical for improving post-harvest processing and quality control. In this study, we present a comparative evaluation of two state-of-the-art YOLO architectures, YOLOv11x and YOLOv12x, trained on the newly constructed VIC01 dataset comprising 1600 annotated images captured under both background-free and natural background conditions. Both models were implemented in PyTorch and trained until the convergence of box regression, classification, and distribution-focal losses. Under an IoU (intersection over union) threshold of 0.50, YOLOv11x and YOLOv12x achieved an identical mean average precision (mAP₅₀) of 0.995 with perfect precision and recall or TPR (true positive rate). Averaged over IoU thresholds from 0.50 to 0.95, YOLOv11x demonstrated superior spatial localization performance (mAP_50–95 = 0.973), while YOLOv12x exhibited robust performance in complex background scenarios, achieving a competitive mAP_50–95. Inference throughput averaged 3.9 ms per image for YOLOv11x and 6.7 ms for YOLOv12x, highlighting a trade-off between speed and architectural complexity. Fused model representations revealed optimized layer fusion and reduced computational overhead (GFLOPs), facilitating efficient deployment. Confusion-matrix analyses confirmed YOLOv11x’s ability to reject background clutter more effectively than YOLOv12x, whereas precision–recall and F1-score curves indicated both models maintain near-perfect detection balance across thresholds. The public release of the VIC01 dataset and trained weights ensures reproducibility and supports future research. Our results underscore the importance of selecting architectures based on application-specific requirements, balancing detection accuracy, background discrimination, and computational constraints. Future work will extend this framework to additional maturation stages, sensor fusion modalities, and lightweight edge-deployment variants. By facilitating precise immature fruit identification, this work contributes to sustainable production and value addition in macauba processing. Full article

(This article belongs to the Section Agricultural Technology)

► Show Figures

Figure 1

28 pages, 4950 KiB

Open AccessArticle

A Method for Auto Generating a Remote Sensing Building Detection Sample Dataset Based on OpenStreetMap and Bing Maps

by Jiawei Gu, Chen Ji, Houlin Chen, Xiangtian Zheng, Liangbao Jiao and Liang Cheng

Remote Sens. 2025, 17(14), 2534; https://doi.org/10.3390/rs17142534 - 21 Jul 2025

Viewed by 312

Abstract

In remote sensing building detection tasks, data acquisition remains a critical bottleneck that limits both model performance and large-scale deployment. Due to the high cost of manual annotation, limited geographic coverage, and constraints of image acquisition conditions, obtaining large-scale, high-quality labeled datasets remains [...] Read more.

In remote sensing building detection tasks, data acquisition remains a critical bottleneck that limits both model performance and large-scale deployment. Due to the high cost of manual annotation, limited geographic coverage, and constraints of image acquisition conditions, obtaining large-scale, high-quality labeled datasets remains a significant challenge. To address this issue, this study proposes an automatic semantic labeling framework for remote sensing imagery. The framework leverages geospatial vector data provided by OpenStreetMap, precisely aligns it with high-resolution satellite imagery from Bing Maps through projection transformation, and incorporates a quality-aware sample filtering strategy to automatically generate accurate annotations for building detection. The resulting dataset comprises 36,647 samples, covering buildings in both urban and suburban areas across multiple cities. To evaluate its effectiveness, we selected three publicly available datasets—WHU, INRIA, and DZU—and conducted three types of experiments using the following four representative object detection models: SSD, Faster R-CNN, DETR, and YOLOv11s. The experiments include benchmark performance evaluation, input perturbation robustness testing, and cross-dataset generalization analysis. Results show that our dataset achieved a mAP at 0.5 intersection over union of up to 93.2%, with a precision of 89.4% and a recall of 90.6%, outperforming the open-source benchmarks across all four models. Furthermore, when simulating real-world noise in satellite image acquisition—such as motion blur and brightness variation—our dataset maintained a mean average precision of 90.4% under the most severe perturbation, indicating strong robustness. In addition, it demonstrated superior cross-dataset stability compared to the benchmarks. Finally, comparative experiments conducted on public test areas further validated the effectiveness and reliability of the proposed annotation framework. Full article

► Show Figures

Figure 1

17 pages, 4914 KiB

Open AccessArticle

Large-Scale Point Cloud Semantic Segmentation with Density-Based Grid Decimation

by Liangcun Jiang, Jiacheng Ma, Han Zhou, Boyi Shangguan, Hongyu Xiao and Zeqiang Chen

ISPRS Int. J. Geo-Inf. 2025, 14(7), 279; https://doi.org/10.3390/ijgi14070279 - 17 Jul 2025

Viewed by 444

Abstract

Accurate segmentation of point clouds into categories such as roads, buildings, and trees is critical for applications in 3D reconstruction and autonomous driving. However, large-scale point cloud segmentation encounters challenges such as uneven density distribution, inefficient sampling, and limited feature extraction capabilities. To [...] Read more.

Accurate segmentation of point clouds into categories such as roads, buildings, and trees is critical for applications in 3D reconstruction and autonomous driving. However, large-scale point cloud segmentation encounters challenges such as uneven density distribution, inefficient sampling, and limited feature extraction capabilities. To address these issues, this paper proposes RT-Net, a novel framework that incorporates a density-based grid decimation algorithm for efficient preprocessing of outdoor point clouds. The proposed framework helps alleviate the problem of uneven density distribution and improves computational efficiency. RT-Net also introduces two modules: Local Attention Aggregation, which extracts local detailed features of points using an attention mechanism, enhancing the model’s recognition ability for small-sized objects; and Attention Residual, which integrates local details of point clouds with global features by an attention mechanism to improve the model’s generalization ability. Experimental results on the Toronto3D, Semantic3D, and SemanticKITTI datasets demonstrate the superiority of RT-Net for small-sized object segmentation, achieving state-of-the-art mean Intersection over Union (mIoU) scores of 86.79% on Toronto3D and 79.88% on Semantic3D. Full article

(This article belongs to the Topic 3D Computer Vision and Smart Building and City, 3rd Edition)

► Show Figures

Figure 1

21 pages, 4008 KiB

Open AccessArticle

Enhancing Suburban Lane Detection Through Improved DeepLabV3+ Semantic Segmentation

by Shuwan Cui, Bo Yang, Zhifu Wang, Yi Zhang, Hao Li, Hui Gao and Haijun Xu

Electronics 2025, 14(14), 2865; https://doi.org/10.3390/electronics14142865 - 17 Jul 2025

Viewed by 279

Abstract

Lane detection is a key technology in automatic driving environment perception, and its accuracy directly affects vehicle positioning, path planning, and driving safety. In this study, an enhanced real-time model for lane detection based on an improved DeepLabV3+ architecture is proposed to address [...] Read more.

Lane detection is a key technology in automatic driving environment perception, and its accuracy directly affects vehicle positioning, path planning, and driving safety. In this study, an enhanced real-time model for lane detection based on an improved DeepLabV3+ architecture is proposed to address the challenges posed by complex dynamic backgrounds and blurred road boundaries in suburban road scenarios. To address the lack of feature correlation in the traditional Atrous Spatial Pyramid Pooling (ASPP) module of the DeepLabV3+ model, we propose an improved LC-DenseASPP module. First, inspired by DenseASPP, the number of dilated convolution layers is reduced from six to three by adopting a dense connection to enhance feature reuse, significantly reducing computational complexity. Second, the convolutional block attention module (CBAM) attention mechanism is embedded after the LC-DenseASPP dilated convolution operation. This effectively improves the model’s ability to focus on key features through the adaptive refinement of channel and spatial attention features. Finally, an image-pooling operation is introduced in the last layer of the LC-DenseASPP to further enhance the ability to capture global context information. DySample is introduced to replace bilinear upsampling in the decoder, ensuring model performance while reducing computational resource consumption. The experimental results show that the model achieves a good balance between segmentation accuracy and computational efficiency, with a mean intersection over union (mIoU) of 95.48% and an inference speed of 128 frames per second (FPS). Additionally, a new lane-detection dataset, SubLane, is constructed to fill the gap in the research field of lane detection in suburban road scenarios. Full article

► Show Figures

Figure 1

Search Results (823)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (823)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI