MDPI - Publisher of Open Access Journals

17 pages, 11141 KB

Open AccessArticle

Dynamic Fine-Tuning Rotation Network for Semantic Segmentation of Rock Paintings

by Chuanping Bai, Donglin Jing, Zhixue Wang and Fangqin Zhang

Algorithms 2026, 19(5), 349; https://doi.org/10.3390/a19050349 - 1 May 2026

Viewed by 270

The scale features of rock art exhibit significant diversity and graduality. Among the existing semantic segmentation methods for rock art, although some models have taken note of the scale differences in rock art patterns and the complexity of directional features, and proposed targeted [...] Read more.

The scale features of rock art exhibit significant diversity and graduality. Among the existing semantic segmentation methods for rock art, although some models have taken note of the scale differences in rock art patterns and the complexity of directional features, and proposed targeted improvement strategies, most of these methods view scale adaptation and directional representation as unconnected problems. They fail to model the intrinsic correlation between the scale adaptation and directional representation, and particularly overlook the restrictive effect of scale accuracy on the extraction of directional features. This ultimately leads to the problem of “spatial representation misalignment” in the semantic segmentation of rock art. To address the above problems, this paper proposes a Dynamic Fine-tuning Rotation Network (DFTR-Net), which aims to solve the problems of imprecise scale feature extraction and directional misalignment for rock art patterns with arbitrary orientations. The network consists of a dynamic selective convolution structure and a shapeaware spatial feature extraction module. Specifically, the dynamic selective convolution dynamically adjusts the coverage range of the receptive field through inter-layer feature aggregation. It uses stacked small dilated convolution kernels to replace large convolution kernels with the same receptive field for extracting the neighborhood details of patterns. Then, by combining with feature aggregation, it constructs spatial feature differences and realizes intra-layer dynamic weighted fusion, thereby achieving accurate scale feature extraction. After obtaining fine-grained scale features, the shape-aware module first corrects the initial segmentation candidate regions of the patterns to generate directional guide boxes. Subsequently, it drives the rotational sampling of convolution kernels based on the angles of the guide boxes, forming region-constrained deformable convolutions that adapt to the shape of the patterns. These convolution kernels obtain strong supervision based on pixel-level annotations, which enhances the sensitivity to the directional features of the patterns and effectively alleviates the problem of directional misalignment. Extensive experiments show that DFTR-Net can achieve higher performance on the 3D-pitoti and Petroglyph Annotation datasets compared with the existing methods. Full article

(This article belongs to the Special Issue Advances in Deep Learning-Based Data Analysis)

► Show Figures

Figure 1

25 pages, 23737 KB

Open AccessArticle

A Soybean Rust Resistance Evaluation Approach Based on a Novel Spectral Index SRSI

by Shuxin Zhu, Jiarui Feng, Hongfeng Yu, Xianglin Dou, Huanliang Xu and Zhaoyu Zhai

Agriculture 2026, 16(9), 951; https://doi.org/10.3390/agriculture16090951 - 26 Apr 2026

Viewed by 747

Abstract

Soybean rust is a widespread and rapidly spreading fungal disease that poses a serious threat to both the yield and quality of soybeans. Traditional vegetation indices struggle to effectively assess disease severity across different infection stages, particularly during early or mild stages, due [...] Read more.

Soybean rust is a widespread and rapidly spreading fungal disease that poses a serious threat to both the yield and quality of soybeans. Traditional vegetation indices struggle to effectively assess disease severity across different infection stages, particularly during early or mild stages, due to weak spectral responses. In this study, we propose a soybean rust resistance identification model, RustNet-3D (Soybean Rust Disease Diagnosis Network-3D), which integrates a 3D deformable convolution module and a spectral dilated convolution module to achieve accurate classification of different disease severity levels. We further introduce a spectral feature band extraction module, iBSAM (improved Band Selection and Attention Module), which employs a modified depthwise separable convolution architecture. iBSAM incorporates bandwise independent convolution to enable individualized modeling of each spectral band. It also applies a hard thresholding strategy to remove redundant information, and integrates a channel attention mechanism to reinforce the model’s sensitivity to discriminative wavelengths. By modeling the temporal hyperspectral data of soybean rust, five highly sensitive spectral bands—581 nm, 605 nm, 596 nm, 609 nm, and 628 nm—are identified and subsequently used to construct the Soybean Rust Spectral Index (SRSI). Experimental results demonstrate that the RustNet-3D model achieves an overall accuracy (OA) of 92.74%, and the correlation coefficient between SRSI and disease severity reaches 0.89, validating the effectiveness of the selected spectral features. This study provides a rapid and accurate solution for soybean rust severity evaluation, offering a high-efficiency and automated approach for resistance identification and intelligent breeding. Full article

(This article belongs to the Topic Digital Agriculture, Smart Farming and Crop Monitoring)

► Show Figures

Figure 1

18 pages, 2199 KB

Open AccessArticle

Brain-Oct-Pvt: A Physics-Guided Transformer with Radial Prior and Deformable Alignment for Neurovascular Segmentation

by Quan Lan, Jianuo Huang, Chenxi Huang, Songyuan Song, Yuhao Shi, Zijun Zhao, Wenwen Wu, Hongbin Chen and Nan Liu

Bioengineering 2026, 13(3), 332; https://doi.org/10.3390/bioengineering13030332 - 13 Mar 2026

Viewed by 696

Abstract

The primary objective of this study is to develop a specialized deep learning framework specifically adapted for the unique physical characteristics of neurovascular Optical Coherence Tomography (OCT) imaging. Although Polyp-PVT, originally designed for polyp segmentation, shows promise for OCT analysis, it faces limitations [...] Read more.

The primary objective of this study is to develop a specialized deep learning framework specifically adapted for the unique physical characteristics of neurovascular Optical Coherence Tomography (OCT) imaging. Although Polyp-PVT, originally designed for polyp segmentation, shows promise for OCT analysis, it faces limitations in neurovascular applications. The default RGB input wastes resources on duplicated grayscale data, while its fixed-scale fusion struggles with vascular curvature variations. Furthermore, the attention mechanism fails to capture radial vessel patterns, and geometric constraints limit thin boundary detection. To address these challenges, we propose Brain-OCT-PVT with key innovations: a single-channel input stem reducing parameters by two-thirds; a Radial Intensity Module (RIM) using polar transforms and angular convolution to model annular structures; and a Deformable Cross-scale Fusion Module (D-CFM) with learnable offsets. The Boundary-aware Attention Module (BAM) combines Laplace edge detection with Swin-Transformer for sub-pixel consistency. A specialized loss function combines Dice Similarity Coefficient (Dice), BoundaryIoU on 2-pixel dilated edges, and Focal Tversky to handle extreme class imbalance. Evaluation on 13 clinical cases achieves a Dice score of 95.06% and an 95% Hausdorff Distance (HD95) of 0.269 mm, demonstrating superior performance compared to existing approaches. Full article

(This article belongs to the Special Issue AI-Driven Imaging and Analysis for Biomedical Applications)

► Show Figures

Graphical abstract

24 pages, 18324 KB

Open AccessArticle

DTRFR: A Unified Detector for Diverse Target Detection in High-Spatial-Resolution Spaceborne Infrared Video

by Xiaoying Wu, Dandan Li, Xin Chen, Kai Hu and Peng Rao

Remote Sens. 2026, 18(5), 780; https://doi.org/10.3390/rs18050780 - 4 Mar 2026

Viewed by 497

Abstract

Spaceborne infrared small-target detection plays a critical role in space-sky early warning, disaster rescue, and reconnaissance tracking, benefiting from all-time, all-weather, and wide-area monitoring capabilities. The deployment of high-spatial-resolution infrared payloads (ground sampling distance, GSD < 10 m) has introduced pronounced scale diversity [...] Read more.

Spaceborne infrared small-target detection plays a critical role in space-sky early warning, disaster rescue, and reconnaissance tracking, benefiting from all-time, all-weather, and wide-area monitoring capabilities. The deployment of high-spatial-resolution infrared payloads (ground sampling distance, GSD < 10 m) has introduced pronounced scale diversity among targets, leading to size-sensitive performance degradation in existing detectors and heightened risks of missed detections or false alarms in mixed-size scenarios. Furthermore, multi-frame infrared small-target detection methods often face challenges in maintaining consistent temporal coherence during feature propagation across sequences. To overcome these limitations in high-resolution spaceborne infrared videos, we propose DTRFR, an end-to-end unified detection framework built on an enhanced recurrent feature refinement architecture. This approach incorporates a realistic SITP-QLSD dataset derived from QLSAT-2 infrared backgrounds, featuring diverse scenes, multi-size small targets, and a dedicated generalization sub-test set with extremely small targets partially unseen in training; a multi-scale IRFeatureExtractor leveraging parallel convolutions and dilated receptive fields for improved cross-scale discrimination and clutter suppression; and an adaptive gating pyramid deformable alignment module to optimize sequence alignment and enhance temporal consistency, enabling robust performance across various clutter levels and dynamic backgrounds. Extensive evaluations on SITP-QLSD demonstrate that DTRFR attains competitive performance, achieving mIoU of 74.32% and Pd of 94.51% on the main set, with strong robustness on the generalization sub-test set (Pd = 92.37%). Compared to single-frame and multi-frame baselines, the proposed method achieves higher detection accuracy with significantly reduced false alarms, benefiting from multi-scale feature extraction that enables robust detection of small targets of different sizes in infrared videos. Full article

(This article belongs to the Section Remote Sensing Image Processing)

► Show Figures

Figure 1

25 pages, 24156 KB

Open AccessArticle

MLCANet: Multi-Level Composite Attention-Guided Network for Non-Homogeneous Image Dehazing in Adverse Weather Conditions

by Yongsheng Qiu

Sensors 2026, 26(5), 1505; https://doi.org/10.3390/s26051505 - 27 Feb 2026

Viewed by 424

Abstract

Image dehazing is a challenging ill-posed problem in low-level computer vision tasks, requiring the restoration of high-quality, haze-free images from complex and foggy conditions. Deep learning-based dehazing methods struggle to effectively remove non-homogeneous fog distributions due to the uneven and dense nature of [...] Read more.

Image dehazing is a challenging ill-posed problem in low-level computer vision tasks, requiring the restoration of high-quality, haze-free images from complex and foggy conditions. Deep learning-based dehazing methods struggle to effectively remove non-homogeneous fog distributions due to the uneven and dense nature of fog patches, making it difficult to clear real-world fog variations. A key challenge for non-homogeneous image dehazing algorithms is efficiently capturing the spatial distribution of haze in areas with varying fog densities while restoring fine image details. To address these challenges, we propose MLCANet, a multi-level composite attention-guided network for non-homogeneous image dehazing. MLCANet mitigates the impact of uneven haze areas through two main components: the Multi-level Composite Attention Generation Network (MCAGN) and the Dehazed Image Reconstruction Network (DIRN). The MCAGN integrates channel attention (CA), spatial attention (SA), and multi-scale pixel attention (MSPA) to capture haze features at different spatial scales. The DIRN, based on a decoder-encoder architecture, combines multi-scale dilated convolutions and deformable convolutions to restore fine image details more flexibly and efficiently. Extensive qualitative and quantitative experiments, along with ablation studies, demonstrate the effectiveness and feasibility of this method for non-homogeneous image dehazing. Full article

(This article belongs to the Special Issue Image Processing and Visual Recognition for Adverse Weather Sensing and Monitoring)

► Show Figures

Figure 1

20 pages, 1304 KB

Open AccessArticle

LSDA-YOLO: Enhanced SAR Target Detection with Large Kernel and SimAM Dual Attention

by Jingtian Yang and Lei Zhu

Symmetry 2026, 18(1), 23; https://doi.org/10.3390/sym18010023 - 23 Dec 2025

Cited by 1 | Viewed by 825

Abstract

Synthetic Aperture Radar (SAR) target detection faces significant challenges including speckle noise interference, weak small object features, and multi-category imbalance. To address these issues, this paper proposes LSDA-YOLO, an enhanced SAR target detection framework built upon the YOLO architecture that integrates Large Kernel [...] Read more.

Synthetic Aperture Radar (SAR) target detection faces significant challenges including speckle noise interference, weak small object features, and multi-category imbalance. To address these issues, this paper proposes LSDA-YOLO, an enhanced SAR target detection framework built upon the YOLO architecture that integrates Large Kernel Attention and SimAM dual attention mechanisms. Our method effectively overcomes these challenges by synergistically combining global context modeling and local detail enhancement to improve robustness and accuracy. Notably, this framework leverages the inherent symmetry properties of typical SAR targets (e.g., geometric symmetry of ships and bridges) to strengthen feature consistency, thereby reducing interference from asymmetric background clutter. By replacing the baseline C2PSA module with Deformable Large Kernel Attention and incorporating parameter-free SimAM attention throughout the detection network, our approach achieves improved detection accuracy while maintaining computational efficiency. The deformable large kernel attention module expands the receptive field through synergistic integration of deformable and dilated convolutions, enhancing geometric modeling for complex-shaped targets. Simultaneously, the SimAM attention mechanism enables adaptive feature enhancement across channel and spatial dimensions based on visual neuroscience principles, effectively improving discriminability for small targets in noisy SAR environments. Experimental results on the RSAR dataset demonstrate that LSDA-YOLO achieves 80.8% mAP50, 53.2% mAP50-95, and 77.6% F1 score, with computational complexity of 7.3 GFLOPS, showing significant improvement over baseline models and other attention variants while maintaining lightweight characteristics suitable for real-time applications. Full article

► Show Figures

Figure 1

17 pages, 2692 KB

Open AccessArticle

MSDTCN-Net: A Multi-Scale Dual-Encoder Network for Skin Lesion Segmentation

by Da Li, Xinyang Wu and Qin Wei

Diagnostics 2025, 15(22), 2924; https://doi.org/10.3390/diagnostics15222924 - 19 Nov 2025

Cited by 1 | Viewed by 975

Abstract

Background/Objectives: Accurate segmentation of skin lesions is essential for early skin cancer detection. However, traditional CNNs are limited in modeling long-range dependencies, leading to poor performance on lesions with complex shapes. Methods: We propose MSDTCN-Net, a dual-encoder network that integrates ConvNeXt and Deformable [...] Read more.

Background/Objectives: Accurate segmentation of skin lesions is essential for early skin cancer detection. However, traditional CNNs are limited in modeling long-range dependencies, leading to poor performance on lesions with complex shapes. Methods: We propose MSDTCN-Net, a dual-encoder network that integrates ConvNeXt and Deformable Transformer to extract both local details and global semantic information. A Squeeze-and-Excitation (SE) mechanism is introduced to adaptively emphasize important channels. To address scale variation in lesions, we design a Multi-Scale Receptive Field (MSRF) module combining multi-branch and dilated convolutions. Furthermore, a Hierarchical Feature Transfer (HFT) mechanism is employed to guide high-level semantics progressively to shallow layers, enhancing boundary reconstruction in the decoder. Results: Extensive experiments on the ISIC 2016, ISIC 2017, ISIC 2018, and PH2 datasets show that MSDTCN-Net achieves competitive performance across metrics including IoU, Dice, and ACC, validating its effectiveness and generalization in skin lesion segmentation. Conclusions: MSDTCN-Net effectively combines local and global feature extraction, multi-scale adaptability, and semantic guidance to achieve high-accuracy skin lesion segmentation, demonstrating its potential in clinical diagnostic applications. Full article

(This article belongs to the Section Machine Learning and Artificial Intelligence in Diagnostics)

► Show Figures

Figure 1

17 pages, 2779 KB

Open AccessArticle

Image Restoration Based on Semantic Prior Aware Hierarchical Network and Multi-Scale Fusion Generator

by Yapei Feng, Yuxiang Tang and Hua Zhong

Technologies 2025, 13(11), 521; https://doi.org/10.3390/technologies13110521 - 13 Nov 2025

Viewed by 988

Abstract

As a fundamental low-level vision task, image restoration plays a pivotal role in reconstructing authentic visual information from corrupted inputs, directly impacting the performance of downstream high-level vision systems. Current approaches frequently exhibit two critical limitations: (1) Progressive texture degradation and blurring during [...] Read more.

As a fundamental low-level vision task, image restoration plays a pivotal role in reconstructing authentic visual information from corrupted inputs, directly impacting the performance of downstream high-level vision systems. Current approaches frequently exhibit two critical limitations: (1) Progressive texture degradation and blurring during iterative refinement, particularly in irregular damage patterns. (2) Structural incoherence when handling cross-domain artifacts. To address these challenges, we present a semantic-aware hierarchical network (SAHN) that synergistically integrates multi-scale semantic guidance with structural consistency constraints. Firstly, we construct a Dual-Stream Feature Extractor. Based on a modified U-Net backbone with dilated residual blocks, this skip-connected encoder–decoder module simultaneously captures hierarchical semantic contexts and fine-grained texture details. Secondly, we propose the semantic prior mapper by establishing spatial–semantic correspondences between damaged areas and multi-scale features through predefined semantic prototypes through adaptive attention pooling. Additionally, we construct a multi-scale fusion generator, by employing cascaded association blocks with structural similarity constraints. This unit progressively aggregates features from different semantic levels using deformable convolution kernels, effectively bridging the gap between global structure and local texture reconstruction. Compared to existing methods, our algorithm attains the highest overall PSNR of 34.99 with the best visual authenticity (with the lowest FID of 11.56). Comprehensive evaluations of three datasets demonstrate its leading performance in restoring visual realism. Full article

► Show Figures

Figure 1

23 pages, 3467 KB

Open AccessArticle

YOLO-LDFI: A Lightweight Deformable Feature-Integrated Detector for SAR Ship Detection

by Wendong Bao, Shuoying Chen, Jiansen Zhao and Xinyue Lin

J. Mar. Sci. Eng. 2025, 13(9), 1724; https://doi.org/10.3390/jmse13091724 - 6 Sep 2025

Cited by 5 | Viewed by 1600

Abstract

A lightweight enhanced detection model named YOLO-LDFI is proposed in this study for ship target detection in SAR images, aiming to improve detection accuracy and deployment efficiency under complex maritime environments. Based on YOLOv11n, the model incorporates four architectural improvements in a progressive [...] Read more.

A lightweight enhanced detection model named YOLO-LDFI is proposed in this study for ship target detection in SAR images, aiming to improve detection accuracy and deployment efficiency under complex maritime environments. Based on YOLOv11n, the model incorporates four architectural improvements in a progressive manner: linear deformable convolution (LDConv), deformable context-aware attention mechanism (DCAM), frequency-adaptive dilated convolution detection head (FAHead), and Inner-EIoU. Experiments conducted on the public SAR ship detection dataset HRSID demonstrate that the proposed model achieves an AP50 of 90.7% and an F1 score of 87.0%, with only 2.63 M parameters and a computational complexity of 6.7 GFLOPs. Ablation experiments validate the contribution of each component to improved feature alignment, reduced background interference, and more accurate target localization. Overall, the results indicate that the proposed model offers a reasonable trade-off between detection performance and computational efficiency in SAR ship detection tasks. Full article

(This article belongs to the Special Issue Applications of Sensors and Artificial Intelligence Techniques in Ships)

► Show Figures

Figure 1

23 pages, 9065 KB

Open AccessArticle

Multi-Scale Guided Context-Aware Transformer for Remote Sensing Building Extraction

by Mengxuan Yu, Jiepan Li and Wei He

Sensors 2025, 25(17), 5356; https://doi.org/10.3390/s25175356 - 29 Aug 2025

Cited by 2 | Viewed by 1525

Abstract

Building extraction from high-resolution remote sensing imagery is critical for urban planning and disaster management, yet remains challenging due to significant intra-class variability in architectural styles and multi-scale distribution patterns of buildings. To address these limitations, we propose the Multi-Scale Guided Context-Aware Network [...] Read more.

Building extraction from high-resolution remote sensing imagery is critical for urban planning and disaster management, yet remains challenging due to significant intra-class variability in architectural styles and multi-scale distribution patterns of buildings. To address these limitations, we propose the Multi-Scale Guided Context-Aware Network (MSGCANet), a Transformer-based multi-scale guided context-aware network. Our framework integrates a Contextual Exploration Module (CEM) that synergizes asymmetric and progressive dilated convolutions to hierarchically expand receptive fields, enhancing discriminability for dense building features. We further design a Window-Guided Multi-Scale Attention Mechanism (WGMSAM) to dynamically establish cross-scale spatial dependencies through adaptive window partitioning, enabling precise fusion of local geometric details and global contextual semantics. Additionally, a cross-level Transformer decoder leverages deformable convolutions for spatially adaptive feature alignment and joint channel-spatial modeling. Experimental results show that MSGCANet achieves IoU values of 75.47%, 91.53%, and 83.10%, and F1-scores of 86.03%, 95.59%, and 90.78% on the Massachusetts, WHU, and Inria datasets, respectively, demonstrating robust performance across these datasets. Full article

(This article belongs to the Section Optical Sensors)

► Show Figures

Figure 1

18 pages, 2565 KB

Open AccessArticle

Rock Joint Segmentation in Drill Core Images via a Boundary-Aware Token-Mixing Network

by Seungjoo Lee, Yongjin Kim, Yongseong Kim, Jongseol Park and Bongjun Ji

Buildings 2025, 15(17), 3022; https://doi.org/10.3390/buildings15173022 - 25 Aug 2025

Cited by 3 | Viewed by 1394

Abstract

The precise mapping of rock joint traces is fundamental to the design and safety assessment of foundations, retaining structures, and underground cavities in building and civil engineering. Existing deep learning approaches either impose prohibitive computational demands for on-site deployment or disrupt the topological [...] Read more.

The precise mapping of rock joint traces is fundamental to the design and safety assessment of foundations, retaining structures, and underground cavities in building and civil engineering. Existing deep learning approaches either impose prohibitive computational demands for on-site deployment or disrupt the topological continuity of subpixel lineaments that govern rock mass behavior. This study presents BATNet-Lite, a lightweight encoder–decoder architecture optimized for joint segmentation on resource-constrained devices. The encoder introduces a Boundary-Aware Token-Mixing (BATM) block that separates feature maps into patch tokens and directionally pooled stripe tokens, and a bidirectional attention mechanism subsequently transfers global context to local descriptors while refining stripe features, thereby capturing long-range connectivity with negligible overhead. A complementary Multi-Scale Line Enhancement (MLE) module combines depth-wise dilated and deformable convolutions to yield scale-invariant responses to joints of varying apertures. In the decoder, a Skeletal-Contrastive Decoder (SCD) employs dual heads to predict segmentation and skeleton maps simultaneously, while an InfoNCE-based contrastive loss enforces their topological consistency without requiring explicit skeleton labels. Training leverages a composite focal Tversky and edge IoU loss under a curriculum-thinning schedule, improving edge adherence and continuity. Ablation experiments confirm that BATM, MLE, and SCD each contribute substantial gains in boundary accuracy and connectivity preservation. By delivering topology-preserving joint maps with small parameters, BATNet-Lite facilitates rapid geological data acquisition for tunnel face mapping, slope inspection, and subsurface digital twin development, thereby supporting safer and more efficient building and underground engineering practice. Full article

► Show Figures

Figure 1

16 pages, 3211 KB

Open AccessArticle

Exploiting a Deformable and Dilated Feature Fusion Module for Object Detection

by Xiaoxia Qi, Md Gapar Md Johar, Ali Khatibi and Jacquline Tham

Electronics 2025, 14(13), 2716; https://doi.org/10.3390/electronics14132716 - 4 Jul 2025

Cited by 1 | Viewed by 1263

Abstract

We propose the Deformable and Dilated Feature Fusion Module (D2FM) in this paper to enhance the adaptability and flexibility of feature extraction in object detection tasks. Unlike traditional convolutions and Deformable Convolutional Networks (DCNs), D2FM dynamically predicts both dilation coefficients, and additionally predicts [...] Read more.

We propose the Deformable and Dilated Feature Fusion Module (D2FM) in this paper to enhance the adaptability and flexibility of feature extraction in object detection tasks. Unlike traditional convolutions and Deformable Convolutional Networks (DCNs), D2FM dynamically predicts both dilation coefficients, and additionally predicts spatial offsets based on the features at the dilated positions to better capture multi-scale and context-dependent patterns. Furthermore, a self-attention mechanism is introduced to fuse geometry-aware and enhanced local features. To efficiently integrate D2FM into detection frameworks, we design the D2FM-HierarchyEncoder, which employs hierarchical channel reduction and depth-dependent stacking of D2FM blocks, balancing representation capability and computational cost. We apply our design to the YOLOv11 detector, forming the D2YOLOv11 model. On the COCO 2017 dataset, our method achieves 47.9 AP when implemented with the YOLOv11s backbone network, representing a 1.0 AP improvement over the baseline YOLOv11 approach. Full article

► Show Figures

Figure 1

18 pages, 5323 KB

Open AccessArticle

Surface Defect and Malformation Characteristics Detection for Fresh Sweet Cherries Based on YOLOv8-DCPF Method

by Yilin Liu, Xiang Han, Longlong Ren, Wei Ma, Baoyou Liu, Changrong Sheng, Yuepeng Song and Qingda Li

Agronomy 2025, 15(5), 1234; https://doi.org/10.3390/agronomy15051234 - 19 May 2025

Cited by 7 | Viewed by 2044

Abstract

The damaged and deformed fruits of fresh berries severely restrict the economic value of produce, and accurate identification and grading methods have become a global research hotspot. To address the challenges of rapid and accurate defect detection in intelligent cherry sorting systems, this [...] Read more.

The damaged and deformed fruits of fresh berries severely restrict the economic value of produce, and accurate identification and grading methods have become a global research hotspot. To address the challenges of rapid and accurate defect detection in intelligent cherry sorting systems, this study proposes an enhanced YOLOv8n-based framework for sweet cherry defect identification. First, the dilation-wise residual (DWR) module replaces the conventional C2f structure, allowing for the adaptive capture of both local and global features through multi-scale convolution. This enhances the recognition accuracy of subtle surface defects and large-scale damages on cherries. Second, a channel attention feature fusion mechanism (CAFM) is incorporated at the front end of the detection head, which enhances the model’s ability to identify fine defects on the cherry surface. Additionally, to improve bounding box regression accuracy, powerful-IoU (PIoU) replaces the traditional CIoU loss function. Finally, self-distillation technology is introduced to further improve the mode’s generalization capability and detection accuracy through knowledge transfer. Experimental results show that the YOLOv8-DCPF model achieves precision, mAP, recall, and F1 score rates of 92.6%, 91.2%, 89.4%, and 89.0%, respectively, representing improvements of 6.9%, 5.6%, 6.1%, and 5.0% over the original YOLOv8n baseline network. The proposed model demonstrates high accuracy in cherry defect detection, providing an efficient and precise solution for intelligent cherry sorting in agricultural engineering applications. Full article

(This article belongs to the Special Issue Facility Agriculture Robots and Autonomous Unmanned Management for Crops)

► Show Figures

Figure 1

29 pages, 9314 KB

Open AccessArticle

SFRADNet: Object Detection Network with Angle Fine-Tuning Under Feature Matching

by Keliang Liu, Yantao Xi, Donglin Jing, Xue Zhang and Mingfei Xu

Remote Sens. 2025, 17(9), 1622; https://doi.org/10.3390/rs17091622 - 2 May 2025

Viewed by 1386

Abstract

Due to the distant acquisition and bird’s-eye perspective of remote sensing images, ground objects are distributed in arbitrary scales and multiple orientations. Existing detectors often utilize feature pyramid networks (FPN) and deformable (or rotated) convolutions to adapt to variations in object scale and [...] Read more.

Due to the distant acquisition and bird’s-eye perspective of remote sensing images, ground objects are distributed in arbitrary scales and multiple orientations. Existing detectors often utilize feature pyramid networks (FPN) and deformable (or rotated) convolutions to adapt to variations in object scale and orientation. However, these methods solve scale and orientation issues separately and ignore their deeper coupling relationships. When the scale features extracted by the network are significantly mismatched with the object, it is difficult for the detection head to effectively capture orientation of object, resulting in misalignment between object and bounding box. Therefore, we propose a one-stage detector—Scale First Refinement-Angle Detection Network (SFRADNet), which aims to fine-tune the rotation angle under precise scale feature matching. We introduce the Group Learning Large Kernel Network (GL²KNet) as the backbone of SFRADNet and employ a Shape-Aware Spatial Feature Extraction Module (SA-SFEM) as the primary component of the detection head. Specifically, within GL²KNet, we construct diverse receptive fields with varying dilation rates to capture features across different spatial coverage ranges. Building on this, we utilize multi-scale features within the layers and apply weighted aggregation based on a Scale Selection Matrix (SSMatrix). The SSMatrix dynamically adjusts the receptive field coverage according to the target size, enabling more refined selection of scale features. Based on precise scale features captured, we first design a Directed Guiding Box (DGBox) within the SA-SFEM, using its shape and position information to supervise the sampling points of the convolution kernels, thereby fitting them to deformations of object. This facilitates the extraction of orientation features near the object region, allowing for accurate refinement of both scale and orientation. Experiments show that our network achieves a mAP of 80.10% on the DOTA-v1.0 dataset, while reducing computational complexity compared to the baseline model. Full article

(This article belongs to the Special Issue Remote Sensing of Target Object Detection and Identification (Third Edition))

► Show Figures

Figure 1

19 pages, 10010 KB

Open AccessArticle

MCANet: An Unsupervised Multi-Constraint Cascaded Attention Network for Accurate and Smooth Brain Medical Image Registration

by Min Huang, Haoyu Wang and Guanyu Ren

Appl. Sci. 2025, 15(9), 4629; https://doi.org/10.3390/app15094629 - 22 Apr 2025

Viewed by 1175

Abstract

Brain medical image registration is a fundamental premise for the computer-assisted treatment of brain diseases. The brain is one of the most important and complex organs of the human body, and it is very challenging to perform accurate and fast registration on it. [...] Read more.

Brain medical image registration is a fundamental premise for the computer-assisted treatment of brain diseases. The brain is one of the most important and complex organs of the human body, and it is very challenging to perform accurate and fast registration on it. Aiming at the problem of voxel folding in the deformation field and low registration accuracy when facing complex and fine objects, this paper proposed a fully convolutional multi-constraint cascaded attention network (MCANet). The network is composed of two registration sub-network cascades and performs coarse-to-fine registration of input image pairs in an iterative manner. The registration subnetwork is called the dilated self-attention network (DSNet), which incorporates dilated convolution combinations with different dilation rates and attention gate modules. During the training of MCANet, a double regularization constraint was applied to punish, in a targeted manner, the excessive deformation problem, so that the network can generate relatively smooth deformation while having high registration accuracy. Experimental results on the Mindboggle101 dataset showed that the registration accuracy of MCANet was significantly better than several existing advanced registration methods, and the network can complete relatively smooth registration. Full article

► Show Figures

Figure 1

Search Results (34)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (34)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI