MDPI - Publisher of Open Access Journals

23 pages, 3952 KB

Open AccessArticle

WASP-Mamba: A Wavelet-Enhanced Mamba Framework for Remote Sensing Semantic Segmentation

by Yuhao Zhang, Guolong Zhang, Yi Li, Yongbo Wu and Zhiguo Zhou

Remote Sens. 2026, 18(13), 2188; https://doi.org/10.3390/rs18132188 (registering DOI) - 4 Jul 2026

Semantic segmentation of high-resolution remote sensing (HRRS) imagery is a fundamental task in Earth observation but remains challenging due to severe scale variations and complex boundary structures. Although State Space Models (SSMs), particularly Mamba, have shown strong potential for efficient global modeling with [...] Read more.

Semantic segmentation of high-resolution remote sensing (HRRS) imagery is a fundamental task in Earth observation but remains challenging due to severe scale variations and complex boundary structures. Although State Space Models (SSMs), particularly Mamba, have shown strong potential for efficient global modeling with linear complexity, existing methods are still limited in capturing multi-scale context and fine-grained spatial details. To address these issues, we propose a novel framework named WASP-Mamba for HRRS image segmentation. A Mamba-based spatial pyramid pooling module is introduced at the encoder bottleneck to enhance multi-scale feature aggregation and improve robustness to scale variations. In addition, a wavelet-inspired Mamba decoder is designed to decouple and reconstruct low-frequency semantic information and high-frequency details, mitigating boundary degradation caused by conventional upsampling. Extensive experiments on the LoveDA, ISPRS Vaihingen and ISPRS Potsdam datasets demonstrate that the proposed method achieves 53.90%, 84.32% and 87.06% mIoU, respectively. Compared with recent state-of-the-art methods, WASP-Mamba achieves superior segmentation performance. Full article

(This article belongs to the Special Issue Deep Learning-Based Interpretation and Processing of Remote Sensing Images)

28 pages, 27420 KB

Open AccessArticle

A Carbon Trace Detection Method for Oil-Immersed Transformers Based on Superimposed Illumination Estimation and Multi-Scale Feature Fusion

by Hongxin Ji, Zhennan Shi, Jiaqi Li, Xinghua Liu and Liqing Liu

Sensors 2026, 26(13), 4223; https://doi.org/10.3390/s26134223 - 3 Jul 2026

Abstract

Accurately locating and reliably diagnosing insulation defects in oil-immersed transformers remains challenging. To overcome this, a micro-robot is employed to autonomously identify partial discharge (PD)-induced carbon traces on the insulation surface of the core components. Accurately capturing the multi-scale complex features of surface-discharge [...] Read more.

Accurately locating and reliably diagnosing insulation defects in oil-immersed transformers remains challenging. To overcome this, a micro-robot is employed to autonomously identify partial discharge (PD)-induced carbon traces on the insulation surface of the core components. Accurately capturing the multi-scale complex features of surface-discharge carbon traces under low-illumination conditions is critical for effective defect detection. Therefore, to address the obscurity of carbon trace features caused by insufficient illumination inside oil-immersed transformers, a Retinex-based image enhancement algorithm with superimposed illumination estimation is proposed. By transforming the original image into the HSI color space and integrating negative-image illumination fusion, this algorithm decouples brightness from chromaticity and preserves dark-region details, thereby reducing color distortion and enhancing carbon trace features. Furthermore, to handle the significant scale variations in carbon traces, a C2f module integrated with spatial and channel synergistic attention (SCSA) is designed. This module employs multi-scale depthwise separable convolutions and wide-channel self-attention to enhance cross-scale feature representation and reduce redundancy. Moreover, to address the feature resolution degradation in the fast spatial pyramid pooling module, which hinders the accurate perception of tiny carbon traces, a poly kernel inception atrous spatial pyramid pooling module (PKI-ASPP) is adopted. This preserves precise morphological details and minimizes the missed and false detection rates for tiny carbon traces. Finally, to tackle the difficulties in fusing complex morphological features, a deformable large kernel attention (DLKA) module is introduced into the neck network. This adapts to irregular carbon trace shapes, significantly improving the localization and learning of complex morphologies. Experiments on a transformer PD carbon trace dataset demonstrate that the proposed model significantly improves perceptual capabilities for carbon traces with massive scale variation. The improved model outperforms the baseline across all evaluation metrics, with mAP50 improved by 2.7% and mAP50-95 improved by 7.9%. These results indicate that the proposed method is highly reliable, providing solid technical support for internal surface discharge intensity detection and insulation condition assessment in oil-immersed transformer maintenance. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

27 pages, 13814 KB

Open AccessArticle

BFFPN-YOLO: Detection of Cow Estrus Behavior Under Fisheye Imaging via Boundary Enhancement and Frequency-Domain Compensation

by Xiaohan Yang, Rong Wang, Qifeng Li, Weiwei Huang, Yujiao Rong, Xuwen Li, Tonghui Wu and Ronghua Gao

Agriculture 2026, 16(13), 1458; https://doi.org/10.3390/agriculture16131458 - 2 Jul 2026

Viewed by 173

Abstract

In modern farm management, accurate detection of estrus behavior in dairy cows is essential for improving reproductive efficiency and enabling intelligent decision-making. Although fisheye lenses offer a wider field of view, they often introduce image distortion. This leads to geometric and scale deformation [...] Read more.

In modern farm management, accurate detection of estrus behavior in dairy cows is essential for improving reproductive efficiency and enabling intelligent decision-making. Although fisheye lenses offer a wider field of view, they often introduce image distortion. This leads to geometric and scale deformation of cow mounting behavior features, which reduces detection accuracy. To address this issue, a lightweight model called Boundary-Enhanced Frequency-Domain Feature Pyramid Network YOLO (BFFPN-YOLO) was developed. It is designed for detecting dairy cow mounting behavior under fisheye imaging, incorporating boundary enhancement and frequency-domain compensation. Initially, the backbone network was equipped with the multi-scale dilated fusion structure SPPELAN. This structure expands the receptive field and preserves detailed information, thereby enhancing boundary modeling for targets with scale variations. Subsequently, a boundary-enhanced frequency-domain feature pyramid network (BFFPN) module was designed for reconstructing the top-down transmission path in the Neck. The module is composed of the frequency-domain detail compensation FreqFusion and the spatial attention enhancement SEAM. By strengthening boundary responses, compensating for high-frequency details, and replacing the traditional upsampling and concatenation operations, it effectively mitigates blurred target boundaries in images of dairy cow mounting behavior. The improved algorithm demonstrates strong detection performance, achieving a Precision of 88%, a Recall of 84.5%, and an mAP@0.5 of 92.7%. Compared with the original YOLOv11, these metrics were increased by 3.8, 2.3, and 4.6 percentage points, respectively. The model parameter count was reduced by 1.10 × 10⁶. In complex scenarios, edge features and high-frequency details of dairy cow mounting behavior are more accurately captured by the improved model. These improvements provide a reliable technical basis for the intelligent detection of estrus behavior. Full article

(This article belongs to the Section Farm Animal Production)

► Show Figures

Figure 1

22 pages, 102126 KB

Open AccessArticle

A Lightweight Insulator Defect Detection Model for Edge Computing Devices: PEBL-YOLO

by Hao Wang, Jie Li and Qi Xing

Sensors 2026, 26(13), 4169; https://doi.org/10.3390/s26134169 (registering DOI) - 2 Jul 2026

Viewed by 83

Abstract

Insulators are critical insulation components in power transmission lines; however long-term exposure to adverse environmental conditions may threaten the safety and stability of power delivery. Existing studies primarily emphasize detection accuracy, while deployment efficiency and inference speed have received insufficient attention, limiting their [...] Read more.

Insulators are critical insulation components in power transmission lines; however long-term exposure to adverse environmental conditions may threaten the safety and stability of power delivery. Existing studies primarily emphasize detection accuracy, while deployment efficiency and inference speed have received insufficient attention, limiting their applicability to CPU-based edge computing devices. To address these limitations, this paper proposes PEBL-YOLO, a lightweight model for insulator defect detection. The proposed model retains the external C3k2 structure of YOLOv11 while simplifying its internal bottleneck module, in which PConv is embedded to improve spatial feature extraction and fusion efficiency. In the neck, the original Path Aggregation Feature Pyramid Network (PAFPN) is reconstructed by integrating a Bidirectional Feature Pyramid Network (BiFPN) with Efficient Channel Attention (ECA), enabling more effective aggregation of multi-scale features and stronger focus on defect-related regions with minimal parameter increase. Moreover, a lightweight shared decoupled detection head is designed to decouple classification and regression branches. By combining parameter sharing with Group Normalization (GN) the detection head further reduces model complexity while maintaining accurate localization capability. Experimental results show that PEBL-YOLO contains only 1.68 M parameters. It achieves Precision, Recall, mAP@0.5, and mAP@0.5:0.95 of 95.0%, 92.1%, 94.4%, and 53.6%, respectively. These results demonstrate that PEBL-YOLO achieves a favorable trade-off between detection accuracy and parameter efficiency, providing a practical solution for lightweight insulator defect detection in edge computing scenarios. Full article

(This article belongs to the Special Issue Vision Based Defect Detection in Power Systems)

► Show Figures

Figure 1

25 pages, 7164 KB

Open AccessArticle

Underwater Image Enhancement and Small Object Detection Method Based on RBE-CycleGAN and MSFDC-Net

by Zongren Li, Chundong Xu, Wenjun Hui, Rui Chen and Xiaofang Kong

Sustainability 2026, 18(13), 6659; https://doi.org/10.3390/su18136659 - 1 Jul 2026

Viewed by 106

Abstract

Underwater object detection plays a vital role in marine exploration and resource exploitation. However, complex underwater environment leads to severe color deviation, blurring, and information loss of small targets, which greatly restrict detection performance. To address these problems, this paper integrates the Channel [...] Read more.

Underwater object detection plays a vital role in marine exploration and resource exploitation. However, complex underwater environment leads to severe color deviation, blurring, and information loss of small targets, which greatly restrict detection performance. To address these problems, this paper integrates the Channel Attention and Spatial Attention Block (CASAB) attention mechanism into residual blocks based on generative adversarial networks to correct color distortion and improve the clarity of degraded underwater images. For underwater small object detection, MobileNetV2 is selected as the backbone network within the Faster R-CNN framework, and a multi-scale feature fusion strategy is adopted to reduce feature loss caused by repeated downsampling. In the detection head, coordinate attention and parallel dilated convolution are further integrated to suppress background noise and expand the receptive field of feature extraction. Experimental results on the Underwater Robot Professional Contest (URPC) dataset demonstrate that the proposed method yields gains of 10.06%, 9.43%, and 12.29% in three evaluation metrics: Underwater Image Quality Measure (UIQM), Underwater Colour Image Quality Evaluation (UCIQE) and Natural Image Quality Evaluator (NIQE), together with 7.81% in Mean Average Precision (mAP) and an 8.57% increase in Mean Recall (mRecall). These results demonstrate the effectiveness of all improvements. Full article

(This article belongs to the Special Issue Sustainability of Intelligent Detection and New Sensor Technology)

► Show Figures

Figure 1

24 pages, 16853 KB

Open AccessArticle

Sedimentary Microfacies Analysis and Reservoir Prediction of Braided River Delta Reservoirs in Central Asia’s S Gas Field

by Feilong Li, Yungui Xu, Haotong Liu, Youheng Leng, Zhanjun Wei, Nini Zhang, Ronghe Liu, Boyong Liao and Xuri Huang

Appl. Sci. 2026, 16(13), 6523; https://doi.org/10.3390/app16136523 - 30 Jun 2026

Viewed by 175

Abstract

The prediction of thin-bedded, favorable sand bodies within the Middle-Lower Jurassic braided river delta–lacustrine succession of Block S (Amu Darya Right Bank) is challenging because of strong spatial heterogeneity, deep burial, and limited seismic resolution near the acoustic basement. To address this, we [...] Read more.

The prediction of thin-bedded, favorable sand bodies within the Middle-Lower Jurassic braided river delta–lacustrine succession of Block S (Amu Darya Right Bank) is challenging because of strong spatial heterogeneity, deep burial, and limited seismic resolution near the acoustic basement. To address this, we propose an integrated workflow that combines sedimentological characterization with geologically constrained seismic inversion. The study uses core, grain-size data, wireline logs, and 3D seismic surveys. Core–log–seismic integration first delineates three subfacies and nine numbered microfacies (MF1–MF9), with the delta front dominated by underwater distributary channels (MF1), mouth bars (MF2), and interdistributary bays (MF3). Planar microfacies distribution maps and electrofacies boundaries are then used as geological constraints for reservoir prediction. Steerable pyramid enhancement (

K = 4

scales,

N = 6

orientations) improves channel-reflection continuity, and PDF-regularized stochastic optimization inversion (

λ = 0.8

) is performed to identify thin sand reservoirs. Sand-ratio and GR cutoffs were validated against 412 core–log contacts in five wells. Discretization sensitivity tests confirm stable inversion under 2 ms and 4 ms sampling. The results show that (1) favorable Type I and Type II reservoirs occur preferentially in MF1 and MF2 (average porosities of 12.7% and 10.1%, respectively); (2) vertically, two sand-rich progradational intervals (Lower Member and late Upper Member) are separated by a transgressive mud-prone middle–early Upper Member; and (3) inversion low-impedance anomalies delineate strip-like and lobate channel–mouth-bar sand belts with thickness up to 14 m, consistent with well control. Fault-controlled graben–horst paleotopography influenced sand fairway distribution. The workflow highlights the value of integrating sedimentary microfacies boundaries as geological constraints in seismic inversion for heterogeneous deep clastic gas reservoirs. Full article

(This article belongs to the Special Issue New Technologies and Theories Applied in Oil and Gas Development Under Complex Conditions)

► Show Figures

Figure 1

32 pages, 270887 KB

Open AccessArticle

DCFP-YOLO: A Dual-Backbone Feature Fusion Network for Multi-Pose Chili Flower Recognition and Edge Deployment

by Minqiu Kuang, Xiaojian Li, Fangping Xie, Shang Chen, Dawei Liu, Yang Xiang, Bei Wu, Feng Liu, Yuxuan Zhang and Xu Li

Agriculture 2026, 16(13), 1422; https://doi.org/10.3390/agriculture16131422 - 29 Jun 2026

Viewed by 196

Abstract

To address the challenges of difficult feature extraction and insufficient recognition accuracy caused by the small size of chili flowers, occlusion by branches and leaves, and illumination variations in complex field environments, a dual-backbone-based chili flower pose estimation algorithm, termed DCFP-YOLO, is proposed. [...] Read more.

To address the challenges of difficult feature extraction and insufficient recognition accuracy caused by the small size of chili flowers, occlusion by branches and leaves, and illumination variations in complex field environments, a dual-backbone-based chili flower pose estimation algorithm, termed DCFP-YOLO, is proposed. Built upon the YOLO11n framework, the proposed method performs classification and recognition of five typical upward-oriented chili flower poses. To alleviate the loss of local detail features of small chili flowers under complex backgrounds, a dual-backbone feature extraction network composed of StarNet and ShuffleNetV2 is constructed. Specifically, the StarNet backbone enhances the extraction of fine-grained local features from key floral regions, while the ShuffleNetV2 backbone improves the perception of global spatial structural information. The complementary fusion of dual-backbone features strengthens the representation capability of chili flower pose features in complex environments. To mitigate the attenuation of shallow detail information during multi-scale feature transmission, a Bidirectional Multi-branch Auxiliary Feature Pyramid Network (BiMAFPN) is designed to enhance feature propagation through cross-scale feature interaction, thereby improving pose recognition performance under occlusion and overlapping conditions. Furthermore, a Programmable Gradient Information (PGI)-assisted training mechanism is introduced to optimize gradient propagation paths and alleviate information bottlenecks in deep networks, thereby enhancing the robustness of multi-pose feature extraction under occlusion, blur, and complex illumination conditions. Experimental results demonstrate that DCFP-YOLO achieves recall, mAP50, and mAP50 values of 87.4%, 92.0%, and 66.9%, respectively, representing improvements of 1.7, 1.3, and 3.5 percentage points over the baseline model. Overall performance surpasses that of current mainstream object detection algorithms. After deployment on the NVIDIA Jetson AGX Orin platform, the model achieves an inference speed of 20.9 frames/s, which can basically satisfy the real-time perception requirements of chili flower pose recognition in complex agricultural environments. The proposed method provides an effective visual perception framework for chili flower pose recognition in complex agricultural environments. Rather than constituting a complete robotic pollination solution, the developed model serves as a potential perception component for future intelligent pollination robotic systems, providing reliable flower pose information for subsequent research on target localization, end-effector alignment, and robotic pollination in unstructured greenhouse environments. Full article

(This article belongs to the Section Artificial Intelligence and Digital Agriculture)

► Show Figures

Figure 1

32 pages, 10531 KB

Open AccessArticle

A Hybrid ResNet U-Net++ Architecture with ASPP and SE for Fish Histological Image Segmentation

by Antonio Fhillipi Maciel Silva, Yanna Leidy Ketley Fernandes Cruz, Kayla Rocha Braga, Wesley Batista Dominices de Araujo, Raimunda Nonata Fortes Carvalho Neta and Ewaldo Eder Carvalho Santana

Eng 2026, 7(7), 310; https://doi.org/10.3390/eng7070310 - 27 Jun 2026

Viewed by 149

Abstract

The histological segmentation of fish gill lesions is a crucial step in environmental biomarker analysis, as morphological alterations in bioindicator species, such as Sciades herzbergii, provide biologically meaningful evidence of exposure to aquatic contaminants. In this context, gill histology enables the assessment of [...] Read more.

The histological segmentation of fish gill lesions is a crucial step in environmental biomarker analysis, as morphological alterations in bioindicator species, such as Sciades herzbergii, provide biologically meaningful evidence of exposure to aquatic contaminants. In this context, gill histology enables the assessment of biomarkers; however, manual lesion quantification remains time-consuming, observer-dependent, and challenging to scale for environmental monitoring programs. Moreover, this task remains challenging due to the presence of heterogeneous textures, fragmented lesion boundaries, low-contrast regions, and staining variability. To address these issues, this study proposes a deep learning framework for the semantic segmentation of epithelial lifting (EL) and hyperplasia (HY) in gill histological images. The proposed model combines a ResNet-50 encoder, an ASPP bottleneck for multiscale contextual aggregation, squeeze-and-excitation-based channel recalibration at the bridge, and a nested U-Net++ decoder with deep supervision. The GillHistDB dataset was also developed for this study, comprising 447 RGB histological images and 29,730 annotated lesions, including 16,855 EL and 12,875 HY instances. The proposed method achieved the best overall performance among the evaluated models in the main overlap-based metrics. At the class level, it obtained Dice values of (0.842 ± 0.055) for EL and (0.684 ± 0.190) for HY, with corresponding IoU values of (0.731 ± 0.080) and (0.548 ± 0.196), respectively. For EL, the method also achieved the highest recall (0.848 ± 0.074), while for HY it reached the highest precision (0.653 ± 0.205) and maintained a high recall (0.767 ± 0.139). These results indicate that the proposed architecture provides an effective and robust solution for gill histological lesion segmentation, while GillHistDB establishes a relevant benchmark to support future studies on environmental biomonitoring, histological biomarkers, and the assessment of aquatic pollution. Full article

(This article belongs to the Special Issue Transfer Learning and Data Augmentation in Engineering: Bridging Gaps for Smart Industrial Solutions)

13 pages, 2083 KB

Open AccessArticle

On-Chip Mid-Infrared Wavefront Sensing Based on Vectorial Photocurrent Manipulation

by Tao Ye, Xiaofei He, Jun Ning, Xueling Guo, Xianda Zhang, Ziao Li, Wei Lu, Xiaoshuang Chen and Jing Zhou

Sensors 2026, 26(13), 4022; https://doi.org/10.3390/s26134022 - 24 Jun 2026

Viewed by 266

Abstract

Wavefront sensing (WFS) is fundamental to adaptive optics, astronomical observation, biological microscopy, and free-space optical communications. However, conventional approaches—including Shack–Hartmann sensors, shearing interferometers, and transport of intensity equation-based methods—are inherently limited by trade-offs among spatial sampling density, angular dynamic range, and device compactness [...] Read more.

Wavefront sensing (WFS) is fundamental to adaptive optics, astronomical observation, biological microscopy, and free-space optical communications. However, conventional approaches—including Shack–Hartmann sensors, shearing interferometers, and transport of intensity equation-based methods—are inherently limited by trade-offs among spatial sampling density, angular dynamic range, and device compactness and have rarely been extended to the mid-infrared range. Here, we propose an on-chip mid-infrared wavefront sensing scheme operating based on vectorial photocurrent manipulation and analyze the properties of the proposed device through finite-element simulations. The proposed device comprises a hexagonal array of antenna-integrated graphene pixels, each equipped with three contacts and a microlens. Based on the antenna-induced vectorial photocurrent manipulation, angle-dependent absorption is translated into photocurrent signals, potentially enabling unambiguous recovery of both the elevation and azimuth angles of the incident light over an effective angular dynamic range of ±28°. The hexagonal layout provides a high spatial sampling density of 11,547 mm⁻². Southwell algorithm-based wavefront reconstruction and numerical simulations yield faithful recovery of parabolic, conical, and quadrangular pyramidal wavefronts. In addition, simulation results indicate that this approach can enable high-fidelity reconstruction of both the phase and intensity distributions of an object based on angular-spectrum diffraction theory. Overall, this work theoretically demonstrates a new route toward high-density wavefront measurement and complex light field imaging in the mid-infrared range without a conventional imaging lens. Full article

(This article belongs to the Section Optical Sensors)

► Show Figures

Figure 1

18 pages, 8476 KB

Open AccessArticle

Dual-Pathway Wavelet-Attention Framework for Image-Only AI-Generated Image Quality Assessment

by Yang Li, Yu Zheng and Dong Sui

Mathematics 2026, 14(13), 2249; https://doi.org/10.3390/math14132249 - 23 Jun 2026

Viewed by 175

Abstract

AI-generated images (AIGIs) often contain perceptual defects that differ from the distortions commonly studied in conventional no-reference image quality assessment (NR-IQA). This work investigates image-only AIGC image quality assessment, where no prompt text is used and the quality score must be inferred from [...] Read more.

AI-generated images (AIGIs) often contain perceptual defects that differ from the distortions commonly studied in conventional no-reference image quality assessment (NR-IQA). This work investigates image-only AIGC image quality assessment, where no prompt text is used and the quality score must be inferred from visual evidence such as artifacts, structure, and semantic plausibility. We propose a dual-pathway wavelet-attention framework built on a Swin Transformer V2-Base backbone. The artifact pathway employs a Noise Perceptive Attention Module (NPAM) with fixed Haar wavelet decomposition to describe generation-related sub-band degradation cues, whereas the image-perception pathway models semantic, structural, and contextual quality evidence using multi-scale attention, global–local spatial-channel attention, and pyramid pooling. The two pathways are integrated through adaptive fusion and a spatially weighted regression head with an auxiliary global prediction. Experiments on AGIQA-1K, AGIQA-3K, and AIGCIQA2023 demonstrate competitive in-domain performance, including SRCC values of 0.8418 on AGIQA-3K and 0.8445 on the quality dimension of AIGCIQA2023. The evaluation further covers individual module ablations, score-fusion variants, seed stability, qualitative error analysis, and cross-database transfer, revealing both the contribution of the proposed components and the remaining difficulty of source-disjoint generalization. Full article

(This article belongs to the Section E1: Mathematics and Computer Science)

► Show Figures

Figure 1

19 pages, 7335 KB

Open AccessArticle

MSA-DET: A Multi-Scale Attention Network with Adaptive Feature Fusion for SAR Ship Detection

by Sai Wan, Zhiyong Tao and Lu Chen

Sensors 2026, 26(13), 3970; https://doi.org/10.3390/s26133970 - 23 Jun 2026

Viewed by 246

Abstract

Synthetic aperture radar (SAR) ship detection faces three persistent challenges: coherent speckle noise that obscures target boundaries, heterogeneous background clutter in coastal and harbor scenes, and ship targets whose spatial extent varies by more than an order of magnitude within the same image. [...] Read more.

Synthetic aperture radar (SAR) ship detection faces three persistent challenges: coherent speckle noise that obscures target boundaries, heterogeneous background clutter in coastal and harbor scenes, and ship targets whose spatial extent varies by more than an order of magnitude within the same image. To address these issues jointly, this paper proposes MSA-DET, an improved SAR ship detection network built upon YOLOv11. In the backbone, a Multi-Scale Cross-axis Attention module (MSCAttention) runs horizontal and vertical axial attention branches in parallel across multiple receptive-field scales, sharpening feature representations for ship targets that vary widely in size and orientation. In the neck, the standard C3k2 block is redesigned as C3k2_SSA by embedding sparse self-attention, which selectively focuses on the most discriminative spatial tokens while suppressing speckle interference and reducing computational overhead. An Adaptive Spatial Feature Fusion detection head (ASFF) replaces fixed pyramid-level aggregation with learned per-pixel blending weights, resolving gradient conflicts across scales and improving localization consistency for both small and large ships. On the HRSID dataset, MSA-DET achieves an mAP@0.5:0.95 of 63.6% and mAP@0.5 of 88.1%, representing gains of 4.0% and 1.6% over the YOLOv11n baseline; on SSDD, it reaches 69.6% and 97.7%, surpassing the baseline by 7.2% and 2.1%, respectively. These results demonstrate that coordinated multi-stage redesign—rather than isolated module substitution—is an effective strategy for SAR-oriented ship detection. The accuracy gains are accompanied by a moderate increase in model size (8.9 M parameters versus 2.6 M for YOLOv11n) and computational cost (9.6 G FLOPs versus 6.3 G), a trade-off that is justified by the substantial improvement in detection quality. Full article

(This article belongs to the Section Remote Sensors)

► Show Figures

Figure 1

18 pages, 30352 KB

Open AccessArticle

An Intelligent Building Recognition Method in Remote Sensing Images Based on Cascade R-CNN

by Mingguang Diao, Changyuan Shen, Jikang Jiang, Wenji Li and Zheng Lian

Appl. Sci. 2026, 16(12), 6277; https://doi.org/10.3390/app16126277 - 22 Jun 2026

Viewed by 163

Abstract

Building recognition and detection in remote sensing images are of great significance for urban planning, spatial database updating, and the construction of urban geographic information systems. For remote sensing images with complex background information, variations in the size of building objects make automatic [...] Read more.

Building recognition and detection in remote sensing images are of great significance for urban planning, spatial database updating, and the construction of urban geographic information systems. For remote sensing images with complex background information, variations in the size of building objects make automatic building detection and recognition challenging, thereby affecting the recognition accuracy of deep learning models. At the same time, the lack of a standardized workflow for converting detection results into vector data formats makes it difficult to directly transform building detection results into usable GIS-compatible vector data. Based on the Cascade R-CNN model, an intelligent building recognition model for remote sensing images and a vectorization workflow for the recognition results are proposed. To address the issue of building recognition accuracy in remote sensing images, an intelligent building recognition model comprising ResNet101, a Feature Pyramid Network (FPN), a Region Proposal Network (RPN), and a cascade detector is proposed, which enhances the recognition precision and localization capability of building objects in multi-scale remote sensing images. To address the efficiency issue of vectorizing detection results, a procedural conversion method for building detection results in remote sensing images is proposed, which converts raster recognition results into GIS-compatible vector files through data verification, information extraction, boundary construction, polygon generation, and format conversion. Experiments show that the intelligent recognition model achieves a recall of 0.958, a miss rate of 0.042, a precision of 0.963, and an F1-score of 0.960. In addition, mAP@0.5, mAP@0.5:0.95, and mean IoU reach 0.954, 0.793, and 0.742, respectively, indicating good performance in building detection and localization. Compared with manual vectorization, the automated workflow reduces the processing time for 57 raster files from 25.4 min to 3.1 min, corresponding to an 87.8% reduction in processing time. These results indicate that the proposed method improves building recognition accuracy while enhancing the efficiency of converting recognition results into GIS vector data, showing application potential for urban spatial information extraction. Full article

(This article belongs to the Topic Artificial Intelligence, Remote Sensing and Digital Twin Driving Innovation in Sustainable Natural Resources and Ecology)

► Show Figures

Figure 1

24 pages, 4627 KB

Open AccessArticle

A State Space Model-Driven Feature Disentanglement Network for Real-Time Detection of Morphologically Complex Insect Pests in Agricultural Fields

by Jiaren Sun, Yating Jiang, Shuai Teng, Zongchao Liu and Nuo Chen

Modelling 2026, 7(3), 122; https://doi.org/10.3390/modelling7030122 - 21 Jun 2026

Viewed by 222

Abstract

Accurate detection of field insect pests remains a significant challenge for precision agriculture due to the elongated and variable morphology of the target organisms, their frequent resemblance to complex background textures, and the long-tail distribution of species in natural datasets. While deep convolutional [...] Read more.

Accurate detection of field insect pests remains a significant challenge for precision agriculture due to the elongated and variable morphology of the target organisms, their frequent resemblance to complex background textures, and the long-tail distribution of species in natural datasets. While deep convolutional neural networks (CNNs) have advanced the field, they are often constrained by a limited effective receptive field and the entanglement of semantic and spatial features, which can lead to elevated false-positive rates and missed detections for low-contrast or rare targets. This paper introduces a novel detection framework that integrates state space modeling with multi-stream feature disentanglement to address these limitations. First, a visual state space module is employed as the backbone feature extractor, enabling the establishment of a global receptive field with linear computational complexity and thereby improving the perception of long-range morphological structures. Second, a Topological Feature Disentanglement Pyramid Network is proposed. This architecture explicitly separates feature representations into semantic and spatial streams and recombines them through graph convolutional interactions, which serves to suppress background interference and enhance localization precision. A meta-auxiliary detection head, active only during training, is introduced to amplify supervision signals for hard, low-contrast samples via adversarial gradient modulation. Furthermore, an implicit neural radiance field augmentation pipeline is used to generate physically consistent synthetic views of underrepresented pest classes, mitigating the negative effects of long-tail data distributions. Experimental evaluations on the public BAU-Insectv2 benchmark demonstrate that the proposed method achieves a mean average precision (mAP@0.5) of 81.8%, representing a 4.4-percentage-point improvement over a comparable baseline, while maintaining a compact parameter count of 2.33 M and an inference speed of 178.6 FPS. The framework exhibits particular efficacy in detecting elongated, minute, and rare pests, suggesting a promising technical approach for real-time, field-based pest surveillance in precision agriculture. Full article

► Show Figures

Figure 1

26 pages, 8518 KB

Open AccessArticle

CVA-Net: Multi-View 3D Reconstruction for Fringe Projection Profilometry via Cross-View Attention and Sim2Real Learning

by Zuqiong Chen, Xiaopin Zhong and Yibin Tian

Photonics 2026, 13(6), 601; https://doi.org/10.3390/photonics13060601 - 21 Jun 2026

Viewed by 298

Abstract

Fringe projection profilometry (FPP) is widely used for 3D reconstruction, but conventional single-view FPP systems suffer from inherent occlusions and shadow regions, leading to incomplete surface recovery. In this study, we propose CVA-Net, an end-to-end deep learning framework with cross-view attention (CVA) that [...] Read more.

Fringe projection profilometry (FPP) is widely used for 3D reconstruction, but conventional single-view FPP systems suffer from inherent occlusions and shadow regions, leading to incomplete surface recovery. In this study, we propose CVA-Net, an end-to-end deep learning framework with cross-view attention (CVA) that directly reconstructs dense depth maps from multi-view fringe patterns. CVA-Net simultaneously processes four fringe images acquired from orthogonal projection directions and leverages a CVA module to explicitly model inter-view dependencies, enabling adaptive fusion of complementary information. A 3D U-Net backbone with attention gates, atrous spatial pyramid pooling (ASPP), and an auxiliary parameter estimation branch further enhances reconstruction accuracy and structural consistency via multitask learning. To support Sim2Real network training, we build a Blender-based digital twin of a multi-view FPP system and generate a large-scale synthetic dataset with perfect ground truth. Extensive experiments on both synthetic and real-world objects demonstrate that CVA-Net significantly outperforms state-of-the-art single-view methods. With a symmetric four-view configuration and fringe period of 8, CVA-Net achieves an MAE of 0.0359 mm, an MSE of 0.0379 mm² and an RMSE of 0.1947 mm, reducing the MAE, MSE, and RMSE by 32.8%, 54.1%, and 32.2%, respectively, compared to the best single-view competitor. Ablation studies validate the contribution of each architectural component, while real-system experiments demonstrate the feasibility of transferring a network trained purely on synthetic data to practical FPP measurements without domain adaptation. Although further improvements are required to enhance reconstruction accuracy under real imaging conditions, the proposed framework provides an effective initial step toward bridging the gap between digital-twin-based training and real-world multi-view FPP applications. CVA-Net provides a robust, occlusion-aware solution for multi-view FPP reconstruction. Full article

(This article belongs to the Special Issue Optical Imaging for 3D Surface and Phase Recovery: Techniques and Applications)

► Show Figures

Figure 1

25 pages, 35295 KB

Open AccessArticle

A Lightweight Framework for Tea Shoot Detection and Plucking Point Localization Enabled by Modified YOLOv11s-Seg Model

by Yongmao Huang, Yuankai Luo, Yuanxi Mu and Haiyan Jin

Agriculture 2026, 16(12), 1357; https://doi.org/10.3390/agriculture16121357 - 20 Jun 2026

Viewed by 298

Abstract

In this work, a lightweight framework enabled by the modified YOLOv11s-seg model for tea shoot detection and plucking point localization is proposed. Detecting tea shoots and localizing plucking points with higher accuracy generally require larger model size and more model parameters, making it [...] Read more.

In this work, a lightweight framework enabled by the modified YOLOv11s-seg model for tea shoot detection and plucking point localization is proposed. Detecting tea shoots and localizing plucking points with higher accuracy generally require larger model size and more model parameters, making it difficult to balance accuracy and lightweighting. To overcome this limitation, a modified lightweight YOLOv11s-seg model is developed. First, the multi-scale edge information enhancement is introduced into the conventional YOLOv11s-seg to extract edge feature better and improve the detection accuracy of tea shoots. Meanwhile, context anchor attention is utilized to modify the cross stage partial spatial attention module in a backbone network to improve the detection capability for small objects. Moreover, the detail calibration reconstruction feature pyramid network is proposed. It utilizes spatial and contextual semantic information to reconstruct and calibrate features in key regions, enhancing the capability for object fusion and recognition at various scales. Furthermore, with the modified model performing instance segmentation to acquire the contour of each tea shoot, the coordinates of the three lowest pixel points in the contour are captured to localize the plucking point based on the average coordinates. In addition, the layer-adaptive magnitude-based pruning (LAMP) method is used to lighten the model. The experimental results show that the LAMP-pruned modified YOLOv11s-seg model with a speedup ratio of 1.5 achieves a mAP@0.5 of 86.5% for tea shoot detection, exhibiting a 4.7 percentage point improvement over the conventional YOLOv11s-seg model. Moreover, it exhibits an accuracy of 81.9% for plucking point localization on the validation and test subsets with 232 images in total, and its number of parameters, model size and floating point operations (FLOPs) separately achieve reductions of 67.3%, 66.2%, and 24.9% over the conventional model as well. Therefore, the proposed LAMP-pruned modified model shows good balance between lightweighting and detection accuracy. Finally, the modified LAMP-pruned YOLOv11s-seg model is deployed on a Jetson Orin NX edge module and measured in a tea plantation, with the measured results exhibiting a detection speed of 34.1 FPS and verifying its availability in practical applications. Full article

(This article belongs to the Special Issue Advances in Precision Agriculture in Orchard)

► Show Figures

Figure 1

Search Results (1,233)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (1,233)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI