MDPI - Publisher of Open Access Journals

26 pages, 14761 KB

Open AccessArticle

Surface Defect Detection in Liquid Crystal Display Polariser Coating Manufacturing Based on an Enhanced YOLOv10-N Approach

by Jiayue Zhang, Shanhui Liu, Minghui Chen, Kezhan Zhang, Yinfeng Li, Ming Peng and Yeting Teng

Coatings 2026, 16(4), 451; https://doi.org/10.3390/coatings16040451 - 8 Apr 2026

Abstract

To address the issues of uneven grayscale distribution, weak defect features, and small target scales on the coating surface of LCD polarizers during manufacturing, an improved YOLOv10-N-based method is proposed for surface defect detection. First, a polarizer coating defect dataset is constructed based [...] Read more.

To address the issues of uneven grayscale distribution, weak defect features, and small target scales on the coating surface of LCD polarizers during manufacturing, an improved YOLOv10-N-based method is proposed for surface defect detection. First, a polarizer coating defect dataset is constructed based on the LCD polarizer coating process and the characteristics of coating defects. Adaptive median filtering is then employed for image denoising, while a particle-swarm-optimization-based improved histogram equalization method is adopted for image enhancement. Next, the Scale-aware Pyramid Pooling (SCPP) module is introduced into the C2f module of the backbone network to construct the C2f_SCPP feature extraction module, thereby improving the model’s ability to detect coating defects with different morphologies through multi-scale semantic feature fusion. In addition, rotation-equivariant convolution PreCM is incorporated into the SPPF module of the backbone network to build the SPPF_PreCM module, which effectively suppresses feature redundancy and scale conflicts while strengthening the representation of tiny defects. Finally, while retaining the original Distribution Focal Loss (DFL) branch of YOLOv10, WIoU is used to replace CIoU as the IoU loss term in bounding box regression, thereby improving localization accuracy and accelerating model convergence during training. Experimental results show that, compared with YOLOv10-N, the proposed method improves mAP@0.5 and mAP@0.5:0.95 by 1.8 and 2.8 percentage points, respectively, demonstrating its effectiveness for polarizer coating defect detection. However, its generalization capability under diverse production environments, varying illumination conditions, and complex noise scenarios still requires further investigation. Full article

(This article belongs to the Section High-Energy Beam Surface Engineering and Coatings)

23 pages, 9838 KB

Open AccessArticle

Bimodal Image Fusion and Brightness Piecewise Linear Enhancement for Crack Segmentation

by Yong Li, Nian Ji, Fuzhe Zhao, Huaiwen Zhang, Zeqi Liu, Laxmisha Rai and Zhaopeng Deng

Mathematics 2026, 14(7), 1235; https://doi.org/10.3390/math14071235 - 7 Apr 2026

Abstract

Accurate segmentation of structural cracks is a core prerequisite for quantifying crack parameters, assessing damage severity, and providing early warning of structural safety. However, different types of structures exhibit significant individual variations in features such as color, texture, and brightness. Consequently, commonly used [...] Read more.

Accurate segmentation of structural cracks is a core prerequisite for quantifying crack parameters, assessing damage severity, and providing early warning of structural safety. However, different types of structures exhibit significant individual variations in features such as color, texture, and brightness. Consequently, commonly used image segmentation algorithms struggle to establish a universal mathematical model, making it challenging to robustly identify and precisely segment crack targets amidst multi-feature disparities. To address the issue, this paper proposes a crack-segmentation algorithm based on bimodal image fusion and brightness piecewise linear enhancement (CSA-BB), and further enables parameter extraction and crack monitoring. The algorithm utilizes the complementary properties of visible-light and pseudo-color images for bimodal image fusion, thereby enhancing the detailed features of cracks. Furthermore, a brightness piecewise linear function has been devised that automatically selects appropriate parameters for image enhancement of structural cracks across varying background brightness. Subsequently, the crack region is effectively segmented using the bottom-hat transform and the OTSU algorithm. Ultimately, the crack’s safety level is determined from the acquired crack parameters, thereby enabling effective monitoring and assessment of the crack development process. In this paper, the proposed method achieves the best segmentation performance with a Dice coefficient of 0.4511 and a Jaccard index of 0.2981. Compared to the second-best algorithm, it yields significant improvements of 26.9% and 34.5%, respectively, demonstrating higher consistency with the ground truth. Moreover, superior computational efficiency and robustness are achieved, fulfilling the operational demands of real-world engineering environments. Full article

► Show Figures

Figure 1

25 pages, 7467 KB

Open AccessArticle

Double Cost-Volume Stereo Matching with Entropy-Difference-Guided Fusion

by Huanchun Yang, Hongshe Dang, Xuande Zhang and Quanping Chen

Electronics 2026, 15(7), 1525; https://doi.org/10.3390/electronics15071525 - 6 Apr 2026

Viewed by 64

Abstract

To address the reduced accuracy of stereo matching networks near object boundaries and disparity discontinuities, a double cost–volume stereo matching network with entropy-difference-guided fusion is proposed. The proposed network was built based on RAFT-Stereo. It employs a pretrained backbone to extract multi-scale features [...] Read more.

To address the reduced accuracy of stereo matching networks near object boundaries and disparity discontinuities, a double cost–volume stereo matching network with entropy-difference-guided fusion is proposed. The proposed network was built based on RAFT-Stereo. It employs a pretrained backbone to extract multi-scale features and uses deformable attention for cross-scale feature fusion. A shallow image-guided branch was used to generate pixel-wise constraint information to limit the magnitude of sampling offsets and alleviate cross-structure sampling. Based on the extracted features, a group-wise correlation cost–volume and a normalized correlation cost–volume were constructed. Both cost–volumes were regularized by 3D Hourglass networks, and a structure-consistent intra-scale aggregation module was introduced during the regularization of the group-wise correlation cost–volume. The two aggregated results were then fused by the entropy-difference-guided fusion module to obtain the final cost–volume. The experimental results show the effectiveness of the proposed network in the Scene Flow, KITTI, and ETH3D datasets, achieving an endpoint error of 0.45 px and a >3 px error rate of 2.41% on the Scene Flow dataset. Full article

(This article belongs to the Section Artificial Intelligence)

► Show Figures

Figure 1

32 pages, 43664 KB

Open AccessArticle

MVFF: Multi-View Feature Fusion Network for Small UAV Detection

by Kunlin Zou, Haitao Zhao, Xingwei Yan, Wei Wang, Yan Zhang and Yaxiu Zhang

Drones 2026, 10(4), 264; https://doi.org/10.3390/drones10040264 - 4 Apr 2026

Viewed by 287

Abstract

With the widespread adoption of various types of Unmanned Aerial Vehicles (UAVs), their non-compliant operations pose a severe challenge to public safety, necessitating the urgent identification and detection of UAV targets. However, in complex backgrounds, UAV targets exhibit small-scale dimensions and low contrast, [...] Read more.

With the widespread adoption of various types of Unmanned Aerial Vehicles (UAVs), their non-compliant operations pose a severe challenge to public safety, necessitating the urgent identification and detection of UAV targets. However, in complex backgrounds, UAV targets exhibit small-scale dimensions and low contrast, coupled with extremely low signal-to-noise ratios. This forces conventional target detection methods to confront issues such as feature convergence, missed detections, and false alarms. To address these challenges, we propose a Multi-View Feature Fusion Network (MVFF) that achieves precise identification of small, low-contrast UAV targets by leveraging complementary multi-view information. First, we design a collaborative view alignment fusion module. This module employs a cross-map feature fusion attention mechanism to establish pixel-level mapping relationships and perform deep fusion, effectively resolving geometric distortion and semantic overlap caused by imaging angle differences. Furthermore, we introduce a view feature smoothing module that employs displacement operators to construct a lightweight long-range modeling mechanism. This overcomes the limitations of traditional convolutional local receptive fields, effectively eliminating ghosting artifacts and response discontinuities arising from multi-view fusion. Additionally, we developed a small object binary cross-entropy loss function. By incorporating scale-adaptive gain factors and confidence-aware weights, this function enhances the learning capability of edge features in small objects, significantly reducing prediction uncertainty caused by background noise. Comparative experiments conducted on a multi-perspective UAV dataset demonstrate that our approach consistently outperforms existing state-of-the-art methods across multiple performance metrics. Specifically, it achieves a Structure-measure of 91.50% and an F-measure of 85.14%, validating the effectiveness and superiority of the proposed method. Full article

(This article belongs to the Special Issue Detection, Identification and Tracking of UAVs and Drones: 2nd Edition)

► Show Figures

Figure 1

37 pages, 33258 KB

Open AccessArticle

An Intelligent Gated Fusion Network for Waterbody Recognition in Multispectral Remote Sensing Imagery

by Tong Zhao, Chuanxun Hou, Zhili Zhang and Zhaofa Zhou

Remote Sens. 2026, 18(7), 1088; https://doi.org/10.3390/rs18071088 - 4 Apr 2026

Viewed by 169

Abstract

Accurate water body segmentation from multispectral remote sensing imagery is critical for hydrological monitoring and environmental management. However, leveraging transfer learning with pre-trained models remains challenging due to the dimensional mismatch between three-channel RGB-based architectures and multi-band spectral data. To address this, this [...] Read more.

Accurate water body segmentation from multispectral remote sensing imagery is critical for hydrological monitoring and environmental management. However, leveraging transfer learning with pre-trained models remains challenging due to the dimensional mismatch between three-channel RGB-based architectures and multi-band spectral data. To address this, this study proposes a novel segmentation network, termed Intelligent Gated Fusion Network (IGF-Net), built upon a dual-branch feature encoder module and a core Intelligent Gated Fusion Module (IGFM). The IGFM achieves adaptive fusion of visual and spectral features through a cascaded mechanism integrating differences-and-commonalities parallel modeling, channel-context priors, and adaptive temperature control. We evaluate IGF-Net on the newly constructed Tiangong-2 remote sensing image water body semantic segmentation dataset, which comprises 3776 meticulously annotated multispectral image patches. Comprehensive experiments demonstrate that IGF-Net achieves strong and consistent performance on this dataset, with an Intersection over Union of 0.8742 and a Dice coefficient of 0.9239, consistently outperforming the evaluated baseline methods, such as FCN, U-Net, and DeepLabv3+. It also exhibits strong cross-dataset generalization capabilities on an independent Sentinel-2 water segmentation dataset. Ablation studies and visualization analyses confirm that the proposed fusion strategy significantly enhances segmentation accuracy and stability, particularly in complex scenarios. placeholder Full article

(This article belongs to the Topic Advances in Hydrological Remote Sensing)

27 pages, 24041 KB

Open AccessArticle

PMDet: Patch-Aware Enhancement and Fusion for Multispectral Object Detection

by Jie Li, Chenhong Sui, Jing Wang and Jun Zhou

Remote Sens. 2026, 18(7), 1068; https://doi.org/10.3390/rs18071068 - 2 Apr 2026

Viewed by 168

Abstract

Multispectral object detection addresses the limitations of single-modal approaches by fusing complementary information from visible and infrared images, thereby improving robustness in complex environments. However, the inter-modal representations are inherently misaligned due to sensing discrepancies, and the complementary cues they provide are often [...] Read more.

Multispectral object detection addresses the limitations of single-modal approaches by fusing complementary information from visible and infrared images, thereby improving robustness in complex environments. However, the inter-modal representations are inherently misaligned due to sensing discrepancies, and the complementary cues they provide are often imbalanced, making it difficult to exploit modality-specific information effectively. Moreover, directly merging features from different modalities can introduce noise and artifacts that deteriorate the detection performance. To this end, this paper proposes a patch-aware enhancement and fusion network for multispectral object detection (PMDet). This method employs a dual-stream backbone equipped with the patch-aware Feature Enhancer (FE) module for cross-modal features alignment and enhancement. FE not only reinforces the feature representation of key regions but also helps to suppress local noise and enhance the model’s perception of fine textures and differences. Building on these enriched features, the patch-based Feature Aggregator (FA) module allows for efficient inter-modal feature interaction and semantic fusion with noise resistance. Specifically, both FE and FA modules leverage the shifted-patch design to preserve computational efficiency while enabling long-range modeling. In this regard, PMDet couples multi-scale cross-modal semantic enhancement with deep semantic fusion to form a stable and discriminative multimodal representation pipeline. Experiments on FLIR, LLVIP, and VEDAI demonstrate that the method outperforms mainstream approaches in detection accuracy and robustness, and ablation studies further verify the effectiveness of each module. Full article

(This article belongs to the Special Issue Advanced Technology for Remote Sensing Image Analysis and Applications)

► Show Figures

Figure 1

19 pages, 8523 KB

Open AccessArticle

DAMFusion: Multi-Spectral Image Segmentation via Competitive Query and Boundary Region Attention

by Miao Yu, Xing Lu, Ziyao Yang, Daoxing Gao and Guoqiang Zhong

Remote Sens. 2026, 18(7), 1064; https://doi.org/10.3390/rs18071064 - 2 Apr 2026

Viewed by 229

Abstract

To address the challenges of modal differences in multimodal farmland images and insufficient segmentation accuracy for small targets, this paper proposes a multi-source image fusion branch (DAMFusion) based on modal competitive selection. The branch dynamically selects infrared and visible light features through the [...] Read more.

To address the challenges of modal differences in multimodal farmland images and insufficient segmentation accuracy for small targets, this paper proposes a multi-source image fusion branch (DAMFusion) based on modal competitive selection. The branch dynamically selects infrared and visible light features through the Competitive Query Module (CQM) using Top-K screening, combined with IOU-aware loss optimization to avoid cross-modal interference. The multimodal fusion module (MMFormer) employs cross-modal attention and symmetric mechanisms, enhancing single-modal features through a self-enhancement module and unifying multimodal distributions via linear projection. The Boundary Region Attention Multi-level Fusion Module (BRM) extracts boundary information through feature differencing, strengthens it with spatial attention, and fuses it with shallow features to achieve cross-layer detail recovery. Through the collaborative design of dynamic modal feature selection, cross-modal distribution unification, and boundary region enhancement, DAMFusion effectively solves the problems of multimodal differences and small target segmentation in multispectral images, providing precise feature representation for fine farmland segmentation. Experiments on the OUC-UAV-MSEG dataset show that DAMFusion achieves 93.25% OA, 91.71% F1, and 89.70% mIoU, demonstrating clear advantages over representative comparison methods. In addition, ablation results verify the effectiveness of the proposed modules, where CQM improves OA from 91.00% to 93.25%, confirming the importance of discriminative modality selection before fusion. Full article

(This article belongs to the Section AI Remote Sensing)

► Show Figures

Figure 1

23 pages, 2936 KB

Open AccessArticle

Lightweight Transient-Source Detection Method for Edge Computing

by Jiahao Zhang, Yutian Fu, Feng Dong and Lingfeng Huang

Universe 2026, 12(4), 101; https://doi.org/10.3390/universe12040101 - 1 Apr 2026

Viewed by 193

Abstract

Transient-source detection without relying on difference images still faces challenges in achieving high accuracy, especially under practical space-based astronomical survey conditions where the data volume is enormous, on-orbit transmission bandwidth is limited, and real-time response is required for rapid follow-up observations. To address [...] Read more.

Transient-source detection without relying on difference images still faces challenges in achieving high accuracy, especially under practical space-based astronomical survey conditions where the data volume is enormous, on-orbit transmission bandwidth is limited, and real-time response is required for rapid follow-up observations. To address these issues, this paper proposes a lightweight detection network that integrates multi-scale feature fusion with contextual feature extraction, enabling efficient real-time processing on resource-constrained edge devices. The proposed model enhances robustness to point-spread-function variations across observation conditions and to complex background environments, while simultaneously improving detection accuracy. To evaluate performance comprehensively, lightweight VGG and lightweight ResNet architectures and other baseline models—commonly used as baselines for transient-source detection—are adopted for comparison. Experimental results show that under the condition that the models have approximately the same number of parameters, the proposed network achieves the best accuracy, obtaining nearly 1% improvement compared with the best-performing baseline model. Based on this design, an ultra-lightweight version with only 7k parameters is further developed by incorporating a compact multi-scale module, improving accuracy by 1% over the version without the multi-scale structure. Moreover, through heterogeneous knowledge distillation and adaptive iterative training, the accuracy of the ultra-lightweight model is further increased from 93.3% to 94.0%. Finally, the model is deployed and validated on an AI hardware acceleration platform. The results demonstrate that the proposed method substantially improves inference throughput while maintaining high accuracy, providing a practical solution for real-time, low-latency, on-device transient-source detection under large data volume and limited transmission conditions. Specifically, the proposed models are trained offline on a high-performance GPU and subsequently deployed on the Fudan Microelectronics 7100 AI board to evaluate their real-world inference efficiency on resource-constrained edge devices. Full article

(This article belongs to the Special Issue Applications of Artificial Intelligence in Modern Astronomy)

► Show Figures

Figure 1

21 pages, 56996 KB

Open AccessArticle

Comprehensive Analysis of Multimodal Fusion Techniques for Ocular Disease Detection

by Veena K. M., Pragya Gupta, Ruthvik Avadhanam, Rashmi Naveen Raj, Sulatha V. Bhandary, Varadraj Gurupur and Veena Mayya

AI 2026, 7(4), 126; https://doi.org/10.3390/ai7040126 - 1 Apr 2026

Viewed by 327

Abstract

Accurate and early identification of ocular diseases is essential to prevent vision impairment and enable timely medical intervention. In routine clinical practice, ophthalmologists rely on a structured diagnostic workflow that incorporates multiple imaging modalities to manually assess and diagnose ocular diseases. However, interpreting [...] Read more.

Accurate and early identification of ocular diseases is essential to prevent vision impairment and enable timely medical intervention. In routine clinical practice, ophthalmologists rely on a structured diagnostic workflow that incorporates multiple imaging modalities to manually assess and diagnose ocular diseases. However, interpreting each modality requires significant clinical experience and can be time-consuming. These limitations can be effectively addressed through the application of AI (Artificial intelligence)-driven multimodal fusion techniques. In this study, we conducted an empirical investigation to assess the impact of different fusion strategies—including early, intermediate, and late fusion—on diagnostic performance, training requirements, and interpretability. The proposed methodology was evaluated using three publicly available datasets: FFA-Fundus (Fundus fluorescein angiography), GAMMA (Glaucoma Analysis and Multi-Modal Assessment), and OLIVES (Ophthalmic Labels to Investigate Visual Eye Semantics). Experimental results demonstrate that multimodal feature fusion improves disease detection performance. Although fused models typically required an increase in training parameters compared to single-modality models, they provided interpretability on par with that of individual single-modal networks. However, inference time increased by approximately 50% for multimodal architectures. These findings underscore the value of integrating diverse ophthalmic imaging modalities to enhance diagnostic accuracy in automated disease detection systems. At the same time, the results highlight that unimodal models containing highly discriminative features can also perform competitively, particularly when a single modality is sufficient for disease identification. Multimodal fusion provides the greatest benefit in scenarios where complementary information across modalities contributes distinct and non-redundant features. Furthermore, fusing all available modalities may not be optimal due to increased computational cost and reduced inference efficiency; thus, selective modality integration and lightweight fusion strategies are essential to balance accuracy, interpretability, and efficiency in clinical deployment. Full article

► Show Figures

Figure 1

20 pages, 60255 KB

Open AccessArticle

A Multi-Atlas Dynamic Connectivity Transformer Fused with 4D Spatiotemporal Modeling for Autism Spectrum Disorder Recognition

by Monan Wang, Jiujiang Guo and Xiaojing Guo

Brain Sci. 2026, 16(4), 378; https://doi.org/10.3390/brainsci16040378 - 30 Mar 2026

Viewed by 271

Abstract

Background: The recognition of autism spectrum disorder (ASD) has been a challenge due to the heterogeneity in symptoms and complex variations in brain function. Resting-state functional magnetic resonance imaging (rs-fMRI) has become instrumental in studying these disorders by accessing underlying abnormal neural activity [...] Read more.

Background: The recognition of autism spectrum disorder (ASD) has been a challenge due to the heterogeneity in symptoms and complex variations in brain function. Resting-state functional magnetic resonance imaging (rs-fMRI) has become instrumental in studying these disorders by accessing underlying abnormal neural activity and connectivity. Recently, deep learning approaches have shifted the analysis of brain networks by capturing spatiotemporal information from fMRI sequences. Nonetheless, most existing studies are limited by relying on a single representational scale, typically restricting analysis to either voxel-level spatiotemporal patterns or static connectivity matrices. Additionally, the dynamic reconfiguration of functional coupling and its variations across different anatomical parcellations are often ignored, which obscures neurobiologically meaningful dynamics. Methods: In this regard, we propose a multi-atlas dynamic connectivity transformer fused with 4D spatiotemporal modeling for ASD recognition (MADCT-4D). Specifically, the framework comprises two complementary branches. The 4D spatiotemporal branch encodes raw rs-fMRI volumes to learn hierarchical representations of evolving neural activity, while the dynamic-connectivity branch models time-resolved functional connectivity sequences constructed from multiple atlases, enabling the network to capture dynamic reconfiguration at the connectome level under different parcellation granularities. Moreover, we perform late fusion by combining the branch-specific decision scores with a learnable gate, allowing the model to adaptively weight voxel-level dynamics and multi-atlas connectivity evidence for each subject. Results: Extensive experiments on the publicly available ABIDE dataset demonstrate that the proposed method achieves 90.2% accuracy for ASD recognition, outperforming multiple competitive baselines. Conclusions: The proposed framework yields interpretable biomarkers based on learned dynamic connectivity patterns that are consistent with altered functional coupling in ASD. Full article

(This article belongs to the Section Computational Neuroscience, Neuroinformatics, and Neurocomputing)

► Show Figures

Figure 1

19 pages, 1666 KB

Open AccessArticle

MTLL: A Novel Multi-Task Learning Approach for Lymphocytic Leukemia Classification and Nucleus Segmentation

by Cuisi Ou, Zhigang Hu, Xinzheng Wang, Kaiwen Cao and Yipei Wang

Electronics 2026, 15(7), 1419; https://doi.org/10.3390/electronics15071419 - 28 Mar 2026

Viewed by 228

Abstract

Bone marrow cell classification and nucleus segmentation in microscopic images are fundamental tasks for computer-aided diagnosis of lymphocytic leukemia. However, bone marrow cells from different subtypes exhibit high morphological similarity, and structural information is often constrained under optical microscopic imaging, posing challenges for [...] Read more.

Bone marrow cell classification and nucleus segmentation in microscopic images are fundamental tasks for computer-aided diagnosis of lymphocytic leukemia. However, bone marrow cells from different subtypes exhibit high morphological similarity, and structural information is often constrained under optical microscopic imaging, posing challenges for stable and effective feature representation. To address this issue, we propose MTLL (Multitask Model on Lymphocytic Leukemia), a novel multitask approach that performs cell classification and nucleus segmentation within a unified network to exploit their complementary information. The model constructs a hybrid backbone for shared feature representation based on a CNN-Transformer architecture, in which Fuse-MBConv modules are tightly integrated with multilayer multi-scale transformers to enable deep fusion of local texture and global semantic information. For the segmentation branch, we design an AM (Atrous Multilayer Perceptron) decoder that combines atrous spatial pyramid pooling with multilayer perceptrons to fuse multi-scale information and accurately delineate nucleus boundaries. The classification branch incorporates prior knowledge of cell nuclei structures to capture subtle variations in cellular morphology and texture, thereby enhancing the model’s ability to distinguish between leukemia subtypes. Experimental results demonstrate that the MTLL model significantly outperforms existing advanced single-task and multi-task models in both lymphocytic leukemia classification and cell nucleus segmentation. These results validate the effectiveness of the multi-task feature-sharing strategy for lymphocytic leukemia diagnosis using bone marrow microscopic images. Full article

► Show Figures

Figure 1

28 pages, 10414 KB

Open AccessArticle

MBFTFuse: A Triple-Path Adversarial Network Based on Modality Balancing and Feature-Tracing Compensation for Infrared and Visible Image Fusion

by Mingxi Chen, Bingting Zha, Rui Yang, Yuran Tan, Shaojie Ma and Zhen Zheng

Sensors 2026, 26(7), 2109; https://doi.org/10.3390/s26072109 - 28 Mar 2026

Viewed by 287

Abstract

Infrared and visible image fusion aims to integrate complementary information from heterogeneous images captured by different optical sensors based on distinct imaging principles; however, existing methods often exhibit modality bias, leading to weakened targets or the loss of crucial texture details. To address [...] Read more.

Infrared and visible image fusion aims to integrate complementary information from heterogeneous images captured by different optical sensors based on distinct imaging principles; however, existing methods often exhibit modality bias, leading to weakened targets or the loss of crucial texture details. To address this, we propose MBFTFuse, an adversarial fusion network based on modality balancing and feature tracing, which consists of a triple-path generator and dual discriminators. The architecture employs a generator with a triple-path structure: a central modality-balancing path for deep feature fusion and dual edge feature-tracing paths for modality-specific enhancement. Specifically, a multi-cognitive modality-balancing module is introduced to achieve feature weight equilibrium, while a Feature-Tracing Attention Module self-enhances single-modality features to compensate for information loss in the fusion results. Furthermore, a pixel loss based on intensity histograms is designed to optimize inter-modal balance at the pixel level. Comparative experiments against nine state-of-the-art methods across three public datasets demonstrate that MBFTFuse effectively highlights infrared targets while preserving intricate visible textures. The superior performance of this method in both quantitative metrics and downstream object detection tasks contributes to extending the boundaries of sensor-driven computer vision technologies. Full article

(This article belongs to the Special Issue Sensing and Imaging in Computer Vision)

► Show Figures

Figure 1

16 pages, 10364 KB

Open AccessArticle

A Method for Filling Blank Stripes in Electrical Imaging Based on the Fusion of Arbitrary Kernel Convolution and Generative Adversarial Networks

by Ruhan A, Die Liu, Ge Cao, Kun Meng, Taiping Zhao, Lili Tian, Bin Zhao, Guilan Lin and Sinan Fang

Appl. Sci. 2026, 16(7), 3267; https://doi.org/10.3390/app16073267 - 27 Mar 2026

Viewed by 310

Abstract

Electrical imaging logging images play a crucial role in petroleum exploration; however, in practical applications, blank strips frequently appear due to instrument malfunctions or data transmission failures, severely compromising geological interpretation and hydrocarbon evaluation. Existing image inpainting methods have limited adaptability to blank [...] Read more.

Electrical imaging logging images play a crucial role in petroleum exploration; however, in practical applications, blank strips frequently appear due to instrument malfunctions or data transmission failures, severely compromising geological interpretation and hydrocarbon evaluation. Existing image inpainting methods have limited adaptability to blank strips at different depth scales and exhibit blurred high-resolution geological textures. To address these issues, this paper proposes a blank strip filling method that integrates Arbitrary Kernel Convolution (AKConv) with the Aggregated Contextual-Transformations Generative Adversarial Network (AOT-GAN). Specifically, the adaptive sampling mechanism of AKConv is incorporated into the generator network of AOT-GAN, enabling the model—to effectively capture long-range contextual information and adaptively handle blank strips of varying scales and shapes through multi-scale feature fusion. Experimental results on real oilfield datasets demonstrate that the proposed method achieves significant improvements in PSNR, SSIM, and MAE, exhibiting superior structural preservation and texture sharpness—especially in restoring deep and large-scale blank strips. Furthermore, visual comparisons confirm the method’s superior performance in recovering key geological features, such as bedding continuity and fracture structures, thus providing an effective approach for electrical imaging logging image restoration. Full article

(This article belongs to the Special Issue Applied Geophysical Imaging and Data Processing, 2nd Edition)

► Show Figures

Figure 1

21 pages, 11455 KB

Open AccessArticle

Cross-Scale Spectral Calibration for Spatiotemporal Fusion of Remote Sensing Images

by Yishuo Tian, Xiaorong Xue, Jingtong Yang, Wen Zhang, Bingyan Lu, Xin Zhao and Wancheng Wang

Sensors 2026, 26(7), 2090; https://doi.org/10.3390/s26072090 - 27 Mar 2026

Viewed by 358

Abstract

Spatiotemporal fusion aims to generate remote sensing images with both high spatial and high temporal resolution by integrating multi-source observations. However, significant spectral inconsistencies often arise when fusing images acquired at different spatial scales, which severely degrade the radiometric fidelity and temporal reliability [...] Read more.

Spatiotemporal fusion aims to generate remote sensing images with both high spatial and high temporal resolution by integrating multi-source observations. However, significant spectral inconsistencies often arise when fusing images acquired at different spatial scales, which severely degrade the radiometric fidelity and temporal reliability of the fused results. Most existing methods focus on enhancing spatial details or temporal consistency, while the cross-scale spectral discrepancy between coarse- and fine-resolution images has not been sufficiently addressed. To tackle this issue, we propose a cross-scale spectral calibration framework for spatiotemporal fusion (XSC-Net), which explicitly models and corrects spectral responses across different spatial scales. The proposed method introduces a spatial feature refinement block to enhance spatially discriminative structures and a hierarchical spectral refinement block to adaptively calibrate channel-wise spectral representations. By jointly exploiting spatial and spectral correlations, the proposed framework effectively suppresses spectral distortion while preserving fine spatial details. Extensive experiments on the public CIA and LGC datasets indicate that XSC-Net compares favorably with state-of-the-art methods, demonstrating superior performance over established baselines. Furthermore, ablation studies verify the efficacy and contribution of the proposed architectural components. Full article

(This article belongs to the Section Remote Sensors)

► Show Figures

Figure 1

18 pages, 6071 KB

Open AccessArticle

DFENet: A Novel Dual-Path Feature Extraction Network for Semantic Segmentation of Remote Sensing Images

by Li Cao, Zishang Liu, Yan Wang and Run Gao

J. Imaging 2026, 12(3), 141; https://doi.org/10.3390/jimaging12030141 - 23 Mar 2026

Viewed by 295

Abstract

Semantic segmentation of remote sensing images (RSIs) is a fundamental task in geoscience research. However, designing efficient feature fusion modules remains challenging for existing dual-branch or multi-branch architectures. Furthermore, existing deep learning-based architectures predominantly concentrate on spatial feature modeling and context capturing while [...] Read more.

Semantic segmentation of remote sensing images (RSIs) is a fundamental task in geoscience research. However, designing efficient feature fusion modules remains challenging for existing dual-branch or multi-branch architectures. Furthermore, existing deep learning-based architectures predominantly concentrate on spatial feature modeling and context capturing while inherently neglecting the exploration and utilization of critical frequency-domain features, which is crucial for addressing issues of semantic confusion and blurred boundaries in complex remote sensing scenes. To address the challenges of feature fusion and the lack of frequency-domain information, we propose a novel dual-path feature extraction network (DFENet) in this paper. Specifically, a dual-path module (DPM) is developed in DFENet to extract global and local features, respectively. In the global path, after applying the channel splitting strategy, four feature extraction strategies are innovatively integrated to extract global features from different granularities. According to the strategy of supplementing frequency-domain information, a frequency-domain feature extraction block (FFEB) dominated by discrete Wavelet transform (DWT) is designed to effectively captures both high- and low-frequency components. Experimental results show that our method outperforms existing state-of-the-art methods in terms of segmentation performance, achieving a mean intersection over union (mIoU) of 83.09% on the ISPRS Vaihingen dataset and 86.05% on the ISPRS Potsdam dataset. Full article

(This article belongs to the Section Image and Video Processing)

► Show Figures

Figure 1

Search Results (1,226)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (1,226)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI