MDPI - Publisher of Open Access Journals

24 pages, 20337 KiB

Open AccessArticle

MEAC: A Multi-Scale Edge-Aware Convolution Module for Robust Infrared Small-Target Detection

by Jinlong Hu, Tian Zhang and Ming Zhao

Sensors 2025, 25(14), 4442; https://doi.org/10.3390/s25144442 - 16 Jul 2025

Viewed by 373

Infrared small-target detection remains a critical challenge in military reconnaissance, environmental monitoring, forest-fire prevention, and search-and-rescue operations, owing to the targets’ extremely small size, sparse texture, low signal-to-noise ratio, and complex background interference. Traditional convolutional neural networks (CNNs) struggle to detect such weak, [...] Read more.

Infrared small-target detection remains a critical challenge in military reconnaissance, environmental monitoring, forest-fire prevention, and search-and-rescue operations, owing to the targets’ extremely small size, sparse texture, low signal-to-noise ratio, and complex background interference. Traditional convolutional neural networks (CNNs) struggle to detect such weak, low-contrast objects due to their limited receptive fields and insufficient feature extraction capabilities. To overcome these limitations, we propose a Multi-Scale Edge-Aware Convolution (MEAC) module that enhances feature representation for small infrared targets without increasing parameter count or computational cost. Specifically, MEAC fuses (1) original local features, (2) multi-scale context captured via dilated convolutions, and (3) high-contrast edge cues derived from differential Gaussian filters. After fusing these branches, channel and spatial attention mechanisms are applied to adaptively emphasize critical regions, further improving feature discrimination. The MEAC module is fully compatible with standard convolutional layers and can be seamlessly embedded into various network architectures. Extensive experiments on three public infrared small-target datasets (SIRSTD-UAVB, IRSTDv1, and IRSTD-1K) demonstrate that networks augmented with MEAC significantly outperform baseline models using standard convolutions. When compared to eleven mainstream convolution modules (ACmix, AKConv, DRConv, DSConv, LSKConv, MixConv, PConv, ODConv, GConv, and Involution), our method consistently achieves the highest detection accuracy and robustness. Experiments conducted across multiple versions, including YOLOv10, YOLOv11, and YOLOv12, as well as various network levels, demonstrate that the MEAC module achieves stable improvements in performance metrics while slightly increasing computational and parameter complexity. These results validate the MEAC module’s significant advantages in enhancing the detection of small and weak objects and suppressing interference from complex backgrounds. These results validate MEAC’s effectiveness in enhancing weak small-target detection and suppressing complex background noise, highlighting its strong generalization ability and practical application potential. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

32 pages, 7048 KiB

Open AccessArticle

DCMC-UNet: A Novel Segmentation Model for Carbon Traces in Oil-Immersed Transformers Improved with Dynamic Feature Fusion and Adaptive Illumination Enhancement

by Hongxin Ji, Jiaqi Li, Zhennan Shi, Zijian Tang, Xinghua Liu and Peilin Han

Sensors 2025, 25(13), 3904; https://doi.org/10.3390/s25133904 - 23 Jun 2025

Viewed by 304

Abstract

For large oil-immersed transformers, their metal-enclosed structure poses significant challenges for direct visual inspection of internal defects. To ensure the effective detection of internal insulation defects, this study employs a self-developed micro-robot for internal visual inspection. Given the substantial morphological and dimensional variations [...] Read more.

For large oil-immersed transformers, their metal-enclosed structure poses significant challenges for direct visual inspection of internal defects. To ensure the effective detection of internal insulation defects, this study employs a self-developed micro-robot for internal visual inspection. Given the substantial morphological and dimensional variations of target defects (e.g., carbon traces produced by surface discharge inside the transformer), the intelligent and efficient extraction of carbon trace features from complex backgrounds becomes critical for robotic inspection. To address these challenges, we propose the DCMC-UNet, a semantic segmentation model for carbon traces containing adaptive illumination enhancement and dynamic feature fusion. For blurred carbon trace images caused by unstable light reflection and illumination in transformer oil, an improved CLAHE algorithm is developed, incorporating learnable parameters to balance luminance and contrast while enhancing edge features of carbon traces. To handle the morphological diversity and edge complexity of carbon traces, a dynamic deformable encoder (DDE) was integrated into the encoder, leveraging deformable convolutional kernels to improve carbon trace feature extraction. An edge-aware decoder (EAD) was integrated into the decoder, which extracts edge details from predicted segmentation maps and fuses them with encoded features to enrich edge features. To mitigate the semantic gap between the encoder and the decoder, we replace the standard skip connection with a cross-level attention connection fusion layer (CLFC), enhancing the multi-scale fusion of morphological and edge features. Furthermore, a multi-scale atrous feature aggregation module (MAFA) is designed in the neck to enhance the integration of deep semantic and shallow visual features, improving multi-dimensional feature fusion. Experimental results demonstrate that DCMC-UNet outperforms U-Net, U-Net++, and other benchmarks in carbon trace segmentation. For the transformer carbon trace dataset, it achieves better segmentation than the baseline U-Net, with an improved mIoU of 14.04%, Dice of 10.87%, pixel accuracy (P) of 10.97%, and overall accuracy (Acc) of 5.77%. The proposed model provides reliable technical support for surface discharge intensity assessment and insulation condition evaluation in oil-immersed transformers. Full article

(This article belongs to the Section Industrial Sensors)

► Show Figures

Figure 1

20 pages, 2150 KiB

Open AccessArticle

Industrial Image Anomaly Detection via Synthetic-Anomaly Contrastive Distillation

by Junxian Li, Mingxing Li, Shucheng Huang, Gang Wang and Xinjing Zhao

Sensors 2025, 25(12), 3721; https://doi.org/10.3390/s25123721 - 13 Jun 2025

Viewed by 599

Abstract

Industrial image anomaly detection plays a critical role in intelligent manufacturing by automatically identifying defective products through visual inspection. While unsupervised approaches eliminate dependency on annotated anomaly samples, current teacher–student framework-based methods still face two fundamental limitations: insufficient discriminative capability for structural anomalies [...] Read more.

Industrial image anomaly detection plays a critical role in intelligent manufacturing by automatically identifying defective products through visual inspection. While unsupervised approaches eliminate dependency on annotated anomaly samples, current teacher–student framework-based methods still face two fundamental limitations: insufficient discriminative capability for structural anomalies and suboptimal anomaly feature decoupling efficiency. To address these challenges, we propose a Synthetic-Anomaly Contrastive Distillation (SACD) framework for industrial anomaly detection. SACD comprises two pivotal components: (1) a reverse distillation (RD) paradigm whereby a pre-trained teacher network extracts hierarchically structured representations, subsequently guiding the student network with inverse architectural configuration to establish hierarchical feature alignment; (2) a group of feature calibration (FeaCali) modules designed to refine the student’s outputs by eliminating anomalous feature responses. During training, SACD adopts a dual-branch strategy, where one branch encodes multi-scale features from defect-free images, while a Siamese anomaly branch processes synthetically corrupted counterparts. FeaCali modules are trained to strip out a student’s anomalous patterns in anomaly branches, enhancing the student network’s exclusive modeling of normal patterns. We construct a dual-objective optimization integrating cross-model distillation loss and intra-model contrastive loss to train SACD for feature alignment and discrepancy amplification. At the inference stage, pixel-wise anomaly scores are computed through multi-layer feature discrepancies between the teacher’s representations and the student’s refined outputs. Comprehensive evaluations on the MVTec AD and BTAD benchmark demonstrate that our method is effective and superior to current knowledge distillation-based approaches. Full article

(This article belongs to the Section Industrial Sensors)

► Show Figures

Figure 1

19 pages, 1486 KiB

Open AccessArticle

A Dual-Enhanced Hierarchical Alignment Framework for Multimodal Named Entity Recognition

by Jian Wang, Yanan Zhou, Qi He and Wenbo Zhang

Appl. Sci. 2025, 15(11), 6034; https://doi.org/10.3390/app15116034 - 27 May 2025

Viewed by 463

Abstract

Multimodal amed entity recognition (MNER) is a natural language-processing technique that integrates text and visual modalities to detect and segment entity boundaries and their types from unstructured multimodal data. Although existing methods alleviate semantic deficiencies by optimizing image and text feature extraction and [...] Read more.

Multimodal amed entity recognition (MNER) is a natural language-processing technique that integrates text and visual modalities to detect and segment entity boundaries and their types from unstructured multimodal data. Although existing methods alleviate semantic deficiencies by optimizing image and text feature extraction and fusion, a fundamental challenge remains due to the lack of fine-grained alignment caused by cross-modal semantic deviations and image noise interference. To address these issues, this paper proposes a dual-enhanced hierarchical alignment (DEHA) framework that achieves dual semantic and spatial enhancement via global–local cooperative alignment optimization. The proposed framework incorporates a dual enhancement strategy comprising Semantic-Augmented Global Contrast (SAGC) and Multi-scale Spatial Local Contrast (MS-SLC), which reinforce the alignment of image and text modalities at the global sample level and local feature level, respectively, thereby reducing image noise. Additionally, a cross-modal feature fusion and vision-constrained CRF prediction layer is designed to achieve adaptive aggregation of global and local features. Experimental results on the Twitter-2015 and Twitter-2017 datasets yield F1 scores of 77.42% and 88.79%, outperforming baseline models. These results demonstrate that the global–local complementary mechanism effectively balances alignment precision and noise robustness, thereby enhancing entity recognition accuracy in social media and advancing multimodal semantic understanding. Full article

(This article belongs to the Special Issue Intelligence Image Processing and Patterns Recognition)

► Show Figures

Figure 1

25 pages, 6796 KiB

Open AccessArticle

SEANet: Semantic Enhancement and Amplification for Underwater Object Detection in Complex Visual Scenarios

by Ke Yang, Xiao Wang, Wei Wang, Xin Yuan and Xin Xu

Sensors 2025, 25(10), 3078; https://doi.org/10.3390/s25103078 - 13 May 2025

Viewed by 471

Abstract

Detecting underwater objects is a complex task due to the inherent challenges of low contrast and intricate backgrounds. The wide range of object scales further complicates detection accuracy. To address these issues, we propose a Semantic Enhancement and Amplification Network (SEANet), a framework [...] Read more.

Detecting underwater objects is a complex task due to the inherent challenges of low contrast and intricate backgrounds. The wide range of object scales further complicates detection accuracy. To address these issues, we propose a Semantic Enhancement and Amplification Network (SEANet), a framework designed to enhance underwater object detection in complex visual scenarios. SEANet integrates three core components: the Multi-Scale Detail Amplification Module (MDAM), the Semantic Enhancement Feature Pyramid (SE-FPN), and the Contrast Enhancement Module (CEM). MDAM expands the receptive field across multiple scales, enabling the capture of subtle features that are often masked by background similarities. SE-FPN combines multi-scale features, optimizing feature representation and improving the synthesis of information across layers. CEM incorporates Fore-Background Contrast Attention (FBC) to amplify the contrast between foreground and background objects, thereby improving focus on low-contrast features. These components collectively enhance the network’s ability to effectively identify critical underwater features. Extensive experiments on three distinct underwater object detection datasets demonstrate the efficacy and robustness of SEANet. Specifically, the framework achieves the highest

A P

(Average Precision) of 67.0% on the RUOD dataset, 53.0% on the URPC2021 dataset, and 71.5% on the DUO dataset. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

19 pages, 4766 KiB

Open AccessArticle

Research on Soil Pore Segmentation of CT Images Based on MMLFR-UNet Hybrid Network

by Changfeng Qin, Jie Zhang, Yu Duan, Chenyang Li, Shanzhi Dong, Feng Mu, Chengquan Chi and Ying Han

Agronomy 2025, 15(5), 1170; https://doi.org/10.3390/agronomy15051170 - 11 May 2025

Viewed by 563

Abstract

Accurate segmentation of soil pore structure is crucial for studying soil water migration, nutrient cycling, and gas exchange. However, the low-contrast and high-noise CT images in complex soil environments cause the traditional segmentation methods to have obvious deficiencies in accuracy and robustness. This [...] Read more.

Accurate segmentation of soil pore structure is crucial for studying soil water migration, nutrient cycling, and gas exchange. However, the low-contrast and high-noise CT images in complex soil environments cause the traditional segmentation methods to have obvious deficiencies in accuracy and robustness. This paper proposes a hybrid model combining a Multi-Modal Low-Frequency Reconstruction algorithm (MMLFR) and UNet (MMLFR-UNet). MMLFR enhances the key feature expression by extracting the image low-frequency signals and suppressing the noise interference through the multi-scale spectral decomposition, whereas UNet excels in the segmentation detail restoration and complexity boundary processing by virtue of its coding-decoding structure and the hopping connection mechanism. In this paper, an undisturbed soil column was collected in Hainan Province, China, which was classified as Ferralsols (FAO/UNESCO), and CT scans were utilized to acquire high-resolution images and generate high-quality datasets suitable for deep learning through preprocessing operations such as fixed-layer sampling, cropping, and enhancement. The results show that MMLFR-UNet outperforms UNet and traditional methods (e.g., Otsu and Fuzzy C-Means (FCM)) in terms of Intersection over Union (IoU), Dice Similarity Coefficients (DSC), Pixel Accuracy (PA), and boundary similarity. Notably, this model exhibits exceptional robustness and precision in segmentation tasks involving complex pore structures and low-contrast images. Full article

(This article belongs to the Section Precision and Digital Agriculture)

► Show Figures

Figure 1

16 pages, 2845 KiB

Open AccessArticle

HPANet: Hierarchical Path Aggregation Network with Pyramid Vision Transformers for Colorectal Polyp Segmentation

by Yuhong Ying, Haoyuan Li, Yiwen Zhong and Min Lin

Algorithms 2025, 18(5), 281; https://doi.org/10.3390/a18050281 - 11 May 2025

Viewed by 449

Abstract

The automatic segmentation technique for colorectal polyps in colonoscopy is considered critical for aiding physicians in real-time lesion identification and minimizing diagnostic errors such as false positives and missed lesions. Despite significant progress in existing research, accurate segmentation of colorectal polyps remains technically [...] Read more.

The automatic segmentation technique for colorectal polyps in colonoscopy is considered critical for aiding physicians in real-time lesion identification and minimizing diagnostic errors such as false positives and missed lesions. Despite significant progress in existing research, accurate segmentation of colorectal polyps remains technically challenging due to persistent issues such as low contrast between polyps and mucosa, significant morphological heterogeneity, and susceptibility to imaging artifacts caused by bubbles in the colorectal lumen and poor lighting conditions. To address these limitations, this study proposed a novel pyramid vision transformer-based hierarchical path aggregation network (HPANet) for polyp segmentation. Specifically, firstly, the backward multi-scale feature fusion module (BMFM) was developed to enhance the ability of processing polyps with different scales. Secondly, the forward noise reduction module (FNRM) was designed to learn the texture features of the upper and lower layers to reduce the influence of noise such as bubbles. Finally, in order to solve the problem of boundary ambiguity caused by repeated up and down sampling, the boundary feature refinement module (BFRM) was developed to further refine the boundary. The proposed network was compared with several representative networks on five public polyp datasets. Experimental results show that the proposed network achieves better segmentation performance, especially on the Kvasir SEG dataset, where the mDice and mIoU coefficients reach 0.9204 and 0.8655. Full article

► Show Figures

Figure 1

17 pages, 7809 KiB

Open AccessArticle

Research on X-Ray Weld Defect Detection of Steel Pipes by Integrating ECA and EMA Dual Attention Mechanisms

by Guanli Su, Xuanhe Su, Qunkai Wang, Weihong Luo and Wei Lu

Appl. Sci. 2025, 15(8), 4519; https://doi.org/10.3390/app15084519 - 19 Apr 2025

Cited by 1 | Viewed by 752

Abstract

The welding quality of industrial pipelines directly impacts structural safety. X-ray non-destructive testing (NDT), known for its non-invasive and efficient characteristics, is widely used for weld defect detection. However, challenges such as low contrast between defects and background, as well as large variations [...] Read more.

The welding quality of industrial pipelines directly impacts structural safety. X-ray non-destructive testing (NDT), known for its non-invasive and efficient characteristics, is widely used for weld defect detection. However, challenges such as low contrast between defects and background, as well as large variations in defect scales, reduce the accuracy of existing object detection models. To address these, an optimized detection model based on You Only Look Once (YOLO) v5 is proposed. Firstly, the Efficient Multi-Scale Attention (EMA) attention mechanism is integrated into the first Cross Stage Partial (C3) module of the backbone to enhance the model’s receptive field and the initial feature extraction. Secondly, the Efficient Channel Attention (ECA) attention mechanism is embedded before the Spatial Pyramaid Pooling Fast (SPPF) layer to enhance the model’s ability to extract small targets and key features. Finally, the Complete Intersection over Union (CIoU) loss is replaced with Wise Intersection over Union (WIoU) to improve localization accuracy and multi-scale detection performance. The experimental results show that the optimized model achieves a precision of 94.1%, a recall of 89.2%, and an mAP@0.5 of 94.6%, representing improvements by 11.5%, 5.4%, and 6.9%, respectively, over the original YOLOv5. The optimized model also outperforms several mainstream object detection models in weld defect detection. In terms of mAP@0.5, the optimized YOLOv5 model shows improvements of 14.89%, 13.02%, 6.1%, 19.37%, 7.1%, 7.5%, and 10.7% compared with the Faster-RCNN, SSD, RT-DETR, YOLOv3, YOLOv8, YOLOv9, and YOLOv10 models, respectively. This optimized model significantly enhances X-ray weld defect detection accuracy, meeting industrial application requirements and offering another high-precision solution for weld defect detection. Full article

► Show Figures

Figure 1

31 pages, 24332 KiB

Open AccessArticle

IDDNet: Infrared Object Detection Network Based on Multi-Scale Fusion Dehazing

by Shizun Sun, Shuo Han, Junwei Xu, Jie Zhao, Ziyu Xu, Lingjie Li, Zhaoming Han and Bo Mo

Sensors 2025, 25(7), 2169; https://doi.org/10.3390/s25072169 - 29 Mar 2025

Viewed by 572

Abstract

In foggy environments, infrared images suffer from reduced contrast, degraded details, and blurred objects, which impair detection accuracy and real-time performance. To tackle these issues, we propose IDDNet, a lightweight infrared object detection network that integrates multi-scale fusion dehazing. IDDNet includes a multi-scale [...] Read more.

In foggy environments, infrared images suffer from reduced contrast, degraded details, and blurred objects, which impair detection accuracy and real-time performance. To tackle these issues, we propose IDDNet, a lightweight infrared object detection network that integrates multi-scale fusion dehazing. IDDNet includes a multi-scale fusion dehazing (MSFD) module, which uses multi-scale feature fusion to eliminate haze interference while preserving key object details. A dedicated dehazing loss function, DhLoss, further improves the dehazing effect. In addition to MSFD, IDDNet incorporates three main components: (1) bidirectional polarized self-attention, (2) a weighted bidirectional feature pyramid network, and (3) multi-scale object detection layers. This architecture ensures high detection accuracy and computational efficiency. A two-stage training strategy optimizes the model’s performance, enhancing its accuracy and robustness in foggy environments. Extensive experiments on public datasets demonstrate that IDDNet achieves 89.4% precision and 83.9% AP, showing its superior accuracy, processing speed, generalization, and robust detection performance. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

28 pages, 4266 KiB

Open AccessArticle

Hierarchical Vision–Language Pre-Training with Freezing Strategy for Multi-Level Semantic Alignment

by Huiming Xie, Yang Qin and Shuxue Ding

Electronics 2025, 14(4), 816; https://doi.org/10.3390/electronics14040816 - 19 Feb 2025

Viewed by 995

Abstract

Vision–language pre-training (VLP) faces challenges in aligning hierarchical textual semantics (words/phrases/sentences) with multi-scale visual features (objects/relations/global context). We propose a hierarchical VLP model (HieVLP) that addresses such challenges through semantic decomposition and progressive alignment. Textually, a semantic parser deconstructs captions into word-, phrase-, [...] Read more.

Vision–language pre-training (VLP) faces challenges in aligning hierarchical textual semantics (words/phrases/sentences) with multi-scale visual features (objects/relations/global context). We propose a hierarchical VLP model (HieVLP) that addresses such challenges through semantic decomposition and progressive alignment. Textually, a semantic parser deconstructs captions into word-, phrase-, and sentence-level components, which are encoded via hierarchical BERT layers. Visually, a Swin Transformer extracts object- (local), relation- (mid-scale), and global-level features through shifted window hierarchies. During pre-training, a freezing strategy sequentially activates text layers (sentence→phrase→word), aligning each with the corresponding visual scales via contrastive and language modeling losses. The experimental evaluations demonstrate that HieVLP outperforms hierarchical baselines across various tasks, with the performance improvements ranging from approximately 3.2% to 11.2%. In the image captioning task, HieVLP exhibits an average CIDEr improvement of around 7.2% and a 2.1% improvement in the SPICE metric. For image–text retrieval, it achieves recall increases of 4.7–6.8%. In reasoning tasks, HieVLP boosts accuracy by 2.96–5.8%. These results validate that explicit multi-level alignment enables contextually coherent caption generation and precise cross-modal reasoning. Full article

(This article belongs to the Section Artificial Intelligence)

► Show Figures

Figure 1

24 pages, 10147 KiB

Open AccessArticle

A Siamese Network via Cross-Domain Robust Feature Decoupling for Multi-Source Remote Sensing Image Registration

by Qichao Han, Xiyang Zhi, Shikai Jiang, Wenbin Chen, Yuanxin Huang, Lijian Yu and Wei Zhang

Remote Sens. 2025, 17(4), 646; https://doi.org/10.3390/rs17040646 - 13 Feb 2025

Cited by 1 | Viewed by 1216

Abstract

Image registration is a prerequisite for many multi-source remote sensing image fusion applications. However, due to differences in imaging factors such as sensor type, imaging time, resolution, and viewing angle, multi-source image registration faces challenges of multidimensional coupling such as radiation, scale, and [...] Read more.

Image registration is a prerequisite for many multi-source remote sensing image fusion applications. However, due to differences in imaging factors such as sensor type, imaging time, resolution, and viewing angle, multi-source image registration faces challenges of multidimensional coupling such as radiation, scale, and directional differences. To address this issue, this paper proposes a Siamese network based on cross-domain robust feature decoupling as an image registration framework (CRS-Net), aiming to improve the robustness of multi-source image features across domains, scales, and rotations. Firstly, we design Siamese multiscale encoders and introduce a rotation-invariant convolutional layer without additional training parameters, achieving natural invariance to any rotation. Secondly, we propose a modality-independent decoder that utilizes the self-similarity of feature neighborhoods to excavate stable high-order structural information. Thirdly, we introduce cluster-aware contrastive constraints to learn discriminative and stable keypoint pairs. Finally, we design three multi-source remote sensing datasets and conduct sufficient experiments. Numerous experimental results show that our proposed method outperforms other SOTA methods and achieves more accurate registration in complex multi-source remote sensing scenes. Full article

► Show Figures

Figure 1

19 pages, 8290 KiB

Open AccessArticle

Multi-Scale Contrastive Learning with Hierarchical Knowledge Synergy for Visible-Infrared Person Re-Identification

by Yongheng Qian and Su-Kit Tang

Sensors 2025, 25(1), 192; https://doi.org/10.3390/s25010192 - 1 Jan 2025

Cited by 1 | Viewed by 1323

Abstract

Visible-infrared person re-identification (VI-ReID) is a challenging cross-modality retrieval task to match a person across different spectral camera views. Most existing works focus on learning shared feature representations from the final embedding space of advanced networks to alleviate modality differences between visible and [...] Read more.

Visible-infrared person re-identification (VI-ReID) is a challenging cross-modality retrieval task to match a person across different spectral camera views. Most existing works focus on learning shared feature representations from the final embedding space of advanced networks to alleviate modality differences between visible and infrared images. However, exclusively relying on high-level semantic information from the network’s final layers can restrict shared feature representations and overlook the benefits of low-level details. Different from these methods, we propose a multi-scale contrastive learning network (MCLNet) with hierarchical knowledge synergy for VI-ReID. MCLNet is a novel two-stream contrastive deep supervision framework designed to train low-level details and high-level semantic representations simultaneously. MCLNet utilizes supervised contrastive learning (SCL) at each intermediate layer to strengthen visual representations and enhance cross-modality feature learning. Furthermore, a hierarchical knowledge synergy (HKS) strategy for pairwise knowledge matching promotes explicit information interaction across multi-scale features and improves information consistency. Extensive experiments on three benchmarks demonstrate the effectiveness of MCLNet. Full article

(This article belongs to the Special Issue Artificial Intelligence in Computer Vision: Methods and Applications—2nd Edition)

► Show Figures

Figure 1

20 pages, 4647 KiB

Open AccessArticle

DeSPPNet: A Multiscale Deep Learning Model for Cardiac Segmentation

by Elizar Elizar, Rusdha Muharar and Mohd Asyraf Zulkifley

Diagnostics 2024, 14(24), 2820; https://doi.org/10.3390/diagnostics14242820 - 14 Dec 2024

Viewed by 1289

Abstract

Background: Cardiac magnetic resonance imaging (MRI) plays a crucial role in monitoring disease progression and evaluating the effectiveness of treatment interventions. Cardiac MRI allows medical practitioners to assess cardiac function accurately by providing comprehensive and quantitative information about the structure and function, hence [...] Read more.

Background: Cardiac magnetic resonance imaging (MRI) plays a crucial role in monitoring disease progression and evaluating the effectiveness of treatment interventions. Cardiac MRI allows medical practitioners to assess cardiac function accurately by providing comprehensive and quantitative information about the structure and function, hence making it an indispensable tool for monitoring the disease and treatment response. Deep learning-based segmentation enables the precise delineation of cardiac structures including the myocardium, right ventricle, and left ventricle. The accurate segmentation of these structures helps in the diagnosis of heart failure, cardiac functional response to therapies, and understanding the state of the heart functions after treatment. Objectives: The objective of this study is to develop a multiscale deep learning model to segment cardiac organs based on MRI imaging data. Good segmentation performance is difficult to achieve due to the complex nature of the cardiac structure, which includes a variety of chambers, arteries, and tissues. Furthermore, the human heart is also constantly beating, leading to motion artifacts that reduce image clarity and consistency. As a result, a multiscale method is explored to overcome various challenges in segmenting cardiac MRI images. Methods: This paper proposes DeSPPNet, a multiscale-based deep learning network. Its foundation follows encoder–decoder pair architecture that utilizes the Spatial Pyramid Pooling (SPP) layer to improve the performance of cardiac semantic segmentation. The SPP layer is designed to pool features from densely convolutional layers at different scales or sizes, which will be combined to maintain a set of spatial information. By processing features at different spatial resolutions, the multiscale densely connected layer in the form of the Pyramid Pooling Dense Module (PPDM) helps the network to capture both local and global context, preserving finer details of the cardiac structure while also capturing the broader context required to accurately segment larger cardiac structures. The PPDM is incorporated into the deeper layer of the encoder section of the deep learning network to allow it to recognize complex semantic features. Results: An analysis of multiple PPDM placement scenarios and structural variations revealed that the 3-path PPDM, positioned at the encoder layer 5, yielded optimal segmentation performance, achieving dice, intersection over union (IoU), and accuracy scores of 0.859, 0.800, and 0.993, respectively. Conclusions: Different PPDM configurations produce a different effect on the network; as such, a shallower layer placement, like encoder layer 4, retains more spatial data that need more parallel paths to gather the optimal set of multiscale features. In contrast, deeper layers contain more informative features but at a lower spatial resolution, which reduces the number of parallel paths required to provide optimal multiscale context. Full article

(This article belongs to the Special Issue Advances in Imaging Diagnosis and Management of Cardiovascular and Pulmonary Diseases)

► Show Figures

Figure 1

23 pages, 9520 KiB

Open AccessArticle

Visual Feature-Guided Diamond Convolutional Network for Finger Vein Recognition

by Qiong Yao, Dan Song, Xiang Xu and Kun Zou

Sensors 2024, 24(18), 6097; https://doi.org/10.3390/s24186097 - 20 Sep 2024

Cited by 2 | Viewed by 974

Abstract

Finger vein (FV) biometrics have garnered considerable attention due to their inherent non-contact nature and high security, exhibiting tremendous potential in identity authentication and beyond. Nevertheless, challenges pertaining to the scarcity of training data and inconsistent image quality continue to impede the effectiveness [...] Read more.

Finger vein (FV) biometrics have garnered considerable attention due to their inherent non-contact nature and high security, exhibiting tremendous potential in identity authentication and beyond. Nevertheless, challenges pertaining to the scarcity of training data and inconsistent image quality continue to impede the effectiveness of finger vein recognition (FVR) systems. To tackle these challenges, we introduce the visual feature-guided diamond convolutional network (dubbed ‘VF-DCN’), a uniquely configured multi-scale and multi-orientation convolutional neural network. The VF-DCN showcases three pivotal innovations: Firstly, it meticulously tunes the convolutional kernels through multi-scale Log-Gabor filters. Secondly, it implements a distinctive diamond-shaped convolutional kernel architecture inspired by human visual perception. This design intelligently allocates more orientational filters to medium scales, which inherently carry richer information. In contrast, at extreme scales, the use of orientational filters is minimized to simulate the natural blurring of objects at extreme focal lengths. Thirdly, the network boasts a deliberate three-layer configuration and fully unsupervised training process, prioritizing simplicity and optimal performance. Extensive experiments are conducted on four FV databases, including MMCBNU_6000, FV_USM, HKPU, and ZSC_FV. The experimental results reveal that VF-DCN achieves remarkable improvement with equal error rates (EERs) of

0.17 %

,

0.19 %

,

2.11 %

, and

0.65 %

, respectively, and Accuracy Rates (ACC) of

100 %

,

99.97 %

,

98.92 %

, and

99.36 %

, respectively. These results indicate that, compared with some existing FVR approaches, the proposed VF-DCN not only achieves notable recognition accuracy but also shows fewer number of parameters and lower model complexity. Moreover, VF-DCN exhibits superior robustness across diverse FV databases. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

21 pages, 2923 KiB

Open AccessArticle

Multi-Scale Classification and Contrastive Regularization: Weakly Supervised Large-Scale 3D Point Cloud Semantic Segmentation

by Jingyi Wang, Jingyang He, Yu Liu, Chen Chen, Maojun Zhang and Hanlin Tan

Remote Sens. 2024, 16(17), 3319; https://doi.org/10.3390/rs16173319 - 7 Sep 2024

Viewed by 1751

Abstract

With the proliferation of large-scale 3D point cloud datasets, the high cost of per-point annotation has spurred the development of weakly supervised semantic segmentation methods. Current popular research mainly focuses on single-scale classification, which fails to address the significant feature scale differences between [...] Read more.

With the proliferation of large-scale 3D point cloud datasets, the high cost of per-point annotation has spurred the development of weakly supervised semantic segmentation methods. Current popular research mainly focuses on single-scale classification, which fails to address the significant feature scale differences between background and objects in large scenes. Therefore, we propose MCCR (Multi-scale Classification and Contrastive Regularization), an end-to-end semantic segmentation framework for large-scale 3D scenes under weak supervision. MCCR first aggregates features and applies random downsampling to the input data. Then, it captures the local features of a random point based on multi-layer features and the input coordinates. These features are then fed into the network to obtain the initial and final prediction results, and MCCR iteratively trains the model using strategies such as contrastive learning. Notably, MCCR combines multi-scale classification with contrastive regularization to fully exploit multi-scale features and weakly labeled information. We investigate both point-level and local contrastive regularization to leverage point cloud augmentor and local semantic information and introduce a Decoupling Layer to guide the loss optimization in different spaces. Results on three popular large-scale datasets, S3DIS, SemanticKITTI and SensatUrban, demonstrate that our model achieves state-of-the-art (SOTA) performance on large-scale outdoor datasets with only 0.1% labeled points for supervision, while maintaining strong performance on indoor datasets. Full article

(This article belongs to the Special Issue Advanced Application of Artificial Intelligence and Machine Vision in Remote Sensing (Third Edition))

► Show Figures

Figure 1

Search Results (41)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (41)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI