Advances in Image Recognition, Image Segmentation, Image Fusion, and Singal Processing

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Computer Science & Engineering".

Deadline for manuscript submissions: 15 June 2026 | Viewed by 7291

Special Issue Editor

Foundation Model Group, Artificial Intelligence Department, Brookhaven National Laboratory, Upton, NY 11741, USA
Interests: weakly supervised Image segmentation; domain generalization; information theoretical learning; visual-language model; LLM agent

Special Issue Information

Dear Colleagues,

We are pleased to invite you to contribute to our Special Issue titled "Advances in Image Recognition, Image Segmentation, Image Fusion, and Singal Processing". This research area stands at the forefront of modern computational techniques, offering transformative applications in medical imaging, remote sensing, autonomous vehicles, and more. With rapid advancements in machine learning and deep learning, innovative approaches in image processing and analysis are becoming increasingly critical. In particular, the integration of visual-language models is revolutionizing how systems understand and relate visual content to natural language, while novel grounding segmentation techniques are enhancing object localization by leveraging textual and contextual cues. This Special Issue aims to highlight the latest methodologies and breakthroughs that address both the theoretical and practical challenges in these domains.

This Special Issue aims to gather a collection of high-quality articles that explore advanced techniques and novel applications in image recognition, segmentation, fusion, and signal processing. We aim to foster interdisciplinary collaborations and present research that bridges cutting-edge approaches with real-world applications.

In this Special Issue, original research articles and reviews are welcome. Research areas may include (but not limited to) the following:

  • Advanced Algorithms for Image Recognition and Classification: Cutting-edge approaches in feature extraction, deep learning models, and performance optimization.
  • Innovative Techniques in Image Segmentation: Including traditional methods and deep learning-based segmentation, with a focus on both semantic and instance segmentation.
  • Visual-Language Models: Research on models that integrate visual data with natural language processing, enabling richer interpretation of scenes and enhanced human-machine interaction.
  • Grounding Segmentation Techniques: Approaches that leverage textual cues and contextual information to improve segmentation accuracy and object localization.
  • Methods and Applications in Image Fusion: Integrating data from multiple sources to generate comprehensive images for improved decision-making.
  • Novel Developments in Signal Processing: Advanced methodologies for image and multimedia analysis across diverse applications.
  • Case Studies and Comparative Evaluations: Empirical studies demonstrating the practical impacts and performance comparisons of state-of-the-art methods.

We look forward to receiving your contributions.

Dr. Xi Yu
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 250 words) can be sent to the Editorial Office for assessment.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • image and video segmentation
  • image and video understanding
  • visual-language model
  • grounding segmentation
  • multi-modality (e.g., image, text, and video) learning

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (6 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

21 pages, 1719 KB  
Article
DA-UNet: A Direction-Aware U-Net for Leaf Vein Segmentation in Tissue-Cultured Plantlets
by Qiuze Wu, Qing Yang, Dong Meng and Xiaofei Yan
Electronics 2026, 15(7), 1531; https://doi.org/10.3390/electronics15071531 - 6 Apr 2026
Viewed by 458
Abstract
For the automation of Agrobacterium-mediated genetic transformation of tissue-cultured plantlets, accurate leaf vein segmentation is essential. The thin, low-contrast structure of leaf veins frequently leads to fragmented segmentation outputs, despite the proposal of various methodologies for vein segmentation. To address this issue, we [...] Read more.
For the automation of Agrobacterium-mediated genetic transformation of tissue-cultured plantlets, accurate leaf vein segmentation is essential. The thin, low-contrast structure of leaf veins frequently leads to fragmented segmentation outputs, despite the proposal of various methodologies for vein segmentation. To address this issue, we propose Direction-Aware U-Net (DA-UNet), an improved U-Net architecture that incorporates a Direction-Aware Context Pooling (DACPool) module and Topology-aware Segmentation loss (TopoSeg loss). The DACPool module explicitly exploits vein orientation to aggregate directional contextual information, while the TopoSeg loss jointly optimizes pixel-level accuracy and topological continuity. DA-UNet achieves efficient leaf vein segmentation with improved continuity and structural integrity, according to evaluations on the self-constructed Tissue-Cultured Plantlet Vein Dataset 2025 (TCPVD2025). Comparative experiment results show that the improved model outperforms PSPNet, DeepLabV3+, U-Net, TransUNet, Swin-UNet, CCNet, and SegNeXt, as evidenced by Recall, Dice, and CONNECT scores of 71.35%, 69.08%, and −2.25, while maintaining competitive Precision of 66.98%. Ablation experiment results provide further evidence for the efficacy of the TopoSeg loss and the DACPool module. The results demonstrate the effectiveness of the proposed vein segmentation framework for generating outputs that are both accurate and structurally consistent, thus enabling reliable automated processes for plant genetic transformation. Full article
Show Figures

Figure 1

16 pages, 829 KB  
Article
Hyperspectral Images Anomaly Detection Based on Rapid Collaborative Representation and EMP
by Jiaxin Li, Xiaowei Shen, Fang He, Jianwei Zhao, Haojie Hu and Weimin Jia
Electronics 2025, 14(24), 4878; https://doi.org/10.3390/electronics14244878 - 11 Dec 2025
Viewed by 734
Abstract
Hyperspectral anomaly detection (HAD) refers to a method of identifying abnormal targets through the differences in spectral separabilities of anomaly versus background clutter. It plays a significant role in fields such as commercial agriculture, for instance, in pest and disease monitoring and environmental [...] Read more.
Hyperspectral anomaly detection (HAD) refers to a method of identifying abnormal targets through the differences in spectral separabilities of anomaly versus background clutter. It plays a significant role in fields such as commercial agriculture, for instance, in pest and disease monitoring and environmental monitoring. Collaborative representation detector (CRD) is a classic hyperspectral anomaly detection method. However, by constructing a sliding dual window, it leads to a high computational complexity and thus takes a relatively long time. In response to the deficiencies existing in that CRD method, we propose a method that first extracts extended morphological profiles (EMP) and then uses the obtained feature images to construct K-means CRD (EMPKCRD). This method performs window reconstruction on complex hyperspectral background pixels through the K-means clustering algorithm to separate abnormal pixels with similar features and obtain the background dictionary matrix. The method leverages the observation that background pixels can be effectively approximated by a linear combination of their spatially adjacent pixels, whereas anomalous pixels, due to their distinct nature, cannot be similarly reconstructed from their local neighborhood. This fundamental disparity in reconstructibility is then exploited to separate anomalies from the background. Then, anomaly detection can be carried out on this matrix faster, avoiding the high computational complexity caused by the use of a sliding dual window. Through comparative simulation experiments with seven widely used algorithms at present on three real-world datasets, the empirical evaluations validate that this method has excellent performance while exhibiting a favorable balance between detection precision and operational speed. Full article
Show Figures

Figure 1

16 pages, 3897 KB  
Article
Beyond RGB: Early Stage Fusion of Thermal and Visual Modalities for Robust Maritime Perception
by Ondrej Kafka, Christian Rankl and David Moser
Electronics 2025, 14(23), 4746; https://doi.org/10.3390/electronics14234746 - 2 Dec 2025
Viewed by 1240
Abstract
In maritime environments, reliable object detection and semantic segmentation are essential for navigation and collision avoidance, especially under adverse conditions. This paper benchmarks early stage RGB–thermal (RGBT) fusion architectures for these tasks using a novel, pixel-aligned maritime dataset. We evaluate transformer-based, attention-driven, and [...] Read more.
In maritime environments, reliable object detection and semantic segmentation are essential for navigation and collision avoidance, especially under adverse conditions. This paper benchmarks early stage RGB–thermal (RGBT) fusion architectures for these tasks using a novel, pixel-aligned maritime dataset. We evaluate transformer-based, attention-driven, and lightweight convolutional models, analyzing trade-offs between accuracy and efficiency for edge deployment. Our results show that RGBT fusion significantly improved detection robustness, with transformer models achieving the top accuracy and lightweight models like WNet-S offering strong performance with lower computational costs. We also introduce a modular, open-source fusion framework to support reproducible research and practical deployment in maritime and other safety-critical domains. Full article
Show Figures

Figure 1

23 pages, 27054 KB  
Article
ActionMamba: Action Spatial–Temporal Aggregation Network Based on Mamba and GCN for Skeleton-Based Action Recognition
by Jinglong Wen, Dan Liu and Bin Zheng
Electronics 2025, 14(18), 3610; https://doi.org/10.3390/electronics14183610 - 11 Sep 2025
Cited by 2 | Viewed by 2539
Abstract
Skeleton-based action recognition networks have widely adopted the approach of Graph Convolutional Networks (GCN) due to their superior capabilities in modeling data topology, but several key issues still require further investigation. Firstly, the graph convolutional network extracts action features by applying temporal convolution [...] Read more.
Skeleton-based action recognition networks have widely adopted the approach of Graph Convolutional Networks (GCN) due to their superior capabilities in modeling data topology, but several key issues still require further investigation. Firstly, the graph convolutional network extracts action features by applying temporal convolution to each key point, which causes the model to ignore the temporal connections between different important points. Secondly, the local receptive field of graph convolutional networks limits their ability to capture correlations between non-adjacent joints. Motivated by the State Space Model (SSM), we propose an Action Spatio-temporal Aggregation Network, named ActionMamba. Specifically, we introduce a novel embedding module called the Action Characteristic Encoder (ACE), which enhances the coupling of temporal and spatial information in skeletal features by combining intrinsic spatio-temporal encoding with extrinsic space encoding. Additionally, we design an Action Perception Model (APM) based on Mamba and GCN. By effectively combining the excellent feature processing capabilities of GCN with the outstanding global information modeling capabilities of Mamba, APM is able to comprehend the hidden features between different joints and selectively filter information from various joints. Extensive experimental results demonstrate that ActionMamba achieves highly competitive performance on three challenging benchmark datasets: NTU-RGB+D 60, NTU-RGB+D 120, and UAV–Human. Full article
Show Figures

Figure 1

30 pages, 59872 KB  
Article
Advancing 3D Seismic Fault Identification with SwiftSeis-AWNet: A Lightweight Architecture Featuring Attention-Weighted Multi-Scale Semantics and Detail Infusion
by Ang Li, Rui Li, Yuhao Zhang, Shanyi Li, Yali Guo, Liyan Zhang and Yuqing Shi
Electronics 2025, 14(15), 3078; https://doi.org/10.3390/electronics14153078 - 31 Jul 2025
Viewed by 977
Abstract
The accurate identification of seismic faults, which serve as crucial fluid migration pathways in hydrocarbon reservoirs, is of paramount importance for reservoir characterization. Traditional interpretation is inefficient. It also struggles with complex geometries, failing to meet the current exploration demands. Deep learning boosts [...] Read more.
The accurate identification of seismic faults, which serve as crucial fluid migration pathways in hydrocarbon reservoirs, is of paramount importance for reservoir characterization. Traditional interpretation is inefficient. It also struggles with complex geometries, failing to meet the current exploration demands. Deep learning boosts fault identification significantly but struggles with edge accuracy and noise robustness. To overcome these limitations, this research introduces SwiftSeis-AWNet, a novel lightweight and high-precision network. The network is based on an optimized MedNeXt architecture for better fault edge detection. To address the noise from simple feature fusion, a Semantics and Detail Infusion (SDI) module is integrated. Since the Hadamard product in SDI can cause information loss, we engineer an Attention-Weighted Semantics and Detail Infusion (AWSDI) module that uses dynamic multi-scale feature fusion to preserve details. Validation on field seismic datasets from the Netherlands F3 and New Zealand Kerry blocks shows that SwiftSeis-AWNet mitigates challenges like the loss of small-scale fault features and misidentification of fault intersection zones, enhancing the accuracy and geological reliability of automated fault identification. Full article
Show Figures

Figure 1

16 pages, 6397 KB  
Article
Heterogenous Image Matching Fusion Based on Cumulative Structural Similarity
by Nan Zhu, Shiman Yang and Zhongxun Wang
Electronics 2025, 14(13), 2693; https://doi.org/10.3390/electronics14132693 - 3 Jul 2025
Viewed by 674
Abstract
To solve the problem of the limited capability of multimodal image feature descriptors constructed by gradient information and the phase consistency principle, a method of cumulative structure feature descriptor construction with rotation invariance is proposed in this paper. Firstly, we extract the direction [...] Read more.
To solve the problem of the limited capability of multimodal image feature descriptors constructed by gradient information and the phase consistency principle, a method of cumulative structure feature descriptor construction with rotation invariance is proposed in this paper. Firstly, we extract the direction of multi-scale and multi-direction feature point edges using the Log-Gabor odd-symmetric filter and calculate the amplitude of pixel edges based on the phase consistency principle. Then, the main direction of the key points is determined based on the edge direction feature map, and the coordinates are established according to the main direction to ensure that the feature point descriptor has rotation invariance. Finally, the Log-Gabor odd-symmetric filter calculates the cumulative structural response in the maximum direction and constructs a highly identifiable descriptor with rotation invariance. We select several representative heterogeneous images as test data and compare the matching performance of the proposed algorithm with several excellent descriptors. The results indicate that the descriptor constructed in this paper is more robust than other descriptors for heterosource images with rotation changes. Full article
Show Figures

Figure 1

Back to TopTop