mathematics-logo

Journal Browser

Journal Browser

Structural Networks for Image Application

A special issue of Mathematics (ISSN 2227-7390). This special issue belongs to the section "E1: Mathematics and Computer Science".

Deadline for manuscript submissions: closed (30 April 2026) | Viewed by 7863

Special Issue Editors


E-Mail Website
Guest Editor
School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
Interests: video/image restoration and recognition; image generation; speech processing and intelligent transportation; big model technology and multimodality
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
School of Computer Science and Technology, Xidian University, Xi’an 710071, China
Interests: object detection; computer vision; medical image analysis and processing; pattern recognition

E-Mail Website
Guest Editor
1. School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
2. Guangdong Province Key Laboratory of Information Security Technology, Guangzhou, China
3. Key Laboratory of Machine Intelligence and Advanced Computing, Ministry of Education, Beijing, China
Interests: computer vision; 3D human video prediction and generation; multimodal video understanding; multimodal large models; ocean large models; trajectory prediction
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
College of Artificial Intelligence, Nanjing University of Aeronauticsand Astronautics, Nanjing 210023, China
Interests: artificial intelligence security; autonomous driving security; IoT security; computational visual security; software vulnerability analysis; formal methods; embedded systems; edge AI; microservices

E-Mail Website
Guest Editor
School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen 518055, China
Interests: deep learning; artificial intelligence; 3D vision; image super-resolution; video enhancement

Special Issue Information

Dear Colleagues,

Structural networks are a transformative approach to image application, integrating principles from graph theory, topology, computational geometry, deep learning, and more. They focus on modeling complex relationships within image data. This paradigm enables the capture of hierarchical, spatial, and semantic dependencies critical for advanced image understanding. As an important component of neural networks, structural networks enable models to parse intricate visual scenes, infer contextual relationships, and generate robust representations that transcend pixel-level analysis.

The scope of this Special Issue encompasses advances in structural network methodologies and their innovative applications in imaging domains, including, but not limited to, computer vision, pattern recognition, natural language understanding, intelligent robotics, and deep learning.

This Special Issue aims to build a collaborative community for researchers to present cutting-edge developments, propose novel frameworks, and bridge theoretical innovations with real-world implementations. We invite papers on topics including, but not limited to, architectural design, biomedical diagnostics, autonomous navigation, satellite imagery analysis, augmented reality and efficient learning for large-scale/high-resolution images. We welcome theoretically rigorous and experimentally validated contributions that advance the frontier of structural networks in imaging. Papers may span algorithms, architectures, benchmarks, and interdisciplinary case studies.

Dr. Chunwei Tian
Dr. Shuai Wu
Dr. Jian-Fang Hu
Dr. Yinbo Yu
Dr. Xin Wang
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 250 words) can be sent to the Editorial Office for assessment.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Mathematics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • deep learning,
  • deep networks
  • image processing
  • optimization methods
  • statistical methods

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (6 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

27 pages, 1334 KB  
Article
ETR: Event-Centric Temporal Reasoning for Question-Conditioned Video Question Answering
by Lingmin Pan, Ziyi Gao, Yueming Zhu, Fuchen Chen, Chengyuan Zhang, Dan Yin, Yong Cai, Siqiao Tan and Lei Zhu
Mathematics 2026, 14(5), 913; https://doi.org/10.3390/math14050913 - 7 Mar 2026
Viewed by 579
Abstract
Video Question Answering (VideoQA) requires a deep understanding of dynamic video content, integrating spatial reasoning, temporal dependencies, and language comprehension. Existing methods often struggle with long or semantically complex videos due to the lack of question-guided keyframe weight adjustment and the absence of [...] Read more.
Video Question Answering (VideoQA) requires a deep understanding of dynamic video content, integrating spatial reasoning, temporal dependencies, and language comprehension. Existing methods often struggle with long or semantically complex videos due to the lack of question-guided keyframe weight adjustment and the absence of question-aligned cross-modal description generation. To address these challenges, we propose ETR (Event-centric Temporal Reasoning), an adaptive framework for VideoQA. ETR introduces three key mechanisms: (i) a hierarchical weight adjustment selector to identify questions requiring event-centric temporal reasoning; (ii) a T-Route that segments videos into semantically coherent events and dynamically adjusts keyframe weights with question intent; and (iii) a question-conditioned prompting strategy that focuses on key objects to generate textual prompts aligned with a question’s semantics. This hierarchical and adaptive design effectively balances visual and textual information, enhances temporal reasoning, and improves object-centric alignment. Experiments on two datasets demonstrate that ETR achieves competitive performance in fine question-aware VideoQA. Full article
(This article belongs to the Special Issue Structural Networks for Image Application)
Show Figures

Figure 1

19 pages, 3664 KB  
Article
Hybrid-Frequency-Aware Mixture-of-Experts Method for CT Metal Artifact Reduction
by Pengju Liu, Hongzhi Zhang, Chuanhao Zhang and Feng Jiang
Mathematics 2026, 14(3), 494; https://doi.org/10.3390/math14030494 - 30 Jan 2026
Viewed by 623
Abstract
In clinical CT imaging, high-density metallic implants often induce severe metal artifacts that obscure critical anatomical structures and degrade image quality, thereby hindering accurate diagnosis. Although deep learning has advanced CT metal artifact reduction (CT-MAR), many methods do not effectively use frequency information, [...] Read more.
In clinical CT imaging, high-density metallic implants often induce severe metal artifacts that obscure critical anatomical structures and degrade image quality, thereby hindering accurate diagnosis. Although deep learning has advanced CT metal artifact reduction (CT-MAR), many methods do not effectively use frequency information, which can limit the recovery of both fine details and overall image structure. To address this limitation, we propose a Hybrid-Frequency-Aware Mixture-of-Experts (HFMoE) network for CT-MAR. The proposed method synergizes the spatial-frequency localization of the wavelet transform with the global spectral representation of the Fourier transform to achieve precise multi-scale modeling of artifact characteristics. Specifically, we design a hybrid-frequency interaction encoder with three specialized branches, incorporating wavelet-domain, Fourier-domain, and cascaded wavelet–Fourier modulation, to distinctively refine local details, global structures, and complex cross-domain features. Then, they are fused via channel attention to yield a comprehensive representation. Furthermore, a Frequency-Aware Mixture-of-Experts (MoE) mechanism is introduced to dynamically route features to specific frequency experts based on the degradation severity, thereby adaptively assigning appropriate receptive fields to handle varying metal artifacts. Evaluations on synthetic (DeepLesion) and clinical (SpineWeb, CLINIC-metal) datasets show that HFMoE outperforms existing methods in both quantitative metrics and visual quality. Our method demonstrates the value of explicit frequency-domain adaptation for CT-MAR and could inform the design of other image restoration tasks. Full article
(This article belongs to the Special Issue Structural Networks for Image Application)
Show Figures

Figure 1

15 pages, 4977 KB  
Article
Ensuring Consistency for In-Image Translation
by Chengpeng Fu, Xiaocheng Feng, Yichong Huang, Wenshuai Huo, Baohang Li, Yang Xiang, Hui Wang and Ting Liu
Mathematics 2026, 14(3), 490; https://doi.org/10.3390/math14030490 - 30 Jan 2026
Viewed by 536
Abstract
The in-image machine translation task involves translating text embedded within images, with the translated results presented in image format. While this task has numerous applications in various scenarios such as film poster translation and everyday scene image translation, existing methods frequently neglect the [...] Read more.
The in-image machine translation task involves translating text embedded within images, with the translated results presented in image format. While this task has numerous applications in various scenarios such as film poster translation and everyday scene image translation, existing methods frequently neglect the aspect of consistency throughout this process. We propose the need to uphold two types of consistency in this task: translation consistency and image generation consistency. The former entails incorporating image information during translation, while the latter involves maintaining consistency between the style of the text image and the original image, ensuring background coherence. To address these consistency requirements, we introduce a novel two-stage framework named HCIIT (High-Consistency In-Image Translation), which involves text image translation using a multimodal multilingual large language model in the first stage and image backfilling with a diffusion model in the second stage. Chain-of-thought learning is employed in the first stage to enhance the model’s ability to effectively leverage visual information during translation. Subsequently, a diffusion model trained for style-consistent text–image generation is adopted. We further modify the structural network of the conventional diffusion model by introducing a style latent module, which ensures uniformity of text style within images while preserving fine-grained background details. The results obtained on both curated test sets and authentic image test sets validate the effectiveness of our framework in ensuring consistency and producing high-quality translated images. Full article
(This article belongs to the Special Issue Structural Networks for Image Application)
Show Figures

Figure 1

17 pages, 1079 KB  
Article
Prototype-Based Two-Stage Few-Shot Instance Segmentation with Flexible Novel Class Adaptation
by Qinying Zhu, Yilin Zhang, Peng Xiao, Mengxi Ying, Lei Zhu and Chengyuan Zhang
Mathematics 2025, 13(17), 2889; https://doi.org/10.3390/math13172889 - 7 Sep 2025
Viewed by 2040
Abstract
Few-shot instance segmentation (FSIS) is devised to address the intricate challenge of instance segmentation when labeled data for novel classes is scant. Nevertheless, existing methodologies encounter notable constraints in the agile expansion of novel classes and the management of memory overhead. The integration [...] Read more.
Few-shot instance segmentation (FSIS) is devised to address the intricate challenge of instance segmentation when labeled data for novel classes is scant. Nevertheless, existing methodologies encounter notable constraints in the agile expansion of novel classes and the management of memory overhead. The integration workflow for novel classes is inflexible, and given the necessity of retaining class exemplars during both training and inference stages, considerable memory consumption ensues. To surmount these challenges, this study introduces an innovative framework encompassing a two-stage “base training-novel class fine-tuning” paradigm. It acquires discriminative instance-level embedding representations. Concretely, instance embeddings are aggregated into class prototypes, and the storage of embedding vectors as opposed to images inherently mitigates the issue of memory overload. Via a Region of Interest (RoI)-level cosine similarity matching mechanism, the flexible augmentation of novel classes is realized, devoid of the requirement for supplementary training and independent of historical data. Experimental validations attest that this approach significantly outperforms state-of-the-art techniques in mainstream benchmark evaluations. More crucially, its memory-optimized attributes facilitate, for the first time, the conjoint assessment of FSIS performance across all classes within the COCO dataset. Visualized instances (incorporating colored masks and class annotations of objects across diverse scenarios) further substantiate the efficacy of the method in real-world complex contexts. Full article
(This article belongs to the Special Issue Structural Networks for Image Application)
Show Figures

Figure 1

19 pages, 7161 KB  
Article
Dynamic Snake Convolution Neural Network for Enhanced Image Super-Resolution
by Weiqiang Xin, Ziang Wu, Qi Zhu, Tingting Bi, Bing Li and Chunwei Tian
Mathematics 2025, 13(15), 2457; https://doi.org/10.3390/math13152457 - 30 Jul 2025
Cited by 2 | Viewed by 2332
Abstract
Image super-resolution (SR) is essential for enhancing image quality in critical applications, such as medical imaging and satellite remote sensing. However, existing methods were often limited in their ability to effectively process and integrate multi-scales information from fine textures to global structures. To [...] Read more.
Image super-resolution (SR) is essential for enhancing image quality in critical applications, such as medical imaging and satellite remote sensing. However, existing methods were often limited in their ability to effectively process and integrate multi-scales information from fine textures to global structures. To address these limitations, this paper proposes DSCNN, a dynamic snake convolution neural network for enhanced image super-resolution. DSCNN optimizes feature extraction and network architecture to enhance both performance and efficiency: To improve feature extraction, the core innovation is a feature extraction and enhancement module with dynamic snake convolution that dynamically adjusts the convolution kernel’s shape and position to better fit the image’s geometric structures, significantly improving feature extraction. To optimize the network’s structure, DSCNN employs an enhanced residual network framework. This framework utilizes parallel convolutional layers and a global feature fusion mechanism to further strengthen feature extraction capability and gradient flow efficiency. Additionally, the network incorporates a SwishReLU-based activation function and a multi-scale convolutional concatenation structure. This multi-scale design effectively captures both local details and global image structure, enhancing SR reconstruction. In summary, the proposed DSCNN outperforms existing methods in both objective metrics and visual perception (e.g., our method achieved optimal PSNR and SSIM results on the Set5 ×4 dataset). Full article
(This article belongs to the Special Issue Structural Networks for Image Application)
Show Figures

Figure 1

Review

Jump to: Research

43 pages, 1250 KB  
Review
Challenges and Opportunities in Tomato Leaf Disease Detection with Limited and Multimodal Data: A Review
by Yingbiao Hu, Huinian Li, Chengcheng Yang, Ningxia Chen, Zhenfu Pan and Wei Ke
Mathematics 2026, 14(3), 422; https://doi.org/10.3390/math14030422 - 26 Jan 2026
Cited by 1 | Viewed by 920
Abstract
Tomato leaf diseases cause substantial yield and quality losses worldwide, yet reliable detection in real fields remains challenging. Two practical bottlenecks dominate current research: (i) limited data, including small samples for rare diseases, class imbalance, and noisy field images, and (ii) multimodal heterogeneity, [...] Read more.
Tomato leaf diseases cause substantial yield and quality losses worldwide, yet reliable detection in real fields remains challenging. Two practical bottlenecks dominate current research: (i) limited data, including small samples for rare diseases, class imbalance, and noisy field images, and (ii) multimodal heterogeneity, where RGB images, textual symptom descriptions, spectral cues, and optional molecular assays provide complementary but hard-to-align evidence. This review summarizes recent advances in tomato leaf disease detection under these constraints. We first formalize the problem settings of limited and multimodal data and analyze their impacts on model generalization. We then survey representative solutions for limited data (transfer learning, data augmentation, few-/zero-shot learning, self-supervised learning, and knowledge distillation) and multimodal fusion (feature-, decision-, and hybrid-level strategies, with attention-based alignment). Typical model–dataset pairs are compared, with emphasis on cross-domain robustness and deployment cost. Finally, we outline open challenges—including weak generalization in complex field environments, limited interpretability of multimodal models, and the absence of unified multimodal benchmarks—and discuss future opportunities toward lightweight, edge-ready, and scalable multimodal systems for precision agriculture. Full article
(This article belongs to the Special Issue Structural Networks for Image Application)
Show Figures

Figure 1

Back to TopTop