Big Model Techniques for Image Processing

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Electronic Multimedia".

Deadline for manuscript submissions: 15 October 2024 | Viewed by 12365

Special Issue Editors

School of Computer Science and Artificial Intelligence, Wuhan University of Technology, Wuhan 430070, China
Interests: visual information processing; multimedia content analysis and retrieval
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
School of Electronics and Information, Northwestern Polytechnical University, Xi’an 710072, China
Interests: intelligent decision and control of UAVs; deep reinforcement learning; uncertain information processing; image processing
Special Issues, Collections and Topics in MDPI journals
School of Electronics and Communication Engineering, Sun Yat-sen University, Shenzhen 518107, China
Interests: deep learning; machine vision; human-robot interaction

Special Issue Information

Dear Colleagues,

In recent years, there has been a growing interest in leveraging large-scale models to address various challenges in image processing and analysis. The proliferation of digital images and the advent of the Internet of Things (IoT) have significantly increased the demand for efficient and accurate image processing techniques. However, traditional image processing methods often face limitations in handling complex data distributions, large-scale datasets, and high-dimensional feature representations. Big model techniques, on the other hand, harness the potential of large-scale neural networks, leveraging their ability to automatically learn and extract intricate patterns and features from images.

The rise of deep learning and the availability of vast amounts of data have paved the way for significant advancements in this field. By harnessing the power of big models, we can potentially achieve breakthroughs in image-related tasks such as image recognition, segmentation, generation, and enhancement.

Topics of interest include, but are not limited to, the following:

  • Machine learning and deep learning;
  • Big model;
  • Image processing;
  • Image generation;
  • Object detection.

Dr. Chunwei Tian
Dr. Xian Zhong
Dr. Bo Li
Dr. Qing Gao
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • speech denoising 
  • speech recognition 
  • CNN 
  • computer vision 
  • pattern recognition 
  • deep learning 
  • signal processing

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (8 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

16 pages, 5861 KiB  
Article
NRPerson: A Non-Registered Multi-Modal Benchmark for Tiny Person Detection and Localization
by Yi Yang, Xumeng Han, Kuiran Wang, Xuehui Yu, Wenwen Yu, Zipeng Wang, Guorong Li, Zhenjun Han and Jianbin Jiao
Electronics 2024, 13(9), 1697; https://doi.org/10.3390/electronics13091697 - 27 Apr 2024
Viewed by 686
Abstract
In recent years, the detection and localization of tiny persons have garnered significant attention due to their critical applications in various surveillance and security scenarios. Traditional multi-modal methods predominantly rely on well-registered image pairs, necessitating the use of sophisticated sensors and extensive manual [...] Read more.
In recent years, the detection and localization of tiny persons have garnered significant attention due to their critical applications in various surveillance and security scenarios. Traditional multi-modal methods predominantly rely on well-registered image pairs, necessitating the use of sophisticated sensors and extensive manual effort for registration, which restricts their practical utility in dynamic, real-world environments. Addressing this gap, this paper introduces a novel non-registered multi-modal benchmark named NRPerson, specifically designed to advance the field of tiny person detection and localization by accommodating the complexities of real-world scenarios. The NRPerson dataset comprises 8548 RGB-IR image pairs, meticulously collected and filtered from 22 video sequences, enriched with 889,207 high-quality annotations that have been manually verified for accuracy. Utilizing NRPerson, we evaluate several leading detection and localization models across both mono-modal and non-registered multi-modal frameworks. Furthermore, we develop a comprehensive set of natural multi-modal baselines for the innovative non-registered track, aiming to enhance the detection and localization of unregistered multi-modal data using a cohesive and generalized approach. This benchmark is poised to facilitate significant strides in the practical deployment of detection and localization technologies by mitigating the reliance on stringent registration requirements. Full article
(This article belongs to the Special Issue Big Model Techniques for Image Processing)
Show Figures

Figure 1

14 pages, 6912 KiB  
Article
A Dynamic Network with Transformer for Image Denoising
by Mingjian Song, Wenbo Wang and Yue Zhao
Electronics 2024, 13(9), 1676; https://doi.org/10.3390/electronics13091676 - 26 Apr 2024
Cited by 1 | Viewed by 906
Abstract
Deep convolutional neural networks (CNNs) can achieve good performance in image denoising due to their superiority in the extraction of structural information. However, they may ignore the relationships between pixels to limit effects for image denoising. Transformer, focusing on pixel to pixel relationships [...] Read more.
Deep convolutional neural networks (CNNs) can achieve good performance in image denoising due to their superiority in the extraction of structural information. However, they may ignore the relationships between pixels to limit effects for image denoising. Transformer, focusing on pixel to pixel relationships can effectively solve this problem. This article aims to make a CNN and Transformer complement each other in image denoising. In this study, we propose a dynamic network with Transformer for image denoising (DTNet), with a residual block (RB), a multi-head self-attention block (MSAB), and a multidimensional dynamic enhancement block (MDEB). Firstly, the RB not only utilizes a CNN but also lays the foundation for the combination with Transformer. Then, the MSAB adds positional encoding and applies multi-head self-attention, which enables the preservation of sequential positional information while employing the Transformer to obtain global information. Finally, the MDEB uses dimension enhancement and dynamic convolution to improve the adaptive ability. The experiments show that our DTNet is superior to some existing methods for image denoising. Full article
(This article belongs to the Special Issue Big Model Techniques for Image Processing)
Show Figures

Figure 1

14 pages, 3268 KiB  
Article
An Adaptive Atrous Spatial Pyramid Pooling Network for Hyperspectral Classification
by Tianxing Zhu, Qin Liu and Lixiang Zhang
Electronics 2023, 12(24), 5013; https://doi.org/10.3390/electronics12245013 - 15 Dec 2023
Cited by 3 | Viewed by 1230
Abstract
Hyperspectral imaging (HSI) offers rich spectral and spatial data, beneficial for a variety of applications. However, challenges persist in HSI classification due to spectral variability, non-linearity, limited samples, and a dearth of spatial information in conventional spectral classifiers. While various spectral–spatial classifiers and [...] Read more.
Hyperspectral imaging (HSI) offers rich spectral and spatial data, beneficial for a variety of applications. However, challenges persist in HSI classification due to spectral variability, non-linearity, limited samples, and a dearth of spatial information in conventional spectral classifiers. While various spectral–spatial classifiers and dimension reduction techniques have been developed to mitigate these issues, they are often constrained by the utilization of handcrafted features. Deep learning has been introduced to HSI classification, with pixel- and patch-level deep learning (DL) classifiers gaining substantial attention. Yet, existing patch-level DL classifiers encounter difficulties in concentrating on long-distance dependencies and managing category areas of diverse sizes. The proposed Self-Adaptive 3D atrous spatial pyramid pooling (ASPP) Multi-Scale Feature Fusion Network (SAAFN) addresses these challenges by simultaneously preserving high-resolution spatial detail data and high-level semantic information. This method integrates a modified hyperspectral superpixel segmentation technique, a multi-scale 3D ASPP convolution block, and an end-to-end framework to extract and fuse multi-scale features at a self-adaptive rate for HSI classification. This method significantly enhances the classification accuracy of HSI with limited samples. Full article
(This article belongs to the Special Issue Big Model Techniques for Image Processing)
Show Figures

Figure 1

18 pages, 5422 KiB  
Article
D2StarGAN: A Near-Far End Noise Adaptive StarGAN for Speech Intelligibility Enhancement
by Dengshi Li, Chenyi Zhu and Lanxin Zhao
Electronics 2023, 12(17), 3620; https://doi.org/10.3390/electronics12173620 - 27 Aug 2023
Viewed by 1887
Abstract
When using mobile communication, the voice output from the device is already relatively clear, but in a noisy environment, it is difficult for the listener to obtain the information expressed by the speaker with clarity. Consequently, speech intelligibility enhancement technology has emerged to [...] Read more.
When using mobile communication, the voice output from the device is already relatively clear, but in a noisy environment, it is difficult for the listener to obtain the information expressed by the speaker with clarity. Consequently, speech intelligibility enhancement technology has emerged to help alleviate this problem. Speech intelligibility enhancement (IENH) is a technique that enhances speech intelligibility during the reception phase. Previous research has focused on IENH through normal versus different levels of Lombardic speech conversion, inspired by a well-known acoustic mechanism called the Lombard effect. However, these methods often lead to speech distortion and impair the overall speech quality. To address the speech quality degradation problem, we propose an improved (StarGAN)-based IENH framework by combining StarGAN networks with the dual discriminator idea to construct the conversion framework. This approach offers two main advantages: (1) Addition of a speech metric discriminator on top of StarGAN to optimize multiple intelligibility and quality-related metrics simultaneously; (2) a framework that is adaptive to different distal and proximal noise levels with different noise types. Experimental results from objective experiments and subjective preference tests show that our approach outperforms the baseline approach, and these enable IENH to be more widely used. Full article
(This article belongs to the Special Issue Big Model Techniques for Image Processing)
Show Figures

Figure 1

26 pages, 7083 KiB  
Article
MemoryGAN: GAN Generator as Heterogeneous Memory for Compositional Image Synthesis
by Zongtao Wang, Jiajie Peng and Zhiming Liu
Electronics 2023, 12(13), 2927; https://doi.org/10.3390/electronics12132927 - 3 Jul 2023
Viewed by 1282
Abstract
The Generative Adversarial Network (GAN) has recently experienced great progress in compositional image synthesis. Unfortunately, the models proposed in the literature usually require a set of pre-defined local generators and use a separate generator to model each part object. This makes the model [...] Read more.
The Generative Adversarial Network (GAN) has recently experienced great progress in compositional image synthesis. Unfortunately, the models proposed in the literature usually require a set of pre-defined local generators and use a separate generator to model each part object. This makes the model inflexible and also limits its scalability. Inspired by humans’ structured memory system, we propose MemoryGAN to eliminate these disadvantages. MemoryGAN uses a single generator as a shared memory to hold the heterogeneous information of the parts, and it uses a recurrent neural network to model the dependency between the parts and provide the query code for the memory. The shared memory structure and the query and feedback mechanism make MemoryGAN flexible and scalable. Our experiment shows that although MemoryGAN only uses a single generator for all the parts, it achieves comparable performance with the state-of-the-art, which uses multiple generators, in terms of synthesized image quality, compositional ability and disentanglement property. We believe that our result of using the generator of the GAN as a memory model will inspire future work of both bio-friendly models and memory-augmented models. Full article
(This article belongs to the Special Issue Big Model Techniques for Image Processing)
Show Figures

Figure 1

17 pages, 813 KiB  
Article
NFSP-PLT: Solving Games with a Weighted NFSP-PER-Based Method
by Huale Li, Shuhan Qi, Jiajia Zhang, Dandan Zhang, Lin Yao, Xuan Wang, Qi Li and Jing Xiao
Electronics 2023, 12(11), 2396; https://doi.org/10.3390/electronics12112396 - 25 May 2023
Viewed by 1162
Abstract
Nash equilibrium strategy is a typical goal when solving two-player imperfect-information games (IIGs). Neural fictitious self-play (NFSP) is a popular method to find the Nash equilibrium in IIGs, which is the first end-to-end method used to compute the Nash equilibrium strategy. However, the [...] Read more.
Nash equilibrium strategy is a typical goal when solving two-player imperfect-information games (IIGs). Neural fictitious self-play (NFSP) is a popular method to find the Nash equilibrium in IIGs, which is the first end-to-end method used to compute the Nash equilibrium strategy. However, the training of NFSP requires a large number of sample data and the interactive cost of obtaining such data is often very high. Realizing the efficient training of network under limited samples is an urgent problem. In this paper, we first proposed a new NFSP-based method, NFSP with prioritized experience replay (NFSP-PER), to improve the sample training efficiency. Then, a weighted NFSP-PER with learning time (NFSP-PLT) was proposed to control the utilization degree of priority-weighted samples. Furthermore, based on the NFSP-PLT, an adaptive upper-confidence-bound applied to tree (UCT) is used to solve the optimal response strategy, which makes the solving strategy more accurate. Extensive experimental results show that the proposed NFSP-PLT effectively improves the sample learning efficiency compared with the existing works. Full article
(This article belongs to the Special Issue Big Model Techniques for Image Processing)
Show Figures

Figure 1

12 pages, 20154 KiB  
Article
Denoising and Reducing Inner Disorder in Point Clouds for Improved 3D Object Detection in Autonomous Driving
by Weifan Xu, Jin Jin, Fenglei Xu, Ze Li and Chongben Tao
Electronics 2023, 12(11), 2364; https://doi.org/10.3390/electronics12112364 - 23 May 2023
Viewed by 1344
Abstract
In the field of autonomous driving, precise spatial positioning and 3D object detection have become increasingly critical due to advancements in LiDAR technology and its extensive applications. Traditional detection models for RGB images face challenges in handling the intrinsic disorder present in LiDAR [...] Read more.
In the field of autonomous driving, precise spatial positioning and 3D object detection have become increasingly critical due to advancements in LiDAR technology and its extensive applications. Traditional detection models for RGB images face challenges in handling the intrinsic disorder present in LiDAR point clouds. Although point clouds are typically perceived as irregular and disordered, an implicit order actually exists, owing to laser arrangement and sequential scanning. Therefore, we propose Frustumformer, a novel framework that leverages the inherent order of LiDAR point clouds, reducing disorder and enhancing representation. Our approach consists of a frustum-based method that relies on the results of a 2D image detector, a frustum patch embedding that exploits the new data representation format, and a single-stride transformer network for original resolution feature fusion. By incorporating these components, Frustumformer effectively exploits the intrinsic order of point clouds and models long-range dependencies to further improve performance. Ablation studies verify the efficacy of the single-stride transformer component and the overall model architecture. We conduct experiments on the KITTI dataset, and Frustumformer outperforms existing methods. Full article
(This article belongs to the Special Issue Big Model Techniques for Image Processing)
Show Figures

Figure 1

19 pages, 4845 KiB  
Article
Using CNN with Multi-Level Information Fusion for Image Denoising
by Shaodong Xie, Jiagang Song, Yuxuan Hu, Chengyuan Zhang and Shichao Zhang
Electronics 2023, 12(9), 2146; https://doi.org/10.3390/electronics12092146 - 8 May 2023
Cited by 1 | Viewed by 2384
Abstract
Deep convolutional neural networks (CNN) with hierarchical architectures have obtained good results for image denoising. However, in some cases where the noise level is unknown and the image background is complex, it is challenging to obtain robust information through CNN. In this paper, [...] Read more.
Deep convolutional neural networks (CNN) with hierarchical architectures have obtained good results for image denoising. However, in some cases where the noise level is unknown and the image background is complex, it is challenging to obtain robust information through CNN. In this paper, we present a multi-level information fusion CNN (MLIFCNN) in image denoising containing a fine information extraction block (FIEB), a multi-level information interaction block (MIIB), a coarse information refinement block (CIRB), and a reconstruction block (RB). In order to adapt to more complex image backgrounds, FIEB uses parallel group convolution to extract wide-channel information. To enhance the robustness of the obtained information, a MIIB uses residual operations to act in two sub-networks for implementing the interaction of wide and deep information to adapt to the distribution of different noise levels. To enhance the stability of the training denoiser, CIRB stacks common and group convolutions to refine the obtained information. Finally, RB uses a residual operation to act in a single convolution in order to obtain the resultant clean image. Experimental results show that our method is better than many other excellent methods, both in terms of quantitative and qualitative aspects. Full article
(This article belongs to the Special Issue Big Model Techniques for Image Processing)
Show Figures

Figure 1

Back to TopTop