Multimodal Signal, Image and Video Analysis and Application

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Artificial Intelligence".

Deadline for manuscript submissions: closed (30 April 2023) | Viewed by 11146

Special Issue Editors

School of Computer Science, Wuhan University, Wuhan 430072, China
Interests: computer vision; pattern recognition; machine learning

E-Mail Website
Guest Editor
School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan 430074, China
Interests: computer vision and machine learning, especially data/label- and computation-efficient deep learning for visual recognition
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
School of Mobile Information Engineering, Sun Yat-Sen University, Guangzhou 510275, China
Interests: autonomous vehicle perception system; imaging and point clouds processing and machine learning

Special Issue Information

Dear Colleagues,

With the increasing requirement of fine-scale representation of the physical world, various data capturing techniques have been developed, which leads to a wide range of modalities for signals, images and videos. It is of particular importance to study the use of multimodal information to understand events in real-world application scenarios, e.g., object detection using RGBD and LiDAR data, surface defect inspection using optical and radar data, mural painting analysis using X-ray and multi-spectrum data, information retrieval using images and texts, etc. This Special Issue aims at promoting cutting-edge research in multimodal information processing and offers a timely collection of works to benefit researchers in both academia and industry. We welcome high-quality original submissions addressing either theoretical or practical issues. Topics of interests include, but are not limited to:

  • Multimodal data acquisition techniques;
  • Multimodal data storage and indexing techniques;
  • Modeling, representation, and learning with multimodal data;
  • Multimodal data fusion;
  • Cross-modal data mining and retrieval;
  • Visualization and interpretation techniques for multimodal data;
  • Machine learning with multimodal data;
  • Open issues for multimodal data analysis and application.

Dr. Qin Zou
Prof. Dr. Xinggang Wang
Dr. Long Chen
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • multimodal data
  • image processing
  • video processing
  • machine learning
  • information retrieval
  • data fusion

Published Papers (6 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

20 pages, 1650 KiB  
Article
Analysis of Scale Sensitivity of Ship Detection in an Anchor-Free Deep Learning Framework
by Yongxin Jiang, Li Huang, Zhiyou Zhang, Bu Nie and Fan Zhang
Electronics 2023, 12(1), 38; https://doi.org/10.3390/electronics12010038 - 22 Dec 2022
Cited by 1 | Viewed by 1174
Abstract
Ship detection is an important task in sea surveillance. In the past decade, deep learning-based methods have been proposed for ship detection from images and videos. Convolutional features are observed to be very effective in representing ship objects. However, the scales of convolution [...] Read more.
Ship detection is an important task in sea surveillance. In the past decade, deep learning-based methods have been proposed for ship detection from images and videos. Convolutional features are observed to be very effective in representing ship objects. However, the scales of convolution often lead to different capacities of feature representation. It is unclear how the scale influences the performance of deep learning methods in ship detection. To this end, this paper studies the scale sensitivity of ship detection in an anchor-free deep learning framework. Specifically, we employ the classical CenterNet as the base and analyze the influence of the size, the depth, and the fusion strategy of convolution features on multi-scale ship target detection. Experiments show that, for small targets, the features obtained from the top-down path fusion can improve the detection performance more significantly than that from the bottom-up path fusion; on the contrary, the bottom-up path fusion achieves better detection performance on larger targets. Full article
(This article belongs to the Special Issue Multimodal Signal, Image and Video Analysis and Application)
Show Figures

Figure 1

12 pages, 1561 KiB  
Article
MFVC: Urban Traffic Scene Video Caption Based on Multimodal Fusion
by Mingxing Li, Hao Zhang, Cheng Xu, Chenyang Yan, Hongzhe Liu and Xuewei Li
Electronics 2022, 11(19), 2999; https://doi.org/10.3390/electronics11192999 - 21 Sep 2022
Cited by 2 | Viewed by 1614
Abstract
With the development of electronic technology, intelligent cars can gradually realize more complex artificial intelligence algorithms. The video caption algorithm is one of them. However, current video caption algorithms only consider single-visual information when applied to urban traffic scenes, which leads to the [...] Read more.
With the development of electronic technology, intelligent cars can gradually realize more complex artificial intelligence algorithms. The video caption algorithm is one of them. However, current video caption algorithms only consider single-visual information when applied to urban traffic scenes, which leads to the failure to generate accurate captions of complex sets. The multimodal fusion algorithm based on Transformer is one of the solutions to this problem. However, the existing algorithms have the difficulties of a low fusion performance and high computational complexity. We propose a new video caption Transformer-based model, the MFVC (Multimodal Fusion for Video Caption), to solve these issues. We introduce audio modal data and the attention bottleneck module to increase the available information to describe the generative model and improve the model effect with less operation costs through the attention bottleneck module. Finally, the experiment is conducted on the available datasets, MSR-VTT and MSVD. Meanwhile, to verify the effect of the model on the urban traffic scene, the experiment is carried out on the self-built traffic caption dataset BUUISE, and the evaluation index confirms the model. This model can achieve good results on both available datasets and urban traffic datasets and has excellent application prospects in the intelligent driving industry. Full article
(This article belongs to the Special Issue Multimodal Signal, Image and Video Analysis and Application)
Show Figures

Figure 1

17 pages, 31864 KiB  
Article
RI-MFM: A Novel Infrared and Visible Image Registration with Rotation Invariance and Multilevel Feature Matching
by Depeng Zhu, Weida Zhan, Jingqi Fu, Yichun Jiang, Xiaoyu Xu, Renzhong Guo and Yu Chen
Electronics 2022, 11(18), 2866; https://doi.org/10.3390/electronics11182866 - 10 Sep 2022
Cited by 2 | Viewed by 1442
Abstract
In the past ten years, multimodal image registration technology has been continuously developed, and a large number of researchers have paid attention to the problem of infrared and visible image registration. Due to the differences in grayscale distribution, resolution and viewpoint between two [...] Read more.
In the past ten years, multimodal image registration technology has been continuously developed, and a large number of researchers have paid attention to the problem of infrared and visible image registration. Due to the differences in grayscale distribution, resolution and viewpoint between two images, most of the existing infrared and visible image registration methods are still insufficient in accuracy. To solve such problems, we propose a new robust and accurate infrared and visible image registration method. For the purpose of generating more robust feature descriptors, we propose to generate feature descriptors using a concentric-circle-based feature-description algorithm. The method enhances the description of the main direction of feature points by introducing centroids, and, at the same time, uses concentric circles to ensure the rotation invariance of feature descriptors. To match feature points quickly and accurately, we propose a multi-level feature-matching algorithm using improved offset consistency for matching feature points. We redesigned the matching algorithm based on the offset consistency principle. The comparison experiments with several other state-of-the-art registration methods in CVC and homemade datasets show that our proposed method has significant advantages in both feature-point localization accuracy and correct matching rate. Full article
(This article belongs to the Special Issue Multimodal Signal, Image and Video Analysis and Application)
Show Figures

Figure 1

22 pages, 4281 KiB  
Article
Data Augmentation Based on Generative Adversarial Network with Mixed Attention Mechanism
by Yu Yang, Lei Sun, Xiuqing Mao and Min Zhao
Electronics 2022, 11(11), 1718; https://doi.org/10.3390/electronics11111718 - 27 May 2022
Cited by 2 | Viewed by 2346
Abstract
Some downstream tasks often require enough data for training in deep learning, but it is formidable to acquire data in some particular fields. Generative Adversarial Network has been extensively used in data augmentation. However, it still has problems of unstable training and low [...] Read more.
Some downstream tasks often require enough data for training in deep learning, but it is formidable to acquire data in some particular fields. Generative Adversarial Network has been extensively used in data augmentation. However, it still has problems of unstable training and low quality of generated images. This paper proposed Data Augmentation Based on Generative Adversarial Network with Mixed Attention Mechanism (MA-GAN) to solve those problems. This method can generate consistent objects or scenes by correlating the remote features in the image, thus improving the ability to create details. Firstly, the channel-attention and the self-attention mechanism are added into the generator and discriminator. Then, the spectral normalization is introduced into the generator and discriminator so that the parameter matrix satisfies the Lipschitz constraint, thus improving the stability of the model training process. By qualitative and quantitative evaluations on small-scale benchmarks (CelebA, MNIST, and CIFAR-10), the experimental results show that the proposed method performs better than other methods. Compared with WGAN-GP (Improved Training of Wasserstein GANs) and SAGAN (Self-Attention Generative Adversarial Networks), the proposed method contributes to higher classification accuracy, indicating that this method can effectively augment the data of small samples. Full article
(This article belongs to the Special Issue Multimodal Signal, Image and Video Analysis and Application)
Show Figures

Figure 1

13 pages, 15193 KiB  
Article
Multi-Site and Multi-Scale Unbalanced Ship Detection Based on CenterNet
by Feihu Zhang and Xujia Hou
Electronics 2022, 11(11), 1713; https://doi.org/10.3390/electronics11111713 - 27 May 2022
Cited by 1 | Viewed by 1480
Abstract
Object detection plays an essential role in the computer vision domain, especially the machine learning-based approach, which has developed rapidly in the past decades. However, the development of convolutional neural networks in the marine field is relatively slow, such as in ship classification [...] Read more.
Object detection plays an essential role in the computer vision domain, especially the machine learning-based approach, which has developed rapidly in the past decades. However, the development of convolutional neural networks in the marine field is relatively slow, such as in ship classification and tracking. In this paper, ship detection is considered as a central point classification and regression task but discards the non-maximum suppression operation. We first improved the deep layer aggregation network to enhance the feature extraction capability of tiny targets, then reduced the number of parameters through the lightweight convolution module, and finally employed a unique activation function to enhance the nonlinearity of the model. By doing this, the improved network not only suits unbalanced sample ratios in classifying, but is more robust in scenarios where both the number and resolution of samples are unstable. Experimental results demonstrate that the proposed approach obtains outstanding performance and especially suits tiny object detection compared with current advanced methods. Furthermore, in contrast to the original CenterNet framework, the mAP of the proposed approach increased by 5.6%. Full article
(This article belongs to the Special Issue Multimodal Signal, Image and Video Analysis and Application)
Show Figures

Figure 1

16 pages, 43300 KiB  
Article
A Novel Path Voting Algorithm for Surface Crack Detection
by Jianwei Yu, Zhipeng Chen and Zhiming Xiong
Electronics 2022, 11(3), 501; https://doi.org/10.3390/electronics11030501 - 08 Feb 2022
Cited by 1 | Viewed by 1794
Abstract
Path voting is a widely used technique for line structure detection in images. Traditional path voting, based on minimal-path, is performed to track paths based on how seeds grow. The former requires to set a starting point and an end point. Thus, the [...] Read more.
Path voting is a widely used technique for line structure detection in images. Traditional path voting, based on minimal-path, is performed to track paths based on how seeds grow. The former requires to set a starting point and an end point. Thus, the performance of minimal-path path voting depends on the initialization. However, high-quality initialization often requires human interaction, which limits its applications in practice. In this paper, a fully automatic path voting method has been proposed and applied for crack detection. The proposed path voting is performed to segment images, which partitions an image patch along the potential crack path and integrates the path to form a crack probability map. After path voting, crack seeds are sampled and modeled into a graph, and the edge weights are assigned using an attraction field algorithm. Finally, cracks are extracted by using spanning tree and tree pruning algorithms. Experimental results demonstrate that the proposed path voting approach can effectively infer the cracks from 2D optic images and 3D depth images. Full article
(This article belongs to the Special Issue Multimodal Signal, Image and Video Analysis and Application)
Show Figures

Figure 1

Back to TopTop