Special Issue "Artificial Intelligence for Multimedia Signal Processing"

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: 31 December 2020.

Special Issue Editors

Prof. Dr. Byung-Gyu Kim
Website
Guest Editor
Department of IT Engineering, Sookmyung Women's University, Seoul, Korea
Interests: image/video signal processing; pattern recognition; computer vision; deep learning; artificial intelligence
Prof. Dr. Dongsan Jun
Website
Guest Editor
Department of Information and Communication Engineering, Kyungnam University, Changwon 51767, Korea
Interests: media; video coding; video compression; video encoder; image processing; realistic digital broadcasting system

Special Issue Information

Dear Colleagues,

At the ImageNet Large Scale Visual Re-Conversion Challenge (ILSVRC), a 2012 global image recognition contest, the University of Toronto Supervision team led by Prof. Geoffrey Hinton took first and second place by a landslide, sparking an explosion of interest in deep learning. Since then, global experts and companies such as Google, Microsoft, nVidia, and Intel have been competing to lead artificial intelligence technologies, such as deep learning. Now, they are developing deep-learning-based technologies applied to all industries and solving many classification and recognition problems.

These artificial intelligence technologies are also actively applied to broadcasting and multimedia processing technologies. A lot of research has been conducted in a wide variety of fields, such as content creation, transmission, and security, and these attempts have been made in the past two to three years to improve image, video, speech, and other data compression efficiency in areas related to MPEG media processing technology. Additionally, technologies such as media creation, processing, editing, and creating scenarios are very important areas of research in multimedia processing and engineering.

While this Special Issue invites topics broadly across advanced computational intelligence algorithms and technologies for emerging multimedia signal processing, some specific topics include but are not limited to:

- Signal/image/video processing algorithm for advanced deep learning;

- Fast and complexity reduction mechanism based on deep neural network;

- Protecting technologies for privacy/personalized media data;

- Advanced circuit/system design and analysis based on deep neural networks;

- Image/video-based recognition algorithm using deep neural network;

- Deep-learning-based speech and audio processing;

- Efficient multimedia sharing schemes using artificial intelligence;

- Artificial intelligence technologies for multimedia creation, processing, editing, and creating scenarios;

- Deep-learning-based web data mining and representation.

Prof. Dr. Byung-Gyu Kim
Prof. Dr. Dongsan Jun
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All papers will be peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1800 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Artificial/computational intelligence
  • Image/video/speech signal processing
  • Advance deep learning
  • Learning mechanism
  • Multimedia processing

Published Papers (3 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Open AccessArticle
Recommendations for Different Tasks Based on the Uniform Multimodal Joint Representation
Appl. Sci. 2020, 10(18), 6170; https://doi.org/10.3390/app10186170 - 04 Sep 2020
Abstract
Content curation social networks (CCSNs), such as Pinterest and Huaban, are interest driven and content centric. On CCSNs, user interests are represented by a set of boards, and a board is composed of various pins. A pin is an image with a description. [...] Read more.
Content curation social networks (CCSNs), such as Pinterest and Huaban, are interest driven and content centric. On CCSNs, user interests are represented by a set of boards, and a board is composed of various pins. A pin is an image with a description. All entities, such as users, boards, and categories, can be represented as a set of pins. Therefore, it is possible to implement entity representation and the corresponding recommendations on a uniform representation space from pins. Furthermore, lots of pins are re-pinned from others and the pin’s re-pin sequences are recorded on CCSNs. In this paper, a framework which can learn the multimodal joint representation of pins, including text representation, image representation, and multimodal fusion, is proposed. Image representations are extracted from a multilabel convolutional neural network. The multiple labels of pins are automatically obtained by the category distributions in the re-pin sequences, which benefits from the network architecture. Text representations are obtained with the word2vec tool. Two modalities are fused with a multimodal deep Boltzmann machine. On the basis of the pin representation, different recommendation tasks are implemented, including recommending pins or boards to users, recommending thumbnails to boards, and recommending categories to boards. Experimental results on a dataset from Huaban demonstrate that the multimodal joint representation of pins contains the information of user interests. Furthermore, the proposed multimodal joint representation outperformed unimodal representation in different recommendation tasks. Experiments were also performed to validate the effectiveness of the proposed recommendation methods. Full article
(This article belongs to the Special Issue Artificial Intelligence for Multimedia Signal Processing)
Show Figures

Figure 1

Open AccessArticle
The Application and Improvement of Deep Neural Networks in Environmental Sound Recognition
Appl. Sci. 2020, 10(17), 5965; https://doi.org/10.3390/app10175965 - 28 Aug 2020
Abstract
Neural networks have achieved great results in sound recognition, and many different kinds of acoustic features have been tried as the training input for the network. However, there is still doubt about whether a neural network can efficiently extract features from the raw [...] Read more.
Neural networks have achieved great results in sound recognition, and many different kinds of acoustic features have been tried as the training input for the network. However, there is still doubt about whether a neural network can efficiently extract features from the raw audio signal input. This study improved the raw-signal-input network from other researches using deeper network architectures. The raw signals could be better analyzed in the proposed network. We also presented a discussion of several kinds of network settings, and with the spectrogram-like conversion, our network could reach an accuracy of 73.55% in the open-audio-dataset “Dataset for Environmental Sound Classification 50” (ESC50). This study also proposed a network architecture that could combine different kinds of network feeds with different features. With the help of global pooling, a flexible fusion way was integrated into the network. Our experiment successfully combined two different networks with different audio feature inputs (a raw audio signal and the log-mel spectrum). Using the above settings, the proposed ParallelNet finally reached the accuracy of 81.55% in ESC50, which also reached the recognition level of human beings. Full article
(This article belongs to the Special Issue Artificial Intelligence for Multimedia Signal Processing)
Show Figures

Figure 1

Open AccessArticle
Human Height Estimation by Color Deep Learning and Depth 3D Conversion
Appl. Sci. 2020, 10(16), 5531; https://doi.org/10.3390/app10165531 - 10 Aug 2020
Abstract
In this study, an estimation method for human height is proposed using color and depth information. Color images are used for deep learning by mask R-CNN to detect a human body and a human head separately. If color images are not available for [...] Read more.
In this study, an estimation method for human height is proposed using color and depth information. Color images are used for deep learning by mask R-CNN to detect a human body and a human head separately. If color images are not available for extracting the human body region due to low light environment, then the human body region is extracted by comparing between current frame in depth video and a pre-stored background depth image. The topmost point of the human head region is extracted as the top of the head and the bottommost point of the human body region as the bottom of the foot. The depth value of the head top-point is corrected to a pixel value that has high similarity to a neighboring pixel. The position of the body bottom-point is corrected by calculating a depth gradient between vertically adjacent pixels. Two head-top and foot-bottom points are converted into 3D real-world coordinates using depth information. Two real-world coordinates estimate human height by measuring a Euclidean distance. Estimation errors for human height are corrected as the average of accumulated heights. In experiment results, we achieve that the estimated errors of human height with a standing state are 0.7% and 2.2% when the human body region is extracted by mask R-CNN and the background depth image, respectively. Full article
(This article belongs to the Special Issue Artificial Intelligence for Multimedia Signal Processing)
Show Figures

Figure 1

Back to TopTop