Submit to Electronics Review for Electronics Propose a Special Issue

Journal Menu

Journal Browser

Digital Signal, Image and Video Processing for Emerging Multimedia Technology, Volume II

Print Special Issue Flyer
Special Issue Editors
Special Issue Information
Keywords
Benefits of Publishing in a Special Issue
Published Papers

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Computer Science & Engineering".

Deadline for manuscript submissions: closed (31 August 2022) | Viewed by 48312

Share This Special Issue

Special Issue Editor

Prof. Dr. Byung-Gyu Kim

E-Mail Website
Guest Editor

Division of AI Engineering, Sookmyung Women's University, Seoul 04310, Republic of Korea
Interests: deep learning; multimedia processing; visual intelligence; emotion recognition
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Recent developments in image/video-based deep learning technology have enabled new services in the field of multimedia and recognition technology. The technologies underlying the development of these recognition and emerging services are based on essential signal and image processing algorithms. In addition, the recent realistic media services, mixed reality, augmented reality and virtual reality media services also require very high-definition media creation, personalization, and transmission technologies, and this demand continues to grow. To accommodate these needs, international standardization and industry are studying various digital signal and image processing technologies to provide a variety of new or future media services.While this Special Issue invites topics broadly across the advanced signal, image and video processing algorithms and technologies for emerging multimedia services, some specific topics include, but are not limited to:

Signal/image/video processing algorithm for advanced machine learning
Fast and complexity-reducing mechanisms to support real-time systems
Protecting technologies for privacy/personalized information
Advanced circuit and system design and implementation for emerging multimedia services
Image/video-based recognition algorithms using deep neural networks
Novel applications for emerging multimedia services
Efficient media sharing schemes in distributed environments

Prof. Dr. Byung-Gyu Kim
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 250 words) can be sent to the Editorial Office for assessment.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

Emerging multimedia
Signal/image/video processing
Real-time systems
Advanced machine learning
Image/video-based deep learning

Benefits of Publishing in a Special Issue

Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (10 papers)

Download All Papers

Order results

Result details

Show export options Show export options

Select all

Export citation of selected articles as:

Research

14 pages, 3414 KB

Open AccessFeature PaperArticle

Speech Emotion Recognition Based on Parallel CNN-Attention Networks with Multi-Fold Data Augmentation

by John Lorenzo Bautista, Yun Kyung Lee and Hyun Soon Shin

Electronics 2022, 11(23), 3935; https://doi.org/10.3390/electronics11233935 - 28 Nov 2022

Cited by 37 | Viewed by 5200

Abstract

In this paper, an automatic speech emotion recognition (SER) task of classifying eight different emotions was experimented using parallel based networks trained using the Ryeson Audio-Visual Dataset of Speech and Song (RAVDESS) dataset. A combination of a CNN-based network and attention-based networks, running in parallel, was used to model both spatial features and temporal feature representations. Multiple Augmentation techniques using Additive White Gaussian Noise (AWGN), SpecAugment, Room Impulse Response (RIR), and Tanh Distortion techniques were used to augment the training data to further generalize the model representation. Raw audio data were transformed into Mel-Spectrograms as the model’s input. Using CNN’s proven capability in image classification and spatial feature representations, the spectrograms were treated as an image with the height and width represented by the spectrogram’s time and frequency scales. Temporal feature representations were represented by attention-based models Transformer, and BLSTM-Attention modules. Proposed architectures of the parallel CNN-based networks running along with Transformer and BLSTM-Attention modules were compared with standalone CNN architectures and attention-based networks, as well as with hybrid architectures with CNN layers wrapped in time-distributed wrappers stacked on attention-based networks. In these experiments, the highest accuracy of 89.33% for a Parallel CNN-Transformer network and 85.67% for a Parallel CNN-BLSTM-Attention Network were achieved on a 10% hold-out test set from the dataset. These networks showed promising results based on their accuracies, while keeping significantly less training parameters compared with non-parallel hybrid models. Full article

(This article belongs to the Special Issue Digital Signal, Image and Video Processing for Emerging Multimedia Technology, Volume II)

► Show Figures

Figure 1

13 pages, 1086 KB

Open AccessArticle

An Adjacency Encoding Information-Based Fast Affine Motion Estimation Method for Versatile Video Coding

by Ximei Li, Jun He, Qi Li and Xingru Chen

Electronics 2022, 11(21), 3429; https://doi.org/10.3390/electronics11213429 - 23 Oct 2022

Cited by 6 | Viewed by 2658

Abstract

Versatile video coding (VVC), a new generation video coding standard, achieves significant improvements over high efficiency video coding (HEVC) due to its added advanced coding tools. Despite the fact that affine motion estimation adopted in VVC takes into account the translational, rotational, and scaling motions of the object to improve the accuracy of interprediction, this technique adds a high computational complexity, making VVC unsuitable for use in real-time applications. To address this issue, an adjacency encoding information-based fast affine motion estimation method for VVC is proposed in this paper. First, this paper counts the probability of using the affine mode in interprediction. Then we analyze the trade-off between computational complexity and performance improvement based on statistical information. Finally, by exploring the mutual exclusivity between skip and affine modes, an enhanced method is proposed to reduce interprediction complexity. Experimental results show that compared with the VVC, the proposed low-complexity method achieves 10.11% total encoding time reduction and 40.85% time saving of affine motion estimation with a 0.16% Bjøontegaard delta bitrate (BDBR) increase. Full article

(This article belongs to the Special Issue Digital Signal, Image and Video Processing for Emerging Multimedia Technology, Volume II)

► Show Figures

Figure 1

17 pages, 4815 KB

Open AccessArticle

Assessment of Compressed and Decompressed ECG Databases for Telecardiology Applying a Convolution Neural Network

by Ekta Soni, Arpita Nagpal, Puneet Garg and Plácido Rogerio Pinheiro

Electronics 2022, 11(17), 2708; https://doi.org/10.3390/electronics11172708 - 29 Aug 2022

Cited by 22 | Viewed by 3225

Abstract

Incalculable numbers of patients in hospitals as a result of COVID-19 made the screening of heart patients arduous. Patients who need regular heart monitoring were affected the most. Telecardiology is used for regular remote heart monitoring of such patients. However, the resultant huge electrocardiogram (ECG) data obtained through regular monitoring affects available storage space and transmission bandwidth. These signals can take less space if stored or sent in a compressed form. To recover them at the receiver end, they are decompressed. We have combined telecardiology with automatic ECG arrhythmia classification using CNN and proposed an algorithm named TELecardiology using a Deep Convolution Neural Network (TELDCNN). Discrete cosine transform (DCT), 16-bit quantization, and run length encoding (RLE) were used for compression, and a convolution neural network (CNN) was applied for classification. The database was formed by combining real-time signals (taken from a designed ECG device) with an online database from Physionet. Four kinds of databases were considered and classified. The attained compression ratio was 2.56, and the classification accuracies for compressed and decompressed databases were 0.966 and 0.990, respectively. Comparing the classification performance of compressed and decompressed databases shows that the decompressed signals can classify the arrhythmias more appropriately than their compressed-only form, although at the cost of increased computational time. Full article

(This article belongs to the Special Issue Digital Signal, Image and Video Processing for Emerging Multimedia Technology, Volume II)

► Show Figures

Figure 1

15 pages, 824 KB

Open AccessArticle

Aspect-Based Sentiment Analysis in Hindi Language by Ensembling Pre-Trained mBERT Models

by Abhilash Pathak, Sudhanshu Kumar, Partha Pratim Roy and Byung-Gyu Kim

Electronics 2021, 10(21), 2641; https://doi.org/10.3390/electronics10212641 - 28 Oct 2021

Cited by 31 | Viewed by 5879

Abstract

Sentiment Analysis is becoming an essential task for academics, as well as for commercial companies. However, most current approaches only identify the overall polarity of a sentence, instead of the polarity of each aspect mentioned in the sentence. Aspect-Based Sentiment Analysis (ABSA) identifies the aspects within the given sentence, and the sentiment that was expressed for each aspect. Recently, the use of pre-trained models such as BERT has achieved state-of-the-art results in the field of natural language processing. In this paper, we propose two ensemble models based on multilingual-BERT, namely, mBERT-E-MV and mBERT-E-AS. Using different methods, we construct an auxiliary sentence from this aspect and convert the ABSA problem to a sentence-pair classification task. We then fine-tune different pre-trained BERT models and ensemble them for a final prediction based on the proposed model; we achieve new, state-of-the-art results for datasets belonging to different domains in the Hindi language. Full article

(This article belongs to the Special Issue Digital Signal, Image and Video Processing for Emerging Multimedia Technology, Volume II)

► Show Figures

Figure 1

12 pages, 2994 KB

Open AccessArticle

Context-Based Inter Mode Decision Method for Fast Affine Prediction in Versatile Video Coding

by Seongwon Jung and Dongsan Jun

Electronics 2021, 10(11), 1243; https://doi.org/10.3390/electronics10111243 - 24 May 2021

Cited by 24 | Viewed by 3773

Abstract

Versatile Video Coding (VVC) is the most recent video coding standard developed by Joint Video Experts Team (JVET) that can achieve a bit-rate reduction of 50% with perceptually similar quality compared to the previous method, namely High Efficiency Video Coding (HEVC). Although VVC can support the significant coding performance, it leads to the tremendous computational complexity of VVC encoder. In particular, VVC has newly adopted an affine motion estimation (AME) method to overcome the limitations of the translational motion model at the expense of higher encoding complexity. In this paper, we proposed a context-based inter mode decision method for fast affine prediction that determines whether the AME is performed or not in the process of rate-distortion (RD) optimization for optimal CU-mode decision. Experimental results showed that the proposed method significantly reduced the encoding complexity of AME up to 33% with unnoticeable coding loss compared to the VVC Test Model (VTM). Full article

(This article belongs to the Special Issue Digital Signal, Image and Video Processing for Emerging Multimedia Technology, Volume II)

► Show Figures

Figure 1

12 pages, 8519 KB

Open AccessArticle

Two-Dimensional Audio Compression Method Using Video Coding Schemes

by Seonjae Kim, Dongsan Jun, Byung-Gyu Kim, Seungkwon Beack, Misuk Lee and Taejin Lee

Electronics 2021, 10(9), 1094; https://doi.org/10.3390/electronics10091094 - 6 May 2021

Cited by 3 | Viewed by 3489

Abstract

As video compression is one of the core technologies that enables seamless media streaming within the available network bandwidth, it is crucial to employ media codecs to support powerful coding performance and higher visual quality. Versatile Video Coding (VVC) is the latest video coding standard developed by the Joint Video Experts Team (JVET) that can compress original data hundreds of times in the image or video; the latest audio coding standard, Unified Speech and Audio Coding (USAC), achieves a compression rate of about 20 times for audio or speech data. In this paper, we propose a pre-processing method to generate a two-dimensional (2D) audio signal as an input of a VVC encoder, and investigate the applicability to 2D audio compression using the video coding scheme. To evaluate the coding performance, we measure both signal-to-noise ratio (SNR) and bits per sample (bps). The experimental result shows the possibility of researching 2D audio encoding using video coding schemes. Full article

(This article belongs to the Special Issue Digital Signal, Image and Video Processing for Emerging Multimedia Technology, Volume II)

► Show Figures

Figure 1

17 pages, 9369 KB

Open AccessArticle

New Image Encryption Algorithm Using Hyperchaotic System and Fibonacci Q-Matrix

by Khalid M. Hosny, Sara T. Kamal, Mohamed M. Darwish and George A. Papakostas

Electronics 2021, 10(9), 1066; https://doi.org/10.3390/electronics10091066 - 30 Apr 2021

Cited by 144 | Viewed by 7404

Abstract

In the age of Information Technology, the day-life required transmitting millions of images between users. Securing these images is essential. Digital image encryption is a well-known technique used in securing image content. In image encryption techniques, digital images are converted into noise images using secret keys, where restoring them to their originals required the same keys. Most image encryption techniques depend on two steps: confusion and diffusion. In this work, a new algorithm presented for image encryption using a hyperchaotic system and Fibonacci Q-matrix. The original image is confused in this algorithm, utilizing randomly generated numbers by the six-dimension hyperchaotic system. Then, the permutated image diffused using the Fibonacci Q-matrix. The proposed image encryption algorithm tested using noise and data cut attacks, histograms, keyspace, and sensitivity. Moreover, the proposed algorithm’s performance compared with several existing algorithms using entropy, correlation coefficients, and robustness against attack. The proposed algorithm achieved an excellent security level and outperformed the existing image encryption algorithms. Full article

(This article belongs to the Special Issue Digital Signal, Image and Video Processing for Emerging Multimedia Technology, Volume II)

► Show Figures

Figure 1

12 pages, 2359 KB

Open AccessArticle

WMNet: A Lossless Watermarking Technique Using Deep Learning for Medical Image Authentication

by Yueh-Peng Chen, Tzuo-Yau Fan and Her-Chang Chao

Electronics 2021, 10(8), 932; https://doi.org/10.3390/electronics10080932 - 14 Apr 2021

Cited by 33 | Viewed by 4532

Abstract

Traditional watermarking techniques extract the watermark from a suspected image, allowing the copyright information regarding the image owner to be identified by the naked eye or by similarity estimation methods such as bit error rate and normalized correlation. However, this process should be more objective. In this paper, we implemented a model based on deep learning technology that can accurately identify the watermark copyright, known as WMNet. In the past, when establishing deep learning models, a large amount of training data needed to be collected. While constructing WMNet, we implemented a simulated process to generate a large number of distorted watermarks, and then collected them to form a training dataset. However, not all watermarks in the training dataset could properly provide copyright information. Therefore, according to the set restrictions, we divided the watermarks in the training dataset into two categories; consequently, WMNet could learn and identify the copyright information that the watermarks contained, so as to assist in the copyright verification process. Even if the retrieved watermark information was incomplete, the copyright information it contained could still be interpreted objectively and accurately. The results show that the method proposed by this study is relatively effective. Full article

(This article belongs to the Special Issue Digital Signal, Image and Video Processing for Emerging Multimedia Technology, Volume II)

► Show Figures

Figure 1

16 pages, 3560 KB

Open AccessArticle

Cross-Modal Learning Based on Semantic Correlation and Multi-Task Learning for Text-Video Retrieval

by Xiaoyu Wu, Tiantian Wang and Shengjin Wang

Electronics 2020, 9(12), 2125; https://doi.org/10.3390/electronics9122125 - 11 Dec 2020

Cited by 5 | Viewed by 3858

Abstract

Text-video retrieval tasks face a great challenge in the semantic gap between cross modal information. Some existing methods transform the text or video into the same subspace to measure their similarity. However, this kind of method does not consider adding a semantic consistency constraint when associating the two modalities of semantic encoding, and the associated result is poor. In this paper, we propose a multi-modal retrieval algorithm based on semantic association and multi-task learning. Firstly, the multi-level features of video or text are extracted based on multiple deep learning networks, so that the information of the two modalities can be fully encoded. Then, in the public feature space where the two modalities information are mapped together, we propose a semantic similarity measurement and semantic consistency classification based on text-video features for a multi-task learning framework. With the semantic consistency classification task, the learning of semantic association task is restrained. So multi-task learning guides the better feature mapping of two modalities and optimizes the construction of unified feature subspace. Finally, the experimental results of our proposed algorithm on the Microsoft Video Description dataset (MSVD) and MSR-Video to Text (MSR-VTT) are better than the existing research, which prove that our algorithm can improve the performance of cross-modal retrieval. Full article

(This article belongs to the Special Issue Digital Signal, Image and Video Processing for Emerging Multimedia Technology, Volume II)

► Show Figures

Figure 1

22 pages, 3563 KB

Open AccessArticle

A Robust Forgery Detection Method for Copy–Move and Splicing Attacks in Images

by Mohammad Manzurul Islam, Gour Karmakar, Joarder Kamruzzaman and Manzur Murshed

Electronics 2020, 9(9), 1500; https://doi.org/10.3390/electronics9091500 - 12 Sep 2020

Cited by 30 | Viewed by 6178

Abstract

Internet of Things (IoT) image sensors, social media, and smartphones generate huge volumes of digital images every day. Easy availability and usability of photo editing tools have made forgery attacks, primarily splicing and copy–move attacks, effortless, causing cybercrimes to be on the rise. While several models have been proposed in the literature for detecting these attacks, the robustness of those models has not been investigated when (i) a low number of tampered images are available for model building or (ii) images from IoT sensors are distorted due to image rotation or scaling caused by unwanted or unexpected changes in sensors’ physical set-up. Moreover, further improvement in detection accuracy is needed for real-word security management systems. To address these limitations, in this paper, an innovative image forgery detection method has been proposed based on Discrete Cosine Transformation (DCT) and Local Binary Pattern (LBP) and a new feature extraction method using the mean operator. First, images are divided into non-overlapping fixed size blocks and 2D block DCT is applied to capture changes due to image forgery. Then LBP is applied to the magnitude of the DCT array to enhance forgery artifacts. Finally, the mean value of a particular cell across all LBP blocks is computed, which yields a fixed number of features and presents a more computationally efficient method. Using Support Vector Machine (SVM), the proposed method has been extensively tested on four well known publicly available gray scale and color image forgery datasets, and additionally on an IoT based image forgery dataset that we built. Experimental results reveal the superiority of our proposed method over recent state-of-the-art methods in terms of widely used performance metrics and computational time and demonstrate robustness against low availability of forged training samples. Full article

(This article belongs to the Special Issue Digital Signal, Image and Video Processing for Emerging Multimedia Technology, Volume II)

► Show Figures

Journal Menu

Journal Browser

Digital Signal, Image and Video Processing for Emerging Multimedia Technology, Volume II

Share This Special Issue

Special Issue Editor

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Published Papers (10 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI