Signal Processing Based on Convolutional Neural Network

A special issue of Information (ISSN 2078-2489). This special issue belongs to the section "Information Processes".

Deadline for manuscript submissions: closed (31 March 2023) | Viewed by 22735

Special Issue Editor


E-Mail Website
Guest Editor
Department of Electrical, Electronic and Computer Engineering, University of Ca-tania, 95125 Catania, Italy
Interests: biomedical informatics; EEG; biometrics; signal theory; RMI
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Machine Learning (ML) has recently attracted a great deal of attention in the area of signal processing due to its intrinsic ability to analyze the signal in both the time and frequency domains. Convolutional Neural Networks (CNNs) can be viewed as the class of machine learning algorithms, they are used in many fields, especially in pattern recognition, signal classification, signal processing, computer vision, and biomedical technologies.

In recent years, research on signal processing has extended towards the use of artificial intelligence techniques and, in particular, towards recent machine learning techniques, which include the modern technologies of CNNs (Convolutional Neural Networks) and DNNs (Deep Neural Networks). The main advantages concern their greater accuracy in performance, in terms of robustness to signal degradation, and their lower computational complexity thanks to the possibility of processing data directly in the time domain without necessarily having to implement sets of features, which are typically obtained in the frequency domain. The application fields are numerous: speech recognition and identification, speech synthesis, classification of signals (image, speech, audio, and medical), recognition of emotions, automatic diagnosis, advanced methods and algorithms in smart sensors.

This Special Issue is devoted to reporting novel scientific ideas, approaches, results, and (prototype) solutions/applications on signal processing algorithms based on CNNs. Contributions are solicited in the wide spectrum of topics listed below:

  • Digital Signal Processing based on Machine Learning;
  • Signal Processing Algorithms and Neural Networks;
  • Artificial Intelligence for Multimedia Signal Processing;
  • Signal Detection using Machine Learning;
  • CNNs and DNNs for Signal Classification and Coding;
  • Application of CNNs to the Diagnosis of Biomedical Signals;
  • Audio Forensics Analysis based on Machine Learning;
  • Video Signal Processing and CNNs;
  • Computer Vision based on CNNs;
  • Pattern Recognition and Machine Learning;
  • Rainfall Estimation using Convolutional Neural Network;
  • Pattern Recognition and Machine Learning.

Dr. Francesco Beritelli
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Information is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Published Papers (7 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

22 pages, 1002 KiB  
Article
Digital Audio Tampering Detection Based on Deep Temporal–Spatial Features of Electrical Network Frequency
by Chunyan Zeng, Shuai Kong, Zhifeng Wang, Kun Li and Yuhao Zhao
Information 2023, 14(5), 253; https://doi.org/10.3390/info14050253 - 22 Apr 2023
Cited by 8 | Viewed by 2140
Abstract
In recent years, digital audio tampering detection methods by extracting audio electrical network frequency (ENF) features have been widely applied. However, most digital audio tampering detection methods based on ENF have the problems of focusing on spatial features only, without effective representation of [...] Read more.
In recent years, digital audio tampering detection methods by extracting audio electrical network frequency (ENF) features have been widely applied. However, most digital audio tampering detection methods based on ENF have the problems of focusing on spatial features only, without effective representation of temporal features, and do not fully exploit the effective information in the shallow ENF features, which leads to low accuracy of audio tamper detection. Therefore, this paper proposes a new method for digital audio tampering detection based on the deep temporal–spatial feature of ENF. To extract the temporal and spatial features of the ENF, firstly, a highly accurate ENF phase sequence is extracted using the first-order Discrete Fourier Transform (DFT), and secondly, different frame processing methods are used to extract the ENF shallow temporal and spatial features for the temporal and spatial information contained in the ENF phase. To fully exploit the effective information in the shallow ENF features, we construct a parallel RDTCN-CNN network model to extract the deep temporal and spatial information by using the processing ability of Residual Dense Temporal Convolutional Network (RDTCN) and Convolutional Neural Network (CNN) for temporal and spatial information, and use the branch attention mechanism to adaptively assign weights to the deep temporal and spatial features to obtain the temporal–spatial feature with greater representational capacity, and finally, adjudicate whether the audio is tampered with by the MLP network. The experimental results show that the method in this paper outperforms the four baseline methods in terms of accuracy and F1-score. Full article
(This article belongs to the Special Issue Signal Processing Based on Convolutional Neural Network)
Show Figures

Figure 1

23 pages, 4407 KiB  
Article
Empirical Comparison between Deep and Classical Classifiers for Speaker Verification in Emotional Talking Environments
by Ali Bou Nassif, Ismail Shahin, Mohammed Lataifeh, Ashraf Elnagar and Nawel Nemmour
Information 2022, 13(10), 456; https://doi.org/10.3390/info13100456 - 27 Sep 2022
Cited by 3 | Viewed by 1430
Abstract
Speech signals carry various bits of information relevant to the speaker such as age, gender, accent, language, health, and emotions. Emotions are conveyed through modulations of facial and vocal expressions. This paper conducts an empirical comparison of performances between the classical classifiers: Gaussian [...] Read more.
Speech signals carry various bits of information relevant to the speaker such as age, gender, accent, language, health, and emotions. Emotions are conveyed through modulations of facial and vocal expressions. This paper conducts an empirical comparison of performances between the classical classifiers: Gaussian Mixture Model (GMM), Support Vector Machine (SVM), K-Nearest Neighbors (KNN), Artificial neural networks (ANN); and the deep learning classifiers, i.e., Long Short-Term Memory (LSTM), Convolutional Neural Network (CNN), and Gated Recurrent Unit (GRU) in addition to the ivector approach for a text-independent speaker verification task in neutral and emotional talking environments. The deep models undergo hyperparameter tuning using the Grid Search optimization algorithm. The models are trained and tested using a private Arabic Emirati Speech Database, Ryerson Audio–Visual Database of Emotional Speech and Song dataset (RAVDESS) database, and a public Crowd-Sourced Emotional Multimodal Actors (CREMA) database. Experimental results illustrate that deep architectures do not necessarily outperform classical classifiers. In fact, evaluation was carried out through Equal Error Rate (EER) along with Area Under the Curve (AUC) scores. The findings reveal that the GMM model yields the lowest EER values and the best AUC scores across all datasets, amongst classical classifiers. In addition, the ivector model surpasses all the fine-tuned deep models (CNN, LSTM, and GRU) based on both evaluation metrics in the neutral, as well as the emotional speech. In addition, the GMM outperforms the ivector using the Emirati and RAVDESS databases. Full article
(This article belongs to the Special Issue Signal Processing Based on Convolutional Neural Network)
Show Figures

Figure 1

24 pages, 3279 KiB  
Article
Saliency-Enabled Coding Unit Partitioning and Quantization Control for Versatile Video Coding
by Wei Li, Xiantao Jiang, Jiayuan Jin, Tian Song and Fei Richard Yu
Information 2022, 13(8), 394; https://doi.org/10.3390/info13080394 - 19 Aug 2022
Cited by 4 | Viewed by 1850
Abstract
The latest video coding standard, versatile video coding (VVC), has greatly improved coding efficiency over its predecessor standard high efficiency video coding (HEVC), but at the expense of sharply increased complexity. In the context of perceptual video coding (PVC), the visual saliency model [...] Read more.
The latest video coding standard, versatile video coding (VVC), has greatly improved coding efficiency over its predecessor standard high efficiency video coding (HEVC), but at the expense of sharply increased complexity. In the context of perceptual video coding (PVC), the visual saliency model that utilizes the characteristics of the human visual system to improve coding efficiency has become a reliable method due to advances in computer performance and visual algorithms. In this paper, a novel VVC optimization scheme compliant PVC framework is proposed, which consists of fast coding unit (CU) partition algorithm and quantization control algorithm. Firstly, based on the visual saliency model, we proposed a fast CU division scheme, including the redetermination of the CU division depth by calculating Scharr operator and variance, as well as the executive decision for intra sub-partitions (ISP), to reduce the coding complexity. Secondly, a quantization control algorithm is proposed by adjusting the quantization parameter based on multi-level classification of saliency values at the CU level to reduce the bitrate. In comparison with the reference model, experimental results indicate that the proposed method can reduce about 47.19% computational complexity and achieve a bitrate saving of 3.68% on average. Meanwhile, the proposed algorithm has reasonable peak signal-to-noise ratio losses and nearly the same subjective perceptual quality. Full article
(This article belongs to the Special Issue Signal Processing Based on Convolutional Neural Network)
Show Figures

Figure 1

14 pages, 1941 KiB  
Article
An Explainable Fake News Detector Based on Named Entity Recognition and Stance Classification Applied to COVID-19
by Giorgio De Magistris, Samuele Russo, Paolo Roma, Janusz T. Starczewski and Christian Napoli
Information 2022, 13(3), 137; https://doi.org/10.3390/info13030137 - 7 Mar 2022
Cited by 32 | Viewed by 5031
Abstract
Over the last few years, the phenomenon of fake news has become an important issue, especially during the worldwide COVID-19 pandemic, and also a serious risk for the public health. Due to the huge amount of information that is produced by the social [...] Read more.
Over the last few years, the phenomenon of fake news has become an important issue, especially during the worldwide COVID-19 pandemic, and also a serious risk for the public health. Due to the huge amount of information that is produced by the social media such as Facebook and Twitter it is becoming difficult to check the produced contents manually. This study proposes an automatic fake news detection system that supports or disproves the dubious claims while returning a set of documents from verified sources. The system is composed of multiple modules and it makes use of different techniques from machine learning, deep learning and natural language processing. Such techniques are used for the selection of relevant documents, to find among those, the ones that are similar to the tested claim and their stances. The proposed system will be used to check medical news and, in particular, the trustworthiness of posts related to the COVID-19 pandemic, vaccine and cure. Full article
(This article belongs to the Special Issue Signal Processing Based on Convolutional Neural Network)
Show Figures

Figure 1

29 pages, 1237 KiB  
Article
Neural Vocoding for Singing and Speaking Voices with the Multi-Band Excited WaveNet
by Axel Roebel and Frederik Bous
Information 2022, 13(3), 103; https://doi.org/10.3390/info13030103 - 23 Feb 2022
Cited by 5 | Viewed by 5046
Abstract
The use of the mel spectrogram as a signal parameterization for voice generation is quite recent and linked to the development of neural vocoders. These are deep neural networks that allow reconstructing high-quality speech from a given mel spectrogram. While initially developed for [...] Read more.
The use of the mel spectrogram as a signal parameterization for voice generation is quite recent and linked to the development of neural vocoders. These are deep neural networks that allow reconstructing high-quality speech from a given mel spectrogram. While initially developed for speech synthesis, now neural vocoders have also been studied in the context of voice attribute manipulation, opening new means for voice processing in audio production. However, to be able to apply neural vocoders in real-world applications, two problems need to be addressed: (1) To support use in professional audio workstations, the computational complexity should be small, (2) the vocoder needs to support a large variety of speakers, differences in voice qualities, and a wide range of intensities potentially encountered during audio production. In this context, the present study will provide a detailed description of the Multi-band Excited WaveNet, a fully convolutional neural vocoder built around signal processing blocks. It will evaluate the performance of the vocoder when trained on a variety of multi-speaker and multi-singer databases, including an experimental evaluation of the neural vocoder trained on speech and singing voices. Addressing the problem of intensity variation, the study will introduce a new adaptive signal normalization scheme that allows for robust compensation for dynamic and static gain variations. Evaluations are performed using objective measures and a number of perceptual tests including different neural vocoder algorithms known from the literature. The results confirm that the proposed vocoder compares favorably to the state-of-the-art in its capacity to generalize to unseen voices and voice qualities. The remaining challenges will be discussed. Full article
(This article belongs to the Special Issue Signal Processing Based on Convolutional Neural Network)
Show Figures

Figure 1

19 pages, 2004 KiB  
Article
A Bottleneck Auto-Encoder for F0 Transformations on Speech and Singing Voice
by Frederik Bous and Axel Roebel
Information 2022, 13(3), 102; https://doi.org/10.3390/info13030102 - 23 Feb 2022
Cited by 3 | Viewed by 3864
Abstract
In this publication, we present a deep learning-based method to transform the f0 in speech and singing voice recordings. f0 transformation is performed by training an auto-encoder on the voice signal’s mel-spectrogram and conditioning the auto-encoder on the f0. [...] Read more.
In this publication, we present a deep learning-based method to transform the f0 in speech and singing voice recordings. f0 transformation is performed by training an auto-encoder on the voice signal’s mel-spectrogram and conditioning the auto-encoder on the f0. Inspired by AutoVC/F0, we apply an information bottleneck to it to disentangle the f0 from its latent code. The resulting model successfully applies the desired f0 to the input mel-spectrograms and adapts the speaker identity when necessary, e.g., if the requested f0 falls out of the range of the source speaker/singer. Using the mean f0 error in the transformed mel-spectrograms, we define a disentanglement measure and perform a study over the required bottleneck size. The study reveals that to remove the f0 from the auto-encoder’s latent code, the bottleneck size should be smaller than four for singing and smaller than nine for speech. Through a perceptive test, we compare the audio quality of the proposed auto-encoder to f0 transformations obtained with a classical vocoder. The perceptive test confirms that the audio quality is better for the auto-encoder than for the classical vocoder. Finally, a visual analysis of the latent code for the two-dimensional case is carried out. We observe that the auto-encoder encodes phonemes as repeated discontinuous temporal gestures within the latent code. Full article
(This article belongs to the Special Issue Signal Processing Based on Convolutional Neural Network)
Show Figures

Figure 1

13 pages, 2785 KiB  
Article
Adaptive Feature Pyramid Network to Predict Crisp Boundaries via NMS Layer and ODS F-Measure Loss Function
by Gang Sun, Hancheng Yu, Xiangtao Jiang and Mingkui Feng
Information 2022, 13(1), 32; https://doi.org/10.3390/info13010032 - 12 Jan 2022
Cited by 1 | Viewed by 1775
Abstract
Edge detection is one of the fundamental computer vision tasks. Recent methods for edge detection based on a convolutional neural network (CNN) typically employ the weighted cross-entropy loss. Their predicted results being thick and needing post-processing before calculating the optimal dataset scale (ODS) [...] Read more.
Edge detection is one of the fundamental computer vision tasks. Recent methods for edge detection based on a convolutional neural network (CNN) typically employ the weighted cross-entropy loss. Their predicted results being thick and needing post-processing before calculating the optimal dataset scale (ODS) F-measure for evaluation. To achieve end-to-end training, we propose a non-maximum suppression layer (NMS) to obtain sharp boundaries without the need for post-processing. The ODS F-measure can be calculated based on these sharp boundaries. So, the ODS F-measure loss function is proposed to train the network. Besides, we propose an adaptive multi-level feature pyramid network (AFPN) to better fuse different levels of features. Furthermore, to enrich multi-scale features learned by AFPN, we introduce a pyramid context module (PCM) that includes dilated convolution to extract multi-scale features. Experimental results indicate that the proposed AFPN achieves state-of-the-art performance on the BSDS500 dataset (ODS F-score of 0.837) and the NYUDv2 dataset (ODS F-score of 0.780). Full article
(This article belongs to the Special Issue Signal Processing Based on Convolutional Neural Network)
Show Figures

Figure 1

Back to TopTop