Submit to Applied Sciences Review for Applied Sciences Propose a Special Issue

Journal Browser

Deep Learning for Applications in Acoustics: Modeling, Synthesis, and Listening

Print Special Issue Flyer
Special Issue Editors
Special Issue Information
Keywords
Published Papers

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Acoustics and Vibrations".

Deadline for manuscript submissions: closed (31 July 2020) | Viewed by 53535

Share This Special Issue

Special Issue Editors

Dr. Leonardo Gabrielli

E-Mail Website
Guest Editor

Department of Information Engineering, Università Politecnica delle Marche, Ancona, Italy

Dr. George Fazekas

E-Mail Website
Guest Editor

Queen Mary University, London, UK

Dr. Juhan Nam

E-Mail Website
Guest Editor

Korea Advanced Institute of Science and Technology, Korea

Special Issue Information

Dear Colleagues,

Recent introduction of Deep Learning has led to a vast array of breakthroughs in many fields of science and engineering. The data-driven approach has gathered the attention of research communities and has often been successful in yielding solutions to very complex classification and regression problems.

In the fields of audio analysis, processing and acoustic modelling, Deep Learning has been adopted, initially borrowing their methods from the image processing and computer vision field, and then finding creative and innovative solutions to suit domain-specific needs of acoustic research. In this process, researchers are facing two big challenges: learning meaningful spatio-temporal representations of audio signals and making sense of the black-box model of neural networks, i.e. extracting knowledge that is useful for scientific advance.

In this special issue, we welcome the submission of papers dealing with novel computational methods involving modelling, parametrization, and knowledge extraction of acoustic data. The considered topics include, e.g.:

Applications of Deep Learning to sound synthesis
Control and estimation problems in physical modeling
Intelligent music production and novel digital audio effects
Representation learning and/or transfer of musical composition and performance characteristics including, timbre, style and playing technique
Analysis and modelling of acoustic phenomena including musical acoustics, speech signals, room acoustics, environmental, ecological, medical and machine sounds.
Machine listening and perception models inspired by human hearing
Application of Deep Learning to wave propagation problems in fluids and solids

We aim at fostering good research practices in Deep Learning. Considering current scientific and ethical concerns with Deep Learning, including reproducibility and explainability, we strongly support works that are based on open datasets and source code, works that excel on the scientific method, and works providing evidences and explanations for the observed phenomena.

Dr. Leonardo Gabrielli
Dr. George Fazekas
Dr. Juhan Nam
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

Deep learning
Sound synthesis
Machine listening
Audio signal processing
Sound event detection
Acoustic modelling
Digital audio effects
Audio style transfer

Published Papers (11 papers)

Download All Papers

Editorial

Jump to: Research, Review

4 pages, 164 KiB

Open AccessEditorial

Special Issue on Deep Learning for Applications in Acoustics: Modeling, Synthesis, and Listening

by Leonardo Gabrielli, György Fazekas and Juhan Nam

Appl. Sci. 2021, 11(2), 473; https://doi.org/10.3390/app11020473 - 06 Jan 2021

Cited by 2 | Viewed by 2197

Abstract

The recent introduction of Deep Learning has led to a vast array of breakthroughs in many fields of science and engineering [...] Full article

(This article belongs to the Special Issue Deep Learning for Applications in Acoustics: Modeling, Synthesis, and Listening)

Research

Jump to: Editorial, Review

16 pages, 2685 KiB

Open AccessArticle

Synthesis of Normal Heart Sounds Using Generative Adversarial Networks and Empirical Wavelet Transform

by Pedro Narváez and Winston S. Percybrooks

Appl. Sci. 2020, 10(19), 7003; https://doi.org/10.3390/app10197003 - 08 Oct 2020

Cited by 13 | Viewed by 3726

Abstract

Currently, there are many works in the literature focused on the analysis of heart sounds, specifically on the development of intelligent systems for the classification of normal and abnormal heart sounds. However, the available heart sound databases are not yet large enough to train generalized machine learning models. Therefore, there is interest in the development of algorithms capable of generating heart sounds that could augment current databases. In this article, we propose a model based on generative adversary networks (GANs) to generate normal synthetic heart sounds. Additionally, a denoising algorithm is implemented using the empirical wavelet transform (EWT), allowing a decrease in the number of epochs and the computational cost that the GAN model requires. A distortion metric (mel–cepstral distortion) was used to objectively assess the quality of synthetic heart sounds. The proposed method was favorably compared with a mathematical model that is based on the morphology of the phonocardiography (PCG) signal published as the state of the art. Additionally, different heart sound classification models proposed as state-of-the-art were also used to test the performance of such models when the GAN-generated synthetic signals were used as test dataset. In this experiment, good accuracy results were obtained with most of the implemented models, suggesting that the GAN-generated sounds correctly capture the characteristics of natural heart sounds. Full article

(This article belongs to the Special Issue Deep Learning for Applications in Acoustics: Modeling, Synthesis, and Listening)

► Show Figures

Figure 1

24 pages, 2907 KiB

Open AccessArticle

BassNet: A Variational Gated Autoencoder for Conditional Generation of Bass Guitar Tracks with Learned Interactive Control

by Maarten Grachten, Stefan Lattner and Emmanuel Deruty

Appl. Sci. 2020, 10(18), 6627; https://doi.org/10.3390/app10186627 - 22 Sep 2020

Cited by 4 | Viewed by 4798

Abstract

Deep learning has given AI-based methods for music creation a boost by over the past years. An important challenge in this field is to balance user control and autonomy in music generation systems. In this work, we present BassNet, a deep learning model for generating bass guitar tracks based on musical source material. An innovative aspect of our work is that the model is trained to learn a temporally stable two-dimensional latent space variable that offers interactive user control. We empirically show that the model can disentangle bass patterns that require sensitivity to harmony, instrument timbre, and rhythm. An ablation study reveals that this capability is because of the temporal stability constraint on latent space trajectories during training. We also demonstrate that models that are trained on pop/rock music learn a latent space that offers control over the diatonic characteristics of the output, among other things. Lastly, we present and discuss generated bass tracks for three different music fragments. The work that is presented here is a step toward the integration of AI-based technology in the workflow of musical content creators. Full article

(This article belongs to the Special Issue Deep Learning for Applications in Acoustics: Modeling, Synthesis, and Listening)

► Show Figures

Figure 1

21 pages, 2050 KiB

Open AccessArticle

Assistive Model to Generate Chord Progressions Using Genetic Programming with Artificial Immune Properties

by María Navarro-Cáceres, Javier Félix Merchán Sánchez-Jara, Valderi Reis Quietinho Leithardt and Raúl García-Ovejero

Appl. Sci. 2020, 10(17), 6039; https://doi.org/10.3390/app10176039 - 31 Aug 2020

Cited by 1 | Viewed by 2975

Abstract

In Western tonal music, tension in chord progressions plays an important role in defining the path that a musical composition should follow. The creation of chord progressions that reflects such tension profiles can be challenging for novice composers, as it depends on many subjective factors, and also is regulated by multiple theoretical principles. This work presents ChordAIS-Gen, a tool to assist the users to generate chord progressions that comply with a concrete tension profile. We propose an objective measure capable of capturing the tension profile of a chord progression according to different tonal music parameters, namely, consonance, hierarchical tension, voice leading and perceptual distance. This measure is optimized into a Genetic Program algorithm mixed with an Artificial Immune System called Opt-aiNet. Opt-aiNet is capable of finding multiple optima in parallel, resulting in multiple candidate solutions for the next chord in a sequence. To validate the objective function, we performed a listening test to evaluate the perceptual quality of the candidate solutions proposed by our system. Most listeners rated the chord progressions proposed by ChordAIS-Gen as better candidates than the progressions discarded. Thus, we propose to use the objective values as a proxy for the perceptual evaluation of chord progressions and compare the performance of ChordAIS-Gen with chord progressions generators. Full article

(This article belongs to the Special Issue Deep Learning for Applications in Acoustics: Modeling, Synthesis, and Listening)

► Show Figures

Figure 1

24 pages, 5190 KiB

Open AccessArticle

A Comparison of Human against Machine-Classification of Spatial Audio Scenes in Binaural Recordings of Music

by Sławomir K. Zieliński, Hyunkook Lee, Paweł Antoniuk and Oskar Dadan

Appl. Sci. 2020, 10(17), 5956; https://doi.org/10.3390/app10175956 - 28 Aug 2020

Cited by 9 | Viewed by 3930

Abstract

The purpose of this paper is to compare the performance of human listeners against the selected machine learning algorithms in the task of the classification of spatial audio scenes in binaural recordings of music under practical conditions. The three scenes were subject to classification: (1) music ensemble (a group of musical sources) located in the front, (2) music ensemble located at the back, and (3) music ensemble distributed around a listener. In the listening test, undertaken remotely over the Internet, human listeners reached the classification accuracy of 42.5%. For the listeners who passed the post-screening test, the accuracy was greater, approaching 60%. The above classification task was also undertaken automatically using four machine learning algorithms: convolutional neural network, support vector machines, extreme gradient boosting framework, and logistic regression. The machine learning algorithms substantially outperformed human listeners, with the classification accuracy reaching 84%, when tested under the binaural-room-impulse-response (BRIR) matched conditions. However, when the algorithms were tested under the BRIR mismatched scenario, the accuracy obtained by the algorithms was comparable to that exhibited by the listeners who passed the post-screening test, implying that the machine learning algorithms capability to perform in unknown electro-acoustic conditions needs to be further improved. Full article

(This article belongs to the Special Issue Deep Learning for Applications in Acoustics: Modeling, Synthesis, and Listening)

► Show Figures

Figure 1

24 pages, 2773 KiB

Open AccessArticle

Low-Order Spherical Harmonic HRTF Restoration Using a Neural Network Approach

by Benjamin Tsui, William A. P. Smith and Gavin Kearney

Appl. Sci. 2020, 10(17), 5764; https://doi.org/10.3390/app10175764 - 20 Aug 2020

Cited by 3 | Viewed by 2652

Abstract

Spherical harmonic (SH) interpolation is a commonly used method to spatially up-sample sparse head related transfer function (HRTF) datasets to denser HRTF datasets. However, depending on the number of sparse HRTF measurements and SH order, this process can introduce distortions into high frequency representations of the HRTFs. This paper investigates whether it is possible to restore some of the distorted high frequency HRTF components using machine learning algorithms. A combination of convolutional auto-encoder (CAE) and denoising auto-encoder (DAE) models is proposed to restore the high frequency distortion in SH-interpolated HRTFs. Results were evaluated using both perceptual spectral difference (PSD) and localisation prediction models, both of which demonstrated significant improvement after the restoration process. Full article

(This article belongs to the Special Issue Deep Learning for Applications in Acoustics: Modeling, Synthesis, and Listening)

► Show Figures

Figure 1

22 pages, 2689 KiB

Open AccessArticle

Bioacoustic Classification of Antillean Manatee Vocalization Spectrograms Using Deep Convolutional Neural Networks

by Fernando Merchan, Ariel Guerra, Héctor Poveda, Héctor M. Guzmán and Javier E. Sanchez-Galan

Appl. Sci. 2020, 10(9), 3286; https://doi.org/10.3390/app10093286 - 08 May 2020

Cited by 11 | Viewed by 3632

Abstract

We evaluated the potential of using convolutional neural networks in classifying spectrograms of Antillean manatee (Trichechus manatus manatus) vocalizations. Spectrograms using binary, linear and logarithmic amplitude formats were considered. Two deep convolutional neural networks (DCNN) architectures were tested: linear (fixed filter size) and pyramidal (incremental filter size). Six experiments were devised for testing the accuracy obtained for each spectrogram representation and architecture combination. Results show that binary spectrograms with both linear and pyramidal architectures with dropout provide a classification rate of 94–99% on the training and 92–98% on the testing set, respectively. The pyramidal network presents a shorter training and inference time. Results from the convolutional neural networks (CNN) are substantially better when compared with a signal processing fast Fourier transform (FFT)-based harmonic search approach in terms of accuracy and F1 Score. Taken together, these results prove the validity of using spectrograms and using DCNNs for manatee vocalization classification. These results can be used to improve future software and hardware implementations for the estimation of the manatee population in Panama. Full article

(This article belongs to the Special Issue Deep Learning for Applications in Acoustics: Modeling, Synthesis, and Listening)

► Show Figures

Figure 1

21 pages, 4045 KiB

Open AccessArticle

Designing Audio Equalization Filters by Deep Neural Networks

by Giovanni Pepe, Leonardo Gabrielli, Stefano Squartini and Luca Cattani

Appl. Sci. 2020, 10(7), 2483; https://doi.org/10.3390/app10072483 - 04 Apr 2020

Cited by 16 | Viewed by 5090

Abstract

Audio equalization is an active research topic aiming at improving the audio quality of a loudspeaker system by correcting the overall frequency response using linear filters. The estimation of their coefficients is not an easy task, especially in binaural and multipoint scenarios, due to the contribution of multiple impulse responses to each listening point. This paper presents a deep learning approach for tuning filter coefficients employing three different neural networks architectures—the Multilayer Perceptron, the Convolutional Neural Network, and the Convolutional Autoencoder. Suitable loss functions are proposed for each architecture, and are formulated in terms of spectral Euclidean distance. The experiments were conducted in the automotive scenario, considering several loudspeakers and microphones. The obtained results show that deep learning techniques give superior performance compared to baseline methods, achieving almost flat magnitude frequency response. Full article

(This article belongs to the Special Issue Deep Learning for Applications in Acoustics: Modeling, Synthesis, and Listening)

► Show Figures

Figure 1

22 pages, 2090 KiB

Open AccessArticle

An Analysis of Rhythmic Patterns with Unsupervised Learning

by Matevž Pesek, Aleš Leonardis and Matija Marolt

Appl. Sci. 2020, 10(1), 178; https://doi.org/10.3390/app10010178 - 25 Dec 2019

Cited by 7 | Viewed by 6138

Abstract

This paper presents a model capable of learning the rhythmic characteristics of a music signal through unsupervised learning. The model learns a multi-layer hierarchy of rhythmic patterns ranging from simple structures on lower layers to more complex patterns on higher layers. The learned hierarchy is fully transparent, which enables observation and explanation of the structure of the learned patterns. The model employs tempo-invariant encoding of patterns and can thus learn and perform inference on tempo-varying and noisy input data. We demonstrate the model’s capabilities of learning distinctive rhythmic structures of different music genres using unsupervised learning. To test its robustness, we show how the model can efficiently extract rhythmic structures in songs with changing time signatures and live recordings. Additionally, the model’s time-complexity is empirically tested to show its usability for analysis-related applications. Full article

(This article belongs to the Special Issue Deep Learning for Applications in Acoustics: Modeling, Synthesis, and Listening)

► Show Figures

Figure 1

13 pages, 2806 KiB

Open AccessArticle

Noise-Robust Voice Conversion Using High-Quefrency Boosting via Sub-Band Cepstrum Conversion and Fusion

by Xiaokong Miao, Meng Sun, Xiongwei Zhang and Yimin Wang

Appl. Sci. 2020, 10(1), 151; https://doi.org/10.3390/app10010151 - 23 Dec 2019

Cited by 10 | Viewed by 3413

Abstract

This paper presents a noise-robust voice conversion method with high-quefrency boosting via sub-band cepstrum conversion and fusion based on the bidirectional long short-term memory (BLSTM) neural networks that can convert parameters of vocal tracks of a source speaker into those of a target speaker. With the implementation of state-of-the-art machine learning methods, voice conversion has achieved good performance given abundant clean training data. However, the quality and similarity of the converted voice are significantly degraded compared to that of a natural target voice due to various factors, such as limited training data and noisy input speech from the source speaker. To address the problem of noisy input speech, an architecture of voice conversion with statistical filtering and sub-band cepstrum conversion and fusion is introduced. The impact of noises on the converted voice is reduced by the accurate reconstruction of the sub-band cepstrum and the subsequent statistical filtering. By normalizing the mean and variance of the converted cepstrum to those of the target cepstrum in the training phase, a cepstrum filter was constructed to further improve the quality of the converted voice. The experimental results showed that the proposed method significantly improved the naturalness and similarity of the converted voice compared to the baselines, even with the noisy inputs of source speakers. Full article

(This article belongs to the Special Issue Deep Learning for Applications in Acoustics: Modeling, Synthesis, and Listening)

► Show Figures

Figure 1

Review

Jump to: Editorial, Research

16 pages, 231 KiB

Open AccessReview

A Review of Deep Learning Based Methods for Acoustic Scene Classification

by Jakob Abeßer

Appl. Sci. 2020, 10(6), 2020; https://doi.org/10.3390/app10062020 - 16 Mar 2020

Cited by 97 | Viewed by 11841

Abstract

The number of publications on acoustic scene classification (ASC) in environmental audio recordings has constantly increased over the last few years. This was mainly stimulated by the annual Detection and Classification of Acoustic Scenes and Events (DCASE) competition with its first edition in 2013. All competitions so far involved one or multiple ASC tasks. With a focus on deep learning based ASC algorithms, this article summarizes and groups existing approaches for data preparation, i.e., feature representations, feature pre-processing, and data augmentation, and for data modeling, i.e., neural network architectures and learning paradigms. Finally, the paper discusses current algorithmic limitations and open challenges in order to preview possible future developments towards the real-life application of ASC systems. Full article

(This article belongs to the Special Issue Deep Learning for Applications in Acoustics: Modeling, Synthesis, and Listening)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Deep Learning for Applications in Acoustics: Modeling, Synthesis, and Listening

Share This Special Issue

Special Issue Editors

Special Issue Information

Keywords

Published Papers (11 papers)

Editorial

Research

Review

Further Information

Guidelines

MDPI Initiatives

Follow MDPI