Next Article in Journal
Plasmon Modulation Spectroscopy of Noble Metals to Reveal the Distribution of the Fermi Surface Electrons in the Conduction Band
Next Article in Special Issue
Playing for a Virtual Audience: The Impact of a Social Factor on Gestures, Sounds and Expressive Intents
Previous Article in Journal
Compensation for Group Velocity of Polychromatic Wave Measurement in Dispersive Medium
Previous Article in Special Issue
A Psychoacoustic-Based Multiple Audio Object Coding Approach via Intra-Object Sparsity
Article Menu
Issue 12 (December) cover image

Export Article

Open AccessFeature PaperArticle
Appl. Sci. 2017, 7(12), 1313; doi:10.3390/app7121313

A Neural Parametric Singing Synthesizer Modeling Timbre and Expression from Natural Songs

Music Technology Group, Universitat Pompeu Fabra, 08012 Barcelona, Spain
This paper is an extended version of our paper published in Blaauw, M.; Bonada, J. A neural parametric singing synthesizer. In Proceedings of the 18th Annual Conference of the International Speech Communication Association (Interspeech), Stockholm, Sweden, 20–24 August 2017.
These authors contributed equally to this work.
*
Author to whom correspondence should be addressed.
Academic Editor: Vesa Valimaki
Received: 3 November 2017 / Revised: 30 November 2017 / Accepted: 12 December 2017 / Published: 18 December 2017
(This article belongs to the Special Issue Sound and Music Computing)
View Full-Text   |   Download PDF [2125 KB, uploaded 18 December 2017]   |  

Abstract

We recently presented a new model for singing synthesis based on a modified version of the WaveNet architecture. Instead of modeling raw waveform, we model features produced by a parametric vocoder that separates the influence of pitch and timbre. This allows conveniently modifying pitch to match any target melody, facilitates training on more modest dataset sizes, and significantly reduces training and generation times. Nonetheless, compared to modeling waveform directly, ways of effectively handling higher-dimensional outputs, multiple feature streams and regularization become more important with our approach. In this work, we extend our proposed system to include additional components for predicting F0 and phonetic timings from a musical score with lyrics. These expression-related features are learned together with timbrical features from a single set of natural songs. We compare our method to existing statistical parametric, concatenative, and neural network-based approaches using quantitative metrics as well as listening tests. View Full-Text
Keywords: singing synthesis; machine learning; deep learning; conditional generative models; autoregressive models singing synthesis; machine learning; deep learning; conditional generative models; autoregressive models
Figures

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. (CC BY 4.0).

Share & Cite This Article

MDPI and ACS Style

Blaauw, M.; Bonada, J. A Neural Parametric Singing Synthesizer Modeling Timbre and Expression from Natural Songs. Appl. Sci. 2017, 7, 1313.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics

1

Comments

[Return to top]
Appl. Sci. EISSN 2076-3417 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top