Next Article in Journal
Citric Acid Tunes the Formation of Antimicrobial Melanin-Like Nanostructures
Next Article in Special Issue
An Approximation of Heart Failure Using Cardiovascular Simulation Toolbox
Previous Article in Journal
Dermal Denticles of Three Slowly Swimming Shark Species: Microscopy and Flow Visualization
Previous Article in Special Issue
2018 IEEE International Work Conference on Bioinspired Intelligence (IWOBI): Conference Report
Open AccessArticle

Improving Post-Filtering of Artificial Speech Using Pre-Trained LSTM Neural Networks

Escuela de Ingeniería Eléctrica, Universidad de Costa Rica, San José 11501-2060, Costa Rica
Biomimetics 2019, 4(2), 39; https://doi.org/10.3390/biomimetics4020039
Received: 14 March 2019 / Revised: 16 May 2019 / Accepted: 22 May 2019 / Published: 28 May 2019
(This article belongs to the Special Issue Bioinspired Intelligence)
Several researchers have contemplated deep learning-based post-filters to increase the quality of statistical parametric speech synthesis, which perform a mapping of the synthetic speech to the natural speech, considering the different parameters separately and trying to reduce the gap between them. The Long Short-term Memory (LSTM) Neural Networks have been applied successfully in this purpose, but there are still many aspects to improve in the results and in the process itself. In this paper, we introduce a new pre-training approach for the LSTM, with the objective of enhancing the quality of the synthesized speech, particularly in the spectrum, in a more efficient manner. Our approach begins with an auto-associative training of one LSTM network, which is used as an initialization for the post-filters. We show the advantages of this initialization for the enhancing of the Mel-Frequency Cepstral parameters of synthetic speech. Results show that the initialization succeeds in achieving better results in enhancing the statistical parametric speech spectrum in most cases when compared to the common random initialization approach of the networks. View Full-Text
Keywords: deep learning; LSTM; machine learning; post-filtering; signal processing; speech synthesis deep learning; LSTM; machine learning; post-filtering; signal processing; speech synthesis
Show Figures

Figure 1

MDPI and ACS Style

Coto-Jiménez, M. Improving Post-Filtering of Artificial Speech Using Pre-Trained LSTM Neural Networks. Biomimetics 2019, 4, 39.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map

1
Back to TopTop