Next Article in Journal
Control Technology of Ground-Based Laser Communication Servo Turntable via a Novel Digital Sliding Mode Controller
Previous Article in Journal
Electrical Characterization of Pork Tissue Measured by a Monopolar Injection Needle and Discrete Fourier Transform based Impedance Measurement
Review

A Review of Deep Learning Based Speech Synthesis

by 1,2,3,4, 1,2,3,4, 5,6,*, 1,2 and 3,4
1
Research Institute of Information Technology Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing 100084, China
2
Department of Computer Science and Technology Institute of Internet Industry, Tsinghua University, Beijing 100084, China
3
National Engineering Research Center for Supporting Software of Enterprise Internet Services, Shenzhen 518057, China
4
Kingdee Research, Kingdee International Software Group Company Limited, Shenzhen 518057, China
5
Tsinghua-CUHK Joint Research Center for Media Sciences, Technologies and Systems, Shenzhen Key Laboratory of Information Science and Technology, Graduate School at Shenzhen, Tsinghua University, Shenzhen 518055, China
6
Tsinghua National Laboratory for Information Science and Technology (TNList), Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2019, 9(19), 4050; https://doi.org/10.3390/app9194050
Received: 1 August 2019 / Revised: 6 September 2019 / Accepted: 20 September 2019 / Published: 27 September 2019
(This article belongs to the Section Computing and Artificial Intelligence)
Speech synthesis, also known as text-to-speech (TTS), has attracted increasingly more attention. Recent advances on speech synthesis are overwhelmingly contributed by deep learning or even end-to-end techniques which have been utilized to enhance a wide range of application scenarios such as intelligent speech interaction, chatbot or conversational artificial intelligence (AI). For speech synthesis, deep learning based techniques can leverage a large scale of <text, speech> pairs to learn effective feature representations to bridge the gap between text and speech, thus better characterizing the properties of events. To better understand the research dynamics in the speech synthesis field, this paper firstly introduces the traditional speech synthesis methods and highlights the importance of the acoustic modeling from the composition of the statistical parametric speech synthesis (SPSS) system. It then gives an overview of the advances on deep learning based speech synthesis, including the end-to-end approaches which have achieved start-of-the-art performance in recent years. Finally, it discusses the problems of the deep learning methods for speech synthesis, and also points out some appealing research directions that can bring the speech synthesis research into a new frontier. View Full-Text
Keywords: deep learning; speech synthesis; end-to-end; text analysis deep learning; speech synthesis; end-to-end; text analysis
MDPI and ACS Style

Ning, Y.; He, S.; Wu, Z.; Xing, C.; Zhang, L.-J. A Review of Deep Learning Based Speech Synthesis. Appl. Sci. 2019, 9, 4050. https://doi.org/10.3390/app9194050

AMA Style

Ning Y, He S, Wu Z, Xing C, Zhang L-J. A Review of Deep Learning Based Speech Synthesis. Applied Sciences. 2019; 9(19):4050. https://doi.org/10.3390/app9194050

Chicago/Turabian Style

Ning, Yishuang, Sheng He, Zhiyong Wu, Chunxiao Xing, and Liang-Jie Zhang. 2019. "A Review of Deep Learning Based Speech Synthesis" Applied Sciences 9, no. 19: 4050. https://doi.org/10.3390/app9194050

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop