A Multi-Resolution Approach to GAN-Based Speech Enhancement
Department of Electrical and Computer Engineering and the Institute of New Media and Communications, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul 08826, Korea
*
Author to whom correspondence should be addressed.
Appl. Sci. 2021, 11(2), 721; https://doi.org/10.3390/app11020721
Received: 2 December 2020 / Revised: 8 January 2021 / Accepted: 10 January 2021 / Published: 13 January 2021
(This article belongs to the Special Issue Artificial Intelligence for Multimedia Signal Processing)
Recently, generative adversarial networks (GANs) have been successfully applied to speech enhancement. However, there still remain two issues that need to be addressed: (1) GAN-based training is typically unstable due to its non-convex property, and (2) most of the conventional methods do not fully take advantage of the speech characteristics, which could result in a sub-optimal solution. In order to deal with these problems, we propose a progressive generator that can handle the speech in a multi-resolution fashion. Additionally, we propose a multi-scale discriminator that discriminates the real and generated speech at various sampling rates to stabilize GAN training. The proposed structure was compared with the conventional GAN-based speech enhancement algorithms using the VoiceBank-DEMAND dataset. Experimental results showed that the proposed approach can make the training faster and more stable, which improves the performance on various metrics for speech enhancement.
View Full-Text
Keywords:
speech enhancement; generative adversarial network; relativistic GAN; convolutional neural network
▼
Show Figures
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited
MDPI and ACS Style
Kim, H.Y.; Yoon, J.W.; Cheon, S.J.; Kang, W.H.; Kim, N.S. A Multi-Resolution Approach to GAN-Based Speech Enhancement. Appl. Sci. 2021, 11, 721. https://doi.org/10.3390/app11020721
AMA Style
Kim HY, Yoon JW, Cheon SJ, Kang WH, Kim NS. A Multi-Resolution Approach to GAN-Based Speech Enhancement. Applied Sciences. 2021; 11(2):721. https://doi.org/10.3390/app11020721
Chicago/Turabian StyleKim, Hyung Y.; Yoon, Ji W.; Cheon, Sung J.; Kang, Woo H.; Kim, Nam S. 2021. "A Multi-Resolution Approach to GAN-Based Speech Enhancement" Appl. Sci. 11, no. 2: 721. https://doi.org/10.3390/app11020721
Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.
Search more from Scilit