Next Article in Journal
Investigation on Ship Hydroelastic Vibrational Responses in Waves
Previous Article in Journal
A Real-Time Measurement Method of Air Refractive Index Based on Special Material Etalon
Article

Target Speaker Localization Based on the Complex Watson Mixture Model and Time-Frequency Selection Neural Network

by 1,2,*, 1,2 and 1,2,3
1
Key Laboratory of Speech Acoustics and Content Understanding, Institute of Acoustics, Chinese Academy of Sciences, Beijing 100190, China
2
University of Chinese Academy of Sciences, Beijing 100190, China
3
Xinjiang Laboratory of Minority Speech and Language Information Processing, Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumchi 830001, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2018, 8(11), 2326; https://doi.org/10.3390/app8112326
Received: 25 October 2018 / Revised: 18 November 2018 / Accepted: 19 November 2018 / Published: 21 November 2018
(This article belongs to the Section Acoustics and Vibrations)
Common sound source localization algorithms focus on localizing all the active sources in the environment. While the source identities are generally unknown, retrieving the location of a speaker of interest requires extra effort. This paper addresses the problem of localizing a speaker of interest from a novel perspective by first performing time-frequency selection before localization. The speaker of interest, namely the target speaker, is assumed to be sparsely active in the signal spectra. The target speaker-dominant time-frequency regions are separated by a speaker-aware Long Short-Term Memory (LSTM) neural network, and they are sufficient to determine the Direction of Arrival (DoA) of the target speaker. Speaker-awareness is achieved by utilizing a short target utterance to adapt the hidden layer outputs of the neural network. The instantaneous DoA estimator is based on the probabilistic complex Watson Mixture Model (cWMM), and a weighted maximum likelihood estimation of the model parameters is accordingly derived. Simulative experiments show that the proposed algorithm works well in various noisy conditions and remains robust when the signal-to-noise ratio is low and when a competing speaker exists. View Full-Text
Keywords: target speaker localization; Watson mixture model; time-frequency selection; deep neural network target speaker localization; Watson mixture model; time-frequency selection; deep neural network
Show Figures

Figure 1

MDPI and ACS Style

Wang, Z.; Li, J.; Yan, Y. Target Speaker Localization Based on the Complex Watson Mixture Model and Time-Frequency Selection Neural Network. Appl. Sci. 2018, 8, 2326. https://doi.org/10.3390/app8112326

AMA Style

Wang Z, Li J, Yan Y. Target Speaker Localization Based on the Complex Watson Mixture Model and Time-Frequency Selection Neural Network. Applied Sciences. 2018; 8(11):2326. https://doi.org/10.3390/app8112326

Chicago/Turabian Style

Wang, Ziteng, Junfeng Li, and Yonghong Yan. 2018. "Target Speaker Localization Based on the Complex Watson Mixture Model and Time-Frequency Selection Neural Network" Applied Sciences 8, no. 11: 2326. https://doi.org/10.3390/app8112326

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop