Next Article in Journal
Investigation on Ship Hydroelastic Vibrational Responses in Waves
Previous Article in Journal
A Real-Time Measurement Method of Air Refractive Index Based on Special Material Etalon
Article Menu
Issue 11 (November) cover image

Export Article

Open AccessArticle
Appl. Sci. 2018, 8(11), 2326; https://doi.org/10.3390/app8112326

Target Speaker Localization Based on the Complex Watson Mixture Model and Time-Frequency Selection Neural Network

1,2,* , 1,2
and
1,2,3
1
Key Laboratory of Speech Acoustics and Content Understanding, Institute of Acoustics, Chinese Academy of Sciences, Beijing 100190, China
2
University of Chinese Academy of Sciences, Beijing 100190, China
3
Xinjiang Laboratory of Minority Speech and Language Information Processing, Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumchi 830001, China
*
Author to whom correspondence should be addressed.
Received: 25 October 2018 / Revised: 18 November 2018 / Accepted: 19 November 2018 / Published: 21 November 2018
(This article belongs to the Section Acoustics and Vibrations)
Full-Text   |   PDF [601 KB, uploaded 21 November 2018]   |  

Abstract

Common sound source localization algorithms focus on localizing all the active sources in the environment. While the source identities are generally unknown, retrieving the location of a speaker of interest requires extra effort. This paper addresses the problem of localizing a speaker of interest from a novel perspective by first performing time-frequency selection before localization. The speaker of interest, namely the target speaker, is assumed to be sparsely active in the signal spectra. The target speaker-dominant time-frequency regions are separated by a speaker-aware Long Short-Term Memory (LSTM) neural network, and they are sufficient to determine the Direction of Arrival (DoA) of the target speaker. Speaker-awareness is achieved by utilizing a short target utterance to adapt the hidden layer outputs of the neural network. The instantaneous DoA estimator is based on the probabilistic complex Watson Mixture Model (cWMM), and a weighted maximum likelihood estimation of the model parameters is accordingly derived. Simulative experiments show that the proposed algorithm works well in various noisy conditions and remains robust when the signal-to-noise ratio is low and when a competing speaker exists. View Full-Text
Keywords: target speaker localization; Watson mixture model; time-frequency selection; deep neural network target speaker localization; Watson mixture model; time-frequency selection; deep neural network
Figures

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).
SciFeed

Share & Cite This Article

MDPI and ACS Style

Wang, Z.; Li, J.; Yan, Y. Target Speaker Localization Based on the Complex Watson Mixture Model and Time-Frequency Selection Neural Network. Appl. Sci. 2018, 8, 2326.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics

1

Comments

[Return to top]
Appl. Sci. EISSN 2076-3417 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top