MDPI - Publisher of Open Access Journals

13 pages, 601 KB

Open AccessArticle

Target Speaker Localization Based on the Complex Watson Mixture Model and Time-Frequency Selection Neural Network

by Ziteng Wang, Junfeng Li and Yonghong Yan

Appl. Sci. 2018, 8(11), 2326; https://doi.org/10.3390/app8112326 - 21 Nov 2018

Cited by 11 | Viewed by 6446

Common sound source localization algorithms focus on localizing all the active sources in the environment. While the source identities are generally unknown, retrieving the location of a speaker of interest requires extra effort. This paper addresses the problem of localizing a speaker of interest from a novel perspective by first performing time-frequency selection before localization. The speaker of interest, namely the target speaker, is assumed to be sparsely active in the signal spectra. The target speaker-dominant time-frequency regions are separated by a speaker-aware Long Short-Term Memory (LSTM) neural network, and they are sufficient to determine the Direction of Arrival (DoA) of the target speaker. Speaker-awareness is achieved by utilizing a short target utterance to adapt the hidden layer outputs of the neural network. The instantaneous DoA estimator is based on the probabilistic complex Watson Mixture Model (cWMM), and a weighted maximum likelihood estimation of the model parameters is accordingly derived. Simulative experiments show that the proposed algorithm works well in various noisy conditions and remains robust when the signal-to-noise ratio is low and when a competing speaker exists. Full article

(This article belongs to the Section Acoustics and Vibrations)

► Show Figures

Figure 1

Search Results (1)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (1)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI