Next Article in Journal
Experimental Investigation on Chemical Grouting in a Permeated Fracture Replica with Different Roughness
Previous Article in Journal
Deep Learning Application to Ensemble Learning—The Simple, but Effective, Approach to Sentiment Classifying
Previous Article in Special Issue
Dual-Channel Speech Enhancement Based on Extended Kalman Filter Relative Transfer Function Estimation
Article Menu
Issue 13 (July-1) cover image

Export Article

Open AccessArticle

Restricted Boltzmann Machine Vectors for Speaker Clustering and Tracking Tasks in TV Broadcast Shows

TALP Research Center, Department of Signal Theory and Communications, Universitat Politecnica de Catalunya—BarcelonaTech, 08034 Barcelona, Spain
*
Author to whom correspondence should be addressed.
This paper is an extended version of our paper published in IberSPEECH-2018.
Appl. Sci. 2019, 9(13), 2761; https://doi.org/10.3390/app9132761
Received: 21 May 2019 / Revised: 23 June 2019 / Accepted: 2 July 2019 / Published: 9 July 2019
  |  
PDF [703 KB, uploaded 9 July 2019]
  |  

Abstract

Restricted Boltzmann Machines (RBMs) have shown success in both the front-end and backend of speaker verification systems. In this paper, we propose applying RBMs to the front-end for the tasks of speaker clustering and speaker tracking in TV broadcast shows. RBMs are trained to transform utterances into a vector based representation. Because of the lack of data for a test speaker, we propose RBM adaptation to a global model. First, the global model—which is referred to as universal RBM—is trained with all the available background data. Then an adapted RBM model is trained with the data of each test speaker. The visible to hidden weight matrices of the adapted models are concatenated along with the bias vectors and are whitened to generate the vector representation of speakers. These vectors, referred to as RBM vectors, were shown to preserve speaker-specific information and are used in the tasks of speaker clustering and speaker tracking. The evaluation was performed on the audio recordings of Catalan TV Broadcast shows. The experimental results show that our proposed speaker clustering system gained up to 12% relative improvement, in terms of Equal Impurity (EI), over the baseline system. On the other hand, in the task of speaker tracking, our system has a relative improvement of 11% and 7% compared to the baseline system using cosine and Probabilistic Linear Discriminant Analysis (PLDA) scoring, respectively. View Full-Text
Keywords: speaker tracking; speaker clustering; speaker segmentation; restricted boltzmann machine adaptation; agglomerative hierarchical clustering speaker tracking; speaker clustering; speaker segmentation; restricted boltzmann machine adaptation; agglomerative hierarchical clustering
Figures

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).
SciFeed

Share & Cite This Article

MDPI and ACS Style

Khan, U.; Safari, P.; Hernando, J. Restricted Boltzmann Machine Vectors for Speaker Clustering and Tracking Tasks in TV Broadcast Shows. Appl. Sci. 2019, 9, 2761.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics

1

Comments

[Return to top]
Appl. Sci. EISSN 2076-3417 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top