An Expectation–MaximizationBased IVA Algorithm for Speech Source Separation Using Student’s t Mixture Model Based Source Priors
Abstract
:1. Introduction
2. Related Work
3. Proposed Method
3.1. Maximum Likelihood Estimation of SMM
3.2. The ExpectationMaximization Algorithm
3.3. The Expectation Step
3.4. The Maximization Step
Algorithm 1 EM algorithm for Student’s t Mixtures 
Require: Given a Student’s t mixture model, the aim is to maximize the log likelihood function with respect to the parameters $\theta =\{{\mathbf{W}}_{i},{\mathbf{\Lambda}}_{i},p\left({q}_{i}\right)\}$.

4. Experimentations and Results
4.1. Case I: Simulations with the Image Method
4.2. Case II: Simulations with Real RIRs
4.3. Case III: Simulations with Binaural Room Impulse Responses
5. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
Appendix A. The EM framework for SMIVA
References
 Haykin, S. Unsupervised Adaptive Filtering (Volume I: Blind Source Separation); Wiley: Hoboken, NJ, USA, 2000. [Google Scholar]
 Cherry, C. Some experiments on the recognition of speech, with one and with two ears. J. Acoust. Soc. Am. 1953, 25, 975–979. [Google Scholar] [CrossRef]
 Haykin, S.; Chen, Z. The cocktail party problem. Neural Comput. 2005, 17, 1875–1902. [Google Scholar] [CrossRef] [PubMed]
 Cichocki, A.; Amari, S. Adaptive Blind Signal and Image Processing; John Wiley: Hoboken, NJ, USA, 2002. [Google Scholar]
 McDermott, J.H. The cocktail party problem. Curr. Biol. 2009, 19, R1024–R1027. [Google Scholar] [CrossRef] [PubMed] [Green Version]
 Wang, D.; Brown, G. Fundamentals of computational auditory scene analysis. In Computational Auditory Scene Analysis: Principles, Algorithms and Applications; John Wiley and Sons: Hoboken, NJ, USA, 2006; pp. 1–44. [Google Scholar]
 Adali, T.; Anderson, M.; GengShen, F. Diversity in independent component and vector analyses: Identiability, algorithms, and applications in medical imaging. IEEE Signal Process. Mag. 2014, 31, 18–33. [Google Scholar] [CrossRef]
 Parra, L.; Alvino, C. Geometric source separation: merging convolutive source separation with geometric beamforming. IEEE Trans. Speech Audio Process. 2002, 10, 352–362. [Google Scholar] [CrossRef]
 Pedersen, M.S.; Larsen, J.U.; Kjems, U.; Parra, L.C. A survey of convolutive blind source separation methods. Springer Handb. Speech Process. Speech Commun. 2007, 8, 1–34. [Google Scholar]
 Jutten, C.; Herault, J. Blind Seperation of sources, part I: An adaptive algorithm based on neuromimetic architecture. Signal Process. 1991, 24, 1–10. [Google Scholar] [CrossRef]
 Jutten, C.; Comon, P. Handbook of Blind Source Separation: Independent Component Analysis and Applications; Academic Press: Cambridge, MA, USA, 2010. [Google Scholar]
 Lee, T.W. Independent Component Analysis: Theory and Applications; Kluwer Academic: Norwell, MA, USA, 2000. [Google Scholar]
 Hyvrinen, A. Fast and robust fixedpoint algorithms for independent component analysis. IEEE Trans. Neural Netw. 1999, 10, 626–634. [Google Scholar] [CrossRef] [Green Version]
 Parra, L.; Spence, C. Convolutive blind separation of nonstationary sources. IEEE Trans. Speech Audio Process. 2000, 8, 320–327. [Google Scholar] [CrossRef]
 Kim, T.; Attias, H.; Lee, S.; Lee, T. Blind source separation exploiting higherorder frequency dependencies. IEEE Trans. Audio Speech Lang. Process. 2007, 15, 70–79. [Google Scholar] [CrossRef]
 Kim, T.; Lee, I.; Lee, T.W. Independent vector analysis: Definition and algorithms. In Proceedings of the Fortieth Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, USA, 29 October–1 November 2006. [Google Scholar]
 Simonyan, K.; Ackermann, H.; Chang, E.F.; Greenlee, J.D. New developments in understanding the complexity of human speech production. J. Neurosci. 2016, 36, 11440–11448. [Google Scholar] [CrossRef] [PubMed]
 Cooke, M.; Ellis, D. The auditory orgnization of speech and other sources in listeners and computational models. Speech Commun. 2001, 35, 141–177. [Google Scholar] [CrossRef]
 Sun, Y.; Rafique, W.; Chambers, J.A.; Naqvi, S.M. Underdetermined source separation using timefrequency masks and an adaptive combined GaussianStudent’s t probabilistic model. In Proceedings of the 2017 IEEE ICASSP, New Orleans, LA, USA, 5–9 March 2017; pp. 4187–4191. [Google Scholar]
 Sundar, H.; Seelamantula, C.S.; Sreenivas, T. A mixture model approach for formant tracking and the robustness of Student’s t distribution. IEEE Trans. Audio Speech Lang. Process. 2012, 20, 2626–2636. [Google Scholar] [CrossRef]
 Rafique, W.; Naqvi, S.M.; Jackson, P.J.B.; Chambers, J.A. IVA algorithms using a multivariate Student’s t source prior for speech source separation in real room environments. In Proceedings of the IEEE ICASSP, South Brisbane, QLD, Australia, 19–24 April 2015; pp. 474–478. [Google Scholar]
 Rafique, W. Enhanced Independent Vector Analysis for Speech Separation in Room Environments. Ph.D. Thesis, Newcastle University, Newcastle upon Tyne, UK, 2017. [Google Scholar]
 Bishop, C.M. Pattern Recognition and Machine Learning; SpringerVerlag: Secaucus, NJ, USA, 2006. [Google Scholar]
 Hao, J.; Lee, I.; Lee, T.W.; Sejnowski, T.J. Independent Vector Analysis for Source Separation Using a Mixture of Gaussians Prior. Neural Comput. 2010, 22, 1646–1673. [Google Scholar] [CrossRef] [Green Version]
 Rafique, W.; Erateb, S.; Naqvi, S.M.; Dlay, S.S.; Chambers, J.A. Independent vector analysis for source separation using an energy driven mixed Student’s t and super Gaussian source prior. In Proceedings of the 2016 24th European Signal Processing Conference (EUSIPCO), Budapest, Hungary, 29 August–2 September 2016. [Google Scholar]
 Liang, Y. Enhanced Independent Vector Analysis for Audio Separation in a Room Environment. Ph.D. Thesis, Loughborough University, Loughborough, UK, 2013. [Google Scholar]
 Peel, D.; McLachlan, G.J. Robust mixture modelling using the t distribution. Stat. Comput. 2000, 10, 339–348. [Google Scholar] [CrossRef]
 Rafique, W.; Naqvi, S.M.; Chambers, J.A. Speech source separation using the IVA algorithm with multivariate mixed super Gaussian Student’s t source prior in real room environment. In Proceedings of the IET Conference Proceedings, London, UK, 1–2 December 2015. [Google Scholar]
 Rafique, W.; Naqvi, S.M.; Chambers, J.A. Mixed source prior for the fast independent vector analysis algorithm. In Proceedings of the IEEE Sensor Array and Multichannel Signal Processing Workshop (SAM), Rio de Janeiro, Brazil, 10–13 July 2016; pp. 1–5. [Google Scholar]
 Aroudi, A.; Veisi, H.; Sameti, H.; Mafakheri, Z. Speech signal modeling using multivariate distributions. EURASIP J. Audio Speech Music Process. 2015, 2015, 35. [Google Scholar] [CrossRef]
 Bauchau, O.A.; Trainelli, L. The vectorial parametrization of rotation. J. Nonlinear Dyn. 2003, 32, 71–92. [Google Scholar] [CrossRef]
 Dragmor, S.S.; Goh, C.J. Some counterpart inequalities in for a functional associated with Jensen’s inequality. J. Inequal. Appl. 1997, 1, 311–325. [Google Scholar] [CrossRef]
 Garofolo, J.S.; Lamel, L.F.; Fisher, W.M.; Fiscus, J.G.; Pallett, D.S.; Dahlgren, N.L.; Zue, V. TIMIT AcousticPhonetic Continuous Speech Corpus; Linguistic Data Consortium: Philadelphia, PA, USA, 1993. [Google Scholar]
 Vincent, E.; Fevotte, C.; Gribonval, R. Performance measurement in blind audio source separation. IEEE Trans. Audio Speech Lang. Process. 2006, 14, 1462–1469. [Google Scholar] [CrossRef] [Green Version]
 Allen, J.B.; Berkley, D.A. Image method for efficiently simulating smallroom acoustics. J. Acoust. Soc. Am. 1979, 65, 943–950. [Google Scholar] [CrossRef]
 Andreson, M.; Adali, T.; Li, X.L. Joint blind source separation with multivariate Gaussian model: Algorithms and performance analysis. IEEE Trans. Signal Process. 2012, 60, 1672–1682. [Google Scholar] [CrossRef]
 Boukouvalas, Z.; Fu, G.S.; Adali, T. An efficient multivariate generalized Gaussian distribution estimator: Application to IVA. In Proceedings of the 2015 49th Annual Conference on Information Sciences and Systems (CISS), Baltimore, MD, USA, 18–20 March 2015. [Google Scholar]
 Hummersone, C. A Psychopsychoacoustic Engineering Approach to Machine Sound Source Separation in Reverberant Environments. Ph.D. Thesis, University of Surrey, Guildford, UK, 2011. [Google Scholar]
 ISO 33822: 2008. Acoustics. Measurements of Room Acoustics Parameters, Part 2; ISO: Geneva, Switzerland, 2008. [Google Scholar]
 ShinnCunningham, B.; Kopco, N.; Martin, T. Localizing nearby sound sources in a classroom: Binaural room impulse responses. J. Acoust. Soc. Am. 2005, 117, 3100–3115. [Google Scholar] [CrossRef]
 Hu, Y.; Loizou, P.C. Evaluation of Objective Quality Measures for Speech Enhancement. IEEE Trans. Audio Speech Lang. Process. 2008, 16, 229–238. [Google Scholar] [CrossRef]
Sampling rate  8 kHz 
STFT frame length  1024 
Reverberation time  200 ms 
Room dimensions  7 m × 5 m × 3 m 
Source signal duration  4 s (TIMIT) 
Room impulse responses  Image method 
Objective measure  Signal to Distortion Ratio (SDR) 
Original Super Gaussian  Student’s t Distribution  SMM Source Prior  

Set1  9.09  9.84  10.27 
Set2  8.98  9.72  10.24 
Set3  9.26  10.11  10.87 
Set4  9.02  9.95  10.49 
Set5  9.53  10.21  10.62 
Set6  9.51  10.14  10.74 
Set7  8.91  9.67  10.09 
Set8  9.86  10.48  11.05 
Set9  9.94  10.66  11.24 
Set10  10.02  10.56  11.01 
Sampling rate  8 kHz 
STFT frame length  1024 
Velocity of sound  343 m/s 
Reverberation time  565 ms (BRIRs) 
Room dimensions  9 m × 5 m × 3.5 m 
Source signal duration  3.5 s (TIMIT) 
GMM Source Prior  SMM Source Prior  Percentage Improvement  

Angle${15}^{\circ}$  4.51  4.82  6.87% 
Angle${30}^{\circ}$  4.62  4.97  7.56% 
Angle${45}^{\circ}$  4.77  5.09  6.70% 
Angle${60}^{\circ}$  4.97  5.32  7.04% 
Angle${75}^{\circ}$  4.91  5.28  7.73% 
Angle${90}^{\circ}$  4.77  5.12  7.34% 
GMM Source Prior  SMM Source Prior  

Set1  1.85  2.02 
Set2  1.98  2.11 
Set3  1.96  2.13 
Set4  2.02  2.19 
Set5  1.93  2.14 
Set6  2.08  2.21 
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Rafique, W.; Chambers, J.; Sunny, A.I. An Expectation–MaximizationBased IVA Algorithm for Speech Source Separation Using Student’s t Mixture Model Based Source Priors. Acoustics 2019, 1, 117136. https://doi.org/10.3390/acoustics1010009
Rafique W, Chambers J, Sunny AI. An Expectation–MaximizationBased IVA Algorithm for Speech Source Separation Using Student’s t Mixture Model Based Source Priors. Acoustics. 2019; 1(1):117136. https://doi.org/10.3390/acoustics1010009
Chicago/Turabian StyleRafique, Waqas, Jonathon Chambers, and Ali Imam Sunny. 2019. "An Expectation–MaximizationBased IVA Algorithm for Speech Source Separation Using Student’s t Mixture Model Based Source Priors" Acoustics 1, no. 1: 117136. https://doi.org/10.3390/acoustics1010009