Next Article in Journal
Optical Effects Induced by Bloch Surface Waves in One-Dimensional Photonic Crystals
Previous Article in Journal
Transition Analysis and Its Application to Global Path Determination for a Biped Climbing Robot
Article Menu
Issue 1 (January) cover image

Export Article

Open AccessArticle
Appl. Sci. 2018, 8(1), 123;

Multiple Speech Source Separation Using Inter-Channel Correlation and Relaxed Sparsity

Beijing Key Laboratory of Computational Intelligence and Intelligent System, Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
Faculty of Engineering & Information Sciences, University of Wollongong, Wollongong NSW2522, Australia
Author to whom correspondence should be addressed.
Current address: Beijing University of Technology, No. 100, Pingleyuan, Chaoyang District, Beijing, China.
These authors contributed equally to this work.
Received: 5 December 2017 / Revised: 27 December 2017 / Accepted: 14 January 2018 / Published: 16 January 2018
(This article belongs to the Section Acoustics and Vibrations)
Full-Text   |   PDF [4671 KB, uploaded 16 January 2018]   |  


In this work, a multiple speech source separation method using inter-channel correlation and relaxed sparsity is proposed. A B-format microphone with four spatially located channels is adopted due to the size of the microphone array to preserve the spatial parameter integrity of the original signal. Specifically, we firstly measure the proportion of overlapped components among multiple sources and find that there exist many overlapped time-frequency (TF) components with increasing source number. Then, considering the relaxed sparsity of speech sources, we propose a dynamic threshold-based separation approach of sparse components where the threshold is determined by the inter-channel correlation among the recording signals. After conducting a statistical analysis of the number of active sources at each TF instant, a form of relaxed sparsity called the half-K assumption is proposed so that the active source number in a certain TF bin does not exceed half the total number of simultaneously occurring sources. By applying the half-K assumption, the non-sparse components are recovered by regarding the extracted sparse components as a guide, combined with vector decomposition and matrix factorization. Eventually, the final TF coefficients of each source are recovered by the synthesis of sparse and non-sparse components. The proposed method has been evaluated using up to six simultaneous speech sources under both anechoic and reverberant conditions. Both objective and subjective evaluations validated that the perceptual quality of the separated speech by the proposed approach outperforms existing blind source separation (BSS) approaches. Besides, it is robust to different speeches whilst confirming all the separated speeches with similar perceptual quality. View Full-Text
Keywords: multiple speech source separation; sparsity; B-format microphone multiple speech source separation; sparsity; B-format microphone

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).

Share & Cite This Article

MDPI and ACS Style

Jia, M.; Sun, J.; Zheng, X. Multiple Speech Source Separation Using Inter-Channel Correlation and Relaxed Sparsity. Appl. Sci. 2018, 8, 123.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics



[Return to top]
Appl. Sci. EISSN 2076-3417 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top