#
Dual-Channel Speech Enhancement Based on Extended Kalman Filter Relative Transfer Function Estimation^{ †}

^{1}

^{2}

^{*}

^{†}

## Abstract

**:**

## 1. Introduction

## 2. System Overview

## 3. Extended Kalman Filter-Based Relative Transfer Function Estimation

- Dynamic model for the RTF ${\mathbf{a}}_{21,t}$: We assume that the state vector ${\mathbf{a}}_{21,t}$ is a random walk stochastic process which can be expressed as$${\mathbf{a}}_{21,t}={\mathbf{a}}_{21,t-1}+{\mathbf{w}}_{t},$$$${\mathbf{w}}_{t}\sim \mathcal{N}\left(\mathbf{0},\mathbf{Q}\right)$$
- Observation model for the noisy speech at the secondary microphone, ${\mathbf{y}}_{2,t}$: It is defined using the distortion model in (8) as$$\begin{array}{cc}{\mathbf{y}}_{2,t}\hfill & =\mathbf{h}\left({\mathbf{a}}_{21,t},{\mathbf{n}}_{1,t};{\mathbf{y}}_{1,t}\right)+{\mathbf{n}}_{2,t}\hfill \\ \hfill & =\left(\left[\begin{array}{cc}1& 0\\ 0& 1\end{array}\right]\left({Y}_{1,t}^{r}-{N}_{1,t}^{r}\right)+\left[\begin{array}{cc}0& -1\\ 1& 0\end{array}\right]\left({Y}_{1,t}^{i}-{N}_{1,t}^{i}\right)\right){\mathbf{a}}_{21,t}+{\mathbf{n}}_{2,t},\hfill \end{array}$$$$\left[\begin{array}{c}{\mathbf{n}}_{1,t}\\ {\mathbf{n}}_{2,t}\end{array}\right]\sim \mathcal{N}\left(\mathbf{0},\left[\begin{array}{cc}{\mathsf{\Sigma}}_{{n}_{11},t}& {\mathsf{\Sigma}}_{{n}_{12},t}\\ {\mathsf{\Sigma}}_{{n}_{21},t}& {\mathsf{\Sigma}}_{{n}_{22},t}\end{array}\right]\right),$$

- The prediction step, using the model (12), is applied for every frame $t>0$,$${\widehat{\mathbf{a}}}_{21,t|t-1}={\widehat{\mathbf{a}}}_{21,t-1},$$$${\mathbf{P}}_{t|t-1}={\mathbf{P}}_{t-1}+\mathbf{Q},$$$${\mathbf{P}}_{t}=E\left[\left({\mathbf{a}}_{21,t}-{\widehat{\mathbf{a}}}_{21,t}\right){\left({\mathbf{a}}_{21,t}-{\widehat{\mathbf{a}}}_{21,t}\right)}^{\top}\right],$$$${\mathbf{P}}_{t|t-1}=E\left[\left({\mathbf{a}}_{21,t}-{\widehat{\mathbf{a}}}_{21,t|t-1}\right){\left({\mathbf{a}}_{21,t}-{\widehat{\mathbf{a}}}_{21,t|t-1}\right)}^{\top}\right]$$
- The updating step is applied to correct the previous estimation with the observations ${\mathbf{y}}_{1,t}$ and ${\mathbf{y}}_{2,t}$ (whose relationship is given by Equation (14)),$${\widehat{\mathbf{a}}}_{21,t}={\widehat{\mathbf{a}}}_{21,t|t-1}+{\mathbf{K}}_{t}\left({\mathbf{y}}_{2,t}-{\mathit{\mu}}_{y,t}\right),$$$${\mathbf{P}}_{t}={\mathbf{P}}_{t|t-1}-{\mathbf{K}}_{t}{\mathbf{S}}_{y,t}{\mathbf{K}}_{t}^{\top},$$$${\mathbf{K}}_{t}={\mathbf{C}}_{ay,t}{\mathbf{S}}_{y,t}^{-1}$$$${\mathit{\mu}}_{y,t}=E\left[{\mathbf{y}}_{2,t}\right],$$$${\mathbf{S}}_{y,t}=E\left[\left({\mathbf{y}}_{2,t}-{\mathit{\mu}}_{y,t}\right){\left({\mathbf{y}}_{2,t}-{\mathit{\mu}}_{y,t}\right)}^{\top}\right],$$$${\mathbf{C}}_{ay,t}=E\left[\left({\mathbf{a}}_{21,t}-{\widehat{\mathbf{a}}}_{21,t|t-1}\right){\left({\mathbf{y}}_{2,t}-{\mathit{\mu}}_{y,t}\right)}^{\top}\right]$$

#### 3.1. Vector Taylor Series Approximation

#### 3.2. A Priori RTF Statistics

## 4. Speech Presence Probability-Based Noise Statistics Estimation

#### 4.1. A Posteriori SPP Estimation

- The estimation of the RTF presented in the previous section is only accurate in time-frequency bins where speech is present. The a posteriori SPP indicates those bins where speech presence is more likely. Therefore, in our implementation we only update the eKF in those bins where ${p}_{x}(t,f)>{p}_{\mathrm{thr}}$, with ${p}_{\mathrm{thr}}$ being a predefined probability threshold. Otherwise, the previous values are preserved.
- The postfiltering performance can be improved if additional information about SPP is provided, as shown later in Section 5.

- Initialization: Estimate the noisy SCM with a recursive updating,$${\widehat{\mathsf{\Phi}}}_{Y}(t,f)=\tilde{\alpha}{\widehat{\mathsf{\Phi}}}_{Y}(t-1,f)+\left(1-\tilde{\alpha}\right)\mathbf{Y}(t,f){\mathbf{Y}}^{H}(t,f),$$
- 2nd iteration: Re-estimate ${p}_{x}(t,f)$ using now ${\widehat{\mathsf{\Phi}}}_{N}(t,f)$ in (49). Finally, re-estimate ${\widehat{\mathsf{\Phi}}}_{N}(t,f)$ using ${p}_{x}(t,f)$.

#### 4.2. A Priori SAP Estimation

## 5. Postfiltering Approaches for Dual-Microphone Smartphones

#### 5.1. Parametric Wiener Filtering

#### 5.2. Optimally Modified Log-Spectral Amplitude Estimator

#### 5.3. Single-Channel Speech and Noise PSD Estimators

#### 5.3.1. Power Level Difference-Based Estimation

#### 5.3.2. Minimum Variance Distortionless Response-Based Estimation

## 6. Experimental Evaluation

- The Perceptual Evaluation Speech Quality (PESQ) [37] metric is utilized to evaluate the speech quality of the enhanced speech signal. This metric gives a mean opinion score between one and five. The higher the PESQ values, the better the speech quality.
- The Short-Time Objective Intelligibility (STOI) [38] metric is intended to evaluate the speech intelligibility of the enhanced speech signal. The resulting score is a value between zero and one. The higher the STOI value, the better the speech intelligibility.

#### 6.1. Experimental Results: Performance of SAP Estimators

#### 6.2. Experimental Results: Performance of RTF Estimators

#### 6.3. Experimental Results: Performance of Single-Channel Clean Speech PSD Estimators

#### 6.4. Experimental Results: Performance of Postfiltering Approaches

## 7. Conclusions

## Author Contributions

## Funding

## Conflicts of Interest

## References

- Parchami, M.; Zhu, W.P.; Champagne, B.; Plourde, E. Recent developments in speech enhancement in the short-time Fourier transform domain. IEEE Circuits Syst. Mag.
**2016**, 16, 45–77. [Google Scholar] [CrossRef] - Benesty, J.; Chen, J.; Huang, Y. Microphone Array Signal Processing; Springer: Berlin, Germany, 2008; Volume 1. [Google Scholar]
- Kumatani, K.; McDonough, J.; Raj, B. Microphone array processing for distant speech recognition: From close-talking microphones to far-field sensors. IEEE Signal Process. Mag.
**2012**, 29, 127–140. [Google Scholar] [CrossRef] - Souden, M.; Benesty, J.; Affes, S.; Chen, J. An Integrated Solution for Online Multichannel Noise Tracking and Reduction. IEEE Trans. Audio Speech Lang. Process.
**2011**, 19, 2159–2169. [Google Scholar] [CrossRef] - Taseska, M.; Habets, E. Nonstationary noise PSD matrix estimation for multichannel blind speech extraction. IEEE/ACM Trans. Audio Speech Lang. Process.
**2017**, 25, 2223–2236. [Google Scholar] - Gannot, S.; Vincent, E.; Markovich-Golan, S.; Ozerov, A. A Consolidated Perspective on Multimicrophone Speech Enhancement and Source Separation. IEEE/ACM Trans. Audio Speech Lang. Process.
**2017**, 25, 692–730. [Google Scholar] [CrossRef] - Cohen, I. Relative transfer function identification using speech signals. IEEE Trans. Speech Audio Process.
**2004**, 12, 451–459. [Google Scholar] [CrossRef] - Markovich, S.; Gannot, S.; Cohen, I. Multichannel eigenspace beamforming in a reverberant noisy environment with multiple interfering speech signals. IEEE Trans. Audio Speech Lang. Process.
**2009**, 17, 1071–1086. [Google Scholar] [CrossRef] - Markovich-Golan, S.; Gannot, S. Performance analysis of the covariance subtraction method for relative transfer function estimation and comparison to the covariance whitening method. In Proceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brisbane, QLD, Australia, 19–24 April 2015; pp. 544–548. [Google Scholar]
- Serizel, R.; Moonen, M.; Van Dijk, B.; Wouters, J. Low-rank approximation based multichannel Wiener filter algorithms for noise reduction with application in cochlear implants. IEEE Trans. Audio Speech Lang. Process.
**2014**, 22, 785–799. [Google Scholar] [CrossRef] - Varzandeh, R.; Taseska, M.; Habets, E.A.P. An iterative multichannel subspace-based covariance subtraction method for relative transfer function estimation. In Proceedings of the 2017 Hands-Free Speech Communications and Microphone Arrays (HSCMA), San Francisco, CA, USA, 1–3 March 2017; pp. 11–15. [Google Scholar]
- Koldovský, Z.; Malek, J.; Gannot, S. Spatial source subtraction based on incomplete measurements of relative transfer function. IEEE/ACM Trans. Audio Speech Lang. Process.
**2015**, 23, 1335–1347. [Google Scholar] [CrossRef] - Schmid, D.; Malik, S.; Enzner, G. An expectation-maximization algorithm for multichannel adaptive speech dereverberation in the frequency-domain. In Proceedings of the 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Kyoto, Japan, 25–30 March 2012; pp. 17–20. [Google Scholar]
- Schwartz, B.; Gannot, S.; Habets, E.A.P. Online Speech Dereverberation Using Kalman Filter and EM Algorithm. IEEE/ACM Trans. Audio Speech Lang. Process.
**2015**, 23, 394–406. [Google Scholar] [CrossRef] - Martín-Doñas, J.M.; López-Espejo, I.; Gomez, A.M.; Peinado, A.M. An Extended Kalman Filter for RTF Estimation in Dual-Microphone Smartphones. In Proceedings of the 2018 26th European Signal Processing Conference (EUSIPCO), Rome, Italy, 3–7 September 2018; pp. 2488–2492. [Google Scholar]
- Tashev, I.; Mihov, S.; Gleghorn, T.; Acero, A. Sound capture system and spatial filter for small devices. In Proceedings of the Interspeech, Brisbane, Australia, 22–26 September 2008; pp. 435–438. [Google Scholar]
- Jeub, M.; Herglotz, C.; Nelke, C.; Beaugeant, C.; Vary, P. Noise reduction for dual-microphone mobile phones exploiting power level differences. In Proceedings of the 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Kyoto, Japan, 25–30 March 2012; pp. 1693–1696. [Google Scholar]
- Nelke, C.M.; Beaugeant, C.; Vary, P. Dual microphone noise PSD estimation for mobile phones in hands-free position exploiting the coherence and speech presence probability. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, Canada, 26–31 May 2013; pp. 7279–7283. [Google Scholar]
- Jin, W.; Taghizadeh, M.J.; Chen, K.; Xiao, W. Multi-channel noise reduction for hands-free voice communication on mobile phones. In Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA, 5–9 March 2017; pp. 506–510. [Google Scholar]
- Zelinski, R. A microphone array with adaptive post-filtering for noise reduction in reverberant rooms. In Proceedings of the ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing, New York, NY, USA, 11–14 April 1988; pp. 2578–2581. [Google Scholar]
- Marro, C.; Mahieux, Y.; Simmer, K.U. Analysis of noise reduction and dereverberation techniques based on microphone arrays with postfiltering. IEEE Trans. Speech Audio Process.
**1998**, 6, 240–259. [Google Scholar] [CrossRef] - McCowan, I.A.; Bourlard, H. Microphone Array Post-Filter Based on Noise Field Coherence. IEEE Trans. Speech Audio Process.
**2003**, 11, 709–716. [Google Scholar] [CrossRef] - Lefkimmiatis, S.; Maragos, P. A generalized estimation approach for linear and nonlinear microphone array post-filters. Speech Commun.
**2007**, 49, 657–666. [Google Scholar] [CrossRef] [Green Version] - Gannot, S.; Cohen, I. Speech enhancement based on the general transfer function GSC and postfiltering. IEEE Trans. Speech Audio Process.
**2004**, 12, 561–571. [Google Scholar] [CrossRef] - Habets, E.; Gannot, S.; Cohen, I. Dual-microphone speech dereverberation in a noisy environment. In Proceedings of the 2006 IEEE International Symposium on Signal Processing and Information Technology, Vancouver, BC, Canada, 27–30 August 2006; pp. 651–655. [Google Scholar]
- Zheng, C.; Liu, H.; Peng, R.; Li, X. A statistical analysis of two-channel post-filter estimators in isotropic noise fields. IEEE Trans. Audio Speech Lang. Process.
**2013**, 21, 336–342. [Google Scholar] [CrossRef] - Martín-Doñas, J.M.; López-Espejo, I.; Gomez, A.M.; Peinado, A.M. A postfiltering approach for dual-microphone smartphones. In Proceedings of the IberSpeech, Barcelona, Spain, 21–23 November 2018; pp. 142–146. [Google Scholar]
- Esch, T.; Vary, P. Efficient musical noise suppression for speech enhancement systems. In Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, Taipei, Taiwan, 19–24 April 2009; pp. 4409–4412. [Google Scholar]
- Julier, S.; Uhlmann, J. A new extension of the Kalman filter to nonlinear systems. In Proceedings of the SPIE, Orlando, FL, USA, 28 July 1997; pp. 182–193. [Google Scholar]
- Ducharme, G.R.; Lafaye de Micheaux, P.; Marchina, B. The complex multinormal distribution, quadratic forms in complex random vectors and an omnibus goodness-of-fit test for the complex normal distribution. Ann. Inst. Stat. Math.
**2016**, 68, 77–104. [Google Scholar] [CrossRef] - Gerkmann, T.; Breithaupt, C.; Martin, R. Improved a posteriori speech presence probability estimation based on a likelihood ratio with fixed priors. IEEE Trans. Audio Speech Lang. Process.
**2008**, 16, 910–919. [Google Scholar] [CrossRef] - Cohen, I. Noise spectrum estimation in adverse environments: Improved minima controlled recursive averaging. IEEE Trans. Speech Audio Process.
**2003**, 11, 466–475. [Google Scholar] [CrossRef] - Schwarz, A.; Kellermann, W. Coherent-to-Diffuse Power Ratio Estimation for Dereverberation. IEEE/ACM Trans. Audio Speech Lang. Process.
**2015**, 23, 1006–1018. [Google Scholar] [CrossRef] - Doclo, S.; Spriet, A.; Wouters, J.; Moonen, M. Speech Distortion Weighted Multichannel Wiener Filtering Techniques for Noise Reduction. In Speech Enhancement; Springer: Berlin, Germany, 2005; pp. 199–228. [Google Scholar]
- Ephraim, Y.; Malah, D. Speech enhancement using a minimum mean-square error-log-spectral amplitude estimator. IEEE Trans. Acoustics Speech Signal Process.
**1985**, 33, 443–445. [Google Scholar] [CrossRef] - Cohen, I.; Berdugo, B. Speech enhancement for non-stationary noise environments. Signal Process.
**2001**, 81, 2403–2418. [Google Scholar] [CrossRef] - P.862.2: Wideband Extension to Recommendation P.862 for the Assessment Of Wideband Telephone Networks and Speech Codec; ITU-T Std. P.862.2; International Telecommunication Union: Geneva, Switzerland, 2007.
- Taal, C.H.; Hendriks, R.C.; Heusdens, R.; Jensen, J. An algorithm for intelligibility prediction of time-frequency weighted noisy speech. IEEE Trans. Audio Speech Lang. Process.
**2011**, 19, 2125–2136. [Google Scholar] [CrossRef] - López-Espejo, I.; Peinado, A.M.; Gomez, A.M.; González, J.A. Dual-channel spectral weighting for robust speech recognition in mobile devices. Digit. Signal Process.
**2018**, 75, 13–24. [Google Scholar] [CrossRef] - Garofolo, J. Getting Started With the DARPA TIMIT CD-ROM: An Acoustic Phonetic Continuous Speech Database; NIST Tech. Rep.; National Institute of Standards and Technology (NIST): Gaithersburgh, MD, USA, 1988.
- Lamel, L.; Kassel, R.; Seneff, S. Speech database development: Design and analysis of the acoustic-phonetic corpus. In Proceedings of the DARPA Speech Recognition Workshop, Noordwijkerhout, The Netherlands, 20–23 September 1989; pp. 2161–2170. [Google Scholar]
- Martín-Doñas, J.M.; Peinado, A.M.; López-Espejo, I.; Gomez, A.M. Dual-Channel Postfiltering and eKF-RTF Estimation: Source Code and Audio Examples. 2019. Available online: http://sigmat.ugr.es/dc-ekf-rtf (accessed on 1 May 2019).

**Table 1.**Predefined acoustic environments: each environment combines a reverberation environment with a given noise.

Reverberation | Noise (Test Only) |
---|---|

(A) No reverb. | Car, Street, Pedestrian street |

(B) Low | Bus, Cafe |

(C) Medium | Babble, Bus station |

(D) High | Mall |

N° Utterances | N° Speakers | |
---|---|---|

Training set | 700 | 440 |

Test set | 150 | 93 |

**Table 3.**Perceptual Evaluation Speech Quality (PESQ) and Short-Time Objective Intelligibility (STOI) results for different speech absence probability (SAP) estimators when combined with speech presence probability (SPP)-based extended Kalman filter - relative transfer function (eKF-RTF) estimation for Minimum Variance Distortionless Response (MDVR) beamforming. Results are broken down by both signal-to-noise ratio (SNR) and device placement.

Place. | Method | PESQ | STOI (%) | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|

SNR (dB) | SNR (dB) | ||||||||||||

20 | 15 | 10 | 5 | 0 | −5 | 20 | 15 | 10 | 5 | 0 | −5 | ||

CT | Noisy | 2.26 | 1.80 | 1.46 | 1.23 | 1.11 | 1.07 | 95.36 | 91.01 | 83.82 | 74.05 | 62.48 | 51.46 |

eKF-MCRA | 2.28 | 1.84 | 1.49 | 1.26 | 1.13 | 1.08 | 95.58 | 91.91 | 85.37 | 75.57 | 63.65 | 52.20 | |

eKF-CDR | 2.42 | 2.00 | 1.63 | 1.35 | 1.18 | 1.12 | 93.37 | 89.66 | 83.13 | 73.80 | 62.11 | 50.90 | |

eKF-PLDn | 2.60 | 2.09 | 1.67 | 1.38 | 1.20 | 1.11 | 96.99 | 93.77 | 87.96 | 79.12 | 67.56 | 55.61 | |

eKF-P&C | 2.59 | 2.07 | 1.66 | 1.37 | 1.19 | 1.11 | 96.90 | 93.59 | 87.60 | 78.56 | 66.85 | 54.84 | |

eKF-OracleN | 2.76 | 2.21 | 1.76 | 1.44 | 1.23 | 1.12 | 97.76 | 95.18 | 90.26 | 82.35 | 71.43 | 59.49 | |

FT | Noisy | 2.38 | 1.89 | 1.51 | 1.26 | 1.11 | 1.07 | 94.65 | 89.91 | 82.52 | 72.69 | 61.09 | 50.09 |

eKF-MCRA | 2.35 | 1.90 | 1.52 | 1.27 | 1.13 | 1.07 | 94.47 | 90.48 | 83.71 | 73.65 | 61.11 | 49.69 | |

eKF-CDR | 2.57 | 2.08 | 1.66 | 1.36 | 1.16 | 1.08 | 94.80 | 90.79 | 83.77 | 73.75 | 61.24 | 49.41 | |

eKF-PLDn | 2.43 | 2.03 | 1.65 | 1.37 | 1.19 | 1.10 | 92.62 | 89.34 | 83.41 | 74.64 | 63.46 | 52.20 | |

eKF-P&C | 2.65 | 2.11 | 1.67 | 1.37 | 1.18 | 1.09 | 95.78 | 91.96 | 85.45 | 76.01 | 64.01 | 52.03 | |

eKF-OracleN | 2.99 | 2.41 | 1.88 | 1.51 | 1.26 | 1.13 | 97.25 | 94.68 | 89.88 | 82.29 | 71.64 | 59.85 |

**Table 4.**Speech distortion (SD) and STOI results for different RTF estimators when used for MVDR beamforming. Results are broken down by both SNR and device placement.

Place. | Method | SD (%) | STOI (%) | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|

SNR (dB) | SNR (dB) | ||||||||||||

20 | 15 | 10 | 5 | 0 | −5 | 20 | 15 | 10 | 5 | 0 | −5 | ||

CT | EVD-PLDn | 0.52 | 0.66 | 0.98 | 1.53 | 2.46 | 3.64 | 96.96 | 93.72 | 87.76 | 78.70 | 66.89 | 54.88 |

CW-PLDn | 0.52 | 0.63 | 0.92 | 1.41 | 2.27 | 3.43 | 97.00 | 93.77 | 87.87 | 78.89 | 67.10 | 55.08 | |

eKF-PLDn | 0.52 | 0.58 | 0.72 | 0.90 | 1.16 | 1.44 | 96.99 | 93.77 | 87.96 | 79.12 | 67.56 | 55.61 | |

OracleC-PLDn | 0.07 | 0.11 | 0.16 | 0.22 | 0.28 | 0.34 | 97.33 | 94.32 | 88.72 | 80.05 | 68.57 | 56.60 | |

FT | EVD-P&C | 3.64 | 3.56 | 4.03 | 5.12 | 7.11 | 10.04 | 95.56 | 91.84 | 85.38 | 75.80 | 63.69 | 51.69 |

CW-P&C | 3.96 | 3.79 | 4.19 | 5.22 | 7.15 | 10.01 | 95.54 | 91.88 | 85.46 | 75.94 | 63.85 | 51.78 | |

eKF-P&C | 2.09 | 2.63 | 3.32 | 4.23 | 5.57 | 7.49 | 95.78 | 91.96 | 85.45 | 76.01 | 64.01 | 52.03 | |

OracleC-P&C | 0.24 | 0.45 | 0.81 | 1.26 | 1.82 | 2.43 | 97.05 | 93.89 | 88.23 | 79.66 | 68.18 | 56.17 |

**Table 5.**SD results for different RTF estimators when used for MVDR beamforming. Results are broken down by both reverberation environment and device placement. The noise environments are grouped in terms of the reverberant environment as in Table 1: A (Car, Street, Pedestrian street), B (Bus, Cafe), C (Bus station, Babble) and D (Mall).

Place. | Method | SD (%) | |||
---|---|---|---|---|---|

Environment | |||||

A | B | C | D | ||

CT | EVD-PLDn | 1.16 | 1.97 | 1.73 | 2.17 |

CW-PLDn | 1.08 | 1.87 | 1.59 | 2.08 | |

eKF-PLDn | 0.59 | 1.24 | 0.87 | 1.09 | |

OracleC-PLDn | 0.03 | 0.39 | 0.17 | 0.36 | |

FT | EVD-P&C | 3.62 | 7.29 | 5.54 | 8.13 |

CW-P&C | 3.62 | 7.78 | 5.55 | 8.26 | |

eKF-P&C | 2.92 | 5.12 | 4.32 | 6.13 | |

OracleC-P&C | 0.45 | 1.80 | 1.06 | 2.30 |

**Table 6.**PESQ and STOI results for different clean speech power spectral density (PSD) estimators when combined with Wiener postfiltering applied to the MVDR beamformer output. Results are broken down by SNR and device placement.

Place. | Method | PESQ | STOI (%) | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|

SNR (dB) | SNR (dB) | ||||||||||||

20 | 15 | 10 | 5 | 0 | −5 | 20 | 15 | 10 | 5 | 0 | −5 | ||

CT | eKF-PLDn | 2.60 | 2.09 | 1.67 | 1.38 | 1.20 | 1.11 | 96.99 | 93.77 | 87.96 | 79.12 | 67.56 | 55.61 |

WF-Ps-eKF-PLDn | 2.79 | 2.32 | 1.91 | 1.56 | 1.31 | 1.16 | 97.17 | 94.18 | 88.66 | 80.02 | 67.95 | 54.88 | |

WF-Ms-eKF-PLDn | 2.81 | 2.34 | 1.95 | 1.62 | 1.36 | 1.20 | 97.18 | 94.20 | 88.72 | 80.11 | 67.97 | 54.51 | |

FT | eKF-P&C | 2.65 | 2.11 | 1.67 | 1.37 | 1.18 | 1.09 | 95.78 | 91.96 | 85.45 | 76.01 | 64.01 | 52.03 |

WF-Ps-eKF-P&C | 2.59 | 2.22 | 1.85 | 1.54 | 1.31 | 1.17 | 92.59 | 89.56 | 83.75 | 74.60 | 62.70 | 50.25 | |

WF-Ms-eKF-P&C | 2.95 | 2.45 | 1.99 | 1.64 | 1.36 | 1.21 | 96.10 | 92.64 | 86.51 | 76.91 | 64.20 | 50.80 |

**Table 7.**PESQ and STOI results for different postfilters applied to the MVDR beamformer output and for other related state-of-the-art approaches. Results are broken down by SNR and device placement.

Place. | Method | PESQ | STOI (%) | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|

SNR (dB) | SNR (dB) | ||||||||||||

20 | 15 | 10 | 5 | 0 | −5 | 20 | 15 | 10 | 5 | 0 | −5 | ||

CT | PLDwf | 2.81 | 2.38 | 1.98 | 1.64 | 1.36 | 1.20 | 95.94 | 92.21 | 85.70 | 76.11 | 63.53 | 50.42 |

WF-Ms-eKF-PLDn | 2.81 | 2.34 | 1.95 | 1.62 | 1.36 | 1.20 | 97.18 | 94.20 | 88.72 | 80.11 | 67.97 | 54.51 | |

pWF-Ms-eKF-PLDn | 2.86 | 2.40 | 2.00 | 1.64 | 1.37 | 1.20 | 97.24 | 94.35 | 89.00 | 80.53 | 68.41 | 54.54 | |

OMLSA-Ms-eKF-PLDn | 2.96 | 2.49 | 2.06 | 1.68 | 1.40 | 1.23 | 97.25 | 94.45 | 89.24 | 80.86 | 68.72 | 55.11 | |

FT | SPPCwf | 2.74 | 2.26 | 1.81 | 1.48 | 1.25 | 1.12 | 94.43 | 90.26 | 83.27 | 73.28 | 61.16 | 49.34 |

WF-Ms-eKF-P&C | 2.95 | 2.45 | 1.99 | 1.64 | 1.36 | 1.21 | 96.10 | 92.64 | 86.51 | 76.91 | 64.20 | 50.80 | |

pWF-Ms-eKF-P&C | 2.99 | 2.49 | 2.01 | 1.63 | 1.36 | 1.21 | 96.12 | 92.73 | 86.68 | 77.08 | 64.18 | 50.36 | |

OMLSA-Ms-eKF-P&C | 2.85 | 2.38 | 1.94 | 1.60 | 1.35 | 1.20 | 95.98 | 92.64 | 86.63 | 77.03 | 64.13 | 50.54 |

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Martín-Doñas, J.M.; Peinado, A.M.; López-Espejo, I.; Gomez, A.
Dual-Channel Speech Enhancement Based on Extended Kalman Filter Relative Transfer Function Estimation. *Appl. Sci.* **2019**, *9*, 2520.
https://doi.org/10.3390/app9122520

**AMA Style**

Martín-Doñas JM, Peinado AM, López-Espejo I, Gomez A.
Dual-Channel Speech Enhancement Based on Extended Kalman Filter Relative Transfer Function Estimation. *Applied Sciences*. 2019; 9(12):2520.
https://doi.org/10.3390/app9122520

**Chicago/Turabian Style**

Martín-Doñas, Juan M., Antonio M. Peinado, Iván López-Espejo, and Angel Gomez.
2019. "Dual-Channel Speech Enhancement Based on Extended Kalman Filter Relative Transfer Function Estimation" *Applied Sciences* 9, no. 12: 2520.
https://doi.org/10.3390/app9122520