Individualized Interaural Feature Learning and Personalized Binaural Localization Model^{ †}

## Abstract

## 1. Introduction

## 2. Individualized Feature Selection Using Mutual Information

#### 2.1. Mutual Information Computation

#### 2.2. Analysis of Mutual Information in Interaural Cues

#### 2.3. Spatial Feature Learning and Selected Feature Vector

Algorithm 1: Spatial feature learning for robust localization |

## 3. Probabilistic Localization Model and System Design

## 4. Feature Dependency Analysis and Assembled Data Partition Model

#### 4.1. Data Partition and Tree-Structured Model

#### 4.2. Random Forest Bagging and Unbiased Probability Estimation

## 5. Model Training and Interpretation

#### 5.1. Model Training and Parameter Selection

#### 5.2. Trained Model Interpretation

## 6. Experiments With Simulated Data

#### 6.1. 3-D Space Localization with Mutual Information–Based Feature Selection

#### 6.1.1. Simulation Configuration

#### 6.1.2. Performance Impact of the Feature Vector Length

#### 6.1.3. Localization Performance

#### 6.2. 3-D Space Localization with Probabilistic Model

#### 6.2.1. Performance Measurements and Simulation Configuration

#### 6.2.2. Localization Performance with Different Training Environment

#### 6.2.3. Localization Performance with Additive Noise

#### 6.2.4. Localization Performance with Reverberations

## 7. Experiment in Laboratory Environment

#### 7.1. Experiment Facility and Room Configurations

#### 7.2. Testing Positions and Microphone Data Pre-Processing

#### 7.3. Experiment Result

## 8. Conclusions

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

## References

**Figure 1.**Mutual Information between the spatial cues and the elevation for a range of azimuths, frequencies and noise conditions.

**Figure 4.**Feature Usage Accounts for training condition (

**a**) SNR = ∞; (

**b**) SNR = 20 dB and (

**c**) SNR = 10 dB. The first two columns shows the feature usage for azimuth model and the last two columns shows the average feature usage for elevation model.

**Figure 5.**(

**a**) Split feature learning for $\theta ={30}^{\circ}$ and $\varphi ={45}^{\circ}$; (

**b**) Split feature learning for $\theta ={30}^{\circ}$ and $\varphi ={135}^{\circ}$. Split feature value comparison between (

**a**) $\theta ={30}^{\circ}$, $\varphi ={45}^{\circ}$ and (

**b**) $\theta ={30}^{\circ}$, $\varphi ={135}^{\circ}$.

**Figure 7.**Comparing the localization accuracy with different training conditions. The angular error tolerance is $2.{5}^{\circ}$.

**Figure 8.**Comparing the localization accuracy between proposed and PPAM methods using different feature vector types. The angular error tolerance is $2.{5}^{\circ}$.

**Figure 9.**Comparing the localization accuracy between proposed method and PPAM with different ${T}_{60}$. The angular error tolerance is $2.{5}^{\circ}$.

**Figure 11.**The hardware setup and the loudspeaker positions. (

**a**) The loudspeakers are positioned on the middle of edges of a dodecahedron frame, and the dummy head simulator with two microphones are placed in the center of the speaker arrays; (

**b**) The ground truth position of sound sources and their corresponding labels.

Localization Approach | Mean Angular Localization Error | ||
---|---|---|---|

10 dB | 20 dB | 30 dB | |

Proposed learning | 5.63${}^{\circ}$ | 0.89${}^{\circ}$ | 0.14${}^{\circ}$ |

Composite feature [25] | 24.30${}^{\circ}$ | 5.11${}^{\circ}$ | 0.85${}^{\circ}$ |

Cross-Correlation [31] | 67.65${}^{\circ}$ | 58.55${}^{\circ}$ | 51.58${}^{\circ}$ |

(a) Azimuth Accuracy Comparison | ||||||||

SNR | No Noise | 30 dB | 20 dB | 10 dB | ||||

Tolerance | ≤2.5${}^{\circ}$ | ≤5${}^{\circ}$ | ≤2.5${}^{\circ}$ | ≤5${}^{\circ}$ | ≤2.5${}^{\circ}$ | ≤5${}^{\circ}$ | ≤2.5${}^{\circ}$ | ≤5${}^{\circ}$ |

Proposed - FULL | $99.44\%$ | $100.0\%$ | $98.88\%$ | $100.0\%$ | $97.20\%$ | $99.60\%$ | $94.40\%$ | $97.92\%$ |

PPAM - FULL | $79.92\%$ | $91.12\%$ | $79.52\%$ | $90.72\%$ | $75.36\%$ | $87.6\%$ | $61.36\%$ | 73.84% |

Proposed - ILPD | $95.68\%$ | $97.12\%$ | $87.68\%$ | $90.56\%$ | $78.96\%$ | $86.00\%$ | $68.64\%$ | $79.44\%$ |

PPAM - ILPD | $89.28\%$ | $96.96\%$ | $86.64\%$ | $95.68\%$ | $73.84\%$ | $89.44\%$ | $55.04\%$ | 73.84% |

(b) Elevation Accuracy Comparison | ||||||||

SNR | No Noise | 30 dB | 20 dB | 10 dB | ||||

Tolerance | ≤2.5${}^{\circ}$ | ≤6${}^{\circ}$ | ≤2.5${}^{\circ}$ | ≤6${}^{\circ}$ | ≤2.5${}^{\circ}$ | ≤6${}^{\circ}$ | ≤2.5${}^{\circ}$ | ≤6${}^{\circ}$ |

Proposed - FULL | $96.08\%$ | $99.52\%$ | $89.60\%$ | $96.72\%$ | $72.64\%$ | $83.84\%$ | $37.04\%$ | $51.20\%$ |

PPAM - FULL | $28.08\%$ | $47.12\%$ | $24.48\%$ | $44.80\%$ | $19.52\%$ | $34.64\%$ | $9.20\%$ | $17.20\%$ |

Proposed - ILPD | $94.40\%$ | $97.12\%$ | $76.96\%$ | $84.00\%$ | $48.72\%$ | $58.88\%$ | $17.76\%$ | $27.04\%$ |

PPAM - ILPD | $44.72\%$ | $71.92\%$ | $40.72\%$ | $63.28\%$ | $24.26\%$ | $41.44\%$ | $9.84\%$ | $18.24\%$ |

(a) Azimuth Accuracy Comparison | ||||||||

${\mathit{T}}_{60}$ | 200 ms | 300 ms | 400 ms | 500 ms | ||||

Tolerance | ≤2.5${}^{\circ}$ | ≤5${}^{\circ}$ | ≤2.5${}^{\circ}$ | ≤5${}^{\circ}$ | ≤2.5${}^{\circ}$ | ≤5${}^{\circ}$ | ≤2.5${}^{\circ}$ | ≤5${}^{\circ}$ |

Proposed - FULL | $94.32\%$ | $97.92\%$ | $91.44\%$ | $96.72\%$ | $89.44\%$ | $95.76\%$ | $78.88\%$ | $89.12\%$ |

PPAM - FULL | $81.04\%$ | $90.64\%$ | $79.84\%$ | $90.32\%$ | $77.84\%$ | $88.88\%$ | $66.35\%$ | $85.76\%$ |

Proposed - ILPD | $74.24\%$ | $84.96\%$ | $72.08\%$ | $83.28\%$ | $66.32\%$ | $80.4\%$ | $53.04\%$ | $74.88\%$ |

PPAM - ILPD | $86.72\%$ | $96.88\%$ | $96.40\%$ | $77.20\%$ | $77.2\%$ | $94.96\%$ | $58.00\%$ | 83.92% |

(b) Elevation Accuracy Comparison | ||||||||

${\mathit{T}}_{\mathbf{60}}$ | 200 ms | 300 ms | 400 ms | 500 ms | ||||

Tolerance | ≤2.5${}^{\circ}$ | ≤6${}^{\circ}$ | ≤2.5${}^{\circ}$ | ≤6${}^{\circ}$ | ≤2.5${}^{\circ}$ | ≤6${}^{\circ}$ | ≤2.5${}^{\circ}$ | ≤6${}^{\circ}$ |

Proposed - FULL | $75.84\%$ | $89.04\%$ | $68.48\%$ | $82.72\%$ | $55.52\%$ | $72.56\%$ | $42.64\%$ | $62.80\%$ |

PPAM - FULL | $26.16\%$ | $96.88\%$ | $23.92\%$ | $40.00\%$ | $21.12\%$ | $35.52\%$ | $14.12\%$ | 24.80% |

Proposed - ILPD | $61.92\%$ | $42.64\%$ | $53.76\%$ | $70.32\%$ | $42.08\%$ | $61.04\%$ | $22.48\%$ | $40.24\%$ |

PPAM - ILPD | $37.36\%$ | $61.60\%$ | $34.08\%$ | $55.44\%$ | $28.96\%$ | $46.80\%$ | $15.68\%$ | $28.00\%$ |

**Table 4.**To performance of the passive model. The ground truth loudspeaker locations are represented in ${\Theta}_{\mathrm{IP}}$.

Loudspeaker No. | True Azimuth | True Elevation | Estimated Azimuth | Estimated Elevation | Estimated Error |
---|---|---|---|---|---|

1 | $-18.{00}^{\circ}$ | 63.44${}^{\circ}$ | $-22.{00}^{\circ}$ | 81.68${}^{\circ}$ | 35.25${}^{\circ}$ |

2 | 18.00${}^{\circ}$ | 63.44${}^{\circ}$ | 20.00${}^{\circ}$ | 67.50${}^{\circ}$ | 4.33${}^{\circ}$ |

3 | 30.00${}^{\circ}$ | 100.82${}^{\circ}$ | 30.00${}^{\circ}$ | 71.43${}^{\circ}$ | 31.53${}^{\circ}$ |

4 | 0.00${}^{\circ}$ | 121.72${}^{\circ}$ | $-5.{00}^{\circ}$ | 123.75${}^{\circ}$ | 2.03${}^{\circ}$ |

5 | $-30.{00}^{\circ}$ | 100.82${}^{\circ}$ | $-30.{00}^{\circ}$ | 84.38${}^{\circ}$ | 14.04${}^{\circ}$ |

6 | 0.00${}^{\circ}$ | 31.72${}^{\circ}$ | 0.00${}^{\circ}$ | 33.75${}^{\circ}$ | 2.03${}^{\circ}$ |

7 | 54.00${}^{\circ}$ | 63.44${}^{\circ}$ | 55.00${}^{\circ}$ | 60.19${}^{\circ}$ | 3.56${}^{\circ}$ |

8 | 30.00${}^{\circ}$ | 142.62${}^{\circ}$ | 30.00${}^{\circ}$ | 140.63${}^{\circ}$ | 1.73${}^{\circ}$ |

9 | $-30.{00}^{\circ}$ | 142.62${}^{\circ}$ | $-30.{00}^{\circ}$ | 135.56${}^{\circ}$ | 6.11${}^{\circ}$ |

10 | $-54.{00}^{\circ}$ | 63.44${}^{\circ}$ | $-55.{00}^{\circ}$ | 59.06${}^{\circ}$ | 2.82${}^{\circ}$ |

11 | $-18.{00}^{\circ}$ | 0.00${}^{\circ}$ | $-20.{00}^{\circ}$ | 0.00${}^{\circ}$ | 2.00${}^{\circ}$ |

12 | 18.00${}^{\circ}$ | 0.00${}^{\circ}$ | 19.50${}^{\circ}$ | 10.69${}^{\circ}$ | 11.88${}^{\circ}$ |

13 | 54.00${}^{\circ}$ | 0.00${}^{\circ}$ | 55.00${}^{\circ}$ | 0.00${}^{\circ}$ | 1.48${}^{\circ}$ |

14 | 90.00${}^{\circ}$ | 0.00${}^{\circ}$ | 80.00${}^{\circ}$ | $-39.{94}^{\circ}$ | 10.00${}^{\circ}$ |

15 | 54.00${}^{\circ}$ | 180.00${}^{\circ}$ | 55.00${}^{\circ}$ | 194.06${}^{\circ}$ | 9.61${}^{\circ}$ |

16 | 18.00${}^{\circ}$ | 180.00${}^{\circ}$ | 20.00${}^{\circ}$ | 177.75${}^{\circ}$ | 3.47${}^{\circ}$ |

17 | $-18.{00}^{\circ}$ | 180.00${}^{\circ}$ | $-18.{00}^{\circ}$ | 157.50${}^{\circ}$ | 24.36${}^{\circ}$ |

18 | $-54.{00}^{\circ}$ | 180.00${}^{\circ}$ | $-55.{00}^{\circ}$ | 182.81${}^{\circ}$ | 2.21${}^{\circ}$ |

19 | $-90.{00}^{\circ}$ | 0.00${}^{\circ}$ | $-80.{00}^{\circ}$ | $-39.{94}^{\circ}$ | 10.00${}^{\circ}$ |

20 | $-54.{00}^{\circ}$ | 0.00${}^{\circ}$ | $-55.{00}^{\circ}$ | 1.69${}^{\circ}$ | 2.69${}^{\circ}$ |

21 | $-30.{00}^{\circ}$ | $-37.{38}^{\circ}$ | $-30.{00}^{\circ}$ | $-38.{81}^{\circ}$ | 1.87${}^{\circ}$ |

22 | 30.00${}^{\circ}$ | $-37.{38}^{\circ}$ | 30.00${}^{\circ}$ | $-33.{75}^{\circ}$ | 3.14${}^{\circ}$ |

23 ${}^{*}$ | 54.00${}^{\circ}$ | 243.43${}^{\circ}$ | 55.00${}^{\circ}$ | 168.75${}^{\circ}$ | 41.26${}^{\circ}$ |

24 | 0.00${}^{\circ}$ | 211.74${}^{\circ}$ | $-5.{00}^{\circ}$ | 219.38${}^{\circ}$ | 9.14${}^{\circ}$ |

25 ${}^{*}$ | $-54.{00}^{\circ}$ | 243.43${}^{\circ}$ | $-55.{00}^{\circ}$ | 101.25${}^{\circ}$ | 66.65${}^{\circ}$ |

26 ${}^{*}$ | 0.00${}^{\circ}$ | $-58.{29}^{\circ}$ | 0.00${}^{\circ}$ | $-33.{75}^{\circ}$ | 24.54${}^{\circ}$ |

27 ${}^{*}$ | 30.00${}^{\circ}$ | $-79.{19}^{\circ}$ | 30.00${}^{\circ}$ | 73.13${}^{\circ}$ | 114.47${}^{\circ}$ |

28 ${}^{*}$ | 18.00${}^{\circ}$ | 243.43${}^{\circ}$ | 20.00${}^{\circ}$ | 196.88${}^{\circ}$ | 43.93${}^{\circ}$ |

29 ${}^{*}$ | $-18.{00}^{\circ}$ | 243.43${}^{\circ}$ | $-20.{00}^{\circ}$ | 157.50${}^{\circ}$ | 80.27${}^{\circ}$ |

30 ${}^{*}$ | $-30.{00}^{\circ}$ | $-79.{19}^{\circ}$ | $-30.{00}^{\circ}$ | $-33.{75}^{\circ}$ | 39.08${}^{\circ}$ |

