# Optimization of Virtual Loudspeakers for Spatial Room Acoustics Reproduction with Headphones

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Virtual Loudspeaker Position Optimization

#### 2.1. Sample Weighting

- Energy weighting corresponds to the most traditional form of weighting. Each SDM data point is weighted according to its energy:$${w}_{\mathrm{E},n}=\frac{{p}_{n}^{2}}{{max}_{n}\left({p}_{n}^{2}\right)}$$
- Delay weighting emphasizes the SDM data points that locate in the earlier part of the RIR. Weighting is computed as a normalized backward Schroeder integral [16] over the RIR:$${w}_{\mathrm{T},n}=\frac{{\sum}_{k=n}^{N}{p}_{k}^{2}}{{\sum}_{k=1}^{N}{p}_{k}^{2}}$$This causes the direct sound and early reflections with distinctively more energy to be weighted more than samples in the later part of the response. This aspect is linked to psychocaoustics, as the early part of the RIR is perceptually more important than the late reverberation [15].
- Gradient weighting is used as a reliability measure to the SDM data points. The reliability of one SDM data point is dependent on the data points directly before and after it in time. If DOAs of the neighboring data points are close to the DOA of the current point, the point is given a large weight, and the greater the distance to its neighbors, the smaller the weight. The weight of one SDM data point is resolved as a mean of the distances between the DOA of the point and the DOAs of the previous and next data points:$$\begin{array}{cc}\hfill {w}_{\mathrm{G},n}& ={10}^{({\mathit{min}}_{n}\left({d}_{\mathrm{G},n}\right)-{d}_{\mathrm{G},n})/10}\hfill \end{array}$$$$\begin{array}{cc}\hfill {d}_{\mathrm{G},n}& =\frac{\left|\right|{\mathbf{u}}_{\mathrm{doa},n}-{\mathbf{u}}_{\mathrm{doa},n-1}\left|\right|+\left|\right|{\mathbf{u}}_{\mathrm{doa},n+1}-{\mathbf{u}}_{\mathrm{doa},n}\left|\right|}{2h}\hfill \end{array}$$
- Direction weighting emphasizes the data points that have a lot of energy arriving from their general direction regardless of the temporal information. The operation can be thought as a low-pass filter for directions; a single high-energy sample does not get a large weight unless there are more high energy samples in the same spherical sector. Conversely, the sectors with mainly low-energy samples are given a small weight. Perceptually, this operation can be thought as simulating the limits of perception. A single high energy sample cannot be heard separately but a longer period of time is needed to generate a perceivable acoustic event. Therefore, directions with more high-energy samples should be prioritized when searching for optimal loudspeaker positions. There is also a benefit in algorithmic means as the influence of high-energy sectors are spread, making it easier for the optimization algorithm to iterate to the directions with higher energy density. Similar to the spatial downsampling presented later in Section 2.3, the general energy directions are approximated by calculating an energy map. First, an equidistant grid of points is created on the surface of the unit sphere, representing DOAs in the listener space. Then, each SDM data point is assigned to the closest point in this grid, measuring the distance from the DOA of the data point to the DOA of the grid point. When this nearest-neighbor search is ready, the energies of the assigned data points are accumulated grid point wise. The operation results in an energy map where the energy of different incoming directions has been approximated. Finally, the weight of the SDM data point is calculated from this map by interpolation:$$\begin{array}{cc}\hfill {w}_{\mathrm{D},n}& =\frac{{\sum}_{i=1}^{3}{E}_{min(n,i)}*{d}_{min(n,i)}}{{\sum}_{i=1}^{3}{d}_{min(n,i)}}\hfill \end{array}$$$$\begin{array}{cc}\hfill {d}_{min(n,i)}& =i\mathrm{th}\phantom{\rule{4.pt}{0ex}}\mathrm{smallest}\phantom{\rule{4.pt}{0ex}}\mathrm{value}\phantom{\rule{4.pt}{0ex}}\mathrm{of}\phantom{\rule{4.pt}{0ex}}\mathrm{a}\phantom{\rule{4.pt}{0ex}}\mathrm{set}\phantom{\rule{1.em}{0ex}}\left\{\right||{\mathbf{u}}_{\mathrm{doa},n}-{\mathbf{u}}_{\mathrm{grid},m}||,\phantom{\rule{4pt}{0ex}}m=1\cdots M\}\hfill \end{array}$$

#### 2.2. Initialization and Weighted DOA Clustering

Algorithm 1 Virtual loudspeaker optimization by using weighted DOA clustering. | |||

1: | function OptimizeLoudspeakerPositions($\mathbf{w},{\mathbf{u}}_{\mathrm{doa}},{N}_{\mathrm{ls}},{d}_{\mathrm{repulsion}}$) | ||

2: | ${\mathbf{W}}_{\mathrm{map}}\leftarrow $ CalculateWeightMap($\mathbf{w},{\mathbf{u}}_{\mathrm{doa}}$) | ▹ Initialize loudspeaker positions | |

3: | ${\mathbf{u}}_{\mathrm{ls}}\leftarrow $ SampleWeightMap(${\mathbf{W}}_{\mathrm{map}},{N}_{\mathrm{ls}}$) | ||

4: | repeat | ▹ Calculate weighted K-means | |

5: | ${\mathbf{u}}_{\mathrm{ls},\mathrm{old}}\leftarrow {\mathbf{u}}_{\mathrm{ls}}$ | ||

6: | ${\mathbf{c}}_{\mathrm{ls}}\leftarrow {\mathbf{0}}^{N\times 1}$ | ||

7: | for $n\leftarrow 1\mathbf{to}N$ do | ||

8: | ${\mathbf{c}}_{\mathrm{ls},n}\leftarrow arg{min}_{i}\left(\right||{\mathbf{u}}_{\mathrm{ls},i}-{\mathbf{u}}_{\mathrm{doa},n}\left|\right|)$ | ▹ find the closest loudspeaker to the SDM data point | |

9: | end for | ||

10: | for $i\leftarrow 1\mathbf{to}{N}_{\mathrm{ls}}$ do | ||

11: | $({\mathbf{w}}_{\mathrm{cls}},{\mathbf{u}}_{\mathrm{cls}})\leftarrow (\mathbf{w},{\mathbf{u}}_{\mathrm{doa}}){|}_{{\mathbf{c}}_{\mathrm{ls},n}=i}$ | ▹ assign the data point to the ith loudspeaker | |

12: | ${\mathbf{u}}_{\mathrm{ls},i}\leftarrow {\sum}_{k}\left({\mathrm{w}}_{\mathrm{cls},k}{\mathbf{u}}_{\mathrm{cls},k}\right)/{\sum}_{k}\left({\mathrm{w}}_{\mathrm{cls},k}\right)$ | ▹ weighted mean of the assigned DOAs | |

13: | end for | ||

14: | ${\mathbf{d}}_{\mathrm{closest}}\leftarrow {\mathbf{0}}^{{N}_{\mathrm{ls}}\times 1}$ | ▹ Apply repulsion area | |

15: | for $i\leftarrow 1\mathbf{to}{N}_{\mathrm{ls}}$ do | ||

16: | ${d}_{\mathrm{closest},i}\leftarrow mi{n}_{j\ne i}\left(\right||{\mathbf{u}}_{\mathrm{ls},i}-{\mathbf{u}}_{\mathrm{ls},j}\left|\right|)$ | ▹ distance to the closest neighbor | |

17: | end for | ||

18: | for all ${d}_{\mathrm{closest},i}<{d}_{\mathrm{repulsion}}$, from smallest to largest do | ||

19: | ${\mathbf{u}}_{\mathrm{v}}\leftarrow $ vertices of a Voronoi diagram of ${\mathbf{u}}_{\mathrm{ls}}$ | ▹ potential furthest points | |

20: | ${\mathbf{u}}_{\mathrm{ls},i}\leftarrow arg{max}_{{\mathbf{u}}_{\mathrm{v},j}}({min}_{i}\left(\right||{\mathbf{u}}_{\mathrm{v},j}-{\mathbf{u}}_{\mathrm{ls},i}\left|\right|\left)\right)$ | ▹ Select the ${\mathbf{u}}_{\mathrm{v},j}$ furthest from all ${\mathbf{u}}_{\mathrm{ls}}$ | |

21: | end for | ||

22: | reduce ${d}_{\mathrm{repulsion}}$ | ||

23: | until all $\left|\right|{\mathbf{u}}_{\mathrm{ls}}-{\mathbf{u}}_{\mathrm{ls},\mathrm{old}}\left|\right|<$ threshold | ||

24: | return ${\mathbf{u}}_{\mathrm{ls}}$ | ||

25: | end function |

#### 2.3. Spatial Downsampling

## 3. Perceptual Evaluation with a Listening Test

#### 3.1. Listening Test Setup and Sound Signals

#### 3.2. Listening Test Method

#### 3.3. Statistical Analysis

## 4. Results

#### 4.1. Discrimination

#### 4.2. Discrimination Criteria

## 5. Discussion

## 6. Conclusions

## Acknowledgments

## Author Contributions

## Conflicts of Interest

## Abbreviations

SDM | Spatial Decomposition Method |

IR | Impulse Response |

RIR | Room Impulse Response |

NLS | Nearest Loudspeaker Synthesis |

HRTF | Head Related Transfer Function |

VBAP | Vector Base Amplitude Panning |

O14/O16/O22/O24 | Optimized loudspeaker setup with 14/16/22/24 loudspeakers |

U14/U16/U22/U24 | Uniform loudspeaker setup with 14/16/22/24 loudspeakers |

CI | Confidence interval |

Probability distribution function |

## References

- Brandenburg, K.; Werner, S.; Klein, F.; Sladeczek, C. The Technology of Binaural Listening & Understanding: Auditory illusion through headphones: History , challenges and new solutions Auditory illusion through headphones: History , challenges and new solutions. In Proceedings of the 22nd International Congress on Acousitcs, Buenos Aires, Argentina, 5–9 September 2016. [Google Scholar]
- Hacihabiboglu, H.; De Sena, E.; Cvetkovic, Z.; Johnston, J.; Smith, J.O., III. Perceptual Spatial Audio Recording, Simulation, and Rendering: An overview of spatial-audio techniques based on psychoacoustics. IEEE Signal Process. Mag.
**2017**, 34, 36–54. [Google Scholar] [CrossRef] - Möller, H. Fundamentals of binaural technology. Appl. Acoust.
**1992**, 36, 171–218. [Google Scholar] [CrossRef] - Brimijoin, W.O.; Boyd, A.W.; Akeroyd, M.A. The contribution of head movement to the externalization and internalization of sounds. PLoS ONE
**2013**, 8, e83068. [Google Scholar] [CrossRef] [PubMed] - Hendrickx, E.; Stitt, P.; Messonnier, J.C.; Lyzwa, J.M.; Katz, B.F.; de Boishéraud, C. Influence of head tracking on the externalization of speech stimuli for non-individualized binaural synthesis. J. Acoust. Soc. Am.
**2017**, 141, 2011–2023. [Google Scholar] [CrossRef] [PubMed] - Tervo, S.; Pätynen, J.; Kuusinen, A.; Lokki, T. Spatial decomposition method for room impulse responses. J. Audio Eng. Soc.
**2013**, 61, 17–28. [Google Scholar] - Lokki, T.; Pätynen, J.; Kuusinen, A.; Tervo, S. Concert hall acoustics: Repertoire, listening position and individual taste of the listeners influence the qualitative attributes and preferences. J. Acoust. Soc. Am.
**2016**, 140, 551–562. [Google Scholar] [CrossRef] [PubMed] - Pätynen, J.; Lokki, T. Concert halls with strong and lateral sound increase the emotional impact of orchestra music. J. Acoust. Soc. Am.
**2016**, 139, 1214–1224. [Google Scholar] [CrossRef] [PubMed] - Tervo, S.; Laukkanen, P.; Pätynen, J.; Lokki, T. Preference of critical listening environment among sound engineers. J. Audio Eng. Soc.
**2014**, 62, 300–314. [Google Scholar] [CrossRef] - Tervo, S.; Pätynen, J.; Kaplanis, N.; Lydolf, M.; Bech, S.; Lokki, T. Spatial Analysis and Synthesis of Car Audio System and Car-Cabin Acoustics with a Compact Microphone Array. J. AES
**2015**, 63, 914–925. [Google Scholar] [CrossRef] - Kaplanis, N.; Bech, S.; Tervo, S.; Pätynen, J.; Lokki, T.; Van Waterschoot, T.; Jensen, S.H. A rapid sensory analysis method for perceptual assessment of automotive audio. AES J. Audio Eng. Soc.
**2017**, 65, 130–146. [Google Scholar] [CrossRef] - Amengual Gari, S.; Kob, M.; Lokki, T.; Pätynen, J.; Välimäki, V. Investigations on Stage Acoustic Preferences of Solo Trumpet Players using Virtual Acoustics. In Proceedings of the 14th Sound and Music Computing Conference, Espoo, Finland, 5–8 July 2017. [Google Scholar]
- Pulkki, V. Virtual Sound Source Positioning Using Vector Base Amplitude Panning. J. Audio Eng. Soc.
**1997**, 45, 456–466. [Google Scholar] - Pätynen, J.; Tervo, S.; Lokki, T. Amplitude panning decreases spectral brightness with concert hall auralizations. In Proceedings of the 55th International Conference of the Audio Engineering Society on Spatial Audio, Helsinki, Finland, 27–29 August 2014; pp. 1–8. [Google Scholar]
- Haapaniemi, A.; Lokki, T. Identifying concert halls from source presence vs room presence. J. Acoust. Soc. Am.
**2014**, 135, EL311–EL317. [Google Scholar] [CrossRef] [PubMed] - Schroeder, M.R. New Method of Measuring Reverberation Time. J. Acoust. Soc. Am.
**1965**, 37, 409–412. [Google Scholar] [CrossRef] - Fisher, N.I.; Lewis, T.; Embleton, B.J. Statistical Analysis of Spherical Data; Cambridge University Press: Cambridge, UK, 1987. [Google Scholar]
- Huttunen, T.; Vanne, A. End-to-End Process for HRTF Personalization; Audio Engineering Society Convention 142; Audio Engineering Society: Berlin, Germany, 2017. [Google Scholar]
- Lawless, H.T.; Heymann, H. Sensory Evaluation of Food: Principles and Practices, 2nd ed.; Springer Science & Business Media: Berlin, Germany, 2010. [Google Scholar]
- Kuusinen, A.; Lokki, T. Wheel of Concert Hall Acoustics. Acta Acust. United Acust.
**2017**, 103, 185–188. [Google Scholar] [CrossRef]

**Figure 1.**Position optimization system outline. Spatial Decomposition Method (SDM) generated data containing directions of arrival (DOA) the omnidirectional pressure values for each sample are provided to the algorithm, from which the optimized loudspeaker positions are approximated. The numbers inside the boxes refer to the corresponding sections in this paper, spatial downsampling part (in blue) working as an extra acceleration component for the main algorithm (in black).

**Figure 2.**A sketch of the listening test setup. The listener (in the middle) is surrounded by six infrared cameras (red) that track the movements of the noise-canceling headphones. The field of view has been obscured with a curtain.

**Figure 3.**Loudspeaker setups (white circles) overlaid with the spatial map of overall sound energy in the small room case: (

**a**) uniform setup with 14 loudspeakers; (

**b**) uniform setup with 22 loudspeakers; (

**c**) optimized setup with 14 loudspeakers; and (

**d**) optimized setup with 22 loudspeakers.

**Figure 4.**Loudspeaker setups used in the concert hall samples: (

**a**) uniform setup with 16 loudspeakers; (

**b**) uniform setup with 24 loudspeakers; (

**c**) optimized setup with 16 loudspeakers; and (

**d**) optimized setup with 24 loudspeakers.

**Figure 5.**The discrimination rates of different listening conditions and their one-sided 95% confidence intervals (P${}_{\mathrm{d}}$ set discrimination level, P${}_{0}$ chance rate).

**Table 1.**Results of the listening test and the frequencies of the attributes elicited by 15 subjects for correctly rated pairs.

Space | Small Room | Concert Hall | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|

Sample A | O14 | O14 | O14 | O22 | O22 | U14 | O16 | O16 | O16 | O24 | O24 | U16 |

Sample B | O22 | U14 | U22 | U14 | U22 | U22 | O24 | U16 | U24 | U16 | U24 | U24 |

Subjects 4/4 correct | 2 | 13 | 7 | 10 | 7 | 14 | 3 | 0 | 1 | 3 | 2 | 4 |

4/4 proportion (%) | 13.3 | 86.7 | 46.7 | 66.7 | 46.7 | 93.3 | 20.0 | 0.0 | 6.7 | 20.0 | 13.3 | 26.7 |

P${}_{\mathrm{adj}}$ (%) | 7.6 | 85.8 | 43.1 | 64.4 | 43.1 | 92.9 | 14.7 | 0.0 | 0.4 | 14.7 | 7.6 | 21.8 |

CI. lower (%) | - | 70.9 | 22.1 | 44.1 | 22.1 | 82.0 | - | - | - | - | - | - |

CI. upper (%) | 18.8 | - | - | - | - | - | 29.7 | 0.0 | 3.3 | 29.7 | 18.8 | 39.3 |

Attribute | Frequency of attributes elicited on the difference between samples A and B | |||||||||||

image shift | 4 | 15 | 9 | 13 | 9 | 12 | 4 | 2 | 6 | 2 | 3 | 2 |

reverberance | - | 7 | 1 | 5 | 7 | 15 | 4 | 3 | 5 | 4 | 3 | 2 |

width | - | 6 | 2 | 4 | 6 | 4 | 2 | 7 | 3 | 4 | 3 | 3 |

spectral balance | 2 | 9 | 5 | 3 | 1 | 2 | 4 | 1 | 2 | - | 1 | 7 |

timbre | 1 | 3 | 3 | 4 | 4 | 3 | - | 1 | 2 | 4 | 4 | 3 |

spatial impression | 1 | 3 | 6 | 2 | 2 | 5 | 3 | 1 | - | - | 2 | 4 |

envelopment | - | 4 | 3 | 5 | 1 | 3 | 2 | 2 | 3 | 1 | 2 | 3 |

bass | 1 | 3 | 2 | 2 | 4 | 4 | 1 | 1 | 2 | - | - | 1 |

loudness | 1 | 1 | - | - | 1 | - | 6 | 3 | 1 | 3 | 1 | 2 |

brightness | 1 | 1 | - | 2 | - | 3 | 2 | 1 | - | 1 | 1 | - |

distance | - | 1 | - | 2 | 1 | 1 | 1 | - | 1 | 1 | 2 | - |

size of space | - | - | - | - | - | - | 4 | 1 | - | 1 | 1 | - |

focus | 1 | - | - | - | 2 | 2 | - | - | - | - | - | - |

warmth | - | - | - | 1 | - | 2 | - | - | - | - | - | - |

openness | - | - | - | 1 | - | - | - | - | - | - | 2 | - |

dynamic range | – | - | - | - | - | - | - | - | - | 1 | 1 | - |

clarity | - | - | - | - | - | 1 | - | - | - | - | - | - |

source presence | - | - | 1 | - | - | - | - | - | - | - | - | - |

spatial balance | - | - | - | - | 1 | - | - | - | - | - | - | - |

Total | 12 | 53 | 32 | 44 | 39 | 57 | 33 | 23 | 25 | 22 | 26 | 27 |

© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Puomio, O.; Pätynen, J.; Lokki, T.
Optimization of Virtual Loudspeakers for Spatial Room Acoustics Reproduction with Headphones. *Appl. Sci.* **2017**, *7*, 1282.
https://doi.org/10.3390/app7121282

**AMA Style**

Puomio O, Pätynen J, Lokki T.
Optimization of Virtual Loudspeakers for Spatial Room Acoustics Reproduction with Headphones. *Applied Sciences*. 2017; 7(12):1282.
https://doi.org/10.3390/app7121282

**Chicago/Turabian Style**

Puomio, Otto, Jukka Pätynen, and Tapio Lokki.
2017. "Optimization of Virtual Loudspeakers for Spatial Room Acoustics Reproduction with Headphones" *Applied Sciences* 7, no. 12: 1282.
https://doi.org/10.3390/app7121282