Low Frequency Interactive Auralization Based on a Plane Wave Expansion

This paper addresses the problem of interactive auralization of enclosures based on a finite superposition of plane waves. For this, room acoustic simulations are performed using the Finite Element (FE) method. From the FE solution, a virtual microphone array is created and an inverse method is implemented to estimate the complex amplitudes of the plane waves. The effects of Tikhonov regularization are also considered in the formulation of the inverse problem, which leads to a more efficient solution in terms of the energy used to reconstruct the acoustic field. Based on this sound field representation, translation and rotation operators are derived enabling the listener to move within the enclosure and listen to the changes in the acoustic field. An implementation of an auralization system based on the proposed methodology is presented. The results suggest that the plane wave expansion is a suitable approach to synthesize sound fields. Its advantage lies in the possibility that it offers to implement several sound reproduction techniques for auralization applications. Furthermore, features such as translation and rotation of the acoustic field make it convenient for interactive acoustic renderings.

shows the vector x , which identifies the origin of a relative coordinate system 111 corresponding to the centre of the listener's head. The vector x rel defines the same point in space 112 in the relative coordinate system as is identified by the vector x in absolute coordinates.
113 Figure 1. Vector x is represented as x rel in the relative coordinate system x .
The sound field translation operator is derived by considering two plane wave expansions of the same sound field, but centred at different points in space, specifically at the origin of the absolute and relative coordinate systems. In this case, the difference between the two plane wave expansions is given by the plane wave amplitude densities q(ŷ, ω) and q rel (ŷ, ω), respectively. Therefore, the objective is to express one density in terms of the other. This is achieved by expanding x rel as x − x , leading to: q rel ( y, ω) = q( y, ω)e jkx · y . (2) Equation (2) indicates that e jkx · y is the translation operator for the plane wave expansion from the origin to x . Its equivalence in the time domain can be easily found by using the shifting property of the Fourier transform: (3)

114
A spherical harmonic transformation of the plane wave expansion can be performed based on the Jacobi-Anger relation [24]. From the spherical harmonic representation, the rotation can be generated by the implementation of a rotation matrix. The derivation of a sound field rotation operator in the spherical harmonic domain proceeds as follows. A rotation in the azimuthal plane by φ 0 can be expressed as: Expanding the right-hand side of Equation (4) gives: (n − m)! (n + m)! P m n (cos θ)e jmφ e −jmφ 0 , which yields: in which: Equation (7) indicates that the azimuthal rotation of the sound field can be performed by taking 115 the product of the complex spherical harmonic coefficients and a complex exponential whose argument 116 depends on the angle of rotation. An inverse spherical harmonic transformation can be implemented 117 to return to the plane wave domain after the rotation has been carried out.

119
Although the plane wave expansion is an integral representation, for the implementation of the method, Equation (1) is discretized into a finite number of L plane waves, namely: p(x, ω) = L ∑ l=1 e jkx· y l q l (ω)∆Ω l , where ∆Ω l corresponds to the portion of the unit sphere that is associated with the plane wave l. The discretization of Equation (1) is performed by using a predefined uniform distribution of L plane waves over a unit sphere [25]. Based on a finite plane wave expansion, an inverse method can be implemented to estimate a discrete set of plane waves whose complex amplitudes synthesize the target acoustic field. To that end, the acoustic pressure calculated with the FEM at a specific location of the domain can be understood as corresponding to the output of an omnidirectional microphone. The combination of acoustic pressures at discrete points generates a virtual microphone array that is used to extract spatial information of the sound field. Based on that information, the amplitude q l of each plane wave is determined by the inversion of the transfer function matrix between the microphones and the plane waves [26]. This principle is explained as follows: the complex acoustic pressure predicted with the FE model at M virtual microphone positions is denoted in vector notation as: where p m is the acoustic pressure at the m-th virtual microphone. Likewise, the complex amplitudes of L plane waves used to reconstruct the sound field are represented by the vector: Finally, the transfer function that describes the sound propagation from each plane wave to each 120 virtual microphone is arranged in matrix notation as: in which h ml = e jkx m · y l . Consequently, the relationship between the plane wave amplitudes and the virtual microphone signals is: The amplitude of the plane waves is calculated by solving Equation (11) for q(ω). This is carried out in terms of a least squares solution, which minimizes the sum of the squared errors between the reconstructed and the target sound field [26]. In the case of an overdetermined problem (more virtual microphones than plane waves), the error vector can be expressed as: wherep(ω) is the pressure reconstructed by the plane wave expansion and p(ω) is the target pressure from the FE model. The least squares solution is achieved by the minimization of a cost function J(ω) = e H (ω)e(ω) in which (·) H indicates the Hermitian transpose. The minimization of the cost function J(ω) is given by [26]: in which H † (ω) is the Moore-Penrose pseudo-inverse of the propagation matrix H(ω) [26].

122
The use of a finite number of plane waves leads to artefacts in the sound field reconstruction. Ward and Abhayapala [27] proposed the following relation between the area of accurate reconstruction, the number of plane waves and the frequency of the field: in which L is the number of plane waves, · indicates the round operator, λ is the wavelength and 123 R is the radius of a sphere within which the reconstruction is accurate. Numerical simulations have 124 been performed to evaluate the effects of discretizing the plane wave expansion.    additional plane waves only make the inversion of the propagation matrix H(ω) more difficult.

157
In addition, the use of a higher number of plane waves makes the method computational expensive phase error: wherep(x) is the reconstructed pressure, p(x) is the target pressure, (·) * indicates the complex where β(ω) is the regularization parameter. The minimization of the cost function J(ω) of Equation (17) is given by [26]: Equation (17) indicates that the minimization of the cost function takes into account the sum of the squared errors between the reconstructed and target acoustic pressure and, in addition, the sum of the squared norm of plane wave amplitude vector. Figures 8-11 show the results obtained for the reference problem when Tikhonov regularization is applied. The value of β, which is given in each case, was calculated as: in which · is the spectral norm (the largest singular value) of the propagation matrix H(ω) and Γ 196 is an arbitrary constant whose value is selected between 1 × 10 −3 and 1 × 10 −6 .    An additional analysis of the energy distribution of the plane wave density q l was conducted to evaluate the effects of regularization in the formulation of the inverse problem. Figures 12 and 13 show an interpolated distribution of the complex amplitude of the plane waves. This is plotted in two dimensions by unwrapping the unit sphere onto a 2D plane whose axes represent the elevation and azimuth angle in degrees. The total energy of the plane wave expansion is calculated from the expression: in which q l is the complex amplitude of the l-th plane wave. This value is noted in the figure captions.

207
These figures indicate that regularization has an important effect on the spatial distribution of 208 the energy of the plane wave density at 63 Hz. For this frequency, a more concentrated directional 209 representation was found when regularization is implemented. Figure 14 shows the energy of each    Figure 15 shows the general architecture of the signal processing chain. It is composed of four 230 main blocks. The first block is the convolution of anechoic audio material with a plane wave expansion.

231
The second module refers to the implementation of the translation operator in the plane wave domain.

232
The third block corresponds to the application of a rotation operator in the spherical harmonic domain, 233 and finally, the last stage is the sound reproduction using a headphone-based binaural system with 234 non-individual equalization. The implementation has been made using the commercial packages 235 Max v.7.2 and Unity v.5.0. Max is a visual programming language oriented toward audio and video 236 processing. In contrast, Unity is a programming language dedicated to video game development. It 237 was used in the current research to create a graphical interface for the interactive auralization.
238 Figure 15. General architecture for a real-time auralization based on the plane wave expansion.

239
It is important to point out that the reconstructed field will be exact for the target field if an integral plane wave expansion (Equation (1)) is used for the synthesis. This outcome means that the translation operator will lead to the correct field regardless of the location where the translation is intended. However, if the plane wave expansion is approximated by a finite sum, as in Equation (8), the reconstructed field will contain errors, and the translation operator will lead to the correct field only in the area where the discretized plane wave expansion matches the target field. An indication of the amount of translation that can be applied before noticeable sound artefacts occur can be estimated from Equation (14) as: where r t is the translation distance, L is the number of plane waves used for the synthesis, 240 λ is the wavelength and r l is the radius of the listener's head (e.g., 0.1 m) where it is desirable that 241 the sound field is reproduced accurately. Indeed, r l must be taken into consideration in order to 242 preserve the binaural cues. An example of the translation radius is given in Figure 16 between the angles of the directional impulse responses y l (θ l , φ l ) and the rotation angle, rather than 256 by multiplying the spherical harmonic coefficients by a rotation operator. Figure 17 shows the signal 257 processing chain for the rotation operator.
One relevant consideration on the use of this approach is the number of audio files to be processed   as a reference is due to the lack of measurements and spatial information across the enclosure to 295 evaluate different positions.

296
Five receivers' positions were selected as shown in Figure 19. The central point of the expansion corresponds to Location 01. The predicted frequency responses in narrow band and in 1/3 octave band resolution for each receiver are illustrated in Figures 20-24. The vertical cyan line indicates the cut-off frequency (355 Hz) of the low-pass filter, and the vertical black line indicates the maximum frequency below which the translation is expected to provide accurate results. The maximum frequency was estimated by solving Equation (14) for R. This was done taking into account the distance between the central point of the expansion and each receiver's position. The mean error displayed in the figures was selected as a metric, and it is defined as: 10 log 10 (| p i | 2 ) − 10 log 10 (|p i | 2 ) , in which n is the number of 1/3 octave frequency bands and | p i | 2 and |p i | 2 are the predicted and 297 reference energy of the acoustic pressure in the 1/3 octave band i, respectively. This error is based 298 on an equal contribution from all of the 1/3 octave bands, being analogous to a model in which pink 299 noise is used as the input signal. It was created to provide insight into how dissimilar on average the 300 reconstructed field is from the reference one. A summary of the mean errors according to the receiver 301 location is presented in Table 2.       The results indicate that the changes in the modal response of the enclosure are correctly this approach is a preliminary analysis, and further investigation is required.

320
The B-format reference signals were estimated by the implementation of an inverse method 321 according the formulation proposed by the authors [34]. For this, virtual microphone arrays were used 322 to sample the FE data at the positions where the B-format signals were intended to be synthesized.

323
The B-format signals from the auralization system were obtained by recording the output of the rotation 324 module (see Figure 15), which is based on a spherical harmonic transformation. The zero and first 325 order components were recorded and compared to the numerical references.

326
The B-format consists of four signals, which correspond to an omnidirectional (zero order) and  Table 3.          (14). For the remaining coefficients, the match is not as good as the zero order, 340 but with good agreement in terms of the envelope of the frequency response. indicates that the interactive auralization system based on a plane wave representation is able to 350 synthesize the acoustic field at low frequencies correctly, making it suitable for the auralization of 351 enclosures, whose sound field is characterized by a modal behaviour.

352
The suitability of inverse methods to estimate the amplitude of a set of plane waves that 353 reconstruct a target field has been proven. This technique is useful to extract directional information 354 from data obtained from FE simulations. However, the discretization of the integral representation 355 into a finite number of plane waves limits the spatial accuracy of the sound field reconstruction.

356
The extent of the region in which the synthesis is accurate depends on the number of plane waves and 357 the frequency of the field.

358
The use of Tikhonov regularization has three main effects on the sound field representation.

359
The first is that the energy of the plane wave density used for the synthesis of the sound fields 360 is considerably lower than for the non-regularized solution. The second consequence is that 361 the energy distribution of the plane wave density is much more directionally concentrated. The last 362 effect is a reduction of the area where the sound field reconstruction is accurate compared to 363 the non-regularized solution. Nevertheless, the implementation of regularization is convenient for 364 an interactive auralization system because the translation operator can generate zones with high 365 acoustic pressure if no regularization is applied.

366
Future work will include the combination of finite element and geometrical acoustic results to 367 extend the method outlined here to middle and high frequencies.

368
The code developed for the interactive system can be download from "https://drive.google.com/ 369 file/d/0BwuuNpQpY5UKZ2htSW9JbXN5OVU/view?usp=sharing". The following abbreviations are used in this manuscript: