Time-Domain Sound Field Reproduction with Pressure and Particle Velocity Jointly Controlled

Hu, Xuanqi; Wang, Jiale; Zhang, Wen; Zhang, Lijun

doi:10.3390/app112210880

Open AccessArticle

Time-Domain Sound Field Reproduction with Pressure and Particle Velocity Jointly Controlled

Center of Intelligent Acoustics and Immersive Communications, School of Marine Science and Technology, Northwestern Polytechnical University, Xi’an 710072, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2021, 11(22), 10880; https://doi.org/10.3390/app112210880

Submission received: 30 September 2021 / Revised: 7 November 2021 / Accepted: 10 November 2021 / Published: 18 November 2021

(This article belongs to the Special Issue Sound Field Control)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Particle velocity has been introduced to improve the performance of spatial sound field reproduction systems with an irregular loudspeaker array setup. However, existing systems have only been developed in the frequency domain. In this work, we propose a time-domain sound field reproduction method with both sound pressure and particle velocity components jointly controlled. To solve the computational complexity problem associated with the multi-channel setup and the long-length filter design, we adopt the general eigenvalue decomposition-based approach and the conjugate gradient method. The performance of the proposed method is evaluated through numerical simulations with both a regular loudspeaker array layout and an irregular loudspeaker array layout in a room environment.

Keywords:

sound field reproduction; pressure matching; particle velocity; time domain

1. Introduction

Spatial sound field reproduction aims at faithfully reproducing the sound field within an extended region of space so that listeners inside the region could experience the replication of the original sound field as realistically as possible. Such a system normally uses multiple loudspeakers as secondary sources to control sound radiation and nowadays has been widely used in various audio applications such as in theaters, cinemas, concerts, home entertainment systems, etc.

The exploration of spatial sound field reproduction has never stopped, up to now, many approaches have been developed for this technology, including wave field synthesis (WFS), Ambisonics, and least-squares (LS)-based multi-point control [1]. The WFS approach was first proposed by Berkhout, which is based on the Huygens–Fresnel integral to represent a sound field within a bounded region of the space by a continuous distribution of monopole and dipole secondary sources arranged on the boundary of that region [2,3]. For practical implementation, an array of equally spaced loudspeakers are used to approximate the continuous distribution of secondary sources. Reproduction artifacts due to the spatial discretization of the continuously-distributed secondary sources and the finite size of the array were investigated [4]. In WFS, a large number of closely spaced loudspeakers is necessary for accurate sound field reproduction.

Another well-known sound field reproduction technique, Ambisonics, was designed based on Huygens principle [5,6]. The system adopts the zero and first order spherical harmonic decomposition of the original sound field into four channels, and from a linear combination of these four channels to derive the loudspeaker driving signals. This low-order system is optimum only at low frequencies and when the listener stays at the central sweet spot. Later, Higher-Order Ambisonics (HOA) based on the higher-order spherical harmonic decomposition of a sound field was developed for high reproduction frequencies and large reproduction regions [7,8,9]. A typical Ambisonic or HOA system uses a circular or spherical loudspeaker array geometry. In addition, the spherical harmonic expansion order increases with the reproduction frequency and radius of the reproduction region, thus in most time HOA also requires densely distributed loudspeakers to match all the spherical harmonics to the given order to avoid spatial aliasing [10].

For arbitrarily placed loudspeakers, a simple approach in sound reproduction is the least-squares (LS) based multi-point control, which uses multiple microphones as matching points to derive the least-square solution as the loudspeaker weights [11]. This approach is a typical inverse problem, and Tikhonov regularization is commonly adopted to obtain the loudspeaker weights with limited energy and to improve the system robustness. With this method, the placement of both loudspeakers and matching microphones greatly affects control accuracy and filter stability [12]. In addition, the acoustic characteristics of loudspeakers affect the reproduction results, and its modeling though measurements has been discussed in previous work [13].

While most existing work focuses on controlling sound pressure only, some recent work started to investigate controlling the particle velocity in sound field reproduction [14]. A joint control of sound pressure and particle velocity has also been proposed for single-zone [15] and multi-zone sound field reproduction [16]. A general finding is that particle velocity assisted sound field reproduction is especially suitable to a non-uniformly spaced loudspeaker array with reduced number of loudspeakers and control points required. The extension to intensity based sound field reproduction has also been investigated [17,18].

So far, the particle velocity assisted sound field reproduction system has only been developed in the frequency domain. In this work, we propose a time-domain sound field reproduction algorithm with both sound pressure and particle velocity jointly controlled. As demonstrated in many works, time-domain processing in spatial sound recording and reproduction is suited for real-time applications [19,20]; however, it is also computationally expensive as long-tap room impulse response (RIR) filters are usually involved for sound field reproduction inside reverberant rooms. We adopt the eigenvalue decomposition (EVD)-based approach and the conjugate gradient (CG) method [21,22] in this work to reduce the computational complexity.

The paper is organized as follows. Frequency-domain velocity assisted sound field reproduction is reviewed in Section 2. In Section 3, the proposed time-domain sound field reproduction with joint control of sound pressure and particle velocity, and implementation details, are introduced. In Section 4, the effectiveness of the proposed method is evaluated through numerical simulations in a room environment of different reverbration times. Finally, Section 5 concludes this paper.

Notations: italic letters denote scalars, lower case boldface letters denote vectors, and upper case boldface letters denote matrices.

2. Review: Frequency-Domain Velocity-Assisted Sound Field Reproduction

As a starting point, we briefly review the concept of frequency-domain velocity assisted sound field reproduction. At an arbitrary observation position

x

, the particle velocity

v (x, ω)

and the complex-valued sound pressure

p (x, ω)

with time-dependency

e^{i ω t}

have a relationship established by Euler’s equation,

- \nabla p (x, ω) = i ω ρ v (x, ω),

(1)

where i is the imaginary unit,

ω = 2 π f

denotes the angular frequency,

ρ

is the density of the propagation medium, and ∇ represents the gradient operation along the direction of the particle velocity vector. The components of the particle velocity vector can be defined either in the Cartesian coordinate, i.e.,

v \equiv {v_{x}, v_{y}, v_{z}}

, or the polar coordinate, such as

v \equiv {v_{rad}, v_{θ}, v_{ϕ}}

along the radial direction, the elevation and azimuth angular direction, respectively.

In sound field reconstruction, we consider the reproduced sound generated by an array of L loudspeakers located positioned at

y_{l}

with

ℓ = 1, \dots, L

surrounding the listening area. We define the acoustic transfer function (ATF) for the sound pressure component from the ℓth loudspeaker to the control point

x

as

T_{p} (x | y_{ℓ}, ω)

. A special case is when the loudspeakers are modeled as point sources, and by assuming free-field propagation, the ATF is represented by the Green’s function, that is

T_{p}^{free - field} (x | y_{ℓ}, ω) = \frac{1}{4 π} \frac{e^{i k {∥ y_{ℓ} - x ∥}_{2}}}{{∥ y_{ℓ} - x ∥}_{2}},

(2)

where

k = ω / c_{0}

is the wave number,

c_{0}

denotes the sound speed, and

{∥ \cdot ∥}_{2}

denotes the L2-norm.

Then, the reproduced sound pressure at position

x

can be expressed as

\begin{matrix} p (x, ω) & = \sum_{ℓ = 1}^{L} T_{p} (x | y_{ℓ}, ω) w_{ℓ} (ω) S (ω) \\ = t_{p}^{T} (x, ω) w (ω) S (ω), \end{matrix}

(3)

where

t_{p} (x, ω) = {[T_{p} (x | y_{1}, ω), \dots, T_{p} (x | y_{L}, ω)]}^{T}

is a column vector containing the ATFs for all the loudspeakers to the position

x

,

w (ω) = {[w_{1} (ω), \dots, w_{L} (ω)]}^{T}

is the vector consisting of the frequency-domain loudspeaker weights, and

S (ω)

is the source audio signal.

{(\cdot)}^{T}

denotes the transpose operator.

Similarly, we can define the ATF for the particle velocity, i.e.,

t_{v} (x | y_{ℓ}, ω)

, which is a column vector of length 3 for each component of

v

, and has the following representation for the reproduced particle velocity:

\begin{matrix} v (x, ω) & = \sum_{ℓ = 1}^{L} t_{v} (x | y_{ℓ}, ω) w_{ℓ} (ω) S (ω) \\ = T_{v}^{T} (x, ω) w (ω) S (ω), \end{matrix}

(4)

where

T_{v} (x, ω) = {[t_{v} (x | y_{1}, ω), \dots, t (x | y_{L}, ω)]}^{T}

is a matrix of size

L \times 3

.

Given that in sound field reproduction applications, we aim to reproduce the desired sound within a certain region of interest, by matching the sound pressure and particle velocity at multiple control points. That is, we have the matrix form representation of (3) and (4) as

p (ω) = T_{p} (ω) w (ω) S (ω)

(5)

v (ω) = T_{v} (ω) w (ω) S (ω),

(6)

where given M control points,

x_{m}

and

m = 1, \dots, M

,

p (ω) = {[p (x_{1}, ω), \dots, p (x_{M}, ω)]}^{T}

and

v (ω) = {[v^{T} (x_{1}, ω), \dots, v^{T} (x_{M}, ω)]}^{T}

are column vectors of length M and

3 M

, respectively. The ATF matrix

T_{p} (ω) = {[t_{p} (x_{1}, ω), \dots, t_{p} (x_{M}, ω)]}^{T}

T_{v} (ω) = {[T_{v} (x_{1}, ω), \dots, T_{v} (x_{M}, ω)]}^{T}

is of size of

M \times L

and

3 M \times L

, respectively.

Based on Equations (5) and (6), and assuming the unit amplitude of the source audio signal, i.e.,

S (ω) = 1

, the cost function for a jointly controlling the reproduced sound pressure and particle velocity is formulated as follows:

min_{w (ω)} {τ (ω) {∥ T_{p} (ω) w (ω) - p_{d} (ω) ∥}_{2} + (1 - τ (ω)) {∥ T_{v} (ω) w (ω) - v_{d} (ω) ∥}_{2}}

(7)

where

p_{d} (ω)

and

v_{d} (ω)

represent the desired pressure and particle velocity, respectively. The control strategy is to minimize the reproduction error of both components, and

τ (ω) \in [0, 1]

is the parameter to adjust the relative weights for matching of pressure and velocity. Equation (7) is known as the weighted least squares problem and can be solved using a Moore–Penrose pseudoinverse with Tikhonov regularization.

3. Proposed: Time-Domain Sound Field Reproduction with Joint Control of Sound Pressure and Particle Velocity

3.1. System Formulation

Now, we discuss the problem of sound field reproduction in the time domain. Assuming the room impulse responses (RIRs) are pre-calibrated through measurements, the reproduced sound pressure and particle velocity at the mth (

1 ⩽ m ⩽ M

) control point

x_{m}

, generated by L loudspeakers located at

y_{1}, \dots, y_{L}

, can be expressed as

p_{n} (x_{m}) = \sum_{ℓ = 1}^{L} s_{n} * q_{n}^{ℓ} * h_{p, n} (x_{m} | y_{ℓ})

(8)

v_{n} (x_{m}) = \sum_{ℓ = 1}^{L} s_{n} * q_{n}^{ℓ} * h_{v, n} (x_{m} | y_{ℓ}),

(9)

where * denotes the linear convolution operator and n denotes the sampling index.

s_{n}

denotes the input sound signal,

q_{n}^{ℓ}

denotes the control filter for the ℓth loudspeaker,

h_{p, n} (x_{m} | y_{ℓ})

and

h_{v, n} (x_{m} | y_{ℓ})

denote the RIRs of pressure and velocity components, respectively, from the ℓ-th loudspeaker to the m-th control point. Note that for particle velocity vector, we follow the convention to define three components along x, y and z axes, that is,

v_{n} (x_{m}) = {[v_{n, x} (x_{m}), v_{n, y} (x_{m}), v_{n, z} (x_{m})]}^{T}

and

h_{v, n} (x_{m} | y_{ℓ}) = {[h_{v, n}^{x} (x_{m} | y_{ℓ}), h_{v, n}^{y} (x_{m} | y_{ℓ}), h_{v, n}^{z} (x_{m} | y_{ℓ})]}^{T} .

Represent (8) and (9) in matrix form, we have

p_{n} (x_{m}) = \sum_{ℓ = 1}^{L} q_{ℓ}^{T} H_{p} (x_{m} | y_{ℓ}) s_{n} = q^{T} H_{p} (x_{m}) s_{n}

(10)

v_{n, c} (x_{m}) = \sum_{ℓ = 1}^{L} q_{ℓ}^{T} H_{v, c} (x_{m} | y_{ℓ}) s_{n} = q^{T} H_{v, c} (x_{m}) s_{n}, c \in x, y, z,

(11)

where given the K-tap long RIR and J-tap long control filter,

s_{n} = {[s_{n}, s_{n - 1}, \dots, s_{n - (K + J - 2)}]}^{T},

q_{ℓ} = {[q_{1}^{ℓ}, q_{2}^{ℓ}, \dots, q_{J}^{ℓ}]}^{T},

and the RIR matrices

H_{p} (x_{m} | y_{ℓ})

and

H_{v, c} (x_{m} | y_{ℓ})

are Toeplitz matrices of size

J \times (K + J - 1)

. The first row vector and the first column vector of

H_{p} (x_{m} | y_{ℓ})

are defined as

[h_{p, 1} (x_{m} | y_{ℓ}), \dots, h_{p, K} (x_{m} | y_{ℓ}), \underset{J - 1}{\underset{︸}{0, \dots, 0}}]

and

{[h_{p, 1} (x_{m} | y_{ℓ}), \underset{J - 1}{\underset{︸}{0, \dots, 0}}]}^{T}

, respectively. The same formulation is adopted for each particle velocity component.

Then, we have

q = {[q_{1}^{T}, q_{2}^{T}, \dots, q_{L}^{T}]}^{T},

H_{p} (x_{m}) = {[H_{p}^{T} (x_{m} | y_{1}), H_{p}^{T} (x_{m} | y_{2}), \dots, H_{p}^{T} (x_{m} | y_{L})]}^{T},

H_{v, c} (x_{m}) = {[H_{v, c}^{T} (x_{m} | y_{1}), H_{v, c}^{T} (x_{m} | y_{2}), \dots, H_{v, c}^{T} (x_{m} | y_{L})]}^{T}, c \in x, y, z,

which are of size

L J \times 1

,

L J \times (K + J - 1)

, and

L J \times (K + J - 1)

, respectively.

In a similar way, the desired sound pressure and particle velocity at position

x_{m}

can be expressed as

p_{n}^{d} (x_{m}) = s_{n} * g_{p, n} (x_{m}) = g_{p}^{T} (x_{m}) s_{n}

(12)

v_{n}^{d} (x_{m}) = s_{n} * g_{v, n} (x_{m}) = g_{v}^{T} (x_{m}) s_{n}

(13)

where

g_{p} (x_{m})

and

g_{v} (x_{m})

denote the RIR of pressure and particle velocity, respectively, from the virtual desired source to the mth control point.

While the conventional pressure-matching-based method aims to minimize the desired and reproduce sound pressure only, in this work, a joint control of sound pressure and particle velocity is investigated. That is, the mean squared error (MSE) between the desired and reproduced sound pressure and particle velocity are minimized simultaneously over N time samples and M control points. The cost function is formulated as follows:

J = \frac{1}{N M} \sum_{m = 1}^{M} \sum_{n = 1}^{N} (1 - τ) {(p_{n} (x_{m}) - p_{n}^{d} (x_{m}))}^{2} + τ {∥ v_{n} (x_{m}) - v_{n}^{d} (x_{m}) ∥}_{2}

(14)

where

p_{n}^{d} (x_{m})

,

p_{n} (x_{m})

,

v_{n}^{d} (x_{m})

, and

v_{n} (x_{m})

denote the desired and reproduced sound pressure, the desired and the reproduced particle velocity at the control point

x_{m}

, respectively. Note that in (14), the first term of sound pressure control is a scalar, while the second term of particle velocity control is a vector, which can be further represented as

\begin{matrix} {| | v_{n} (x_{m}) - v_{n}^{d} (x_{m}) | |}^{2} = \\ {(v_{n}^{x} (x_{m}) - v_{n}^{x, d} (x_{m}))}^{2} + {(v_{n}^{y} (x_{m}) - v_{n}^{y, d} (x_{m}))}^{2} + {(v_{n}^{z} (x_{m}) - v_{n}^{z, d} (x_{m}))}^{2} . \end{matrix}

Equation (14) can also be represented in matrix form, i.e.,

\begin{matrix} J (q) & = (1 - τ) (q^{T} R_{p} q - 2 q^{T} r_{p} + σ_{p}) + τ (q^{T} R_{v} q - 2 q^{T} r_{v} + σ_{v}) \\ = q^{T} R q - 2 q^{T} r + σ, \end{matrix}

(15)

where

τ

denotes weighting parameter.

The spatial autocorrelation matrix is defined as

R = (1 - τ) R_{p} + τ R_{v},

(16)

with

R_{p} = \frac{1}{M} \sum_{m = 1}^{M} H_{p} (x_{m}) R_{s} H_{p}^{T} (x_{m})

,

R_{s} = \frac{1}{N} \sum_{n = 1}^{N} s_{n} s_{n}^{T}

, and the same formulation for each particle velocity component, that is,

R_{c} = \frac{1}{M} \sum_{m = 1}^{M} H_{v, c} (x_{m}) R_{s} H_{v, c}^{T} (x_{m})

,

c \in x, y, z

,

R_{v} = R_{x} + R_{y} + R_{z}

.

The spatial cross-correlation vector is defined as

r = (1 - τ) r_{p} + τ r_{v},

(17)

with

r_{p} = \frac{1}{M} \sum_{m = 1}^{M} H_{p} (x_{m}) R_{s} g_{p}^{T} (x_{m})

and the same formulation for each particle velocity vector,

r_{c} = \frac{1}{M} \sum_{m = 1}^{M} H_{v, c} (x_{m}) R_{s} g_{v, c}^{T} (x_{m})

,

c \in x, y, z

,

r_{v} = r_{x} + r_{y} + r_{z}

.

The constant term is defined as

σ = (1 - τ) σ_{p} + τ σ_{v},

(18)

with

σ_{p} = \frac{1}{M} \sum_{m = 1}^{M} g_{p}^{T} (x_{m}) R_{s} g_{p} (x_{m})

and the same formulation for each particle velocity component,

σ_{c} = \frac{1}{M} \sum_{m = 1}^{M} g_{v, c}^{T} (x_{m}) R_{s} g_{v, c} (x_{m})

,

c \in x, y, z

,

σ_{v} = σ_{x} + σ_{y} + σ_{z} .

Minimizing the cost function Equation (14) by setting its derivative of

q

equal to 0, we get the solution

\hat{q} = R^{- 1} r .

(19)

3.2. EVD-Based Approach with Conjugate Gradient Algorithm

While the joint control of sound pressure and particle velocity for sound field reproduction is especially suited to non-uniform loudspeaker array setup with reduce number of loudspeakers and control points required, the proposed time-domain reproduction method has the potential to be used in real-time applications. However, for long-length RIRs and control filters, the inverse solution in (16) requires very high computational complexity. To solve this problem, the eigenvalue decomposition (EVD) based approach is adopted with the conjugate gradient (CG) method to search for the optimal solution in an iterative manner.

In (19), the matrix

R

is a symmetric positive definite matrix of size

L J \times L J

, where L is the number of loudspeaker used for reproduction and J is the length of the control filter for each loudspeaker. Solving the problem with the direct inverse operation requires

O ({(L J)}^{3})

operations. Instead, we assume that the space spanned by the spatial autocorrelation matrix

R

can be approximated by its I dominant eigen vectors, where

I \leq L J

. Then, the CG method, which searches the solution in a set of orthogonal directions, can be used to find the solution iterative. The CG method adopted in this work has the advantage of reducing the computational complexity to

O (I {(L J)}^{2})

by setting the dimension of search direction as I. The flow of the algorithm is summarized in Table 1.

4. Simulation

In this section, we verify the effectiveness of the proposed reproduction method through numerical simulations in a room environment. For convenience, we treat the loudspeaker as a simple point source in this simulation. Two different loudspeaker layouts are simulated, i.e., a regular and an irregular loudspeaker array on the horizontal plane. Therefore, we consider both the desired sound field and reproduced sound field on the horizontal plane, for which only the two components of particle velocity along x and y axes are matched. In the simulation setup, the origin of the coordinate coincides with the left-bottom corner of the room and we use a segment of speech as the input signal of the system.

4.1. Performance Evaluation Metrics

Two performance evaluation metrics adopted are as follows:

The normalized mean squared error (NMSE) of reproduced sound intensity, which is defined as

$ϵ = \frac{\frac{1}{N M} \sum_{m = 1}^{M} \sum_{n = 1}^{N} {∥ I_{n} (x_{m}) - I_{n}^{d} (x_{m}) ∥}^{2}}{\frac{1}{N M} \sum_{m = 1}^{M} \sum_{n = 1}^{N} {∥ I_{n}^{d} (x_{m}) ∥}^{2}}$

(20)

with the sound intensity vector calculated as follows [23]:

$I_{n} (x_{m}) = p_{n} (x_{m}) v_{n} (x_{m}) .$

(21)

Here, $I_{n} (x_{m})$ and $I_{n}^{d} (x_{m})$ denote the reproduced and desired sound intensity at the point $x_{m}$ , respectively. The results over N time samples and M points are averaged in Equation (20).
Specially, the intensity reproduction NMSE along c ( $c \in x, y$ ) axis is investigated separately, that is,

$\begin{matrix} ϵ_{c} & = \frac{\frac{1}{N M} \sum_{m = 1}^{M} \sum_{n = 1}^{N} {∥ I_{c} (x_{m}) - I_{c}^{d} (x_{m}) ∥}^{2}}{\frac{1}{N M} \sum_{m = 1}^{M} \sum_{n = 1}^{N} ∥ I_{c}^{d} (x_{m}) ∥ 2} \end{matrix}$

(22)

Note that as proved in psycho-acoustic experiments, the sound intensity measure is closely linked with human perception of sound locations [24].
The NMSE of the reproduced sound pressure, which defined as

$η = \frac{\frac{1}{N M} \sum_{m = 1}^{M} \sum_{n = 1}^{N} {∥ p_{n} (x_{m}) - p_{n}^{d} (x_{m}) ∥}^{2}}{\frac{1}{N M} \sum_{m = 1}^{M} \sum_{n = 1}^{N} {∥ p_{n}^{d} (x_{m}) ∥}^{2}},$

(23)

where $p_{n} (x_{m})$ and $p_{n}^{d} (x_{m})$ denote the reproduced and desired sound pressure at the point $x_{m}$ , respectively. This measure is commonly used for evaluating the accuracy of sound field reproduction systems.

4.2. Regular Loudspeaker Array

We first simulate the case of a regular circular loudspeaker array consisting of 8 evenly distributed loudspeakers, whose center is located at the center of the room and radius is 2 m. The dimension of the room is 8 m × 6 m × 4 m. Six control points (or matching microphones) form a concentric circle with the loudspeaker array, inside which is our sound reproduction zone. We have one matching point at the center of the circle, and the other five points evenly distributed on a circle with a radius of 0.2 m, as shown in Figure 1. The sampling frequency is set to 16 kHz. The RIRs are generated using the RIR Generator toolbox [25], which is based on the image source method [26]. The desired sound field comes from a point source located at

y

= (6 m, 5 m, 2 m).

In Figure 2, we plot the NMSE of the reproduced intensity

ϵ

varying with the tuning parameter

τ

. The case of

τ = 0

or

τ = 1

corresponds to controlling only the pressure or particle velocity, for which the reproduced sound intensity has a large error. The minimum intensity reproduction error occurs at

τ = 0.5

, which is about 5 dB and 3 dB lower compared to the case of controlling only the pressure and the particle velocity, respectively. These results demonstrate that when the pressure and velocity are controlled with equal weights, which approximates sound-intensity control, the best reproduction performance can be obtained. As stated in literature, the sound intensity is closely related to source location perception [15], the proposed method also achieves the optimal reproduction results with sound intensity control under the regular loudspeaker array layout.

We further investigate the validity of the proposed method in different reverberation conditions with varying control filter length. Figure 3 plots the NMSE of the reproduced sound intensity with the reverberation time

R T 60

increasing from 0.2 s to 0.7 s while the control filter length is set as a constant value of

J = 400

. As the reverberation time

R T 60

increase, the reproduction error also increases. In Figure 4, we change the control filter length, i.e.,

J = 100, 200, 400, 800, 1000, 1600

, to examine the reproduction performance. When

J < 800

, the error decreases rapidly with the increasing control filter length; however, when J is more than 800, the trend of the error decreases tends to be stable.

We then investigate the reproduction performance within the entire reproduction region, especially the reproduction results at uncontrolled points. We randomly selected 20 uncontrolled points, whose positions are shown in Table 2. The corresponding reproduction error

ϵ

and

η

with different values of the tuning parameter

τ

are drawn in Figure 5 and Figure 6, respectively. It can be seen that the NMSE value of both reproduced intensity and sound pressure firstly decreases and then increases at these uncontrolled points. The minimum error for the intensity reproduction and the sound pressure reproduction occurs at

τ = 0.6

and

τ = 0.4

, respectively. Compared with previous works based on multi-point pressure matching which obtain the optimal reproduction performance only at the matching (or control) points, the above results demonstrate that the proposed method with an appropriate value of

τ

can achieve an accurate reproduction over an enlarged area, even within the entire control region. In other words, jointly controlling the sound pressure and particle velocity helps to enlarge the sound reproduction area.

4.3. Irregular Loudspeaker Array

Next, we investigate the proposed method on an irregular loudspeaker array. We adopt the widely used ITU-T standard 5.1 setup layout without the woofer unit to validate our method, as shown in Figure 7. In this setup, the angle between the left (right) and the center loudspeaker is

30^{\circ}

, the angle between the surround left (surround right) and the center loudspeaker is

110^{\circ}

, respectively. The other simulation setup is the same as Section 4.2.

Figure 8 and Figure 9 show the NMSE of the reproduced intensity

ϵ

varying with the tuning parameter

τ

at the control and the uncontrolled points, respectively. At both the controlled and uncontrolled points, the reproduction error

ϵ

shows a consistent trend, that is, first decreasing and then increasing with the increase value of the tuning parameter

τ

. The minimum error occurs when

τ = 0.7

in both Figure 8 and Figure 9, proving that a jointly control of sound pressure and particle velocity with an adjustable weighting parameter has the flexibility to be adapted to the irregular loudspeaker array layout.

Figure 10 and Figure 11 show the NMSE of the reproduced sound pressure

η

varying with the tuning parameter

τ

at the control and the uncontrolled points, respectively. Though Figure 10 indicates that the NMSE of the reproduced sound pressure monotonically increase as the parameter

τ

increase for reproduction at the control points. Plots in Figure 11, on the other hand, demonstrate that, at the uncontrolled points, the minimum sound pressure reproduction error occurs at

τ

= 0.2∼0.4, which is also lower than that of

τ = 0

or

τ = 1

, i.e., the sole control of pressure and particle velocity. We can draw the conclusion that a joint control of sound pressure and particle velocity is beneficial to improve both sound pressure and sound intensity reproduction within the entire reproduction region.

4.4. Computation Complexity Performance

Finally, we examine the computational complexity performance of the proposed method, which uses the CG method to avoid a large-sized matrix inverse operation. We compared the processing time of the direct inverse operation and the CG method for implementing Equation (19). The run times were computed on a laptop with 2.4 GHz Intel(R) Core(TM) i5-1135G7 CPU with the algorithm simulated on the MATLAB R2020b. The cases of different iteration numbers, i.e.,

I = 100, 200, 400, 800

, are simulated, in comparison with the direct inverse operation, and the results are shown in Table 3.

Figure 12 plots the NMES of the reproduced intensity

ϵ

using the direct inverse operation and the CG method with different iteration numbers, i.e.,

I = 100, 200, 400, 800

. Combined with the results in Table 3, we can see that the CG method has almost the same reproduction accuracy as the direct inverse operation but significantly reduces the processing time.

5. Conclusions

We have proposed a time-domain sound field reproduction method with sound pressure and particle velocity jointly controlled. The control was formulated using a Lagrangian cost function with a tuning parameter to adjust the control weights, which gives the flexibility to achieve the optimal control at different loudspeaker array layouts. While most existing works implement particle velocity or sound intensity assisted sound field reproduction in frequency domain, the present work focused on time-domain reproduction and adopted the conjugate gradient method to reduce computational complexity. The proposed method was evaluated on both a regular loudspeaker array layout and an irregular loudspeaker array layout. We demonstrated that the proposed method improves both sound pressure and sound intensity reproduction with reduced computational complexity. Given that the reproduction system of controlling the particle velocity is especially suitable to a non-uniformly spaced loudspeaker array with reduced number of loudspeakers and control points required, the present work has the potential in real-time sound field reproduction applications when the reproduction environment is time varying, such as in-car audio systems.

Author Contributions

Conceptualization, W.Z. and X.H.; methodology, X.H.; validation, X.H. and J.W.; writing—original draft preparation, X.H. and J.W.; writing—review and editing, W.Z. and L.Z.; supervision, L.Z.; funding acquisition, W.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (NSFC) under Grant Nos. 61671380.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Zhang, W.; Parasanga, N.S.; Chen, H.; Abhayapala, T.D. Surround by sound: A review of spatial audio recording and reproduction. Appl. Sci. 2017, 7, 532. [Google Scholar] [CrossRef]
Berkhout, A.J. A holographic approach to acoustic control. J. Audio Eng. Soc. 1988, 36, 977–995. [Google Scholar]
Berkhout, A.J.; de Vries, D.; Vogel, P. Acoustic control by wave field synthesis. J. Acoust. Soc. Am. 1993, 93, 2764–2778. [Google Scholar] [CrossRef]
Spors, S.; Rabenstein, R. Spatial aliasing aritifacts produced by linear and circular loudspeaker arrays used for wave field synthesis. In Proceedings of the 120th Audio Engineering Society Convention, Paris, France, 20–23 May 2006. [Google Scholar]
Gerzon, M.A. Periphony: With-height sound reproduction. J. Audio Eng. Soc. 1973, 21, 2–10. [Google Scholar]
Gerzon, M.A. Ambisonics in multichannel broadcasting video. J. Audio Eng. Soc. 1985, 33, 859–871. [Google Scholar]
Poletti, M.A. Three-dimensional surround sound systems based on spherical harmonics. J. Audio Eng. Soc. 2005, 53, 1004–1025. [Google Scholar]
Daniel, J. Spatial sound encoding including near field effect: Introducing distance coding filters and a viable, new ambisonic format. In Proceedings of the 23rd AES International Conference: Signal Processing in Audio Recording and Reproduction, Copenhagen, Denmark, 23–25 May 2003. [Google Scholar]
Ahrens, J.; Spors, S. Applying the ambisonics approach to planar and linear distributions of secondary sources and combinations thereof. Acta Acust. United Acust. 2012, 98, 28–36. [Google Scholar] [CrossRef] [Green Version]
Ward, D.B.; Abhayapala, T.D. Reproduction of a plane-wave sound field using an array of loudspeakers. IEEE Trans. Speech Audio Process. 2001, 9, 697–707. [Google Scholar] [CrossRef] [Green Version]
Kirkeby, O.; Nelson, P.A. Reproduction of plane wave sound fields. J. Acoust. Soc. Am. 1993, 94, 2992–3000. [Google Scholar] [CrossRef]
Koyama, S.; Chardon, G.; Daudet, L. Optimizing source and sensor placement for sound field control: An Overview. IEEE/ACM Trans. Audio Speech Lang. Process. 2020, 28, 696–714. [Google Scholar] [CrossRef] [Green Version]
Bianco, F.; Teti, L.; Licitra, G.; Cerchiai, M. Loudspeaker FEM modelling: Characterisation of critical aspects in acoustic impedance measure through electrical impedance. Appl. Acoust. 2017, 124, 20–29. [Google Scholar] [CrossRef]
Shin, M.; Nelson, P.A.; Fazi, F.M.; Seo, J. Velocity controlled sound field reproduction by non-uniformly spaced loudspeakers. J. Sound Vib. 2016, 370, 444–464. [Google Scholar] [CrossRef] [Green Version]
Zuo, H.; Abhayapala, T.D.; Samarasinghe, P.N. Particle velocity assisted three dimensional sound field reproduction using a modal-domain approach. IEEE/ACM Trans. Audio Speech Lang. Process. 2020, 28, 2119–2133. [Google Scholar] [CrossRef]
Buerger, M.; Hofmann, C.; Kellermann, W. Broadband multizone sound rendering by jointly optimizing the sound pressure and particle velocity. J. Acoust. Soc. Am. 2018, 143, 1477–1490. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Zuo, H.; Abhayapala, T.D.; Samarasinghe, P.N. Intensity based spatial soundfield reproduction using an irregular loudspeaker array. IEEE/ACM Trans. Audio Speech Lang. Process. 2020, 28, 1356–1369. [Google Scholar] [CrossRef]
Zuo, H.; Abhayapala, T.D.; Samarasinghe, P.N. 3D multizone soundfield reproduction in a reverberant room using intensity matching method. In Proceedings of the 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, 6–11 June 2021. [Google Scholar]
Feng, Q.; Yang, F.; Yang, J. Time-domain sound field reproduction using the group Lasso. J. Acoust. Soc. Am. 2018, 143, EL55–EL60. [Google Scholar] [CrossRef] [PubMed]
Molés-Cases, V.; Piñero, G.; de-Diego, M.; Gonzale, A. Personal sound zones by subband filtering and time domain optimization. IEEE/ACM Trans. Audio Speech Lang. Process. 2020, 28, 2684–2696. [Google Scholar] [CrossRef]
Lee, T.; Shi, L.M.; Nielsen, J.K.; Christensen, M.D. Fast generation of sound zones using variable span trade-off filters in the DFT-domain. IEEE/ACM Trans. Audio Speech Lang. Process. 2021, 29, 363–378. [Google Scholar] [CrossRef]
Shi, L.M.; Lee, T.; Zhang, L.; Nielsen, J.K.; Christensen, M.D. Generation of personal sound zones with physical meaningful constraints and conjugate gradient method. IEEE/ACM Trans. Audio Speech Lang. Process. 2021, 29, 823–837. [Google Scholar] [CrossRef]
Williams, E.G. Fourier Acoustics: Sound Radiation and Nearfield Acoustical Holography; Academic Press: London, UK, 1999. [Google Scholar]
Gerzon, M.A. General metatheory of auditory localisation. In Proceedings of the 92nd Convention of the Audio Engineering Society, Vienna, Austria, 24–27 March 1992. [Google Scholar]
Habets, E.A.P. Room Impulse Response Generator. 2010. Available online: http://home.tiscali.nl/ehabets/rir_generator.html (accessed on 20 September 2010).
Allen, J.B.; Berkley, D.A. Image method for efficiently simulating small-room acoustics. J. Acoust. Soc. Am. 1979, 65, 943–950. [Google Scholar] [CrossRef]

Figure 1. Simulation setup: 8 loudspeakers locate on a circle with a radius of 2 m, which are denoted by the black squares. The red dots denote the controlled points. The blue star indicates the location of the virtual sound source.

Figure 2. The NMSE of the reproduced intensity at control points with different tuning parameter

τ

. The reverberation time

R T 60

= 200 ms and the control filter length

J = 400

.

Figure 2. The NMSE of the reproduced intensity at control points with different tuning parameter

τ

. The reverberation time

R T 60

= 200 ms and the control filter length

J = 400

.

Figure 3. The NMSE of the reproduced intensity at control points under different reverberation times. The tuning parameter

τ = 0.5

and the control filter length

J = 400

.

Figure 3. The NMSE of the reproduced intensity at control points under different reverberation times. The tuning parameter

τ = 0.5

and the control filter length

J = 400

.

Figure 4. The NMSE of the reproduced intensity at controlled points with varying filter length. The tuning parameter

τ = 0.5

and reverberation time

R T 60 = 200

ms.

Figure 4. The NMSE of the reproduced intensity at controlled points with varying filter length. The tuning parameter

τ = 0.5

and reverberation time

R T 60 = 200

ms.

Figure 5. The NMSE of the reproduced intensity at uncontrolled points. The reverberation time

R T 60

= 200 ms and the control filter length

J = 400

.

Figure 5. The NMSE of the reproduced intensity at uncontrolled points. The reverberation time

R T 60

= 200 ms and the control filter length

J = 400

.

Figure 6. The NMSE of the reproduced sound pressure at uncontrolled points. The reverberation time

R T 60

= 200 ms and the control filter length

J = 400

.

Figure 6. The NMSE of the reproduced sound pressure at uncontrolled points. The reverberation time

R T 60

= 200 ms and the control filter length

J = 400

.

Figure 7. Simulation setup: The 5 loudspeakers are configured as the ITU-T standard 5.1, which are denoted by the black squares. The red dots denote the control points. The blue star indicates the location of the virtual sound source.

Figure 8. The NMSE of the reproduced intensity at controlled points. The reverberation time

R T 60

= 200 ms and the control filter length

J = 400

.

Figure 8. The NMSE of the reproduced intensity at controlled points. The reverberation time

R T 60

= 200 ms and the control filter length

J = 400

.

Figure 9. The NMSE of the reproduced intensity at uncontrolled points. The reverberation time

R T 60

= 200 ms and the control filter length

J = 400

.

Figure 9. The NMSE of the reproduced intensity at uncontrolled points. The reverberation time

R T 60

= 200 ms and the control filter length

J = 400

.

Figure 10. The NMSE of the sound pressure

η

at controlled points. The reverberation time

R T 60

= 200 ms and the control filter length

J = 400

.

Figure 10. The NMSE of the sound pressure

η

at controlled points. The reverberation time

R T 60

= 200 ms and the control filter length

J = 400

.

Figure 11. The NMSE of the sound pressure

η

at uncontrolled points. The reverberation time

R T 60

= 200 ms and the control filter length

J = 400

.

Figure 11. The NMSE of the sound pressure

η

at uncontrolled points. The reverberation time

R T 60

= 200 ms and the control filter length

J = 400

.

Figure 12. The NMSE of the reproduced intensity using the direct inverse and the adopted CG method.

I = 100, 200, 400, 800

is the iteration number of CG. The reverberation time

R T 60

= 200 ms and the control filter length

J = 400

.

Figure 12. The NMSE of the reproduced intensity using the direct inverse and the adopted CG method.

I = 100, 200, 400, 800

is the iteration number of CG. The reverberation time

R T 60

= 200 ms and the control filter length

J = 400

.

Table 1. Conjugate gradient algorithm for implementing the proposed time-domain sound field reproduction system.

INITIALIZATION:
1. Calculate the spatial autocorrelation matrix $R$ and the spatial cross-correlation vector $r$ using (16) and (17)
2. Set the initial value of the filter ${\hat{q}}_{1} = 0_{L J \times 1}$ , the initial search direction vector $d_{1} = r$ , and the initial residual error
$e_{1} = r$
3. Set the number of iterations I
LOOP: for $i = 1, 2, \dots, I$
1. Determine the step $α_{i}$ of the ith iteration according to $α_{i} = \frac{e_{i}^{T} e_{i}}{d_{i}^{T} R d_{i}}$
2. Update the estimates of the control filter ${\hat{q}}_{i + 1} = {\hat{q}}_{i} + α_{i} d_{i}$ and the residual error $e_{i + 1} = e_{i} - α_{i} R d_{i}$
3. Calculate the factor $β_{i + 1}$ that satisfies the conjugation condition, that is, $β_{i + 1} = \frac{e_{i + 1}^{T} e_{i + 1}}{e_{i}^{T} e_{i}}$
4. Calculate the $i + 1$ th search direction vector $d_{i + 1} = e_{i + 1} + β_{i + 1} d_{i}$

Table 2. The 20 uncontrolled point positions.

Uncontrolled Point No.	x (m)	y (m)	z (m)
1	4.0492	3.0087	2.0000
2	4.0070	3.0495	2.0000
3	3.9551	3.0219	2.0000
4	3.9653	2.9640	2.0000
5	4.0235	2.9559	2.0000
6	4.0693	3.0400	2.0000
7	3.9834	3.0783	2.0000
8	3.9204	3.0084	2.0000
9	3.9675	2.9269	2.0000
10	4.0595	2.9465	2.0000
11	4.0376	3.1034	2.0000
12	3.9133	3.0677	2.0000
13	3.9088	2.9385	2.0000
14	4.0303	2.8943	2.0000
15	4.1099	2.9962	2.0000
16	4.0900	3.1072	2.0000
17	3.9258	3.1187	2.0000
18	3.8642	2.9661	2.0000
19	3.9902	2.8603	2.0000
20	4.1298	2.9476	2.0000

Table 3. The processing time of the direct inverse operation and the CG method, where the corresponding computational complexity is

O ({(L J)}^{3})

and

O (I {(L J)}^{2})

, respectively. I is the iteration number of the CG method. The reverberation time

R T 60

= 200 ms and the control filter length

J = 400

.

Table 3. The processing time of the direct inverse operation and the CG method, where the corresponding computational complexity is

O ({(L J)}^{3})

and

O (I {(L J)}^{2})

, respectively. I is the iteration number of the CG method. The reverberation time

R T 60

= 200 ms and the control filter length

J = 400

.

Method	Inverse	I = 100	I = 200	I = 400	I = 800
Time(s)	203.97	25.53	29.90	39.02	70.75

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hu, X.; Wang, J.; Zhang, W.; Zhang, L. Time-Domain Sound Field Reproduction with Pressure and Particle Velocity Jointly Controlled. Appl. Sci. 2021, 11, 10880. https://doi.org/10.3390/app112210880

AMA Style

Hu X, Wang J, Zhang W, Zhang L. Time-Domain Sound Field Reproduction with Pressure and Particle Velocity Jointly Controlled. Applied Sciences. 2021; 11(22):10880. https://doi.org/10.3390/app112210880

Chicago/Turabian Style

Hu, Xuanqi, Jiale Wang, Wen Zhang, and Lijun Zhang. 2021. "Time-Domain Sound Field Reproduction with Pressure and Particle Velocity Jointly Controlled" Applied Sciences 11, no. 22: 10880. https://doi.org/10.3390/app112210880

APA Style

Hu, X., Wang, J., Zhang, W., & Zhang, L. (2021). Time-Domain Sound Field Reproduction with Pressure and Particle Velocity Jointly Controlled. Applied Sciences, 11(22), 10880. https://doi.org/10.3390/app112210880

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Time-Domain Sound Field Reproduction with Pressure and Particle Velocity Jointly Controlled

Abstract

1. Introduction

2. Review: Frequency-Domain Velocity-Assisted Sound Field Reproduction

3. Proposed: Time-Domain Sound Field Reproduction with Joint Control of Sound Pressure and Particle Velocity

3.1. System Formulation

3.2. EVD-Based Approach with Conjugate Gradient Algorithm

4. Simulation

4.1. Performance Evaluation Metrics

4.2. Regular Loudspeaker Array

4.3. Irregular Loudspeaker Array

4.4. Computation Complexity Performance

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI