A Measure Based on Beamforming Power for Evaluation of Sound Field Reproduction Performance

: This paper proposes a measure to evaluate sound field reproduction systems with an array of loudspeakers. The spatially-averaged squared error of the sound pressure between the desired and the reproduced field, namely the spatial error, has been widely used, which has considerable problems in two conditions. First, in non-anechoic conditions, room reflections substantially deteriorate the spatial error, although these room reflections affect human localization to a lesser degree. Second, for 2.5-dimensional reproduction of spherical waves, the spatial error increases consistently due to the difference in the amplitude decay rate, whereas the degradation of human localization performance is limited. The measure proposed in this study is based on the beamforming powers of the desired and the reproduced fields. Simulation and experimental results show that the proposed measure is less sensitive to room reflections and the amplitude decay than the spatial error, which is likely to agree better with the human perception of source localization.

For the physical evaluation of the spatial feature, the spatially-averaged squared pressure error between the desired and the reproduced fields, called the spatial error in what follows, has been widely used [5][6][7][8]12,14,15,25,34], whilst listening tests are conducted for perceptual evaluation of human localization. However, the spatial error does not always correspond to human localization [29]. For example, if a sound source is reproduced in non-anechoic rooms, room reflections significantly increase the spatial error, although human localization is affected to a lesser degree. In addition, differences in the amplitude decay with the distance increase the spatial error consistently, but play a less important role from a perceptual view point [35][36][37][38]. The aim of this paper is to propose a physical measure to evaluate the spatial feature of the reproduced sound fields that potentially has better agreement with human localization than the spatial error does. Since it is nearly impossible that a physical measure perfectly reflects human localization as long as human perception of source localization is not fully understood, this paper attempts to reduce the effects of the room reflections and amplitude decay with distance while evaluating the directions of sound waves as effectively as the spatial error.
Room reflections and spatial amplitude decay are relevant issues to practical reproduction systems. Most reproduction systems are installed in highly-damped rooms, not in anechoic conditions, although most reproduction methods strictly assume a free-field condition. There are a few studies that attempt to make use of room reflections [39,40], but these methods are feasible only if the room geometry and absorption properties are known, which is not always possible in practice. Moreover, linear arrays and circular arrays are implemented in many cases, which is called a 2.5-dimensional (2.5D) reproduction. When a 2.5D reproduction is used to reproduce a monopole source, the reproduced monopole does not have the amplitude decay of 1/r, where r is the distance to the monopole. Unwanted reflections and unsuccessful reproduction of the amplitude decay would consistently magnify the spatial error, of which the perceptual effect is found to be rather limited [35][36][37][38].
In order to overcome the limitations of the spatial error, the present study proposes a measure based on the beamforming power, or the beam-power. The beam-powers are calculated based on the sound field in a control region, whereas the other binaural measures, such as inter-aural time difference (ITD) or inter-aural level difference (ILD), are specific to the position and direction of the listener at each time. The main reason why beamforming is used for the measure is that it can concentrate the direct sound energy in a control region along its propagating direction, whereas the reflections are, more or less, uniformly distributed. Thus, the effect of the reflections can be suppressed by a directional filter that gives a higher weighting around the intended direction of a reproduced source. After applying the directional filter, the error of the beam-power between the desired and the reproduced fields is quantified and termed the beam-power error. The effect of the difference in the amplitude decay vanishes when the beam-power is normalized (See Section 3). A typical circular loudspeaker array with 2.5D HOA reproduction is used as an example.
The research problems of this paper are defined in Section 2. The proposed measure and its basic properties are investigated via simulations in Section 3. In Section 4, the beam-power error is compared with the spatial error in a simple pre-experiment and experiments with 2.5D HOA with a circular array of 16 loudspeakers in a damped room. A plane wave or a sound wave generated by a monopole in free field condition is defined as the desired field in many studies because any sound fields can be expressed as the superposition of plane waves or monopoles [5][6][7][9][10][11]. In this study, the desired field is defined by a monopole:

Spatial Error in Sound Field Reproduction
where A is the monopole amplitude, v R is the distance between the position of the monopole v r  and a point in the control region The spatially-averaged error f e is defined as the error between the desired and the reproduced fields: This measure quantifies how similar the reproduced field is to the desired in the space domain in terms of the pressure distribution.

Effect of Room Reflections
Loudspeaker-based reproduction systems are installed in non-anechoic conditions in most practical cases, and consequently unintended room reflections are included in the transfer function of the loudspeakers as follows, On the contrary, room reflections are known to affect human localization of the sound source to a lesser degree according to law of the first wave front [35]. Late reflections hardly affect human localization of the sound source, and early reflections up to 10 ms can smear or shift the image of the perceived source location [35], but do not significantly change the localized direction. By removing the late reflections with a time window [41,42], the effect of reflections on the spatial error can be suppressed to some extent. Yet, some early reflections cannot be perfectly removed due to overlap, and increase the spatial error.
Hence, in non-anechoic conditions, the spatial error gives little explanations on the reproduced field [29]. This means that the spatial error cannot show whether or not there are experimental errors, such as positioning errors of microphones and loudspeakers, or difference between loudspeaker models and actual loudspeaker responses.

Effect of Amplitude Decay with Distance
The amplitude of a monopole decays with distance by 1/r, where r is the distance to the monopole position. If a linear array or a circular array of loudspeakers is used, e.g., a 2.5D reproduction, the amplitude decay will differ. Figure 2 shows the amplitude decay in a computer simulation where the reproduced field is generated with a circular array of 16 loudspeakers at 1 kHz. The radius of the circular array is 1.5 m, and the position of a desired monopole is at (10 m, 0). The desired and the reproduced decays are compared with a special focus in the controllable zone (9.6 m < x < 10.4 m). The magnitude is normalized to have 0 dB at the center of the controllable region (x = 10). The reproduced field differs clearly from the desired decay. The difference in the amplitude decay would significantly increase the spatial error. On the contrary, the amplitude decay hardly affects the human localization in terms of perception; directional perception is mainly due to the inter-aural time difference (ITD) and the inter-aural level difference (ILD), and the distance perception is more attributed to the loudness and coloration than the amplitude decay [35][36][37][38].

Beam-Power and Directional Filter
Beamforming methods can distribute the acoustic energy in a sound field along the propagating directions. When the beamforming methods are applied to a successfully-reproduced field in a free-field condition, the beam-power should have the maximum value at the intended direction of the virtual monopole source. Even in non-anechoic conditions, the direction of the maximum value does not vary considerably because the beam-power of the direct sound is still dominant, while room reflections are distributed in many directions. The effect of the reflections can be reduced by using a directional filter that gives a greater weighting towards the monopole direction.
To obtain a consistent measure across the frequency, the main lobe needs to have a constant width. This width depends on the beamforming method and the aperture size of the sensor array. If HOA is used, the reproducible region is determined as kr < N [5], where k is the wave number, N is the order of the HOA, and r is the radius of the region. This means that the aperture size decreases with the frequency. Delay-and-sum (DAS) beamforming is chosen [43], because DAS beamforming also has the narrower main lobe width at the higher frequency for a constant aperture size. Thus, HOA is used together with DAS beamforming. If the reproducible region is frequency-invariant, constant directivity beamforming methods need to be used.
The desired beamformer output with respect to the assumed azimuth angle φ c for the DAS beamforming can be calculated in frequency domain as: where a is the radius of the controllable region of the N-th order HOA (a = N/k), and Rc is the distance between the measurement point ( ) , and the radius c r is assumed to be identical to the distance of the monopole v r . If v r is assumed to be much greater than the radius of the reproducible region a (rv >> a) for simplicity, inserting Equation (2) into Equation (7) leads to: where 1 J is the first-order Bessel function of the first kind. See Appendix A for more details.
This beam-former output is frequency-invariant as ka is constant as N. To make it independent of the distance, the normalized beam-power β  N can be defined as: Then, the effect of the amplitude decay with the distance vanishes.
The beam-former output of the reproduced field ( ) φ c B can be obtained in the same way, Unlike the ideal case (Equation (8)), the outcome of this integration is a complex number in general. The reproduced field contains the direct sound and reflections. The effect of these reflections can be reduced by a time window for excluding late reflections and a directional filter ( ) φ c W for reducing the weighting at remote directions from φ v . In this study, a directional filter that has the maximum at φ = φ c v is defined as: The beam-power error is then defined as: where is the normalized beam-power of the reproduced field.

Properties of the Beam-Power Error
In order to investigate the properties of the beam-power error, two sound fields are evaluated with the spatial error and the beam-power error via numerical simulations. A reference sound field has a monopole in a free field, corresponding to the desired field. To compare with the reference condition, two modifications are made: inclusion of room reflections and change in the monopole position, i.e., change in wave fronts. The position of the monopole v r  is (0, 0.9 m), and the order of HOA, N, is 7.
In the first modification, the reproduced field has room reverberation on top of the desired field. A small room (4.5 m × 4.4 m × 2.5 m) is assumed, and reflections are simulated by the image source method [44,45] with the pressure reflection coefficient, ρ, varying from 0.1 to 0.9. Late reflections that arrive after the mixing time ( 7.0 mix t V = = ms) are excluded with a time window. The mixing time has been proposed as a criterion that separates early reflections from late reflections [46][47][48]. Figure 4 shows the beam-power distributions of the desired and the reproduced fields when the reflection coefficient is 0.9. The side lobes of the reproduced field have greater values than the desired field due to the strong reflections, but the main lobe has a small difference in these two curves, decreasing the beam-power error. Figure 5 compares the beam-power error with the spatial error with respect to the reflection coefficient. The beam-power error has at least 20 dB lower values than the spatial error, having lower values than −20 dB even for high reflection coefficients.  In the second modification, a wrong monopole location is used for the reproduced field in an anechoic condition. This error should be effectively detected because it is induced by the error in the direction of sound waves, which can be easily perceived by human listeners. Figure 6 shows the spatial error and the beam-power error with respect to the difference of the monopole direction, ∆φ, where both errors differ by 1.7 dB on average and 2.0 dB at the most. This means that the beam-power error can detect the error due to the change in the direction of the monopole as effectively as the spatial error does.
In summary, the beam-power error is much less sensitive to room reflections than the spatial error, but detects faults in the direction of arrival as effectively as the spatial error does.

Pre-Experiment with One Source On
As a pre-experiment, the proposed measure was compared with the spatial error in a real reproduction room where only one loudspeaker was turned on. The reproduction room was an acoustically-damped room (4.5 m × 4.4 m × 2.5 m) [27,28,30] built at the Technical University of Denmark, where the reverberation time T30 was 0.16 s in the 125 Hz octave band, and below 0.1 s for higher frequencies. The loudspeaker was located at (0, 1.8 m). Figure 7 illustrates this setup, and the loudspeaker is indicated as #1.
The measurement was conducted with a planar array of 60 (6 × 10) microphones (B&K type 4957) at 15 positions (5 × 3). The measurement region was a square in the horizontal plane (2.25 m × 2.25 m) indicated in Figure 7, and the spacing of the measurement points was 7.5 cm. The total number of the measurement points was, thus, 900 (30 × 30) (Figure 7). The microphones were calibrated, and the phase difference among the microphones was ±5° up to 3 kHz. The microphone signals were recorded with a multi-channel analyzer (B&K frontend frame 3560D, modules 7537A and 3038B). To synchronize the measurement at each position, the signal fed into the loudspeaker was also measured. The background noise was lower than 10 dB SPL above 80 Hz, and the temperature and the humidity were 21 °C and 42%. The sampling frequency was 16,384 Hz.
The control region was assumed to be a circular region ( kr N < ) with the maximum order N = 7. The measurement points located in the control region were used for the beamforming and, thus, the number of the measurement points varied with the frequency. For frequencies lower than 340 Hz, the entire control region was not measured because some of the control region was outside the measurement region. The late reflections that arrived after the mixing time were excluded by the time window as used in the simulation. Figure 8 shows the error measures with frequency. The spatial error varies from around −10 to 0 dB values at all frequencies, which is a considerable amount. Since only one loudspeaker was used, the sound field was close to that generated by a monopole at the loudspeaker position, except for room reflections, and human localization is expected to be stable because the room is acoustically damped. Informal listening tests also confirmed a stable localization performance. In contrast, the beam-power error has values lower than -20 dB above 200 Hz. The spatial error has periodic fluctuations approximately every 280 Hz. The period of this frequency corresponds to the time delay between the direct sound and the reflection from the floor. Hence, the floor reflection is the main cause of the periodic increased spatial error, whereas the beam-power error is hardly affected by such reflections. The beam-power error is mainly affected by the difference in the angle φ c of the maximum beam-power between the desired and the reproduced field.

Experiment with 2.5D HOA
The spatial error and the beam-power error were compared in the same room, yet with 2.5D HOA [6]. A circular array of 16 loudspeakers at equiangular positions was used, and the radius of the circle was 1.8 m (Figure 7). The loudspeakers had ±3 dB deviation from the mean value in the entire frequency range of interest. The details of the filter used were described in Appendix B. The loudspeakers were modeled as monopoles.
Two virtual source positions were aimed for reproduction as illustrated in Figure 7. A monopole is reproduced in each experiment as indicated by 'a' and 'b' in Figure 7. The position 'a' is (1.8 m, 90 degrees), and the position 'b' is (1.8 m, 101.25 degrees). The first position corresponds to that of the first loudspeaker, and the second one is at the middle point of the first and the second loudspeakers. Figure 9 compares the spatial error and the beam-power error for the desired monopole located at a (left) and b (right). As the position 'a' coincide with the loudspeaker #1, one might expect a better reproduction performance than that for the position 'b'. The spatial error, however, shows similar values and fluctuations with the frequency for those two cases. The beam-power error for position 'a' turns out to be smaller than that for position 'b'.
As explained in Figure 8, these peaks in the spatial error are ascribed to the floor reflection. On the other hand, the beam-power error is less affected by the floor reflection.

Discussion
A time window was used in the experiment to exclude the late reflections based on the concept of mixing time. In general, as the window becomes shorter, both the spatial and beam-power error becomes smaller because the direct sound becomes more predominant. Controlling such a window length can be useful to check whether or not the sound field is close to a free-field condition. Early reflections were included in this study, which occasionally shifted the peak in the beam-power. This might be related to the fact that early reflections of 5-10 ms can shift the perceived source position in human listening [35,49]. This relation needs to be further studied, but it is beyond the scope of the present study.
As shown in Figure 4, the beam-power distribution shows the difference between the desired and the reproduced fields with respect to the assumed direction. The shift of the propagating direction of the direct sound is shown in the main-lobe, and the effect of excessive reflections is shown mainly in the side lobes. This means that the beam-power distribution gives more information than the spatial distribution does.
In addition, as shown in Figure 6, if there are no reflections, it is expected that the beam-power error can detect the difference in the main lobe as effectively as the spatial error. Thus, for reproduced sound fields by simple panning methods, such as VBAP (vector-based amplitude panning) [50], the beam-power error is expected to be as large as the spatial error. On the other hand, because of the directional filter, the difference in the side lobes can be underestimated compared with the spatial error.
Recently, another measure called planarity has been proposed for evaluation of the reproduced fields [51]. The planarity quantifies the similarity to a plane wave sound field using the ratio between the intensity component in the direction of the plane wave and the total energy flux by the beamforming technique. Although the planarity and the proposed beam-power error use the beamforming method in common, the main difference is that the planarity does not take the shape of the main lobe into consideration because the planarity is calculated as the weighted sum of the beam-power. This makes the planarity insensitive to the difference in the wavefronts. Figure 10 shows the planarity for the case considered in Figure 6. The planarity decreases as the angle difference increases as expected, but the change is not significant. For example, the spatial error and the beam-power error increases by around 15 dB for the shift of five degrees, while the planarity decreases from 0.970 to 0.967. Consequently, it is concluded that the beam-power error is more effective to detect the difference in wave fronts.
The experimental example used a specific reproduction method (2.5D HOA) in a specific listening room, and sound fields in the controllable region on the horizontal plane, which is frequency dependent (kr < N), were used to obtain the beam-power distribution. However, this measure can be used in other circumstances. For example, a frequency-independent spherical region that just includes the listener's head can be chosen, because the sound field in this region is directly related to the sound pressure that the listener would have at two ears. If 3D reproduction methods are used, and the elevation angle of the virtual source is also of concern, the beamformer output can be extended to the two-dimensional case that the beamformer output is expressed as ( ) The present study is limited to the physical evaluation of sound fields. Subjective tests are needed to prove that the proposed measure has good agreements with human localization in future works.

Conclusions
The present study proposed a measure that can quantify how well a sound field is reproduced based on a beamforming technique. Instead of directly comparing the reproduced field pressure with that of the desired field, as the spatial error does, the proposed measure compares the beam-power between the reproduced and desired sound field. The spatial error is overly affected by the amplitude decay with distance and early reflections in rooms and overestimated for a simple sound field generated by one loudspeaker in a highly-damped room. The proposed beam-power measure is less sensitive to the amplitude decay over distance and room reflections, which could better correspond with human perception. The beam-power error can pick up the difference in wavefronts of the direct sound in an equally effective manner as the spatial error. This measure can be useful particularly when a reproduction system is installed in non-anechoic rooms or a 2.5D reproduction is conducted with a linear or a circular array of loudspeakers. The desired field in free field ( ) , φ  f P r can also be approximated as: Inserting Equations (A1) and (A2) into Equation (7):

Appendix B. Near-Field Compensated Higher-Order Ambisonics
The reproduction method, 2.5D NFC-HOA, can generate a monopole source with a circular array of loudspeakers [30]. The distance and the angle of the monopole source can be controlled in the horizontal plane of the loudspeakers. The loudspeakers are modeled as monopoles.
The NFC-HOA filter can be written with complex cylindrical harmonics as follows. For each monopole source located at ( )  The number of the coefficients for a given N in Equation (A6) is 2N + 1. It is well known that the number of loudspeakers should be greater than 2N + 1 for an accurate reproduction [5]. In this experiment, 16 loudspeakers were used, which allow for a maximum order of N = 7.