Stereophonic Microphone Array for the Recording of the Direct Sound Field in a Reverberant Environment

State-of-the-art stereo recording techniques using two microphones have two main disadvantages: first, a limited reduction of the reverberation in the direct sound component, and second, compression or expansion of the angular position of sound sources. To address these disadvantages, the aim of this study is the development of a true stereo recording microphone array that aims to record the direct and reverberant sound field separately. This array can be used within the recording and playback configuration developed in Grosse and van de Par, 2015. Instead of using only two microphones, the proposed method combines two logarithmically-spaced microphone arrays, whose directivity patterns are optimized with a superdirective beamforming algorithm. The optimization allows us to have a better control of the overall beam pattern and of interchannel level differences. A comparison between the newly-proposed system and existing microphone techniques shows a lower percentage of the recorded reverberance within the sound field.


Introduction
Sound reproduction systems play an important role in our everyday life.They allow us to listen to recordings from a different place and a past time.Many different methods for the recording and playback of sound exist, utilizing different combinations of microphone and loudspeaker setups.The most common one is a simple stereo reproduction, but there are more complex reproduction techniques, such as wave field synthesis [1] or ambisonics [2].Even though the state-of-the-art methods achieve a very good accuracy in reproducing sound fields, they do not consider the interaction between the acoustics of the recording and playback environment.In particular, extra reverberation is created by the playback environment, and in addition, there is no control over the spatial distribution of the reverberant sound field, which may influence the apparent source width and perceived listener envelopment.For this reason, ongoing investigations aim to improve the performance of these methods.
In particular, Grosse and van de Par proposed a new way of recording and playing back sound fields [3].The main idea behind their research was to record the direct and reverberant sound field separately in order to be able to render it in a playback room while optimizing certain perceptually-motivated criteria for the authentic audio reproduction.These criteria aim for recreating the reverberant sound field in the playback environment as faithfully as possible by optimizing the amount and spectral shape of the reverberation, as well as the interaural cross-correlation created by the reproduced reverberant sound field, such as it is created in the reproduction room, including its added reverberant effect.In their paper, Grosse and van de Par assumed that optimizing these perceptual criteria is sufficient for an authentic reproduction of the sound field present in the recording room, which is created by a single source.This claim was supported by subjective evaluations.The playback and recording configuration can be seen in Figure 1.In addition to the two basic stereo loudspeakers, the proposed approach used two dipole loudspeakers to excite and equalize the reverberant sound field.For the optimized rendering, the system relies on the presence of a relatively dry direct signal to be rendered on the frontal loudspeakers and a reverberant signal to be optimized and rendered on the dipole loudspeakers.To record the direct sound, a microphone (C) was positioned close to the sound source.This also avoided early reflections, which could cause a change in coloration [4,5].For recording the reverberant sound field, two microphones (B l , B r ) were placed at two distant positions in the diffuse field.Since the method of Grosse and van de Par [3] until now is limited to a single source and only records the direct sound field with one microphone, an extension is needed to also represent the spatial distribution of sources within the direct sound field signals as perceived at the listener position.Although this could in principle be achieved by using multiple close microphones and an appropriate mixing scheme, in this contribution, we want to provide a method with only a single 'true-stereo' microphone setup that is placed at the intended listener position within the recording room.Particular attention has to be paid to reduce the reverberant sound field in the direct sound field signals to be able to separately optimize the rendering of the direct and reverberant sound fields according to perceptual criteria within the playback room [3].
Although the specific design criteria for the proposed microphone array are envisioned to be used in the audio reproduction system of Grosse and van de Par [3], it can also be considered to use the proposed microphone array to record a relatively dry spatial image of the sound sources on stage to be combined with a reverberant track that can be mixed at a level that the recording engineer deems suitable.In this case, however, it will not necessarily fulfill the optimization criteria as formulated in Grosse and van de Par [3] that create a faithful audio reproduction.
The state-of-the-art true stereo systems combine two microphones with a characteristic directivity pattern, placed at different distances and under different angles relative to one another.Depending on these parameters, a deviating spatial rendering of the distributed sources can be observed [6].Despite this, for use in the method proposed by Grosse and van de Par [3], these systems have some disadvantages that make them unsuitable to be implemented in this specific sound reproduction system because there is a high percentage of recorded reverberant sound, which should be avoided in the system of [3].
We overcome these disadvantages with the development of a new method of a true stereo microphone array, using a superdirective beamforming algorithm that is applied on two logarithmically-spaced microphone arrays.Correct, frequency-dependent interchannel level differences are captured by optimizing the shape of the two main lobes of the arrays.Together, they create the proper interchannel level difference required for an accurate spatial reproduction of the sound field while ensuring that no interchannel phase differences occur that can result in unintended changes in the perceived location of sound sources.Additionally, an optimal side lobe suppression is applied to reduce the influence of the reverberant sound field on the recording of the direct sound.This proposed stereo microphone array is compared to the state-of-the-art stereo microphone configurations mentioned earlier that shows a clearly reduced level of the reverberant sound field.

Methods
The following section is divided into five parts.The first Section 2.1 gives a brief introduction to the most relevant theory on beamforming needed for our proposed method.Section 2.2 focuses on the issue of the robustness of beamforming algorithms.The desired directivity pattern is specified in Section 2.3, which is based on a stereo intensity-panning rule related to the auditory processing of the interaural level differences.Section 2.4 introduces an optimal array design to suppress side lobes and, in this way, reduce the influence of the reverberant sound field on the recording of the direct sound.Further, a specific filter design is proposed in Section 2.5, which will be used and evaluated throughout this study.The design is based on a superdirective beamforming algorithm and describes how the directivity pattern that is specified in Section 2.3 can be used for the optimization.

Beamforming
Beamforming describes the process of forming the directivity pattern of several microphones, which are arranged into an array, with signal processing techniques to obtain a specific, frequency-dependent directivity pattern.The directivity pattern b( f , φ) of a linear discrete microphone array, consisting of N microphones, is calculated as follows [7]: where φ denotes the angle ranging from −π to π, f the frequency, w n ( f ) the frequency-dependent complex weighting filtering applied to microphone n and G n ( f , φ) the steering vector denoting the direction and frequency-dependent transfer function from the sound source to microphone n.Such a microphone array is illustrated in Figure 2.
Microphone array receiving a signal with frequency f and angle of incidence φ.The incoming wavefront is captured with a microphone n, modified with the respective filter w n and, at the end, summed up to form the directivity pattern b( f , φ).
Assuming a far field condition with the microphones that have an omnidirectional directivity pattern, the transfer function states: where c is the speed of sound and x n represents the distance of the n-th microphone to the center of the array [7].
The influence on the directivity patterns of the microphones in the array can be taken into account by changing the transfer function G n .The filter optimization used to match the directivity pattern of the array with a desired one is called beamforming.The look direction of the microphone array is defined as the angle of the main lobe of the desired directivity pattern, which is also called the steering angle.
There are several beamforming algorithms based on an analytic solution for the optimal filter w n ( f ) and some others on a numerical approximation.Analytic solutions allow us to set N constraints on the directivity pattern for a finite number of frequencies, as for example described in [8].Since we have a higher number of constraints in our problem, we will use numerical methods that allow accommodating a higher number of constraints to control the directivity pattern.
Equation (1) will be solved numerically, and for this purpose, the frequency range is discretized into P frequencies f p , p = 0, . . ., P − 1 and the angular range into M angles φ m , m = 0, . . ., M − 1: Equation ( 3) is reformulated in matrix notation as: where the directivity pattern is an x n cos(φ m ) and the filter a N × 1 vector [7].All bold variables are either vectors or matrices in the remainder of this manuscript.

Robustness and White Noise Gain
One of the problems that beamforming algorithms often have is their lack of robustness.This property is related to a resistance to the presence of spatially white noise and can be impaired by deviations from the specified microphone characteristics and microphone position errors.These imperfections affect the beamformer in a manner similar to a recorded spatially white noise that is amplified.Hence, the White Noise Gain (WNG) is a measure commonly used for quantifying the robustness of a beamformer design.The WNG shows the ability of a beamformer to suppress spatial white noise, because it expresses the gain of the beamformer in the desired look direction relative to the amplification of spatially white noise.
The WNG A( f p ) is defined as follows: where b steer ( f p ) denotes the value of the directivity pattern in steer direction [7].A high value of the WNG A( f p ) > 1 corresponds to a robust beamforming design, whereas a small value A( f p ) < 1 effectively corresponds to an amplification of spatial white noise [7].The maximum possible value of the WNG is equal to the number of microphones used: which corresponds to a uniform filter [7]:

Desired Directivity Pattern
The playback of the recorded signals should be in a stereophonic configuration, as mentioned in Section 1 and illustrated in Figure 3a.The playback approach proposed by Grosse and van de Par [3] uses two loudspeakers for the direct sound reproduction with a typical base angle of φ base = 60 • relative to the listener's position [9].There are several approaches to shift a phantom source from one loudspeaker to the other, utilizing phase differences ∆phase = phase 1 − phase 2 and/or level differences (amplitude panning) ∆Level = Level 1 − Level 2 applied on the two loudspeaker signals.
Based on this playback configuration, the recording configuration presented in this paper consists of two crossed end-fire microphone arrays with a 60 • opening angle, sharing one center microphone and using omnidirectional microphones, illustrated in Figure 3b.The microphone positions in this figure can only be considered as a sketch, the absolute positions can be found in Section 3. The phantom-source shifting approaches of the playback configuration can be used to formulate either the correct phase and/or level differences between the two arrays.In this way, the perceived location of the sound source in the playback situation is identical to the one of the recording provided that the distribution of recorded sound sources does not span more than 60 • of angle.Although not evaluated here, in principle, a different opening angle could be used for the microphone arrays, thus effectively compressing or expanding the reproduced sound stage.We restrict our proposed method to have only level differences, and for this reason, the desired directivity pattern b is purely real valued.With this desired directivity pattern, the phase of the directivity pattern is mainly controlled by the array design, which will be explained in Section 2.4.
In this paper, the phantom source shifting approach of amplitude panning is used for formulating the desired directivity pattern of Array 1 barray1 and Array 2 barray2 [9]: The angle area φ δ between both arrays is defined by: with the constant φ b = φ base = 60 • .The derivation of the desired directivity patterns according to [9] gives two possible recording room assumptions: an anechoic chamber or a real room.The latter one is chosen for Equation ( 8) since the microphone array configuration will be used in real rooms, such as concert halls.
The desired directivity pattern of the one array is the mirror-flipped version of the other array.This symmetry of the recording configuration makes it possible to formulate one desired directivity pattern, which is the same for both arrays.The following parts of the desired directivity pattern, the first bbeam valid for the beam area and the second bsteer valid for the steering angle, consider a microphone array aligned on the 0 • axis corresponding to the steering angle φ steer = 0 In the following subsections, an optimal array design in terms of optimal microphone positions and an optimal filter design is proposed to achieve the desired directivity pattern.

Array Design
The positions of the microphones have an influence both on the filter w n ( f p ) and the transfer function G mn ( f p ), and thus, on the directivity pattern itself.The optimal microphone positions selected for this paper maximize the spatial aliasing frequency and, at the same time, minimize the frequency from which beamforming is effectively possible.The spatial aliasing frequency describes the lowest frequency f al for which aliasing effects occur, which is caused by a spatial undersampling of the array for sound waves at high frequencies.The aliasing leads to side lobes with the same amplitude as the main lobe.The spatial aliasing frequency of an array with linear microphone spacing is usually given in the literature as: with x as the space between the microphones [10].
A small microphone spacing sets an upper limit to the spatial aliasing frequency.In contrast, a large microphone spacing sets a lower limit to the frequency from which beamforming is effectively possible.In order to have good directional properties of the microphone array across a wide frequency range, an irregularly-spaced microphone array is used in which both kinds of spacing can occur.A linear-shaped, logarithmically-spaced, to the reference microphone (n = 0), symmetrical array is used in this paper.Consequential, the number of the used microphones N has to be uneven (N ∈ N U ).
The symmetry around one central microphone ensures a purely real directivity.The microphone positions are calculated as follows [11]. with: where Length is the total length of the array.The array parameter l spread ∈ R >0 is a free variable describing the ratio between the spacing of the microphones at the extremities of the array and the spacing of the microphones at the center of the array.Linear microphone spacings are archived with l spread = 1.If l spread < 1, the spacing of the microphones at the extremities of the array is smaller than the one at the center of the array.In the case of l spread > 1, it is the opposite.

Filter Design
In this section, an optimal filter design is proposed to fit the directivity pattern of the array, whose design was specified in Section 2.4, to the desired directivity pattern specified in Section 2.3.The following filter design is based on numerical convex optimization and has the advantage that only one global minimum exists.In general, this end-fire design can also be used with different desired directivity patterns and array designs.In Section 3, we indicate the ideal values of the constants for the desired directivity pattern and array design proposed in this study.
The aim of this algorithm is to minimize the quadratic error error m between the directivity pattern obtained by a microphone array b m ( f p ) and a desired frequency independent directivity pattern bm [7]: (14) This minimization task will be subjected to additional constraints, and therefore, the beamformer will be termed the Constrained Least-Squares Beamformer (CLSB).
In the following subsections, the main minimization task and the used constraints will be explained paying particularly attention to the WNG and different spatial areas.These areas are shown in Figure 4.
Additionally, this optimization process is placed within an optimization loop in order to optimize several important constants.This optimization procedure will be explained in the last subsection of this section.
φ steer End-fire microphone array

White Noise Gain
Such a convex optimization procedure allows including a frequency-dependent lower bound γ( f p ) for the WNG when optimizing the filters w n ( f p ) [7]: This constraint has a direct influence on the robustness and on how well the desired directivity pattern can be achieved.A high value for the lower bound reduces the accuracy of forming the directivity pattern because the filter is too restricted by this constraint, whereas a low value leads to a not robust filter.In Section 3, an optimal value for this lower bound will be discussed.

Steering Angle
In the direction of the steering angle φ steer , representing the direction of the main lobe of the microphone array, the directivity pattern obtained by the array is constrained to the value of the desired directivity pattern [7]: In this way, the directivity pattern is normalized to bsteer .The steering angle is limited to the array-axis, since the goal is an end-fire array.

Beam Area
The area around the steering angle is the beam area, which defines the main lobe of the directivity pattern: φ steer−1 and φ steer+1 indicate one discrete angle before and after the steering angle, respectively.The constant φ b can be chosen freely and defines the width of the beam area.Fitting the directivity pattern to the desired one, an angle-dependent upper bound beam is set to the error (cf.Equation ( 14)) in this area: where abs() denotes the absolute value of every entry of the vector argument.In this case, beam is a column vector with as many entries as the directivity pattern in the beam area.

Unconstrained Area
An angle area without any constraints is defined to avoid an effective discontinuity in the intermediate zone between the beam and the stop area, which would have a negative impact on the optimized solution that would be obtained: The constant φ u can be chosen freely and defines the width of the unconstrained area.

Stop Area
The remaining area is called the stop area: The main optimization task is applied to this area.In the context of this work, the sound from this direction can be assumed to be mainly reverberant sound that does not belong to the direct sound and is therefore undesired.For this reason, the desired directivity pattern in this area is set to zero to suppress sound coming from this area as much as possible [7]: with bstop = 0 (21) In addition to this optimization, an upper bound sto p is set to the uniform norm of the directivity pattern: This upper bound is not angle-dependent, but restricted to the stop area because of the uniform norm and will play an important role in the following loop design.

Loop Design
Choosing the correct upper bound for the beam area is difficult: on the one hand, a low upper bound for the beam area leads to a good fit in this area (low error beam values), but to undesired side lobes in the stop area (high error sto p values).Consequential, the direct sound will be recorded correctly, but is mixed with the undesired reverberant sound field, which should be ideally suppressed.On the other hand, a high upper bound for the beam area leads to the opposite, a bad fit in the beam area (high error beam values), but low undesired side lobes (low error sto p values).The following loop design finds a frequency-dependent optimal upper bound for the beam area, which is a compromise between a good fit in the beam area and only small side-lobes in the stop area.
As a first step in the loop design, the upper bound of the beam area is initialized in matrix notation: The rows cover the beam area, whereas the columns cover the different iterations of the following loops with k as the counter, where k = K indicates the last iteration.The upper bound starts in the first iteration with k=1 beam = 0 and continues linearly spaced with step size α.The step size is designed in such a way that the maximum value of the upper bound of the beam area bsteer − b(φ steer ± φ b ) is reached in overall K steps.Either b(φ steer − φ b ) or b(φ steer + φ b ) can be chosen to calculate α, since they are equal according to the symmetry of the desired directivity pattern.The upper bound then ends with the difference between bsteer and bbeam at the row specific angle.If this difference is reached before the last iteration (k < K), this value will stay till this iteration is reached.This will be the case for every row, except the first and the last one.This procedure ensures that bsteer stays the maximum value of the directivity pattern.
In contrast to the upper bound of the beam area, the bound of the stop area is initialized as a vector, since there is no angle dependency: The entries with the counter l, where l = L indicates the last iteration, correspond to the iterations of the following loops and are linearly spaced.The constant b f irst sto p controls the maximum allowed value of the directivity pattern in the stop area for the first iteration.
The loop design itself can be seen in Figure 5 and is repeated for every frequency f p , where the constants K temp and K step can be chosen freely so that K/K temp ∈ N and K temp /K step ∈ N, respectively.These two constants regulate the part of the upper bound of the beam area, which is used in the looped optimization process.K / K temp K temp / K step ε beam = eq.23 ε stop = eq.24 The first loop repeats the optimization with the first part of the upper bound of the beam area (from k=1 beam to k=K temp ≤K beam ) till Equation ( 22) with 1 sto p is true.A result of the optimization, fulfilling Equation ( 22), is denoted as valid.If this is not the case, Loop 2 repeats Loop 1 with different upper bounds of the stop area (from 2 sto p to L sto p ).If still no valid result is found, Loop 3 increases K temp with the step width of K step .The upper bounds, for which the loop design finds a valid solution, are denoted as optimal o pt beam and o pt sto p .The filter w, which corresponds to these upper bounds, is also denoted as optimal w o pt .For the case that K step increases K temp over K (K temp + K step > K), the last k = K calculated result of the optimization is taken as a valid solution.

Setup
The following setup is used for the numerical simulations, whose results are described in Sections 4 and 5.The angular range is discretized into M = 360 linearly-spaced angles {φ 0 = 0 • , φ 1 = 1 • , . . ., φ 359 = 360 • }.The frequency range covers the range of f p=0 = 0 Hz to f p=256 = 24 kHz generated at a sampling rate of f s = 48 kHz using a filter length of 512 samples.This results in P = 257 linear spaced frequency bins.This frequency range covers the spectral content of music [12] that is to be recorded by these microphone arrays.To obtain impulse responses of the filters, the complex spectrum was mirrored, conjugated and transformed towards the time domain via an ifft .
The microphone array consists of N = 9 omnidirectional microphones and has a total length of Length = 1 m.The array design is done with l spread ≈ 35, so that the smallest microphone spacing (s) in the center of the array is s = 0.01 m.Following that, the spatial aliasing frequency can be maximized to a frequency of f al ≈ 17, 000 Hz.For practical reasons, the limitation is set to s = 0.01 m to ensure enough space for the microphones.The absolute microphone positions are set as follows (displayed in millimeter precision): After having specified the microphone positions, the convex functions of the CLSB, shown in Section 2.5, are solved utilizing CVX, a package for specifying and solving convex programs [13,14].Parts of these convex functions are the WNG constraint and the loop design.
For the WNG constraint, the lower bound γ for the WNG A( f p ) is set up as follows: for f p = 0 Hz CSI for 0 Hz < f p < 187.5 Hz The lower bound starts with γ( f p = 0 Hz) = 5 and ends with γ( f p ≥ 187.5 Hz) = 1.In the intermediate zone, a Cubic Spline Interpolation (CSI) connects both points.The CSI in the intermediate zone avoids rapid changes of the directivity pattern across frequency below ( f p < 187.5 Hz).In the high frequency range ( f p ≥ 187.5 Hz), a lower bound of γ = 1 ensures a robust beamforming design.
For the loop design, the constants are set up as follows: The constants φ b and φ steer , as well as the parts of the desired directivity pattern bbeam and bsteer are set up according to Section 2.3.
The values of the constants K, K temp and K step are chosen in such a way that Loop 1 scans the beam area from k=1 beam = 0 in steps of α = 0.01 till If necessary, Loop 3 increases the value of the upper bound of the beam area according to the value of the constant K step (cf.Section 2.5).
An increase of the value of the constant K leads to an improvement in the beam area (lower error beam values), because the step size α is smaller.The validity (cf.Section 2.5) of more possible directivity patterns with small error beam values is checked by the loop design.In fact, to find a valid solution, Loop 2 has to increase sto p further than before, which leads to a worsening in the stop area (higher error sto p values).A decrease of the value of the constant K leads consequently to the opposite effect.
An increase of the values of the constants K temp and K step leads to a worsening in the beam area (higher error beam values), because the first end point of Loop 1 , as well as all of the other ones k=K temp +K step +K step +••• beam is now higher.More possible directivity patterns with high error beam values are checked by the loop design: Loop 2 does not have to increase sto p so much than before, because these directivity patterns are in general more likely to be valid.This leads then to an improvement in the stop area (lower error sto p values).A decrease of the values of the constants K temp and K step leads consequently to the opposite effect.
The values of the constants L and b f irst sto p are chosen in such a way that Loop 2 scans the stop area from l=1 sto p = 0.2 in steps of ( bsteer − b f irst sto p • bsteer )/(L − 1) = 0.1 till l=L sto p = bsteer = 1.An increase of the value of the constant b f irst sto p and at the same time a decrease of the value of the constant L, preserving the step width of 0.1 as mentioned earlier, lead to a worsening in the stop area.The start point of Loop 2 is now higher, allowing higher error sto p values from the beginning.It is now easier for Loop 1 to find a valid solution, which leads to an improvement in the beam area.A decrease of the value of the constant b f irst sto p and a coherent increase of the value of the constant L lead to the opposite effect.
Overall, it can be said that a variation of the values of the constants K, K temp , K step , L and b f irst sto p leads to a changed balance, fulfilling the constraints between the beam and the stop area.For every desired directivity pattern and intended purpose of the microphone array has to be found separately optimal values.A variation of the value of the constant φ u does not significantly change the results in terms of the error in the beam and the stop area.Nevertheless, the value should not be chosen too big to avoid undesired results (very big differences between the obtained and the desired directivity pattern), since there is no control over the directivity pattern in the unconstrained area.The maximum value of φ u till there are no undesired results depends in a complex manner on the number of used microphones and the desired directivity pattern.
With the setup shown in Equation (26), we achieved best results in fitting the directivity pattern to the desired one.Different initializations of the constants are also possible, as mentioned before (a detailed analysis of the effect on the results regarding the variation of the constants' values given in Equation ( 26) is beyond the scope of this article).Our results are, however, discussed in the following Sections 4 and 5.

Objective Evaluation
The following section is divided into four parts.In Section 4.1, two array designs are compared to each other to show the improvement of the spatial aliasing of a logarithmically-spaced array over a linearly-spaced one.In the second Section 4.2, the new stereo system proposed in this study is compared to the state-of-the-art ones, which utilize two microphones.In the third Section 4.3, the WNG constraint and the frequency response are analyzed.Finally, in the last Section 4.4, the angular constraints, as well as the phase of the directivity pattern are investigated.

Directivity Index Comparison
The directivity pattern of the logarithmically-spaced array (l spread ≈ 35, s = 0.01 m) is more directive for high frequencies than the one of a linearly-spaced array (l spread = 1, s = 0.125 m) having the same total length of Length = 1 m.Less reverberant sound is recorded by the first type of array than by the latter one.As a measure, we choose the directivity index D I, which is the logarithm of the directivity D [15]: In fact, Figure 6 shows that the linearly-spaced array has lower D I values for high frequencies ( f p > 1200 Hz) than the logarithmically-spaced one.This is caused by aliasing effects, as the aliasing frequency for the linearly-spaced array is f al ≈ 1460 Hz.There is a big drop of the D I values (D I < 7 dB) for the logarithmically-spaced array for very high frequencies ( f p > 10, 500 Hz), which is also caused by aliasing effects.The lowest values of the D I for the logarithmically-spaced array are located around the aliasing frequency f al (∆x = s) ≈ 17, 000 Hz.

Comparison Stereo Systems
The necessary phase and/or level differences for a stereophonic recording as mentioned in Section 2.3 can also be obtained by only two microphones.Different angles and distances between these two microphones, as well as different microphone directivity patterns are possible, as described, for example, by the A-B or the X-Y technique [12].A unified theory of these two-microphone systems for stereophonic sound recording can be found in [6].
Assuming no phase differences, this theory states that a level difference of ∆Level = ±15 dB determines the left or right lateral shift towards the loudspeakers of a phantom sound source in the playback situation.This level difference is achieved in the recording situation with different angles between two microphones with specific directivity patterns.The angle covering this level difference is called recording angle φ rec .If φ rec > φ base , the recorded sound scene is compressed in the playback configuration, whereas φ rec < φ base , the recorded sound scene is expanded [6].Therefore, we can assume that if we have φ rec = φ base , the recorded spatial properties are the same after playback.Table 1 shows the possible microphone directivities and base angles between the microphone pairs.
The microphone array stereo system described in this study records less reverberant sound than these state-of-the-art two-microphone stereo systems.As a measure, we choose a modified definition of the directivity index DI mod , which is the logarithm of a modified directivity D mod , mentioned in Section 4.1: where b mic1 (φ m ) and b mic2 (φ m ) are the directivity patterns of the first and the second microphone, respectively.The modified directivity index includes the sum of the directivity patterns of the two microphones.The modified directivity index considers the angle between these two directivity patterns, which determines the percentage of recorded reverberant sound in addition to the directivity pattern itself.As shown in Table 1, the proposed microphone array stereo system is, in fact, more directive than the two-microphone stereo ones, taking also into account the angle between the two microphone arrays.
Table 1.The modified directivity index DI mod of the state-of-the-art two-microphone stereo systems and the microphone array stereo system described in this study.For the latter one, the desired directivity patterns are used.Only stereo systems with φ rec = φ base are displayed.This angle constraint avoids angular compression or angular expansion in the playback situation.

WNG and Frequency Response
The algorithm successfully fits the WNG A( f p ) to the lower bound γ( f p ) specified in Section 3, as shown in Figure 7a.This ensures a robust beamforming design.For high frequencies f p ≥ 7031 Hz, the algorithm finds even higher WNG values than the lower bound.
Figure 7b shows the frequency response of both arrays according to the configuration that is shown in Figure 3.The responses for both arrays were calculated for a sound source emanating from φ = 30 • (resulting in a sound source perceived at the location of the left loudspeaker, solid and dashed line) according to Figure 3 and φ = 0 • (resulting in a phantom source between both speakers, dotted and dash-dotted line).It can be seen that for φ = 0 • , the responses of both arrays show a high similarity in terms of level differences and have only minor fluctuations of approximately ±2 dB above 1000 Hz.Below 1000 Hz, it can be observed that there is a boost of approximately 3 dB, which might be attributed to a violation of a constraint at low frequencies.When the sound source is emanating from φ = 30 • , a flat frequency response can be observed for Array 1 (on axis) with minor fluctuations of approximately 1 dB across frequency.Array 2 shows a considerably lower level, but larger fluctuations.It can be assumed that these fluctuations will not be perceivable because the location of the sound source will be determined by Array 1.

Beam and Stop Area Constraints
The results of the loop design mentioned in Section 2.5 are shown in Figure 8.This loop design finds a compromise between a good fit in the beam area and low directivity pattern values in the stop area.For low frequencies f p < 187.5 Hz, the directivity pattern is quite omnidirectional ( error stop ∞ > 0.2 and error beam ∞ > 0.1), so that Loop 3 has to increase beam to opt beam ∞ > 0.1.For higher frequencies f p ≥ 187.5 Hz, there is a good fit in the beam area error beam ∞ ≤ 0.1 so that Loop 1 and Loop 2 find the ideal upper bound for the beam and the stop area.Overall, it can be said that the best result is found in the frequency range of 281.3 Hz ≤ f p ≤ 1969 Hz: a good fit in the beam area combined with low directivity pattern values in the stop area error stop ∞ ≤ 0.2.At high frequencies ( f p ≥ 16, 690 Hz), Figure 8b shows aliasing effects ( error stop ∞ = 1), which are expected, since the aliasing frequency of the logarithmically-spaced array is f al (∆x = s) ≈ 17, 000 Hz.
Figure 9 shows the polar plot of the desired directivity pattern in addition to the absolute value of the directivity patterns of the frequencies f p = 250 Hz, f p = 1000 Hz, f p = 4000 Hz and f p = 8000 Hz.For all frequencies, there is a good fit (a small difference between desired and obtained directivity pattern) in the beam area, as already quantified by Figure 8a.Comparing the side-lobe-levels of the different frequencies, the following can be stated: the side-lobe-level decreases from f p = 250 Hz to 1000 Hz; there is no big difference in side-lobe-level between f p = 1000 Hz and f p = 4000 Hz; the side-lobe-level increases from f p = 4000 Hz to f p = 8000 Hz.This analysis is described in a quantified matter in Figure 8b.The omnidirectional behavior of the directivity pattern up to f p = 187.5 Hz can be also seen there.For higher frequencies, side lobes appear at φ m = ±180 • and move with increasing frequency into the direction of the beam −60 • ≤ φ m ≤ 60 • .Aliasing effects can be seen in Figure 10a, like in Figure 8b.In addition to the absolute value of the directivity pattern, the phase arg(b( f p , φ m )) is represented in Figure 10b.
The directivity pattern is purely real: the phase shows only three possible values arg(b) = {−π, 0, π} as mentioned in Section 2.3.In the beam area, the phase has, in fact, only values arg(b) = 0, which leads to no phase differences between the two arrays in the recording configuration mentioned in Section 2.3.

Subjective Evaluation
In this section, the proposed microphone array is subjectively evaluated.For this purpose, a listening experiment was performed, whose results are shown.

Subjective Evaluation: Localization Accuracy
In order to evaluate the proposed stereophonic-microphone array in terms of localization accuracy when simulating spatially-distributed sound sources, subjective data were obtained in a localization experiment within a real room from listeners.The loudspeaker signals were generated using a single sound source and by simulating the delays between the microphones and the sound source.The optimized filters w opt were applied on each microphone signal to obtain the output signal for the left and right array, which was then played back via the two loudspeakers during the listening experiment.The loudspeaker and array configurations are shown in Figure 3.
The sound sources were placed on virtual locations between −30 • and +30 • in a five degree resolution, resulting in a phantom source stereo image based on intensity-panning between the left and the right loudspeakers.The evaluation took place in a reverberant room with the dimensions (7.5, 7.1, 2.97) m with a reverberation time of T 60 = 0.45 s.The distance between the loudspeakers was 3 m, and the listeners were seated at the position that created a 60 • stereo triangle with the loudspeakers (cf. Figure 3).As a source signal, three short pink noise bursts with a total length of 1.1 s were presented to the listeners.The noise covered a frequency rang from 100 Hz to f s /2 covering the spectral content of musical signals.Data were obtained from seven listeners, and the 13 source position angles were presented in random order.For each subject, the experiment covered one training session and three measurement sessions.The task of the participants was to indicate the perceived direction between the loudspeaker using indicators placed between the loudspeakers in five degree steps.

Subjective Evaluation: Results
Figure 11 shows the perceived directions of the subjective evaluation.The dotted line indicates perfect correspondence between the true source location and the perceived location.Circles show the average perceived location in dependence of the simulated source location.As can be seen, there is a rather linear behavior on localization, indicating a mostly precise representation of the presented directions.Exceptions can be observed around ±20 degrees at which the presented source is perceived more lateral than the simulated source location.The maximum localization error of ≈6 degrees that can be observed can probably be attributed to the target functions that were used to optimize the directivity pattern, which may cause too high level differences when both arrays are used in combination.

Discussion and Conclusions
In this study, a new approach for intensity stereophonic recording has been investigated.Guided by the playback situation and its auditory requirements, we decided to postulate a setup consisting of two crossed end-fire microphone arrays and a fitting desired directivity pattern.The difference between the directivity pattern obtained and the one desired was minimized by a superdirective beamforming algorithm.It was based on convex numeric optimization and also contains a frequency-dependent WNG constraint to ensure a robust beamforming design.
In addition to designing the filters of the microphones via beamforming algorithms, we found an ideal array design.This design maximizes the spatial aliasing frequency and also takes practical issues into account, which will appear in an actualization of the arrays.The extent of the microphones demands a particular spacing, also to avoid interferences between them.
A comparison between the new stereo system and the state-of-the-art ones, which use two microphones, has shown that the former has the advantage of less recorded reverberant sound, as it is more directive in the look direction than the latter are.This matches the requirements posed by the recording method proposed in Grosse and van de Par [3], which requires separate dry and reverberated representations of the audio signal.The reverberated sound field can be taken from single microphone signals.
Future research could develop a method to optimize the directivity pattern of both arrays as one system rather than handling them separately.Furthermore, two additional beams pointing into the diffuse field could be introduced for optimization to replace the two microphones placed in that field and to use only the array system.
A final assessment of the proposed recording and playback system needs to run listening tests and investigate the perception of the recording and playback room.

Figure 1 .
Figure 1.Recording and playback configuration with a processing stage in between to maintain the acoustical perception of a recording room.The microphone (C) records the direct sound, which is played later by two conventional loudspeakers, whereas the two microphones (B l ) and (B r ) record the reverberant sound field, which is played later by two dipole loudspeakers.Figure reproduced with permission from [3], Copyright IEEE, 2015.

Figure 3 .
Figure 3.The stereophonic recording configuration is based on the playback one.Recorded level and phase differences with the two end-fire microphone arrays generate a phantom source between the two loudspeakers in the playback configuration.The signal emitted from Loudspeaker 1 has the level Level 1 and the phase phase 1 .The signal emitted from Loudspeaker 2 has the level Level 2 and the phase phase 2 .(a) Typical stereophonic playback configuration[9]; (b) proposed stereophonic recording configuration with sketched microphone positions.The absolute microphone positions are shown in Section 3.

Figure 4 .
Figure 4. Different spatial areas in the directivity pattern optimization problem.The steering angle φ steer , the beam area φ beam (indicated by horizontal hash lines), an area without any constraints φ unconstrained (indicated by crossed hash lines) and the stop area φ stop (indicated by vertical hash lines).

Figure 5 .
Figure5.Loop design to determine the optimal filter, as well as the optimal upper bound for the beam and the stop area.

Figure 6 .
Figure 6.Directivity index DI( f p ) of a linearly-spaced array (l spread = 1, s = 0.125 m) (dashed line) and the logarithmically-spaced one (l spread ≈ 35, s = 0.01 m) (solid line) with the same total length of Length = 1 m.

Figure 7 .
Figure 7. (a) White Noise Gain (WNG) A( f p ), as well as the lower bound for the WNG γ( f p ) across frequency; (b) shown are frequency responses of both arrays for two sound sources emanating from φ = 30 • and φ = 0 • according to the configuration illustrated in Figure 3.

Figure 8 .
Figure 8.The difference between the simulated directivity pattern and the desired one (error) in the beam (a) and the stop (b) area, as well as the corresponding upper bounds of both areas as function of the frequency.

Figure 9 .
Figure 9. Polar plot of the desired directivity pattern (grey markers) and the absolute value of the obtained directivity patterns of the frequencies f p = 250 Hz (solid line), f p = 1000 Hz (dashed line), f p = 4000 Hz (dashed-dotted line) and f p = 8000 Hz (dotted line).

Figure
Figure10aallows for a more detailed analysis, as it shows the absolute value of the difference between the directivity pattern and the desired one in the whole angular range |error(φ m , f p )|.The omnidirectional behavior of the directivity pattern up to f p = 187.5 Hz can be also seen there.For higher frequencies, side lobes appear at φ m = ±180 • and move with increasing frequency into the direction of the beam −60 • ≤ φ m ≤ 60 • .Aliasing effects can be seen in Figure10a, like in Figure8b.

Figure 10 .
Figure 10.The difference between the directivity pattern and the desired one |error( f p , φ m )| (a), as well as the phase of the directivity pattern arg(b( f p , φ m )) (b).

Figure 11 .
Figure 11.Illustrated are the mean-values of the perceived angle of incidence with the standard deviation across seven participants' means.The x-axis represents the simulated angle of incidence φ of the presented noise sources.The dotted line indicates a perfect match between simulated and perceived localization.