Power Response and Modal Decay Estimation of Room Reﬂections from Spherical Microphone Array Measurements Using Eigenbeam Spatial Correlation Model

: Modal decays and modal power distribution in acoustic environments are key factors in deciding the perceptual quality and performance accuracy of audio applications. This paper presents the application of the eigenbeam spatial correlation method in estimating the time-frequency-dependent directional reﬂection powers and modal decay times. The experimental results evaluate the application of the proposed technique for two rooms with distinct environments using their room impulse response (RIR) measurements recorded by a spherical microphone array. The paper discusses the classical concepts behind room mode distribution and the reasons behind their complex behavior in real environments. The time-frequency spectrum of room reﬂections, the dominant reﬂection locations, and the directional decay rates emulate a realistic response with respect to the theoretical expectations. The experimental observations prove that our model is a promising tool in characterizing early and late reﬂections, which will be beneﬁcial in controlling the perceptual factors of room acoustics.


Introduction
In any enclosed acoustic space, the sound received by a listener is the superposition of the direct sound from the source and the reflected sounds from the surrounding surfaces. The numerous reflections termed reverberation cause persistence of sound even after the source ceases, until these reflected waves decay due to absorption by the surrounding surfaces. The intricate sound field generated by these reflected waves provides the sense of acoustic space to the perceived sound. However, severe reverberation can cause spectral distortions and reduce speech intelligibility. The study of reverberation is complicated since it is a product of many factors like sound frequency, room shape, room size, room geometry, source and receiver locations, source and receiver directivity, etc. A comprehensive understanding of the reflection sound field distribution, resonant frequencies, and modal decay rates is necessary to control audible artifacts and achieve desired sound perception quality in room acoustic applications.
Initially, the objective parameters like reverberation time, percentage articulation (PA) [1], decay rates [2], and statistical measures of room impulse responses (RIR) [3] were the only measures of reverberation. However, later studies [4,5] found that these measures vary with the sound frequency and wall surface properties. This necessitated the frequencydependent spatio-temporal analysis of sound fields for accurate characterization of room acoustics. The existing 3D room acoustic parameter estimation methods either depend on predictions based on computational acoustics or derive the parameters directly from real sound field measurements. The room acoustic analysis using prominent computational models like ray/geometrical [6,7], wave/element [8], statistical energy [9], or synthetic RIR [10,11] methods are computationally complex and applicable to limited frequency ranges. The lack of proper consideration of the source and environment factors, frequencydependent wave behavior, and precise reflection methods reduce the estimation accuracy of these computational approaches, especially in highly reverberant environments [12]. Furthermore, the analysis of intermediate frequencies using these computational models is complicated because of the dominant diffraction effects and the influence of both wave and ray acoustic behaviors.
The characterization of real acoustic environments requires 3D acoustic scene analysis using spatial sound field measurements. This led to the development of several microphone arrays designs [13][14][15] and processing methods like sound intensity mapping [16], plane-wave decomposition (PWD) and steered beamforming [17][18][19], sound intensity vector analysis [20], and multi-channel correlation model [21]. Gover et al. used PWD beamforming in [18] to estimate the angular distribution and anisotropy index of the spatial sound field from the RIRs recorded by a spherical microphone array. The recent works in [22][23][24] allow similar analysis in terms of isotropy measures and directional energy decays using Schroeder integration [25] and PWD of directional RIRs. However, these methods require a large number of RIR measurements for an accurate analysis of the room acoustic field. This problem was overcome with the introduction of higher-order spherical harmonic (eigenbeam)-based processing of spherical microphone array measurements [12,[26][27][28], which provided higher spatial resolution for analysis compared to the previous methods. Subsequently, more robust techniques [29][30][31] were developed to achieve efficient parameterization of the spatial sound field using modal decomposition. In [32], the eigenbeam rotational invariance technique (EB-SPRIT) was used to identify room modes and damping parameters from RIRs. In [33,34], Samarasinghe et al. used the spatial correlation of higher-order eigenbeams to estimate the directional characteristics of the reverberant field, and this approach was able to achieve an accurate estimation of direct-to-reverberant energy ratio and dominant reflection directions.
The majority of the existing methods of directional characterization of room reflections derive the parameters from the aggregate sound field formed by the direct and reflected waves. Even though the direct path can be removed from the RIRs, the spatial resolution for directional analysis will be limited by the number of microphones. Moreover, a fine-scale separation of the spatial components of the direct path and reflected path is difficult without the knowledge of the source directivity. Additionally, the lack of incorporation of frequency-dependent surface reflectivities with distinct decay times can cause severe errors in the reflected sound field power distribution estimated by the existing methods [18,24,32]. Hence, a competent room characterization tool should integrate the frequency, time, and spatial dependencies in the formulation of the reflected sound field.
In this paper, we utilize the spatial correlation of higher-order eigenbeams to estimate the directional power response of room reflections by processing the RIR measurements. The proposed technique further facilitates room mode analysis and directional decay rate estimation. In comparison to the previous version of this method in [33,34], we model the reflection power as a function of time, frequency, and direction for comprehending the influence of frequency-dependent wall absorption properties of the room surfaces. This method allows the estimation of the directional features of reflections with higher spatial resolution independent of the direct sound component. The room mode features, directional decay rates and dominant reflection locations generated from the proposed tool can serve many applications like room response equalization, acoustic treatment design, architectural design simulations, room geometry inference, auralization of historic buildings, archaeoacoustics, and other machine hearing technologies.
The remainder of this paper is organized as follows: Section 2 discusses the formulation and implementation procedure of the eigenbeam spatial correlation model for estimating the reflection power response. Section 3 presents the experimental results including the time-frequency spectrum of reflection power, directional decay rates, and dominant reflection directions. Section 4 concludes the paper with a summary of the key findings and mentions the future research plans.

Reflection Power Estimation Using Eigenbeam Spatial Correlation Model
In this section, we present the formulation and synthesis of reflection power as a function of time, frequency, and space in the spherical harmonics domain.

Problem Formulation
Consider a convex room with a single sound source and a spherical microphone array of radius R with Q omnidirectional microphones centered at a location O, as shown in Figure 1. Let the spherical coordinate y o = (r o , θ o , φ o ) denote the sound source location with respect to O. Similarly, the qth microphone element is located at x q = (R, θ q , φ q ) for q ∈ {1, 2, · · · , Q}. In this paper, all the elevation angles are ∈ [0, π] downwards from the Z-axis and the azimuth angles are ∈ [0, 2π) counterclockwise from the X-axis. We treat the room as a linear time-invariant (LTI) acoustic transmission system whose dynamic behavior is represented by the RIRs derived from the spherical microphone array measurements. Let H(x q , y o , t, k) be the room transfer function (RTF), between the source at y o and the microphone element at x q , obtained from the short-time Fourier transform (STFT) of the RIR. Here, t is the STFT temporal frame index and k = 2π f /c is the wavenumber with f and c representing the frequency and speed of sound, respectively. Since the incident sound field at the receiver contains the direct sound and the reflections, we can decompose the RTF H(x q , y o , t, k) as where H d (x q , y o , t, k) and H r (x q , y o , t, k) are the direct path and reflected path components, respectively. Assuming that the distance between y o and x q is significantly larger than the aperture size of the microphone array, we can represent H d (x q , y o , t, k) and H r (x q , y o , t, k) as a composition of plane waves in the spatial domain as where G D (t, k|y o ) is the direct path gain with respect to O,ŷ o is the unit vector along the source direction, i = √ −1, G R (t, k,ŷ|y o ) is the gain of the reflected plane wave arriving from the directionŷ = (1, θ, φ), and ŷ dŷ = 2π 0 π the reflection gain G R as a non-isotropic directional distribution function that varies with frequency and time to comprehend a real room with inhomogeneous surfaces that have frequency-dependent wall impedance and damping coefficients.
By examining E H d H * d based on (2), where E{·} represents the statistical expectation operator, we can express the direct path power as where |·| denotes the absolute value. Similarly, by examining E H r H * r based on (3), we can write the power of the reflected sound field component incoming from the directionŷ as We aim to estimate the reflection power P R (t, k,ŷ|y o ) from the RTFs H(x q , y o , t, k) ∀ q obtained using a spherical microphone array. Since P R (t, k,ŷ|y o ) is a spherical function, we can simplify its estimation using the spherical harmonic decomposition [35] given by where γ vu (t, k|y o ) are the reflection power coefficients and Y vu (·) is the spherical harmonic function of vth order and uth mode. Thus, we can calculate the reflection power for any incoming direction and time-frequency bin once we estimate γ vu (t, k|y o ) coefficients.

Methodology
For determining the γ vu (t, k|y o ) coefficients, we utilize the spatial correlation of higherorder spherical harmonic (eigenbeam) coefficients of the incident sound field. The estimation of the reflection power response involves two main steps: Step 1: Estimating spherical harmonic coefficients of the incident sound field In this work, since we are interested in characterizing the room response independent of the source power spectrum, we assume a sound source emitting an impulse signal and treat H(x q , y o , t, k) as the incident sound field on the spherical microphone array. For deducing the higher-order spherical harmonic coefficients of the incident sound field, we represent H(x q , y o , t, k) as the spherical harmonic decomposition of Helmholtz wave equation solution to the interior sound field problem [12] as where α nm (t, k|y o ) are the modal coefficients of the spatial sound field,x q is the unit vector in the direction of the qth microphone, and for an open array h n (kR) for a rigid array (8) with j n (·) and h n (·) denoting the spherical Bessel and Hankel functions of order n, respectively, and (·) represents the first derivative operation. From (7), we can estimate α nm (t, k|y o ) coefficients using the orthogonal property of spherical harmonics [36] as where (·) * denotes the complex conjugation operation. Practically, we truncate α nm (t, k|y o ) to an order N, such that N = kR and Q ≥ (N + 1) 2 , where · denotes the ceiling operation, to avoid errors due to spatial aliasing and high-pass nature of higher-order Bessel functions [36].
Step 2: Estimating reflection gains using the spatial correlation model We can now estimate γ vu (t, k|y o ) from the α nm (t, k|y o ) coefficients using the spatial correlation matrix expression [33] given by where representing the Wigner 3j symbols [37]. The elements in Λ(t, k|y o ) and B(k, y o ) can be generated from the α nm (t, k|y o ) coefficients and source direction information, respectively. Now, we can solve (10) to estimate where[ ·] and [·] † indicate estimated values and pseudo-inversion operator, respectively. While solving (14), the order of γ vu (t, where · indicate flooring operation, to avoid an underdetermined system [34]. Once the γ vu (t, k|y o ) coefficients are extracted fromΩ(t, k|y o ), we can generate the reflection power using Equation (6) for different incoming directionsŷ and time-frequency bins. From P R (t, k,ŷ|y o ), we can estimate the total reflected power in any time-frequency bin as Substituting (6) in (15) and using the symmetrical property of spherical harmonics [35] P T (t, k|y o ) = γ 00 (t, k|y o ). We can now use P R (t, k,ŷ|y o ) and P T (t, k|y o ) to analyze the reflection power variations with time, frequency, and direction.

Experimental Analysis
In this section, we present the analysis of the reflection power response of two rooms from their RIR datasets recorded using an em32 Eigenmike [38], which is a Q = 32 element rigid spherical microphone array of radius R = 0.042 m. Both the RIR datasets were measured using a source signal generated from a directional loudspeaker. The first RIR dataset available from the work in [39] is for a small audio laboratory room of size 3.54 × 4.06 × 2.70 m, hereafter referred to as Room-1. The second RIR dataset from [40] pertains to a larger classroom of size 6.5 × 8.3 × 2.9 m, hereafter referred to as Room-2. According to these datasets, the reverberation time (T 60 ) of Room-1 and Room-2 are 0.329 s and 1.12 s, respectively. From the datasets, we have selected the RIRs for different source positions in the XY plane, i.e., θ o = 90 • at different φ o angles, and at 1 m distance from the microphone array center. The direct path component from the source arrives at the receiver around 0.0026 s and 0.0028 s for Room-1 and Room-2, respectively.
From the selected 32-channel RIRs, we obtain H(x q , y o , t, k) using the STFT operation with a 1024-sample Hanning window with 50% overlap, 2048-point fast Fourier transform (FFT), and 48 KHz sampling frequency. We then follow the process described in Section 2.2 to generate P R (t, k,ŷ|y o ) for 500 uniformly distributedŷ directions derived from spiralbased sampling [41] ∀ t, k bins in the frequency band of 20 to 1500 Hz. These 500 spiral sampled directions provide sufficient spatial resolution to assimilate the sound reflectivity variations across the room surfaces at a reasonable computation cost. Finally, we estimate P T (t, k|y o ) for analyzing the time-frequency spectrum of the reflection power of the two rooms. While dealing with the temporal response in the following sections, the 0 s in the time-index indicates the moment of sound event occurrence. However, the reflection power response is calculated only from 0.01 s which is the center of the first STFT frame. This frame size was selected after considering a reasonable time-frequency resolution for proper spectral and temporal analysis of reflections in both rooms.

Theoretical Background
Here we discuss important theoretical concepts of room acoustics and room response characteristics according to prevalent literature [5,[42][43][44] to validate the experimental analysis.

Modal Decay
The reverberation field inside a room leads to the persistence of sound even after the source ceases. The duration of this sound persistence, called the reverberation time R T [5], is the most commonly used measure of room acoustic quality. In practical applications, acousticians calculate R T as the 60 dB decay time since source cessation and is referred to as T 60 [43]. Typically, such estimations assume diffuse sound field conditions and average wall absorption and calculate R T as a single value to characterize the room acoustics. However, in reality, the wall absorption factors change with frequency [5,44], and hence accurate R T estimates should be frequency-dependent. Furthermore, the room architecture, variations in surface materials, and source-receiver properties affect the reflection path length [44] and magnitude, which, in turn, influence the decay of different frequency components. Therefore, decay times should be a function of frequency and direction. Since an analytical solution to decay rate estimation is complex, we can derive them numerically through reflection sound field analysis.

Room Modes
The sound propagation in any acoustic enclosure follows different wave characteristic phenomena like reflection, scattering, diffraction, and interference. Such a complex interaction of innumerous waves is characterized through the acoustical wave equation [5]. The frequencies corresponding to the eigenvalues of the acoustic wave equation can form standing waves inside the room to create a resonant behavior leading to non-uniform distribution of reflection power and extended reverberation [5,43,44]. These frequencies are often referred to as room modes or eigenfrequencies.
According to [5,43], at low frequency ranges, the number of resonant frequencies will be small, and they can be excited individually. Hence, the room response will be quite irregular and anisotropic for these frequencies. When we move towards the higher frequencies, the eigenvalues are densely spaced, so they cannot be independently excited. Even though the higher frequencies contribute to the reflected sound pressure, the lack of independent resonance combined with increased scattering makes them relatively uniform and less prominent compared to the lower frequencies. Hence, in a typical room response, we expect high reflection powers with some resonant peaks for low (<300 Hz) to mid (300 to 600 Hz) audible frequencies and decaying magnitude towards the high (>600 Hz) frequencies. The cross-over frequency [5,43] that separates the resonant low-frequency response and the high-frequency diffused reflections is termed as Schroeder frequency (ν S ). It can be calculated using the empirical formula where ∆ is the room volume. From the dimensions and T 60 of the test rooms, (16) gives ν S ≈ 184 Hz and ≈169 Hz for Room-1 and Room-2, respectively. For a rectangular enclosure, we can calculate the eigenvalues of the wave equation [5,[42][43][44] as ν n x n y n z = c 2 where {n x , n y , n z } are non-negative integers and l x × l y × l z are the room dimensions. When two of {n x , n y , n z } equals zero, the solution of (17) gives the axial modes which are considered to be stronger with low decay rates compared to other modes [42]. We can calculate the tangential modes with two non-zero integers in {n x , n y , n z } and oblique modes by substituting all non-zero integers in {n x , n y , n z }. Figure 2 shows the room mode distribution in Room-1 and Room-2. The axial and tangential modes are calculated from (17), and the line heights in Figure 2 represent the number of resonances occurring at a frequency since different {n x , n y , n z } combinations can result in the same ν n x n y n z frequency. The axial modes were given a higher nominal weight [44] while calculating this distribution due to their inherent prominence. Theoretically, an empty rectangular room of the same dimensions should replicate this trend in their frequency response. However, in a real room environment, the interference of normal modes of different decay rates [44] and the influence of inhomogeneous surfaces and source directivity alter the assumptions behind (17). Therefore, the real room response may vary from the predicted distribution. For practical validation of the real acoustic phenomenon, we will use the power response generated using the proposed technique to identify the variations in the room mode distribution and modal decays compared to the above theoretical expectations. Figures 3 and 4 show the spectrogram of P T (t, k|y o ) for different source positions in Room-1 and Room-2, respectively. For both rooms, the lower frequencies show some irregular peaks, and the reflection power of late reverberation clearly decays towards the higher frequencies as we predicted in Section 3.1.2. Additionally, the reflection power is maximum in the initial time instants, and then the power decays with time for all frequencies due to surface absorption. It should be noted that the power decay trend is varying with the frequencies due to the frequency-dependent wall impedance property [5]. Apart from some magnitude variations, the time-frequency spectrum trend is maintained for all source positions in both rooms. In the following sections, we will analyze the reflections power variations with frequency and time in more detail.   Figure 5. Similarly, some of the observed peaks in Room-2 around 164 Hz, 304 Hz, 328 Hz, and 492 Hz vary from the theoretical room mode estimates shown in Figure 2b. Additionally, the identification of ν S is difficult from these responses, but is clearly greater than the predicted ν S values mentioned in Section 3.1.2. This error is caused by the approximation in (16) by use of frequency-averaged T 60 and from the influence of source directivity.  It should also be noted that there are no substantial variations in the frequency response of Room-2 for different source positions. Additionally, in Room-1, the differences are not drastic as should be expected in a smaller room with significant reverberation. This is the result of the formulation of reflection gains with respect to a common listening position (O) and the separation of the direct path component from the reflections. A direct analysis of the frequency response of RIR will show significant differences with the change in source positions. Therefore, the proposed technique can be used to predict the room response behavior independent of the source positions. Figures 7 and 8 show the temporal response of P T (t, k|y o ) at different frequencies for different source positions in Room-1 and Room-2, respectively. As evident from these figures, the reflection power decays due to surface absorption, and the decay trend is similar for all source positions. Since the damping constants of room surfaces are frequencydependent, each frequency in Figures 7 and 8 decays at different rates. The lower frequencies like 70 Hz, 141 Hz, and 211 Hz have slower decay rates compared to the other frequencies. As we move from 281 Hz to 633 Hz in Figures 7 and 8, the decay rate stabilizes towards the higher frequencies. Furthermore, the decay of higher frequencies is nearly linear, whereas the lower frequencies (70 Hz to 211 Hz) exhibit a non-linear decay, especially in Room-2. This can be attributed to the highly non-uniform power distribution of the lower frequency resonant modes, which leads to the concentration of sound absorption to certain surfaces [42,43]. In comparison, the high frequencies have more diffused distribution of reflection power, and hence the decay behavior is averaged over broader surface areas.

Decay Time
From the time-frequency spectrum of reflection power, we can estimate the decay time of each frequency to predict the strong room modes in a real room environment. Figures 9 and 10 show the 60 dB decay time of each frequency estimated from the P T (t, k|y o ) values for different source positions in Room-1 and Room-2, respectively. Even though the temporal response at each frequency in Figures 7 and 8 seems relatively independent of the source positions, the decay times of the frequencies is slightly different for each source position according to Figures 9 and 10. The average decay time, maximum decay time, and the corresponding frequency for each source position in both rooms are summarized under Table 1. We can say that the strongest modes in Room-1 are ≈140 Hz, ≈164 Hz, and ≈258 Hz, which are closer to the peak power frequencies observed in Figure 5. However, in Room-2, the frequencies with maximum decay time are different from the frequencies with maximum reflection power. Hence, we need a deeper insight into the directional variations of power and decay time which we will analyze in the next section.

Directional Decays and Dominant Reflection Directions
As we discussed in Section 3.1.1, decay times are a function of frequency and direction. Additionally, from Section 3.3, we found that the modes with higher decay times can be different from the modes with high reflection powers. Therefore, a more comprehensive analysis of the spatial spectrum of these reflections is necessary to identify room surfaces causing the observed behaviors for the frequencies of interest. Figure 11a,b shows the directional decay times of Room-1 for y o = (1, 90 • , 40 • ) and y o = (1, 90 • , 120 • ), respectively, obtained from the 60 dB decay time of P R (t, k,ŷ|y o ) in eachŷ direction. Figure 12a,b shows the directions with high reflection powers in Room-1 for y o = (1, 90 • , 40 • ) and y o = (1, 90 • , 120 • ), respectively. The letters indicated near the locations of highest reflection powers in Figure 12 are coarsely mapped onto the real Room-1 environment in Figure 13. As evident from this figure, the locations around 'A', 'C', 'D', and 'E' have glass surfaces with high reflectivity, and hence the observed dominant power directions are valid. Furthermore, there is no evident pattern between the distributions in Figures 11 and 12 for the given modal frequencies, and hence the feature predictions based on computational room acoustic models can be imprecise. In such cases, we can employ the proposed technique to reproduce authentic spatio-temporal room responses.
According to Figures 11 and 12, the directions of high decay times and dominant reflections are different from each other for every frequency and source position. Even though the dominant reflection locations and directional decay distribution have many common factors of influence, the reflection power in a direction strongly depends on the source directivity and source-to-wall distance, whereas the directional decay is mainly a function of the wall impedance coefficients and reflection paths. Hence, as seen in Figure 11, the directional decay will be different between the frequencies due to wall impedance variations, as well as for different source positions due to change in reflection path. In contrast, if we observe Figures 12 and 13   Based on the above observations, the analysis of both directional decay and directional power is essential in characterizing the room reflections. This is particularly important while managing the features of early reflections and late reverberations to achieve desired perception quality. Since the early reflections undergo very few boundary reflections [45], they are mainly defined by the source directivity and source-to-wall distance. Hence, we can use the dominant reflection directions to characterize the behavior of early reflections. The late reverberation undergoes multiple boundary reflections, and they are integrated both spatially and temporally before reaching the receiver [45]. Since the late reverberation characteristics are primarily characterized by the surface absorption and room shape [45,46], we can analyze the directional decay rates to study their behavior. We can further visualize the power spectrum of P R (t, k,ŷ|y o ) across time for an extensive analysis of the variations in the anisotropic spatial properties between the early reflections and late reverberations.
The precise knowledge of frequencies and surfaces contributing to the salient features of these reflections will be useful for defining the perceptual targets for modal control methods [47], optimizing room mode redistribution to improve acoustic quality [48], and devising active [49] and passive [50,51] room acoustic treatment methods.

Conclusions
In this paper, we presented a reflection power response estimation technique utilizing the spatial correlation of higher-order eigenbeams derived from spherical microphone array measurements. The formulation of the reflection gain as a function of time, frequency, and direction helps in comprehending a faithful room response for a realistic non-diffuse sound field. The experimental results validate the frequency response and temporal response of the reflection power against the theoretical expectations.
The proposed technique can estimate the resonant frequencies and modal decays caused by directional speakers and complex room environments. Furthermore, the directional decay times and dominant reflection directions facilitate the distinction of early and late reflection features. The insights from this room acoustic evaluation technique will be beneficial in controlling the acoustic quality while designing performance spaces. Particularly, the findings from this method will be more reliable than computational room models while deciding acoustic treatment schemes compatible with the source directivity. Additionally, the room mode features identified from this method can be incorporated in spectral equalization algorithms to improve speech intelligibility and remove audible artifacts. The dominant reflection locations and directional decay spectrum can aid in the inference of room geometry and calibration of the room acoustics in virtual reality-based rendering of heritage sites.
The method can also be adapted for blind estimation of the discussed characteristics from the direct processing of microphone recordings for any arbitrary source signal, since we can separate the reflected power from the direct path power. Moreover, apart from spherical microphone arrays, any arbitrary array designs that can generate accurate spatial sound field coefficients can be integrated with the proposed algorithm. The future work shall expand the method to include multiple sources in noisy environments to conceive more real-world applications.

Conflicts of Interest:
The authors declare no conflicts of interest.

Abbreviations
The following abbreviations are used in this manuscript: