Time Domain Spherical Harmonic Processing with Open Spherical Microphones Recording

: Spherical harmonic analysis has been a widely used approach for spatial audio processing in recent years. Among all applications that beneﬁt from spatial processing, spatial Active Noise Control (ANC) remains unique with its requirement for open spherical microphone arrays to record the residual sound ﬁeld throughout the continuous region. Ideally, a low delay spherical harmonic recording algorithm for open spherical microphone arrays is desired for real-time spatial ANC systems. Currently, frequency domain algorithms for spherical harmonic decomposition of microphone array recordings are applied in a spatial ANC system. However, a Short Time Fourier Transform is required, which introduces undesirable system delay for ANC systems. In this paper, we develop a time domain spherical harmonic decomposition algorithm for the application of spatial audio recording mainly with beneﬁt to ANC with an open spherical microphone array. Microphone signals are processed by a series of pre-designed ﬁnite impulse response (FIR) ﬁlters to obtain a set of time domain spherical harmonic coefﬁcients. The time domain coefﬁcients contain the continuous spatial information of the residual sound ﬁeld. We corroborate the time domain algorithm with a numerical simulation of a fourth order system, and show the proposed method to have lower delay than existing approaches.


Introduction
Spherical harmonic analysis has been widely used for spatial acoustic signal processing for years [1]. Sound field recordings can be decomposed into a set of orthogonal spatial basis functions and respective coefficients when an appropriately designed spherical microphone array is used [2,3]. The spherical harmonic decomposition has the advantage that a given sound field can be analyzed over a continuous spatial region rather than a set of distributed points [4]. This has embraced a wide range of algorithms in three-dimensional (3D) audio signal processing such as: sound intensity analysis [5], sound field diffusive analysis [6], beamforming [7,8], source localization [9,10], and spatial Active Noise Control (ANC) [11,12]. A spatial ANC system aims to reduce the unwanted acoustic noise [13] over a space in order to create a silent zone for people. Multiple microphones are used to record the residual noise, and multiple loudspeakers are used to generate the anti-noise field. The recording's accuracy of the residual sound field can highly influence the performance of an ANC system. Furthermore, recording efficiency is also important, as ANC usually focuses on low frequency and time-variant noise. As a result, an accurate and low latency algorithm for residual sound field recording is desired [14].
The sound field recording step in a spatial ANC system focuses on obtaining the location independent spherical harmonic coefficients that represent the residual sound field inside a region of interest. This is different to real time spherical harmonic beamforming or directivity analysis which focuses on extracting source location information from the spatial recording. Moreover, spatial ANC mainly focuses on reducing the sound field inside the spherical microphone array (the region of interest). While other spatial recording applications may focus on analysing the sound field exterior to the array. Additionally, although most of the spatial audio applications utilize a rigid spherical array [15][16][17] for its convenience to build and use, an open spherical array is considered to be more suitable for a spatial ANC system. This is because users should be able to enter and move within the ANC region of interest that is surrounded by the spherical microphone array [12,18]. Furthermore, there exists previous work focusing on optimising the open array for spherical harmonic recording [19,20], and for spatial ANC systems [21]. However, we consider the optimisation of the open microphone array design to be outside of the scope of this paper, and instead focus on a time-domain recording algorithm.
Real-time spatial beamforming systems illustrate that applications with strict delay requirements can highly benefit from the small latency and efficient computation of time domain processing [22,23]. By posing the signal processing algorithm in the time domain, system performance can be optimized with real-valued lower order filters [24], and lower modeling delays [25]. Specifically, for a spatial ANC system, the system delay which includes the filter group delay (signal processing algorithm), the A/D and D/A converter, and the data processing delay, should be less than the acoustic delay from the reference microphones to the secondary loudspeakers in order to satisfy causality [26]. Furthermore, a longer signal processing delay slows down the convergence speed of the adaptive filtering and may lead to an unstable system [27,28]. Therefore, it is worthwhile to consider a time domain spherical harmonic decomposition method to achieve sound field recording with an open spherical array for the application of spatial ANC.
Frequency domain spherical harmonic recording has been well developed with various optimised filters [29][30][31]. One benefit of developing the method in the frequency domain is that the influence of the spherical Bessel zeros can be easily removed by avoiding the estimation of the coefficients at these erroneous frequency bins [19,[32][33][34]. However, when we consider a time domain method, we can not simply avoid the Bessel zeros because we do not apply a Fourier Transformation to separate the Bessel zero frequency components from the others.
Meanwhile, there are also several works relate to time domain spatial audio signal processing. In [35], Poletti and Abhayapala give a time domain description of the free-space Green's function in the spherical harmonic domain. This provides a solution to decompose the free-space channel between a loudspeaker and microphone into the time-space domain. This work only targets the free-space Green's function, and as a result, the method is highly limited to the application of free space sound field reproduction. In [36], a time domain wave field synthesis method is presented. Although an IFFT is applied to derive the time domain solution, the work still demonstrates that time-domain wave field synthesis can be beneficial to time-varying spatial acoustic applications. In [37], Hahn and Spors offer a time domain representation of the spherical harmonic equation. They relate the time domain spherical harmonic coefficients to the sound pressure, but do not include the method of obtaining the time domain coefficients from a given recording. Time domain beamformers are designed in [38,39] with the IFFT of spherical harmonics. These papers show certain advantages for finite impulse response (FIR) filtering based signal processing systems. Overall, these time domain approaches illustrate the advantages of time domain signal processing, however, they remain unable to obtain location-independent spherical harmonic coefficients. This makes them ill-suited for spatial ANC systems, as these location-independent coefficients provide necessary information about the continuous residual sound field inside the region of interest.
In this paper, we propose a FIR filter based time domain spherical harmonic analysis method to accurately record spatial sound fields with an open spherical microphone array for the purpose of spatial ANC. We note that this work focuses solely on the problem of sound field recording, and that the spatial ANC application acts purely as motivation to our problem. Therefore, with spatial ANC in mind, the recording method prioritizes a minimum processing delay, a bandwidth of interest (low frequencies for typical noise scenarios), and a practical array geometry (open sphere surrounding a quite zone). Employing the recording method in an actual ANC system, and its evaluation, is out of the scope of this paper. The novelty of the presented work is the investigation of time domain spherical harmonic coefficients. These time domain coefficients match the properties of conventional frequency domain spherical harmonic coefficients. That is, the coefficients are location independent within the region of interest, and they represent the continuous sound field over the space. Additionally, these coefficients are obtained in time domain, which relieves the block processing constraint (and can do sample-by-sample processing) and results in lower system delay. Hence, the proposed method is considered to be highly beneficial to spatial ANC systems.
We organise the main body of this paper as follows: In Section 2 we detail the background of the frequency domain spherical harmonic algorithm for spatial sound field recording. Additionally, we introduce the time domain equation of spherical harmonic decomposition, while addressing the challenges of recording time domain spherical harmonic coefficients. The filter's design and implementation to obtain time domain spherical harmonic coefficients is presented in Section 3, along with error analysis. Effects of truncation and filter length are shown in Section 4 via initial simulations of filter performance. Section 5 presents simulation results for the proposed method's estimation of spherical harmonic coefficients, as well as sound field reconstruction performance at a point and over space, verifying the effectiveness of the proposed theory and design. We conclude the findings and insights gained from this work in Section 6.

Problem Formulation
We begin this section by reviewing the well-known frequency domain spherical harmonic decomposition method. We then introduce the corresponding time domain formulation, and detail the Fourier Transform relationship between the components in the frequency domain equation and the time domain equivalent. Finally, we show the difficulties in obtaining spherical harmonic coefficients in the time domain.

Spherical Harmonic Decomposition of Sound Field in Frequency-Space Domain
An incident sound field at any arbitrary point x = (r, θ, φ) inside a source free 3D spherical region Ω, where r refers to the distance between the point x and the origin, θ and φ denote elevation and azimuth angles, respectively [40], can be expressed in the frequency domain as [1,41] where order n (n ≥ 0) and mode m are integers, N = kR [1], k = 2π f /c is the wave number, f is frequency, c is the speed of sound, R is the radius of Ω, α nm (k) is a set of spherical harmonic coefficients representing the sound field inside Ω, j n (kr) is the nth order spherical Bessel function of the first kind, Y nm (θ, φ) are the spherical harmonic functions. For convenience, we use real spherical harmonics in this paper, given by [42] Y nm (θ, φ) =(−1) |m| 2n + 1 4π where P nm (·) is the associated Legendre function. Real spherical harmonics have the orthogonality property of If the spherical harmonic coefficients α nm (k) are available for a sound field, then these coefficients can fully describe the sound field over the continuous spatial region of interest. Traditionally, when spatial harmonic processing is used to record a spatial sound field S(x, k), it is recorded over a spherical surface of radius R Q (R Q ≥ r). The corresponding α nm (k) are extracted by integrating (1) over the spherical surface while exploiting the orthogonality property of Y nm (·) in (3), which gives [2] α nm (k) = 1 j n (kr) In practice, this integration is realized using an equivalent discrete summation of spatial samples over the sphere.

Equivalent Spherical Harmonic Decomposition of a Sound Field in Time-Space Domain
While the frequency domain spatial sound field capture is well established as explained in Section 2.1, in this paper, our objective is to investigate the possibility of an analogous spherical harmonic analysis in time domain. In a similar fashion to (1) and (4), we now consider the relationship between sound pressure s(x, t) recorded by a spherical microphone array and the time domain spherical harmonic coefficients, denoted as a nm (t). It is desirable to have these time domain coefficients a nm (t) independent of the measurement radius. Thus, we only need to record a nm (t) to obtain the sound field over the entire region of interest Ω. A time domain method can directly extract a nm (t), thus avoiding the Fourier transformation of signals.
As a time domain analysis is usually with real-valued components, we rewrite (1) in the form of where i = √ −1, in order to make the inverse Fourier transform of all terms to be real. Taking the inverse Fourier transformation of (5), we obtain where * denotes the convolution operation, where F −→ denotes the Fourier transform operator, which is given by where P n (·) is the Legendre function. The proof of (9) is given in Appendix A. We note that every component in (6) is real valued. Equation (6) shows how to reconstruct the sound pressure at x = (r, θ, φ) with the recorded time domain spherical harmonic coefficients a nm (t). We consider an alternative time domain filter to obtain a nm (t) from the recorded signals rather than taking the inverse Fourier transform of (4) since 1/j n (kr) is unbounded when j n (kr) = 0. Note that j n (kr) as a filter has order dependent zeros when j n (kr) = 0. As a result, 1/j n (kr) approaches infinity at these frequencies, making it unstable to have an inverse Fourier transform. In other words, the z-transform of p n (t, r) given in (9), has zeros on the unit circle because of Bessel zeros, refers to a non-minimum phase system. In this case, the inverse system of p n (t, r), with the frequency response of 1/j n (kr) is not stable. As a result, we first define which has a frequency response of Since Y nm (θ, φ) is independent to both frequency and time, b nm (t, r) can be obtained by integrating (6) over a sphere of radius r such that If we regularly place Q ≥ (N + 1) 2 omni-directional microphones on a sphere of radius R Q , we can estimate the integration in (12) with a finite summation such that To simplify the implementation, we sample the signals with sampling time T such that t = νT = ν/F s , where ν is the time index and F s is the sampling frequency. We rewrite (10) where is a time limited function with p n (ν, With (14) in hand, our problem reduces to obtaining a nm (ν) from the measured b nm (ν, R Q ). This is not achievable since it is an under-determined problem. We always have 2L p + 1 more unknowns (a nm (ν)) than knowns (b nm (ν, R Q )). Moreover, this is not practically feasible because the z-transform of p n (ν, R Q ) has zeros on the unit circle, resulting in poles on the unit circle in its direct inverse, making the system unstable. Alternatively, a nm (ν) can be extracted from b nm (ν, R Q ) using an appropriately designed filter.
In this paper, we attempt to design a filtering solution while overcoming the above challenges. It is important to note that the Fourier transform relationship discussed in this section were solely used to formulate the definition of the time-domain spherical harmonic decomposition of a sound field. From this point onward, we will focus on signal processing of the captured sound field only in the time domain.

Filter Design for Obtaining Time Domain Spherical Coefficients
In Section 2, we have presented a method to obtain b nm (ν, R Q ) from recorded sound pressure S(x q , ν) with a spherical microphone array. In this section, we design a series of FIR filters to obtain a nm (ν) from given b nm (ν, R Q ).

Stability of Ideal Inverse Filter
Due to the challenges mentioned in Section 2, rather than directly using (14), we pre-design a series of filters ρ n (ν, r) such that where We note here that ρ n (ν, r) should be order n dependent but mode m independent, as is the same property with p n (ν, r). However, we can never achieve a precise δ(ν) in (17), as the energy of measured sound pressure at the frequency bins of Bessel zeros has been filtered to zero by p n (ν, R Q ). Therefore, we refrain from designing the inverse filter at these zero positions. Instead, we modify δ(ν) toẑ n (ν) such that its frequency responseẐ n ( f ) is given bŷ where is a small positive constant threshold which satisfies j n (kr) ≈ 0 when |j n (kr)| < . For a fixed R Q , both j n (2π f R Q /c) andẐ n ( f ) can be seen as a function of f . Figure 1 shows From Figure 1, we can see thatẐ n ( f ) is a superposition of a series of rectangular windows, meaning its inverse Fourier transformation,ẑ n (ν), should be a superposition of sinc functions. In practice, due to inherent properties of j n (2π f R Q /c), for a given maximum frequency f max , the number of active spherical harmonic orders is up to N ≈ kR Q [1]. We use the same truncation limit when designingẐ n ( f ), resulting inẑ n (ν) to be a superposition with a finite number of sinc functions. The necessity and the influence of this truncation on frequency f max will be further discussed in Section 4.1.
Let us define w (n) in radian (rad), such that j n (w (n) F s R Q /c) = , where is the positive threshold we explained in the last paragraph. Therefore, w (n) can be considered as the edges of window inẐ n ( f ) (see Figure 1). Given the vector of [w (n) where S is the number of rectangular windows inẐ Furthermore, w (n) s are dependent on the radius of the microphone array R Q , sampling frequency F s and the speed of sound c, but the value of w (n) F s R Q /c remains constant for each order n such that |j n (w (n) F s R Q /c)| = . The first four order of w (n) is given in Table 1 with the highest frequency limit of f max = 2047 Hz and sampling frequency F s = 48, 000 Hz. Note that for the zero-th order, we set w 1 = 8.9 × 10 −4 to block DC component in practice.
If we have a series of concentric spherical microphone arrays with the radii of r 1 , r 2 , · · · , the value of w (n) F s r q /c would be different from a single sphere model, which can be calculated by |j n (w (n) r 1 /F s c) + j n (w (n) r 2 /F s c) + · · · | = .

Modified Inverse Filter
Now that the design forẑ n (ν) is established, our next step is to design filters ρ n (ν, R Q ) which satisfies We notice in (20) that p n (ν, R Q ) is a finite length vector and we would like ρ n (ν, R Q ) also to be a finite length vector. However,ẑ n (ν) is infinitely long with a series of sinc functions. If we perform linear convolution of p n (ν, R Q ) with ρ n (ν, R Q ), we would obtain a vector with the length of 2(L + L p ) + 1 samples, where 2L + 1 is the filter length of ρ n , such that ρ n (ν, R Q ) has none-zero values for −L ≤ ν ≤ L. Thus, we need to truncate the infinite lengthẑ n (ν) to 2(L + L p ) + 1 samples for every order of n where We can then write (20) in a finite summation form as We rewrite (22) into matrix form z n = P n ρ n , where z n = [z n (−(L + L p )), z n (−(L + L p ) + 1), · · · , z n ((L + L p ))] T , ρ n = [ρ n (−L, R Q ), ρ n (−L + 1, R Q ), · · · , ρ n (L, R Q )] T , and P n is the convolution matrix based on the Toeplitz structure of p n (ν, R Q ), given in (24).
The size of matrix P n is [2(L + L p ) + 1, 2L + 1], where we choose the filter length 2L + 1 of ρ n (ν, R Q ) to be significantly larger than both 2L p + 1 and the main lobe width of function z n (ν), to avoid P n being ill-conditioned and minimize the error of truncating z n (ν) into a finite length signal. The influence of choosing L will be detailed in Section 4.2.
Since (23) is an over-determined system of equations, we apply LMS method to (23) to obtain ρ n = P + n z n , where P + n refers to the Moore-Penrose inverse of P n . As a result, with (16) and (25), a nm (ν) can be estimated by In this way we obtain a nm (ν) while overcoming the challenges listed in Section 2.

Practical Considerations of Filter Implementation
In (26), Naturally because of the Legendre function in p n (ν, R Q ). However, with the influence of sinc functions in z n (ν) in our proposed filters ρ n (ν, R Q ), we now need the past L samples and the future L samples of b nm (ν, R Q ) to obtain a nm (ν) at time index ν. For offline signal processing, L samples of zeros should be added both in the beginning and the end of the vector of b nm before filtering it with pre-designed ρ n (ν, R Q ). Moreover, an overlap of 2L + 1 samples is needed for frame based signal processing. For on-line real time signal processing, we cannot obtain future samples of b nm (ν, R Q ). As a result, we add L samples of zeros in front of the filter ρ n (ν, R Q ), and create a buffer of the past 2L + 1 samples of b nm (ν, R Q ). At time index ν, we obtain a nm (ν − L) with the buffer of [b nm (ν − 2L, R Q ), · · · , b nm (ν, R Q )]. Thus, there is a L samples of group delay of the system. We further discuss and compare the group delay with frequency domain method in Section 5.5.

Error Analysis
We define the error e nm (v) as the difference between the desired time domain spherical harmonic and the coefficients we obtained by the proposed method, which can be decomposed to: e nm (v) = e filter (n, m, ν) + e position (n, m) + e truncation (n), (28) where e filter (n, m, ν) is filtering error introduced by ρ n (ν, R Q ), e truncation (n) is the truncation error of order N, and e position (n, m) is due to the microphones position error. The qualitative analysis of e truncation (n) and e position (n, m) based on the frequency domain method are addressed in [3], where we draw a similar conclusion in time domain that with increasing number of microphones and fixed N, e truncation (n) decreases. Meanwhile, e position (n, m) depends on the nature of inaccurate microphone positioning, referring to the distance between the desired point and microphone location. We mainly focus on e filter (n, m, ν) here as it is the main error contribution due to the proposed filtering approach. According to (27), e filter (n, m, ν) at a specific order n and mode m can be expressed as where e n (ν) δ(ν) − z n (ν).
Using (18) and (21), the Fourier transform of e n (ν) is with the same truncation in frequency asẐ n ( f ). Thus, e n (ν) can be expressed as where S and w (n) have the same definition as in (19) and w (n) 0 = 0. With (29) and (32) we can quantitatively calculate the filtering error e filter (n, m, ν) introduced by ρ n (ν, R Q ). The total error caused by filtering can be calculated by a summation of e filter (n, m, ν) over every order of n and mode of m. As this filtering error is mainly due to Bessel zeros, it can be reduced by limiting the highest order N of the system, where a smaller N results in lower Bessel zeros hence a smaller e filter (n, m, ν). Also, N depends on the highest wave number k and the radius of the microphone array R Q . By choosing N with a pre-knowledge of the frequency limit of the input signals and R Q also helps to minimize the filtering error e filter (n, m, ν).

A Filter Design Example
To provide a further understanding of the filter design process, we present a design example of a fourth (N = 4) order spherical microphone array of R Q = 0.16 m, designed to record the time domain spherical harmonic coefficients within the spatial region enclosed by the array with a desired frequency band of [20,1360] Hz. Let F s = 48, 000 Hz and c = 343 m/s. Before we apply the proposed method to recording signals, we first analyze the influence of several steps in designing the proposed filter ρ n (ν, R Q ).

Effect of Frequency Truncation of Z n (ν)
As audio signals are often band limited in ANC applications [14], we can have a finite truncation on spherical harmonic decomposition with order N = kR Q . In other words, if we have a fixed N-th order system, the highest frequency that the system can successfully capture is given by f max = Nc/(2πR Q ) ≈ 1360 Hz. Figure 2 shows the frequency response of ρ n (ν, R Q ), refers to Φ n ( f , R Q ), which is designed using (25) with z n (ν) truncated at f 1 = 1023.6 Hz (Figure 2a), f 2 = 1364.8 Hz (Figure 2b), f 3 = 2047.1 Hz (Figure 2c), respectively. The filter length is set to be 500. To obtain the frequency response of ρ n (ν, R Q ), a FFT of I = 4096 points is applied with zero padding to ρ n (ν, R Q ). We remind here that z n (ν) is given by (19) in time domain, which does not rely on any frequency domain processing. 10 0 We observe that for a N = 4th order system, the truncation at f 1 is not enough to get an accurate frequency response of ρ n (ν, R Q ), as the frequency response Φ n ( f , R Q ) begins to decline at f 1 . In this case, ρ n (ν, R Q ) can not provide an acceptable filtering result with signals containing higher frequency components. Truncation at both f 2 and f 3 can give a satisfied frequency response when f < f max . As the frequency range of the system is also limited by N = kR Q , it is not necessary to look at the frequency response when f > f max . So in both cases ρ n (ν, R Q ) can give an acceptable filtering output. As a result, we choose to truncate z n (ν) at f max , where 2π f max R Q /c = N. If the recorded signal is known as a band limited signal where its highest frequency component is less than f max , an alternative choice of the frequency truncation of z n (ν) is at this highest frequency to reduce the computation complexity. Meanwhile, if z n (ν) has been designed with a higher frequency truncation, it can also be used in a lower order system with a lower requirement of frequency truncation.

Choice of Filter Length of ρ n (ν, R Q )
Intuitively, a longer filter often brings us less error and better performance. Figure 3 supports this idea by showing the result of ρ n (ν, R Q ) * p n (ν, R Q ) − z n (ν) with different choices of L, which refers to the error introduced into the system by the filtering processing. We observe that the filtering error decreases across all of the orders with a higher L. This is due to the time truncation of z n (ν) (length of vector z n in (25)), being related to L. Thus, a higher L leads to less information loss in the time truncation of z n (ν), hence smaller error in ρ n (ν, R Q ). However, Figure 4 shows the time domain filter ρ n (ν, R Q ) with different lengths. We observe that a longer filter results in a higher group delay of filtering. This is not desirable because it leads to a higher system delay of our proposed method, while lowering the system delay is one of the most important motivations that we develop the proposed time domain method.  As a result, we need to balance the noise tolerance, group delay, and the filtering error when we choose L. We suggest that filter length 2L + 1 should be significantly larger than the main lobe width of z n (ν) and 2L P + 1, the length of p n (ν, R Q ), but no more than 50 times of 2L P + 1. Additionally, L should be less than the maximum tolerance of the delay of the system. Based on these guidelines, for the current example, we choose 2L + 1 = 501.

Simulation Results and Analysis
In this section, we evaluate the result of the proposed algorithm for time domain spherical harmonic analysis using a fourth order (N = 4) system. We consider 32 microphones regularly placed on an open spherical array of R Q = 0.16 m, where the analysis region of interest is inside the array. A point source is placed at [1, 2, 1] m with respect to the origin which coincides with the origin of the microphone array. The sampling frequency is 48,000 Hz, and the filter length 2L + 1 is 501. A noise signal at 40 dB SNR is added to each microphone to reflect thermal noise. Considering the application of the proposed method to be spatial ANC, we construct the desired frequency band to cover the target noise band, and construct the radius of the region to be wide enough to fit one human head.
It is difficult to validate our method in time domain directly because the coefficients are time dependent and no ground truth has been given. Therefore, we first validate our proposed time domain spherical harmonic coefficients in the frequency domain. Thus, we compare the Fourier transformation of the time domain coefficients to the theoretical frequency domain coefficients given in (4). Next, to clarify that our proposed method has the ability to record a sound field in the region of interest in the time domain, we reconstruct sound pressure at an arbitrary point as well as over a plane inside the region of interest with the captured time domain spherical harmonic coefficients by (6). Finally, the time delay of the proposed method is given.

Comparison between the Time Domain and the Frequency Domain Spherical Harmonic Coefficients
We use a narrow band signal at 1200 Hz to test if our proposed method can obtain time domain spherical harmonic coefficients a nm (ν) correctly with (26). In (11), we give the relationship between a nm (ν) and α nm (k). We compare the Fourier transformation result of our obtained time domain spherical harmonic coefficients F T {a nm (ν)} with the desired frequency domain spherical harmonic coefficients α nm (k), obtained by Equation (4) in frequency domain. Fourier transformations use J = 1024 points. We do not compare the phase of these coefficients since the group delay of the time domain method and the frequency domain method is different. Instead, we compare the phase difference, given by α nm (k) − α n(m−1) (k). The results of both amplitude and phase difference are shown in Figure 5. In Figure 5 we see that there is little to no difference on both amplitude and phase difference between the Fourier Transformed time domain coefficients and the frequency domain coefficients over all the order and modes. Thus, our proposed time domain method successfully obtained the time domain spherical harmonic coefficients, which can be related to the frequency domain coefficients by Fourier transformation.
Next, we compare the coefficients over different frequencies with a wide band test signal within the frequency limited of [20,1300] Hz. In Figure 6, we show the comparison of amplitude at FT{a 00 (ν)} and α 00 (k), FT{a 11 (ν)} and α 11 (k), and FT{a 31 (ν)} and α 31 (k) over frequencies respectively while Figure 7 shows the phase difference. A huge error is observed in Figure 6a at the 46th frequency bin. This error is due to that there is a Bessel zero of the zeroth order at this frequency bin (around 1072 Hz). We see the frequency domain spherical harmonic coefficients α 00 (k) has a much higher amplitude, while our proposed method suppressed the amplitude at this certain frequency bin. Meanwhile, we can see in Figures 6 and 7 that the error at a 31 (ν) is higher compared to the other two modes. As order increases, the error increases. This error can be decreased by applying more microphones on the array. We also obtain a non-negligible error before the 30th frequency bin of the coefficients amplitude for (n, m) = (3, 1) in Figures 6c and 7c. This error is because our time domain proposed method and conventional frequency domain method have different processing on suppressing Bessel zeros. During the reconstruction process, the high pass property of spherical Bessel function removes the information at this frequency bin. Thus, this error will not influence the reconstruction of the sound field.

Sound Pressure Comparison at a Point Of Interest
In this section, we reconstruct the sound field with the captured time domain spherical harmonic coefficients at a point in the region of interest, and compare it with the desired sound field at the same point of interest. We use a signal containing three frequency components of 600 Hz, 850 Hz, and 1300 Hz. Figure  We note here that when reconstructing the sound-field with (6), we face the problem that at a point x = (r, θ, φ) where the radius r is very small, the filter p n (ν, r), whose filter length dependents on rF s /c, is too short to perform efficient filtering. To overcome this problem, we up-sample the obtained a nm (ν) with a rate of R Q /r and construct corresponding p n (ν, r) with the same length of L p = 2 * R Q F s /c + 1. We then down-sample the resulting b nm (ν, r) with a rate of r/R Q to keep the sampling frequency consistent with F s . We can see that the obtained a nm (ν) by our proposed method can successfully reconstruct the sound pressure at a point inside the region of interest with a tolerable error. This supports that our time domain coefficients contain certain spatial information of the sound field that the sound pressure at an arbitrary point inside the region of interest can be properly calculated with the measurements only being taken on the boundary of the region.

Sound Field Comparison over a Plane
To further evaluate our method on reconstructing sound field over space, we now reconstruct the sound field by a nm (ν) over a plane. We use a narrow band signal of 1200Hz here that the sound field in the region of interest is simple and clearly understood. Although the sound field is reconstructed over time, a 2D plot can only show the result of one time index. Figure 9 shows the reconstructed sound field and the desired sound field over the plane parallel to the x-y plane, with z = 0.02 m at t = 0.3 s. The 272 samples group delay is manually fixed and will be discussed later in the next subsection. The white line in Figure 9 bounds the region of interest. We can see that the reconstructed sound field inside this region in Figure 9a is roughly the same as the desired sound field in Figure 9b. This confirms that the coefficients recorded by our proposed method are able to capture the sound field inside the region of interest.

Sound Field Error Estimation over The Region
To evaluate the reconstructed sound field over time, we calculate the instantaneous average squared spatial error over time, which is defined by Figure 10 shows how the error fluctuates with time in a tolerable range (no more than 5 × 10 −4 ) with a 900 Hz tone and a 1072 Hz tone. We have already observed in Figure 8 that the error of the sound pressure at a point of interest is proportional to the desired sound pressure. We observe the same trend when we evaluate the error over the region that the error increases when the sound field inside the region of interest is at peak amplitude. We also observe in Figure 10 that the error with 1072 Hz signal is higher than 900 Hz signal. This is due to that there is a Bessel zero of the zeroth order (j 0 (kr)) at 1072 Hz in the proposed spatial ANC system. Hence, the amplitude of a 00 (ν) is suppressed by the proposed method, leading to a higher error in reconstructing the sound field.

Processing Delay Analysis
In this section, we indicate the group delay of our method. Figure 11 shows the desired sound pressure and the reconstructed sound pressure of a signal containing three frequency components of [600, 850, 1300] Hz at the point [−0.13, 0.07, 0.02] m. We can obtain from Figure 11 that the processing delay of the system is 1046 − 774 = 272 samples, which equals to L + R Q F s /c . The L samples of the delay is from the group delay of the proposed filter ρ n (ν, R Q ), while R Q F s /c is the delay introducing by the Legendre function within filter p n (ν, r) to reconstruct the sound pressure at a point with the time domain spherical harmonic coefficients. Comparing to a conventional frequency domain scenario with 512 frame-size and 75% of overlap Short Time Fourier transformation, which refers to a 2048 samples of delay [43], our proposed method can significantly reduce the processing delay.
Comparing to one of the start-of-art frequency domain spherical harmonic filter designs [44], which states a 75 ms delay with a 900 sample long filter, our method can achieve a 972 samples (20.25 ms with 48k Hz sampling frequency) delay with the same length of filter. Meanwhile, as our method is processed in time domain, there is nothing to stop us from doing a sample by sample signal processing instead of frame based signal processing. This sample based processing considerably extends the application of spherical harmonic analysis.

Conclusions
In this paper, a time domain spherical harmonic analysis method for spatial sound field recording over 3D space has been developed with the goal to minimize processing delay. This favours the specific application of spatial ANC. With the proposed FIR filter design, the time domain spherical harmonic coefficients can be obtained from the sound pressure measurements of an open spherical microphone array. The filters are designed based on the inverse of the Legendre function. Additionally, the filters are modified with considerations of stability and practical implementation. We have provided simulation results proving the validity of the proposed method.
We note that by obtaining the proposed time domain spherical harmonic coefficients, the desired sound field can be efficiently captured and reconstructed over space. The proposed time domain spherical harmonic coefficients can be related to the conventional frequency domain coefficients, where both have the same location independent property. The proposed method has the prominent advantage of lower delay since it is developed in the time domain without the introduction of a Fourier transformation or inverse Fourier transformation. Furthermore, the proposed time domain filtering method can support sample based signal processing instead of frame based, which indicates that the frame size can be one sample if necessary. As a result, we consider the proposed time domain spherical harmonic analysis method to be highly suitable for a spatial ANC system where accurate spatial recording with low delay is desired.
The most important future work is practically introducing the proposed spatial recording method to a spatial ANC system. Currently the proposed method utilizes open spherical microphone arrays, where the difficulties of constructing this array limit the potential applications. Hence, applying the proposed method to alternative optimised open microphone arrays is another direction for future work.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: ANC Active noise control 3D Three-dimensional IFFT Inverse fast fourier transform FIR Finite impulse response SNR Signal to noise ratio