Efficient Sigma–Delta Sensor Array Beamforming

Nowadays, sensors with built-in sigma–delta modulators (ΣΔMs) are widely used in consumer, industrial, automotive, and medical applications, as they have become a cost-effective and convenient way to deliver data to digital processors. This is the case for micro-electro-mechanical system (MEMS), digital microphones that convert analog audio to a pulse-density modulated (PDM) bitstream. However, as the ΣΔMs output a PDM signal, sensors require either built-in or external high-order decimation filters to demodulate the PDM signal to a baseband multi-bit pulse-code modulated (PCM) signal. Because of this extra circuit requirement, the implementation of sensor array algorithms, such as beamforming in embedded systems (where the processing resources are critical) or in very large-scale integration (VLSI) circuits (where the power and area are crucial) becomes especially expensive as a large number of parallel decimation filters are required. This article proposes a novel architecture for beamforming algorithm implementation that fuses delay and decimation operations based on maximally flat (MAXFLAT) filters to make array processing more affordable. As proof of concept, we present an implementation example of a delay-and-sum (DAS) beamformer at given spatial and frequency requirements using this novel approach. Under these specifications, the proposed architecture requires 52% lower storage resources and 19% lower computational resources than the most efficient state-of-the-art architecture.


Introduction
In the last decades, sensor array processing has emerged as an active area of research in estimating space-time parameters.Array-processing applications are applied to solve many real-world problems.In telecommunications, for example, antenna arrays are steered in one user direction to reduce user interference.Radar and sonar use arrays of antennas and hydrophones, respectively, to calculate parameters like direction of arrival (DoA), velocity, and range.In medicine, sensor arrays are used for medical imaging, and planar biomagnetic sensor arrays are used in electrocardiograms to localize brain activity.In industry, sensor arrays are used in automatic monitoring and fault detection [1].
More recently, microphone array processing has emerged to increase the audio quality in consumer devices like mobile phones, speakerphones, and smart speakers, which are broadly used in conference rooms, desktop devices, and intelligent virtual assistants (IVA), in both consumer and industrial devices.Most frequently, the signals from several microphones are combined via a beamforming algorithm to enhance the sound coming from a desired direction while attenuating ambient noise and interference [1].
However, microphone array implementations are still expensive due to the complex characteristics of speech signals (non-static source, intermittent, and broadband) and the usual environmental conditions (reverberation and non-stationary additive noise).Adding an extra microphone in the design requires new routing, new placement conditions, and more processing resources, increasing the system cost and power consumption, a critical factor for internet of things (IoT) and mobile applications.
Digital MEMS microphones (introduced in 2006 [2]) have emerged as an alternative to overcome the size and cost limitations.As these microphones have an analog-to-digital converter (ADC) incorporated as a pre-amplifier [3], they have a single line PDM output; because of that, they are also known as PDM microphones (PDM-mics).A decimation filter (also known as a PDM-to-PCM converter) demodulates this PDM bitstream output to a PCM signal.Unfortunately, implementing this decimation filter is still not cheap, as its cost (measured in die area and power) increases with the quality of the desired audio signal.Take, for example, the case of a microphone array using these PDM-mics.This architecture requires a decimation filter for each microphone input so that the implementation cost and power consumption will increase proportionally with the number of microphones, being even more expensive for practical applications.
This paper proposes a novel and economical method to implement beamforming algorithms with arrays of MEMS digital microphones.We apply the new architecture to a DAS beamformer as a proof of concept, but it can also be used with other beamforming strategies.This method merges a conventional beamformer's filtering and delays operations into a single structure dubbed as delayed decimation filter.We propose a J-stage decimation filter whose penultimate stage (J − 1) is a Samadi filter, and its last stage (J) is an equiripple filter.The Samadi filter controls the overall filter delay by adjusting a single parameter, and the last equiripple stage compensates for the magnitude and phase distortion caused by the Samadi filter under a specific limit.
In the end, the proposed delayed decimation filter is an "all-in-one" filter that performs the same filtering and downsampling operations as any state-of-the-art decimation filter, has the capability of altering its group delay without any change in its structure or additional delay chain, and provides storage and computational resources savings in comparison to state-of-the-art architectures.
To explain the working principle of the proposal, we first recapitulate the implementation of a DAS beamformer in Section 2. We then present a novel beamformer based on delayed decimation filters in Section 3, where we introduce multirate and decimation filters, as well as how a Samadi filter can be used with these structures.To conclude, as a proof of concept, we present in Section 4 an implementation of this novel architecture, and in Section 5, we compare it to state-of-the-art DAS beamformer architectures.

DAS Beamformer
The DAS beamformer is the oldest and simplest array signal processing algorithm [1].The underlying idea is to delay each microphone input by an appropriate time delay and then add all delayed microphone signals together.In this sense, the audio signal arriving from a particular direction at the array is reinforced in relation to signals coming from different directions and incoherent noise.
The traditional or discrete-time DAS beamformer (In the literature, the traditional DAS does not have the weights w m in its temporal representation because these weights only show up if you use a "weighted DAS" or a frequency representation; however, in this work the "weighted DAS" is referred to as the traditional DAS, as w m can implement the averaging process.) is the result of where y m is the mth microphone's output in PCM representation and k m is the integer delay associated with the mth microphone, such that where ∆ m is the required delay in the mth microphone, [x] means the nearest integer to x, and f o and T are the sampling rate and period in y m , respectively.
In case of PDM-mics, Equation (1) can be represented as shown in Figure 1, such that y m is the decimation filter's output and x m is the PDM bitstream incoming from the respective mth PDM-mic.Due to the integer nature of k, the DAS beamformer does not allow one to form sums that involve noninteger multiples of T. Consequently, beams cannot be steered in arbitrary directions, resulting in a directivity pattern with a stepped response due to the integer nature of the delay elements, which limits the beamformer resolution (as exemplified in Figure 2).Also, if one assumes uncorrelated noise at the locations of the sensors and that the beamformer's delays are appropriately matched to the wave's DoA, it can be proven [4] that the beamformer gain (G) depends only on the weights w m and the number of microphones: so that, for the beamformer in Figure 2, with M = 40 and w m = 1, the white noise gain will be G = 40 or 32 dB.Furthermore, the dynamic range depends only on the number of elements in the array.The array used for the current example provides a dynamic range of 13 dB.Three audio sources of 1 kHz, 3 kHz, and 5 kHz are located at 20, 60, and 110 degrees, respectively, i.e., the three with equal strength.The beamformer is placed on the X-axis.Therefore, its directivity pattern is symmetric about this axis.

Beamformer Based on Delayed Decimation Filter
Figure 1 describes a typical architecture for implementing DAS beamformers with PDM-mics.For each PDM-mic, there is an associated decimation filter to convert the PDM bitstream into a PCM bitstream and a delay line to steer the beamformer.To devise a more economical implementation of this architecture, we propose to merge the decimation filtering and the delaying operations into a single structure.To explain how a Samadi filter can be used for this purpose, we first review the concept of multirate and decimation filters, present the Samadi filter structure, show how it can be used as a multirate filter, and finally propose a new beamforming architecture based on this multirate filter (delayed decimation filter).

Multirate and Decimation Filters
Multirate filters are digital filters whose different parts operate at different rates.The most obvious application of such a filter is when the input and output sample rates must differ (decimation or interpolation).A decimation filter is a class of multirate filters [5] that decreases a signal sampling rate by an integer or fractional factor.Figure 3 shows a generic decimation filter structure, where the input signal at f i sampling rate passes through a low-pass filter (LPF) with impulse response H(z), and then it is downsampled by a factor R to an output sampling rate f o = f i /R.In the case of a PDM-mic, usually, x[n] has a one-bit width only while y[k] is a multi-bit output.For a given application, there are many design parameters to be taken into account for the LPF design, such as filter passband frequency F p , stopband frequency F s , passband ripple δ p , and stopband ripple δ s , as exemplified in Figure 4.Those LPF design parameters are related as follows: where U p and U s are the passband and stopband frequency ranges, respectively.Also, the angular passband and stopband frequencies can be expressed as ) and U p and U s intervals can be scaled to angular frequency domain as Low-pass filter design parameters.The passband and stopband regions are defined by F p and F s , respectively, and their respectives ripples are defined by δ p and δ s .The whole filter frequency response is constrained to the input sampling rate ( f i ).
In the case of audio sensors such as MEMS microphones, a decimation filter is required to convert the oversampled output from the internal ADC to a standard audio PCM output.Baseband signal quality parameters such as linearity, signal-to-noise ratio (SNR), total harmonic distortion (THD), and total harmonic distortion plus noise (THD+N) can be worsened at the filter output if the LPF is not properly designed [6].Also, the LPF structure should be carefully chosen to obtain a proper phase response.A Finite Impulse Response (FIR) structure, for example, can be used if a linear phase is required; otherwise, Infinite Impulse Response (IIR) filters are preferred, as, usually, IIR filters are smaller than their equivalent FIR implementations.Moreover, some applications tolerate some degree of non-linearity in phase; in this case, quasi-linear filters, a mixture of FIR and IIR filters, can be used.

Universal Maximally Flat Samadi Filter
As derived in [7], the transfer function in Samadi filters is defined by where K is the number of zeros at z = −1, N is the filter order, and the delay parameter d is a real number defined as For a given group delay α, such that 0 ≤ α ≤ N, from (9), one can verify that where d max = N/2 is the maximum allowed delay parameter and the binomial coefficients in ( 8) are defined as This filter becomes a maximally flat (MAXFLAT) linear phase FIR when d = 0.As shown in [8,9], the angular passband frequency (ω p ) of these linear phase filters is related with N as L Nω p /π + 0.5 (13) where L is defined for convenience as The cutoff frequency of these linear phase filters increases almost linearly with L, as shown in Figure 5 for different values of N. Also, as demonstrated in [7] and shown in Figure 5a, for linear phase filters (d = 0), the coefficient of ( 7) is Then, the magnitude frequency spectra of L = 2j and L = 2j + 1 are the same for j ∈ {0, . . ., N/2 − 1 }.On the other hand, when d = 0, the Samadi filter becomes a MAXFLAT nonlinear phase filter.The most interesting characteristic of this filter class is the ability to modify its group delay with the filter delay parameter (d), as given by (9). Figure 6 shows how the flatness of the magnitude and phase of the filter's frequency response is affected when d increases-we see that passband δ p 's ripples worsen as d increases.However, it is also shown that the phase is still linear inside the passband region for ω < 0.15π and that the decimation filter continues under the same specification for all values |d| ≤ 5.This suggests that this filter can be used as an intermediary stage in a multirate filter chain to adjust the overall filter delay (∆) and perform low-pass filtering at the same time, as discussed in the following sections.Finally, we propose Algorithm 1 to calculate the minimum K and N Samadi filter values for a given d, matching a given filter specification with the following parameters: V p , V s , δ p , and δ s .In lines 2-4, the algorithm initializes ω p , L, and N values to the minimum possible ones.Then, in line 5, it starts to iterate to calculate the minimum K and N values.In line 6, K is updated.In lines 7-8, δ p and δ s are calculated from the filter frequency response for V p and V s ranges, respectively, and for the current K and N values.If δ p and δ s meet the specification, it returns the parameter values in line 10.Else, in lines 12-26, it increases the N or L value, depending on the d weight or if the filter parameters are inside ranges defined in ( 13)-(15).Algorithm 1 Samadi Filter minimum N and K calculation algorithm end loop 29: end procedure Also, it is essential to remark that, if the Samadi filter is designed for d max , the decimation filter continues under the same specification for values |d| ≤ d max .This effect can be observed in Figure 6a, where δ p decreases for lower values of d, and, in Figure 7, where, for d ≥ 3, if N is kept constant and d is decreased, ω p tends to increase so that the flatness is improved.

Delayed Decimation Filter
Because of its configurable group delay property, a single Samadi filter could be used as the LPF of a multirate filter with adjustable overall filter delay, as shown in Figure 8athis structure is dubbed in this paper as delayed decimation filter.However, as a Samadi filter does not have the flexibility to be designed for specific F p and F s values without changing other filtering parameters, its frequency response needs to be compensated to keep the overall decimation filter's parameters under specification for different delay values (d).For this reason, we propose a J-stages decimation filter architecture whose penultimate stage (J − 1) is a Samadi filter and its last stage (J) is an equiripple filter, as shown in Figure 8b.The Samadi filter can then be decomposed iton its binomial components, as shown in Figure 8c.
The Samadi filter controls the overall filter delay (∆) by setting its respective d parameter, and the last equiripple stage compensates for the magnitude and phase distortion caused by the Samadi filter under a specified limit.Also, as this is a multi-stage filter, other filtering stages (1 to J − 2) can be optionally added to help with decimation and filtering.
The overall filter delay ∆ depends on the d, R J−1 , and R J parameters in such a way: If we replace ( 16) in (11), it is observed that the maximum required delay (∆ max ) is limited by the d max parameter as follows: Therefore, since d max = ∆ max R J R J−1 f o , the minimum K and N parameters can be calculated using Algorithm 1 for d = d max and the desired filter specification parameters:

Optimized Beamformer Structure
Since the Samadi filter is a binomial filter sequence (as first proposed by Haddad in [10]), (7) can be rearranged to allow the filter to be expressed as The binomial filter in Equation ( 18) can be realized as a cascade of two filters: where The Samadi filter stage in a delayed decimation filter in Figure 8c can be expressed in its binomial representation in such a way that the latter part of the filter chain does not depend on ∆, as d is used only for the calculation of c j .Therefore, if M delayed decimation filters are placed in parallel, the weightings by w m are placed just before the A N (z) filter and their outputs are added to form a beamformer.Note that the latter part after B N,K,d (z) can be shared between all microphone channels, as shown in Figure 9.

Proof of Concept
We now evaluate the proposed architecture.We determine the delayed decimation filter parameters for a given specification and compare the proposed architecture to stateof-the-art DAS beamformer architectures.

Decimation Filter Specifications
Filter specifications and array geometries change depending on the beamformer application.Therefore, to compare the efficiency between the proposed method and the straightforward DAS beamformer implementation, we use the specification shown in Table 1 as the basis of all our decimation filter designs, as it is considered enough for most PDM-mic types and speech-processing applications.

Beamformer Specification
The delay from the array center to the mth microphone (∆ m ) in an array is constrained to where xmax is the furthest sensor location in relation to xc (which is the array's center reference), M is the number of microphones, and c is the sound speed (typically 343.0 m/s).
Assume that we require a microphone array for hands-free applications that, when placed 80 cm from the voice source, would attain the same SNR as the SNR obtained by a single microphone placed 2 cm from the same source [11].Then, by (3), the desired microphone array requires M = 40 microphones.
Also, as the minimum distance between microphones should be D min ≤ c/2F p to avoid spatial aliasing, if the frequency range is limited to F p = 7.5 kHz, then the desired microphone array will require D min ≤ 2 cm.Finally, as M = 40, if a 5 × 8 microphone array is assumed, then the ∆ max can be calculated using ( 22), with the resulting value shown in Table 2.

Filter Design
A delayed decimation filter was designed according to specifications listed in Table 1.The filter has a three-stage architecture ([lthband, maxflat, equir]) with respective decimation rates [48, 2, 2].The lthband stage is an LPF whose cutoff frequency is π/L, and the impulse response is zero for every L-th sample [5].The second stage is a maxflat Samadi filter, and the last is an equiripple filter [12].As R J = R J−1 = 2, by (17), d max = 20.13; the parameters N and K of the maxflat stage are calculated using Algorithm 1 so that the overall filter specification is kept for all |d| ≤ 20. 13.
Figure 10a shows the individual frequency spectrum of each internal stage for d max = 20.13, and Figure 10b zooms in the passband frequency region.Note that even though the maxflat stage has a bumpy frequency spectrum above the passband frequency (F p ), this is compensated by the last stage equiripple filter (equir).Figure 11a also shows that the magnitude in the overall frequency spectrum of the delayed decimation filter is inside the required passband and stopband filter specifications, while Figure 11b,c show that the filter phase and magnitude response is almost linear in the passband range.
The advantage of using a Samadi filter is that it allows one to change its group delay by changing some coefficients, i.e., without changing the whole filter structure.Figure 12 shows the group delay of this multi-stage filter for many values of its d parameter.It is easy to see how the group delay is directly proportional to the d parameter.Table 3 shows the resources required to implement a DAS beamformer based on this three-stage delayed decimation filter designed for array specifications listed in Table 2, and Table 4 shows the breakdown of resources required per filter stage.

Results
Results from Table 3 are compared to other state-of-the-art DAS beamformer architectures (more details in [13]) in Table 5.
The pcm_multi architecture is the same as shown in Figure 1 but uses a multi-stage decimator filter structure for each channel.It has more beamformer's storage requirement and additions per second because of the parallel architecture for delaying and filtering.
The pcm_single_memsav architecture is also the same as shown in Figure 1 but uses a single-stage decimation filter with a memory-saving polyphase implementation [14] for each channel.This architecture has the lowest beamformer's storage requirement because of the polyphase implementation.Still, conversely, it also has the most additions per second because more operations are performed at higher sampling rates before downsampling.
The pdm_multi architecture is the same as shown in Figure 13.Still, using a multi-stage decimator filter structure in the output is the most efficient state-of-the-art architecture because only a single decimation filter is required, and the delaying operations require only a few bits per channel.
The pdm_single_memsav architecture is also the same as shown in Figure 13 but using a single-stage decimation filter with a memory-saving polyphase implementation [14] in the output.It has lower beamformer's storage requirement because of the polyphase implementation, but, conversely, it also requires more additions per second because more operations are performed at higher sampling rates before downsampling.
Table 5 shows that, for the given specification and because of the shared resources for delaying and filtering, the proposed architecture (delayed_bf ) requires about 19% lower computational resources (additions per second) and 52% lower storage (beamformer's storage requirement) than the most efficient state-of-the-art architecture (pdm_multi).
It is also observed that the proposed architecture's storage efficiency is ranked just after the pcm_single_memsav architecture.However, as the pcm_single_memsav architecture also requires a prohibitive quantity of computational resources (about 697% more), it can be concluded that the proposed beamformer based on delayed decimation filters is the most resource-efficient beamformer architecture for the given specification.
Finally, we see that, because of the lowest computational resources requirement, in practical cases such as implementing the beamformer either in a single-core/single-adder CPU, in a Field-Programmable Gate Array (FPGA) running at 64 MHz, or in an integrated circuit (VLSI) running at 10 MHz, the proposed architecture will be, in all cases, about 19% more efficient.
x  Finally, the resulting sum is filtered and downsampled.
Table 5.Comparison of the proposed beamformer architecture based on delayed decimation filter (delayed_bf ) and other state-of-the-art beamformer architectures implementing a DAS beamformer, as specified in Tables 1 and 2. All percentages are related to the respective value for the pdm_multi architecture, the most efficient state-of-the-art architecture found for the given specification [13].

Conclusions
In this study, we proposed combining the decimation filters found in PDM-mics with the delay line required in the traditional DAS beamformer.This was achieved by designing a decimation filter that includes a stage realized with the Samadi filter structure, which easily allows its group delay to be altered by the varying a single parameter.
We evaluated the proposed architecture by comparing it to other state-of-the-art DAS beamformer architectures.To facilitate the comparison, we established a set of filter specifications as a baseline for all decimation filter designs.These specifications were sufficient for various PDM-mics and speech-processing applications.
The designed filter demonstrated satisfactory performance, as exemplified in the frequency response and group delay plots.Furthermore, using a Samadi filter provided flexibility in adjusting the group delay without altering the overall filter structure.
Overall, the proposed architecture showed promising filter design and resource requirements results, providing the best trade-off between storage and computational resources.The presented specification requires 52% lower storage resources and 19% lower computational resources than the most efficient state-of-the-art architecture.The findings support the feasibility and effectiveness of the proposed approach for beamforming applications applied, but not limited, to DAS beamformers.

Figure 1 .
Figure 1.PDM microphones' DAS beamformers.Each PDM-mic requires a decimation filter with H(z) frequency response and R downsampling.Then, each filter output y m [k] is delayed by a ∆ m factor.Finally, all delayed signals are weighted (factor w m ) and summed together.

Figure 2 .
Figure 2. Normalized power (polar) of a uniform linear array of an M = 40 microphones DAS beamformer.Three audio sources of 1 kHz, 3 kHz, and 5 kHz are located at 20, 60, and 110 degrees, respectively, i.e., the three with equal strength.The beamformer is placed on the X-axis.Therefore, its directivity pattern is symmetric about this axis.

Figure 3 .
Figure 3. Generic decimation filter structure.In order to avoid aliasing, the input data x[n] at f i sampling rate is low-pass filtered and then downsampled by R. If correctly filtered, the output data y[n] at f o sampling rate contain the same information as x[n] decimated by R.

Figure 5
also shows that the filter has a linear phase and that the group delay for d = 0 is α = N/2, as expected by (9).

Figure 5 .
Figure 5. Normalized frequency spectra of linear-phase Samadi filters (d = 0) with N = 9 and N = 12: (a) magnitude, (b) phase, and (c) group delay.It is observed that, in d = 0 case, ω p changes linearly with L, that the phase is linear for both N values and that the group delay is proportional to N.

Figure 6 .
Figure 6.Normalized frequency spectra of Samadi filters with N = 10 and d ∈ {−5, . . ., 5}: (a) magnitude, (b) phase, and (c) group delay.It is observed that, approximately until ω/π < 0.15, the magnitude is flat, the phase is linear, and the group delay is proportional to d.For ω/π ≥ 0.15, the frequency response is nonlinear in magnitude, phase, and group delay.

Figure 7
Figure7shows minimum N and K values, calculated using Algorithm 1 for d ∈ {0, . . ., 26} and different values of ω p .It is shown that the minimum N, required for any d, decreases with ω p increments, and it is almost three times d when ω p /π = 0.28.Also, it is essential to remark that, if the Samadi filter is designed for d max , the decimation filter continues under the same specification for values |d| ≤ d max .This effect can be observed in Figure6a, where δ p decreases for lower values of d, and, in Figure7, where, for d ≥ 3, if N is kept constant and d is decreased, ω p tends to increase so that the flatness is improved.

Figure 7 .
Figure 7. Minimum (a) N and (b) K values calculated using Algorithm 1 for δ s = −80 dB and different values of d and ω p .It is observed that ω p and d have a negative correlation for a given N value i.e., when ω p increases, d decreases.

Figure 8 .
Figure 8.(a) Delayed decimation filter, (b) its version as a multi-stage decimation filter with the J − 1 stage being a Samadi filter, and (c) its version with Samadi filter decomposed into its binomial components.Samadi filter stage is meant to control the overall filter delay (∆) and the equiripple filter to compensate the non-linear response of the Samadi filter in its non-flat band.The optional Stages 1 to J − 2 are meant to compensate and downsample the overall frequency response.

Figure 10 .Figure 11 .Figure 12 .
Figure 10.(a) Magnitude frequency spectrum of internal stages of the delayed decimation filter in the whole input range, and (b) the same frequency spectrum in the 0 kHz to 50 kHz range.

Figure 13 .
Figure 13.PDM microphones' DAS beamformer at PDM domain.Each PDM-mic output x m [n] is delayed by a ∆ m factor, then all delayed signals are weighted (factor w m ) and summed together.Finally, the resulting sum is filtered and downsampled.
p and δ s ≤ δ s then if δ s ≥ 1 or L ≥ N then

Table 3 .
Required resources to implement a beamformer using 40 shared delayed decimation filters.

Table 4 .
Delayed decimation filter resource requirements breakdown.The first row corresponds to the Lth-band filter stage, the second and third ones are to the B N,K,d (z) and A n (z) parts of the Samadi filter, respectively, and the last one to the equiripple filter.
H N,K,d (z) Samadi filter impulse response.H(z) low-pass filter impulse response.S z bf beamformer's storage requirement.K number of zeros at z = −1 in a Samadi filter.L frame frame length (for frequency domain implementations).