# Audlet Filter Banks: A Versatile Analysis/Synthesis Framework Using Auditory Frequency Scales

^{1}

^{2}

^{*}

## Abstract

**:**

## Featured Application

**The proposed framework is highly suitable for audio applications that require analysis–synthesis systems with the following properties: stability, perfect reconstruction, and a flexible choice of redundancy.**

## Abstract

## 1. Introduction

## 2. Preliminaries

#### 2.1. Notations and Definition

#### 2.2. Filter Banks and Frames

#### 2.3. Auditory Frequency Scales

## 3. The Proposed Approach

#### 3.1. Analysis Filter Bank

#### 3.1.1. Construction of the Set $\left\{{f}_{k}\right\}$

#### 3.1.2. Construction of ${H}_{0}$ and ${H}_{K}$

**Remark**

**1.**

#### 3.1.3. Construction of the Set $\left\{{d}_{k}\right\}$

#### 3.2. Invertibility Test

- An eigenvalue analysis of the linear operator corresponding to analysis with ${({H}_{k},{d}_{k})}_{k}$ followed by FB synthesis with ${({H}_{k},{d}_{k})}_{k}$. The frame bounds A and B correspond to the smallest (infinimum) and largest (supremum) eigenvalues of the resulting operator, respectively. The largest eigenvalue can be estimated by numerical methods with reasonable efficiency but estimating the smallest eigenvalue directly is highly computationally expensive. In the next section we discuss an alternative method that consists in approximating the inverse operator and estimating its largest eigenvalue, the reciprocal of which is the desired lower frame bound A (see also Section 5 for an example frame bounds analysis).
- Computation of A and B directly from the overall FB response, i.e., verification that $0<A\le {\mathcal{H}}_{0}(\xi )\le B<\infty $ for some constants $A,B$ and almost every $\xi \in (-1/2,1/2]$.
- Checking of whether the overall aliasing is dominated by ${\mathcal{H}}_{0}$, i.e., if there exist $0<{A}_{0}\le {B}_{0}<\phantom{\rule{3.33333pt}{0ex}}\infty $ that satisfy$${A}_{0}\le {\mathcal{H}}_{0}(\xi )\pm \sum _{j=1}^{D-1}\left|{\mathcal{H}}_{j}(\xi )\right|\le {B}_{0},$$

#### 3.3. Synthesis Stage

Algorithm 1 Synthesis by means of conjugate gradients |

Initialize ${({H}_{k},{d}_{k})}_{k}$, ${({y}_{k})}_{k}$ |

${x}_{0}\in {\ell}_{2}(\mathbb{Z})$ (arbitrary) |

$j=0$ and $\epsilon >0$ (error tolerance) |

${\mathcal{H}}_{0}\leftarrow {\sum}_{k}{d}_{k}^{-1}{H}_{k}$ |

for $k=0,\dots ,K+1$ do |

${G}_{k}\leftarrow {H}_{k}/{\mathcal{H}}_{0}$ |

end for |

$b\leftarrow \tilde{\mathcal{S}}({({y}_{k})}_{k},{({G}_{k},{d}_{k})}_{k})$ |

${r}_{0}\leftarrow b-\tilde{\mathcal{S}}(\mathcal{A}({x}_{0},{({H}_{k},{d}_{k})}_{k}),{({G}_{k},{d}_{k})}_{k})$ |

${p}_{0}\leftarrow {r}_{0}$ |

while ${r}_{j}>\epsilon $ do |

${q}_{j}\leftarrow \tilde{\mathcal{S}}(\mathcal{A}({p}_{j},{({H}_{k},{d}_{k})}_{k}),{({G}_{k},{d}_{k})}_{k})$ |

${a}_{j}\leftarrow {\left|{r}_{j}\right|}^{2}/\langle {p}_{j},{q}_{j}\rangle $ |

${x}_{j+1}\leftarrow {x}_{j}+{a}_{j}{p}_{j}$ |

${r}_{j+1}\leftarrow {r}_{j}-{a}_{j}{q}_{j}$ |

${b}_{j}\leftarrow {|{r}_{j+1}/{r}_{j}|}^{2}$ |

${p}_{j+1}\leftarrow {r}_{j+1}+{b}_{j}{p}_{j}$ |

$j\leftarrow j+1$ |

end while |

## 4. Implementation

#### 4.1. Practical Issues

#### 4.2. Code

`audfilters`. The function allows to construct at will uniform or non-uniform Audlet FBs with integer or rational downsampling factors, thus offering flexibility in FB design. Rational downsampling factors can be achieved in the time domain by properly combining upsamplers and downsamplers (e.g., [19]). In LTFAT the sampling rate changes are directly performed in the frequency domain by periodizing and folding the ${Y}_{k}(z)$’s, then performing an inverse $\mathrm{DFT}$ [63]. This technique allows to achieve rational downsampling factors at low computational costs. The desired number of channels in the frequency range $[{f}_{\mathrm{min}},{f}_{\mathrm{max}}]$ can be set by specifying either K or V. The function

`audfilters`also accepts parameters $\mathrm{Scale}$, $\beta $, w, and ${R}_{\mathrm{t}}$. Currently, three scales (ERB—the default—as well as Bark and Mel) are available. Possible choices of w include (but are not limited to) Hann (default), Blackman, Nuttall, gammatone, or Gaussian. If ${R}_{\mathrm{t}}$ is specified, ${c}_{bw}$ is inferred from ${R}_{\mathrm{t}}$ according to (15)–(18). Otherwise ${c}_{bw}$ = 1. The analysis of a signal is performed by

`filterbank`. The synthesis is performed by

`ifilterbankiter`that implements Algorithm 1. In the painless case, the more computationally efficient synthesis can be achieved by first computing the exact synthesis FB with

`filterbankdual`and then synthesizing the signal with

`ifilterbank`. The function

`filterbankdual`can also be used to check whether a given analysis FB qualifies for the painless case.

#### 4.3. Computational Complexity

## 5. Evaluation

- The construction of uniform and non-uniform gammatone FBs and examination of their stability and reconstruction property at low and high redundancies. For this purpose we replicated the simulations described in [44] (Section IV), which we consider as state of the art.
- The construction of various analysis–synthesis systems and use to perform sub-band processing. For this purpose we considered the example application of audio source separation because it is intuitive, clear, and it easily demonstrates the behavior of the system when attempting modification of an audio signal. In this application we assess the effects of perfect reconstruction, bandwidth and shape of the filters, and auditory scale on the quality of sub-channel processing.

#### 5.1. Construction of Perfect-Reconstruction Gammatone FBs

#### 5.1.1. Method

#### 5.1.2. Results and Discussion

#### 5.2. Utility for Audio Applications

#### 5.2.1. Method

**trev_gfb**:- a state-of-the-art gammatone FB with approximate reconstruction (the acronym
**trev**stands for “time reversal”). The ${H}_{k}$’s followed (7) with $w(\xi )={w}_{csGT,4,1.019}(\xi )$ (22) with a threshold $\u03f5={10}^{-5}$. The synthesis filters ${G}_{k}({e}^{2i\pi \xi})=\overline{{H}_{k}}({e}^{2i\pi \xi})$. This corresponds to the baseline system used in audio applications like [11,28,29]. **Audlet_gfb**:- an Audlet FB with a gammatone prototype. The ${H}_{k}$’s were computed as in
**trev_gfb**but the synthesis stage was Algorithm 1. This system aims to compare to the baseline system and assess the effect of perfect reconstruction. **Audlet_hann**:- an Audlet FB with a Hann prototype. This system aims to assess the effect of filter shape.
**STFT_hann**:- an STFT using a 1024-point Hann window. Synthesis was achieved by the dual window [2]. The time step was then adapted to match the desired redundancy ${R}_{\mathrm{t}}$. This corresponds to the baseline system used in most audio applications (e.g., [10,66]). This system aims to assess the use of an auditory frequency scale.

#### 5.2.2. Results and Discussion

## 6. Conclusions

## Supplementary Materials

## Acknowledgments

## Author Contributions

## Conflicts of Interest

## References

- Flandrin, P. Time-Frequency/Time-Scale Analysis; Wavelet Analysis and Its Application; Academic Press: San Diego, CA, USA, 1999; Volume 10. [Google Scholar]
- Gröchenig, K. Foundations of Time-Frequency Analysis; Birkhäuser: Boston, MA, USA, 2001. [Google Scholar]
- Kamath, S.; Loizou, P. A multi-band spectral subtraction method for enhancing speech corrupted by colored noise. In Proceedings of the 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing, Orlando, FL, USA, 13–17 May 2002; Volume 4. [Google Scholar]
- Majdak, P.; Balazs, P.; Kreuzer, W.; Dörfler, M. A time-frequency method for increasing the signal-to-noise ratio in system identification with exponential sweeps. In Proceedings of the 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, Czech Republic, 22–27 May 2011. [Google Scholar]
- International Organization for Standardization. ISO/IEC 11172-3: Information Technology—Coding of Moving Pictures and Associated Audio for Digital Storage Media at up to About 1.5 Mbits/s, Part 3: Audio; Technical Report; International Organization for Standardization (ISO): Geneva, Switzerland, 1993. [Google Scholar]
- International Organization for Standardization. ISO/IEC 13818-7: 13818-7: Generic Coding of Moving Pictures and Associated Audio: Advanced Audio Coding; Technical Report; International Organization for Standardization (ISO): Geneva, Switzerland, 1997. [Google Scholar]
- International Organization for Standardization. ISO/IEC 14496-3/AMD-2: Information Technology—Coding of Audio-Visual Objects, Amendment 2: New Audio Profiles; Technical Report; International Organization for Standardization (ISO): Geneva, Switzerland, 2006. [Google Scholar]
- Průša, Z.; Holighaus, N. Phase vocoder done right. In Proceedings of the 25th European Signal Processing Conference (EUSIPCO–2017), Kos Island, Greece, 28 August–2 September 2017; pp. 1006–1010. [Google Scholar]
- Sirdey, A.; Derrien, O.; Kronland-Martinet, R. Adjusting the spectral envelope evolution of transposed sounds with gabor mask prototypes. In Proceedings of the 13th International Conference on Digital Audio Effects (DAFx-10), Graz, Austria, 10 September 2010; pp. 1–7. [Google Scholar]
- Leglaive, S.; Badeau, R.; Richard, G. Multichannel Audio Source Separation with Probabilistic Reverberation Priors. IEEE/ACM Trans. Audio Speech Lang. Process.
**2016**, 24, 2453–2465. [Google Scholar] [CrossRef] - Gao, B.; Woo, W.L.; Khor, L.C. Cochleagram-based audio pattern separation using two-dimensional non-negative matrix factorization with automatic sparsity adaptation. J. Acoust. Soc. Am.
**2014**, 135, 1171–1185. [Google Scholar] - Unoki, M.; Akagi, M. A method of signal extraction from noisy signal based on auditory scene analysis. Speech Commun.
**1999**, 27, 261–279. [Google Scholar] [CrossRef] - Bertin, N.; Badeau, R.; Vincent, E. Enforcing Harmonicity and Smoothness in Bayesian Non-Negative Matrix Factorization Applied to Polyphonic Music Transcription. IEEE Trans. Audio Speech Lang. Process.
**2010**, 18, 538–549. [Google Scholar] [CrossRef] - Cvetković, Z.; Johnston, J.D. Nonuniform oversampled filter banks for audio signal processing. IEEE Speech Audio Process.
**2003**, 11, 393–399. [Google Scholar] [CrossRef] - Smith, J.O. Spectral Audio Signal Processing. Online Book. 2011. Available online: http://ccrma.stanford.edu/~jos/sasp/ (accessed on 9 January 2018).
- Akkarakaran, S.; Vaidyanathan, P. Nonuniform filter banks: New results and open problems. In Beyond Wavelets; Studies in Computational Mathematics; Elsevier: Amsterdam, The Netherlands, 2003; Volume 10, pp. 259–301. [Google Scholar]
- Vaidyanathan, P. Multirate Systems And Filter Banks; Electrical Engineering, Electronic and Digital Design; Prentice Hall: Englewood Cliffs, NJ, USA, 1993. [Google Scholar]
- Vetterli, M.; Kovačević, J. Wavelets and Subband Coding; Prentice Hall PTR: Englewood Cliffs, NJ, USA, 1995. [Google Scholar]
- Kovačević, J.; Vetterli, M. Perfect reconstruction filter banks with rational sampling factors. IEEE Trans. Signal Process.
**1993**, 41, 2047–2066. [Google Scholar] - Balazs, P.; Holighaus, N.; Necciari, T.; Stoeva, D. Frame theory for signal processing in psychoacoustics. In Excursions in Harmonic Analysis; Applied and Numerical Harmonic Analysis; Birkäuser: Basel, Switzerland, 2017; Volume 5, pp. 225–268. [Google Scholar]
- Bölcskei, H.; Hlawatsch, F.; Feichtinger, H. Frame-theoretic analysis of oversampled filter banks. IEEE Trans. Signal Process.
**1998**, 46, 3256–3268. [Google Scholar] [CrossRef] - Cvetković, Z.; Vetterli, M. Oversampled filter banks. IEEE Trans. Signal Process.
**1998**, 46, 1245–1255. [Google Scholar] [CrossRef] - Strohmer, T. Numerical algorithms for discrete Gabor expansions. In Gabor Analysis and Algorithms: Theory and Applications; Feichtinger, H.G., Strohmer, T., Eds.; Birkhäuser: Boston, MA, USA, 1998; pp. 267–294. [Google Scholar]
- Härmä, A.; Karjalainen, M.; Savioja, L.; Välimäki, V.; Laine, U.K.; Huopaniemi, J. Frequency-Warped Signal Processing for Audio Applications. J. Audio Eng. Soc.
**2000**, 48, 1011–1031. [Google Scholar] - Gunawan, T.S.; Ambikairajah, E.; Epps, J. Perceptual speech enhancement exploiting temporal masking properties of human auditory system. Speech Commun.
**2010**, 52, 381–393. [Google Scholar] [CrossRef] - Balazs, P.; Laback, B.; Eckel, G.; Deutsch, W.A. Time-Frequency Sparsity by Removing Perceptually Irrelevant Components Using a Simple Model of Simultaneous Masking. IEEE Trans. Audio Speech Lang. Process.
**2010**, 18, 34–49. [Google Scholar] [CrossRef] - Chardon, G.; Necciari, T.; Balazs, P. Perceptual matching pursuit with Gabor dictionaries and time-frequency masking. In Proceedings of the 39th International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2014), Florence, Italy, 4–9 May 2014. [Google Scholar]
- Wang, D.; Brown, G.J. Computational Auditory Scene Analysis: Principles, Algorithms, and Applications; Wiley-IEEE Press: Hoboken, NJ, USA, 2006. [Google Scholar]
- Li, P.; Guan, Y.; Xu, B.; Liu, W. Monaural Speech Separation Based on Computational Auditory Scene Analysis and Objective Quality Assessment of Speech. IEEE Trans. Audio Speech Lang. Process.
**2006**, 14, 2014–2023. [Google Scholar] [CrossRef] - Glasberg, B.R.; Moore, B.C.J. Derivation of auditory filter shapes from notched-noise data. Hear. Res.
**1990**, 47, 103–138. [Google Scholar] [CrossRef] - Rosen, S.; Baker, R.J. Characterising auditory filter nonlinearity. Hear. Res.
**1994**, 73, 231–243. [Google Scholar] [CrossRef] - Lyon, R. All-pole models of auditory filtering. Divers. Audit. Mech.
**1997**, 205–211. [Google Scholar] - Irino, T.; Patterson, R.D. A Dynamic Compressive Gammachirp Auditory Filterbank. Audio Speech Lang. Process.
**2006**, 14, 2222–2232. [Google Scholar] [CrossRef] [PubMed] - Verhulst, S.; Dau, T.; Shera, C.A. Nonlinear time-domain cochlear model for transient stimulation and human otoacoustic emission. J. Acoust. Soc. Am.
**2012**, 132, 3842–3848. [Google Scholar] [CrossRef] [PubMed] - Feldbauer, C.; Kubin, G.; Kleijn, W.B. Anthropomorphic coding of speech and audio: A model inversion approach. EURASIP J. Adv. Signal Process.
**2005**, 2005, 1334–1349. [Google Scholar] [CrossRef] - Decorsière, R.; Søndergaard, P.L.; MacDonald, E.N.; Dau, T. Inversion of Auditory Spectrograms, Traditional Spectrograms, and Other Envelope Representations. IEEE Trans. Audio Speech Lang. Process.
**2015**, 23, 46–56. [Google Scholar] [CrossRef] - Lyon, R.; Katsiamis, A.; Drakakis, E. History and future of auditory filter models. In Proceedings of the 2010 IEEE International Symposium on Circuits and Systems (ISCAS), Paris, France, 30 May–2 June 2010; pp. 3809–3812. [Google Scholar]
- Patterson, R.D.; Robinson, K.; Holdsworth, J.; McKeown, D.; Zhang, C.; Allerhand, M.H. Complex sounds and auditory images. In Proceedings of the Auditory Physiology and Perception: 9th International Symposium on Hearing, Carcens, France, 9–14 June 1991; pp. 429–446. [Google Scholar]
- Hohmann, V. Frequency analysis and synthesis using a Gammatone filterbank. Acta Acust. United Acust.
**2002**, 88, 433–442. [Google Scholar] - Lin, L.; Holmes, W.; Ambikairajah, E. Auditory filter bank inversion. In Proceedings of the 2001 IEEE International Symposium on Circuits and Systems (ISCAS 2001), Sydney, Australia, 6–9 May 2001; Volume 2, pp. 537–540. [Google Scholar]
- Slaney, M. An Efficient Implementation of the Patterson-Holdsworth Auditory Filter Bank; Apple Computer Technical Report No. 35; Apple Computer, Inc.: Cupertino, CA, USA; 1993; pp. 1–42. [Google Scholar]
- Holdsworth, J.; Nimmo-Smith, I.; Patterson, R.D.; Rice, P. Implementing a Gammatone Filter Bank; Annex c of the Svos Final Report (Part A: The Auditory Filterbank); MRC Applied Psychology Unit: Cambridge, UK, 1988. [Google Scholar]
- Darling, A. Properties and Implementation of the Gammatone Filter: A Tutorial; Technical Report; University College London, Department of Phonetics and Linguistics: London, UK, 1991; pp. 43–61. [Google Scholar]
- Strahl, S.; Mertins, A. Analysis and design of gammatone signal models. J. Acoust. Soc. Am.
**2009**, 126, 2379–2389. [Google Scholar] [PubMed] - Balazs, P.; Dörfler, M.; Holighaus, N.; Jaillet, F.; Velasco, G. Theory, Implementation and Applications of Nonstationary Gabor Frames. J. Comput. Appl. Math.
**2011**, 236, 1481–1496. [Google Scholar] [CrossRef] [PubMed] - Holighaus, N.; Dörfler, M.; Velasco, G.; Grill, T. A framework for invertible, real-time constant-Q transforms. Audio Speech Lang. Process.
**2013**, 21, 775–785. [Google Scholar] [CrossRef] - Holighaus, N.; Wiesmeyr, C.; Průša, Z. A class of warped filter bank frames tailored to non-linear frequency scales. arXiv, 2016; arXiv:1409.7203. [Google Scholar]
- Necciari, T.; Balazs, P.; Holighaus, N.; Søndergaard, P. The ERBlet transform: An auditory-based time-frequency representation with perfect reconstruction. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vancouver, BC, Canada, 26–31 May 2013; pp. 498–502. [Google Scholar]
- Trefethen, L.N.; Bau, D., III. Numerical Linear Algebra; SIAM: Philadelphia, PA, USA, 1997. [Google Scholar]
- Moore, B.C.J. An Introduction to the Psychology of Hearing, 6th ed.; Emerald Group Publishing: Bingley, UK, 2012. [Google Scholar]
- Zwicker, E.; Terhardt, E. Analytical expressions for critical-band rate and critical bandwidth as a function of frequency. J. Acoust. Soc. Am.
**1980**, 68, 1523–1525. [Google Scholar] [CrossRef] - O’shaughnessy, D. Speech Communication: Human and Machine; Addison-Wesley: Boston, MA, USA, 1987. [Google Scholar]
- Daubechies, I.; Grossmann, A.; Meyer, Y. Painless nonorthogonal expansions. J. Math. Phys.
**1986**, 27, 1271–1283. [Google Scholar] [CrossRef] - Průša, Z.; Søndergaard, P.L.; Rajmic, P. Discrete Wavelet Transforms in the Large Time-Frequency Analysis Toolbox for Matlab/GNU Octave. ACM Trans. Math. Softw.
**2016**, 42, 32:1–32:23. [Google Scholar] [CrossRef] - Hestenes, M.R.; Stiefel, E. Methods of conjugate gradients for solving linear systems. J. NBS
**1952**, 49, 409–436. [Google Scholar] [CrossRef] - Gröchenig, K. Acceleration of the frame algorithm. IEEE Trans. Signal Process.
**1993**, 41, 3331–3340. [Google Scholar] [CrossRef] - Eisenstat, S.C. Efficient implementation of a class of preconditioned conjugate gradient methods. SIAM J. Sci. Stat. Comput.
**1981**, 2, 1–4. [Google Scholar] [CrossRef] - Balazs, P.; Feichtinger, H.G.; Hampejs, M.; Kracher, G. Double preconditioning for Gabor frames. IEEE Trans. Signal Process.
**2006**, 54, 4597–4610. [Google Scholar] [CrossRef] - Christensen, O. An Introduction to Frames and Riesz Bases; Applied and Numerical Harmonic Analysis; Birkhäuser: Boston, MA, USA, 2016. [Google Scholar]
- Smith, J.O. Audio FFT filter banks. In Proceedings of the 12th International Conference on Digital Audio Effects (DAFx-09), Como, Italy, 1–4 September 2009; pp. 1–8. [Google Scholar]
- Søndergaard, P.L.; Torrésani, B.; Balazs, P. The Linear Time Frequency Analysis Toolbox. Int. J. Wavelets Multiresolut. Inf. Process.
**2012**, 10, 1250032. [Google Scholar] [CrossRef] - Průša, Z.; Søndergaard, P.L.; Holighaus, N.; Wiesmeyr, C.; Balazs, P. The large time-frequency analysis toolbox 2.0. In Sound, Music, and Motion; Springer: Berlin, Germany, 2014; pp. 419–442. [Google Scholar]
- Schörkhuber, C.; Klapuri, A.; Holighaus, N.; Dörfler, M. A matlab toolbox for efficient perfect reconstruction time-frequency transforms with log-frequency resolution. In Proceedings of the Audio Engineering Society 53rd International Conference on Semantic Audio, London, UK, 27–29 January 2014. [Google Scholar]
- Velasco, G.A.; Holighaus, N.; Dörfler, M.; Grill, T. Constructing an invertible constant-Q transform with nonstationary Gabor frames. In Proceedings of the 14th International Conference on Digital Audio Effects (DAFx-11), Paris, France, 19–23 September 2011; pp. 93–99. [Google Scholar]
- Lehoucq, R.; Sorensen, D.C. Deflation Techniques for an Implicitly Re-Started Arnoldi Iteration. SIAM J. Matrix Anal. Appl.
**1996**, 17, 789–821. [Google Scholar] [CrossRef] - Le Roux, J.; Vincent, E. Consistent Wiener Filtering for Audio Source Separation. Signal Process. Lett. IEEE
**2013**, 20, 217–220. [Google Scholar] [CrossRef][Green Version] - Emiya, V.; Vincent, E.; Harlander, N.; Hohmann, V. Subjective and Objective Quality Assessment of Audio Source Separation. IEEE Trans. Audio Speech Lang. Process.
**2011**, 19, 2046–2057. [Google Scholar] [CrossRef][Green Version] - Balazs, P. Basic Definition and Properties of Bessel Multipliers. J. Math. Anal. Appl.
**2007**, 325, 571–585. [Google Scholar] [CrossRef]

**Figure 1.**General structure of a non-uniform analysis filter bank (FB) ${({H}_{k},{d}_{k})}_{k}$ with ${H}_{k}$ being the z-transform of the impulse response ${h}_{k}\left[n\right]$ of the filter, also denoted as $\mathcal{A}(\xb7,{({H}_{k},{d}_{k})}_{k})$.

**Figure 2.**General structure of a non-uniform synthesis FB ${({G}_{k},{d}_{k})}_{k}$, also denoted by $\tilde{\mathcal{S}}(\xb7,{({G}_{k},{d}_{k})}_{k})$.

**Figure 3.**General structure of a synthesis system. $\mathcal{S}$ is a linear operator that maps the sub-band components ${y}_{k}$ to an output signal $\tilde{x}$.

**Figure 4.**Illustration of the frequency allocations of the filters ${H}_{0}$ (red line) and ${H}_{K}$ (green line) given the restricted frequency response ${\mathcal{H}}_{0}^{(r)}(\xi )$ (dashed line) of an FB.

**Figure 5.**Source separation for ${R}_{\mathrm{t}}$ = 4 and $\beta $ = 1/6 displayed as time-frequency (TF) plots: the magnitude of each sub-band component (in dB) as a function of time (in s). (

**a**) Shows the mixture analyzed by a gammatone FB; (

**b**) Shows the target (voice) analyzed by a gammatone FB; (

**c**) Shows the binary mask obtained for Audlet_hann; (

**d**) Shows the binary mask obtained for trev_gfb and Audlet_gfb—the black and white dots in the masks represent ‘1’ and ‘0’ entries, respectively; (

**e**,

**f**) Show the target separated by Audlet_hann and Audlet_gfb, respectively.

**Table 1.**Ratios $B/A$ for various combinations of D and K obtained for the proposed Audlet framework and reported in [44] (S–M).

K | Framework | $\mathit{D}=1$ | $\mathit{D}=2$ | $\mathit{D}=4$ | $\mathit{D}=6$ | $\mathit{D}=8$ |
---|---|---|---|---|---|---|

51 | Audlet | 1.124 | 1.124 | 1.125 | 1.134 | 1.157 |

S–M | 1.100 | > 10 | > 10 | > 10 | > 10 | |

76 | Audlet | 1.007 | 1.007 | 1.009 | 1.021 | 1.073 |

S–M | 1.100 | 2 | 2 | 3 | 6 | |

101 | Audlet | 1.003 | 1.003 | 1.005 | 1.017 | 1.068 |

S–M | 1.003 | 1.003 | 1.003 | 2 | 4 | |

151 | Audlet | 1.015 | 1.015 | 1.016 | 1.025 | 1.066 |

S–M | 1.003 | 1.003 | 1.003 | 1.100 | 2 |

**Table 2.**Signal-to-noise ratios (SNRs; in dB) obtained for the Audlet framework and reported in [44] (Figure 10) (S–M).

${\mathit{d}}_{\mathit{k}}$ Based on (15)–(18) | ${\mathit{d}}_{\mathit{k}}$ from [44] | |||||
---|---|---|---|---|---|---|

${R}_{\mathrm{t}}$ | R | Audlet | S–M | R | Audlet | S–M |

2 | 2.40 | $>180$ | 5 | 2.38 | $>170$ | 10 |

4 | 4.46 | $>180$ | 7 | 4.38 | $>190$ | 13 |

8 | 8.60 | $>180$ | 10 | 8.38 | $>200$ | 17 |

12 | 12.73 | $>220$ | 9 | 12.38 | $>210$ | 18 |

16 | 16.87 | $>260$ | 15 | 16.38 | $>200$ | 19 |

**Table 3.**Objective quality measures for the separated voice signal. The signal-to-distortion ratio (SDR) and signal-to-artifact ratio (SAR) are in dB; the larger the ratio, the better the separation result. Overall perceptual score (OPS) and target perceptual score (TPS) are without unit; they indicate scores between 0 (bad quality) and 1 (excellent quality). The corresponding audio files are available on the companion webpage. STFT: short-time Fourier transform.

System | ${\mathit{R}}_{\mathbf{t}}$ | SDR | SAR | OPS | TPS | ||||
---|---|---|---|---|---|---|---|---|---|

$\mathit{\beta}=$ 1 | 1/6 | 1 | 1/6 | 1 | 1/6 | 1 | 1/6 | ||

trev_gfb | 1.1 | 0.1 | 5.8 | 3.2 | 9.2 | 0.26 | 0.26 | 0.06 | 0.12 |

Audlet_gfb | 4.7 | 10.7 | 8.5 | 19.0 | 0.25 | 0.31 | 0.11 | 0.20 | |

Audlet_hann | 4.7 | 11.8 | 7.6 | 18.3 | 0.26 | 0.34 | 0.05 | 0.26 | |

STFT_hann | −1.7 | 0.5 | 0.46 | 0.02 | |||||

trev_gfb | 1.5 | 2.4 | 8.5 | 5.7 | 13.5 | 0.24 | 0.30 | 0.11 | 0.17 |

Audlet_gfb | 6.9 | 11.1 | 12.5 | 20.5 | 0.24 | 0.35 | 0.13 | 0.29 | |

Audlet_hann | 7.0 | 12.8 | 11.1 | 20.1 | 0.22 | 0.36 | 0.07 | 0.35 | |

STFT_hann | 2.4 | 9.2 | 0.22 | 0.04 | |||||

trev_gfb | 4 | 7.0 | 10.7 | 12.0 | 18.9 | 0.24 | 0.37 | 0.24 | 0.34 |

Audlet_gfb | 9.0 | 11.4 | 18.3 | 21.6 | 0.27 | 0.38 | 0.32 | 0.39 | |

Audlet_hann | 11.1 | 13.1 | 19.4 | 21.7 | 0.25 | 0.37 | 0.21 | 0.32 | |

STFT_hann | 11.4 | 20.5 | 0.38 | 0.34 |

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Necciari, T.; Holighaus, N.; Balazs, P.; Průša, Z.; Majdak, P.; Derrien, O.
Audlet Filter Banks: A Versatile Analysis/Synthesis Framework Using Auditory Frequency Scales. *Appl. Sci.* **2018**, *8*, 96.
https://doi.org/10.3390/app8010096

**AMA Style**

Necciari T, Holighaus N, Balazs P, Průša Z, Majdak P, Derrien O.
Audlet Filter Banks: A Versatile Analysis/Synthesis Framework Using Auditory Frequency Scales. *Applied Sciences*. 2018; 8(1):96.
https://doi.org/10.3390/app8010096

**Chicago/Turabian Style**

Necciari, Thibaud, Nicki Holighaus, Peter Balazs, Zdeněk Průša, Piotr Majdak, and Olivier Derrien.
2018. "Audlet Filter Banks: A Versatile Analysis/Synthesis Framework Using Auditory Frequency Scales" *Applied Sciences* 8, no. 1: 96.
https://doi.org/10.3390/app8010096