# Designing Audio Equalization Filters by Deep Neural Networks

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Problem Statement

## 3. Proposed Method

#### 3.1. Multilayer Perceptron

#### 3.2. Convolutional Neural Networks

#### 3.3. Autoencoder

## 4. Baseline Methods

#### 4.1. Frequency Deconvolution Method

#### 4.2. Steepest Descent Method

## 5. Experiments

## 6. Results

#### 6.1. Alfa Romeo Giulia

#### 6.2. Jeep Renegade

#### 6.3. Sensitivity to Head Movements

#### 6.4. Sensitivity to the Input

#### 6.5. Over-Determined Case

#### 6.6. Remarks

#### 6.7. Results Summary

## 7. Conclusions

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

## References

- Cecchi, S.; Carini, A.; Spors, S. Room Response Equalization—A Review. Appl. Sci.
**2017**, 8, 16. [Google Scholar] [CrossRef] [Green Version] - D’Orazio, D.; Garai, M. The autocorrelation-based analysis as a tool of sound perception in a reverberant field. Riv. Estet.
**2017**, 133–147. [Google Scholar] [CrossRef] - Karjalainen, M.; Paatero, T.; Mourjopoulos, J.N.; Hatziantoniou, P.D. About room response equalization and dereverberation. In Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY, USA, 16 October 2005; pp. 183–186. [Google Scholar] [CrossRef]
- Shaymah, Y.; Angela, A. Channel impulse response equalization scheme based on particle swarm optimization algorithm in mode division multiplexing. EPJ Web Conf.
**2017**, 162, 01023. [Google Scholar] [CrossRef] [Green Version] - Krusienski, D.J.; Jenkins, W.K. The application of particle swarm optimization to adaptive IIR phase equalization. In Proceedings of the 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, Montreal, QC, Canada, 17–21 May 2004; Volume 2, p. ii-693. [Google Scholar] [CrossRef]
- Mohammed, J.R. A Study on the Suitability of Genetic Algorithm for Adaptive Channel Equalization. Int. J. Electr. Comput. Eng. (IJECE)
**2012**, 2, 285–292. [Google Scholar] [CrossRef] - Chang, P.; Lin, C.G.; Yeh, B. Inverse filtering of a loudspeaker and room acoustics using time-delay neural networks. J. Acoust. Soc. Am.
**1994**, 95, 3400–3408. [Google Scholar] [CrossRef] - Sabin, A.T.; Pardo, B. A Method for Rapid Personalization of Audio Equalization Parameters. In Proceedings of the 17th ACM International Conference on Multimedia (MM ’09), Vancouver, BC, Canada, 19–24 October 2009; ACM: New York, NY, USA, 2009; pp. 769–772. [Google Scholar] [CrossRef]
- Pardo, B.; Little, D.; Gergle, D. Building a Personalized Audio Equalizer Interface with Transfer Learning and Active Learning. In Proceedings of the Second International ACM Workshop on Music Information Retrieval with User-Centered and Multimodal Strategies (MIRUM ’12), Nara, Japan, 29 October–2 November 2012; ACM: New York, NY, USA, 2012; pp. 13–18. [Google Scholar] [CrossRef]
- Reed, D. A Perceptual Assistant to Do Sound Equalization. In Proceedings of the 5th International Conference on Intelligent User Interfaces (IUI 00), New Orleans, LA, USA, 9–12 January 2000; ACM: New York, NY, USA, 2000; pp. 212–218. [Google Scholar] [CrossRef]
- Grachten, M.; Deruty, E.; Tanguy, A. Auto-adaptive Resonance Equalization using Dilated Residual Networks. arXiv
**2018**, arXiv:1807.08636. [Google Scholar] - Martinez Ramirez, M.A.; Reiss, J.D. End-to-End Equalization with Convolutional Neural Networks. In Proceedings of the 21st International Conference on Digital Audio Effects (DAFx-18), Aveiro, Portugal, 4–8 September 2018; Available online: http://dafx2018.web.ua.pt/papers/DAFx2018_paper_27.pdf (accessed on 3 April 2020).
- Agrawal, N.; Kumar, A.; Bajaj, V. A New Design Method for Stable IIR Filters With Nearly Linear-Phase Response Based on Fractional Derivative and Swarm Intelligence. IEEE Trans. Emerg. Top. Comput. Intell.
**2017**, 1, 464–477. [Google Scholar] [CrossRef] - Kamra, I.; Sidhu, D.S.; Sidhu, B.S. Design of Digital IIR Low Pass Filter Using Particle Swarm Optimization (PSO). Int. J. Sci. Res. Eng. Technol. (IJSRET)
**2014**, 6, 275–280. [Google Scholar] - Foresi, F.; Vecchiotti, P.; Zallocco, D.; Squartini, S. Designing Quasi-Linear Phase IIR Filters for Audio Crossover Systems by Using Swarm Intelligence; Audio Engineering Society Convention 144; Audio Engineering Society: Milan, Italy, 2018. [Google Scholar]
- Rashedi, E.; Nezamabadi-pour, H.; Saryazdi, S. Filter modeling using gravitational search algorithm. Eng. Appl. Artif. Intell.
**2011**, 24, 117–122. [Google Scholar] [CrossRef] - Kalinli, A.; Karaboga, N. Artificial immune algorithm for IIR filter design. Eng. Appl. Artif. Intell.
**2005**, 18, 919–929. [Google Scholar] [CrossRef] - Allakhverdiyeva, N. Application of neural network for digital recursive filter design. In Proceedings of the 2016 IEEE 10th International Conference on Application of Information and Communication Technologies (AICT), Baku, Azerbaijan, 12–14 October 2016; pp. 1–4. [Google Scholar] [CrossRef]
- Kumari, M.; Kumar, M.; Saxena, R.; Wal, A. Performance analysis of FIR Low Pass FIR Filter using Artificial Neural Network. Int. J. Eng. Trends Technol.
**2017**, 50, 58–62. [Google Scholar] [CrossRef] - Wang, X.H.; He, Y.G.; Li, T.Z. Neural Network Algorithm for Designing FIR Filters Utilizing Frequency-Response Masking Technique. J. Comput. Sci. Technol.
**2009**, 24, 463–471. [Google Scholar] [CrossRef] - Pepe, G.; Gabrielli, L.; Squartini, S.; Cattani, L. Evolutionary tuning of filters coefficients for binaural audio equalization. Appl. Acoust.
**2020**, 163, 107204. [Google Scholar] [CrossRef] - Azzali, A.; Bellini, A.; Farina, A.; Ugolotti, E. Design and Implementation of Psychoacoustics Equalizer for Infotainment; DSP Implementation Day, Politecnico di Milano: Milano, Italy, 2002; Volume 23. [Google Scholar]
- Bellini, A.; Farina, A.; Cibelli, G.; Ugolotti, E.; Bruschi, F. Experimental Validation of Equalizing Filters for Car Cockpits Designed with Warping Techniques; Audio Engineering Society: New York, NY, USA, 2000. [Google Scholar]
- Zhang, W.; Khong, A.W.H.; Naylor, P.A. Adaptive inverse filtering of room acoustics. In Proceedings of the 2008 42nd Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, USA, 26–29 October 2008; pp. 788–792. [Google Scholar] [CrossRef]
- Dagar, A.; Nitish, S.S.; Hegde, R. Joint Adaptive Impulse Response Estimation and Inverse Filtering for Enhancing In-Car Audio. In Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 15–20 April 2018; pp. 416–420. [Google Scholar] [CrossRef]
- Cichocki, A.; Unbehauen, R. Neural Networks for Optimization and Signal Processing; John Wiley & Sons, Inc.: New York, NY, USA, 1993. [Google Scholar]
- Villarrubia, G.; De Paz, J.F.; Chamoso, P.; De la Prieta, F. Artificial neural networks used in optimization problems. Neurocomputing
**2018**, 272, 10–16. [Google Scholar] [CrossRef] - Lopez Paz, D.; Sagun, L. Easing non-convex optimization with neural networks. In Proceedings of the International Conference on Learning Representations (ICLR 2018), Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- Isola, P.; Zhu, J.; Zhou, T.; Efros, A.A. Image-to-Image Translation with Conditional Adversarial Networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 5967–5976. [Google Scholar] [CrossRef] [Green Version]
- Pathak, D.; Krähenbühl, P.; Donahue, J.; Darrell, T.; Efros, A.A. Context Encoders: Feature Learning by Inpainting. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 2536–2544. [Google Scholar] [CrossRef] [Green Version]
- Pascual, S.; Bonafonte, A.; Serrà, J. SEGAN: Speech Enhancement Generative Adversarial Network. arXiv
**2017**, 3642–3646. [Google Scholar] [CrossRef] [Green Version] - Goodfellow, I.; Bengio, Y.; Courville, A.; Bengio, Y. Deep Learning; MIT Press: Cambridge, MA, USA, 2016; Volume 1. [Google Scholar]
- Kirkeby, O.; Nelson, P.A.; Hamada, H.; Orduna-Bustamante, F. Fast deconvolution of multichannel systems using regularization. IEEE Trans. Speech Audio Process.
**1998**, 6, 189–194. [Google Scholar] [CrossRef] [Green Version] - Farina, A. Advancements in Impulse Response Measurements by Sine Sweeps; Audio Engineering Society: Vienna, Austria, 2007. [Google Scholar]
- Cecchi, S.; Palestini, L.; Peretti, P.; Piazza, F.; Carini, A. Multipoint equalization of digital car audio systems. In Proceedings of the 2009 6th International Symposium on Image and Signal Processing and Analysis, Salzburg, Austria, 16–18 September 2009; pp. 650–655. [Google Scholar] [CrossRef]
- Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv
**2014**, arXiv:1412.6980. [Google Scholar] - Moreau, D.; Cazzolato, B.; Zander, A.; Petersen, C. A review of virtual sensing algorithms for active noise control. Algorithms
**2008**, 1, 69–99. [Google Scholar] [CrossRef] [Green Version]

**Figure 1.**Multi-point equalization problem: $\mathcal{S}$ loudspeakers are displaced in an environment together with $\mathcal{M}$ microhones. The equalizing filters ${g}_{s}$ are designed to invert the environment impulse responses ${h}_{m,s}$.

**Figure 2.**Scheme of the proposed method using an Multilayer Perceptron (MLP). The impulse responses are all concatenated into a vector and fed to the first layer, which must have size $\mathcal{S}\times \mathcal{M}\times L$.

**Figure 5.**Top view of the Alfa Romeo Giulia (

**a**) and the Jeep Renegade (

**b**) showing the placement of the $\mathcal{S}$ loudspeakers and the $\mathcal{M}$ microphones. D indicates the dummy head. The three yellow labels around M2 are the proximity test microphone PM1, PM2, PM3.

**Figure 6.**Magnitude frequency response of the 1024-th order FIR filters designed by the CNN for each one of the Alfa Romeo Giulia loudspeakers S1-S7 shown in Figure 5a.

**Figure 7.**Magnitude frequency responses at the left and right microphones of the dummy head in the Alfa Romeo Giulia after applying filters obtained from the CNN (

**a**,

**b**), Frequency Deconvolution (

**c**,

**d**) and Steepest Descent (

**e**,

**f**) methods. The original magnitude frequency response is shown in green while the equalized frequency response is shown in blue. The target magnitude response is shown in black.

**Figure 8.**Frequency response at microphone M2 (

**a**); microphones PM1 and PM2 (

**b**,

**c**), corresponding to small forward and backward head movements; microphones PM3 (

**d**), corresponding to a large lateral head movement.

**Figure 9.**Phase response of one of the filters achieved with the CNN method (FIR order 1024) and a linear fitting. Frequency is normalized according to Nyquist.

**Table 1.**The CNN and MLP configurations used in the experiments. The number of parameters are referred to filters of 1024-th order.

CNN | MLP | |||||
---|---|---|---|---|---|---|

Configuration | Number of Kernels | Number of Units | Trainable Parameters | Configuration | Number of Units | Trainable Parameters |

Conv #1 | [48, 24] | [10] | 7,481,943 | MLP #1 | [10] | 6,798,935 |

Conv #2 | [10, 5] | [100, 10] | 3,826,153 | MLP #2 | [100, 10] | 67,280,035 |

Conv #3 | [100, 25] | [100, 100] | 12,483,433 | MLP #3 | [100, 100] | 67,934,875 |

Conv #4 | [10] | [1000] | 3,825,863 | MLP #4 | [1000] | 679,183,175 |

MLP #5 | [100] | 67,924,775 | ||||

MLP #6 | [100, 100, 100] | 67,944,975 | ||||

MLP #7 | [5] | 36,003,713 | ||||

MLP #8 | [10, 1000, 1000] | 14,914,185 |

**Table 2.**Audio equalization results for the Alfa Romeo Giulia with binaural microphones. Please note that the $\overline{MSE}$ in the absence of equalization is 2.19, with $\overline{\sigma}$ 3.52. Best results for each column are highlighted in bold.

Filter Order | MLP | AE | CNN | FD ($\mathit{\beta}=0.1$) | SD | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|

Conf. | $\overline{MSE}$ | $\overline{\mathit{\sigma}}$ | Conf. | $\overline{MSE}$ | $\overline{\mathit{\sigma}}$ | Conf. | $\overline{MSE}$ | $\overline{\mathit{\sigma}}$ | $\overline{MSE}$ | $\overline{\mathit{\sigma}}$ | $\overline{MSE}$ | $\overline{\mathit{\sigma}}$ | |

512 | MLP #5 | 0.32 | 2.877 | Conv #1 | 9.72$\xb7{10}^{-4}$ | 0.136 | Conv #2 | 7.90$\xb7{10}^{-4}$ | 0.122 | 0.18 | 2.52 | 0.40 | 1.95 |

640 | MLP #8 | 0.36 | 2.730 | Conv #1 | 3.80$\xb7{10}^{-4}$ | 0.085 | Conv #2 | 3.74$\xb7{10}^{-4}$ | 0.084 | 0.15 | 2.34 | 0.35 | 1.72 |

768 | MLP #5 | 0.46 | 2.796 | Conv #1 | 1.66$\xb7{10}^{-4}$ | 0.056 | Conv #2 | 1.79$\xb7{10}^{-4}$ | 0.058 | 0.14 | 2.23 | 0.33 | 1.60 |

896 | MLP #2 | 0.45 | 2.799 | Conv #1 | 1.07$\xb7{10}^{-4}$ | 0.045 | Conv #1 | 1.02$\xb7{10}^{-4}$ | 0.044 | 0.12 | 2.07 | 0.31 | 1.54 |

1024 | MLP #7 | 0.32 | 2.746 | Conv #1 | 6.85$\xb7{\mathbf{10}}^{-\mathbf{5}}$ | 0.036 | Conv #1 | 6.31$\xb7{\mathbf{10}}^{-\mathbf{5}}$ | 0.034 | 0.10 | 1.93 | 0.30 | 1.50 |

**Table 3.**Effect of the parameter $\beta $ on the performance. The V-shaped configuration refers to a frequency-dependent $\beta $ with a minimum of ${10}^{-4}$ at 1 kHz and maxima of ${10}^{-1}$ at DC and Nyquist, varying linearly on a dB scale. The U-shaped configuration takes a value of ${10}^{-4}$ in the range 100 Hz–10 kHz and one elsewhere. Best results for each column are highlighted in bold.

$\mathit{\beta}$ | $\overline{MSE}$ | $\overline{\mathit{\sigma}}$ |
---|---|---|

${10}^{-4}$ | 0.123 | 1.83 |

${10}^{-3}$ | 0.118 | 1.82 |

${10}^{-2}$ | 0.108 | 1.81 |

${10}^{-1}$ | 0.108 | 1.93 |

1 | 0.281 | 2.71 |

10 | 0.686 | 4.2 |

100 | 0.937 | 5.09 |

V-shaped | 0.101 | 1.829 |

U-shaped | 0.124 | 1.86 |

**Table 4.**Audio equalization results for the Jeep Renegade with binaural microphones and four microphones (one per seat). The FIR order is 1024.

Setup | CNN | FD $\mathit{\beta}=0.1$ | |||
---|---|---|---|---|---|

Conf | $\overline{MSE}$ | $\overline{\mathit{\sigma}}$ | $\overline{MSE}$ | $\overline{\mathit{\sigma}}$ | |

Binaural | #1 | $6.19\xb7{10}^{-5}$ | 0.035 | 0.05 | 1.21 |

4 seats | #1 | $5.7\xb7{10}^{-4}$ | 0.106 | 0.15 | 1.95 |

**Table 5.**Audio equalization results for microphone M2 and microphones PM1, PM2 and PM3. The evaluation is achieved by the experiments performed using the Jeep Renegade with four microphones (see Table 4).

Mic. | CNN | FD | ||
---|---|---|---|---|

$\overline{MSE}$ | $\overline{\mathbf{\sigma}}$ | $\overline{MSE}$ | $\overline{\mathbf{\sigma}}$ | |

M2 | $5.07\xb7{10}^{-4}$ | 0.10 | 0.14 | 1.82 |

PM1 | 0.61 | 2.88 | 1.2 | 2.9 |

PM2 | 0.50 | 3.31 | 0.57 | 3.07 |

PM3 | 0.80 | 3.09 | 0.84 | 3.12 |

**Table 6.**Effect of the input type on the results of the CNN (filter order 1024). For each case, the best result and the related configuration is reported.

Input | $\overline{MSE}$ | $\overline{\mathit{\sigma}}$ | Conf. |
---|---|---|---|

Impulse Responses | $6.31\xb7{10}^{-5}$ | 0.034 | Conv #1 |

Random Iteration | $0.14$ | 2.152 | Conv #1 |

Random Fixed | $1.35\xb7{10}^{-4}$ | 0.052 | Conv #1 |

All 1s | $1.17\xb7{10}^{-4}$ | 0.049 | Conv #1 |

All 0s | ill-conditioned |

**Table 7.**Audio equalization in the single-channel and over-determined cases. Setup is $\mathcal{M}\times \mathcal{S}$.

Car | Setup | CNN | FD | ||
---|---|---|---|---|---|

$\overline{MSE}$ | $\overline{\mathbf{\sigma}}$ | $\overline{MSE}$ | $\overline{\mathbf{\sigma}}$ | ||

Giulia | $1\times 1$ | 0.52 | 8.57 | 0.62 | 9.84 |

$2\times 1$ | 0.57 | 7.81 | 0.64 | 9.19 | |

Renegade | $1\times 1$ | 0.03 | 1.34 | 0.12 | 2.01 |

$4\times 1$ | 0.22 | 2.76 | 0.44 | 3.62 |

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Pepe, G.; Gabrielli, L.; Squartini, S.; Cattani, L.
Designing Audio Equalization Filters by Deep Neural Networks. *Appl. Sci.* **2020**, *10*, 2483.
https://doi.org/10.3390/app10072483

**AMA Style**

Pepe G, Gabrielli L, Squartini S, Cattani L.
Designing Audio Equalization Filters by Deep Neural Networks. *Applied Sciences*. 2020; 10(7):2483.
https://doi.org/10.3390/app10072483

**Chicago/Turabian Style**

Pepe, Giovanni, Leonardo Gabrielli, Stefano Squartini, and Luca Cattani.
2020. "Designing Audio Equalization Filters by Deep Neural Networks" *Applied Sciences* 10, no. 7: 2483.
https://doi.org/10.3390/app10072483