# A Comparative Study on Denoising Algorithms for Footsteps Sounds as Biometric in Noisy Environments

^{1}

^{2}

^{*}

^{†}

## Abstract

**:**

## 1. Introduction

#### Related Work

## 2. Materials and Methods

#### 2.1. Footsteps Sound Analysis

- Zero Crossing Rate: Defined as [21]$$ZCR=\sum _{m=-\infty}^{\infty}|sgn\left(x\left(m\right)\right)-sgn\left(x(m-1)\right)|w(n-m),$$
- Energy: The energy of the signal can be computed using the sum of squares of the signal samples, following the equation:$$E\left(x\left(n\right)\right)=\sum _{n=-\infty}^{\infty}{\left|x\left(n\right)\right|}^{2}.$$
- Entropy of Energy: The entropy of the energy is also important as a measure of abrupt changes in the energy of frames.
- Spectral Centroid: This is a measure that represents the center of mass of the signal’s spectrum.
- Spectral Spread: Is a measure of the variance in the signal’s spectrum.
- Spectral Entropy: The entropy can be measured in the spectrum, quantifying the spectral complexity of the speech signal. It can be obtained by [22]$$SE=\sum _{f}{p}_{f}log\frac{1}{{p}_{f}},$$
- Spectral Flux: This is a measure of how quickly the spectrum is changing, by calculating the square of successive frames.
- Spectral Rolloff: It is the frequency below which 90% of the energy of the spectrum is concentrated.
- Mel Frequency Cepstral Coefficients (MFCCs): It is a representation of the power spectrum, based on the Fourier Transform mapped on the nonlinear mel scale of frequency. MFCCs vectors are commonly applied in speech recognition tasks. A detailed description of the MFCC can be found in [23].
- Chroma Vector: Chroma vectors are a representation of the spectrum, mapped into the twelve pitch classes of the traditional tonal music.
- Chroma Deviation: This is the measure of the standard deviation of the chroma coefficients.

#### 2.2. Experimental Setup

- Binary classification: The binary case of classification was defined for this experiments. This means that the data comes from the recording of two persons, and the identification pretends to distinguish between one of two possibilities.
- Noise: As mentioned in the Introduction, the presence of noise has to be contemplated in any real-file scenario of sound recording and processing. For our experiments, we consider both naturally and artificially generated noise. The Babble and Office Noise, obtained from mynoise.net accessed on 12 February 2022 provides realistic scenarios where a biometrics system could be implemented. On the other hand, White noise is usually analyzed in signal denoising tasks. For every type of noise, we add five signal-to-noise ratio (SNR) levels (−10, −5, 0, 5, 10) to cover light to heavy noise affectation of the footsteps sound signals.
- Denoising Algorithms: The problem of denoising signals has been explored for decades, and the comparison of algorithms is a usual task in sound-enhancing experiences. For this experimental setup, we chose three of the most commonly applied algorithms based on classical signal processing, along with a deep learning-based approach. Details of the algorithms are presented in Section 2.2.2.
- Classifiers: For this first experience of exploring the functionality of a system based on classification and denoising algorithms, we chose the Support Vector Machines (SVM) classifier. From the implementation in pyAudioAnalysis, a cross-validation procedure is performed to select the parameters for the optimal classifiers, like the margin parameter C for the SVM.

#### 2.2.1. Dataset

#### 2.2.2. Sound Classification in Noisy Environments

- 1.
- MMSE: As usual in several denoising methods, the Minimum Mean Square Error algorithm (MMSE) models the presence of noise as an additive process, as$$y\left(t\right)=x\left(t\right)+n\left(t\right),$$$$\begin{array}{c}\hfill \hat{{c}_{x}}\left(k\right)=E\left\{{c}_{x}\left(k\right)|{m}_{y}\right\}=\sum _{b}{a}_{k,b}E\left\{log{m}_{x}\left(b\right)|{m}_{y}\right\},\end{array}$$
- 2.
- Spectral subtraction: Using the same additive noise model of the previous case (Equation (4)), the power spectrum of the noisy speech can be estimated as [25]:$${\left|Y\left(k\right)\right|}^{2}\approx {\left|X\left(k\right)\right|}^{2}+{\left|N\left(k\right)\right|}^{2},$$$$|\hat{X}{\left(k\right)|}^{2}={\left|Y\left(k\right)\right|}^{2}-\alpha {\left|\hat{D}\left(k\right)\right|}^{2},$$
- 3.
- Wiener filter: The Wiener filtering is one of the most successful and commonly implemented algorithms for denoising speech signals. The filtering is performed by minimizing the Mean Square Error. In the description presented in [26], the minimization in the frequency domain can be formulated using the transfer function$$H\left(\omega \right)=\frac{{P}_{x}\left(\omega \right)}{{P}_{x}\left(\omega \right)+{P}_{n}\left(\omega \right)},$$$${\hat{P}}_{x}\left(\omega \right)=H\left(\omega \right){P}_{y}\left(\omega \right).$$
- 4.
- Deep learning: The application of deep learning-based algorithms for denoising sound signals has been successfully applied in recent experiences. Among the different approaches and types of deep learning models, recurrent neural networks such as Long-short Term Memory (LSTM) stand out for their results and capacity to model sequential information.For our experiments, we chose the PyTorch implementation of LSTM-based autoencoders presented by Facebook Research (https://github.com/facebookresearch/denoiser, accessed on 12 February 2022). This implementation is based on an encoder/decoder architecture that combines convolutional and LSTM layers, with skip U-net connections. It works with raw waveforms. Further details can be found in [28], where we extracted the parameters of the neural network.

#### 2.2.3. Evaluation

## 3. Results

Algorithm | Accuracy | Precision | Recall | F1-Score |
---|---|---|---|---|

No filter | 0.60 | 0.65 | 0.59 | 0.62 |

MMSE [24] | 0.56 | 0.50 | 0.57 | 0.53 |

Spectral subtraction [25] | 0.67 | 0.58 | 0.71 | 0.64 |

Wiener [27] | 0.77 * | 0.77 * | 0.77 * | 0.77 * |

Deep learning [28] | 0.43 | 0.40 | 0.42 | 0.41 |

Algorithm | Accuracy | Precision | Recall | F1-Score |
---|---|---|---|---|

No filter | 0.75 | 0.77 | 0.74 | 0.75 |

MMSE [24] | 0.65 | 0.58 | 0.68 | 0.63 |

Spectral subtraction [25] | 0.77 * | 0.81 | 0.75 | 0.78 * |

Wiener [27] | 0.75 | 0.85 * | 0.71 | 0.77 |

Deep learning [28] | 0.73 | 0.55 | 0.85 * | 0.67 |

Algorithm | Accuracy | Precision | Recall | F1-Score |
---|---|---|---|---|

No filter | 0.92 | 0.88 | 0.96 | 0.92 |

MMSE [24] | 0.79 | 0.77 | 0.80 | 0.78 |

Spectral subtraction [25] | 0.92 | 0.92 * | 0.92 | 0.92 |

Wiener [27] | 0.94 * | 0.88 | 1.00 * | 0.94 * |

Deep learning [28] | 0.78 | 0.55 | 1.00 * | 0.71 |

Algorithm | Accuracy | Precision | Recall | F1-Score |
---|---|---|---|---|

No filter | 0.96 | 0.92 | 1.00 * | 0.96 |

MMSE [24] | 0.90 | 0.85 | 0.96 | 0.90 |

Spectral subtraction [25] | 0.98 * | 0.96 * | 1.00 * | 0.98 * |

Wiener [27] | 0.94 | 0.92 | 0.96 | 0.94 |

Deep learning [28] | 0.88 | 0.75 | 1.00 * | 0.86 |

Algorithm | Accuracy | Precision | Recall | F1-Score |
---|---|---|---|---|

No filter | 0.96 * | 0.92 * | 1.00 * | 0.96 * |

MMSE [24] | 0.92 | 0.85 | 1.00 * | 0.92 |

Spectral subtraction [25] | 0.94 | 0.92 * | 0.96 | 0.94 |

Wiener [27] | 0.96 * | 0.92 * | 1.00 * | 0.96 * |

Deep learning [28] | 0.95 | 0.90 | 1.00 * | 0.95 |

Algorithm | Accuracy | Precision | Recall | F1-Score |
---|---|---|---|---|

No filter | 0.96 | 0.92 | 1.00 * | 0.96 |

MMSE [24] | 0.92 | 0.85 | 1.00 * | 0.92 |

Spectral subtraction [25] | 0.88 | 0.77 | 1.00 | 0.87 |

Wiener [27] | 0.94 | 0.88 | 1.00 * | 0.94 |

Deep learning [28] | 0.98 * | 0.95 * | 1.00 * | 0.97 * |

Algorithm | Accuracy | Precision | Recall | F1-Score |
---|---|---|---|---|

No filter | 0.96 | 0.92 | 1.00 * | 0.96 |

MMSE [24] | 0.96 | 0.92 | 1.00 * | 0.96 |

Spectral subtraction [25] | 0.96 | 0.92 | 1.00 * | 0.96 |

Wiener [27] | 1.00 * | 1.00 * | 1.00 * | 1.00 * |

Deep learning [28] | 1.00 * | 1.00 * | 1.00 * | 1.00 * |

Algorithm | Accuracy | Precision | Recall | F1-Score |
---|---|---|---|---|

No filter | 1.00 * | 1.00 * | 1.00 * | 1.00 * |

MMSE [24] | 0.96 | 0.92 | 1.00 * | 0.96 |

Spectral subtraction [25] | 0.96 | 0.92 | 1.00 * | 0.96 |

Wiener [27] | 0.98 | 0.96 | 1.00 * | 0.98 |

Deep learning [28] | 1.00 * | 1.00 * | 1.00 * | 1.00 * |

Algorithm | Accuracy | Precision | Recall | F1-Score |
---|---|---|---|---|

No filter | 1.00 * | 1.00 * | 1.00 * | 1.00 * |

MMSE [24] | 0.98 | 0.96 | 1.00 * | 0.98 |

Spectral subtraction [25] | 0.98 | 0.96 | 1.00 * | 0.98 |

Wiener [27] | 0.96 | 0.92 | 1.00 * | 0.96 |

Deep learning [28] | 1.00 * | 1.00 * | 1.00 * | 1.00 * |

Algorithm | Accuracy | Precision | Recall | F1-Score |
---|---|---|---|---|

No filter | 1.00 * | 1.00 * | 1.00 * | 1.00 * |

MMSE [24] | 0.96 | 0.92 | 1.00 * | 0.96 |

Spectral subtraction [25] | 0.96 | 0.92 | 1.00 * | 0.96 |

Wiener [27] | 0.98 | 0.96 | 1.00 * | 0.98 |

Deep learning [28] | 1.00 * | 1.00 * | 1.00 * | 1.00 * |

Algorithm | Accuracy | Precision | Recall | F1-Score |
---|---|---|---|---|

No filter | 0.60 | 0.65 | 0.59 | 0.62 |

MMSE [24] | 0.56 | 0.50 | 0.57 | 0.53 |

Spectral subtraction [25] | 0.67 | 0.58 | 0.71 | 0.64 |

Wiener [27] | 0.71 * | 0.73 * | 0.70 * | 0.72 * |

Deep learning [28] | 0.65 | 0.65 | 0.65 | 0.65 |

Algorithm | Accuracy | Precision | Recall | F1-Score |
---|---|---|---|---|

No filter | 0.75 | 0.77 | 0.74 | 0.75 |

MMSE [24] | 0.65 | 0.58 | 0.68 | 0.63 |

Spectral subtraction [25] | 0.79 * | 0.85 * | 0.76 * | 0.80 * |

Wiener [27] | 0.75 | 0.85 * | 0.71 | 0.77 |

Deep learning [28] | 0.60 | 0.65 | 0.59 | 0.62 |

Algorithm | Accuracy | Precision | Recall | F1-Score |
---|---|---|---|---|

No filter | 0.92 | 0.88 | 0.96 | 0.92 |

MMSE [24] | 0.79 | 0.77 | 0.80 | 0.78 |

Spectral subtraction [25] | 0.96 * | 0.92 * | 1.00 * | 0.98 * |

Wiener [27] | 0.94 | 0.88 | 1.00 * | 0.94 |

Deep learning [28] | 0.75 | 0.55 | 0.92 | 0.69 |

Algorithm | Accuracy | Precision | Recall | F1-Score |
---|---|---|---|---|

No filter | 0.96 | 0.92 | 1.00 * | 0.96 |

MMSE [24] | 0.90 | 0.85 | 0.96 | 0.90 |

Spectral subtraction [25] | 0.98 * | 0.96 * | 1.00 * | 0.98 * |

Wiener [27] | 0.94 | 0.92 | 0.96 | 0.94 |

Deep learning [28] | 0.88 | 0.75 | 1.00 * | 0.86 |

Algorithm | Accuracy | Precision | Recall | F1-Score |
---|---|---|---|---|

No filter | 0.98 * | 0.96 * | 1.00 * | 0.98 * |

MMSE [24] | 0.92 | 0.85 | 1.00 * | 0.92 |

Spectral subtraction [25] | 0.94 | 0.92 | 0.96 | 0.94 |

Wiener [27] | 0.96 | 0.92 | 1.00 * | 0.96 |

Deep learning [28] | 0.93 | 0.85 | 1.00 * | 0.92 |

## 4. Discussion

Algorithm | Advantages | Disadvantages |
---|---|---|

MMSE [24] | competitive results in the lower levels of noise (SNR 10) | The algorithm did not achieve good results in four of the five SNR levels for all kinds of noise |

Spectral subtraction [25] | Easy of implementation. Achieved very good results for natural noises. | In the presence of White noise, the algorithm degrades the signals and significantly lower the accuracy and precision. |

Wiener [27] | Obtains the best accuracy results of Babble noise, and competitive results for White noise. | A tendency to lower the accuracy for low levels of noise (SNR 10) was observed. |

Deep learning [28] | Obtained the best performance in all SNR levels of White Noise. | Large training time. It may require much larger datasets to enhance natural noises. |

## 5. Conclusions

## Author Contributions

## Funding

## Conflicts of Interest

## Abbreviations

MDPI | Multidisciplinary Digital Publishing Institute |

SNR | Signal-to-noise ratio |

LSTM | Long Short-term Memory Neural Networks |

SVM | Support Vector Machine classification algorithm |

MMSE | Minimum Mean Square Error algorithm. |

MFCC | Mel-frequency Cepstral Coefficients. |

## References

- Down, M.P.; Sands, R.J. Biometrics: An overview of the technology, challenges and control considerations. Inf. Syst. Control. J.
**2004**, 4, 53–56. [Google Scholar] - AbdElaziz, A.A. A survey of smartphone-based Face Recognition systems for Security Purposes. Kafrelsheikh J. Inf. Sci.
**2021**, 2, 1–7. [Google Scholar] [CrossRef] - Vera-Rodriguez, R.; Lewis, R.P.; Mason, J.; Evans, N. Footstep recognition for a smart home environment. Int. J. Smart Home
**2008**, 2, 95–110. [Google Scholar] - Alsaadi, I.M. Study On Most Popular Behavioral Biometrics, Advantages, Disadvantages and Recent Applications: A Review. Int. J. Sci. Technol. Res.
**2021**, 10, 15–21. [Google Scholar] - Thomas, P.A.; Mathew, K.P. A broad review on non-intrusive active user authentication in biometrics. J. Ambient. Intell. Humaniz. Comput.
**2021**, 1–22, online ahead of print. [Google Scholar] - Gomez-Alanis, A.; Gonzalez-Lopez, J.A.; Peinado, A.M. GANBA: Generative Adversarial Network for Biometric Anti-Spoofing. Appl. Sci.
**2022**, 12, 1454. [Google Scholar] [CrossRef] - Pedotti, A. Simple equipment used in clinical practice for evaluation of locomotion. IEEE Trans. Biomed. Eng.
**1977**, 5, 456–461. [Google Scholar] [CrossRef] [PubMed] - Addlesee, M.D.; Jones, A.L.; Livesey, F.; Samaria, F. The ORL active floor [sensor system]. IEEE Pers. Commun.
**1997**, 4, 35–41. [Google Scholar] [CrossRef] - Rodriguez, R.V.; Evans, N.W.D.; Lewis, R.P.; Fauve, B.; Mason, J.S.D. An experimental study on the feasibility of footsteps as a biometric. In Proceedings of the 2007 15th European Signal Processing Conference, Poznan, Poland, 3–7 September 2007. [Google Scholar]
- Algermissen, S.; Hörnlein, M. Person Identification by Footstep Sound Using Convolutional Neural Networks. Appl. Mech.
**2021**, 2, 257–273. [Google Scholar] [CrossRef] - Hori, Y.; Ando, T.; Fukuda, A. Personal Identification Methods Using Footsteps of One Step. In Proceedings of the 2020 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), Fukuoka, Japan, 19–21 February 2020. [Google Scholar]
- Mason, J.E.; Traoré, I.; Woungang, I. Gait Biometric Recognition. In Machine Learning Techniques for Gait Biometric Recognition; Springer: Cham, Switzerland, 2016; pp. 9–35. [Google Scholar]
- Connor, P.; Ross, A. Biometric recognition by gait: A survey of modalities and features. Comput. Vis. Image Underst.
**2018**, 167, 1–27. [Google Scholar] [CrossRef] - Vera-Rodriguez, R.; Mason, J.S.D.; Ortega-Garcia, J.F.J. Analysis of time domain information for footstep recognition. In International Symposium on Visual Computing; Springer: Berlin/Heidelberg, Germany, 2010. [Google Scholar]
- Shoji, Y.; Takasuka, T.; Yasukawa, H. Personal identification using footstep detection. In Proceedings of the 2004 International Symposium on Intelligent Signal Processing and Communication Systems, ISPACS 2004, Seoul, Korea, 18–19 November 2004. [Google Scholar]
- Tsuji, K.; Takao, R.; Yamada, M.; Harada, K.; Kamiya, Y. Multiple Person Detection by Footsteps Sounds Using GMRS. IEEE Sens. J.
**2020**, 21, 6543–6554. [Google Scholar] [CrossRef] - Naqvi, S.Z.H.; Choudhry, M.A. An automated system for classification of chronic obstructive pulmonary disease and pneumonia patients using lung sound analysis. Sensors
**2020**, 20, 6512. [Google Scholar] [CrossRef] - García-Domínguez, A.; Galván-Tejada, C.E.; Brena, R.F.; Aguileta, A.A.; Galván-Tejada, J.I.; Gamboa-Rosales, H.; Celaya-Padilla, J.M.; Luna-García, H. Children’s Activity Classification for Domestic Risk Scenarios Using Environmental Sound and a Bayesian Network. Healthcare
**2021**, 9, 884. [Google Scholar] [CrossRef] [PubMed] - Leng, L.; Li, M.; Kim, C.; Bi, X. Dual-source discrimination power analysis for multi-instance contactless palmprint recognition. Multimed. Tools Appl.
**2017**, 76, 333–354. [Google Scholar] [CrossRef] - Giannakopoulos, T. Pyaudioanalysis: An open-source python library for audio signal analysis. PLoS ONE
**2015**, 10, e0144610. [Google Scholar] [CrossRef] - Bachu, R.G.; Adapa, B.K.; Kopparthi, S.; Buket, D. Barkana Separation of voiced and unvoiced using zero crossing rate and energy of the speech signal. In Proceedings of the American Society for Engineering Education (ASEE) Zone Conference Proceedings, Pittsburgh, PA, USA, 22–25 June 2008. [Google Scholar]
- Acharya, U.R.; Fujitad, H.; Sudarshan, V.K.; Bhate, S.; Koh, J.E.W. Application of entropies for automated diagnosis of epilepsy using EEG signals: A review. Knowl.-Based Syst.
**2015**, 88, 85–96. [Google Scholar] [CrossRef] - Tiwari, V. MFCC and its applications in speaker recognition. Int. J. Emerg. Technol.
**2010**, 1, 19–22. [Google Scholar] - Yu, D.; Deng, L.; Droppo, J.; Wu, J.; Gong, Y.; Acero, A. A minimum-mean-square-error noise reduction algorithm on mel-frequency cepstra for robust speech recognition. In Proceedings of the 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, Las Vegas, NV, USA, 31 March–4 April 2008. [Google Scholar]
- Kamath, S.; Loizou, P. A multi-band spectral subtraction method for enhancing speech corrupted by colored noise. In Proceedings of the 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing, Orlando, FL, USA, 13–17 May 2002; Volume 4. [Google Scholar]
- El-Fattah, M.A.A.; Dessouky, M.I.; Abbas, A.M.; Diab, S.M.; El-Rabaie, E.L.M.; Al-Nuaimy, W.; Alshebeili, S.A.; El-samie, F.E.A. Speech enhancement with an adaptive Wiener filter. Int. J. Speech Technol.
**2014**, 17, 53–64. [Google Scholar] [CrossRef] - Upadhyay, N.; Jaiswal, R.K. Single channel speech enhancement: Using Wiener filtering with recursive noise estimation. Proc. Comput. Sci.
**2016**, 84, 22–30. [Google Scholar] [CrossRef] - Defossez, A.; Synnaeve, G.; Adi, Y. Real time speech enhancement in the waveform domain. arXiv
**2020**, arXiv:2006.12847. [Google Scholar] - Altaf, M.U.B.; Butko, T.; Juang, B.-H. Person identification using biometric markers from footsteps sound. In Proceedings of the INTERSPEECH, Lyon, France, 25–29 August 2013. [Google Scholar]
- Riwurohi, J.E.; Riwurohi, J.E.; Mustofa, K. Agfianto Eko Putra People recognition through footstep sound using MFCC extraction method of artificial neural network back propagation. Int. J. Comput. Sci. Netw. Secur.
**2018**, 18, 28–35. [Google Scholar]

**Figure 2.**Sample waveform and spectrogram of a segment of five seconds from the first volunteer, during several stages of the experimental process. (

**a**) Clean segment. (

**b**) Noise-degraded with White Noise SNR 0. (

**c**) After the deep-learning based denoising.

**Figure 3.**Sample waveform and spectrogram of a segment of five seconds from the first volunteer, during several stages of the experimental process. (

**a**) Clean segment. (

**b**) Noise-degraded with Office Noise SNR 0. (

**c**) After the deep learning-based denoising.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Caravaca-Mora, R.; Brenes-Jiménez, C.; Coto-Jiménez, M.
A Comparative Study on Denoising Algorithms for Footsteps Sounds as Biometric in Noisy Environments. *Computation* **2022**, *10*, 133.
https://doi.org/10.3390/computation10080133

**AMA Style**

Caravaca-Mora R, Brenes-Jiménez C, Coto-Jiménez M.
A Comparative Study on Denoising Algorithms for Footsteps Sounds as Biometric in Noisy Environments. *Computation*. 2022; 10(8):133.
https://doi.org/10.3390/computation10080133

**Chicago/Turabian Style**

Caravaca-Mora, Ronald, Carlos Brenes-Jiménez, and Marvin Coto-Jiménez.
2022. "A Comparative Study on Denoising Algorithms for Footsteps Sounds as Biometric in Noisy Environments" *Computation* 10, no. 8: 133.
https://doi.org/10.3390/computation10080133