Binaural Acoustic Scene Classification Using Wavelet Scattering, Parallel Ensemble Classifiers and Nonlinear Fusion
Abstract
:1. Introduction
2. Related Work
2.1. Feature Extraction and Preprocessing
2.2. Classification Schemes
2.2.1. Ensemble Methods
2.2.2. Deep Learning Methods
3. Wavelet Scattering
3.1. Wavelet Filter Bank
3.2. Nonlinearity Modulus
3.3. Averaging
4. Proposed Method
4.1. Training Scheme
4.1.1. Feature Extraction
4.1.2. Ensemble Classifiers
4.1.3. Fusion Step
- If the result of the classifiers are similar, the value of Fusion result leads to the maximum in this result;
- If the result of the classifiers are different, Fusion result should be maximized in a true class.
- The summation of each class probabilities in two classifiers;
- The summation of class probabilities square in two classifiers;
- The multiply of class probabilities in two classifiers;
- The absolute value of the difference of each class probabilities in two classifiers.
4.2. Test Scheme
4.3. Evaluation Methodology
- Bus-travelling by bus in the city (vehicle);
- Cafe/Restaurant—small cafe/restaurant (indoor);
- Car-driving or travelling as a passenger, in the city (vehicle);
- City centre (outdoor);
- Forest path (outdoor);
- Grocery store -medium size grocery store (indoor);
- Home (indoor);
- Lakeside beach (outdoor);
- Library (indoor);
- Metro station (indoor);
- Office-multiple persons, typical workday (indoor);
- Residential area (outdoor);
- Train (travelling, vehicle);
- Tram (travelling, vehicle);
- Urban park (outdoor).
5. Results and Discussion
5.1. Results of Ensemble Classifiers and of the Proposed Method
5.2. Sensitivity Analysis
5.3. Result Comparison with Previous Studies
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Waldekar, S.; Saha, G. Two-level fusion-based acoustic scene classification. Appl. Acoust. 2020, 170, 107502. [Google Scholar] [CrossRef]
- Ren, Z.; Kong, Q.; Han, J.; Plumbley, M.D.; Schuller, B.W. CAA-Net: Conditional Atrous CNNs with Attention for Explainable Device-robust Acoustic Scene Classification. IEEE Trans. Multimed. 2020, 23, 10–15. [Google Scholar] [CrossRef]
- Abeßer, J. A Review of Deep Learning Based Methods for Acoustic Scene Classification. Appl. Sci. 2020, 10, 2020. [Google Scholar] [CrossRef] [Green Version]
- Liu, Y.; Jiang, S.; Shi, C.; Li, H. Acoustic scene classification using ensembles of deep residual networks and spectrogram decompositions. In Proceedings of the Detection and Classification of Acoustic Scenes and Events 2019 Workshop (DCASE 2019), New York, NY, USA, 25–26 October 2019. [Google Scholar]
- Naranjo-Alcazar, J.; Perez-Castanos, S.; Zuccarello, P.; Cobos, M. Acoustic Scene Classification with Squeeze-Excitation Residual Networks. IEEE Access 2020, 8, 112287–112296. [Google Scholar] [CrossRef]
- Peeters, G.; Richard, G. Deep Learning for Audio and Music. In Multi-Faceted Deep Learning; Springer: Cham, Switzerland, 2021; pp. 231–266. [Google Scholar]
- Serizel, R.; Bisot, V.; Essid, S.; Richard, G. Acoustic Features for Environmental Sound Analysis. In Computational Analysis of Sound Scenes and Events; Springer International Publishing: Berlin/Heidelberg, Germany, 2017; pp. 71–101. [Google Scholar] [CrossRef] [Green Version]
- Vilouras, K. Acoustic scene classification using fully convolutional neural networks and per-channel energy normalization. Technical Report, Detection and Classification of Acoustic Scenes and Events 2020 Challenge, 1 March–1 July 2020.
- Hajihashemi, V.; Alavigharahbagh, A.; Oliveira, H.S.; Cruz, P.M.; Tavares, J.M.R. Novel Time-Frequency Based Scheme for Detecting Sound Events from Sound Background in Audio Segments. In Iberoamerican Congress on Pattern Recognition; Springer: Berlin/Heidelberg, Germany, 2021; pp. 402–416. [Google Scholar] [CrossRef]
- McDonnell, M.; UniSA, S. Low-Complexity Acoustic Scene Classification Using One-Bit-per-Weight Deep Convolutional Neural Networks, Technical Report, Detection and Classification of Acoustic Scenes and Events 2020 Challenge, 1 March–1 July 2020.
- Jiang, S.; Shi, C.; Li, H. Acoustic Scene Classification Technique for Active Noise Control. In Proceedings of the 2019 International Conference on Control, Automation and Information Sciences (ICCAIS), Chengdu, China, 23–26 October 2019; pp. 1–5. [Google Scholar]
- Ma, X.; Shao, Y.; Ma, Y.; Zhang, W.Q. Deep Semantic Encoder-Decoder Network for Acoustic Scene Classification with Multiple Devices. In Proceedings of the 2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Auckland, New Zealand, 7–10 December 2020; pp. 365–370. [Google Scholar]
- Zhang, T.; Liang, J.; Ding, B. Acoustic scene classification using deep CNN with fine-resolution feature. Expert Syst. Appl. 2020, 143, 113067. [Google Scholar] [CrossRef]
- Yang, L.; Tao, L.; Chen, X.; Gu, X. Multi-scale semantic feature fusion and data augmentation for acoustic scene classification. Appl. Acoust. 2020, 163, 107238. [Google Scholar] [CrossRef]
- He, N.; Zhu, J. A Weighted Partial Domain Adaptation for Acoustic Scene Classification and Its Application in Fiber Optic Security System. IEEE Access 2021, 9, 2244–2250. [Google Scholar] [CrossRef]
- Nguyen, T.; Pernkopf, F.; Kosmider, M. Acoustic Scene Classification for Mismatched Recording Devices Using Heated-Up Softmax and Spectrum Correction. In Proceedings of the ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020. [Google Scholar] [CrossRef]
- Zhang, L.; Han, J.; Shi, Z. Learning Temporal Relations from Semantic Neighbors for Acoustic Scene Classification. IEEE Signal Process. Lett. 2020, 27, 950–954. [Google Scholar] [CrossRef]
- Mezza, A.I.; Habets, E.A.; Müller, M.; Sarti, A. Feature Projection-Based Unsupervised Domain Adaptation for Acoustic Scene Classification. In Proceedings of the 2020 IEEE 30th International Workshop on Machine Learning for Signal Processing (MLSP), Espoo, Finland, 21–24 September 2020; pp. 1–6. [Google Scholar]
- Mezza, A.I.; Habets, E.A.P.; Muller, M.; Sarti, A. Unsupervised Domain Adaptation for Acoustic Scene Classification Using Band-Wise Statistics Matching. In Proceedings of the 2020 28th European Signal Processing Conference (EUSIPCO), Amsterdam, The Netherlands, 18–21 January 2021. [Google Scholar] [CrossRef]
- Takeyama, S.; Komatsu, T.; Miyazaki, K.; Togami, M.; Ono, S. Robust Acoustic Scene Classification to Multiple Devices Using Maximum Classifier Discrepancy and Knowledge Distillation. In Proceedings of the 2020 28th European Signal Processing Conference (EUSIPCO), Amsterdam, The Netherlands, 18–21 January 2021. [Google Scholar] [CrossRef]
- Ooi, K.; Peksi, S.; Gan, W.S. Ensemble of Pruned Low-Complexity Models for Acoustic Scene Classification. In Proceedings of the 5th the Workshop on Detection and Classification of Acoustic Scenes and Events 2020 (DCASE 2020), Tokyo, Japan, 2–4 November 2020. [Google Scholar]
- Kwiatkowska, Z.; Kalinowski, B.; Kośmider, M.; Rykaczewski, K. Deep Learning Based Open Set Acoustic Scene Classification. In Interspeech 2020; ISCA: Singapore, 2020; pp. 1216–1220. [Google Scholar] [CrossRef]
- Alamir, M.A. A novel acoustic scene classification model using the late fusion of convolutional neural networks and different ensemble classifiers. Appl. Acoust. 2021, 175, 107829. [Google Scholar] [CrossRef]
- Abrol, V.; Sharma, P. Learning Hierarchy Aware Embedding from Raw Audio for Acoustic Scene Classification. IEEE/ACM Trans. Audio Speech Lang. Process. 2020, 28, 1964–1973. [Google Scholar] [CrossRef]
- Wu, Y.; Lee, T. Time-Frequency Feature Decomposition Based on Sound Duration for Acoustic Scene Classification. In Proceedings of the ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; pp. 716–720. [Google Scholar] [CrossRef]
- Leng, Y.; Zhao, W.; Lin, C.; Sun, C.; Wang, R.; Yuan, Q.; Li, D. LDA-based data augmentation algorithm for acoustic scene classification. Knowl.-Based Syst. 2020, 195, 105600. [Google Scholar] [CrossRef]
- Pham, L.; Phan, H.; Nguyen, T.; Palaniappan, R.; Mertins, A.; McLoughlin, I. Robust acoustic scene classification using a multi-spectrogram encoder-decoder framework. Digit. Signal Process. 2021, 110, 102943. [Google Scholar] [CrossRef]
- Nguyen, T.; Ngo, D.; Pham, L.; Tran, L.; Hoang, T. A Re-trained Model Based On Multi-kernel Convolutional Neural Network for Acoustic Scene Classification. In Proceedings of the 2020 RIVF International Conference on Computing and Communication Technologies (RIVF), Ho Chi Minh City, Vietnam, 14–15 October 2020. [Google Scholar] [CrossRef]
- Gao, W.; McDonnell, M.; UniSA, S. Acoustic Scene Classification Using Deep Residual Networks with Focal Loss and Mild Domain Adaptation, Technical Report, Detection and Classification of Acoustic Scenes and Events 2020 Challenge, 1 March–1 July 2020.
- Lee, Y.; Lim, S.; Kwak, I.Y. CNN-Based Acoustic Scene Classification System. Electronics 2021, 10, 371. [Google Scholar] [CrossRef]
- Seo, S.; Kim, C.; Kim, J.H. Multi-Channel Feature Using Inter-Class and Inter-Device Standard Deviations for Acoustic Scene Classification, Technical Report, Detection and Classification of Acoustic Scenes and Events 2020 Challenge, 1 March–1 July 2020.
- Hu, H.; Yang, C.H.H.; Xia, X.; Bai, X.; Tang, X.; Wang, Y.; Niu, S.; Chai, L.; Li, J.; Zhu, H.; et al. A Two-Stage Approach to Device-Robust Acoustic Scene Classification. In Proceedings of the ICASSP 2021–2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, 6–11 June 2020; pp. 1–5. [Google Scholar]
- McDonnell, M.D.; Gao, W. Acoustic Scene Classification Using Deep Residual Networks with Late Fusion of Separated High and Low Frequency Paths. In Proceedings of the ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; pp. 141–145. [Google Scholar] [CrossRef]
- Hu, H.; Yang, C.H.H.; Xia, X.; Bai, X.; Tang, X.; Wang, Y.; Niu, S.; Chai, L.; Li, J.; Zhu, H.; et al. Device-robust acoustic scene classification based on two-stage categorization and data augmentation. arXiv 2020, arXiv:2007.08389. [Google Scholar]
- Bai, X.; Du, J.; Pan, J.; Zhou, H.-s.; Tu, Y.H.; Lee, C.H. High-Resolution Attention Network with Acoustic Segment Model for Acoustic Scene Classification. In Proceedings of the ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; pp. 656–660. [Google Scholar] [CrossRef]
- Singh, A.; Rajan, P.; Bhavsar, A. SVD-based redundancy removal in 1D CNNs for acoustic scene classification. Pattern Recognit. Lett. 2020, 131, 383–389. [Google Scholar] [CrossRef]
- Paseddula, C.; Gangashetty, S.V. Acoustic Scene Classification using Single Frequency Filtering Cepstral Coefficients and DNN. In Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, UK, 19–24 July 2020; pp. 1–6. [Google Scholar] [CrossRef]
- Lostanlen, V.; Andén, J. Binaural scene classification with wavelet scattering. In Proceedings of the Detection and Classification of Acoustic Scenes and Events 2016 Workshop (DCASE2016), Tampere University of Technology, Tampere, Finland, September 2016. [Google Scholar]
- Shim, H.J.; Jung, J.W.; Kim, J.H.; Yu, H.J. Capturing scattered discriminative information using a deep architecture in acoustic scene classification. arXiv 2020, arXiv:2007.04631. [Google Scholar]
- Jung, J.W.; Heo, H.S.; Shim, H.J.; Yu, H.J. Knowledge Distillation in Acoustic Scene Classification. IEEE Access 2020, 8, 166870–166879. [Google Scholar] [CrossRef]
- Nguyen, T.; Pernkopf, F. Acoustic Scene Classification Using a Convolutional Neural Network Ensemble and Nearest Neighbor Filters. In Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events, Surrey, UK, 19–20 November 2018. [Google Scholar]
- Jung, J.W.; Heo, H.S.; Shim, H.J.; Yu, H. DNN based multi-level feature ensemble for acoustic scene classification. In Proceedings of the Detection and Classification of Acoustic Scenes and Events 2018 Workshop (DCASE2018), Surrey, UK, 19–20 November 2018; pp. 113–117. [Google Scholar]
- Singh, A.; Thakur, A.; Rajan, P.; Bhavsar, A. A layer-wise score level ensemble framework for acoustic scene classification. In Proceedings of the 2018 26th European Signal Processing Conference (EUSIPCO), Rome, Italy, 3–7 September 2018; pp. 837–841. [Google Scholar]
- Sakashita, Y.; Aono, M. Acoustic scene classification by ensemble of spectrograms based on adaptive temporal divisions. In Proceedings of the Detection and Classification of Acoustic Scenes and Events 2018 Workshop (DCASE 2018), Surrey, UK, 19–20 November 2018. [Google Scholar]
- Mars, R.; Pratik, P.; Nagisetty, S.; Lim, C. Acoustic scene classification from binaural signals using convolutional neural networks. In Proceedings of the Detection and Classification of Acoustic Scenes and Events 2019 Workshop (DCASE2019), New York, NY, USA, 25–26 October 2019. [Google Scholar]
- Huang, J.; Lu, H.; Lopez Meyer, P.; Cordourier, H.; Del Hoyo Ontiveros, J. Acoustic scene classification using deep learning-based ensemble averaging. In Proceedings of the Detection and Classification of Acoustic Scenes and Events 2019 Workshop (DCASE2019), New York, NY, USA, 25–26 October 2019. [Google Scholar]
- Wang, W.; Liu, M.; Li, Y. The SEIE-SCUT systems for acoustic scene classification using CNN ensemble. In Proceedings of the Detection and Classification of Acoustic Scenes and Events 2019 Workshop (DCASE2019), New York, NY, USA, 25–26 October 2019. [Google Scholar]
- Ding, B.; Liu, G.; Liang, J. Acoustic scene classification based on ensemble system. In Proceedings of the Detection and Classification of Acoustic Scenes and Events 2019 Workshop (DCASE2019), New York, NY, USA, 25–26 October 2019. [Google Scholar]
- Xu, K.; Zhu, B.; Kong, Q.; Mi, H.; Ding, B.; Wang, D.; Wang, H. General audio tagging with ensembling convolutional neural networks and statistical features. J. Acoust. Soc. Am. 2019, 145, EL521–EL527. [Google Scholar] [CrossRef] [Green Version]
- Gao, L.; Xu, K.; Wang, H.; Peng, Y. Multi-representation knowledge distillation for audio classification. Multimed. Tools Appl. 2022, 1–24. [Google Scholar] [CrossRef]
- Wang, M.; Wang, R.; Zhang, X.L.; Rahardja, S. Hybrid constant-Q transform based CNN ensemble for acoustic scene classification. In Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Lanzhou, China, 18–21 November 2019; pp. 1511–1516. [Google Scholar]
- Lopez-Meyer, P.; Ontiveros, J.d.H.; Stemmer, G.; Nachman, L.; Huang, J. Ensemble of convolutional neural networks for the DCASE 2020 acoustic scene classification challenge. In Proceedings of the 5th the Workshop on Detection and Classification of Acoustic Scenes and Events 2020 (DCASE 2020), Tokyo, Japan, 2–4 November 2020. [Google Scholar]
- Chin, C.S.; Kek, X.Y.; Chan, T.K. Scattering Transform of Averaged Data Augmentation for Ensemble Random Subspace Discriminant Classifiers in Audio Recognition. In Proceedings of the 2021 7th International Conference on Advanced Computing and Communication Systems (ICACCS), Coimbatore, India, 19–20 March 2021; Volume 1, pp. 454–458. [Google Scholar]
- Wang, Q.; Zheng, S.; Li, Y.; Wang, Y.; Wu, Y.; Hu, H.; Yang, C.H.H.; Siniscalchi, S.M.; Wang, Y.; Du, J.; et al. A Model Ensemble Approach for Audio-Visual Scene Classification. In Proceedings of the 6th Workshop on Detection and Classification of Acoustic Scenes and Events 2021 (DCASE 2021), Online, 15–19 November 2021. [Google Scholar] [CrossRef]
- Sarman, S.; Sert, M. Audio based violent scene classification using ensemble learning. In Proceedings of the 2018 6th International Symposium on Digital Forensic and Security (ISDFS), Antalya, Turkey, 22–25 March 2018; pp. 1–5. [Google Scholar]
- Paseddula, C.; Gangashetty, S.V. Late fusion framework for Acoustic Scene Classification using LPCC, SCMC, and log-Mel band energies with Deep Neural Networks. Appl. Acoust. 2021, 172, 107568. [Google Scholar] [CrossRef]
- Mallat, S. Group Invariant Scattering. Commun. Pure Appl. Math. 2012, 65, 1331–1398. [Google Scholar] [CrossRef] [Green Version]
- Anden, J.; Mallat, S. Deep Scattering Spectrum. IEEE Trans. Signal Process. 2014, 62, 4114–4128. [Google Scholar] [CrossRef] [Green Version]
- Zhu, H.; Wong, T.; Lin, N.; Lung, H.; Li, Z.; Thedoridis, S. A New Target Classification Method for Synthetic Aperture Radar Images based on Wavelet Scattering Transform. In Proceedings of the 2020 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC), Macau, China, 21–24 August 2020; pp. 1–6. [Google Scholar] [CrossRef]
- Ghezaiel, W.; Brun, L.; Lezoray, O. Wavelet Scattering Transform and CNN for Closed Set Speaker Identification. In Proceedings of the 2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP), Tampere, Finland, 21–24 September 2020; pp. 1–6. [Google Scholar] [CrossRef]
- Adiga, A.; Magimai, M.; Seelamantula, C.S. Gammatone wavelet Cepstral Coefficients for robust speech recognition. In Proceedings of the 2013 IEEE International Conference of IEEE Region 10 (TENCON 2013), Xi’an, China, 22–25 October 2013; pp. 1–4. [Google Scholar] [CrossRef] [Green Version]
- Anden, J.; Lostanlen, V.; Mallat, S. Joint time-frequency scattering for audio classification. In Proceedings of the 2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP), Boston, MA, USA, 17–20 September 2015; pp. 1–6. [Google Scholar] [CrossRef] [Green Version]
- Kreyszig, E. Advanced Engineering Mathematics, 10th ed.; Publisher John Wiley & Sons: Hoboken, NJ, USA, 2009. [Google Scholar]
- Chaparro, L.; Akan, A. Signals and Systems Using MATLAB; Academic Press: Cambridge, MA, USA, 2018. [Google Scholar]
- Slaney, M. An Efficient Implementation of the Patterson-Holdsworth Auditory Filter Bank; Apple Computer Technical Report #35; Perception Group, Advanced Technology Group, Apple Computer Inc.: Cupertino, Santa Clara County, CA, USA, 1993; Volume 35, pp. 57–64. [Google Scholar]
- Ho, T.K. The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 1998, 20, 832–844. [Google Scholar] [CrossRef] [Green Version]
- Mesaros, A.; Heittola, T.; Diment, A.; Elizalde, B.; Shah, A.; Vincent, E.; Raj, B.; Virtanen, T. DCASE 2017 challenge setup: Tasks, datasets and baseline system. In Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events, DCASE 2017, Munich, Germany, 16–17 November 2017. [Google Scholar]
- Zhao, S.; Nguyen, T.N.T.; Gan, W.S.; Jones, D.L. ADSC submission for DCASE 2017: Acoustic scene classification using deep residual convolutional neural networks. In Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events, DCASE 2017, Munich, Germany, 16–17 November 2017. [Google Scholar]
- Jung, J.W.; Heo, H.S.; Yang, I.; Yoon, S.H.; Shim, H.J.; Yu, H.J. DNN-based audio scene classification for DCASE 2017: Dual input features, balancing cost, and stochastic data duplication. In Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events, DCASE 2017, Munich, Germany, 16–17 November 2017. [Google Scholar]
- Piczak, K.J. The details that matter: Frequency resolution of spectrograms in acoustic scene classification. In Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events, DCASE 2017, Munich, Germany, 16–17 November 2017. [Google Scholar]
- Kukanov, I.; Hautamäki, V.; Lee, K.A. Recurrent neural network and maximal figure of merit for acoustic event detection. In Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events, DCASE 2017, Munich, Germany, 16–17 November 2017. [Google Scholar]
- Park, S.; Mun, S.; Lee, Y.; Ko, H. Acoustic scene classification based on convolutional neural network using double image features. Processdings of the Detection and Classification of Acoustic Scenes and Events 2017 Workshop (DCASE 2017), Munich, Germany, 16–17 November 2017; pp. 98–102. [Google Scholar]
- Lehner, B.; Eghbal-Zadeh, H.; Dorfer, M.; Korzeniowski, F.; Koutini, K.; Widmer, G. Classifying short acoustic scenes with I-vectors and CNNs: Challenges and optimisations for the 2017 DCASE ASC task. In Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events, DCASE 2017, Munich, Germany, 16–17 November 2017. [Google Scholar]
- Hyder, R.; Ghaffarzadegan, S.; Feng, Z.; Hasan, T. Buet Bosch consortium (B2C) acoustic scene classification systems for DCASE 2017 challenge. In Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events, DCASE 2017, Munich, Germany, 16–17 November 2017. [Google Scholar]
- Zheng, W.; Jiantao, Y.; Xing, X.; Liu, X.; Peng, S. Acoustic scene classification using deep convolutional neural network and multiple spectrograms fusion. In Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events, DCASE 2017, Munich, Germany, 16–17 November 2017. [Google Scholar]
- Han, Y.; Park, J.; Lee, K. Convolutional neural networks with binaural representations and background subtraction for acoustic scene classification. In Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events, DCASE 2017, Munich, Germany, 16–17 November 2017. [Google Scholar]
- Mun, S.; Park, S.; Han, D.K.; Ko, H. Generative adversarial network based acoustic scene training set augmentation and selection using SVM hyper-plane. In Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events, DCASE 2017, Munich, Germany, 16–17 November 2017. [Google Scholar]
- Ren, Z.; Qian, K.; Zhang, Z.; Pandit, V.; Baird, A.; Schuller, B. Deep Scalogram Representations for Acoustic Scene Classification. IEEE/CAA J. Autom. Sin. 2018, 5, 662–669. [Google Scholar] [CrossRef]
- Waldekar, S.; Saha, G. Wavelet Transform Based Mel-scaled Features for Acoustic Scene Classification. In Interspeech 2018; ISCA: Singapore, 2018; pp. 3323–3327. [Google Scholar] [CrossRef] [Green Version]
- Yang, Y.; Zhang, H.; Tu, W.; Ai, H.; Cai, L.; Hu, R.; Xiang, F. Kullback–Leibler Divergence Frequency Warping Scale for Acoustic Scene Classification Using Convolutional Neural Network. In Proceedings of the ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019; pp. 840–844. [Google Scholar] [CrossRef]
- Wu, Y.; Lee, T. Enhancing Sound Texture in CNN-based Acoustic Scene Classification. In Proceedings of the ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019; pp. 815–819. [Google Scholar] [CrossRef] [Green Version]
- Chen, H.; Zhang, P.; Yan, Y. An Audio Scene Classification Framework with Embedded Filters and a DCT-based Temporal Module. In Proceedings of the ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019; pp. 835–839. [Google Scholar] [CrossRef]
- Mesaros, A.; Heittola, T.; Benetos, E.; Foster, P.; Lagrange, M.; Virtanen, T.; Plumbley, M.D. Detection and classification of acoustic scenes and events: Outcome of the DCASE 2016 challenge. IEEE/ACM Trans. Audio Speech Lang. Process. 2017, 26, 379–393. [Google Scholar] [CrossRef] [Green Version]
Development | Evaluation | |
---|---|---|
Number of Files | 4680 | 1620 |
Number of Classes | 15 | |
Duration per audio signal | 10 s | |
Data Format | 44.1 kHz, 24 bit resolution, Binaural stereo wave files | |
Location | Dataset was recorded in different cities, including London and Paris. | |
Device | Roland Edirol R-09 wave recorder | |
Task | Acoustic Scene Classification |
Ref. | Year | Test Accuracy | Detection Approach |
---|---|---|---|
[68] | 2017 | 70 | Deep residual CNN |
[69] | 2017 | 70.6 | DNN |
[70] | 2017 | 70.6 | CNN |
[71] | 2017 | 71.7 | Recurrent Neural Network (RNN) |
[72] | 2017 | 72.6 | CNN |
[73] | 2017 | 73.8 | CNN |
[74] | 2017 | 74.1 | CNN |
[75] | 2017 | 77.7 | DCNN |
[76] | 2017 | 80.4 | CNN |
[77] | 2017 | 83.3 | GAN |
[78] | 2018 | 64 | Deep scalogram representations |
[79] | 2018 | 69.9 | SVM |
[80] | 2019 | 69.3 | CNN |
[81] | 2019 | 75.4 | CNN |
[82] | 2019 | 77.1 | DCNN |
[1] | 2020 | 70 | SVM |
[23] | 2021 | 80 | CNN and Ensemble classifiers |
[27] | 2021 | 72.6 | DNN |
Channel A | 72.04 | One Ensemble classifier | |
Channel B | 76.36 | One Ensemble classifier | |
Our Method | 95 | Two Ensemble classifiers and Fusion |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Hajihashemi, V.; Gharahbagh, A.A.; Cruz, P.M.; Ferreira, M.C.; Machado, J.J.M.; Tavares, J.M.R.S. Binaural Acoustic Scene Classification Using Wavelet Scattering, Parallel Ensemble Classifiers and Nonlinear Fusion. Sensors 2022, 22, 1535. https://doi.org/10.3390/s22041535
Hajihashemi V, Gharahbagh AA, Cruz PM, Ferreira MC, Machado JJM, Tavares JMRS. Binaural Acoustic Scene Classification Using Wavelet Scattering, Parallel Ensemble Classifiers and Nonlinear Fusion. Sensors. 2022; 22(4):1535. https://doi.org/10.3390/s22041535
Chicago/Turabian StyleHajihashemi, Vahid, Abdorreza Alavi Gharahbagh, Pedro Miguel Cruz, Marta Campos Ferreira, José J. M. Machado, and João Manuel R. S. Tavares. 2022. "Binaural Acoustic Scene Classification Using Wavelet Scattering, Parallel Ensemble Classifiers and Nonlinear Fusion" Sensors 22, no. 4: 1535. https://doi.org/10.3390/s22041535
APA StyleHajihashemi, V., Gharahbagh, A. A., Cruz, P. M., Ferreira, M. C., Machado, J. J. M., & Tavares, J. M. R. S. (2022). Binaural Acoustic Scene Classification Using Wavelet Scattering, Parallel Ensemble Classifiers and Nonlinear Fusion. Sensors, 22(4), 1535. https://doi.org/10.3390/s22041535