Next Article in Journal
A Fast Binocular Localisation Method for AUV Docking
Next Article in Special Issue
ECNet: Efficient Convolutional Networks for Side Scan Sonar Image Segmentation
Previous Article in Journal
Vibration-Based In-Situ Detection and Quantification of Delamination in Composite Plates
Previous Article in Special Issue
Reducing False Arrhythmia Alarms Using Different Methods of Probability and Class Assignment in Random Forest Learning Methods
Article Menu
Issue 7 (April-1) cover image

Export Article

Open AccessArticle

Environment Sound Classification Using a Two-Stream CNN Based on Decision-Level Fusion

1
School of Astronautics, Northwestern Polytechnical University (NPU), 127 Youyi Xilu, Xi’an 710072, China
2
Signals, Images, and Intelligent Systems Laboratory (LISSI / EA 3956), Université Paris-Est, University Paris-Est Creteil, Senart-FB Institute of Technology, 36-37 rue Charpak, 77127 Lieusaint, France
*
Author to whom correspondence should be addressed.
Sensors 2019, 19(7), 1733; https://doi.org/10.3390/s19071733
Received: 13 March 2019 / Revised: 9 April 2019 / Accepted: 9 April 2019 / Published: 11 April 2019
  |  
PDF [2597 KB, uploaded 11 April 2019]
  |  

Abstract

With the popularity of using deep learning-based models in various categorization problems and their proven robustness compared to conventional methods, a growing number of researchers have exploited such methods in environment sound classification tasks in recent years. However, the performances of existing models use auditory features like log-mel spectrogram (LM) and mel frequency cepstral coefficient (MFCC), or raw waveform to train deep neural networks for environment sound classification (ESC) are unsatisfactory. In this paper, we first propose two combined features to give a more comprehensive representation of environment sounds Then, a fourfour-layer convolutional neural network (CNN) is presented to improve the performance of ESC with the proposed aggregated features. Finally, the CNN trained with different features are fused using the Dempster–Shafer evidence theory to compose TSCNN-DS model. The experiment results indicate that our combined features with the four-layer CNN are appropriate for environment sound taxonomic problems and dramatically outperform other conventional methods. The proposed TSCNN-DS model achieves a classification accuracy of 97.2%, which is the highest taxonomic accuracy on UrbanSound8K datasets compared to existing models. View Full-Text
Keywords: Auditory Cognition; Environment Sound Classification; Convolutional Neural Network; Dempster—Shafer evidence theory; Fusion Model Auditory Cognition; Environment Sound Classification; Convolutional Neural Network; Dempster—Shafer evidence theory; Fusion Model
Figures

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).
SciFeed

Share & Cite This Article

MDPI and ACS Style

Su, Y.; Zhang, K.; Wang, J.; Madani, K. Environment Sound Classification Using a Two-Stream CNN Based on Decision-Level Fusion. Sensors 2019, 19, 1733.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics

1

Comments

[Return to top]
Sensors EISSN 1424-8220 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top