Next Article in Journal
Multi-Scale Study on Mechanical Property and Strength of New Green Sand (Poly Lactic Acid) as Replacement of Fine Aggregate in Concrete Mix
Previous Article in Journal
Phenomenological Effects of CPT and Lorentz Invariance Violation in Particle and Astroparticle Physics
Article

Efficient Classification of Environmental Sounds through Multiple Features Aggregation and Data Enhancement Techniques for Spectrogram Images

Department of Electrical Engineering, National Taiwan University of Science & Technology (NTUST), Taipei 106, Taiwan
*
Author to whom correspondence should be addressed.
Symmetry 2020, 12(11), 1822; https://doi.org/10.3390/sym12111822
Received: 6 October 2020 / Revised: 27 October 2020 / Accepted: 29 October 2020 / Published: 3 November 2020
(This article belongs to the Section Computer Science and Symmetry/Asymmetry)
Over the past few years, the study of environmental sound classification (ESC) has become very popular due to the intricate nature of environmental sounds. This paper reports our study on employing various acoustic features aggregation and data enhancement approaches for the effective classification of environmental sounds. The proposed data augmentation techniques are mixtures of the reinforcement, aggregation, and combination of distinct acoustics features. These features are known as spectrogram image features (SIFs) and retrieved by different audio feature extraction techniques. All audio features used in this manuscript are categorized into two groups: one with general features and the other with Mel filter bank-based acoustic features. Two novel and innovative features based on the logarithmic scale of the Mel spectrogram (Mel), Log (Log-Mel) and Log (Log (Log-Mel)) denoted as L2M and L3M are introduced in this paper. In our study, three prevailing ESC benchmark datasets, ESC-10, ESC-50, and Urbansound8k (Us8k) are used. Most of the audio clips in these datasets are not fully acquired with sound and include silence parts. Therefore, silence trimming is implemented as one of the pre-processing techniques. The training is conducted by using the transfer learning model DenseNet-161, which is further fine-tuned with individual optimal learning rates based on the discriminative learning technique. The proposed methodologies attain state-of-the-art outcomes for all used ESC datasets, i.e., 99.22% for ESC-10, 98.52% for ESC-50, and 97.98% for Us8k. This work also considers real-time audio data to evaluate the performance and efficiency of the proposed techniques. The implemented approaches also have competitive results on real-time audio data. View Full-Text
Keywords: data augmentation; environmental sound classification; transfer learning; features aggregation; ESC-10; ESC-50; urban sound 8k data augmentation; environmental sound classification; transfer learning; features aggregation; ESC-10; ESC-50; urban sound 8k
Show Figures

Figure 1

MDPI and ACS Style

Mushtaq, Z.; Su, S.-F. Efficient Classification of Environmental Sounds through Multiple Features Aggregation and Data Enhancement Techniques for Spectrogram Images. Symmetry 2020, 12, 1822. https://doi.org/10.3390/sym12111822

AMA Style

Mushtaq Z, Su S-F. Efficient Classification of Environmental Sounds through Multiple Features Aggregation and Data Enhancement Techniques for Spectrogram Images. Symmetry. 2020; 12(11):1822. https://doi.org/10.3390/sym12111822

Chicago/Turabian Style

Mushtaq, Zohaib, and Shun-Feng Su. 2020. "Efficient Classification of Environmental Sounds through Multiple Features Aggregation and Data Enhancement Techniques for Spectrogram Images" Symmetry 12, no. 11: 1822. https://doi.org/10.3390/sym12111822

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop