Next Article in Journal
Composite Properties and Micromechanical Analysis of Highly Ductile Cement Composite Incorporating Limestone Powder
Next Article in Special Issue
Stay True to the Sound of History: Philology, Phylogenetics and Information Engineering in Musicology
Previous Article in Journal
Research on Model-Based Fault Diagnosis for a Gas Turbine Based on Transient Performance
Previous Article in Special Issue
Analyzing Free-Hand Sound-Tracings of Melodic Phrases
Article Menu
Issue 1 (January) cover image

Export Article

Open AccessFeature PaperArticle
Appl. Sci. 2018, 8(1), 150; doi:10.3390/app8010150

SampleCNN: End-to-End Deep Convolutional Neural Networks Using Very Small Filters for Music Classification

Graduate School of Culture Technology, KAIST, 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Korea
This article is a re-written and extended version of “Sample-level Deep Convolutional Neural Networks for music auto-tagging using raw waveforms” presented at SMC 2017, Espoo, Finland on 5 July 2017.
*
Author to whom correspondence should be addressed.
Academic Editor: Meinard Müller
Received: 3 November 2017 / Revised: 14 January 2018 / Accepted: 17 January 2018 / Published: 22 January 2018
(This article belongs to the Special Issue Sound and Music Computing)
View Full-Text   |   Download PDF [14998 KB, uploaded 23 January 2018]   |  

Abstract

Convolutional Neural Networks (CNN) have been applied to diverse machine learning tasks for different modalities of raw data in an end-to-end fashion. In the audio domain, a raw waveform-based approach has been explored to directly learn hierarchical characteristics of audio. However, the majority of previous studies have limited their model capacity by taking a frame-level structure similar to short-time Fourier transforms. We previously proposed a CNN architecture which learns representations using sample-level filters beyond typical frame-level input representations. The architecture showed comparable performance to the spectrogram-based CNN model in music auto-tagging. In this paper, we extend the previous work in three ways. First, considering the sample-level model requires much longer training time, we progressively downsample the input signals and examine how it affects the performance. Second, we extend the model using multi-level and multi-scale feature aggregation technique and subsequently conduct transfer learning for several music classification tasks. Finally, we visualize filters learned by the sample-level CNN in each layer to identify hierarchically learned features and show that they are sensitive to log-scaled frequency. View Full-Text
Keywords: convolutional neural networks; music classification; raw waveforms; sample-level filters; downsampling; filter visualization; transfer learning convolutional neural networks; music classification; raw waveforms; sample-level filters; downsampling; filter visualization; transfer learning
Figures

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. (CC BY 4.0).

Share & Cite This Article

MDPI and ACS Style

Lee, J.; Park, J.; Kim, K.L.; Nam, J. SampleCNN: End-to-End Deep Convolutional Neural Networks Using Very Small Filters for Music Classification. Appl. Sci. 2018, 8, 150.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics

1

Comments

[Return to top]
Appl. Sci. EISSN 2076-3417 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top