Next Article in Journal
PD-Type Iterative Learning Control with Adaptive Learning Gains for High-Performance Load Torque Tracking of Electric Dynamic Load Simulator
Next Article in Special Issue
Investigating the Effects of Training Set Synthesis for Audio Segmentation of Radio Broadcast
Previous Article in Journal
Means of IoT and Fuzzy Cognitive Maps in Reactive Navigation of Ubiquitous Robots
Previous Article in Special Issue
Jazz Bass Transcription Using a U-Net Architecture
Article

A Comparison of Deep Learning Methods for Timbre Analysis in Polyphonic Automatic Music Transcription

Department of Electronic Engineering and Communications, University of Zaragoza, 50018 Zaragoza, Spain
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Academic Editors: Alexander Lerch and Peter Knees
Electronics 2021, 10(7), 810; https://doi.org/10.3390/electronics10070810
Received: 26 February 2021 / Revised: 24 March 2021 / Accepted: 25 March 2021 / Published: 29 March 2021
(This article belongs to the Special Issue Machine Learning Applied to Music/Audio Signal Processing)
Automatic music transcription (AMT) is a critical problem in the field of music information retrieval (MIR). When AMT is faced with deep neural networks, the variety of timbres of different instruments can be an issue that has not been studied in depth yet. The goal of this work is to address AMT transcription by analyzing how timbre affect monophonic transcription in a first approach based on the CREPE neural network and then to improve the results by performing polyphonic music transcription with different timbres with a second approach based on the Deep Salience model that performs polyphonic transcription based on the Constant-Q Transform. The results of the first method show that the timbre and envelope of the onsets have a high impact on the AMT results and the second method shows that the developed model is less dependent on the strength of the onsets than other state-of-the-art models that deal with AMT on piano sounds such as Google Magenta Onset and Frames (OaF). Our polyphonic transcription model for non-piano instruments outperforms the state-of-the-art model, such as for bass instruments, which has an F-score of 0.9516 versus 0.7102. In our latest experiment we also show how adding an onset detector to our model can outperform the results given in this work. View Full-Text
Keywords: music transcription; music information retrieval; deep learning music transcription; music information retrieval; deep learning
Show Figures

Figure 1

MDPI and ACS Style

Hernandez-Olivan, C.; Zay Pinilla, I.; Hernandez-Lopez, C.; Beltran, J.R. A Comparison of Deep Learning Methods for Timbre Analysis in Polyphonic Automatic Music Transcription. Electronics 2021, 10, 810. https://doi.org/10.3390/electronics10070810

AMA Style

Hernandez-Olivan C, Zay Pinilla I, Hernandez-Lopez C, Beltran JR. A Comparison of Deep Learning Methods for Timbre Analysis in Polyphonic Automatic Music Transcription. Electronics. 2021; 10(7):810. https://doi.org/10.3390/electronics10070810

Chicago/Turabian Style

Hernandez-Olivan, Carlos, Ignacio Zay Pinilla, Carlos Hernandez-Lopez, and Jose R. Beltran. 2021. "A Comparison of Deep Learning Methods for Timbre Analysis in Polyphonic Automatic Music Transcription" Electronics 10, no. 7: 810. https://doi.org/10.3390/electronics10070810

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop