Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (3)

Search Parameters:
Keywords = automatic drum transcription

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
20 pages, 686 KiB  
Article
High-Quality and Reproducible Automatic Drum Transcription from Crowdsourced Data
by Mickaël Zehren, Marco Alunno and Paolo Bientinesi
Signals 2023, 4(4), 768-787; https://doi.org/10.3390/signals4040042 - 10 Nov 2023
Cited by 2 | Viewed by 2919
Abstract
Within the broad problem known as automatic music transcription, we considered the specific task of automatic drum transcription (ADT). This is a complex task that has recently shown significant advances thanks to deep learning (DL) techniques. Most notably, massive amounts of labeled data [...] Read more.
Within the broad problem known as automatic music transcription, we considered the specific task of automatic drum transcription (ADT). This is a complex task that has recently shown significant advances thanks to deep learning (DL) techniques. Most notably, massive amounts of labeled data obtained from crowds of annotators have made it possible to implement large-scale supervised learning architectures for ADT. In this study, we explored the untapped potential of these new datasets by addressing three key points: First, we reviewed recent trends in DL architectures and focused on two techniques, self-attention mechanisms and tatum-synchronous convolutions. Then, to mitigate the noise and bias that are inherent in crowdsourced data, we extended the training data with additional annotations. Finally, to quantify the potential of the data, we compared many training scenarios by combining up to six different datasets, including zero-shot evaluations. Our findings revealed that crowdsourced datasets outperform previously utilized datasets, and regardless of the DL architecture employed, they are sufficient in size and quality to train accurate models. By fully exploiting this data source, our models produced high-quality drum transcriptions, achieving state-of-the-art results. Thanks to this accuracy, our work can be more successfully used by musicians (e.g., to learn new musical pieces by reading, or to convert their performances to MIDI) and researchers in music information retrieval (e.g., to retrieve information from the notes instead of audio, such as the rhythm or structure of a piece). Full article
(This article belongs to the Topic Research on the Application of Digital Signal Processing)
Show Figures

Figure 1

19 pages, 10088 KiB  
Article
Global Structure-Aware Drum Transcription Based on Self-Attention Mechanisms
by Ryoto Ishizuka, Ryo Nishikimi and Kazuyoshi Yoshii
Signals 2021, 2(3), 508-526; https://doi.org/10.3390/signals2030031 - 13 Aug 2021
Cited by 8 | Viewed by 3639
Abstract
This paper describes an automatic drum transcription (ADT) method that directly estimates a tatum-level drum score from a music signal in contrast to most conventional ADT methods that estimate the frame-level onset probabilities of drums. To estimate a tatum-level score, we propose a [...] Read more.
This paper describes an automatic drum transcription (ADT) method that directly estimates a tatum-level drum score from a music signal in contrast to most conventional ADT methods that estimate the frame-level onset probabilities of drums. To estimate a tatum-level score, we propose a deep transcription model that consists of a frame-level encoder for extracting the latent features from a music signal and a tatum-level decoder for estimating a drum score from the latent features pooled at the tatum level. To capture the global repetitive structure of drum scores, which is difficult to learn with a recurrent neural network (RNN), we introduce a self-attention mechanism with tatum-synchronous positional encoding into the decoder. To mitigate the difficulty of training the self-attention-based model from an insufficient amount of paired data and to improve the musical naturalness of the estimated scores, we propose a regularized training method that uses a global structure-aware masked language (score) model with a self-attention mechanism pretrained from an extensive collection of drum scores. The experimental results showed that the proposed regularized model outperformed the conventional RNN-based model in terms of the tatum-level error rate and the frame-level F-measure, even when only a limited amount of paired data was available so that the non-regularized model underperformed the RNN-based model. Full article
(This article belongs to the Special Issue Advances in Processing and Understanding of Music Signals)
Show Figures

Figure 1

26 pages, 880 KiB  
Article
Sigmoidal NMFD: Convolutional NMF with Saturating Activations for Drum Mixture Decomposition
by Len Vande Veire, Cedric De Boom and Tijl De Bie
Electronics 2021, 10(3), 284; https://doi.org/10.3390/electronics10030284 - 25 Jan 2021
Cited by 9 | Viewed by 3681
Abstract
In many types of music, percussion plays an essential role to establish the rhythm and the groove of the music. Algorithms that can decompose the percussive signal into its constituent components would therefore be very useful, as they would enable many analytical and [...] Read more.
In many types of music, percussion plays an essential role to establish the rhythm and the groove of the music. Algorithms that can decompose the percussive signal into its constituent components would therefore be very useful, as they would enable many analytical and creative applications. This paper describes a method for the unsupervised decomposition of percussive recordings, building on the non-negative matrix factor deconvolution (NMFD) algorithm. Given a percussive music recording, NMFD discovers a dictionary of time-varying spectral templates and corresponding activation functions, representing its constituent sounds and their positions in the mix. We observe, however, that the activation functions discovered using NMFD do not show the expected impulse-like behavior for percussive instruments. We therefore enforce this behavior by specifying that the activations should take on binary values: either an instrument is hit, or it is not. To this end, we rewrite the activations as the output of a sigmoidal function, multiplied with a per-component amplitude factor. We furthermore define a regularization term that biases the decomposition to solutions with saturated activations, leading to the desired binary behavior. We evaluate several optimization strategies and techniques that are designed to avoid poor local minima. We show that incentivizing the activations to be binary indeed leads to the desired impulse-like behavior, and that the resulting components are better separated, leading to more interpretable decompositions. Full article
(This article belongs to the Special Issue Machine Learning Applied to Music/Audio Signal Processing)
Show Figures

Graphical abstract

Back to TopTop