Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (5)

Search Parameters:
Keywords = automatic music transcription (AMT)

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
10 pages, 451 KiB  
Article
PF2N: Periodicity–Frequency Fusion Network for Multi-Instrument Music Transcription
by Taehyeon Kim, Man-Je Kim and Chang Wook Ahn
Mathematics 2025, 13(11), 1708; https://doi.org/10.3390/math13111708 - 23 May 2025
Viewed by 567
Abstract
Automatic music transcription in multi-instrument settings remains a highly challenging task due to overlapping harmonics and diverse timbres. To address this, we propose the Periodicity–Frequency Fusion Network (PF2N), a lightweight and modular component that enhances transcription performance by integrating both spectral and periodicity-domain [...] Read more.
Automatic music transcription in multi-instrument settings remains a highly challenging task due to overlapping harmonics and diverse timbres. To address this, we propose the Periodicity–Frequency Fusion Network (PF2N), a lightweight and modular component that enhances transcription performance by integrating both spectral and periodicity-domain representations. Inspired by traditional combined frequency and periodicity (CFP) methods, the PF2N reformulates CFP as a neural module that jointly learns harmonically correlated features across the frequency and cepstral domains. Unlike handcrafted alignments in classical approaches, the PF2N performs data-driven fusion using a learnable joint feature extractor. Extensive experiments on three benchmark datasets (Slakh2100, MusicNet, and MAESTRO) demonstrate that the PF2N consistently improves transcription accuracy when incorporated into state-of-the-art models. The results confirm the effectiveness and adaptability of the PF2N, highlighting its potential as a general-purpose enhancement for multi-instrument AMT systems. Full article
(This article belongs to the Section E1: Mathematics and Computer Science)
Show Figures

Figure 1

16 pages, 2532 KiB  
Article
Towards Automatic Expressive Pipa Music Transcription Using Morphological Analysis of Photoelectric Signals
by Yuancheng Wang, Xuanzhe Li, Yunxiao Zhang and Qiao Wang
Sensors 2025, 25(5), 1361; https://doi.org/10.3390/s25051361 - 23 Feb 2025
Viewed by 599
Abstract
The musical signal produced by plucked instruments often exhibits non-stationarity due to variations in the pitch and amplitude, making pitch estimation a challenge. In this paper, we assess different transcription processes and algorithms applied to signals captured by optical sensors mounted on a [...] Read more.
The musical signal produced by plucked instruments often exhibits non-stationarity due to variations in the pitch and amplitude, making pitch estimation a challenge. In this paper, we assess different transcription processes and algorithms applied to signals captured by optical sensors mounted on a pipa—a traditional Chinese plucked instrument—played using a range of techniques. The captured signal demonstrates a distinctive arched feature during plucking. This facilitates onset detection to avoid the impact of the spurious energy peaks within vibration areas that arise from pitch-shift playing techniques. Subsequently, we developed a novel time–frequency feature, known as continuous time-period mapping (CTPM), which contains pitch curves. The proposed process can also be applied to playing techniques that mix pitch shifts and tremolo. When evaluated on four renowned pipa music pieces of varying difficulty levels, our fully time-domain-based onset detectors outperformed four short-time methods, particularly during tremolo. Our zero-crossing-based pitch estimator achieved a performance comparable to short-time methods with a far better computational efficiency, demonstrating its suitability for use in a lightweight algorithm in future work. Full article
(This article belongs to the Special Issue Recent Advances in Smart Mobile Sensing Technology)
Show Figures

Figure 1

15 pages, 856 KiB  
Article
DAFE-MSGAT: Dual-Attention Feature Extraction and Multi-Scale Graph Attention Network for Polyphonic Piano Transcription
by Rui Cao, Zushuang Liang, Zheng Yan and Bing Liu
Electronics 2024, 13(19), 3939; https://doi.org/10.3390/electronics13193939 - 5 Oct 2024
Viewed by 1348
Abstract
Automatic music transcription (AMT) aims to convert raw audio signals into symbolic music. This is a highly challenging task in the fields of signal processing and artificial intelligence, and it holds significant application value in music information retrieval (MIR). Existing methods based on [...] Read more.
Automatic music transcription (AMT) aims to convert raw audio signals into symbolic music. This is a highly challenging task in the fields of signal processing and artificial intelligence, and it holds significant application value in music information retrieval (MIR). Existing methods based on convolutional neural networks (CNNs) often fall short in capturing the time-frequency characteristics of audio signals and tend to overlook the interdependencies between notes when processing polyphonic piano with multiple simultaneous notes. To address these issues, we propose a dual attention feature extraction and multi-scale graph attention network (DAFE-MSGAT). Specifically, we design a dual attention feature extraction module (DAFE) to enhance the frequency and time-domain features of the audio signal, and we utilize a long short-term memory network (LSTM) to capture the temporal features within the audio signal. We introduce a multi-scale graph attention network (MSGAT), which leverages the various implicit relationships between notes to enhance the interaction between different notes. Experimental results demonstrate that our model achieves high accuracy in detecting the onset and offset of notes on public datasets. In both frame-level and note-level metrics, DAFE-MSGAT achieves performance comparable to the state-of-the-art methods, showcasing exceptional transcription capabilities. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

16 pages, 914 KiB  
Article
A Comparison of Deep Learning Methods for Timbre Analysis in Polyphonic Automatic Music Transcription
by Carlos Hernandez-Olivan, Ignacio Zay Pinilla, Carlos Hernandez-Lopez and Jose R. Beltran
Electronics 2021, 10(7), 810; https://doi.org/10.3390/electronics10070810 - 29 Mar 2021
Cited by 22 | Viewed by 5587
Abstract
Automatic music transcription (AMT) is a critical problem in the field of music information retrieval (MIR). When AMT is faced with deep neural networks, the variety of timbres of different instruments can be an issue that has not been studied in depth yet. [...] Read more.
Automatic music transcription (AMT) is a critical problem in the field of music information retrieval (MIR). When AMT is faced with deep neural networks, the variety of timbres of different instruments can be an issue that has not been studied in depth yet. The goal of this work is to address AMT transcription by analyzing how timbre affect monophonic transcription in a first approach based on the CREPE neural network and then to improve the results by performing polyphonic music transcription with different timbres with a second approach based on the Deep Salience model that performs polyphonic transcription based on the Constant-Q Transform. The results of the first method show that the timbre and envelope of the onsets have a high impact on the AMT results and the second method shows that the developed model is less dependent on the strength of the onsets than other state-of-the-art models that deal with AMT on piano sounds such as Google Magenta Onset and Frames (OaF). Our polyphonic transcription model for non-piano instruments outperforms the state-of-the-art model, such as for bass instruments, which has an F-score of 0.9516 versus 0.7102. In our latest experiment we also show how adding an onset detector to our model can outperform the results given in this work. Full article
(This article belongs to the Special Issue Machine Learning Applied to Music/Audio Signal Processing)
Show Figures

Figure 1

19 pages, 1373 KiB  
Article
Estimating the Rank of a Nonnegative Matrix Factorization Model for Automatic Music Transcription Based on Stein’s Unbiased Risk Estimator
by Seokjin Lee
Appl. Sci. 2020, 10(8), 2911; https://doi.org/10.3390/app10082911 - 23 Apr 2020
Cited by 8 | Viewed by 3185
Abstract
In this paper, methods to estimate the number of basis vectors of the nonnegative matrix factorization (NMF) of automatic music transcription (AMT) systems are proposed. Previously, studies on NMF-based AMT have demonstrated that the number of basis vectors affects the performance and that [...] Read more.
In this paper, methods to estimate the number of basis vectors of the nonnegative matrix factorization (NMF) of automatic music transcription (AMT) systems are proposed. Previously, studies on NMF-based AMT have demonstrated that the number of basis vectors affects the performance and that the number of note events can be a good selection as the rank of NMF. However, many NMF-based AMT methods do not provide a method to estimate the appropriate number of basis vectors; instead, the number is assumed to be given in advance, even though the number of basis vectors significantly affects the algorithm’s performance. Recently, based on Bayesian methods, certain estimation algorithms for the number of basis vectors have been proposed; however, they are not designed to be used as music transcription algorithms but are components of specific NMF methods and thus cannot be used generally as NMF-based transcription algorithms. Our proposed estimation algorithms are based on eigenvalue decomposition and Stein’s unbiased risk estimator (SURE). Because the SURE method requires variance in undesired components as a priori knowledge, the proposed algorithms estimate the value using random matrix theory and first and second onset information in the input music signal. Experiments were then conducted for the AMT task using the MIDI-aligned piano sounds (MAPS) database, and these algorithms were compared with variational NMF, gamma process NMF, and NMF with automatic relevance determination algorithms. Based on experimental results, the conventional NMF-based transcription algorithm with the proposed rank estimation algorithms demonstrated enhanced F1 score performances of 2–3% compared to the algorithms. While the performance advantages are not significantly large, the results are meaningful because the proposed algorithms are lightweight, are easy to combine with any other NMF methods that require an a priori rank parameter, and do not have setting parameters that considerably affect the performance. Full article
(This article belongs to the Special Issue Intelligent Speech and Acoustic Signal Processing)
Show Figures

Figure 1

Back to TopTop