MDPI - Publisher of Open Access Journals

10 pages, 451 KiB

Open AccessFeature PaperArticle

PF2N: Periodicity–Frequency Fusion Network for Multi-Instrument Music Transcription

by Taehyeon Kim, Man-Je Kim and Chang Wook Ahn

Mathematics 2025, 13(11), 1708; https://doi.org/10.3390/math13111708 - 23 May 2025

Viewed by 567

Automatic music transcription in multi-instrument settings remains a highly challenging task due to overlapping harmonics and diverse timbres. To address this, we propose the Periodicity–Frequency Fusion Network (PF2N), a lightweight and modular component that enhances transcription performance by integrating both spectral and periodicity-domain representations. Inspired by traditional combined frequency and periodicity (CFP) methods, the PF2N reformulates CFP as a neural module that jointly learns harmonically correlated features across the frequency and cepstral domains. Unlike handcrafted alignments in classical approaches, the PF2N performs data-driven fusion using a learnable joint feature extractor. Extensive experiments on three benchmark datasets (Slakh2100, MusicNet, and MAESTRO) demonstrate that the PF2N consistently improves transcription accuracy when incorporated into state-of-the-art models. The results confirm the effectiveness and adaptability of the PF2N, highlighting its potential as a general-purpose enhancement for multi-instrument AMT systems. Full article

(This article belongs to the Section E1: Mathematics and Computer Science)

► Show Figures

Figure 1

16 pages, 2532 KiB

Open AccessArticle

Towards Automatic Expressive Pipa Music Transcription Using Morphological Analysis of Photoelectric Signals

by Yuancheng Wang, Xuanzhe Li, Yunxiao Zhang and Qiao Wang

Sensors 2025, 25(5), 1361; https://doi.org/10.3390/s25051361 - 23 Feb 2025

Viewed by 599

Abstract

The musical signal produced by plucked instruments often exhibits non-stationarity due to variations in the pitch and amplitude, making pitch estimation a challenge. In this paper, we assess different transcription processes and algorithms applied to signals captured by optical sensors mounted on a pipa—a traditional Chinese plucked instrument—played using a range of techniques. The captured signal demonstrates a distinctive arched feature during plucking. This facilitates onset detection to avoid the impact of the spurious energy peaks within vibration areas that arise from pitch-shift playing techniques. Subsequently, we developed a novel time–frequency feature, known as continuous time-period mapping (CTPM), which contains pitch curves. The proposed process can also be applied to playing techniques that mix pitch shifts and tremolo. When evaluated on four renowned pipa music pieces of varying difficulty levels, our fully time-domain-based onset detectors outperformed four short-time methods, particularly during tremolo. Our zero-crossing-based pitch estimator achieved a performance comparable to short-time methods with a far better computational efficiency, demonstrating its suitability for use in a lightweight algorithm in future work. Full article

(This article belongs to the Special Issue Recent Advances in Smart Mobile Sensing Technology)

► Show Figures

Figure 1

15 pages, 856 KiB

Open AccessArticle

DAFE-MSGAT: Dual-Attention Feature Extraction and Multi-Scale Graph Attention Network for Polyphonic Piano Transcription

by Rui Cao, Zushuang Liang, Zheng Yan and Bing Liu

Electronics 2024, 13(19), 3939; https://doi.org/10.3390/electronics13193939 - 5 Oct 2024

Viewed by 1348

Abstract

Automatic music transcription (AMT) aims to convert raw audio signals into symbolic music. This is a highly challenging task in the fields of signal processing and artificial intelligence, and it holds significant application value in music information retrieval (MIR). Existing methods based on convolutional neural networks (CNNs) often fall short in capturing the time-frequency characteristics of audio signals and tend to overlook the interdependencies between notes when processing polyphonic piano with multiple simultaneous notes. To address these issues, we propose a dual attention feature extraction and multi-scale graph attention network (DAFE-MSGAT). Specifically, we design a dual attention feature extraction module (DAFE) to enhance the frequency and time-domain features of the audio signal, and we utilize a long short-term memory network (LSTM) to capture the temporal features within the audio signal. We introduce a multi-scale graph attention network (MSGAT), which leverages the various implicit relationships between notes to enhance the interaction between different notes. Experimental results demonstrate that our model achieves high accuracy in detecting the onset and offset of notes on public datasets. In both frame-level and note-level metrics, DAFE-MSGAT achieves performance comparable to the state-of-the-art methods, showcasing exceptional transcription capabilities. Full article

(This article belongs to the Section Artificial Intelligence)

► Show Figures

Figure 1

16 pages, 914 KiB

Open AccessArticle

A Comparison of Deep Learning Methods for Timbre Analysis in Polyphonic Automatic Music Transcription

by Carlos Hernandez-Olivan, Ignacio Zay Pinilla, Carlos Hernandez-Lopez and Jose R. Beltran

Electronics 2021, 10(7), 810; https://doi.org/10.3390/electronics10070810 - 29 Mar 2021

Cited by 22 | Viewed by 5587

Abstract

Automatic music transcription (AMT) is a critical problem in the field of music information retrieval (MIR). When AMT is faced with deep neural networks, the variety of timbres of different instruments can be an issue that has not been studied in depth yet. The goal of this work is to address AMT transcription by analyzing how timbre affect monophonic transcription in a first approach based on the CREPE neural network and then to improve the results by performing polyphonic music transcription with different timbres with a second approach based on the Deep Salience model that performs polyphonic transcription based on the Constant-Q Transform. The results of the first method show that the timbre and envelope of the onsets have a high impact on the AMT results and the second method shows that the developed model is less dependent on the strength of the onsets than other state-of-the-art models that deal with AMT on piano sounds such as Google Magenta Onset and Frames (OaF). Our polyphonic transcription model for non-piano instruments outperforms the state-of-the-art model, such as for bass instruments, which has an F-score of 0.9516 versus 0.7102. In our latest experiment we also show how adding an onset detector to our model can outperform the results given in this work. Full article

(This article belongs to the Special Issue Machine Learning Applied to Music/Audio Signal Processing)

► Show Figures

Figure 1

19 pages, 1373 KiB

Open AccessArticle

Estimating the Rank of a Nonnegative Matrix Factorization Model for Automatic Music Transcription Based on Stein’s Unbiased Risk Estimator

by Seokjin Lee

Appl. Sci. 2020, 10(8), 2911; https://doi.org/10.3390/app10082911 - 23 Apr 2020

Cited by 8 | Viewed by 3185

Abstract

In this paper, methods to estimate the number of basis vectors of the nonnegative matrix factorization (NMF) of automatic music transcription (AMT) systems are proposed. Previously, studies on NMF-based AMT have demonstrated that the number of basis vectors affects the performance and that the number of note events can be a good selection as the rank of NMF. However, many NMF-based AMT methods do not provide a method to estimate the appropriate number of basis vectors; instead, the number is assumed to be given in advance, even though the number of basis vectors significantly affects the algorithm’s performance. Recently, based on Bayesian methods, certain estimation algorithms for the number of basis vectors have been proposed; however, they are not designed to be used as music transcription algorithms but are components of specific NMF methods and thus cannot be used generally as NMF-based transcription algorithms. Our proposed estimation algorithms are based on eigenvalue decomposition and Stein’s unbiased risk estimator (SURE). Because the SURE method requires variance in undesired components as a priori knowledge, the proposed algorithms estimate the value using random matrix theory and first and second onset information in the input music signal. Experiments were then conducted for the AMT task using the MIDI-aligned piano sounds (MAPS) database, and these algorithms were compared with variational NMF, gamma process NMF, and NMF with automatic relevance determination algorithms. Based on experimental results, the conventional NMF-based transcription algorithm with the proposed rank estimation algorithms demonstrated enhanced F1 score performances of 2–3% compared to the algorithms. While the performance advantages are not significantly large, the results are meaningful because the proposed algorithms are lightweight, are easy to combine with any other NMF methods that require an a priori rank parameter, and do not have setting parameters that considerably affect the performance. Full article

(This article belongs to the Special Issue Intelligent Speech and Acoustic Signal Processing)

► Show Figures

Figure 1

Search Results (5)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (5)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI