MDPI - Publisher of Open Access Journals

24 pages, 3882 KiB

Open AccessArticle

Open-Set Recognition of Pansori Rhythm Patterns Based on Audio Segmentation

by Jie You and Joonwhoan Lee

Appl. Sci. 2024, 14(16), 6893; https://doi.org/10.3390/app14166893 - 6 Aug 2024

Cited by 1 | Viewed by 1207

Pansori, a traditional Korean form of musical storytelling, is characterized by performances involving a vocalist and a drummer. It is well-known for the singer’s expressive narrative (aniri) and delicate gesture with fan in hand. The classical Pansori repertoires mostly tell love, satire, and humor, as well as some social lessons. These performances, which can extend from three to five hours, necessitate that the vocalist adheres to precise rhythmic structures. The distinctive rhythms of Pansori are crucial for conveying both the narrative and musical expression effectively. This paper explores the challenge of open-set recognition, aiming to efficiently identify unknown Pansori rhythm patterns while applying the methodology to diverse acoustic datasets, such as sound events and genres. We propose a lightweight deep learning-based encoder–decoder segmentation model, which employs a 2-D log-Mel spectrogram as input for the encoder and produces a frame-based 1-D decision along the temporal axis. This segmentation approach, processing 2-D inputs to classify frame-wise rhythm patterns, proves effective in detecting unknown patterns within time-varying sound streams encountered in daily life. Throughout the training phase, both center and supervised contrastive losses, along with cross-entropy loss, are minimized. This strategy aimed to create a compact cluster structure within the feature space for known classes, thereby facilitating the recognition of unknown rhythm patterns by allocating ample space for their placement within the embedded feature space. Comprehensive experiments utilizing various datasets—including Pansori rhythm patterns (91.8%), synthetic datasets of instrument sounds (95.1%), music genres (76.9%), and sound datasets from DCASE challenges (73.0%)—demonstrate the efficacy of our proposed method to detect unknown events, as evidenced by the AUROC metrics. Full article

(This article belongs to the Special Issue Algorithmic Music and Sound Computing)

► Show Figures

Figure 1

15 pages, 3038 KiB

Open AccessArticle

Korean Pansori Vocal Note Transcription Using Attention-Based Segmentation and Viterbi Decoding

by Bhuwan Bhattarai and Joonwhoan Lee

Appl. Sci. 2024, 14(2), 492; https://doi.org/10.3390/app14020492 - 5 Jan 2024

Viewed by 1527

Abstract

In this paper, first, we delved into the experiment by comparing various attention mechanisms in the semantic pixel-wise segmentation framework to perform frame-level transcription tasks. Second, the Viterbi algorithm was utilized by transferring the knowledge of the frame-level transcription model to obtain the vocal notes of Korean Pansori. We considered a semantic pixel-wise segmentation framework for frame-level transcription as the source task and a Viterbi algorithm-based Korean Pansori note-level transcription as the target task. The primary goal of this paper was to transcribe the vocal notes of Pansori music, a traditional Korean art form. To achieve this goal, the initial step involved conducting the experiments with the source task, where a trained model was employed for vocal melody extraction. To achieve the desired vocal note transcription for the target task, the Viterbi algorithm was utilized with the frame-level transcription model. By leveraging this approach, we sought to accurately transcribe the vocal notes present in Pansori performances. The effectiveness of our attention-based segmentation methods for frame-level transcription in the source task has been compared with various algorithms using the vocal melody task of the MedleyDB dataset, enabling us to measure the voicing recall, voicing false alarm, raw pitch accuracy, raw chroma accuracy, and overall accuracy. The results of our experiments highlight the significance of attention mechanisms for enhancing the performance of frame-level music transcription models. We also conducted a visual and subjective comparison to evaluate the results of the target task for vocal note transcription. Since there was no ground truth vocal note for Pansori, this analysis provides valuable insights into the preservation and appreciation of this culturally rich art form. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

11 pages, 2875 KiB

Open AccessArticle

Tracking the Rhythm: Pansori Rhythm Segmentation and Classification Methods and Datasets

by Yagya Raj Pandeya, Bhuwan Bhattarai and Joonwhoan Lee

Appl. Sci. 2022, 12(19), 9571; https://doi.org/10.3390/app12199571 - 23 Sep 2022

Cited by 4 | Viewed by 2425

Abstract

This paper presents two methods to understand the rhythmic patterns of the voice in Korean traditional music called Pansori. We used semantic segmentation and classification-based structural analysis methods to segment the seven rhythmic categories of Pansori. We propose two datasets; one is for rhythm classification and one is for segmentation. Two classification and two segmentation neural networks are trained and tested in an end-to-end manner. The standard HR network and DeepLabV3+ network are used for rhythm segmentation. A modified HR network and a novel GlocalMuseNet are used for the classification of music rhythm. The GlocalMuseNet outperforms the HR network for Pansori rhythm classification. A novel segmentation model (a modified HR network) is proposed for Pansori rhythm segmentation. The results show that the DeepLabV3+ network is superior to the HR network. The classifier networks are used for time-varying rhythm classification that behaves as the segmentation using overlapping window frames in a spectral representation of audio. Semantic segmentation using the DeepLabV3+ and the HR network shows better results than the classification-based structural analysis methods used in this work; however, the annotation process is relatively time-consuming and costly. Full article

(This article belongs to the Special Issue Scale Space and Variational Methods in Computer Vision)

► Show Figures

Figure 1

Search Results (3)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (3)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI