Jangdan as Downbeats: Rhythm-Aware Tracking for Expressive Vocal Energy and Tempo Analysis in Korean Pansori

Thapa, Nikhil; Lee, Joonwhoan

doi:10.3390/app16031235

Open AccessArticle

Jangdan as Downbeats: Rhythm-Aware Tracking for Expressive Vocal Energy and Tempo Analysis in Korean Pansori

by

Nikhil Thapa

and

Joonwhoan Lee

^*

Department of Computer Science and Engineering, Jeonbuk National University, Jeonju 54896, Republic of Korea

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(3), 1235; https://doi.org/10.3390/app16031235

Submission received: 26 December 2025 / Revised: 20 January 2026 / Accepted: 23 January 2026 / Published: 26 January 2026

(This article belongs to the Special Issue Information Retrieval: From Theory to Applications)

Download

Browse Figures

Versions Notes

Abstract

Computational musicology and music information retrieval research on Korean Pansori requires reliable analysis of vocal energy and tempo variation across rhythmic patterns known as jangdan. In this work, a jangdan is treated as a downbeat period: analogous to downbeats in Western music, it denotes both a rhythmic pattern type and the temporal span between two consecutive downbeats. Under this formulation, jangdan tracking is equivalent to downbeat tracking, allowing conventional downbeat-tracking methods to be directly applied to Pansori. Downbeat tracking in Pansori is challenging due to expressive rhythmic cycles, flexible tempi, and sparse accompaniment, which limit the generalization of systems trained on Western music. This paper proposes a rhythm-pattern-aware downbeat (i.e., jangdan) tracking framework based on offline and online Temporal Convolutional Networks (TCNs) and RoFormer-based models. A jangdan-aware Dynamic Bayesian Network (DBN) constrains minimum and maximum downbeat intervals using prior rhythmic knowledge. Using 22.4 h of annotated Pansori recordings, the proposed approach consistently outperforms general-purpose downbeat trackers across all jangdan patterns, with the offline RoFormer and tuned DBN achieving the strongest results. The improved jangdan inference enables detailed analysis of vocal energy and tempo variation. An A-weighted, beat-level vocal energy labeling method reveals characteristic energy contours aligned with specific jangdan cycles, while tempo analysis shows how performers modulate pacing in relation to rhythmic structure. These results demonstrate that identifying jangdan as a downbeat analog and incorporating rhythm-pattern-aware decoding substantially improves downbeat reliability and enables fine-grained analysis of temporal expressivity in Korean Pansori.

Keywords:

downbeat tracking; Pansori; tempo variation; vocal energy labeling; music information retrieval

1. Introduction

In Korean traditional music, rhythmic structure is organized through recurring cyclic patterns known as jangdan. While the term jangdan primarily refers to specific named rhythmic types (e.g., Jinyangjo, Jungmori, Jajinmori), it simultaneously defines the temporal interval between two successive structural accents, a unit directly analogous to the downbeat period in Western music. In this study, we adopt this perspective: each jangdan cycle corresponds to one downbeat-to-downbeat interval. Consequently, the task of detecting jangdan boundaries is equivalent to conventional downbeat tracking. This conceptual alignment allows state-of-the-art downbeat tracking techniques to be applied to Pansori without requiring a fundamental redefinition of the temporal inference problem, while clearly distinguishing rhythmic pattern classification from temporal segmentation.

The perception and organization of musical time constitute a fundamental research question across auditory science, ethnomusicology, and music information retrieval (MIR). Beats and downbeats serve as critical temporal anchors that guide bodily entrainment, build musical expectation, and support expressive interpretation. Accurate and robust identification of these anchors is therefore essential for downstream applications, including automatic music transcription, performance analysis, computer-assisted pedagogy, digital archiving, and interactive music systems. Moreover, computational modeling of rhythm and timing provides powerful tools for investigating how temporal structure contributes to emotional expression, cultural identity, and stylistic diversity in different musical traditions [1].

While beat and downbeat tracking systems have reached high maturity for Western popular and classical music, most models are trained under assumptions of isochronous meter, relatively stable tempo, and strong percussive reinforcement characteristics that rarely hold in many non-Western traditions. In repertoires featuring asymmetric meters, flexible micro-timing, polyrhythmic layering, or predominantly vocal articulation (e.g., Carnatic music, Turkish aksak rhythms, Uruguayan Candombe, or Korean Pansori), conventional systems frequently exhibit poor generalization [2,3,4]. Growing evidence suggests that culturally and stylistically specific modeling incorporating domain knowledge about rhythm, tempo plasticity, and instrumentation is often necessary to achieve reliable performance outside the Western canon.

Korean Pansori, a sung narrative art form combining virtuosic vocal expression (sori), dramatic storytelling, and minimal percussion accompaniment by the gosu (drummer), exemplifies these challenges. Pansori performances are structured around jangdan cycles that govern dramatic pacing, emotional contour, and interaction between singer (sorikkun) and drummer (see Table 1). The extreme sparsity of accompaniment, frequent shifts between singing, speech-like delivery (aniri), and highly flexible timing make automatic rhythm analysis particularly difficult. Despite increasing interest in computational ethnomusicology, systematic MIR research on Pansori rhythm remains limited, and existing general-purpose beat/downbeat trackers often fail to capture its essential temporal characteristics.

To address these issues, the present study introduces a jangdan-aware downbeat tracking framework specifically designed for Pansori. By explicitly treating jangdan cycles as downbeat periods, we adapt contemporary deep learning architectures and incorporate rhythm-specific tempo constraints within a Dynamic Bayesian Network (DBN) decoder. The improved reliability of jangdan/downbeat detection then enables novel downstream analyses of two central dimensions of Pansori expressivity: beat-level vocal energy contours and intra-cycle tempo variation. The main contributions of this work are the following:

We formalize the functional equivalence between jangdan cycles and Western downbeat periods, enabling direct application and adaptation of modern downbeat tracking methods to Pansori.
We propose a rhythm-pattern-aware downbeat tracking system that combines Temporal Convolutional Networks (TCNs) and RoFormer transformers (with rotary positional embeddings) with a jangdan-conditioned DBN decoder using culturally-informed tempo priors.
We report significantly superior downbeat tracking performance compared to general-purpose systems, both on full mixture recordings and more challengingly on isolated vocal tracks.
Utilizing the enhanced jangdan detection, we conduct the first systematic computational study of characteristic vocal energy profiles and tempo shaping strategies across major jangdan patterns, offering new quantitative insight into Pansori performance practice.

The remainder of the paper is organized as follows. Section 2 reviews prior work on beat and downbeat tracking in Western and non-Western musical traditions, with emphasis on existing computational approaches to Pansori. Section 3 describes the dataset, input representation, model architectures, post-processing, and training and inference procedures. Section 4 reports quantitative downbeat tracking results and presents analyses of beat-level vocal energy and tempo variation within jangdan cycles. Section 5 concludes the study and outlines directions for future work.

2. Related Work

2.1. Beat and Downbeat Tracking in Western Music

Contemporary beat and downbeat tracking systems for Western music predominantly employ deep neural architectures, including convolutional neural networks (CNNs), recurrent neural networks (RNNs), temporal convolutional networks (TCNs), and transformer-based models [5,6,7,8]. These approaches achieve state-of-the-art performance on large datasets of popular, electronic, jazz, and classical music by exploiting isochronous meters (e.g., 4/4, 3/4), relatively stable tempo, and strong percussive accents that reinforce metrical hierarchy [9].

2.2. Rhythmic Organization in World Musics

Beyond the Western canon, beat and downbeat tracking becomes substantially more challenging due to fundamental differences in rhythmic conception, temporal flexibility, and acoustic cues for pulse inference [3].

Several traditions share with Korean Pansori the key characteristics of high tempo flexibility and performer-driven microtiming. South Asian classical musics (Hindustani and Carnatic) allow extensive tempo rubato within fixed tāla cycles, where performers stretch or compress phrases while preserving cyclic structure [10]. Arab maqām-based music and Turkish art music similarly employ expressive tempo modulation and subtle timing deviations for narrative and emotional effect [11]. Certain West African drumming traditions (e.g., Ewe) and Afro-Cuban rumba also exhibit tempo elasticity within repeating rhythmic cycles.

However, important structural and textural differences distinguish Korean jangdan from most of these traditions. Unlike the often long and multi-layered tāla cycles of Indian classical music (typically 8–16+ beats with complex internal groupings), jangdan patterns are shorter (4–12 beats) and function more like bar-length units with identifiable downbeat equivalents. While Balkan aksak rhythms (e.g., 7/8, 9/8, 11/8) and Turkish usul patterns feature asymmetric groupings, they typically maintain more stable tempo and stronger percussive reinforcement than the sparse, vocal-dominant texture of Pansori. In contrast to the interlocking polyrhythmic layers common in many Sub-Saharan African traditions or Afro-Uruguayan Candombe [2], Pansori relies almost exclusively on the interaction between a solo vocalist and minimal drum accompaniment (gosu), yielding a much sparser signal for automatic pulse detection.

Traits such as vocal dominance, extreme tempo plasticity, frequent transitions between metric and free timing, and minimal percussion place Pansori closer (in computational difficulty) to repertoires such as Japanese nōgaku chanting, Mongolian khoomei, or Vietnamese ca trù, where pulse is conveyed primarily through vocal inflection.

Growing evidence indicates that reliable tracking in such repertoires requires explicit incorporation of culture- and style-specific knowledge, through specialized datasets [10,12], interactive adaptation [4], or probabilistic models integrating domain priors on meter, tempo range, and rhythmic hierarchy [3].

2.3. Computational Research on Korean Pansori

Computational studies of Pansori remain limited compared to other Asian traditions. Recent work has focused on rhythm pattern classification and segmentation, where semantic segmentation models such as DeepLabV3+ outperform simpler classifiers, albeit at high annotation cost [13]. Open-set jangdan recognition using lightweight encoder–decoder architectures has demonstrated strong performance across both known and unseen patterns [14]. Note-level transcription studies indicate that attention mechanisms combined with customized post-processing are critical for handling Pansori’s non-metric phrasing and vocal ornamentation (sigimsae) [15].

Several studies emphasize the challenges posed by minimal percussion, frequent spoken–sung transitions (aniri), and highly individualized timing [16]. More recent work explores computational analysis of Pansori singing through alignment of audio with Western-style transcriptions, revealing expressive features such as dynamic vibrato [16]. However, no prior study has explicitly addressed downbeat tracking (i.e., detection of jangdan cycle boundaries), nor has any work systematically examined vocal intensity contours or intra-cycle tempo variation across different jangdan types.

2.4. Context- and Culture-Aware Approaches

General strategies to improve robustness in expressive or variable-tempo music include interactive multiple-hypothesis tracking [17], incorporation of tempo and meter priors [9], genre-informed modeling [18], and real-time adaptive methods based on Predominant Local Pulse (PLP) estimation [19].

3. Materials and Methods

3.1. Data

A total of 80,743 s (22.4 h) of professionally produced Pansori recordings were collected from commercial CDs in collaboration with Jeonbuk National University’s Department of Music. All recordings were in .wav format digitized at 44.1 kHz and 16-bit resolution. The corpus covers the principal jangdan rhythmic patterns used in traditional Pansori performance, including Jinyangjo, Jungmori, Jungjungmori, Jajinmori, Hwimori, Eotmori, and Eotjungmori.

Each performance was manually annotated by expert musicians using Sonic Visualiser. Annotators identified downbeat positions and labeled the corresponding jangdan pattern as well. The final configuration allocates 80% of recordings for training (17.92 h), 10% for validation, and 10% for testing (2.24 h). The details of duration regarding the data according to the rhythm patterns are summarised in Table 2.

3.2. Input Representation

In this study, the input representation uses a log-magnitude spectrogram with a 2048-sample Hann window and a 100 ms hop size at a 22,050 Hz sampling rate, resulting in a 10 Hz frame rate. This setup is tailored to Pansori genre with highly expressive, irregular, and sparse downbeat patterns that develop over long timescales (often several seconds), unlike the regular, fast pulses common in Western music. Typical MIR beat/downbeat systems employ finer resolutions (e.g., 10–20 ms hops, or 50–100 Hz frame rates) to align with narrow tolerances (±70 ms), but these can oversample irrelevant micro-timing details in Pansori and add noise that masks broader, narrative-driven rhythms. Instead, the 100 ms hop provides sufficient sampling for slow-changing structures—aligning with MIR practices that use 5–20 Hz frame rates for slowly evolving features in low-tempo music—while improving efficiency and matching the genre’s jangdan cycles. For spectrogram extraction, all audio was converted to mono.

3.3. Model Architectures and Postprocessing

Two model families were evaluated for downbeat tracking: a Temporal Convolutional Network (TCN) [5] and a RoFormer [6]. Both architectures share a three-layer convolutional front-end using 3 × 3 kernels with ELU activations. The TCN employs 16 filters in each convolutional layer, whereas the RoFormer uses 64. A 1 × 3 max-pooling operation reduces frequency resolution and produces a sequence of feature vectors suitable for temporal modeling.

The TCN architecture, shown in Figure 1, comprises ten dilated one-dimensional convolutional layers with exponentially increasing dilation factors from

2^{0}

to

2^{11}

. The offline variant uses symmetric padding to provide non-causal context, whereas the online variant employs causal convolutions with look-ahead cropping to ensure real-time feasibility. This structure enables the model to integrate both short-range rhythmic cues and longer-range cyclical dependencies typical of jangdan patterns.

The RoFormer model, shown in Figure 2, employs a six-layer transformer encoder with a hidden size of 64 and four attention heads. Rotary positional embeddings (RoPE) are used to encode temporal relationships in a way that accommodates varying tempi, and a feed-forward expansion factor of four increases representational capacity. As with the TCN, the online RoFormer applies causal masking, whereas the offline version operates without temporal restrictions.

Frame-level downbeat activation probabilities produced by both models were refined using a rhythm-aware Dynamic Bayesian Network (DBN) based on the implementation in the madmom library [20]. The DBN incorporates jangdan-specific tempo constraints between 5 and 60 DBPM and models the latent rhythmic state using bar position, tempo, and rhythm type. Tempo transitions follow a Gaussian prior, while bar positions evolve cyclically according to the tempo. Observation likelihoods are conditioned on the model activations and vary depending on whether the DBN state corresponds to a downbeat, beat, or non-beat position. Offline predictions were obtained through Viterbi decoding; online predictions used a fixed-lag smoothing window to approximate real-time performance.

3.4. Training and Inference

All models were trained using binary cross-entropy loss optimized with the Adam algorithm (learning rate

10^{- 3}

). Training segments were one minute in length to expose the models to diverse tempo fluctuations and expressive phrasing across jangdan. Early stopping based on validation loss prevented overfitting. During inference, predicted downbeat activations were constrained to the allowed tempo range and decoded with the jangdan-aware DBN. Evaluations were conducted on entire song lengths instead of 1 min chunks that were used during training.

4. Experiments and Results

4.1. Downbeat Tracking

4.1.1. Experimental Setup

During training, Pansori recordings were segmented into 1 min chunks to expose models to a variety of rhythmic transitions within and across jangdan patterns. For evaluation, the models were tested on full-length recordings to assess their robustness to tempo modulations and metric shifts.

To evaluate tracking performance, we employed the F-measure. The F-measure is defined as the harmonic mean of precision and recall, where an estimated beat is considered a true positive if it falls within a predefined temporal tolerance window of a reference beat [21]. While conventional beat-tracking evaluations in Western music typically use a ±70 ms window, our study adopts a wider tolerance of ±1.5 s to better reflect the expressive timing characteristics of Pansori. Accordingly, a predicted downbeat was deemed correct if it occurred within this window of the ground-truth annotation. The metrics were computed using the mir_eval Python library (Python 3.8.12 and mir_eval 0.8.2).

All experiments were run on an Ubuntu Linux server (x86_64 architecture, single CPU socket, 1 NUMA node) equipped with an Intel Core i9-9900K CPU (8 cores/16 threads, 3.6–5.0 GHz), 64 GB RAM, and an NVIDIA GeForce RTX 4080 GPU. Models were implemented in Python with PyTorch (PyTorch 1.7.1); audio preprocessing, feature extraction, baseline beat/downbeat tracking and evaluation pipelines relied on librosa and madmom.

We compared our models against the RNN-DBN model from madmom [20], using it as a baseline after adapting its decoder to match jangdan-specific tempo constraints. For each rhythm pattern, we estimated beat BPMs by assuming 3-4 beats per bar and derived the corresponding min-max downbeat BPM (DBPM) ranges. These tempo priors were integrated into the DBN decoder to constrain the state transitions during inference.

4.1.2. Results

Table 3 shows the F-measures obtained by each model across the different jangdan patterns. While the madmom baseline performed reasonably well on faster rhythms such as Jajinmori and Jungjungmori (F = 0.724), it struggled with slower, expressive cycles such as Jinyangjo (F = 0.257) and Eotjungmori (F = 0.478). In contrast, the Offline TCN achieved F-measures above 0.96 for all patterns, and the Offline RoFormer peaked at 0.991 on Jungjungmori.

Offline RoFormer consistently outperformed all other models, particularly in slow and expressive patterns such as Jinyangjo (0.990) and Jungjungmori (0.991). The Offline TCN also performed well, achieving 0.989 on Jungjungmori with a simpler architecture. Online models, though slightly less accurate due to causal limitations, often retained competitive performance though slightly lower than their offline counterparts.

Table 4 demonstrates the results using exactly the same setup, but only on isolated vocals.

Several jangdan patterns, including Hwimori, Eotmori, and Eotjungmori, were intentionally excluded from training and used exclusively for evaluation due to data availability constraints. This experimental design was adopted to explicitly assess cross-pattern generalization: we aimed to examine whether models trained on a limited subset of jangdan could transfer to unseen rhythmic structures encountered in real performance settings.

While we do not claim that such generalization will always hold, the empirical results indicate that the proposed framework maintains stable performance on these unseen patterns. This suggests that the model learns rhythm-invariant acoustic cues beyond memorizing pattern-specific templates. At the same time, we acknowledge that tempo priors embedded in the DBN decoder play a significant role in this transferability. By constraining decoding with plausible tempo ranges, the system avoids implausible predictions in rubato passages, which may partially account for the observed robustness. We emphasize that these findings reflect empirical observations on the available dataset. Further validation on larger and more diverse corpora will be required to confirm the generality of this behavior.

While evaluating multiple tolerance windows can provide additional insight into temporal accuracy, reporting such metrics for all model variants would substantially increase result complexity without proportional analytical benefit. We therefore conduct tolerance-window analysis only on the best-performing model. As shown in Table 5, F-measure decreases monotonically as the tolerance narrows, indicating that the model captures coarse temporal structure rather than millisecond-level precision. The sharp drop at ±70 ms reflects substantial expressive deviation in Pansori timing, making frame-accurate alignment unrealistic. Faster patterns such as Hwimori and Eotjungmori retain higher scores under strict tolerances, whereas slower patterns such as Jinyangjo degrade rapidly due to stronger rubato. These results justify our use of wider tolerance windows and highlight fundamental differences between Pansori and Western metrical timing, shaped by the underlying jangdan structure.

4.2. Beat Level Energy Analysis

To analyze beat-level energy dynamics, we implemented A-weighted energy labeling. The frame-wise RMS energy was calculated using the STFT and weighted by a perceptual A-weighting filter. The beat-level energies were then aggregated using Gaussian smoothing. A pause threshold was defined using energy percentiles. Beats below this threshold were labeled as pauses; the others were binned into five energy levels (very low to very high) using log-scaled percentile edges. The per-beat distributions were computed by histogramming the frame energies within each beat. A pseudocode for the steps involved in this analysis is presented in Algorithm 1.

Algorithm 1: A-weighted Beat Energy Labeling

Figure 3 shows the distribution of vocal energy labels across relative beat positions within jangdan units. Each stacked bar represents the average proportion of pause, VL, L, M, H, and VH labels at each beat position, where position 0 corresponds to the downbeat, i.e., the jangdan boundary. The analysis is conducted on annotated jangdan sections whose distribution across rhythmic patterns is detailed in Table 6.

Dominant labels are annotated above the bars, revealing characteristic energy contours over the course of each jangdan cycle and highlighting contrasts between low-energy and accented beats. These contours reflect singers’ interpretive choices in aligning vocal intensity with narrative content and emotional intent within each jangdan. For example, prominent high-energy accents at specific beat positions in faster cycles such as Jajinmori and Hwimori often coincide with dramatic climaxes or emphatic textual delivery, whereas the broader low-to-mid energy distribution in slower cycles such as Jinyangjo supports sustained sorrowful or majestic expression. These patterns illustrate how performers systematically modulate vocal dynamics across a downbeat period equivalently, across a jangdan to enhance storytelling and affective impact.

The subdivision of each downbeat (jangdan) into beats is performed by uniformly dividing each jangdan interval into 18 parts for Jinyangjo and 12 parts for all other patterns, following established cultural priors.

This analysis was motivated by the observation that each jangdan exhibits a distinct pattern of lyrical alignment, in which syllables often correspond to specific beat positions with characteristic energy levels. Studying the overlap between vocal syllables and beat-level energy distributions may help to uncover the latent rhythmic structures or phrase boundaries inherent to each pattern. A detailed investigation of this alignment is left for future work.

4.3. Tempo Variation Across Downbeat Sections

Tempo analysis of Pansori performances is critical for understanding the dynamic interplay between jangdan rhythmic cycles and narrative progression. The jangdan patterns, such as Jinyangjo, Jajinmori, Jungmori, and Hwimori, exhibit distinct tempo characteristics that reflect the emotional and structural nuances of the music. These cycles, ranging from slow and meditative (e.g., Jinyangjo, typically 5–20 downbeats per minute) to fast and energetic (e.g., Hwimori, up to 60 downbeats per minute), introduce significant variability in tempo across downbeat sections, posing unique challenges for automated tracking systems.

To analyze tempo variations, our framework extracts downbeat-aligned tempo estimates from annotated Pansori recordings. We employ a multi-stage approach: Firstly, we track the downbeats using the best performing downbeat tracking model, then we identify the tempo estimation based on inter-downbeat intervals which we treat as a jangdan over the duration of the recording. Tempo was computed as the reciprocal of the median interval between consecutive downbeats within each section, expressed in downbeats per minute (DBPM). This approach accounts for the asymmetric and expressive timing inherent in Pansori, where microtiming deviations and vocal ornamentation can obscure regular pulse structures that simply measuring tempo in terms of beats per minute would not account for.

The tempo distributions in Figure 4 present examples drawn from selected representative segments of the dataset, illustrating characteristic intra-cycle tempo fluctuations within individual jangdan patterns. These examples are intended as qualitative exemplars rather than statistical norms. Pansori performers exhibit substantial variability in tempo shaping depending on narrative context, emotional intent, and individual interpretative style; therefore, the displayed distributions should not be interpreted as population-level averages.

The proposed jangdan-aware Dynamic Bayesian Network (DBN) further refines tempo estimation by incorporating rhythmic priors specific to each jangdan type. By explicitly modeling expected tempo ranges and transition probabilities between cycle sections, the DBN mitigates estimation errors caused by irregular subdivisions and drummer interjections (chuimsae). To avoid overgeneralization, we explicitly clarify that the samples shown in Figure 4 correspond to specific selected passages rather than aggregated statistics across the full corpus. They serve as illustrative case studies of typical expressive tempo behavior within individual performances, and how the presented method can be applied.

This tempo analysis enables several downstream applications. First, it supports precise alignment of lyric boundary events (buchimsae) by correlating tempo shifts with narrative transitions. Second, it facilitates the development of interactive performance systems, such as an artificial gosu that adapts to real-time tempo variation. Finally, it contributes to musicological research by providing quantitative evidence of the expressive role of jangdan tempo in Pansori, supporting digital archiving and pedagogical applications. Although such a system is not implemented in the present study, the reported gains establish its technical feasibility. This direction is further supported by prior work such as JukeDrummer [22], which demonstrates that beat-embedding representations can successfully drive adaptive virtual percussion agents. Similar beat-aware accompaniment and generation systems have been explored in related literature, confirming the viability of rhythm-conditioned interactive music systems.

Finally, the statistically consistent tempo contours and intra-cycle variation patterns extracted across trials provide quantitative evidence of the expressive role of jangdan tempo in Pansori. While these applications are left for future work, the present results establish a validated analytical foundation for musicological research, as well as for digital archiving and pedagogical tools.

5. Conclusions

This study introduced a rhythm-aware downbeat tracking framework tailored for Korean Pansori music, effectively addressing its expressive timing, sparse accompaniment, and diverse jangdan cycles. By integrating Temporal Convolutional Networks (TCNs), RoFormer-based transformers, and a jangdan aware DBN, our approach achieved state-of-the-art performance, with the offline RoFormer model, significantly outperforming general-purpose trackers even after using jangdan aware post-processing. The A-weighted energy labeling method further revealed distinct intensity patterns, deepening insights into pansori’s rhythmic and expressive dynamics. In particular, the observed tempo and energy variations across jangdan cycles highlight singers’ interpretive strategies for aligning rhythmic structure with narrative content and emotional expression. This framework advances music information retrieval (MIR) for non-Western traditions, offering a robust foundation for musicological analysis, digital archiving, and interactive performance systems. By leveraging culturally informed priors, our work underscores the potential of MIR to preserve and promote intangible cultural heritage such as Pansori, paving the way for further innovations in culturally sensitive music processing.

6. Future Work

Future research will focus on developing an artificial Pansori drummer system to serve as a virtual gosu (drummer), using real-time downbeat tracking to generate responsive chuimsae (encouragement shouts typical of Pansori performances). This system aims to function as a performance tool and practice companion, thereby supporting the digital preservation and interactive dissemination of Pansori. To address the challenges in slower jangdan cycles, we plan to enhance culturally informed decoders that better handle flexible temporal structures while maintaining rhythm-specific tempo priors during training and inference. Real-time systems could benefit from adaptive architectures or causal attention mechanisms, with potential applications in live performances. Additionally, incorporating multimodal inputs such as linguistic cues may improve tracking accuracy during extended narrative segments and syncopated phrasing. Finally, the energy labeling analysis suggests the potential for integrating vocal intensity patterns into the jangdan inference process.

Author Contributions

Conceptualization, N.T. and J.L.; methodology, N.T.; software, N.T.; validation, N.T.; formal analysis, N.T.; investigation, N.T.; resources, J.L.; data curation, N.T.; writing—original draft preparation, N.T.; writing—review and editing, N.T. and J.L.; visualization, N.T.; supervision, J.L.; project administration, J.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by the National Research Foundation of Korea (NRF) under the Development of AI-Based Analysis/Synthesis Techniques for Korean Traditional Music Project (Funding Number: RS-2024-00340948).

Data Availability Statement

Data sharing is not applicable to this article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Jia, B.; Lv, J.; Liu, D. Deep learning-based automatic downbeat tracking: A brief review. Multimed. Syst. 2019, 25, 617–638. [Google Scholar] [CrossRef]
Nunes, L.; Rocamora, M.; Jure, L.; Biscainho, L.W.P. Beat and Downbeat Tracking Based on Rhythmic Patterns Applied to the Uruguayan Candombe Drumming. In Proceedings of the 16th International Society for Music Information Retrieval Conference (ISMIR), Málaga, Spain, 26–30 October 2015; pp. 264–270. [Google Scholar]
Holzapfel, A.; Krebs, F.; Srinivasamurthy, A. Tracking the “Odd”: Meter Inference in a Culturally Diverse Music Corpus. In Proceedings of the 15th International Society for Music Information Retrieval Conference (ISMIR), Taipei, Taiwan, 27–31 October 2014; pp. 425–430. [Google Scholar]
Sá Pinto, A.; Bernardes, G. Bridging the Rhythmic Gap: A User-Centric Approach to Beat Tracking in Challenging Music Signals. In Proceedings of the 16th International Symposium on Computer Music Multidisciplinary Research (CMMR), Tokyo, Japan, 13–17 November 2023; pp. 512–523. [Google Scholar]
Davies, M.E.P.; Böck, S. Temporal Convolutional Networks for Musical Audio Beat Tracking. In Proceedings of the 27th European Signal Processing Conference (EUSIPCO), A Coruna, Spain, 2–6 September 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1–5. [Google Scholar]
Foscarin, F.; Schlüter, J.; Widmer, G. Beat this! Accurate beat tracking without DBN postprocessing. arXiv 2024, arXiv:2407.21658. [Google Scholar] [CrossRef]
Durand, S.; Bello, J.P.; David, B.; Richard, G. Downbeat tracking with multiple features and deep neural networks. In Proceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), South Brisbane, QLD, Australia, 19–24 April 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 409–413. [Google Scholar]
Thapa, N.; Lee, J. Dual-Path Beat Tracking: Combining Temporal Convolutional Networks and Transformers in Parallel. Appl. Sci. 2024, 14, 11777. [Google Scholar] [CrossRef]
Davies, M.E.P.; Plumbley, M.D. Context-Dependent Beat Tracking of Musical Audio. IEEE Trans. Audio Speech Lang. Process. 2007, 15, 1009–1020. [Google Scholar] [CrossRef]
Srinivasamurthy, A.; Serra, X. Modeling rhythmic structure in Indian classical music. Trans. Int. Soc. Music Inf. Retr. 2018, 1, 1–12. [Google Scholar]
Aydın, A.E. Automatic rhythm analysis in Turkish makam music. J. New Music Res. 2021, 50, 63–77. [Google Scholar] [CrossRef]
Kröher, N.; Müller, M. Automatic meter detection for expressive performances of classical music. In Proceedings of the 20th International Society for Music Information Retrieval Conference (ISMIR), Delft, The Netherlands, 4–8 November 2019; pp. 350–357. [Google Scholar]
Pandeya, Y.R.; Bhattarai, B.; Lee, J. Tracking the rhythm: Pansori rhythm segmentation and classification methods and datasets. Appl. Sci. 2022, 12, 9571. [Google Scholar] [CrossRef]
You, J.; Lee, J. Open-Set Recognition of Pansori Rhythm Patterns Based on Audio Segmentation. Appl. Sci. 2024, 14, 6893. [Google Scholar] [CrossRef]
Bhattarai, B.; Lee, J. Korean Pansori Vocal Note Transcription Using Attention-Based Segmentation and Viterbi Decoding. Appl. Sci. 2024, 14, 492. [Google Scholar] [CrossRef]
Park, S.; Han, D.; Jeong, D. Towards computational analysis of Pansori singing. arXiv 2024, arXiv:2410.12956. [Google Scholar] [CrossRef]
Dixon, S. An Interactive Beat Tracking and Visualisation System. In Proceedings of the 2001 International Computer Music Conference, Havana, Cuba, 17–22 September 2001. [Google Scholar]
Kao, M.Y.; Shiau, S.H.; Yang, C.B. Tempo and Beat Tracking for Audio Signals with Music Genre Classification. Int. J. Intell. Inf. Database Syst. 2007, 1, 369–384. [Google Scholar] [CrossRef]
Meier, P.; Chiu, C.Y.; Müller, M. A Real-Time Beat Tracking System with Zero Latency and Enhanced Controllability. Trans. Int. Soc. Music Inf. Retr. 2024, 7, 213–227. [Google Scholar] [CrossRef]
Böck, S.; Korzeniowski, F.; Schlüter, J.; Krebs, F.; Widmer, G. Madmom: A new python audio and music signal processing library. In Proceedings of the 24th ACM international conference on Multimedia, Amsterdam, The Netherlands, 15–19 October 2016; pp. 1174–1178. [Google Scholar] [CrossRef]
Davies, M.E.; Degara, N.; Plumbley, M.D. Evaluation Methods for Musical Audio Beat Tracking Algorithms; Tech. Rep. C4DM-TR-09-06; Queen Mary University of London, Centre for Digital Music: London, UK, 2009. [Google Scholar]
Wu, Y.K.; Chiu, C.Y.; Yang, Y.H. JukeDrummer: Conditional beat-aware audio-domain drum accompaniment generation via Transformer VQ-VAE. arXiv 2022, arXiv:2210.06007. [Google Scholar]

Figure 1. Architecture of the TCN-based downbeat tracking model.

Figure 2. Architecture of the RoFormer-based model with rotary positional embeddings.

Figure 3. Visualization of energy label distributions across beat positions for different jangdan patterns. Each subfigure shows the average proportion of energy labels (pause, VL, L, M, H, VH) at relative beat positions, with position 0 as the downbeat. (a) Jungmori. (b) Jungjungmori. (c) Jajinmori. (d) Jinyangjo.

Figure 4. Tempo distributions across beat positions for four exemplar jangdan patterns: (a) Jungmori. (b) Jungjungmori. (c) Jajinmori. (d) Jinyangjo.

Table 1. Overview of principal jangdan patterns in Pansori, including metric structure, characteristic tempo ranges (in downbeats per minute, DBPM), approximate beats per minute (BPM), and typical emotional/affective associations.

Jangdan	Meter/Subdivisions	DBPM Range	Tempo (BPM)	Characteristic Affect/Function
Jinyangjo	6 beats (3 + 3)	5–15	30–50	Majestic, tragic, highly dramatic
Jungmori	12 beats (4 + 4 + 4)	5–15	70–90	Lyrical, calm, narrative
Jungjungmori	4 beats (3 + 3 + 3 + 3)	5–30	50–90	Light, cheerful, descriptive
Jajinmori	4 beats (3 + 3 + 3 + 3)	20–40	90–144	Tense, urgent, dramatic acceleration
Hwimori	4 beats (2 + 2)	40–60	120–160	Maximum dramatic intensity
Eotmori	4 beats (3 + 2 + 3 + 2)	20–30	180–200	Mysterious, otherworldly
Eotjungmori	6 beats (2 + 2 + 2)	10–20	60–80	Closing sections, reflective

Table 2. Total duration per rhythm pattern.

Rhythm Pattern	Test Duration (s)	Train Duration (s)
Jajinmori	2397	19,179
Jinyangjo	2091	16,728
Jungjungmori	3578	28,624
Jungmori	6833	54,664
Hwimori ^*	191	–
Eotmori ^*	1603	–
Eotjungmori ^*	453	–

* Indicates the rhythm patterns were used for test only.

Table 3. F-measures across jangdan patterns using madmom, Offline/Online TCN, and RoFormer models. (* Hwimori, Eotmori, and Eotjungmori were test-only patterns, unseen during training).

Rhythm Patterns	`madmom`	Offline TCN	Online TCN	Offline RoFormer	Online RoFormer	Mean Acrsoss Models	SD Across Models
Jajinmori	0.724	0.922	0.917	0.924	0.920	0.881	0.079
Jinyangjo	0.257	0.980	0.967	0.990	0.984	0.836	0.289
Jungjungmori	0.724	0.989	0.988	0.991	0.988	0.936	0.106
Jungmori	0.593	0.983	0.970	0.985	0.984	0.903	0.155
Hwimori ^*	0.720	0.947	0.912	0.818	0.738	0.827	0.091
Eotmori ^*	0.680	0.971	0.964	0.963	0.940	0.904	0.112
Eotjungmori ^*	0.478	0.966	0.959	0.968	0.968	0.868	0.195
Mean acrsoss Rhythm Patterns	0.597	0.965	0.954	0.948	0.932	–	–
SD acrsoss Rhythm Patterns	0.161	0.022	0.026	0.057	0.082	–	–

* Indicates the rhythm patterns were used for test only.

Table 4. F-measures across jangdan patterns using madmom, Offline/Online TCN, and RoFormer models for vocals only. (* Hwimori, Eotmori, and Eotjungmori were test-only patterns, unseen during training.)

Rhythm Patterns	`madmom`	Offline TCN	Online TCN	Offline RoFormer	Online RoFormer	Mean Acrsoss Models	SD Across Models
Jajinmori	0.584	0.884	0.852	0.890	0.887	0.819	0.118
Jinyangjo	0.223	0.478	0.373	0.407	0.397	0.376	0.084
Jungjungmori	0.637	0.690	0.620	0.664	0.645	0.651	0.024
Jungmori	0.530	0.440	0.431	0.371	0.367	0.428	0.059
Hwimori ^*	0.623	0.742	0.631	0.633	0.621	0.650	0.046
Eotmori ^*	0.566	0.912	0.908	0.910	0.903	0.840	0.137
Eotjungmori ^*	0.348	0.750	0.747	0.706	0.698	0.650	0.152
Mean acrsoss Rhythm Patterns	0.502	0.699	0.652	0.654	0.645	–	–
SD acrsoss Rhythm Patterns	0.144	0.169	0.186	0.195	0.195	–	–

* Indicates the rhythm patterns were used for test only.

Table 5. F-measure of the best-performing model Offline TCN across different tolerance windows for each jangdan pattern.

Rhythm Pattern	±1.5 s	±1.0 s	±0.5 s	±70 ms
Jajinmori	0.922	0.849	0.792	0.093
Jinyangjo	0.980	0.694	0.416	0.027
Jungjungmori	0.989	0.765	0.651	0.043
Jungmori	0.983	0.769	0.477	0.010
Hwimori ^*	0.947	0.591	0.565	0.142
Eotmori ^*	0.971	0.958	0.932	0.052
Eotjungmori ^*	0.966	0.898	0.716	0.174

* Indicates rhythm patterns used for testing only.

Table 6. Distribution of annotated jangdan sections used for analysis.

Rhythm Pattern	Number of Jangdan Sections
Jinyangjo	106
Jajinmori	99
Jungmori	100
Jungjungmori	100

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Thapa, N.; Lee, J. Jangdan as Downbeats: Rhythm-Aware Tracking for Expressive Vocal Energy and Tempo Analysis in Korean Pansori. Appl. Sci. 2026, 16, 1235. https://doi.org/10.3390/app16031235

AMA Style

Thapa N, Lee J. Jangdan as Downbeats: Rhythm-Aware Tracking for Expressive Vocal Energy and Tempo Analysis in Korean Pansori. Applied Sciences. 2026; 16(3):1235. https://doi.org/10.3390/app16031235

Chicago/Turabian Style

Thapa, Nikhil, and Joonwhoan Lee. 2026. "Jangdan as Downbeats: Rhythm-Aware Tracking for Expressive Vocal Energy and Tempo Analysis in Korean Pansori" Applied Sciences 16, no. 3: 1235. https://doi.org/10.3390/app16031235

APA Style

Thapa, N., & Lee, J. (2026). Jangdan as Downbeats: Rhythm-Aware Tracking for Expressive Vocal Energy and Tempo Analysis in Korean Pansori. Applied Sciences, 16(3), 1235. https://doi.org/10.3390/app16031235

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Jangdan as Downbeats: Rhythm-Aware Tracking for Expressive Vocal Energy and Tempo Analysis in Korean Pansori

Abstract

1. Introduction

2. Related Work

2.1. Beat and Downbeat Tracking in Western Music

2.2. Rhythmic Organization in World Musics

2.3. Computational Research on Korean Pansori

2.4. Context- and Culture-Aware Approaches

3. Materials and Methods

3.1. Data

3.2. Input Representation

3.3. Model Architectures and Postprocessing

3.4. Training and Inference

4. Experiments and Results

4.1. Downbeat Tracking

4.1.1. Experimental Setup

4.1.2. Results

4.2. Beat Level Energy Analysis

4.3. Tempo Variation Across Downbeat Sections

5. Conclusions

6. Future Work

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI