Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (7)

Search Parameters:
Keywords = melody transcription

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
21 pages, 564 KiB  
Article
Sounding Identity: A Technical Analysis of Singing Styles in the Traditional Music of Sub-Saharan Africa
by Alfred Patrick Addaquay
Arts 2025, 14(3), 68; https://doi.org/10.3390/arts14030068 - 16 Jun 2025
Viewed by 963
Abstract
This article presents an in-depth examination of the technical and cultural dimensions of singing practices within the traditional music of sub-Saharan Africa. Utilizing an extensive body of theoretical and ethnomusicological research, comparative transcription, and culturally situated observation, it presents a comprehensive framework for [...] Read more.
This article presents an in-depth examination of the technical and cultural dimensions of singing practices within the traditional music of sub-Saharan Africa. Utilizing an extensive body of theoretical and ethnomusicological research, comparative transcription, and culturally situated observation, it presents a comprehensive framework for understanding the significance of the human voice in various performance contexts. The study revolves around a tripartite model—auditory clarity, ambiguous auditory clarity, and occlusion—that delineates the varying levels of audibility of vocal lines amidst intricate instrumental arrangements. The article examines case studies from West, East, and Southern Africa, highlighting essential vocal techniques such as straight tone, nasal resonance, ululation, and controlled (or delayed) vibrato. It underscores the complex interplay between language, melody, and rhythm in tonal languages. The analysis delves into the influence of sound reinforcement technologies on vocal presence and cultural authenticity, positing that PA systems have the capacity to either enhance or disrupt the equilibrium between traditional aesthetics and modern requirements. This research is firmly rooted in a blend of African and Western theoretical frameworks, drawing upon the contributions of Nketia, Agawu, Chernoff, and Kubik. It proposes a nuanced methodology that integrates technical analysis with cultural significance. It posits that singing in African traditional music transcends mere expression, serving as a vessel for collective memory, identity, and the socio-musical framework. The article concludes by emphasizing the enduring strength and flexibility of African vocal traditions, illustrating their capacity for evolution while preserving fundamental communicative and artistic values. Full article
15 pages, 3038 KiB  
Article
Korean Pansori Vocal Note Transcription Using Attention-Based Segmentation and Viterbi Decoding
by Bhuwan Bhattarai and Joonwhoan Lee
Appl. Sci. 2024, 14(2), 492; https://doi.org/10.3390/app14020492 - 5 Jan 2024
Viewed by 1526
Abstract
In this paper, first, we delved into the experiment by comparing various attention mechanisms in the semantic pixel-wise segmentation framework to perform frame-level transcription tasks. Second, the Viterbi algorithm was utilized by transferring the knowledge of the frame-level transcription model to obtain the [...] Read more.
In this paper, first, we delved into the experiment by comparing various attention mechanisms in the semantic pixel-wise segmentation framework to perform frame-level transcription tasks. Second, the Viterbi algorithm was utilized by transferring the knowledge of the frame-level transcription model to obtain the vocal notes of Korean Pansori. We considered a semantic pixel-wise segmentation framework for frame-level transcription as the source task and a Viterbi algorithm-based Korean Pansori note-level transcription as the target task. The primary goal of this paper was to transcribe the vocal notes of Pansori music, a traditional Korean art form. To achieve this goal, the initial step involved conducting the experiments with the source task, where a trained model was employed for vocal melody extraction. To achieve the desired vocal note transcription for the target task, the Viterbi algorithm was utilized with the frame-level transcription model. By leveraging this approach, we sought to accurately transcribe the vocal notes present in Pansori performances. The effectiveness of our attention-based segmentation methods for frame-level transcription in the source task has been compared with various algorithms using the vocal melody task of the MedleyDB dataset, enabling us to measure the voicing recall, voicing false alarm, raw pitch accuracy, raw chroma accuracy, and overall accuracy. The results of our experiments highlight the significance of attention mechanisms for enhancing the performance of frame-level music transcription models. We also conducted a visual and subjective comparison to evaluate the results of the target task for vocal note transcription. Since there was no ground truth vocal note for Pansori, this analysis provides valuable insights into the preservation and appreciation of this culturally rich art form. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

20 pages, 1278 KiB  
Review
A Comprehensive Review on Music Transcription
by Bhuwan Bhattarai and Joonwhoan Lee
Appl. Sci. 2023, 13(21), 11882; https://doi.org/10.3390/app132111882 - 30 Oct 2023
Cited by 6 | Viewed by 7698
Abstract
Music transcription is the process of transforming recorded sound of musical performances into symbolic representations such as sheet music or MIDI files. Extensive research and development have been carried out in the field of music transcription and technology. This comprehensive review paper surveys [...] Read more.
Music transcription is the process of transforming recorded sound of musical performances into symbolic representations such as sheet music or MIDI files. Extensive research and development have been carried out in the field of music transcription and technology. This comprehensive review paper surveys the diverse methodologies, techniques, and advancements that have shaped the landscape of music transcription. The paper outlines the significance of music transcription in preserving, analyzing, and disseminating musical compositions across various genres and cultures. It also provides a historical perspective by tracing the evolution of music transcription from traditional manual methods to modern automated approaches. It also highlights the challenges in transcription posed by complex singing techniques, variations in instrumentation, ambiguity in pitch, tempo changes, rhythm, and dynamics. The review also categorizes four different types of transcription techniques, frame-level, note-level, stream-level, and notation-level, discussing their strengths and limitations. It also encompasses the various research domains of music transcription from general melody extraction to vocal melody, note-level monophonic to polyphonic vocal transcription, single-instrument to multi-instrument transcription, and multi-pitch estimation. The survey further covers a broad spectrum of music transcription applications in music production and creation. It also reviews state-of-the-art open-source as well as commercial music transcription tools for pitch estimation, onset and offset detection, general melody detection, and vocal melody detection. In addition, it also encompasses the currently available python libraries that can be used for music transcription. Furthermore, the review highlights the various open-source benchmark datasets for different areas of music transcription. It also provides a wide range of references supporting the historical context, theoretical frameworks, and foundational concepts to help readers understand the background of music transcription and the context of our paper. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

9 pages, 2724 KiB  
Data Descriptor
Manual Conversion of Sadhukarn to Thai and Western Music Notations and Their Translation into a Rhyme Structure for Music Analysis
by Sumetus Eambangyung, Gretel Schwörer-Kohl and Witoon Purahong
Data 2022, 7(11), 150; https://doi.org/10.3390/data7110150 - 31 Oct 2022
Cited by 3 | Viewed by 3291
Abstract
Sadhukarn plays an important role as the most sacred music composition in Thai, Cambodian, and Lao music cultural areas. Due to various versions of unverified Sadhukarn main melodies in three different countries, notating melodies in suitable formats with a systematic method is necessary. [...] Read more.
Sadhukarn plays an important role as the most sacred music composition in Thai, Cambodian, and Lao music cultural areas. Due to various versions of unverified Sadhukarn main melodies in three different countries, notating melodies in suitable formats with a systematic method is necessary. This work provides a data descriptor for music transcription related to 25 different versions of the Sadhukarn main melody collected in Thailand, Cambodia, and Laos. Furthermore, we introduce a new procedure of music analysis based on rhyme structure. The aims of the study are to (1) provide Thai/Western musical note comprehension in the forms of Western staff and Thai notation, and (2) describe the procedures for translating from musical note to rhyme structure. To generate a rhyme structure, we apply a Thai poetic and linguistic approach as the method establishment. Rhyme structure is composed of melodic structures, the pillar tones Look-Tok, and melodic rhyming outline. Full article
Show Figures

Figure 1

17 pages, 610 KiB  
Review
Singing Voice Detection: A Survey
by Ramy Monir, Daniel Kostrzewa and Dariusz Mrozek
Entropy 2022, 24(1), 114; https://doi.org/10.3390/e24010114 - 12 Jan 2022
Cited by 18 | Viewed by 5631
Abstract
Singing voice detection or vocal detection is a classification task that determines whether there is a singing voice in a given audio segment. This process is a crucial preprocessing step that can be used to improve the performance of other tasks such as [...] Read more.
Singing voice detection or vocal detection is a classification task that determines whether there is a singing voice in a given audio segment. This process is a crucial preprocessing step that can be used to improve the performance of other tasks such as automatic lyrics alignment, singing melody transcription, singing voice separation, vocal melody extraction, and many more. This paper presents a survey on the techniques of singing voice detection with a deep focus on state-of-the-art algorithms such as convolutional LSTM and GRU-RNN. It illustrates a comparison between existing methods for singing voice detection, mainly based on the Jamendo and RWC datasets. Long-term recurrent convolutional networks have reached impressive results on public datasets. The main goal of the present paper is to investigate both classical and state-of-the-art approaches to singing voice detection. Full article
(This article belongs to the Special Issue Methods in Artificial Intelligence and Information Processing)
Show Figures

Figure 1

19 pages, 4204 KiB  
Article
Singing Transcription from Polyphonic Music Using Melody Contour Filtering
by Zhuang He and Yin Feng
Appl. Sci. 2021, 11(13), 5913; https://doi.org/10.3390/app11135913 - 25 Jun 2021
Cited by 2 | Viewed by 3254
Abstract
Automatic singing transcription and analysis from polyphonic music records are essential in a number of indexing techniques for computational auditory scenes. To obtain a note-level sequence in this work, we divide the singing transcription task into two subtasks: melody extraction and note transcription. [...] Read more.
Automatic singing transcription and analysis from polyphonic music records are essential in a number of indexing techniques for computational auditory scenes. To obtain a note-level sequence in this work, we divide the singing transcription task into two subtasks: melody extraction and note transcription. We construct a salience function in terms of harmonic and rhythmic similarity and a measurement of spectral balance. Central to our proposed method is the measurement of melody contours, which are calculated using edge searching based on their continuity properties. We calculate the mean contour salience by separating melody analysis from the adjacent breakpoint connective strength matrix, and we select the final melody contour to determine MIDI notes. This unique method, combining audio signals with image edge analysis, provides a more interpretable analysis platform for continuous singing signals. Experimental analysis using Music Information Retrieval Evaluation Exchange (MIREX) datasets shows that our technique achieves promising results both for audio melody extraction and polyphonic singing transcription. Full article
(This article belongs to the Special Issue Advances in Computer Music)
Show Figures

Figure 1

11 pages, 297 KiB  
Article
Jazz Bass Transcription Using a U-Net Architecture
by Jakob Abeßer and Meinard Müller
Electronics 2021, 10(6), 670; https://doi.org/10.3390/electronics10060670 - 12 Mar 2021
Cited by 16 | Viewed by 3293
Abstract
In this paper, we adapt a recently proposed U-net deep neural network architecture from melody to bass transcription. We investigate pitch shifting and random equalization as data augmentation techniques. In a parameter importance study, we study the influence of the skip connection strategy [...] Read more.
In this paper, we adapt a recently proposed U-net deep neural network architecture from melody to bass transcription. We investigate pitch shifting and random equalization as data augmentation techniques. In a parameter importance study, we study the influence of the skip connection strategy between the encoder and decoder layers, the data augmentation strategy, as well as of the overall model capacity on the system’s performance. Using a training set that covers various music genres and a validation set that includes jazz ensemble recordings, we obtain the best transcription performance for a downscaled version of the reference algorithm combined with skip connections that transfer intermediate activations between the encoder and decoder. The U-net based method outperforms previous knowledge-driven and data-driven bass transcription algorithms by around five percentage points in overall accuracy. In addition to a pitch estimation improvement, the voicing estimation performance is clearly enhanced. Full article
(This article belongs to the Special Issue Machine Learning Applied to Music/Audio Signal Processing)
Show Figures

Figure 1

Back to TopTop