Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (9)

Search Parameters:
Keywords = vocal transcription

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
26 pages, 18583 KiB  
Article
Transforming Pedagogical Practices and Teacher Identity Through Multimodal (Inter)action Analysis: A Case Study of Novice EFL Teachers in China
by Jing Zhou, Chengfei Li and Yan Cheng
Behav. Sci. 2025, 15(8), 1050; https://doi.org/10.3390/bs15081050 - 3 Aug 2025
Viewed by 410
Abstract
This study investigates the evolving pedagogical strategies and professional identity development of two novice college English teachers in China through a semester-long classroom-based inquiry. Drawing on Norris’s Multimodal (Inter)action Analysis (MIA), it analyzes 270 min of video-recorded lessons across three instructional stages, supported [...] Read more.
This study investigates the evolving pedagogical strategies and professional identity development of two novice college English teachers in China through a semester-long classroom-based inquiry. Drawing on Norris’s Multimodal (Inter)action Analysis (MIA), it analyzes 270 min of video-recorded lessons across three instructional stages, supported by visual transcripts and pitch-intensity spectrograms. The analysis reveals each teacher’s transformation from textbook-reliant instruction to student-centered pedagogy, facilitated by multimodal strategies such as gaze, vocal pitch, gesture, and head movement. These shifts unfold across the following three evolving identity configurations: compliance, experimentation, and dialogic enactment. Rather than following a linear path, identity development is shown as a negotiated process shaped by institutional demands and classroom interactional realities. By foregrounding the multimodal enactment of self in a non-Western educational context, this study offers insights into how novice EFL teachers navigate tensions between traditional discourse norms and reform-driven pedagogical expectations, contributing to broader understandings of identity formation in global higher education. Full article
Show Figures

Figure 1

21 pages, 564 KiB  
Article
Sounding Identity: A Technical Analysis of Singing Styles in the Traditional Music of Sub-Saharan Africa
by Alfred Patrick Addaquay
Arts 2025, 14(3), 68; https://doi.org/10.3390/arts14030068 - 16 Jun 2025
Viewed by 1191
Abstract
This article presents an in-depth examination of the technical and cultural dimensions of singing practices within the traditional music of sub-Saharan Africa. Utilizing an extensive body of theoretical and ethnomusicological research, comparative transcription, and culturally situated observation, it presents a comprehensive framework for [...] Read more.
This article presents an in-depth examination of the technical and cultural dimensions of singing practices within the traditional music of sub-Saharan Africa. Utilizing an extensive body of theoretical and ethnomusicological research, comparative transcription, and culturally situated observation, it presents a comprehensive framework for understanding the significance of the human voice in various performance contexts. The study revolves around a tripartite model—auditory clarity, ambiguous auditory clarity, and occlusion—that delineates the varying levels of audibility of vocal lines amidst intricate instrumental arrangements. The article examines case studies from West, East, and Southern Africa, highlighting essential vocal techniques such as straight tone, nasal resonance, ululation, and controlled (or delayed) vibrato. It underscores the complex interplay between language, melody, and rhythm in tonal languages. The analysis delves into the influence of sound reinforcement technologies on vocal presence and cultural authenticity, positing that PA systems have the capacity to either enhance or disrupt the equilibrium between traditional aesthetics and modern requirements. This research is firmly rooted in a blend of African and Western theoretical frameworks, drawing upon the contributions of Nketia, Agawu, Chernoff, and Kubik. It proposes a nuanced methodology that integrates technical analysis with cultural significance. It posits that singing in African traditional music transcends mere expression, serving as a vessel for collective memory, identity, and the socio-musical framework. The article concludes by emphasizing the enduring strength and flexibility of African vocal traditions, illustrating their capacity for evolution while preserving fundamental communicative and artistic values. Full article
22 pages, 1596 KiB  
Article
Fuzzy Frequencies: Finding Tonal Structures in Audio Recordings of Renaissance Polyphony
by Mirjam Visscher and Frans Wiering
Heritage 2025, 8(5), 164; https://doi.org/10.3390/heritage8050164 - 6 May 2025
Viewed by 678
Abstract
Understanding tonal structures in Renaissance music has been a long-standing musicological problem. Computational analysis on a large scale could shed new light on this. Encoded scores provide easy access to pitch content, but the availability of such data is low. This paper addresses [...] Read more.
Understanding tonal structures in Renaissance music has been a long-standing musicological problem. Computational analysis on a large scale could shed new light on this. Encoded scores provide easy access to pitch content, but the availability of such data is low. This paper addresses this shortage of data by exploring the potential of audio recordings. Analysing audio, however, is challenging due to the presence of harmonics, reverb and noise, which may obscure the pitch content. We test several multiple pitch estimation models on audio recordings, using encoded scores from the Josquin Research Project (JRP) as a benchmark for evaluation. We present a dataset of multiple pitch estimations from 611 compositions in the JRP. We use the pitch estimations to create pitch profiles and pitch class profiles, and to estimate the lowest final pitch of each recording. Our findings indicate that the Multif0 model yields pitch profiles, pitch class profiles and finals most closely aligned with symbolic encodings. Furthermore, we found no effect of year of recording, number of voices and ensemble composition on the accuracy of pitch estimations. Finally, we demonstrate how these models can be applied to gain insight into tonal structures in early polyphony. Full article
Show Figures

Figure 1

20 pages, 629 KiB  
Article
Lessons in Developing a Behavioral Coding Protocol to Analyze In-the-Wild Child–Robot Interaction Events and Experiments
by Xela Indurkhya and Gentiane Venture
Electronics 2024, 13(7), 1175; https://doi.org/10.3390/electronics13071175 - 22 Mar 2024
Cited by 2 | Viewed by 2000
Abstract
Behavioral analyses of in-the-wild HRI studies generally rely on interviews or visual information from videos. This can be very limiting in settings where video recordings are not allowed or limited. We designed and tested a vocalization-based protocol to analyze in-the-wild child–robot interactions based [...] Read more.
Behavioral analyses of in-the-wild HRI studies generally rely on interviews or visual information from videos. This can be very limiting in settings where video recordings are not allowed or limited. We designed and tested a vocalization-based protocol to analyze in-the-wild child–robot interactions based upon a behavioral coding scheme utilized in wildlife biology, specifically in studies of wild dolphin populations. The audio of a video or audio recording is converted into a transcript, which is then analyzed using a behavioral coding protocol consisting of 5–6 categories (one indicating non-robot-related behavior, and 4–5 categories of robot-related behavior). Refining the code categories and training coders resulted in increased agreement between coders, but only to a level of moderate reliability, leading to our recommendation that it be used with three coders to assess where there is majority consensus, and thereby correct for subjectivity. We discuss lessons learned in the design and implementation of this protocol and the potential for future child–robot experiments analyzed through vocalization behavior. We also perform a few observational behavior analyses from vocalizations alone to demonstrate the potential of this field. Full article
Show Figures

Figure 1

15 pages, 3038 KiB  
Article
Korean Pansori Vocal Note Transcription Using Attention-Based Segmentation and Viterbi Decoding
by Bhuwan Bhattarai and Joonwhoan Lee
Appl. Sci. 2024, 14(2), 492; https://doi.org/10.3390/app14020492 - 5 Jan 2024
Viewed by 1564
Abstract
In this paper, first, we delved into the experiment by comparing various attention mechanisms in the semantic pixel-wise segmentation framework to perform frame-level transcription tasks. Second, the Viterbi algorithm was utilized by transferring the knowledge of the frame-level transcription model to obtain the [...] Read more.
In this paper, first, we delved into the experiment by comparing various attention mechanisms in the semantic pixel-wise segmentation framework to perform frame-level transcription tasks. Second, the Viterbi algorithm was utilized by transferring the knowledge of the frame-level transcription model to obtain the vocal notes of Korean Pansori. We considered a semantic pixel-wise segmentation framework for frame-level transcription as the source task and a Viterbi algorithm-based Korean Pansori note-level transcription as the target task. The primary goal of this paper was to transcribe the vocal notes of Pansori music, a traditional Korean art form. To achieve this goal, the initial step involved conducting the experiments with the source task, where a trained model was employed for vocal melody extraction. To achieve the desired vocal note transcription for the target task, the Viterbi algorithm was utilized with the frame-level transcription model. By leveraging this approach, we sought to accurately transcribe the vocal notes present in Pansori performances. The effectiveness of our attention-based segmentation methods for frame-level transcription in the source task has been compared with various algorithms using the vocal melody task of the MedleyDB dataset, enabling us to measure the voicing recall, voicing false alarm, raw pitch accuracy, raw chroma accuracy, and overall accuracy. The results of our experiments highlight the significance of attention mechanisms for enhancing the performance of frame-level music transcription models. We also conducted a visual and subjective comparison to evaluate the results of the target task for vocal note transcription. Since there was no ground truth vocal note for Pansori, this analysis provides valuable insights into the preservation and appreciation of this culturally rich art form. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

20 pages, 1278 KiB  
Review
A Comprehensive Review on Music Transcription
by Bhuwan Bhattarai and Joonwhoan Lee
Appl. Sci. 2023, 13(21), 11882; https://doi.org/10.3390/app132111882 - 30 Oct 2023
Cited by 6 | Viewed by 7978
Abstract
Music transcription is the process of transforming recorded sound of musical performances into symbolic representations such as sheet music or MIDI files. Extensive research and development have been carried out in the field of music transcription and technology. This comprehensive review paper surveys [...] Read more.
Music transcription is the process of transforming recorded sound of musical performances into symbolic representations such as sheet music or MIDI files. Extensive research and development have been carried out in the field of music transcription and technology. This comprehensive review paper surveys the diverse methodologies, techniques, and advancements that have shaped the landscape of music transcription. The paper outlines the significance of music transcription in preserving, analyzing, and disseminating musical compositions across various genres and cultures. It also provides a historical perspective by tracing the evolution of music transcription from traditional manual methods to modern automated approaches. It also highlights the challenges in transcription posed by complex singing techniques, variations in instrumentation, ambiguity in pitch, tempo changes, rhythm, and dynamics. The review also categorizes four different types of transcription techniques, frame-level, note-level, stream-level, and notation-level, discussing their strengths and limitations. It also encompasses the various research domains of music transcription from general melody extraction to vocal melody, note-level monophonic to polyphonic vocal transcription, single-instrument to multi-instrument transcription, and multi-pitch estimation. The survey further covers a broad spectrum of music transcription applications in music production and creation. It also reviews state-of-the-art open-source as well as commercial music transcription tools for pitch estimation, onset and offset detection, general melody detection, and vocal melody detection. In addition, it also encompasses the currently available python libraries that can be used for music transcription. Furthermore, the review highlights the various open-source benchmark datasets for different areas of music transcription. It also provides a wide range of references supporting the historical context, theoretical frameworks, and foundational concepts to help readers understand the background of music transcription and the context of our paper. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

28 pages, 6362 KiB  
Article
A Novel In Vivo Model of Laryngeal Papillomavirus-Associated Disease Using Mus musculus Papillomavirus
by Renee E. King, Andrea Bilger, Josef Rademacher, Ella T. Ward-Shaw, Rong Hu, Paul F. Lambert and Susan L. Thibeault
Viruses 2022, 14(5), 1000; https://doi.org/10.3390/v14051000 - 8 May 2022
Cited by 11 | Viewed by 3596
Abstract
Recurrent respiratory papillomatosis (RRP), caused by laryngeal infection with low-risk human papillomaviruses, has devastating effects on vocal communication and quality of life. Factors in RRP onset, other than viral presence in the airway, are poorly understood. RRP research has been stalled by limited [...] Read more.
Recurrent respiratory papillomatosis (RRP), caused by laryngeal infection with low-risk human papillomaviruses, has devastating effects on vocal communication and quality of life. Factors in RRP onset, other than viral presence in the airway, are poorly understood. RRP research has been stalled by limited preclinical models. The only known papillomavirus able to infect laboratory mice, Mus musculus papillomavirus (MmuPV1), induces disease in a variety of tissues. We hypothesized that MmuPV1 could infect the larynx as a foundation for a preclinical model of RRP. We further hypothesized that epithelial injury would enhance the ability of MmuPV1 to cause laryngeal disease, because injury is a potential factor in RRP and promotes MmuPV1 infection in other tissues. In this report, we infected larynges of NOD scid gamma mice with MmuPV1 with and without vocal fold abrasion and measured infection and disease pathogenesis over 12 weeks. Laryngeal disease incidence and severity increased earlier in mice that underwent injury in addition to infection. However, laryngeal disease emerged in all infected mice by week 12, with or without injury. Secondary laryngeal infections and disease arose in nude mice after MmuPV1 skin infections, confirming that experimentally induced injury is dispensable for laryngeal MmuPV1 infection and disease in immunocompromised mice. Unlike RRP, lesions were relatively flat dysplasias and they could progress to cancer. Similar to RRP, MmuPV1 transcript was detected in all laryngeal disease and in clinically normal larynges. MmuPV1 capsid protein was largely absent from the larynx, but productive infection arose in a case of squamous metaplasia at the level of the cricoid cartilage. Similar to RRP, disease spread beyond the larynx to the trachea and bronchi. This first report of laryngeal MmuPV1 infection provides a foundation for a preclinical model of RRP. Full article
Show Figures

Figure 1

17 pages, 610 KiB  
Review
Singing Voice Detection: A Survey
by Ramy Monir, Daniel Kostrzewa and Dariusz Mrozek
Entropy 2022, 24(1), 114; https://doi.org/10.3390/e24010114 - 12 Jan 2022
Cited by 18 | Viewed by 5711
Abstract
Singing voice detection or vocal detection is a classification task that determines whether there is a singing voice in a given audio segment. This process is a crucial preprocessing step that can be used to improve the performance of other tasks such as [...] Read more.
Singing voice detection or vocal detection is a classification task that determines whether there is a singing voice in a given audio segment. This process is a crucial preprocessing step that can be used to improve the performance of other tasks such as automatic lyrics alignment, singing melody transcription, singing voice separation, vocal melody extraction, and many more. This paper presents a survey on the techniques of singing voice detection with a deep focus on state-of-the-art algorithms such as convolutional LSTM and GRU-RNN. It illustrates a comparison between existing methods for singing voice detection, mainly based on the Jamendo and RWC datasets. Long-term recurrent convolutional networks have reached impressive results on public datasets. The main goal of the present paper is to investigate both classical and state-of-the-art approaches to singing voice detection. Full article
(This article belongs to the Special Issue Methods in Artificial Intelligence and Information Processing)
Show Figures

Figure 1

21 pages, 647 KiB  
Article
Automatic Transcription of Polyphonic Vocal Music
by Andrew McLeod, Rodrigo Schramm, Mark Steedman and Emmanouil Benetos
Appl. Sci. 2017, 7(12), 1285; https://doi.org/10.3390/app7121285 - 11 Dec 2017
Cited by 22 | Viewed by 6921
Abstract
This paper presents a method for automatic music transcription applied to audio recordings of a cappella performances with multiple singers. We propose a system for multi-pitch detection and voice assignment that integrates an acoustic and a music language model. The acoustic model performs [...] Read more.
This paper presents a method for automatic music transcription applied to audio recordings of a cappella performances with multiple singers. We propose a system for multi-pitch detection and voice assignment that integrates an acoustic and a music language model. The acoustic model performs spectrogram decomposition, extending probabilistic latent component analysis (PLCA) using a six-dimensional dictionary with pre-extracted log-spectral templates. The music language model performs voice separation and assignment using hidden Markov models that apply musicological assumptions. By integrating the two models, the system is able to detect multiple concurrent pitches in polyphonic vocal music and assign each detected pitch to a specific voice type such as soprano, alto, tenor or bass (SATB). We compare our system against multiple baselines, achieving state-of-the-art results for both multi-pitch detection and voice assignment on a dataset of Bach chorales and another of barbershop quartets. We also present an additional evaluation of our system using varied pitch tolerance levels to investigate its performance at 20-cent pitch resolution. Full article
(This article belongs to the Special Issue Sound and Music Computing)
Show Figures

Graphical abstract

Back to TopTop