Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (3)

Search Parameters:
Keywords = audio melody extraction

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
11 pages, 3250 KiB  
Article
A Deep Attention Model for Environmental Sound Classification from Multi-Feature Data
by Jinming Guo, Chuankun Li, Zepeng Sun, Jian Li and Pan Wang
Appl. Sci. 2022, 12(12), 5988; https://doi.org/10.3390/app12125988 - 12 Jun 2022
Cited by 13 | Viewed by 5013
Abstract
Automated environmental sound recognition has clear engineering benefits; it allows audio to be sorted, curated, and searched. Unlike music and language, environmental sound is loaded with noise and lacks the rhythm and melody of music or the semantic sequence of language, making it [...] Read more.
Automated environmental sound recognition has clear engineering benefits; it allows audio to be sorted, curated, and searched. Unlike music and language, environmental sound is loaded with noise and lacks the rhythm and melody of music or the semantic sequence of language, making it difficult to find common features representative enough of various environmental sound signals. To improve the accuracy of environmental sound recognition, this paper proposes a recognition method based on multi-feature parameters and time–frequency attention module. It begins with a pretreatment that relies on multi-feature parameters to extract the sound, which supplements the phase information lost by the Log-Mel spectrogram in the current mainstream methods, and enhances the expressive ability of input features. A time–frequency attention module with multiple convolutions is designed to extract the attention weight of the input feature spectrogram and reduce the interference coming from the background noise and irrelevant frequency bands in the audio. Comparative experiments were conducted on three general datasets: environmental sound classification datasets (ESC-10, ESC-50) and an UrbanSound8K dataset. Experiments demonstrated that the proposed method performs better. Full article
(This article belongs to the Special Issue Computer Vision and Pattern Recognition Based on Deep Learning)
Show Figures

Figure 1

17 pages, 610 KiB  
Review
Singing Voice Detection: A Survey
by Ramy Monir, Daniel Kostrzewa and Dariusz Mrozek
Entropy 2022, 24(1), 114; https://doi.org/10.3390/e24010114 - 12 Jan 2022
Cited by 17 | Viewed by 5571
Abstract
Singing voice detection or vocal detection is a classification task that determines whether there is a singing voice in a given audio segment. This process is a crucial preprocessing step that can be used to improve the performance of other tasks such as [...] Read more.
Singing voice detection or vocal detection is a classification task that determines whether there is a singing voice in a given audio segment. This process is a crucial preprocessing step that can be used to improve the performance of other tasks such as automatic lyrics alignment, singing melody transcription, singing voice separation, vocal melody extraction, and many more. This paper presents a survey on the techniques of singing voice detection with a deep focus on state-of-the-art algorithms such as convolutional LSTM and GRU-RNN. It illustrates a comparison between existing methods for singing voice detection, mainly based on the Jamendo and RWC datasets. Long-term recurrent convolutional networks have reached impressive results on public datasets. The main goal of the present paper is to investigate both classical and state-of-the-art approaches to singing voice detection. Full article
(This article belongs to the Special Issue Methods in Artificial Intelligence and Information Processing)
Show Figures

Figure 1

19 pages, 4204 KiB  
Article
Singing Transcription from Polyphonic Music Using Melody Contour Filtering
by Zhuang He and Yin Feng
Appl. Sci. 2021, 11(13), 5913; https://doi.org/10.3390/app11135913 - 25 Jun 2021
Cited by 2 | Viewed by 3206
Abstract
Automatic singing transcription and analysis from polyphonic music records are essential in a number of indexing techniques for computational auditory scenes. To obtain a note-level sequence in this work, we divide the singing transcription task into two subtasks: melody extraction and note transcription. [...] Read more.
Automatic singing transcription and analysis from polyphonic music records are essential in a number of indexing techniques for computational auditory scenes. To obtain a note-level sequence in this work, we divide the singing transcription task into two subtasks: melody extraction and note transcription. We construct a salience function in terms of harmonic and rhythmic similarity and a measurement of spectral balance. Central to our proposed method is the measurement of melody contours, which are calculated using edge searching based on their continuity properties. We calculate the mean contour salience by separating melody analysis from the adjacent breakpoint connective strength matrix, and we select the final melody contour to determine MIDI notes. This unique method, combining audio signals with image edge analysis, provides a more interpretable analysis platform for continuous singing signals. Experimental analysis using Music Information Retrieval Evaluation Exchange (MIREX) datasets shows that our technique achieves promising results both for audio melody extraction and polyphonic singing transcription. Full article
(This article belongs to the Special Issue Advances in Computer Music)
Show Figures

Figure 1

Back to TopTop