MDPI - Publisher of Open Access Journals

11 pages, 3250 KiB

Open AccessArticle

A Deep Attention Model for Environmental Sound Classification from Multi-Feature Data

by Jinming Guo, Chuankun Li, Zepeng Sun, Jian Li and Pan Wang

Appl. Sci. 2022, 12(12), 5988; https://doi.org/10.3390/app12125988 - 12 Jun 2022

Cited by 13 | Viewed by 5013

Automated environmental sound recognition has clear engineering benefits; it allows audio to be sorted, curated, and searched. Unlike music and language, environmental sound is loaded with noise and lacks the rhythm and melody of music or the semantic sequence of language, making it difficult to find common features representative enough of various environmental sound signals. To improve the accuracy of environmental sound recognition, this paper proposes a recognition method based on multi-feature parameters and time–frequency attention module. It begins with a pretreatment that relies on multi-feature parameters to extract the sound, which supplements the phase information lost by the Log-Mel spectrogram in the current mainstream methods, and enhances the expressive ability of input features. A time–frequency attention module with multiple convolutions is designed to extract the attention weight of the input feature spectrogram and reduce the interference coming from the background noise and irrelevant frequency bands in the audio. Comparative experiments were conducted on three general datasets: environmental sound classification datasets (ESC-10, ESC-50) and an UrbanSound8K dataset. Experiments demonstrated that the proposed method performs better. Full article

(This article belongs to the Special Issue Computer Vision and Pattern Recognition Based on Deep Learning)

► Show Figures

Figure 1

17 pages, 610 KiB

Open AccessReview

Singing Voice Detection: A Survey

by Ramy Monir, Daniel Kostrzewa and Dariusz Mrozek

Entropy 2022, 24(1), 114; https://doi.org/10.3390/e24010114 - 12 Jan 2022

Cited by 17 | Viewed by 5571

Abstract

Singing voice detection or vocal detection is a classification task that determines whether there is a singing voice in a given audio segment. This process is a crucial preprocessing step that can be used to improve the performance of other tasks such as automatic lyrics alignment, singing melody transcription, singing voice separation, vocal melody extraction, and many more. This paper presents a survey on the techniques of singing voice detection with a deep focus on state-of-the-art algorithms such as convolutional LSTM and GRU-RNN. It illustrates a comparison between existing methods for singing voice detection, mainly based on the Jamendo and RWC datasets. Long-term recurrent convolutional networks have reached impressive results on public datasets. The main goal of the present paper is to investigate both classical and state-of-the-art approaches to singing voice detection. Full article

(This article belongs to the Special Issue Methods in Artificial Intelligence and Information Processing)

► Show Figures

Figure 1

19 pages, 4204 KiB

Open AccessArticle

Singing Transcription from Polyphonic Music Using Melody Contour Filtering

by Zhuang He and Yin Feng

Appl. Sci. 2021, 11(13), 5913; https://doi.org/10.3390/app11135913 - 25 Jun 2021

Cited by 2 | Viewed by 3206

Abstract

Automatic singing transcription and analysis from polyphonic music records are essential in a number of indexing techniques for computational auditory scenes. To obtain a note-level sequence in this work, we divide the singing transcription task into two subtasks: melody extraction and note transcription. We construct a salience function in terms of harmonic and rhythmic similarity and a measurement of spectral balance. Central to our proposed method is the measurement of melody contours, which are calculated using edge searching based on their continuity properties. We calculate the mean contour salience by separating melody analysis from the adjacent breakpoint connective strength matrix, and we select the final melody contour to determine MIDI notes. This unique method, combining audio signals with image edge analysis, provides a more interpretable analysis platform for continuous singing signals. Experimental analysis using Music Information Retrieval Evaluation Exchange (MIREX) datasets shows that our technique achieves promising results both for audio melody extraction and polyphonic singing transcription. Full article

(This article belongs to the Special Issue Advances in Computer Music)

► Show Figures

Figure 1

Search Results (3)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (3)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI