Next Article in Journal
Influence of Contamination of Gear Oils in Relation to Time of Operation on Their Lubricity
Next Article in Special Issue
A General Framework for Visualization of Sound Collections in Musical Interfaces
Previous Article in Journal
Research on Shear Behavior of Sand–Structure Interface Based on Monotonic and Cyclic Tests
Previous Article in Special Issue
Automatic Evaluation of Piano Performances for STEAM Education
Article

Exploring Channel Properties to Improve Singing Voice Detection with Convolutional Neural Networks

1
School of Software Engineering, Jinling Institute of Technology, Nanjing 211169, China
2
Key Lab of Broadband Wireless Communication and Sensor Network Technology (Nanjing University of Posts and Telecommunications), Ministry of Education, Nanjing 210003, China
3
Centre for Digital Music, Queen Mary University of London, London E1 4NS, UK
4
Faculty of Science, Queensland University of Technology, Brisbane, QLD 4001, Australia
*
Author to whom correspondence should be addressed.
Academic Editor: Philippe Esling
Appl. Sci. 2021, 11(24), 11838; https://doi.org/10.3390/app112411838
Received: 20 October 2021 / Revised: 26 November 2021 / Accepted: 9 December 2021 / Published: 13 December 2021
(This article belongs to the Special Issue Advances in Computer Music)
Singing voice detection is still a challenging task because the voice can be obscured by instruments having the same frequency band, and even the same timbre, produced by mimicking the mechanism of human singing. Because of the poor adaptability and complexity of feature engineering, there is a recent trend towards feature learning in which deep neural networks play the roles of feature extraction and classification. In this paper, we present two methods to explore the channel properties in the convolution neural network to improve the performance of singing voice detection by feature learning. First, channel attention learning is presented to measure the importance of a feature, in which two attention mechanisms are exploited, i.e., the scaled dot-product and squeeze-and-excitation. This method focuses on learning the importance of the feature map so that the neurons can place more attention on the more important feature maps. Second, the multi-scale representations are fed to the input channels, aiming at adding more information in terms of scale. Generally, different songs need different scales of a spectrogram to be represented, and multi-scale representations ensure the network can choose the best one for the task. In the experimental stage, we proved the effectiveness of the two methods based on three public datasets, with the accuracy performance increasing by up to 2.13 percent compared to its already high initial level. View Full-Text
Keywords: singing voice detection; convolutional neural network; scaled dot-product; squeeze-and-excitation; multi-scale channels singing voice detection; convolutional neural network; scaled dot-product; squeeze-and-excitation; multi-scale channels
Show Figures

Figure 1

MDPI and ACS Style

Gui, W.; Li, Y.; Zang, X.; Zhang, J. Exploring Channel Properties to Improve Singing Voice Detection with Convolutional Neural Networks. Appl. Sci. 2021, 11, 11838. https://doi.org/10.3390/app112411838

AMA Style

Gui W, Li Y, Zang X, Zhang J. Exploring Channel Properties to Improve Singing Voice Detection with Convolutional Neural Networks. Applied Sciences. 2021; 11(24):11838. https://doi.org/10.3390/app112411838

Chicago/Turabian Style

Gui, Wenming, Yukun Li, Xian Zang, and Jinglan Zhang. 2021. "Exploring Channel Properties to Improve Singing Voice Detection with Convolutional Neural Networks" Applied Sciences 11, no. 24: 11838. https://doi.org/10.3390/app112411838

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop