Next Article in Journal
A Review of Data-Driven Decision-Making Methods for Industry 4.0 Maintenance Applications
Next Article in Special Issue
Informing Piano Multi-Pitch Estimation with Inferred Local Polyphony Based on Convolutional Neural Networks
Previous Article in Journal
Defective Product Classification System for Smart Factory Based on Deep Learning
Previous Article in Special Issue
A Comparison of Deep Learning Methods for Timbre Analysis in Polyphonic Automatic Music Transcription
Article

Investigating the Effects of Training Set Synthesis for Audio Segmentation of Radio Broadcast

Interdisciplinary Centre for Computer Music Research, University of Plymouth, Plymouth PL4 8AA, UK
*
Author to whom correspondence should be addressed.
Academic Editors: Alexander Lerch and Peter Knees
Electronics 2021, 10(7), 827; https://doi.org/10.3390/electronics10070827
Received: 26 February 2021 / Revised: 26 March 2021 / Accepted: 28 March 2021 / Published: 31 March 2021
(This article belongs to the Special Issue Machine Learning Applied to Music/Audio Signal Processing)
Music and speech detection provides us valuable information regarding the nature of content in broadcast audio. It helps detect acoustic regions that contain speech, voice over music, only music, or silence. In recent years, there have been developments in machine learning algorithms to accomplish this task. However, broadcast audio is generally well-mixed and copyrighted, which makes it challenging to share across research groups. In this study, we address the challenges encountered in automatically synthesising data that resembles a radio broadcast. Firstly, we compare state-of-the-art neural network architectures such as CNN, GRU, LSTM, TCN, and CRNN. Later, we investigate how audio ducking of background music impacts the precision and recall of the machine learning algorithm. Thirdly, we examine how the quantity of synthetic training data impacts the results. Finally, we evaluate the effectiveness of synthesised, real-world, and combined approaches for training models, to understand if the synthetic data presents any additional value. Amongst the network architectures, CRNN was the best performing network. Results also show that the minimum level of audio ducking preferred by the machine learning algorithm was similar to that of human listeners. After testing our model on in-house and public datasets, we observe that our proposed synthesis technique outperforms real-world data in some cases and serves as a promising alternative. View Full-Text
Keywords: audio segmentation; training set synthesis; audio classification; music-speech detection; deep learning; Convolutional Recurrent Neural Network; radio; music information retrieval; automatic mixing; audio ducking audio segmentation; training set synthesis; audio classification; music-speech detection; deep learning; Convolutional Recurrent Neural Network; radio; music information retrieval; automatic mixing; audio ducking
Show Figures

Figure 1

MDPI and ACS Style

Venkatesh, S.; Moffat, D.; Miranda, E.R. Investigating the Effects of Training Set Synthesis for Audio Segmentation of Radio Broadcast. Electronics 2021, 10, 827. https://doi.org/10.3390/electronics10070827

AMA Style

Venkatesh S, Moffat D, Miranda ER. Investigating the Effects of Training Set Synthesis for Audio Segmentation of Radio Broadcast. Electronics. 2021; 10(7):827. https://doi.org/10.3390/electronics10070827

Chicago/Turabian Style

Venkatesh, Satvik, David Moffat, and Eduardo R. Miranda. 2021. "Investigating the Effects of Training Set Synthesis for Audio Segmentation of Radio Broadcast" Electronics 10, no. 7: 827. https://doi.org/10.3390/electronics10070827

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop