MDPI - Publisher of Open Access Journals

17 pages, 2731 KiB

Open AccessArticle

VAT-SNet: A Convolutional Music-Separation Network Based on Vocal and Accompaniment Time-Domain Features

by Xiaoman Qiao, Min Luo, Fengjing Shao, Yi Sui, Xiaowei Yin and Rencheng Sun

Electronics 2022, 11(24), 4078; https://doi.org/10.3390/electronics11244078 - 8 Dec 2022

Cited by 3 | Viewed by 2411

The study of separating the vocal from the accompaniment in single-channel music is foundational and critical in the field of music information retrieval (MIR). Mainstream music-separation methods are usually based on the frequency-domain characteristics of music signals, and the phase information of the music is lost during time–frequency decomposition. In recent years, deep learning models based on speech time-domain signals, such as Conv-TasNet, have shown great potential. However, for the vocal and accompaniment separation problem, there is no suitable time-domain music-separation model. Since the vocal and the accompaniment in music have a higher synergy and similarity than the voices of two speakers in speech, separating the vocal and accompaniment using a speech-separation model is not ideal. Based on this, we propose VAT-SNet; this optimizes the network structure of Conv-TasNet, which sets sample-level convolution in the encoder and decoder to preserve deep acoustic features, and takes vocal embedding and accompaniment embedding generated by the auxiliary network as references to improve the purity of the separation of the vocal and accompaniment. The results from public music datasets show that the quality of the vocal and accompaniment separated by VAT-SNet is improved in GSNR, GSIR, and GSAR compared with Conv-TasNet and mainstream separation methods, such as U-Net, SH-4stack, etc. Full article

(This article belongs to the Special Issue Machine Learning in Music/Audio Signal Processing)

► Show Figures

Figure 1

23 pages, 952 KiB

Open AccessArticle

Multi-Modal Song Mood Detection with Deep Learning

by Konstantinos Pyrovolakis, Paraskevi Tzouveli and Giorgos Stamou

Sensors 2022, 22(3), 1065; https://doi.org/10.3390/s22031065 - 29 Jan 2022

Cited by 24 | Viewed by 7074

Abstract

The production and consumption of music in the contemporary era results in big data generation and creates new needs for automated and more effective management of these data. Automated music mood detection constitutes an active task in the field of MIR (Music Information Retrieval). The first approach to correlating music and mood was made in 1990 by Gordon Burner who researched the way that musical emotion affects marketing. In 2016, Lidy and Schiner trained a CNN for the task of genre and mood classification based on audio. In 2018, Delbouys et al. developed a multi-modal Deep Learning system combining CNN and LSTM architectures and concluded that multi-modal approaches overcome single channel models. This work will examine and compare single channel and multi-modal approaches for the task of music mood detection applying Deep Learning architectures. Our first approach tries to utilize the audio signal and the lyrics of a musical track separately, while the second approach applies a uniform multi-modal analysis to classify the given data into mood classes. The available data we will use to train and evaluate our models comes from the MoodyLyrics dataset, which includes 2000 song titles with labels from four mood classes, {happy, angry, sad, relaxed}. The result of this work leads to a uniform prediction of the mood that represents a music track and has usage in many applications. Full article

(This article belongs to the Collection Convolutional Neural Networks Applications in Sensing and Imaging: Architectures, Insight, Visualization, Transparency)

► Show Figures

Figure 1

Search Results (2)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (2)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI