Next Article in Journal
Belief and Possibility Belief Interval-Valued N-Soft Set and Their Applications in Multi-Attribute Decision-Making Problems
Next Article in Special Issue
On the Problem of Small Objects
Previous Article in Journal
Accelerating Causal Inference and Feature Selection Methods through G-Test Computation Reuse
Previous Article in Special Issue
Evolution of Entropy in Art Painting Based on the Wavelet Transform
Article

Statistical and Visual Analysis of Audio, Text, and Image Features for Multi-Modal Music Genre Recognition

Department of Computer Science, Technische Universität Dortmund, 44227 Dortmund, Germany
*
Author to whom correspondence should be addressed.
Academic Editor: Gholamreza Anbarjafari
Entropy 2021, 23(11), 1502; https://doi.org/10.3390/e23111502
Received: 1 October 2021 / Revised: 7 November 2021 / Accepted: 9 November 2021 / Published: 12 November 2021
We present a multi-modal genre recognition framework that considers the modalities audio, text, and image by features extracted from audio signals, album cover images, and lyrics of music tracks. In contrast to pure learning of features by a neural network as done in the related work, handcrafted features designed for a respective modality are also integrated, allowing for higher interpretability of created models and further theoretical analysis of the impact of individual features on genre prediction. Genre recognition is performed by binary classification of a music track with respect to each genre based on combinations of elementary features. For feature combination a two-level technique is used, which combines aggregation into fixed-length feature vectors with confidence-based fusion of classification results. Extensive experiments have been conducted for three classifier models (Naïve Bayes, Support Vector Machine, and Random Forest) and numerous feature combinations. The results are presented visually, with data reduction for improved perceptibility achieved by multi-objective analysis and restriction to non-dominated data. Feature- and classifier-related hypotheses are formulated based on the data, and their statistical significance is formally analyzed. The statistical analysis shows that the combination of two modalities almost always leads to a significant increase of performance and the combination of three modalities in several cases. View Full-Text
Keywords: music genre recognition; multi-modal classification; feature evaluation; audio signal features; album cover images; lyrics music genre recognition; multi-modal classification; feature evaluation; audio signal features; album cover images; lyrics
Show Figures

Figure 1

MDPI and ACS Style

Wilkes, B.; Vatolkin, I.; Müller, H. Statistical and Visual Analysis of Audio, Text, and Image Features for Multi-Modal Music Genre Recognition. Entropy 2021, 23, 1502. https://doi.org/10.3390/e23111502

AMA Style

Wilkes B, Vatolkin I, Müller H. Statistical and Visual Analysis of Audio, Text, and Image Features for Multi-Modal Music Genre Recognition. Entropy. 2021; 23(11):1502. https://doi.org/10.3390/e23111502

Chicago/Turabian Style

Wilkes, Ben, Igor Vatolkin, and Heinrich Müller. 2021. "Statistical and Visual Analysis of Audio, Text, and Image Features for Multi-Modal Music Genre Recognition" Entropy 23, no. 11: 1502. https://doi.org/10.3390/e23111502

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop