Ontology-Guided Multimodal Framework for Explainable Music Similarity and Recommendation

Rumiantcev, Mikhail

doi:10.3390/bdcc10040122

This is an early access version, the complete PDF, HTML, and XML versions will be available soon.

Open AccessArticle

Ontology-Guided Multimodal Framework for Explainable Music Similarity and Recommendation

by

Mikhail Rumiantcev

Faculty of Information Technology, University of Jyväskylä, FI-40014 Jyvaskyla, Finland

Big Data Cogn. Comput. 2026, 10(4), 122; https://doi.org/10.3390/bdcc10040122

Submission received: 2 February 2026 / Revised: 4 April 2026 / Accepted: 13 April 2026 / Published: 15 April 2026

(This article belongs to the Section Cognitive System)

Download Versions Notes

Abstract

Analyzing music similarity in large catalogs is challenging because people perceive music differently and important details are found in audio, text, and metadata. This article introduces a multimodal framework that uses an ontology to make music similarity and recommendation more explainable. The framework brings together learned features from audio, lyrics, and other text with structured metadata in a shared similarity space, and then improves ranking with a music ontology that captures relationships between songs, artists, genres, and moods. The design works with any encoder that creates fixed-size features. This study uses strong neural audio and text encoders, mainly based on transformers. This approach allows the system to handle different input types while staying reliable across datasets. This study tests the framework on several open music and audio datasets using content-based retrieval tasks and standard ranking measures. In addition to Configurations C1–C4, this study includes an external content-based reference baseline based on conventional MIR audio descriptors. This baseline represents a signal-level retrieval approach that models complementary aspects of the audio signal, such as timbre, harmony, and spectral characteristics, and is evaluated under the same retrieval protocol as the main framework. It is included to provide an external comparison point outside the proposed C1–C4 design. Compared to audio-only and non-ontological variants within the same framework, the proposed multimodal and ontology-guided configurations achieve better precision, recall, and mean average precision, and also cover more rare content. Visualizations and case studies show that combining different data types and using ontology-based reranking can improve performance and make results easier to interpret. This work lays the groundwork for explainable, cognitively informed music recommendation systems and points to future work in modeling user behavior over time and adapting to different cultures.

Keywords: cognitive computing; multimodal representation learning; ontology-driven reasoning; explainable recommendation; music similarity; music information retrieval; recommender systems

Share and Cite

MDPI and ACS Style

Rumiantcev, M. Ontology-Guided Multimodal Framework for Explainable Music Similarity and Recommendation. Big Data Cogn. Comput. 2026, 10, 122. https://doi.org/10.3390/bdcc10040122

AMA Style

Rumiantcev M. Ontology-Guided Multimodal Framework for Explainable Music Similarity and Recommendation. Big Data and Cognitive Computing. 2026; 10(4):122. https://doi.org/10.3390/bdcc10040122

Chicago/Turabian Style

Rumiantcev, Mikhail. 2026. "Ontology-Guided Multimodal Framework for Explainable Music Similarity and Recommendation" Big Data and Cognitive Computing 10, no. 4: 122. https://doi.org/10.3390/bdcc10040122

APA Style

Rumiantcev, M. (2026). Ontology-Guided Multimodal Framework for Explainable Music Similarity and Recommendation. Big Data and Cognitive Computing, 10(4), 122. https://doi.org/10.3390/bdcc10040122

Article Menu

Ontology-Guided Multimodal Framework for Explainable Music Similarity and Recommendation

Abstract

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI