Previous Article in Journal
Distilling the Complexity of Agent-Based Simulations Into Textual Explanations via Large Language Models
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

Ontology-Guided Multimodal Framework for Explainable Music Similarity and Recommendation

by
Mikhail Rumiantcev
Faculty of Information Technology, University of Jyväskylä, FI-40014 Jyvaskyla, Finland
Big Data Cogn. Comput. 2026, 10(4), 122; https://doi.org/10.3390/bdcc10040122
Submission received: 2 February 2026 / Revised: 4 April 2026 / Accepted: 13 April 2026 / Published: 15 April 2026
(This article belongs to the Section Cognitive System)

Abstract

Analyzing music similarity in large catalogs is challenging because people perceive music differently and important details are found in audio, text, and metadata. This article introduces a multimodal framework that uses an ontology to make music similarity and recommendation more explainable. The framework brings together learned features from audio, lyrics, and other text with structured metadata in a shared similarity space, and then improves ranking with a music ontology that captures relationships between songs, artists, genres, and moods. The design works with any encoder that creates fixed-size features. This study uses strong neural audio and text encoders, mainly based on transformers. This approach allows the system to handle different input types while staying reliable across datasets. This study tests the framework on several open music and audio datasets using content-based retrieval tasks and standard ranking measures. In addition to Configurations C1–C4, this study includes an external content-based reference baseline based on conventional MIR audio descriptors. This baseline represents a signal-level retrieval approach that models complementary aspects of the audio signal, such as timbre, harmony, and spectral characteristics, and is evaluated under the same retrieval protocol as the main framework. It is included to provide an external comparison point outside the proposed C1–C4 design. Compared to audio-only and non-ontological variants within the same framework, the proposed multimodal and ontology-guided configurations achieve better precision, recall, and mean average precision, and also cover more rare content. Visualizations and case studies show that combining different data types and using ontology-based reranking can improve performance and make results easier to interpret. This work lays the groundwork for explainable, cognitively informed music recommendation systems and points to future work in modeling user behavior over time and adapting to different cultures.
Keywords: cognitive computing; multimodal representation learning; ontology-driven reasoning; explainable recommendation; music similarity; music information retrieval; recommender systems cognitive computing; multimodal representation learning; ontology-driven reasoning; explainable recommendation; music similarity; music information retrieval; recommender systems

Share and Cite

MDPI and ACS Style

Rumiantcev, M. Ontology-Guided Multimodal Framework for Explainable Music Similarity and Recommendation. Big Data Cogn. Comput. 2026, 10, 122. https://doi.org/10.3390/bdcc10040122

AMA Style

Rumiantcev M. Ontology-Guided Multimodal Framework for Explainable Music Similarity and Recommendation. Big Data and Cognitive Computing. 2026; 10(4):122. https://doi.org/10.3390/bdcc10040122

Chicago/Turabian Style

Rumiantcev, Mikhail. 2026. "Ontology-Guided Multimodal Framework for Explainable Music Similarity and Recommendation" Big Data and Cognitive Computing 10, no. 4: 122. https://doi.org/10.3390/bdcc10040122

APA Style

Rumiantcev, M. (2026). Ontology-Guided Multimodal Framework for Explainable Music Similarity and Recommendation. Big Data and Cognitive Computing, 10(4), 122. https://doi.org/10.3390/bdcc10040122

Article Metrics

Back to TopTop