Previous Article in Journal
A Personality-Informed Candidate Recommendation Framework for Recruitment Using MBTI Typology
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

Explainable Instrument Classification: From MFCC Mean-Vector Models to CNNs on MFCC and Mel-Spectrograms with t-SNE and Grad-CAM Insights

Department of Civil, Computer Science and Aeronautical Technologies Engineering, Università degli Studi Roma Tre, Via V. Volterra 62, 00146 Roma, Italy
*
Author to whom correspondence should be addressed.
Information 2025, 16(10), 864; https://doi.org/10.3390/info16100864 (registering DOI)
Submission received: 26 August 2025 / Revised: 24 September 2025 / Accepted: 3 October 2025 / Published: 5 October 2025
(This article belongs to the Special Issue Artificial Intelligence for Acoustics and Audio Signal Processing)

Abstract

This paper presents an automatic system for the classification of musical instruments from audio recordings. The project leverages deep learning (DL) techniques to achieve its objective, exploring three different classification approaches based on distinct input representations. The first method involves the extraction of Mel-Frequency Cepstral Coefficients (MFCCs) from the audio files, which are then fed into a two-dimensional convolutional neural network (Conv2D). The second approach makes use of mel-spectrogram images as input to a similar Conv2D architecture. The third approach employs conventional machine learning (ML) classifiers, including Logistic Regression, K-Nearest Neighbors, and Random Forest, trained on MFCC-derived feature vectors. To gain insight into the behavior of the DL model, explainability techniques were applied to the Conv2D model using mel-spectrograms, allowing for a better understanding of how the network interprets relevant features for classification. Additionally, t-distributed stochastic neighbor embedding (t-SNE) was employed on the MFCC vectors to visualize how instrument classes are organized in the feature space. One of the main challenges encountered was the class imbalance within the dataset, which was addressed by assigning class-specific weights during training. The results, in terms of classification accuracy, were very satisfactory across all approaches, with the convolutional models and Random Forest achieving around 97–98%, and Logistic Regression yielding slightly lower performance. In conclusion, the proposed methods proved effective for the selected dataset, and future work may focus on further improving class balance techniques.
Keywords: audio signal processing; deep learning; convolutional neural networks (CNN); machine learning; explainability; Mel-Frequency Cepstral Coefficients (MFCC); mel-spectrogram audio signal processing; deep learning; convolutional neural networks (CNN); machine learning; explainability; Mel-Frequency Cepstral Coefficients (MFCC); mel-spectrogram

Share and Cite

MDPI and ACS Style

Senatori, T.; Nardone, D.; Lo Giudice, M.; Salvini, A. Explainable Instrument Classification: From MFCC Mean-Vector Models to CNNs on MFCC and Mel-Spectrograms with t-SNE and Grad-CAM Insights. Information 2025, 16, 864. https://doi.org/10.3390/info16100864

AMA Style

Senatori T, Nardone D, Lo Giudice M, Salvini A. Explainable Instrument Classification: From MFCC Mean-Vector Models to CNNs on MFCC and Mel-Spectrograms with t-SNE and Grad-CAM Insights. Information. 2025; 16(10):864. https://doi.org/10.3390/info16100864

Chicago/Turabian Style

Senatori, Tommaso, Daniela Nardone, Michele Lo Giudice, and Alessandro Salvini. 2025. "Explainable Instrument Classification: From MFCC Mean-Vector Models to CNNs on MFCC and Mel-Spectrograms with t-SNE and Grad-CAM Insights" Information 16, no. 10: 864. https://doi.org/10.3390/info16100864

APA Style

Senatori, T., Nardone, D., Lo Giudice, M., & Salvini, A. (2025). Explainable Instrument Classification: From MFCC Mean-Vector Models to CNNs on MFCC and Mel-Spectrograms with t-SNE and Grad-CAM Insights. Information, 16(10), 864. https://doi.org/10.3390/info16100864

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop