This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Open AccessArticle
Explainable Instrument Classification: From MFCC Mean-Vector Models to CNNs on MFCC and Mel-Spectrograms with t-SNE and Grad-CAM Insights
by
Tommaso Senatori
Tommaso Senatori ,
Daniela Nardone
Daniela Nardone ,
Michele Lo Giudice
Michele Lo Giudice *
and
Alessandro Salvini
Alessandro Salvini
Department of Civil, Computer Science and Aeronautical Technologies Engineering, Università degli Studi Roma Tre, Via V. Volterra 62, 00146 Roma, Italy
*
Author to whom correspondence should be addressed.
Information 2025, 16(10), 864; https://doi.org/10.3390/info16100864 (registering DOI)
Submission received: 26 August 2025
/
Revised: 24 September 2025
/
Accepted: 3 October 2025
/
Published: 5 October 2025
Abstract
This paper presents an automatic system for the classification of musical instruments from audio recordings. The project leverages deep learning (DL) techniques to achieve its objective, exploring three different classification approaches based on distinct input representations. The first method involves the extraction of Mel-Frequency Cepstral Coefficients (MFCCs) from the audio files, which are then fed into a two-dimensional convolutional neural network (Conv2D). The second approach makes use of mel-spectrogram images as input to a similar Conv2D architecture. The third approach employs conventional machine learning (ML) classifiers, including Logistic Regression, K-Nearest Neighbors, and Random Forest, trained on MFCC-derived feature vectors. To gain insight into the behavior of the DL model, explainability techniques were applied to the Conv2D model using mel-spectrograms, allowing for a better understanding of how the network interprets relevant features for classification. Additionally, t-distributed stochastic neighbor embedding (t-SNE) was employed on the MFCC vectors to visualize how instrument classes are organized in the feature space. One of the main challenges encountered was the class imbalance within the dataset, which was addressed by assigning class-specific weights during training. The results, in terms of classification accuracy, were very satisfactory across all approaches, with the convolutional models and Random Forest achieving around 97–98%, and Logistic Regression yielding slightly lower performance. In conclusion, the proposed methods proved effective for the selected dataset, and future work may focus on further improving class balance techniques.
Share and Cite
MDPI and ACS Style
Senatori, T.; Nardone, D.; Lo Giudice, M.; Salvini, A.
Explainable Instrument Classification: From MFCC Mean-Vector Models to CNNs on MFCC and Mel-Spectrograms with t-SNE and Grad-CAM Insights. Information 2025, 16, 864.
https://doi.org/10.3390/info16100864
AMA Style
Senatori T, Nardone D, Lo Giudice M, Salvini A.
Explainable Instrument Classification: From MFCC Mean-Vector Models to CNNs on MFCC and Mel-Spectrograms with t-SNE and Grad-CAM Insights. Information. 2025; 16(10):864.
https://doi.org/10.3390/info16100864
Chicago/Turabian Style
Senatori, Tommaso, Daniela Nardone, Michele Lo Giudice, and Alessandro Salvini.
2025. "Explainable Instrument Classification: From MFCC Mean-Vector Models to CNNs on MFCC and Mel-Spectrograms with t-SNE and Grad-CAM Insights" Information 16, no. 10: 864.
https://doi.org/10.3390/info16100864
APA Style
Senatori, T., Nardone, D., Lo Giudice, M., & Salvini, A.
(2025). Explainable Instrument Classification: From MFCC Mean-Vector Models to CNNs on MFCC and Mel-Spectrograms with t-SNE and Grad-CAM Insights. Information, 16(10), 864.
https://doi.org/10.3390/info16100864
Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details
here.
Article Metrics
Article metric data becomes available approximately 24 hours after publication online.