Information

Journal Browser

► Journal Browser

Artificial Intelligence for Acoustics and Audio Signal Processing

Share This Special Issue

Special Issue Editors

Dr. Michele Lo Giudice

E-Mail Website
Guest Editor

Department of Civil, Computer Science and Aeronautical Technologies Engineering, Università degli Studi Roma Tre, 00146 Roma, Italy
Interests: artificial intelligence; deep learning; signal processing; time–frequency analysis; explainability

Prof. Giosue Caliano

E-Mail Website
Guest Editor

Department of Civil, Computer Science and Aeronautical Technologies Engineering, Università degli Studi Roma Tre, 00146 Roma, Italy
Interests: sensors; electronic systems; signal processing; time–frequency analysis; artificial intelligence
Special Issues, Collections and Topics in MDPI journals

Prof. Alessandro Salvini

E-Mail Website
Guest Editor

Department of Civil, Computer Science and Aeronautical Technologies Engineering, Università degli Studi Roma Tre, 00146 Roma, Italy
Interests: computational intelligence; optimization; signal processing; time–frequency analysis; explainability

Special Issue Information

Dear Colleagues,

The processing of audio signals through Artificial Intelligence (AI)-driven approaches has gained significant importance in recent years, thanks to its ability to enhance human–machine interaction in the most natural and immediate form: sound. A wide range of applications has emerged—from fault prediction and acoustic-based defect detection in cultural heritage, to medical auscultation, modelling of sound response architecture, or acoustics, music interpretation and generation and speech emotion recognition, among many others.

In these tasks, time–frequency representations have proven to be a crucial step in preprocessing raw signals, enabling effective input formatting for neural networks. Whether using linear, logarithmic, mel-scale or mel-frequency cepstral coefficients, these 2D projections offer a rich domain where AI models can extract meaningful features for classification, detection, or synthesis.

This Special Issue aims to explore and extend the field of spectrogram-based and time–frequency neural recognition, by inviting high-quality original research contributions related (but not limited) to:

AI and deep learning applied to time–frequency analysis of audio signals
Spectrogram-based classification and pattern recognition
Audio signal preprocessing for neural network input
Audio-based anomaly or defect detection in industrial or cultural heritage domains
Speech-based emotion or health state recognition
Music genre recognition and melody and song generation
Multimodal fusion involving time–frequency features
Explainable AI for time–frequency models
Novel architectures for spectrogram understanding (e.g., CNNs, Transformers, Attention models)

We look forward to receiving your submissions and advancing the state of the art in AI-based sound analysis.

Dr. Michele Lo Giudice
Prof. Giosue Caliano
Prof. Alessandro Salvini
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Information is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1800 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Benefits of Publishing in a Special Issue

Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.

Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.

Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.

External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.

Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (1 paper)

Order results

Result details

Show export options Show export options

Select all

Export citation of selected articles as:

Research

22 pages, 1556 KB

Open AccessArticle

Explainable Instrument Classification: From MFCC Mean-Vector Models to CNNs on MFCC and Mel-Spectrograms with t-SNE and Grad-CAM Insights

by Tommaso Senatori, Daniela Nardone, Michele Lo Giudice and Alessandro Salvini

Information 2025, 16(10), 864; https://doi.org/10.3390/info16100864 - 5 Oct 2025

Viewed by 175

Abstract

This paper presents an automatic system for the classification of musical instruments from audio recordings. The project leverages deep learning (DL) techniques to achieve its objective, exploring three different classification approaches based on distinct input representations. The first method involves the extraction of Mel-Frequency Cepstral Coefficients (MFCCs) from the audio files, which are then fed into a two-dimensional convolutional neural network (Conv2D). The second approach makes use of mel-spectrogram images as input to a similar Conv2D architecture. The third approach employs conventional machine learning (ML) classifiers, including Logistic Regression, K-Nearest Neighbors, and Random Forest, trained on MFCC-derived feature vectors. To gain insight into the behavior of the DL model, explainability techniques were applied to the Conv2D model using mel-spectrograms, allowing for a better understanding of how the network interprets relevant features for classification. Additionally, t-distributed stochastic neighbor embedding (t-SNE) was employed on the MFCC vectors to visualize how instrument classes are organized in the feature space. One of the main challenges encountered was the class imbalance within the dataset, which was addressed by assigning class-specific weights during training. The results, in terms of classification accuracy, were very satisfactory across all approaches, with the convolutional models and Random Forest achieving around 97–98%, and Logistic Regression yielding slightly lower performance. In conclusion, the proposed methods proved effective for the selected dataset, and future work may focus on further improving class balance techniques. Full article

(This article belongs to the Special Issue Artificial Intelligence for Acoustics and Audio Signal Processing)

► Show Figures

Journal Menu

Journal Browser

Artificial Intelligence for Acoustics and Audio Signal Processing

Share This Special Issue

Special Issue Editors

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Published Papers (1 paper)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI