Machine Learning in Audio Signal Processing and Music Information Retrieval

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: 20 June 2024 | Viewed by 3960

Special Issue Editors


E-Mail Website
Guest Editor
Application of Information and Communication Technologies (ATIC) Research Group, ETSI Telecomunicación, Campus Universitario de Teatinos s/n, 29071 Malaga, Spain
Interests: serious games; digital audio and image processing; pattern analysis and recognition and applications of signal processing techniques and methods
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Application of Information and Communication Technologies (ATIC) Research Group, ETSI Telecomunicación, Campus Universitario de Teatinos s/n, 29071 Malaga, Spain
Interests: music information retrieval; audio signal processing; machine learning; musical acoustics; serious games; eeg signal processing; multimedia aplications
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Machine learning methods and applications have been utilized for a while; recently, there has been a growth in the multimedia content and databases available, as well as computational advances, artificial intelligence techniques, and, especially, deep learning methods. These have spread across all application areas in the multimedia signal application framework and, within this context, in the audio and music signal research topics, including music information retrieval.

These methods cover a wide range of techniques, from classical machine learning methods to the recently developed deep neural networks, with an application in a large variety of tasks including audio classification, source separation, enhancement, transcription, indexation, content creation, entertainment, gaming, etc.

In this context, there is still ample room for research in innovation. This Special Issue aims to provide the research community with a space to share their recent findings and advances. The topics of interest include, but are not limited to, the following:

  • Machine learning methods for music/audio information retrieval, indexation, and querying;
  • Music instrument identification, synthesis, transformation, and classification;
  • Symbolic music processing;
  • Machine learning for the discovery of musical structure, segmentation, and form: melody and motives, harmony, chords and tonality, rhythm, beat, tempo, timbre, instrumentation and voice, style, and genre;
  • Musical content creation: melodies, accompaniment, orchestration, etc;
  • Machine learning methods for natural language processing, text, and web mining;
  • Sound source separation;
  • Music transcription and annotation, alignment, synchronization, and score following. Optical music recognition;
  • Audio fingerprinting;
  • Machine learning approaches for visualization, auralization, and sonification;
  • Music recommendation and playlist generation;
  • Music and health, wellbeing, therapy, music training, and education;
  • Machine learning methods for music and audio in gaming.

Prof. Dr. Lorenzo J. Tardón
Prof. Dr. Isabel Barbancho
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • music information retrieval
  • machine learning for audio and music
  • intelligent audio signal processing
  • audio analysis and transformation

Published Papers (4 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

13 pages, 385 KiB  
Article
Attributes Relevance in Content-Based Music Recommendation System
by Daniel Kostrzewa, Jonatan Chrobak and Robert Brzeski
Appl. Sci. 2024, 14(2), 855; https://doi.org/10.3390/app14020855 - 19 Jan 2024
Cited by 1 | Viewed by 758
Abstract
The possibility of recommendations of musical songs is becoming increasingly required because of the millions of users and songs included in online databases. Therefore, effective methods that automatically solve this issue need to be created. In this paper, the mentioned task is solved [...] Read more.
The possibility of recommendations of musical songs is becoming increasingly required because of the millions of users and songs included in online databases. Therefore, effective methods that automatically solve this issue need to be created. In this paper, the mentioned task is solved using three basic factors based on genre classification made by neural network, Mel-frequency cepstral coefficients (MFCCs), and the tempo of the song. The recommendation system is built using a probability function based on these three factors. The authors’ contribution to the development of an automatic content-based recommendation system are methods built with the use of the mentioned three factors. Using different combinations of them, four strategies were created. All four strategies were evaluated based on the feedback score of 37 users, who created a total of 300 surveys. The proposed recommendation methods show a definite improvement in comparison with a random method. The obtained results indicate that the MFCC parameters have the greatest impact on the quality of recommendations. Full article
Show Figures

Figure 1

16 pages, 1082 KiB  
Article
Analyzing the Influence of Diverse Background Noises on Voice Transmission: A Deep Learning Approach to Noise Suppression
by Alberto Nogales, Javier Caracuel-Cayuela and Álvaro J. García-Tejedor
Appl. Sci. 2024, 14(2), 740; https://doi.org/10.3390/app14020740 - 15 Jan 2024
Viewed by 638
Abstract
This paper presents an approach to enhancing the clarity and intelligibility of speech in digital communications compromised by various background noises. Utilizing deep learning techniques, specifically a Variational Autoencoder (VAE) with 2D convolutional filters, we aim to suppress background noise in audio signals. [...] Read more.
This paper presents an approach to enhancing the clarity and intelligibility of speech in digital communications compromised by various background noises. Utilizing deep learning techniques, specifically a Variational Autoencoder (VAE) with 2D convolutional filters, we aim to suppress background noise in audio signals. Our method focuses on four simulated environmental noise scenarios: storms, wind, traffic, and aircraft. The training dataset has been obtained from public sources (TED-LIUM 3 dataset, which includes audio recordings from the popular TED-TALK series) combined with these background noises. The audio signals were transformed into 2D power spectrograms, upon which our VAE model was trained to filter out the noise and reconstruct clean audio. Our results demonstrate that the model outperforms existing state-of-the-art solutions in noise suppression. Although differences in noise types were observed, it was challenging to definitively conclude which background noise most adversely affects speech quality. The results have been assessed with objective (mathematical metrics) and subjective (listening to a set of audios by humans) methods. Notably, wind noise showed the smallest deviation between the noisy and cleaned audio, perceived subjectively as the most improved scenario. Future work should involve refining the phase calculation of the cleaned audio and creating a more balanced dataset to minimize differences in audio quality across scenarios. Additionally, practical applications of the model in real-time streaming audio are envisaged. This research contributes significantly to the field of audio signal processing by offering a deep learning solution tailored to various noise conditions, enhancing digital communication quality. Full article
Show Figures

Figure 1

15 pages, 5957 KiB  
Article
Deformer: Denoising Transformer for Improved Audio Music Genre Classification
by Jigang Wang, Shuyu Li and Yunsick Sung
Appl. Sci. 2023, 13(23), 12673; https://doi.org/10.3390/app132312673 - 25 Nov 2023
Viewed by 1166
Abstract
Audio music genre classification is performed to categorize audio music into various genres. Traditional approaches based on convolutional recurrent neural networks do not consider long temporal information, and their sequential structures result in longer training times and convergence difficulties. To overcome these problems, [...] Read more.
Audio music genre classification is performed to categorize audio music into various genres. Traditional approaches based on convolutional recurrent neural networks do not consider long temporal information, and their sequential structures result in longer training times and convergence difficulties. To overcome these problems, a traditional transformer-based approach was introduced. However, this approach employs pre-training based on momentum contrast (MoCo), a technique that increases computational costs owing to its reliance on extracting many negative samples and its use of highly sensitive hyperparameters. Consequently, this complicates the training process and increases the risk of learning imbalances between positive and negative sample sets. In this paper, a method for audio music genre classification called Deformer is proposed. The Deformer learns deep representations of audio music data through a denoising process, eliminating the need for MoCo and additional hyperparameters, thus reducing computational costs. In the denoising process, it employs a prior decoder to reconstruct the audio patches, thereby enhancing the interpretability of the representations. By calculating the mean squared error loss between the reconstructed and real patches, Deformer can learn a more refined representation of the audio data. The performance of the proposed method was experimentally compared with that of two distinct baseline models: one based on S3T and one employing a residual neural network-bidirectional gated recurrent unit (ResNet-BiGRU). The Deformer achieved an 84.5% accuracy, surpassing both the ResNet-BiGRU-based (81%) and S3T-based (81.1%) models, highlighting its superior performance in audio classification. Full article
Show Figures

Figure 1

19 pages, 20258 KiB  
Article
Design of a Semantic Understanding System for Optical Staff Symbols
by Fengbin Lou, Yaling Lu and Guangyu Wang
Appl. Sci. 2023, 13(23), 12627; https://doi.org/10.3390/app132312627 - 23 Nov 2023
Viewed by 718
Abstract
Symbolic semantic understanding of staff images is an important technological support to achieve “intelligent score flipping”. Due to the complex composition of staff symbols and the strong semantic correlation between symbol spaces, it is difficult to understand the pitch and duration of each [...] Read more.
Symbolic semantic understanding of staff images is an important technological support to achieve “intelligent score flipping”. Due to the complex composition of staff symbols and the strong semantic correlation between symbol spaces, it is difficult to understand the pitch and duration of each note when the staff is performed. In this paper, we design a semantic understanding system for optical staff symbols. The system uses the YOLOv5 to implement the optical staff’s low-level semantic understanding stage, which understands the pitch and duration in natural scales and other symbols that affect the pitch and duration. The proposed note encoding reconstruction algorithm is used to implement the high-level semantic understanding stage. Such an algorithm understands the logical, spatial, and temporal relationships between natural scales and other symbols based on music theory and outputs digital codes for the pitch and duration of the main notes during performances. The model is trained with a self-constructed SUSN dataset. Experimental results with YOLOv5 show that the precision is 0.989 and that the recall is 0.972. The system’s error rate is 0.031, and the omission rate is 0.021. The paper concludes by analyzing the causes of semantic understanding errors and offers recommendations for further research. The results of this paper provide a method for multimodal music artificial intelligence applications such as notation recognition through listening, intelligent score flipping, and automatic performance. Full article
Show Figures

Figure 1

Back to TopTop