Digital Audio and Image Processing with Focus on Music Research

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Acoustics and Vibrations".

Deadline for manuscript submissions: closed (31 July 2018) | Viewed by 91879

Special Issue Editors

ATIC Research Group, Universidad de Málaga, 29007 Málaga, Spain
Interests: serious games; digital audio and image processing; pattern analysis and recognition and applications of signal processing techniques and methods
Special Issues, Collections and Topics in MDPI journals
Application of Information and Communication Technologies (ATIC) Research Group, ETSI Telecomunicación, Campus Universitario de Teatinos s/n, 29071 Malaga, Spain
Interests: music information retrieval; audio signal processing; machine learning; musical acoustics; serious games; eeg signal processing; multimedia aplications
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Nowadays, massive amounts of digital data collections are at our reach, and technology is basic in order to get the most benefit from them. A very significant percentage in these data corresponds to musical content in different forms, such as audio, image (scores or other images related to music and music representations) and textual data, among others.

Music has accompanied human beings throughout all epochs of our history, and is essential in our lives: Billions of people enjoy music; our mood is affected by the music we listen, and, at the same time, the type of music we choose depends on our mood and the moment to enjoy it; learning, playing and composing music fills many hours of professional and amateur musicians; music is used as therapy for various kinds of diseases; and music is of key importance to create immersive experiences for different purposes, etc.

Hence, a deeper understanding of music, in the most general sense, benefits a very wide range of aspects of our lives.

This Special Issue is aimed at providing the research community with novel algorithms, methods, developments, applications and understanding of music in the widest sense. Thus, among others, the following concepts are covered:

  • Music signal processing techniques: automatic transcription, source separation, optical music recognition, etc.
  • Musical tools and methods for new personal and immersive experiences.
  • Music signal processing in relation to brain activity analysis and mood.
  • Music creation and entertainment; and the synergies arising between individuals connected through musical social media.
  • Music learning with new technology-driven methods.
  • Music tools based on data mining, machine learning, deep learning and big data.
  • Ubiquitous interaction with musical content in multiplatform scenarios.

The manuscripts in this Applied Sciences Special Issue on “Digital Audio and Image Processing with Focus on Music Research” will foster the growth of the interest in all kinds of music-related tasks towards a better technological, cultural, and humanistic future.

Prof. Dr. Lorenzo J. Tardón
Prof. Dr. Isabel Barbancho
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Music Information Retrieval (MIR)
  • Music transcription
  • Musical therapy
  • Music learning
  • Musical interaction
  • Music creation
  • Musical environment
  • Music and the brain

Published Papers (13 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

17 pages, 1516 KiB  
Article
Joint Detection and Classification of Singing Voice Melody Using Convolutional Recurrent Neural Networks
by Sangeun Kum and Juhan Nam
Appl. Sci. 2019, 9(7), 1324; https://doi.org/10.3390/app9071324 - 29 Mar 2019
Cited by 51 | Viewed by 6432
Abstract
Singing melody extraction essentially involves two tasks: one is detecting the activity of a singing voice in polyphonic music, and the other is estimating the pitch of a singing voice in the detected voiced segments. In this paper, we present a joint detection [...] Read more.
Singing melody extraction essentially involves two tasks: one is detecting the activity of a singing voice in polyphonic music, and the other is estimating the pitch of a singing voice in the detected voiced segments. In this paper, we present a joint detection and classification (JDC) network that conducts the singing voice detection and the pitch estimation simultaneously. The JDC network is composed of the main network that predicts the pitch contours of the singing melody and an auxiliary network that facilitates the detection of the singing voice. The main network is built with a convolutional recurrent neural network with residual connections and predicts pitch labels that cover the vocal range with a high resolution, as well as non-voice status. The auxiliary network is trained to detect the singing voice using multi-level features shared from the main network. The two optimization processes are tied with a joint melody loss function. We evaluate the proposed model on multiple melody extraction and vocal detection datasets, including cross-dataset evaluation. The experiments demonstrate how the auxiliary network and the joint melody loss function improve the melody extraction performance. Furthermore, the results show that our method outperforms state-of-the-art algorithms on the datasets. Full article
(This article belongs to the Special Issue Digital Audio and Image Processing with Focus on Music Research)
Show Figures

Figure 1

17 pages, 7279 KiB  
Article
Real-Time Musical Conducting Gesture Recognition Based on a Dynamic Time Warping Classifier Using a Single-Depth Camera
by Fahn Chin-Shyurng, Shih-En Lee and Meng-Luen Wu
Appl. Sci. 2019, 9(3), 528; https://doi.org/10.3390/app9030528 - 04 Feb 2019
Cited by 22 | Viewed by 9846
Abstract
Gesture recognition is a human–computer interaction method, which is widely used for educational, medical, and entertainment purposes. Humans also use gestures to communicate with each other, and musical conducting uses gestures in this way. In musical conducting, conductors wave their hands to control [...] Read more.
Gesture recognition is a human–computer interaction method, which is widely used for educational, medical, and entertainment purposes. Humans also use gestures to communicate with each other, and musical conducting uses gestures in this way. In musical conducting, conductors wave their hands to control the speed and strength of the music played. However, beginners may have a limited comprehension of the gestures and might not be able to properly follow the ensembles. Therefore, this paper proposes a real-time musical conducting gesture recognition system to help music players improve their performance. We used a single-depth camera to capture image inputs and establish a real-time dynamic gesture recognition system. The Kinect software development kit created a skeleton model by capturing the palm position. Different palm gestures were collected to develop training templates for musical conducting. The dynamic time warping algorithm was applied to recognize the different conducting gestures at various conducting speeds, thereby achieving real-time dynamic musical conducting gesture recognition. In the experiment, we used 5600 examples of three basic types of musical conducting gestures, including seven capturing angles and five performing speeds for evaluation. The experimental result showed that the average accuracy was 89.17% in 30 frames per second. Full article
(This article belongs to the Special Issue Digital Audio and Image Processing with Focus on Music Research)
Show Figures

Figure 1

11 pages, 432 KiB  
Article
Creative Chord Sequence Generation for Electronic Dance Music
by Darrell Conklin, Martin Gasser and Stefan Oertl
Appl. Sci. 2018, 8(9), 1704; https://doi.org/10.3390/app8091704 - 19 Sep 2018
Cited by 5 | Viewed by 5019
Abstract
This paper describes the theory and implementation of a digital audio workstation plug-in for chord sequence generation. The plug-in is intended to encourage and inspire a composer of electronic dance music to explore loops through chord sequence pattern definition, position locking and generation [...] Read more.
This paper describes the theory and implementation of a digital audio workstation plug-in for chord sequence generation. The plug-in is intended to encourage and inspire a composer of electronic dance music to explore loops through chord sequence pattern definition, position locking and generation into unlocked positions. A basic cyclic first-order statistical model is extended with latent diatonicity variables which permits sequences to depart from a specified key. Degrees of diatonicity of generated sequences can be explored and parameters for voicing the sequences can be manipulated. Feedback on the concepts, interface, and usability was given by a small focus group of musicians and music producers. Full article
(This article belongs to the Special Issue Digital Audio and Image Processing with Focus on Music Research)
Show Figures

Figure 1

15 pages, 3753 KiB  
Article
Melody Extraction Using Chroma-Level Note Tracking and Pitch Mapping
by Weiwei Zhang, Zhe Chen and Fuliang Yin
Appl. Sci. 2018, 8(9), 1618; https://doi.org/10.3390/app8091618 - 11 Sep 2018
Cited by 2 | Viewed by 3549
Abstract
A new architecture for melody extraction from polyphonic music is explored in this paper. Specifically, chromagrams are first constructed through the harmonic pitch class profile (HPCP) to measure the salience of melody, and chroma-level notes are tracked by dynamic programming. Then, note detection [...] Read more.
A new architecture for melody extraction from polyphonic music is explored in this paper. Specifically, chromagrams are first constructed through the harmonic pitch class profile (HPCP) to measure the salience of melody, and chroma-level notes are tracked by dynamic programming. Then, note detection is performed according to chroma-level note differences between adjacent frames. Next, note pitches are coarsely mapped by maximizing the salience of each note, followed by a fine tuning to fit the dynamic variation within each note. Finally, voicing detection is carried out to determine the presence of melody according to the salience of fine-tuned notes. Note level pitch mapping and fine tuning avoids pitch shifting between different octaves or notes within one note duration. Several experiments have been conducted to evaluate the performance of the proposed method. The experimental results show that the proposed method can track the dynamic pitch changing within each note, and performs well at different signal-to-accompaniment ratios. However, its performance for deep vibratos and pitch glides still needs to be improved. Full article
(This article belongs to the Special Issue Digital Audio and Image Processing with Focus on Music Research)
Show Figures

Figure 1

21 pages, 5551 KiB  
Article
A Baseline for General Music Object Detection with Deep Learning
by Alexander Pacha, Jan Hajič, Jr. and Jorge Calvo-Zaragoza
Appl. Sci. 2018, 8(9), 1488; https://doi.org/10.3390/app8091488 - 29 Aug 2018
Cited by 37 | Viewed by 11472
Abstract
Deep learning is bringing breakthroughs to many computer vision subfields including Optical Music Recognition (OMR), which has seen a series of improvements to musical symbol detection achieved by using generic deep learning models. However, so far, each such proposal has been based on [...] Read more.
Deep learning is bringing breakthroughs to many computer vision subfields including Optical Music Recognition (OMR), which has seen a series of improvements to musical symbol detection achieved by using generic deep learning models. However, so far, each such proposal has been based on a specific dataset and different evaluation criteria, which made it difficult to quantify the new deep learning-based state-of-the-art and assess the relative merits of these detection models on music scores. In this paper, a baseline for general detection of musical symbols with deep learning is presented. We consider three datasets of heterogeneous typology but with the same annotation format, three neural models of different nature, and establish their performance in terms of a common evaluation standard. The experimental results confirm that the direct music object detection with deep learning is indeed promising, but at the same time illustrates some of the domain-specific shortcomings of the general detectors. A qualitative comparison then suggests avenues for OMR improvement, based both on properties of the detection model and how the datasets are defined. To the best of our knowledge, this is the first time that competing music object detection systems from the machine learning paradigm are directly compared to each other. We hope that this work will serve as a reference to measure the progress of future developments of OMR in music object detection. Full article
(This article belongs to the Special Issue Digital Audio and Image Processing with Focus on Music Research)
Show Figures

Figure 1

13 pages, 1359 KiB  
Article
A Robust Cover Song Identification System with Two-Level Similarity Fusion and Post-Processing
by Mingyu Li and Ning Chen
Appl. Sci. 2018, 8(8), 1383; https://doi.org/10.3390/app8081383 - 16 Aug 2018
Cited by 4 | Viewed by 4679
Abstract
Similarity measurement plays an important role in various information retrieval tasks. In this paper, a music information retrieval scheme based on two-level similarity fusion and post-processing is proposed. At the similarity fusion level, to take full advantage of the common and complementary properties [...] Read more.
Similarity measurement plays an important role in various information retrieval tasks. In this paper, a music information retrieval scheme based on two-level similarity fusion and post-processing is proposed. At the similarity fusion level, to take full advantage of the common and complementary properties among different descriptors and different similarity functions, first, the track-by-track similarity graphs generated from the same descriptor but different similarity functions are fused with the similarity network fusion (SNF) technique. Then, the obtained first-level fused similarities based on different descriptors are further fused with the mixture Markov model (MMM) technique. At the post-processing level, diffusion is first performed on the two-level fused similarity graph to utilize the underlying track manifold contained within it. Then, a mutual proximity (MP) algorithm is adopted to refine the diffused similarity scores, which helps to reduce the bad influence caused by the “hubness” phenomenon contained in the scores. The performance of the proposed scheme is tested in the cover song identification (CSI) task on three cover song datasets (Covers80, Covers40, and Second Hand Songs (SHS)). The experimental results demonstrate that the proposed scheme outperforms state-of-the-art CSI schemes based on single similarity or similarity fusion. Full article
(This article belongs to the Special Issue Digital Audio and Image Processing with Focus on Music Research)
Show Figures

Figure 1

15 pages, 3628 KiB  
Article
Applying Acoustical and Musicological Analysis to Detect Brain Responses to Realistic Music: A Case Study
by Niels Trusbak Haumann, Marina Kliuchko, Peter Vuust and Elvira Brattico
Appl. Sci. 2018, 8(5), 716; https://doi.org/10.3390/app8050716 - 04 May 2018
Cited by 8 | Viewed by 4823
Abstract
Music information retrieval (MIR) methods offer interesting possibilities for automatically identifying time points in music recordings that relate to specific brain responses. However, how the acoustical features and the novelty of the music structure affect the brain response is not yet clear. In [...] Read more.
Music information retrieval (MIR) methods offer interesting possibilities for automatically identifying time points in music recordings that relate to specific brain responses. However, how the acoustical features and the novelty of the music structure affect the brain response is not yet clear. In the present study, we tested a new method for automatically identifying time points of brain responses based on MIR analysis. We utilized an existing database including brain recordings of 48 healthy listeners measured with electroencephalography (EEG) and magnetoencephalography (MEG). While we succeeded in capturing brain responses related to acoustical changes in the modern tango piece Adios Nonino, we obtained less reliable brain responses with a metal rock piece and a modern symphony orchestra musical composition. However, brain responses might also relate to the novelty of the music structure. Hence, we added a manual musicological analysis of novelty in the musical structure to the computational acoustic analysis, obtaining strong brain responses even to the rock and modern pieces. Although no standardized method yet exists, these preliminary results suggest that analysis of novelty in music is an important aid to MIR analysis for investigating brain responses to realistic music. Full article
(This article belongs to the Special Issue Digital Audio and Image Processing with Focus on Music Research)
Show Figures

Graphical abstract

21 pages, 4386 KiB  
Article
Deep Neural Networks for Document Processing of Music Score Images
by Jorge Calvo-Zaragoza, Francisco J. Castellanos, Gabriel Vigliensoni and Ichiro Fujinaga
Appl. Sci. 2018, 8(5), 654; https://doi.org/10.3390/app8050654 - 24 Apr 2018
Cited by 31 | Viewed by 6095
Abstract
There is an increasing interest in the automatic digitization of medieval music documents. Despite efforts in this field, the detection of the different layers of information on these documents still poses difficulties. The use of Deep Neural Networks techniques has reported outstanding results [...] Read more.
There is an increasing interest in the automatic digitization of medieval music documents. Despite efforts in this field, the detection of the different layers of information on these documents still poses difficulties. The use of Deep Neural Networks techniques has reported outstanding results in many areas related to computer vision. Consequently, in this paper, we study the so-called Convolutional Neural Networks (CNN) for performing the automatic document processing of music score images. This process is focused on layering the image into its constituent parts (namely, background, staff lines, music notes, and text) by training a classifier with examples of these parts. A comprehensive experimentation in terms of the configuration of the networks was carried out, which illustrates interesting results as regards to both the efficiency and effectiveness of these models. In addition, a cross-manuscript adaptation experiment was presented in which the networks are evaluated on a different manuscript from the one they were trained. The results suggest that the CNN is capable of adapting its knowledge, and so starting from a pre-trained CNN reduces (or eliminates) the need for new labeled data. Full article
(This article belongs to the Special Issue Digital Audio and Image Processing with Focus on Music Research)
Show Figures

Figure 1

23 pages, 3681 KiB  
Article
End-to-End Neural Optical Music Recognition of Monophonic Scores
by Jorge Calvo-Zaragoza and David Rizo
Appl. Sci. 2018, 8(4), 606; https://doi.org/10.3390/app8040606 - 11 Apr 2018
Cited by 50 | Viewed by 17861
Abstract
Optical Music Recognition is a field of research that investigates how to computationally decode music notation from images. Despite the efforts made so far, there are hardly any complete solutions to the problem. In this work, we study the use of neural networks [...] Read more.
Optical Music Recognition is a field of research that investigates how to computationally decode music notation from images. Despite the efforts made so far, there are hardly any complete solutions to the problem. In this work, we study the use of neural networks that work in an end-to-end manner. This is achieved by using a neural model that combines the capabilities of convolutional neural networks, which work on the input image, and recurrent neural networks, which deal with the sequential nature of the problem. Thanks to the use of the the so-called Connectionist Temporal Classification loss function, these models can be directly trained from input images accompanied by their corresponding transcripts into music symbol sequences. We also present the Printed Music Scores dataset, containing more than 80,000 monodic single-staff real scores in common western notation, that is used to train and evaluate the neural approach. In our experiments, it is demonstrated that this formulation can be carried out successfully. Additionally, we study several considerations about the codification of the output musical sequences, the convergence and scalability of the neural models, as well as the ability of this approach to locate symbols in the input score. Full article
(This article belongs to the Special Issue Digital Audio and Image Processing with Focus on Music Research)
Show Figures

Graphical abstract

19 pages, 34192 KiB  
Article
A Novel Tempogram Generating Algorithm Based on Matching Pursuit
by Wenming Gui, Yao Sun, Yuting Tao, Yanping Li, Lun Meng and Jinglan Zhang
Appl. Sci. 2018, 8(4), 561; https://doi.org/10.3390/app8040561 - 04 Apr 2018
Viewed by 3724
Abstract
Tempogram is one of the most useful representations for tempo, which has many applications, such as music tempo estimation, music structure analysis, music classification, and beat tracking. This paper presents a novel tempogram generating algorithm, which is based on matching pursuit. First, a [...] Read more.
Tempogram is one of the most useful representations for tempo, which has many applications, such as music tempo estimation, music structure analysis, music classification, and beat tracking. This paper presents a novel tempogram generating algorithm, which is based on matching pursuit. First, a tempo dictionary is designed in the light of the characteristics of tempo and note onset, then matching pursuit based on the tempo dictionary is executed on the resampled novelty curve, and finally the tempogram is created by assembling the coefficients of matching pursuit. The tempogram created by this algorithm has better resolution, stronger sparsity, and flexibility than those of the traditional algorithms. We demonstrate the properties of the algorithm through experiments and provide an application example for tempo estimation. Full article
(This article belongs to the Special Issue Digital Audio and Image Processing with Focus on Music Research)
Show Figures

Figure 1

18 pages, 2309 KiB  
Article
Assessment of Student Music Performances Using Deep Neural Networks
by Kumar Ashis Pati, Siddharth Gururani and Alexander Lerch
Appl. Sci. 2018, 8(4), 507; https://doi.org/10.3390/app8040507 - 27 Mar 2018
Cited by 30 | Viewed by 8312
Abstract
Music performance assessment is a highly subjective task often relying on experts to gauge both the technical and aesthetic aspects of the performance from the audio signal. This article explores the task of building computational models for music performance assessment, i.e., analyzing an [...] Read more.
Music performance assessment is a highly subjective task often relying on experts to gauge both the technical and aesthetic aspects of the performance from the audio signal. This article explores the task of building computational models for music performance assessment, i.e., analyzing an audio recording of a performance and rating it along several criteria such as musicality, note accuracy, etc. Much of the earlier work in this area has been centered around using hand-crafted features intended to capture relevant aspects of a performance. However, such features are based on our limited understanding of music perception and may not be optimal. In this article, we propose using Deep Neural Networks (DNNs) for the task and compare their performance against a baseline model using standard and hand-crafted features. We show that, using input representations at different levels of abstraction, DNNs can outperform the baseline models across all assessment criteria. In addition, we use model analysis techniques to further explain the model predictions in an attempt to gain useful insights into the assessment process. The results demonstrate the potential of using supervised feature learning techniques to better characterize music performances. Full article
(This article belongs to the Special Issue Digital Audio and Image Processing with Focus on Music Research)
Show Figures

Figure 1

15 pages, 2277 KiB  
Article
Polyphonic Piano Transcription with a Note-Based Music Language Model
by Qi Wang, Ruohua Zhou and Yonghong Yan
Appl. Sci. 2018, 8(3), 470; https://doi.org/10.3390/app8030470 - 19 Mar 2018
Cited by 9 | Viewed by 4398
Abstract
This paper proposes a note-based music language model (MLM) for improving note-level polyphonic piano transcription. The MLM is based on the recurrent structure, which could model the temporal correlations between notes in music sequences. To combine the outputs of the note-based MLM and [...] Read more.
This paper proposes a note-based music language model (MLM) for improving note-level polyphonic piano transcription. The MLM is based on the recurrent structure, which could model the temporal correlations between notes in music sequences. To combine the outputs of the note-based MLM and acoustic model directly, an integrated architecture is adopted in this paper. We also propose an inference algorithm, in which the note-based MLM is used to predict notes at the blank onsets in the thresholding transcription results. The experimental results show that the proposed inference algorithm improves the performance of note-level transcription. We also observe that the combination of the restricted Boltzmann machine (RBM) and recurrent structure outperforms a single recurrent neural network (RNN) or long short-term memory network (LSTM) in modeling the high-dimensional note sequences. Among all the MLMs, LSTM-RBM helps the system yield the best results on all evaluation metrics regardless of the performance of acoustic models. Full article
(This article belongs to the Special Issue Digital Audio and Image Processing with Focus on Music Research)
Show Figures

Figure 1

15 pages, 1174 KiB  
Article
Constraint-Based Time-Scale Modification of Music Recordings for Noise Beautification
by Meinard Müller, Helmut Hedwig, Frank Zalkow and Stefan Popescu
Appl. Sci. 2018, 8(3), 436; https://doi.org/10.3390/app8030436 - 14 Mar 2018
Viewed by 3478
Abstract
In magnetic resonance imaging (MRI), a patient is exposed to beat-like knocking sounds, often interrupted by periods of silence, which are caused by pulsing currents of the MRI scanner. In order to increase the patient’s comfort, one strategy is to play back ambient [...] Read more.
In magnetic resonance imaging (MRI), a patient is exposed to beat-like knocking sounds, often interrupted by periods of silence, which are caused by pulsing currents of the MRI scanner. In order to increase the patient’s comfort, one strategy is to play back ambient music to induce positive emotions and to reduce stress during the MRI scanning process. To create an overall acceptable acoustic environment, one idea is to adapt the music to the locally periodic acoustic MRI noise. Motivated by this scenario, we consider in this paper the general problem of adapting a given music recording to fulfill certain temporal constraints. More concretely, the constraints are given by a reference time axis with specified time points (e.g., the time positions of the MRI scanner’s knocking sounds). Then, the goal is to temporally modify a suitable music recording such that its beat positions align with the specified time points. As one technical contribution, we model this alignment task as an optimization problem with the objective to fulfill the constraints while avoiding strong local distortions in the music. Furthermore, we introduce an efficient algorithm based on dynamic programming for solving this task. Based on the computed alignment, we use existing time-scale modification procedures for locally adapting the music recording. To illustrate the outcome of our procedure, we discuss representative synthetic and real-world examples, which can be accessed via an interactive website. In particular, these examples indicate the potential of automated methods for noise beautification within the MRI application scenario. Full article
(This article belongs to the Special Issue Digital Audio and Image Processing with Focus on Music Research)
Show Figures

Graphical abstract

Back to TopTop