Special Issue "Digital Audio and Image Processing with Focus on Music Research"

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Acoustics and Vibrations".

Deadline for manuscript submissions: closed (31 July 2018)

Special Issue Editor

Guest Editor
Prof. Dr. Lorenzo J. Tardón

ATIC Research Group, Dept. Ingeniería de Comunicaciones, E.T.S.I. Telecomunicación, Universidad de Málaga, Andalucía Tech, Campus de Teatinos s/n, Málaga 29071, Spain
Website | E-Mail
Interests: serious games; digital audio and image processing; pattern analysis and recognition and applications of signal processing techniques and methods

Special Issue Information

Dear Colleagues,

Nowadays, massive amounts of digital data collections are at our reach, and technology is basic in order to get the most benefit from them. A very significant percentage in these data corresponds to musical content in different forms, such as audio, image (scores or other images related to music and music representations) and textual data, among others.

Music has accompanied human beings throughout all epochs of our history, and is essential in our lives: Billions of people enjoy music; our mood is affected by the music we listen, and, at the same time, the type of music we choose depends on our mood and the moment to enjoy it; learning, playing and composing music fills many hours of professional and amateur musicians; music is used as therapy for various kinds of diseases; and music is of key importance to create immersive experiences for different purposes, etc.

Hence, a deeper understanding of music, in the most general sense, benefits a very wide range of aspects of our lives.

This Special Issue is aimed at providing the research community with novel algorithms, methods, developments, applications and understanding of music in the widest sense. Thus, among others, the following concepts are covered:

  • Music signal processing techniques: automatic transcription, source separation, optical music recognition, etc.
  • Musical tools and methods for new personal and immersive experiences.
  • Music signal processing in relation to brain activity analysis and mood.
  • Music creation and entertainment; and the synergies arising between individuals connected through musical social media.
  • Music learning with new technology-driven methods.
  • Music tools based on data mining, machine learning, deep learning and big data.
  • Ubiquitous interaction with musical content in multiplatform scenarios.

The manuscripts in this Applied Sciences Special Issue on “Digital Audio and Image Processing with Focus on Music Research” will foster the growth of the interest in all kinds of music-related tasks towards a better technological, cultural, and humanistic future.

Prof. Dr. Lorenzo J. Tardón
Prof. Dr. Isabel Barbancho
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All papers will be peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Music Information Retrieval (MIR)
  • Music transcription
  • Musical therapy
  • Music learning
  • Musical interaction
  • Music creation
  • Musical environment
  • Music and the brain

Published Papers (11 papers)

View options order results:
result details:
Displaying articles 1-11
Export citation of selected articles as:

Research

Open AccessArticle Creative Chord Sequence Generation for Electronic Dance Music
Appl. Sci. 2018, 8(9), 1704; https://doi.org/10.3390/app8091704
Received: 31 July 2018 / Revised: 16 September 2018 / Accepted: 17 September 2018 / Published: 19 September 2018
PDF Full-text (432 KB) | HTML Full-text | XML Full-text
Abstract
This paper describes the theory and implementation of a digital audio workstation plug-in for chord sequence generation. The plug-in is intended to encourage and inspire a composer of electronic dance music to explore loops through chord sequence pattern definition, position locking and generation
[...] Read more.
This paper describes the theory and implementation of a digital audio workstation plug-in for chord sequence generation. The plug-in is intended to encourage and inspire a composer of electronic dance music to explore loops through chord sequence pattern definition, position locking and generation into unlocked positions. A basic cyclic first-order statistical model is extended with latent diatonicity variables which permits sequences to depart from a specified key. Degrees of diatonicity of generated sequences can be explored and parameters for voicing the sequences can be manipulated. Feedback on the concepts, interface, and usability was given by a small focus group of musicians and music producers. Full article
(This article belongs to the Special Issue Digital Audio and Image Processing with Focus on Music Research)
Figures

Figure 1

Open AccessArticle Melody Extraction Using Chroma-Level Note Tracking and Pitch Mapping
Appl. Sci. 2018, 8(9), 1618; https://doi.org/10.3390/app8091618
Received: 31 July 2018 / Revised: 8 September 2018 / Accepted: 10 September 2018 / Published: 11 September 2018
PDF Full-text (3753 KB) | HTML Full-text | XML Full-text
Abstract
A new architecture for melody extraction from polyphonic music is explored in this paper. Specifically, chromagrams are first constructed through the harmonic pitch class profile (HPCP) to measure the salience of melody, and chroma-level notes are tracked by dynamic programming. Then, note detection
[...] Read more.
A new architecture for melody extraction from polyphonic music is explored in this paper. Specifically, chromagrams are first constructed through the harmonic pitch class profile (HPCP) to measure the salience of melody, and chroma-level notes are tracked by dynamic programming. Then, note detection is performed according to chroma-level note differences between adjacent frames. Next, note pitches are coarsely mapped by maximizing the salience of each note, followed by a fine tuning to fit the dynamic variation within each note. Finally, voicing detection is carried out to determine the presence of melody according to the salience of fine-tuned notes. Note level pitch mapping and fine tuning avoids pitch shifting between different octaves or notes within one note duration. Several experiments have been conducted to evaluate the performance of the proposed method. The experimental results show that the proposed method can track the dynamic pitch changing within each note, and performs well at different signal-to-accompaniment ratios. However, its performance for deep vibratos and pitch glides still needs to be improved. Full article
(This article belongs to the Special Issue Digital Audio and Image Processing with Focus on Music Research)
Figures

Figure 1

Open AccessArticle A Baseline for General Music Object Detection with Deep Learning
Appl. Sci. 2018, 8(9), 1488; https://doi.org/10.3390/app8091488
Received: 31 July 2018 / Revised: 23 August 2018 / Accepted: 26 August 2018 / Published: 29 August 2018
PDF Full-text (5551 KB) | HTML Full-text | XML Full-text
Abstract
Deep learning is bringing breakthroughs to many computer vision subfields including Optical Music Recognition (OMR), which has seen a series of improvements to musical symbol detection achieved by using generic deep learning models. However, so far, each such proposal has been based on
[...] Read more.
Deep learning is bringing breakthroughs to many computer vision subfields including Optical Music Recognition (OMR), which has seen a series of improvements to musical symbol detection achieved by using generic deep learning models. However, so far, each such proposal has been based on a specific dataset and different evaluation criteria, which made it difficult to quantify the new deep learning-based state-of-the-art and assess the relative merits of these detection models on music scores. In this paper, a baseline for general detection of musical symbols with deep learning is presented. We consider three datasets of heterogeneous typology but with the same annotation format, three neural models of different nature, and establish their performance in terms of a common evaluation standard. The experimental results confirm that the direct music object detection with deep learning is indeed promising, but at the same time illustrates some of the domain-specific shortcomings of the general detectors. A qualitative comparison then suggests avenues for OMR improvement, based both on properties of the detection model and how the datasets are defined. To the best of our knowledge, this is the first time that competing music object detection systems from the machine learning paradigm are directly compared to each other. We hope that this work will serve as a reference to measure the progress of future developments of OMR in music object detection. Full article
(This article belongs to the Special Issue Digital Audio and Image Processing with Focus on Music Research)
Figures

Figure 1

Open AccessArticle A Robust Cover Song Identification System with Two-Level Similarity Fusion and Post-Processing
Appl. Sci. 2018, 8(8), 1383; https://doi.org/10.3390/app8081383
Received: 23 July 2018 / Revised: 13 August 2018 / Accepted: 14 August 2018 / Published: 16 August 2018
PDF Full-text (1359 KB) | HTML Full-text | XML Full-text
Abstract
Similarity measurement plays an important role in various information retrieval tasks. In this paper, a music information retrieval scheme based on two-level similarity fusion and post-processing is proposed. At the similarity fusion level, to take full advantage of the common and complementary properties
[...] Read more.
Similarity measurement plays an important role in various information retrieval tasks. In this paper, a music information retrieval scheme based on two-level similarity fusion and post-processing is proposed. At the similarity fusion level, to take full advantage of the common and complementary properties among different descriptors and different similarity functions, first, the track-by-track similarity graphs generated from the same descriptor but different similarity functions are fused with the similarity network fusion (SNF) technique. Then, the obtained first-level fused similarities based on different descriptors are further fused with the mixture Markov model (MMM) technique. At the post-processing level, diffusion is first performed on the two-level fused similarity graph to utilize the underlying track manifold contained within it. Then, a mutual proximity (MP) algorithm is adopted to refine the diffused similarity scores, which helps to reduce the bad influence caused by the “hubness” phenomenon contained in the scores. The performance of the proposed scheme is tested in the cover song identification (CSI) task on three cover song datasets (Covers80, Covers40, and Second Hand Songs (SHS)). The experimental results demonstrate that the proposed scheme outperforms state-of-the-art CSI schemes based on single similarity or similarity fusion. Full article
(This article belongs to the Special Issue Digital Audio and Image Processing with Focus on Music Research)
Figures

Figure 1

Open AccessFeature PaperArticle Applying Acoustical and Musicological Analysis to Detect Brain Responses to Realistic Music: A Case Study
Appl. Sci. 2018, 8(5), 716; https://doi.org/10.3390/app8050716
Received: 19 March 2018 / Revised: 24 April 2018 / Accepted: 30 April 2018 / Published: 4 May 2018
PDF Full-text (3628 KB) | HTML Full-text | XML Full-text
Abstract
Music information retrieval (MIR) methods offer interesting possibilities for automatically identifying time points in music recordings that relate to specific brain responses. However, how the acoustical features and the novelty of the music structure affect the brain response is not yet clear. In
[...] Read more.
Music information retrieval (MIR) methods offer interesting possibilities for automatically identifying time points in music recordings that relate to specific brain responses. However, how the acoustical features and the novelty of the music structure affect the brain response is not yet clear. In the present study, we tested a new method for automatically identifying time points of brain responses based on MIR analysis. We utilized an existing database including brain recordings of 48 healthy listeners measured with electroencephalography (EEG) and magnetoencephalography (MEG). While we succeeded in capturing brain responses related to acoustical changes in the modern tango piece Adios Nonino, we obtained less reliable brain responses with a metal rock piece and a modern symphony orchestra musical composition. However, brain responses might also relate to the novelty of the music structure. Hence, we added a manual musicological analysis of novelty in the musical structure to the computational acoustic analysis, obtaining strong brain responses even to the rock and modern pieces. Although no standardized method yet exists, these preliminary results suggest that analysis of novelty in music is an important aid to MIR analysis for investigating brain responses to realistic music. Full article
(This article belongs to the Special Issue Digital Audio and Image Processing with Focus on Music Research)
Figures

Graphical abstract

Open AccessFeature PaperArticle Deep Neural Networks for Document Processing of Music Score Images
Appl. Sci. 2018, 8(5), 654; https://doi.org/10.3390/app8050654
Received: 28 February 2018 / Revised: 13 April 2018 / Accepted: 20 April 2018 / Published: 24 April 2018
Cited by 4 | PDF Full-text (4386 KB) | HTML Full-text | XML Full-text
Abstract
There is an increasing interest in the automatic digitization of medieval music documents. Despite efforts in this field, the detection of the different layers of information on these documents still poses difficulties. The use of Deep Neural Networks techniques has reported outstanding results
[...] Read more.
There is an increasing interest in the automatic digitization of medieval music documents. Despite efforts in this field, the detection of the different layers of information on these documents still poses difficulties. The use of Deep Neural Networks techniques has reported outstanding results in many areas related to computer vision. Consequently, in this paper, we study the so-called Convolutional Neural Networks (CNN) for performing the automatic document processing of music score images. This process is focused on layering the image into its constituent parts (namely, background, staff lines, music notes, and text) by training a classifier with examples of these parts. A comprehensive experimentation in terms of the configuration of the networks was carried out, which illustrates interesting results as regards to both the efficiency and effectiveness of these models. In addition, a cross-manuscript adaptation experiment was presented in which the networks are evaluated on a different manuscript from the one they were trained. The results suggest that the CNN is capable of adapting its knowledge, and so starting from a pre-trained CNN reduces (or eliminates) the need for new labeled data. Full article
(This article belongs to the Special Issue Digital Audio and Image Processing with Focus on Music Research)
Figures

Figure 1

Open AccessArticle End-to-End Neural Optical Music Recognition of Monophonic Scores
Appl. Sci. 2018, 8(4), 606; https://doi.org/10.3390/app8040606
Received: 28 February 2018 / Revised: 22 March 2018 / Accepted: 8 April 2018 / Published: 11 April 2018
Cited by 2 | PDF Full-text (3681 KB) | HTML Full-text | XML Full-text
Abstract
Optical Music Recognition is a field of research that investigates how to computationally decode music notation from images. Despite the efforts made so far, there are hardly any complete solutions to the problem. In this work, we study the use of neural networks
[...] Read more.
Optical Music Recognition is a field of research that investigates how to computationally decode music notation from images. Despite the efforts made so far, there are hardly any complete solutions to the problem. In this work, we study the use of neural networks that work in an end-to-end manner. This is achieved by using a neural model that combines the capabilities of convolutional neural networks, which work on the input image, and recurrent neural networks, which deal with the sequential nature of the problem. Thanks to the use of the the so-called Connectionist Temporal Classification loss function, these models can be directly trained from input images accompanied by their corresponding transcripts into music symbol sequences. We also present the Printed Music Scores dataset, containing more than 80,000 monodic single-staff real scores in common western notation, that is used to train and evaluate the neural approach. In our experiments, it is demonstrated that this formulation can be carried out successfully. Additionally, we study several considerations about the codification of the output musical sequences, the convergence and scalability of the neural models, as well as the ability of this approach to locate symbols in the input score. Full article
(This article belongs to the Special Issue Digital Audio and Image Processing with Focus on Music Research)
Figures

Graphical abstract

Open AccessArticle A Novel Tempogram Generating Algorithm Based on Matching Pursuit
Appl. Sci. 2018, 8(4), 561; https://doi.org/10.3390/app8040561
Received: 14 February 2018 / Revised: 26 March 2018 / Accepted: 3 April 2018 / Published: 4 April 2018
PDF Full-text (34192 KB) | HTML Full-text | XML Full-text
Abstract
Tempogram is one of the most useful representations for tempo, which has many applications, such as music tempo estimation, music structure analysis, music classification, and beat tracking. This paper presents a novel tempogram generating algorithm, which is based on matching pursuit. First, a
[...] Read more.
Tempogram is one of the most useful representations for tempo, which has many applications, such as music tempo estimation, music structure analysis, music classification, and beat tracking. This paper presents a novel tempogram generating algorithm, which is based on matching pursuit. First, a tempo dictionary is designed in the light of the characteristics of tempo and note onset, then matching pursuit based on the tempo dictionary is executed on the resampled novelty curve, and finally the tempogram is created by assembling the coefficients of matching pursuit. The tempogram created by this algorithm has better resolution, stronger sparsity, and flexibility than those of the traditional algorithms. We demonstrate the properties of the algorithm through experiments and provide an application example for tempo estimation. Full article
(This article belongs to the Special Issue Digital Audio and Image Processing with Focus on Music Research)
Figures

Figure 1

Open AccessArticle Assessment of Student Music Performances Using Deep Neural Networks
Appl. Sci. 2018, 8(4), 507; https://doi.org/10.3390/app8040507
Received: 28 February 2018 / Revised: 19 March 2018 / Accepted: 22 March 2018 / Published: 27 March 2018
Cited by 1 | PDF Full-text (2309 KB) | HTML Full-text | XML Full-text
Abstract
Music performance assessment is a highly subjective task often relying on experts to gauge both the technical and aesthetic aspects of the performance from the audio signal. This article explores the task of building computational models for music performance assessment, i.e., analyzing an
[...] Read more.
Music performance assessment is a highly subjective task often relying on experts to gauge both the technical and aesthetic aspects of the performance from the audio signal. This article explores the task of building computational models for music performance assessment, i.e., analyzing an audio recording of a performance and rating it along several criteria such as musicality, note accuracy, etc. Much of the earlier work in this area has been centered around using hand-crafted features intended to capture relevant aspects of a performance. However, such features are based on our limited understanding of music perception and may not be optimal. In this article, we propose using Deep Neural Networks (DNNs) for the task and compare their performance against a baseline model using standard and hand-crafted features. We show that, using input representations at different levels of abstraction, DNNs can outperform the baseline models across all assessment criteria. In addition, we use model analysis techniques to further explain the model predictions in an attempt to gain useful insights into the assessment process. The results demonstrate the potential of using supervised feature learning techniques to better characterize music performances. Full article
(This article belongs to the Special Issue Digital Audio and Image Processing with Focus on Music Research)
Figures

Figure 1

Open AccessArticle Polyphonic Piano Transcription with a Note-Based Music Language Model
Appl. Sci. 2018, 8(3), 470; https://doi.org/10.3390/app8030470
Received: 18 January 2018 / Revised: 7 March 2018 / Accepted: 16 March 2018 / Published: 19 March 2018
PDF Full-text (2277 KB) | HTML Full-text | XML Full-text
Abstract
This paper proposes a note-based music language model (MLM) for improving note-level polyphonic piano transcription. The MLM is based on the recurrent structure, which could model the temporal correlations between notes in music sequences. To combine the outputs of the note-based MLM and
[...] Read more.
This paper proposes a note-based music language model (MLM) for improving note-level polyphonic piano transcription. The MLM is based on the recurrent structure, which could model the temporal correlations between notes in music sequences. To combine the outputs of the note-based MLM and acoustic model directly, an integrated architecture is adopted in this paper. We also propose an inference algorithm, in which the note-based MLM is used to predict notes at the blank onsets in the thresholding transcription results. The experimental results show that the proposed inference algorithm improves the performance of note-level transcription. We also observe that the combination of the restricted Boltzmann machine (RBM) and recurrent structure outperforms a single recurrent neural network (RNN) or long short-term memory network (LSTM) in modeling the high-dimensional note sequences. Among all the MLMs, LSTM-RBM helps the system yield the best results on all evaluation metrics regardless of the performance of acoustic models. Full article
(This article belongs to the Special Issue Digital Audio and Image Processing with Focus on Music Research)
Figures

Figure 1

Open AccessArticle Constraint-Based Time-Scale Modification of Music Recordings for Noise Beautification
Appl. Sci. 2018, 8(3), 436; https://doi.org/10.3390/app8030436
Received: 9 February 2018 / Revised: 27 February 2018 / Accepted: 7 March 2018 / Published: 14 March 2018
PDF Full-text (1174 KB) | HTML Full-text | XML Full-text
Abstract
In magnetic resonance imaging (MRI), a patient is exposed to beat-like knocking sounds, often interrupted by periods of silence, which are caused by pulsing currents of the MRI scanner. In order to increase the patient’s comfort, one strategy is to play back ambient
[...] Read more.
In magnetic resonance imaging (MRI), a patient is exposed to beat-like knocking sounds, often interrupted by periods of silence, which are caused by pulsing currents of the MRI scanner. In order to increase the patient’s comfort, one strategy is to play back ambient music to induce positive emotions and to reduce stress during the MRI scanning process. To create an overall acceptable acoustic environment, one idea is to adapt the music to the locally periodic acoustic MRI noise. Motivated by this scenario, we consider in this paper the general problem of adapting a given music recording to fulfill certain temporal constraints. More concretely, the constraints are given by a reference time axis with specified time points (e.g., the time positions of the MRI scanner’s knocking sounds). Then, the goal is to temporally modify a suitable music recording such that its beat positions align with the specified time points. As one technical contribution, we model this alignment task as an optimization problem with the objective to fulfill the constraints while avoiding strong local distortions in the music. Furthermore, we introduce an efficient algorithm based on dynamic programming for solving this task. Based on the computed alignment, we use existing time-scale modification procedures for locally adapting the music recording. To illustrate the outcome of our procedure, we discuss representative synthetic and real-world examples, which can be accessed via an interactive website. In particular, these examples indicate the potential of automated methods for noise beautification within the MRI application scenario. Full article
(This article belongs to the Special Issue Digital Audio and Image Processing with Focus on Music Research)
Figures

Graphical abstract

Back to Top