Special Issue "Sound and Music Computing"

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Acoustics".

Deadline for manuscript submissions: closed (3 November 2017)

Special Issue Editors

Guest Editor
Prof. Dr. Tapio Lokki

Department of Computer Science, Aalto University, Espoo 02150, Finland
Website | E-Mail
Interests: virtual acoustics; spatial sound; psychoacoustics
Co-Guest Editor
Prof. Dr. Stefania Serafin

Department of Architecture, Design and Media Technology, Aalborg University, 2450 Copenhagen SV, Denmark
Website | E-Mail
Interests: multimodal interfaces; sonic interaction design
Co-Guest Editor
Prof. Dr. Meinard Müller

International Audio Laboratories Erlangen, Friedrich-Alexander Universität Erlangen-Nürnberg, Erlangen 91058, Germany
Website | E-Mail
Interests: music information retrieval; music processing; audio signal processing
Co-Guest Editor
Prof. Dr. Vesa Valimaki

Department of Signal Processing and Acoustics, Aalto University, Espoo 02150, Finland
Website | E-Mail
Interests: audio signal processing; sound synthesis

Special Issue Information

Dear Colleagues,

Sound and music computing is a young and highly multidisciplinary research field. It combines scientific, technological, and artistic methods to produce, model, and understand audio and sonic arts with the help of computers. Sound and music computing borrows methods, for example, from computer science, electrical engineering, mathematics, musicology, and psychology.

In this Special Issue, we want to address recent advances in the following topics:

·         Analysis, synthesis, and modification of sound

·         Automatic composition, accompaniment, and improvisation

·         Computational musicology and mathematical music theory

·         Computer-based music analysis

·         Computer music languages and software

·         High-performance computing for audio

·         Interactive performance systems and new interfaces

·         Multi-modal perception and emotion

·         Music information retrieval

·         Music games and educational tools

·         Music performance analysis and rendering

·         Robotics and music

·         Room acoustics modeling and auralization

·         Social interaction in sound and music computing

·         Sonic interaction design

·         Sonification

·         Soundscapes and environmental arts

·         Spatial sound

·         Virtual reality applications and technologies for sound and music

Submissions are invited for both original research and review articles. Additionally, invited papers based on excellent contributions to recent conferences in this field will be included in this Special Issue; for example, from the 2017 Sound and Music Computing Conference SMC-17. We hope that this collection of papers will serve as an inspiration for those interested in sound and music computing.

Prof. Dr. Tapio Lokki,
Prof. Dr. Stefania Serafin,
Prof. Dr. Meinard Müller,
Prof. Dr. Vesa Välimäki
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All papers will be peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1200 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Audio signal processing

  • computer interfaces

  • computer music

  • multimedia

  • music cognition

  • music control and performance

  • music information retrieval

  • music technology

  • sonic interaction design

  • virtual reality

Published Papers (12 papers)

View options order results:
result details:
Displaying articles 1-12
Export citation of selected articles as:

Research

Open AccessFeature PaperArticle Automatic Transcription of Polyphonic Vocal Music
Appl. Sci. 2017, 7(12), 1285; doi:10.3390/app7121285 (registering DOI)
Received: 31 October 2017 / Revised: 1 December 2017 / Accepted: 4 December 2017 / Published: 11 December 2017
PDF Full-text (647 KB) | HTML Full-text | XML Full-text
Abstract
This paper presents a method for automatic music transcription applied to audio recordings of a cappella performances with multiple singers. We propose a system for multi-pitch detection and voice assignment that integrates an acoustic and a music language model. The acoustic model performs
[...] Read more.
This paper presents a method for automatic music transcription applied to audio recordings of a cappella performances with multiple singers. We propose a system for multi-pitch detection and voice assignment that integrates an acoustic and a music language model. The acoustic model performs spectrogram decomposition, extending probabilistic latent component analysis (PLCA) using a six-dimensional dictionary with pre-extracted log-spectral templates. The music language model performs voice separation and assignment using hidden Markov models that apply musicological assumptions. By integrating the two models, the system is able to detect multiple concurrent pitches in polyphonic vocal music and assign each detected pitch to a specific voice type such as soprano, alto, tenor or bass (SATB). We compare our system against multiple baselines, achieving state-of-the-art results for both multi-pitch detection and voice assignment on a dataset of Bach chorales and another of barbershop quartets. We also present an additional evaluation of our system using varied pitch tolerance levels to investigate its performance at 20-cent pitch resolution. Full article
(This article belongs to the Special Issue Sound and Music Computing)
Figures

Open AccessArticle The Effects of Musical Experience and Hearing Loss on Solving an Audio-Based Gaming Task
Appl. Sci. 2017, 7(12), 1278; doi:10.3390/app7121278 (registering DOI)
Received: 23 October 2017 / Revised: 2 December 2017 / Accepted: 5 December 2017 / Published: 10 December 2017
PDF Full-text (4664 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
We conducted an experiment using a purposefully designed audio-based game called the Music Puzzle with Japanese university students with different levels of hearing acuity and experience with music in order to determine the effects of these factors on solving such games. A group
[...] Read more.
We conducted an experiment using a purposefully designed audio-based game called the Music Puzzle with Japanese university students with different levels of hearing acuity and experience with music in order to determine the effects of these factors on solving such games. A group of hearing-impaired students (n = 12) was compared with two hearing control groups with the additional characteristic of having high (n = 12) or low (n = 12) engagement in musical activities. The game was played with three sound sets or modes; speech, music, and a mix of the two. The results showed that people with hearing loss had longer processing times for sounds when playing the game. Solving the game task in the speech mode was found particularly difficult for the group with hearing loss, and while they found the game difficult in general, they expressed a fondness for the game and a preference for music. Participants with less musical experience showed difficulties in playing the game with musical material. We were able to explain the impacts of hearing acuity and musical experience; furthermore, we can promote this kind of tool as a viable way to train hearing by focused listening to sound, particularly with music. Full article
(This article belongs to the Special Issue Sound and Music Computing)
Figures

Figure 1

Open AccessFeature PaperArticle Optimization of Virtual Loudspeakers for Spatial Room Acoustics Reproduction with Headphones
Appl. Sci. 2017, 7(12), 1282; doi:10.3390/app7121282 (registering DOI)
Received: 31 October 2017 / Revised: 24 November 2017 / Accepted: 5 December 2017 / Published: 9 December 2017
PDF Full-text (579 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
The use of headphones in reproducing spatial sound is becoming more and more popular. For instance, virtual reality applications often use head-tracking to keep the binaurally reproduced auditory environment stable and to improve externalization. Here, we study one spatial sound reproduction method over
[...] Read more.
The use of headphones in reproducing spatial sound is becoming more and more popular. For instance, virtual reality applications often use head-tracking to keep the binaurally reproduced auditory environment stable and to improve externalization. Here, we study one spatial sound reproduction method over headphones, in particular the positioning of the virtual loudspeakers. The paper presents an algorithm that optimizes the positioning of virtual reproduction loudspeakers to reduce the computational cost in head-tracked real-time rendering. The listening test results suggest that listeners could discriminate the optimized loudspeaker arrays for renderings that reproduced a relatively simple acoustic conditions, but optimized array was not significantly different from equally spaced array for a reproduction of a more complex case. Moreover, the optimization seems to change the perceived openness and timbre, according to the verbal feedback of the test subjects. Full article
(This article belongs to the Special Issue Sound and Music Computing)
Figures

Figure 1

Open AccessArticle Melodic Similarity and Applications Using Biologically-Inspired Techniques
Appl. Sci. 2017, 7(12), 1242; doi:10.3390/app7121242
Received: 30 September 2017 / Revised: 23 November 2017 / Accepted: 27 November 2017 / Published: 1 December 2017
PDF Full-text (926 KB) | HTML Full-text | XML Full-text
Abstract
Music similarity is a complex concept that manifests itself in areas such as Music Information Retrieval (MIR), musicological analysis and music cognition. Modelling the similarity of two music items is key for a number of music-related applications, such as cover song detection and
[...] Read more.
Music similarity is a complex concept that manifests itself in areas such as Music Information Retrieval (MIR), musicological analysis and music cognition. Modelling the similarity of two music items is key for a number of music-related applications, such as cover song detection and query-by-humming. Typically, similarity models are based on intuition, heuristics or small-scale cognitive experiments; thus, applicability to broader contexts cannot be guaranteed. We argue that data-driven tools and analysis methods, applied to songs known to be related, can potentially provide us with information regarding the fine-grained nature of music similarity. Interestingly, music and biological sequences share a number of parallel concepts; from the natural sequence-representation, to their mechanisms of generating variations, i.e., oral transmission and evolution respectively. As such, there is a great potential for applying scientific methods and tools from bioinformatics to music. Stripped-down from biological heuristics, certain bioinformatics approaches can be generalized to any type of sequence. Consequently, reliable and unbiased data-driven solutions to problems such as biological sequence similarity and conservation analysis can be applied to music similarity and stability analysis. Our paper relies on such an approach to tackle a number of tasks and more notably to model global melodic similarity. Full article
(This article belongs to the Special Issue Sound and Music Computing)
Figures

Figure 1

Open AccessArticle Exploring the Effects of Pitch Layout on Learning a New Musical Instrument
Appl. Sci. 2017, 7(12), 1218; doi:10.3390/app7121218
Received: 27 October 2017 / Revised: 21 November 2017 / Accepted: 21 November 2017 / Published: 24 November 2017
PDF Full-text (2044 KB) | HTML Full-text | XML Full-text
Abstract
Although isomorphic pitch layouts are proposed to afford various advantages for musicians playing new musical instruments, this paper details the first substantive set of empirical tests on how two fundamental aspects of isomorphic pitch layouts affect motor learning: shear, which makes the
[...] Read more.
Although isomorphic pitch layouts are proposed to afford various advantages for musicians playing new musical instruments, this paper details the first substantive set of empirical tests on how two fundamental aspects of isomorphic pitch layouts affect motor learning: shear, which makes the pitch axis vertical, and the adjacency (or nonadjacency) of pitches a major second apart. After receiving audio-visual training tasks for a scale and arpeggios, performance accuracies of 24 experienced musicians were assessed in immediate retention tasks (same as the training tasks, but without the audio-visual guidance) and in a transfer task (performance of a previously untrained nursery rhyme). Each participant performed the same tasks with three different pitch layouts and, in total, four different layouts were tested. Results show that, so long as the performance ceiling has not already been reached (due to ease of the task or repeated practice), adjacency strongly improves performance accuracy in the training and retention tasks. They also show that shearing the layout, to make the pitch axis vertical, worsens performance accuracy for the training tasks but, crucially, it strongly improves performance accuracy in the transfer task when the participant needs to perform a new, but related, task. These results can inform the design of pitch layouts in new musical instruments. Full article
(This article belongs to the Special Issue Sound and Music Computing)
Figures

Figure 1

Open AccessArticle EigenScape: A Database of Spatial Acoustic Scene Recordings
Appl. Sci. 2017, 7(11), 1204; doi:10.3390/app7111204
Received: 23 October 2017 / Revised: 21 November 2017 / Accepted: 8 November 2017 / Published: 22 November 2017
PDF Full-text (3045 KB) | HTML Full-text | XML Full-text
Abstract
The classification of acoustic scenes and events is an emerging area of research in the field of machine listening. Most of the research conducted so far uses spectral features extracted from monaural or stereophonic audio rather than spatial features extracted from multichannel recordings.
[...] Read more.
The classification of acoustic scenes and events is an emerging area of research in the field of machine listening. Most of the research conducted so far uses spectral features extracted from monaural or stereophonic audio rather than spatial features extracted from multichannel recordings. This is partly due to the lack thus far of a substantial body of spatial recordings of acoustic scenes. This paper formally introduces EigenScape, a new database of fourth-order Ambisonic recordings of eight different acoustic scene classes. The potential applications of a spatial machine listening system are discussed before detailed information on the recording process and dataset are provided. A baseline spatial classification system using directional audio coding (DirAC) techniques is detailed and results from this classifier are presented. The classifier is shown to give good overall scene classification accuracy across the dataset, with 7 of 8 scenes being classified with an accuracy of greater than 60% with an 11% improvement in overall accuracy compared to use of Mel-frequency cepstral coefficient (MFCC) features. Further analysis of the results shows potential improvements to the classifier. It is concluded that the results validate the new database and show that spatial features can characterise acoustic scenes and as such are worthy of further investigation. Full article
(This article belongs to the Special Issue Sound and Music Computing)
Figures

Figure 1

Open AccessArticle Identifying Single Trial Event-Related Potentials in an Earphone-Based Auditory Brain-Computer Interface
Appl. Sci. 2017, 7(11), 1197; doi:10.3390/app7111197
Received: 20 October 2017 / Accepted: 17 November 2017 / Published: 21 November 2017
PDF Full-text (3513 KB) | HTML Full-text | XML Full-text
Abstract
As brain-computer interfaces (BCI) must provide reliable ways for end users to accomplish a specific task, methods to secure the best possible translation of the intention of the users are constantly being explored. In this paper, we propose and test a number of
[...] Read more.
As brain-computer interfaces (BCI) must provide reliable ways for end users to accomplish a specific task, methods to secure the best possible translation of the intention of the users are constantly being explored. In this paper, we propose and test a number of convolutional neural network (CNN) structures to identify and classify single-trial P300 in electroencephalogram (EEG) readings of an auditory BCI. The recorded data correspond to nine subjects in a series of experiment sessions in which auditory stimuli following the oddball paradigm were presented via earphones from six different virtual directions at time intervals of 200, 300, 400 and 500 ms. Using three different approaches for the pooling process, we report the average accuracy for 18 CNN structures. The results obtained for most of the CNN models show clear improvement over past studies in similar contexts, as well as over other commonly-used classifiers. We found that the models that consider data from the time and space domains and those that overlap in the pooling process usually offer better results regardless of the number of layers. Additionally, patterns of improvement with single-layered CNN models can be observed. Full article
(This article belongs to the Special Issue Sound and Music Computing)
Figures

Figure 1

Open AccessFeature PaperArticle Sound Synthesis of Objects Swinging through Air Using Physical Models
Appl. Sci. 2017, 7(11), 1177; doi:10.3390/app7111177
Received: 12 October 2017 / Accepted: 10 November 2017 / Published: 16 November 2017
PDF Full-text (953 KB) | HTML Full-text | XML Full-text
Abstract
A real-time physically-derived sound synthesis model is presented that replicates the sounds generated as an object swings through the air. Equations obtained from fluid dynamics are used to determine the sounds generated while exposing practical parameters for a user or game engine to
[...] Read more.
A real-time physically-derived sound synthesis model is presented that replicates the sounds generated as an object swings through the air. Equations obtained from fluid dynamics are used to determine the sounds generated while exposing practical parameters for a user or game engine to vary. Listening tests reveal that for the majority of objects modelled, participants rated the sounds from our model as plausible as actual recordings. The sword sound effect performed worse than others, and it is speculated that one cause may be linked to the difference between expectations of a sound and the actual sound for a given object. Full article
(This article belongs to the Special Issue Sound and Music Computing)
Figures

Figure 1

Open AccessArticle SymCHM—An Unsupervised Approach for Pattern Discovery in Symbolic Music with a Compositional Hierarchical Model
Appl. Sci. 2017, 7(11), 1135; doi:10.3390/app7111135
Received: 13 September 2017 / Revised: 25 October 2017 / Accepted: 1 November 2017 / Published: 4 November 2017
PDF Full-text (795 KB) | HTML Full-text | XML Full-text
Abstract
This paper presents a compositional hierarchical model for pattern discovery in symbolic music. The model can be regarded as a deep architecture with a transparent structure. It can learn a set of repeated patterns within individual works or larger corpora in an unsupervised
[...] Read more.
This paper presents a compositional hierarchical model for pattern discovery in symbolic music. The model can be regarded as a deep architecture with a transparent structure. It can learn a set of repeated patterns within individual works or larger corpora in an unsupervised manner, relying on statistics of pattern occurrences, and robustly infer the learned patterns in new, unknown works. A learned model contains representations of patterns on different layers, from the simple short structures on lower layers to the longer and more complex music structures on higher layers. A pattern selection procedure can be used to extract the most frequent patterns from the model. We evaluate the model on the publicly available JKU Patterns Datasetsand compare the results to other approaches. Full article
(This article belongs to the Special Issue Sound and Music Computing)
Figures

Figure 1

Open AccessArticle Supporting an Object-Oriented Approach to Unit Generator Development: The Csound Plugin Opcode Framework
Appl. Sci. 2017, 7(10), 970; doi:10.3390/app7100970
Received: 31 July 2017 / Revised: 8 September 2017 / Accepted: 18 September 2017 / Published: 21 September 2017
PDF Full-text (1163 KB) | HTML Full-text | XML Full-text
Abstract
This article presents a new framework for unit generator development for Csound, supporting a full object-oriented programming approach. It introduces the concept of unit generators and opcodes, and its centrality with regards to music programming languages in general, and Csound in specific. The
[...] Read more.
This article presents a new framework for unit generator development for Csound, supporting a full object-oriented programming approach. It introduces the concept of unit generators and opcodes, and its centrality with regards to music programming languages in general, and Csound in specific. The layout of an opcode from the perspective of the Csound C-language API is presented, with some outline code examples. This is followed by a discussion which places the unit generator within the object-oriented paradigm and the motivation for a full C++ programming support, which is provided by the Csound Plugin Opcode Framework (CPOF). The design of CPOF is then explored in detail, supported by several opcode examples. The article concludes by discussing two key applications of object-orientation and their respective instances in the Csound code base. Full article
(This article belongs to the Special Issue Sound and Music Computing)
Figures

Figure 1

Open AccessArticle A Two-Stage Approach to Note-Level Transcription of a Specific Piano
Appl. Sci. 2017, 7(9), 901; doi:10.3390/app7090901
Received: 22 July 2017 / Revised: 25 August 2017 / Accepted: 29 August 2017 / Published: 2 September 2017
PDF Full-text (12209 KB) | HTML Full-text | XML Full-text
Abstract
This paper presents a two-stage transcription framework for a specific piano, which combines deep learning and spectrogram factorization techniques. In the first stage, two convolutional neural networks (CNNs) are adopted to recognize the notes of the piano preliminarily, and note verification for the
[...] Read more.
This paper presents a two-stage transcription framework for a specific piano, which combines deep learning and spectrogram factorization techniques. In the first stage, two convolutional neural networks (CNNs) are adopted to recognize the notes of the piano preliminarily, and note verification for the specific individual is conducted in the second stage. The note recognition stage is independent of piano individual, in which one CNN is used to detect onsets and another is used to estimate the probabilities of pitches at each detected onset. Hence, candidate pitches at candidate onsets are obtained in the first stage. During the note verification, templates for the specific piano are generated to model the attack of note per pitch. Then, the spectrogram of the segment around candidate onset is factorized using attack templates of candidate pitches. In this way, not only the pitches are picked up by note activations, but the onsets are revised. Experiments show that CNN outperforms other types of neural networks in both onset detection and pitch estimation, and the combination of two CNNs yields better performance than a single CNN in note recognition. We also observe that note verification further improves the performance of transcription. In the transcription of a specific piano, the proposed system achieves 82% on note-wise F-measure, which outperforms the state-of-the-art. Full article
(This article belongs to the Special Issue Sound and Music Computing)
Figures

Figure 1

Open AccessArticle A Low Cost Wireless Acoustic Sensor for Ambient Assisted Living Systems
Appl. Sci. 2017, 7(9), 877; doi:10.3390/app7090877
Received: 31 July 2017 / Revised: 24 August 2017 / Accepted: 25 August 2017 / Published: 27 August 2017
PDF Full-text (1629 KB) | HTML Full-text | XML Full-text
Abstract
Ambient Assisted Living (AAL) has become an attractive research topic due to growing interest in remote monitoring of older people. Development in sensor technologies and advances in wireless communications allows to remotely offer smart assistance and monitor those people at their own home,
[...] Read more.
Ambient Assisted Living (AAL) has become an attractive research topic due to growing interest in remote monitoring of older people. Development in sensor technologies and advances in wireless communications allows to remotely offer smart assistance and monitor those people at their own home, increasing their quality of life. In this context, Wireless Acoustic Sensor Networks (WASN) provide a suitable way for implementing AAL systems which can be used to infer hazardous situations via environmental sounds identification. Nevertheless, satisfying sensor solutions have not been found with the considerations of both low cost and high performance. In this paper, we report the design and implementation of a wireless acoustic sensor to be located at the edge of a WASN for recording and processing environmental sounds which can be applied to AAL systems for personal healthcare because it has the following significant advantages: low cost, small size, audio sampling and computation capabilities for audio processing. The proposed wireless acoustic sensor is able to record audio samples at least to 10 kHz sampling frequency and 12-bit resolution. Also, it is capable of doing audio signal processing without compromising the sample rate and the energy consumption by using a new microcontroller released at the last quarter of 2016. The proposed low cost wireless acoustic sensor has been verified using four randomness tests for doing statistical analysis and a classification system of the recorded sounds based on audio fingerprints. Full article
(This article belongs to the Special Issue Sound and Music Computing)
Figures

Figure 1

Back to Top