Special Issue "Sound and Music Computing"

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Acoustics".

Deadline for manuscript submissions: closed (3 November 2017)

Special Issue Editors

Guest Editor
Prof. Dr. Tapio Lokki

Department of Computer Science, Aalto University, Espoo 02150, Finland
Website | E-Mail
Interests: virtual acoustics; spatial sound; psychoacoustics
Co-Guest Editor
Prof. Dr. Stefania Serafin

Department of Architecture, Design and Media Technology, Aalborg University, 2450 Copenhagen SV, Denmark
Website | E-Mail
Interests: multimodal interfaces; sonic interaction design
Co-Guest Editor
Prof. Dr. Meinard Müller

International Audio Laboratories Erlangen, Friedrich-Alexander Universität Erlangen-Nürnberg, Erlangen 91058, Germany
Website | E-Mail
Interests: music information retrieval; music processing; audio signal processing
Co-Guest Editor
Prof. Dr. Vesa Valimaki

Department of Signal Processing and Acoustics, Aalto University, Espoo 02150, Finland
Website | E-Mail
Interests: audio signal processing; sound synthesis

Special Issue Information

Dear Colleagues,

Sound and music computing is a young and highly multidisciplinary research field. It combines scientific, technological, and artistic methods to produce, model, and understand audio and sonic arts with the help of computers. Sound and music computing borrows methods, for example, from computer science, electrical engineering, mathematics, musicology, and psychology.

In this Special Issue, we want to address recent advances in the following topics:

·         Analysis, synthesis, and modification of sound

·         Automatic composition, accompaniment, and improvisation

·         Computational musicology and mathematical music theory

·         Computer-based music analysis

·         Computer music languages and software

·         High-performance computing for audio

·         Interactive performance systems and new interfaces

·         Multi-modal perception and emotion

·         Music information retrieval

·         Music games and educational tools

·         Music performance analysis and rendering

·         Robotics and music

·         Room acoustics modeling and auralization

·         Social interaction in sound and music computing

·         Sonic interaction design

·         Sonification

·         Soundscapes and environmental arts

·         Spatial sound

·         Virtual reality applications and technologies for sound and music

Submissions are invited for both original research and review articles. Additionally, invited papers based on excellent contributions to recent conferences in this field will be included in this Special Issue; for example, from the 2017 Sound and Music Computing Conference SMC-17. We hope that this collection of papers will serve as an inspiration for those interested in sound and music computing.

Prof. Dr. Tapio Lokki,
Prof. Dr. Stefania Serafin,
Prof. Dr. Meinard Müller,
Prof. Dr. Vesa Välimäki
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All papers will be peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Audio signal processing

  • computer interfaces

  • computer music

  • multimedia

  • music cognition

  • music control and performance

  • music information retrieval

  • music technology

  • sonic interaction design

  • virtual reality

Published Papers (27 papers)

View options order results:
result details:
Displaying articles 1-27
Export citation of selected articles as:

Research

Jump to: Review

Open AccessArticle Analyzing Free-Hand Sound-Tracings of Melodic Phrases
Appl. Sci. 2018, 8(1), 135; doi:10.3390/app8010135
Received: 31 October 2017 / Revised: 9 December 2017 / Accepted: 15 January 2018 / Published: 18 January 2018
PDF Full-text (2858 KB) | HTML Full-text | XML Full-text
Abstract
In this paper, we report on a free-hand motion capture study in which 32 participants ‘traced’ 16 melodic vocal phrases with their hands in the air in two experimental conditions. Melodic contours are often thought of as correlated with vertical movement (up and
[...] Read more.
In this paper, we report on a free-hand motion capture study in which 32 participants ‘traced’ 16 melodic vocal phrases with their hands in the air in two experimental conditions. Melodic contours are often thought of as correlated with vertical movement (up and down) in time, and this was also our initial expectation. We did find an arch shape for most of the tracings, although this did not correspond directly to the melodic contours. Furthermore, representation of pitch in the vertical dimension was but one of a diverse range of movement strategies used to trace the melodies. Six different mapping strategies were observed, and these strategies have been quantified and statistically tested. The conclusion is that metaphorical representation is much more common than a ‘graph-like’ rendering for such a melodic sound-tracing task. Other findings include a clear gender difference for some of the tracing strategies and an unexpected representation of melodies in terms of a small object for some of the Hindustani music examples. The data also show a tendency of participants moving within a shared ‘social box’. Full article
(This article belongs to the Special Issue Sound and Music Computing)
Figures

Open AccessArticle Desert and Sonic Ecosystems: Incorporating Environmental Factors within Site-Responsive Sonic Art
Appl. Sci. 2018, 8(1), 111; doi:10.3390/app8010111
Received: 3 November 2017 / Revised: 4 January 2018 / Accepted: 9 January 2018 / Published: 14 January 2018
PDF Full-text (17748 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
Advancements in embedded computer platforms have allowed data to be collected and shared between objects—or smart devices—in a network. While this has resulted in highly functional outcomes in fields such as automation and monitoring, there are also implications for artistic and expressive systems.
[...] Read more.
Advancements in embedded computer platforms have allowed data to be collected and shared between objects—or smart devices—in a network. While this has resulted in highly functional outcomes in fields such as automation and monitoring, there are also implications for artistic and expressive systems. In this paper we present a pluralistic approach to incorporating environmental factors within the field of site-responsive sonic art using embedded audio and data processing techniques. In particular, we focus on the role of such systems within an ecosystemic framework, both in terms of incorporating systems of living organisms, as well as sonic interaction design. We describe the implementation of such a system within a large-scale site-responsive sonic art installation that took place in the subtropical desert climate of Arizona in 2017. Full article
(This article belongs to the Special Issue Sound and Music Computing)
Figures

Figure 1

Open AccessArticle Application of Machine Learning for the Spatial Analysis of Binaural Room Impulse Responses
Appl. Sci. 2018, 8(1), 105; doi:10.3390/app8010105
Received: 30 October 2017 / Revised: 12 December 2017 / Accepted: 26 December 2017 / Published: 12 January 2018
PDF Full-text (6703 KB) | HTML Full-text | XML Full-text
Abstract
Spatial impulse response analysis techniques are commonly used in the field of acoustics, as they help to characterise the interaction of sound with an enclosed environment. This paper presents a novel approach for spatial analyses of binaural impulse responses, using a binaural model
[...] Read more.
Spatial impulse response analysis techniques are commonly used in the field of acoustics, as they help to characterise the interaction of sound with an enclosed environment. This paper presents a novel approach for spatial analyses of binaural impulse responses, using a binaural model fronted neural network. The proposed method uses binaural cues utilised by the human auditory system, which are mapped by the neural network to the azimuth direction of arrival classes. A cascade-correlation neural network was trained using a multi-conditional training dataset of head-related impulse responses with added noise. The neural network is tested using a set of binaural impulse responses captured using two dummy head microphones in an anechoic chamber, with a reflective boundary positioned to produce a reflection with a known direction of arrival. Results showed that the neural network was generalisable for the direct sound of the binaural room impulse responses for both dummy head microphones. However, it was found to be less accurate at predicting the direction of arrival of the reflections. The work indicates the potential of using such an algorithm for the spatial analysis of binaural impulse responses, while indicating where the method applied needs to be made more robust for more general application. Full article
(This article belongs to the Special Issue Sound and Music Computing)
Figures

Figure 1

Open AccessArticle Live Convolution with Time-Varying Filters
Appl. Sci. 2018, 8(1), 103; doi:10.3390/app8010103
Received: 31 October 2017 / Revised: 29 December 2017 / Accepted: 3 January 2018 / Published: 12 January 2018
PDF Full-text (1815 KB) | HTML Full-text | XML Full-text
Abstract
The paper presents two new approaches to artefact-free real-time updates of the impulse response in convolution. Both approaches are based on incremental updates of the filter. This can be useful for several applications within digital audio processing: parametrisation of convolution reverbs, dynamic filters,
[...] Read more.
The paper presents two new approaches to artefact-free real-time updates of the impulse response in convolution. Both approaches are based on incremental updates of the filter. This can be useful for several applications within digital audio processing: parametrisation of convolution reverbs, dynamic filters, and live convolution. The development of these techniques has been done within the framework of a research project on crossadaptive audio processing methods for live performance. Our main motivation has thus been live convolution, where the signals from two music performers are convolved with each other, allowing the musicians to “play through each other’s sound”. Full article
(This article belongs to the Special Issue Sound and Music Computing)
Figures

Figure 1

Open AccessArticle Audlet Filter Banks: A Versatile Analysis/Synthesis Framework Using Auditory Frequency Scales
Appl. Sci. 2018, 8(1), 96; doi:10.3390/app8010096
Received: 3 November 2017 / Revised: 15 December 2017 / Accepted: 3 January 2018 / Published: 11 January 2018
PDF Full-text (586 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
Many audio applications rely on filter banks (FBs) to analyze, process, and re-synthesize sounds. For these applications, an important property of the analysis–synthesis system is the reconstruction error; it has to be minimized to avoid audible artifacts. Other advantageous properties include stability and
[...] Read more.
Many audio applications rely on filter banks (FBs) to analyze, process, and re-synthesize sounds. For these applications, an important property of the analysis–synthesis system is the reconstruction error; it has to be minimized to avoid audible artifacts. Other advantageous properties include stability and low redundancy. To exploit some aspects of auditory perception in the signal chain, some applications rely on FBs that approximate the frequency analysis performed in the auditory periphery, the gammatone FB being a popular example. However, current gammatone FBs only allow partial reconstruction and stability at high redundancies. In this article, we construct an analysis–synthesis system for audio applications. The proposed system, referred to as Audlet, is an oversampled FB with filters distributed on auditory frequency scales. It allows perfect reconstruction for a wide range of FB settings (e.g., the shape and density of filters), efficient FB design, and adaptable redundancy. In particular, we show how to construct a gammatone FB with perfect reconstruction. Experiments demonstrate performance improvements of the proposed gammatone FB when compared to current gammatone FBs in terms of reconstruction error and stability, especially at low redundancies. An application of the framework to audio source separation illustrates its utility for audio processing. Full article
(This article belongs to the Special Issue Sound and Music Computing)
Figures

Figure 1

Open AccessArticle A Real-Time Sound Field Rendering Processor
Appl. Sci. 2018, 8(1), 35; doi:10.3390/app8010035
Received: 3 November 2017 / Revised: 5 December 2017 / Accepted: 18 December 2017 / Published: 28 December 2017
PDF Full-text (4206 KB) | HTML Full-text | XML Full-text
Abstract
Real-time sound field renderings are computationally intensive and memory-intensive. Traditional rendering systems based on computer simulations suffer from memory bandwidth and arithmetic units. The computation is time-consuming, and the sample rate of the output sound is low because of the long computation time
[...] Read more.
Real-time sound field renderings are computationally intensive and memory-intensive. Traditional rendering systems based on computer simulations suffer from memory bandwidth and arithmetic units. The computation is time-consuming, and the sample rate of the output sound is low because of the long computation time at each time step. In this work, a processor with a hybrid architecture is proposed to speed up computation and improve the sample rate of the output sound, and an interface is developed for system scalability through simply cascading many chips to enlarge the simulated area. To render a three-minute Beethoven wave sound in a small shoe-box room with dimensions of 1.28 m × 1.28 m × 0.64 m, the field programming gate array (FPGA)-based prototype machine with the proposed architecture carries out the sound rendering at run-time while the software simulation with the OpenMP parallelization takes about 12.70 min on a personal computer (PC) with 32 GB random access memory (RAM) and an Intel i7-6800K six-core processor running at 3.4 GHz. The throughput in the software simulation is about 194 M grids/s while it is 51.2 G grids/s in the prototype machine even if the clock frequency of the prototype machine is much lower than that of the PC. The rendering processor with a processing element (PE) and interfaces consumes about 238,515 gates after fabricated by the 0.18 µm processing technology from the ROHM semiconductor Co., Ltd. (Kyoto Japan), and the power consumption is about 143.8 mW. Full article
(This article belongs to the Special Issue Sound and Music Computing)
Figures

Open AccessArticle Populating the Mix Space: Parametric Methods for Generating Multitrack Audio Mixtures
Appl. Sci. 2017, 7(12), 1329; doi:10.3390/app7121329
Received: 31 October 2017 / Revised: 24 November 2017 / Accepted: 4 December 2017 / Published: 20 December 2017
PDF Full-text (1375 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
The creation of multitrack mixes by audio engineers is a time-consuming activity and creating high-quality mixes requires a great deal of knowledge and experience. Previous studies on the perception of music mixes have been limited by the relatively small number of human-made mixes
[...] Read more.
The creation of multitrack mixes by audio engineers is a time-consuming activity and creating high-quality mixes requires a great deal of knowledge and experience. Previous studies on the perception of music mixes have been limited by the relatively small number of human-made mixes analysed. This paper describes a novel “mix-space”, a parameter space which contains all possible mixes using a finite set of tools, as well as methods for the parametric generation of artificial mixes in this space. Mixes that use track gain, panning and equalisation are considered. This allows statistical methods to be used in the study of music mixing practice, such as Monte Carlo simulations or population-based optimisation methods. Two applications are described: an investigation into the robustness and accuracy of tempo-estimation algorithms and an experiment to estimate distributions of spectral centroid values within sets of mixes. The potential for further work is also described. Full article
(This article belongs to the Special Issue Sound and Music Computing)
Figures

Figure 1

Open AccessFeature PaperArticle Virtual Analog Models of the Lockhart and Serge Wavefolders
Appl. Sci. 2017, 7(12), 1328; doi:10.3390/app7121328
Received: 12 October 2017 / Revised: 10 November 2017 / Accepted: 13 December 2017 / Published: 20 December 2017
PDF Full-text (11615 KB) | HTML Full-text | XML Full-text
Abstract
Wavefolders are a particular class of nonlinear waveshaping circuits, and a staple of the “West Coast” tradition of analog sound synthesis. In this paper, we present analyses of two popular wavefolding circuits—the Lockhart and Serge wavefolders—and show that they achieve a very similar
[...] Read more.
Wavefolders are a particular class of nonlinear waveshaping circuits, and a staple of the “West Coast” tradition of analog sound synthesis. In this paper, we present analyses of two popular wavefolding circuits—the Lockhart and Serge wavefolders—and show that they achieve a very similar audio effect. We digitally model the input–output relationship of both circuits using the Lambert-W function, and examine their time- and frequency-domain behavior. To ameliorate the issue of aliasing distortion introduced by the nonlinear nature of wavefolding, we propose the use of the first-order antiderivative method. This method allows us to implement the proposed digital models in real-time without having to resort to high oversampling factors. The practical synthesis usage of both circuits is discussed by considering the case of multiple wavefolder stages arranged in series. Full article
(This article belongs to the Special Issue Sound and Music Computing)
Figures

Figure 1

Open AccessArticle Playing for a Virtual Audience: The Impact of a Social Factor on Gestures, Sounds and Expressive Intents
Appl. Sci. 2017, 7(12), 1321; doi:10.3390/app7121321
Received: 30 October 2017 / Revised: 14 November 2017 / Accepted: 13 December 2017 / Published: 19 December 2017
PDF Full-text (25421 KB) | HTML Full-text | XML Full-text
Abstract
Can we measure the impact of the presence of an audience on musicians’ performances? By exploring both acoustic and motion features for performances in Immersive Virtual Environments (IVEs), this study highlights the impact of the presence of a virtual audience on both the
[...] Read more.
Can we measure the impact of the presence of an audience on musicians’ performances? By exploring both acoustic and motion features for performances in Immersive Virtual Environments (IVEs), this study highlights the impact of the presence of a virtual audience on both the performance and the perception of authenticity and emotional intensity by listeners. Gestures and sounds produced were impacted differently when musicians performed at different expressive intents. The social factor made features converge towards values related to a habitual way of playing regardless of the expressive intent. This could be due to musicians’ habits to perform in a certain way in front of a crowd. On the listeners’ side, when comparing different expressive conditions, only one congruent condition (projected expressive intent in front of an audience) boosted the participants’ ratings for both authenticity and emotional intensity. At different values for kinetic energy and metrical centroid, stimuli recorded with an audience showed a different distribution of ratings, challenging the ecological validity of artificially created expressive intents. Finally, this study highlights the use of IVEs as a research tool and a training assistant for musicians who are eager to learn how to cope with their anxiety in front of an audience. Full article
(This article belongs to the Special Issue Sound and Music Computing)
Figures

Figure 1

Open AccessArticle Mobile Music, Sensors, Physical Modeling, and Digital Fabrication: Articulating the Augmented Mobile Instrument
Appl. Sci. 2017, 7(12), 1311; doi:10.3390/app7121311
Received: 31 October 2017 / Revised: 12 December 2017 / Accepted: 13 December 2017 / Published: 19 December 2017
PDF Full-text (1524 KB) | HTML Full-text | XML Full-text
Abstract
Two concepts are presented, extended, and unified in this paper: mobile device augmentation towards musical instruments design and the concept of hybrid instruments. The first consists of using mobile devices at the heart of novel musical instruments. Smartphones and tablets are augmented with
[...] Read more.
Two concepts are presented, extended, and unified in this paper: mobile device augmentation towards musical instruments design and the concept of hybrid instruments. The first consists of using mobile devices at the heart of novel musical instruments. Smartphones and tablets are augmented with passive and active elements that can take part in the production of sound (e.g., resonators, exciter, etc.), add new affordances to the device, or change its global aesthetics and shape. Hybrid instruments combine physical/acoustical and “physically informed” virtual/digital elements. Recent progress in physical modeling of musical instruments and digital fabrication is exploited to treat instrument parts in a multidimensional way, allowing any physical element to be substituted with a virtual one and vice versa (as long as it is physically possible). A wide range of tools to design mobile hybrid instruments is introduced and evaluated. Aesthetic and design considerations when making such instruments are also presented through a series of examples. Full article
(This article belongs to the Special Issue Sound and Music Computing)
Figures

Figure 1

Open AccessFeature PaperArticle A Neural Parametric Singing Synthesizer Modeling Timbre and Expression from Natural Songs
Appl. Sci. 2017, 7(12), 1313; doi:10.3390/app7121313
Received: 3 November 2017 / Revised: 30 November 2017 / Accepted: 12 December 2017 / Published: 18 December 2017
PDF Full-text (2125 KB) | HTML Full-text | XML Full-text
Abstract
We recently presented a new model for singing synthesis based on a modified version of the WaveNet architecture. Instead of modeling raw waveform, we model features produced by a parametric vocoder that separates the influence of pitch and timbre. This allows conveniently modifying
[...] Read more.
We recently presented a new model for singing synthesis based on a modified version of the WaveNet architecture. Instead of modeling raw waveform, we model features produced by a parametric vocoder that separates the influence of pitch and timbre. This allows conveniently modifying pitch to match any target melody, facilitates training on more modest dataset sizes, and significantly reduces training and generation times. Nonetheless, compared to modeling waveform directly, ways of effectively handling higher-dimensional outputs, multiple feature streams and regularization become more important with our approach. In this work, we extend our proposed system to include additional components for predicting F0 and phonetic timings from a musical score with lyrics. These expression-related features are learned together with timbrical features from a single set of natural songs. We compare our method to existing statistical parametric, concatenative, and neural network-based approaches using quantitative metrics as well as listening tests. Full article
(This article belongs to the Special Issue Sound and Music Computing)
Figures

Figure 1

Open AccessArticle A Psychoacoustic-Based Multiple Audio Object Coding Approach via Intra-Object Sparsity
Appl. Sci. 2017, 7(12), 1301; doi:10.3390/app7121301
Received: 29 October 2017 / Revised: 3 December 2017 / Accepted: 12 December 2017 / Published: 14 December 2017
PDF Full-text (2320 KB) | HTML Full-text | XML Full-text
Abstract
Rendering spatial sound scenes via audio objects has become popular in recent years, since it can provide more flexibility for different auditory scenarios, such as 3D movies, spatial audio communication and virtual classrooms. To facilitate high-quality bitrate-efficient distribution for spatial audio objects, an
[...] Read more.
Rendering spatial sound scenes via audio objects has become popular in recent years, since it can provide more flexibility for different auditory scenarios, such as 3D movies, spatial audio communication and virtual classrooms. To facilitate high-quality bitrate-efficient distribution for spatial audio objects, an encoding scheme based on intra-object sparsity (approximate k-sparsity of the audio object itself) is proposed in this paper. The statistical analysis is presented to validate the notion that the audio object has a stronger sparseness in the Modified Discrete Cosine Transform (MDCT) domain than in the Short Time Fourier Transform (STFT) domain. By exploiting intra-object sparsity in the MDCT domain, multiple simultaneously occurring audio objects are compressed into a mono downmix signal with side information. To ensure a balanced perception quality of audio objects, a Psychoacoustic-based time-frequency instants sorting algorithm and an energy equalized Number of Preserved Time-Frequency Bins (NPTF) allocation strategy are proposed, which are employed in the underlying compression framework. The downmix signal can be further encoded via Scalar Quantized Vector Huffman Coding (SQVH) technique at a desirable bitrate, and the side information is transmitted in a lossless manner. Both objective and subjective evaluations show that the proposed encoding scheme outperforms the Sparsity Analysis (SPA) approach and Spatial Audio Object Coding (SAOC) in cases where eight objects were jointly encoded. Full article
(This article belongs to the Special Issue Sound and Music Computing)
Figures

Figure 1

Open AccessArticle Wearable Vibration Based Computer Interaction and Communication System for Deaf
Appl. Sci. 2017, 7(12), 1296; doi:10.3390/app7121296
Received: 29 September 2017 / Revised: 25 November 2017 / Accepted: 11 December 2017 / Published: 13 December 2017
PDF Full-text (1683 KB) | HTML Full-text | XML Full-text
Abstract
In individuals with impaired hearing, determining the direction of sound is a significant problem. The direction of sound was determined in this study, which allowed hearing impaired individuals to perceive where sounds originated. This study also determined whether something was being spoken loudly
[...] Read more.
In individuals with impaired hearing, determining the direction of sound is a significant problem. The direction of sound was determined in this study, which allowed hearing impaired individuals to perceive where sounds originated. This study also determined whether something was being spoken loudly near the hearing impaired individual. In this manner, it was intended that they should be able to recognize panic conditions more quickly. The developed wearable system has four microphone inlets, two vibration motor outlets, and four Light Emitting Diode (LED) outlets. The vibration of motors placed on the right and left fingertips permits the indication of the direction of sound through specific vibration frequencies. This study applies the ReliefF feature selection method to evaluate every feature in comparison to other features and determine which features are more effective in the classification phase. This study primarily selects the best feature extraction and classification methods. Then, the prototype device has been tested using these selected methods on themselves. ReliefF feature selection methods are used in the studies; the success of K nearest neighborhood (Knn) classification had a 93% success rate and classification with Support Vector Machine (SVM) had a 94% success rate. At close range, SVM and two of the best feature methods were used and returned a 98% success rate. When testing our wearable devices on users in real time, we used a classification technique to detect the direction and our wearable devices responded in 0.68 s; this saves power in comparison to traditional direction detection methods. Meanwhile, if there was an echo in an indoor environment, the success rate increased; the echo canceller was disabled in environments without an echo to save power. We also compared our system with the localization algorithm based on the microphone array; the wearable device that we developed had a high success rate and it produced faster results at lower cost than other methods. This study provides a new idea for the benefit of deaf individuals that is preferable to a computer environment. Full article
(This article belongs to the Special Issue Sound and Music Computing)
Figures

Figure 1

Open AccessFeature PaperArticle Audio Time Stretching Using Fuzzy Classification of Spectral Bins
Appl. Sci. 2017, 7(12), 1293; doi:10.3390/app7121293
Received: 3 November 2017 / Revised: 3 December 2017 / Accepted: 7 December 2017 / Published: 12 December 2017
PDF Full-text (1630 KB) | HTML Full-text | XML Full-text
Abstract
A novel method for audio time stretching has been developed. In time stretching, the audio signal’s duration is expanded, whereas its frequency content remains unchanged. The proposed time stretching method employs the new concept of fuzzy classification of time-frequency points, or bins, in
[...] Read more.
A novel method for audio time stretching has been developed. In time stretching, the audio signal’s duration is expanded, whereas its frequency content remains unchanged. The proposed time stretching method employs the new concept of fuzzy classification of time-frequency points, or bins, in the spectrogram of the signal. Each time-frequency bin is assigned, using a continuous membership function, to three signal classes: tonalness, noisiness, and transientness. The method does not require the signal to be explicitly decomposed into different components, but instead, the computing of phase propagation, which is required for time stretching, is handled differently in each time-frequency point according to the fuzzy membership values. The new method is compared with three previous time-stretching methods by means of a listening test. The test results show that the proposed method yields slightly better sound quality for large stretching factors as compared to a state-of-the-art algorithm, and practically the same quality as a commercial algorithm. The sound quality of all tested methods is dependent on the audio signal type. According to this study, the proposed method performs well on music signals consisting of mixed tonal, noisy, and transient components, such as singing, techno music, and a jazz recording containing vocals. It performs less well on music containing only noisy and transient sounds, such as a drum solo. The proposed method is applicable to the high-quality time stretching of a wide variety of music signals. Full article
(This article belongs to the Special Issue Sound and Music Computing)
Figures

Figure 1

Open AccessFeature PaperArticle Automatic Transcription of Polyphonic Vocal Music
Appl. Sci. 2017, 7(12), 1285; doi:10.3390/app7121285 (registering DOI)
Received: 31 October 2017 / Revised: 1 December 2017 / Accepted: 4 December 2017 / Published: 11 December 2017
PDF Full-text (647 KB) | HTML Full-text | XML Full-text
Abstract
This paper presents a method for automatic music transcription applied to audio recordings of a cappella performances with multiple singers. We propose a system for multi-pitch detection and voice assignment that integrates an acoustic and a music language model. The acoustic model performs
[...] Read more.
This paper presents a method for automatic music transcription applied to audio recordings of a cappella performances with multiple singers. We propose a system for multi-pitch detection and voice assignment that integrates an acoustic and a music language model. The acoustic model performs spectrogram decomposition, extending probabilistic latent component analysis (PLCA) using a six-dimensional dictionary with pre-extracted log-spectral templates. The music language model performs voice separation and assignment using hidden Markov models that apply musicological assumptions. By integrating the two models, the system is able to detect multiple concurrent pitches in polyphonic vocal music and assign each detected pitch to a specific voice type such as soprano, alto, tenor or bass (SATB). We compare our system against multiple baselines, achieving state-of-the-art results for both multi-pitch detection and voice assignment on a dataset of Bach chorales and another of barbershop quartets. We also present an additional evaluation of our system using varied pitch tolerance levels to investigate its performance at 20-cent pitch resolution. Full article
(This article belongs to the Special Issue Sound and Music Computing)
Figures

Open AccessArticle The Effects of Musical Experience and Hearing Loss on Solving an Audio-Based Gaming Task
Appl. Sci. 2017, 7(12), 1278; doi:10.3390/app7121278
Received: 23 October 2017 / Revised: 2 December 2017 / Accepted: 5 December 2017 / Published: 10 December 2017
PDF Full-text (4665 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
We conducted an experiment using a purposefully designed audio-based game called the Music Puzzle with Japanese university students with different levels of hearing acuity and experience with music in order to determine the effects of these factors on solving such games. A group
[...] Read more.
We conducted an experiment using a purposefully designed audio-based game called the Music Puzzle with Japanese university students with different levels of hearing acuity and experience with music in order to determine the effects of these factors on solving such games. A group of hearing-impaired students (n = 12) was compared with two hearing control groups with the additional characteristic of having high (n = 12) or low (n = 12) engagement in musical activities. The game was played with three sound sets or modes; speech, music, and a mix of the two. The results showed that people with hearing loss had longer processing times for sounds when playing the game. Solving the game task in the speech mode was found particularly difficult for the group with hearing loss, and while they found the game difficult in general, they expressed a fondness for the game and a preference for music. Participants with less musical experience showed difficulties in playing the game with musical material. We were able to explain the impacts of hearing acuity and musical experience; furthermore, we can promote this kind of tool as a viable way to train hearing by focused listening to sound, particularly with music. Full article
(This article belongs to the Special Issue Sound and Music Computing)
Figures

Figure 1

Open AccessFeature PaperArticle Optimization of Virtual Loudspeakers for Spatial Room Acoustics Reproduction with Headphones
Appl. Sci. 2017, 7(12), 1282; doi:10.3390/app7121282
Received: 31 October 2017 / Revised: 24 November 2017 / Accepted: 5 December 2017 / Published: 9 December 2017
PDF Full-text (579 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
The use of headphones in reproducing spatial sound is becoming more and more popular. For instance, virtual reality applications often use head-tracking to keep the binaurally reproduced auditory environment stable and to improve externalization. Here, we study one spatial sound reproduction method over
[...] Read more.
The use of headphones in reproducing spatial sound is becoming more and more popular. For instance, virtual reality applications often use head-tracking to keep the binaurally reproduced auditory environment stable and to improve externalization. Here, we study one spatial sound reproduction method over headphones, in particular the positioning of the virtual loudspeakers. The paper presents an algorithm that optimizes the positioning of virtual reproduction loudspeakers to reduce the computational cost in head-tracked real-time rendering. The listening test results suggest that listeners could discriminate the optimized loudspeaker arrays for renderings that reproduced a relatively simple acoustic conditions, but optimized array was not significantly different from equally spaced array for a reproduction of a more complex case. Moreover, the optimization seems to change the perceived openness and timbre, according to the verbal feedback of the test subjects. Full article
(This article belongs to the Special Issue Sound and Music Computing)
Figures

Figure 1

Open AccessArticle Melodic Similarity and Applications Using Biologically-Inspired Techniques
Appl. Sci. 2017, 7(12), 1242; doi:10.3390/app7121242
Received: 30 September 2017 / Revised: 23 November 2017 / Accepted: 27 November 2017 / Published: 1 December 2017
PDF Full-text (926 KB) | HTML Full-text | XML Full-text
Abstract
Music similarity is a complex concept that manifests itself in areas such as Music Information Retrieval (MIR), musicological analysis and music cognition. Modelling the similarity of two music items is key for a number of music-related applications, such as cover song detection and
[...] Read more.
Music similarity is a complex concept that manifests itself in areas such as Music Information Retrieval (MIR), musicological analysis and music cognition. Modelling the similarity of two music items is key for a number of music-related applications, such as cover song detection and query-by-humming. Typically, similarity models are based on intuition, heuristics or small-scale cognitive experiments; thus, applicability to broader contexts cannot be guaranteed. We argue that data-driven tools and analysis methods, applied to songs known to be related, can potentially provide us with information regarding the fine-grained nature of music similarity. Interestingly, music and biological sequences share a number of parallel concepts; from the natural sequence-representation, to their mechanisms of generating variations, i.e., oral transmission and evolution respectively. As such, there is a great potential for applying scientific methods and tools from bioinformatics to music. Stripped-down from biological heuristics, certain bioinformatics approaches can be generalized to any type of sequence. Consequently, reliable and unbiased data-driven solutions to problems such as biological sequence similarity and conservation analysis can be applied to music similarity and stability analysis. Our paper relies on such an approach to tackle a number of tasks and more notably to model global melodic similarity. Full article
(This article belongs to the Special Issue Sound and Music Computing)
Figures

Figure 1

Open AccessArticle Exploring the Effects of Pitch Layout on Learning a New Musical Instrument
Appl. Sci. 2017, 7(12), 1218; doi:10.3390/app7121218
Received: 27 October 2017 / Revised: 21 November 2017 / Accepted: 21 November 2017 / Published: 24 November 2017
PDF Full-text (2044 KB) | HTML Full-text | XML Full-text
Abstract
Although isomorphic pitch layouts are proposed to afford various advantages for musicians playing new musical instruments, this paper details the first substantive set of empirical tests on how two fundamental aspects of isomorphic pitch layouts affect motor learning: shear, which makes the
[...] Read more.
Although isomorphic pitch layouts are proposed to afford various advantages for musicians playing new musical instruments, this paper details the first substantive set of empirical tests on how two fundamental aspects of isomorphic pitch layouts affect motor learning: shear, which makes the pitch axis vertical, and the adjacency (or nonadjacency) of pitches a major second apart. After receiving audio-visual training tasks for a scale and arpeggios, performance accuracies of 24 experienced musicians were assessed in immediate retention tasks (same as the training tasks, but without the audio-visual guidance) and in a transfer task (performance of a previously untrained nursery rhyme). Each participant performed the same tasks with three different pitch layouts and, in total, four different layouts were tested. Results show that, so long as the performance ceiling has not already been reached (due to ease of the task or repeated practice), adjacency strongly improves performance accuracy in the training and retention tasks. They also show that shearing the layout, to make the pitch axis vertical, worsens performance accuracy for the training tasks but, crucially, it strongly improves performance accuracy in the transfer task when the participant needs to perform a new, but related, task. These results can inform the design of pitch layouts in new musical instruments. Full article
(This article belongs to the Special Issue Sound and Music Computing)
Figures

Figure 1

Open AccessArticle EigenScape: A Database of Spatial Acoustic Scene Recordings
Appl. Sci. 2017, 7(11), 1204; doi:10.3390/app7111204
Received: 23 October 2017 / Revised: 21 November 2017 / Accepted: 8 November 2017 / Published: 22 November 2017
PDF Full-text (3045 KB) | HTML Full-text | XML Full-text
Abstract
The classification of acoustic scenes and events is an emerging area of research in the field of machine listening. Most of the research conducted so far uses spectral features extracted from monaural or stereophonic audio rather than spatial features extracted from multichannel recordings.
[...] Read more.
The classification of acoustic scenes and events is an emerging area of research in the field of machine listening. Most of the research conducted so far uses spectral features extracted from monaural or stereophonic audio rather than spatial features extracted from multichannel recordings. This is partly due to the lack thus far of a substantial body of spatial recordings of acoustic scenes. This paper formally introduces EigenScape, a new database of fourth-order Ambisonic recordings of eight different acoustic scene classes. The potential applications of a spatial machine listening system are discussed before detailed information on the recording process and dataset are provided. A baseline spatial classification system using directional audio coding (DirAC) techniques is detailed and results from this classifier are presented. The classifier is shown to give good overall scene classification accuracy across the dataset, with 7 of 8 scenes being classified with an accuracy of greater than 60% with an 11% improvement in overall accuracy compared to use of Mel-frequency cepstral coefficient (MFCC) features. Further analysis of the results shows potential improvements to the classifier. It is concluded that the results validate the new database and show that spatial features can characterise acoustic scenes and as such are worthy of further investigation. Full article
(This article belongs to the Special Issue Sound and Music Computing)
Figures

Figure 1

Open AccessArticle Identifying Single Trial Event-Related Potentials in an Earphone-Based Auditory Brain-Computer Interface
Appl. Sci. 2017, 7(11), 1197; doi:10.3390/app7111197
Received: 20 October 2017 / Accepted: 17 November 2017 / Published: 21 November 2017
PDF Full-text (3513 KB) | HTML Full-text | XML Full-text
Abstract
As brain-computer interfaces (BCI) must provide reliable ways for end users to accomplish a specific task, methods to secure the best possible translation of the intention of the users are constantly being explored. In this paper, we propose and test a number of
[...] Read more.
As brain-computer interfaces (BCI) must provide reliable ways for end users to accomplish a specific task, methods to secure the best possible translation of the intention of the users are constantly being explored. In this paper, we propose and test a number of convolutional neural network (CNN) structures to identify and classify single-trial P300 in electroencephalogram (EEG) readings of an auditory BCI. The recorded data correspond to nine subjects in a series of experiment sessions in which auditory stimuli following the oddball paradigm were presented via earphones from six different virtual directions at time intervals of 200, 300, 400 and 500 ms. Using three different approaches for the pooling process, we report the average accuracy for 18 CNN structures. The results obtained for most of the CNN models show clear improvement over past studies in similar contexts, as well as over other commonly-used classifiers. We found that the models that consider data from the time and space domains and those that overlap in the pooling process usually offer better results regardless of the number of layers. Additionally, patterns of improvement with single-layered CNN models can be observed. Full article
(This article belongs to the Special Issue Sound and Music Computing)
Figures

Figure 1

Open AccessFeature PaperArticle Sound Synthesis of Objects Swinging through Air Using Physical Models
Appl. Sci. 2017, 7(11), 1177; doi:10.3390/app7111177
Received: 12 October 2017 / Accepted: 10 November 2017 / Published: 16 November 2017
PDF Full-text (953 KB) | HTML Full-text | XML Full-text
Abstract
A real-time physically-derived sound synthesis model is presented that replicates the sounds generated as an object swings through the air. Equations obtained from fluid dynamics are used to determine the sounds generated while exposing practical parameters for a user or game engine to
[...] Read more.
A real-time physically-derived sound synthesis model is presented that replicates the sounds generated as an object swings through the air. Equations obtained from fluid dynamics are used to determine the sounds generated while exposing practical parameters for a user or game engine to vary. Listening tests reveal that for the majority of objects modelled, participants rated the sounds from our model as plausible as actual recordings. The sword sound effect performed worse than others, and it is speculated that one cause may be linked to the difference between expectations of a sound and the actual sound for a given object. Full article
(This article belongs to the Special Issue Sound and Music Computing)
Figures

Figure 1

Open AccessArticle SymCHM—An Unsupervised Approach for Pattern Discovery in Symbolic Music with a Compositional Hierarchical Model
Appl. Sci. 2017, 7(11), 1135; doi:10.3390/app7111135
Received: 13 September 2017 / Revised: 25 October 2017 / Accepted: 1 November 2017 / Published: 4 November 2017
PDF Full-text (795 KB) | HTML Full-text | XML Full-text
Abstract
This paper presents a compositional hierarchical model for pattern discovery in symbolic music. The model can be regarded as a deep architecture with a transparent structure. It can learn a set of repeated patterns within individual works or larger corpora in an unsupervised
[...] Read more.
This paper presents a compositional hierarchical model for pattern discovery in symbolic music. The model can be regarded as a deep architecture with a transparent structure. It can learn a set of repeated patterns within individual works or larger corpora in an unsupervised manner, relying on statistics of pattern occurrences, and robustly infer the learned patterns in new, unknown works. A learned model contains representations of patterns on different layers, from the simple short structures on lower layers to the longer and more complex music structures on higher layers. A pattern selection procedure can be used to extract the most frequent patterns from the model. We evaluate the model on the publicly available JKU Patterns Datasetsand compare the results to other approaches. Full article
(This article belongs to the Special Issue Sound and Music Computing)
Figures

Figure 1

Open AccessArticle Supporting an Object-Oriented Approach to Unit Generator Development: The Csound Plugin Opcode Framework
Appl. Sci. 2017, 7(10), 970; doi:10.3390/app7100970
Received: 31 July 2017 / Revised: 8 September 2017 / Accepted: 18 September 2017 / Published: 21 September 2017
PDF Full-text (1163 KB) | HTML Full-text | XML Full-text
Abstract
This article presents a new framework for unit generator development for Csound, supporting a full object-oriented programming approach. It introduces the concept of unit generators and opcodes, and its centrality with regards to music programming languages in general, and Csound in specific. The
[...] Read more.
This article presents a new framework for unit generator development for Csound, supporting a full object-oriented programming approach. It introduces the concept of unit generators and opcodes, and its centrality with regards to music programming languages in general, and Csound in specific. The layout of an opcode from the perspective of the Csound C-language API is presented, with some outline code examples. This is followed by a discussion which places the unit generator within the object-oriented paradigm and the motivation for a full C++ programming support, which is provided by the Csound Plugin Opcode Framework (CPOF). The design of CPOF is then explored in detail, supported by several opcode examples. The article concludes by discussing two key applications of object-orientation and their respective instances in the Csound code base. Full article
(This article belongs to the Special Issue Sound and Music Computing)
Figures

Figure 1

Open AccessArticle A Two-Stage Approach to Note-Level Transcription of a Specific Piano
Appl. Sci. 2017, 7(9), 901; doi:10.3390/app7090901
Received: 22 July 2017 / Revised: 25 August 2017 / Accepted: 29 August 2017 / Published: 2 September 2017
PDF Full-text (12209 KB) | HTML Full-text | XML Full-text
Abstract
This paper presents a two-stage transcription framework for a specific piano, which combines deep learning and spectrogram factorization techniques. In the first stage, two convolutional neural networks (CNNs) are adopted to recognize the notes of the piano preliminarily, and note verification for the
[...] Read more.
This paper presents a two-stage transcription framework for a specific piano, which combines deep learning and spectrogram factorization techniques. In the first stage, two convolutional neural networks (CNNs) are adopted to recognize the notes of the piano preliminarily, and note verification for the specific individual is conducted in the second stage. The note recognition stage is independent of piano individual, in which one CNN is used to detect onsets and another is used to estimate the probabilities of pitches at each detected onset. Hence, candidate pitches at candidate onsets are obtained in the first stage. During the note verification, templates for the specific piano are generated to model the attack of note per pitch. Then, the spectrogram of the segment around candidate onset is factorized using attack templates of candidate pitches. In this way, not only the pitches are picked up by note activations, but the onsets are revised. Experiments show that CNN outperforms other types of neural networks in both onset detection and pitch estimation, and the combination of two CNNs yields better performance than a single CNN in note recognition. We also observe that note verification further improves the performance of transcription. In the transcription of a specific piano, the proposed system achieves 82% on note-wise F-measure, which outperforms the state-of-the-art. Full article
(This article belongs to the Special Issue Sound and Music Computing)
Figures

Figure 1

Open AccessArticle A Low Cost Wireless Acoustic Sensor for Ambient Assisted Living Systems
Appl. Sci. 2017, 7(9), 877; doi:10.3390/app7090877
Received: 31 July 2017 / Revised: 24 August 2017 / Accepted: 25 August 2017 / Published: 27 August 2017
PDF Full-text (1629 KB) | HTML Full-text | XML Full-text
Abstract
Ambient Assisted Living (AAL) has become an attractive research topic due to growing interest in remote monitoring of older people. Development in sensor technologies and advances in wireless communications allows to remotely offer smart assistance and monitor those people at their own home,
[...] Read more.
Ambient Assisted Living (AAL) has become an attractive research topic due to growing interest in remote monitoring of older people. Development in sensor technologies and advances in wireless communications allows to remotely offer smart assistance and monitor those people at their own home, increasing their quality of life. In this context, Wireless Acoustic Sensor Networks (WASN) provide a suitable way for implementing AAL systems which can be used to infer hazardous situations via environmental sounds identification. Nevertheless, satisfying sensor solutions have not been found with the considerations of both low cost and high performance. In this paper, we report the design and implementation of a wireless acoustic sensor to be located at the edge of a WASN for recording and processing environmental sounds which can be applied to AAL systems for personal healthcare because it has the following significant advantages: low cost, small size, audio sampling and computation capabilities for audio processing. The proposed wireless acoustic sensor is able to record audio samples at least to 10 kHz sampling frequency and 12-bit resolution. Also, it is capable of doing audio signal processing without compromising the sample rate and the energy consumption by using a new microcontroller released at the last quarter of 2016. The proposed low cost wireless acoustic sensor has been verified using four randomness tests for doing statistical analysis and a classification system of the recorded sounds based on audio fingerprints. Full article
(This article belongs to the Special Issue Sound and Music Computing)
Figures

Figure 1

Review

Jump to: Research

Open AccessReview Room Response Equalization—A Review
Appl. Sci. 2018, 8(1), 16; doi:10.3390/app8010016
Received: 3 November 2017 / Revised: 7 December 2017 / Accepted: 12 December 2017 / Published: 23 December 2017
PDF Full-text (5709 KB) | HTML Full-text | XML Full-text
Abstract
Room response equalization aims at improving the sound reproduction in rooms by applying advanced digital signal processing techniques to design an equalizer on the basis of one or more measurements of the room response. This topic has been intensively studied in the last
[...] Read more.
Room response equalization aims at improving the sound reproduction in rooms by applying advanced digital signal processing techniques to design an equalizer on the basis of one or more measurements of the room response. This topic has been intensively studied in the last 40 years, resulting in a number of effective techniques facing different aspects of the problem. This review paper aims at giving an overview of the existing methods following their historical evolution, and discussing pros and cons of each approach with relation to the room characteristics, as well as instrumental and perceptual measures. The review is concluded by a discussion on emerging topics and new trends. Full article
(This article belongs to the Special Issue Sound and Music Computing)
Figures

Figure 1

Back to Top