Special Issue "Audio Signal Processing"

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Acoustics".

Deadline for manuscript submissions: closed (15 March 2016)

Printed Edition Available!
A printed edition of this Special Issue is available here.

Special Issue Editor

Guest Editor
Prof. Dr. Vesa Valimaki

Department of Signal Processing and Acoustics, School of Electrical Engineering, Aalto University, P.O. Box 13000 FI-00076 Aalto, Espoo, Finland
Website | E-Mail
Interests: acoustic signal processing; audio signal processing; audio systems; music technology

Special Issue Information

Dear Colleagues,

Audio signal processing is a highly active research field where digital signal processing theory meets human sound perception and real-time programming requirements. It has a wide range of applications in computers, gaming, and music technology, to name a few of the largest areas. Successful applications include for example perceptual audio coding, digital music synthesizers, and music recognition software. The fact that music is now often listened to using headphones from a mobile device leads to new problems related to background noise control and signal enhancement. Developments in processor technology, such as parallel computing, are changing the way signal-processing algorithms are designed for audio.

In this Special Issue we want to address recent advances in the following topics:

-          Audio signal analysis
-          Music information retrieval
-          Enhancement and restoration of audio
-          Audio equalization and filtering
-          Audio effects processing
-          Sound synthesis and modeling
-          Audio coding
-          Sound capture and noise control
-          Sound source separation
-          Room acoustics and spatial audio
-          Signal processing for headphones and loudspeakers
-          High-performance computing in audio

Submissions are invited for both original research and review articles. Additionally, invited papers based on excellent contributions to recent conferences in this field will be included in this Special Issue. It is hoped that this collection of high-quality works in audio signal processing will serve as an inspiration for future research in this field.

Prof. Dr. Vesa Valimaki
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All papers will be peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1200 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • audio signal analysis
  • music information retrieval
  • enhancement and restoration of audio
  • audio equalization and filtering
  • audio effects processing
  • sound synthesis and modeling
  • audio coding
  • sound capture and noise control
  • sound source separation
  • room acoustics and spatial audio
  • signal processing for headphones and loudspeakers
  • high-performance computing in audio

Published Papers (20 papers)

View options order results:
result details:
Displaying articles 1-20
Export citation of selected articles as:

Research

Jump to: Review

Open AccessArticle Sinusoidal Parameter Estimation Using Quadratic Interpolation around Power-Scaled Magnitude Spectrum Peaks
Appl. Sci. 2016, 6(10), 306; doi:10.3390/app6100306
Received: 16 March 2016 / Accepted: 11 October 2016 / Published: 21 October 2016
PDF Full-text (714 KB) | HTML Full-text | XML Full-text
Abstract
The magnitude of the Discrete Fourier Transform (DFT) of a discrete-time signal has a limited frequency definition. Quadratic interpolation over the three DFT samples surrounding magnitude peaks improves the estimation of parameters (frequency and amplitude) of resolved sinusoids beyond that limit. Interpolating on
[...] Read more.
The magnitude of the Discrete Fourier Transform (DFT) of a discrete-time signal has a limited frequency definition. Quadratic interpolation over the three DFT samples surrounding magnitude peaks improves the estimation of parameters (frequency and amplitude) of resolved sinusoids beyond that limit. Interpolating on a rescaled magnitude spectrum using a logarithmic scale has been shown to improve those estimates. In this article, we show how to heuristically tune a power scaling parameter to outperform linear and logarithmic scaling at an equivalent computational cost. Although this power scaling factor is computed heuristically rather than analytically, it is shown to depend in a structured way on window parameters. Invariance properties of this family of estimators are studied and the existence of a bias due to noise is shown. Comparing to two state-of-the-art estimators, we show that an optimized power scaling has a lower systematic bias and lower mean-squared-error in noisy conditions for ten out of twelve common windowing functions. Full article
(This article belongs to the Special Issue Audio Signal Processing) Printed Edition available
Figures

Figure 1

Open AccessArticle Passive Guaranteed Simulation of Analog Audio Circuits: A Port-Hamiltonian Approach
Appl. Sci. 2016, 6(10), 273; doi:10.3390/app6100273
Received: 25 April 2016 / Revised: 7 September 2016 / Accepted: 13 September 2016 / Published: 24 September 2016
PDF Full-text (2739 KB) | HTML Full-text | XML Full-text
Abstract
We present a method that generates passive-guaranteed stable simulations of analog audio circuits from electronic schematics for real-time issues. On one hand, this method is based on a continuous-time power-balanced state-space representation structured into its energy-storing parts, dissipative parts, and external sources. On
[...] Read more.
We present a method that generates passive-guaranteed stable simulations of analog audio circuits from electronic schematics for real-time issues. On one hand, this method is based on a continuous-time power-balanced state-space representation structured into its energy-storing parts, dissipative parts, and external sources. On the other hand, a numerical scheme is especially designed to preserve this structure and the power balance. These state-space structures define the class of port-Hamiltonian systems. The derivation of this structured system associated with the electronic circuit is achieved by an automated analysis of the interconnection network combined with a dictionary of models for each elementary component. The numerical scheme is based on the combination of finite differences applied on the state (with respect to the time variable) and on the total energy (with respect to the state). This combination provides a discrete-time version of the power balance. This set of algorithms is valid for both the linear and nonlinear case. Finally, three applications of increasing complexities are given: a diode clipper, a common-emitter bipolar-junction transistor amplifier, and a wah pedal. The results are compared to offline simulations obtained from a popular circuit simulator. Full article
(This article belongs to the Special Issue Audio Signal Processing) Printed Edition available
Figures

Figure 1

Open AccessArticle Adaptive Wavelet Threshold Denoising Method for Machinery Sound Based on Improved Fruit Fly Optimization Algorithm
Appl. Sci. 2016, 6(7), 199; doi:10.3390/app6070199
Received: 5 May 2016 / Revised: 27 June 2016 / Accepted: 1 July 2016 / Published: 6 July 2016
PDF Full-text (8755 KB) | HTML Full-text | XML Full-text
Abstract
As the sound signal of a machine contains abundant information and is easy to measure, acoustic-based monitoring or diagnosis systems exhibit obvious superiority, especially in some extreme conditions. However, the sound directly collected from industrial field is always polluted. In order to eliminate
[...] Read more.
As the sound signal of a machine contains abundant information and is easy to measure, acoustic-based monitoring or diagnosis systems exhibit obvious superiority, especially in some extreme conditions. However, the sound directly collected from industrial field is always polluted. In order to eliminate noise components from machinery sound, a wavelet threshold denoising method optimized by an improved fruit fly optimization algorithm (WTD-IFOA) is proposed in this paper. The sound is firstly decomposed by wavelet transform (WT) to obtain coefficients of each level. As the wavelet threshold functions proposed by Donoho were discontinuous, many modified functions with continuous first and second order derivative were presented to realize adaptively denoising. However, the function-based denoising process is time-consuming and it is difficult to find optimal thresholds. To overcome these problems, fruit fly optimization algorithm (FOA) was introduced to the process. Moreover, to avoid falling into local extremes, an improved fly distance range obeying normal distribution was proposed on the basis of original FOA. Then, sound signal of a motor was recorded in a soundproof laboratory, and Gauss white noise was added into the signal. The simulation results illustrated the effectiveness and superiority of the proposed approach by a comprehensive comparison among five typical methods. Finally, an industrial application on a shearer in coal mining working face was performed to demonstrate the practical effect. Full article
(This article belongs to the Special Issue Audio Signal Processing) Printed Edition available
Open AccessArticle Eluding the Physical Constraints in a Nonlinear Interaction Sound Synthesis Model for Gesture Guidance
Appl. Sci. 2016, 6(7), 192; doi:10.3390/app6070192
Received: 15 March 2016 / Revised: 20 June 2016 / Accepted: 21 June 2016 / Published: 30 June 2016
Cited by 1 | PDF Full-text (1219 KB) | HTML Full-text | XML Full-text
Abstract
In this paper, a flexible control strategy for a synthesis model dedicated to nonlinear friction phenomena is proposed. This model enables to synthesize different types of sound sources, such as creaky doors, singing glasses, squeaking wet plates or bowed strings. Based on the
[...] Read more.
In this paper, a flexible control strategy for a synthesis model dedicated to nonlinear friction phenomena is proposed. This model enables to synthesize different types of sound sources, such as creaky doors, singing glasses, squeaking wet plates or bowed strings. Based on the perceptual stance that a sound is perceived as the result of an action on an object we propose a genuine source/filter synthesis approach that enables to elude physical constraints induced by the coupling between the interacting objects. This approach makes it possible to independently control and freely combine the action and the object. Different implementations and applications related to computer animation, gesture learning for rehabilitation and expert gestures are presented at the end of this paper. Full article
(This article belongs to the Special Issue Audio Signal Processing) Printed Edition available
Figures

Open AccessArticle Modal Processor Effects Inspired by Hammond Tonewheel Organs
Appl. Sci. 2016, 6(7), 185; doi:10.3390/app6070185
Received: 16 March 2016 / Accepted: 13 June 2016 / Published: 28 June 2016
PDF Full-text (3651 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
In this design study, we introduce a novel class of digital audio effects that extend the recently introduced modal processor approach to artificial reverberation and effects processing. These pitch and distortion processing effects mimic the design and sonics of a classic additive-synthesis-based electromechanical
[...] Read more.
In this design study, we introduce a novel class of digital audio effects that extend the recently introduced modal processor approach to artificial reverberation and effects processing. These pitch and distortion processing effects mimic the design and sonics of a classic additive-synthesis-based electromechanical musical instrument, the Hammond tonewheel organ. As a reverb effect, the modal processor simulates a room response as the sum of resonant filter responses. This architecture provides precise, interactive control over the frequency, damping, and complex amplitude of each mode. Into this framework, we introduce two types of processing effects: pitch effects inspired by the Hammond organ’s equal tempered “tonewheels”, “drawbar” tone controls, vibrato/chorus circuit, and distortion effects inspired by the pseudo-sinusoidal shape of its tonewheels and electromagnetic pickup distortion. The result is an effects processor that imprints the Hammond organ’s sonics onto any audio input. Full article
(This article belongs to the Special Issue Audio Signal Processing) Printed Edition available
Figures

Open AccessArticle Metrics for Polyphonic Sound Event Detection
Appl. Sci. 2016, 6(6), 162; doi:10.3390/app6060162
Received: 26 February 2016 / Revised: 22 April 2016 / Accepted: 18 May 2016 / Published: 25 May 2016
Cited by 4 | PDF Full-text (651 KB) | HTML Full-text | XML Full-text
Abstract
This paper presents and discusses various metrics proposed for evaluation of polyphonic sound event detection systems used in realistic situations where there are typically multiple sound sources active simultaneously. The system output in this case contains overlapping events, marked as multiple sounds detected
[...] Read more.
This paper presents and discusses various metrics proposed for evaluation of polyphonic sound event detection systems used in realistic situations where there are typically multiple sound sources active simultaneously. The system output in this case contains overlapping events, marked as multiple sounds detected as being active at the same time. The polyphonic system output requires a suitable procedure for evaluation against a reference. Metrics from neighboring fields such as speech recognition and speaker diarization can be used, but they need to be partially redefined to deal with the overlapping events. We present a review of the most common metrics in the field and the way they are adapted and interpreted in the polyphonic case. We discuss segment-based and event-based definitions of each metric and explain the consequences of instance-based and class-based averaging using a case study. In parallel, we provide a toolbox containing implementations of presented metrics. Full article
(This article belongs to the Special Issue Audio Signal Processing) Printed Edition available
Figures

Open AccessArticle Chord Recognition Based on Temporal Correlation Support Vector Machine
Appl. Sci. 2016, 6(5), 157; doi:10.3390/app6050157
Received: 11 February 2016 / Revised: 6 May 2016 / Accepted: 6 May 2016 / Published: 19 May 2016
PDF Full-text (1977 KB) | HTML Full-text | XML Full-text
Abstract
In this paper, we propose a method called temporal correlation support vector machine (TCSVM) for automatic major-minor chord recognition in audio music. We first use robust principal component analysis to separate the singing voice from the music to reduce the influence of the
[...] Read more.
In this paper, we propose a method called temporal correlation support vector machine (TCSVM) for automatic major-minor chord recognition in audio music. We first use robust principal component analysis to separate the singing voice from the music to reduce the influence of the singing voice and consider the temporal correlations of the chord features. Using robust principal component analysis, we expect the low-rank component of the spectrogram matrix to contain the musical accompaniment and the sparse component to contain the vocal signals. Then, we extract a new logarithmic pitch class profile (LPCP) feature called enhanced LPCP from the low-rank part. To exploit the temporal correlation among the LPCP features of chords, we propose an improved support vector machine algorithm called TCSVM. We perform this study using the MIREX’09 (Music Information Retrieval Evaluation eXchange) Audio Chord Estimation dataset. Furthermore, we conduct comprehensive experiments using different pitch class profile feature vectors to examine the performance of TCSVM. The results of our method are comparable to the state-of-the-art methods that entered the MIREX in 2013 and 2014 for the MIREX’09 Audio Chord Estimation task dataset. Full article
(This article belongs to the Special Issue Audio Signal Processing) Printed Edition available
Figures

Open AccessArticle Dynamical Systems for Audio Synthesis: Embracing Nonlinearities and Delay-Free Loops
Appl. Sci. 2016, 6(5), 134; doi:10.3390/app6050134
Received: 1 February 2016 / Revised: 19 April 2016 / Accepted: 27 April 2016 / Published: 10 May 2016
PDF Full-text (1211 KB) | HTML Full-text | XML Full-text
Abstract
Many systems featuring nonlinearities and delay-free loops are of interest in digital audio, particularly in virtual analog and physical modeling applications. Many of these systems can be posed as systems of implicitly related ordinary differential equations. Provided each equation in the network is
[...] Read more.
Many systems featuring nonlinearities and delay-free loops are of interest in digital audio, particularly in virtual analog and physical modeling applications. Many of these systems can be posed as systems of implicitly related ordinary differential equations. Provided each equation in the network is itself an explicit one, straightforward numerical solvers may be employed to compute the output of such systems without resorting to linearization or matrix inversions for every parameter change. This is a cheap and effective means for synthesizing delay-free, nonlinear systems without resorting to large lookup tables, iterative methods, or the insertion of fictitious delay and is therefor suitable for real-time applications. Several examples are shown to illustrate the efficacy of this approach. Full article
(This article belongs to the Special Issue Audio Signal Processing) Printed Edition available
Figures

Open AccessArticle Two-Polarisation Physical Model of Bowed Strings with Nonlinear Contact and Friction Forces, and Application to Gesture-Based Sound Synthesis
Appl. Sci. 2016, 6(5), 135; doi:10.3390/app6050135
Received: 15 March 2016 / Revised: 15 April 2016 / Accepted: 26 April 2016 / Published: 10 May 2016
Cited by 1 | PDF Full-text (975 KB) | HTML Full-text | XML Full-text
Abstract
Recent bowed string sound synthesis has relied on physical modelling techniques; the achievable realism and flexibility of gestural control are appealing, and the heavier computational cost becomes less significant as technology improves. A bowed string sound synthesis algorithm is designed, by simulating two-polarisation
[...] Read more.
Recent bowed string sound synthesis has relied on physical modelling techniques; the achievable realism and flexibility of gestural control are appealing, and the heavier computational cost becomes less significant as technology improves. A bowed string sound synthesis algorithm is designed, by simulating two-polarisation string motion, discretising the partial differential equations governing the string’s behaviour with the finite difference method. A globally energy balanced scheme is used, as a guarantee of numerical stability under highly nonlinear conditions. In one polarisation, a nonlinear contact model is used for the normal forces exerted by the dynamic bow hair, left hand fingers, and fingerboard. In the other polarisation, a force-velocity friction curve is used for the resulting tangential forces. The scheme update requires the solution of two nonlinear vector equations. The dynamic input parameters allow for simulating a wide range of gestures; some typical bow and left hand gestures are presented, along with synthetic sound and video demonstrations. Full article
(This article belongs to the Special Issue Audio Signal Processing) Printed Edition available
Figures

Open AccessArticle Psychoacoustic Approaches for Harmonic Music Mixing
Appl. Sci. 2016, 6(5), 123; doi:10.3390/app6050123
Received: 29 February 2016 / Revised: 20 April 2016 / Accepted: 25 April 2016 / Published: 3 May 2016
PDF Full-text (1876 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
The practice of harmonic mixing is a technique used by DJs for the beat-synchronous and harmonic alignment of two or more pieces of music. In this paper, we present a new harmonic mixing method based on psychoacoustic principles. Unlike existing commercial DJ-mixing software,
[...] Read more.
The practice of harmonic mixing is a technique used by DJs for the beat-synchronous and harmonic alignment of two or more pieces of music. In this paper, we present a new harmonic mixing method based on psychoacoustic principles. Unlike existing commercial DJ-mixing software, which determines compatible matches between songs via key estimation and harmonic relationships in the circle of fifths, our approach is built around the measurement of musical consonance. Given two tracks, we first extract a set of partials using a sinusoidal model and average this information over sixteenth note temporal frames. By scaling the partials of one track over ±6 semitones (in 1/8th semitone steps), we determine the pitch-shift that maximizes the consonance of the resulting mix. For this, we measure the consonance between all combinations of dyads within each frame according to psychoacoustic models of roughness and pitch commonality. To evaluate our method, we conducted a listening test where short musical excerpts were mixed together under different pitch shifts and rated according to consonance and pleasantness. Results demonstrate that sensory roughness computed from a small number of partials in each of the musical audio signals constitutes a reliable indicator to yield maximum perceptual consonance and pleasantness ratings by musically-trained listeners. Full article
(This article belongs to the Special Issue Audio Signal Processing) Printed Edition available
Figures

Open AccessArticle Full-Band Quasi-Harmonic Analysis and Synthesis of Musical Instrument Sounds with Adaptive Sinusoids
Appl. Sci. 2016, 6(5), 127; doi:10.3390/app6050127
Received: 16 February 2016 / Revised: 18 April 2016 / Accepted: 19 April 2016 / Published: 2 May 2016
PDF Full-text (1223 KB) | HTML Full-text | XML Full-text
Abstract
Sinusoids are widely used to represent the oscillatory modes of musical instrument sounds in both analysis and synthesis. However, musical instrument sounds feature transients and instrumental noise that are poorly modeled with quasi-stationary sinusoids, requiring spectral decomposition and further dedicated modeling. In this
[...] Read more.
Sinusoids are widely used to represent the oscillatory modes of musical instrument sounds in both analysis and synthesis. However, musical instrument sounds feature transients and instrumental noise that are poorly modeled with quasi-stationary sinusoids, requiring spectral decomposition and further dedicated modeling. In this work, we propose a full-band representation that fits sinusoids across the entire spectrum. We use the extended adaptive Quasi-Harmonic Model (eaQHM) to iteratively estimate amplitude- and frequency-modulated (AM–FM) sinusoids able to capture challenging features such as sharp attacks, transients, and instrumental noise. We use the signal-to-reconstruction-error ratio (SRER) as the objective measure for the analysis and synthesis of 89 musical instrument sounds from different instrumental families. We compare against quasi-stationary sinusoids and exponentially damped sinusoids. First, we show that the SRER increases with adaptation in eaQHM. Then, we show that full-band modeling with eaQHM captures partials at the higher frequency end of the spectrum that are neglected by spectral decomposition. Finally, we demonstrate that a frame size equal to three periods of the fundamental frequency results in the highest SRER with AM–FM sinusoids from eaQHM. A listening test confirmed that the musical instrument sounds resynthesized from full-band analysis with eaQHM are virtually perceptually indistinguishable from the original recordings. Full article
(This article belongs to the Special Issue Audio Signal Processing) Printed Edition available
Figures

Open AccessArticle Augmenting Environmental Interaction in Audio Feedback Systems
Appl. Sci. 2016, 6(5), 125; doi:10.3390/app6050125
Received: 2 March 2016 / Accepted: 18 April 2016 / Published: 28 April 2016
PDF Full-text (1119 KB) | HTML Full-text | XML Full-text
Abstract
Audio feedback is defined as a positive feedback of acoustic signals where an audio input and output form a loop, and may be utilized artistically. This article presents new context-based controls over audio feedback, leading to the generation of desired sonic behaviors by
[...] Read more.
Audio feedback is defined as a positive feedback of acoustic signals where an audio input and output form a loop, and may be utilized artistically. This article presents new context-based controls over audio feedback, leading to the generation of desired sonic behaviors by enriching the influence of existing acoustic information such as room response and ambient noise. This ecological approach to audio feedback emphasizes mutual sonic interaction between signal processing and the acoustic environment. Mappings from analyses of the received signal to signal-processing parameters are designed to emphasize this specificity as an aesthetic goal. Our feedback system presents four types of mappings: approximate analyses of room reverberation to tempo-scale characteristics, ambient noise to amplitude and two different approximations of resonances to timbre. These mappings are validated computationally and evaluated experimentally in different acoustic conditions. Full article
(This article belongs to the Special Issue Audio Signal Processing) Printed Edition available
Open AccessArticle Blockwise Frequency Domain Active Noise Controller Over Distributed Networks
Appl. Sci. 2016, 6(5), 124; doi:10.3390/app6050124
Received: 2 March 2016 / Accepted: 20 April 2016 / Published: 28 April 2016
PDF Full-text (11253 KB) | HTML Full-text | XML Full-text
Abstract
This work presents a practical active noise control system composed of distributed and collaborative acoustic nodes. To this end, experimental tests have been carried out in a listening room with acoustic nodes equipped with loudspeakers and microphones. The communication among the nodes is
[...] Read more.
This work presents a practical active noise control system composed of distributed and collaborative acoustic nodes. To this end, experimental tests have been carried out in a listening room with acoustic nodes equipped with loudspeakers and microphones. The communication among the nodes is simulated by software. We have considered a distributed algorithm based on the Filtered-x Least Mean Square (FxLMS) method that introduces collaboration between nodes following an incremental strategy. For improving the processing efficiency in practical scenarios where data acquisition systems work by blocks of samples, the frequency-domain partitioned block technique has been used. Implementation aspects such as computational complexity, processing time of the network and convergence of the algorithm have been analyzed. Experimental results show that, without constraints in the network communications, the proposed distributed algorithm achieves the same performance as the centralized version. The performance of the proposed algorithm over a network with a given communication delay is also included. Full article
(This article belongs to the Special Issue Audio Signal Processing) Printed Edition available
Figures

Open AccessArticle Influence of the Quality of Consumer Headphones in the Perception of Spatial Audio
Appl. Sci. 2016, 6(4), 117; doi:10.3390/app6040117
Received: 29 February 2016 / Revised: 4 April 2016 / Accepted: 12 April 2016 / Published: 22 April 2016
PDF Full-text (1309 KB) | HTML Full-text | XML Full-text
Abstract
High quality headphones can generate a realistic sound immersion reproducing binaural recordings. However, most people commonly use consumer headphones of inferior quality, as the ones provided with smartphones or music players. Factors, such as weak frequency response, distortion and the sensitivity disparity between
[...] Read more.
High quality headphones can generate a realistic sound immersion reproducing binaural recordings. However, most people commonly use consumer headphones of inferior quality, as the ones provided with smartphones or music players. Factors, such as weak frequency response, distortion and the sensitivity disparity between the left and right transducers could be some of the degrading factors. In this work, we are studying how these factors affect spatial perception. To this purpose, a series or perceptual tests have been carried out with a virtual headphone listening test methodology. The first experiment focuses on the analysis of how the disparity of sensitivity between the two transducers affects the final result. The second test studies the influence of the frequency response relating quality and spatial impression. The third test analyzes the effects of distortion using a Volterra kernels scheme for the simulation of the distortion using convolutions. Finally, the fourth tries to relate the quality of the frequency response with the accuracy on azimuth localization. The conclusions of the experiments are: the disparity between both transducers can affect the localization of the source; the perception of quality and spatial impression has a high correlation; the distortion produced by the range of headphones tested at a fixed level does not affect the perception of binaural sound; and that some frequency bands have an important role in the front-back confusions. Full article
(This article belongs to the Special Issue Audio Signal Processing) Printed Edition available
Open AccessArticle Semantically Controlled Adaptive Equalisation in Reduced Dimensionality Parameter Space
Appl. Sci. 2016, 6(4), 116; doi:10.3390/app6040116
Received: 24 February 2016 / Revised: 4 April 2016 / Accepted: 5 April 2016 / Published: 20 April 2016
Cited by 3 | PDF Full-text (1413 KB) | HTML Full-text | XML Full-text
Abstract
Equalisation is one of the most commonly-used tools in sound production, allowing users to control the gains of different frequency components in an audio signal. In this paper we present a model for mapping a set of equalisation parameters to a reduced dimensionality
[...] Read more.
Equalisation is one of the most commonly-used tools in sound production, allowing users to control the gains of different frequency components in an audio signal. In this paper we present a model for mapping a set of equalisation parameters to a reduced dimensionality space. The purpose of this approach is to allow a user to interact with the system in an intuitive way through both the reduction of the number of parameters and the elimination of technical knowledge required to creatively equalise the input audio. The proposed model represents 13 equaliser parameters on a two-dimensional plane, which is trained with data extracted from a semantic equalisation plug-in, using the timbral adjectives warm and bright. We also include a parameter weighting stage in order to scale the input parameters to spectral features of the audio signal, making the system adaptive. To maximise the efficacy of the model, we evaluate a variety of dimensionality reduction and regression techniques, assessing the performance of both parameter reconstruction and structural preservation in low-dimensional space. After selecting an appropriate model based on the evaluation criteria, we conclude by subjectively evaluating the system using listening tests. Full article
(This article belongs to the Special Issue Audio Signal Processing) Printed Edition available
Figures

Open AccessArticle Frequency-Dependent Amplitude Panning for the Stereophonic Image Enhancement of Audio Recorded Using Two Closely Spaced Microphones
Appl. Sci. 2016, 6(2), 39; doi:10.3390/app6020039
Received: 19 November 2015 / Revised: 19 January 2016 / Accepted: 21 January 2016 / Published: 1 February 2016
PDF Full-text (1736 KB) | HTML Full-text | XML Full-text
Abstract
In this paper, we propose a new frequency-dependent amplitude panning method for stereophonic image enhancement applied to a sound source recorded using two closely spaced omni-directional microphones. The ability to detect the direction of such a sound source is limited due to weak
[...] Read more.
In this paper, we propose a new frequency-dependent amplitude panning method for stereophonic image enhancement applied to a sound source recorded using two closely spaced omni-directional microphones. The ability to detect the direction of such a sound source is limited due to weak spatial information, such as the inter-channel time difference (ICTD) and inter-channel level difference (ICLD). Moreover, when sound sources are recorded in a convolutive or a real room environment, the detection of sources is affected by reverberation effects. Thus, the proposed method first tries to estimate the source direction depending on the frequency using azimuth-frequency analysis. Then, a frequency-dependent amplitude panning technique is proposed to enhance the stereophonic image by modifying the stereophonic law of sines. To demonstrate the effectiveness of the proposed method, we compare its performance with that of a conventional method based on the beamforming technique in terms of directivity pattern, perceived direction, and quality degradation under three different recording conditions (anechoic, convolutive, and real reverberant). The comparison shows that the proposed method gives us better stereophonic images in a stereo loudspeaker reproduction than the conventional method without any annoying effects. Full article
(This article belongs to the Special Issue Audio Signal Processing) Printed Edition available
Open AccessArticle Auralization of Accelerating Passenger Cars Using Spectral Modeling Synthesis
Appl. Sci. 2016, 6(1), 5; doi:10.3390/app6010005
Received: 28 September 2015 / Revised: 14 December 2015 / Accepted: 15 December 2015 / Published: 24 December 2015
Cited by 1 | PDF Full-text (4064 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
While the technique of auralization has been in use for quite some time in architectural acoustics, the application to environmental noise has been discovered only recently. With road traffic noise being the dominant noise source in most countries, particular interest lies in the
[...] Read more.
While the technique of auralization has been in use for quite some time in architectural acoustics, the application to environmental noise has been discovered only recently. With road traffic noise being the dominant noise source in most countries, particular interest lies in the synthesis of realistic pass-by sounds. This article describes an auralizator for pass-bys of accelerating passenger cars. The key element is a synthesizer that simulates the acoustical emission of different vehicles, driving on different surfaces, under different operating conditions. Audio signals for the emitted tire noise, as well as the propulsion noise are generated using spectral modeling synthesis, which gives complete control of the signal characteristics. The sound of propulsion is synthesized as a function of instantaneous engine speed, engine load and emission angle, whereas the sound of tires is created in dependence of vehicle speed and emission angle. The sound propagation is simulated by applying a series of time-variant digital filters. To obtain the corresponding steering parameters of the synthesizer, controlled experiments were carried out. The tire noise parameters were determined from coast-by measurements of passenger cars with idling engines. To obtain the propulsion noise parameters, measurements at different engine speeds, engine loads and emission angles were performed using a chassis dynamometer. The article shows how, from the measured data, the synthesizer parameters are calculated using audio signal processing. Full article
(This article belongs to the Special Issue Audio Signal Processing) Printed Edition available

Review

Jump to: Research

Open AccessReview A Review of Physical and Perceptual Feature Extraction Techniques for Speech, Music and Environmental Sounds
Appl. Sci. 2016, 6(5), 143; doi:10.3390/app6050143
Received: 15 March 2016 / Revised: 22 April 2016 / Accepted: 28 April 2016 / Published: 12 May 2016
Cited by 4 | PDF Full-text (789 KB) | HTML Full-text | XML Full-text
Abstract
Endowing machines with sensing capabilities similar to those of humans is a prevalent quest in engineering and computer science. In the pursuit of making computers sense their surroundings, a huge effort has been conducted to allow machines and computers to acquire, process, analyze
[...] Read more.
Endowing machines with sensing capabilities similar to those of humans is a prevalent quest in engineering and computer science. In the pursuit of making computers sense their surroundings, a huge effort has been conducted to allow machines and computers to acquire, process, analyze and understand their environment in a human-like way. Focusing on the sense of hearing, the ability of computers to sense their acoustic environment as humans do goes by the name of machine hearing. To achieve this ambitious aim, the representation of the audio signal is of paramount importance. In this paper, we present an up-to-date review of the most relevant audio feature extraction techniques developed to analyze the most usual audio signals: speech, music and environmental sounds. Besides revisiting classic approaches for completeness, we include the latest advances in the field based on new domains of analysis together with novel bio-inspired proposals. These approaches are described following a taxonomy that organizes them according to their physical or perceptual basis, being subsequently divided depending on the domain of computation (time, frequency, wavelet, image-based, cepstral, or other domains). The description of the approaches is accompanied with recent examples of their application to machine hearing related problems. Full article
(This article belongs to the Special Issue Audio Signal Processing) Printed Edition available
Figures

Open AccessReview All About Audio Equalization: Solutions and Frontiers
Appl. Sci. 2016, 6(5), 129; doi:10.3390/app6050129
Received: 15 March 2016 / Revised: 15 April 2016 / Accepted: 19 April 2016 / Published: 6 May 2016
Cited by 3 | PDF Full-text (1883 KB) | HTML Full-text | XML Full-text
Abstract
Audio equalization is a vast and active research area. The extent of research means that one often cannot identify the preferred technique for a particular problem. This review paper bridges those gaps, systemically providing a deep understanding of the problems and approaches in
[...] Read more.
Audio equalization is a vast and active research area. The extent of research means that one often cannot identify the preferred technique for a particular problem. This review paper bridges those gaps, systemically providing a deep understanding of the problems and approaches in audio equalization, their relative merits and applications. Digital signal processing techniques for modifying the spectral balance in audio signals and applications of these techniques are reviewed, ranging from classic equalizers to emerging designs based on new advances in signal processing and machine learning. Emphasis is placed on putting the range of approaches within a common mathematical and conceptual framework. The application areas discussed herein are diverse, and include well-defined, solvable problems of filter design subject to constraints, as well as newly emerging challenges that touch on problems in semantics, perception and human computer interaction. Case studies are given in order to illustrate key concepts and how they are applied in practice. We also recommend preferred signal processing approaches for important audio equalization problems. Finally, we discuss current challenges and the uncharted frontiers in this field. The source code for methods discussed in this paper is made available at https://code.soundsoftware.ac.uk/projects/allaboutaudioeq. Full article
(This article belongs to the Special Issue Audio Signal Processing) Printed Edition available
Figures

Open AccessReview A Review of Time-Scale Modification of Music Signals
Appl. Sci. 2016, 6(2), 57; doi:10.3390/app6020057
Received: 22 December 2015 / Revised: 22 January 2016 / Accepted: 25 January 2016 / Published: 18 February 2016
Cited by 1 | PDF Full-text (1618 KB) | HTML Full-text | XML Full-text
Abstract
Time-scale modification (TSM) is the task of speeding up or slowing down an audio signal’s playback speed without changing its pitch. In digital music production, TSM has become an indispensable tool, which is nowadays integrated in a wide range of music production software.
[...] Read more.
Time-scale modification (TSM) is the task of speeding up or slowing down an audio signal’s playback speed without changing its pitch. In digital music production, TSM has become an indispensable tool, which is nowadays integrated in a wide range of music production software. Music signals are diverse—they comprise harmonic, percussive, and transient components, among others. Because of this wide range of acoustic and musical characteristics, there is no single TSM method that can cope with all kinds of audio signals equally well. Our main objective is to foster a better understanding of the capabilities and limitations of TSM procedures. To this end, we review fundamental TSM methods, discuss typical challenges, and indicate potential solutions that combine different strategies. In particular, we discuss a fusion approach that involves recent techniques for harmonic-percussive separation along with time-domain and frequency-domain TSM procedures. Full article
(This article belongs to the Special Issue Audio Signal Processing) Printed Edition available
Figures

Journal Contact

MDPI AG
Applied Sciences Editorial Office
St. Alban-Anlage 66, 4052 Basel, Switzerland
E-Mail: 
Tel. +41 61 683 77 34
Fax: +41 61 302 89 18
Editorial Board
Contact Details Submit to Applied Sciences Edit a special issue Review for Applied Sciences
loading...
Back to Top