Applied Sciences

Research

Jump to: Review

22 pages, 714 KiB

Open AccessArticle

Sinusoidal Parameter Estimation Using Quadratic Interpolation around Power-Scaled Magnitude Spectrum Peaks

by Kurt James Werner and François Georges Germain

Appl. Sci. 2016, 6(10), 306; https://doi.org/10.3390/app6100306 - 21 Oct 2016

Cited by 12 | Viewed by 8480

Abstract

The magnitude of the Discrete Fourier Transform (DFT) of a discrete-time signal has a limited frequency definition. Quadratic interpolation over the three DFT samples surrounding magnitude peaks improves the estimation of parameters (frequency and amplitude) of resolved sinusoids beyond that limit. Interpolating on [...] Read more.

The magnitude of the Discrete Fourier Transform (DFT) of a discrete-time signal has a limited frequency definition. Quadratic interpolation over the three DFT samples surrounding magnitude peaks improves the estimation of parameters (frequency and amplitude) of resolved sinusoids beyond that limit. Interpolating on a rescaled magnitude spectrum using a logarithmic scale has been shown to improve those estimates. In this article, we show how to heuristically tune a power scaling parameter to outperform linear and logarithmic scaling at an equivalent computational cost. Although this power scaling factor is computed heuristically rather than analytically, it is shown to depend in a structured way on window parameters. Invariance properties of this family of estimators are studied and the existence of a bias due to noise is shown. Comparing to two state-of-the-art estimators, we show that an optimized power scaling has a lower systematic bias and lower mean-squared-error in noisy conditions for ten out of twelve common windowing functions. Full article

(This article belongs to the Special Issue Audio Signal Processing)

► Show Figures

Figure 1

26 pages, 2739 KiB

Open AccessArticle

Passive Guaranteed Simulation of Analog Audio Circuits: A Port-Hamiltonian Approach

by Antoine Falaize and Thomas Hélie

Appl. Sci. 2016, 6(10), 273; https://doi.org/10.3390/app6100273 - 24 Sep 2016

Cited by 60 | Viewed by 8388

Abstract

We present a method that generates passive-guaranteed stable simulations of analog audio circuits from electronic schematics for real-time issues. On one hand, this method is based on a continuous-time power-balanced state-space representation structured into its energy-storing parts, dissipative parts, and external sources. On [...] Read more.

We present a method that generates passive-guaranteed stable simulations of analog audio circuits from electronic schematics for real-time issues. On one hand, this method is based on a continuous-time power-balanced state-space representation structured into its energy-storing parts, dissipative parts, and external sources. On the other hand, a numerical scheme is especially designed to preserve this structure and the power balance. These state-space structures define the class of port-Hamiltonian systems. The derivation of this structured system associated with the electronic circuit is achieved by an automated analysis of the interconnection network combined with a dictionary of models for each elementary component. The numerical scheme is based on the combination of finite differences applied on the state (with respect to the time variable) and on the total energy (with respect to the state). This combination provides a discrete-time version of the power balance. This set of algorithms is valid for both the linear and nonlinear case. Finally, three applications of increasing complexities are given: a diode clipper, a common-emitter bipolar-junction transistor amplifier, and a wah pedal. The results are compared to offline simulations obtained from a popular circuit simulator. Full article

(This article belongs to the Special Issue Audio Signal Processing)

► Show Figures

Figure 1

16 pages, 8755 KiB

Open AccessArticle

Adaptive Wavelet Threshold Denoising Method for Machinery Sound Based on Improved Fruit Fly Optimization Algorithm

by Jing Xu, Zhongbin Wang, Chao Tan, Lei Si, Lin Zhang and Xinhua Liu

Appl. Sci. 2016, 6(7), 199; https://doi.org/10.3390/app6070199 - 6 Jul 2016

Cited by 35 | Viewed by 8402

Abstract

As the sound signal of a machine contains abundant information and is easy to measure, acoustic-based monitoring or diagnosis systems exhibit obvious superiority, especially in some extreme conditions. However, the sound directly collected from industrial field is always polluted. In order to eliminate [...] Read more.

As the sound signal of a machine contains abundant information and is easy to measure, acoustic-based monitoring or diagnosis systems exhibit obvious superiority, especially in some extreme conditions. However, the sound directly collected from industrial field is always polluted. In order to eliminate noise components from machinery sound, a wavelet threshold denoising method optimized by an improved fruit fly optimization algorithm (WTD-IFOA) is proposed in this paper. The sound is firstly decomposed by wavelet transform (WT) to obtain coefficients of each level. As the wavelet threshold functions proposed by Donoho were discontinuous, many modified functions with continuous first and second order derivative were presented to realize adaptively denoising. However, the function-based denoising process is time-consuming and it is difficult to find optimal thresholds. To overcome these problems, fruit fly optimization algorithm (FOA) was introduced to the process. Moreover, to avoid falling into local extremes, an improved fly distance range obeying normal distribution was proposed on the basis of original FOA. Then, sound signal of a motor was recorded in a soundproof laboratory, and Gauss white noise was added into the signal. The simulation results illustrated the effectiveness and superiority of the proposed approach by a comprehensive comparison among five typical methods. Finally, an industrial application on a shearer in coal mining working face was performed to demonstrate the practical effect. Full article

(This article belongs to the Special Issue Audio Signal Processing)

► Show Figures

Figure 1

14 pages, 1219 KiB

Open AccessArticle

Eluding the Physical Constraints in a Nonlinear Interaction Sound Synthesis Model for Gesture Guidance

by Etienne Thoret, Mitsuko Aramaki, Charles Gondre, Sølvi Ystad and Richard Kronland-Martinet

Appl. Sci. 2016, 6(7), 192; https://doi.org/10.3390/app6070192 - 30 Jun 2016

Cited by 5 | Viewed by 5624

Abstract

In this paper, a flexible control strategy for a synthesis model dedicated to nonlinear friction phenomena is proposed. This model enables to synthesize different types of sound sources, such as creaky doors, singing glasses, squeaking wet plates or bowed strings. Based on the [...] Read more.

In this paper, a flexible control strategy for a synthesis model dedicated to nonlinear friction phenomena is proposed. This model enables to synthesize different types of sound sources, such as creaky doors, singing glasses, squeaking wet plates or bowed strings. Based on the perceptual stance that a sound is perceived as the result of an action on an object we propose a genuine source/filter synthesis approach that enables to elude physical constraints induced by the coupling between the interacting objects. This approach makes it possible to independently control and freely combine the action and the object. Different implementations and applications related to computer animation, gesture learning for rehabilitation and expert gestures are presented at the end of this paper. Full article

(This article belongs to the Special Issue Audio Signal Processing)

► Show Figures

Graphical abstract

18 pages, 3651 KiB

Open AccessArticle

Modal Processor Effects Inspired by Hammond Tonewheel Organs

by Kurt James Werner and Jonathan S. Abel

Appl. Sci. 2016, 6(7), 185; https://doi.org/10.3390/app6070185 - 28 Jun 2016

Cited by 5 | Viewed by 8997

Abstract

In this design study, we introduce a novel class of digital audio effects that extend the recently introduced modal processor approach to artificial reverberation and effects processing. These pitch and distortion processing effects mimic the design and sonics of a classic additive-synthesis-based electromechanical [...] Read more.

In this design study, we introduce a novel class of digital audio effects that extend the recently introduced modal processor approach to artificial reverberation and effects processing. These pitch and distortion processing effects mimic the design and sonics of a classic additive-synthesis-based electromechanical musical instrument, the Hammond tonewheel organ. As a reverb effect, the modal processor simulates a room response as the sum of resonant filter responses. This architecture provides precise, interactive control over the frequency, damping, and complex amplitude of each mode. Into this framework, we introduce two types of processing effects: pitch effects inspired by the Hammond organ’s equal tempered “tonewheels”, “drawbar” tone controls, vibrato/chorus circuit, and distortion effects inspired by the pseudo-sinusoidal shape of its tonewheels and electromagnetic pickup distortion. The result is an effects processor that imprints the Hammond organ’s sonics onto any audio input. Full article

(This article belongs to the Special Issue Audio Signal Processing)

► Show Figures

Graphical abstract

17 pages, 651 KiB

Open AccessArticle

Metrics for Polyphonic Sound Event Detection

by Annamaria Mesaros, Toni Heittola and Tuomas Virtanen

Appl. Sci. 2016, 6(6), 162; https://doi.org/10.3390/app6060162 - 25 May 2016

Cited by 456 | Viewed by 24220

Abstract

This paper presents and discusses various metrics proposed for evaluation of polyphonic sound event detection systems used in realistic situations where there are typically multiple sound sources active simultaneously. The system output in this case contains overlapping events, marked as multiple sounds detected [...] Read more.

This paper presents and discusses various metrics proposed for evaluation of polyphonic sound event detection systems used in realistic situations where there are typically multiple sound sources active simultaneously. The system output in this case contains overlapping events, marked as multiple sounds detected as being active at the same time. The polyphonic system output requires a suitable procedure for evaluation against a reference. Metrics from neighboring fields such as speech recognition and speaker diarization can be used, but they need to be partially redefined to deal with the overlapping events. We present a review of the most common metrics in the field and the way they are adapted and interpreted in the polyphonic case. We discuss segment-based and event-based definitions of each metric and explain the consequences of instance-based and class-based averaging using a case study. In parallel, we provide a toolbox containing implementations of presented metrics. Full article

(This article belongs to the Special Issue Audio Signal Processing)

► Show Figures

Graphical abstract

14 pages, 1977 KiB

Open AccessArticle

Chord Recognition Based on Temporal Correlation Support Vector Machine

by Zhongyang Rao, Xin Guan and Jianfu Teng

Appl. Sci. 2016, 6(5), 157; https://doi.org/10.3390/app6050157 - 19 May 2016

Cited by 8 | Viewed by 8306

Abstract

In this paper, we propose a method called temporal correlation support vector machine (TCSVM) for automatic major-minor chord recognition in audio music. We first use robust principal component analysis to separate the singing voice from the music to reduce the influence of the [...] Read more.

In this paper, we propose a method called temporal correlation support vector machine (TCSVM) for automatic major-minor chord recognition in audio music. We first use robust principal component analysis to separate the singing voice from the music to reduce the influence of the singing voice and consider the temporal correlations of the chord features. Using robust principal component analysis, we expect the low-rank component of the spectrogram matrix to contain the musical accompaniment and the sparse component to contain the vocal signals. Then, we extract a new logarithmic pitch class profile (LPCP) feature called enhanced LPCP from the low-rank part. To exploit the temporal correlation among the LPCP features of chords, we propose an improved support vector machine algorithm called TCSVM. We perform this study using the MIREX’09 (Music Information Retrieval Evaluation eXchange) Audio Chord Estimation dataset. Furthermore, we conduct comprehensive experiments using different pitch class profile feature vectors to examine the performance of TCSVM. The results of our method are comparable to the state-of-the-art methods that entered the MIREX in 2013 and 2014 for the MIREX’09 Audio Chord Estimation task dataset. Full article

(This article belongs to the Special Issue Audio Signal Processing)

► Show Figures

Graphical abstract

32 pages, 975 KiB

Open AccessArticle

Two-Polarisation Physical Model of Bowed Strings with Nonlinear Contact and Friction Forces, and Application to Gesture-Based Sound Synthesis

by Charlotte Desvages and Stefan Bilbao

Appl. Sci. 2016, 6(5), 135; https://doi.org/10.3390/app6050135 - 10 May 2016

Cited by 20 | Viewed by 10972

Abstract

Recent bowed string sound synthesis has relied on physical modelling techniques; the achievable realism and flexibility of gestural control are appealing, and the heavier computational cost becomes less significant as technology improves. A bowed string sound synthesis algorithm is designed, by simulating two-polarisation [...] Read more.

Recent bowed string sound synthesis has relied on physical modelling techniques; the achievable realism and flexibility of gestural control are appealing, and the heavier computational cost becomes less significant as technology improves. A bowed string sound synthesis algorithm is designed, by simulating two-polarisation string motion, discretising the partial differential equations governing the string’s behaviour with the finite difference method. A globally energy balanced scheme is used, as a guarantee of numerical stability under highly nonlinear conditions. In one polarisation, a nonlinear contact model is used for the normal forces exerted by the dynamic bow hair, left hand fingers, and fingerboard. In the other polarisation, a force-velocity friction curve is used for the resulting tangential forces. The scheme update requires the solution of two nonlinear vector equations. The dynamic input parameters allow for simulating a wide range of gestures; some typical bow and left hand gestures are presented, along with synthetic sound and video demonstrations. Full article

(This article belongs to the Special Issue Audio Signal Processing)

► Show Figures

Graphical abstract

12 pages, 1211 KiB

Open AccessArticle

Dynamical Systems for Audio Synthesis: Embracing Nonlinearities and Delay-Free Loops

by David Medine

Appl. Sci. 2016, 6(5), 134; https://doi.org/10.3390/app6050134 - 10 May 2016

Cited by 7 | Viewed by 6001

Abstract

Many systems featuring nonlinearities and delay-free loops are of interest in digital audio, particularly in virtual analog and physical modeling applications. Many of these systems can be posed as systems of implicitly related ordinary differential equations. Provided each equation in the network is [...] Read more.

Many systems featuring nonlinearities and delay-free loops are of interest in digital audio, particularly in virtual analog and physical modeling applications. Many of these systems can be posed as systems of implicitly related ordinary differential equations. Provided each equation in the network is itself an explicit one, straightforward numerical solvers may be employed to compute the output of such systems without resorting to linearization or matrix inversions for every parameter change. This is a cheap and effective means for synthesizing delay-free, nonlinear systems without resorting to large lookup tables, iterative methods, or the insertion of fictitious delay and is therefor suitable for real-time applications. Several examples are shown to illustrate the efficacy of this approach. Full article

(This article belongs to the Special Issue Audio Signal Processing)

► Show Figures

Figure 1

21 pages, 1876 KiB

Open AccessArticle

Psychoacoustic Approaches for Harmonic Music Mixing

by Roman B. Gebhardt, Matthew E. P. Davies and Bernhard U. Seeber

Appl. Sci. 2016, 6(5), 123; https://doi.org/10.3390/app6050123 - 3 May 2016

Cited by 12 | Viewed by 8472

Abstract

The practice of harmonic mixing is a technique used by DJs for the beat-synchronous and harmonic alignment of two or more pieces of music. In this paper, we present a new harmonic mixing method based on psychoacoustic principles. Unlike existing commercial DJ-mixing software, [...] Read more.

The practice of harmonic mixing is a technique used by DJs for the beat-synchronous and harmonic alignment of two or more pieces of music. In this paper, we present a new harmonic mixing method based on psychoacoustic principles. Unlike existing commercial DJ-mixing software, which determines compatible matches between songs via key estimation and harmonic relationships in the circle of fifths, our approach is built around the measurement of musical consonance. Given two tracks, we first extract a set of partials using a sinusoidal model and average this information over sixteenth note temporal frames. By scaling the partials of one track over ±6 semitones (in 1/8th semitone steps), we determine the pitch-shift that maximizes the consonance of the resulting mix. For this, we measure the consonance between all combinations of dyads within each frame according to psychoacoustic models of roughness and pitch commonality. To evaluate our method, we conducted a listening test where short musical excerpts were mixed together under different pitch shifts and rated according to consonance and pleasantness. Results demonstrate that sensory roughness computed from a small number of partials in each of the musical audio signals constitutes a reliable indicator to yield maximum perceptual consonance and pleasantness ratings by musically-trained listeners. Full article

(This article belongs to the Special Issue Audio Signal Processing)

► Show Figures

Graphical abstract

20 pages, 1223 KiB

Open AccessArticle

Full-Band Quasi-Harmonic Analysis and Synthesis of Musical Instrument Sounds with Adaptive Sinusoids

by Marcelo Caetano, George P. Kafentzis, Athanasios Mouchtaris and Yannis Stylianou

Appl. Sci. 2016, 6(5), 127; https://doi.org/10.3390/app6050127 - 2 May 2016

Cited by 9 | Viewed by 7957

Abstract

Sinusoids are widely used to represent the oscillatory modes of musical instrument sounds in both analysis and synthesis. However, musical instrument sounds feature transients and instrumental noise that are poorly modeled with quasi-stationary sinusoids, requiring spectral decomposition and further dedicated modeling. In this [...] Read more.

Sinusoids are widely used to represent the oscillatory modes of musical instrument sounds in both analysis and synthesis. However, musical instrument sounds feature transients and instrumental noise that are poorly modeled with quasi-stationary sinusoids, requiring spectral decomposition and further dedicated modeling. In this work, we propose a full-band representation that fits sinusoids across the entire spectrum. We use the extended adaptive Quasi-Harmonic Model (eaQHM) to iteratively estimate amplitude- and frequency-modulated (AM–FM) sinusoids able to capture challenging features such as sharp attacks, transients, and instrumental noise. We use the signal-to-reconstruction-error ratio (SRER) as the objective measure for the analysis and synthesis of 89 musical instrument sounds from different instrumental families. We compare against quasi-stationary sinusoids and exponentially damped sinusoids. First, we show that the SRER increases with adaptation in eaQHM. Then, we show that full-band modeling with eaQHM captures partials at the higher frequency end of the spectrum that are neglected by spectral decomposition. Finally, we demonstrate that a frame size equal to three periods of the fundamental frequency results in the highest SRER with AM–FM sinusoids from eaQHM. A listening test confirmed that the musical instrument sounds resynthesized from full-band analysis with eaQHM are virtually perceptually indistinguishable from the original recordings. Full article

(This article belongs to the Special Issue Audio Signal Processing)

► Show Figures

Graphical abstract

15 pages, 1119 KiB

Open AccessArticle

Augmenting Environmental Interaction in Audio Feedback Systems

by Seunghun Kim, Graham Wakefield and Juhan Nam

Appl. Sci. 2016, 6(5), 125; https://doi.org/10.3390/app6050125 - 28 Apr 2016

Cited by 3 | Viewed by 6393

Abstract

Audio feedback is defined as a positive feedback of acoustic signals where an audio input and output form a loop, and may be utilized artistically. This article presents new context-based controls over audio feedback, leading to the generation of desired sonic behaviors by [...] Read more.

Audio feedback is defined as a positive feedback of acoustic signals where an audio input and output form a loop, and may be utilized artistically. This article presents new context-based controls over audio feedback, leading to the generation of desired sonic behaviors by enriching the influence of existing acoustic information such as room response and ambient noise. This ecological approach to audio feedback emphasizes mutual sonic interaction between signal processing and the acoustic environment. Mappings from analyses of the received signal to signal-processing parameters are designed to emphasize this specificity as an aesthetic goal. Our feedback system presents four types of mappings: approximate analyses of room reverberation to tempo-scale characteristics, ambient noise to amplitude and two different approximations of resonances to timbre. These mappings are validated computationally and evaluated experimentally in different acoustic conditions. Full article

(This article belongs to the Special Issue Audio Signal Processing)

► Show Figures

Figure 1

17 pages, 11253 KiB

Open AccessArticle

Blockwise Frequency Domain Active Noise Controller Over Distributed Networks

by Christian Antoñanzas, Miguel Ferrer, Maria De Diego and Alberto Gonzalez

Appl. Sci. 2016, 6(5), 124; https://doi.org/10.3390/app6050124 - 28 Apr 2016

Cited by 14 | Viewed by 4982

Abstract

This work presents a practical active noise control system composed of distributed and collaborative acoustic nodes. To this end, experimental tests have been carried out in a listening room with acoustic nodes equipped with loudspeakers and microphones. The communication among the nodes is [...] Read more.

This work presents a practical active noise control system composed of distributed and collaborative acoustic nodes. To this end, experimental tests have been carried out in a listening room with acoustic nodes equipped with loudspeakers and microphones. The communication among the nodes is simulated by software. We have considered a distributed algorithm based on the Filtered-x Least Mean Square (FxLMS) method that introduces collaboration between nodes following an incremental strategy. For improving the processing efficiency in practical scenarios where data acquisition systems work by blocks of samples, the frequency-domain partitioned block technique has been used. Implementation aspects such as computational complexity, processing time of the network and convergence of the algorithm have been analyzed. Experimental results show that, without constraints in the network communications, the proposed distributed algorithm achieves the same performance as the centralized version. The performance of the proposed algorithm over a network with a given communication delay is also included. Full article

(This article belongs to the Special Issue Audio Signal Processing)

► Show Figures

Graphical abstract

18 pages, 1309 KiB

Open AccessArticle

Influence of the Quality of Consumer Headphones in the Perception of Spatial Audio

by Pablo Gutierrez-Parera and Jose J. Lopez

Appl. Sci. 2016, 6(4), 117; https://doi.org/10.3390/app6040117 - 22 Apr 2016

Cited by 6 | Viewed by 9866

Abstract

High quality headphones can generate a realistic sound immersion reproducing binaural recordings. However, most people commonly use consumer headphones of inferior quality, as the ones provided with smartphones or music players. Factors, such as weak frequency response, distortion and the sensitivity disparity between [...] Read more.

High quality headphones can generate a realistic sound immersion reproducing binaural recordings. However, most people commonly use consumer headphones of inferior quality, as the ones provided with smartphones or music players. Factors, such as weak frequency response, distortion and the sensitivity disparity between the left and right transducers could be some of the degrading factors. In this work, we are studying how these factors affect spatial perception. To this purpose, a series or perceptual tests have been carried out with a virtual headphone listening test methodology. The first experiment focuses on the analysis of how the disparity of sensitivity between the two transducers affects the final result. The second test studies the influence of the frequency response relating quality and spatial impression. The third test analyzes the effects of distortion using a Volterra kernels scheme for the simulation of the distortion using convolutions. Finally, the fourth tries to relate the quality of the frequency response with the accuracy on azimuth localization. The conclusions of the experiments are: the disparity between both transducers can affect the localization of the source; the perception of quality and spatial impression has a high correlation; the distortion produced by the range of headphones tested at a fixed level does not affect the perception of binaural sound; and that some frequency bands have an important role in the front-back confusions. Full article

(This article belongs to the Special Issue Audio Signal Processing)

► Show Figures

Figure 1

19 pages, 1413 KiB

Open AccessArticle

Semantically Controlled Adaptive Equalisation in Reduced Dimensionality Parameter Space

by Spyridon Stasis, Ryan Stables and Jason Hockman

Appl. Sci. 2016, 6(4), 116; https://doi.org/10.3390/app6040116 - 20 Apr 2016

Cited by 19 | Viewed by 6841

Abstract

Equalisation is one of the most commonly-used tools in sound production, allowing users to control the gains of different frequency components in an audio signal. In this paper we present a model for mapping a set of equalisation parameters to a reduced dimensionality [...] Read more.

Equalisation is one of the most commonly-used tools in sound production, allowing users to control the gains of different frequency components in an audio signal. In this paper we present a model for mapping a set of equalisation parameters to a reduced dimensionality space. The purpose of this approach is to allow a user to interact with the system in an intuitive way through both the reduction of the number of parameters and the elimination of technical knowledge required to creatively equalise the input audio. The proposed model represents 13 equaliser parameters on a two-dimensional plane, which is trained with data extracted from a semantic equalisation plug-in, using the timbral adjectives warm and bright. We also include a parameter weighting stage in order to scale the input parameters to spectral features of the audio signal, making the system adaptive. To maximise the efficacy of the model, we evaluate a variety of dimensionality reduction and regression techniques, assessing the performance of both parameter reconstruction and structural preservation in low-dimensional space. After selecting an appropriate model based on the evaluation criteria, we conclude by subjectively evaluating the system using listening tests. Full article

(This article belongs to the Special Issue Audio Signal Processing)

► Show Figures

Figure 1

13 pages, 1736 KiB

Open AccessArticle

Frequency-Dependent Amplitude Panning for the Stereophonic Image Enhancement of Audio Recorded Using Two Closely Spaced Microphones

by Chan Jun Chun and Hong Kook Kim

Appl. Sci. 2016, 6(2), 39; https://doi.org/10.3390/app6020039 - 1 Feb 2016

Cited by 5 | Viewed by 5881

Abstract

In this paper, we propose a new frequency-dependent amplitude panning method for stereophonic image enhancement applied to a sound source recorded using two closely spaced omni-directional microphones. The ability to detect the direction of such a sound source is limited due to weak [...] Read more.

In this paper, we propose a new frequency-dependent amplitude panning method for stereophonic image enhancement applied to a sound source recorded using two closely spaced omni-directional microphones. The ability to detect the direction of such a sound source is limited due to weak spatial information, such as the inter-channel time difference (ICTD) and inter-channel level difference (ICLD). Moreover, when sound sources are recorded in a convolutive or a real room environment, the detection of sources is affected by reverberation effects. Thus, the proposed method first tries to estimate the source direction depending on the frequency using azimuth-frequency analysis. Then, a frequency-dependent amplitude panning technique is proposed to enhance the stereophonic image by modifying the stereophonic law of sines. To demonstrate the effectiveness of the proposed method, we compare its performance with that of a conventional method based on the beamforming technique in terms of directivity pattern, perceived direction, and quality degradation under three different recording conditions (anechoic, convolutive, and real reverberant). The comparison shows that the proposed method gives us better stereophonic images in a stereo loudspeaker reproduction than the conventional method without any annoying effects. Full article

(This article belongs to the Special Issue Audio Signal Processing)

► Show Figures

Figure 1

27 pages, 4064 KiB

Open AccessArticle

Auralization of Accelerating Passenger Cars Using Spectral Modeling Synthesis

by Reto Pieren, Thomas Bütler and Kurt Heutschi

Appl. Sci. 2016, 6(1), 5; https://doi.org/10.3390/app6010005 - 24 Dec 2015

Cited by 48 | Viewed by 9784

Abstract

While the technique of auralization has been in use for quite some time in architectural acoustics, the application to environmental noise has been discovered only recently. With road traffic noise being the dominant noise source in most countries, particular interest lies in the [...] Read more.

While the technique of auralization has been in use for quite some time in architectural acoustics, the application to environmental noise has been discovered only recently. With road traffic noise being the dominant noise source in most countries, particular interest lies in the synthesis of realistic pass-by sounds. This article describes an auralizator for pass-bys of accelerating passenger cars. The key element is a synthesizer that simulates the acoustical emission of different vehicles, driving on different surfaces, under different operating conditions. Audio signals for the emitted tire noise, as well as the propulsion noise are generated using spectral modeling synthesis, which gives complete control of the signal characteristics. The sound of propulsion is synthesized as a function of instantaneous engine speed, engine load and emission angle, whereas the sound of tires is created in dependence of vehicle speed and emission angle. The sound propagation is simulated by applying a series of time-variant digital filters. To obtain the corresponding steering parameters of the synthesizer, controlled experiments were carried out. The tire noise parameters were determined from coast-by measurements of passenger cars with idling engines. To obtain the propulsion noise parameters, measurements at different engine speeds, engine loads and emission angles were performed using a chassis dynamometer. The article shows how, from the measured data, the synthesizer parameters are calculated using audio signal processing. Full article

(This article belongs to the Special Issue Audio Signal Processing)

► Show Figures

Figure 1

Review

Jump to: Research

44 pages, 789 KiB

Open AccessReview

A Review of Physical and Perceptual Feature Extraction Techniques for Speech, Music and Environmental Sounds

by Francesc Alías, Joan Claudi Socoró and Xavier Sevillano

Appl. Sci. 2016, 6(5), 143; https://doi.org/10.3390/app6050143 - 12 May 2016

Cited by 191 | Viewed by 22242

Abstract

Endowing machines with sensing capabilities similar to those of humans is a prevalent quest in engineering and computer science. In the pursuit of making computers sense their surroundings, a huge effort has been conducted to allow machines and computers to acquire, process, analyze [...] Read more.

Endowing machines with sensing capabilities similar to those of humans is a prevalent quest in engineering and computer science. In the pursuit of making computers sense their surroundings, a huge effort has been conducted to allow machines and computers to acquire, process, analyze and understand their environment in a human-like way. Focusing on the sense of hearing, the ability of computers to sense their acoustic environment as humans do goes by the name of machine hearing. To achieve this ambitious aim, the representation of the audio signal is of paramount importance. In this paper, we present an up-to-date review of the most relevant audio feature extraction techniques developed to analyze the most usual audio signals: speech, music and environmental sounds. Besides revisiting classic approaches for completeness, we include the latest advances in the field based on new domains of analysis together with novel bio-inspired proposals. These approaches are described following a taxonomy that organizes them according to their physical or perceptual basis, being subsequently divided depending on the domain of computation (time, frequency, wavelet, image-based, cepstral, or other domains). The description of the approaches is accompanied with recent examples of their application to machine hearing related problems. Full article

(This article belongs to the Special Issue Audio Signal Processing)

► Show Figures

Graphical abstract

46 pages, 1883 KiB

Open AccessReview

All About Audio Equalization: Solutions and Frontiers

by Vesa Välimäki and Joshua D. Reiss

Appl. Sci. 2016, 6(5), 129; https://doi.org/10.3390/app6050129 - 6 May 2016

Cited by 127 | Viewed by 38247

Abstract

Audio equalization is a vast and active research area. The extent of research means that one often cannot identify the preferred technique for a particular problem. This review paper bridges those gaps, systemically providing a deep understanding of the problems and approaches in [...] Read more.

Audio equalization is a vast and active research area. The extent of research means that one often cannot identify the preferred technique for a particular problem. This review paper bridges those gaps, systemically providing a deep understanding of the problems and approaches in audio equalization, their relative merits and applications. Digital signal processing techniques for modifying the spectral balance in audio signals and applications of these techniques are reviewed, ranging from classic equalizers to emerging designs based on new advances in signal processing and machine learning. Emphasis is placed on putting the range of approaches within a common mathematical and conceptual framework. The application areas discussed herein are diverse, and include well-defined, solvable problems of filter design subject to constraints, as well as newly emerging challenges that touch on problems in semantics, perception and human computer interaction. Case studies are given in order to illustrate key concepts and how they are applied in practice. We also recommend preferred signal processing approaches for important audio equalization problems. Finally, we discuss current challenges and the uncharted frontiers in this field. The source code for methods discussed in this paper is made available at https://code.soundsoftware.ac.uk/projects/allaboutaudioeq. Full article

(This article belongs to the Special Issue Audio Signal Processing)

► Show Figures

Graphical abstract

26 pages, 1618 KiB

Open AccessReview

A Review of Time-Scale Modification of Music Signals

by Jonathan Driedger and Meinard Müller

Appl. Sci. 2016, 6(2), 57; https://doi.org/10.3390/app6020057 - 18 Feb 2016

Cited by 70 | Viewed by 29916

Abstract

Time-scale modification (TSM) is the task of speeding up or slowing down an audio signal’s playback speed without changing its pitch. In digital music production, TSM has become an indispensable tool, which is nowadays integrated in a wide range of music production software. [...] Read more.

Time-scale modification (TSM) is the task of speeding up or slowing down an audio signal’s playback speed without changing its pitch. In digital music production, TSM has become an indispensable tool, which is nowadays integrated in a wide range of music production software. Music signals are diverse—they comprise harmonic, percussive, and transient components, among others. Because of this wide range of acoustic and musical characteristics, there is no single TSM method that can cope with all kinds of audio signals equally well. Our main objective is to foster a better understanding of the capabilities and limitations of TSM procedures. To this end, we review fundamental TSM methods, discuss typical challenges, and indicate potential solutions that combine different strategies. In particular, we discuss a fusion approach that involves recent techniques for harmonic-percussive separation along with time-domain and frequency-domain TSM procedures. Full article

(This article belongs to the Special Issue Audio Signal Processing)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Audio Signal Processing

Share This Special Issue

Special Issue Editor

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Published Papers (20 papers)

Research

Review

Further Information

Guidelines

MDPI Initiatives

Follow MDPI