Auralization of Accelerating Passenger Cars Using Spectral Modeling Synthesis

While the technique of auralization has been in use for quite some time in architectural acoustics, the application to environmental noise has been discovered only recently. With road traffic noise being the dominant noise source in most countries, particular interest lies in the synthesis of realistic pass-by sounds. This article describes an auralizator for pass-bys of accelerating passenger cars. The key element is a synthesizer that simulates the acoustical emission of different vehicles, driving on different surfaces, under different operating conditions. Audio signals for the emitted tire noise, as well as the propulsion noise are generated using spectral modeling synthesis, which gives complete control of the signal characteristics. The sound of propulsion is synthesized as a function of instantaneous engine speed, engine load and emission angle, whereas the sound of tires is created in dependence of vehicle speed and emission angle. The sound propagation is simulated by applying a series of time-variant digital filters. To obtain the corresponding steering parameters of the synthesizer, controlled experiments were carried out. The tire noise parameters were determined from coast-by measurements of passenger cars with idling engines. To obtain the propulsion noise parameters, measurements at different engine speeds, engine loads and emission angles were performed using a chassis dynamometer. The article shows how, from the measured data, the synthesizer parameters are calculated using audio signal processing.


Introduction
Noise caused by traffic is a relevant health factor in urban environments, along major transport routes and in the vicinity of airports.Noise, in contrast to sound, can principally not be measured, but has to be assessed.For the most relevant noise sources, objective quantities have been derived that correlate with the annoyance as reported by people.However, these correlations are usually weak.One reason for this is the fact that the describing quantities used so far represent the acoustic situation only in a very simplified manner.A method to further investigate the signal properties relevant to noise is to conduct listening experiments where different stimuli are presented to test persons.Relying on audio recordings allows for little variation of different signal aspects only.A more versatile method with a much higher degree of freedom, as well as full control of the influencing signal parameters is to synthesize the stimuli and, thus, to auralize an acoustical environment.
Auralization has been in use for quite some time in architectural acoustics, namely in the fields of room and building acoustics [1][2][3], but it has only recently been discovered for environmental acoustical applications.Today, most auralizations are generated based on computer models and digital signal processing.However, between applications, the individual simulation steps may emission angle with respect to the observer is required.For the rendering, a variety of different methods exists [19].Further, it has to be assured that the reproduction system has a linear frequency response and that it is correctly calibrated.
In the ongoing research project "TAURA", a traffic noise auralizator is developed that covers road traffic and railway noise.It will form the basis for future experiments to refine the characterization of noise.The key element is a synthesizer that simulates the acoustical emission of a great many different vehicles, operating on a wide variety of surfaces and under different operating conditions.In the TAURA model, road traffic noise is created by the superposition of individual vehicle pass-by sounds.The objective of this article is to describe how these single pass-by sounds can be generated in the case of passenger cars.
This paper extends the work presented in [20] and is structured in two main parts: In Section 2, the auralization model of accelerating passenger cars is developed step by step and presented.Thereby, an emission synthesizer, which is based on spectral modeling synthesis, and propagation filtering algorithms are elucidated.Section 3 shows how the model parameters can be estimated based on controlled measurements.On that account, a series of signal analysis steps to obtain the steering parameters of the synthesizer is proposed.The article ends with conclusions in Section 4.

Overview
This section presents an overview of the model to auralize accelerating passenger cars.Further, the key assumptions and motivations used for the model development are presented.In the model, each car is represented by two moving point sources.The geometrical situation is depicted in Figure 1, in which the distance of the straight driving lane to the receiver is D, the emission angle ϕ, the angle of inclination α and the point source positions S1 and S2, respectively.Describing the kinematics of the vehicle, its speed v(t) in km/h as a function of time t (at the source) is used throughout this paper.In correspondence with the Harmonoise model [21], the point sources are vertically stacked and located at heights of 0.01 and 0.3 m above ground.By not attributing separate point sources to each vehicle axle, we limit the applicability of the model to situations with source-receiver distances clearly larger than the axial distance, while in return, saving computational costs.Road traffic noise is mainly composed of propulsion noise and tire noise [18,[21][22][23].Both contributions differ in their relevance for the total noise, depending on the vehicle, its operating Appl.Sci.2016, 6, 5 4 of 27 conditions and the pavement type.This motivates that in the presented model, the contributions of propulsion noise and tire noise are simulated individually.
In accordance with Vorländer's definition of the "principle of auralization" [2], the presented model to auralize accelerating passenger cars comprises a separate emission, a propagation and a reproduction module.The emission module is described in Section 2.2, the propagation module in Section 2.3 and the reproduction module in Section 2.4. Figure 2 shows the block diagram of the model.The input variables describe the vehicle, driver, road surface, geometry, ground type and the weather; the input variables marked by * are time dependent.For both moving source positions, S1 and S2, the sound propagation to the (static) receiver position is simulated in the propagation module.To generate the receiver signals, both corresponding source signals, s 1 and s 2 , are filtered by a series of time-varying digital filters, as described in Section 2.3.These filters depend on the instantaneous propagation geometry, the ground type and the weather conditions (cf. Figure 2).Finally, in the reproduction module, the receiver signals are summed up and rendered for multi-channel reproduction using the instantaneous immission angle.Section 2.4 exemplifies a possible stereo rendering procedure.

Emission Module
The emission module describes the emitted sounds of an individual passenger car.Its structure is depicted in Figure 2. As described above, the acoustical emission of the passenger car is assumed to consist of the two contributions: tire noise and propulsion noise.Their corresponding emission signals are denoted as s tire and s prop , respectively.
Tire noise strongly depends on tire type [22,24], road surface type [18,21,22,25] and vehicle speed [18,[21][22][23].Further, the horn effect mainly determines the horizontal directivity of tire noise [21,23,26].To model these effects, the tire noise contribution is assumed to depend on the road and tire type, as well as on vehicle speed v and the emission angle ϕ.Section 2.2.1 shows how the signals s tire are calculated based on these input parameters.In current noise prediction models, propulsion noise is commonly calculated as a function of vehicle speed, acceleration and road inclination [18,21,27].This is due to the fact that these models are developed and used in cases for which the engaged gear is not known.The gear, however, strongly influences the sound of propulsion [23].For a given speed, acceleration and road inclination, by changing the gear, the engine speed, as well as the engine load change.From an engine's viewpoint, it is these two parameters that are sufficient to fully describe the engine condition.Section 2.2.2 explains how in the auralization model, engine speed n and engine load Γ are calculated by simulating the driving dynamics of the vehicle.These simulations require information on the vehicle and the driving style, the road inclination α and the vehicle speed v(t) as a function of time t.Further, propulsion noise features a directivity [23], which is also taken into account in the auralization model.Section 2.2.3 shows how, based on n, Γ and the emission angle ϕ, the signal s prop is calculated.
The audible emission signals s tire and s prop are generated artificially by two digital sound synthesizers.The synthesizers are based on a combination of additive and subtractive synthesis.In additive synthesis, the signal is constructed by the sum of sinusoids, each having a time-varying amplitude and phase [28][29][30].On the other hand, subtractive synthesis uses filters to shape a more complex source signal, e.g., a sawtooth wave or white noise [28,29].The combination of both techniques is known as spectral modeling synthesis [30][31][32].However, in contrast to the applications presented in [31] and [32], in the presented model, the sounds are not synthesized using the short-time Fourier transform (STFT), but directly in the time domain.The structure of the synthesizer is similar to the one recently published for wind turbine sounds [6].
The signal of propulsion noise, s prop , is fully attributed to the upper point source S2.However, the sound power of the tire noise contribution is attributed to the point sources by 80%/20% [21].This translates to a ratio of 2:1 of their respective sound pressure signals.The conditions of incoherent signals and energy conservation yield a normalization factor of 1/ √ 5. Thus, the sound pressure source signals are: at reference distance r 0 = 1 m for source positions S2 and S1, respectively.Indices 2 and 1 indicate that different, uncorrelated signals for the sound of tires are generated for the two source positions.

Sound of Tires
The emission signal of the sound of tires is assumed to consist of broadband noise only, i.e., discrete tones due to, e.g., tire tread resonances or discrete vibrational tire resonances are not taken into account.
The spectral shaping of the broadband noise components is performed in 1/3 octave bands.For each 1/3 octave band i, white noise is generated and filtered by a digital pink filter.This pre-shaping helps to produce a smoother spectrum of the resulting signal [6].The output of the pink filter is bandpass filtered by an eighth order Butterworth filter (Class 0 according to the standard IEC 1260:1995 [33]) and normalized to unit signal power to obtain the signal ξ i (t).For stability reasons, also at low frequencies, the filters are implemented as cascaded second-order sections (SOS).
The sound pressure emission signals of the sound of tires component are thus calculated by [6]: with N b being the number of considered 1/3 octave bands, the reference pressure p 0 = 20 µPa and normalized bandpass filtered pink noise signals ξ i (t).A total of N b = 29 bands from 20 Hz to 12.5 kHz are used.For the sound pressure level L tire,i of band i, a common logarithmic speed relationship [22] with additive correction terms is assumed: with reference speed v 0 = 70 km/h, regression parameters A i and B i , the road surface correction ∆L road,i and a horizontal directivity ∆L dir,i .For the road surface correction, the Swiss "sonRoad" model [18] offers the parameter ∆ BG for 10 surface types.However, ∆ BG does not depend on frequency or vehicle speed.The recently-published EU directive on establishing common noise assessment methods (CNOSSOS-EU) [27] contains spectral corrections in octave bands in the form of: with experimental regression parameters α i and β, which are tabulated for 15 different road surface types.The horizontal directivity simulates the horn effect [26] and only applies for signal s 1 (t), i.e., for the lower source position (S1).The empirically-obtained relationship [21]: with the 1/3 octave band center frequencies f c,i and the correction C is employed.C accounts for a limited emission angle range during emission measurements, e.g., it amounts to 0.9 dB for an angle range 45

Driving Dynamics
Figure 2 shows that as a first step of the propulsion noise simulation, the driving dynamics of the car are calculated in order to obtain the instantaneous engine speed n(t) and engine load M(t).The engine speed in engaged mode reads [34]: with the instantaneous vehicle speed v given in km/h, the gear ratio i gear , the axle ratio i ax and the dynamic tire radius r tire,dyn ≈ 0.3 m.The traction F T is modeled by: [34,35] with the vehicle mass m, gravity g, the inclination angle of the road α, the translational acceleration a of the car and a mean equivalent mass factor ē = 1.15 for the rotational accelerations for each individual gear.The basic driving resistance F B (consisting of rolling resistance and aerodynamic drag) is modeled by the coast-down parameters F 0 , F 1 and F 2 with units N, N/(km/h) and N/(km/h) 2 , respectively.These parameters have to be provided by the manufacturer during the type approval procedure.The engine load (torque) is formulated by [34]: with a globally-set efficiency factor η = 0.9 for the power transmission from the engine to the wheels.The engine load in percent is defined by [36]: with Γ = 100% at full load.At idling engine, M = 0 Nm and Γ = 0%.In engine overrun operation (e.g., while engine braking), the engine delivers a negative torque to the crankshaft, which means that the engine load M becomes negative.In the model, this state is approximated as idle, i.e., M is set to zero.Gearbox shifts are modeled by three consecutive processes: the clutch is disengaged; a new gear is put in (at idling engine); and the clutch is engaged again.In dependence of the driving style, these processes vary in their respective durations.In the model, for a sporty driving style, the total gear change takes 0.6 s, whereas for a cozy, economic driving style, the gear change takes 1.3 s.Furthermore, the moments of a gear change strongly depend on the driving style and can be formulated as a function of engine speed and engine load, which is also the basic working principle of an automatic gearbox.
Figure 3a and 3b show two simulated engine condition courses within an engine load vs. engine speed diagram.The black lines show the simulated temporal progression of the engine condition during a virtual pass-by.Both simulations start at the same initial engine condition of 900 rpm and 3 Nm, marked by green stars.In both cases, the passenger car starts in first gear and accelerates from v = 7 km/h to 50 km/h, however with differing accelerations a and driving styles.For the medium driving style and an acceleration of a = 1 m/s 2 (Figure 3a), three gear changes occur at around 2000 rpm (Sample 1); whereas for the sporty driving style and an acceleration of a = 2 m/s 2 (Figure 3b), only two gear changes happen, but at higher engine speeds of 3000 to 4000 rpm (Sample 2).The temporal behaviors of the engine states for these two examples are published as supplementary data (see the videos in Supplementary File).

Sound of Propulsion
The structure of the emission synthesizer for the sound of propulsion is depicted in Figure 4.The sound pressure emission signal of the sound of propulsion is assumed to consist of a deterministic signal representing the most important engine orders and a quasi-stochastic signal: Engine order ν corresponds to an event taking place ν times per engine revolution.The engine order signal is composed of the sum of the engine orders ν, which are generated using additive synthesis [29,30].The engine order signal is thus calculated by [6]: A proper selection of the essential orders ν strongly depends on the specific vehicle type and its condition.In the context of sound design, it is known that at least orders up to ν = 18 are relevant [23].Further, the sound characteristics can be influenced by half-orders [23].In this model, it was decided to synthesize orders ν = 1 to 30 in half-order steps, resulting in a total of 59 orders.This somewhat arbitrary, but safe choice leaves room for optimization.In Equation ( 14), L † prop,ord,ν denotes the order level and the instantaneous order phase: with the order phase φ † ν and the order frequency: Listening tests revealed that in this application, the order phase is a relevant synthesizer parameter.For a four-stroke engine with N cyl cylinders, the engine order corresponding to the ignition, and mostly the predominant order, is ν ign = N cyl /2 [23].Thus, the ignition frequency reads [5]: For time-discrete signals, Equations ( 14) and ( 15) can be interpreted as a modified numerically-controlled oscillator (NCO) [28], whereas Equation ( 15) corresponds to the phase accumulator (PA), and the phase-to-amplitude converter (PAC) is realized in Equation ( 14).This formulation concurrently implements a frequency modulation by F and a phase modulation by φ.
The noise signal component of the sound of propulsion is synthesized similarly as the sound of tires (Equation ( 3)) by: A total of N b = 29 bands from 20 Hz to 12.5 kHz are used.The 1/3 octave band level function is formulated as: with the level L † eq,prop,noise,i , a level standard deviation σ † i and a level fluctuation function R(t) with zero mean and unit power.The constant K ensures that despite the level fluctuations, the equivalent continuous level (Leq) is not altered.This level modulation simulates the rattling sound component that elicits a roughness sensation, which is particularly characteristic for low engine speeds and diesel engines.Motivated by measurement data that showed the strongest level fluctuations at the ignition frequency, R is modeled by a quasi-periodic function with period 1/F ign (t).The first half-period of R is composed of a Hann window, whereas the second half-period is held constant.
In summary, the presented synthesizer needs about 180 input parameters to generate a stationary signal for the sound of propulsion.However, during a pass-by, the sound of propulsion may considerably vary, and so do these parameters.These parameters, which are marked by † in the above equations, simultaneously depend on the engine speed n, the engine load M and the emission angle ϕ and are hence time dependent.They are calculated by a triangulation-based linear 3D interpolation of measurement data.Measurements were taken on a discrete grid, typically n ≈ {1000, 2000, 3000, 4000} rpm, Γ ≈ {0, 40, 70, 100}% and ϕ = {0, 60, 120, 180} • .The measuring point pairs n, M of a measurement performed on a Ford Focus 1.8i are depicted as circles in Figure 3.
The topmost points at 1000 to 4000 rpm are at full load, i.e., Γ = 100%.Furthermore, the adopted Delaunay triangulation, which is used for the interpolation, is shown with gray lines.The synthesizer parameters are evaluated with a temporal resolution of 20 ms and linearly interpolated to the audio sampling rate, f s .For the interpolation of the order phase, φ, its cyclic behavior has to be considered in order to avoid spurious phase fluctuations.

Propagation Filtering
The sound propagation model described in this section incorporates the following effects: • Propagation delay • Doppler effect (frequency shift and amplification) Other outdoor sound propagation effects that may be relevant in certain situations are screening [37][38][39], foliage attenuation [38], meteorological effects due to an inhomogeneous atmosphere [38][39][40][41][42], as well as reflections at artificial [38,43] and natural surfaces [42,44,45].Most published environmental noise auralization models simulate some of the above listed effects by applying a 1/3 octave filter bank and adjusting the filter gains [8,14,15].In this model, however, all of these effects are applied in the time domain, i.e., by time-variant digital filters.Sound propagation is modeled by two paths, namely for direct sound and a single ground reflection (in the following account indicated by subscripts "dir" and "gr", respectively).The sound pressure of a point source has a 1/r distance dependency.Thus, to model geometrical spreading, the emitted sound pressure signals, x, are divided by their path length r dir or r gr , respectively.The interaction of a sound wave with the ground influences its amplitude and phase as a function of frequency.This effect can be modeled by convolution of the ground-reflected signal with a time-variant filter [46].Furthermore, the attenuation due to air absorption can be efficiently modeled using a filter [46].Considering these aspects, the receiver signal y is calculated by: where * denotes linear convolution, t the receiver time axis, r dir is the source-receiver distance, r gr is the distance source-ground reflection point-receiver, x dir and x gr are delayed versions of the emitted sounds and h air,t and h gr,t denote impulse responses of time-dependent filters described in Sections 2.3.2 and 2.3.3.The modeling of effects due to source motion and the propagation delay are explained in Section 2.3.1.Note that the immission angle θ(t ), which is needed for surround reproduction, has to be evaluated on the receiver axis, as well.
Figure 3 shows normalized spectrograms of two synthesized pass-by sounds.For the synthesis of the sound of propulsion, the respective engine condition courses depicted in a and b of Figure 3 were used.They are described in Section 2.2.2.The temporal behavior of the engine states, the spectrograms, as well as the auralizations are published as supplementary data (see the videos in the Supplementary File).In both simulations, the car passes the receiver at Time 0 at 30 km/h.The receiver is located 1.2 m above a hard ground at a distance D = 7.5 m.As at the pass-by, the engine speed still increases, the Doppler frequency shift is not directly observable in the course of the order frequencies.However, the gear change moments can be well observed as local decreases of order frequencies.As a consequence of the used engine condition courses, these frequency drops occur at higher frequencies for the sporty driving style (d).

Effects Due to Source Motion and Propagation Delay
Due to the travel time of sound and the movement of the source, the source and the receiver have differing time axis.By neglecting wind and turbulence, the warped time axis at the receiver is given by: where r dir/gr denotes the sound propagation distance of the direct sound or the ground reflected sound, respectively, and c 0 is sound speed in still air.A constant sound speed of c 0 = 340 m/s is assumed.Since the receiver signal is supposed to have a constant sampling rate of f s , the corresponding times, t s , on the emission time axis have to be found.This is achieved by linear interpolation of Equation ( 21).The emission signals x for the direct and the ground-reflected path, respectively, with respect to the receiver time t are: with the Doppler factor: Equation ( 22) describes the kinematic and the aerodynamic effect of source motion.The former is known as the Doppler effect, i.e., the Doppler frequency shift and amplification.The latter is known as convective amplification.The exponent two of the Doppler factor indicates that a Lighthill [47] monopole [48] and/or dipole source [49] is assumed.
The change of the time axis in Equation ( 22) realizes the propagation delay, as well as the Doppler frequency shift.For digital signals, this change corresponds to an asynchronous resampling process.It can by implemented using a variable delay-line with delay ∆t [50].If ∆t is just rounded to the nearest sample, audible artifacts occur, so-called "zipper noise".Therefore, an interpolation strategy has to be used.As we are only interested in sequential access to the emission signal, a fractional delay filter can be used [51].In [18] and [50], a linear interpolator is proposed.This however produces high frequency attenuation, as well as strong nonlinear distortions due to aliasing.Therefore, here, we introduce a band-limited interpolation or, respectively, a windowed sinc interpolation [51]: with the floor function ., the integer sample index k, the non-integer sample index k s = t s f s and the Hamming kernel: with an integer b describing the filter length.To keep the computational effort low, in the implementation of Equation ( 24), values of the kernel K are stored in a look-up table.The Doppler factor D in Equation ( 22) is implemented by approximating the derivative in Equation ( 23) by finite differences as: with index i.
In order to validate different implementations of Equation ( 22), numerical simulations were performed.Figure 5 compares the signal attenuation introduced by different interpolation schemes.The high frequency attenuation of the linear interpolation can be improved by a windowed sinc interpolation and controlled by parameter b. Figure 6 shows spectrograms of receiver signals calculated by the same three interpolation schemes.As an extreme case, a virtual source emitting a 1 kHz pure tone travels at constant speed v = 150 km/h and passes a static receiver at a distance of D = 7.5 m. Figure 6 shows that by introducing a windowed sinc interpolation of sufficient filter length, artifacts due to aliasing can be significantly reduced compared to a linear interpolation (a).The minimal kernel size b required for a decent sound quality cannot be stated in general, as it strongly depends on the application, i.e., the source signal, the propagation situation and, not least, the sampling frequency.However, in the example of Figure 6, already, b = 10 reaches a good sound quality, without audible artifacts.Nevertheless, to be on the safe side, a value of b = 100 was adopted.The careful choice of b, however, provides the potential for optimization in terms of sound quality and computational cost.
For sound speed c 0 = 340 m/s, the Mach number M ≡ v/c 0 ≈ 0.12.At times t = ±∞, the received frequencies f due to the Doppler shift are given by [2,52]: with f being the emitted frequency.In our example, according to Equation ( 27), the received frequency changes by a factor of 1.28 across the pass-by, which corresponds to a musical interval that is larger than a major third.The sum of the Doppler and the convective amplification amounts to [48]: Equation ( 28) yields an amplification of 2.3 dB at t = −∞ and an attenuation by 2.0 dB at t = ∞, resulting in a level difference of 4.3 dB across the pass-by.The numerical implementations of Equation ( 22) corresponded well with these theoretical values.

Ground Effect
In Equation (20), the ground effect is modeled in a physical way as the interference between direct and ground reflected sound.A flat topography is assumed, i.e., only one ground-reflected path is modeled, which is implemented by adding a second signal path.The ground-reflected sound differs from direct sound by scaling with its propagation distance and a complex reflection factor, as well as an additional delay.The complex reflection factor depends on frequency, geometry and ground surface type and is realized by the filter h gr,t .h gr,t is the impulse response of the spherical wave reflection coefficient at an infinite locally-reacting surface.The ground surface is acoustically described by a frequency-depending surface impedance, for which the widely-used empirical model of Delany and Bazley [53] was used.
In [46], the additional delay of the ground-reflected sound was modeled by a digital delay of integer length.However, in this application, due to the higher relative source speed and short delays, audible artifacts ("zipper noise") occur.Therefore, a separate resampling is performed in Equation ( 22) for the ground-reflected sound.Furthermore, this type of processing eliminates the spectrally-fluctuating errors (see Figure 1 in [46]).
The spherical wave reflection coefficient filter h gr,t is implemented by an FIR filter designed using the inverse FFT, as described in [46].It has to be made sure that the filter lag is compensated.However, compared to [46] for this application, substantially more filter taps are required to reproduce the correct interference pattern.Figure 7 shows simulation results for the standard configuration of road traffic noise emission measurements (a) and a receiver point at distance D = 100 m and height 2 m with sound propagation over grassy ground (b).For the former case a filter with 40 taps is sufficient, as the difference to the simulation with a filter with 400 taps stays well below 1 dB for all frequencies (nearly perfect coincidence of curves in Figure 7a).For the latter case, however, Figure 7b shows that such a short filter is not able to correctly reproduce the interference pattern and creates large errors at mid and low frequencies.A filter length of 400 taps allows simulations that are in good agreement with the exact solution for both cases.Large errors only occur near the Nyquist frequency.An update interval of the filter coefficients of 200 ms is used.

Air Absorption
For performance reasons, the identical air absorption filter h air,t is applied to the direct and ground reflected path in Equation (20).h air,t are linear-phase FIR filters designed using the inverse FFT, as described in [46].The frequency-dependent sound attenuation coefficients for atmospheric absorption as a function of relative humidity and temperature are calculated according to the standard ISO 9613-1 [54].A filter length of 30 taps is used with an update interval of the filter coefficients of 200 ms.

Reproduction Rendering
The rendering of the immission signals for reproduction strongly depends on the type of reproduction system.For surround reproduction via multiple loudspeakers, techniques, such as Ambisonics [55,56] or amplitude panning (e.g., Vector Base Amplitude Panning (VBAP) [57] or Multiple-Direction Amplitude Panning (MDAP) [58]), are possible candidates.For binaural reproduction over headphones, generally, head-related transfer functions (HRTF) should be applied.In this paper, for simplicity, a simulation of the "ORTF" stereo technique [59,60] is used.If the listener is facing the road, this allows for a reproduction with sufficient accuracy via headphones and a stereo speaker set-up.The cardioid microphone pattern and the time difference between the left and right channel are modeled by: with the time-varying time difference: The immission angle θ(t ) has to be evaluated on the receiver axis (see Equation ( 21)).y(t + u) is calculated using a windowed sinc interpolation strategy according to Equation (24).As a consequence of this interpolation, high-frequency attenuation, as shown in Figure 5, and nonlinear distortions are introduced to channel L.

Model Parameter Estimation
This section presents procedures to obtain the model parameters of the emission synthesizer described in Section 2.2.The procedures are based on controlled measurements.The following sections describe the measurements, as well as the signal processing that is applied to the acquired data.

Tire Noise
The emission parameters for tire noise were obtained from pass-by measurements with idling engine.For an individual tire type, pass-bys by the same passenger car at different speeds were recorded at a sampling frequency of f s = 44.1 kHz with a calibrated measurement microphone in a set-up referring to the standard ISO 11819-1 [25] and depicted in Figure 8a.The pass-by speed was measured by radar, and the pass-by time was determined from synchronous video.Under the assumption of constant speed, a time-dependent backpropagation to the source was performed.Thereby, two equal incoherent point sources at the nearby wheels were assumed, i.e., placed at the side of the car, horizontally separated by the wheelbase and set on the ground.For the temporal accordance, the sound propagation delay, as well as the filter group delays of the 1/3 octave band filters have to be taken into account.Consequently, emission levels at reference distances r 0 were obtained by integration over an emission angle range of 90 • .Applying a logarithmic transformation to the measured pass-by speeds, the linear regression parameters A i and B i of Equation ( 4) were fitted in a least-squares sense.Despite the idling engine, some low 1/3 octave bands were contaminated by the engine sound.To correct for this, in the first step, for each band, a quality criterion based on the correlation coefficient and the slope of the regression line was deployed.Adverse bands were identified and imputed based on the values of adjacent valid bands.In the second step, low-frequency peaks of A i were smoothed by a nonlinear method.Figure 9

Propulsion Noise
To obtain the emission synthesizer parameters of the propulsion noise, controlled measurements on a chassis dynamometer (see Figure 8b) and at idling engine under free field conditions were performed.Calibrated audio recordings at a sampling frequency of f s = 44.1 kHz at different microphone positions around the vehicle and at different engine conditions were taken.During the measurements on the chassis dynamometer, four microphones were placed on the ground at emission angles ϕ ≈ {0, 60, 120, 180} • at distances r = 1 to 2 m from the vehicle.During the free field measurements, four additional microphones were placed on the ground at the identical emission angles, but at larger distances of r = 4.5 to 7 m.The free field measurements were used to correct for the room influences of the lab, as explained in Section 3.2.5.On the chassis dynamometer, measurements were typically taken at engine speeds of n ≈ {1000, 2000, 3000, 4000} rpm and engine loads of Γ ≈ {0, 40, 70, 100}%.The measuring point pairs n, M of a measurement performed on a Ford Focus 1.8i are depicted as circles in Figure 3.The topmost points at 1000 to 4000 rpm are at full load, i.e., Γ = 100%.To confine tire noise, low vehicle speeds were aimed for by choosing low driving gears.Mostly, it was the second gear, which resulted in vehicle speeds <50 km/h.
To these recordings, a series of signal analysis steps were applied, which are outlined in a signal flowchart in Figure 10.These steps are further explained in the following sections.For the signal processing, a signal length of 4 s is used.

Resampling
The emission synthesizer uses detailed information about the engine orders.These parameters are obtained by a narrowband analysis, which is described in the following section.Although during the measurements, the engine speed was kept fairly constant, the instantaneous order frequencies slightly fluctuate as exemplarily shown in Figure 11.This figure shows the spectrogram of a recording made at the rear of a car with an inline, four cylinder engine idling at 1100 rpm.To be able to separate engine orders and broadband noise by the narrowband analysis, a preceding resampling of the slightly non-stationary signals is performed.In order to actuate the resampling process, the instantaneous ignition frequency, F ign (t), of the engine is required.This data are extracted from the audio recordings.
In the first step, the average ignition frequency is estimated.From the signal taken closest to the exhaust, the power spectral density (PSD) with a frequency resolution <1 Hz is calculated.Based on the rough indication of the engine speed taken from the car's tachometer, a first estimate of the ignition frequency, F ign , is obtained using Equation (17).The location of the maximum value of the PSD within a search range around this frequency yields a better, second estimate.Particularly, for low engine speeds, at which the ignition frequency can be as low as 20 Hz, this estimate is still not precise enough due to the low relative resolution at low frequencies.Thus, this estimate is further enhanced by considering the double ignition frequency, 2F ign , (i.e., engine order ν = 4 in Equation ( 16) for a four-cylinder engine) within a smaller range of the PSD.In the second step, this information is used to track the course of the ignition frequency, F ign (t).This task is generally known as pitch detection [61][62][63].A wide variety of algorithms exist that work in the time or frequency domain or a combination of them.In our application, a spectral method was established, in which the course of one discrete frequency component (i.e., an engine order) is tracked in a spectrogram.The spectrogram S(t, f ) expressed in decibels is computed by the short-time Fourier transform (STFT): STFT is calculated using the FFT.Windows of 200 ms with a 50% overlap, i.e., a temporal resolution of ∆t = 100 ms, are multiplied by a Hann window function.To obtain a high frequency resolution of ∆ f < 0.5 Hz, the signals are zero padded.A section of such a spectrogram is depicted in Figure 11.Within the spectrogram, the "highest cost" path between time t = 0 and t = T, within a certain frequency range around a reference frequency, F r , is sought.F r is chosen to be the first multiple of the mean ignition frequency above 55 Hz.This is a compromise between signal power and frequency localization: typically, the power decreases for increasing even orders (see Figure 11b), but higher orders exhibit larger absolute frequency variations (see Figure 11a).In the example of Figure 11, F r is 71 Hz (corresponding to the fourth engine order), as the mean ignition frequency lies at 35 Hz.
The optimization task is solved by dynamic programming, which breaks the complex problem down into many simple subproblems.This method prevents taking possible wrong local decisions and guarantees that the best solution is found.A well-known algorithm that uses dynamic programming is dynamic time warping (DTW), which is often applied in, e.g., automatic speech recognition (ASR).Additionally, we make use of the a priori knowledge that the engine speed does not change rapidly over time.This is introduced as a requirement on the slope of the optimal path, F opt .The algorithm described below is based on an algorithm developed for object tracking in video data [64].Within the search section of the discrete spectrogram, the local score q is calculated by: for which holds q ≥ 0. From q for each time step m = {1...M} and frequency bin l, the global score Q(m, l) is recursively computed by: with a positive integer c realizing the requirement: on the absolute value of the slope C of F opt given in Hertz per second.For the forward processing described by Equation (34), the starting condition is that the initial global score, Q(1, l), is set to the local score, i.e., Q(1, l) = q(1, l).During the evaluation of Equation (34), it is essential that the back pointers: B(m, l) = arg max to the optimal predecessors are stored.From the global score Q, the end point of the optimal path is found by: Using the back pointers B, the optimal path can be traced back using a recursive procedure known as backtracking: In Appendix, we provide a simple MATLAB code, which solves Equations ( 34) to (38). Figure 11 shows the optimal path drawn as a black line following the frequency component around 70 Hz.
In the third step, the sound pressure signal is asynchronously resampled based on the course of the tracked engine order.The warped time axis is calculated by: where F opt (t) is the linearly-interpolated version of F opt (m).For the resampling of the sound pressure signals, a windowed sinc interpolation, as described by Equation ( 24), is adopted.Figure 11 illustrates the effect of the asynchronous resampling on the power spectral density.In contrast to the original signal, for the resampled signal, all even engine orders from two to 12 can be clearly identified as equidistant, narrow peaks.

Order Analysis
From the resampled signals, information about the engine orders is extracted.Therefore, a filter bank consisting of one bandpass filter per considered engine order is generated and applied to the signal.Eighth order Butterworth filters centered around the engine order frequency F ν with a 6-Hz bandwidth are employed.Figure 12 shows the magnitude frequency response of the filter bank.At the output of each filter, the corresponding order level, L prop,ord,ν in Equation (14), is calculated as an equivalent continuous level (Leq).Figure 13 exemplifies measured order levels at idling engine and full load recorded at the rear of a VW Touran running at 1000 and 3000 rpm.The order phases are detected using the cross-correlation function.Since the above-described infinite impulse response (IIR) filter bank introduces phase shifts, the outputs of the filter bank are time reversed and sent once again through the same filter bank and, finally, time reversed.In doing so, a zero-phase forward and reverse digital IIR filtering is implemented.This signal, g ν (t), is cross-correlated with a prototype function cos(2πF ν t) to obtain the time shift: from which the phase shift of Equation ( 15) can be derived as:

Noise Analysis
The noise levels and their short-term level fluctuations are obtained by a series of filtering operations.Starting with the resampled signal, in a first attempt, the engine orders are suppressed using cascaded notch filters.These filters are designed analogously to the engine order filter bank from the previous section, except that instead of bandpass filters, band-stop filters are generated (see Figure 12).Figure 15 shows two power spectral densities, which illustrate the effect of the order suppression filter.After this operation, the signal is split into sub-bands for further analysis.The signal is therefore decomposed into 1/3 octave bands using a 1/3 octave band filter bank.Each of the N b filters yields a signal q i (t).From q i (t), the noise levels, L eq,prop,noise,i in Equation (19), are calculated as Leqs.Moreover, from q i (t), using a moving average filter, smoothed level-time curves: are calculated using a window length of K = 4 ms.Subsequently, from L q,i , the mean value is subtracted to obtain a DC-free level fluctuation signal: Figure 16a exemplifies such a fluctuation signal for the 2.5-kHz band recorded at the front of a diesel engine car.The periodic structure is clearly visible.Following [6], the autocorrelation function (ACF) is used to estimate the standard deviations σ i (used in Equation ( 19)) of the level fluctuations with period 1/F ign by: Figure 16b shows the square root of the ACF of the level fluctuation signal depicted in Figure 16a.Clear peaks can be observed at lag zero and multiples of the ignition period of 34 ms.The standard deviation σ amounts to about 5 dB.The fact that a higher peak appears at the double ignition period, at 68 ms, indicates that the signal contains an additional level modulation with a modulation frequency equal to the half ignition frequency.This can also be observed in Figure 16a in which every second peak is about 5 dB higher than the previous one.
Figure 17 shows measured spectra of the standard deviations σ i .The measurements were performed in front of five cars idling at low engine speeds.It can be seen that diesel cars feature higher values compared to gasoline engine cars.This finding corresponds to the increased rattling sound noticed in the field.

Background Noise Corrections
On the test rig, the main background noise sources were the tire noise, the airstream fan, the room ventilation and the dynamometer itself.Firstly, to confine the tire noise, the measurements were performed at low vehicle speeds, i.e., low gears.Secondly, during the measurements, the airstream fan (depicted in Figure 8) was briefly switched off for periods of about 10 s.However, the dropping tonal components of the fan still strongly interfered with the propulsion noise of the car (see Figure 18).Therefore, several shifted analysis time windows were deployed, and the minimal levels and the maximum level standard deviations, σ, across these windows were exploited.Thirdly, background noise measurements with a switched off engine at different vehicle speeds were performed.For each ordinary measurement, the corresponding background noise was identically analyzed and used for level corrections.As the interface to the propagation model, the emission signals are defined at a (virtual) reference distance of r 0 = 1 meter from the source position.For the measured levels L lab , the following inverse sound propagation model is used: L Em,1 m = L lab + 20 log r Ac r 0 + A room + A gr (45) with the ground effect A gr = −6 dB, as all microphones were mounted on the ground.For the microphones placed close to the room edge (emission angles ϕ = 60 • and 120 • ), the room correction A room was set to −6 dB for frequency bands below 1 kHz and to −3 dB otherwise.As for the microphones placed in front and at the back of the car, the distance to the closest wall was about three meters; A room was set to 0 dB for these signals.r Ac is the distance to the acoustical center, which, by assuming geometrical spreading of a point source, is obtained by simultaneous free field measurements at two points at distances r and r (r > r) with: Parameter r Ac is evaluated separately for each emission angle, engine speed and frequency band (or engine order, respectively).

Conclusions
In the proposed auralization model, emission sounds of accelerating passenger cars are artificially generated based on spectral modeling synthesis.Whereas the sound of tires is synthesized as stationary noise, which is time-dependently shaped in third octave bands, the realistic synthesis of sounds of propulsion requires more subtlety.
It is synthesized as the superposition of a noise component and tones.Frequency-dependent periodic short-term modulations are applied to the noise component in order to create a rattling sound eliciting a roughness sensation.The tones are related to the engine orders.It was found that a large number of engine orders are needed (50...100) to convincingly represent different engine speeds and loads.Moreover auralizations revealed that the order phases have to be included as synthesizer parameters.In conclusion, the presented emission synthesizer gives complete control over the signal characteristics, but is computationally much more demanding than a synthesizer based on granular synthesis [5] with limited flexibility.However, a hybrid approach could profit from both advantages, i.e., by pre-creating relevant signal grains using spectral modeling synthesis and usage for granular synthesis in real-time applications.
Analysis of the propagation filtering algorithms yielded two main insights.Aliasing, arising from the simulation of the Doppler effect, can be reduced by incorporating a band-limited resampling strategy, such as the windowed sinc interpolation.Furthermore, due to the low source height, a significantly higher number of filter taps is needed to correctly simulate the ground effect in relevant situations, as compared to elevated sources, such as airplanes or wind turbines [46].
We conclude that with the presented synthesizer structure, audio signals from vehicle pass-bys can be represented in a compact and elegant manner.To give the reader an impression of the subjective quality of the proposed model, auralizations of two examples are published as supplementary data (see the videos in the Supplementary File and Section 2.3 for details).

Figure 1 .
Figure 1.Sketch of the geometrical situation showing the two source positions S1 and S2, the inclination angle of the road α, the distance D, the instantaneous vehicle speed v, emission angle ϕ, immmission angle θ and source-receiver distance r.

Figure 2 .
Figure 2. Simulation flowchart of the auralization of accelerating passenger cars.The input variables marked by a * are time dependent.

Figure 3 .
Figure 3. Simulation results: The upper graphs (a,b) show two simulated engine condition courses of an accelerating Ford Focus 1.8i with different accelerations and driving styles.The gray triangles show the interpolation grid spanned by the measuring points marked as circles, as introduced in Section 2.2.3.The lower two graphs (c,d) show the spectrograms of the corresponding synthesized pass-by sounds (normalized to 0 dB).Their calculation is elucidated in Section 2.3.

Figure 4 .
Figure 4. Signal flow chart of the synthesizer for the sound of propulsion.

Figure 7 .
Figure 7. Simulated ground effect spectra for a point source at a height of 0.3 m in the reference situation (a) with a receiver at a height of 1.2 m at a horizontal distance of D = 7.5 m and propagation over hard ground (flow resistivity 20,000 kPa•s•m −2 ); and a distant situation (b) with a receiver at a height of 2 m at horizontal distance D = 100 m and propagation over grassy ground (flow resistivity 200 kPa•s•m −2 ).

Figure 8 .
Figure 8. Photographs showing the measurement set-ups for tire noise (a) and propulsion noise (b).In (a), the coast-by situation is depicted with two measurement microphones placed at different distances and a camera connected to a laptop; (b) shows the lab with a passenger car on the chassis dynamometer, the airstream fan in front of the car and two microphones on the floor at the left-hand room edge (emission angles ϕ = 60 • and 120 • ).

Figure 9 .
Figure 9. Measured tire noise regression parameters A i (a) and B i (b) of 13 tires and the values according to the Harmonoise model [21] (dotted lines).

Figure 10 .
Figure 10.Signal analysis flowchart to obtain the synthesizer parameters of propulsion noise as described in Section 2.2.3 from audio recordings.

Figure 11 .
Figure 11.Normalized spectrogram (a) of the measured sound pressure signal with tracked double ignition frequency (drawn as a black line) and power spectral density (b) of the original and asynchronously resampled sound pressure signal, respectively.The recording was conducted at the rear of the BMW with an inline, four-cylinder engine idling at 1100 rpm.

Figure 13 .
Figure 13.Comparison of engine order levels with idling engine (white) and full load (black) at 1000 (a) and 3000 rpm (b).Recorded at the rear of a VW Touran 1.6 FSI.

Figure 14 comparesFigure 14 .
Figure 14 compares the sound pressure signals of a recording and the corresponding synthesis consisting of engine orders with constant phases, which were estimated by Equation (41).0 50 100 150 200 −2

Figure 15 .
Figure15.Power spectral densities illustrating the effect of the order suppression filter, which is applied to a recording of an inline, four-cylinder engine idling at 4000 rpm.

Figure 16 .Figure 17 .
Figure 16.Level fluctuation signal (a) of the 2.5-kHz 1/3 octave band and its square root of the autocorrelation function (b) from a recording taken at the front of an idling four-cylinder diesel engine at 870 rpm, corresponding to an ignition period of 34 ms.The right plot indicates that for this band, the level standard deviation, σ, amounts to about 5 dB.

Figure 18 .
Figure 18.Normalized spectrogram of a microphone position in front of a Ford Focus 1.8i at 4000 rpm and full load on the dynamometer.The dropping tonal component around 100 Hz stems from the briefly switched off airstream fan of the lab.