Next Article in Journal
Mobility-Included DNN Partition Offloading from Mobile Devices to Edge Clouds
Previous Article in Journal
Grain Boundary Control of Organic Semiconductors via Solvent Vapor Annealing for High-Sensitivity NO2 Detection
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Binaural Modelling and Spatial Auditory Cue Analysis of 3D-Printed Ears

1
School of Electrical & Electronic Engineering, Universiti Sains Malaysia, Nibong Tebal 14300, Penang, Malaysia
2
Flextronics Systems Sdn. Bhd., Batu Kawan Industrial Park PMT 719 Lingkaran Cassia Selatan, Simpang Ampat 14110, Penang, Malaysia
*
Author to whom correspondence should be addressed.
Sensors 2021, 21(1), 227; https://doi.org/10.3390/s21010227
Submission received: 2 October 2020 / Revised: 7 November 2020 / Accepted: 11 November 2020 / Published: 1 January 2021
(This article belongs to the Section Electronic Sensors)

Abstract

:
In this work, a binaural model resembling the human auditory system was built using a pair of three-dimensional (3D)-printed ears to localize a sound source in both vertical and horizontal directions. An analysis on the proposed model was firstly conducted to study the correlations between the spatial auditory cues and the 3D polar coordinate of the source. Apart from the estimation techniques via interaural and spectral cues, the property from the combined direct and reverberant energy decay curve is also introduced as part of the localization strategy. The preliminary analysis reveals that the latter provides a much more accurate distance estimation when compared to approximations via sound pressure level approach, but is alone not sufficient to disambiguate the front-rear confusions. For vertical localization, it is also shown that the elevation angle can be robustly encoded through the spectral notches. By analysing the strengths and shortcomings of each estimation method, a new algorithm is formulated to localize the sound source which is also further improved by cross-correlating the interaural and spectral cues. The proposed technique has been validated via a series of experiments where the sound source was randomly placed at 30 different locations in an outdoor environment up to a distance of 19 m. Based on the experimental and numerical evaluations, the localization performance has been significantly improved with an average error of 0.5 m from the distance estimation and a considerable reduction of total ambiguous points to 3.3%.

1. Introduction

1.1. Background

In the field of acoustics and robotics, it would require a minimum of three microphones to triangulate a sound source in a two-dimensional (2D) space [1,2]. With only two omnidirectional microphones, the microphone fields would intersect at two points, implying that there could be two possible locations where the sound could originate from. Having the third microphone will reveal the unique position of the sound source by eliminating the other possibility. Despite only having two ears. i.e., binaural hearing, humans and animals are able to localize a sound source not only in a 2D space, but also in a three-dimensional (3D) space by analyzing different auditory cues. The brain deciphers the audio cues to predict the direction and distance of the sound source [3]. The job of the ears is to capture and send natural acoustic signals to the brain for processing. The shape of the ear and head additionally plays a role in localizing the sound source by reflecting and diffracting the sound to help the brain to identify the direction [4]. Gaming and recording industries have begun using ear shaped recording devices to make binaural recordings giving a more natural hearing experience for the listeners [5]. In the gaming industries, having binaural audio enables the player to identify where a sound source is coming from in the game in order to give the listener a virtual sense of space [6].
Acoustic triangulation is based on the physical phenomena that sound waves are longitudinal when in far field and if the source is not exactly at the center of all microphones, there will be a time delay between the first microphone and subsequent microphones [7]. Specifically in binaural hearing, this is known as the Interaural Time Difference (ITD) [8]. ITD is the difference in the arrival time of a sound between two ears. It is crucial in the localization of sounds, as it provides a cue to the direction or angle of the sound source from the head. The brain would register the time lag and inform the listener of the direction of sound [9]. ITD analysis is one of the techniques used to predict the angle of arrival of the sound source with respect to the receiver on the azimuth plane.
The Interaural Level Difference (ILD) is another spatial auditory cue that helps a human to localize a sound source. ILD is defined as the difference in amplitude between the two ears [10]. When a sound source is closer to one ear, the sound level will be louder in one ear than the other, as sound is attenuated by distance and also the head. The direction of the sound source can be localized by comparing the level difference between the two ears. ITD/ILD is primarily used for low frequency localization under 800 Hz. The head shadow effects increase with frequency, and therefore loudness differences are the primary horizontal localization cues for frequencies above 1500 Hz [11]. In many applications, ILD and ITD are used in tandem for a more accurate position estimation on the horizontal plane.
Vertical localization is essential if one is to estimate the position of a sound source in a 3D space, and for binaural auditory systems, this can only be realized with the existence of the ears. The shape of the ears is in such a way that the amplitude and frequency response changes, depending on where the sound source is located on the azimuth plane. The pinna which is the outer part of the ear acts as a filter to attenuate certain frequency ranges and has a major role in helping the human auditory system to localize the angle and distance of the sound source [12]. Since the shape of the pinna is very complex and asymmetrical, different pinna resonances become active both vertically and horizontally, depending on the location of the source. These resonances add direction-specific patterns into the frequency response of the ears, which is then recognized by the auditory system for direction localization [13].

1.2. Related Work, Motivation and Contributions

In a binaural system, there exists a set of points that are equidistant from left and right ears, which results in ILD values that are almost identical, and creating a zone called the cone of confusion. This sound ambiguity that is typically referred to as “front-rear” confusion commonly occurs when localizing a sound source with binaural audition, where it is difficult to determine whether the source is behind or in the front of the receiver [14]. One way to resolve this is by introducing dummy heads, such as the Head and Torso Simulator (HATS) with a microphone placed inside each ear canal to allow for the creation of various acoustic scenes [6]. Via this strategy, several researchers proposed Head-Related Transfer Function (HRTF) estimations, where the transfer functions that are based on left and right ears were preliminarily analyzed and constructed before the binaural model is reproduced [15]. However, some techniques via HRTF have been experimentally outperformed via another approach in [16], which used artificial pinnae made with silicone in order to remove the ambiguity by comparing the mean intensity of the sound source signal in a specific frequency range using a threshold value.
For binaural localization in targeted rooms, statistical relationships between sound signals and room transfer functions can be analyzed prior to real-time location estimations, such as the work presented in [17]. The accuracy can be further enhanced by jointly estimating the azimuth and the distance of binaural signals using artificial neural network [18,19]. Another approach utilizing the room’s reverberation properties has been proposed in [20], where the reverberation weighting is used to separately attenuate the early and late reverberations while preserving the interaural cues. This allows the direct-to-reverberant (DRR) energy ratio to be calculated, which contains the information for performing absolute distance measurement [21].
The interest in most of the aforementioned work is nonetheless gravitated to sound source localization and disambiguation on the horizontal or azimuth plane, with greater focus on indoor environments. In order to estimate the vertical direction of the sound source, spectral cue analysis is required, as the direction heavily depends on the spectral composition of the sound. The shoulders, head, and especially pinnae act as filters interfering with incident sound waves by reflection, diffraction, and absorption [22,23]. Careful selections on the head dampening factor, materials for the ears/pinnae development, and location of the microphones are equally important for a realistic distortion of the sounds [24]. In [25], artificial human ears that were made from silicone were used to capture binaural and spectral cues for localization in both azimuth and elevation planes. To enhance the localization and disambiguation performance while retaining the binaural hearing technique and structure, a number of recent works have proposed using active ears, which is inspired by animals, such as bats, which are able to change the shape of their pinnae [26,27,28]. In this regard, the ears act as actuators that can induce dynamic binaural cues for a better prediction.
While reproducing the auditory model of binaural hearing may be a challenging problem, the past decade has seen a renewed interest in binaural approaches to sound localization, which has been applied in a wide area of research and development, including rescue and surveillance robots, animal acoustics, as well as human robot interactions [29,30,31,32,33,34,35]. Unique or predetermined sound sources for instance can be embedded with search and rescue robots for ad-hoc localization in hazardous or cluttered environments, as well as for emergency signals in remote or unknown areas [36,37]. This approach is particularly useful when searching that is based on visual is occluded by obstacles, but allows for sound to pass through [38].
Inspired by the intricacies of human ears and how they can benefit plethora of applications if successfully reproduced, this research aims to build a binaural model that is similar to the human auditory system for both vertical and horizontal localizations using a pair of ears that were 3D-printed out of Polylactic Acid (PLA). Unlike silicones, which were mostly used by past researchers for binaural modelling [39], PLA is generally more rigid and a more common material in domestic 3D printers. Using 3D printed ears with PLA will also allow for cheaper and quicker replication of this work in future studies. The ears that were anatomically modeled after an average human ear were additionally mounted on a Styrofoam head of a mannequin to get the shape and size of an average human. The purpose is to build a binaural recording system similar to the HATS to be able to capture all different auditory cues, and for the system to have a head shadow to change the spectral components. The HATS replica may not be as good as the actual simulator, but it provides a cheap and quick alternative for simple measurements.
In this work, an analysis on the proposed model was firstly conducted to study the correlations between the spatial auditory cues and the 3D polar coordinate (i.e., distance, azimuth and elevation angles) of the targeted sound source. Apart from the techniques via interaural and spectral cues, the time for the sound pressure level (SPL) resulting from the combined direct and reverberant intensities to decay by 60 dB (hereafter denoted as D R T 60 ) is also introduced as part of the localization strategy. The preliminary analysis reveals that the latter provides a much more accurate distance estimation as compared to the SPL approach, but is alone not sufficient to disambiguate the front-rear confusions. For vertical localization, it is also shown that the elevation angle can be robustly encoded through the spectral notches. By analysing the strengths and shortcomings of each estimation method, a new algorithm is formulated in order to localize the sound source, which is also further improved with induced secondary cues via cross-correlation between the interaural and spectral cues. The contributions of this paper can thus be summarized as follows:
(a)
auditory cue analysis of ears that were 3D-printed out of cheap off-the-shelf materials, which remains underexplored; and,
(b)
a computationally less taxing binaural localization strategy with D R T 60 and improved disambiguation mechanism (via the induced secondary cues)
This work is motivated by recent studies on binaural localizations for indoor environments that utilized ITD in a 3D space [40], both ILD and ITD [41] in a 2D space, and DRR for distance estimation of up to 3 m [42]. The aforementioned literatures however did not use pinnae for front-rear disambiguation, hence requiring either a servo system to integrate rotational and translational movements of the receiver or other algorithms to solve for unobservable states in the front-rear confusion areas until the source can be correctly localized. Nevertheless, instead of targeting indoor environments with additional control systems to reduce the estimation errors, the focus of this work is on outdoor environment with a relatively larger 3D space. As both the DRR and reverberation time (RT) change with the distance between the source and the receiver [43], particularly in outdoor spaces [44,45], this property has been exploited and correlated with the distance to further improve the estimation accuracy. The proposed technique in this study has been validated via a series of experiments where the sound source was randomly placed at 30 different locations with the distance between the source and receiver of up to 19 m.

2. Sound Ambiguity

Human binaural hearing is able to approximate the location of a sound source in a spherical or 3D coordinate (i.e., azimuth and elevation planes). This is achieved by the shape and position of the ears on the head, and the auditory cues interpreted by the brain. As illustrated in Figure 1, the azimuth angle is represented by θ , while the elevation angle is represented by ϕ . Moving the sound source from left to right would change θ and moving it up and down will change ϕ , and each varies from 0 ° to  360 °.
With only two microphones in a binaural system, there will be localization ambiguity on both azimuth and elevation planes. With regard to the azimuth plane, for every measurement, there will be an ambiguous data point located at the mirrored position along the interaural axis where the two microphones are placed as illustrated in Figure 2a. Localizing the sound source on the elevation plane is relatively more difficult as there will be an infinite amount of ambiguous positions as depicted in Figure 2b. This paper looks into finding the actual sound location by using auditory cues. When the distance between a sound source and the microphones is significantly greater than the distance between the microphones, we can consider the sound as a plane wave and the sound incidence reaching each microphone as parallel incidence.
For the elevation angle ϕ , when the sound source is located at θ = 0 ° , ITD and ILD for each ear would theoretically be the same. For an omni directional microphone, this would be impossible to solve as there are infinite number of possibilities where the actual sound source could be located, as there would be no difference in values when taking measurements. Nevertheless, with the addition of the head and ear, sound is reflected and attenuated differently, as it is moved around the head. Attenuation happens in both the time and frequency domain for different frequency ranges. This work aims to localize a sound source in the azimuth and elevation planes while only retaining the actual location by removing the ambiguous points. The proposed method in this work is based on the analysis of different auditory cues and characterization of their properties in order to estimate the location of the sound source relative to the receiver.

3. Materials, Methods and Analysis

The experiment setup in this work follows the HATS, where the geometry is the same as an average adult. For this work, the ear model which was 3D-printed out of PLA was scaled to the dimensions, as shown in Figure 3a to fit the application (for the pinnae shape, we have referred to https://pubmed.ncbi.nlm.nih.gov/18835852/ which provides the database containing 3D files for human parts for biomedical research). A microphone slot for each side of the head model was also designed as depicted in Figure 3a,b, and the two microphones were connected to a two-channel sound card (Focusrite Scarlett 2i2 model) for simultaneous recording (Figure 3c). The hardware component consists of two parts, namely the microphone bias and the sound card, as shown in Figure 3d. The gain was adjusted for each ear to balance the gain for the left and right ear. The ears were also polished with a solvent to smooth out the plastic parts, and treated with the fumes from the solvent to smooth out the internal parts. A mechanical mesh was then placed on top of each microphone when assembling the 3D printed ear to act as a filter. For printing, a DreamMaker Overlord 3D printer was used. Details on the printing parameters are listed in Table 1 (the printer is available at https://www.dfrobot.com/product-1299.html, while the STL file is available from the Supplementary Materials). The total cost for this setup is approximately USD175 (i.e., USD11.4 for the 3D-printed ears, USD 2.89 for the Styrofoam head, USD156.5 for the sound card, and USD 4.2 for the bias circuit).
Figure 4 shows the binaural processing chain within the device under test (DUT) in order to localize the sound source. The first stage is the data acquisition from the left and right inputs of the microphones. To ensure no sound leakage, the microphone is sealed to the printed ear with silicone. The next stage is the amplifier stage, which biases the signal to 1.5 V. The microphones used are rated to 3.0 V, so using a standard lithium battery was enough to prevent the microphones from saturating at the reference point. Following the amplifier (after the analog-to-digital (ADC) converter) is the filtering stage, which consists of a bandpass filter with a cut-off frequency, f c = 3.5  kHz and a bandwidth, BW = 1.2 kHz to attenuate the effects from environmental noise. The sound source considered has a frequency range from 2.8 kHz to 4.0 kHz, hence other frequencies beyond this range can be filtered out.
The following block is the analysis of the auditory cues split into four categories, spectral cues (SC), and D R T 60 (explained further in Section 3.2), SPL and ITD. Fast Fourier Transform (FFT) is performed on the filtered signal in order to find the spectral components in the frequency domain of the audio source used in SC. D R T 60 , SPL, and ITD, on the other hand, are measured in time domain. The ITD and SC are essential for azimuth angle and elevation angle estimations respectively. In order to estimate the Euclidean distance between the center of the DUT and the sound source, both D R T 60 and SPL will be used. Ambiguous data points are then filtered before the sound source is localized.
Before the actual experiment was conducted, a preliminary analysis was performed in order to ensure its feasibility. To observe the SPL and frequency responses with respect to azimuth and elevation angles, a sound source was placed at d = 110  cm from the receiver and positioned on a rotating jig as depicted in Figure 5. A Bluetooth speaker was used in order to play a recording of the sound source intended for the actual experiment. Figure 6 shows how θ and ϕ are measured on their respective planes.
During the pilot testing (a controlled environment with a noise floor below −60 dB was selected for this testing), the sound source was rotated along the azimuth and elevation planes at a step of 15 ° . The jig was used to ensure consistent angle increments and keep the sound source at a fixed distance. In this test, both left and right audio were captured simultaneously, and each test was repeated three times to analyze the consistency of the measurement setup. Figure 7 shows the polar plots for the SPL that was measured at the left and right ears for the three trials on both the azimuth and elevation planes. For every instance of azimuth and elevation angle, FFT was applied to the signal and the peak at each desired frequency point was measured. Figure 8 illustrates the frequency responses of the spectral components of the sound source.
Figure 9 illustrates the variations of the SPL and the frequency response in a 3D Cartesian plane, where x 0 , y 0 and z 0 correspond to d cos θ , d sin θ and d sin ϕ , respectively. Based on the SPL response, it is observed that the variations of the amplitude are relatively much smaller on the azimuth plane (i.e., y vs x) as compared to that on the elevation plane (i.e., z vs y). With regard to the frequency response, it can be seen that the amplitude on the azimuth and elevation planes change significantly enough that it is distinguishable from other coordinates. This is due to the shape of the ears as well as reflections around the head which induced notches into the spectrum. This signifies the suitability of the cues to be used as part of the techniques for horizontal localization, and the notches in the frequency response for vertical localization. The following sections will describe in greater detail how these properties, along with ITD, D R T 60 , and SC will be exploited in order to localize the sound source.

3.1. Interaural Time Difference (ITD)

In order to estimate the direction of the sound source, the angle of the incident wave with respect to the DUT, which is also known as angle of arrival (AoA) needs to be found. This is done by comparing the delay between the sound signal of the two microphones, which is termed ITD in the context of binaural hearing. To this purpose, let the ITD be written as τ d = | t R t L | , where t R and t L refer to the time of arrival of the sound between both microphones, which is 0.20 m, and ν s is the speed of sound, i.e., 343 ms 1 . From the illustration shown in Figure 10, it can be intuitively seen that the wave front will arrive at Mic L later than it does at Mic R. The AoA, β , as seen by Mic L relative to Mic R can be calculated using Equation (1), below.
Δ d = ν s τ d ; Δ d = Δ x sin β ; β = arcsin ( Δ d Δ x ) .
To quantify the phase shift, cross-correlation was applied in order to measure the ITD between the two signals. The cross-correlation in Equation (2) is used to calculate the ITD between the two signals, where N = 44,100 refers the total number of observations, m a ( i ) is the signal that is received by Mic R and m b ( i ) is the signal received by Mic L. The notations m ¯ a and m ¯ b denote the mean of m a ( i ) and m b ( i ) , respectively. The cross correlation coefficient R a b can then be calculated, as follows
R a b ( τ d ) = i = 1 N [ ( m a ( i ) m ¯ a ) · ( m b ( i τ d ) m ¯ b ) ] i = 1 N ( m a ( i ) m ¯ a ) 2 i = 1 N ( m b ( i τ d ) m ¯ b ) 2
which would return a value ranging from −1 to 1. Audio was taken at a sampling rate, f s , of 44,100 samples per second. The returned value of the cross correlation coefficient would denote how many samples apart the two wave forms are.
It is worth noting that the ILD can also be used as a means of measuring the AoA by comparing the ratio of attenuation between each ear. The amount of attenuation and the ratio between left and right would be characterized by placing the sound source at θ = 90 ° and θ = 270 ° . The ILD is able to capture the AoA by comparing the attenuation, but it is not as accurate as using the ITD. As an example, when the audio source is closer to the left ear at θ = 45 ° , the amplitude is higher than the right and vice versa. When the sound source is at θ = 0 ° , the amplitude is roughly the same level. The method of using ILD to estimate the angle is inaccurate and unreliable when compared to cross correlation of ITD. There are many factors affecting the attenuation of sound, such as environment, distance from sound source, and reflection, which can cause the estimation of angle based on this parameter to be temperamental. Since cross correlation looks at the similarity of the audio signal between left and right, it is more robust and not as susceptible to interference. In this work, the cross correlation of ITD is more consistent at determining the AoA when compared to the attenuation ratio method of estimating AoA based on ILD. From the testing, the ILD estimation method using the attenuation ratio has an error of ± 20 ° , while ITD has an error of ± 10 ° . Although the ILD is not directly used in the estimation of angle in this study, the SPL at each ear are instrumental for distance estimation and front-rear disambiguation. The subsequent sections will present the analyses on D R T 60 and ILD along with the proposed methods in order to estimate the distance and direction of the sound source.

3.2. Direct and Reverberant Energy Fields

While the RT is predicted to be constant using the Sabine’s equation in many enclosed acoustical environments, it has been shown in [43] that it can vary with the distance between the sound source and the receiver under certain circumstances, thus contributing to the variation of the DRR with distance. The dependency of the RT with distance is also more prominent in outdoor spaces as reported in [44,45]. As a consequence, the SPL that is measured at the receiver is usually a combination of energies from both the direct and reverberant fields, which is consistent with the theoretical conclusion in [21]. Hence, depending on applications, considering the combined pressure level would be relatively more practical due to the observed dynamics of both DRR and RT in past studies.
In this work, a car honk was used as the targeted sound source as it creates distinctive acoustic characteristics that are suitable for outdoor spaces. The impulse to noise ratio (INR) for this sound is above 44.2 dB, which is sufficient according to the ISO 3382-2 for accurate RT measurement in outdoor spaces within 50 m range [45]. Its unique identity was represented by its frequency components, where the range varied from 2.9 kHz to 4.0 kHz with peaks at every 200 Hz interval. In this analysis where the setup was done outdoors, the sound source was initially placed at the front of the DUT on the azimuth plane (i.e., θ = 0 ° , ϕ = 0 ° ), and data was captured when it was located at varying distances ranging from 1 m to 19 m. Figure 11a shows the time response of the measured sound amplitude after the source was abruptly switched off at different distances. In order to calculate the D R T 60 , which refers to the time for the combined direct and reverberant energy level to decay by 60 dB, the perceived signal was firstly band-passed to the desired frequency range of 2–4 kHz. Considering E ( t ) = t h 2 ( τ ) d τ as the energy decay curve from time t where h ( t ) is the impulse response from the band-passed signal, a linear regression was performed in order to estimate the slope, S, between the −5 dB and −25 dB level range (similar to RT estimation via T20 method: https://www.acoustics-engineering.com/files/TN007.pdf). The D R T 60 can then be estimated as 60 / S . The corresponding D R T 60 against distance is depicted in Figure 11b which shows the average D R T 60 of five trials along with the error bars.
In order to analyze the variation of D R T 60 further, the same test was conducted with θ varied from θ = 0 ° until θ = 360 ° at a step of 45 °. Figure 12 shows the D R T 60 against the azimuth angle. The measured D R T 60 however did not reveal any distinctive trend and it only has small deviations at different angles.
Comparing Figure 11 and Figure 12, it can be concluded that the D R T 60 value changes most significantly against distance, and the variation against θ is negligibly small. The next section will explain how the D R T 60 response will be used along with the SPL in order to estimate the distance and treat the ambiguity issue.

3.3. Ambiguity Elimination and Distance Estimation

Apart from the D R T 60 test, another test to investigate the variation of SPL was also conducted. Figure 13 shows the variations of the average SPL from both ears against distance when the sound source was located at the front (blue line) and back (orange line) positions. The average amplitude and error bars are represented by the curve and vertical lines, respectively. Theoretically, the sound intensity changes with distance following the inverse square law, as represented by the yellow line in the figure.
A large difference can be seen from the theoretical and measured SPL curves due to the existence of ears and head as well as environmental effects. The amplitude attenuation is also relatively higher when the sound source is located at the back of the head as compared to the front. Based on the SPL measurements, the following correlation can be derived:
α j ( d ) = p j d q j + r j ; p j R q , j , r j R + 0 < d 30 ; 30 < α j < 0 ; for j = f , b ;
where α f and α b represent the average SPL for front and back positions respectively. Via curve fitting techniques, one will obtain ( p f , q f , r f ) = ( 5.2 , 0.4689 , 7.085 ) and ( p b , q b , r b ) = ( 7.7 , 0.4599 , 19.989 ) . It is worth noting that the sound amplitude alone is insufficient to determine both the distance and direction. To treat this issue, the attenuation of the sound source is used in order to eliminate the ambiguity of sound’s location, since the D R T 60 value is relatively more consistent for all values of θ and ϕ . Via regression, Equation (4), which provides a less mean squared error than other polynomials can be derived with ( p R , q R , r R ) = ( 0.01693 , 8.3494 , 204.1312 ) , which represents the correlation between D R T 60 (denoted by τ R in milliseconds) and the distance d.
τ R ( d ) = p R d 2 + q R d + r R ; p R , q R , r R , R + 0 < d 30 ; 100 < τ R < 400 .
Hence, the inverse function of Equation (4) can be attained as follows:
d R = 0.5 q R / p R + ( 0.5 / p R ) q R 2 4 p r ( r R τ R )
which returns the distance estimated based on the value of τ R measured from the received signal.
Likewise, the estimated distance based on SPL measurements can be obtained in a similar manner from Equation (3), which leads to   
d j = ( ( α r j ) / p j ) ( 1 / q j ) ; j = f , b
where α is the SPL, d f and d b are the predicted distance values for the front and back locations. In order to eliminate the sound source ambiguity, two parameters need to be observed; the first one is the difference between d j and d R , and the second one is the elevation angle ϕ (the method to estimate this is presented in Section 3.4). For the first one, the values of d b and d f are compared against the value of d R , and the one with the closer value will return the estimated distance and direction based on SPL, denoted by d α , and the other will be the ambiguity to be eliminated. With regard to the second parameter, two sets of angles can be firstly defined, as follows:
Ω f = ϕ R | ϕ [ 0 ° , 90 ° ] ( 270 ° , 360 ° ] ; and Ω b = ϕ R | ϕ ( 90 ° , 270 ° ]
where Ω f and Ω b refer to the yellow area and blue area in Figure 6b, respectively. The ambiguity checker can then be written as:
( d α , η ) = ( d f , 1 ) if { d b R > d f R } { ϕ Ω f } ( d b , 0 ) if { d b R d f R } { ϕ Ω b }
where d b R = | d b d R | , d f R = | d f d R | , and η = 1 and η = 0 indicate whether the sound source is located at the front position with respect to DUT respectively. As there are now two methods for estimating the value of d (i.e., via D R T 60 and via SPL), the following technique is proposed:
d ^ = ν d α + ( 1 ν ) d R ; ν [ 0 , 1 ]
with ν representing the weighting parameter that varies between 0 and 1. To find the optimal value of ν , a further analysis was conducted based on 16 datasets, as presented in Table A1 (in Appendix A), where half of them refer to the case when ϕ Ω f , while the other half refer to the case when ϕ Ω b . In this analysis, the distance between the DUT and source varied between 6 m and 19 m. The cumulative distance error, which reads:
E c u m = k = 1 8 e k ; e k = | d d ^ |
with d being the actual distance is considered. Figure 14 and Figure 15 show the corresponding plots when ϕ Ω f and when ϕ Ω b , respectively. By observing the value of ν when E c u m is minimum, it is found that the distance error can be minimized when ν = 0.37 when d α = d f , and ν = 0 when d α = d b . The latter indicates that the estimated distance that is based on the D R T 60 is generally much closer to the actual value when the sound source is located at the back of the DUT, thus only d R is considered in this scenario.
Combining Equations (8) and (9) and solutions from Figure 14 and Figure 15, the distance estimation along with ambiguity elimination can be further derived, as follows:
d ^ = 0.36 d f + 0.64 d R if ϵ = 1 d R if ϵ = 0
where
ϵ = 1 if { d b R > d f R } { ϕ Ω f } 0 if { d b R d f R } { ϕ Ω b }

3.4. Spectral Cues (SC)

The clues to sound location that come from sound frequency are called spectral cues. These spectral cues derive from the acoustical filtering of an individual’s auditory periphery. Since the angle and distance on the azimuth plane can be calculated using ITD, SPL and D R T 60 , but not for the elevation plane ϕ , the spectral cues are vital in determining the elevation of the sound source. The ambiguous data points in the cone of confusion can be reduced using mathematical estimation. This work addresses the cone of confusion by characterizing the attenuation of different frequency elements against ϕ . Figure 16 depicts the amplitude ( A p ) at each peak frequency, f p , when the sound source was placed at ϕ = 0 ° (blue line), ϕ = 90 ° (orange line), ϕ = 180 ° (yellow line), and ϕ = 270 ° (purple line). The data were also captured at three different distances; d = 6  m (a), d = 13  m (b), and  d = 19  m (c).
In order to characterize the amplitude response against ϕ at each peak frequency, a linear regression was performed based on the average values of A p in Figure 16, which led to the following statement:
A p = γ 1 ( ϕ ) if f p = 2.9 k H z γ 2 ( ϕ ) if f p = 3.1 k H z γ 3 ( ϕ ) if f p = 3.3 k H z γ 4 ( ϕ ) if f p = 3.5 k H z γ 5 ( ϕ ) if f p = 3.7 k H z γ 6 ( ϕ ) if f p = 3.9 k H z undefined otherwise
where
γ i ( ϕ ) = a i ϕ 2 + b i ϕ + c i α 0 , a i , c i R + , b i R ; i = 1 , 2 , , 6 ,
with α 0 R being the amplitude in dBFS of the received signal, and α i , c i R + and b i R are the coefficients that depend on α 0 .
By measuring A p and α 0 from the incoming signals’ spectral components, the angle ϕ i can then be calculated by solving the inverse function of Equation (14), which reduces to
ϕ i = 0.5 b i / a i ( 0.5 / a i ) b i 2 4 a j ( c i α 0 γ i ) ; i = 1 , 2 , , 6 .
In order to obtain the estimated ϕ when the sound source is placed at a particular location, the calculated angle is averaged over all peak frequencies. Figure 17 shows the results from a simple test when the source was placed at ϕ = ( 0 ° , 45 ° , 90 ° , 135 ° , 180 ° ) . The left plot is the case when the source was 6 m away from the DUT, while the right plot is the case when the source was 13 m away from the DUT. From the test, it was found that the magnitude of the error only varied between 1.34 ° and 6.22 ° , which can be considered to be small, as the average error is less than 3.5%. Thus, the close relationship between the SC cues and the elevation angle will allow for the vertical direction of the source to be robustly localized.

3.5. Binaural Localization Strategy

In summary, the direction of the sound source on the azimuth plane can be calculated using the ITD cue via cross correlation on the incident signals. The resulting AoA can then be used in order to estimate the value of θ . To predict the actual distance of the source from the DUT, the properties from the SPL cues can be exploited. Nevertheless, due to the structure of the head and the 3D-printed ears, estimations via SPL are not sufficient, thus the estimation via D R T 60 auditory cue that has less variation against angles is needed together with the weighting parameter derived in the preceding section to remove ambiguous data points. With regard to the elevation angle, SC will be exploited by finding the amplitude and peak frequencies from the signal’s spectral components.
To improve the performance during real-time experiments, induced secondary cues are introduced based on the estimated distance and elevation angle, which are represented by η and μ , respectively. Specifically, η = 1 when the sound source is estimated at the front side of the DUT (based on the SPL), and μ = 1 when the estimated ϕ is within Ω f . Hence the parameter ϵ will be unity when both η and μ are one, which corresponds to Equation (12). This will be the first stage of the ambiguity elimination technique. To treat the front-rear confusion further on the resulting azimuth angle, the values of η and μ will be cross-checked at the second stage; i.e., if η = 0 and μ = 1 , then the sound source is expected to be at the mirrored position along the interaural axis (i.e., front side). This was formulated based on the idea that prediction based on μ would be more accurate due to the small position errors that are presented in Section 3.4. However, exceptions are imposed for the border case where the estimated angle within the margin areas; i.e., ( 85 ° , 95 ° ) and ( 265 ° , 275 ° ) remain unchanged. The whole procedure for the binaural localization with ambiguity elimination partitioned into two stages is summarized in Algorithm 1. For clarity purposes, θ ^ , ϕ ^ , d ^ will be used to denote the estimated values for θ , ϕ and d, respectively.
Algorithm 1 Binaural Localization via Spatial Auditory Cues
Require: SPL, D R T 60 , SC
Ensure: θ ^ , ϕ ^ , d ^ and x , y , z                 ▹ Estimated coordinates
1:
while true do
2:
    procedure Distance Estimations(SPL, D R T 60 )
3:
         { d R } Equation (5) { α }
4:
         { d b , d f } Equation (6) { α }
5:
    end procedure
6:
    procedure Azimuth Angle Encoding(SPL)
7:
         { τ d } Equation (2)
8:
         { β } Equation (1)
9:
         { θ 0 } { β }               ▹ Estimated θ (before correction)
10:
    end procedure
11:
    procedure Elevation Angle Encoding(SC)
12:
         { ϕ i ( i = 1 , , 6 ) } Equation (15) { A p , α 0 }
13:
         ϕ ^ = ( 1 / 6 ) i = 1 6 ( ϕ i )                    ▹ Estimated ϕ
14:
    end procedure
15:
    procedure Ambiguity Elimination( ϕ ^ , d b , d f , d R )
16:
         d b R = | d b d R | ; d f R = | d f d R |
17:
        if d f R < d b R then                     ▹ Stage 1
18:
            η = 1 ;
19:
        else η = 0 ;
20:
        end if
21:
        if ϕ ^ Ω f then
22:
            μ 1 ;
23:
        else
24:
            μ 0 ;
25:
        end if
26:
         ϵ = μ × η ;
27:
         { d ^ } Equation (11) { ϵ , d f , d R }             ▹ Estimated d
28:
        if ( μ = 1 and η = 0 ) then                 ▹ Stage 2
29:
           if ( 0 θ ^ 85 ) then
30:
                θ ^ 180 ° θ 0 ;             ▹ Mirrored angle (left side)
31:
           else if ( 275 θ ^ < 360 ) then
32:
                θ ^ 540 ° θ 0 ;           ▹ Mirrored angle (right side)
33:
           end if
34:
        else
35:
            θ ^ θ 0 ;
36:
        end if
37:
    end procedure
38:
    procedure Localization( ( θ ^ , ϕ ^ , d ^ ) )  ▹ Polar to 3D Cartesian coordinates
39:
         x = d ^ sin ( ϕ ^ ) cos ( θ ^ ) ; y = d ^ cos ( ϕ ^ ) cos ( θ ^ ) ; z = d ^ cos ( ϕ ^ )
40:
    end procedure
41:
end while

4. Experiments and Performance Evaluations

This section presents the results from real-time experiments when the sound source was placed at 30 different locations in the 3D space. The tests were conducted in a car park area with the model being placed on the road, as shown in Figure A1 (in Appendix A), which has existing linear markers that allow for accurate distance and direction measurements. Three different distances i.e., d = 6 m, d = 13 m, and d = 19 m, with various sets of θ and ϕ were randomly selected for performance evaluations. Without a loss of generality, measurements for θ and ϕ were taken by rotating the receiver instead of the sound source, as it was relatively easier to control.
The values for d ^ , θ ^ and ϕ ^ when Algorithm 1 was applied are presented in Table 2, which have been partitioned according to the values of d. All of the captured data, including the secondary cues, η , μ and ϵ that were used for ambiguity elimination can be referred in Table A3 in the Appendix A. For clarity purposes, the variable k is used in order to represent the experiment number for each distance considered. Figure 18 shows the estimated and actual locations of the sound source with respect to the DUT in a 3D Cartesian plane that have also been plotted according to the values of d, i.e., (a) d = 6 m, (b) d = 13 m, and (c) d = 19 m. The actual coordinates are represented by the colored circles, while the corresponding predicted coordinates are represented by the “diamonds” of the same color. The numbers next to the circles are included to denote the values of k from Table 2. By observing the plots, all of the coordinates considered were correctly localized with small position errors, except for k = 6 in (a). This was caused by the value of η which was supposed to be 1 instead of 0, hence the estimated azimuth was interpreted at the mirrored position of the captured angle, which explains the large difference. Nevertheless, when comparing with the results without the application of Algorithm 1 from Table 3 (complete individual data in Table A3), we can see that the total number of ambiguous points (AP) is 9. This demonstrates that the proposed method has significantly reduced the total number of AP.
In order to evaluate the localization performance, the following errors are defined:
e ( j ) = j j ^ ; j = d , θ , ϕ
which calculates the deviation of the estimated from the actual values, and
E a v ( j ) = 1 10 k = 1 10 | e k ( j ) | ; j = d , θ , ϕ
which is the average value of absolute errors. Figure 19 shows the plots of e ( d ) (represented by the blue line), which is also compared against the corresponding errors when d b , d f , and d R are used as the estimated distance. From the plot, it is observed that the proposed method has successfully kept the error minimum for all experiments when compared to the performance by the other three methods.
With regard to the accuracy of the estimated angles, Figure 20, which shows the plots for e ( θ ) and e ( ϕ ) , is also compared against the error before the azimuth angle was amended in Stage 2 of Algorithm 1, i.e., θ 0 . The large peaks shown from the orange plots correspond to the results from the ambiguous data points where the mirrored positions of the source were not corrected using the secondary cues from the proposed method. Other than that, it is observed that ε ( ϕ ) is consistently close to zero for all experiments, which has also become the contributing factor for the success in the ambiguity elimination technique. The overall average errors from both figures are summarized in Table 3 where E ˜ a v = ( E a v , d = 6 + E a v , d = 13 + E a v , d = 19 ) / 3 . From the data presented, the proposed method has significantly improved the performance by reducing the errors in distance and angle estimations. It is also worth noting that, without the D R T 60 and SC measurements as well as the secondary cues, the estimated sound source locations on the azimuth plane would be 100% ambiguous. In particular, with only Stage 1 in Algorithm 1, which also heavily relies on the ITD method (refer to θ 0 in Table 3), the total ambiguous points (AP) was reduced to 30 % , but, when combined with Stage 2 (refer to θ ^ in Table 3), the total AP has been considerably reduced to 3.3 % . Table 3 also shows that, due to the large number of AP from θ 0 , the average error, E ˜ a v , is approximately 28.3 ° , which is significantly higher than that when the complete Algorithm 1 is applied, which only gives an average error of 9.6 ° .

5. Discussion

The results, as presented in Table 3, have demonstrated significant improvements in the distance and angle estimations, thus showing that using PLA-based 3D printed ears is practical, particularly for front-rear disambiguation in outdoor environments. While this might work in several other environments, modifications on the strategy may be needed if there is a sudden or drastic change in the acoustic scene. Thus, to detect as well as identify the changes, machine learning can be used and the resulting mechanism can be embedded into the system so as to ensure the proposed strategy is adaptive to the changes. Apart from that, as the reverberation properties in outdoor spaces can be modeled according to the sound source frequency as well as the nature of the spaces, the D R T 60 -based distance estimation technique in Section 3.2 can always be tuned in order to make it applicable to other environments.

6. Conclusions and Future Work

This paper contributes its findings to binaural localization using auditory cues. Instead of using a HATS (this costs approximately USD20k, and USD120 for daily rent) or an ear simulator, this work uses a pair of cheap PLA-based 3D-printed ears with mechanical acoustic dampers and filters covering the microphones. The analysis that was obtained from this work shows that there is a possibility in using cheap 3D-printed materials in order to simulate an actual ear. Other benefits of using a 3D printed ear include the ability to quickly replicate this work, and to make modifications to the existing design to study how different shapes would affect the result.
From the conducted experiments, it has been demonstrated that the proposed strategy can considerably improve the binaural localization performance with average errors of 0.5 m for distance, 9.6 ° for azimuth angle, 10.3 ° for elevation angle, and, most importantly, a significant reduction of total ambiguous points to 3.3%. The results also reveal that the proposed model and methodology can provide a promising framework for further enhancement of binaural localization strategy.
Having dynamic cues, in addition to what this work has presented, can help enhance the accuracy, particularly when there is a drastic change in the acoustic scene or when the targeted sound source is moving. Tracking a moving source or multiple sources is significantly more complex, as Doppler effects come into play and, thus, the spectral cues has to account for the phenomena. Dynamic cues are useful to help further improve how the receiver perceives sound by essentially getting more sets of data. As discussed in Section 5, the method can be paired with advance algorithms in future works, such as deep learning, to help improve the detection of acoustic cues that are based on different situations.

Supplementary Materials

The following are available online at https://www.mdpi.com/1424-8220/21/1/227/s1.

Author Contributions

Conceptualization, T.M.T. and N.S.A.; Methodology, T.M.T. and N.S.A.; Software, T.M.T.; Validation, T.M.T., N.S.A., P.G. and J.M.-S.; Formal analysis, N.S.A.; Investigation, T.M.T. and N.S.A.; Resources, T.M.T.; Data curation, T.M.T.; Writing—original draft preparation, T.M.T.; Writing—review and editing, N.S.A.; Visualization, N.S.A, P.G. and J.M.-S.; Supervision, N.S.A.; Project administration, N.S.A.; Funding acquisition, J.M.-S.. All authors have read and agreed to the published version of the manuscript.

Funding

Universiti Sains Malaysia, Research Universiti Individual (RUI) Grant (1001/PELECT/8014104).

Acknowledgments

The authors would like to thank Universiti Sains Malaysia for the financial support under the Research University-Individual (RUI) Grant Scheme (1001/PELECT/8014104).

Conflicts of Interest

The authors declare no conflict of interest.

Notations and Acronyms

The following notations and acronyms are used in this manuscript:
Notations/ AcronymsDescriptions
PLAPolylactic Acid
HATSHead and Torso Simulator
HRTFHead Related Transfer Function
DRRdirect-to-reverberant ratio
SPL, ILD, ITDsound pressure level, interaural level difference, interaural time difference
AoAAngle of Arrival
f s Sampling rate
Mic L, Mic RLeft microphone, right microphone
RTreverberation time
D R T 60 the estimated time for the combined direct and reverberant energy decay curve to drop by 60 dB
SCspectral cues
ADCanalog to digital converter
DUTdevice under test
FFTFast Fourier Transform
dB, dBFSdecibels/decibels relative to full scale
θ , ϕ ,dactual azimuth angle, elevation angle, Euclidean distance of the sound source from the DUT
θ ^ , ϕ ^ , d ^ estimated azimuth angle, elevation angle, Euclidean distance of the sound source from the DUT
β notation for AoA
θ 0 Estimated azimuth angle before correction
μ , η , ϵ induced secondary cues (as described in Algorithm 1)
x , y , z estimated coordinates of the sound source in the 3D space
τ d notation for ITD
d f , d b estimated distance based on SPL regression curve when the sound source is in the front/back of the receiver
d R estimated distance based on D R T 60
τ R notation for D R T 60 (in milliseconds)
Ω f , Ω b sets of elevation angle defined in Equation (7)
ν weighting parameter for the estimated distance
edeviation of the estimated from the actual values (for θ , ϕ , and d)
E c u m , E a v Cumulative error, Average of absolute error
R fields of real numbers
R + , R fields of positive real numbers, fields of negative real numbers

Appendix A

Table A1. Datasets for the analysis in Section 3.2.
Table A1. Datasets for the analysis in Section 3.2.
ϕ Ω f ϕ Ω b
k1234567812345678
d66810101216196681010121619
ϕ 00153030458080110110135180180180260260
θ 04590270135002700459027013500270
Figure A1. Experimental area and setup; (a) Illustration on how the DUT was rotated to estimate different values of ϕ ; (b) Illustration on the test for ( d , θ , ϕ ) = ( 13 , 0 ° , 45 ° ) ; (c) Illustration on the test for ( d , θ , ϕ ) = ( 19 , 0 ° , 45 ° ) ; (d) Illustration on the test for ( d , θ , ϕ ) = ( 6 , 270 ° , 0 ° ) .
Figure A1. Experimental area and setup; (a) Illustration on how the DUT was rotated to estimate different values of ϕ ; (b) Illustration on the test for ( d , θ , ϕ ) = ( 13 , 0 ° , 45 ° ) ; (c) Illustration on the test for ( d , θ , ϕ ) = ( 19 , 0 ° , 45 ° ) ; (d) Illustration on the test for ( d , θ , ϕ ) = ( 6 , 270 ° , 0 ° ) .
Sensors 21 00227 g0a1
Table A2. Absolute errors from the estimated distance for all experiments. Note that SPL and D R T 60 methods represent the individual components of the proposed method (Algorithm 1).
Table A2. Absolute errors from the estimated distance for all experiments. Note that SPL and D R T 60 methods represent the individual components of the proposed method (Algorithm 1).
d = 6d = 13d = 19
k 123456789101234567891012345678910
SPL ( d f )0.144.410.464.921.046.473.64.694.745.782.341.180.974.653.84.248.747.7518.16.710.792.84.739.919.890.362.061.467.876.2
SPL ( d b )1.421.461.031.770.662.730.961.641.672.315.394.674.541.121.631.361.340.756.920.125.824.623.470.40.416.095.065.421.62.59
D R T 60 0.30.270.310.380.470.230.060.530.140.151.450.911.310.721.50.81.621.270.540.570.140.810.870.920.780.040.60.730.080.01
Algorithm 10.250.270.310.380.670.230.060.530.140.150.030.120.450.722.320.81.621.270.540.570.170.490.870.920.780.080.350.050.080.01
Table A3. The estimated azimuth angles, elevation angles, secondary cues and ambiguous points (AP) for each experiment. The first, second and third tables from the top correspond to the data when d = 6 m, d = 13 m, and d = 19 m respectively.
Table A3. The estimated azimuth angles, elevation angles, secondary cues and ambiguous points (AP) for each experiment. The first, second and third tables from the top correspond to the data when d = 6 m, d = 13 m, and d = 19 m respectively.
k12345678910
Actual azimuth θ 090225270045002250
Estimated azimuth θ 0 9.6585.448.82769.2136.3173.49.16212.55.35
θ ^ (Alg. 1)9.6585.4131.22649.21143.76.319.16212.55.35
Estimated elevation ϕ ^ 6.29.67.78.2231.234.886.9172166260
Secondary cues μ 1111111000
η 1000100000
ϵ 1000100000
APWithout Alg. 10011001000
With Alg. 10000010000
Actual azimuth θ 04590135270135000270
Estimated azimuth θ 0 1.5442.287.749.126853.9174.37.665.69246
θ ^ (Alg. 1)1.5442.287.71312681265.697.665.69246
Estimated elevation ϕ ^ 5.8110.211.19.258.2221.685.7151256156
Secondary cues μ 1111111000
η 1110100000
ϵ 1110100000
APWithout Alg. 10001011000
With Alg. 10000000000
Actual azimuth θ 0459022527031545000
Estimated azimuth θ 0 5.4249.696.730925530542.321.353.222.32
θ ^ (Alg. 1)5.4249.683.323128530542.321.353.222.32
Estimated elevation ϕ ^ 9.255.327.938.3914.17.736.3271.3190262
Secondary cues μ 1111111100
η 1100011100
ϵ 1100011100
APWithout Alg. 10011100000
With Alg. 10000000000

References

  1. Argentieri, S.; Danès, P.; Souères, P. A survey on sound source localization in robotics: From binaural to array processing methods. Comput. Speech Lang. 2015, 34, 87–112. [Google Scholar] [CrossRef] [Green Version]
  2. Zhong, X.; Sun, L.; Yost, W. Active Binaural Localization of Multiple Sound Sources. Robot. Auton. Syst. 2016, 85, 83–92. [Google Scholar] [CrossRef]
  3. Kumpik, D.P.; Campbell, C.; Schnupp, J.W.H.; King, A.J. Re-weighting of Sound Localization Cues by Audiovisual Training. Front. Neurosci. 2019, 13, 1164. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  4. Zhang, P.; Hartmann, W. On the ability of human listeners to distinguish between front and back. Hear. Res. 2009, 260, 30–46. [Google Scholar] [CrossRef] [Green Version]
  5. Paul, S. Binaural Recording Technology: A Historical Review and Possible Future Developments. Acta Acust. United Acust. 2009, 95, 767–788. [Google Scholar] [CrossRef]
  6. Zhang, W.; Samarasinghe, P.N.; Chen, H.; Abhayapala, T.D. Surround by Sound: A Review of Spatial Audio Recording and Reproduction. Appl. Sci. 2017, 7, 532. [Google Scholar] [CrossRef]
  7. Yang, Y.; Chu, Z.; Shen, L.; Xu, Z. Functional delay and sum beamforming for three-dimensional acoustic source identification with solid spherical arrays. J. Sound Vib. 2016, 373, 340–359. [Google Scholar] [CrossRef]
  8. Fischer, B.J.; Seidl, A.H. Resolution of interaural time differences in the avian sound localization circuit—A modeling study. Front. Comput. Neurosci. 2014, 8, 99. [Google Scholar] [CrossRef] [Green Version]
  9. Du, R.; Liu, J.; Zhou, D.; Meng, G. Adaptive Kalman filter enhanced with spectrum analysis to estimate guidance law parameters with unknown prior statistics. Proc. Inst. Mech. Eng. Part G J. Aerosp. Eng. 2018, 232, 3078–3099. [Google Scholar] [CrossRef]
  10. Dorman, M.; Loiselle, L.; Stohl, J.; Yost, W.; Spahr, T.; Brown, C.; Natale, S. Interaural Level Differences and Sound Source Localization for Bilateral Cochlear Implant Patients. Ear Hear. 2014, 35, 633. [Google Scholar] [CrossRef] [Green Version]
  11. Fischer, R.; Weber, J. Real World Assessment of Auditory Localization Using Hearing Aids. Available online: https://www.audiologyonline.com/articles/real-world-assessment-of-auditory-localization-~using-hearing-aids-11719 (accessed on 30 May 2020).
  12. Spagnol, S. On distance dependence of pinna spectral patterns in head-related transfer functions. J. Acoust. Soc. Am. 2015, 137, EL58–EL64. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  13. Ahveninen, J.; Kopco, N.; Jääskeläinen, I. Psychophysics and Neuronal Bases of Sound Localization in Humans. Hear. Res. 2013, 307, 86–97. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  14. Risoud, M.; Jean Noel, H.; Gauvrit, F.; Renard, C.; Bonne, N.X.; Vincent, C. Azimuthal sound source localization of various sound stimuli under different conditions. Eur. Ann. Otorhinolaryngol. Head Neck Dis. 2019, 137, 21–29. [Google Scholar] [CrossRef] [PubMed]
  15. Zhong, X.L.; Xie, B.S. Head-Related Transfer Functions and Virtual Auditory Display. Soundscape Semiot. Localization Categ. 2014. [Google Scholar] [CrossRef] [Green Version]
  16. Kim, E.; Nakadai, K.; Okuno, H. Improved sound source localization in horizontal plane for binaural robot audition. Appl. Intell. 2014, 42, 63–74. [Google Scholar] [CrossRef]
  17. Georganti, E.; Mourjopoulos, J. Statistical relationships of Room Transfer Functions and Signals. In Proceedings of the Forum Acusticum, Aalborg, Denmark, 27 June–1 July 2011. [Google Scholar]
  18. Lovedee-Turner, M.; Murphy, D. Application of Machine Learning for the Spatial Analysis of Binaural Room Impulse Responses. Appl. Sci. 2018, 8, 105. [Google Scholar] [CrossRef] [Green Version]
  19. Ding, J.; Ke, Y.; Cheng, L.; Zheng, C.; Li, X. Joint estimation of binaural distance and azimuth by exploiting deep neural networks. J. Acoust. Soc. Am. 2020, 147, 2625–2635. [Google Scholar] [CrossRef]
  20. Pang, C.; Liu, H.; Zhang, J.; Li, X. Binaural Sound Localization Based on Reverberation Weighting and Generalized Parametric Mapping. IEEE/ACM Trans. Audio Speech Lang. Process. 2017, 25, 1618–1632. [Google Scholar] [CrossRef] [Green Version]
  21. Larsen, E.; Iyer, N.; Lansing, C.; Feng, A. On the minimum audible difference in direct-to-reverberant energy ratio. J. Acoust. Soc. Am. 2008, 124, 450–461. [Google Scholar] [CrossRef] [Green Version]
  22. Garas, J.; Sommen, P. Improving virtual sound source robustness using multiresolution spectral analysis and synthesis. In Proceedings of the Audio Engineering Society Convention 105, San Francisco, CA, USA, 26–29 September 1998. [Google Scholar]
  23. Iida, K. Head-Related Transfer Function and Acoustic Virtual Reality; Springer: Singapore, 2019. [Google Scholar]
  24. Fingerhuth, S.; Bravo, J.L.; Bustamante, M.; Pizarro, F. Experimental Study of the Transfer Function of Replicas of Pinnae of Individuals Manufactured with Alginate. IEEE Lat. Am. Trans. 2020, 18, 16–23. [Google Scholar] [CrossRef]
  25. Rodemann, T.; Ince, G.; Joublin, F.; Goerick, C. Using binaural and spectral cues for azimuth and elevation localization. In Proceedings of the 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, Nice, France, 22–26 September 2008; pp. 2185–2190. [Google Scholar]
  26. Heffner, R.; Koay, G.; Heffner, H. Use of binaural cues for sound localization in large and small non-echolocating bats: Eidolon helvum and Cynopterus brachyotis. J. Acoust. Soc. Am. 2010, 127, 3837–3845. [Google Scholar] [CrossRef] [PubMed]
  27. Schillebeeckx, F.; Mey, F.D.; Vanderelst, D.; Peremans, H. Biomimetic Sonar: Binaural 3D Localization using Artificial Bat Pinnae. Int. J. Robot. Res. 2011, 30, 975–987. [Google Scholar] [CrossRef]
  28. Odo, W.; Kimoto, D.; Kumon, M.; Furukawa, T. Active Sound Source Localization by Pinnae with Recursive Bayesian Estimation. J. Robot. Mechatron. 2017, 29, 49–58. [Google Scholar] [CrossRef]
  29. Grothe, B.; Pecka, M. The natural history of sound localization in mammals—A story of neuronal inhibition. Front. Neural Circuits 2014, 8, 116. [Google Scholar] [CrossRef] [Green Version]
  30. Heffner, H.; Heffner, R. The evolution of mammalian hearing. AIP Conf. Proc. 2018, 1965, 130001. [Google Scholar]
  31. Kulaib, A.; Al-Mualla, M.; Vernon, D. 2D Binaural Sound Localization: For Urban Search and Rescue Robotics. Mob. Robot. Solut. Chall. 2009, 423–445. [Google Scholar] [CrossRef] [Green Version]
  32. Rascon, C.; Meza, I. Localization of sound sources in robotics: A review. Robot. Auton. Syst. 2017, 96, 184–210. [Google Scholar] [CrossRef]
  33. Kerzel, M.; Strahl, E.; Magg, S.; Navarro-Guerrero, N.; Heinrich, S.; Wermter, S. NICO—Neuro-Inspired COmpanion: A Developmental Humanoid Robot Platform for Multimodal Interaction. In Proceedings of the 2017 26th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), Lisbon, Portugal, 28–31 August 2017. [Google Scholar]
  34. Deshpande, N.; Braasch, J. Detection of early reflections from a binaural activity map using neural networks. J. Acoust. Soc. Am. 2019, 146, 2529–2539. [Google Scholar] [CrossRef]
  35. Wang, M.; Zhang, X.L.; Rahardja, S. An Unsupervised Deep Learning System for Acoustic Scene Analysis. Appl. Sci. 2020, 10, 2076. [Google Scholar] [CrossRef] [Green Version]
  36. Argentieri, S.; Portello, A.; Bernard, M.; Danès, P.; Gas, B. Binaural Systems in Robotics. In The Technology of Binaural Listening; Blauert, J., Ed.; Springer: Berlin/Heidelberg, Germany, 2013; pp. 225–253. [Google Scholar]
  37. Ma, N.; Gonzalez, J.; Brown, G. Robust Binaural Localization of a Target Sound Source by Combining Spectral Source Models and Deep Neural Networks. IEEE/ACM Trans. Audio Speech Lang. Process. 2018, 26, 1. [Google Scholar] [CrossRef]
  38. Scharine, A.; Letowski, T.; Sampson, J. Auditory situation awareness in urban operations. J. Mil. Strateg. Stud. 2009, 11, 1–24. [Google Scholar]
  39. Sebastian Mannoor, M.; Jiang, Z.; James, T.; Kong, Y.; Malatesta, K.; Soboyejo, W.; Verma, N.; Gracias, D.; McAlpine, M. 3D Printed Bionic Ears. Nano Lett. 2013, 13, 2634–2639. [Google Scholar] [CrossRef] [Green Version]
  40. Gala, D.; Lindsay, N.; Sun, L. Realtime Active Sound Source Localization for Unmanned Ground Robots Using a Self-Rotational Bi-Microphone Array. J. Intell. Robot. Syst. 2019, 95, 935–954. [Google Scholar] [CrossRef] [Green Version]
  41. Magassouba, A.; Bertin, N.; Chaumette, F. Aural Servo: Sensor-Based Control From Robot Audition. IEEE Trans. Robot. 2018, 34, 572–585. [Google Scholar] [CrossRef]
  42. Zohourian, M.; Martin, R. Binaural Direct-to-Reverberant Energy Ratio and Speaker Distance Estimation. IEEE/ACM Trans. Audio Speech Lang. Process. 2020, 28, 92–104. [Google Scholar] [CrossRef]
  43. Lu, Y.C.; Cooke, M. Binaural distance perception based on direct-to-reverberant energy ratio. In Proceedings of the International Workshop on Acoustic Echo and Noise Control, Washington, DC, USA, 14–17 September 2008. [Google Scholar]
  44. Thomas, P.; Van Renterghem, T.; De Boeck, E.; Dragonetti, L.; Botteldooren, D. Reverberation-based urban street sound level prediction. J. Acoust. Soc. Am. 2013, 133, 3929–3939. [Google Scholar] [CrossRef] [Green Version]
  45. Yang, H.S.; Kang, J.; Kim, M.J. An experimental study on the acoustic characteristics of outdoor spaces surrounded by multi-residential buildings. Appl. Acoust. 2017, 127, 147–159. [Google Scholar] [CrossRef]
Figure 1. Illustration on the azimuth angle, θ , and elevation angle, ϕ , with respect to the model in three-dimensional (3D) space.
Figure 1. Illustration on the azimuth angle, θ , and elevation angle, ϕ , with respect to the model in three-dimensional (3D) space.
Sensors 21 00227 g001
Figure 2. (a) Ambiguity on the azimuth plane; (b) ambiguity on the elevation plane.
Figure 2. (a) Ambiguity on the azimuth plane; (b) ambiguity on the elevation plane.
Sensors 21 00227 g002
Figure 3. (a) Illustrations on the ear model from the STL file; (b) Left view of the HATS with the 3D printed ear; (c) Sketch of the setup with microphones on the left and right ears (i.e., Mic L and Mic R). (d) Detailed connections between the microphones and the computer.
Figure 3. (a) Illustrations on the ear model from the STL file; (b) Left view of the HATS with the 3D printed ear; (c) Sketch of the setup with microphones on the left and right ears (i.e., Mic L and Mic R). (d) Detailed connections between the microphones and the computer.
Sensors 21 00227 g003
Figure 4. Binaural processing chain within the device under test (DUT) to localize the sound source where ’ D R T 60 ’, ’SC’, ’SPL’, and ’ITD’ refer to the auditory cues.
Figure 4. Binaural processing chain within the device under test (DUT) to localize the sound source where ’ D R T 60 ’, ’SC’, ’SPL’, and ’ITD’ refer to the auditory cues.
Sensors 21 00227 g004
Figure 5. A Bluetooth speaker placed at 110 cm was rotated around the model during the measurements.
Figure 5. A Bluetooth speaker placed at 110 cm was rotated around the model during the measurements.
Sensors 21 00227 g005
Figure 6. Illustration on (a) θ on the azimuth plane; and (b) ϕ on the elevation plane.
Figure 6. Illustration on (a) θ on the azimuth plane; and (b) ϕ on the elevation plane.
Sensors 21 00227 g006
Figure 7. Sound pressure level (SPL) measured from the three trials for right and left ears on (a) azimuth plane; and, (b) elevation plane.
Figure 7. Sound pressure level (SPL) measured from the three trials for right and left ears on (a) azimuth plane; and, (b) elevation plane.
Sensors 21 00227 g007
Figure 8. Frequency responses on (a) azimuth plane, and; (b) elevation plane.
Figure 8. Frequency responses on (a) azimuth plane, and; (b) elevation plane.
Sensors 21 00227 g008
Figure 9. Illustrations of SPL (a) and frequency response (b) in the 3D Cartesian plane relative to the DUT (represented by the head icon).
Figure 9. Illustrations of SPL (a) and frequency response (b) in the 3D Cartesian plane relative to the DUT (represented by the head icon).
Sensors 21 00227 g009
Figure 10. Illustration of angle of arrival (AoA) calculation (not to scale).
Figure 10. Illustration of angle of arrival (AoA) calculation (not to scale).
Sensors 21 00227 g010
Figure 11. (a) Time response of the sound amplitude at θ = 0 ° , ϕ = 0 ° after the sound source was abruptly switched off at varying distances; (b) D R T 60 against distance—blue line denotes the average values for five trials, while vertical yellow line denotes the corresponding error bar.
Figure 11. (a) Time response of the sound amplitude at θ = 0 ° , ϕ = 0 ° after the sound source was abruptly switched off at varying distances; (b) D R T 60 against distance—blue line denotes the average values for five trials, while vertical yellow line denotes the corresponding error bar.
Sensors 21 00227 g011
Figure 12. D R T 60 against θ at varying distances.
Figure 12. D R T 60 against θ at varying distances.
Sensors 21 00227 g012
Figure 13. Measured amplitude against distance when θ = 0 ° (front) and when θ = 180 ° (back). The theoretical curve based on the inverse square law is represented by the yellow line.
Figure 13. Measured amplitude against distance when θ = 0 ° (front) and when θ = 180 ° (back). The theoretical curve based on the inverse square law is represented by the yellow line.
Sensors 21 00227 g013
Figure 14. Distance error, e k (left) and cumulative distance error, E c u m , (right) when ϕ Ω f . E c u m is minimum when ν = 0.37 .
Figure 14. Distance error, e k (left) and cumulative distance error, E c u m , (right) when ϕ Ω f . E c u m is minimum when ν = 0.37 .
Sensors 21 00227 g014
Figure 15. Distance error, e k (left) and cumulative distance error, E c u m , (right) when ϕ Ω b . E c u m is minimum when ν = 0 .
Figure 15. Distance error, e k (left) and cumulative distance error, E c u m , (right) when ϕ Ω b . E c u m is minimum when ν = 0 .
Sensors 21 00227 g015
Figure 16. Illustrations on the frequency response at varying ϕ when (a) d = 6 m; (b) d = 13 m; (c) d = 19 m. The z-axis denotes the average amplitude ( A p ) at the peak frequency f p . The blue, orange, yellow and purple lines denote the A p at ϕ = 0 ° , ϕ = 90 ° , ϕ = 180 ° , and ϕ = 270 ° respectively.
Figure 16. Illustrations on the frequency response at varying ϕ when (a) d = 6 m; (b) d = 13 m; (c) d = 19 m. The z-axis denotes the average amplitude ( A p ) at the peak frequency f p . The blue, orange, yellow and purple lines denote the A p at ϕ = 0 ° , ϕ = 90 ° , ϕ = 180 ° , and ϕ = 270 ° respectively.
Sensors 21 00227 g016
Figure 17. Estimated elevation angle, ϕ , based on SC when the sound source was placed at a distance of 6 m (a) and 13 m (b) from the DUT. The green line corresponds to the error, while the scatter plots correspond to the values of ϕ i at different peak frequencies.
Figure 17. Estimated elevation angle, ϕ , based on SC when the sound source was placed at a distance of 6 m (a) and 13 m (b) from the DUT. The green line corresponds to the error, while the scatter plots correspond to the values of ϕ i at different peak frequencies.
Sensors 21 00227 g017
Figure 18. Illustrations on the actual positions (denoted by the colored circles) and the corresponding estimated positions (denoted by ’diamonds’ of the same color) in a 3D space when the sound source was placed at (a) 6 m; (b) 13 m; and (c) 19 m, away from the DUT (represented by the head icon). All of the positions considered were correctly localized with small position errors, except for one point found in (a) (at k = 6 ), which was a result from the ambiguity issue.
Figure 18. Illustrations on the actual positions (denoted by the colored circles) and the corresponding estimated positions (denoted by ’diamonds’ of the same color) in a 3D space when the sound source was placed at (a) 6 m; (b) 13 m; and (c) 19 m, away from the DUT (represented by the head icon). All of the positions considered were correctly localized with small position errors, except for one point found in (a) (at k = 6 ), which was a result from the ambiguity issue.
Sensors 21 00227 g018
Figure 19. Distance error based on d f , d b , d R , and d ^ when d = 6 m (left plot), d = 13 m (middle plot), and d = 19 m (right plot) from the 30 experiments. The distance estimation via application of Algorithm 1 is represented by the blue line.
Figure 19. Distance error based on d f , d b , d R , and d ^ when d = 6 m (left plot), d = 13 m (middle plot), and d = 19 m (right plot) from the 30 experiments. The distance estimation via application of Algorithm 1 is represented by the blue line.
Sensors 21 00227 g019
Figure 20. Errors in the azimuth and elevation planes when d = 6 m (left plot), d = 13 m (middle plot), and d = 19 m (right plot) from the 30 experiments. The estimated azimuth angles with and without application of Algorithm 1 are represented by the blue and orange lines, respectively. The yellow line corresponds to the estimated elevation angle, which is consistently close to zero from all experiments.
Figure 20. Errors in the azimuth and elevation planes when d = 6 m (left plot), d = 13 m (middle plot), and d = 19 m (right plot) from the 30 experiments. The estimated azimuth angles with and without application of Algorithm 1 are represented by the blue and orange lines, respectively. The yellow line corresponds to the estimated elevation angle, which is consistently close to zero from all experiments.
Sensors 21 00227 g020
Table 1. DreamMaker OverLord printing parameters.
Table 1. DreamMaker OverLord printing parameters.
ParameterDescriptions
SlicerCura 15.04.6
MaterialPLA
Layer height0.15 mm
Shell thickness0.8 mm
Enable extractionYes
Bottom Thickness0.6 mm
Fill density100%
Print speed60 mm/s
Nozzle Temperature210 C
Nozzle size0.4 mm
Layer thickness0.1 mm
Extrusion overlap0.15 mm
Travel speed100 mm/s
Bottom layer speed20 mm/s
Outer shell speed50 mm/s
Inner shell speed60 mm/s
Minimal layer time5 s
Table 2. Numerical Results.
Table 2. Numerical Results.
d = 6 m d = 13 m d = 19 m
ActualEstimatedActualEstimatedActualEstimated
k θ ( ° ) ϕ ( ° ) d ^ ( m ) θ ^ ( ° ) ϕ ^ ( ° ) θ ( ° ) ϕ ( ° ) d ^ ( m ) θ ^ ( ° ) ϕ ^ ( ° ) θ ( ° ) ϕ ( ° ) d ^ ( m ) θ ^ ( ° ) ϕ ^ ( ° )
1005.759.656.210013.031.545.810019.175.429.25
29005.7385.359.5545013.1242.1510.2245019.4949.625.32
313505.69131.227.6590013.4587.6611.1290018.1383.327.93
427006.38264.128.22135013.72130.99.25225018.08231.128.39
50456.679.2131.21270015.32267.968.22270018.22285.2214.07
645456.23143.6834.791351513.8126.1221.57315019.08305.327.7
70906.066.3186.909014.625.6985.7454519.3542.3236.32
801805.479.16171.88018014.277.66150.6609019.051.3571.31
92251805.86212.46165.68027013.545.69255.66018018.923.22190.21
1002706.155.35259.6327013513.57245.69155.69027018.992.32262.32
Table 3. Performance evaluations in terms of errors and total number of ambiguous points (AP).
Table 3. Performance evaluations in terms of errors and total number of ambiguous points (AP).
Distance Error Angle Error
Index d f (SPL) d b (SPL) d R ( DRT 60 ) d ^ (Alg. 1)Index θ ^ (Alg. 1) θ 0 ϕ ^ (SC)
E ˜ a v ( ° )9.5928.310.3
E ˜ a v (m)4.72.60.60.5Total AP190
Total AP (%)3.3300
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Ting, T.M.; Ahmad, N.S.; Goh, P.; Mohamad-Saleh, J. Binaural Modelling and Spatial Auditory Cue Analysis of 3D-Printed Ears. Sensors 2021, 21, 227. https://doi.org/10.3390/s21010227

AMA Style

Ting TM, Ahmad NS, Goh P, Mohamad-Saleh J. Binaural Modelling and Spatial Auditory Cue Analysis of 3D-Printed Ears. Sensors. 2021; 21(1):227. https://doi.org/10.3390/s21010227

Chicago/Turabian Style

Ting, Te Meng, Nur Syazreen Ahmad, Patrick Goh, and Junita Mohamad-Saleh. 2021. "Binaural Modelling and Spatial Auditory Cue Analysis of 3D-Printed Ears" Sensors 21, no. 1: 227. https://doi.org/10.3390/s21010227

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop