## 1. Introduction

An aspect of importance in some classes of robots is an awareness and perception of the acoustic environment in which a robot is immersed. A prerequisite to the application of higher level processes acting on acoustic signals such as acoustic source classification and speech recognition/interpretation is a process to locate acoustic sources in space [

1,

2,

3,

4,

5,

6,

7,

8,

9,

10,

11,

12,

13,

14,

15,

16,

17,

18,

19,

20,

21,

22,

23]. A method for a binaural robotic system to locate directions to acoustic sources, based on a synthetic aperture computation, is described in Tamsett [

24]. A far-range approximation in that paper is relaxed in this one and the method is generalized for potentially determining range to an acoustic source as well as its direction.

With two ears, humans extract information on the direction to acoustic sources over a spherical field of audition based on differences in arrival times of sound at the ears (interaural time difference or ITD) for frequencies less than approximately 1500 Hz [

25,

26,

27]. Measurement of arrival time difference might be made by applying a short time-base cross-correlation process to sounds received at the ears [

28] or by a functionally equivalent process [

29,

30,

31,

32,

33,

34,

35]. For pure tones at higher frequencies locating acoustic sources is dominated by the use of interaural level difference (ILD) [

25,

26,

27].

Other aural information is also integrated into an interpretation of the direction to an acoustic source. The spectral content of a signal arriving at the ears is affected by the shape of the pinnae and the head around which sound has to diffract—the so-called head related transfer function (HRTF) [

36,

37,

38,

39]—and this effect provides information that can be exploited for aural direction finding.

Measurement of an instantaneous time or level difference between acoustic signals arriving at a pair of listening antennae or ears allows an estimate of the angle

$\lambda $ between the auditory axis and the direction to an acoustic source. This ambiguously locates the acoustic source on the surface of a cone with an apex at the auditory center and an axis shared with the auditory axis. The cone projects onto a circle of colatitude of a spherical shell sharing a center and an axis with the auditory axis (a lambda circle). Based on observations of human behavior, Wallach [

40] inferred that humans disambiguate the direction to an acoustic source by “dynamically” integrating information received at the ears as the head is turned while listening to a sound (also [

41]). A solution to the binaural location of the directions to acoustic sources in both azimuth and elevation based on a synthetic aperture computation is described in Tamsett [

24] representing the “dynamic” process posited by Wallach [

40].

The synthetic aperture computation (SAC) approach to finding directions to acoustic sources applied to a pair listening antennae as the head is turned provides an elegant solution to the location of the direction to acoustic sources. This process, or in nature a neural implementation equivalent to it, is analogous to those performed in anthropic synthetic aperture radar, sonar, and seismic technologies [

42], and in the same way that SAC in these technologies considerably improves the resolution of targets in processed images over that in unprocessed “raw” images, so the ambiguity inherent in locating an acoustic source with just two omnidirectional acoustic antennae, collapses from a multiplicity of points on a circle for a stationary head, to a single point along a line, by deploying a SAC process as the head is turned while listening to a sound.

Only relatively recently has binaural sensing in robotic systems, in emulation of binaural sensing in animals, developed sufficiently for the deployment of processes for estimating directions to acoustic sources [

1,

2,

3,

4,

5,

6,

7,

8,

9,

10,

11,

12,

13]. Acoustic source localization has been largely restricted to estimating azimuth [

1,

2,

3,

4,

5,

6,

7,

8,

9,

10,

11,

12,

13] on the assumption of zero elevation, except where audition has been fused with vision for estimates also of elevation [

14,

15,

17,

18]. Information gathered as the head is turned has been exploited either to locate the azimuth at which the ITD reduces to zero, thereby determining the azimuthal direction to a source, or to resolve the front–back ambiguity associated with estimating only the azimuth [

4,

5,

6,

7,

8,

9,

10,

11,

12,

13,

14,

19,

20]. Recently, Kalman filters acting on a changing ITD have also been applied in robotic systems for acoustic localization [

21,

22,

23].

The second aspect to locating acoustic sources is estimating range. Some species of animal possess ears with pinnae having a response to sound that is strongly directional [

43], and that are independently orientable, for example cats. Such animals can explore sounds by rotating the pinnae while keeping their heads still. In this way they are able to determine the direction to an acoustic source with a single ear and in principle using both ears estimate range by triangulation.

Animals without this facility (e.g., humans and owls) must use other mechanisms or cues for range finding. Owls are known to be able to catch prey in total darkness [

44] suggesting they are equipped to accurately measure distance to an acoustic source. An owl flying past a source of sound could use multiple determinations of direction to estimate range by triangulation. Or in flying directly towards a point acoustic source an owl will experience a change in intensity due to an inverse square relationship with range which could be exploited for monaural estimates of range. For example, a quadrupling in intensity during a swoop on a point acoustic source would indicate that the distance to the source has halved; and in this way an estimate of instantaneous current range could be made. In addition, sounds that are familiar to an animal have an expected intensity and spectral distribution as a function of range and this learned information can be exploited for estimating range [

45,

46]. Range in robots has been estimated on the basis of triangulation to acoustic source involving lateral movement of the head [

6].

A far-range approximation in Tamsett [

24] restricts acoustic source localization to direction. It was suggested [

24] that SAC in principle could provide an estimate of range from near-field deviations from far-field expectations. In relaxing the far-range approximation in the current paper, a synthetic aperture computation as a function of range is formulated to enable range to acoustic source as well as direction to be estimated.

A proof of concept demonstrating the principle and potential utility of the method is provided through the use of simulated experimental data. Multiple SACs as a function of range are performed on the simulated data and distance to acoustic source estimated by optimizing range for the set of lambda circles of colatitude generated as a function of range that intersect to best converge or focus to a point. Employing a SAC process in this way adds a dimension to the SAC process that finds only direction to source. Acoustic energy maxima are sought effectively in a three-dimensional virtual acoustic volume rather than over a two-dimensional acoustic surface.

The solution could be implemented in a binaural robotic system capable of generating a series of sufficiently accurate values of $\lambda $ estimated from measurements of arrival time/level differences as the head is turned. Any implementation in nature will involve biological/neural components to provide an equivalent to the mathematics-based functions deployed in an anthropic robotic system.

## 2. Arrival Time Difference, Angle to Acoustic Source, and Range

A straight line simplification of the relationship between arrival time difference, angle to source, and range is illustrated in

Figure 1. A more elaborate model might allow for diffraction around the head to reach the more distant ear [

19,

47].

L represents the position of the left ear, and R the right ear; the line LR lies on the auditory axis;

C represents the position of the auditory center;

S is the position of the acoustic source;

$\lambda $ is the angle at the auditory center between the auditory axis and the direction to the acoustic source;

$d$ represents the distance between the ears (the length of the line LR);

$nd$ represents the distance of the acoustic source from the auditory center as a multiple $n$ of the length $d$ (the length of the line CS);

$fd$ represents the difference in the acoustic ray path lengths from the source to the ears as a proportion of the length $d$ ($-1\le f\le 1$); and

$a$ is the distance from the acoustic source to the right ear (the length of the line SR).

The distance

$fd$ is related to the difference in arrival times at the ears measured by the auditory system by:

where:

$c$ is the acoustic transmission velocity (e.g., 330 ms^{−1} for air); and

$\Delta t$ is the difference in the arrival time of sound received at the ears.

Applying the cosine rule to the triangle SCR:

Applying the cosine rule to the triangle SLC:

Substituting for

$a$ in Equation (5), using Equation (3) yields:

For infinite range (

$n$ =

$\infty $) Equation (6) reduces to:

Equation (7) is the relationship between

$f$ and

$\lambda $ for the far-range approximation [

24] in which acoustic rays incident on the ears are parallel rather than diverging from a point a finite distance from the ears.

Equation (6) is quadratic in

$f$ yielding a single physically realizable solution:

Equation (8) may be rearranged for a solution to

$\lambda $ in terms of

$n$ and

$f$:

Equation (9) reduces to Equation (7) for infinite range as expected. A value for $\lambda $ computed for values of $f$ and $n$, defines a circle of colatitude on a spherical surface of radius $nd$ sharing a center and axis with the auditory axis.

## 4. Discussion

The method described for estimating range to an acoustic source could in principle form the basis of an implementation in a binaural robotic system, however the challenge in an implementation is the need for very accurate measurement of arrival time/level differences between the antennae to achieve the spatial resolution required to successfully discriminate the effect of range in a SAC for an estimate of range to be made.

It is apparent from

Figure 2 and

Figure 3 that high spatial resolution of acoustic location better than

$1/4\xb0$ would appear to be necessary, to achieve an estimate of a range even as short as 3.5-times the distance between the ears, to an accuracy no better than ~30%. This is beyond human capability which can locate direction with a resolution estimated to be 1.0

$\xb0$ [

48] to 1.5

$\xb0$ [

26] in the direction the head is facing, and furthermore would also appear to be beyond the capability of owls estimated to be able to resolve direction to approximately only 3

$\xb0$ [

49,

50] despite being able to catch prey in total darkness from audition alone [

44].

However, it might be possible in principle for the auditory capability of a robotic system to exceed those of humans and owls for estimates of range based on the three-dimensional SAC method described and demonstrated here. Highly accurate estimates of difference in arrival times at the ears, better than 0.5% of the travel time for the distance between the ears, are required and an obvious ploy to adopt to attempt to achieve this in a robotic system intended to explore the possibilities of the method would be to allow the distance between the listening antennae to be large, at least in the first instance. Because differences in arrival times are related to differences in acoustic ray path lengths via the acoustic transmission velocity, the velocity of sound will also be required to be known to a similar degree of accuracy.

It will be possible to calibrate an auditory system’s estimate of the acoustic transmission velocity by performing a three-dimensional (3D) SAC analogous to the one described here for estimating range, but instead listening to a sound from a source at a sufficiently large range to allow the far-range approximation to be made, and optimizing the acoustic transmission velocity for the strongest response in a 3D SAC.

Any implementation in nature of acoustic range estimation based on the SAC principle will incorporate biological neural-ware components to achieve the equivalent of mathematical operations in the software/firmware components of a robotic system.

## 5. Summary

A method for determining the direction to an acoustic source with a pair of omnidirectional listening antennae in a robotic system, or by an animal’s ears, as the head is turned is described in Tamsett [

24] in which a measure of acoustic energy received at the antennae/ears is extended over lambda circles of colatitude and integrated in a virtual/subconscious acoustic image of the field of audition. This constitutes a synthetic aperture computation (SAC) analogous to data processes in anthropic synthetic aperture radar, sonar, and seismic [

42] technologies. The method has been extended in this paper and a far range approximation [

24] relaxed to allow SAC as a function of range. By optimizing range to find maxima in integrated energy over multiple sets of lambda circles generated as a function of range; or alternatively, a best focus to a point of intersection of lambda circles, range as well as direction to acoustic sources can in principle be estimated.

This embellished SAC process promotes the direction finding capability of a pair of antennae to that of a large two-dimensional stationary array of acoustic antennae, with not only beam forming direction finding but also range finding capability. The method appears to be beyond the acoustic direction finding capabilities of human audition, however it might nevertheless find utility in a binaural robotic system capable of sufficiently accurate measurement of arrival time/level differences between the listening antennae.