Localization of Sound Sources: A Systematic Review

: Sound localization is a vast ﬁeld of research and advancement which is used in many useful applications to facilitate communication, radars, medical aid, and speech enhancement to but name a few. Many different methods are presented in recent times in this ﬁeld to gain beneﬁts. Various types of microphone arrays serve the purpose of sensing the incoming sound. This paper presents an overview of the importance of using sound localization in different applications along with the use and limitations of ad-hoc microphones over other microphones. In order to overcome these limitations certain approaches are also presented. Detailed explanation of some of the existing methods that are used for sound localization using microphone arrays in the recent literature is given. Existing methods are studied in a comparative fashion along with the factors that inﬂuence the choice of one method over the others. This review is done in order to form a basis for choosing the best ﬁt method for our use.


Introduction
Sound localization deals with finding the source of sound with respect to an array of microphones. In practice, sound source localization is done using two type of cues, these are: binaural and monaural. Binaural cues are determined by using differences in sound signals reaching at the two ears [1]. This difference is calculated using either time to intensity of the incident sound signal. Monaural cues are measured through the angle of incidence of the sound signal on the ear. The ability to distinguish and identify particular sounds from the surrounding noise is an important aspect of normal auditory system. People with hearing loss suffer from the disability of being unable to interpret speech in the presence of background noise and not being able to recognize and distinguish between multiple speakers [2]. Hence, this mechanism is implemented in hearing aids for people who are suffering from hearing loss in one or both ears. For the last two to three decades, sound source localization using a set of microphone arrays has been a major topic of interest for researchers and has been discussed in a number of noteworthy studies [3][4][5][6].
To this day, this problem receives immense importance by researchers from the field of medicine, robotics and signal processing. One of the many challenges faced in this domain is the problem of acoustic localization in reverberate environments [7]. Apart from that, the number of microphones in the arrangement as well as their geometry is also a matter of ongoing research, as there is a need to limit the use of microphones in the setting to make the system compact, reduce complexity and minimize resource consumption. Sound source localization using microphones is still a theoretical concept which is still being researched vigorously [8].
Sound localization has many applications in modern technologies and help in producing even better systems which are used in various fields. One of the most important application of this is in hearing aid for the disabled people. A massive research for producing personal guidance system is done to facilitate blind people so they get familiarize with the environment. This guidance system includes headphones, electronic compass, transmitter and receiver. This will use the sense of hearing in place of sight to easily move around [9]. Sound localization is also used for navigation. Sonar used this technique to find the location of target. In addition, localization help in creating better virtual reality (VR) scenarios which greatly increase their realness [10]. Some other uses of the sound localization are audio surveillance, teleconferencing, improved speech recognition and speech enhancement [11][12][13][14][15][16].
The sound source localization now has a wide range of applications in a variety of fields. It is applied frequently in the industries as well as in domestic and military applications. In audio communication, this technology has become crucial for the development of smart devices to be used for voice enhancement. For instance, a sound source localization technology is integrated in a camera which is used in video conferencing [17][18][19]. Using source localization, the camera automatically moves in the direction of the speaker [20]. This technology is also being used in hearing aids which are used to assist people having hearing disabilities. In such a device, the location of the source of sound is determined which is then passed through an integrated array technology for enhancing the voice. Sound coming from all other directions is minimized making the sound of the source voice more strong and distinct [21]. Sound source localization has been successfully implemented in both speech recognition and enhancement systems. The smart water-mine used in the ocean wars, uses sound localization technology for automatic identification of target and its location. This data is communicated to the control system which attacks the identified target [22][23][24][25][26][27].
Sound source localization has many prospects in the field of robotics [28][29][30]. Apart from having basic senses like sight, hearing and touch, the robots also have some power to think logically due to which they are being used in a wide range of intelligent applications and are getting increased validation. Sound source localization using microphones is still a theoretical concept which is still being researched vigorously and has not reached the field of robotics till now. Locating the source of the sound using just two microphones have been pondered upon in the past [31][32][33]. Two microphones have been used as left and right ears of a robot to locate the source of a sound. However, only two microphones are not enough to meet this objective due to the inability to achieve the required arrangement in space. SR-SLOMA is a new sound localization technology which consists of a microphone array with a system of speakers. This system has been used to recognize verbal patterns [34][35][36][37]. This technology has applications in teleconferencing and interactive classrooms. It uses both source localization of sound and a voice recognition mechanism to identify the recording process of some input audio and video files. This helps to improve the working of a standard transmitter and receiver making it smarter and more human.
As discussed earlier, source localization is a growing field and there are many advancements related to it. Every development in this field gives rise to many research options. The ongoing research and development in this field has opened doors of numerous facilities for the people may it be for the use of disabled people in form of hearing aid or by the forces to do audio surveillance and locate the targets, one such example is of sonar that locate the position of target by using sound waves.
In this paper, some state of the art methods which are used for the purpose of sound localization are discussed in detailed. More precisely, this paper targets the following research questions: RQ-1. Which sound localization methods have been recently presented in literature? RQ-2. Which factors affect the sound localization methods? RQ-3. How can the limitations in the existing sound localization methods be overcome using the current technologies?

Materials and Methods
This paper reviews the sound source localization technologies. It probes the current limitations of the methods and presents an insight about how the process can be improved to enhance the precision of sound source localization. In this section, a comprehensive detail about the materials, method and the resources required to conduct this study have been presented. The overall process followed for the study have been illustrated in Figure 1.

Materials and Methods
This paper reviews the sound source localization technologies. It probes the current limitations of the methods and presents an insight about how the process can be improved to enhance the precision of sound source localization. In this section, a comprehensive detail about the materials, method and the resources required to conduct this study have been presented. The overall process followed for the study have been illustrated in Figure  1. As sound localization is a vast field so it is necessary to define the boundary of this document. Therefore, the data collection step has been facilitated by first forming three main categories of data. These categories have been created on the basis of the three research questions specified in the Section 1. These categories are listed as follows: C-1: sound source localization methods C-1(a): sound source localization in 3D space C-2: Factors influencing sound source localization C-3: Improvement over sound source localization The major sources which are consulted to retrieve the recent research articles related to each category are the websites of journals and conferences like Elsevier, Institute of Electrical and Electronics Engineer (IEEE) and Research Gate. Other websites include MDPI, Google Scholar and arxiv. Table 1 gives the restricted domains for this purpose along with the number of articles retrieved from each domain. It tells the number of papers studied for the literature and the keywords for easy access. The search keywords and phrases to be entered on the search engines of these websites are formulated so as to completely exhaust the database of each website and get maximum number of articles relevant to the topic of interest. To facilitate the generation of keywords, first a set of basic keywords is formed. This initial set consists of keywords include phrases like "sound source localization", "influencing factors" and "improvements". Once, this set is created, more keywords relevant to the keywords in this list are searched. VOS viewer software was used to conduct this search by using the keywords from the initial set. As a result of this  As sound localization is a vast field so it is necessary to define the boundary of this document. Therefore, the data collection step has been facilitated by first forming three main categories of data. These categories have been created on the basis of the three research questions specified in the Section 1. These categories are listed as follows: sound source localization methods C-1(a): sound source localization in 3D space C-2: Factors influencing sound source localization C-3: Improvement over sound source localization The major sources which are consulted to retrieve the recent research articles related to each category are the websites of journals and conferences like Elsevier, Institute of Electrical and Electronics Engineer (IEEE) and Research Gate. Other websites include MDPI, Google Scholar and arxiv. Table 1 gives the restricted domains for this purpose along with the number of articles retrieved from each domain. It tells the number of papers studied for the literature and the keywords for easy access. The search keywords and phrases to be entered on the search engines of these websites are formulated so as to completely exhaust the database of each website and get maximum number of articles relevant to the topic of interest. To facilitate the generation of keywords, first a set of basic keywords is formed. This initial set consists of keywords include phrases like "sound source localization", "influencing factors" and "improvements". Once, this set is created, more keywords relevant to the keywords in this list are searched. VOS viewer software was used to conduct this search by using the keywords from the initial set. As a result of this search, the most relevant keywords in literature related to the three categories were found which include words like "signal", "microphone", "array". A detailed diagram of the cluster of keywords is given in Figure 2. The generated keywords were then used by to form phrases which were used on the search engines of websites. Hence, this process yielded maximum number of relevant research articles from the search platforms. For the category C-1 the keywords designed consist of phrases like: "direction of arrival", "sound localization" and "ad-hoc microphone array". For C-2, the defined keywords include "sound localization dependencies" and "factors affecting sound localization". For C-3, the keywords included phrases such as "improving sound localization" and "enhance sound localization precision". search, the most relevant keywords in literature related to the three categories were found which include words like "signal", "microphone", "array". A detailed diagram of the cluster of keywords is given in Figure 2. The generated keywords were then used by to form phrases which were used on the search engines of websites. Hence, this process yielded maximum number of relevant research articles from the search platforms. For the category C-1 the keywords designed consist of phrases like: "direction of arrival", "sound localization" and "ad-hoc microphone array". For C-2, the defined keywords include "sound localization dependencies" and "factors affecting sound localization". For C-3, the keywords included phrases such as "improving sound localization" and "enhance sound localization precision". After this step, a set of ranked research papers was obtained from the selected search platforms. The next step was to screen these articles to filter them. For this purpose, some assessment criteria were defined to screen the papers. These criteria for article screening are given below: (1) Published between the years 2011 to 2021.
(3) No duplicate papers. (4) Articles must be research papers, reviews or book chapters. After this step, a set of ranked research papers was obtained from the selected search platforms. The next step was to screen these articles to filter them. For this purpose, some assessment criteria were defined to screen the papers. These criteria for article screening are given below: (1) Published between the years 2011 to 2021.
(3) No duplicate papers. (4) Articles must be research papers, reviews or book chapters.
The research papers passing the screening criteria were downloaded. The content of each article was then studied and examined to confirm its relevance to the defined categories. For this purpose, abstract, introduction and methodology sections of the research articles were carefully analyzed. Hence, a finalized version of the research articles for the study was prepared which contained only the articles passing the final content assessment criteria. The total number of articles in this list was 28. These papers are shown in Table 2 along with the name of their source journal or conference. From these articles, 19 belonged to C-1, 3 belonged to C-2 while 6 papers belonged to C-3. "Using multiple microphone arrays and reflections for 3D localization of sound sources" IEEE/RSJ International Conference on Intelligent Robots and Systems [29] "Special issue on wireless acoustic sensor networks and ad hoc microphone arrays" Signal Processing [30] "Classification of reverberant audio signals using clustered ad hoc distributed microphones" Signal Processing [31] "Ad Hoc Microphone Array Calibration: Euclide-an Distance Matrix Completion Algorithm and Theoretical Guarantees" International Conference on Digital Signal Processing [32] "Binaural sound localization based on reverberation weighting and generalized parametric mapping" IEEE/ACM Transactions on Audio, Speech, and Language Processing [33] "Sound source locali-zatio" European Annals of Otorhinolaryngology, Head and Neck Diseases [7] "A Survey of Sound Source Localization Methods in Wireless Acoustic Sensor Networks" Wireless Communications and Mobile Computing [9] "Energy-based acoustic source localization methods: a survey" Sensors [10] "Localization of sound sources in robotics: A review" Robotics and Autonomous Systems [12] "DOA estimation based on MUSIC algorithm" Digitala Vetenskapliga Arkivet DiVA, Småland [13] "Direction of Arrival Estimation via ESPRIT Algorithm for Smart Antenna System" International Journal of Computer Applications [16] "On-Grid Doa Estimation Method Using Orthogonal Matching Pursuit" International Conference on Signal Processing and Communication (ICSPC) [17] "Localizing multiple audio sources in a wireless acoustic sensor network" Signal Processing [18] "Robust 3D Localization and Tracking of Sound Sources Using Beamforming and Particle Filtering" Acoustics, Speech and Signal Processing [27] "3D Sound Source Localization System Based on Learning of Binaural Hearing" IEEE International Conference on Systems, Man and Cybernetics [24] "Revisiting trilateration for robot localization" IEEE Transactions of the Source [11] "Three ring microphone array for 3-D sound localization and separation for mobile robot audition" "Time-delay estimation for TOA-based localization of multiple sensors" IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) [22] "Sensation and Perception (Eighth Edition)" Cengage Learning [23] C-3 "Real-time implementation and performance optimization of 3D sound localization on GPUs" Automation and Test in Europe Conference and Exhibition [28] "High performance 3D sound localization for surveillance applications" IEEE Conference on Advanced Video and Signal Based Surveillance [34] "SNR improvement with speech enhancement techniques" Proceedings of the ICWET [4] "Cooperative integrated noise reduction and node-specific direction-of-arrival es-timation in a fully connected wireless acoustic sensor network" Signal Processing [14] "Array Signal Processing for Maximum Likelihood Direction-of-Arrival Estimation" Electrical Electronic System [35] "Smart room: participant and speaker localization and identification" IEEE International Conference on Acoustics, Speech, and Signal Processing [6] Energies 2021, 14, 3910 6 of 17

Energy-Based Localization
Most of the energy-based methods are used for the wireless acoustic sensory network (WASNs) because of low variation of acoustic power. Source for the sound are microphones which are represented by nodes. Taking the sound energy input and using it for localization depends on the acquired averaged readings received by the microphone for defined signal samples. Energy based techniques do not have the issue of synchronization also they do not need multiple microphones for each node [38][39][40][41]. The energy difference from different microphones of same node are minor. The basic idea of energy-based localization method is to use the energy ratios of the sensors and the target is restricted to a hypersphere. Increasing sensors will increase hyperspheres and the target will be at the point of hypersphere intersection.

Time of Arrival (TOA)
The time instant at which the source signal is detected by the microphones is called the time-of-arrival (TOA). Time-of-Flight (ToF) is a technique to determine the distance between microphones and an object. It is calculated by finding the time taken by the source signal to reach the microphone after being emitted by the source and reflected by an object [42]. Direct mapping from TOA to source-node distance is not possible because TOA and TOF may or may not be equivalent.
For TOA measurements the source and sensor nodes are cooperated so that the propagation time of the signal is easily detected by the nodes. In case of non-availability of the cooperation the initial transmission time will be unknown and without this, TOA is not able to determine the propagation time of signal alone [42][43][44]. TOA uses the method of trilateration by forming equations for the anchors representing the circle having radius equal to the distance from the source. The solution to these equation gives the intersection point which the location of the source [45][46][47].

Time Difference of Arrival (TDOA)
Time difference of arrival (TDOA) works with the time difference between the signals. This can be done with the measurement of time difference between zero level crossings or between the onset times of both signals. TDOA is also calculated by using the assumption of sound source signal to be narrowband. Another popular way of estimating TDOA is by calculating cross-correlation vector between the signals which can be sensitive to noise sources [12]. The algorithms using TDOA or TOA are designed such that they localize the sound source using nodes whose positions are known. To implement these methods, there is a need to choose a reference node to nullify the noise factors and ease the synchronization needs. Hence, the accuracy of sound source localization greatly depends on the choice of this reference. Due to this reliance, the performance of such systems often suffers in the case of a poor choice of the reference. To overcome this issue, Wang et al. proposed a set up where the nodes with known positions are synchronized while the clock of the sound source runs independently [48]. Through vigorous experimentation, the authors demonstrated that in such a configuration, there is no need to choose and rely on a reference node.

Direction of Arrival (DOA)
Each node in this approach estimates the direction of arrival (DOA) of the sources and transmit these estimates to the center. As each node does the estimation individually so there is no need of synchronization. It will work fine with unsynchronized inputs as long as the motion of the source is very low. It uses triangulation of the points in locating the source. This approach needs more computational power along with multiple microphones [9].
Basic principle for the DOA estimation is that if the incoming wave meets the conditions for far field narrow band then the difference between the normal to array and direction vector plane gives the angle of arrival. For far field wide signals there exists a wave-way difference between different array elements for the same signal provide the angle of arrival [13]. Entire structure for the DOA estimation is composed of three stages. Figure 3 gives the idea for these spaces that make up the entire architecture for the DOA estimation.
Each node in this approach estimates the direction of arrival (DOA) of the sources and transmit these estimates to the center. As each node does the estimation individually so there is no need of synchronization. It will work fine with unsynchronized inputs as long as the motion of the source is very low. It uses triangulation of the points in locating the source. This approach needs more computational power along with multiple microphones [9].
Basic principle for the DOA estimation is that if the incoming wave meets the conditions for far field narrow band then the difference between the normal to array and direction vector plane gives the angle of arrival. For far field wide signals there exists a waveway difference between different array elements for the same signal provide the angle of arrival [13]. Entire structure for the DOA estimation is composed of three stages. Figure 3 gives the idea for these spaces that make up the entire architecture for the DOA estimation. The technique to find DOA has been proposed in [49], where phase difference among signals are calculated to determine the angle of arrival of the source signal. In this method, first a fast Fourier transform (FFT) is applied to the signal received by each microphone. After this step, the frequency and phase values of each signal at peak points are measured [49]. For each node, the phase difference at peak points are then calculated to determine phase delays. The technique to find DOA has been proposed in [49], where phase difference among signals are calculated to determine the angle of arrival of the source signal. In this method, first a fast Fourier transform (FFT) is applied to the signal received by each microphone. After this step, the frequency and phase values of each signal at peak points are measured [49]. For each node, the phase difference at peak points are then calculated to determine phase delays.
First is the target space which comprises of the signal source and also includes the environment with its complexities. Unknown parameters of the signal are estimated at this stage.
The second stage is the observation space that receives the information from target space. The received information contains environmental characteristics such as noise and interference.
The third and final stage is for the estimation techniques which may be array correction or filtering technique. This stage basically reconstructs the target space signal whose accuracy depends on many factors. Energy distribution of signal forms the spatial spectrum and this estimation of spectrum is basically DOA estimation [13].
Various factor affect the DOA estimation results which are briefly discussed here: (1) Array elements For same parameters of the array, increasing the number of array elements increase the estimation performance. Here, parameters refer to the sensor properties, sensors physical position and the errors in the calculated positions.
(2) Signal-to-noise ratio (SNR) With low value of SNR will affect the performance as the incoming sound will be contaminated by the noise and interference to a larger scale and this will drop the performance of the DOA estimation.

(3) Coherence in source signal
Signal which have the same frequency and propagate with constant phase offset are called coherent signals. It becomes a major problem to differentiate these two signals. In return affecting the performance of the DOA estimates [50].

(4) Position of sensors
The location of the sensors is also very important. They should be within range of the sound-producing source so that they can easily detect the sound and work on the localization task afterwards. The sensors or microphones are placed in a geometrical shape. Previously, sensors were arranged in the form of equilateral triangle [51]. By calculating time delays manifested by the source signal in reaching each microphone, the distance of each microphone from the source signal along with the angle of the source can be estimated.
Many different type of algorithms are used for the DOA estimation out of which multiple signal classification (MUSIC) and estimation of signal parameters via rotational invariance technique (ESPRIT) algorithms are subspace-based methods which work by the decomposing the eigenvalue of the signal correlation matrices coming from the microscope. The MUSIC algorithm works when the array geometry is completely known and calibrated and also has complex computation [14]. ESPRIT, on the other hand is more robust and doesn't have to search for all the possible DOAs, which reduces its computational complexity [15]. In addition to the above briefly described sub-space methods for finding the DOA estimates there exists other methods too. Some of them are described in Table 3. Table 3. Methods for DOA estimation.

Method
Advantages Disadvantages

Conventional Beamforming
Produce maximum output power needed for estimation in certain time [13].
Limited to the beam width height and side lobe giving low resolution [13].

ESPRIT Algorithm
Computational complexity and storage requirements are less than MUSIC [16].
Noise effect the precise value of the arrival angle. Multipath fading is also seen [16].
Prone to errors.

MUSIC Algorithm
Measure multiple signals simultaneously. High precision. Real time processing is achieved by the use of high-speed processing technology.
Small difference of incident angle with low SNR while moving will decrease the performance of algorithm. Increasing array element spacing will give false peaks for spatial spectrum [13].

Non-linear Least Squares
Give superior results in presence of low SNR, coherent sources and short data sample than its counterpart methods [35].
Good initial estimates are required. High time complexity [17].

Grid-based method
Doesn't require initial points. Low computational burden.
Accuracy is limited to grid point's density [18].

Beamforming
Beamforming uses a microphone array in the far field which is defined as being further away from the source than the diameter. Sound waves which hit the microphone array in far field are planar waves so it is easy to propagate the incoming sound directly to the test object. Signals from the beamforming array are added incorporating the propagation distance delay [19].

Inter-Microphone Intensity Difference (IID)
This method works for a 2-microphone array which measures the difference of energy between the signals at any instance. The obtained time domain signal helps in determining whether the source is at right, left or front of the microphone. In order to increase the resolution, greater number of microphones can be used. This time domain signal can be changed to frequency domain version which is inter-microphone level difference (ILD) which uses the difference spectrum for the signal [12]. Logarithmically spaced set of filters in frequency domain called the filter bank is the similar feature to ILD but it is more robust against noise as compared to ILD.

Steered Response Power (SRP)
SRP is a beam forming technique that computes the power of a filter and sum it to a set of source location defined by the spatial grid [9]. Generalized cross correlation (GCC) data from multiple microphone pairs are accumulated and are used for computation. Estimated source location is generated by the highest value in SRP power map which is the grid of the set of SRP [52][53][54].
Many different methods are used for the sound localization purposes. According to the application of the user particular method is selected. A detailed review of some of the methods is taken and they are compared in order to build a clear basis that helps in choosing the method best suited for our application. Table 4 shows a comparative analysis for better understanding of different methods. Table 4. Comparative study of sound localization methods.

Method Synchronization Requirements Advantages Drawbacks
Energy-based methods Direction of arrival, ad-hoc microphone arrays, wireless acoustic sensor network, audio signal classification, location estimation.
Simpler capturing and transmission devices.
Gain calibration is required at nodes for high energy ratio [10] Beamforming Source localization, direction of arrival, trilateration, time delay estimation, position calibration.
Simultaneous measurement of data is requirement.
Results have good spatial resolution [19]. Fast analysis speed. Robust in noisy environment [21].
Graphics Processing Units (GPUs) are needed for implementation.

TOA Precise synchronization.
Precise timing hardware is a requirement Using reasonable assumptions, higher accuracy along with reduced execution time can be achieved [22].
Unknown internal delays that need to be dealt with data fitting.

DOA
Easily works with unsynchronized inputs at slow rates of sources.
Data association needs to be done for false alarms [9].
Low bandwidth usage. Complex computation.

Inter-microphone Intensity Difference
In frequency domain, correlation exists.
Incorporation of learning based-mapping.
Works for a 2-microphone arrays.

Sound Source Localization in 3D Space
Locating the sound source in 3D space is referred to as 3D sound localization. The method involves analyzing both the horizontal and vertical angles of the arrival of sound waves along with the different between the sound source and the microphones. This requires the microphones to be arranged in a particular structure with the sound source. Usually the 3D coordinates are determined by applying signal processing techniques [55][56][57]. Many mammals along with human beings make use of binaural hearing for sound source localization. In this process, the information received from each ear undergoes some comparative analysis which is part of a synthesis process. In the experimental setting the binaural hearing functionality is achieved through the use of two microphones [33]. In this project, the use of two nodes of microphones where each node contains an array of two microphones were introduced [58-61].

Technologies
The sound source localization technology is mostly used in the fields of audio and acoustics like direction navigation, speech enhancement, surveillance and hearing aids [34]. The current sound localization routines make use of the time difference of arrival of each Energies 2021, 14, 3910 10 of 17 sound signal. These systems mostly limit the localization to two-dimensional space and therefore are not viable to be used to solve practical sound localization problems [56][57][58].

Sound Localization Features
The sound source is identified through the use of some features [23]. These cues can be binaural or monoaural. Vertical sound localization can be done by using monaural cues. These cues can be achieved through spectral analysis. Binaural cues are used for horizontal sound source localization. The difference in hearing between the left and right ears are analyzed. The time difference between the arrival of sound wave to both ears and the differences in intensities are both taken into account during the analysis [59][60][61].

Methods
Most common methods being used for 3D sound localization are listed below: (1) A structure comprising of multiple sensors like microphones or hearing robots can be used to mimic the sound localization technique biologically used by mammals [24]. (2) Acoustic vector sensor (AVS) arrays [25] is a method used for real time sound localization. (3) Offline methods (4) Result optimization with the use of classification techniques, neural network and maximum likelihood methods are applied.
The following sections provide an overview of the different methods that are employed for localization of sound source in 3D space.
(1) Steered beam-former method The steered beam-former method using microphones which are combined using a steered beam-former. The DoA is detected by a robotic sensor network. The incoming signals are then filtered to reduce the noise. This method is considered useful in speech recognition applications in complex environments, where sound entropy has to be reduced for successful localization [62].
(2) Beam-former method The beam-former method relies on generating pulses towards a projector at multiple time points, such that all the pulses hit the projector at the same time, creating a large sound impact. This method is used as a basis for a multiple input multiple output (MIMO) model for improving the performance of the sound localization systems, such as cellular technologies. Such a system is suggested to reduce the bit-error-rate in sound transmission. The presence of multiple transmission (Tx) and receiving (Rx) channels adds sub-channels for increasing channel capacity without increasing the overall bandwidth of the system. The use of multiple channels has proven to be efficient at providing a focused sound beam without the need of increasing design complexity [63].

(3) Acoustic vector sensor (AVS) array
The acoustic vector sensor (AVS) array is used to measure the acoustic pressure. An AVS contains three velocity sensors along with a pressure sensor. These sensors detect signals in the form an XYZO array. The DoA of the sound can then be estimated using these arrays. The DoA performance of AVS has been deemed to be better than other methods reported in literature. The key feature of AVS is that it utilizes all the information available about acoustics in a defined space. This feature makes AVS a desirable method on platforms where space is limited. Figure 4 shows an AVS array configuration [64].
The acoustic vector sensor (AVS) array is used to measure the acoustic pressure. An AVS contains three velocity sensors along with a pressure sensor. These sensors detect signals in the form an XYZO array. The DoA of the sound can then be estimated using these arrays. The DoA performance of AVS has been deemed to be better than other methods reported in literature. The key feature of AVS is that it utilizes all the information available about acoustics in a defined space. This feature makes AVS a desirable method on platforms where space is limited. Figure 4 shows an AVS array configuration [64]. The underlying principle of the multiple microphone array method is to record the time of difference between the arrival of sound for determining the direction. To accurately determine the spatial distribution of the sound beams, triangulation is applied using the distance between the microphone placement and using the ratio of distance between the microphones.
Mobility in the multiple microphone array is an advantage as it helps in identifying the source of sound by determining the distance between different microphones. However, sound coming from multiple sources can cause difficulty in determining the source of the signal. Moreover, the identification of DoA becomes more complex in the case of moving objects [65].

Factors Affecting the Choice
Choosing a certain method among many different methods comes with challenges. Some of the methods are cost efficient but they lack in accuracy, while others provide good results with bandwidth inefficiency so there is a need to find a tradeoff between them which is best suited. Some of the factors which effect in choosing a method are as follows. The underlying principle of the multiple microphone array method is to record the time of difference between the arrival of sound for determining the direction. To accurately determine the spatial distribution of the sound beams, triangulation is applied using the distance between the microphone placement and using the ratio of distance between the microphones.
Mobility in the multiple microphone array is an advantage as it helps in identifying the source of sound by determining the distance between different microphones. However, sound coming from multiple sources can cause difficulty in determining the source of the signal. Moreover, the identification of DoA becomes more complex in the case of moving objects [65].
RQ-2. Which methods affect the choice of sound localization methods?

Factors Affecting the Choice
Choosing a certain method among many different methods comes with challenges. Some of the methods are cost efficient but they lack in accuracy, while others provide good results with bandwidth inefficiency so there is a need to find a tradeoff between them which is best suited. Some of the factors which effect in choosing a method are as follows.

Cost Efficiency
In order to achieve high accuracy for the sound localization, many methods use advanced and recent techniques which can provide high processing speeds and ease the computational complexities. This can be ensured by using high end hardware systems which are quite expensive. Keeping this in mind many single-board computing devices are made which minimize this problem but still managing the cost of the whole system and keep it within budget is very important.

Measurement Errors
All of the sound localization methods are subject to errors which are caused by the surrounding noise and interference. Sound waves are subjected to the problems of signal diffraction, echoes, reflection, deflection and diffractions which produce many measurement errors and wrong localization. Many of the errors also occur due to lack of synchronization of the nodes which is very crucial in some methods like TOA and TDOA [9], so, it is necessary to choose a method which minimizes this problem and is less susceptible to noise and provides better results.

Power Dissipation
To achieve better performance of the system many battery powered nodes are used in the hardware which leads to energy dissipation. This loss needs to be properly checked and should be tried best to minimize it as much as possible.

Deployment Issues
All of the methods have a particular hardware requirement which needs to be done accordingly in order to utilize it in best possible way. Requirements of the methods differ from each other as in TOA synchronization of nodes is of primary importance and in energy based methods calibrated gains are needed. Some methods also require physical administration along with seasonal variation. Calibration of these problems can be time consuming [9].

System Flexibility
System flexibility is measure by how easily it is to cope when an issue arises within the network. As the methods for sound localization are applied in an open environment they face a lot of physical challenges. In this process, sometimes a part or a node (microphones) may fail to work so there is a need of backup so that even if a part fails, the localization estimates still get measured properly [9].

Scalability
With wide use of sound localization in many different applications, systems may need to be applied in very small places as well as larger places. Changing the hardware accordingly and scaling it with respect to the requirement is very important. With the correct knowledge of user's application different sound localization techniques must be scaled up or down.

Discussion
RQ-3. How can the limitations in the existing sound localization methods Be overcome using the current technologies?
After a complete and detailed study of different methods of localization along with the factors that should be kept in mind when choosing a particular method it is clear that choosing one method is very crucial task and there will always exist a tradeoff between the attributes. Considering the application of designing a stable system using only two microphones, DOA has been chosen because of its no need of synchronization of the inputs which is very important in many other methods and it is very difficult to achieve that precise synchronization [66][67][68]. No need for synchronization gives the user freedom of working with various inputs. It also consumes less bandwidth in comparison to other methods, which is highly beneficial as bandwidth is a costly resource. Bandwidth is needed to be preserved in order to have a budget friendly product which can readily be used by the people [9]. DOA is a broader field of signal processing which is divided into two categories which are self-adaption array signal processing and spatial spectrum. Spatial spectrum gives the distribution of signal to the receiver from all directions. So getting the information of signal's spatial spectrum will give the information about the DOA [13]. Along with some advantages, there also exists a drawback, which is the complex computation. This can be overcome by using powerful single board processors. Many in node signal processing single boards are being made to make the computation easy and they are readily available in market [9].
Keeping in mind the advancements and importance of sound localization, many different methods have been proposed to know the source location correctly. For this purpose, many different arrangements of microphones are tested by changing the distances between the microphones in order to have the correct knowledge about how changing one parameter change the performance of the system and how it can be further improved. Till now, many different types of microphone arrays are used, which include circular arrays [39], hexagonal arrays [48], 2D arrays [49], linear arrays [50] and ad hoc arrays [51]. All of these arrangements have advantages over one another and are used depending upon the application by the user. Single microphone arrays have certain limitations of physical size and processing power which are overcome by the use of ad-hoc microphone arrays as they can cover larger area due to smaller in size and thus increases the spatial information [30].
The concept of using ad-hoc microphone arrays, will be very helpful in conferences because it can easily connect different devices and form an instant network. This will improve the experience of people attending the conferences and provide them great convenience. They need not to worry of any type of special arrangement or devices, as easily available daily use devices such as tablets and mobile phone can also be connected to form an ad-hoc array.
Although the use of ad-hoc microphones comes with certain challenges in their use as all the microphones in the array have their own clock which will give rise to the problem of sampling rate mismatch affecting the performance of traditional multichannel sound enhancement algorithm. Mismatch exists between the test and anechoic training data, which can be reduced by the use of spatial distribution of nodes in the ad-hoc, arrays [31]. In ad-hoc arrays, the microphones have un-calibrated nodes and their relative position is also unknown which provide no information regarding the geometry of the array and partial information of the distance between microphones. This problem can be solved by the use of Euclidean distance matrix completion algorithm [32].
In order to bring novelty to the field of sound source localization and enhance precision, a new geometrical arrangement of microphones can be proposed to locate the sound source in 3D space with maximum accuracy. This configuration should require a minimum number of microphones to avoid complexity and abundant use of resources. One possible way is to estimate DOA using phase difference information among the sound waves received by different microphones [52,53]. The use of a small number of microphones will make the system a good choice to be used in the embedded systems and wearable devices such as hearing aids and supporting devices for blinds. The sound source localization problem is also needed to be tackled in real environments with less than ideal conditions due to the presence of noise, echo and other reverberations. Figure 5 shows how various signal processing techniques have been combined and applied to measure both the DOA and the exact 3D location of the sound source [68][69][70].

Conclusions
This paper has discussed the most recent sound source localization techniques presented in the literature. Three major research questions have been proposed which have

Conclusions
This paper has discussed the most recent sound source localization techniques presented in the literature. Three major research questions have been proposed which have been elaborated extensively. These are: RQ-1. Which sound localization methods have been recently presented in literature? RQ-2. Which methods affect the choice of sound localization methods? RQ-3. How can the limitations in the existing sound localization methods be overcome using the current technologies? In the light of the results presented in the paper, it can be concluded that the most common and effective methods for sound localization include energy-based localization, TOA, TDOA, DOA, beamforming, IID and SRP. For sound localization specifically in 3D space the most common technologies are the steered beam-former method, beam-former method, AVS array, advanced microphone array and multiple microphone array. The most common factors affecting the sound source localization methods include cost efficiency, measurement errors, power dissipation, deployment issues, system flexibility and scalability. Minimizing the number of microphones in the configuration can reduce the resource consumption while presenting a more cost-effective solution. This can be achieved by using only two microphones mimicking the auditory system in human beings consisting of two ears. For such a system DOA is identified as an effective approach as it does not require the inputs to be synchronized which can be difficult to obtain precisely. Moreover, the benefits of using ad-hoc microphones have been highlighted as such an arrangement does not require special geometry and an instant network can be formed for application of the sound localization system in conferences. In the end, a system has been proposed which effectively targets the issues faced by the current sound source localization technologies. This system is composed of a minimum number of microphones. A DOA method is proposed, which calculates the phase difference among the sound waves received by the microphones. Such a system will be compact in size and can be efficiently used in embedded systems and hearing aids.