A Feed-Forward Neural Network Approach for Energy-Based Acoustic Source Localization

: The localization of an acoustic source has attracted much attention in the scientiﬁc com-munity, having been applied in several different real-life applications. At the same time, the use of neural networks in the acoustic source localization problem is not common; hence, this work aims to show their potential use for this ﬁeld of application. As such, the present work proposes a deep feed-forward neural network for solving the acoustic source localization problem based on energy measurements. Several network typologies are trained with ideal noise-free conditions, which sim-pliﬁes the usual heavy training process where a low mean squared error is obtained. The networks are implemented, simulated, and compared with conventional algorithms, namely, deterministic and metaheuristic methods, and our results indicate improved performance when noise is added to the measurements. Therefore, the current developed scheme opens up a new horizon for energy-based acoustic localization, a ﬁeld where machine learning algorithms have not been applied in the past.


Introduction
The localization of an acoustic source in Wireless Sensors Networks has been commonly employed in several real-life problems. Examples of its application can be found for energy control of buildings [1], ambient assisted living [2], underwater acoustic networks [3], wildlife monitoring [4], smart surveillance [5], shooter detection [6], or as a complementary source of information to other locating platforms [7,8]. The solution to the problem consists of obtaining measurements that represent the distances from an acoustic source to sensors that acquire the measurement. Contrary to range-free methods, physical measurements such as time-of-arrival [9], time-difference-of-arrival [10], or direction-ofarrival [11] have shown promising results for acquiring distance measures; however, they rely either on high-precision hardware for timing purposes or on microphone sensor arrays for angle perception. Contrarily, range-free methods rely on information about connectivity and propagation patterns and are, thus, highly dependent on environmental conditions [12,13]. The acoustic energy decay model as an indirect measure of distance was initially proposed by empirically analyzing the sound emitted from an engine [14]. The localization approach considers averaging the energy of the received acoustic signal data samples, standing out for lower bandwidth since it is sampled at a much lower rate [15]. Additionally, the required hardware becomes very simple, having as the main part a single microphone that converts acoustic pressure into an electrical signal. This model is considered here, that is, the energy measured at each sensor related to the transmitted power and an inverse proportionality to the squared distance between the sensor and the acoustic source.
With regard to artificial intelligence, Artificial Neural Networks (ANNs) have reached a point of maturity that allows for wider use. Regarding its employment in location and positioning, the use of a Multilayer Perceptron (MLP) was firstly evaluated in terms of accuracy, memory, and computational requirements, showing promising results [16]. It was shown that the MLP could potentially achieve higher localization accuracy, requiring less computational effort and lower memory resources when compared with an Extended Kalman filter, for example. More recently, ANNs have reported promising results in applications such as simultaneous localization and mapping technique [17] or Wi-Fi Fingerprint [18]. However, its major drawback lies in their training stage, where the physical presence of training devices is necessary and tedious. Further, Convolutional Neural Network designs were studied with application to localization, by simulating hydrodynamic flow caused by target objects of location, with meritorious accuracy results [19]. Hence, although the application of neural networks can be seen as an option for localization problems, challenges remain to be overcome with regard to their training and structure.
It is worth mentioning that the learning procedure of a neural network can be either supervised or unsupervised. The related works presented above consider that the desired output is already known. As such, during the learning process, the units (or weight values) of such a network are determined given pairs of input/output values. Depending on the difference between the current iterative output and its target (known desired output), an error value is computed with the goal of being minimized. This procedure is called supervised learning, as the current output is being supervised, or monitored, to match the desired one [20]. On the contrary, unsupervised learning does not consider the desired output to be known and has the goal of arranging groups of similar inputs close together. This effect can be used efficiently for pattern classification purposes. Examples of sound classification events can be found in [21], where different sources are separated and classified. Considering the present problem of acoustic localization, a supervised approach is considered, where the network will be trained taking into account observations of an acoustic model, and the corresponding coordinates in a predefined search-space.
Based on the previous discussion, the present work proposes a Deep Feed-Forward Network (DFNN) for solving the energy-based acoustic source localization problem. Several hyperparameters, such as the number of hidden layers, perceptrons, and epochs, are surveyed to build an effective training model. Moreover, the proposed DFNN is trained with noise-free input data and validated against different measurement noise levels. This corresponds to a training stage independent of physical devices, requiring only partial knowledge of the localization layout (dimension of the search space, number of sensors, and an environmental decay factor). The dataset is applied to the new DFNN, where the root-mean-squared error (RMSE) is calculated and compared with several state-of-the-art algorithms. The obtained results indicate that the proposed DFNN is a promising method for determining the location of an acoustic source through energy-based measurements, attaining lower RMSE errors for a wide range of the measurement noise, associated with its low implementation complexity. Besides the simplified methodology for network training, and as far as the authors' knowledge, artificial neural networks have not been applied to the energy-based acoustic source localization problem in the scientific literature. While the existing works on different localization schemes consider hardware-based methodologies for network training, the current work represents the first complete study of the energy-based localization problem via neural networks.
The main insights and contributions of the present work are summarized as follows: (1) a DFNN architecture for solving the energy-based acoustic source localization problem is proposed, where several hyperparameters are tuned; (2) the proposed DFNN is trained with noise-free input data and, later on, validated against different measurement noise levels; (3) the proposed training stage allows for offline network training, without any complementary hardware or previous data acquisition; (4) the proposed architecture supplants traditional methods for a wide range of noise levels; (5) the present work stamps the appliance of artificial neural networks to the energy-based acoustic source localization problem.
The remaining paper is organized as follows. Section 2 summarizes previous related work and Section 3 provides the theoretical background on energy-based acoustic localization and DFNNs. Section 4 describes the proposed methodology, while Section 5 assesses its performance. Finally, Section 6 concludes the work.

Related Work
Traditionally, the energy-based acoustic source localization problem has been addressed by the use of deterministic methods [22]. With this proposal, a weighted leastsquares method was applied in [23], which was enhanced with a correction technique and presented good results for low values of noise, but its performance is degraded in noisy environments. A closed-form solution was proposed in [24], which exhibits good performance for low noise power, but also suffers considerable degradation for higher levels of noise power. The method proposed in [25] stands for its simplicity, consisting of a bisection approach, where good performance is obtained for low values of the measurement noise. Considering the fact that the problem is highly nonconvex, the use of convex optimization methods was proposed in [26,27] through semidefinite programming relaxations. Considering that the problem is not approached directly, but rather through approximations, its enhancement was proposed in [28][29][30] by applying second-order cone programming. The methods based on convex optimization performed well, even in noisy environments, with their only drawback being their computational complexity, which increases geometrically with the size of the network. Besides deterministic methods presented, the use of swarm-based optimization-namely, Elephant Herding Optimization (EHO) [31,32]-was initially proposed in [33,34]. The methodology was enhanced in [35,36] by new population initialization strategies, combining computational simplicity with low positioning error. The improved EHO [35,36] demonstrated a high suitability when considering embedded implementations for real-time applications, mainly due to its low latency, although knowledge of the noise statistics is assumed by the estimator.
The current state-of-the-art about the acoustic source localization problem is mostly based on triangulation methods that rely on Euclidean geometry. To this end, some measure of distance is acquired by a sensors network with the purpose of inferring geometric properties on the source location, namely, energy, angle, or time [37]. The correlation between the measured quantity and the coordinates of the physical location in space is generally non-linear, non-convex, and subject to different sources of noise. Under near-ideal conditions, these models can very accurately estimate the acoustic physical location without error bounds and among different environments. Nevertheless, sources of measurement noise, simplifications in the ranging models, or complex environment conditions, have great impact on the accuracy and reliability on the system performance. Basically, modelbased methods become less reliable when more effects are present that were not considered in its physical mode foundations, i.e., when one has less confidence in the model itself. Theoretically, the Cramer-Rao lower bound is usually applied to demonstrate and state the estimation bounds; its dependencies with the physical model; and, of major importance, the geometry of the problem [15,24,[38][39][40][41]. Unlike model-based methodologies, datadriven (or learning-based) methods are an alternative to solve constrains that construct a mapping function by acquired knowledge. More specifically, ANNs can behave as universal approximators, and almost automatically discover features relevant to the localization problem by exploiting the increased amount of sensor data and computational power on an offline stage. The localization problem is treated as a regression one, where distances are learned, mapping coordinates in a search-space, having modest or nonexistent physical knowledge intrinsic to the problem under scrutiny [42][43][44][45].

Theoretical Background
The current section aims to provide the theoretical foundations of both the energybased acoustic localization problem and the deployment of DFF Networks.

Energy-Based Acoustic Localization
Location of an acoustic source, by exploiting energy measurements acquired by sensors, was firstly addressed in [14]. The proposed model considers M noisy measurements within a time window T = M/ f s , where f s is the acoustic sampling frequency and averages energy signatures over the time window [t − T/2, t + T/2]. The obtained measure at the ith sensor is then modeled as follows [14,15]: where g i is the gain of sensor i, P is the transmitted power, x = (x x , x y ) T is the unknown location, s i = (S ix , S iy ) denotes the known location of sensor i, ν i represents the measurement noise modeled as Gaussian, N is the total number of sensors, and β is a decay factor dependent on environmental conditions. For the sake of simplicity, the decay factor is considered β = 2 [14,15], which corresponds to an outdoor setting. In Expression (1), the observed measurements represent the power received per unit surface area where that emitted energy falls, and thus, its unit is W/m 2 . With the purpose of generalizing the physical conditions of the problem, the observations will be numerically treated as dimensionless in the present work. When performing several measurements (minimum of 3 for a bidimensional space), the unknown position would be determined as the intersection point of circumferences, centered on the sensor coordinates, with radii obtained from Expression (1). Due to noise corruption of the measurements, the intersection of the circumferences will likely form an area, rather than a single point ( Figure 1). Hence, this work will train the proposed DFNN with noise-free data only, avoiding in this way the need for characterizing the statistical behavior of ν i . This will imply a simple procedure, independent on real terrain acquisition. Regarding both deterministic and metaheuristic algorithms, all observations from the multiple sensors are aggregated as an estimator of x, where the solution of the localization problem is the argument (pair of coordinates) that minimizes the expression [14,15]: (2) The estimator in Expression (2) is highly nonconvex, with singularities in each sensor's coordinates, several suboptimal solutions, and saddle regions. All the enumerated features makes the problem very challenging in the field of numerical optimization, making it a good candidate in the context of regression and ANNs. The considered energy model (Expression (1)) relies on the fact that the acoustic source is stationary. Targeting moving sound sources is mostly considered with direction or angle measurements [46,47], combined with microphone arrays [48], or mostly relying on the measurement of propagation time [49], given the achievement of very satisfactory accuracy, despite the complexity of the physical devices employed. While in the space-state domain Kalman filters and maximum a posteriori estimation are commonly used [50], ANNs have also been considered [51,52]. In this case, time series prediction is performed with the resource of recurrent neural networks composed of long short-term memory cells [53]. Although network structures are currently well defined in the literature, problems with the need to find the appropriate sample rate, the need to identify an appropriately sized input window, or to archive stability are still challenging.

Deep Feed-Forward Neural Network
Theoretical results on ANNs, known as the Universal Approximation Theorem, assert that a single hidden layer on a sigmoidal feed-forward ANN with a sufficient numbers of nodes is capable of approximating any continuous function with admissible accuracy [54]. The theorem was generalized to feed-forward multilayer architectures in 1991 by Kurt Hornik [55] and, more recently, it was shown that universal approximation also holds for unbounded activation functions such as the rectified linear unit (ReLU) [56]. The mentioned theorem allows the following hypothesis: if energy values are measured at at least three sensor positions, at a specified point in 2-dimensional space, then a sigmoidal feed-forward artificial neural network may be established, which takes as input the energy values measured at the sensors and predicts the coordinates of the source propagating the acoustic signal. According to this hypothesis, the key task of this work is to correctly set the network topology and its training. Fundamentals of ANNs rely on modeling the biological neuron and its intercommunication cells called synapses, where an artificial neuron (or perceptron) is obtained. A deep neural network will have an input layer, two or more hidden layers, and one output layer. Each layer is connected to the next one through some synaptic weights, forming a flow of information in one direction, from the inputs to the outputs.
The Sigmoid (or Fermi) activation function was one of the first to be applied in ANNs. The function maps the input to a value between 0 and 1, having a simple derivative that efficiently performs the network training through gradient-based algorithms [54]. Since a continuous signal is faked as output (coordinate values with regard to the search space), the hyperbolic tangent function is considered. The function is similar to the previous sigmoid and shares much of its properties. However, this function allows us to map the input to any value between −1 and 1 [57]. Putting a network to work consists of determining the weight that best fits the training data, that is, the set of pairs of inputs and outputs. Essentially, it consists of a regression problem, where the extent of the network can reach a large number of dimensions and, therefore, ANNs will imply good performance [58]. The learning problem is formulated in terms of minimization of an objective function, which measures the performance of a neural network on a predefined data set. Concerning the network training, the backpropagation algorithm is one of the most popular methods to train neural networks [59]. Nonetheless, original backpropagation suffers from slow convergence, and thus, several variants have been proposed in the literature. The Levenberg-Marquardt (LM) algorithm is an alternative for training ANNs based on a nonlinear optimization [60]. The method employs an approximation of second-order derivatives of the objective function so that better convergence behavior can be obtained [61]. Based on the above discussion, our artificial neuron model will rely on the hyperbolic tangent activation function and the network will be trained by the LM algorithm.

Proposed Method
The proposed approach for solving the energy-based acoustic source localization problem relies on a DFNN (Figure 2), where the inputs consist of the measures taken from microphones in the sensors' network. The structure of the network is therefore dependent on the problem layout. For a layout composed of N acoustic sensors, N inputs will be present on the DFNN. Regarding the outputs, independent of the number of network entries, these will always be two, corresponding to the estimates of the source coordinates (x x , x y ). Identifying network topology-namely, the number of hidden layers and the number of perceptrons in each layer-has been traditionally based on trial and error [62]. Although it was proven that one single hidden layer can approximate any continuous function [54], the number of required perceptrons in each layer can be as high as the number of training samples [63]. In fact, when considering m output neurons, the number of perceptrons to train N samples with some reduced error is given by 2 (m + 2)N [63]. In addition, it can be seen that a three-layer feed-forward neural network with k hidden perceptrons can assign arbitrary analogue values to j arbitrary inputs [64]. To overcome the physical dimension of the search space and the properties of the model (Expression (1)), a high number of samples must be used to train the proposed network. This situation would lead to a high complexity of the network structure, according to [63]. As such, a two-phase method is considered to determine the optimal number of perceptrons in the hidden layers of a 5-layer network [65].
The overall proposed network consists of one input layer, one output layer, and three hidden layers. The number of layers and the number of perceptrons in each layer are hyperparameters that must be initially specified [66]. Three hidden layers are empirically stated, considering one first layer, a nonlinear separation, one processing layer, and one aggregation layer to provide the outputs. The number of perceptrons in each layer will be under scrutiny and further analyzed and discussed. As mentioned earlier, one of the outcomes of the methodology is its training stage, where random samples are generated over the search space and the observations (or inputs) are calculated under ideal conditions, i.e., when no noise is present. This is because we propose generating the training data numerically, which will help us by removing the need to make data acquisition in the environment where the network will be implemented. The training stage is summarized in Algorithm 1. ( Since it is the topology of the network and its training strategy that is under scrutiny, the remaining parameters and tools are considered as well-established methods in the literature. The LM optimization method is applied, using the mean squared error (MSE) as a performance metric.
Looking at the inference stage of a feed-forward neural network, the computational complexity is related to the number of inputs, the number of layers, and the number of perceptrons in each layer. Let W ij be the matrix of weights connecting layer i to layer j. Obtaining the outputs of layer i implies computing some S j = W ij · Z i . From this point, the activation function f (x) is applied as Z j = f (S j ), where Z j is the output of layer j obtained from the previous outputs of layer i, Z i . Thus, if the network has L layers (including an input and an output layer), the expression is evaluated (L − 1) times, where, for each layer, a matrix multiplication and an activation function is computed. Considering the MLP case, the weight matrix W 12 ∈ R N×p , where p is the number of perceptrons in the hidden layer, while W 23 ∈ R p×2 , since there are only two outputs. When considering the DFNN case, the matrices linking hidden layers belong to vector space R p×p . Taking the complexity with regard to the network dimension, one can see that only W 12 will increase accordingly. This situation is valid for both the MLP and DFNN cases. As such, the complexity will then be of order O(n), regardless of the network structure.
In summary, Figure 3 shows the two stages that establish the proposed method. Initially, a set of data is generated, where pairs of observations with the corresponding coordinates (lines 2 and 4 of Algorithm 1) are obtained. This data construction depends only on the model (gain, transmitting power, and sensors' positions) and the layout (upper and lower bound) features, and thus, do not need an environment-dependent data acquisition. In this offline stage, the weights of the network are obtained. The second stage matches the online processing phase. Here, noisy measurements are acquired through the sensors and inference in the network, obtaining the coordinates of the acoustic source. It should be noted that, firstly, no assumption regarding statistical features of the noise was made, contrarily to the methods presented in Section 2. Secondly, the inference of the network was performed with noisy measurements, while it was trained with a noise-free collection of pairs of simulated coordinates and observations.

Results and Discussion
The first hyperparameter to be addressed concerns the number of layers and the number of perceptrons in each layer. In order to study this hyperparameter, several scenarios were created that include arranging N (N = 3, 6, 9, 12, and 15) acoustic sensors on a circle centered at the middle of the search space, with radius equal to 50 m. The training data are randomly generated over the search space and made up of 10,000 coordinates (X x i , X y i ), where 70% of the samples form the training set, and the remaining 30% form the validation set. Firstly, all of the samples are generated using MATLAB ® R2020b on an AMD Ryzen™ R7-4700U Octa-Core processor, featuring 2 GHz and 16 GB of RAM, running on a Windows™ 10 operating system. The generated coordinates and the sensors' distribution over the search space are represented in Figure 4 when considering N = 6, where a distance of 5 m is considered to avoid singularities that would arise when the source matches the coordinates of the sensors (Expression (1)). Secondly, the corresponding observations are calculated under ideal noise-free conditions, by applying Expression (1) with ν i = 0. This strategy implies that the network is trained offline, with "measurements" obtained numerically, and as such is not dependent on field acquisitions. Thirdly, the input data are scaled and normalized athwart Batch Normalization [67]. With this procedure over the input values, the fastest and more stable network is obtained, by recentering and rescaling the observation values [68]. Basically, the Batch process performs a rescaling of the input data over the domain of the activation function. As it is a linear mathematical operation, the neighboring information is preserved, and a centralizing and stretching effect is observed. The histogram of Figure 5 shows the distribution of the data obtained with the model (Expression (1)) in red and the result of applying the Batch Normalization algorithm in blue, corresponding to the total dataset when N = 3. It should be noticed that without this rescaling, the hyperbolic tangent activation function would behave as a linear one, since all values would tend to zero from positive values. The rescaling also highlights the fact that the input data approaches a Rayleigh distribution. This behavior is expected since the three sensor readings are independent and identically distributed Gaussian with equal variance and zero mean [69].
Regarding the training parameters, the LM optimization method is applied, using the mean squared error (MSE) as a performance metric, over 1000 epochs, a minimum performance gradient of 10 −7 , and a learning rate of 0.001. We consider that the training is finished when the total number of epochs is reached, the performance gradient falls below 10 −7 , or the performance is minimized to its goal, which that would be hypothetically null (MSE → 0). The transmitted power is set to P = 5 and the sensors gain to g i = 1 for i = 1, . . . , N. To assert the performance of the proposed DFNN, all scenarios (N = 3 to N = 15 with increments of 3) are trained with network structures of 3, 9, and 27 perceptrons per layer ( Figure 6). To demonstrate the effectiveness of the Deep Learning methodology, the same procedure is done on an MLP, consisting of only one input layer, one hidden layer, and one output layer (Figure 7). This comparison, the DFNN with the MLP, indirectly asserts the number of layers in the network, since networks with one and three hidden layers are being balanced out. The case of using only one hidden layer is identified as MLP 1 throughout the results discussion, and the DFNN with 3 hidden layers is identified as DFNN 3 . It should be noted that the acronym DFNN is used here to distinguish the two networks in terms of the number of layers and that a DFNN is also an MLP network, only with a higher number of hidden layers.  When analyzing the obtained MSE over the 1000 epochs of the MLP 1 (Figure 7), one can see that the training stopped prematurely (due to gradient performance) for all scenarios considered, with different values for the number of sensors N for 3 and 9 perceptrons. The networks consisting of 27 perceptrons per layer reached 1000 epochs with an MSE of approximately 6.5 × 10 −3 and 3 × 10 −7 for N = 3 and N = 15, respectively. Concerning the DFNN 3 , only the case of N = 3, N = 6, and N = 9 with 3 perceptrons per layer stopped prematurely, but the DFNN 3 with 27 perceptrons per layer reached much lower MSE values (6 × 10 −7 for N = 3 and 9 × 10 −12 for N = 15). This situation demonstrates the superiority of the proposed DFNN over a standard MLP, justifying its implementation, although with higher complexity in training or network structure. Since the training is performed in an offline stage, and online processing involves mostly matrix summation and multiplication, this complexity increase does not directly affect the network behavior, increasing linearly with the number of sensors, as demonstrated. The performance of the newly presented localization method will be compared in terms of root-mean-squared error (RMSE) defined as where x i is the true (unknown) source location of the ith sample, x i is the estimated coordinate through the DFNN, and M is the size of the data set. Since the RMSE measures the standard deviation of the residuals, the error maintains the dimension of the variables, allowing cross-checking of the error with the search space in meters. For that purpose, the MLP 1 is considered for comparison as a simpler neural network structure (composed of one hidden layer). Based on the previous analysis (Figure 7), 27 perceptrons are considered for the hidden layer. Additionally, two more methods are selected from state-of-the-art deterministic and metaheuristic algorithms. With regard to the deterministic one, although second-order cone programming can be considered as the current state-of-the-art [28,29], providing good results even for noisy environments, the bisection method proposed in [25] is considered due to its simplicity and reliability, denoted here by "EXACT". With regard to metaheuristic algorithms, the most recent version of EHO for acoustic localization is considered [32], and named "EEHO". Both methods that are compared with the present work have complexity in the order of O(n) [33]. The employed strategy to validate the four algorithms consists of generating 10,000 samples randomly distributed over the search space ( Figure 4). Secondly, seven new sample sets are created by adding different noise levels, generated from a Normal distribution. More specifically, the noise term ν i in Expression (1) follows a normal distribution with zero mean and variance σ 2 ν i , that is, ν i ∼ N (0, σ 2 ν i ). The variance is considered in a logarithmic scale (σ 2 ν i (dB) = 10 × log 10 (σ 2 ν i )) on the interval [−80, −50] dB, with increments of 5 dB. The performance comparison methods is evaluated in terms of RMSE. The noise variance interval intends to cover a wide range of the signal-noise-ratio (SNR). When considering a source at 1 m of the sensor, the SNR varies from 74 dB to 104 dB (dB ref. 1 W/m 2 ), but when considering a distance of 50 m-which would correspond to a source located at the edge of the search space-the SNR varies from approximately −5 dB to 30 dB (dB ref. 1 W/m 2 ).
The obtained results for both layouts (N = 3 and N = 12 sensors) are represented in Figure 8, with regard to the newly proposed DFNN 3 and the three algorithms chosen for comparison (MLP 1 , EEHO, and EXACT). Concerning lower values of the measurement noise, the performance of the considered methods is quite diverse. In this situation, the EXACT method has the highest error (8.45 m for σ 2 ν i = −80 dB and N = 3); followed by MLP 1 (3.41 m for σ 2 ν i = −80 dB and N = 3); EEHO (1.57 for σ 2 ν i = −80 dB and N = 3); and finally, with the lowest error, the proposed DFNN 3 (0.53 for σ 2 ν i = −80 dB and N = 3). When considering N = 12, the proposed DFNN 3 reaches values of error as low as 0.096 m for σ 2 ν i = −80 dB. When analyzing the methods' behavior for higher values of measurement noise, the results obtained tend to overlap and the differences of the errors are not as pronounced. Considering N = 12 and σ 2 ν i = −50 dB, while the EXACT method shows an error of 17.53 m, MLP 1 , DFNN 3 , and EEHO show 9.80 m, 8.77 m, and 8.50 m, respectively. This means that, although the EXACT method stands out in a negative sense, the remaining methods show an equivalent behavior. Even so, the proposed DFNN 3 assumes greater simplicity in terms of its implementation when compared to EEHO. As a last remark, it is worth mentioning that DFNN 3 for N = 3 performs concurrently with its counterparts when considering N = 12. This situation implies that the proposed method needs less sensors to achieve the performance of others. This DFNN 3 network behavior is not surprising since the network has been trained with synthetic or ideal observations. Even so, it works relatively well in scenarios where the noise is not small (or at least not worse than the other methods). As can be seen from Figure 8, when noise power is low to medium, there is quite a performance margin between the proposed method and the remaining ones (e.g., for N = 3, noise variance = −80 dB, the proposed method outperforms EEHO, MLP 1 , and EXACT for around 1, 3, and 8 m, respectively). This behavior can be explained by the fact that the proposed method is trained with noise-free data, which allows for better accuracy for low-to-medium noise power. As the noise power gets higher, one can see that the performance of all methods deteriorates significantly, and that all of them have roughly the same location accuracy.
With the goal of getting an even better insight on the performance of the proposed DFNN 3 , we employ another performance metric in Figures 9 and 10. The figures illustrates the Cumulative Distribution Function (CDF) of the localization error (LE), LE = x i −x i 2 , over the M samples, when σ 2 ν i = −80 dB for N = 3 and N = 12. From Figure 9, for N = 3, one can see that the proposed solution achieves LE < 1 m in 90% of the cases, while LE < 2.5 m and LE < 5 m achieved for EEHO and MLP 1 in the same percentage, respectively, whereas EXACT achieves LE ≤ 10 m in around 75% of the cases. When N = 12 ( Figure 10), one can observe LE < 0.15 m for DFNN 3 , while the EEHO, MLP 1 , and EXACT methods achieve only LE < 0.32 m, LE < 0.45 m, and LE < 0.5 m in the same percentage of cases, respectively. The obtained results are in line with the previous analysis regarding RMSE performance, confirming the effectiveness of the proposed solution.

Conclusions
A new method based on DFNNs for solving the energy-based acoustic localization problem is proposed in the present work. Its particularity consists of a reliable and straightforward training stage, where the dataset is generated under ideal environmental conditions and noise-free measurements. The simulation results demonstrate that the methodology exceeds the performance of the state-of-the-art for lower values of measurement noise, while it matches its performance in noisy environments. Concerning environments with a higher noise level, the proposed method matches the performance of its counterparts. The new methodology paves the way to machine learning, more specifically, the use of Neural Networks, for the acoustic localization problem.
This work considered the localization of a single stationary source. Generalizing the presented algorithm for the localization of nonstationary sources through extending the feed-forward network to process time serialized data, is to be considered as future work.

Conflicts of Interest:
The authors declare no conflict of interest.