Acoustic Indoor Localization Augmentation by Self-Calibration and Machine Learning

An acoustic transmitter can be located by having multiple static microphones. These microphones are synchronized and measure the time differences of arrival (TDoA). Usually, the positions of the microphones are assumed to be known in advance. However, in practice, this means they have to be manually measured, which is a cumbersome job and is prone to errors. In this paper, we present two novel approaches which do not require manual measurement of the receiver positions. The first method uses an inertial measurement unit (IMU), in addition to the acoustic transmitter, to estimate the positions of the receivers. By using an IMU as an additional source of information, the non-convex optimizers are less likely to fall into local minima. Consequently, the success rate is increased and measurements with large errors have less influence on the final estimation. The second method we present in this paper consists of using machine learning to learn the TDoA signatures of certain regions of the localization area. By doing this, the target can be located without knowing where the microphones are and whether the received signals are in line-of-sight or not. We use an artificial neural network and random forest classification for this purpose.


Introduction
The applications for which indoor localization systems are envisioned are immense. Among others, these systems can be used in intralogistics. The domain of in-house logistics of commercial enterprises, sometimes also referred to as intralogistics, covers the entire range of organization, execution and optimization of in-house material flow and warehousing. While traditionally, and by the means of material transportation, a static infrastructure is understood, involving conveyor belts, manually operated cranes and fork lifts, there is a trend towards autonomous agents. An important factor is localization and tracking of these autonomous systems [1,2]. Today, many different indoor localization systems are available based on different methods. A large number of them use radio frequency (RF) signals for localization [3][4][5][6]. The RF-Systems use the propagation of radio waves. Another possibility is to use the propagation of sound for localization [7,8]. Indoor localization systems work with indoor satellites (anchor nodes), which have to be installed inside the indoor environment. Most of the traditional indoor localization methods assume that the position of these anchor nodes are known in advance. By knowing their position, the position of the target can be estimated. However, this means that in a real-life scenario, they have to be manually measured, which is a cumbersome job and prone to errors. Therefore, a system which is capable of locating the target without manually measuring the anchor node positions is necessary. We propose two different methods. The first method estimates the positions of the anchor nodes and the target using non-linear optimization. This allows locating the target with an error in the order of centimeters. This method increases the success rate of traditional self-calibration methods by using an inertial measurement unit (IMU) as an additional source of information. Moreover, it is robust against non-line-of-sight measurements, as it discretizes the solution space, choosing the discretized solutions which comply with the information gathered by the IMU.
We propose another method which uses machine learning to identify when the target is in predefined regions. This does not allow localization outside these regions. However, it ensures proper localization in the zones of interest. This is because one can locate the target independently of whether the received signals are in line-of-sight or non-line-of-sight. The algorithm learns from the measurements that are received in the region regardless of this matter. Moreover, the installation effort is reduced even more than estimating the anchor node positions. This is because when estimating the anchor node positions, one needs to ensure that all receivers get enough line-of-sight measurements from different positions in the location area, so that their position can be estimated. Moreover, one needs to ensure that the graph of estimated positions is rigid. This means, there needs to be enough sender positions so that all receivers are constrained by the same coordinate system without allowing any rotation or translation.
In this work, we present a novel method for self-calibration, which determines the anchor positions before the localization of the target. The goal is to improve the success rate of the traditional self-calibration methods. The novel algorithm that is presented in this study simplifies the calibration of acoustic localization systems without compromising the robustness of the system. This is due to the fact that the information gathered from the inertial measurement unit is used to refuse unlikely distributions of receivers.
Additionally, in this manuscript, we show how machine learning algorithms can be used for locating a target when the receiver positions are unknown. This is specially useful in situations where one wants to know whether a speaker is in a certain area. By training the algorithm over different regions, one can then identify in which of the predefined regions the target is located. The results suggest that this method can be applied for zone detection in real-life scenarios.

State of the Art
In classic hyperbolic TDoA localization, the positions of static receivers are assumed to be known in order to locate a moving sender. In the scope of this work, we will refer to the stationary nodes as anchors, and as beacons to the localized nodes. In some publications, it is suggested to compensate for given erroneous reference anchor positions [9,10], but in general, the positions of anchors are assumed to be precisely determined by external means in an external coordinate system. Manual measurement of anchor positions may be done by a measuring tape, a laser range finder, a grid arrangement or specifically structured sensor array, or by geodetic methods.
The problem of calibration-free TDoA is challenging. First, the number of unknown variables to be determined is higher than in conventional TDoA, as the anchor positions need to be estimated. Second, the beacon and anchor positions depend on each other, which is adverse to the robustness of the estimation. For instance, one failed measurement at a specific time, a large outlier, affects the estimation of anchor positions, which again affect the estimation of other sender positions. In contrast, in conventional TDoA, a failed measurement can only influence a single sender position. Third, due to the high dimensionality of the problem, many strategies to linearize the problem [11,12] cannot be applied in general, so iterative approaches are required, which are time-consuming and require careful initialization.
For self-calibration, different methods are available. In [13], Biswas and Thrun apply a maximum likelihood estimation algorithm to localize the anchors and the beacon simultaneously. In [14], Wendeberg et al. present a non-linear optimization approach for TDoA self-calibration. In the self-calibration problem, both beacon locations and anchor positions are unknown variables. Starting from the randomized initial values, Wendeberg et al. use gradient descent and the Gaussian-Newton method to search for a minimum of the TDoA objective function.
Due to the non-convexity of the hyperbolic function, there may be multiple local minima. Thus, the non-linear optimization method does not ensure a global minimum. In [14], Wendeberg et al. repeat the randomized initialization to increase the probability of finding the global minimum. Furthermore, in [15], Wendeberg et al. solve the self-calibration problem with a branch-and-bound algorithm. In comparison to the optimization-based method, the branch-and-bound method requires more computational power, as revealed in [16]. In [17,18], a far-field assumption is used to simplify the equations and initialize the variables. In contrast, here we present an initialization technique which utilizes the gyroscope measurements to generate an initial guess close to the true positions. It improves the success rate of self-calibration and increases the converging rate during the optimization process. For the sake of convenience, we name this technique direction adjudgment, or DA in short, and we use DA-SC to refer to the self-calibration initialized by direction adjudgment. Meanwhile, in this work, RA refers to the randomized initialization and RI-SC refers to the self-calibration initialized by randomized initialization.
In this work, we also explore the possibility of using machine learning for detecting whether a speaker is located in some predefined areas, without having to measure or estimate the positions of the anchors. Some authors have explored the possibility of using these algorithms for TDoA localization in the past. For example, Niitsoo et al. [19] use the channel impulse responses for localization. Feig et al. [20] use recurring neural networks on drifting time-of-flight measurements. Zhang et al. [21] use a support vector machine (SVM) for identifying whether an acoustic measurement is in line-of-sight or non-line-of-sight. Ebrahimkhanlou et al. [22] use deep learning to locate acoustic emission sources in plate-like structures.

Localization by TDoA with Machine Learning
A simple but effective approach to avoid the tenuous manual measurement of all N k anchors' three-dimensional positions is to use a machine learning algorithm in order to learn the characteristic TDoAs at each position. This is specially useful in situations where the positional error is less important than a correct association of the position information, e.g., if a customer is at a certain table. Therefore, we define two-dimensional rectangular clusters that mimic this situation. In order to train the algorithms, we put the transmitting beacon in multiple positions b inside every cluster and save the first reception time t r of every anchor. We train a neural network and a random forest classifier with TDoA vectors τ that contain all the possible anchor pairing permutations. We train both so that we can compare their performance over different circumstances.

Time Difference of Arrival (TDoA)
Having one of the N k anchors with index i located at a three-dimensional position k i and a speaker located at a position b, the distance between the two nodes is described in Equation (1). Consequently, a signal traveling from b to k i is received at time t i as in Equation (2).
where c is the sound velocity and t b the starting time of the signal transmission. Then, having another one of those anchors with index j at position k j , one can calculate the TDoA τ ij as formulated in Equation (3), which does not depend on the sending time.
In order get a notion of how large the clusters can be, one can study what is the minimum achievable position error using TDoA and assuming every timestamp has additive Gaussian noise with standard deviation σ n .
When using hyperbolic multilateration, two timestamps are used for every measurement, which means the noises between multiple measurements are correlated. As shown in [23], one can define a matrix D as: The first column corresponds to the reference anchor whose timestamp is subtracted to the others. The other N k − 1 columns correspond to the other anchors. Then the (N k − 1) × (N k − 1) noise matrix of the TDoA measurements will be: Then, the matrix A which contains an estimation of the covariance of the estimated variables can be calculated as follows: where H is the Jacobian matrix of the TDoA sensor model. The squared root of the trace of A is a lower bound for the achievable root mean squared error assuming a non-biased estimator: where b is the estimation of b. Moreover, if this term is normalized by the noise, it is known as the dilution of precision and is used to determine how the geometrical distribution of anchors affects the final position estimation:

Neural Network Classification
An artificial neural network consists of a high number of neurons n which are interconnected over a certain number M of layers. The layers between the input and the output layer are the hidden layers and determine the complexity of the functions which they approximate (see Figures 1 and 2). Then, the connection between a neuron i of the layer k − 1 with another neuron j of the next layer k is weighted with a weight w i,j before being added to the other weighted inputs of the neuron. Afterwards, each neuron adds a bias b j,k . The output o j,k of the neuron j in the layer k is calculated by the activation function g(·). This can be written as follows [19]: where N is the number of neurons per layer. The value of n i,k is the activation of a neuron in the previous layer.

Random Forest Classification
The random forest consists of multiple trained trees. These trees have multiple splits where the data gets separated into two similar sized groups. Possible criteria for the split are the gini index or the entropy. See Figure 3 for a simple example with five clusters C 1 to C 5 . In each branch, we have a simple condition based on one TDoA measurement. If one would use only one tree, a problem would arise. If we do not have any data about the root node or any node along the way, the tree cannot make a prediction, or at least, it can only be used to analyse a subset of clusters. To work around this problem, we do not train one tree but a forest of trees. All trees are trained on different subsets of the training data and use different splits to distinguish between the clusters. As well as reducing our reliance on each TDoA measurement, the ensemble of trees also reduces the bias that a single trained tree has. Another advantage of the random forest is that it is very easy to interpret as we can look into the splitting points and see how the clusters get separated.

IMU-Assisted Self-Calibration Algorithm
In this case, we aim to estimate the unknown variables: the position of the beacon b and the position of the anchors k i . In order to avoid that the non-linear optimization algorithms get stuck in local minima, we first estimate a suitable initial value for these variables by additionally measuring the rotational speeds with a gyroscope; in our case, embedded in an IMU. While the additional sensor generally requires a separate calibration on its own and increases the number of error sources, the proposed method only uses the gyroscope data for a rough estimation that is less influenced by the sensor error. Further signal pre-conditioning of the gyroscope data can be implemented to increase this estimation accuracy and limit the search interval for each anchor even further [24].
The proposed initialization algorithm assumes that at the beginning, the beacon is located under an anchor. Consequently, one can estimate the distance to other three anchors. By knowing these distances, one can define the coordinate system in such a way that two anchors are fixed and the other two must lie each one in a different circle. The solutions inside such circles are discretized into equally spaced points. When the target moves, it generates measurements which must be consistent with the estimated positions. The measurements from the anchors and the ones from the gyroscope are used to decide the most likely solutions inside the discretized solution space. Once the positions of the first four anchors have been estimated, the other anchors can also be located using the estimated sender positions. The details of this algorithm are provided in the next sections.

Anchor Distances and Possible Space Reduction
The unknown anchors are assumed to be located on two-dimensional planes with known fixed heights. Hence, the possible space of each anchor is a horizontal plane which has two degrees of freedom. If we take one anchor at position k i as the pivotal reference point and estimate the horizontal distance to another encompassing anchor at position k j , we can reduce the possible space of the latter to a circle whose midpoint is k i and its radius r ij equals to the geometric distance between the two positions, as formulated in Equation (10). Note that r ij is the projection of the geometric distance onto the lateral plane in the height of the beacon, in contrast to the distance between beacon and anchors (compare Equation (1)). This is also illustrated in Figure 4. In order to estimate the distance on the two-dimensional plane between those two anchor positions, we place the sender vertically below the anchor at the reference position. This allows for a simplified calculation of the time difference of arrival between the reference anchor and encompassing anchor.
Let τ ij be the mean value of the TDoA measurements and z i , z j and z b be the heights, we can regard the distance r bi in this specific arrangement as a entirely vertical and one-dimensional problem, as shown in Equation (14).
and thus we can express the horizontal distance r ij (see Figure 5) as the geometric distance formulated in Equation (16).
As the anchors are on the ceiling and our vertical coordinate increases from floor to ceiling, we assume that z i ≈ z j z b always holds. Equations (14) and (16) are only valid when the condition that the beacon is right under the pivotal anchor holds. We observed with a Motion Capture system [25] that the placement error due to our visual estimation remains approximately below 0.3 m to the actual position of the anchors. It is important to note that the error of human observation in this phase only influences the initialization of the variables. The aim is to find an initial value for the variables of the scenario which can then be refined with nonlinear optimization without getting stuck in local minima.
As mentioned before, the possible space of the anchor can be reduced to a circle if its distance to the pivotal anchor r 1,i is known. We further discretize the circle as discontinuous points on the circle. As the encompass anchor is at the coordinate origin in the x and y coordinates, the discretized positions of the pivotal anchors k − i are defined in Equation (17).
where z i is the known height of the anchor, r 1i is the estimated inter-anchor distance and −π φ i,k < π.
We discretize the angle as steps of ∆φ: where k ∈ Z. Here ∆φ decides the resolution of the discretized approximation. Figure 5. Knowing the distance between the pivotal anchor and the encompass anchor, the pivotal anchor must lie in a circle whose center is the encompass anchor.
In order to reduce the computational complexity, we avoid the congruent transformation of anchor configurations. We establish a coordinate system where the pivotal anchor is fixed at the origin and one of the encompass anchors on the positive half of the x-axis, i.e., at the point k 2 = [r 12 , 0], where r 12 is the estimated distance between the pivotal anchor k 1 and the first encompassing anchor k 2 . The possible spaces of the other two anchors k 3 and k 4 are circles around the pivotal anchor. Figure 6 gives an example for the first four anchors.

Straight Movement Detection
In order to further reduce the subspace of possible solutions k − , we use combinations of four anchors from the subspace and estimate the sender positions. Afterwards, we compare the results with the measurements from the inertial measurement unit and discard the anchor positions which lead to sender positions which contradict the IMU estimations. In order to do so, we use the inertial measurement unit to detect when the target is moving in a straight line. The heading direction remains unchanged when this happens. Therefore, the horizontal rotation is close to zero. We first smooth the rotation measurement ω z and then apply a threshold on the smoothed curve. The endpoints of the straight-moving intervals can be found around the intersections of the threshold line and the smoothed ω z curve.
An example is given in Figure 7. We can observe several peaks and saddles in the ω z curve. The peaks occur when the vehicle turns its direction and the saddles indicate that the beacon is moving in a straight direction. The magenta points denote the endpoints of the straight-moving interval, while the green points are the midpoints of the time interval. In Figure 8, we can see how the straight lines are successfully detected in a real trajectory. We name the endpoints of the time interval in which the target moves in a line as the critical time points, their location are the critical waypoints and the corresponding TDoA measurements the critical TDoA sequences. In this work, a waypoint refers to a location sample on the trajectory. Each waypoint is associated with a TDoA sequence.

Direction Adjudgment
In order to reduce the computational complexity and the calibration effort, we first assume that we have estimated the distance from the encompass anchor to three pivotal anchors. Then, we have four anchors, which are enough to estimate the position of the target using TDoA. Two of these anchors will have a fixed position and two of them will be located in circles; as it can be seen in Figure 6. In order to find which are the most likely positions of the anchors inside the circles, we use the IMU data. For every possible position of the anchors, we estimate the positions of the target using TDoA. In order to do this, we use a closed-form [26] formulation. From the TDoA estimations, we extract the forward directions and compare them with the rotations measured by the gyroscope. The similarity between the trajectory measured by the IMU and the one estimated with the TDoA measurements lets us know which are the most likely anchor positions.
If a target turns, then moves straight and then turns, the line is defined by the critical points b h j ,b e j which denote the starting and ending points of the line. Then, we can estimate the directionθ j for the straight segment j withθ whereθ j is the estimated direction of the straight segment j, b h j is the estimation of the start critical point and b e j is the estimation of the end critical point.

Rotation Calculation
In order to estimate the rotation, we integrate the horizontal rotation rate in the time interval [t m1 , t m2 ]. We formulate it in a generalized time interval [t m,j , t m,j+1 ] as where ∆θ j is the measured rotation between the straight segments j and j + 1, and ω z is the rotation rate around z-axis of the vehicle frame.
In the end, we apply to judge the direction, where θ th is a chosen threshold. If the measurements fulfill Equation (21), we consider the guessed anchor positions to be feasible. The trajectory may consist of several straight segments. Each pair of adjacent straight segments forms a constraint of the form formulated in (21). We sum up the number of satisfied constrains for each of the guessed configurations. In the end, we keep only the guesses with the largest number of satisfied constrains and exclude the others. We provide an example of the reduced possible spaces after direction adjustment in Figure 9. Note that the arc-shaped possible space is still not the initialization. We average the remaining φ i k of an anchor i and initialize it as k − , whereφ i k is the mean of the feasible angles. We provide an example of the result in Figure 9. The hollow circles are the remaining anchor positions while the points used for initialization are denoted as filled circles in Figure 9. In this section, we use the closed-form TDoA solution presented by Bucher and Misra in [26] to estimate the target positions. This closed-form solution requires four anchors. When the number of anchors increases, the possible solutions increase exponentially, which makes the direction adjudgment inefficient. Therefore, we introduce a new method to initialize the additional anchors.

Initialization of Additional Anchors
With the first four anchors initialized, we can use the initialized positions to approximate the true positions of the first four anchors. The subspaces of possible positions for the additional anchors are also circles. First, we randomly select several TDoA sequences and estimate the corresponding sender positions with the approximation of the first four anchors. With the estimated sender positions, we reconstruct the TDoA measurements that each additional anchor would receive and compare them with the measured TDoAs. Repeating this process over all the possible positions of the additional anchors, we choose the possible positions whose resulting forward directions comply best with the gyroscope measurements for initialization.
To initialize the trajectory, we randomly choose a configuration from the possible spaces for each of the waypoints on the trajectory and estimate the sender positions with the closed-form solution. Figure 10 gives an example of the initialization results. We choose a central anchor k 1 as the origin and a randomly selected encompassing anchor k 2 , through which we define the positive x-axis. The position estimation of the next anchors k 3 and k 4 is selected from the reduced solution sub-space. We estimate the initial positions of these latter two anchors from the average coordinates in the polar system.

Optimization
Once we have an initialization for the position of the anchor nodes and trajectory of the sender we minimize the following quadratic error function: The objective function is the squared sum of TDoA residuals, which is formulated as arg min In order to search for the minimum value of the objective function, we apply the Levenberg-Marquart algorithm.

Simulations
We designed and performed a simulation in order to verify that the direction adjudgment can be generalized to different anchor topologies.
The simulation depends on both the IMU and TDoA inputs. However, we are not able to simulate the IMU measurements since the theoretical models often differ highly from reality. Therefore, we use real IMU measurements. Based on the randomly generated anchor positions and the recorded beacon trajectories, we estimate the theoretical values of TDoA measurements and impose Gaussian distributed errors on the theoretical values. The assumption of a Gaussian distribution does not hold, as was also shown by Cui et. al. [27], but for this work we estimated the resulting error as small enough to be neglected at this stage.

System Error Distribution
First, we have to determine the error level of the TDOA localization system before starting the simulation. Therefore, we estimate the theoretically expected TDoA values using the real trajectories recorded by a Motion Capture system, which has submillimeter accuracy and precision, and the true anchor positions. We compared the TDoA measurements and the expected values in 200,138 TDoA measurements to estimate the measurement errors (see Figure 11). Fitting the error distribution into the Gaussian distribution, we obtained a Gaussian distribution with µ TDoA = −0.006 ms and σ TDoA = 0.637 ms. Since µ σ, we regard the offset of the error distribution to be 0. Actually, the localization system measures the timestamp of the signal of arrival. The TDoA is the subtraction of two timestamps measured by the different anchors. Assuming that the errors of the timestamp measurements from all the anchors comply with the same Gaussian distribution, i.e., e t ∼ N (µ t , σ 2 t ), we have: Therefore, the standard deviation is calculated as:

Performance Comparison of DA vs. RI
We randomly generated 1000 different anchor configurations. Each configuration contains nine anchors. In reality, one would not install two anchors very close to each other. Therefore, we set the minimum distance between two arbitrary anchors to be three meters. Figure 12 gives examples of simulated anchor configurations. In the simulation, we use two target trajectories (see Figure 13). Since we generate 1000 anchor configurations, we will have 2000 different simulation scenarios. In the end, we compare the performance between DA-SC and RI-SC over all the scenarios.   One of the performance indicators is the success rate. In the self-calibration problem, the anchor positioning error larger than 0.5 m is mainly due to local minima. Therefore, we have the following definition of success: Definition 1. Success of Self-calibration: A self-calibration attempt succeeds if the following condition holds where k i is the estimation of the anchor position and k i denotes the true position.
The direct self-calibration results are likely to be in a different coordinate system to the true anchor positions. The estimations are rotated and translated according to the reference system before the comparison.
Since direction adjudgment is a deterministic process, we evaluate only its single success rate from the first attempt. Table 1 gives the success rate at the first attempt between direction adjudgment (DA-SC) and random initialization (RI-SC) for the two different trajectories. Table 1. Comparison between success rates at the first attempt of the two self-calibration methods Direction Adjudgment (DA-SC) and Random Initialization (RI-SC). In addition, Figure 14 shows the difference of performance between DA-SC and RI-SC. The corresponding success rates given in Figure 14 and Table 1 are the percentage of simulated topologies that were successfully estimated by DA-SC. In contrast, randomized initialization is a stochastic process. We repeat the RI-SC for 20 times redoing the unsuccessful cases and count the accumulative success rate after each repetition. We observe that the success rates of DA-SC are nearly 100%. Comparatively, the success rates of RI-SC can only reach this level after 10-20 repetitions. As shown in Figure 15, since we use the same objective function, the calibration results have a similar error and the resulting mean errors have nearly the same value.

Experiments of IMU-Assisted Self-Calibration
We performed experiments for the verification and further analysis of the simulation results.

Locales and Set-Up
We present measurements from the two seperate locales Hangar (see Section 6.1.1) and Messe (see Section 6.1.2). These installations have different characteristics and, therefore, allow us to test our proposed algorithm under different conditions. The receivers are placed at a larger height in the Messe, which increases the dilution of precision. However, in the experiment in Hangar there are more obstacles and, therefore, the signals can be reflected or occluded.

Experiments in the Hangar
The anchors in the Hangar are mounted in a height of appoximately z H ≈ 4.9 m above the concrete floor to steel beam girders and powered permanently by wire. We measured the real positions of the anchors with a Total Station Topcon GPT-8203A for reference. A part of the anchor set-up is shown in Figure 16a for illustration. We use the Motion Capture (MoCap) system MotionAnalysis Cortex with Raptor-E cameras to record the true trajectory with sub-millimeter precision. The MoCap system only provides reliable positions in the area between the cameras, which results in a limited coverage area that is considerably smaller than the one of ASSIST. Figure 16b

Experimental Vehicle
We use a HTC One smartphone for both ASSIST and IMU tracking. The smartphone is fixed to a wooden support rack along with the motion capturing (MoCap) markers, and this is shown in Figure 18a. We align the smartphone to the midpoint between the rear wheels with the y-axis of IMU perpendicular to the rear axle. Moreover, we fix it horizontally with its screen facing upwards. As a result, the speaker, which is located above the top of the device's screen, is suspended in air. This is done to roughly mimic the smartphone being held in front of your chest while moving and to reduce offset errors. This set-up is then fixed to a push-cart as shown in Figure 18b, which we slowly moved through the experimental areas. The cart has two freely rotating front wheels and two fixed-heading rear wheels, known as an Ackermann steering geometry, which is common for wheeled vehicles [28]. All the wheels are nonholonomic which prevent the side slipping. The heading direction of the vehicle's body is always perpendicular to the rear axle [28], i.e., the moving direction is perpendicular to the rear axle.

Experimental Results
In the anchor self-calibration experiments in the Hangar we calibrate nine anchors. We also validate the direction adjudgment with 17 real-world experiments. Table 2 summarizes the 17 experiments, among which Exp. 01-12 were conducted in Hangar and Exp. 13-17 were in Messe Freiburg. We applied both direction adjudgment self-calibration (DA-SC) and randomized initialization self-calibration (RI-SC) over all experiments. We repeated the RI-SC for 100 times in each experiment, but execute the DA-SC for only once. Table 2 gives the comparison between DA-SC and RI-SC. We list the results of both initialization and optimization for DA-SC. To evaluate the success rate of RI-SC, we repeat the RI-SC for 100 times and calculate the percentage of success attempts. Note that we use Definition 1 again in this section to define the success of self-calibration. The result shows how the success rate is increased while the mean error is similar in both cases, as expected from the simulated environment. Of special interest are the experiments in the Hangar, as their success rate is limited using randomized initialization due to the large height of the receivers, which increases the dilution of precision. It is worth mentioning that the mean error in these cases is calculated with a reduced number of valid attempts. This is the reason why one observes a larger discrepancy in the mean error of both initialization strategies.

Experiments of Zone Detection with Machine Learning
The aim of this experiment is to compare the performance of each algorithm in whether it correctly associates the presence of beacons with the correct zone.

Experimental Setup
In order to test the capability of an artificial neural network to locate the target with unknown anchor positions and a realistic environment, we perform a real experiment in a cafeteria (see Figure 19). We mount seven anchors on the ceiling. Then, we place a sender in 29 different zones on tables, that are marked by tape. The received timestamps at each zone and the zone number serve as the training set for the machine learning algorithms.

Test Set Creation
We create a test set by placing the sender near each of the trained points in order to estimate the percentage of correct associations. We move the sender near every trained point so that we can simulate a real environment, where the sender is not placed exactly in the trained points.

Experimental Results
After evaluating the data, there were 91% correct associations using random forest and 88% using a neural network. Note that most of the cluster errors occur in close geometric proximity, and consequently similar TDoA. This is indicated by the arrows in Figures 20 and 21. If one allows a certain geometrical error in the estimations, the number of correct assignments is higher. For example, if only the correct table is to be identified, but not the individual position at the table, only the larger misclassifications beyond the table zones can be considered erroneous. The effect of increased cluster sizes can be seen in Table 3. The random forest achieves throughout the whole experiment a larger number of correct assignments than the artificial neural network. This difference is more noticeable for the smallest cluster size.  From the results in Table 3 we can conclude that machine learning can be used for localization in situation where one wants to avoid manually measuring the positions of the anchors and requires only to detect whether the speaker is in certain regions.

Conclusions
In this work, we have presented two methods for acoustic indoor localization which do not require manually measuring the positions of the microphones. We have presented a method for estimating the anchor positions using an inertial measurement unit as additional information. This method provides an initial estimation for non-linear optimization algorithms which often fall into local minima. By doing this, we increase the single-shoot success rate. We have shown the performance of our method in both simulations and real-world experiments. The other approach presented in this work consists of using machine learning to identify in which region a speaker is lying. We have shown experimentally how using random forest classification, one can know where the sender is lying with 94 % of accuracy defining regions of around 0.8 m 2 .
However, various open lines for future work remain. When estimating the position of the microphones, we have multiple restrictions on the trajectory for initialization, as one has to start directly beyond one of the anchor nodes and the trajectory must contain straight line segments. In the future, it would be desirable to overcome those restrictions and allow for more general trajectories.
So far, we have assumed to have all the anchors at a known height and performed two-dimensional localization only. An obvious line for future work is, thus, to extend the methods to three dimensions.
Furthermore, the machine learning approach could be extended by training the algorithm with other data such as the amplitude of the received signals or all the reception times of the reflections.