Indoor Navigation—User Requirements, State-of-the-Art and Developments for Smartphone Localization

: A variety of positioning systems have emerged for indoor localization which are based on several system strategies, location methods, and technologies while using different signals, such as radio frequency (RF) signals. Demands regarding positioning in terms of performance, robustness, availability and positioning accuracies are increasing. The overall goal of indoor positioning is to provide GNSS-like functionality in places where GNSS signals are not available. Analysis of the state-of-the-art indicates that although a lot of work is being done to combine both the outdoor and indoor positioning systems, there are still many problems and challenges to be solved. Most people moving on the city streets and interiors of public facilities have a smartphone, and most professionals working in public facilities or construction sites are equipped with tablets or smartphone devices. If users already have the necessary equipment, they should be provided with further functionalities that will help them in day-to-day life and work. In this review study, user requirements and the state-of-the-art in system development for smartphone localization are discussed. In particular, localization with current and upcoming ‘signals-of-opportunity’ (SoP) for use in mobile devices is the main focus of this paper.


Introduction
Indoor positioning, localisation and navigation are gaining more attention in society, industry and business. To date, the positioning systems were mainly based on satellite observations with Global Navigation Satellite Systems (GNSS) that cannot be used inside the buildings. With the increasing ubiquity of smartphones and other mobile devices, users are now routinely carrying a variety of sensors with them wherever they go. These devices are enabling technologies for ubiquitous computing, facilitating continuous updates of a user's context [1]. Cell phones can nowadays receive signals from multi-constellation GNSS satellites on two frequency bands (L1 and L5 in the case of the US Navstar Global Positioning System GPS; see, e.g., [2,3]) as well as dual-band Wi-Fi on 2.4 and 5 GHz. Such technologies are predesignated to be used in Location-based Services (LBS). Moreover, it can be expected that simple tasks of applied surveying can be performed with smartphones in the near future. This saves time and cost, since no additional hardware has to be purchased as the smartphone is a constant companion anyway. In order to investigate to what extent smartphones are suitable for measurement tasks, the accuracy to be achieved, the measurement effort, the repeatability of the measurement results and the quality of the measurement data are of particular interest. In this paper, especially their usage for positioning in indoor and GNSS challenged and denied environments is investigated.
Starting from a book chapter on 'Indoor Navigation' written by the author and published in the Encyclopedia of Geodesy in 2016 [4] this paper provides an update on the current state-of-the-art and advances in indoor navigation. In this introductory book chapter a comprehensive and concise overview about technologies and techniques which can be employed for indoor positioning and navigation is provided. Indoor navigation is defined in this work as: Indoor positioning is defined as any system which attempts to provide an accurate positioning inside of a covered structure using radio waves, acoustic signals, or other sensory information collected by mobile devices. It is primarily used for real-time location of people or objects in large buildings and in closed areas/spaces. Several types of locationsensing systems exist in which each have its own strengths and limitations.
Based on a classification of Li and Rizos [5] in the Editorial of the Journal of Location Based Services in 2014 for a special issue of the International Conference on Indoor Positioning and Navigation (IPIN) 2012, the following differentiation of indoor localization technologies and techniques was made. These authors have identified three classes in indoor navigation, i.e., (1) designated technologies based on pre-deployed signal transmission infrastructure, (2) technologies based on so-called 'signals-of-opportunity' (SoP), and (3) technologies not based on signals. Infrastructure-based technologies started with the development of systems using infrared (see, e.g., [6]) or ultrasonic signals (see, e.g., [7,8]), followed by the usage of geomagnetic and/or induced magnetic fields [9], Bluetooth Low Energy (BLE), Wireless Fidelity (Wi-Fi), Zigbee, Radio Frequency Identification (RFID), Ultra-wide Band (UWB), or other RF-based (radio frequency based) systems. These wireless technologies are under rapid development also in relation to smartphone localization. Types of wireless technologies being developed range from simple IrDA that uses infrared light for short-range, point-to-point communications, to wireless personal area network (WPAN) for short-range, point-to-multi-point communications, such as Bluetooth and ZigBee, to mid-range, multi-hop wireless local area network (WLAN or usually referred to as Wi-Fi in the case of positioning), to long-distance cellular phone systems, such as 5G [1]. Thereby the most commonly employed SoP is the usage of Wi-Fi for localization [10]. Apart from Wi-Fi, also mobile telephony, FM radio, digital television, and others are SoP [5]. The third category includes mainly sensors for relative positioning providing continuous localization from a given start position using techniques such as dead reckoning (DR). The most usable sensors of this kind embedded into smartphones or other mobile devices are accelerometers and gyroscopes based on MEMS (Micro-electro Mechanical System) technology. With these very low-cost sensors inertial navigation can be carried out; that is why they are referred to as inertial sensors (INS) [11]. In addition, an embedded magnetometer or digital compass in the mobile device can be employed for determination of the direction or heading of the user. Moreover, barometric pressure sensors are found in smartphones enabling altitude determination together with temperature sensors, such as to estimate the correct floor in a multi-storey building where the user is currently located. In addition, vision/camera systems belong also to the third category employing scene analysis and visual odometry [12]. Figure 1 summarizes the main technologies and techniques for indoor and outdoor relative or absolute positioning and Figure 2 visualizes their state of maturity and adoption. The 2018 edition of the GNSS User Technology Reports [13] published every year by the European Union Agency for the Space Programme (EUSPA) served as a main source for this overview and summary. In 2022, EUSPA published the first edition of a joint EO (Earth Observation) and GNSS Market Report [14] which provides further information. As can be seen from these two Figures, the technologies and techniques vary depending on the used signals and sensors.
In a second book chapter co-authored by the author of this article entitled 'Navigation Based on Sensors in Smartphones' [11] especially the use of sensors based on MEMS technology together with an integration with wireless options in smartphones is elaborated. Furthermore, several review type papers from the literature summarizing types of sensors and technologies for indoor navigation, such as [8,15,16], are building the basis for preparation of this article.  The paper is organized as follows: In Section 2, the user requirements with their key performance parameters are discussed followed by the description of the main localization topologies and methods in Section 3. Section 4 is dedicated to inertial navigation (IN) where a change in philosophy is proposed to use the IN sensors as the primary localization technique which is updated by absolute localization technologies and techniques to encounter for the IN sensor errors and drifts. This is followed by Section 5 where the combination of sensors and techniques is discussed. Thereby an emphasis is led on the use of radio frequency (RF) based wireless techniques (Section 5.1). Section 6 identifies the main smartphone-based localization capabilities and describes their usage in modern indoor navigation systems. A comparison of systems is then provided in Section 7 arranged in three Tables. Finally, concluding remarks are given in Section 8.

User Requirements
If someone talks about localization in general, the user requirements for a certain type of application need to be considered and defined. In this section, the key performance parameters derived from GNSS positioning are identified based on the directive of the GNSS User Technology Report [13] and their relationship is discussed. One very important parameter thereby is the integrity of the solution as indicated in Section 2.3.

Key Performance Parameters
In the GNSS User Technology Report from 2018 [13] also the main four dimensions of PNT (Positioning, Navigation and Timing) systems technology development that enable the future of automated intelligent positioning systems are presented. These are apart from positioning accuracy and ubiquity also security and connectivity building the fundament visualized in a PNT technology drivers pyramid as shown in Figure 3. Thus, the core statement in [13] is that reliable and robust location systems must be ubiquitous, secure, accurate and connected to provide the basis for modern automation and ambient intelligence. The following descriptions are given in the report: • Accuracy is obtained thanks to multi constellation, multi-frequency GNSS, augmented by PPP-RTK (Precise Point Positioning -Real Time Kinematic) services and hybridized with INS and other sensors; • Connectivity relies on the integration with both satellites and terrestrial networks, such as the mobile 5G networks, LEO (Low Earth Orbit) satellites or LPWANs (Low Power Wide Area Networks); • Ubiquity is provided by complementary positioning technologies and sensors; and • Security is provided by the combination of independent redundant technologies, cybersecurity and authentication.
To achieve the goal of continuous navigation in mixed (transition) environments, such as open areas, partially obstructed, and indoors requires the fusion of multiple positioning technologies and sensors. Then ubiquitous navigation is achievable which requires the usage of multi-sensor, low-cost and robust navigation solutions.
In 2022, in the first published combined edition of the EO (Earth Observation) and GNSS Market Report [14], the definitions of the key parameters for navigation are stated more precisely. Annex 2 of the report provides the definitions and characteristics. The most important parameters which are applicable for any type of PNT are: (1) availability; (2) accuracy; (3) continuity; (4) integrity; (5) robustness; (6) authentication and (7) time-to-first-fix (TTFF). These key parameters have been defined comprehensively in the report. Table 1 provides an overview of these key performance parameters and their priorities for mass market solutions and safety and liability critical applications. Other performance parameters are also given which are especially relevant if smartphones or other low-cost devices are used. These parameters are: (1) power consumption; (2) resiliency; (3) connectivity; (4) interoperability and (5) traceability. In the author's opinion these key performance parameters are relevant and have to be applied to any type of localization solution and technology. As presented at the FIG Working Week in 2020 [17], the parameters apply to any PNT applications not involving only GNSS but also other sensors and technologies which are additionally and independently used. In the following, examples are given. For instance, in Wi-Fi or UWB positioning similar key requirements and performance parameters can be formulated and applied. In the case of availability, the number of stationary transmitters (UWB stationary units or Wi-Fi Access Points) plays a similar role to the number of GNSS satellites. They stationary units are thereby referred to as infrastructure nodes or anchors and their placement is also decisive is the current location of the satellites if one thinks about geometry effects on positioning accuracy. TTFF in the case of Wi-Fi signal strength-based positioning is highly correlated with the RSSI (Received Signal Strength indicator) scan duration of a certain mobile device. This is especially important in kinematic positioning. As seen in experiments of the author [18], the RSSI scan durations can vary significantly for different smartphones or other mobile devices resulting in significantly different achievable positioning accuracies. In the case of pedestrian navigation, the result depends thereby decisively on the user's walking speed. For different users robustness may have a different meaning, such as the ability of the solution to respond following a serious shadowing event. Here, robustness is defined as the ability of the solution to mitigate interference. Especially integrity is often neglected and not paid full attention. Section 2.3 is dedicated to this important key parameter. An important performance parameter is also power consumption, especially in the case of mobile devices power consumption is still very critical to provide a long-term solution possibility. For solutions where ranges are derived from travel time measurements, for instance, continuous measurements are very power consuming. The ability to prepare for and adapt to changing conditions as described by the parameter resiliency has to be considered in addition. For instance, signal strength variations and fluctuations such as it is the case for Wi-Fi RSSI-based positioning have a significant impact on the positioning result. To encounter for their influence new robust schemes are necessary and need to be developed. Table 1 provides an overview about the key performance parameters and their priorities for mass market solutions and safety and liability critical applications whereas Table 2 highlights them for lower and higher performance applications. As can be seen the requirements are quite different and therefore different priorities must be considered and applied depending on the type of application. It can be very substantial and critical to decided on the key parameters which have to be achieved in any case for the application in mind [17].  Thus, maintaining overall performance requires the fusion of multiple positioning technologies and sensors. GNSS only solutions are difficult or even impossible, for instance, in urban canyons or non-line-of-sight (NLoS) conditions leading to multipath effects and a reduction of the number of satellites in view. The gap in satellite coverage or GNSS performance is not acceptable for many applications and is addressed by using complementary technologies (compare their state of maturity and adoption in Figure 2). The different technologies differ quite significantly with respect to their state of maturity. Especially, signals-of-opportunity (SoP) have to be highlighted because their usage are one of the great opportunities for future ubiquitous user localization in any environment. Automated systems have progressed very rapidly recently thanks to the development alongside all four dimensions of the PNT drivers pyramid base ( Figure 3). The main aims in any type of application are therefore to deliver GNSS-like performance anywhere, anytime, under any operating conditions as well as to exceed the performance levels of GNSS for safety and liability critical applications.

The Key Parameter Integrity
According to [20] integrity is the most important performance metric from the point of safety. To recall, integrity is mainly defined as the ability of the positioning system to provide warnings to users when it should not be used. Gabela et al. [21] define the integrity of the positioning system solution as a measure of trust one can put in the value of the estimated position [22][23][24]. Thus, integrity means 'a guarantee of safety' practically [25]. As it is stated in [21], 'a guarantee of safety', however, cannot be given without any risk of misleading information associated with it. This risk exists due to the different error sources of the positioning system, such as the signal errors in GNSS positioning that affect the measurement system and need to be limited to a specified tolerable level that differs depending on the application.
The integrity has become relevant in addition to the development of robust positioning systems, to support the further development of any localization system. The way that integrity is ensured and assessed, and the means of delivering integrity related information to the user are highly application dependent. It can be distinguished between integrity monitoring and position integrity. The definitions are given as follows: • Integrity monitoring is the ability of a system to provide timely warnings to users when the system should not be used for navigation [26]; and • Position integrity is the general performance feature referring to the level of trust a user can have in the value of a given position or velocity as provided by a location system [22].
Integrity monitoring was first used in civil aviation [20,26,27] and has now become a key performance metric for developing more robust integrity monitoring algorithms for applications in GNSS-challenged/denied environments [24,28]. Integrity parameters are threefold, i.e., (1) the integrity risk IR; (2) the alarm limit AL; and (3) the protection level PL. Figure 4 shows the integrity parameters PL and AL and their relationship to an estimated positioning solution and the ground truth. The IR is the probability that, at any moment in a certain reference time interval, the position error (PE) exceeds a confidence interval. That confidence interval can be called PL. The PL can be defined as a 'radius of an interval (of a circle in a plane), with its centre being at the true position, which describes the region which is assured to contain the estimated quantity' with probability 1 − IR. It means that the estimated position solution (by the positioning system) is bounded by the PL (estimated by the integrity algorithm) with the probability of 1 − IR. AL, on the other hand, is a radius of an interval (of a circle in a plane), with its centre being at the true position, which describes the region which is required to contain the indicated position with a probability 1 − IR.

Relationship between Key Parameters
The relationship between the key parameters accuracy, integrity, continuity and availability is depicted in Figure 5. From bottom to top the relationship between all four parameters is shown. The relationships start from accuracy to integrity followed by continuity and availability. They are crucial for any kind of development in the PNT field.
As aforementioned, integrity has become relevant in addition to the development of robust positioning systems, to support further development in PNT. As can be seen from Figure 5, the integrity parameter is directly linked to continuity and availability similar as accuracy. Moreover, a system has to provide continuous localization capabilities and a high availability [29]. If one looks back to the PNT technologies drivers pyramid in Figure 3 the most important key points are identified for PNT challenges and their possible solutions. Together with the key parameters identified it will be possible to develop robust positioning solutions. This requires, however, the fusion of multiple positioning technologies and sensors for maintaining performance in all contexts.

Localization Topologies and Methods
Apart from positioning and tracking as well as absolute and relative positioning, physical or geometrical and symbolic localization are distinguished when referring to common localization methods [8]. The description of a certain location with coordinates that identify a location on a map is the most typical example of geometrical localization. Providing an address and/or using landmarks in positioning and navigation symbolic localization is applied. Moreover, four different location system topologies are distinguishable; they are based on self-and remote-positioning. In the first self-positioning topology, a mobile station (MS) or device uses measurements from outside transmitters which are placed at known locations to locate itself. Two forms are commonly employed depending on the situation where the location is calculated, either in the mobile device or in a network, such as a cell phone network. This is referred to as MS-based for the first self-positioning topology and MS-assisted for the latter. Remote-positioning concerns the topology where measurements are carried out in a network. In this case, the mobile unit serves as a signal transmitter and several fixed stations at known coordinates are measured. For both topologies, either direct or indirect positioning is distinguished in dependence on where the measurements and position calculations take place. The MS-based approach is described by direct selfpositioning as both the measurements and the position estimation are carried out in the mobile unit. The MS-assisted approach is indirect remote-positioning as the measurements are taken in the mobile unit but the calculation of the position is performed, e.g., on a central server in the network. Indirect self-positioning means, on the other hand, the topology that the measurements made in the network are sent via a data link to the mobile unit which estimates its position. Direct remote-positioning is then in consequence the topology involving the measurements and position estimation central on a network server.
Moreover, localization can be classified into cell-based positioning, proximity technique, (tri/multi)lateration and angulation, hyperbolic lateration, scene analysis, location fingerprinting, dead reckoning (DR) and hybrid solutions. The author provides a comprehensive overview of these techniques applied to indoor localization in [4]. In the following, only the principles of the most relevant methods for the localization of smartphones are briefly discussed and new developments are highlighted.

Cell-Based Positioning
The cell-based approach -referred to as cell-ID (identification) or cell-of-origin (CoO)-is the simplest and most straightforward method. It is based on the cell identity and the location associated with it [30,31]. The location of a smartphone, for instance, is described in relation to the vicinity to the location of the known object(s), such as cell towers (Base transceiving stations BTS) in a cellular phone network, Wi-Fi APs (Section 5.1.1) or Bluetooth tags or iBEacons (Section 5.1.2), define the location of the known object or the symbolic cell-ID. Obviously the size of the associated cell results in the achievable positioning accuracies [32].

Lateration
Lateration uses range measurements to known locations similar as in conventional surveying. Using the intersection of at least three spherical surfaces where the centers are the known locations and the radii the measured ranges the users' location can be estimated. Depending on the number of measured ranges, tri-and multilateration can be distinguished. To obtain a range either one-or two-way travel times between the unknown and known position are measured. The former is also referred to as time-of-arrival (ToA) or time-offlight (ToF) and the latter as a round trip time of flight (RToF) or short round trip time (RTT) [8]. Wi-Fi RTT is an example for this approach; see Section 5.1.1 for further details As an alternative to direct measurement of travel times, ranges to known locations can also be derived from signal strength measurements. A typical example is RSSI-based Wi-Fi positioning. Theoretically, the RSSI decreases with the transmitted energy propagating into space. Path loss models, such as modelling the propagation on the logarithmic scale, can be employed to establish the relationship between the RSSI and propagating distances [32]. The relationship can be defined as given in Equation (1): where P is the transmission power, which is emitted by a transmitter., such as a Wi-Fi AP, given in the logarithmic unit decibel-milliwatt (dBm). The unit Bel is a logarithmic quantity and is defined by reference to a certain value. For dBm the reference value is 1 milliwatt (mW) which corresponds to 0 dBm. Thereby values above 1 mW result in positive dBm values, values below 1 mW result in negative dBm values. The transmission power of a Wi-Fi AP depends, among other things, on the frequency band on which the signal is transmitted [10].
A simple path loss model is the so-called one-slope model which can be described as given in Equation (2) [33]: This one-slop model is a very simple empiric model based on the principle of free space loss of the signals. The damping factor γ depends thereby only on the logarithmic distance d between the transmitter and receiver. P is the received empirical RSSI, P 0 the reference RSSI in distance of 1 m.
Further refinements, applicable especially for indoor environments, can be made by applying a modelling of walls between the user and transmitter or even by using ray launching and ray tracing. For the first, the so-called multi-wall model is a straightforward realization. It is given by the following form: with the additional parameter D i which is the damping value of the i-th wall. P rec is the received RSSI. As in the one-slope model the other parameters can be determined with a least-squares adjustment in a practical scenario by means of measured RSSI values. It is a semi-empiric model which means in this context that the current surroundings are incorporated in the signal propagation estimation model. Simply speaking it can be said that the model considers the damping characteristics of existing walls between transmitter and receiver whereby the direct path between them is treated. More realistic results than in the one-slop model are achievable although the limitations of the multi-wall model are soon reached if small structures, such as corridors, columns, narrow staircases, etc., and different materials in the walls exist. Alternatives are ray launching and ray tracing which are deterministic and ray-optical propagation models. They model the physical propagation laws (absorption, diffraction and reflection) on the basis of objects. Processing power, however, can be very high if a large number of objects in the path are considered. In ray launching, on the one hand, all isotropic rays from a transmitter and their resultant signal strengths are modelled and calculated whereas in ray tracing, on the other hand, the possible rays to the transmitter are determined reverse from the receiver to calculate afterwards the energy loss. To achieve acceptable results with these models, however, the physical properties of the objects need to be known accurately. Physical changes of the environment -as simple as opening doors or windows-and the presence of people, however, cause an uncertainty for modelling of the ray propagation [33].
With a differential approach for Wi-Fi positioning developed by the author [33,34] the impact of spatial and temporal signal variations on the result are reduced. This approach was termed Differential Wi-Fi (DWi-Fi) in analogy to the well-known Differential GPS (DGPS or DGNSS) operational principle. Instead of theoretical path loss models continuous RSSI scans carried out at reference stations are utilized. With these measurements then the positioning accuracy and reliability of the user can be improved. Wi-Fi scanners as reference stations may be collocated with APs and serve the purpose to derive range corrections in a network. Low-cost computers, such as Raspberry Pi units, are used in this approach to operate simultaneously as both APs and reference stations. They emit and scan Wi-Fi signals at the same time. The result is that DWi-Fi outperforms common RSSI-based lateration methods in complex environments.

Hyperbolic Lateration
If distance differences are measured the lines-of-position (LOPs) are hyperbolas. In other words, the resulting hyperbolas of measurements between two transmitters are lines of constant distance difference with each transmitter, such as a base tranceiving station (BTS) in a cellular network, located in one of its foci. To obtain a unique 2D position fix, measurements to at least three transmitters or receivers are necessary, or additional information such as the approximate position needs to be known. The technology is also referred to as a measurement of the time difference of arrival (TDoA). Further information about hyperbolic lateration may be found in [4].

Location Fingerprinting
Location fingerprinting, however, is the most commonly applied technique when it comes to RSSI-based localization techniques using pattern recognition [8]. It can be considered as feature-based positioning technique as a spatially varying feature, in this case the RSSI, is used for user localization. Fingerprinting always includes two phases, i.e., the off-line training phase and the on-line positioning phase. Figure 6 illustrates this two phases. The estimation of the users' location can be based either on deterministic and probabilistic approaches [8]. Deterministic location estimation is based on the similarity of the RSSI measurements and the fingerprints in a database of RSSI fingerprints. Each RSSI sample is not used separately, but the sample averages of different transmitters are collected into a vector and used to estimate the mobile device's location [35]. A vector distance between the database entries and the currently measured RSSI value is estimated for localization. Here most commonly the Euclidean vector distance d is selected (see, e.g., [36][37][38][39]. The vector distance d can be given by: where f obs is the observed fingerprint and f map the respective fingerprint in the fingerprint radio map. The RSSI distributions in the database are visualized in so-called radio maps indicating the RSSI values for a certain AP (compare Figure 6). The vector distance d between the observed fingerprint f obs and the fingerprint in the radio map f i map is calculated and then the position with the shortest distance in the radio map, i.e., the nearest neighbour (NN), yields the unknown location: In this NN algorithm then the Euclidean distance d is calculated for each AP in the positioning phase from the database entries. The distance can be described by the given mathematical relationship in Equation (4). If more neighbours with a small distance are found next to the location to be determined they are considered for the estimation of the location, the K-nearest neighbour (KNN) or the weighted K-nearest neighbour (WKNN) algorithm can then be applied. In this case, K reference points (RP) in the RSSI distribution in the fingerprint database are compared to the observed measurements to select K RPs with the nearest RSSI values. The value K larger than 2 can be an arbitrarily selected number or can be determined by a threshold value for a certain minimum distance. In this KNN approach, the location of the user is usually the centre of gravity of the K positions X NN,j with the K-smallest vector distances [10]: This KNN approach can still be optimized by calculating a weighting for each K fingerprint, on the basis of which the centre of gravity of all K fingerprints can be estimated as the location of the smartphone user, i.e., the WKNN approach. An empirical determination of K values revealed that no significant improvement of the mean deviations from the ground truth are achieved by increasing K while measuring along waypoints of predefined trajectories. Contrary to the expectations, the mean deviations even increased slightly as the value of K increases over a value of 5. However, this increase was only on the centimetre range [18].
As a suitable alternative, also machine learning algorithms, such as neural networks, random forest, decision trees, or support vector machines (SVM), may be applied in deterministic fingerprinting [40]. Figuera et al. [41] provide a good overview and discussion of neural networks and SVM algorithms for Wi-Fi positioning. They found, however, that commonly employed learning algorithms do not significantly outperform the KNN approach in most cases. They achieved an improvement of the performance of the employed SVM algorithm only if a priori information within the learning machine, such as using the spectral information of the training data set, and a complex output taking into account the advantage of the cross information in the two dimensions of the location, is used in addition. A higher performance improvement for fingerprinting is mainly only achievable if probabilistic approaches for positioning are employed. Two selected approaches are presented in the following.
Usually in probabilistic approaches [42] the sample of measurements collected during the training phase are exploited more efficiently. The conditional probability density function (PDF) of the unknown position is calculated in the approach. Thereby the prior distribution is usually considered to be uniform. Using Bayes' theorem (see, e.g., [43]) and the measurements, the posterior PDF can be estimated. The fingerprints contain information about the signal characteristics across the cells. Then the normalized histogram pattern measured at the reference point from a transmitter can be interpreted as the distribution of the RSSI sample from the transmitter. Several approaches for the calculation are available, e.g., the histogram method and the kernel method (see [36,42]). On the other hand, a simple relationship can be derived using the Mahalanobis distance as given in Equation (7): where is the empirical covariance matrix of the number of RSSIs.
In the case where the covariance matrix C −1 is the unit matrix, the Mahalanobis distance given in Equation (7) therefore corresponds to the Euclidean distance used in the deterministic fingerprinting approach (see Equation (4)). As the inverse of the covariance matrix is the weight matrix, the weighted square sum of the RSSI differences between offline training and on-line positioning phase is calculated to obtain the Mahalanobis distance. Thereby the weights are inversely proportional to the variances of the corresponding fingerprints. A large number of studies have shown that probabilistic fingerprinting offers higher accuracy than the deterministic approaches in indoor positioning, as they take better account of signal fluctuations. Leb and Retscher [44] showed that in kinematic positioning where the pedestrian user walked with normal step speed positioning accuracies of 2 m on average are achievable while using the Mahalanobis distance. This type of accuracy is comparable with commonly employed algorithms. Further details can be found in [10]. A particle filter (see, e.g., [45]), for instance, can then be used for localization.
In [46], experiments are presented where the probabilistic fingerprinting approach based on Mahalanobis distance was applied for localization using Wi-Fi in a University library. As an example, Figure 7 shows the estimated vector distances in the unit dBm for five different reference points (waypoint 1 to 5) which are located in the University library.
The horizontal axis in Figure shows the number of the waypoints along the trajectory. As can be seen from the Figure, the positions at waypoint 1, 2, 4 and 5 have been correctly determined in the positioning phase because the minimum of the Mahalanobis distance is estimated at these locations. The on-line measurement in the positioning phase at waypoint 3, however, has its minimum at point 5, which means that the position has not been assigned correctly. This simple example demonstrates the advantages of using the Mahalanobis vector distance for probabilistic fingerprinting. The main reason for this result is the knowledge of the covariance matrix. whereby the standard deviations of each fingerprint must, however, be known. The waypoint with the shortest vector distance is then the desired location. Figure 7. Positioning using the Mahalanobis distances at five different reference points CP01 to CP05 which are located in the University library of TU Wien (Source: [46]).
For validating the achievable positioning performance of location fingerprinting, the Cramér-Rao Lower Bound (CRLB) (see, e.g., [47][48][49][50]) on the Root Mean Square Error (RMSE) is applied in [46] to analyze the resulting deviations from the ground truth. Figure 8 shows a visualization of the resulting CRLB on the RMSE for the ground floor in the library for one smartphone in two opposite user orientations. Low CRLB values in dark blue indicate higher positioning accuracies during the on-line positioning phase, while higher values in red mean lower accuracy. As seen from the Figure, two areas exist where the CRLB is 2 to 3 m (green-yellow areas), while in the other parts of the area it has only values of 0.5 to 1 m. The CRLB on the RMSE is a suitable variable for performance analysis of localization systems in general if it is applied to errors of the positioning solution, such as deviations of estimated locations from a ground truth.

Scene Analysis
Scene analysis is a technique where examining and matching of a video/image or electromagnetic characteristics viewed or sensed from a target object is carried out [15]. Smartphone cameras can be used for vision-based positioning. Specific tags or significant patters, for instance, captured in the images are then used to determine the user's location. Edge detection or the determination of vanishing points in the images is commonly employed. In addition, perspective images of the environment can be compared with pre-recorded images which are stored in a database to obtain the current location. Because of its analogy, this technique can be seen similar as location fingerprinting. During the off-line phase images from reference points are captured to build up the database. As these image databases require a lot of memory various features like edges, corners, blobs, ridges, etc. are extracted from the images and stored as the database. In the on-line phase the user captures images. Feature extraction takes then place before the matching process with the images in the database. For this purpose often image histograms, edges, blobs, and their spatial relationship are extracted. Thereby the selection of efficient feature extraction techniques and an appropriate feature matching measure is essential for a good vision-based localization system [16]. If RF-signals are used it leads to location fingerprinting where a spatial variable feature, such as the RSSI, is used (see Section 3.4). As aforementioned it was first applied in 2000 for Wi-Fi positioning with the system RADAR [35]. Further description about Wi-Fi positioning will follow in Section 5.1.1.

Promising Alternative Techniques
In this section, only two different promising alternative techniques for indoor localization are briefly mentioned. No comprehensive discussion of all possible techniques is carried out here, as this would be by far out of the scope of this paper.
An alternative to the aforementioned techniques are also QR codes distributed in the area of interest, such as in a warehouse or office building. They can be detected in images leading to localization and/or route guidance. These codes can also serve as landmarks for symbolic localization. A simple smartphone App-based solution is presented in [51]. Here the user has to install an App on their mobile device and as the user enters a building all necessary information needed for navigation (maps, contact lists, etc.) are automatically downloaded and updated. An alternative is an online solution where the requested information is retrieved by reading an URL, which is encoded in the QR code, using a web browser. In this case, the URL includes the address of the maps on the external server, and an internet connection is required at all times during the visit to the building. The advantage of this second strategy is, that the storage of the device used is not overloaded with a large amount of data. A similar approach was implemented by TU Wien in respect to contract tracing in the COVID-19 pandemic. Users scan the QR code at the building entrance and are then taken to a website where their entry to the University building is recorded in the TU Wien information system. There is a wide range of other applications of QR codes possible. The connection to vision-based positioning will be further discussed in Section 5.2.
Moreover, white LEDs (light-emitting diodes) can serve as another possibility for indoor localization. In this case, position information can be derived from a range of properties of the received signal, such as the power of the received signal or the angle, i.e., angle-of-arrival (AoA), at which the signal reaches the receiver [52,53]. An approach is also the combination of ToA and AoA measurements. From the direction angles and the distances, the location of the user can be estimated. A system developed by the BU Center for Information and Systems Engineering called 'ByteLight' turns LED light sources into positioning beacons [54]. ByteLight-enabled lights transmit proprietary signals which can be picked up by camera equipped mobile devices. Once signals are detected, the device then calculates its position without the need for an active network connection. The manufacturer claims that the indoor positioning solution is accurate to less than one metre and takes less than a second to compute.

Inertial Navigation (IN) as Primary Sensor
Previously, INS were mostly employed to bridge gaps of GNSS positioning for a short limited time. Thereby trajectory estimation is based usually on filtering techniques, such as the popular Kalman filter or Extended Kalman filter [55]. Due to the high drift causing a high error growth of INS, the period to bridge GNSS outages is very short and a frequent updated with known absolute positions is required. This strategy of bridging GNSS gaps can be seen as a classical approach for positioning and navigation. In new developments a changed navigation philosophy is applied. In this case, INS is considered as primary navigation sensor and the focus lies on bounding the INS error growth. This approach allows a flexible and adaptive blend with other sensors, including unconventional sensors, not designed for navigation. Here wireless technologies come into play as they can provide absolute positioning capabilities. MEMS tri-axes accelerometers and gyroscopes have become one of the standard features in mobile devices who enable that the INS observations in smartphones are combined and intergraded with absolute positions coming from Wi-Fi or other sources [11]. Research challenges in this context include the development and application of flexible software architectures, adaptive data filtering and sensor fusion, stochastic transition between different hybridizations as well as the usage of intelligent algorithms, such as machine learning. Further details are provided in the following sub-section.

INS Trajectory Estimation
The trajectory of a mobile user can be estimated from position, velocity, acceleration and orientation measurements over time. The sensor employed is referred to as Inertial Measurement Unit (IMU). With MEMS-based accelerometers, gyroscope, and magnetometer the distance travelled as well as the direction of movement (heading or azimuth α) is determined. GNSS and other wireless positioning techniques are then utilized to update the estimated INS trajectory of the IMU. Thereby time synchronization of the sensors is also an essential aspect for successful sensor fusion. Estimation theory, in general, and Kalman filtering (see, e.g., [55][56][57]) in particular, provide a theoretical framework for combining information from various sensors. By properly combining the information from an INS and other absolute positioning systems, the errors in position and velocity are compensated for. In the following, the operational principle of IN, dead reckoning (DR) and map matching (MM) is briefly reviewed followed by a discussion of activity detection of pedestrian users as well as altitude determination in indoor environments.

Inertial Navigation (IN)
IN is based on DR where from the measurement of the distance travelled and the corresponding azimuth a position change between consecutive measurement epochs is derived. Hence, the position, orientation, and velocity (direction and speed of movement) of a moving user can be continuously estimated. Then the trajectory is estimated using sequential calculation. In general, an INS consists of three orthogonal arranged motion sensors, i.e., accelerometers, for determination of the acceleration vector and three orthogonal arranged rotation sensors, i.e., rate-gyroscopes, for attitude determination measuring angular velocity and linear acceleration, respectively. Their observations are converted into new observations regarding the position and orientation of the system. Six degrees of freedom, i.e., three translations and three rotations, have to be estimated to be able to determine a movement in space [11].
The main drawback is that IMUs suffer from integration drifts, as small errors in the measurement of acceleration and angular velocity are integrated errors in velocity finally leading to greater errors in position. Since the new position is calculated from the previous calculated position and the measured acceleration and angular velocity, these errors accumulate roughly proportionally to the time. An error in the acceleration measurements causes an error in the distance which grows squarely with time due to double integration. Hence, the estimated position must be periodically updated by observations from absolute positioning systems.

Dead Reckoning (DR) Principle
In DR one starts with a known location, e.g., determined by GNSS or other wireless absolute positioning technique. Using observations of the inertial sensors then the current position of the user can be dead reckoned by projecting the course and speed from a known present position [58,59]. Hence, an INS can be used to make a relative position estimate by means of DR. The current position is always calculated relative to the previously calculated position and no correlation with the true position can be made. This process, however, is subject to significant cumulative errors due to many factors. They are either compounding, multiplicatively or exponentially, as both velocity and direction must be accurately known. Any errors and uncertainties of the process are cumulative leading to the fact that the error and uncertainty in the position grow with time [11].
In the case of pedestrian navigation, PDR (Pedestrian Dead Reckoning) is applied to utilize the observations of the smartphone IN sensors, for instance. This operational principle is as in DR for vehicle navigation. PDR is simply the counting of the steps and estimation of a step length (or walking speed) as well as the course over ground (or direction of walking). The PDR technique is effective if a hard mounting point on the pedestrian is used. MEMS-based IMUs can be worn on the human body, such as illustrated in Figure 9.
Here optimum IMU locations are foot-mounted, such as on the shoe, or mounting on the hips on a belt in the back of the person, for instance, if step detection (see Section 4.5) is carried out. Still acceptable locations are holding the IMU (e.g., in the smartphone) in the hand or have it in the trousers pocket. Thereby the IMU must be worn by the same user since the step model is trained with a particular individual's walking patterns [60] otherwise a calibration for different users is required.

Map Matching (MM)
Map matching (MM) decribes the process when an estimated trajectory, e.g., obtained via DR, of a vehicle is matched to a digital road database. Thus, MM is defined as the technique that combines electronic map with location information to obtain the real position of vehicles. It uses spatial road network data to determine the spatial reference of the vehicle location [61]. The MM algorithms can be used not only to provide the vehicle's position in the correct road segment but also can improve the positioning accuracy if good spatial road network data are available [62]. Different algorithms are applicable which have in common that they use spatial information in an attempt to improve navigation accuracy. Either the current vehicle's location, the distance travelled or the curvature pattern of the road elements and mutual allocation is utilized by cross correlation [63]. Cameras, GNSS, etc., can improve positioning accuracy. For Intelligent Transportation System (ITS) applications, however, these algorithms are mostly not yet capable of supporting the navigation requirements [11]. High definition (HD) maps requiring centimeter-level accuracy and ultra-high centimeter-level or higher resolution are the keys to autonomous driving [64]. The creation of HD maps is a challenge. Currently, outdoor mobile mapping systems equipped with high-end LiDAR, cameras, RADARs, GNSS and INS are employed for the creation and updating of these HD maps.
In buildings, MM algorithms may employ the topology of polygons derived from the indoor maps. Furthermore Building Information Models (BIM) can support pedestrian navigation in an indoor environment. BIM is based on an electronic record of full knowledge and data about a building object, and builds an intelligent 3D model-based process that gives architecture, engineering, and construction professionals the insight and tools to more efficiently plan, design, construct and manage buildings and infrastructure. Information extracted from the BIM can reorganise, modify, actualise and store to facilitate access and display model information directly on mobile devices or the Web [65,66]. With the increasing number of smartphone users and the increase of low-cost sensors found in them, there is an opportunity to utilize smartphones to enhance BIM technology. The sensors and receivers found in modern smartphones can enable and improve PNT solutions for Location-based Services (LBS). If a BIM exists also navigation in the 3D building model can be carried out.

Activity and Step Detection
For pedestrian navigation dynamic activity and step detection are needed [67]. Human activity recognition aims to recognize the motion of a person from a series of observations from the user's body and environment. For example, in [68] a single biaxial accelerometer is employed for classifying six activities, i.e., walking, running, sitting, walking upstairs, walking downstairs, and standing; and in [69] an activity and location recognition system using a combination of a bi-axial accelerometer, compass, and gyroscope is used. In other studies, sensors are also worn on the human body for activity detection [67].
When it comes to step detection usually peak detection [58,70] or zero-crossing counting [60,71] on low-pass filtered accelerometer signals are used to count steps. The cycle of walking is across a stride, i.e., two steps as can be seen in Figure 10, with typical stride frequencies of around 1 to 2 Hz. In general, vertical acceleration provides better performance [72], but depends on being able to isolate the orthogonal accelerations in the global frame. This is difficult to perform on a smartphone that is not firmly attached to the body. Instead, the signal magnitude is often substituted [73]. As an example, Figure 11 shows a typical acceleration recording in X-, Y-and Z-axis on a smartphone. As can be seen from the Figure, the acceleration values show significant maxima and minima. From these peaks it is possible to count the steps of the walking user. An easy approach to count the steps is to recognize one step from the excess of a certain threshold. Then a step is detected when this value first gets lower than a defined threshold and subsequently upon a second threshold. Correction for the gravity effect on the X-, Y-and Z-axes of the smartphone's local coordinate system is essential to correct the determination of accelerometer-derived distance travelled [1]. On the other hand, zero-crossings can also be employed for step detection. Further methods make use of the sinusoidal-type pattern of acceleration signals to detect step events. In these cases, frequency domain analysis, such as fast Fourier transform (FFT), is applied to analyze acceleration signal series [74]. FFT results usually show a strong frequency peak in the range of 0.5 to 2 Hz when walking. This corresponds to a walking frequency of 0.5 to 2 steps per second [75].  To obtain the distance travelled from step detection also the step length needs to be estimated which is not a trivial task. In [74], four models are distinguished: (1) the constant model; (2) a cluster of linear models; (3) non-linear multivariable models; and (4) nonparametric analytical techniques. The constant model is least accurate as, for example, an error of 5 cm in step length represents already a relative error of 15 percent compared to a 35 cm step length. The second linear model with clustering uses a correlation between step length and temporal variables, such as step interval and frequency. Consequently, the resulting step length estimation only follows a time variant process leading to a limited improvement of accuracy for the estimation of the distance travelled as terrain slop, body height of the person and trajectory curvature are not considered. Non-linear models try to consider these influencing factors in a way that they consider step length as a nonlinear function of these variables. In addition to these parametric models, non-parametric analytical techniques use, for example, fuzzy logic, wavelet analysis or artificial neural networks (ANNs) [76].

Indoor Altitude Determination
For 3D positioning in indoor environments the system must also be capable to determine the correct floor where the user is currently located. In [77], the use of the barometric pressure sensor, which are now found increasingly also in smartphones, is proposed and investigated. Altitude determination with a barometric pressure sensor can be performed relatively from a given start height. For the conversion of the observed air pressure p in a height difference ∆h the following barometric height equation may be employed: (8) where T re f is the reference temperature, dT the temperature gradient and p re f a reference air pressure, such as derived from a standard atmospheric model.
As an example, Figure 12 shows an example for the air pressure observations and the derived height in a multi-storey office building. The readings were smoothed while walking along the trajectory between different floors. As the user walks up to different floors the air pressure decreases. Using this inverse relationship the altitude can be estimated. For the conversion of the air pressure in a height difference also the mean value of the temperature at both stations is required (see Equation (8)). MEMS infrared thermometers are also increasingly found in smartphones and they can provide the mean temperature differences between different locations. As can be seen from Figure 12 the smartphones altimeter is capable to determine the correct floor and the relative height between them.

Combination of MEMS-Based INS with Other Sensor Systems
The majority of navigation systems require a form of combination of the MEMS-based sensors with other sensor systems and technologies, such as wireless localization techniques or also vision-aided positioning via scene analysis or visual odometry. In this section, these wireless options and image-based systems together with their integration potential with MEMS-based sensors are discussed.

RF-Based Wireless Options
Wireless options based of RF signals, such as Wi-Fi and UWB (see, e.g., [78][79][80]), can be alternatives to GNSS indoors. Integration of these wireless technologies with INS allows taking advantage of the strengths of both leading to a more robust positioning solution. Since INS location estimates are obtained by the mathematical integration of sensor measurements, e.g., forming the double integral of the accelerations measured by the smartphone accelerometer to derive the distance traveled, it leads to the unbounded accumulation of location errors over time. Location estimates determined from absolute positioning technologies, on the other hand, are independent of any previous estimates because the location is obtained from a single or a set of measurements without integrating measurements over time [11]. Intentionally, three RF-based technologies, i.e., Wi-Fi, Bluetooth Low Energy (BLE) and UWB, are briefly reviewed as main alternatives in the following. A further discussion of all other possible wireless options would be out of the scope of this paper. The first two wireless options Wi-Fi and BLE are available in every smartphone nowadays although not intentionally employed for localization as they were embedded into mobile devices. Various wireless standards have been established. Among them, the standards for Wi-Fi, IEEE 802.11 and wireless PAN, IEEE 802.15.1 (Bluetooth), are used more widely for measurement and localization. UWB is an upcoming promising technology as it finds its way in smartphones and mobile devices. It has inherent advantages compared to the previously mentioned technologies, which will be discussed in Section 5.1.3.

Wireless Fidelity (Wi-Fi)
A comprehensive discussion from the author of this contribution about the fundamentals of Wi-Fi localization may be found in a paper published in Sensors in 2020 [10]. It is by far the most prominent signal-of-opportunity until now and therefore widely adopted. Wi-Fi refers to a local wireless network, which is classified under the IEEE (Institute of Electrical and Electronics Engineers) standard 802.11. One of the main advantages of the utilization of Wi-Fi is that no designated infrastructure has to be built up. Wi-Fi networks originally designed for short-range wireless data communication and typically deployed as an ad hoc networks are built by attaching a device called access point (AP) to the edge of a wired network. The transmission power is the major factor that has direct influence on the effective range. Hence, if the Wi-Fi signal strength is measured, its RSSIs together with the associated MAC (Media Access Control) addresses of the APs are location-dependent information that can be adopted for positioning purpose. An observable associated with a MAC address of an AP consist of the following information: (1) the unique MAC address of the RF transmitter, (2) the location of the RF transmitter, and (3) the effective range of the signal, or the size of the signal coverage area of the RF transmitter [78]. Dual-band Wi-Fi technology on 2.4 and 5 GHz frequencies is usually employed. For localization of a mobile device, either cell-based solutions or lateration and location fingerprinting are commonly employed (see Section 3). The most widely adopted positioning method is thereby location fingerprinting in RSSI-based approaches (compare Section 3.4). In the following, this section concentrates on lateration-based approaches, especially on the novel Wi-Fi RTT technology where the double distance between the AP and the mobile unit is measured allowing for higher positioning accuracies.
Introduced by [81] in the GPS World Magazine in 2018, the IEEE 802.11mc standard allows the measurement of the RTT between APs and mobile clients with the Fine Timing Measurement (FTM) protocol. For that purpose, several hardware design changes in the existing Wi-Fi chip-sets are necessary to increase the timing resolution from the microseconds level to the nanosecond level (or even sub-nanosecond level). Using an exchange of multiple message frames between an initiating station (ISTA) and a responding station (RSTA) FTM is carried out. If the smartphone sends a localization request it serves in this way as an ISTA and measures its range to the RSTA, i.e., a certain AP. If the localization request comes from the network, the ISTA is the AP and the mobile device the RSTA, on the other hand. Thus, the Wi-Fi RTT FTM is a point-to-point (P2P) single-user protocol. The measurements are carried out as follows: 1.
the ISTA sends an FTM request to the RSTA; 2.
the RSTA receives the request and returns an acknowledgement (ACK) signal to the ISTA; 3.
then several FTM feedbacks are sent from the RSTA to the ISTA; and 4.
then the mean RTT measurement is used for range calculation.
These procedure yields to the calculation as given in Equation (9): where t 1 i is the timestamp when the FTM request is first sent by an ISTA, t 2 i the timestamp when the FTM signal arrives at the RSTA, t 3 i the timestamp when the RSTA returns the acknowledgment (ACK) signal to the ISTA, t 4 i the timestamp when the ACK signal is finally received by the ISTA, N is the successful burst number (where N > 0, N < B and B is the total burst number (i.e., burst size, B = 8 by selected default). A burst or more accurately speaking data burst is the broadcast of a relatively high-bandwidth transmission over a short period. Burst transmission can be intentional, broadcasting a compressed message at a very high data signalling rate within a very short transmission time.
The protocol excludes the processing time at the ISTA by subtracting t 3 i − t 2 i from the total RTT t 4 i − t 1 i , which represents the time from the instant the FTM message is sent t 1 i to the instant that the ACK is received t 4 i . This calculation is repeated for each FTM-ACK exchange, and the final RTT is the average over the successful number of FTM-ACK bursts as seen from Equation (9). The estimated range r est can then be calculated using Equation (10): where c is the propagation speed of the RF signal. Challenges for Wi-Fi RTT arise in dense-multipath environments and even in NLoS (non-line-of-sight) conditions. In this cases, accurate time-delay estimation might be difficult to achieve as it requires precise detection of the first signal path with the LoS condition between the two stations and the estimation of its arrival time [82,83]. That is why the RTT protocol is not entirely compatible with a NLoS surrounding environment currently.
The measured ranges to at least three APs are then required for 2D lateration. Achievable accuracies for the ranges lie in the cm-to dm-range and for the estimated positions on the dm-to m-range [84] which is significantly higher compared to RSSI-based lateration or location fingerprinting. RSSI-based approaches are still needed since the coverage with new AP hardware in the surrounding environment cannot always be guaranteed. Therefore, a combination and integration of technologies as a hybrid solution is needed [10].

Bluetooth
Bluetooth is a technology that operates wirelessly and transfers data between compatible devices. Thus, it is considered a cable replacement for mobile devices for old RS-232 data wires and it is mainly designed to maximize the ad hoc networking functionality [85]. Bluetooth uses radio waves (UHF) and bands (ISM) of two different frequencies, the first one is 2.402 GHz, and the second is 2.48 GHz, respectively. Compared to Wi-Fi, the gross bit rate is lower (1 Mb/s) with a data transmission power of just 2.5 mW; resulting usually in shorter ranges between the devices. It is a 'lighter' standard, highly ubiquitous (because it is embedded in cell phones, etc.) and supports several other networking services in addition to IP. For positioning either tags or Bluetooth Low Energy (BLE) iBeacons are common. Bluetooth tags are small size transceivers. As any other Bluetooth device, each tag has a unique ID. This ID can be used for localization of the tag [86]. iBeacon is a BLE protocol developed by Apple and compatible hardware transmitters, typically so-called beacons, which broadcast their identifier to nearby portable electronic devices. The technology enables mobile devices to perform actions when they are in close proximity to an iBeacon. If this is the case, a universally unique identifier picked up by a compatible App or operating system is transmitted. The identifier and several bytes sent with it can be used to determine the device's physical location. Localization is based on proximity sensing and cell-based solutions [11].
Bluetooth has a core specification that has undergone several versions from 4.0 up to 5.3. The versions are backward compatible in that way that device supporting, for example, in version 5.1, can work with devices running lower versions, obviously, as long as the newer features are not used [87]. Two years after Bluetooth 4.0, the Bluetooth Special Interest Group released Bluetooth 4.1. With this version each device can serve both as a client and a server. For Bluetooth 4.2, introduced at the end of 2014, corresponding smart devices needed new chips, and a software update from Bluetooth 4.0 or 4.1 was not possible [87]. The most significant change was that data packets are smaller under Bluetooth 4.2 leading to a faster exchange between server and client. Theoretically, a speed 2.5 times as high as with the low-energy version of Bluetooth 4.0 and 4.1 is possible. Furthermore, the battery life is also longer in principle [88]. Bluetooth 5.2 is the latest Bluetooth version as of 2020 offering improvements in the audio area in particular [89].
Localization is based on proximity sensing and cell-based solutions. Substantially for positioning was the development of BLE. Released in 2010 as part of the Bluetooth 4 radio specification, BLE enabled to expand the ecosystem of Bluetooth to the Internet of Things [90]. Due to its highly energy-saving wireless technology for short and medium ranges of up to 50 m based on Bluetooth Classic, BLE version 4.0 was specially designed for low-power solutions in control and monitoring applications. There are two types of BLE; one is single mode, and the other dual mode. Devices with single mode only use BLE and therefore have a lower power requirement. These devices, such as beacons, have chips without complicated functionalities. Dual-mode devices have a classic Bluetooth chip that has BLE integrated. These include, for example, smartphones, tablets, and computers. The iPhone 4S was the first device to support BLE in 2011. All current smartphones, tablets, etc., are now BLE-capable.
Proximity detection can be performed with BLE with a simple mechanism. Each BLEequipped device can be in two states, the broadcaster, and the observer. The broadcaster sends a broadcast beacon message on three default channels every 'advertising interval' to the observer, instead of every 'scan interval', wakes up and listens to beacons for a certain 'scan window' time. When the observer receives the beacon, it estimates the distance from the broadcaster using the RSSI. In the BLE protocol definition, 40 channels, each 2 MHz wide, around the 2.4 GHz radio band are used to transmit messages. The duration for transmitting messages is concise to save battery power. Among these 40 channels, there are three channels (i.e., 37, 38, and 39) for broadcasting advertisement messages. The RSSI from these three channels can be used for estimating the location of the mobile device. The BLE advertising rate can be set up to 50 Hz. The transmission power for BLE beacons is also set from 0 to 40 dBm. To reduce power consumption, BLE advertising rate and transmission power are usually set to less than 10 Hz and 16 dBm, respectively.
The main disadvantages iof BLE compared to Wi-Fi are that for indoor positioning BLE iBeacons have to be installed throughout the area of interest [91]. The iBeacons are also battery-powered where the batteries cannot be changed, which reduces their lifetime.

Ultra-Wide Band (UWB)
As defined by the Federal Communications Commission (FCC), UWB is a RF signal that has a bandwidth greater than 500 MHz and is generally operated in the 3.1 to 10.6 GHz range [92]. Due to this high bandwidth which allows the use of short-pulse waveforms, the effect of interference due to a multipath is significantly reduced compared to other RF technologies, such as in Wi-Fi or Bluetooth [93]. The lower part of the frequencies in UWB enables it to penetrate obstacles, including walls and objects. Thus, UWB is under rapid development mainly for positioning purposes due to the high accuracy and robustness that it provides [94,95]. It offers apart from better multipath resolution capabilities, also higher data rates and lower interference with existing systems, as well as generally high accuracy having lower energy requirements [96]. UWB positioning methods can be divided into two broad categories, i.e., fingerprinting-based and geometric methods. The latter estimate the position either using the range or angle information derived from RSSI, ToA, TDoA or AoA measurements observed by a UWB system [97]. For fingerprinting, on the other hand, the RSSI is only used as it is the case in Wi-Fi positioning. UWB units ranging potential [98] enables lateration approaches where the two-way time-of-flight (TW-ToF) measurement, or RTT as previously described for Wi-Fi RTT in Section 5.1.1, is utilized [99]. To recap, TW-ToF or RTT measure the double ranges r obs from the transmitter to the receiver (t 1 ) and back to the transmitter t 2 including the time taken by the receiver to respond to the transmitter UWB t d to derive the range observations between a transmitter and receiver. Thereby the time t d is generally constant and can be estimated using calibration. Then the range observation r obs is obtained as given in Equation (11): where c is the propagation speed of the RF signal and t d,calib the estimated value of t d obtained after calibration [96]. The operational principle for positioning is as follows: UWB sensors rely on the coherent (i.e., phase-stable continuous oscillation) transmission of very short-duration RF waveforms referred to as pulses. These packets of several thousands of short pulses transmitted between UWB nodes are utilized for estimating the required travel time for the RF signal. By exploiting the signal characteristics of these short pulses, accurate detection of the first pulse (first break) or leading edge (LE) is possible, enabling the range measurement of the direct signal, while at the same time filtering out of multipath and NLoS effects [100]. This functionality is only useful in combination with the ability of the UWB signals to penetrate, e.g., walls and other obstacles made from most construction materials (except metal surfaces) and provide accurate ranges. Exact synchronization of the transmitting and receiving devices, however, is a substantial requirement. This is usually done through the UWB hardware. Through utilizing the coherent transmission capabilities of UWB signals and through implementing the RTT technique synchronization issues are resolved to a great extend.
Systems on the market exist which allow range measurements on the cm-level. The nominal accuracy was validated by, e.g., [101] in performance evaluation campaigns as of the order of around 3 cm for calibrated UWB pairs. In [102], two different of-the-shelf UWB systems, i.e., the P410 and P440 Time Domain and the Poxyz UWB modules, were tested. Figure 13 reproduces an example of a trajectory determination using UWB in an indoor building hallway measured in a field campaign at The Ohio State University with contribution of the author of this article. 14 Pozyx and 14 TimeDomain UWB anchors (also referred to as static nodes) were fixed on the walls along a corridor in one single floor as well as in the staircase in the Bolz Hall University building. Calibration and validation range measurement datasets were collected by a rover UWB unit on 35 checkpoints along the corridor. In the Figure an [103,104], assuming a simple dynamic model for the device movement (a random walk model for the velocities). Figure 13 (right) shows a zoomed view of the two estimated trajectories, where the improvement obtained with the proposed calibration model is clearly visible. Thus, this example demonstrates that calibration of the UWB units and derived ranges is essential to achieve a high level of performance and positioning accuracies on the cm-level. It also shows that an EKF forms the key methodology for computing state estimates for non-linear case scenarios. Smartphone manufacturers have already started incorporating UWB chip-sets. This is leading to more widespread applications of the UWB technology. Further discussion about UWB-enabled mobile devices will follow in Section 6.3. Table 3 compares the key specifications of Wi-Fi, BLE and UWB relevant for localization. The specifications show that these three technologies are capable for the usage in an indoor positioning system incorporating smartphones as mobile clients. The advantage of hybrid wireless RF-based solutions has become of growing interest as then advantages of different technologies can be utilized. Alvarez-Merino et al. [105] evaluated and compared the performance of UWB, Wi-Fi RTT and their fusion for the localization application with measurements in a construction site. The measurement campaign included samples from several different floors in a building under construction. These measurements were used to assess the precision of each technology either individually or in combination. Thereby the study showed that it is essential to apply a weighting that prioritises the different reference points (RPs) since they provide different level of accuracies.

Comparison of Wi-Fi, BLE and UWB
Considering an appropriate weighting model for the fusion of the ranging measurements yields a significant performance improvement. In a study conducted by [106] supervised by the author a modified clustering based on DBSCAN (Density-Based Spatial Clustering of Applications with Noise) for noisy range measurements and weighted least squares (WLS) approach was chosen for the fusion of GNSS pseudoranges and UWB ranges for outdoor positioning in challenging environments. Density-based clustering is a method that sorts a set of data points by their spatial connectivity as well as their spatial density. With this method, it is possible to create three-dimensional connected objects in any form possible. It was applied to identify clusters in large spatial datasets by looking at the local density of the data points. In this study, it was also seen that weighting of the range observations is essential. After the data filtering and clustering the WLS algorithm was employed for the estimation of the coordinates of the measurement points. The WLS approach is a calculation method that incorporates statistical models in order to estimate statistically the most plausible values. The first input parameter is the random vector of observations under the assumption of a constant error variance. Additionally, the standard deviations associated with the observations vector are set [106].
In Figure 14, all range measurements are shown over the time of the measurement campaign whereas all UWB range observations are separated by the four anchor points. The data, visualized as white points, only shows range observations that were filtered from outliers. Before using them in the WLS implementation, the measurements were additionally filtered by using the median of the range measurements per grid point. By defining a buffer around the median, the data that was outside of this buffer was eliminated from the data set. In the last step, the time frame of 1.5 min at a rate of one measurement per second was chosen, so in total there were 91 range measurements used. Basic statistical values were calculated from this filtered data in order to assess the ranging performance compared to the ground truth (accuracy and precision) before using them in the WLS. The overall ranging performance was as expected with standard deviations in the range of few cm. The most precise measurements had standard deviations of around 1 cm whereas the most imprecise measurements were around 17 cm.
To assess the positioning solution in detail, Figure 15 shows the results of all grid points after being processed in the GNSS/UWB-fusion WLS approach. By combining both UWB range data and GNSS measurements, the result is significantly improved compared to the single UWB only solution for most of the grid points. These improvements range from several cm to dm. Using only UWB range measurements, the positioning solution achieved only positioning accuracies of several dm when at least three range measurements were used. Accuracy of only several meters were achieved, however, when less than three ranges were available. Thus, by including the calculated GNSS baselines, it was possible to compensate for this lack of measurement data because a redundancy could be created for existing UWB range measurements.
Furthermore, it was found in the study conducted by [105] that UWB has demonstrated to provide better ranging accuracy and Wi-Fi RTT to show robustness against obstructions in the scenarios with better propagation performance and penetration capabilities. In the investigation of the penetration capabilities of walls, it was found that reinforcement concrete completely cancels UWB propagation. Wi-Fi RTT showed better performance in this respect as the RTT ranges were able to benefit from holes in the structure to achieve localization. If no holes are present, however, Wi-Fi RTT showed also difficulties but still managed to estimate ranges.

Vision-Aided Positioning
As a camera is independent from other sensors it provides a good alternative for localization [11] because the noise in images is not cumulating over time as well as also free from an infrastructure installation. The camera is mainly used as a complementary sensor that collaborates with other on-board sensors [107]. Visual positioning means, in general, the usage of information obtained from images to resolve the user position or ease the resolving of the user position by integrating the visual information with position measurements obtained with other methods. The visual information may also be used to make the position solution more accurate and more widely available. Either an absolute position of the user by utilizing reference images in a database or a relative position and location change by examining consecutive images is obtained. Regardless of whether the visual-based positioning is based on either one of the two approaches, the first step of the pre-processing is similar. This includes retrieving the required camera parameters [108]. When the camera is carried by a pedestrian, the motion of the camera relates to the motion of the pedestrian obtained, for instance, calculated from consecutive images. Integrating the motion information obtained from the images derives a navigation system with increased availability and accuracy [109].  The matching of perspective images of the environment captured by the camera carried by a person, to prerecorded images or videos which have been collected to build up 3D models stored in an image/video database can be performed [4]. Thereby this database contains images of recognizable features in the surroundings attached with position information (see, e.g., [110][111][112][113]). The extraction of either feature in the images or vanishing points of straight lines are used in image processing. Images are captured first at known positions during an initial 'mapping' survey and added to the database [114]. When in the positioning phase a match between images in the database and the ones taken by a smartphone camera is found, the absolute position is obtained. Visual features in a close surrounding environment may also be detected and relative positioning information obtained by detecting the motion of the features in consecutive images [12]. This procedure is also referred to as vision-aiding. The motion of the features enables computation of the heading change and translation of the camera between consecutive images. As mentioned above, images provide accurate motion information, because they do not suffer from accumulating measurement errors like IMUs. The matching schemes are robust to changes in scale, illumination, camera position, and small changes in the scene, and databases of a large number of images can be queried in real-time with error rates of just a few percent, depending on the frequency of update required and the nature of the environment [115].
Vision-aided positioning has the advantage that it does not require a dedicated infrastructure and is, thus, low-cost and passive. The main drawback, however, is that building up the image database is laborious due to the a priori preparations and is restricted to the predefined region [109]. In addition, indoor surroundings, like offices and public buildings, however, may usually be poor with features and the lighting is also often constricted. An approach to overcome these drawbacks is to use straight lines, like borders of floors and walls, to calculate vanishing points. The algorithms for vanishing point calculations, however, are mostly computationally heavy. Ref. [109], for example, introduced an indoor pedestrian navigation system using a rapid algorithm utilizing visual-aided heading of the motion for smartphones. In this algorithm, the change in heading is calculated using vanishing points with a frequency of 1 Hz which fulfils the real-time requirements set for navigation for smartphones. The system also integrates the visual-aiding information with measurements obtained from other sensors using an Extended Kalman filter.
A further alternative is the so-called visual odometry where sequential captured image sequences along the users' trajectory are utilized. The term 'visual odometry' was first introduced by Nistér et al. [116] for its similarity to the concept of wheel odometry for vehicles. Figure 16 shows the principle of operation in an indoor setting. It is defined as the process of estimating the robot's motion (translation and rotation with respect to a reference frame) by observing a sequence of images of its environment [117], while the user moves through the building with their smartphone, the integrated camera captures the environment. An image database is also needed for visual odometry as the current recording is compared with a database containing all previously recorded images. The relative position and orientation of the device can be determined by recognizing areas that have already been traversed. This does not view and save the entire image, but extracts special features from it. These can often be found on the corners and edges of objects. As before, an image processing or matching algorithm searches for matches with known features. Visual odometry is an inexpensive and alternative odometry technique that is more accurate than conventional techniques, such as MEMS-based INS, wheel odometry, etc., with a relative position error ranging from 0.1 to 2 percent [118]. However, changes in the environment, such as moving persons, new furniture, times of day and similar corridors in the building, affect the performance of the image processing algorithm and negatively affect the duration and accuracy of positioning.
Simultaneous Localization and Mapping (SLAM) is a way for a platform, usually a robot or a pedestrian navigator, e.g., a backpack system, to localize itself in an unknown environment, while incrementally constructing a map of its surroundings. SLAM has been extensively studied in the past couple of decades [119][120][121] resulting in many different solutions using different sensors, including sonar sensors [122], infrared sensors [123] and laser scanners [124]. Recently there has been an increased interest in visual based SLAM also known as V-SLAM because of the rich visual information available from passive low-cost video sensors compared to laser scanners. However, the trade off is a higher computational cost and the requirement for more sophisticated algorithms for processing the images and extracting the necessary information [117].
The main difference between visual odometry and SLAM is that visual odometry mainly focuses on local consistency and aims to incrementally estimate the path of the camera pose after pose, and possibly performing local optimization; whereas SLAM aims to obtain a globally consistent estimate of the camera/platform trajectory and map.

Sensor Fusion
Sensor fusion of all observations is essential when it comes to the integration of IN with absolute positioning techniques. As seen in Section 4, the IMU can be considered as primary sensor and absolute positioning is used to update and bound the sensor drifts. The most commonly used algorithm is the Kalman filter [56]. The Kalman filter is a method based on recursive Bayesian filtering where the noise in the system is assumed Gaussian. Hence, KF is a recursive algorithm that uses a series of prediction and measurement update steps to obtain an optimal, in a minimum variance sense, estimate of the state vector. As the theory of Kalman filtering may be found in many textbooks it is not further discussed in this section. Usually, the Extended Kalman filter (EKF) needs to be applied as most problems are non-linear. In the EKF non-linearity are approximated using the first or second order derivative. EKF has been successfully applied for indoor positioning especially for the fusion of hybrid positioning measurements. The most common positioning application is to combine PDR and other positioning systems, such as UWB and PDR [125], WiFi and PDR [126][127][128] or the combination of multiple systems [129]. In the following, just a brief summary of the operational principle of two selected sensor fusion techniques which are widely employed nowadays is provided.
One of these algorithms is the particle filter (PF) which is also known as Monte Carlo localization (MCL). This algorithm is now widely used, for example, in robot localization in indoor environments. PF is a recursive implementation of the sequential Monte Carlo method. The basic idea is to replace the integral operation with a set of samples that are close to the posterior probability to obtain a final state estimate [56]. As in Kalman filtering, an estimation is a recursive approach but it can be used similar to the EKF for non-linear systems. Instead of Gaussian distributions, the PF can deal with non-Gaussian noise distribution. Starting with a set of particles in the prediction phase each set is applied through a motion model by sampling the particles. This results in a new set of particles at the following epoch which approximates a random sample from the predictive density. In the next step, the update phase is applied where measurements are taken into account to weigh all of the sampling sets. Then the particles are computed by re-sampling from the weighted set. These two phases are repeated recursively for the subsequent steps [130]. The MCL has some advantages over other algorithms, such as Kalman filtering. MCL is able to represent multi-modal distributions which is useful for self-localization and it is relatively easy to implement [131]. Fusing heterogeneous sources from different networks/sensors by PF has been widely studied. The prevalent one is the fusion of Wi-Fi positioning with IN [132][133][134], whose accuracy can be approved by aiding constraints to PF with map inforamtion [135,136], using local discernibility of magnetic signals [137], combining vision information [138], and aeliorating the filter operation [139,140]. Walking distance and map information can be directly integrated in the PF by using a non-linear prediction function. The map information is then effective auxiliary information that can be used to remove the impossible particles, such as setting the weights of this particles to zeros when the target exceeds certain bounds [135,141]. This can help in indoor environments when then user walks along a corridor and it is clear that he cannot cross walls.
For localization in wireless sensor networks (WSNs), in particular, a promising approach is the so-called SPAWN algorithm. SPAWN stands for sum-product algorithm over a wireless network [142]. The SPAWN algorithm makes use of factor graphs (FG) and sum product algorithm (SPA) where the FG is a method to graphically represent a factorization of a Bayesian network while the SPA is a message passing algorithm for performing inference on the FG. Consider a WSN consisting a set of nodes and a set of anchors. Each node estimates its information from the previous step from its last position solution. Then it receives messages from visible anchors and neighbouring nodes. Using the new information, it updates its positional estimation and shares it with its neighbours. The messages shared among the nodes represent the probability density function. This makes the approach a truly distributed algorithm which is highly suitable for cooperative positioning (CP) applications (see Section 6.6), where a group of users or sensor platforms are localized and navigated together in a neighbourhood.

Smartphone-Based Localization Capabilities
Smartphone localization and navigation is becoming more and more popular not only in LBS but also other surveying type applications. As stated in the introduction of this paper, they are enabling technologies for ubiquitous computing, facilitating continuous updates of a user's context [1]. This section mainly focuses on the usage of signals-of-opportunity (SoP) together with MEMS-based IN sensors. As mentioned in Section 4 the MEMS-based IMUs in smartphones can serve as a primary localization technique. Multi-senor fusion with absolute positioning is a key requirement due to the high INS error growth. As the most prominent SoP, Wi-Fi RSSI or RTT positioning are predestinated to estimate absolute positions from the derived ranges via (multi)lateration or location fingerprinting. UWB can be seen as an upcoming SoP where a significant increase of localization performance can be expected. Other promising technologies include LiDAR (Light Detection and Ranging) which is also already embedded in the Apple iPhone 12 Pro and iPhone 12 Pro Max. Further fast-paced developments on the sensor market can be expected.

MEMS-Based Inertial Sensor Systems
At the start of the development of MEMS-based sensors for localization the first priority was meeting high performance at reduced cost and size. Positioning accuracy was not necessarily the most important issue [1]. In particular, the development enables applications previously considered out of reach, such as guidance, navigation, and control. Thus, this has revolutionized the navigation market because of the small sensor size, extreme ruggedness, and potential for very low-cost and weight means of MEMS accelerometers and gyroscopes. Inertial guidance systems for smartphones were unthinkable before MEMS. In terms of the performance of MEMS IMUs the limiting sensor are the gyroscopes [143]. However, MEMS are now on the way to reach high accuracy tactical grade quality. In the following, the combined solution of MEMS-based accelerometers and gyroscopes and barometric pressure sensors are briefly described. Additionally, a magnetometer or digital compass is employed together with the gyroscopes for the users' heading estimation. Further introductory information may be found in [11].
Smartphones include combined tri-axis accelerometer and gyroscope systems. This is due the difficulty in producing high performing small gyroscopes. Thus, all-accelerometer systems (also known as gyro-free) were developed. Two approaches are typically used: (1) the Coriolis effect is utilized or (2) the accelerometers are placed in fixed locations and used to measure angular acceleration. The latter is known as direct approach [144]. In the first approach, three opposing pairs of monolithic MEMS accelerometers are dithered on a vibrating structure (or rotated). This approach allows the detection of the angular rate Ω. Both approaches have in common that the accelerometers also measure linear acceleration, which enables to provide a full navigation solution. In the direct approach, however, the need to make one more integration step makes it more vulnerable to bias variations and noise, so the output errors grow by an order of magnitude faster over time than when using a conventional IMU [145].
Magnetometers sense the local magnetic field for azimuth determination by exploiting the Lorentz force for measuring magnetic fields. The major sensing types are either capacitive, optical or piezoresistive [146]. The property used is the fact that the measured magnetic field or actually the magnetic flux density (in the unit Tesla [T]) is in general a measure for the exercised force on the sensor. The earth magnetic field has a flux density of around 30 to 60 µT. Electrical devices in the surrounding environment generate itself a magnetic field that can absolutely reach the magnitude of the earth's magnetic field. Thus, electrical devices can significantly influence the magnetic field sensor measurements making them often unreliable in buildings due to the presence of many electronic devices, wires, etc. That is why a combination with gyroscope measurements is usually performed and essential [11]. The disadvantage of gyroscope measurements, on the other hand, is that they can only provide a short term stability and show large drift rates which accumulate quickly.
MEMS barometric pressure sensors-or also referred to as altimeters-embedded in smartphones allow to determine differences in altitude (see Section 4.6). The key element of an air pressure sensor is a diaphragm containing piezoresistors which can be formed by ion implantation or in-diffusion. Applied pressure deflects the diaphragm and thereby changes the resistance of the piezoresistors. By arranging the piezoresistors in a Wheatstone bridge, an output signal voltage can be generated [147]. The measurement sensitivity of the pressure sensor is determined by the strain at the bottom plane of the diaphragm whereby larger strain leads to higher sensitivity [148]. Table 4 compares the location sensors in mobile devices for relative positioning classified depending on their navigation information and typical achievable accuracy. Specifications and characteristics are based on information and values taken from the literature, such as [1,4,7,8,15,77,79,93]. For absolute location determination, a shared reference grid for all located objects is used, while with relative localization it is meant that the positions of the user are determined relatively to a start point, for instance, using DR. As can be seen from the specifications in Table 4 typical accuracies and the characteristics vary quite significantly and all of the sensors provide a specific performance. Due to their additional complementary characteristics a meaningful combination in form of a hybrid solution yields obviously to the best performance for localization of a mobile smartphone user.

Wi-Fi in Smartphones
The use of Wi-Fi with its operational principle for localization is discussed above in detail in Section 5.1.1. To recap, either RSSI-based techniques or the measurement of the ranges with the RTT FTM protocol can be distinguished. The capabilities in terms of accuracies and robust estimation of derived RTT ranges are the main focus of this section.
To be able to measure the RTT a prerequisite is that both the Wi-Fi APs and the smartphone support the IEEE 802.11mc standard [81]. Apart from the new Wi-Fi chip-sets required, smartphones must also support at least Android version 9 or higher. The requirement for different hardware might be still a limited factor for the widespread use of Wi-FI RTT. A decisive advantage, on the other hand, is that the smartphone does not need to connect to the APs, and only the smartphone is used to determine the ranges guaranteeing the privacy of the user [149]. The authors in [150] report the achievable performance of first trials with Wi-Fi RTT range measurements. 12 test points were distributed in a regular grid with a spacing of 6 m between them in an outdoor urban environment (see Figure 17). GNSS observations were used to determine the location of these test points in addition. Results showed that the average deviations from the ground truth of the Wi-Fi RTT solutions were on average 1.4 m ranging from a few dm up to 2.5 m. Thereby the larger deviations resulted due to the poor geometry of the four distributed Wi-Fi APs (compare Figure 17 on the left). On the other hand, the GPS positions deviated on average at 5.1 m. The main reason for the large GNSS deviations, however, lies in the fact that the area was partially obstructing the satellite signals and therefore the positioning accuracies were not that high at all test locations. Thus, the GNSS solutions could not serve as ground truth.

UWB in Smartphones
UWB has been widely adopted due to its robustness against multipath and its centimetre-level accuracy [151]. Some newer smartphones (e.g., Samsung Galaxy S21, Apple iPhone 11 series) are already equipped with UWB chip-sets. They use a wide spectrum of radio waves for wireless communication. By measuring two-way ToF, it is possible to determine a range between two UWB radios. Although UWB chip-sets are currently not used for ranging applications, the inclusion of them into mobile devices opens up that possibility. Consequently, they can be seen as a future SoP just like Wi-Fi is currently. Thus, it may become the standard for indoor positioning in the next years as defined by the European Telecommunications Standards Institute (ETSI) [152].
UWB is the foundation of tracking tags like Apple's AirTag and Samsung's SmartTag Plus, which can help you find a lost keychain, purse, wallet or pet [153]. In a few cases, UWB lets you unlock your car as you approach with your phone, and it should let you do so with your home's front door, too. However, this is just the beginning for the utilization of UWB for localization in general.
The main drawback of UWB in smartphones, however, is that, in order to achieve a short pulse width, the UWB device has a high power consumption for a single packet transmission [154,155]. Thus, using the RTT protocol, which needs the exchange of multiple packets, will increment the energy consumption.

Light Detection and Ranging (LiDAR)
LiDAR does work in a similar way to radar, only it uses lasers to estimate distances and depth [156]. The first smartphones to incorporate LiDAR were the Apple iPhone 12 Pro and iPhone 12 Pro Max since October 2020. The concept behind LiDAR has been around since the 1960s. In short, the technique can scan and map the environment by using laser beams, then timing how quickly they return. More recently, LiDAR has been seen on self-driving autonomous vehicles, where it can detect objects like cyclists and pedestrians. LiDAR's possibilities have really opened up many types of applications. With the systems getting smaller, cheaper and more accurate, they have started to become viable additions to mobile devices that already have things like powerful processors and other sensors which can be employed for localization. Not all LiDAR systems are created equal; until fairly recently, the most common types built 3D maps of their environments by physically sweeping around a laser beam. For mobile devices, however, LiDAR systems have no moving parts. Their difference from ToF sensors is that the LiDAR is rather a scanner than the 'scannerless' systems seen on smartphones so far. Instead of using a single pulse of infrared light to create 3D maps, the scanning LiDAR system fires a large number of laser pulses at different parts of a scene over a short period of time. This brings two main benefits, i.e., an improved range of up to five meters and better object 'occlusion', which is the appearance of virtual objects disappearing behind real ones like trees. The high speed of the transmission is only possible with the latest mobile processors. As Apple stated at the iPad Pro 2020 launch, the LiDAR scanner's data is integrated with data from cameras and a motion sensor, then enhanced by computer vision algorithms for a more detailed understanding of the scene. The scanning thereby works best at room-sized scales opening up a number of applications. In particular, Augmented Reality (AR) applications are thought of by the developers.

5G Cellular Networks
5G is the fifth generation of mobile connectivity. It is a new global wireless standard after 4G networks. It is advertised as superior to 4G in many ways. Compared with previous cellular technologies, the new 5G standard has strong potential to change the cellular-based indoor localization systems [64]. The main reasons are mentioned in the following. First, the coverage range of 5G base transceiving stations (BTS) has to be shrunken from kilometers to hundreds of meters or even within 100 m [157]. The increase of BTS will enhance the geometry and mitigate NLoS conditions. Second, 5G has new features, including mmWave Multiple-Input and Multiple-Output (MIMO), large scale antenna and beam-forming. These features make it possible to use multipath signals for positioning [158]. MIMO antennae provide a precise orientation of the signal in one specific direction instead of multi-directional broadcast. Third, 5G may introduce deviceto-device communication [159] which makes cooperative positioning systems possible (see Section 6.6). Thus, 5G and another wireless systems provide a possibility for wide-area localization in indoor and urban areas.
Inside buildings and in dense urban areas, the super-fast 5G (mmWave) should be capable of achieving a positioning accuracy of 1 m or below. This sounds promising, but mmWave is unfortunately not a reality just yet [160]. For the time being, the 5G that mostly is deployed is called sub-6GHz 5G, which is faster than 4G but not yet sufficient to be used for accurate indoor positioning systems. Table 5 compares the main specifications of the cellular standards 5G-mmWave, 5G-sub-6-GHz and 4G LTE. Before all advantages of 5G can be used, wireless providers have to upgrade their radio antennae to work with the new network, and phone makers have to upgrade chips in their phones. Further challenges are that mmWave has weak penetration and ultra-short range although mmWave 5G networks are ultra-fast. mmWave's performance can be affected significantly by doors, windows, walls, trees, vehicles and even humans. The density of BTSs required to satisfy the quality of user experiences will be decided by the constraints of blockages and latency, rather than the requirements of coverage or capacity. Whereas 4G networks require 8-10 base stations per km 2 , 5G networks would need as many as 40-50 base stations per km 2 [160]. Especially indoors, full coverage for positioning would require a very large number of BTSs. Table 5. Comaprison of the cellular network standards 5G-mmWave, 5G-sub-6-GHz and 4G LTE (after [160]).  Figure 18 demonstrates the evolution of PNT solutions starting from positioning with a single sensor, then to multi-sensor solutions on a single platform (i.e., for one user) and finally to multi-sensor solutions on multi-platforms. The latter is a network-based approach which is usually referred to as cooperative or collaborative positioning, short CP. To date, multi-sensor and CP systems have been demonstrated to be useful for positioning of mobile platforms navigating in GNSS challenged and denied environments [29,[161][162][163]. The CP approach relies on information exchange in an interconnected network of multiple nodes (e.g., pedestrians, vehicles). In a CP network, each node shares information about its own state and relative information concerning its neighbouring dynamic nodes (i.e., Peer-to-Peer (P2P) ranges). Additionally, information can be exchanged with the static nodes (i.e., anchors or infrastructure nodes) [164][165][166]. The relative information such as ranges measured between dynamic and static nodes is also referred to as Peer-to-Infrastructure (P2I) range information. In addition to absolute information (e.g., GNSS), the cooperative solution can be based either on the P2P relative information only or on the combination of the P2P and P2I relative information (e.g., Wi-Fi, UWB). The multi-sensor solution can be based either on measurements that do not require infrastructure (e.g., IMU, GNSS) and/or P2I relative information (e.g., Wi-Fi, UWB). The absolute and relative information received from nodes can be processed to estimate the position of each node. The processing architecture for CP can be either centralised or distributed. Compared to distributed architectures, centralized ones offer improved accuracy but at the cost of increased communication and processing requirements. In contrast, distributed architectures offer robustness, scalability, and better reliability of the CP network [142]. One of the major limitations of distributed algorithms is the presence of unknown correlation among the states of the nodes [167]. The inclusion of static anchors (i.e., infrastructure nodes) has been shown to improve positioning accuracy [95]. On the other hand, infrastructure-free (i.e., P2P) CP systems do not rely on the presence of a fixed infrastructure and can use ad hoc networks for positioning. In GNSS-denied environments, such as an indoor environment, P2I and P2P networks can be best utilised by the realisation of a sufficient number of static anchor nodes, whose precise location is known in advance.
The key components of a CP network are: • inter-nodal ranging sub-system in a dynamic network; • optimisation of dynamic network configuration; • time synchronisation; • optimum distributed sensor aperture size; • communication sub-system; • selection of master and anchor nodes; and • network topology.
To summarize it can be said that CP is a fully integrated approach of sensors, signals and techniques. For sensor fusion (compare Section 5.3) EKF, Monte Carlo localization and artificial intelligence are common approaches.

Application Fields
A large number of application fields reveal the significance of indoor positioning to our society. They can be classified into geodetic and navigation type of applications. Typical application fields are LBS [168] and social networking, location tracking of first responders in emergency situations, navigation of visually impaired and handicapped persons, motion capturing, Augmented Reality (AR), Ambient Assisted Living (AAL) systems and medical care personnel localization and tracking, visitor tracking for surveillance and study of visitor behaviour and location-based user guiding and triggered context aware information services in museums, shopping malls, etc., location of assets and staff members in logistics and disposition to surveying and geodetic applications such as disaster management, scene modelling and mapping, positioning in underground construction sites and mining, structural health monitoring, etc. [4,93].

Comparison of Sensors, Technologies and Techniques
For system comparison Table 6 summarizes the commonly employed indoor localization techniques. The compendium is based on a modified and updated version as published by the author in [4]. In addition, the second Table in this section (Table 7) identifies the characteristics and specifications of the most relevant indoor localization techniques and sensors applied for smartphone positioning. The localization techniques significantly differ in their performance and required costs. The main positioning techniques are the cell-based or proximity technique, lateration and angulation, hyperbolic lateration, scene analysis, location fingerprinting, and dead reckoning (DR). Moreover, the specifications of the technologies are quite different as can be seen from the two Tables. Due to their specific advantages and disadvantages, hybrid solutions are popular, such as the combination of DR using INS with lateration or fingerprinting. Their integration leads to more robust, reliable and accurate performance under consideration of the key performance indicators as given in Section 2.1.
In the following the major common generic selection metrics are consolidated [169]. The third Table 8 in this section provides a compendium giving an overview of the major developed localization systems for indoor localization with their specific metrics. The metrics included into the Table are: (1) scalability; (2) accuracy, (3) complexity; (4) robustness; (5) reliability; (6) energy efficiency; (7) cost; and (8) throughput. Hence, these metrics represent the key performance parameters described in Section 2.1. As can be seen, quite a number of developers have produced different systems based on the use of RF-signals. For the comparison infrared, ultrasound, Wi-Fi, Bluetooth and UWB signals have been chosen as they represent most commonly used RF-based technologies and techniques. A selection of commercialised systems are mentioned in the Table in the row 'IPS (Indoor  Positioning Systems) examples' together with selected relevant references.
From these comparisons it can be concluded that none of the techniques can satisfy so far the performance requirements of any location identification application due to the fact that each technology has certain limitations. In the review written by Oguntala [169], Wi-Fi, UWB, Bluetooth, RFID and ZigBee showed higher viability over most other technologies due to their exceptional advantage of cognitive intelligence, compatibility with most mobile devices, higher precision, low EM radiation, large bandwidth, high penetrating power, lower effective coverage area exhibited in applications. The way to go is a hybridization of different technologies and sensors, such as MEMS-based IMUs. In fact, INS as primary sensor (compare Section 4) requires a hybridization anyway. Hence, this has led to many different developments providing effective approaches to achieve reliable positioning solutions also in the case of smartphone usage.   Table 8. Overview of commercialized RF-based location identification systems and their key performance parameters (updated excerpt from [169]).

Concluding Remarks
In this review paper we have surveyed indoor localization technologies and techniques and their challenges and new developments. In particular, the paper focuses on the use of mobile devices, such as smartphones and tablets. Starting with the user requirements where GNSS-like performance or even higher performance for safety and liability critical applications need to be achieved, the state-of-the-art and future directions for development are identified. Since most people spend in general more time indoors, location identification is highly important when it comes to the safety and emergency services. Thus, many researchers and developers work on localization techniques for indoor and challenging transitional environments to facilitate a seamless transition when moving from outdoor to indoor or vice versa. In the evolution of local-ization technologies and techniques, cooperative positioning (CP) of a group of users or sensor platforms was identified to play nowadays a decisive role. User platforms are localized together while interconnected in a certain neighbourhood. It was further discussed that a paradigm shift from classical approaches where INS bridge GNSS outages to a reversal approach where the INS are the primary sensors for positioning is currently emerging. In this way, the users' location and/or trajectories while moving have to estimated with advanced sensor fusion techniques. In the case of localizing pedestrians, PDR is the main technique for estimating trajectories using an EKF or particle filter (PF).
Furthermore, the comparison of the localization sensors and techniques reveals their potential for the development of ubiquitous positioning and navigation solutions. The use of SoPs for indoor localization will continue and even increasingly utilized as not only the most prominent SoP, i.e., Wi-Fi, but also the newly developed UWB chip-sets and LiDAR sensors for mobile devices have entered the smartphone market. These sensors are being developed in the direction of low-cost and small-sized to facilitate their commercialization for the mass market. Moreover, their cost will be significantly reduced following these developments. To summarize it can be noted, that the high-pace developments in the smartphone market are leading to solutions that will facilitate robust and precise localization capabilities with high performance. Apart from improving these low-cost sensors, future trends are heading towards the use of multi-platform, multi-device and multi-sensor information fusion.
Funding: This research received no external funding.

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.