Dynamic Vertical Mapping with Crowdsourced Smartphone Sensor Data

In this paper, we present our novel approach for the crowdsourced dynamic vertical mapping of buildings. For achieving this, we use the barometric sensor of smartphones to estimate altitude differences and the moment of the outdoor to indoor transition to extract reference pressure. We have identified the outdoor–indoor transition (OITransition) via the fusion of four different sensors. Our approach has been evaluated extensively over a period of 6 months in different humidity, temperature, and cloud-coverage situations, as well as over different hours of the day, and it is found that it can always predict the correct number of floors, while it can approximate the altitude with an average error of 0.5 m.


Introduction
Indoor maps have become a necessity in robotics, augmented reality, location-based services, mobile ad hoc networks, and search and rescue missions. Because of the high manual effort of generating indoor maps, there have emerged approaches for the dynamic generation of two-dimensional indoor maps through crowdsourced sensor data (e.g., [1,2]). However, these approaches require precise localization. Although many localization providers argue having achieved an average accuracy of 6 m in horizontal localization, none of them provides vertical localization. This has as a result pushed back milestones scheduled by initiatives that are focused on accelerating the research of indoor localization, as these milestones require storey-level localization. Such initiatives are the Enhanced 911 [3] in the United States, and the Enhanced 112 in the European Union [4], as well as the European Accessibility Act [5]. The main reason for the lack of vertical localization providers is the limited information available, for example, the lack of precise altitude indication for every floor in a building in existing maps. To the best of our knowledge, no approach for the dynamic vertical mapping using crowdsourced smartphone sensor data has been proposed.
This paper aims to automate the indoor vertical mapping process, while enriching existing maps with indoor information. In this way, we enable maps to carry information regarding the number of floors in a building and the corresponding altitude of each floor. We achieve this using the novel method we use to fuse the barometric sensor of smartphones with other sensors for the extraction of the ambient reference pressure in locations, which can be used for precise altitude estimation.

Background on the Barometric Formula
The atmospheric pressure is the weight exerted by the overhead atmosphere on a unit area of a surface. The barometric formula describes how this atmospheric pressure is reduced when the altitude is increased and vice versa. The unit of pressure is 1 hPa = 1 mbar = 100 Pa.
The barometric formula reads: where h b is the reference altitude, T b and P b are the temperature and pressure at the reference point, L b is the standard temperature lapse rate of 6.49 K/km, P is the pressure at the current point at height h, R is the universal gas constant 8.3144621 J/K/mol, g 0 is the earth's gravity acceleration 9.80665 m/s 2 and M is the molar mass of the earth's air 0.0289644 kg/mol. Equation (1) can be altered for estimating altitude to give the following: The barometer equation is valid within a few kilometers of the earth's surface, within which the lapse rate, gravity acceleration and air composition can be considered constant, given that P b and T b consistently refer to the reference height h b . According to the barometric formula, a 1 mbar difference in pressure, with a 15 • C ambient temperature, leads to a 8.33 m altitude change, while a 1 m change of altitude leads to a 0.1201 mbar change in pressure.

Contribution
The contributions of this paper can be summarized as follows: • We introduce a novel infrastructure-independent method for the dynamic vertical mapping.

•
We introduce a novel approach for the reference pressure estimation through the identification of the outdoor-indoor transition (OITransition) of the user through the fusion of three different sensors. In this way, the need of calibration between sensors becomes obsolete.

•
We propose an enhancement of the CityGML level of detail two plus (LoD2+) method that provides the indoor geometry of buildings at lower levels of detail.
This paper is an extension of work already presented in [15]. More specifically, in this paper, we have extended the approach, by including an additional sensor for the OITransition discovery. This additional sensor is the magnetic sensor, and more information about it is available in Section 3.3.3. Additionally, the method has been extended and sensor fusion functionality has been added in the reference pressure area component. More information is available in Section 3.3.4. Moreover, the evaluation has been extended with additional collected data over longer period of time, as can be seen in Section 4. Finally, as a result of the above-mentioned extensions of our approach, we have achieved a more accurate identification for the recognition of the OITransition discovery with a true positive score of 99.3% instead of 94.2% in the past. This makes our method more robust against various building characteristics.

Paper Structure
In this paper, the related work is introduced in Section 2; the approach is described in Section 3; the evaluation is presented in Section 4; the paper concludes in Section 5, where limitations to validity are also presented; the resulting models are presented in Appendix A; and the list of collected data is presented in Appendix B.

Related Work
Enhancing CityGML models with indoor geometry has already been discussed in [11]. In this study, the LoD2+ method was introduced. The method is robust and was implemented successfully using Nef Polyhedra. However, the authors used some prior knowledge, such as building facades and available data modeled following the LoD2 format. As a result, this method is not applicable to general cases because not all buildings contain sufficient information that can be used for mapping indoor areas.
Apple holds a patent that focuses on the visualization of information in indoor 3D places [16]. They do not consider altitude estimation, but instead they assume the existence of indoor maps with locations that specify where vertical transitions may occur, annotated on the map, and a two-dimensional localization mechanism. Additionally, they assume that users can be localized in a particular floor using a particle filter-based framework, which is responsible for assessing the probability of a vertical transition. In this framework, the confidence is quantified on the basis of WiFi access points and the receive signal strength.
Kaiser et al. [17] point out the need of detecting vertical transitions because of the limitation of the Zero Velocity Update (ZUPT) algorithm to identify vertical displacements. To solve this problem, they introduce a moving platform detection module. This works by combining accurate sensors, not those available on a smartphone, such as an accelerometer, barometer and magnetometer. These use ZUPT for localization and a Simultaneous Localization and Mapping (SLAM) algorithm for reducing the remaining drift. They estimate altitude using the barometric sensor, while they also use it to identify landmark phases. In addition, they attempt to identify the boundaries of vertical movements. The intuition for the use of acceleration for the detection of vertical transitions is that the acceleration caused by external factors is weaker than that caused by the pedestrian. However, their approach focuses on correcting real-time localization and assumes the existence of indoor maps.
Li et al. [7] suggest using barometers for 2.5-dimensional (2.5D) (floor-level) localization. They examine how the barometric formula performs for altitude determination. They researched the robustness of altitude estimation on different devices that record differences from 2.1 to 2.5 hPa, which is translated to an offset of multiple floors. They noted that the variation of pressure over 2 h could reach an equivalent of a 10 m height change. They also examined latency robustness as well as stability in the short term, where they noticed changes of 0.1 hPa every 10 min. On the basis of their experiments, they argue that it is impossible to accurately determine height using a barometer in an indoor environment in an absolute manner. They strongly point out the necessity of a reference station. In their study, they used a reference station 5 km away. However, a reference station is not always available, and using other devices such as reference stations requires calibration, which is not realistic in a real-world scenario.
Xia et al. [6] propose the use of multiple barometers as reference points for the floor positioning of smartphones with built-in barometric sensors. This method does not require knowledge of the accurate heights of buildings and storeys. It is robust against temperature and humidity, and it considers the difference in the barometric pressure-change trends and different floors. The intuition is that atmospheric pressure decreases as the altitude increases. Hence, pressure changes that correspond to altitude changes are possible to be calculated using a reference pressure and the barometric formula. As they argue, humidity does not significantly affect the accuracy of the system for indoor altitude estimation; thus, they use the gas constant for dry air and the air molar mass of dry air instead of humid air. On the basis of the barometric formula and using built-in barometric sensors of smartphones as well as information from a local weather station, they are capable of achieving a good discretization between different floor levels. For the current temperature, they consult a local weather station online service. However, this approach is heavily dependent on dense existing infrastructure, while it focuses only on localization and assumes the existence of maps, which describe the location of each sensor.
Bollmeyer et al. [18] use barometers for medical applications in which a precise altitude estimation of the patient's body is needed. A challenge in this case is the disturbances due to macroscopic flow, such as the influence of ventilation, the opening and closing of doors, or the weather. Calibration between sensors is also needed, in order to compensate for the offset between different sensors. In their research, they created a small sensor network, with sensors attached to the patient body, as well as a reference stationary sensor. They measure a maximum error of 21 cm, but they suggest that a second sensor might reduce the maximum error to 10 cm. However, in our application scenario, we do not focus on such accurate vertical localization; we are looking for an infrastructure-independent approach.
Liu et al. [14] argue that the estimation of altitude via GPS is applicable only outdoors, although even there, its error can be 2.5 times the error of the horizontal location. As a result, they suggest barometers for vertical localization. Their main limitation is the lack of reference points, because the only available reference stations are meteorological stations, which are often sparsely located, while they broadcast periodically, usually at 1 h intervals. Therefore, they introduce the concept of ad hoc reference points. They integrate information from multiple points, while they also use forecast models to estimate air pressure on demand. Besides reference meteorological stations, they additionally use other smartphones when the elevation indication is accurate enough. In order to retrieve better accuracy from other phones, first, they take into account all the reference points that are within a specified distance and time period, and then they give higher weights to reference stations that are closer in distance as well as in time. They also assign a different credibility to different reference stations. Hence, a reference station will be more reliable if its location is known and can report better pressure. They score errors of less than 3 m in outdoor walking, 6 m in mountain climbing, and 0.9 m in indoor floor localization. However, an ad hoc reference sensor reading will constantly have the need of being extracted; it is not clear how this can be achieved, particularly without maps that describe those reference locations.

Approach
In this section, we present the main components of our approach. As visualized in Figure 1, the approach is composed of the Sensor Data Collection module, which collects the data from smartphone users via an application that has been developed for the purpose of this research and can be found in ref. [19]. After smartphone pressure sensor data are collected, noise is filtered out in the Signal Filtering module. The Reference Pressure Extraction module has two roles: (1) to filter outdoor data, and (2) to identify locations where pressure readings can be extracted. In the Stair Removal module, features that belong to intermediate heights (i.e., stairs or elevators) are rejected. Remaining pressure readings are later used in the barometric formula for Altitude Estimation. In the Data Aggregation module, we combine data from multiple users, while the Floor Estimation module has two roles: (1) to identify the number of floors in a set, and (2) to estimate the altitude of each floor. Finally, in the CityGML Generator module, a CityGML Model is dynamically generated.

Sensor Data Collection
The sensor data collection module collects sensor data from pressure, light, GPS, proximity and magnetic sensors. Data collected during different temperatures, days, times and humidity situations, labeled with a time-stamp and a unique user identifier, are streamed on a server developed for this purpose through a client-server approach via HTTP protocol, in JSON format. Our collected data are openly available in [20].

Signal Filtering
For smoothing the collected data, the Savitzky-Golay filer [21] is used. Savitzky-Golay is a moving average filter, which applies local regression to a subset of our entire dataset. More specifically, it smooths data by replacing each data point with the average of the neighboring data points within a defined span. This approach is equivalent to where y s (i) is the smoothed value for the ith data point, N is the number of neighboring data points on either side of y s (i), and 2N + 1 is the span.

Reference Pressure Extraction
The reference pressure is essential for estimating the altitude differences on the basis of the barometer equation using pressure data. The reference pressure is extracted from areas that fulfil the following preconditions: (1) they are common for all user data of each building, (2) they are located indoors, and (3) the pressure fluctuations are low. Such an area is the one that follows the OITransition, as everyone inside a building was at some point in time outside, while it is located indoors where the pressure disturbances are low.

Light Sensor
As has been already suggested by Zhou et al. [22], the OITransition can be identified by aggregating multiple smartphone sensor data. A very promising sensor for this is the ambient light sensor, considering the fact that there is a difference of the light intensity between indoors and outdoors. For identifying the OITransition, in our research, we fuse light and proximity sensors, with 7 and 25 Hz recording rates, respectively. The first sensor helps us to identify the transition, and the second is used as a supportive sensor, indicating when to trust the data, as it can indicate that an object blocks the light sensor.
As can be seen in Figure 2, the light intensity drops when entering the building during the day and increases during the night, while the proximity sensor indicates whether to trust the light sensor, because of various phone poses (e.g., phone in pocket). Hysteresis thresholding is used for maximizing the margins of the signal that belong outdoors and indoors. Finally binary classification is applied on the basis of the high and low distribution frequency, while the decision of whether the data is collected during day or night taken from the hour angle ω 0 of the sun (negative at sunrise; positive at sunset) is computed with where φ is the latitude of the observer on the earth and δ is the sun's declination.

Figure 2.
Light data from six outdoor-indoor transitions (OITransitions) collected during the same day, five during day time and one during night. As can be seen, during the OITransition (after the 70th sample), the light intensity rapidly decreases during the day (left axis) and increases during the night (right axis).

Hysteresis Threshold
The hysteresis thresholding algorithm uses multiple thresholds to find rapid changes in a signal. The algorithm is thus used to discriminate indoor and outdoor locations. Figures 3 and 4 show that it allows the identification of OITransitions with great accuracy. First, we estimate the upper and lower thresholds for the hysteresis thresholding, on the basis of a histogram analysis. In the histogram analysis, we compute frequency distributions of discrete light intensities. We select the upper and lower thresholds on the basis of the pattern of the distribution. If an OITransition exists in the sensor data segment, then the distribution forms a bimodal pattern and the thresholds are selected from the lower and higher peaks of the distribution. Alternatively, if the sensor data segment contains an OITransition, the distribution shows a symmetric pattern and it is not be taken into consideration as a potential OITransition segment. After the upper and lower thresholds have been defined, the upper threshold is used to find the start of a rapid transition. Once a start point is found, then the path is traced from the rapid signal transition through the signal, segment by segment, marking indoors whenever it is above the lower threshold. It stops marking indoors only when the value falls below the lower threshold. We note that during the period after sample 7 × 10 5 , the smartphone was in a pocket. However, it is wrongly classified as indoors. This demonstrates the need for fusion with the proximity sensor, which can indicate whether the phone is exposed (the light sensor can be trusted) or not. Unfortunately, as can be seen in Figure 4, the accuracy of the indoor classification at night-time is reduced in comparison to during the day time.

GPS Uncertainty
Another characteristic of the OITransition is the rapid increase of the GPS uncertainty. As a result, in our approach, we recorded the GPS uncertainty with a sampling frequency of proximately 1 Hz. As can be seen in Figure 5, at the moment of the transition (after the 80th sample), the GPS uncertainty increased from less than 10 m to almost 60 m. Hysteresis thresholding [23] was applied for the maximization of the margin between low GPS accuracy (indoors) and high-accuracy data (outdoors) for better classification. More specifically, GPS uncertainty was first smoothed via a Gaussian smoothing filter. Then multiple hysteresis thresholding was applied in order to enhance the margin and hence the accuracy of the OITransition classification. The approach can be seen in Figure 6. As can be seen in the figure, raw GPS uncertainty (red line) was first smoothed with a Gaussian filter. Then hysteresis thresholding was applied to the smoothed signal (magenta line).

OITransition Detection and Histogram Analysis
Before hysteresis thresholding was applied, the raw GPS uncertainty signal was smoothed via a Gaussian smoothing filter. Then multiple hysteresis thresholding was applied to enhance differences between segments of the signal that belonged outdoors or indoors. This approach is detailed explained in Section 3.3.1 and visualized in Figure 6.
For the identification of an OITransition in the data segment, as well as for the definition of the threshold in the hysteresis thresholding, histogram analysis was applied in the entire GPS uncertainty signal segment. As can be seen in Figure 7, the frequency of different uncertainty radii is visualized in the histograms. As can be seen in Figure 7c, the histogram forms a bimodal pattern when an OITransition occurs. This is a recognizable characteristic of a segment of uncertainty data that contains an OITransition. Once a transition is identified, the two peaks of the signal are used as the upper and lower thresholds in the hysteresis thresholding algorithm. On the other side, as can be seen in Figure 7a,b, the histograms show a more symmetric pattern, which is an indication that the data are extracted from a single place; this place is either indoors or outdoors. More specifically, as can be seen in Figure 7b, the GPS uncertainty is high-more than 20 m-which is an indication that the particular segment has been extracted from exclusively indoor locations. On the other side, as can be seen in Figure 7a, the GPS scores a low uncertainty-less than 15 m-which is an indication that the data are extracted from exclusively outdoor locations.

Magnetic Signal
The magnetic sensor can detect disturbances of the ambient magnetic field, as a result of steel elements inside the walls of a building. Hence, the intensity of the magnetic field can be used as an indicator for identifying the OITransition [24]. In this section, we introduce a process for the identification of the OITransition by measuring the disturbances of the magnetic field. For the identification of OITransitions, we combine a Gaussian filter and the moving window standard deviation. In the following example, we selected a magnetic dataset from the collected data [20] from four of our buildings. The corresponding magnetic signal is shown in Figure 8: The route that corresponds to the signal shown in Figure 8 begins outdoors, followed by four indoor transitions and four outdoor transitions. Towards the end of the time interval, the third outdoor transition occurred when exiting the fourth building.
In the first step, disturbances in the signal were found using the moving window standard deviation with a window size of 20 samples along the time axis. The resulting signal is shown in Figure 9 (orange line). Once disturbances were identified, a second moving standard deviation extraction was applied to the new generated signal. This time, the window size corresponded to 200 samples. The result is illustrated in Figure 9 (purple line).  In the third step, a Gaussian filter was applied to the resulting signal in order to smooth it with a kernel of 500 samples. As can be seen in Figure 9 (red line), this contributed to the identification of the four blobs that correspond to the duration-one by one-of the indoor walking activities. They then could be used to distinguish indoors from outdoors.
Finally, in order to enable binary classification between indoor and outdoor areas, a moving STD was performed, followed by another Gaussian filtering step. The resulting signal was then used to determine the start and end of the indoor areas. The final classification can be seen in Figure 10 (black line), where the value 1 corresponds to indoors and 0 corresponds to outdoors.

Fusion
The sensor fusion was made as is described in Table 1. The sensors that have been taken into consideration are the proximity, the light, GPS and the magnetic field sensor. Their decision is fused as follows: • If the proximity sensor indication is false, this implies that there is no obstacle blocking the light sensor. As a result, three sensors are available. Hence, the result is determined on the basis of the voting fusion. For example, if the light and GPS sensors identify that the particular data segment is extracted from indoors, then the segment is classified as an indoor data segment.

•
On the other side, if the proximity sensor indicates "true", then we have only two sensors available. The majority voting can thus not be applied here. Hence, in such a case, the logic operation and is applied. For example, if the magnetic sensor indicates disturbances-and as a result, indoors-but the GPS uncertainty is low, which indicates outdoor space, then the segment is classified as outdoors.

Stair Removal
In the stair removal phase, sets of features with high disturbances in the pressure readings are rejected, as they mostly correspond either to vertical transitions (e.g., stairs or elevators) or to outliers (e.g., high wind velocities). Such features of high disturbance are identified using the moving window standard deviation.
This approach is equivalent to where x i is the instance of the input signal and N is the number of elements.

Altitude Estimation
The altitude is estimated on the basis of the barometer Equation (2) as follows: where P 0 is the reference pressure extracted from the location where the OITransition was identified, P i is the current pressure value and T b is the temperature value in • C, which is extracted via openly available weather stations online.

Data Aggregation
Data aggregation is essential for identifying all floors inside a building, as not all users are expected to visit all floors. In the data aggregation module, multiple recorded data are fused. Grouped by their GPS coordinates and combined with the building outline, extracted from OpenStreetMap [25], it is ensured that the data always correspond to the same building. More specifically, altitude information estimated from multiple users and labeled by their unique users identifier (UUID) are sorted by their time-stamp and fused together for the classification phase. Because the reference pressure for the altitude estimation is extracted by the same device as that used for estimating it, approximately at the same location for all the users, because of the novel approach for reference altitude extraction on the basis of the identification of the OITransition, there is no need to calibrate any sensor between different phones. In this paper we consider all existing entrances of a building to be at the same altitude. However, in the case of multiple entrances at different altitudes, the entrance altitude, as well as the longitude and latitude, can be extracted from [25], and then the OITransition can be used for the identification of the entrance location. Once the entrance location is identified, the difference between the global altitude of the entrance can be used for locally referring the floor height.

Number of Floors Estimation
Because the number of floors as well as the label of every floor (i.e., the corresponding altitude) are unknown, for classification, we used a classifier able to cope with unlabeled data. The classifier K-means was selected because of its simplicity and its relatively low processing demand. For estimating K, the elbow method was selected. The classification process is divided into two main steps. The first step is the identification of K, which corresponds to the true number of floors. In the second step, the center of each cluster is recognized, which corresponds to the altitude of every floor.

Identification of K
Because the number of floors is unknown (K), it has to be estimated in the first step. For this purpose, the elbow method [26] was chosen. The elbow method is a clustering analysis method, and it enables the interpretation and validation of the consistency within the cluster analysis. It takes into consideration the percentage of variance explained as a function of the number of clusters: the optimum number of clusters is reached when adding another cluster no longer improves the modeling of the data. If we plot the variance as a function of the number of clusters, the first clusters will add much information, but with an increasing number of clusters, the marginal gain will drop and the graph will flatten out, indicating the optimum number of clusters. Identifying the correct number for K is essential, as it corresponds to the number of floors. A wrong estimation of K can lead to large errors in the estimated altitude of each floor.

The Centroid of the Clusters
After K is identified, the classification is made using K-means, as the cluster label (i.e., the altitude of each floor) is unknown. The input to the algorithm is the computed vector of filtered pressure data and the estimated number of floors. The algorithm's output is then a vector with the assigned classes for every input point and the cluster centroids.

Implementation in CityGML
In our research, we concentrate on the derivation of the floor numbers and their heights. This does not allow us to create a complete LoD4 model. As a result, we enhance the LoD2 model geometry with the hull geometry for each floor. For this purpose, we introduce LoD2+, as visualized in Figure 11. Figure 11. The proposed level of detail two plus (LoD2+) model, which carries information about the number of stores as proposed by [11] and their corresponding altitudes.
In LoD2 and higher LoDs, the outer facade of a building can be modeled semantically by the _BoundarySurface. The _BoundarySurface is a part of the building's exterior shell with an assigned function such as the wall WallSurface, roof RoofSurface, ground plate GroundSurface, outer floor OuterFloorSurface, outer ceiling OuterCeilingSurface or ClosureSurface. For indoor modeling FloorSurface, InteriorWallSurface, and CeilingSurface can be used [27]. In [11], the authors enhance the CityGML scheme with a new feature class, Storey, which has five attributes: class, function, usage, storeyHeightAboveGround and storeyVolume.
To model the indoor geometry, we keep the LoD2 representation using _BoundarySurface and add indoor geometry for each storey using FloorSurface, InteriorWallSurface, and CeilingSurface, as well as the feature class Storey introduced by [11]. In addition, we propose a further attribute of the feature class Storey: storeyAltitude. This attribute is necessary for our application, as the output of a navigation device is an altitude and not the height above the ground. This extension is not included in the current version of the CityGML specification, however we suggest to include it in the next release.
For the dynamic generation of the CityGML model, citygml4j [28] was used. This is an open-source library for Java, which binds the XML Schema definitions of CityGML to a Java object model.

Evaluation
In this section, we present the evaluation of the proposed method for the dynamic vertical mapping from user smartphone data as shown in Table 2. More specifically, in Section 4.1, the difference in calibration between the two phones used in this experiment is presented. Section 4.2 presents the robustness of our algorithm against various human walking velocities. In Section 4.3, the performance of the identification of OITransitions is evaluated. In Section 4.4, the evaluation of the identification of the number of floors and their altitude estimation during various weather conditions is presented. For the evaluation of the stair removal Section 4.2, data were collected from three different human walking velocities. Finally, a detailed evaluation, with datasets collected over a period of 6 months from three different buildings, is presented in Section 4.4.

Different Phone Calibration
In this section, we discuss the use of our algorithm for two different smartphones. As can be seen in Figure 12, there is an offset between the sensor readings of the two phones. This implies that there cannot be a single point of reference for both sensors and highlights the need for calibration between the two phones. However, as can be seen, the offset between the two sensors is almost stable. As a result, this effect demonstrates the need of self-reference that our approach offers. Hence, considering the fact that each phone will extract reference pressure from its own sensor and the fact that the offset between different phones is stable, our proposed approach will work for any given barometric sensor calibrated under any given circumstances.

Evaluation of Stair Removal
For testing the robustness of our algorithm against different walking velocities in the stair removal component, we recorded data with three different walking velocities, approximately 1×, 1.2× and 1.5×, while climbing five pairs of stairs on a building, as can be seen in Figure 13. As demonstrated in the results (Table 3), the algorithm scored a precision of 94%, recall of 93.8% and F-score of 93.9% on correctly identifying the stairs, with the same sliding window length for all datasets. The sliding window size was 50 samples long or approximately 10 s, while it slid for every sample or approximately every 250 ms. Figure 13. Dataset used for the evaluation of the stair removal method. The data was collected from the same route for three different visits and walking velocities, approximately 1×, 1.5× and 2× [15]. Table 3. Confusion matrix of stair removal.

Fast
Normal Slow

Evaluation of Reference Pressure Extraction
The reference pressure value for the altitude estimation with the barometric formula corresponds to the location that follows the entrance of a building, as detailed described in Section 4.3. As a result, the identification of the OITransition is necessary in order to identify the building entrance. The transition is identified by monitoring peaks and drops by monitoring peaks and drops in the readings of a number of sensors and their fusion, as suggested by [22] and described in Section 4.3. However, in our scenario, the ambient light, the GPS uncertainty and the disturbances of the magnetic field are taken into consideration, rather than the WiFi Received Signal Strength (RSS) and the Global System for Mobile Communications (GSM) RSS. The approach has been evaluated in three different buildings with four collected datasets for each building, during day and night. Our collected data and the algorithm used for the evaluation are open-source and can be found in [20].
As can be seen in Figure 14, the OITransition (red dots) was successfully identified in all of our datasets. Additionally, the entrance location could also be approximately determined by our approach. This was considered as the place of the transition, for example, the space between the last low GPS uncertainty values and the first high GPS uncertainty values. As a result, we could additionally estimate the spatial error of our approach for the OITransition identification. Hence, the entrance location latitude has been approximated by an average of 1.6 m, while the entrance longitude has been approximated by an average of 5.5 m. This score was lower than the GPS average error outdoors, which was between 10 and 12 m. Furthermore, five out of nine times, the entrance location was identified at the latitude of 48.1489, while two times it was identified at the latitude of 48.14895 and once it was identified at the latitudes of 48.14885 and 48.149. The final latitude was to be decided on the basis of the median, which was 48.14894251, while the true entrance latitude as mapped in the open street maps was 48.1489277. Hence, our algorithm scored an error of 0.00001 • , which corresponded to less than 1.64 m. Regarding the longitude, three out of nine times the entrance was localized at the longitude 11.5677, twice at 11.56775 and once at 11.56755, 11.5676, 11.56765 and 11.568. The final entrance location was estimated from the median at 11.568, when the true entrance was located at longitude 11.568. As a result, our algorithm had an error of 0.00004 • , which corresponded to approximately 4.614 m.
Finally, in Tables 4-6, a detailed evaluation of the OITransition determination for each sensor and the sensor fusion for all 3 buildings and 12 datasets is presented. According to the tables, our algorithm scored an average of 96.8% for precision, 94.2% for recall and 95.5% for the F-score, for identifying the OITransition using a GPS sensor. It scored 93.6% for precision, 96.3% for recall and 94.9% for the F-score for OITransition detection with a light sensor. It scored 88.8% for precision, 89.2% for recall and 89% for the F-score for OITransition detection with a magnetic sensor. It scored 99.4% for precision, 90.7% for recall and 94.8% for the F-score for the fusion of all sensors on the basis of the voting fusion. When the light sensor was not available or when the proximity sensor indication was true, it scored 99.1% for precision, 97.3% for recall and 98.2% for the F-score.  As a result, we can conclude that the OITransition can be recognized and represents a robust means for the extraction of the reference pressure. Additionally, the GPS sensor scored the lowest false positives, while the light sensor scored the lowest false negatives. Furthermore, the fusion of the three sensors scored the lowest false positive rate, and the false positive rate dropped only by 0.3% when the light and magnetic field sensor were the only sensors that were fused.

Evaluation
This section presents a long-term evaluation for the number of floors and the floor heights determined for the buildings TUM main campus (library; Section 4.4.1), Building B (Section 4.4.2), and Deutsche Akademie (Section 4.4.3). The ground truth was obtained via a high-precision laser range meter device. We observed these buildings for about 6 months to evaluate the effects of long-term weather conditions on the measurements.

Building 1: TUM Main Campus
TUM main campus has five floors and a ground floor. The true height for each floor is listed in Table 7. Nineteen datasets were collected from TUM main campus, over a 4 month period. The average duration of the datasets collected was 14.2 min, with an average of 3204 samples from the pressure sensor. All the collected datasets are available in [20]. We collected data from various hours during daylight and night; different routes were traveled inside the building, at different temperatures, humidity levels and ambient pressure, and finally with different cloud coverage. After smoothing and clustering the data as explained in Section 3.2, the OITransition was identified as described in Section 3.3. The accuracy of this component is presented in Section 4.3, in Table 4. Once the OITransition was estimated and the reference pressure was extracted, the altitude of every pressure reading that belonged to indoors was computed. Once all the pressure readings were translated into altitude, they were imported to the elbow method for floor number identification. As can be seen in the elbow method results, in Figure 15a, the number of floors (i.e., clusters, K = 6) in our dataset has been identified correctly for the aggregated dataset as well as for the May and June datasets, for which all floors of the building were visited. The threshold selected for all datasets was 99.12% for the distortion percentage, and for the clustering, the K-means algorithm was selected. Additionally, it can be seen that for the February dataset, the number of floors predicted was five (K temp = 4), as the fifth floor was not visited during this month. For March, the predicted number of floors was four (K temp = 3), as the two highest floors were not visited during that month. For April, the predicted number of floors was three (K temp = 2). Finlay, in the datasets extracted during July, the predicted number of floors was four (K temp = 3), and the third cluster's distortion fell slightly above the 99.12% threshold.
To demonstrate the performance of the altitude estimation or the label of each class (i.e., the centroid of each cluster), the corresponding estimated floor altitude is visualized together with the ground truth in Figure 16a and is listed in Table 7. As can be seen for the aggregated dataset, the maximum error was at 0.66 m, while the minimum error was at 0.48 m. In Figure 16a, it can also be seen that the fact that some datasets were non-visited floors (i.e., July, April, March and February) did not cause a problem to our database, as these floor altitudes were ignored.  Table 8. We have collected data following the same strategy as mentioned above. Twenty-five datasets were collected from Building B in Munich. All collected datasets are available in [20]. After following the procedure described above, for smoothing, clustering and identifying the OITransition, we extracted the reference pressure and then the altitude of every pressure reading that belonged indoors. Finally, once we translated all the pressure readings into altitudes, they were imported to the elbow method for floor number identification.  As can be seen in the elbow method results, in Figure 15b, the number of floors (i.e., clusters, K = 6) in our dataset has been identified correctly for the aggregated dataset as well as for the May and June datasets, for which all floors of the building were visited. The threshold selected for all datasets was 99.12% for the distortion percentage, and for the clustering algorithm, the K-means algorithm was selected. Additionally, it can be seen that for the July dataset, the number of floors predicted was two (K temp = 1), as the four higher floors were not visited during this month.
Regarding the height estimation, the corresponding estimated floor altitude and ground truth are presented together in Figure 16c, as well as in Table 8. As can be seen, in the aggregated dataset, the maximum error was at 1.12 m, while the minimum error was at 0.31 m. In the figure, it can also be seen that for the July dataset, only two floors were visited.

Building 3: Deutsche Akademie
We collected 20 datasets from Deutsch Akademie. This consists of five floors and a ground floor. All collected datasets are available in [20]. The true height for each floor is available in Table 9. Once we estimated the OITransition and extracted the reference pressure, then the altitude of every pressure reading that belonged indoors was computed. Once all the pressure readings were translated into altitude, they were imported to the elbow method for floor number identification.
The ground truth of this building is illustrated in Table 9. As can be seen in the elbow method results, in Figure 15c, the number of floors (i.e., clusters, K = 6) in our dataset has been identified correctly for the aggregated as well as the May and June datasets, for which all floors of the building were visited. The threshold selected for all datasets was 99.12% for the distortion percentage and the clustering algorithm was selected for the K-means algorithm. Additionally, it can be seen that for the June 1 and 13 datasets, the number of floors predicted was five (K temp = 4), as the third floor was not visited during this period. For June 14 and 15 as well as for June 21 and 22, the predicted number of floors was four (K temp = 3), as the two floors were not visited during these period. More specifically, the non-visited floors were the first and second, for the first dataset and the two highest floors for the later dataset.
On the other hand, the corresponding estimated floor altitude and ground truth are visualized together in Figure 16c and Table 9. As can be seen for the aggregated dataset, the maximum error was at 0.61 m, while the minimum error was at 0.23 m. In the figure, it can also be seen that some datasets being non-visited floors (i.e., June 14-15 and 23-29) did not cause a problem to our database, as these floor altitudes were ignored.

Conclusions
This paper describes our novel framework for the dynamic mapping of the vertical characteristics of a building. The proposed method makes use of a new sensor available in the latest smartphones (the last from 2017), and the barometric sensor, which indicates the ambient pressure and manages uncertain sensor data collected from crowdsourcing. The method estimates the altitude of the collected data with the use of the barometric formula. For achieving this, we introduce a novel approach for the extraction of the reference pressure at the OITransition of the user, which is identified through sensor fusion. More specifically, the GPS uncertainty, the magnetic disturbances and the ambient light are taken into consideration for identifying the transition, while the proximity sensor is also used as a supportive sensor. We faced an unsupervised classification problem, in which the number of floors-or the number of clusters-as well as the altitude-or the label of each class-for each floor were unknown. To resolve this problem, a clustering analysis technique called the elbow method and the popular K-means clustering algorithm were used. Finally, we propose a way to map these characteristics by enhancing the standards of the CityGML, enabling it to carry information about the vertical characteristics of a building in lower LoDs.
Although it has been demonstrated in the paper that our approach can work with any barometric sensor (Section 4.1), as the offset between different barometric sensors is stable, our approach has been extensively evaluated only for the Samsung Galaxy S6 [29].
Additionally, we noticed that when a significant delay follows the OITrsansition and precedes ascending to different floors, the vertical localization error increases. This is due to the long-term instability of the ambient air pressure. The same happens when there is lack of data from one floor. It is very likely that such data will not be taken into consideration in the clustering analysis and finally in the clustering phase. This will result in a missing floor in the final model.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Appendix I: Final Models
In this section, the final models together with photographs of the buildings are presented. Those models have been generated dynamically and without the intervention of a user. The outline of the buildings has been extracted from [25]. The CityGML models are available in [20].

Appendix B. Appendix II: Collected Data
In this section, we list all the data that were collected for the evaluation of our model. All the collected data are available in [20], and they include measurements from the following sensors: acceleration, gyroscope, pressure, light, proximity, GPS, magnetometer, pedometer, WiFi, pressure and GSM sensors. Table A1. Collected data from TUM main campus. The table shows the data acquisition date, the time, the visited floors (V Floors), the indicated temperature from AccuWeather (T A) and Google (T A) (unit: • C), the relative humidity from the same two sources (H A) and (H G), and the ambient pressure from AccuWeather (P A) (unit: Pa).

Date
Time V Floors T A T WC H A H WC P A P WC W S Cloud Cov.