An Accurate Visual-Inertial Integrated Geo-Tagging Method for Crowdsourcing-Based Indoor Localization

One of the unavoidable bottlenecks in the public application of passive signal (e.g., received signal strength, magnetic) fingerprinting-based indoor localization technologies is the extensive human effort that is required to construct and update database for indoor positioning. In this paper, we propose an accurate visual-inertial integrated geo-tagging method that can be used to collect fingerprints and construct the radio map by exploiting the crowdsourced trajectory of smartphone users. By integrating multisource information from the smartphone sensors (e.g., camera, accelerometer, and gyroscope), this system can accurately reconstruct the geometry of trajectories. An algorithm is proposed to estimate the spatial location of trajectories in the reference coordinate system and construct the radio map and geo-tagged image database for indoor positioning. With the help of several initial reference points, this algorithm can be implemented in an unknown indoor environment without any prior knowledge of the floorplan or the initial location of crowdsourced trajectories. The experimental results show that the average calibration error of the fingerprints is 0.67 m. A weighted k-nearest neighbor method (without any optimization) and the image matching method are used to evaluate the performance of constructed multisource database. The average localization error of received signal strength (RSS) based indoor positioning and image based positioning are 3.2 m and 1.2 m, respectively, showing that the quality of the constructed indoor radio map is at the same level as those that were constructed by site surveying. Compared with the traditional site survey based positioning cost, this system can greatly reduce the human labor cost, with the least external information.


Introduction
Nowadays, indoor localization has become a common issue for various location-based services and applications. A number of technologies have been proposed for indoor localization, which are based on different principles, such as Wi-Fi [1], geomagnetic [2], ultra wide band (UWB) [3], ultrasound [4] and so on. Among these localization technologies, ultrasound and UWB can be used to estimate the distance between the source and terminals, which can provide accurate localization results. However, such technologies require an extra deployment of localization devices, which restricts their large-scale applications. Many studies have focused on developing localization scheme that do not rely on extra to an indoor floor plan by using an activity recognition mechanism. However, this map matching mechanism highly depends on an assumption that all activities of a user occur at the special locations in an indoor space (e.g., intersections or corners). This assumption is vulnerable to the randomness of human activity. For example, if one person makes a free turn not at a special location, this method may match this activity to an incorrect location. Much effort has been made to improve the performance of trajectory estimation [31][32][33][34]. For example, the trajectory alignment and calibration method proposed in [32] can align a crowdsourcing trajectory into a coordinate system by using a foot-mounted inertial sensor and Wi-Fi RSS measurements. CrowdMap [33] jointly leverage crowdsourced sensor data and video data to track the movements of users. It takes the latest known GPS position as an initial location and the user can modify it if it is incorrect. The video frames can be used to improve the localization accuracy. Pan et al. [34] provides a collaborative filtering with graph-based semi-supervised learning method to estimate the location of user, as well as the location of APs. A training phase is needed to calibrate a probabilistic location estimation system. In summary, although progress has been made in pedestrian tracking and trajectory estimation, but the requirement of extra devices, user intervention, or prior knowledge limits the practical use of these approaches. It remains a question as to how to improve the accuracy and robustness of smartphone-based geo-tagging with least device or prior knowledge requirements.
Another problem is the diversity of smartphone devices. The collected wireless signal probably will be different for two smartphone devices even at a same spatial location due to the difference of built-in sensors in smartphone. It will obviously affect the geo-tagging accuracy of the Wi-Fi or geomagnetic clustering based crowdsourcing methods. However, it is difficult to meet the condition, where all crowdsourcing users have the same type of smartphone devices. Some studies have tried to solve this problem by calibrating RSS fingerprints that were collected by different devices. For example, in [29], a calibration matrix has been constructed for all supported devices. The elements from the matrix represent the linear regression relation between different devices. In [35], the diversity of devices and its effect on localization have been analyzed. A kernel function has been proposed to model the distribution of RSS. It applies local variations as a compensation for linear transformation between two devices. However, the accurately calibration of Wi-Fi fingerprints remains a question due to the inherent instability of the real world. Recently, channel state information (CSI) based indoor localization [36,37] have attracted much attention. When compared with RSS-based solution, CSI maintains more stability and sensitivity, which can provide detailed and fine-grain subcarrier information. For example, in [36], a CSI based indoor localization technique is developed by employing both the intrasubcarrier statistics features and the inter-subcarrier network features. Their results showed that it could achieve 96% classification accuracy. In [37], the proposed DeepFi system uses CSI (collected from three antennas) and the deep learning method to train the fingerprinting database. A probabilistic method is used in the online localization phase, which can achieve a mean error of about 1.8m. However, the CSI based indoor localization methods require a Wi-Fi network interface cards (NIC) to receive CSI signals, which is not a built-in part for the current smartphones. In summary, although significant effort has been devoted to solve the problem of device diversity, it remains an important issue for the practical application of crowdsourcing-based indoor and it has a negligible effect on the localization performance.
The present study proposes involving visual information in the geo-tagging of sampling points for crowdsourcing based localization scheme. The video frames that were collected by smartphone camera contain abundant visual information regarding different spatial location with an advantage of minimum difference in video frame in spite of diversity in smartphone devices. It has been observed that under different condition of smartphone camera (e.g., resolution ratio, size, etc.), the visual features that were extracted from these images can accurately reflect the spatial information of the sampling points. Consequently, this study attempts to develop a visual-based geo-tagging method for crowdsourcing-based indoor localization. This method can be used to reconstruct geometrical trajectories (associated with the collected signals) of multiple crowdsourcing users with different smartphone devices. Some prior knowledge, such as an existing database (e.g., Wi-Fi radio map), floorplan of the environment, and the initial locations of all crowdsourcing users are assumed to be unknown to improve the practicability of this method. A reference coordinate system (RCS) is defined for the geometry reconstruction of crowdsourced trajectories, which can be easily deployed in indoor spaces. The only requirement before reconstructing trajectories and indoor localization is several initial reference points (IRPs), which can be used as the origin of the RCS. An algorithm is also proposed to geo-tag the spatial location of sampling points from all the trajectories. A Wi-Fi radio map and an image dataset were constructed based on the crowdsourced trajectories after geo-tagging to evaluate the proposed method. The RSS-based localization accuracy and the image matching based localization accuracy demonstrate the effectiveness of this method. The proposed method can be used to accurately geo-tag the sampling points that were extracted from crowdsourced trajectories and generate different maps or datasets, such as radio map or geo-tagged image database, for indoor positioning. Based on the positioning results, various indoor location-based services can be provided to the public users, such as indoor navigation, intelligent parking, shopping mall or museum tourism, management of mobile objects, and so on. Besides, it can also be employed to collect and generate other indoor maps, such as noise map, illumination map, or other maps with the help of the corresponding sensors (e.g., PM 2.5). These maps can be used for indoor management, architectural design, or other analysis of indoor environment.
The remainder of this paper is organized, as follows. Section 2 describes the theoretical framework of the visual-inertial integrated geo-tagging method and the principle of the visual-based method for trajectory geometry recovery. Section 3 describes the spatial estimation of crowdsourced trajectories and the construction of radio map. Section 4 presents and discusses the experimental results. Finally, Section 5 summarizes the main conclusions and future work. Figure 1 illustrates the framework of this method. Smartphones are used to collect various sensor data, including video frames, inertial sensors data, Wi-Fi RSS, etc. In this study, some volunteers are required to collect the experiment data with smartphone. During data collection, the volunteers hold a smartphone in hand and walk normally in an indoor area. Firstly, a trajectory reconstruction method was designed to geometrically recover the trajectories from the corresponding crowdsourcing data. Heading angle estimation, step detection and step length estimation are the necessary steps for recovery of trajectory geometry. Both the video frames and the inertial data were employed in order to improve the accuracy of trajectory geometry recovery. Image matching and structure from motion (SFM) methods were used to calculate heading direction of trajectory geometry. This method can recover relative location of trajectory sampling points without prior knowledge, such as floorplan or initial location of the trajectory. After trajectory geometry recovering, a trajectory calibration algorithm was proposed to spatially estimate the location of trajectory in the RCS. By using the initial reference point (IRP), the trajectory geometry can be calibrated into RCS. Importantly, sampling points from the calibrated trajectory can be used as supplementary reference points for the following crowdsourced trajectories. Finally, the sensor data that are associated with the sampling points (from the calibrated trajectories) can be used for the mapping of different database, such as Wi-Fi radio map or image database with spatial labels. During trajectory geometry recovery, the accurate estimation of heading angle of each sampling point from a trajectory is an important issue. The estimations by employing angular velocity (from gyroscope) are usually not accurate due to the drift error of smartphone sensors and the accuracy degradation over time [38]. The method that is proposed in this study uses an SFM-based method for the estimation of heading angle by using video frames. A sliding-window filter-based algorithm is proposed to improve further the performance of heading angle estimation. Finally, the geometry of a trajectory can be recovered by integrating the heading angle of each sampling point and the distance between every pair of adjacent sampling point.

Visual Based Trajectory Geometry Recovery
The proposed approach aims to estimate, at every sampling instant t, the pose of sampling point = ( , , ) with regard to the initial pose , where ( , ) represents the position of relative to the initial position ( , ), and is the orientation of relative to the initial heading angle . Note that the initial pose is unknown for each trajectory. Section 3 describes the spatial estimation of a trajectory in a coordinate system.

Heading angle estimation
The main idea of this method is that the heading angle of a sampling point can be represented by the heading angle of the image taken at this sampling point. Therefore, the heading angle change of a sampling point sequence can be estimated by calculating the heading angle change of the corresponding image sequence that is extracted from video frames. Similar to [8,11], we use the SFM-based method to estimate the heading angle change in sampling points from a trajectory. An image matching method was implemented on two adjacent images from image sequence. For a pair of images, the homogeneous matching points are used for calculating the fundamental matrix F: Where ( , , 1) , ( , , 1) are the homogeneous matching points from the image matching result , | = 1,2, … , F is a 3 × 3 order matrix. It is possible to linearly calculate the matrix F if there are enough matched points [39]. After obtaining the fundamental matrix, the essential matrix E can be calculated as: During trajectory geometry recovery, the accurate estimation of heading angle of each sampling point from a trajectory is an important issue. The estimations by employing angular velocity (from gyroscope) are usually not accurate due to the drift error of smartphone sensors and the accuracy degradation over time [38]. The method that is proposed in this study uses an SFM-based method for the estimation of heading angle by using video frames. A sliding-window filter-based algorithm is proposed to improve further the performance of heading angle estimation. Finally, the geometry of a trajectory can be recovered by integrating the heading angle of each sampling point and the distance between every pair of adjacent sampling point.
The proposed approach aims to estimate, at every sampling instant t, the pose of sampling point s t = (x t , y t , θ t ) with regard to the initial pose s 0 , where (x t , y t ) represents the position of s t relative to the initial position (x 0 , y 0 ), and θ t is the orientation of s t relative to the initial heading angle θ 0 . Note that the initial pose s 0 is unknown for each trajectory. Section 3 describes the spatial estimation of a trajectory in a coordinate system.

Heading Angle Estimation
The main idea of this method is that the heading angle of a sampling point can be represented by the heading angle of the image taken at this sampling point. Therefore, the heading angle change of a sampling point sequence can be estimated by calculating the heading angle change of the corresponding image sequence that is extracted from video frames. Similar to [8,11], we use the SFM-based method to estimate the heading angle change in sampling points from a trajectory. An image matching method was implemented on two adjacent images from image sequence. For a pair of images, the homogeneous matching points are used for calculating the fundamental matrix F:  [39]. After obtaining the fundamental matrix, the essential matrix E can be calculated as: where K is the intrinsic matrix of a smartphone camera, which can be obtained based on the MATLAB Camera Calibrator (MATLAB 8.x on Windows) [40]. The rotation matrix R can be calculated by utilizing singular value decomposition (SVD) of essential matrix E. Importantly, the heading angle of a sampling point can be expressed by a rotation matrix: where ∆θ is the heading angle change of sampling point P t relative to the last sampling point P t−1 . The schematic diagram of this SFM-based heading angle estimation method is shown in Figure 2. If the initial sampling point heading angle is θ 0 , the heading angle of sampling point P t can be calculated as: Remote Sens. 2019, 11, x FOR PEER REVIEW 6 of 23 Where K is the intrinsic matrix of a smartphone camera, which can be obtained based on the MATLAB Camera Calibrator (MATLAB 8.x on Windows) [40]. The rotation matrix R can be calculated by utilizing singular value decomposition (SVD) of essential matrix E. Importantly, the heading angle of a sampling point can be expressed by a rotation matrix: Where ∆ is the heading angle change of sampling point relative to the last sampling point . The schematic diagram of this SFM-based heading angle estimation method is shown in Figure  2. If the initial sampling point heading angle is , the heading angle of sampling point can be calculated as: = + ∑ ∆ (4) Figure 2. the schematic diagram of structure from motion (SFM)-based heading angle estimation.

Trajectory geometry recovery
In this study, the aim of trajectory geometry recovery is to estimate the relative location of each sampling point from a trajectory. The relative location of a sampling point can be calculated, as follows: where ( , ) is the location of sampling point , is the heading angle of sampling point , and ∆ is the heading angle change of that is relative to . L is the distance between and . The SFM-based heading angle estimation method can be used to calculate the ∆ of a sampling point, where accuracy of this method depends on the performance of image matching. The accuracy will be affected if the quality of video frames is poor. To solve this issue, the collected inertial data is also used to estimate the heading angle, which is independent from the visual estimation results.
Usually, smartphone gyroscope-based heading angle estimation can be calculated as integral of the angular velocity (rad/s) with respect to time. The frequency of smartphone gyroscope is more than 100HZ, which is higher than video frame 30fps. This method is only suitable for estimating heading angles in a short-term condition due to the drift error in smartphone gyroscope. With the increase in the integration time, the error of the heading angle rapidly and continuously grows. In order to avoid this problem, we have employed different strategies for different route segments. As

Trajectory geometry recovery
In this study, the aim of trajectory geometry recovery is to estimate the relative location of each sampling point from a trajectory. The relative location of a sampling point can be calculated, as follows: where (x t , y t ) is the location of sampling point P t , θ t−1 is the heading angle of sampling point P t−1 , and ∆θ is the heading angle change of P t that is relative to P t−1 . L is the distance between P t and P t−1 . The SFM-based heading angle estimation method can be used to calculate the ∆θ of a sampling point, where accuracy of this method depends on the performance of image matching. The accuracy will be affected if the quality of video frames is poor. To solve this issue, the collected inertial data is also used to estimate the heading angle, which is independent from the visual estimation results. Usually, smartphone gyroscope-based heading angle estimation can be calculated as integral of the angular velocity (rad/s) with respect to time. The frequency of smartphone gyroscope is more than Remote Sens. 2019, 11, 1912 7 of 23 100 HZ, which is higher than video frame 30 fps. This method is only suitable for estimating heading angles in a short-term condition due to the drift error in smartphone gyroscope. With the increase in the integration time, the error of the heading angle rapidly and continuously grows. In order to avoid this problem, we have employed different strategies for different route segments. As shown in Figure 3, a pedestrian trajectory consists of two types of segments: turning segments (TSs) and non-turning segments (NTSs). A TS segment refers to a turning period with a relatively long turning time; an NTS segment refers to a straight (or approximately straight) walking period that may contain several slight turning actions with very short turning times. The strategy of our method is described as follows: Remote Sens. 2019, 11, x FOR PEER REVIEW 7 of 23 2) for TS sampling points with image-matching failure: the gyroscope-based method is used for heading angle estimation.
3) for NTS sampling point: the lowest from two outputs (SFM-based and gyroscope-based) is used as the final output.
Based on the integration of the two sources, the robustness of the heading angle estimation method can be improved, especially when there is failure of image matching. One important issue with this method is to accurately detect each turning moment (i.e., the joining point between each pair of TS segment and NTS segment) of a trajectory. When considering the high sampling rates (100 Hz) of gyroscope, this method uses gyroscope readings to detect turning moments. As shown in Figure 3, the angular velocity from an NTS segment fluctuates slightly around zero. However, for a TS segment, the angular velocity is always higher (or lower) than zero and the absolute value is much higher as compared to the NTS segment. According this regularity, a sliding-window filter is designed to detect the starting and ending moments of each turning action of a trajectory, detailed description can be found as follows:  (1) for each TS sampling point (i.e., sampling points from a TS segment): the SFM-based method is used to estimate the heading angle if there is no image-matching failure.
(2) for TS sampling points with image-matching failure: the gyroscope-based method is used for heading angle estimation.
(3) for NTS sampling point: the lowest from two outputs (SFM-based and gyroscope-based) is used as the final output.
Based on the integration of the two sources, the robustness of the heading angle estimation method can be improved, especially when there is failure of image matching.
One important issue with this method is to accurately detect each turning moment (i.e., the joining point between each pair of TS segment and NTS segment) of a trajectory. When considering the high sampling rates (100 Hz) of gyroscope, this method uses gyroscope readings to detect turning moments. As shown in Figure 3, the angular velocity from an NTS segment fluctuates slightly around zero. However, for a TS segment, the angular velocity is always higher (or lower) than zero and the absolute value is much higher as compared to the NTS segment. According this regularity, a sliding-window filter is designed to detect the starting and ending moments of each turning action of a trajectory, detailed description can be found as follows: The input of Algorithm 1 includes gyroscope-based heading angle estimations and a sliding window. The gyroscope-based angles are calculated by integrating the angular velocity readings with respect to the timespan between two sampling points. Similar to [41], the value of the sliding window was set to 50 in the experiment. The output of the Algorithm 1 is a two-dimensional vector Turn, which records the starting and ending moment for each TS segment. The main idea of Algorithm 1 is to monitor the fluctuation of gyroscope angles within the sliding window. If all of the gyroscope angles within the sliding window are in the same interval (>0ˆ•or <0ˆ•), the first moment of the sliding window can be treated as a candidate for the starting or ending moment of a TS segment, which are stored in vector Turn_S and Turn_E, respectively. For a TS segment, the first candidate in Turn_S is treated as its starting moment. Similarly, the first candidate in Turn_E is treated as its ending moment.
Based on the turning moment detection algorithm, the visual and inertial estimations can be integrated to calculate the heading angle for each sampling point from a trajectory. Walking steps can be detected by a peak and valley detection algorithm [38], which is an important step in restricting the distance between the adjacent sampling points. The length of each step can be estimated based on a Weinberg model [42]: where step_length i is the length of the i-th step of a trajectory (i.e., step i ), A max and A min are the maximum and minimum values of the Z-axis acceleration during one step period. K is the ratio of the real distance and the estimated distance. After the estimation of steps and step length, the location of the trajectory sampling points can be calculated. When considering the high sampling rate of gyroscope, we assumed the sampling points are equally spaced in a step. The distance between two sampling points from a trajectory can be calculated as: where distance t−1,t is the distance between two adjacent sampling points P t and P t−1 , which are in the i-th step of a trajectory. The S P i is a set of sampling points within step i , n is the number of sampling points in S P i .

Trajectory Calibration and Geo-Tagging
In this section, a method is proposed to estimate spatial a trajectory in a reference coordinate system. Each sampling point from a trajectory can be geo-tagged while using an iterative algorithm. The geotagged sampling points, which are associated with the corresponding RSS and image data, can be used to construct multisource datasets for indoor localization.

Indoor Reference Coordinate System
In most cases, indoor location-based services and applications mainly focus on the location of targets in the local coordinate system (e.g., inside a building), but not the location in a world coordinate system (e.g., WGS84). In this study, firstly, we define a trajectory geometry coordinate system (GCS), which represents the location of a trajectory relative to its initial location. It uses the initial location of a trajectory as its origin and the X and Y axes directly along the east and north, respectively. Subsequently, a two-dimensional reference coordinate system (RCS) is defined to determine the location of a point in the whole indoor space. It uses the location of an initial reference point (IRP) as its origin. Note that the IRP can be arbitrarily selected from an indoor space. For local applications (e.g., navigation services in a building), there is no need to measure its location in a world coordinate system. For global applications, the location of the whole indoor space can be estimated based on the global location of an IRP. The purpose of using IRP is to reduce the application difficulty for the geo-tagging method: it is difficult for many participants to measure the global location of the collected sampling points in a crowdsourcing condition. By using the IRP as an origin, the location of all the collected sampling points (from different participants) can be estimated in the RCS using the proposed algorithm. The location of an IRP needs to be measured once only. For the geo-tagging algorithm, several images are collected at the IRP (in different directions) and they are used as reference images. The algorithm can also be implemented in a multiple IRP condition. The influence of the number of IRP will be evaluated in Section 4. Prior knowledge, such as a floorplan or the initial location of the crowdsourced trajectories, is not required for the geo-tagging algorithm.
The main idea of this algorithm is to determine whether a trajectory crosses a reference point by matching image keypoints between the reference images (images of reference points) and the sampling images (images from a trajectory). If there is a matching success, the location of the sampling points (from the trajectory) can then be estimated in the RCS by using the bundle adjustment (BA) algorithm [43]. More importantly, the spatially estimated sampling points can also be used as reference points, called supplementary reference points (SRPs), to estimate the location of the sampling points from the following trajectories. The sampling images of the SRPs can be directly used as their reference images. The coverage of reference points, including IRPs and SRPs, continuously increases with the increase in crowdsourced trajectories, which makes the algorithm more efficient and robust.

Geo-Tagging Sampling Points in Reference Coordinate System
To geo-tag sampling points from a trajectory, keypoints from each sample image are matched against those from the reference images (i.e., reference images of IRPs and SRPs) by using the image matching technique that is detailed in Section 2. If the number of successfully matched keypoints is higher than a threshold r, the reference point is used to estimate the location of each sampling point (from the trajectory) by using a BA method. The main idea of BA is to calculate the three-dimensional (3D) location of keypoints and to refine the relative location between images by minimizing the projection error of the keypoints and the tracked keypoints on the images. The result of BA is the optimal 3D location of keypoints and the relative pose among the cameras. The spatial relation between a reference point and a sampling point can be represented by the cameras' relative pose that is calculated by the use of BA. If a sampling point is successfully matched with reference point, the location of all the sampling points from the trajectory can be estimated in the RCS. The sampling points from the estimated trajectory will be used as SRPs for the following trajectories.
As shown in Figure 4a, Tr1 is a trajectory, and its geometry has been reconstructed by using the method that is proposed in Section 2. An image-matching method is used to find the best matching result (with the highest number of matched keypoints) between the sampling point image P i and the initial reference point image P s . Note that its adjacent sampling point P i+1 will also be selected as a candidate that may across the IRP P s if the matching result between P i+1 and P s is higher as compared to P i−1 and P s . Otherwise, P i−1 is selected. In Figure 4b, the optimal 3D point cloud and the relative location among P i , P i+1 , and P s are calculated based on the BA method. After the BA process, the 3D coordinates of the two sampling points are: (x i , y i , z i ), (x i+1 , y i+1 , z i+1 ), respectively. The 3D coordinates should be transformed to the coordinates in the RCS due to the lacking of scale information for the BA method. The scale parameter between the two coordinate systems can be calculated, as follows: where σ is the scale parameter of BA, D(i, i + 1) is the distance between P i and P i+1 . Therefore, the coordinates of a sampling point P i in the RCS can be calculated, as follows: where (X i , Y i ) is the coordinates of P i in the RCS, β is the heading angle of the matched reference image of IRP P s , and (x i , y i ) is the two-dimensional (2D) projection of the 3D coordinates (x i , y i , z i ) in the RCS. In equation (9), the coordinates of P i+1 in the RCS are represented as (X i+1 , Y i+1 ). The coordinate of P i and P i+1 in Section 2 can be calculated as (gx i , gy i ) and (gx i+1 , gy i+1 ). The transformation parameters, including the rotation angle ϑ and the shiftings (t x , t y ), can be calculated according to the coordinates of P i and P i+1 in the GCS and the RCS [44]. Based on the location estimation results of P i , the location of each sampling point from the trajectory can be estimated in the RCS, as follows: where (X i , Y i ) is the coordinates of the sampling point in the RCS, (gx i , gy i ) is the coordinates of the sampling point in the GCS. Figure 4c shows an example of a recovered trajectory in the RCS. As shown in Figure 4(a), Tr1 is a trajectory, and its geometry has been reconstructed by using the method that is proposed in Section 2. An image-matching method is used to find the best matching result (with the highest number of matched keypoints) between the sampling point image and the initial reference point image . Note that its adjacent sampling point will also be selected as a candidate that may across the IRP if the matching result between and is higher as compared to and . Otherwise, is selected. In Figure 4(b), the optimal 3D point cloud and the relative location among , , and are calculated based on the BA method. After the BA process, the 3D coordinates of the two sampling points are: ( , , ),( , , ), respectively. The 3D coordinates should be transformed to the coordinates in the RCS due to the lacking of scale information for the BA method. The scale parameter between the two coordinate systems can be calculated, as follows: Where σ is the scale parameter of BA, D(i, i + 1) is the distance between and . Therefore, the coordinates of a sampling point in the RCS can be calculated, as follows: Where ( , ) is the coordinates of in the RCS, is the heading angle of the matched reference image of IRP , and ( , ) is the two-dimensional (2D) projection of the 3D coordinates ( , , ) in the RCS. In equation (9), the coordinates of in the RCS are represented as ( , ). The coordinate of and in Section 2 can be calculated as ( , ) and ( , ). The transformation parameters, including the rotation angle ϑ and the shiftings ( t , t ), can be calculated according to the coordinates of and in the GCS and the RCS [44]. Based on the location estimation results of , the location of each sampling point from the trajectory can be estimated in the RCS, as follows: where ( , ) is the coordinates of the sampling point in the RCS, ( , ) is the coordinates of the sampling point in the GCS. Figure 4(c) shows an example of a recovered trajectory in the RCS.
Once a trajectory has been estimated successfully in the RCS, its sampling points can be used as SRPs. This type of reference points is used to estimate the trajectories that do not cross IRP. As shown in Figure 5  Once a trajectory has been estimated successfully in the RCS, its sampling points can be used as SRPs. This type of reference points is used to estimate the trajectories that do not cross IRP. As shown in Figure 5, a trajectory may cross multiple SRPs from different trajectories. To increase the robustness of this method, the location of a trajectory is calculated as the average of the estimation results by using each SRP. For example, utilizing supplementary reference point S 1 , the coordinates of the sampling points can be calculated as where k is the number of sampling points in this trajectory. For example, using SRP S j , the coordinates of each sampling point from a trajectory can calculated as: where k is the number of sampling points from this trajectory. If there are m SRPs, the coordinates of a sampling point can be estimated in the RCS, as follows: where (X i , Y i ) is the coordinates of the i-th sampling point, m is the number of SRPs that cross the trajectory.
Remote Sens. 2019, 11, x FOR PEER REVIEW 11 of 23 = ( + + ⋯ + ) Where ( , ) is the coordinates of the i-th sampling point, m is the number of SRPs that cross the trajectory. The algorithm for trajectory estimation in the RCS is described in Algorithm 2. The inputs of this algorithm include N trajectories (geometry recovered), and at least one IRP where estimated trajectories are the outputs in RCS. The number of available SRPs continuously increases as the iteration of the algorithm. The algorithm ends when all trajectories are estimated. The algorithm for trajectory estimation in the RCS is described in Algorithm 2. The inputs of this algorithm include N trajectories (geometry recovered), and at least one IRP where estimated trajectories are the outputs in RCS. The number of available SRPs continuously increases as the iteration of the algorithm. The algorithm ends when all trajectories are estimated.

Generating Multi-Source Datasets for Indoor Positioning
After geo-tagging, the sampling points from crowdsourced trajectories can be used to generate datasets, including Wi-Fi radio map and geo-tagged image datasets. Table 1 shows the attributes of sampling points. The collected images, which are associated with their spatial location and direction attributes, can be directly used to generate geo-tagged image datasets. In order to reduce the time cost required for image matching-based indoor localization, a spatial index is constructed for the geo-tagged images. In order to construct Wi-Fi radio map, an indoor space can be partitioned into many regular grids. The center of a grid is treated as the location of a Wi-Fi fingerprint. As this method is designed for crowdsourcing-based data collection and localization systems, it is assumed that the spatial distribution of fingerprints in an indoor space is not equal. Some grids may be passed through by multiple crowdsourced trajectories. Additionally, some grids may not be covered by any trajectory. In the first condition, the sampling points from all related trajectories can be integrated to generate fingerprints, we defined as integrated fingerprint. If there are m sampling points within the spatial extent of a grid, the RSS of AP i for this grid can be calculated, as follows: where rss i is the RSS value of AP i, G is the set of AP in the grid, rss k i is the rss i of the k-th sampling point. If a grid does not contain any sampling point, we defined as interpolated fingerprint, the RSS of this grid can be interpolated by using its four-neighborhoods: where rss i is the interpolated RSS value of AP i, RSS{} is the four-neighborhoods grid of interpolated fingerprint, j is the index of a grid, d is the distance between interpolated fingerprint and grid j, w(x) is the weight function which inverse the distance. The purpose of interpolation is to generate a radio map for indoor localization when space has not been completely covered by crowdsourced sampling points. The interpolated RSS of a grid will be replaced by an actual measured RSS when it is covered by the following trajectories.

Evaluation
In this section, several experiments are designed to evaluate the performance of the proposed method for indoor geo-tagging and positioning. As shown in Figure 6, a typical indoor environment with 106×61 m was selected as the experimental area with long corridors and wide areas. Two android based smartphones (SUMSUNG and HUAWEI) were used for collecting data, including inertial data, video frames and Wi-Fi RSS. The sampling rates of inertial data was 100 HZ, and the sampling frequency of video frame was 30 frames per second (FPS). During data collection, five volunteers (three males and two females) were invited to collect the experimental data. There are 10 different trajectories and each volunteer performed two trajectories. The participants held a smartphone vertically in front of them and kept the camera forward facing. Three IRPs were selected from the experimental area, as shown in Figure 6.
Remote Sens. 2019, 11, x FOR PEER REVIEW 14 of 23 smartphone vertically in front of them and kept the camera forward facing. Three IRPs were selected from the experimental area, as shown in Figure 6. After data collection, the geometry recovery and geo-tagging of the trajectories were performed offline while using a laptop (4-core i7 CPU and 8G RAM). Based on the result of trajectory calibration and geo-tagging, a Wi-Fi radio map and a geo-tagged image dataset were constructed. An RSS based localization and an image based localization experiments were performed to evaluate the quality of the geo-tagged datasets.

Evaluation of trajectory estimation
In this experiment, ten different trajectories were collected in the study area. As shown in Figure 7(a), some markers with known location were set along each trajectory to collect the ground-truth data. The visual results of trajectories recovering are shown in Figure 7(b). The average geo-tagging error of sampling points from ten trajectories is around 0.6m, the standard deviation of location error is about 0.4m. The computation time of recovering 10 trajectories geometry is about 5.7 minutes. The results were compared with gyroscope-based method and the SFM-based method in order to further evaluate the performance of the geometry recovery method. The gyroscope based method only uses the gyroscope data from smartphone to calculate the heading angle and restore walking trajectory. The SFM-based method employs a SFM process to estimate the heading angle using video frames. The shape discrepancy metric (SDM) [45] was used to verify the accuracy of these methods, which is defined as the Euclidean distance between a sampling point and its After data collection, the geometry recovery and geo-tagging of the trajectories were performed offline while using a laptop (4-core i7 CPU and 8G RAM). Based on the result of trajectory calibration and geo-tagging, a Wi-Fi radio map and a geo-tagged image dataset were constructed. An RSS based localization and an image based localization experiments were performed to evaluate the quality of the geo-tagged datasets.

Evaluation of Trajectory Estimation
In this experiment, ten different trajectories were collected in the study area. As shown in Figure 7a, some markers with known location were set along each trajectory to collect the ground-truth data. The visual results of trajectories recovering are shown in Figure 7b. The average geo-tagging error of sampling points from ten trajectories is around 0.6m, the standard deviation of location error is about 0.4 m. The computation time of recovering 10 trajectories geometry is about 5.7 min. smartphone vertically in front of them and kept the camera forward facing. Three IRPs were selected from the experimental area, as shown in Figure 6. After data collection, the geometry recovery and geo-tagging of the trajectories were performed offline while using a laptop (4-core i7 CPU and 8G RAM). Based on the result of trajectory calibration and geo-tagging, a Wi-Fi radio map and a geo-tagged image dataset were constructed. An RSS based localization and an image based localization experiments were performed to evaluate the quality of the geo-tagged datasets.

Evaluation of trajectory estimation
In this experiment, ten different trajectories were collected in the study area. As shown in Figure 7(a), some markers with known location were set along each trajectory to collect the ground-truth data. The visual results of trajectories recovering are shown in Figure 7(b). The average geo-tagging error of sampling points from ten trajectories is around 0.6m, the standard deviation of location error is about 0.4m. The computation time of recovering 10 trajectories geometry is about 5.7 minutes. The results were compared with gyroscope-based method and the SFM-based method in order to further evaluate the performance of the geometry recovery method. The gyroscope based method only uses the gyroscope data from smartphone to calculate the heading angle and restore walking trajectory. The SFM-based method employs a SFM process to estimate the heading angle using video frames. The shape discrepancy metric (SDM) [45] was used to verify the accuracy of these methods, which is defined as the Euclidean distance between a sampling point and its The results were compared with gyroscope-based method and the SFM-based method in order to further evaluate the performance of the geometry recovery method. The gyroscope based method only uses the gyroscope data from smartphone to calculate the heading angle and restore walking trajectory. The SFM-based method employs a SFM process to estimate the heading angle using video frames.
The shape discrepancy metric (SDM) [45] was used to verify the accuracy of these methods, which is defined as the Euclidean distance between a sampling point and its corresponding ground-truth point. Figure 8 shows the cumulative distribution function (CDF) of the SDM for 10 trajectories while using the three methods. The SDM error in gyroscope-based method is much higher as compared to other methods. For the SFM-based method, the maximum SDM error is about 3 m; the 80-percentile SDM error is around 2 m; and, the mean SDM error is about 1.1 m. This indicates that visual information can help to improve the estimation performance of the trajectory recovery. Moreover, the SDM error can be further reduced by integrating both visual and inertial information: the maximum SDM error is about 1.5 m; the 80-percentile SDM error is around 1 m; and, the mean SDM error is about 0.6 m. Figure 9 shows the increasing speed of trajectory recovery error using different methods. Three routes (#1, #2, and #3) are taken as examples. As can be seen from the figure, the increasing speed of the SFM-based method and the proposed method is obviously slower than the gyroscope-based method. These results demonstrate that the fusion of the visual and inertial information helps to overcome the shortcomings in a single-source based method, e.g., drift error from the gyroscope or matching failure of images. Furthermore, the experimental trajectories covered wide spaces in the study area. The results demonstrate that this approach can perform well in a wide indoor space, which may be difficult for the PDR-based methods.
Remote Sens. 2019, 11, x FOR PEER REVIEW 15 of 23 corresponding ground-truth point. Figure 8 shows the cumulative distribution function (CDF) of the SDM for 10 trajectories while using the three methods. The SDM error in gyroscope-based method is much higher as compared to other methods. For the SFM-based method, the maximum SDM error is about 3 m; the 80-percentile SDM error is around 2 m; and, the mean SDM error is about 1.1 m. This indicates that visual information can help to improve the estimation performance of the trajectory recovery. Moreover, the SDM error can be further reduced by integrating both visual and inertial information: the maximum SDM error is about 1.5 m; the 80-percentile SDM error is around 1 m; and, the mean SDM error is about 0.6 m. Figure 9 shows the increasing speed of trajectory recovery error using different methods. Three routes (#1, #2, and #3) are taken as examples. As can be seen from the figure, the increasing speed of the SFM-based method and the proposed method is obviously slower than the gyroscope-based method. These results demonstrate that the fusion of the visual and inertial information helps to overcome the shortcomings in a single-source based method, e.g., drift error from the gyroscope or matching failure of images. Furthermore, the experimental trajectories covered wide spaces in the study area. The results demonstrate that this approach can perform well in a wide indoor space, which may be difficult for the PDR-based methods.  Using the algorithm proposed in Section 3, the recovered crowdsourcing trajectories can be geo-tagged in the RCS. Note that the initial locations of these trajectories were unknown for the algorithm. As shown in Figure 10(b), three IRPs (points A, B, and C) were set in the study area. Each IRP was associated with 12 reference images (intervals of 30°). Firstly, the algorithm uses one IRP (A) to estimate these trajectories and evaluate the performance of geo-tagging. After that, the two others IRPs are also used to test the influence of the number of IRPs on trajectory estimation and geo-tagging.
The spatial estimation results of all trajectories are shown in Table 2. The results were evaluated by the maximum, minimum, average estimation error of the ground-truth points, and the computation time of each trajectory. The average error of all the trajectories is 1.03 m. The computation time for estimating these trajectories is 9.3 minutes. corresponding ground-truth point. Figure 8 shows the cumulative distribution function (CDF) of the SDM for 10 trajectories while using the three methods. The SDM error in gyroscope-based method is much higher as compared to other methods. For the SFM-based method, the maximum SDM error is about 3 m; the 80-percentile SDM error is around 2 m; and, the mean SDM error is about 1.1 m. This indicates that visual information can help to improve the estimation performance of the trajectory recovery. Moreover, the SDM error can be further reduced by integrating both visual and inertial information: the maximum SDM error is about 1.5 m; the 80-percentile SDM error is around 1 m; and, the mean SDM error is about 0.6 m. Figure 9 shows the increasing speed of trajectory recovery error using different methods. Three routes (#1, #2, and #3) are taken as examples. As can be seen from the figure, the increasing speed of the SFM-based method and the proposed method is obviously slower than the gyroscope-based method. These results demonstrate that the fusion of the visual and inertial information helps to overcome the shortcomings in a single-source based method, e.g., drift error from the gyroscope or matching failure of images. Furthermore, the experimental trajectories covered wide spaces in the study area. The results demonstrate that this approach can perform well in a wide indoor space, which may be difficult for the PDR-based methods.  Using the algorithm proposed in Section 3, the recovered crowdsourcing trajectories can be geo-tagged in the RCS. Note that the initial locations of these trajectories were unknown for the algorithm. As shown in Figure 10(b), three IRPs (points A, B, and C) were set in the study area. Each IRP was associated with 12 reference images (intervals of 30°). Firstly, the algorithm uses one IRP (A) to estimate these trajectories and evaluate the performance of geo-tagging. After that, the two others IRPs are also used to test the influence of the number of IRPs on trajectory estimation and geo-tagging.
The spatial estimation results of all trajectories are shown in Table 2. The results were evaluated by the maximum, minimum, average estimation error of the ground-truth points, and the computation time of each trajectory. The average error of all the trajectories is 1.03 m. The computation time for estimating these trajectories is 9.3 minutes. Using the algorithm proposed in Section 3, the recovered crowdsourcing trajectories can be geo-tagged in the RCS. Note that the initial locations of these trajectories were unknown for the algorithm. As shown in Figure 10b, three IRPs (points A, B, and C) were set in the study area. Each IRP was associated with 12 reference images (intervals of 30 • ). Firstly, the algorithm uses one IRP (A) to estimate these trajectories and evaluate the performance of geo-tagging. After that, the two others IRPs are also used to test the influence of the number of IRPs on trajectory estimation and geo-tagging.
SRPs, which continuously increases the coverage of the reference points in the study area. Note that although trajectories #6 and #7 also intersected with trajectory #1, the intersection relationship were not detected by the algorithm. This may be due to the orientation of the sample images of trajectory #6 (or #7), as it was not consistent with trajectory #1. As shown in Figure 10    The spatial estimation results of all trajectories are shown in Table 2. The results were evaluated by the maximum, minimum, average estimation error of the ground-truth points, and the computation time of each trajectory. The average error of all the trajectories is 1.03 m. The computation time for estimating these trajectories is 9.3 min. As shown in Figure 10, the trajectories that cross an IRP (e.g., trajectories #1, #2, and #3) are termed IRP trajectories. Similarly, the trajectories that cross SRPs are termed SRP trajectories (e.g., trajectories #4-#10). The maximum, minimum, and average error of the IRP trajectories (1.45 m for #3, 0.36 m for #2, 0.85 m for #3) are clearly smaller than those of the SRP trajectories (3.05 m for #8, 0.77 m for #5, 1.55 m for #7), respectively. Nevertheless, the average error of all the SRP trajectories is below 1.56 m, which suggested that the proposed algorithm could achieve reasonable estimation results under the condition of only one reference point. By using SRPs, the spatial location of the SRP trajectories can also be estimated in the RCS. Figure 10c-i shows the estimation process of all the trajectories. IRP A was firstly used for the estimation of trajectories. As shown in Figure 10c, only three trajectories (#1, #2, and #3) crossed IRP A. By using the method described in Section 3.2, these trajectories were first estimated in the RCS and the sampling points from the trajectories can be used as SRPs. By verifying relation between examined trajectories (nos. #1, #2, #3) and unexamined trajectories (nos. #4-#10), it was found that trajectories #4 and #5 are intersected with trajectory #1 at SRPs D and E, respectively (Figure 10d-e). The sampling points from the newly examined trajectories can also be used as SRPs, which continuously increases the coverage of the reference points in the study area. Note that although trajectories #6 and #7 also intersected with trajectory #1, the intersection relationship were not detected by the algorithm. This may be due to the orientation of the sample images of trajectory #6 (or #7), as it was not consistent with trajectory #1. As shown in Figure 10f-i, after several iterations, the remaining trajectories were all calibrated in the RCS.
To test the influence of the number of IRPs on the trajectory estimation, two other IRPs (B and C) have been added to the environment, as shown in Figure 10b. Same trajectories were estimated by the algorithm by using three IRPs. The average error of all trajectories reduced from 1.03 m (one IRP) to 0.67 m (three IRPs). As shown in Figure 11, after addition of two IRPs, the average error of trajectories #2, #6, #7, and #10 reduced from 0.77 m, 0.98 m, 1.12 m, and 1.28 m to 0.69 m, 0.73 m, 0.61 m, and 0.7 m, respectively. Figure 12 shows the increasing speed of trajectory estimation error. Four routes (#2, #6, #7, and #10) are taken as examples. As can be seen from the figure, once a trajectory passes through a reference point, the location error obviously reduces. The results show that the increase in number of IRPs helps to further improve the performance of the trajectory estimation. However, more IRPs also require more workload for the collection of reference points, including their location and reference images. Accordingly, this may increase the difficulty of crowdsourcing-based indoor positioning systems. Therefore, it is practical to set IRPs at the places where most people walk past, such as indoor intersections and entrances/exits, to reduce the number of required IRPs for large indoor environments.

Performance of Constructed Databases for Indoor Positioning
The sampling points from calibrated trajectory can be used to construct multisource databases for indoor positioning. In this section, two experiments are conducted to evaluate the quality of the constructed datasets, including RSS-based positioning test and image matching based positioning test.

RSS-Based Indoor Positioning
Fifty calibrated trajectories (ten trajectories are shown in Figure 10, where each trajectory is repeated five times) were used to construct an indoor RSS database. The study area was partitioned into a 2.4 m × 2.4 m mesh grid. By using the method described in Section 3.3, the fingerprints were generated based on the integration of sampling points. The results of the fingerprint generation are shown in Figure 13. There were 89 integrated fingerprints (based on the integration of sampling points) and 55 interpolated fingerprints. Figure 14 shows the RSS distribution of two APs. trajectory estimation. However, more IRPs also require more workload for the collection of reference points, including their location and reference images. Accordingly, this may increase the difficulty of crowdsourcing-based indoor positioning systems. Therefore, it is practical to set IRPs at the places where most people walk past, such as indoor intersections and entrances/exits, to reduce the number of required IRPs for large indoor environments.  trajectory estimation. However, more IRPs also require more workload for the collection of reference points, including their location and reference images. Accordingly, this may increase the difficulty of crowdsourcing-based indoor positioning systems. Therefore, it is practical to set IRPs at the places where most people walk past, such as indoor intersections and entrances/exits, to reduce the number of required IRPs for large indoor environments.

Performance of constructed databases for indoor positioning
The sampling points from calibrated trajectory can be used to construct multisource databases for indoor positioning. In this section, two experiments are conducted to evaluate the quality of the constructed datasets, including RSS-based positioning test and image matching based positioning test.

RSS-based indoor positioning
Fifty calibrated trajectories (ten trajectories are shown in Figure 10, where each trajectory is repeated five times) were used to construct an indoor RSS database. The study area was partitioned into a 2.4 m × 2.4 m mesh grid. By using the method described in Section 3.3, the fingerprints were generated based on the integration of sampling points. The results of the fingerprint generation are shown in Figure 13. There were 89 integrated fingerprints (based on the integration of sampling points) and 55 interpolated fingerprints. Figure 14 shows the RSS distribution of two APs.    To evaluate the quality of the constructed RSS database, a positioning test was conducted based on a weighted k-nearest neighbor method at the 66 test points (the centers of 66 grids). Each grid was covered by more than five trajectories. The reason to select these grids was to verify the improvement in the quality of constructed radio map with increase in crowdsourced trajectories. For comparison, a site survey process was also conducted based on the same mesh grids. The positioning error was calculated, as follows: To evaluate the quality of the constructed RSS database, a positioning test was conducted based on a weighted k-nearest neighbor method at the 66 test points (the centers of 66 grids). Each grid was covered by more than five trajectories. The reason to select these grids was to verify the improvement in the quality of constructed radio map with increase in crowdsourced trajectories. For comparison, a site survey process was also conducted based on the same mesh grids. The positioning error was calculated, as follows: where Err i is the location error of point i, x r i , y r i is the actual physical location of point i, and x e i , y e i is the estimated physical location of point i. Figure 15 shows the performance of the two methods (the site survey based method and the proposed method). R0 represents the localization error of the site survey based method. R1 to R5 represent the localization error of the proposed method. Here, R1 refers to the constructed radio map by using sampling points from one trajectory. Similarly, R2, R3, R4, and R5 refer to the radio maps constructed by using two, three, four, and five trajectories, respectively. As it can be seen from Table 3, the localization error of the site survey based method ranges from 0 to 4.9 m and the average error is 2.6 m. The average error of R1 is 4.3 m, which is higher as compared to R0. However, as the increase of the trajectories (from R1 to R5), the average error gradually decreases and reaches 3.2 m as the sampling points are extracted from five trajectories. It indicates that the quality of the constructed database is comparable to site survey based database provided with sufficient crowdsourced data. Once there are enough crowdsourced trajectories, the quality of the constructed radio map will become stable and it may not improve as the further increasing of crowdsourcing data. The proposed system can considerably reduce the human labor that is needed for database construction. Moreover, it performs well in wide indoor spaces, which increases the potential for applying this system to large indoor environments, such as shopping malls, underground parking garages, or supermarkets. Where Err is the location error of point i, ( , ) is the actual physical location of point i, and ( , ) is the estimated physical location of point i. Figure 15. Localization results of the two methods, R0 is the localization error of the site survey-based method, R1 to R5 is the localization error using the crowdsourced radio maps. Figure 15 shows the performance of the two methods (the site survey based method and the proposed method). R0 represents the localization error of the site survey based method. R1 to R5 represent the localization error of the proposed method. Here, R1 refers to the constructed radio map by using sampling points from one trajectory. Similarly, R2, R3, R4, and R5 refer to the radio maps constructed by using two, three, four, and five trajectories, respectively. As it can be seen from Table 3, the localization error of the site survey based method ranges from 0 to 4.9 m and the average error is 2.6 m. The average error of R1 is 4.3 m, which is higher as compared to R0. However, as the increase of the trajectories (from R1 to R5), the average error gradually decreases and reaches 3.2 m as the sampling points are extracted from five trajectories. It indicates that the quality of the constructed database is comparable to site survey based database provided with sufficient crowdsourced data. Once there are enough crowdsourced trajectories, the quality of the constructed radio map will become stable and it may not improve as the further increasing of crowdsourcing data. The proposed system can considerably reduce the human labor that is needed for database construction. Moreover, it performs well in wide indoor spaces, which increases the potential for applying this system to large indoor environments, such as shopping malls, underground parking garages, or supermarkets. By using the proposed method, each collected sampling point contains single image associated with the corresponding location and direction information. The geo-tagged images can be used to construct image datasets for indoor positioning while using image matching. Most of the image matching based positioning methods use similarity as the metric for location estimation. In the Figure 15. Localization results of the two methods, R0 is the localization error of the site survey-based method, R1 to R5 is the localization error using the crowdsourced radio maps.

Image Matching Based Indoor Positioning
By using the proposed method, each collected sampling point contains single image associated with the corresponding location and direction information. The geo-tagged images can be used to construct image datasets for indoor positioning while using image matching. Most of the image matching based positioning methods use similarity as the metric for location estimation. In the experiment, the number of matched keypoints were calculated to find similarity between a query image and the images from the constructed dataset. The location of image (from the dataset) with the highest number of matched keypoints has been used as the positioning result. In the experiment, 100 different images with known coordinates are used as query images. A SURF [46] based image matching method is used to calculate the similarity among query images and reference images. Table 4 shows the results. The average location error of image matching based method is 1.2 m, the accurate matching rate is 94%. As compared with other image matching based method [10], the proposed method achieves a relative higher accuracy and matching rate. This can be due to high spatial sampling rate along trajectory which helps to construct image datasets with high spatial resolution. The current methods also help to improve the performance of image matching and reduce the location error of image matching based positioning. As compared with [47], this method does not need laser backpack to construct a 3D model. The time and equipment cost of this method are relatively low, which is important to crowdsourcing-based data collection and indoor localization.

Conclusions
The collection and updating of the indoor positioning database are an unavoidable bottleneck for indoor localization. The traditional site survey is quite labor-intensive and time-consuming, which limits the indoor localization for its commercial and industrial application. In this paper, an efficient geo-tagging method is proposed for crowdsourcing-based indoor positioning. This method can recover the geometry of trajectories and spatially estimate the location of sampling points in the RCS.
Multi-source datasets can be geo-tagged and constructed by using this method in different types of indoor spaces, such as corridors, rooms, or wide spaces. It further minimizes the dependence on prior knowledge, such as floorplans or initial locations of crowdsourced trajectories, which makes the proposed method applicable. The experimental results demonstrated that the integration of visual and inertial information can improve the performance of trajectory recovery and geo-tagging significantly. The average location error of the RSS based positioning and image based positioning are 3.2 m and 1.2 m, respectively.
The proposed method can considerably reduce the workload needed for indoor positioning dataset constructing and updating. We believe that it could serve as a tool for crowdsourcing-based indoor positioning systems and facilitate the participation of the public in the collection of multi-source datasets. In future work, the energy and time cost for crowdsourcing-based data collection and geo-tagging will be studied, which is important for the practical use of the localization system.