A Crowd-Sourcing Indoor Localization Algorithm via Optical Camera on a Smartphone Assisted by Wi-Fi Fingerprint RSSI

Indoor positioning based on existing Wi-Fi fingerprints is becoming more and more common. Unfortunately, the Wi-Fi fingerprint is susceptible to multiple path interferences, signal attenuation, and environmental changes, which leads to low accuracy. Meanwhile, with the recent advances in charge-coupled device (CCD) technologies and the processing speed of smartphones, indoor positioning using the optical camera on a smartphone has become an attractive research topic; however, the major challenge is its high computational complexity; as a result, real-time positioning cannot be achieved. In this paper we introduce a crowd-sourcing indoor localization algorithm via an optical camera and orientation sensor on a smartphone to address these issues. First, we use Wi-Fi fingerprint based on the K Weighted Nearest Neighbor (KWNN) algorithm to make a coarse estimation. Second, we adopt a mean-weighted exponent algorithm to fuse optical image features and orientation sensor data as well as KWNN in the smartphone to refine the result. Furthermore, a crowd-sourcing approach is utilized to update and supplement the positioning database. We perform several experiments comparing our approach with other positioning algorithms on a common smartphone to evaluate the performance of the proposed sensor-calibrated algorithm, and the results demonstrate that the proposed algorithm could significantly improve accuracy, stability, and applicability of positioning.


Introduction
As the demands on location-based services (LBSs) increase, developing positioning systems (PSs) has generated great concern over the last decade; in particular, location information is an important basic feature in LBSs, and it is of great significance to get access to the users' location information anytime and anywhere; in addition, users are more and more concerned about the performance of PSs, such as accuracy and stability. Outdoors, the Global Navigation Satellite System (GNSS), including GPS, GLONASS, Galileo, and BDS, which works well in most outdoor applications, is a major way to get users' location information, and the accuracy is adequate for civilian use. Unfortunately, when users are in indoor environments, there exist some limitations, for example, lack of line of sight and multiple paths, both of which will cause satellite signal attenuation, so GNSS is not capable of providing services with sufficient positioning accuracy. At the same time, the widespread use and popularity of mobile devices have spurred extensive demands on indoor LBSs in recent years, which have greatly promoted the rapid development of entertainment, health, business, and other sectors,

Related Works
Indoor positioning plays an important role in providing the location of people and objects. There are three principles used in indoor positioning: triangulation, scene analysis, and proximity. Triangulation locates the target point's coordinates by using distances and angles between the target point and three other points with known coordinates, this principle includes Time of Arrival (TOA), Time Difference of Arrival (TDOA), Angle of Arrival (AOA), and Time of Flight (TOF). Scene analysis is a principle of positioning by analyzing the characteristics of the measured values, for example, received signal strength indicator (RSSI) is a popular scene analysis method to realize localization; available signals include Wi-Fi, radio frequency, geomagnetic, visible light, and so on. The proximity principle is mainly used in radio frequency-based positioning systems; in this technique, for a grid of antennas with fixed locations within a building, when a mobile node is detected, the closest antenna or the one receiving the strongest signal is selected to calculate the mobile node's location [29], for example, cellular localization based on Cell-ID is one of the popular applications.
To meet the demanded high accuracy, some researchers have used hardware-based solutions to get a precise distance to locate objects accurately, which is a range-based method. For instance, Seybold [30] calculated the true distances between router and smartphone with a logarithmic-distance path-loss model. Zou [31] applied a weighted path-loss model to RFID signals to implement indoor positioning, on the basis of Zou, Chen [32] adopted a Kalman filter algorithm fusing received strength of Wi-Fi fingerprints, PDR, and landmarks to locate the position indoors. Fidan [33] used a geometric cooperative technique and path-loss model to estimate in real time the time-consuming coefficient of path-loss in ranging sensors, the simulation results indicated that the proposed approach can effectively adapt to uncertainty of the loss factor in signal propagation. Yang [34] and Kotaru [35] measured the phase-distance relationship of the radio signal to locate position, the experimental results show that those methods could reach an accuracy of decimeter level.
However, all of the mentioned solutions based on hardware above have some disadvantages and limitations from the aspect of practicability on a smartphone, for example, it is hard to synchronize the clocks between many common measuring devices, and the outputs of sensors on smartphones have low accuracy and stability in general, which will directly influence positioning performance. Finally, high-quality equipment will lead to increased cost, it is obviously not suitable for a smartphone.
In contrast, other researchers, including Google, Microsoft, Apple, etc. used software-based solutions to improve positioning performance, whose advantages are more flexibility, high accuracy, low cost, and short development cycle, and have developed some software-based positioning systems. Among those, Wi-Fi fingerprint is a common range-free way with great popularity, and various methods and theories about it have been researched to solve those issues and limitations mentioned in the Introduction. Schussel [36] and Li [37] used the actual signal and Gaussian distribution process to generate some fingerprints to fill the whole database. Through this approach, the positioning problem in different densities and environments was solved. To reduce computational complexity and positioning accuracy on the online stage, clustering theory has been used; Liu [38] proposed a dynamic constraint KNN algorithm by using indoor layout geometry information to cluster, however, the proposed algorithm required the extraction of features of the specific indoor area, so it is hard to promote its use. Saha [39] used the relative size of RSSI as a metric for clustering the fingerprint and tried to solve the equipment diversity issue in crowd-sourcing. The proposed algorithm can achieve a higher accuracy, but its processing involving clustering is also complicated. Lee [40] proposed a support vector machine-based clustering approach and used the margin between two canonical hyperplanes for classification; it reserved the fingerprint distribution property and obtained a superior decision boundary, but did not take instantaneity into consideration. In addition, the drawback of a time-consuming and laborious database prebuilding in the offline stage also needs to be overcome, the accuracy will decrease if the database is not being updated in time, thus, some researchers proposed a crowd-sourcing mechanism [41,42] to update and supplement the database.
To reduce development cost for the positioning system and increase its practicability, researchers have used smartphones' built-in sensors to locate cooperatively. Keller [43] focused on the gyroscope and barometer in current smartphone to realize the indoor navigation with storey detection, because of their work, a position in three dimensions could be interpolated by the determination of a change of detected height with the knowledge of the relative motion distance or a prior known gradient of the stairs/ramp. Chen [32] utilized a Kalman filter to fuse Wi-Fi and sensors integrated in a phone as well as landmarks to finish locating; Galván-Tejada [44] adopted microphone, magnetometer, and light sensor to locate cooperatively; Shin [45] used Wi-Fi fingerprint and PDR to develop SmartSLAM, which locates pedestrians and constructs an indoor map at the same time. By using off-the-shelf smartphones and the methods mentioned above, they have realized indoor positioning, however, we have tested the output of sensors in common smartphones and found that the output values have low accuracy; even if we put the phone in a stationary state, the output fluctuated, therefore, we think that its low accuracy and stability will influence the performance of positioning systems.
Recent advances in charge-coupled device (CCD) sensor technologies have been applied to positioning and several visual positioning systems have been developed. Considering Wi-Fi fingerprint may have shadow areas where the Wi-Fi signal is weak, Song [46] proposed a new positioning technique by which captured images' features and their coordinate information are stored to form a fingerprint, then they used the SURF algorithm along with Random Sample Consensus (RANSAC) to realize image similarity comparisons. Their method worked both indoors and outdoors, including the shadow areas where Wi-Fi fingerprint signals are weak. However, the time consumption was not discussed. Kawaji [47] utilized omnidirectional panoramic images and a PCA-SIFT algorithm to locate, with the proposed confidence factor locality sensitive hashing; this could improve the image matching performance and speed up image retrieval, however, the method required omnidirectional image acquisition and a higher performance processor, and could not be implemented on a mobile phone. Considering positioning accuracy, time consumption, automation and scalability, Levchev [28] proposed a simultaneous fingerprinting and mapping system, which used a particle filtering algorithm to fuse data sampled from sensors in smart mobile devices. Its Android application could achieve an average localization error of under 2 m, but its map and database generation process was complex and time-consuming; Further, a so-called Mobile Visual Indoor Positioning System (MoVIPS) using smartphone cameras was proposed by Werner [26] in 2011 and extended in 2014 [27]. MoVIPS used a dead reckoning approach and estimated orientation factors from smartphone compasses to reduce the amount of images, then quantized and clustered the SURF feature points, and used a transformation matrix to improve the distance estimation, a making it therefore faster, more precise and more efficient.
Visual positioning systems with image matching theories can improve the accuracy, unfortunately, it is generally necessary to construct an image database to be recognized, and the matching process is always time-consuming; in particular, these algorithms should be improved for use on a smartphone with ideal accuracy and instantaneity. Inspired by the researched literature, we propose our algorithm with the following contributions: (1) In the preliminary positioning stage, we use a KWNN-based Wi-Fi fingerprint positioning algorithm to make a coarse estimation, then select K images corresponding to K-nearest neighbors as retrievable images to reduce the time spent in retrieving images. (2) After Wi-Fi fingerprint positioning, we introduce an orientation matching factor calculated using a three-dimensional orientation sensor on a smartphone to constrain random images choice for a coarse estimated location, then, we use the SIFT operator to extract image features to calculate the image matching factor on a multithread mechanism.
We propose a mean-weighted exponent algorithm to improve accuracy by fusing the KWNN algorithm, orientation matching factor, and image matching factor. (4) We use crowd-sourcing to update the database; users can upload their positioning results to a server, which can aids others' positioning the next time if the database has not recorded some RPs information in the offline stage. (5) Our algorithm has been realized in an Android operating system smartphone, and is easy to implement and apply. There is no need to use additional hardware assistance and the costs are greatly reduced.

Overview of the Proposed Sensors-Calibrated Positioning System
We propose a crowd-sourcing indoor localization algorithm via an optical camera and orientation sensor on a smartphone, with the purpose of simplifying indoor localization with high accuracy through the use of a Wi-Fi module and camera as well as the orientation sensor in the smartphone, and it does not need to be assisted by other hardware. The proposed system includes two stages: offline acquisition and online positioning stages. In the offline stage, the smartphone will scan available access points (APs) at different RPs, then record the RP's coordinates, AP's MAC address, and its RSSI values, while recording the front image of RPs and output values of the three-dimensional orientation sensor. In this paper, we use a relational database whose format is shown in Table 1 to store those data, and those images will be preprocessed in a server using the SIFT operator and extracting the features vector. In the online positioning stage, users open the Wi-Fi and scan accessible APs, record the MAC address and RSSI values in the cache of the phone, then users can choose to use Wi-Fi fingerprint positioning based on the KWNN algorithm to perform rough localization. In order to achieve higher accuracy, we use the built-in sensor to calibrate the positioning result of Wi-Fi fingerprint. The framework of this approach is shown in Figure 1. Users can open the camera to take a picture, and the three-dimensional orientation values will be recorded in the cache simultaneously. The features vector are extracted from the captured images and recorded temporarily in the phone cache. The orientation values will be used to calculate the orientation factor, which can restrain random image choices at a position. The temporary features vector and stored features vector corresponding to the K nearest neighbors are used to calculate the image matching factor, then the mean-weighted exponent algorithm will fuse KWNN, orientation matching, and image matching results to improve the accuracy further. At the same time, the users can upload the results and those temporarily stored data including MAC address, RSSI, image feature, and orientation values into a server to update the database and aid better location estimate of others the next time. data including MAC address, RSSI, image feature, and orientation values into a server to update the database and aid better location estimate of others the next time.   Table 1 shows the recording format of RPs' coordinates, Wi-Fi fingerprint, captured image sequence number, and output values of the orientation sensor. The coordinates are specified by the user when building the database in the offline stage, but when using crowd-sourcing to update, they are calculated by our proposed algorithm. The AP number means the total number of available APs when users scan Wi-Fi, it is worth noting that the AP number of different RPs are not all equal because of the changing environment at different places, thus, m1, m2, ..., mn in Table 1 are not all equal. AP1-APm record the received APs' MAC and RSSI; when making a coarse estimation, the KWNN algorithm only calculates the distance of RSSI with the same MAC address. Image sequence number is recorded for 1 to n, and the image features are stored in another data table named as the corresponding sequence number of images. Because of the random direction when users take a picture, namely, it is likely that users take pictures at the same place but the direction of the image is different, so in this paper we use orientation factor to restrain this case of randomness.

Wi-Fi Fingerprint Positioning Based on KWNN
K-weighted nearest neighbors is a version of the K-nearest neighbors (KNN) algorithm improved by introducing a weighted distance factor. If the RPs' distribution is sparse or non-uniform, after calculating the distances and selecting K nearest neighbors, the classification result of KNN is worse than that of KWNN. Torres-Sospedra's [8] extensive experimental results showed that Euclidean distance is not the optimal choice to describe the similarity between fingerprints. On the contrary, the Sorensen distance function can achieve a better accuracy than Euclidean distance. Therefore, in this paper, we choose the Sorensen function for the KWNN algorithm. The definition of the Sorensen distance function is: where i is a sequence of RPs, n denotes the number of APs with the same MAC between mobile node and RPs, MN denotes mobile node, RP denotes a reference point, then, the K-nearest distance will be selected as the nearest neighbors. Finally, the distances corresponding to the K neighbors are used to calculate the normalized weights using [20]:  Table 1 shows the recording format of RPs' coordinates, Wi-Fi fingerprint, captured image sequence number, and output values of the orientation sensor. The coordinates are specified by the user when building the database in the offline stage, but when using crowd-sourcing to update, they are calculated by our proposed algorithm. The AP number means the total number of available APs when users scan Wi-Fi, it is worth noting that the AP number of different RPs are not all equal because of the changing environment at different places, thus, m 1 , m 2 , ..., m n in Table 1 are not all equal. AP 1 -AP m record the received APs' MAC and RSSI; when making a coarse estimation, the KWNN algorithm only calculates the distance of RSSI with the same MAC address. Image sequence number is recorded for 1 to n, and the image features are stored in another data table named as the corresponding sequence number of images. Because of the random direction when users take a picture, namely, it is likely that users take pictures at the same place but the direction of the image is different, so in this paper we use orientation factor to restrain this case of randomness.

Wi-Fi Fingerprint Positioning Based on KWNN
K-weighted nearest neighbors is a version of the K-nearest neighbors (KNN) algorithm improved by introducing a weighted distance factor. If the RPs' distribution is sparse or non-uniform, after calculating the distances and selecting K nearest neighbors, the classification result of KNN is worse than that of KWNN. Torres-Sospedra's [8] extensive experimental results showed that Euclidean distance is not the optimal choice to describe the similarity between fingerprints. On the contrary, the Sorensen distance function can achieve a better accuracy than Euclidean distance. Therefore, in this paper, we choose the Sorensen function for the KWNN algorithm. The definition of the Sorensen distance function is:

1)
where i is a sequence of RPs, n denotes the number of APs with the same MAC between mobile node and RPs, MN denotes mobile node, RP denotes a reference point, then, the K-nearest distance will be selected as the nearest neighbors. Finally, the distances corresponding to the K neighbors are used to calculate the normalized weights using [20]: According to the normalized weight and RPs' coordinates, a coarse estimation using the KWNN-based Wi-Fi fingerprint can use the following formula [20]: In KWNN, it is well known that K is a critical parameter that will directly influence the positioning accuracy, so we have explored that and the results are shown in Figure 2. Each K was simulated 1000 times to calculate the mean error of positioning; we can see from Figure 2 that K = 3 is the optimal choice whose mean error is about 1.59 m, so we set K as 3 in this paper, which will determine the number of retrievable images.
According to the normalized weight and RPs' coordinates, a coarse estimation using the KWNN-based Wi-Fi fingerprint can use the following formula [20]: In KWNN, it is well known that K is a critical parameter that will directly influence the positioning accuracy, so we have explored that and the results are shown in Figure 2. Each K was simulated 1000 times to calculate the mean error of positioning; we can see from Figure 2 that K = 3 is the optimal choice whose mean error is about 1.59 m, so we set K as 3 in this paper, which will determine the number of retrievable images.

Precise Localization Calibrated with Built-in Sensors on a Smartphone
At the image matching positioning stage, the images and three-dimensional orientation sensor data captured via the user's smartphone will be matched with the stored data according to the K nearest neighbors. This process can be described like this: first, we calculate the angle difference between real-time obtained orientation values and stored values, and the results are expressed as orientation matching factors, afterwards, the captured image will be processed to calculate the image matching factor. It is worth noting that the orientation matching factors are three-dimensional vectors, but the image matching factors are 1 × K vectors, where K is the parameter in the KWNN algorithm. Finally, we will utilize the mean-weighted exponent algorithm to correct further the estimated results of the Wi-Fi fingerprint. More pertinent details will be introduced in the following sections.

Three-Dimensional Orientation Matching
The orientation sensor is used to detect motion pose such as azimuth, pitch and roll. We show the 3D model of orientation sensor in Figure 3, it is software-based and derives its three-dimensional floating data from the accelerometer sensor and the geomagnetic field sensor using rotation matrix. Since this whole process has been packaged into the Android API, we can use it directly.

Precise Localization Calibrated with Built-in Sensors on a Smartphone
At the image matching positioning stage, the images and three-dimensional orientation sensor data captured via the user's smartphone will be matched with the stored data according to the K nearest neighbors. This process can be described like this: first, we calculate the angle difference between real-time obtained orientation values and stored values, and the results are expressed as orientation matching factors, afterwards, the captured image will be processed to calculate the image matching factor. It is worth noting that the orientation matching factors are three-dimensional vectors, but the image matching factors are 1ˆK vectors, where K is the parameter in the KWNN algorithm. Finally, we will utilize the mean-weighted exponent algorithm to correct further the estimated results of the Wi-Fi fingerprint. More pertinent details will be introduced in the following sections.

Three-Dimensional Orientation Matching
The orientation sensor is used to detect motion pose such as azimuth, pitch and roll. We show the 3D model of orientation sensor in Figure 3, it is software-based and derives its three-dimensional floating data from the accelerometer sensor and the geomagnetic field sensor using rotation matrix. Since this whole process has been packaged into the Android API, we can use it directly. We use the orientation sensor in a smartphone to improve the image matching performance, so we should know its coordinate system shown in Figure 4a, where the z-axis represents azimuth or heading. When the phone rotates around the z-axis, the first element in the three-dimensional floating data, whose range is 0-360°, indicates the angle between the geomagnetic field and the phone's y-axis, so if the value is equal to 0°, 90°, 180°, or 270°, it means the heading is north, east, south, or west, respectively, and we show this case in Figure 4b. The x-axis represents pitch. When the phone rotates around the x-axis, the second element in the three-dimensional floating data, whose range is −180° to +180° indicates the angle between the geomagnetic field and the phone's z-axis, specifically, when the top of the phone is up, the angle changes from 0° to −180°, when the bottom of the phone is up, the value changes from 0° to 180°. This case is shown in Figure 4c. The x-axis represents roll. When the phone rotates around the y-axis, the second element in the three-dimensional float data, whose range is −90° to +90° indicates the angle between the geomagnetic field and the phone's x-axis, specifically, when the left of the phone is up, the angle changes from 0° to −90°, when the right of phone is up, the value changes from 0° to +90°. This case is shown in Figure 4d.  We use the orientation sensor in a smartphone to improve the image matching performance, so we should know its coordinate system shown in Figure 4a, where the z-axis represents azimuth or heading. When the phone rotates around the z-axis, the first element in the three-dimensional floating data, whose range is 0-360˝, indicates the angle between the geomagnetic field and the phone's y-axis, so if the value is equal to 0˝, 90˝, 180˝, or 270˝, it means the heading is north, east, south, or west, respectively, and we show this case in Figure 4b. The x-axis represents pitch. When the phone rotates around the x-axis, the second element in the three-dimensional floating data, whose range is´180˝to +180˝indicates the angle between the geomagnetic field and the phone's z-axis, specifically, when the top of the phone is up, the angle changes from 0˝to´180˝, when the bottom of the phone is up, the value changes from 0˝to 180˝. This case is shown in Figure 4c. The x-axis represents roll. When the phone rotates around the y-axis, the second element in the three-dimensional float data, whose range is´90˝to +90˝indicates the angle between the geomagnetic field and the phone's x-axis, specifically, when the left of the phone is up, the angle changes from 0˝to´90˝, when the right of phone is up, the value changes from 0˝to +90˝. This case is shown in Figure 4d. We use the orientation sensor in a smartphone to improve the image matching performance, so we should know its coordinate system shown in Figure 4a, where the z-axis represents azimuth or heading. When the phone rotates around the z-axis, the first element in the three-dimensional floating data, whose range is 0-360°, indicates the angle between the geomagnetic field and the phone's y-axis, so if the value is equal to 0°, 90°, 180°, or 270°, it means the heading is north, east, south, or west, respectively, and we show this case in Figure 4b. The x-axis represents pitch. When the phone rotates around the x-axis, the second element in the three-dimensional floating data, whose range is −180° to +180° indicates the angle between the geomagnetic field and the phone's z-axis, specifically, when the top of the phone is up, the angle changes from 0° to −180°, when the bottom of the phone is up, the value changes from 0° to 180°. This case is shown in Figure 4c. The x-axis represents roll. When the phone rotates around the y-axis, the second element in the three-dimensional float data, whose range is −90° to +90° indicates the angle between the geomagnetic field and the phone's x-axis, specifically, when the left of the phone is up, the angle changes from 0° to −90°, when the right of phone is up, the value changes from 0° to +90°. This case is shown in Figure 4d.  In our algorithm, when the user takes a picture, the output values of the orientation sensor and the image will be recorded into the database simultaneously. The angle difference between real-time obtained orientation values and stored values can improve the stability of image matching performance, namely, the values are used to constrain random image choices for a coarse estimated location. A smaller angle difference indicates that the direction of the user and the pose of the smartphone are closer to the corresponding RPs. In this paper, we define the orientation matching factor to measure the closeness as follows: assume that the real-time obtained three-dimensional orientation values are represented as {θx, θy, θ z}, the stored values acquired from the K nearest nodes are denoted as {}, then, the angle difference can be calculated using the following formula: Finally, the orientation matching factor is the normalized weights of the reciprocal value of the angle differences: where i = 1, 2, . . . , K indicates the sequence of RPs in Equations (4) and (5).

Image Feature Matching by an Optical Camera
Image feature matching has been researched for indoor localization for years, and image features can be extracted with many algorithms, for example, SIFT, PCA-SIFT, SURF. Niu [25] and Luo [48] compared SIFT and SURF from the perspective of time, scale, rotation, blur, illumination and affine. We have tested this in our experiments and it proves that SURF is more than three times faster than SIFT, however, it is not optimal to handle scale changes, rotation transformation and blur; a performance comparison of those three changes is shown in Figure 5. In our algorithm, when the user takes a picture, the output values of the orientation sensor and the image will be recorded into the database simultaneously. The angle difference between real-time obtained orientation values and stored values can improve the stability of image matching performance, namely, the values are used to constrain random image choices for a coarse estimated location. A smaller angle difference indicates that the direction of the user and the pose of the smartphone are closer to the corresponding RPs. In this paper, we define the orientation matching factor to measure the closeness as follows: assume that the real-time obtained three-dimensional orientation values are represented as }, the stored values acquired from the K nearest nodes }, then, the angle difference can be calculated using the following formula: Finally, the orientation matching factor is the normalized weights of the reciprocal value of the angle differences: where i = 1, 2, …, K indicates the sequence of RPs in Equations (4) and (5).

Image Feature Matching by an Optical Camera
Image feature matching has been researched for indoor localization for years, and image features can be extracted with many algorithms, for example, SIFT, PCA-SIFT, SURF. Niu [25] and Luo [48] compared SIFT and SURF from the perspective of time, scale, rotation, blur, illumination and affine. We have tested this in our experiments and it proves that SURF is more than three times faster than SIFT, however, it is not optimal to handle scale changes, rotation transformation and blur; a performance comparison of those three changes is shown in Figure 5. Considering that there exists scale, rotation and blur changes when users take photos in many cases, thus, we choose the SIFT algorithm proposed by Lowe [49] to extract image features. Extracted features can be expressed as N × 128-dimensional feature vectors and stored in a database, where N denotes the total number of obtained feature points. The image captured on a smartphone by users is processed by the same algorithm, then we calculate one by one the Sorensen distances between the stored feature vectors and the real-time captured ones, we calculate the ratio by comparing the distance of the closest neighbor to that of the second-closest neighbor as the metric for similarity of images. Lowe demonstrated that a metric threshold of "feature distance" between 0.4 and 0.6 is optimal when the two matching images change with scaling change and rotation transformation [49], further, we have found in our many experiments that when the threshold is equal to 0.55, the Considering that there exists scale, rotation and blur changes when users take photos in many cases, thus, we choose the SIFT algorithm proposed by Lowe [49] to extract image features. Extracted features can be expressed as Nˆ128-dimensional feature vectors and stored in a database, where N denotes the total number of obtained feature points. The image captured on a smartphone by users is processed by the same algorithm, then we calculate one by one the Sorensen distances between the stored feature vectors and the real-time captured ones, we calculate the ratio by comparing the distance of the closest neighbor to that of the second-closest neighbor as the metric for similarity of images. Lowe demonstrated that a metric threshold of "feature distance" between 0.4 and 0.6 is optimal when the two matching images change with scaling change and rotation transformation [49], further, we have found in our many experiments that when the threshold is equal to 0.55, the positioning mean error is smallest, the result is shown in Section 5.2.1. The process of image matching is as follows: it is based on the condition that we have obtained K images after coarse positioning with the KWNN-based Wi-Fi fingerprint, where K is the number of nearest neighbors. As mentioned above, the amount of database images in the candidate set has been reduced and the randomness of images choice have been restricted with the orientation sensor. To decrease the computational time, K images will be processed in a multithread mechanism.
Those K images corresponding to K nearest neighbors in the database can be expressed as I pDB,iq (i = 1, 2, ..., K), and feature vectors are indicated as F pDB,iq (i = 1, 2, ..., K), the image captured by users is represented as Is and its corresponding feature vectors can be donated as Fs, then we use the following formula to calculate the common feature vectors Fc between the captured image and stored images: Fc " pF pDB,iq X Fsq P r0, minpF pDB,iq , Fsqs where i = 1, 2, ..., K. After that, two ratios between those images can be calculated, there are (pF pDB , iq X F s q{F pDB , iq ) and (pF pDB , iq X F s q{F S ). Accordingly, we use the two ratios to calculate the image matching factor. This factor can be defined as: where i = 1, 2, 3 denotes the index of the selected images. The image in the database that should be picked up if it maximizes the image matching factor, as we can see, if the two images don't contain common feature points, the η is 0, if the two images match totally, the η is 2, so the range of η values is between 0.0 ď η ď 2.0, and a larger η indicates a greater similarity between two images.

Precise Localization Using Mean-Weighted Exponent Algorithm
In this section we will introduce our fusion method, which we name the mean-weighted exponent algorithm, to calibrate the estimated location through the KWNN algorithm. The reason why we chose an exponent function is due to its nonlinearity, namely, an exponent function has a more significant change rate compared with a linear function.
According to the selected K RPs, we have calculated the distance factor ω by KWNN, and the orientation matching factor ξ as well as image matching factor η through the proposed algorithm. With those critical parameters, we fuse them by calculating the sum, which will be used as the power of the exponent function. Similar to the previous principle, the normalized correction factor after fusion is represented as: where K is the number of nearest neighbors, i = 1, 2, . . . , K, λ and γ denote adjustment parameters.
In this paper, we set both λ and γ to 1. Finally, the location of the user improved by the camera and orientation sensor on the smartphone is estimated by utilizing the following formula, similar to Equation (3): Equally important, after calculating the orientation matching and image matching factors, it should be noted that the orientation matching factors are three-dimensional vectors, but the image matching factors are 1ˆK vectors, where K is the parameter in the KWNN algorithm. Then, we first set thresholds to judge the maximum value of those factors, if the maximum values are both greater than the two thresholds, this indicates that the location of the reference point corresponding to that maximum value can be used as the final location of the user. In this paper, they are equal to 1.8 and 0.8.

Updating the Database by Crowd-Sourcing
One of the disadvantages of existing Wi-Fi fingerprint positioning is the very high costs in time and manpower during the offline acquisition and calibration phases. To overcome these limitations, our proposed algorithm supports users in uploadiang their scanned Wi-Fi data and captured images into a server to update the database, that is, crowd-sourcing, Estelles-Arolas [50] overviewed this and defined the idea as "a new online distributed production model in which people collaborate and may be rewarded to complete tasks" [51], in our design, we always assume the users are willing to upload the right results into the database. Apart from the fact that it can greatly reduce the complexity of building the database, we can find other advantages from our experiment in that the accuracy of both the Wi-Fi fingerprint and proposed sensors-calibrated system positioning results are higher and more stable as the amount of data in the database increases, as the results in Section 5.2.3 show, so users contributing to crowd-sourcing can get a more precise and stable results. Moreover, it can aid others to locate the next time if the database has not recorded some RPs information in the offline stage, in this way, it also decreases the manpower cost.

Experimental Setup
To validate our proposed algorithm, we have developed an application on an Android smartphone. Our experiment was performed on the fourth floor at the College of Information System and Management of NUDT, with an area of around 1250 m 2 containing six fixed APs, the layout of floor and RPs installation are shown in Figure 6. During the experiment, we scan the Wi-Fi signal RSSI and record the captured front images and three-dimensional orientation values to build the database shown in Figure 7. We did our best to try to capture fixed icons, such as signs, house numbers, etc. We collected 150 RPs' information in the offline stage in the first round. In our experiment, our experimental hardware included: a TP-link router, ThinkPad E420 with Win 7 64-bit operating system, Intel(R) i3-2310M CPU 2.10 GHz 4 cores and 4 G memory. The smartphone is a Huawei Honor 6 (H60-L01) mobile 4G, 16 G ROM whose sensors parameters are listed in Tables 2 and 3 moreover, we used a Kingston 8 GB micro SD memory card to store data in the phone. We developed our software using Android Developer Tools v 21.1.0. that maximum value can be used as the final location of the user. In this paper, they are equal to 1.8 and 0.8.

Updating the Database by Crowd-Sourcing
One of the disadvantages of existing Wi-Fi fingerprint positioning is the very high costs in time and manpower during the offline acquisition and calibration phases. To overcome these limitations, our proposed algorithm supports users in uploadiang their scanned Wi-Fi data and captured images into a server to update the database, that is, crowd-sourcing, Estelles-Arolas [50] overviewed this and defined the idea as "a new online distributed production model in which people collaborate and may be rewarded to complete tasks" [51], in our design, we always assume the users are willing to upload the right results into the database. Apart from the fact that it can greatly reduce the complexity of building the database, we can find other advantages from our experiment in that the accuracy of both the Wi-Fi fingerprint and proposed sensors-calibrated system positioning results are higher and more stable as the amount of data in the database increases, as the results in Section 5.2.3 show, so users contributing to crowd-sourcing can get a more precise and stable results. Moreover, it can aid others to locate the next time if the database has not recorded some RPs information in the offline stage, in this way, it also decreases the manpower cost.

Experimental Setup
To validate our proposed algorithm, we have developed an application on an Android smartphone. Our experiment was performed on the fourth floor at the College of Information System and Management of NUDT, with an area of around 1250 m 2 containing six fixed APs, the layout of floor and RPs installation are shown in Figure 6. During the experiment, we scan the Wi-Fi signal RSSI and record the captured front images and three-dimensional orientation values to build the database shown in Figure 7. We did our best to try to capture fixed icons, such as signs, house numbers, etc. We collected 150 RPs' information in the offline stage in the first round. In our experiment, our experimental hardware included: a TP-link router, ThinkPad E420 with Win 7 64-bit operating system, Intel(R) i3-2310M CPU 2.10 GHz 4 cores and 4 G memory. The smartphone is a Huawei Honor 6 (H60-L01) mobile 4G, 16 G ROM whose sensors parameters are listed in Tables 2  and 3, moreover, we used a Kingston 8 GB micro SD memory card to store data in the phone. We developed our software using Android Developer Tools v 21.1.0.    The simple interface of the app and positioning result are displayed in Figure 8. The red point represents the current location, and a prompt message is at the top.   The simple interface of the app and positioning result are displayed in Figure 8. The red point represents the current location, and a prompt message is at the top.   The simple interface of the app and positioning result are displayed in Figure 8. The red point represents the current location, and a prompt message is at the top.  Four buttons are shown in the bottom, from left to right, their function is: exit, Wi-Fi positioning, image matching positioning, and upload data to the server. Clicking the Wi-Fi positioning button uses the KWNN algorithm to estimate the position for users, After the KWNN algorithm implements the positioning, clicking the image matching positioning opens the camera automatically and further corrects the roughly estimated result using our proposed algorithm. Clicking the upload data button uploads the data and the refined localization in the phone cache into the database. Clicking exit exits our positioning application software. Initially, the user can point to a location on the map and upload the current data into the server to build a database.

Image Matching Evaluation
An experiment is conducted in the meeting room shown in Figure 9 Table 4, meanwhile, we calculate those factors using the methodology described in Section 4. Four buttons are shown in the bottom, from left to right, their function is: exit, Wi-Fi positioning, image matching positioning, and upload data to the server. Clicking the Wi-Fi positioning button uses the KWNN algorithm to estimate the position for users, After the KWNN algorithm implements the positioning, clicking the image matching positioning opens the camera automatically and further corrects the roughly estimated result using our proposed algorithm. Clicking the upload data button uploads the data and the refined localization in the phone cache into the database. Clicking exit exits our positioning application software. Initially, the user can point to a location on the map and upload the current data into the server to build a database.

Image Matching Evaluation
An experiment is conducted in the meeting room shown in Figure 9 Table 4, meanwhile, we calculate those factors using the methodology described in Section 4.   It is well known that Lowe demonstrated that a metric threshold for Euclidean distance between 0.4 and 0.6 is optimal [49,52]. In this case the probability density functions for correct matches are obviously higher than for incorrect matches. In our experiment, we have explored the metric threshold's impact on the positioning accuracy of SURF and SIFT algorithm, it worth noting that "Wi-Fi+SURF" means the method using Wi-Fi fingerprint to make a coarse estimation and SURF-based image matching algorithm to refine the positioning results, which was used in [25], "Wi-Fi+SURF" is the proposed algorithm in our paper. The results are shown in Figure 10, where we selected 150 RPs in the database, and located each group 200 times with different thresholds. As we can see clearly, the mean error is lowest when the threshold is set to 0.55 both of the two algorithm, however, the mean error increases significantly when the threshold is greater than 0.75, the accuracy becomes worse, consequently, we set the threshold to 0.55 when using feature vector to match images and realize positioning in the online stage.  It is well known that Lowe demonstrated that a metric threshold for Euclidean distance between 0.4 and 0.6 is optimal [49,52]. In this case the probability density functions for correct matches are obviously higher than for incorrect matches. In our experiment, we have explored the metric threshold's impact on the positioning accuracy of SURF and SIFT algorithm, it worth noting that "Wi-Fi+SURF" means the method using Wi-Fi fingerprint to make a coarse estimation and SURF-based image matching algorithm to refine the positioning results, which was used in [25], "Wi-Fi+SURF" is the proposed algorithm in our paper. The results are shown in Figure 10, where we selected 150 RPs in the database, and located each group 200 times with different thresholds. As we can see clearly, the mean error is lowest when the threshold is set to 0.55 both of the two algorithm, however, the mean error increases significantly when the threshold is greater than 0.75, the accuracy becomes worse, consequently, we set the threshold to 0.55 when using feature vector to match images and realize positioning in the online stage.

Positioning Evaluation
To validate the accuracy and correctness of the proposed algorithm with calibrated sensors, we have performed an experiment with 150 RPs in the database. The Cumulative Distribution Function (CDF) and root mean square error (RMSE) of the stand-alone KWNN-based Wi-Fi fingerprint algorithm, stand-alone SIFT-based positioning algorithm, SURF combined with Wi-Fi positioning algorithm and proposed sensors-calibrated algorithm are shown in Figure 11. In Figure 11a, we count the errors of 200 positioning events and we can see that positioning accuracy could be greatly improved by the sensors-calibrated algorithm compared with other approaches, for example, the error probability distribution of the KWNN-based algorithm under 1 m is about 12%, under 2 m is about 53.5%, these results demonstrate that the accuracy of the KWNN-based algorithm is not high enough. However, with correction of optical image and orientation sensor matching, this probability distribution increases to 43%, the probability under 2 m is about 80%. The stand-alone SIFT-based algorithm also has a high error probability distribution, and these results demonstrate that the accuracy of the KWNN-based algorithm is improved by visual positioning algorithm obviously (i.e., the sensors-calibrated algorithm). Significantly, the correctness of the KWNN directly impacts the accuracy of the sensors-calibrated algorithm. The RMSEs of different algorithms are shown in Figure 11b, where the RMSE of the sensors-calibrated algorithm is between 1.2 m and 1.5 m, less than the RMSE of the other three positioning algorithms. The stand-alone SIFT-based and

Positioning Evaluation
To validate the accuracy and correctness of the proposed algorithm with calibrated sensors, we have performed an experiment with 150 RPs in the database. The Cumulative Distribution Function (CDF) and root mean square error (RMSE) of the stand-alone KWNN-based Wi-Fi fingerprint algorithm, stand-alone SIFT-based positioning algorithm, SURF combined with Wi-Fi positioning algorithm and proposed sensors-calibrated algorithm are shown in Figure 11. In Figure 11a, we count the errors of 200 positioning events and we can see that positioning accuracy could be greatly improved by the sensors-calibrated algorithm compared with other approaches, for example, the error probability distribution of the KWNN-based algorithm under 1 m is about 12%, under 2 m is about 53.5%, these results demonstrate that the accuracy of the KWNN-based algorithm is not high enough. However, with correction of optical image and orientation sensor matching, this probability distribution increases to 43%, the probability under 2 m is about 80%. The stand-alone SIFT-based algorithm also has a high error probability distribution, and these results demonstrate that the accuracy of the KWNN-based algorithm is improved by visual positioning algorithm obviously (i.e., the sensors-calibrated algorithm). Significantly, the correctness of the KWNN directly impacts the accuracy of the sensors-calibrated algorithm. The RMSEs of different algorithms are shown in Figure 11b, where the RMSE of the sensors-calibrated algorithm is between 1.2 m and 1.5 m, less than the RMSE of the other three positioning algorithms. The stand-alone SIFT-based and "SURF+Wi-Fi" method have close RMSE results, while the stand-alone KWNN-based algorithm is worse, ranging from 2.1 to 2.4. We can conclude that our proposed sensors-calibrated algorithm has better performance than the others and the estimated results are closer to the true location after calibration. "SURF+Wi-Fi" method have close RMSE results, while the stand-alone KWNN-based algorithm is worse, ranging from 2.1 to 2.4. We can conclude that our proposed sensors-calibrated algorithm has better performance than the others and the estimated results are closer to the true location after calibration.
(a) (b) Figure 11. Performance comparison of our sensors-calibrated algorithm with other approaches.

Performance of Crowd-Sourcing for Positioning
We use the error CDF and accuracy as the standard to evaluate the performance of crowd-sourcing. In our small-scale experiment, five people participated in uploading data on the fourth floor at the College of Information System and Management of NUDT. We assume users participate actively, uploading positioning results freely and honestly, that is, there is no unreliable data in the experiment. In this section, we select compare the performance of the stand-alone KWNN algorithm and our proposed algorithm. The number of items in the database is set to 50,100,150,200,250,300, and 350 to simulate the increase in the database size through crowd-sourcing. The results are shown in Figures 12.

Performance of Crowd-Sourcing for Positioning
We use the error CDF and accuracy as the standard to evaluate the performance of crowd-sourcing. In our small-scale experiment, five people participated in uploading data on the fourth floor at the College of Information System and Management of NUDT. We assume users participate actively, uploading positioning results freely and honestly, that is, there is no unreliable data in the experiment. In this section, we select compare the performance of the stand-alone KWNN algorithm and our proposed algorithm. The number of items in the database is set to 50,100,150,200,250,300, and 350 to simulate the increase in the database size through crowd-sourcing. The results are shown in Figure 12. "SURF+Wi-Fi" method have close RMSE results, while the stand-alone KWNN-based algorithm is worse, ranging from 2.1 to 2.4. We can conclude that our proposed sensors-calibrated algorithm has better performance than the others and the estimated results are closer to the true location after calibration.
(a) (b) Figure 11. Performance comparison of our sensors-calibrated algorithm with other approaches.

Performance of Crowd-Sourcing for Positioning
We use the error CDF and accuracy as the standard to evaluate the performance of crowd-sourcing. In our small-scale experiment, five people participated in uploading data on the fourth floor at the College of Information System and Management of NUDT. We assume users participate actively, uploading positioning results freely and honestly, that is, there is no unreliable data in the experiment. In this section, we select compare the performance of the stand-alone KWNN algorithm and our proposed algorithm. The number of items in the database is set to 50,100,150,200,250,300, and 350 to simulate the increase in the database size through crowd-sourcing. The results are shown in Figures 12.    Figure 12a,b show the error CDF of the KWNN-based and the proposed sensors-calibrated algorithm, respectively. From Figure 12 we can conclude that: (1) As the number of RPs in the database increases, the accuracy of the KWNN-based and our sensors-calibrated algorithm are both improved.
(2) The sensors-calibrated algorithm has a better performance. For example, when the number of RPs is equal to 350, the mean error under 1 m of KWNN-based algorithm is about 20%, but, the mean error under 1 m of the proposed algorithm is about 46%, an obvious improvement. (3) For the same database, if the estimated results of the KWNN-based algorithm are more precise and more stable, which will choose better nearest neighbors, so finally, the accuracy of our sensors-calibrated algorithm will be better.
The RMSE and mean error of the stand-alone KWNN-based Wi-Fi fingerprint algorithm, stand-alone SIFT-based positioning algorithm, SURF combined with Wi-Fi positioning algorithm and proposed sensors-calibrated algorithm are shown in Figure 13, where the mean error indicates the closeness between the average value of a number of measurements and an accepted reference value, namely, trueness. precise and more stable, which will choose better nearest neighbors, so finally, the accuracy of our sensors-calibrated algorithm will be better.
The RMSE and mean error of the stand-alone KWNN-based Wi-Fi fingerprint algorithm, stand-alone SIFT-based positioning algorithm, SURF combined with Wi-Fi positioning algorithm and proposed sensors-calibrated algorithm are shown in Figure 13, where the mean error indicates the closeness between the average value of a number of measurements and an accepted reference value, namely, trueness. The RMSE indicates a measure of the system's combined trueness and precision, namely, accuracy. Each group was performed 200 times. Figure 13a compares the RMSE of these four algorithms, Figure 13b compares the mean errors of these four algorithms. We can conclude the following from the shown results: as the number of RPs increases, the RMSE of the KWNN-based algorithm decreases from 2.8 m to 1.6 m, reducing by 42.9%. The sensors-calibrated algorithm decreases from 2.7 m to 1.02 m, reducing by 62%, stand-alone SIFT image matching algorithm and SURF combinated with Wi-Fi algorithm decrease 54.2% and 57.9%, respectively.
The RMSE of the sensors-calibrated algorithm is less than that of the other three algorithms, which means the position from the sensors-calibrated algorithm is closer to the true location, while the stand-alone SIFT-based algorithm and SURF combined with Wi-Fi algorithm results are approximate.
From Figure 13b, we can see that the mean error of visual positioning algorithm is less than that of KWNN-based Wi-Fi fingerprint positioning, and with the coarse estimation by stand-alone KWNN-based algorithm, the proposed sensors-calibrated algorithm is less than those of the other three algorithms as the number of RPs increases, and the rate of increase of the proposed algorithm is larger than that of the others, which indicates the sensors-calibrated algorithm has a better performance.
We can use data variance to evaluate the stability of positioning. Positioning variance comparisons of the different algorithms are shown in Figure 14. As we can see, when the number of RPs increase, the variance of the four algorithms decreases, however, the positioning variance of the proposed sensors-calibrated algorithm is less than the variance of other algorithms, stand-alone SIFT-based algorithm and SURF combined with Wi-Fi algorithm are approximate, therefore, the results demonstrate that the proposed algorithm is more stable than KWNN. The RMSE indicates a measure of the system's combined trueness and precision, namely, accuracy. Each group was performed 200 times. Figure 13a compares the RMSE of these four algorithms, Figure 13b compares the mean errors of these four algorithms. We can conclude the following from the shown results: as the number of RPs increases, the RMSE of the KWNN-based algorithm decreases from 2.8 m to 1.6 m, reducing by 42.9%. The sensors-calibrated algorithm decreases from 2.7 m to 1.02 m, reducing by 62%, stand-alone SIFT image matching algorithm and SURF combinated with Wi-Fi algorithm decrease 54.2% and 57.9%, respectively.
The RMSE of the sensors-calibrated algorithm is less than that of the other three algorithms, which means the position from the sensors-calibrated algorithm is closer to the true location, while the stand-alone SIFT-based algorithm and SURF combined with Wi-Fi algorithm results are approximate.
From Figure 13b, we can see that the mean error of visual positioning algorithm is less than that of KWNN-based Wi-Fi fingerprint positioning, and with the coarse estimation by stand-alone KWNN-based algorithm, the proposed sensors-calibrated algorithm is less than those of the other three algorithms as the number of RPs increases, and the rate of increase of the proposed algorithm is larger than that of the others, which indicates the sensors-calibrated algorithm has a better performance.
We can use data variance to evaluate the stability of positioning. Positioning variance comparisons of the different algorithms are shown in Figure 14. As we can see, when the number of RPs increase, the variance of the four algorithms decreases, however, the positioning variance of the proposed sensors-calibrated algorithm is less than the variance of other algorithms, stand-alone SIFT-based algorithm and SURF combined with Wi-Fi algorithm are approximate, therefore, the results demonstrate that the proposed algorithm is more stable than KWNN.

Performance Comparison of Time Consume for Positioning
Time consumption is an important aspect for positioning systems, and it's well known that the SURF algorithm is faster than the SIFT algorithm. In our proposed sensors-calibrated algorithm, we use a multithread mechanism to extract features of the K stored images corresponding to the K nearest neighbors in parallel by using the SIFT operator, so the proposed algorithm's time consumption for estimating positions of is not worse than SURF, but the time consumption of a stand-alone SIFT-based algorithm is almost three times that of our proposed sensors-calibrated algorithm, and the position estimation time increases greatly as the image database increases in size, thus, we show the results of stand-alone KWNN-based, SURF combined with Wi-Fi algorithm and the proposed algorithm (i.e., SURF+Wi-Fi) in Table 5.
As we can see, the time consumption of the KWNN-based algorithm is the least; the average is 23.79 ms. As we put more emphasis on the time consumption of image matching positioning after coarse estimation by the KWNN-based algorithm, with the help of the multithread mechanism, the average time consumed by our proposed algorithm is 1716.20 ms. At the same time, the time consumption of the image matching approach based on SURF is 1808.03 ms, so the results demonstrate that high computational cost of image matching positioning have been reduced greatly by our sensors-calibrated algorithm.

Conclusions
Aiming at solving the imprecise and time-consuming outcomes in Wi-Fi fingerprint and image matching positioning, respectively, in this paper we propose a crowd-sourcing indoor localization algorithm via the optical camera and orientation sensor in a smartphone. Real-case experimental results of different methods demonstrate that the sensors-calibrated algorithm has a better accuracy and stability performance, compared with three other algorithms (i.e., stand-alone KWNN-based positioning, stand-alone SIFT image matching positioning and SURF combined with Wi-Fi positioning). In summary, the proposed sensors-calibrated crowd-sourcing indoor localization algorithm via the optical camera and orientation sensor on a smartphone has three innovative aspects:

Performance Comparison of Time Consume for Positioning
Time consumption is an important aspect for positioning systems, and it's well known that the SURF algorithm is faster than the SIFT algorithm. In our proposed sensors-calibrated algorithm, we use a multithread mechanism to extract features of the K stored images corresponding to the K nearest neighbors in parallel by using the SIFT operator, so the proposed algorithm's time consumption for estimating positions of is not worse than SURF, but the time consumption of a stand-alone SIFT-based algorithm is almost three times that of our proposed sensors-calibrated algorithm, and the position estimation time increases greatly as the image database increases in size, thus, we show the results of stand-alone KWNN-based, SURF combined with Wi-Fi algorithm and the proposed algorithm (i.e., SURF+Wi-Fi) in Table 5. As we can see, the time consumption of the KWNN-based algorithm is the least; the average is 23.79 ms. As we put more emphasis on the time consumption of image matching positioning after coarse estimation by the KWNN-based algorithm, with the help of the multithread mechanism, the average time consumed by our proposed algorithm is 1716.20 ms. At the same time, the time consumption of the image matching approach based on SURF is 1808.03 ms, so the results demonstrate that high computational cost of image matching positioning have been reduced greatly by our sensors-calibrated algorithm.

Conclusions
Aiming at solving the imprecise and time-consuming outcomes in Wi-Fi fingerprint and image matching positioning, respectively, in this paper we propose a crowd-sourcing indoor localization algorithm via the optical camera and orientation sensor in a smartphone. Real-case experimental results of different methods demonstrate that the sensors-calibrated algorithm has a better accuracy and stability performance, compared with three other algorithms (i.e., stand-alone KWNN-based positioning, stand-alone SIFT image matching positioning and SURF combined with Wi-Fi positioning).
In summary, the proposed sensors-calibrated crowd-sourcing indoor localization algorithm via the optical camera and orientation sensor on a smartphone has three innovative aspects: (1) We explore the impact of SIFT and SURF thresholds on positioning when we choose the SIFT operator to extract image features. We conclude from our experiments that the positioning error is the least when the threshold is set as 0.55 in both the SIFT and SURF algorithms.
We propose a mean-weighted exponent algorithm to fuse the output of the built-in sensors in a smartphone. Small scale real-case scenarios results demonstrate that the proposed algorithm has a better performance because of its nonlinearity to calibrate the positioning accuracy.
We use crowd-sourcing to update and supplement the database. We also explored its influence on positioning. The experimental results show that the accuracy and stability of both algorithms have been improved, and the sensors-calibrated algorithm is obviously better than other three algorithms in this paper.
Compared with the other positioning systems, the proposed system has the following advantages: (1) Higher performance. We apply the mean-weighted exponent algorithm to calculate the correction factors based on the output of the camera and orientation sensor. These factors will improve the performance of positioning system greatly in accuracy, stability, and applicability. (2) Easier usage. All required hardware is widely available in common smartphones. It is unnecessary to use deploy additional installations and users can use their smartphones for positioning in two modes: if high accuracy is unnecessary, they can choose Wi-Fi fingerprint positioning, otherwise, they can open the camera to improve the positioning results through the proposed sensors-calibrated algorithm.
Lower cost. The proposed algorithm can be implemented on a smartphone. Firstly, we use the smartphone to locate, and on the other hand, we adopt crowd-sourcing via the smartphone to update the database. Equally important, we do not need to preset tags for image identification in the indoor scenario. However, if we do preset tags for image identification, it will be better for our algorithm.
For indoor positioning systems, it is ideal to achieve the optimal performance with the simplest equipment or method; however, there are many issues to be solved. Our proposed algorithm still has some shortcomings; fortunately, those shortcomings will be addressed in our future work: Firstly, we should consider combining other built-in sensors in the smartphone. We have performed experiments under the assumption that the light is stable in indoor environments. In practice, the light might change as the weather changes. Smartphone cameras are sensitive to lighting and susceptible to this lowering the quality of the images. We will handle this aspect in the future by combining other sensors or other methods such as MoVIPS [26,27].
Secondly, the influence of equipment diversity should be considered. It is well known that the same built-in sensors in different smartphones have different performances, such as accuracy and stability, which will influence the positioning performance. We have tested the output value of the orientation sensor on the Huawei Honor 6 mobile phone and Motorola Moto X Pro mobile phone. The stability of the sensors in these two devices is obviously different. For simplification, we have done all experiments with the same brand of smartphone. Exploring the influence of equipment diversity for positioning will be a part of our future work.
Finally, the crowd-sourcing selection mechanism should also be considered. Crowd-sourcing is a suitable method to build a database for positioning. On the one hand, it can reduce the time needed to build the database. On the other hand, it can improve positioning performance as demonstrated by our experiments. However, the thing we should take into account is how can we filter out the unreliable data from all the uploaded data. At the same time, we will explore the quality control and management of crowd-sourcing to select better and more useful data to realize localization.