Design of a Smartphone Indoor Positioning Dynamic Ground Truth Reference System Using Robust Visual Encoded Targets

Smartphone indoor positioning ground truth is difficult to directly, dynamically, and precisely measure in real-time. To solve this problem, this paper proposes and implements a robust smartphone high-precision indoor positioning dynamic real-time ground truth reference system using color visual scatter-encoded targets based on machine vision and photogrammetry. First, a kind of novel high-precision color vision scatter-encoded patterns with a robust recognition rate is designed. Then we use a smartphone to obtain a sequence of images of an experimental room and extract the base points of the color visual scatter-encoded patterns from the sequence images to establish the indoor local coordinate system of the encoded targets. Finally, we use a high-efficiency algorithm to decode the targets of a real-time dynamic shooting image to obtain accurate instantaneous pose information of a smartphone camera and establish the high-precision and high-availability smartphone indoor positioning direct ground truth reference system for preliminary real-time accuracy evaluation of other smartphone positioning technologies. The experimental results show that the encoded targets of the color visual scatter-encoded pattern designed in this paper are easy to detect and identify, and the layout is simple and affordable. It can accurately and quickly solve the dynamic instantaneous pose of a smartphone camera to complete the self-positioning of the smartphone according to the artificial scatter feature visual positioning technology. It is a fast, efficient and low-cost accuracy-evaluation method for smartphone indoor positioning.


Introduction
A survey show that people spend more than 70% of their life indoors [1]. Indoor spaces that people frequently enter and exit, such as airports, stations, supermarkets, hospitals, shopping centers, museums, libraries, and underground parking lots, have a high demand for location services. Based on smartphone outdoor positioning technology, positioning accuracy is less than 1 m with the help of a Global Navigation Satellite System (GNSS) signal. However, high-availability indoor positioning accuracy is between two and three meters, based on current technologies. Scientists and industrial researchers in many countries are working on indoor positioning technology with a highly available accuracy of 1 m [2]. Indoor positioning methods for smartphones are mainly based on active radio frequency signals such as Wi-Fi and Bluetooth, and built-in sensors for indoor positioning such as geomagnetism, inertial navigation, and vision [3]. The indoor positioning the recognition rate by making. Nevertheless, their structure is more complicated, and more geometric figures must be extracted. Furthermore, the number of complete encoded sets is relatively limited. The traditional gray-encoded target is generally identified by its relationship of geometry and structure, and it is prone to incorrect recognition due to the large inclination and distortion of images. A color encoded target adds color information, but its structure is relatively complicated, which increases the complexity of the algorithm with the problem of a low recognition rate. This paper addresses the above problems by combining the color and geometric information in encoded targets, by designing a set of color visual scatter encoded patterns for fast calculation and high reliability. Hence, the positioning accuracy of a single smartphone image can reach the centimeter level, which is far superior to other indoor other positioning methods, such as Bluetooth, WiFi, 4G, and Pedestrian Dead Reckoning (PDR), enabling encoded targets to meet accuracy requirements of the ground truth reference system.

Design of Visual Scatter-Encoded Targets
Encoded targets have different shapes, such as circles, squares, and triangles [22][23][24]. An experiment showed that the edge of a circular target is the most rounded, and there is no change of rotation angle when rotating in the plane [25]. Furthermore, in the orthographic condition, the distances between the center and the edge points are equal. This target is easily identified and located, and it is used in logistics warehousing and precision control of indoor robots. This paper uses a circular pattern shape for encoded targets.
As shown in Figure 1a, the encoded target patterns designed in this paper arrange the color circular scatter points in a 4 × 4 configuration with a consistent shape and size. Color scatter-encoded targets can be divided into two categories: base points and identification points. An encoded target pattern has five base points, consisting of color scatter points A, B, C, and D at the corners, plus a near corner point E in Figure 1a, while the identification points are located in the remaining position. Figure 1b shows the coordinates formed by the base points. The position, shape, and color of point A, B, C, D and E are unchanged, and the structure retains the unchanged affine transformation parameters after affine transformation so that it has invariance of scaling, rotation and translation. During acquisition, recognition, and decoding of indoor visual encoded targets, the RGB value of the captured indoor image may be untrue due to the influence of an external light source, error of the camera's photosensitive chip, and other factors. To improve the accuracy of color recognition and reduce the interference of the above factors, it is necessary to enhance the contrast between different colors. The three primary colors and their complementary colors has been shown to improve the color recognition rate [26]. We choose red, white, black, and green to improve the accuracy of color recognition by combining the geometric structure characteristics of the designed encoded targets. In addition, a white background color will result in strong reflection of images in this encoding method, which affects the photosensitive effect of encoded targets. However, the image reflection is weak with a black background, and encoded targets will be less affected by reflection when imaging. targets improve the recognition rate by making. Nevertheless, their structure is more complicated, and more geometric figures must be extracted. Furthermore, the number of complete encoded sets is relatively limited. The traditional gray-encoded target is generally identified by its relationship of geometry and structure, and it is prone to incorrect recognition due to the large inclination and distortion of images. A color encoded target adds color information, but its structure is relatively complicated, which increases the complexity of the algorithm with the problem of a low recognition rate. This paper addresses the above problems by combining the color and geometric information in encoded targets, by designing a set of color visual scatter encoded patterns for fast calculation and high reliability. Hence, the positioning accuracy of a single smartphone image can reach the centimeter level, which is far superior to other indoor other positioning methods, such as Bluetooth, WiFi, 4G, and Pedestrian Dead Reckoning (PDR), enabling encoded targets to meet accuracy requirements of the ground truth reference system.

Design of Visual Scatter-Encoded Targets
Encoded targets have different shapes, such as circles, squares, and triangles [22][23][24]. An experiment showed that the edge of a circular target is the most rounded, and there is no change of rotation angle when rotating in the plane [25]. Furthermore, in the orthographic condition, the distances between the center and the edge points are equal. This target is easily identified and located, and it is used in logistics warehousing and precision control of indoor robots. This paper uses a circular pattern shape for encoded targets.
As shown in Figure 1a, the encoded target patterns designed in this paper arrange the color circular scatter points in a 4 × 4 configuration with a consistent shape and size. Color scatter-encoded targets can be divided into two categories: base points and identification points. An encoded target pattern has five base points, consisting of color scatter points A, B, C, and D at the corners, plus a near corner point E in Figure 1a, while the identification points are located in the remaining position. Figure 1b shows the coordinates formed by the base points. The position, shape, and color of point A, B, C, D and E are unchanged, and the structure retains the unchanged affine transformation parameters after affine transformation so that it has invariance of scaling, rotation and translation. During acquisition, recognition, and decoding of indoor visual encoded targets, the RGB value of the captured indoor image may be untrue due to the influence of an external light source, error of the camera's photosensitive chip, and other factors. To improve the accuracy of color recognition and reduce the interference of the above factors, it is necessary to enhance the contrast between different colors. The three primary colors and their complementary colors has been shown to improve the color recognition rate [26]. We choose red, white, black, and green to improve the accuracy of color recognition by combining the geometric structure characteristics of the designed encoded targets. In addition, a white background color will result in strong reflection of images in this encoding method, which affects the photosensitive effect of encoded targets. However, the image reflection is weak with a black background, and encoded targets will be less affected by reflection when imaging.  Therefore, the edges of non-black circular encoded targets are clearer with a black background than with a white background. This is the basis of the encoded target patterns designed in this paper with a black background. Figure 2 shows some examples of the color scatter-encoded target pattern designed in this paper. Base points are red circles, and identification points are white, green, and black circles. Combining with geometric structure and color information, the number of color scatter encoded target patterns can reach 177,147, which can meet the application requirements of most scenarios. Moreover, our method has the invariance of scaling, rotation and translation, and a stable, high recognition rate and sufficient encoding capacity. Therefore, the edges of non-black circular encoded targets are clearer with a black background than with a white background. This is the basis of the encoded target patterns designed in this paper with a black background. Figure 2 shows some examples of the color scatter-encoded target pattern designed in this paper. Base points are red circles, and identification points are white, green, and black circles. Combining with geometric structure and color information, the number of color scatter encoded target patterns can reach 177,147, which can meet the application requirements of most scenarios. Moreover, our method has the invariance of scaling, rotation and translation, and a stable, high recognition rate and sufficient encoding capacity.

Extraction of Base Points
Based on a complete sequence of smartphone images of an indoor experimental scene, we extract base points from the acquired images. Before extraction, we deepen the RGB value of red base points through a custom channel. Since the Canny operator can produce single-pixel edges and is less sensitive to noise, the Canny operator is suitable to extract the edges of circular artificial target patterns. Therefore, we use Canny edge detection [27] to detect the edges of acquired images to obtain a binarized image. Since a circular target is generally imaged as an ellipse due to projection transformation, we use the ellipse fitting to locate the image center of base points. The position information of the edge pixel is used to determine the center of the base point by fitting an elliptic equation, and then the elliptic sequence is established. The general equation for an ellipse in a plane is on below:

Extraction of Base Points
Based on a complete sequence of smartphone images of an indoor experimental scene, we extract base points from the acquired images. Before extraction, we deepen the RGB value of red base points through a custom channel. Since the Canny operator can produce single-pixel edges and is less sensitive to noise, the Canny operator is suitable to extract the edges of circular artificial target patterns. Therefore, we use Canny edge detection [27] to detect the edges of acquired images to obtain a binarized image. Since a circular target is generally imaged as an ellipse due to projection transformation, we use the ellipse fitting to locate the image center of base points. The position information of the edge pixel is used to determine the center of the base point by fitting an elliptic equation, and then the elliptic sequence is established. The general equation for an ellipse in a plane is on below: In Equation (1), (x, y) are the central coordinates of the ellipse, and B, C, D, E, and F are five parameters of the elliptic equation. These parameters are obtained by ellipse fitting, and the elliptic central coordinates (x 0 , y 0 ) can be obtained from Equation (2): To avoid false base points, after recognition complete, we must judge whether it conforms to circular or elliptical pattern characteristics. This takes place in two steps. First, we check whether the ratio of the semi-major and semi-minor axes are within a certain range. Second, the elliptical area cannot be too small. Experiments show that the elliptical area should be greater than 10 pixels value to eliminate false base points.

Setting Up a Local Coordinate System
After extracting base points of the sequence images, the local coordinate system of the color scatter-encoded target pattern is established according to its base point coordinates. A base point ellipse is randomly selected, and the Euler distance S i between it and other base point ellipses is calculated, centered on its central coordinate (x 0 , y 0 ), as follows: where x i and y i are respectively the values in the x and y directions of other elliptical center coordinates. A temporary set of candidate ellipses to be detected is composed of this base point ellipse and four nearby base point ellipses. When these five base point ellipses meet the following principles, they are considered to possibly belong to the same color scatter-encoded target pattern, and otherwise they should be removed: (1) No two ellipses can have an inclusive or intersecting relationship; (2) The maximum semi-major axis radius of the ellipses cannot be greater than twice the minimum semi-minor axis radius.
Among the five ellipses, we find the two that are farthest apart, and judge whether there are collinear ellipses among the remaining three. If so, this line is assumed to be L i , and we detect that whether there is an ellipse between the other ellipses. If not, then we must return to the previous step. The two ellipses of the longest distance between those need to be judged that whether those on the two sides of L i . If so, the ellipse is numbered to a in the Figure 3b, which is close to the one of two ellipses of the longest distance, and other ellipses are numbered to b, c, e and d in the Figure 3b. If not, then we return to the previous step. Then, we must judge whether the center point f of ellipses d and e are collinear with ellipses a and c. If so, the cross-ratio judgment is made on the center of ellipses a, b, c and the f point. Figure 3a shows the principle of cross-ratio invariance. If four points a, b, c, and d are collinear with line L1, then they are projected onto line L2 through the projection center P, with corresponding image points A, B, C, and D. Their relationship is: where the cross-ratio of the four points is: From the above two equations, it can be seen that the cross-ratio of the four points of the collinear line equals the cross-ratio of the corresponding image points, which indicates the invariance of the cross-ratio projection. As shown in Figure 3b, the cross-ratio (ab, fc) of the elliptical center of a, b, c, and point f is defined as: In Equation (6), af, bc, ac, and bf are all directed line segments, which are not distances. Based on Equation (3), the cross-ratio of this paper is set to 0.5; otherwise, we return to the previous step. If the cross-ratio is consistent with the threshold within the error range, then the five ellipses are the base points of the same color encoded target pattern, and the local coordinate system of the color encoded target pattern can be established. Otherwise, ellipses a and c must be selected again.
In Equation (6), af, bc, ac, and bf are all directed line segments, which are not distances. Based on Equation (3), the cross-ratio of this paper is set to 0.5; otherwise, we return to the previous step. If the cross-ratio is consistent with the threshold within the error range, then the five ellipses are the base points of the same color encoded target pattern, and the local coordinate system of the color encoded target pattern can be established. Otherwise, ellipses a and c must be selected again.

Solution Object Coordinates
This paper utilizes the Photoscan software to process indoor images to obtain the intrinsics and extrinsics of a camera. The coordinates of object points corresponding to image points are obtained using the method of forward intersection. According to the coordinate values of four control points in the indoor local coordinate system, the coordinates of object points are transformed to the indoor local coordinate system.

Decoding Encoded Targets
The maximum value Maxi and minimum value Mini of the RGB channel of the five base points in the color encoded target pattern are counted. According to Maxi and Mini, the pixels of base points of the same color encoded target pattern are linearly stretched, and their color is judged. Judging rules are as follows: If the three channel values are all less than 50, the target is considered to be black encoded target. If the largest value of the green channel is greater than 100, which is 50 higher than the value of the other two channels, it is considered to be the green encoded target. If the three channel values are all greater than 150 and the difference between each pairs of channels is not more than 50, the target is considered to be white encoded target. After attaching the color information to each encoded target, the color visual encoded target pattern is encoded. The state values of white, green, and black encoded targets are set to 2, 1, 0, respectively, and consist of a ternary code which is converted to decimal code to complete the decoding process. Finally, the decimal-encoded values of the color encoded target patterns and corresponding coordinate values are recorded in the dataset of encoded targets.

Solution Object Coordinates
This paper utilizes the Photoscan software to process indoor images to obtain the intrinsics and extrinsics of a camera. The coordinates of object points corresponding to image points are obtained using the method of forward intersection. According to the coordinate values of four control points in the indoor local coordinate system, the coordinates of object points are transformed to the indoor local coordinate system.

Decoding Encoded Targets
The maximum value Max i and minimum value Min i of the RGB channel of the five base points in the color encoded target pattern are counted. According to Max i and Min i , the pixels of base points of the same color encoded target pattern are linearly stretched, and their color is judged. Judging rules are as follows: If the three channel values are all less than 50, the target is considered to be black encoded target. If the largest value of the green channel is greater than 100, which is 50 higher than the value of the other two channels, it is considered to be the green encoded target. If the three channel values are all greater than 150 and the difference between each pairs of channels is not more than 50, the target is considered to be white encoded target. After attaching the color information to each encoded target, the color visual encoded target pattern is encoded. The state values of white, green, and black encoded targets are set to 2, 1, 0, respectively, and consist of a ternary code which is converted to decimal code to complete the decoding process. Finally, the decimal-encoded values of the color encoded target patterns and corresponding coordinate values are recorded in the dataset of encoded targets.

Solution Pose of Single Positioning Image
After completing the above work, the elements of exterior orientation of the positioning image can be calculated by space resection. Taking the photo plane and local indoor coordinate values of the color encoded target patterns as the observation and known value, respectively, the pose of the single positioning image is solved iteratively using the collinear equation, the least square method, and indirect adjustment. The calculation procedure is as follows: First, a single positioning image of a smartphone is decoded to obtain an encoded value of the color encoded target patterns, and the indoor coordinate value (X i , Y i ) and photo coordinate value (x i , y i ) are obtained from the color encoded target pattern dataset according to the encoded value of the color encoded target patterns. Then the collinear  (7) and (8): In Equation (7), x, y are image plane coordinate values of the image point. f, x 0 , and y 0 are the intrinsic parameters of the camera. X A , Y A , and Z A and X S , Y S and Z S are object coordinate values of the object point and camera, respectively. a i , b i , and c i (i = 1, 2, 3) are nine directional cosines consisting of three angle elements of the extrinsic parameters: In Equation (8) where v x , v y are observation corrections, and the above formula can be written in matrix form: Based on Equations (10)-(13), Equation (9) can be rewritten as: Finally, according to the least squares principle, the extrinsics X of a single image are calculated as: Sensors 2019, 19, 1261 8 of 16 Finally, according to the least squares principle, the extrinsics X of a single image are calculated as:  To verify and evaluate the smartphone high-precision indoor positioning dynamic ground truth reference system based on color visual encoded target patterns, a 10 m × 9 m room was selected as an indoor experimental environment, and color encoded target patterns were posted in the room according to certain rules. To verify and evaluate the smartphone high-precision indoor positioning dynamic ground truth reference system based on color visual encoded target patterns, a 10 m × 9 m room was selected as an indoor experimental environment, and color encoded target patterns were posted in the room according to certain rules.

Experimental Data and Environment
The three-dimensional texture model shown in Figure 5 was made by Unity3D software (Unity Technologies, San Francisco, CA, USA, Version 4.6) according to the indoor experimental environment. The experimental smartphones were a Samsung Galaxy S8 (Huizhou city, China) smartphone with 64 GB storage, eight-core Qualcomm Snapdragon 835 processor, and 2960 × 1440 camera resolution, and Huawei P10 (Dongguan City, China) smartphone with 64 GB storage, eight-core Kirin 960 processor, and 1920 × 1080 camera resolution. The three-dimensional texture model shown in Figure 5 was made by Unity3D software (Unity Technologies, San Francisco, United States, Version 4.6 ) according to the indoor experimental environment. The experimental smartphones were a Samsung Galaxy S8 (Huizhou city, China)smartphone with 64 GB storage, eight-core Qualcomm Snapdragon 835 processor, and 2960 × 1440 camera resolution, and Huawei P10(Dongguan City, China) smartphone with 64 GB

Decoding Color Encoded Target Patterns of Sequence Images and Results of Dataset
Based on certain capturing principles, we used a smartphone to obtain 54 images to establish the color encoded targets dataset, and all of the images were decoded. Figure 6 shows one of the decoded images. All of the recognized red base points are framed by a light blue circle frame, and the decoded values are displayed in a light red font. The results show that the recognition rate of the color scatter encoded targets in 54 images was 100%, indicating that the color encoded targets designed in this paper have a high recognition rate using our recognition algorithm. The color encoded target dataset consisted of 374 encoded values. Figure 7 shows the three-dimensional display of the color scatter encoded targets and the sparse three-dimensional point cloud, where each numbered flag indicates the corresponding color scatter encoded target. The color scatter-encoded targets are clearly consistent with their corresponding point clouds, indicating very high accuracy of color encoded target extraction. In our experiment, the base points extracted in Photoscan were imported into the corresponding images. In Figure 8, flags indicates the imported base points. The results show that the correct rate of extraction of the color encoded target base points is 100%. To verify the fault tolerance of the proposed method, we deliberately added three color scatter-encoded target patterns with base points of the wrong structure. The three wrong patterns were identified and rejected during decoding. Figure 9 shows the experimental results. These demonstrate that the color encoded target patterns designed in this paper are stable in structure, high in fault tolerance, and easily identified and decoded.

Results of Smartphone Positioning
Two smartphones were used to calculate the pose of the single image in the condition of different illumination and angles. A Leica TS60 (Leica, Basel, Switzerland) measurement robot was used to measure the smartphone pose, and its measuring result was used as the ground truth of smartphone positioning. It was difficult to measure the camera on the smartphone because the surface of the camera was a glass material. Therefore, a ring crosshair was affixed to the camera for aiming and automatic tracking measurement of the measurement robot, as shown in Figure 10. Figure 10a shows the Leica TS60 measurement robot, and Figure 10b shows the ring crosshair affixed on the smartphone camera. In Figure 10c the blue box represents the experimental room, and the four red dots are the locations of the four control points in the local indoor coordinate system. Samsung Galaxy S8 smartphones and Huawei smartphones were used to capture images at 20

Results of Smartphone Positioning
Two smartphones were used to calculate the pose of the single image in the condition of different illumination and angles. A Leica TS60 (Leica, Basel, Switzerland) measurement robot was used to measure the smartphone pose, and its measuring result was used as the ground truth of smartphone positioning. It was difficult to measure the camera on the smartphone because the surface of the camera was a glass material. Therefore, a ring crosshair was affixed to the camera for aiming and automatic tracking measurement of the measurement robot, as shown in Figure 10. Figure 10a shows the Leica TS60 measurement robot, and Figure 10b shows the ring crosshair affixed on the smartphone camera. In Figure 10c the blue box represents the experimental room, and the four red dots are the locations of the four control points in the local indoor coordinate system.

Results of Smartphone Positioning
Two smartphones were used to calculate the pose of the single image in the condition of different illumination and angles. A Leica TS60 (Leica, Basel, Switzerland) measurement robot was used to measure the smartphone pose, and its measuring result was used as the ground truth of smartphone positioning. It was difficult to measure the camera on the smartphone because the surface of the camera was a glass material. Therefore, a ring crosshair was affixed to the camera for aiming and automatic tracking measurement of the measurement robot, as shown in Figure 10. Figure 10a shows the Leica TS60 measurement robot, and Figure 10b shows the ring crosshair affixed on the smartphone camera. In Figure 10c the blue box represents the experimental room, and the four red dots are the locations of the four control points in the local indoor coordinate system. Samsung Galaxy S8 smartphones and Huawei smartphones were used to capture images at 20 points to implement the smartphone monolithic positioning experiment, and two images were captured at different orientations at each point. Figure 11 shows the distribution of 20 positioning points. The positioning results calculated by the method based on color visual encoded target Samsung Galaxy S8 smartphones and Huawei smartphones were used to capture images at 20 points to implement the smartphone monolithic positioning experiment, and two images were captured at different orientations at each point. Figure 11 shows the distribution of 20 positioning points. The positioning results calculated by the method based on color visual encoded target patterns were compared with the ground truth measured by the measuring robot. And the Root Mean Square Error (RMSE) values of the X direction, Y direction, and overall coordinates were calculated. We used Samsung Galaxy S8 and Huawei P10 smartphone to capture images toward an indoor environment wall at the same position, and the positioning results were calculated in real time. Table 1 shows the error among two measurements results and the corresponding ground truth of Samsung Galaxy S8 smartphone. Figure 12 shows images obtained by the Samsung Galaxy S8 smartphone at 20 points at two capturing orientation. Figure 13 shows the comparison of the two measurements results of Samsung Galaxy S8 smartphone and corresponding ground truth at each point. Table 2 shows the error of the two measurements results and the corresponding ground truth of HUAWEI P10 smartphone. Table 3 shows the numbers of points with different accuracy based on the error values of the positioning results of two smartphones. Figure 14 shows images obtained by the HUAWEI P10 at 20 points at two capturing orientations. Figure 15 shows the comparison of the two measurements results of HUAWEI P10 smartphone and corresponding ground truth at each point. Table 4 shows the RMSE values of the Samsung Galaxy S8 smartphone and HUAWEI P10 smartphone in the X direction, Y direction, and overall coordinates. S8 smartphone at 20 points at two capturing orientation. Figure 13 shows the comparison of the two measurements results of Samsung Galaxy S8 smartphone and corresponding ground truth at each point. Table 2 shows the error of the two measurements results and the corresponding ground truth of HUAWEI P10 smartphone. Table 3 shows the numbers of points with different accuracy based on the error values of the positioning results of two smartphones. Figure 14 shows images obtained by the HUAWEI P10 at 20 points at two capturing orientations. Figure 15 shows the comparison of the two measurements results of HUAWEI P10 smartphone and corresponding ground truth at each point. Table 4 shows the RMSE values of the Samsung Galaxy S8 smartphone and HUAWEI P10 smartphone in the X direction, Y direction, and overall coordinates.    S8 smartphone at 20 points at two capturing orientation. Figure 13 shows the comparison of the two measurements results of Samsung Galaxy S8 smartphone and corresponding ground truth at each point. Table 2 shows the error of the two measurements results and the corresponding ground truth of HUAWEI P10 smartphone. Table 3 shows the numbers of points with different accuracy based on the error values of the positioning results of two smartphones. Figure 14 shows images obtained by the HUAWEI P10 at 20 points at two capturing orientations. Figure 15 shows the comparison of the two measurements results of HUAWEI P10 smartphone and corresponding ground truth at each point. Table 4 shows the RMSE values of the Samsung Galaxy S8 smartphone and HUAWEI P10 smartphone in the X direction, Y direction, and overall coordinates.                  From the perspective of overall coordinate accuracy, the positioning accuracy of the proposed method is at the centimeter level, which is much better than that of other high-availability indoor positioning technologies. Combined with Tables 1-3, the positioning error of 5% of images of the two measurement results of the Samsung Galaxy S8 smartphone in the X direction was greater than 10 cm, the positioning error of 87.5% of images was between 1 cm and 10 cm, and the positioning error of 7.5% of images was less than 1 cm. The positioning error of 5% of images of the two measurement results of the Samsung Galaxy S8 in the Y direction was greater than 10 cm, the positioning error of 77.5% of images was between 1 cm and 10 cm, and the positioning error of 17.5% of images was less than 1 cm. The positioning error of 5% of images of the two measurement results of the Huawei P10 smartphone in the X direction was greater than 10 cm, the positioning error of 92.5% of images is between 1 cm and 10 cm, and the positioning error of 2.5% of images is less than 1 cm. The positioning error of 5% of images of the two measurement results of the Huawei P10 smartphone in the Y direction was greater than 10 cm, the positioning error of 87.5% of images was between 1 cm and 10 cm, and the positioning error of 7.5% of images was less than 1 cm. This shows that the positioning accuracy of color encoded target patterns designed in this paper is high, and the overall positioning accuracy is at the centimeter level. From Table 4, the RMSE values of the two measurement positioning results of the Samsung Galaxy S8 smartphone were roughly the same in the x and y direction, and the RMSE value of the overall coordinates of the two measurements positioning results was around 0.08 m. The RMSE value of the two measurement positioning results of the Huawei P10 smartphone was roughly the same in the x and y direction, and the RMSE value of the overall coordinates of the two measurement positioning results was around 0.09 m. This demonstrates that the accuracy of visual positioning of smartphones in different orientations in the same position is stable, and it also shows that proposed method has good robustness for the pose solution of smartphone positioning images at different orientation. In addition, the RMSE value of the Samsung Galaxy S8 smartphone was slightly lower than that of the Huawei P10 smartphone based on the above data. However, the RMSE values of the positioning results of the two types of smartphones were both less than 0.1 m the different conditions, and the difference was very small. This illustrates that the proposed method has good applicability using two different smartphones, and their positioning accuracy was consistent. In Figure 13, errors of points 5, 7, 9, and 16 were larger than those of other points. In Figure 15, the errors of points 2, 9, 15, and 17 exceeded those of other points. From the corresponding smartphone positioning images in Figures 12 and 14, the capturing distances of these points were far, which indicates that the capturing distance of positioning images has a certain influence on the positioning result. It also conforms to the fundamentals of image positioning technology. However, in a certain distance range, due to the high recognition rate of color encoded patterns in this paper, this adverse effect is weakened. In summary, the proposed method for the smartphone indoor visual positioning based on color encoded target patterns has high positioning accuracy and strong robustness, and the applicability to different smartphones is good.

Conclusions
There are problems associated with directly dynamically measuring smartphone poses using current indoor positioning ground truth reference systems and high-price deployments. To address these problems, this paper proposes a kind of high recognition rate, large encoding capacity and good robustness color visual scatter-encoded patterns as a smartphone indoor positioning ground truth reference system for meeting the needs of frequently, freely, and simply obtaining the accuracy of indoor positioning tests of smartphones with a low cost in daily experiments. While using other positioning methods to carry out a smartphone positioning experiment at the same time, the proposed true value reference system could dynamically self-locate the instantaneous pose of a smartphone in real-time. Compared with existing artificial encoded targets, the structure of the color encoded targets designed in this paper is the key to ensure the high-precision positioning result of a single image of a smartphone. The structure of the base points ensures that the color encoded target patterns maintain stability of rotation, translation, and zooming during affine transformation. The color encoded target patterns of this paper combine geometric structure and color information to increase the encoding capacity; to meet the positioning experimental needs of most indoor scenes. Experiments show that the color encoded target patterns effectively simplify the decoding of encoded targets, and the recognition rate of the proposed encoded targets is 100%. Furthermore, the proposed algorithm is robust. Experiments show that its positioning results have good applicability to different smartphone cameras and capturing angles, and the smartphone positioning requirements for lighting conditions are also relatively loose. In our experiment, the positioning accuracy of the system could reach the centimeter level, which is far better than current high-availability and low-cost Bluetooth, Wi-Fi and other positioning sources. This reference system is low-cost and has good real-time dynamics. Therefore, we recommend it as the ground truth reference system for other smartphone indoor positioning technologies.