A Multi-Mode PDR Perception and Positioning System Assisted by Map Matching and Particle Filtering

: Currently, pedestrian dead reckoning (PDR) is widely used in indoor positioning. Since there are restrictions on a device’s pose in the procedure of using a smartphone to perform the PDR algorithm, this study proposes a novel heading estimation solution by calculating the integral of acceleration along the direction of the user’s movement. First, a lightweight algorithm, that is, a finite state machine (FSM)-decision tree (DT), is used to monitor and recognize the device mode, and the characteristics of the gyroscope at the corners are used to improve the heading estimate performance during the linear phase. Moreover, to solve the problem of heading angle deviation accumulation on positioning, a map-aided particle filter (PF) and behavior perception techniques are introduced to constrain the heading and correct the trajectory through the wall after filtering. The results indicate that the recognition of phone pose can be 93.25%. The improved heading estimation method can achieve higher stability and accuracy than the traditional step-wise method. The localization error can reduce to approximately 2.2 m when the smartphone is held at certain orientations.


Introduction
In recent years, with the increased use of intelligent mobile devices, i.e., smartphones, the customer demand for indoor location services is increasingly strong. As the current satellite positioning technology signals are degraded or denied for indoor environments due to signal attenuation and the multipath effect [1], it is difficult to provide reliable location services to customers in an indoor environment. Therefore, it is extremely challenging to seek a reliable and accurate indoor navigation scheme. Many researchers have made great progress in indoor positioning technology research, such as inertial sensors [2][3][4][5][6][7], Bluetooth [8,9], magnetism [10,11], radio-frequency identification [12,13], ultra-wide band [14,15], wireless local area network [16][17][18][19], and computer vision [20,21]. However, most indoor location technologies rely on a specific infrastructure and are expensive to deploy and maintain. Pedestrian dead reckoning (PDR), especially on smartphones, plays an increasingly important role in indoor locations due to its continuous positioning, fast data updating, and its ability to work without any additional hardware. PDR utilizes the gait information obtained from the inertial sensor to estimate the heading and step length of the pedestrian [2] and periodically updates the pedestrian's position according to the previous state. Numerous methods have been reported that accurately detect step count and estimate step length [3,6], but great challenges still remain in the estimation of headings.
At present, there are two main problems with pedestrian heading estimation. First, inertial sensors mounted on mobile devices are not only affected by the drift caused by their own characteristics but they are also susceptible to environmental disturbances including magnetic fields and irregular human motion. Afzal et al. [22] divided the interference of an environmental magnetic field on the triaxial vector of the magnetometer into four categories and calculated the error of each category to correct the heading estimation. Poulose et al. [23] proposed a calibration algorithm for hard and soft effects, and the scale factors and offset values were utilized for magnetometer calibration to enhance the performance of the heading estimation. Zheng et al. [24] improved the accuracy and robustness of the positioning system by utilizing the zero-velocity update algorithm to reset the error accumulation of the accelerometer in the walking stage. Lin et al. [25] utilized a received signal strength indicator to correct the orientation error of PDR, but the restriction is that it only applies to pedestrians walking in a straight line. In [26,27], some fusion techniques were utilized to improve the performance of low-cost sensors' heading estimation, such as the linear Kalman filter, the extended Kalman filter, the unscented Kalman filter, complementary filters, and the particle filter (PF). However, most of the existing approaches are limited and are only able to perform tracking when the smartphone is carried in a defined or constrained way during the entire walking period, which is not always the case in real life.
Some researchers have focused on heading estimation for users in different movement states and phone poses. Wang et al. [28] divided the usage scenarios of the device into various modes (including SMS, calling, swing, and pocket) in order to set different deviations of a pedestrian heading for each pose. However, when the smartphone is in dynamic motion, the prior deviation will be difficult to adapt. Liu et al. [29] utilized the horizontal component of the angular velocity calculated by transforming the inertial measurements from the frame of reference of the device to the frame of reference of the earth to estimate pedestrian heading based on the least squares principle; however, there was a problem with this estimate, i.e., the solution of the heading estimation equation is not unique. Pei et al. [1] proposed a method to obtain a robust heading estimation using a two-step horizontal acceleration integral, which broke through the limitation of phone pose in the traditional PDR application.
Other researchers have applied principal component analysis (PCA) to heading estimation. Wang et al. [30] proposed a PCA-based method with global accelerations to infer a pedestrian's headings, and an ambiguity elimination method was developed to calibrate the obtained headings. This method can obtain a high accuracy heading estimation in a relatively short time (within a few minutes), but it is still difficult to solve the heading problem over longer periods of time.
The error accumulation caused by low-cost sensors makes the PDR method alone unable to achieve acceptable accuracy [31]. In addition to the research on heading estimation, some researchers focused on indoor trajectory tracking and correction techniques [5,18,19,[31][32][33]. Guo et al. [32,34] constructed a semantic-rich indoor link-node model and utilized the inferred semantic information to match with this model to derive the correct user trajectory. Zhou et al. [31] utilized a semantic augmented route network graph with an adaptive edge length to provide semantic constraint for the trajectory calibration using a particle filter algorithm, which obtained an enhanced accuracy of 1.23 m, with the indoor semantic information attached to each pedestrian's motion. Wang et al. [5] presented a correlation matching algorithm based on map projection and the zone division of a floor map to constrain the accumulation of errors associated with the PDR positioning, which eliminated the accumulation error of PDR systems to a certain extent and improved the quality and accuracy of the positioning results.
Although there have been many investigations into improving the performance of smartphones in heading estimation and positioning, difficulties still exist with PDR in multi-mode multi-pose cases. This study focuses on motion state recognition and the indoor localization of pedestrians. Our method enhances the accuracy and robustness of the PDR system by solving the issues of the smartphone being held in different poses and the accumulation of positioning deviation. The main contributions of our work are as follows: 1. According to the pedestrian's daily custom of using a phone, we defined five typical modes for carrying a smartphone, including holding, videoing, calling, swing, and pocket. In prior works, sensor data were continuously extracted to recognize the mode of the device, which not only consumes a lot of memory but also greatly reduces the battery life. By matching a pre-defined threshold, the finite state machine (FSM) algorithm can make a timely response to the mode switch without extracting data features. Therefore, in this paper, the FSM and decision tree (DT) algorithms were combined to perform real-time monitoring and the recognition of the phone mode. This method uses a DT algorithm to identify the user's current mode once the FSM algorithm detects the change of the device pose. Experimental results show that the light-weight classification approach proposed in this paper can accurately recognize the defined modes. 2. We improved the step-wise heading estimation method by introducing gyroscope information to calculate the global heading of pedestrian in the linear stage. Moreover, we introduced a particle filter-based map-matching algorithm to improve the performance for heading estimation and positioning. We conducted experiments to evaluate the performance of the proposed method in two scenes: an indoor positioning experimental site and an underground parking garage. The results show that the proposed algorithm obtains robust performance for heading estimation and positioning in various usage modes. The main structure of the paper is as follows. Section 1 is the literature review of several methods and their limitations. Section 2 introduces the proposed methods of mode recognition and heading estimation in detail. The experimental results of motion recognition and indoor localization are shown in Section 3. In Section 4, the conclusion is presented and the limitations of the study are discussed.

Materials and Methods
The architecture of the PDR system, which consists of data pre-processing, step detection, mode recognition, and heading estimation, is shown in Figure 1. The low-cost inertial sensors mounted on a smartphone are not only affected by the drift caused by their own characteristics but are also susceptible to environmental disturbances. Therefore, it is necessary to filter and calibrate the obtained original signal before using it. The features are extracted from inertial sensors, which are the input of the classifier. The finite state machine (FSM) and decision tree (DT) algorithm are combined to perform real-time monitoring and the recognition of the phone mode. This method uses the DT algorithm to recognize the user's current mode once the FSM algorithm detects the change of device attitude. The parameters of step detection and corner detection are adjusted based on the results of the classifier, and the PDR positions are updated by PF. The details of the multi-mode PDR perception and positioning system will be further discussed in the following subsections.  Figure 1. Architecture of the system.

Raw Data Preprocessing
Because of the low-cost sensors mounted on the smartphone, there is a lot of noise in the raw signals from sensors. It is necessary to preprocess the obtained original signals before it is used, so as to weaken the adverse impact of sensor noise on the mode recognition and positioning of the user.

Low-Pass Filtering and Smoothing
When the pedestrian is walking, the acceleration signals tend to fluctuate up and down around the gravity component. Acceleration signals are usually used to detect the occurrence of step event and as the input of heading estimation. It is found from experiment that some high-frequency components would influence the results of step detection and heading estimation [35]. Based on the analysis of the data in different device usage patterns, we found that the frequency of most signals is lower than 8 Hz, so a fourth-order Butterworth low pass filter with a cut-off frequency of 8 Hz was used to eliminate the influence of high-frequency noise. Then, the signals were further smoothed by a moving average algorithm to remove unnecessary burrs. The size of the smooth window is seven samples. As shown in Figure 2, the filtered acceleration signals contain less noise, which can reflect the characteristics of pedestrian movement more clearly.
(a) (b) Figure 2. Data filtering and smoothing from the accelerometer and gyroscope: (a) the raw and filtered acceleration data; (b) the raw and filtered angular velocity data.

Magnetometer Calibration
In general, the smartphone magnetometer is susceptible to internal errors and external disturbances, and the internal errors of a magnetometer sensor caused by the manufacturing process and component quality can be divided into non-orthogonal error, sensitivity error, sensor noise, and zero-offset [36].
The magnetometer is an important source of information for calculating equipment orientation, so magnetometer calibration is very necessary. In this paper, the smartphone is rotated around the device's three axes, and the obtained magnetometer sample data are calibrated by using the method of the least square fitting of ellipsoids [37]. Figure 3 is the comparison of the results before and after the calibration of magnetometer data.

Mode Recognition
In daily use, the signals output by the built-in inertial components of a smartphone often present various characteristics due to different orientations at which the device is held or abnormalities in the pedestrian's behavior. Therefore, the smartphone mode should be monitored and recognized to assist handheld-PDR [30]. According to the pedestrian's daily custom of using a phone, we defined five typical modes for carrying smartphone, including holding, videoing, calling, swing, and pocket. The phone poses are described in detail as follows: Holding: the case in which the smartphone is held in front of body with the screen upwards, and the phone's heading is aligned with the moving direction of the user.
Videoing: the case in which the smartphone is held in front of the body with the phone screen pointing to his/her body.
Calling: the case in which the pedestrian makes a call and puts the phone near his/her ear. In this case, the phone screen points to the side of his/her body.
Swing: the case in which the pedestrian holds the phone in his/her hand and swings his/her arm around naturally during walking. In this case, according to the habits of using a phone, we assume that the phone screen points to the side of body and the phone approximately points to the direction of pedestrian motion or the ground.
Pocket: the case in which the phone is carried in the front pocket of the pants. In this case, the phone's heading changes with the body movement, and the phone plane is approximately perpendicular to the ground when the pedestrian is in static state.
In this study, a lightweight device mode monitoring and recognition algorithm is proposed, which can be divided into two modules: feature extraction and mode recognition.

Feature Extraction
Currently, smartphones are equipped with numerous sensor components. The data output from these measurement components can largely reflect the various behaviors of the users. However, the discrete data are insufficient to analyze the behavior characteristics of pedestrians, so we need to use a sliding window to slice the data and extract the characteristics of each window. The size of the window was set to 128 samples (2.5 s) with 50% overlap. The frequency of sensors is 50 Hz. Figure 4 shows the variation of the three-axis acceleration mean values extracted via a sliding window under the five modes of calling, holding, videoing, pocket, and swing, and the duration of each mode was 40 s. It can be found that, due to the different spatial relations of the coordinate system of the equipment, the triaxle acceleration of the equipment presents clear characteristic differences. Therefore, the real-time average acceleration information of users can be used for the classification of device attitude, and its calculation formula is as follows:  In the pocket mode, P1 and P3 represent the cases in which the phone screen was pointed to the user's body and the direction of movement, respectively, during which the phone was placed upside down in the pocket; P2 and P4 represent the cases in which the phone screen was pointed to user's body and in the direction of movement, respectively, during which the phone approximately points upwards. In the swing mode, S1 and S2 represent the case in which the phone approximately points to the direction of pedestrian motion and the ground, respectively.

Mode Monitoring and Classification
Unfortunately, there is a significant computational load associated with continuous sensor data processing, particularly in previous research [38][39][40][41] on pedestrian mode recognition. Modern smartphone CPUs can perform the necessary processing in real time, but only at the cost of highpower consumption and reduced battery life [42]. Gu et al. [41]defined seven types of motion states and compared six commonly used classifiers. In addition, the motion state history and the characteristics of people's motion were utilized to improve the classification accuracies. In [38], multiple classifiers (such as DT, linear discriminant analysis, K-nearest neighbors, Naive Bayes, support vector machine, and least-squares support vector machine) were developed to recognize human step modes and phone poses. The classification accuracy ranged from 80.3% to 97.8%. In [40], six kinds of common movements in indoor navigation were defined, and an artificial neural network was utilized for the classifier.
However, smartphones are generally kept in one's pocket or in one's hand for a long time and continuously extract data to recognize the state of the device, which not only consumes a lot of memory but also greatly reduces the battery life. The classification methods, which classify the modes by extracting statistical features of sensor data after every sample, are commonly found in most literature. In FSM, a state transition occurs when a new event (a state) is detected because its amplitude matches a pre-defined threshold. Therefore, the FSM algorithm can make a timely response to the mode switch and reduce the significant computational load associated with continuous sensor data processing. DT is a non-parametric classifier with a tree structure, which can directly reflect the characteristics of the signals. If an observation is given, the corresponding logical expression is easily introduced according to the generated DT model [30]. Therefore, in this study, the FSM method and DT algorithm are combined to perform real-time monitoring and the recognition of the phone mode. This method uses a DT algorithm to identify the user's current mode once the FSM algorithm detects the change in the device pose. The proposed FSM has six states, with cover all modes and transitions between modes, as shown in the mode recognition section of Figure  1. The main modes of smartphone (holding, videoing, calling, swing, and pocket) are bridged by the TRANS state, and the initial mode is set to holding (reasonably assuming that navigation is always turned on via the holding interaction). When there is no switch detected, the device is always in the current main state; otherwise, it will be switched to the TRANS state.
In Figure 5, an experimental user walks in a straight line and changes the equipment mode every 10 seconds; the total duration of the test was 300 seconds. It was found that when the position of the device is changed rapidly, the angular velocity data of the device tends to present clear characteristics, and when the phone is in different main states, the threshold of the angular velocity of each axis that triggers the TRANS state is also different.  (30,70,150,190,230, 270, 290 s) the direction and position of the mobile phone are relatively stable, so the angular velocity of the device in this mode is relatively small. For the swing (10, 60, 130, 180, 250 s) and pocket (40, 100, 220 s) modes, the relative position of the mobile phone is not fixed, so the curve of angular velocity presents great fluctuation and a distinct periodicity, as the device typically rotates around the z-axis of the device periodically during the swing mode. For the holding mode, when the current mode is switched to videoing mode (120 s, 210 s), the device rotates clockwise approximately 90° around the x-axis, and the y-axis and z-axis are relatively stable. When switching from the holding mode to the calling mode (150 s, 270 s), the device rotates clockwise about 90° around the z-axis and x-axis, and the y-axis is relatively stable. On the contrary, when the device switches to swing mode (60 s, 180 s), the x-axis is relatively stable and switches to pocket mode (100 s), and all three axes are active and the angular velocity fluctuates greatly. Transition motion is characterized by the orientation change of the phone and is detected through monitoring the angular rotation rate of the phone in the x-axis, y-axis, and z-axis by thresholding. The condition of TRANS motion is satisfied differently according to the current state of the FSM. The thresholds of the rotation rate monitored in each mode are also tabulated in Table  1, in which , , was calculated by the formula: , , = ( , , ). Therefore, the angular velocity information of the device obtained by the gyroscope can not only be used as the condition of the TRANS state trigger but can also obtain rough mode (RM) information for the system. Holding - From the experience gained through our experiments (see Section 3.2 for details), we know that the characteristics of angular velocity may be similar in the switching of different modes, such as switching from videoing mode to swing mode (130 s, 250 s) and pocket mode (220 s). Therefore, when the system triggers the TRANS state, the algorithm proposed in this paper automatically extracts the acceleration data of the next sliding window, and then utilizes the DT algorithm to recognize the mode information (DT Mode). During this period, the user's mode is temporarily considered to be RM. The flowchart of the algorithm is shown in Figure 6.  Step i Step j Step j+1 Step Detection Y Figure 6. Smartphone mode monitoring and recognition.

Step Detection and
Step Length Estimation Figure 7 shows the acceleration of the z-axis in holding mode. We can see that the acceleration of the human body has the characteristics of sine wave when the user is walking, so it is possible to detect the step of pedestrians by detecting the crest or valley. However, the change in the phone's mode will greatly influence the measurements of the acceleration on three axes. The acceleration in the vertical direction can generally reflect the step characteristics of pedestrians more clearly. Thus, the vertical orientation axis of acceleration was selected as the norm for step detection. In this study, a multi-condition constrained crest-valley detection method was utilized to detect the step. We improved the accuracy of step counting in different modes by putting constraints on step features, namely similarity [43], time thresholds k t  , and peak thresholds   . To detect the steps accurately, the parameters of the algorithm were adjusted for different phone poses [43,44].
The stride length varies from person to person, and is affected by age, gender, height, walking speed, and other factors [45]. Many estimation models have been proposed and most models were generated by using accelerometer data, including the linear model, the nonlinear model, and the artificial neural network model [30]. In this study, the Weinberg [46] algorithm was utilized to estimate the pedestrian's stride length, and its calculation formula is as follows: where and are the maximum and minimum values of one-step acceleration obtained using the step detection algorithm, respectively; the constant k is the personalized parameter fitting each regression line.

Heading Estimation
When using a smartphone for PDR, the phone heading may constantly change while the pedestrian heading stays fixed. So, the method using only the equipment heading to replace pedestrian direction or introducing heading compensation [28] is difficult to adapt to the real situation.
The accelerometer built into the mobile phone can obtain the acceleration data under the body coordinate system of the device. The double integral method can be used to conveniently calculate the user's displacement from the measurement of the linear accelerometer, and the heading can also be calculated by the displacement. However, the double integration method may exacerbate the accumulation of positioning errors, which will reach the meter level in just a few seconds [35]. Pei et al. [1] proposed a method to obtain the velocity vector under the user reference coordinate system by using a two-step horizontal acceleration integral to calculate the orientation, which broke through the attitude limitation in the traditional PDR application and obtained a heading estimation that had robust performance. In this study, an improved heading calculation method is proposed to calculate the global heading of a pedestrian in the linear stage by introducing gyroscope information, and then a robust fusion orientation can be obtained with the assistance of a particle filter algorithm based on map matching.

Global Heading
The attitude and heading reference system (AHRS) is an attitude reference system that has strong robustness and high accuracy, which is due to fusing the data of the accelerometer, gyroscope, and magnetometer. In this study, the Madwick-AHRS [47] algorithm is adopted to obtain the attitude of the device held by the user in the process of walking in real time, and then the transformation matrix from the body coordinate system to the reference coordinate system can be calculated by the attitude of the device. Thus, we can obtain the horizontal acceleration of the user in the reference coordinate system as follows: where ( , , ) , , , a a a r r x r y r z  represent the acceleration data of the user in body coordinate system and reference coordinate system, respectively. The heading can be calculated by In fact, due to the poor quality of the mobile phone sensor, limited computing ability, and the shaking of the user's body during walking, it is difficult to obtain an accurate heading from Equation (4). Therefore, the step information obtained by the step detection algorithm can be used to obtain the heading information within one step: Step r x Step Step Step Step where j Step  is the step-wise heading of the pedestrian at step; j Step r V is the velocity vector obtained by the horizontal acceleration integral, and its calculation formula is shown in Equation (6); j Step and 1 j Step  represent the time of thej th and ( 1) j th   step obtained by the step detection algorithm, respectively. In indoor environments, due to the restriction of buildings, the randomness of the pedestrian track is greatly reduced. The angular velocity information of the specific axis of the device in different modes can be utilized to detect the movement characteristics of users, such as turning or going straight. In this study, we select the vertical orientation axis of the gyroscope as the norm for corner detection. Figure 8 shows the change in the user's angular velocity in the direction of travel in the holding mode, and there are four instances representing the user's four turns during walking. Then, the trajectory of the pedestrian can be divided into several linear stages (it is assumed that the user walks in an approximately straight line). The quartile anomaly detection typically has a good robustness. Thus, the quartile outlier detection algorithm can be utilized to obtain the turn information of pedestrians. Then, the user's real-time heading can be obtained by an inverse calculation of the velocity vector obtained by the horizontal acceleration integral in the linear stage. In this way, the heading of pedestrians can be constantly corrected as the pedestrian is walking. Step l r x l Step Step Step l r y l Step where j Step  is the global heading of step, and k Step is the last time that the user turns.

Heading Estimation and Localization Assisted by PF and Map Matching
By looking for a group of samples in the state space (called particles), the PF approximates the probability density function to obtain the minimum variance estimation of the state: where 1: is the Dirac function; i k w represents the weight of the i-th particle; k x represents the measurements (the coordinate value calculated by heading and step length using gait detection algorithms); and i k x represents the prior estimate of the i-th particle at time k, which can be obtained from the following conversion [48]: where ( ) The advantage of the PF is that it is suitable for solving nonlinear and noise non-Gaussian problems, but it has some problems, such as improper particle transfer (which means that the particle can transfer to unreachable areas or through a wall to another indoor area). Building maps usually contain a large amount of useful information, which can not only restrain the transfer of particles but can also effectively reduce the number of invalid particles and provide more reliable location information. As shown in Figure 9a, when the particle is transferred to the unreachable region, its weight is set to 0. On the contrary, the right side of the bell curve of the Gaussian distribution [49] can be utilized to calculate the weight of the particle to obtain a more realistic trajectory: After the weight was updated, the particles need to be re-screened. The formula of was utilized to normalize the weight of the k-th coordinate. Particle screening adopted the random resampling method to retain high-weight particles as much as possible and remove low-weight particles. Last, in order to minimize the trajectory fluctuation, the current position of the system can be updated as follows: where global E  and global N  are the displacements of the pedestrian in the east and north directions calculated by the global heading. Therefore, we can determine the main direction of pedestrian movement by comparing the displacements. For the problem of the track going through the wall in the corner, as shown in Figure 9b, the remarkable characteristics of the gyroscope during cornering was used to perceive the nearest landmark information (for example, the point "P" shown in the Figure9b) for correcting the track. Based on the cut edge of the track, we can utilize the correction value of each step to correct the track and avoid the track passing through the impassable area [50].
(a) (b) Figure 9. Particle filtering algorithm assisted by map matching; (a) the distribution of particle (the blue dot represents a dead particle, and the red dot represents a living particle); (b) the diagram of trajectory correction.
At this point, the heading angle of the step can be calculated according to the probability position determined by filtering: The heading determined by this method is approximate to the real pedestrian direction in most positions, but there are still relatively large fluctuation problems at the corners or few points; thus, a robust fusion heading method can be obtained as follows: where global  and stepwise  represent the heading calculated by the methods of global and step-wise, respectively; 1  and 2  are the weight of calculating the fusion heading in both cases, respectively, which can be adjusted according to the actual situation to improve the course estimation ability of the model; In this study, 1  is set to 2/3; 2  is set to 1/3 in holding, videoing, and calling modes.
As for the swing and pocket modes, the phone is unstable when the pedestrian is walking, so 2  is set to 2/3;  is the threshold of heading deviation, which is set to 15° in this study; 1 d represents the difference between particle  and global  , and 2 d represents the difference between particle  and stepwise  .

Experimental Conditions
In this section, the experiments are presented to verify the performance of the proposed methods of mode recognition and indoor localization. Huawei Mate 20 pro, Xiaomi note 3, Samsung S10, and One plus 7 pro smartphones were used as the test platform for system testing.
In the mode monitoring and classification experiment, seven male and three female volunteers of different heights carried four devices to participate in data collection. The participants moved with different modes, and each mode was recorded. The size of the window for feature extraction was set to 128 samples (2.5 s) with 50% overlap. After screening and processing, 12,340 sets of sample data were retained for the construction of the decision tree model. In order to evaluate the performance of the proposed mode monitoring and recognition algorithm in real scenes, 14 volunteers of different heights and ages carried four devices and were instructed to walk normally in both an indoor and outdoor environment and switch equipment modes freely to complete data collection.
The localization experimental sites were situated at two indoor scenes: a third-floor indoor positioning experimental site and an underground parking garage, in the School of Environmental Science and Spatial Informatics, China University of Mining and Technology. Figure 10 shows the indoor positioning experimental site on the third floor, with a total length of approximately 300 m and an area of approximately 670 m 2 . The test area is a typical magnetic field disturbed environment due to existing GNSS receivers, computers, Bluetooth and WiFi transmitter modules, concrete structure, and so on (as shown in Figure 10b). In the experiment, the pedestrian started from the west end of Area C and headed towards the westernmost end of Area B via Area A and a glass corridor, and finally returned to Area A through an outdoor bridge. During this process, the pedestrian walked along the marked trajectory with five modes. Last, we carried out challenging tests in an underground parking garage to verify the performance of the proposed solution with the smartphone carrying mode causally changed.

Mode Monitoring and Classification Experiment
The C4.5 algorithm is a classical algorithm for a generating decision tree, which is an extension and optimization of the ID3 algorithm. It can handle with discrete and continuous attribute types and select the split attribute through the information gain rate. Therefore, in this study, the C4.5 algorithm was utilized to generate the decision tree model. During the establishment of the classification model, the 10-fold cross-validation method was used to test the accuracy of the classification model. In other words, data were divided into 10 parts, among which 9 parts were taken as training data and 1 part was taken as test data in turn. Table 2 shows the confusion matrix of the decision tree classification method. The results show that, in the classification of holding, calling, and swing modes, the proposed algorithm achieved a high accuracy (>97%), while in the two modes of video and pocket, there is a small probability of misjudgment (<2%).  Figure 11 shows the classification results of the scene described in Figure 5 by the FSM + DT classification model in an outdoor environment, in which 1-5, respectively, represent the five modes: holding, videoing, calling, swing, and pocket. It can be seen that in 150 s, 240 s, and 260 s, due to the similarity of switching actions of different modes, the discrimination of the maximum rotation angular velocity of three axes was relatively low, which greatly limited the performance of the FSM algorithm. Therefore, it is difficult to effectively recognize the device's mode using only the rough pattern information provided by the FSM. In order to evaluate the performance of the proposed mode monitoring and recognition algorithm in real scenes. In total, in about three hours, 326 valid mode switching samples were collected by 14 volunteers of different heights and ages. During this process, the pedestrian walks normally in both an indoor and outdoor environment and switches equipment modes freely. Table 3 shows the classification results of mode recognition, including 3.07% missed samples. After analysis, we found that, in holding mode, the device slowly rotated around the x-axis to switch to videoing mode, during which the y-axis and the z-axis are relatively stable. During the switching process, the angular velocity of the x-axis in some samples (especially the data collected by female volunteers) was too small and did not exceed the angular velocity threshold set by the FSM algorithm, resulting in the failure of some videoing mode detection. Although there is a small probability of misjudgment and missing detection, the proposed algorithm achieved the overall mode recognition rate of 93.25%, among which the holding mode achieved a higher classification accuracy, with the accuracy rate exceeding 95% due to the unique characteristics of gravity acceleration. We also achieved a 90.07% pattern recognition rate in the pocket mode even though some participants wore looser clothing. Tian et al. [51] proposed a monitoring and recognition algorithm that only utilizes FSM to realize the holding, swing, and pocket mode. Compared with FSM, the FSM+DT algorithm not only expands the categories of device modes, but also improves the accuracy of the identification of transition states and device modes, with an average increase of 3.03% in the recognition rate and a decrease of 4.42% in the missed detection rate.

Localization Experiment
The localization experimental site was situated at an office building, as shown in Figure 10. Figure 12a,b shows the results of the heading estimation in the holding and pocket modes for the step-wise heading, global heading, and fusion heading based on particle filter and map matching assistance. The results show that the step-wise and global headings have large deviations at the initial stage of positioning (within 20 steps). Over time, the global heading results have the least volatility, which reduces the impact of body shaking and equipment swaying on heading estimation. Figure  12c,d shows the cumulative distribution of the corresponding heading estimation error. We can find that the probability of a fusion heading error being better than 4° is over 85% in both the holding and pocket modes, which was a better performance than the other two heading estimation algorithms. In holding mode, the 50% estimation errors of heading for fusion heading, step-wise, and global heading are 1.3 , 2.8 , and 3.8 , respectively; the 75% estimation errors of the heading are 2.8 , 5.7 , and 8.6 , respectively. In pocket mode, the 50% estimation errors of the heading for the fusion heading, step-wise, and global heading are 0.9 , 1.9 , and 2.0 , respectively; the 75% estimation errors of the heading are 1.7 , 3.7 , and 2.8 , respectively.
The average error and standard deviation in Table 4 reveal the most robust and accurate heading estimation results of the fusion heading. Compared with the step-wise heading, the stability was increased by an average of 32.78%, the average accuracy was increased by a minimum of 44.79%, and the maximum was increased by 67.76%.   Figure 13 shows the location results of different methods in multi-mode. In the particle filter algorithm, the number of particles was set to be 200. Figure 13a shows that the position trajectory calculated by step-wise heading shows obvious deviation from the real trajectory, and the tendency becomes worse over time. Combined with the historical linear stage, the position trajectory calculated by the global heading is shown in Figure 13b. The velocity of the trajectory deviation is greatly limited, and error accumulation is significantly reduced. However, for the holding and swing modes, the first 50 steps all show small deviations due to the shaking of equipment. The location trajectory of the fusion heading solution is shown in Figure 13c. We can see that the trajectory is not only limited in the corridor, but roughly consistent with the centerline of the corridor due to the information regarding the map boundary, which largely limits the randomness of particles in the filtering algorithm. The cumulative error and volatility are basically eliminated, but the trajectory still goes through the wall at the corner. Figure 13d shows the position trajectory assisted by the ground-based landmark perception algorithm, which effectively solves the trajectory through the wall caused by the particle filter algorithm.
Furthermore, we evaluated the mean and maximum positioning errors in different modes, and the results are shown in Table 5. We can find that the trajectory precision of the global heading and the fusion heading is slightly worse than step-wise heading in the videoing mode, but in the four modes of SMS, calling, swing, and pocket, it is better than the step-wise heading. Among them, the maximum error of the pedestrian trajectory calculated by fusing heading decreased by an average of 47.67%, and the average positioning accuracy was improved by 1.34 m at a minimum and 4.13 m at a maximum. It can be seen from the above statement that the fusion heading algorithm proposed in this study not only improves the positioning accuracy and robust in a variety of modes but limits the accumulation of errors to a certain extent.  Last, we conducted challenging experiments in an underground parking garage with an area of 3963 m 2 using a SAMSUNG Galaxy S10 (Android 9.0) to verify the performance of proposed methods in various usage modes. The magnetic field in this area is still disturbed by concrete pillars, vehicles, radio signals, and other factors. A volunteer walked along the marked trajectory while the smartphone carrying mode was causally changed among the five carrying modes as shown in Figure  14. Results show that the average error of localization is less than 2.3 m despite the large environment.  Figure 15, when the number of particles is less than 150, the positioning error of the PF algorithm decreases significantly with the increase in the number of particles, and the positioning accuracy tends to stabilize when the number of particles is greater than 150. In addition, Figure 16 shows that the map-aided PF converges at about 200 particles and the positioning accuracy tends to stabilize when the number of particles is greater than 200. Compared with the PF, the map-aided PF improves the positioning accuracy and processing performance of the algorithm to some extent.

Conclusions
This study presents an indoor localization method based on motion mode recognition. First, the FSM and the DT algorithm are applied, and five modes of phone can be monitored and recognized accurately. Then, by analyzing the obvious characteristics of the gyroscope during turning, we improved the step-wise-based methods by developing the global heading. The results show that the proposed global heading stability increases by 35.54%, on average, compared with the step-wise heading. In addition, this study proposes a PF scheme that combines map matching and behavior perception to constrain the problems of heading deviation accumulation and the improper transfer of particles, as well as optimizing the heading estimation. Consequently, the field tests show that the proposed algorithm obtained robust performance for heading estimation and positioning in various usage modes. For future work, we plan to integrate more characteristic behaviors of pedestrians into the PF, such as taking the elevator, going up and down stairs, or pushing a door, to improve the performance of the indoor localization system. In addition, this study optimizes five typical modes of carrying a smartphone; more complex carrying modes, such as bag mode, as well as more types of smartphones will be further investigated.
Author Contributions: The author X.W. proposed the research idea with the author G.C., carried out most of the experimental work, and drafted the manuscript. The author S.J. performed data analysis and was responsible for field data collection. The corresponding author G.C., who is responsible for the overall work, conducted the experimental part, and was involved in the algorithm design. The author M.Y. was involved in the write-up of the manuscript. All authors have read and agreed to the published version of the manuscript.