Activity Recognition and Semantic Description for Indoor Mobile Localization

As a result of the rapid development of smartphone-based indoor localization technology, location-based services in indoor spaces have become a topic of interest. However, to date, the rich data resulting from indoor localization and navigation applications have not been fully exploited, which is significant for trajectory correction and advanced indoor map information extraction. In this paper, an integrated location acquisition method utilizing activity recognition and semantic information extraction is proposed for indoor mobile localization. The location acquisition method combines pedestrian dead reckoning (PDR), human activity recognition (HAR) and landmarks to acquire accurate indoor localization information. Considering the problem of initial position determination, a hidden Markov model (HMM) is utilized to infer the user’s initial position. To provide an improved service for further applications, the landmarks are further assigned semantic descriptions by detecting the user’s activities. The experiments conducted in this study confirm that a high degree of accuracy for a user’s indoor location can be obtained. Furthermore, the semantic information of a user’s trajectories can be extracted, which is extremely useful for further research into indoor location applications.


Introduction
Location-based services (LBSs) have been popular for many years. Although global navigation satellite systems (GNSSs) can provide good localization services outdoors, there is still no dominant indoor positioning technique [1]. Therefore, an alternative technology is required that can provide accurate and robust indoor localization and tracking. Moreover, the spatial structures of indoor spaces are usually more complex than the outdoor environment, and thus, distinctive information is needed to better describe locations for the LBS-based applications.
With the wide availability of smartphones, a large amount of research has been conducted in recent years targeting indoor localization. Most of the existing indoor localization technologies require additional infrastructure, such as ultra-wideband [2], laser scanning systems (LSSs), radiofrequency identification (RFID) [3] and Wi-Fi access points [4]. However, these approaches often require extensive labor and time. To solve this problem, pedestrian dead reckoning (PDR) has recently been proposed as one of the most promising technologies for indoor localization [5]. Differing from the above approaches, PDR uses the built-in smartphone inertial sensors (accelerometer, gyroscope and magnetometer) to estimate the position. However, PDR suffers from error accumulation when the travel time is long. To achieve improved localization results, a number of studies have been conducted under particular circumstances, but the applicability and accuracy are still limited. 2 of 25 In addition to the direct application in indoor localization, the built-in smartphone sensors can also be used to understand the user's movements [6], as well as to identify the indoor environment. Sensing the implied location information about the user moving in the corresponding environment provides a new opportunity for indoor mobile localization. To exploit this underlying information, some studies have been presented based on human activity recognition (HAR) [7][8][9], which uses these sensors to identify user activity and then infers information about the context of the user's location. Therefore, it is worth exploring how to use this information to assist with indoor localization.
Recently, semantic information in the indoor environment has received increased attention. In many cases, semantic information is as valuable as the location. For example, from a human cognition perspective, in comparing the position coordinates, it is more valuable to know if a location is a room, a corridor or stairs [10]. Furthermore, it is also more convenient for a user to obtain semantic information (e.g., "turn left", "turn right", "go upstairs", "go downstairs" and "go into a room") than information about a route. However, the extraction and description of the necessary semantic information remains an open challenge.
In this paper, a method that combines PDR, HAR and landmarks is developed to accurately determine indoor localization. The proposed method requires no additional devices or expensive labor, and the user trajectory can be corrected and displayed. In addition, to solve the initial position determination problem, a hidden Markov model (HMM) that considers the characteristics of the indoor environment is used to match the continuous trajectory. Furthermore, to describe the user's indoor activities and trajectories, an indoor semantic landmark model is also constructed by detecting the user's activities. Figure 1 shows an overview of the proposed approach. In addition to the direct application in indoor localization, the built-in smartphone sensors can also be used to understand the user's movements [6], as well as to identify the indoor environment. Sensing the implied location information about the user moving in the corresponding environment provides a new opportunity for indoor mobile localization. To exploit this underlying information, some studies have been presented based on human activity recognition (HAR) [7][8][9], which uses these sensors to identify user activity and then infers information about the context of the user's location. Therefore, it is worth exploring how to use this information to assist with indoor localization.
Recently, semantic information in the indoor environment has received increased attention. In many cases, semantic information is as valuable as the location. For example, from a human cognition perspective, in comparing the position coordinates, it is more valuable to know if a location is a room, a corridor or stairs [10]. Furthermore, it is also more convenient for a user to obtain semantic information (e.g., "turn left", "turn right", "go upstairs", "go downstairs" and "go into a room") than information about a route. However, the extraction and description of the necessary semantic information remains an open challenge.
In this paper, a method that combines PDR, HAR and landmarks is developed to accurately determine indoor localization. The proposed method requires no additional devices or expensive labor, and the user trajectory can be corrected and displayed. In addition, to solve the initial position determination problem, a hidden Markov model (HMM) that considers the characteristics of the indoor environment is used to match the continuous trajectory. Furthermore, to describe the user's indoor activities and trajectories, an indoor semantic landmark model is also constructed by detecting the user's activities. Figure 1 shows an overview of the proposed approach. The remainder of the paper is organized as follows. The related works are briefly reviewed in Section 2. The primary methods are then introduced in Section 3. Section 4 presents the experimental process, and Section 5 discusses and analyzes the experimental results. Finally, the conclusions and recommendations for future work are presented in Section 6.

Related Works
Most of the existing indoor localization technologies require additional infrastructure or expensive labor and time. How to achieve reliable and accurate localization in indoor environments at a low cost is still a challenging task [11]. Compared to other methods of indoor localization, using the built-in smartphone sensors provides a more convenient and less expensive indoor localization The remainder of the paper is organized as follows. The related works are briefly reviewed in Section 2. The primary methods are then introduced in Section 3. Section 4 presents the experimental process, and Section 5 discusses and analyzes the experimental results. Finally, the conclusions and recommendations for future work are presented in Section 6.

Related Works
Most of the existing indoor localization technologies require additional infrastructure or expensive labor and time. How to achieve reliable and accurate localization in indoor environments at a low cost is still a challenging task [11]. Compared to other methods of indoor localization, using the built-in The smartphone-based PDR system uses the inbuilt inertial and orientation sensors to track the user's trajectory [31]. The main processes include step detection, step length estimation, direction estimation [20] and trajectory correction.
(a) Step detection: Peak step detection [32], or the zero-crossing step detection algorithm [33], is the most frequently-used method to detect the user's steps. To improve the robustness of the detection result, as in [34,35], the synthetic acceleration magnitude of a three-axis accelerometer is used. This is calculated as follows: where a(t) is the synthetic acceleration reading at time t, and the constant component g represents the Earth's gravity. A low-pass filter is applied to smooth the data and to remove the spurious peaks, as shown in Figure 2. The step detection process is conducted according to the following conditions [10,35]: • a(t) is the local maximum and is larger than a given threshold δ thr .

•
The time between two consecutive detected peaks is greater than the minimum step period t min . • According to human walking posture, the start of a step is the zero-crossing point before the peak. also to improve the user's localization efficiency. The simultaneous localization and semantic acquisition can be considered as a significant contribution of the proposed method.

Location Estimation and Activity Recognition
In this section, PDR is first introduced to estimate the user's location, and then, landmarks are applied to correct the location. Next, with the help of multiple phone sensors, HAR is used to identify the user's activity. Finally, an HMM is proposed in order to estimate the user's initial position.

Landmark-Based PDR
The smartphone-based PDR system uses the inbuilt inertial and orientation sensors to track the user's trajectory [31]. The main processes include step detection, step length estimation, direction estimation [20] and trajectory correction.
(a) Step detection: Peak step detection [32], or the zero-crossing step detection algorithm [33], is the most frequently-used method to detect the user's steps. To improve the robustness of the detection result, as in [34,35], the synthetic acceleration magnitude of a three-axis accelerometer is used. This is calculated as follows: where a(t) is the synthetic acceleration reading at time t, and the constant component g represents the Earth's gravity. A low-pass filter is applied to smooth the data and to remove the spurious peaks, as shown in Figure 2. The step detection process is conducted according to the following conditions [10,35]: • a(t) is the local maximum and is larger than a given threshold δ .
• The time between two consecutive detected peaks is greater than the minimum step period t .
• According to human walking posture, the start of a step is the zero-crossing point before the peak. Figure 2b shows the detected peaks marked with red circles, and the blue circles represent the start and end points of the steps.  Step detection. (a) Raw synthetic acceleration data; (b) filtered data and the step detection result. Figure 2b shows the detected peaks marked with red circles, and the blue circles represent the start and end points of the steps. The length of a step depends on the physical features of the pedestrian (height, weight, age, health status, etc.) and the current state (walking speed and step frequency) [35]. Although step length varies from step to step, even in the same person, step length can be estimated by its corresponding acceleration. A nonlinear model [34] is used to effectively estimate step length.
where a max (k) and a min (k) are the maximum and minimum values of the synthetic acceleration during step k. The coefficient µ is the stride length parameter, and it can be corrected by the landmarks.
(c) Direction estimation: Direction estimation is a challenging problem for PDR using a smartphone. The gyroscope and magnetometer in the smartphone are normally used to estimate the pedestrian's walking direction [10]. The gyroscope and magnetometer obtain the steps' direction during walking. An external environment can easily affect the magnetometer, which may lead to short-term heading estimation errors. Magnetic fields do not affect gyroscopes; however, gyroscopes do accumulate drift error over time [25]. In order to resolve each sensor's drawbacks, both sensors are combined to enhance the direction estimation [13,34].
where ω mag and ω gyro are the weighting parameters on the magnetometer's estimated direction and the gyroscope's estimated angle, respectively. The weight value changes according to the magnitude and correlation of the gyroscope and magnetometer.
(d) Trajectory correction: The raw pedestrian trajectory obtained through the above methods may encounter some bias because of the accumulated error of the PDR. To solve this problem, landmarks are used to recalibrate the errors. Depending on the location and angle of the landmarks, the step length and the angle in the raw trajectory are scaled to form a new corrected trajectory. In the process of a user's indoor walking, two situations occur when passing a landmark. One is when a user goes straight through a landmark, as shown in Figure 3a. The other is when a user turns (see Figure 3b). As shown in Figure 3, the blue lines are the raw PDR trajectory, the green lines are the corrected trajectory and the red dots indicate the landmarks. Step length estimation: The length of a step depends on the physical features of the pedestrian (height, weight, age, health status, etc.) and the current state (walking speed and step frequency) [35]. Although step length varies from step to step, even in the same person, step length can be estimated by its corresponding acceleration. A nonlinear model [34] is used to effectively estimate step length.
where ( ) and ( ) are the maximum and minimum values of the synthetic acceleration during step k. The coefficient is the stride length parameter, and it can be corrected by the landmarks.
(c) Direction estimation: Direction estimation is a challenging problem for PDR using a smartphone. The gyroscope and magnetometer in the smartphone are normally used to estimate the pedestrian's walking direction [10]. The gyroscope and magnetometer obtain the steps' direction during walking. An external environment can easily affect the magnetometer, which may lead to short-term heading estimation errors. Magnetic fields do not affect gyroscopes; however, gyroscopes do accumulate drift error over time [25]. In order to resolve each sensor's drawbacks, both sensors are combined to enhance the direction estimation [13,34].
where and are the weighting parameters on the magnetometer's estimated direction and the gyroscope's estimated angle, respectively. The weight value changes according to the magnitude and correlation of the gyroscope and magnetometer.
(d) Trajectory correction: The raw pedestrian trajectory obtained through the above methods may encounter some bias because of the accumulated error of the PDR. To solve this problem, landmarks are used to recalibrate the errors. Depending on the location and angle of the landmarks, the step length and the angle in the raw trajectory are scaled to form a new corrected trajectory. In the process of a user's indoor walking, two situations occur when passing a landmark. One is when a user goes straight through a landmark, as shown in Figure 3a. The other is when a user turns (see Figure 3b). As shown in Figure 3, the blue lines are the raw PDR trajectory, the green lines are the corrected trajectory and the red dots indicate the landmarks.

Multiple Sensor-Assisted HAR
As in PDR, the synthetic three-axis accelerometer data are used as the base data in HAR. In addition, the smartphone's magnetometer and barometer provide information about direction and height, respectively, which helps to improve the classification accuracy.
(a) Segmentation: Three different windowing techniques have been used to divide the sensor data into smaller data segments: sliding windows, event-defined windows and activity-defined windows [36,37]. Since some specific events, such as the start and the end of a step length, are critical for pedestrian location estimation, the event-defined window approach is applied in our work. To use this approach, each step's start and end points are detected, and then, the samples between them are regarded as a window. If no steps are detected over a period of time, the sliding window approach is used, in which two-second-long time windows with 50% overlap are selected [26,38].
(b) Feature extraction: Two main types of data features are extracted from each time window. Time-domain features include the mean, max, min, standard deviation, variance and signal-magnitude area (SMA). Frequency-domain features include energy, entropy and time between peaks [8]. Two time-domain features are selected-the mean and standard deviation-because they are computationally inexpensive and sufficient to classify the activities.
(c) Classification: A supervised learning method is adopted to infer user activities from the sensory data [7]. A number of different classification algorithms can be applied in HAR, such as decision tree (DT), k-nearest neighbor (KNN), support vector machine (SVM) and naive Bayes (NB) [8,9,38]. Due to the simplification and high accuracy of KNN, the KNN algorithm (see Algorithm 1) is selected to classify four activities: standing, going up (or down) stairs, walking and opening a door.

Input:
Samples that need to be categorized: X j ; the known sample pairs: (X i , y i ) Output: Prediction classification: y j 1: for every sample in the dataset to be predicted do 2: calculate the distance between (X i , y i ) and the current sample X j 3: sort the distances in increasing order 4: select the k samples with the smallest distances to X j 5: find the majority class of the k samples 6: return the majority class as the prediction classification y j

7: end For
In order to improve the classification accuracy of indoor activities, a barometer is used to determine upstairs or downstairs and to locate the user's floor. A magnetometer is used to assist in identifying the door-opening activity. Different ways of opening the door correspond to different magnetometer reactions. However, their similar performance patterns can be extracted by detecting the peak value change of the magnetometer in the sliding window. As shown in Figure 4, when the user opens a door, the magnetometer readings change significantly within a short time and then quickly return to the previous readings. Thus, the door-opening activities can be effectively identified.

The Hidden Markov Model
When the user's initial location is unknown, HMM is used to match the motion sequence with indoor landmarks. PDR and HAR also provide useful information for matching and location estimation. As a widely-applied statistical model, HMM has a unique advantage in processing natural language, and it can capture the hidden states in a sequence of motion observations [30,36]. There are five basic elements in HMM: two sets of states (N, M) and three probability matrices (A, B, π).
Sensors 2017, 17, 649 7 of 24 estimation. As a widely-applied statistical model, HMM has a unique advantage in processing natural language, and it can capture the hidden states in a sequence of motion observations [30,36].
There are five basic elements in HMM: two sets of states (N, M) and three probability matrices (A, B, π).
(a) (b) Because of the unique indoor environment, HMM is presented as follows: (1) N represents the hidden states in the model, which can be transferred between each other. The hidden states in HMM are landmark nodes in the indoor environment, such as a door, stairs or a turning point. (2) M indicates the observations of each hidden state, which are the user's direction selection (east, south, west and north) and the activity result from HAR. (3) A and B state the transition probability and the emission probability, respectively. The pedestrian moves indoors from one node to another, and when the direction of the current state is determined, the reachable nodes are reduced. In order to reduce the algorithm's complexity, A and B are combined to give a transition probability set . [ , , , ] represent the transition probabilities of different directions. (4) π is the distribution in the initial state. The magnetometer and barometer provide direction and altitude information when the user starts recording, which helps to reduce the number of candidate nodes in the initial environment. If the starting point is unknown, the same initial probability is given.
The Viterbi algorithm uses a recursive approach to find the most probable sequence of hidden states. It calculates the most probable path to a middle state, which achieves the maximum probability in the local trajectory. Choosing the state's maximum local probability can determine the best global trajectory. However, in the indoor environment, using a partial maximum probability to obtain the global path is not appropriate, because the probability between hidden states could be zero, and a local best trajectory could become a dead trajectory in the next moment. In this study, the distance information from PDR and the activities information from HAR are combined with the Viterbi algorithm to compute the most likely trajectory. The improved Viterbi algorithm (Algorithm 2) is proposed as follows: Because of the unique indoor environment, HMM is presented as follows: (1) N represents the hidden states in the model, which can be transferred between each other.
The hidden states in HMM are landmark nodes in the indoor environment, such as a door, stairs or a turning point. (2) M indicates the observations of each hidden state, which are the user's direction selection (east, south, west and north) and the activity result from HAR. (3) A and B state the transition probability and the emission probability, respectively. The pedestrian moves indoors from one node to another, and when the direction of the current state is determined, the reachable nodes are reduced. In order to reduce the algorithm's complexity, A and B are combined to give a transition probability set C.
[C e , C s , C w , C n ] represent the transition probabilities of different directions. (4) π is the distribution in the initial state. The magnetometer and barometer provide direction and altitude information when the user starts recording, which helps to reduce the number of candidate nodes in the initial environment. If the starting point is unknown, the same initial probability is given.
The Viterbi algorithm uses a recursive approach to find the most probable sequence of hidden states. It calculates the most probable path to a middle state, which achieves the maximum probability in the local trajectory. Choosing the state's maximum local probability can determine the best global trajectory. However, in the indoor environment, using a partial maximum probability to obtain the global path is not appropriate, because the probability between hidden states could be zero, and a local best trajectory could become a dead trajectory in the next moment. In this study, the distance information from PDR and the activities information from HAR are combined with the Viterbi algorithm to compute the most likely trajectory. The improved Viterbi algorithm (Algorithm 2) is proposed as follows: Algorithm 2. Improved Viterbi algorithm.

Input:
The proposed HMM tuples < N = {n i |i = 1, 2, . . . , N N }, M = {m i |i = 1, 2, . . . , N M }, C, π >; HAR classification results H = {h i |i = 1, 2, . . . , N H }; PDR distance information D = {d i |i = 1, 2, . . . , N D }; Initial direction of magnetometer O; Initial pressure of barometer F; d σ is the distance threshold. Output: Prediction trajectory. 1: O start ← O , F start ← F /* Determine the initial orientation and floor 2: for i from 1 to N M do 3: for each path pass through n i−1 to n i do and (P(n i )>0) then /* Determine whether the distance between two landmark nodes coincides with the distance information estimated by PDR 5: Path(N s , P(N s )) ← Obtain the subset data 6: end if 7: end for 8: end for 9: for path j in Path(N s , P(N s )) do 10: With the determination of the user's trajectory, the initial position can be obtained through the first landmark point and the PDR information.

Trajectory Information Collection
Definition 1. Trajectory information: A trajectory is defined as a six-tuple Γ : I, T, D, A, U, L , where I is the ID of the trajectory, and T and D are the timestamp and position information, respectively, of each step. U is the direction change list; A is the activities information list; and L is the landmark list. Figure 5 shows the trajectory information collection process.

Trajectory Information Collection
Definition 1. Trajectory information: A trajectory is defined as a six-tuple Γ: < I, T, D, A, U, L >, where I is the ID of the trajectory, and T and D are the timestamp and position information, respectively, of each step. U is the direction change list; A is the activities information list; and L is the landmark list. Figure 5 shows the trajectory information collection process.  For example, if a user went from Entrance (ET) to Room 108, I is assigned ET-R108. From PDR, the timestamp and xyz coordinate value for each step can be obtained and can be represented as: where n denotes the number of steps detected. Using the HAR method, the user's activity information can be collected. Hence, A can be given as follows: A = {Standing, Walking, Going up stairs, Opening a door} (6) The direction change list U can be obtained by the gyroscope. It should be noted that we detected only large directional changes (>15 • ), and thus, walking along a smaller arc was not detected. Because most of the turns could be completed in less than five steps, a five-step turn detection method (see Algorithm 3) is proposed to determine the direction change activity. Algorithm 3. Five-step turn detection algorithm.

•
If the going-up-(or -down)-stairs activity is detected, the nearest stairs landmark is added to L. • If direction change activity (see Algorithm 3) is detected, the nearest turn landmark is added to L. • If a door-opening activity is detected, the nearest door landmark is added to L.

Semantics Extraction
Definition 2. Semantic landmark: A semantic landmark S[l] consists of five parts: Id, attribute, adjacent segments, direction information and semantic description. Id is the landmark identifier. Attribute is one of the three types of landmarks: stairs, turn, or door. Adjacent segments contain the distance and semantic information between the current landmark and the next landmarks. Direction information and semantic description indicate the direction information and the semantic information when the user passes the landmark, respectively, as shown in Figure 6.
Definition 2. Semantic landmark: A semantic landmark [ ] consists of five parts: Id, attribute, adjacent segments, direction information and semantic description. Id is the landmark identifier. Attribute is one of the three types of landmarks: stairs, turn, or door. Adjacent segments contain the distance and semantic information between the current landmark and the next landmarks. Direction information and semantic description indicate the direction information and the semantic information when the user passes the landmark, respectively, as shown in Figure 6. Figure 6. Semantic landmark and adjacent segments. An adjacent segment consists of four parts: Id, distance, direction and semantic description. Id is the identifier of a segment. Distance represents the distance between the two landmarks that make up the segment. Direction represents the direction of the segment. Semantics indicates the semantic information that can be obtained.
A sequence of landmarks can denote a trajectory. Therefore, adding semantic information to the landmarks and their adjacent segments can describe the trajectory. The semantic description of trajectories is expressed as follows: where A semantic landmark or an adjacent segment can store multiple semantics and provides semantics based on the detected activity. According to the trajectory information Γ: < I, T, D, A, U, L >, the semantic information can be obtained as shown in Table 1.  , and represent the current landmark, the previous landmark and the next landmark.
Detected (U) and Find ( ( )) indicate that turn activity is detected and that a nearby turn landmark is found. If the user's activity information is detected, but there are no corresponding landmarks nearby, this activity's semantics is added to the corresponding adjacent segment, as shown in Table 2. Figure 6. Semantic landmark and adjacent segments. An adjacent segment consists of four parts: Id, distance, direction and semantic description. Id is the identifier of a segment. Distance represents the distance between the two landmarks that make up the segment. Direction represents the direction of the segment. Semantics indicates the semantic information that can be obtained.
A sequence of landmarks can denote a trajectory. Therefore, adding semantic information to the landmarks and their adjacent segments can describe the trajectory. The semantic description of trajectories is expressed as follows:  Table 1.  Detected (U) and Find (L(turn)) indicate that turn activity is detected and that a nearby turn landmark is found. If the user's activity information is detected, but there are no corresponding landmarks nearby, this activity's semantics is added to the corresponding adjacent segment, as shown in Table 2.  According to the semantics acquisition rules shown in Tables 1 and 2, the semantic landmarks are constructed as shown in Figure 7.    As the above process shows, both semantic information and distance information are added to the semantic model. The order information (e.g., "Turn left at the 2nd turning point") is added according to the following rule: • If the current landmark's adjacent segments contain multiple turn or door landmarks and they have the same semantics, sort them by distance and then provide them with the order information Order or Order . As the above process shows, both semantic information and distance information are added to the semantic model. The order information (e.g., "Turn left at the 2nd turning point") is added according to the following rule:

•
If the current landmark's adjacent segments contain multiple turn or door landmarks and they have the same semantics, sort them by distance and then provide them with the order information Order turn or Order door .

Experiment
The experiment was performed at the State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing (LIESMARS) at Wuhan University, China. An Android mobile phone and the indoor floor plans of LIESMARS were used in the experiment. It should be noted that, in this study, only the hand-held situation was considered. The experimental process is shown in Figure 8.

Experiment
The experiment was performed at the State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing (LIESMARS) at Wuhan University, China. An Android mobile phone and the indoor floor plans of LIESMARS were used in the experiment. It should be noted that, in this study, only the hand-held situation was considered. The experimental process is shown in Figure 8. In the following, the pre-knowledge is first provided. Multiple user trajectories are then presented, including a trajectory on a single floor, a trajectory on multiple floors and a trajectory without knowing the starting point. Finally, the semantics acquisition process and results are described.

Pre-Knowledge
Door points, stair points and turn points were used as landmarks in the experiment, as shown in Figure 9. In indoor spaces, pedestrians tend to walk along a central line, and they tend to go in a straight line between places. The intersection of the corridor area's centerline was therefore selected as a landmark. Most turn points near door points were not assumed to be landmarks, because the door points could replace them as landmarks. The above principles were used to generate the landmarks. The proposed approach does not focus on landmark extraction, but on trajectory generation and the steps in the semantics acquisition.
The smartphone's barometer and magnetometer were used to determine the initial orientation and locate the user's floor, respectively. However, they need to be analyzed first. We collected barometer data at eight different locations on each layer, as shown in Table 3. As the change in barometer readings in the same layer was not significant, we used the average of the collected barometer readings as a benchmark to determine the user's floor. If the current barometer reading is within ±0.1 hpa of B(f1) or B(f2), the corresponding floor is determined.  In the following, the pre-knowledge is first provided. Multiple user trajectories are then presented, including a trajectory on a single floor, a trajectory on multiple floors and a trajectory without knowing the starting point. Finally, the semantics acquisition process and results are described.

Pre-Knowledge
Door points, stair points and turn points were used as landmarks in the experiment, as shown in Figure 9. In indoor spaces, pedestrians tend to walk along a central line, and they tend to go in a straight line between places. The intersection of the corridor area's centerline was therefore selected as a landmark. Most turn points near door points were not assumed to be landmarks, because the door points could replace them as landmarks. The above principles were used to generate the landmarks. The proposed approach does not focus on landmark extraction, but on trajectory generation and the steps in the semantics acquisition.
The smartphone's barometer and magnetometer were used to determine the initial orientation and locate the user's floor, respectively. However, they need to be analyzed first. We collected barometer data at eight different locations on each layer, as shown in Table 3. As the change in barometer readings in the same layer was not significant, we used the average of the collected barometer readings as a benchmark to determine the user's floor. If the current barometer reading is within ±0.1 hpa of B(f1) or B(f2), the corresponding floor is determined.  Compared to the barometer, the magnetometer is more unstable, so 80 north-facing magnetometer readings were collected at various locations within the building. The distribution of the difference between the collected magnetometer data and True North is shown in Figure 10a. Most of the magnetic differences are between −5° and 15° and occasionally more than 20°. Based on the above data, a threshold of 30° was chosen to determine the direction semantics of the initial position. Direction information like 'north' (330 < θ < 360, 0 ≤ θ < 30), 'east' (60 ≤ θ <120), 'south' (150 ≤ θ <210) and 'west' (240 ≤ θ < 300) can be obtained (Figure 10b) when the magnetometer reading θ of the user's initial position is acquired. In addition, the user's step length parameters need to be determined over a short distance (0.45 in our experiment). When the user goes upstairs, the horizontal and vertical distances of each step are given as fixed values (0.3 m and 0.15 m in the experiment). Compared to the barometer, the magnetometer is more unstable, so 80 north-facing magnetometer readings were collected at various locations within the building. The distribution of the difference between the collected magnetometer data and True North is shown in Figure 10a. Most of the magnetic differences are between −5 • and 15 • and occasionally more than 20 • . Based on the above data, a threshold of 30 • was chosen to determine the direction semantics of the initial position. Direction information like 'north' (330 < θ < 360, 0 ≤ θ < 30), 'east' (60 ≤ θ <120), 'south' (150 ≤ θ <210) and 'west' (240 ≤ θ < 300) can be obtained (Figure 10b) when the magnetometer reading θ of the user's initial position is acquired. Compared to the barometer, the magnetometer is more unstable, so 80 north-facing magnetometer readings were collected at various locations within the building. The distribution of the difference between the collected magnetometer data and True North is shown in Figure 10a. Most of the magnetic differences are between −5° and 15° and occasionally more than 20°. Based on the above data, a threshold of 30° was chosen to determine the direction semantics of the initial position. Direction information like 'north' (330 < θ < 360, 0 ≤ θ < 30), 'east' (60 ≤ θ <120), 'south' (150 ≤ θ <210) and 'west' (240 ≤ θ < 300) can be obtained (Figure 10b) when the magnetometer reading θ of the user's initial position is acquired.  In addition, the user's step length parameters need to be determined over a short distance (0.45 in our experiment). When the user goes upstairs, the horizontal and vertical distances of each step are given as fixed values (0.3 m and 0.15 m in the experiment).

Trajectory Generation and Correction
To present a user's trajectory, HAR is performed to identify landmarks, and then, these landmarks are used to correct the PDR trajectory.
Because the HAR training set requires a variety of activities, 25 trajectories from the entrance to each room on the second floor were chosen. If a room has multiple doors, each door corresponds to a trajectory. In order to obtain information from standing activity samples, the user needs to stand for a while at the beginning point and the end point of each trajectory. A training sample is shown in Figure 11.

Trajectory Generation and Correction
To present a user's trajectory, HAR is performed to identify landmarks, and then, these landmarks are used to correct the PDR trajectory.
Because the HAR training set requires a variety of activities, 25 trajectories from the entrance to each room on the second floor were chosen. If a room has multiple doors, each door corresponds to a trajectory. In order to obtain information from standing activity samples, the user needs to stand for a while at the beginning point and the end point of each trajectory. A training sample is shown in Figure 11. Landmarks that the user passes can be determined using the activity classification results provided by HAR, and then, the user's trajectory can be corrected. Figure 12 shows the trajectory from the entrance to Room 108. The blue points indicate the raw trajectory, and the red points indicate the corrected trajectory. The corrected trajectory is extremely close to the ground truth trajectory. In addition, only four landmarks were used: stairs landmarks s0 and s1, turn landmark u0 and door landmark r8.
For trajectories on multiple floors, height information is added to each step point. Figure 13 shows the trajectory from the entrance to Room 201. The blue points and red points indicate the raw trajectory and the corrected trajectory, respectively.
When the user's initial location is unknown, the direction observation sequence is obtained from the direction sensors. Information about position and activities is obtained from PDR and HAR, respectively. When the user trajectory is determined, the starting position can be inferred from PDR.
The user went from point S to point E. The trajectory when using only PDR is shown in Figure 14a, and the direction observation sequence is obtained. Landmarks that the user passes can be determined using the activity classification results provided by HAR, and then, the user's trajectory can be corrected. Figure 12 shows the trajectory from the entrance to Room 108. The blue points indicate the raw trajectory, and the red points indicate the corrected trajectory. The corrected trajectory is extremely close to the ground truth trajectory. In addition, only four landmarks were used: stairs landmarks s0 and s1, turn landmark u0 and door landmark r8.
For trajectories on multiple floors, height information is added to each step point. Figure 13 shows the trajectory from the entrance to Room 201. The blue points and red points indicate the raw trajectory and the corrected trajectory, respectively.
When the user's initial location is unknown, the direction observation sequence is obtained from the direction sensors. Information about position and activities is obtained from PDR and HAR, respectively. When the user trajectory is determined, the starting position can be inferred from PDR.
The user went from point S to point E. The trajectory when using only PDR is shown in Figure 14a, and the direction observation sequence is obtained.
From PDR, the distance between the landmarks of two adjacent observation sequences is obtained, which is denoted by D. For example, represents the distance from the starting point to the first landmark, and denotes the distance from the first landmark to the second landmark. represents the distance from the end point to the last landmark.
Information about activities is obtained from HAR. A = {Standing, Opening a door, Walking, Opening a door, Walking, Standing } The matching process used in the algorithm is shown in Table 4. To simplify the proposed model, a flag was abstracted to express similar landmarks. For example, DN stands for the adjacent doors at the north side of the corridor: dn = [ 0 , 2 , 4 , 6 , 8 , 10 ], ds = [ 1 , 3 , 5 , 7 , 9 ], dw = [ 15,17,19,21], de = [ 14,16,18,20] (see Figure 9). The virtual landmark E indicates the connecting points between the doors and the corridor. For example, E(d3) indicates the connecting point between door d3 and the corridor.
A trajectory is represented by a list, and the elements in the list represent the points that have been passed. The HAR results are denoted by A.
= (s, w, o) indicates the sequence of activities from the start point to the first landmark, which is "Standing-Walking-Opening a door".
It should be noted that we used the real landmark coordinates to correct the PDR results when landmarks were detected. As the trajectory ended, de was given to estimate the final position, and the matching trajectory was obtained.
From PDR, the distance between the landmarks of two adjacent observation sequences is obtained, which is denoted by D. For example, D s represents the distance from the starting point to the first landmark, and D 1−2 denotes the distance from the first landmark to the second landmark. D e represents the distance from the end point to the last landmark.
Information about activities is obtained from HAR. A = {Standing, Opening a door, Walking, Opening a door, Walking, Standing} The matching process used in the algorithm is shown in Table 4. To simplify the proposed model, a flag was abstracted to express similar landmarks. For example, DN stands for the adjacent doors at the north side of the corridor: dn =[d0,d2,d4,d6,d8,d10], ds =[d1,d3,d5,d7,d9], dw = [d15,d17,d19,d21], de = [d14,d16,d18,d20] (see Figure 9). The virtual landmark E indicates the connecting points between the doors and the corridor. For example, E(d3) indicates the connecting point between door d3 and the corridor.
A trajectory is represented by a list, and the elements in the list represent the points that have been passed. The HAR results are denoted by A. A s = (s, w, o) indicates the sequence of activities from the start point to the first landmark, which is "Standing-Walking-Opening a door".
It should be noted that we used the real landmark coordinates to correct the PDR results when landmarks were detected. As the trajectory ended, d e was given to estimate the final position, and the matching trajectory was obtained.  = (s, w, o),

Semantics Extraction
According to the proposed semantics extraction method described in Section 3.2, the semantics for landmarks in trajectory ET-R108 (in Figure 12) were obtained as shown in Table 5. The start and end points were considered as virtual landmarks, which have no specific attributes or fixed locations. After sufficient trajectories were acquired, the landmarks' full semantics could be obtained. Taking the turn landmark (u0) as an example, the complete semantics are as shown in Table 6. In addition, the order of the landmarks could be obtained using the method described in Section 3.2.2. Table 6. Complete semantics of turn u0.

Name Expression
Id u0 Attribute The construction of complete semantics for all of the indoor landmarks requires a large amount of trajectory data. However, the proposed approach only considers the complete semantics of key landmarks and the partial semantics of non-key landmarks, because they can describe most of the user's activities.

Discussion
Firstly, the performance of the HAR classification is evaluated. The location errors in a trajectory are then described. Finally, the accuracy of the landmark matching is analyzed.

HAR Classification Error
In order to evaluate the performance of the HAR classification, 10-fold cross-validation [39] was used. In this method, the dataset is divided into 10 parts: nine parts are used for training, and one part is used for testing each iteration. The classification accuracy of the common classifiers was compared with the proposed classifier, and two different window segmentation approaches were compared. Since the error rate of our step detection is quite low, at 0.19% (total steps: 3092; error detection steps: six), it is more convenient to use the event-defined window approach to sense the user's activity. As shown in Table 7, the results show that the event-defined window approach performs better than the sliding window approach, which applies two-second-long time windows with a 50% overlap. Many different performance metrics could have been be used to evaluate the HAR classification [8]. A confusion matrix was adopted, which is a method commonly used to identify error types (false positives and negatives) [40]. Several different performance metrics-accuracy (the standard metric to express classification performance), precision, recall and F-measure-could be calculated based on the matrix [9]. Table 8 shows the confusion matrix used to evaluate the results of the KNN classification activities. As shown in Table 8, the proposed method achieves an extremely high accuracy (>99%) in detecting stairs and walking activities. Some errors occur in identifying door-opening and standing activities. However, by detecting the magnetometer change, it is easy to distinguish between these two activities, thereby reducing the amount of errors.

Localization Error
The localization error of trajectory ET-R108 is shown in Figure 15a; the blue line indicates the original PDR trajectory, and the orange line indicates the location errors after only the landmarks were corrected. The results show that the PDR errors increase with distance, and a high average localization accuracy (0.59 m) is achieved when we use the landmarks to correct the cumulative errors. Figure 15b shows the cumulative error distribution of the 25 test trajectories. We can see that the proposed approach is more stable than using only PDR, and the average error is reduced from 1.79 m to 0.52 m.  As shown in Table 8, the proposed method achieves an extremely high accuracy (>99%) in detecting stairs and walking activities. Some errors occur in identifying door-opening and standing activities. However, by detecting the magnetometer change, it is easy to distinguish between these two activities, thereby reducing the amount of errors.

Localization Error
The localization error of trajectory ET-R108 is shown in Figure 15a; the blue line indicates the original PDR trajectory, and the orange line indicates the location errors after only the landmarks were corrected. The results show that the PDR errors increase with distance, and a high average localization accuracy (0.59 m) is achieved when we use the landmarks to correct the cumulative errors. Figure 15b shows the cumulative error distribution of the 25 test trajectories. We can see that the proposed approach is more stable than using only PDR, and the average error is reduced from 1.79 m to 0.52 m.

Landmark Matching Errors
The shortest distance method was used to match the landmarks. As shown in Figure 16, the result matches the partial trajectories.

Landmark Matching Errors
The shortest distance method was used to match the landmarks. As shown in Figure 16, the result matches the partial trajectories. Although the trajectories were corrected at the turn landmark (the red point), the PDR-estimated user location still introduced errors, particularly when the user was far from the previous landmark. In the experiment, an error occurred because the distances of d18 and d20 were extremely close and far from the turning point landmark (t6). We can also see that a similar error occurred in turn landmark t5, which is matched to landmark t6 (see Table 9).

Comprehensive Comparison
Some similar indoor localization schemes, which require no additional devices or expensive labor, are compared in terms of requirement, sensors, user participation, accuracy, expression and extensibility in Table 10. Each technique has its own advantages. Zee [22] tracked a pedestrian's trajectory without user participation, and a Wi-Fi training set was simultaneously collected, which can be used in Wi-Fi fingerprinting-based localization techniques. UnLoc [25] only needs a door location as the basic input information and simultaneously computes the user's location and detects various landmarks. Compared to the above localization schemes, the proposed approach needs more basic information; however, the information allows us to obtain a better localization accuracy. Moreover, a semantic landmark model was constructed during the localization process, which can be used not only to describe the user's trajectory, but also to improve the localization efficiency. The overall scores of the three approaches are shown in Figure 17.  Although the trajectories were corrected at the turn landmark (the red point), the PDR-estimated user location still introduced errors, particularly when the user was far from the previous landmark. In the experiment, an error occurred because the distances of d18 and d20 were extremely close and far from the turning point landmark (t6). We can also see that a similar error occurred in turn landmark t5, which is matched to landmark t6 (see Table 9).

Comprehensive Comparison
Some similar indoor localization schemes, which require no additional devices or expensive labor, are compared in terms of requirement, sensors, user participation, accuracy, expression and extensibility in Table 10. Each technique has its own advantages. Zee [22] tracked a pedestrian's trajectory without user participation, and a Wi-Fi training set was simultaneously collected, which can be used in Wi-Fi fingerprinting-based localization techniques. UnLoc [25] only needs a door location as the basic input information and simultaneously computes the user's location and detects various landmarks. Compared to the above localization schemes, the proposed approach needs more basic information; however, the information allows us to obtain a better localization accuracy. Moreover, a semantic landmark model was constructed during the localization process, which can be used not only to describe the user's trajectory, but also to improve the localization efficiency. The overall scores of the three approaches are shown in Figure 17.  Figure 17. Overall score of Zee, UnLoc and the proposed approach.

Computational Complexity
In order to verify the validity of the semantic model for localization and better analyze the computational complexity of semantics-assisted localization method, a further experiment was conducted (Figure 18a). In this experiment, the user started from any place on the second floor, and we wanted to determine the user's trajectory as soon as possible. The overall error and time complexity in the trajectory matching process are used to evaluate the proposed method.

Computational Complexity
In order to verify the validity of the semantic model for localization and better analyze the computational complexity of semantics-assisted localization method, a further experiment was conducted (Figure 18a). In this experiment, the user started from any place on the second floor, and we wanted to determine the user's trajectory as soon as possible. The overall error and time complexity in the trajectory matching process are used to evaluate the proposed method.
As shown in Figure 18b, although only a few semantics are provided, the trajectory error drops rapidly (the average error drops from 9.25 m to 0.48 m). When the first semantic information is obtained from the trajectory, there are five trajectories satisfying the condition. In order to match the semantic information, each landmark point needs to be traversed once, so the time complexity is O(N). The next search only needs to traverse the semantics of the trajectory segments that satisfy the previous condition. These trajectory segments are 'l t8 l t7 ', 'l t8 l t6 ', 'l t7 l t6 ', 'l t4 l t2 ', 'l t1 l t0 ', 'l t1 l s2 ', 'l t0 l s2 ', and the time complexity is O(7). The trajectory is determined after matching the third semantic information, and the trajectory error is similar to the previous trajectory localization experiment, where the initial location was known. The above description is provided in Table 11. Compared to the traditional trajectory matching method, which yields a time complexity of O(NT) or O(N2T), the proposed semantic matching method is more efficient. Since it does not need to traverse all of the states every time, the time complexity is much less than O(NT).

Computational Complexity
In order to verify the validity of the semantic model for localization and better analyze the computational complexity of semantics-assisted localization method, a further experiment was conducted (Figure 18a). In this experiment, the user started from any place on the second floor, and we wanted to determine the user's trajectory as soon as possible. The overall error and time complexity in the trajectory matching process are used to evaluate the proposed method. As shown in Figure 18b, although only a few semantics are provided, the trajectory error drops rapidly (the average error drops from 9.25 m to 0.48 m). When the first semantic information is obtained from the trajectory, there are five trajectories satisfying the condition. In order to match   1 The number of trajectories after semantic matching.

Conclusions
In this paper, PDR, HAR and landmarks have been combined to achieve indoor mobile localization. The landmark information was extracted from indoor maps, and then, HAR was used to detect the landmarks. These landmarks were then used to correct the PDR trajectories and to achieve a high level of accuracy. Without knowing the initial position, HMM was performed to match the motion sequence to the indoor landmarks. Because semantic information was also assigned to the landmarks, the semantic description of a trajectory was obtained, which has the potential to provide more applications and better services. Moreover, the experiment was implemented in an indoor environment, to fully evaluate the proposed approach. The results not only show a high localization accuracy, but also confirm the value of semantic information.
More extensive research can be studied in the future. For example, phone sensing could be used to recognize more activities, particularly complex activities, and more semantic information could be extracted. In addition, complex experimental conditions, such as various trajectories, a location-independent mobile phone and all kinds of users, could be included in future studies. The semantic model used in this study does not contain all possible semantics, and rich semantic information could be obtained by using the data from crowdsourced trajectories. Furthermore, real-time localization (including semantic information) is also the priority of our future work.