Applying Movement Constraints to BLE RSSI-Based Indoor Positioning for Extracting Valid Semantic Trajectories

Indoor positioning techniques, owing to received signal strength indicator (RSSI)-based sensors, can provide useful trajectory-based services. These services include user movement analytics, next-to-visit recommendation, and hotspot detection. However, the value of RSSI is often disturbed due to obstacles in indoor environment, such as doors, walls, and furnitures. Therefore, many indoor positioning techniques still extract an invalid trajectory from the disturbed RSSI. An invalid trajectory contains distant or impossible consecutive positions within a short time, which is unlikely in a real-world scenario. In this study, we enhanced indoor positioning techniques with movement constraints on BLE (Bluetooth Low Energy) RSSI data to prevent an invalid semantic indoor trajectory. The movement constraints ensure that a predicted semantic position cannot be far apart from the previous position. Furthermore, we can extend any indoor positioning technique using these movement constraints. We conducted comprehensive experimental studies on real BLE RSSI datasets from various indoor environment scenarios. The experimental results demonstrated that the proposed approach effectively extracts valid indoor semantic trajectories from the RSSI data.


Introduction
Indoor location-based services have given rise to the requirement of establishing various safety measures in IoT (Internet of Things)-enabled smart buildings in recent years. Indoor trajectories [1] support these services for several applications, such as people behavior analytics, movement patterns extraction, next-to-visit recommendations, and hotspot detection. Positioning devices, such as Wi-Fi [2], BLE (Bluetooth Low Energy) [3][4][5][6][7][8], and RFID (radio frequency identification) [9], provide the trajectories using indoor positioning techniques.
Indoor trajectory is a sequence of paired timestamps and visited positions of a user in indoor spaces. To capture these positions, the users hold a device, such as a smartphone, that receives the RSSI (received strength signal indicator) of the positioning devices in the indoor space while walking. However, the RSSI is often missing or unstable due to delayed transmission, interference from the walls, and clashing signals. Thus, we require a technique called indoor positioning to estimate the user's correct position.
Indoor positioning methods usually require a dataset of collected RSSI data and the associated semantic position. This dataset is called the reference set. Then, the methods estimate the current user position using the knowledge from the reference set. Some popular indoor positioning techniques are the Hidden Markov Model (HMM) [4,5,[9][10][11][12], k-nearest neighbors (kNN) [6][7][8], and Deep Neural   Figure 1b, we see an extracted 2D indoor trajectory that denotes the movements of a user from position (6,7) at t 1 to position (8,3) at t 2 and then to position (9,7.5) at t 3 in a short time in Figure 1c. Thus, if we know that position (6,7) is at the Corridor AB, position (8,3) is at Corridor-BD, and position (9,7.5) is at Room B-North, we can define a more understandable trajectory such as Corridor AB → Corridor BD → Room B-North.
Although we can translate the 2D positioning to semantic positions using a type of mapping technique, due to the erroneous indoor positioning techniques, we should consider a way to prevent the extraction of invalid trajectories. An invalid trajectory contains a distant or an impossible displacement, such as moving directly from a room to another room separated by wall without a door. Thus, we introduce movement constraints to restrict an indoor positioning technique to only infer a position that is close and not obstructed from previous positions. Figure 1b, because of incorrect positioning, it is impossible to move from Corridor AB to Corridor BD as depicted in Figure 1c. Therefore, the extracted trajectory is an invalid trajectory because it contains an impossible movement. This impractical movement, however, can be restricted if we define a movement constraint: we can only go to Room B-South after Corridor AB. Then, using the movement constraint, we can get a valid trajectory Corridor AB → Room-B-South → Room B-North as depicted in Figure 1c.

Example 2. In
In this study, we applied movement constraints in RSSI-based indoor semantic positioning and trajectory extraction. Considering the time interval of each inferred position, which should be short, consecutive positions could not be far apart or blocked by an obstacle. Thus, the movement constraint can be used to prevent the occurence of far or impossible consecutive positions. Therefore, we can guarantee the validity of the extracted trajectory.
The indoor positioning techniques can apply the movement constraints to estimate the current semantic position of a user. We can extend HMM to apply the movement constraints as state transitions, considering the semantic positions as the states. Meanwhile, kNN can be extended with the movement constraints to reduce the search space of current semantic position prediction only to nearby semantic positions by analyzing previously estimated semantic positions. Similar to kNN, other machine learning approaches, such as neural networks, can work as indoor positioning techniques and can prevent the occurence of invalid trajectories by limiting the range of outputs to nearby consecutive positions.
We performed experiments using three real datasets of people moving inside a building using BLE beacons as RSSI-based positioning devices. To tackle incorrect indoor positioning, we performed sliding-window aggregation to reduce the the instability and incompleteness of BLE RSSI data. Different scenarios for semantic position definitions and beacon deployment were considered. To measure the validity and quality of the approach, we devised several metrics for semantic positions with some resemblance to 2D setting.
The key contributions of this paper can be summarized as follows.
• We adopt movement constraints to machine learning-based indoor positioning methods, such as kNN, HMM, and neural networks, to extract valid semantic trajectories in real time from the BLE beacons RSSI in indoor environment.

•
We presented a detailed experimental evaluation for comparing our approach with current state-of-the-art approaches for different environments. The experimental results demonstrate the performance benefits and feasibility setting of the proposed constrained approach.
The rest of this paper is organized as follows. Section 2 provides the related research. Then, we continue to explain the basic knowledge of this work in Section 3. Section 4 describes the proposed constrained approach using several indoor positioning techniques. Section 5 presents the details of the experiment design, the results, and the discussions. Finally, we conclude this paper in Section 6.

Related Works
In this section, we briefly review some related works and provide a comparison to the proposed method.

RSSI-Based Indoor Positioning Techniques
Indoor positioning yields the position of a current user from given several measurements. Some techniques utilize inertial sensors [4,17] and RSSI-based sensors [4,5,7,8,13,17] in an indoor environment. However, most of them work on 2D exact positions, such as particle filter [17], kNN [7,8], and reinforcement learning [13]. Another approach [3] directly infers the room where a user resides by performing classification using convolutional neural network on transformed images that consider BLE signals and positions as feature. However, only a few methods [13,17] along with the HMM method [4,5] consider the trajectory as the output of indoor positioning. Not dealing with this issue leads to invalid trajectory, as the consecutive positioning result may be far apart.

Indoor Trajectory Extraction
Unlike the outdoors, the indoor environment may have a different sense of semantic position and trajectory. Indoor semantic positions tend to have smaller coverage and finer information, e.g., toilet or hallway, unlike outdoor semantic positions, e.g., restaurant or office. One of the representations of the movements on an outdoor semantic trajectory is the road-network [18]. The vehicles move in accordance to the road network as they cannot trespass a building in a normal situation, e.g., not breaking any traffic laws. However, in the indoor environment, we cannot restrict the movement to only the passages and corridors. A large indoor space can contain several semantic positions owing to the specific definition of the area, e.g., a hall can contain numerous exhibition objects and we would like to identify a semantic position as the area around an object. Thus, constructing a semantic trajectory in an indoor environment is different from that in the outdoor environment.
The semantic indoor trajectory extraction process is similar to RFID cleansing [9,19,20]. Similar to Reference [9], we do not assume that all detailed characteristics or spatiotemporal constraints of the BLE beacons in the indoor environment are known. This case is different from previous works [5,19,20] where a position was directly related to a beacon. According to Reference [9], the learning-based method cleans the RFID of the trajectory data to an RFID observation at a time. In our case, we may capture the RSSI signals from different beacons or miss the signal completely at some semantic positions. Thus, we leverage the machine learning-based approach to infer the semantic location from the overlapping and missing observations.
In contrast, we consider a case of indoor semantic trajectory extraction similar to outdoor map-matching [11,12], especially in road networks. Most of these cases use incremental HMM, which takes the distance directly as the observation (GPS data). In indoor cases, we can compute the distance [10] or directly use the existence of captured RSSI in the current position [5] as observation, even though the RSSI is incomplete and noisy. However, the number of deployed beacons in such a setting is usually large. Previous works rarely studied map-matching using RSSI data in different environments, such as sparsely deployed beacons.
Moreover, the use of non-incremental map-matching techniques in indoor semantic positions [5] ensures that the user moves in a valid trajectory and outputs optimal result. However, it needs to see the full trajectory. This style is not applicable in a real-time trajectory extraction case. In contrast, despite yielding suboptimal results, the incremental map-matching techniques work in real time and may suffer a "ping-pong" effect [12], similar to invalid movements. Several online HMMs have attempted to solve the issue. The bounded variable sliding window (BVSW) approach on the online viterbi algorithm [11] solves this issue in outdoor environment, while the local HMM [4] estimates the 2D position using Wi-Fi signal in the indoor case. However, the local HMM still suffers from some invalid movements.
To prevent this issue, we apply movement constraints on machine learning (ML)-based RSSI indoor positioning. We aim to predict the next position that complies with the real-world situation that should not be too far from the current position. These constraints apply to any machine learning-based indoor positioning method, with special cases for HMM and kNN. The constraints reflect the transition probability in HMM and reduce the search space for kNN. This approach also applies to different scenarios of beacon deployment in the indoor environment. Thus, based on this approach, we can extract a valid indoor semantic trajectory, which has been demonstrated through the experiment result.

Preliminaries
In this section, we present the problem definition in Section 3.1, notions of the input of BLE RSSI sequences in Section 3.2, and the notions of indoor semantic trajectory in Section 3.3.

Problem Definition
Our problem, depicted in Figure 2, is described as follows. We need to extract valid trajectories from the input of unlabeled BLE RSSI sequences in an indoor environment while the user is moving.
To perform this, we use the ML-based indoor positioning with constraints that learns the characteristic of the BLE RSSI sequences from the previously collected BLE RSSI dataset with semantic position sequences as labels. The movement constraints come from the indoor floor plan, which represents the information of the indoor environment. These movement constraints are represented by a semantic graph in later parts.

BLE RSSI Input
When a user is moving in an indoor environment with BLE beacons, his/her RSSI capturing device, e.g., smartphone, captures a collection of RSSI observations from the beacons according to his/her position at a time. A raw RSSI has a negative integer value with a maximum value of −1. If a capturing device is closer to a beacon, it will capture the higher RSSI value. A capturing device may capture a weaker signal or even miss a signal due to several circumstances, e.g., delayed transmission, beacon antenna's orientation, obstructing objects, and overlapping signals. These circumstances may inform us about the surroundings of the indoor space, such as disturbance from many objects or deflected by walls.
Suppose the indoor environment contains M deployed beacons that emit M RSSI values. Then, we can represent these M RSSI values as a vector.  Suppose we have collected BLE RSSI observations previously knowing our semantic position s (t) at each time t. Then, we have several sequences of BLE RSSI observations paired with semantic positions. We call this collected sequence reference set R.

Definition 4.
A sequential reference set R is a set that contains sequences of RSSI vectors with semantic positions. We define this set as R = {T 1 , T 2 , ..., T |R| }, where |R| denotes the number of the collected sequences.
The sequential relationship is useful for a sequential-based indoor positioning method, such as Hidden Markov Model. Sometimes, we omit the sequential relationship between timestamps for some indoor positioning techniques, such as kNN or deep neural network. Thus, we drop the timestamp information t i of the reference set R and define another form of reference set R .

Indoor Semantic Trajectory
Given an indoor environment, simply represented by an indoor map, a user manually segments the whole indoor area into a set of nonoverlapping areas S = {s 1 , s 2 , s 3 , ..., s N } in terms of contextual and geographical information.
An indoor floor map contains contextual and geographical information of the indoor space. The geographical information of the indoor space includes several features such as building and room shapes, positions, surroundings, and obstructing objects (walls or any separator). Thus, we can extract the possibility (or impossibility) to reach a place from a nearby place directly. In contrast, the context gives us the semantic meaning of the areas in the indoor space, such as toilet, corridor, resting area, and exhibition area. If we combine these information, we can define the desired nonoverlapping areas in the indoor space. Each nonoverlapping area denotes a semantic position.
A person who is aware of the indoor space and its details, such as a museum manager, can define the semantic positions. In future usage, we can analyze important patterns that represent visitor behavior in the indoor space from the visited semantic positions.
With this notation, we define a semantic trajectory and movement as follows. We separate Rooms A and B into northern and southern parts because each room has a large area to cover even though they are not obstructed by any object or wall. The separation is actually useful for Room B because it is impossible to reach corridor BD from northern part of Room B directly. Definition 6. The semantic trajectory ST of a moving user is a sequence of timestamped semantic positions (t 1 , s (1) ), (t 2 , s (2) ), ..., (t T , s (|ST|) ) , where t i < t j when i < j and an element (t i , s (i) ) describes a user to be at a semantic position s (i) ∈ S at time t i .
Definition 7. Given a semantic trajectory ST, we define a movement s (i) → s (i+1) , where 1 ≤ i < |ST|, as a displacement of a user from a semantic position s (i) to s (i+1) in a consecutive timestamp t i , t i+1 .

Example 5.
A visitor in a museum walks in a similar fashion-like ground truth in Figure 1b. Thus, the visitor has a semantic trajectory ST= (t 1 , CorridorAB), (t 2 , RoomB-South), (t 3 , RoomB-North) . We can also extract his/her movements from the semantic trajectory ST, which are CorridorAB → B-South and RoomB-South → RoomB-North.
Then, we introduce the concept of movement constraints by considering the set of neighboring semantic positiosn NS(s i ) of a semantic position s i ∈ S, the set of movement constraints E(s i ), and the semantic graph SG. Definition 10. Given a semantic position s i ∈ S and its neighboring set NS(s i ), the set of movement constraints of s i , defined as E(s i ) = {s i → s j , s j ∈ NS(s i )}, consists of all possible movements from s i to NS(s i ). The movement constraint for the reverse direction s j → s i also holds true as NS(s j ) always includes s i (symmetric). Example 6. In Figure 1, the semantic position Corridor AB has a set of neighboring semantic positions NS(CorridorAB) = {RoomB-South, RoomB-North, CorridorAB, RoomA-North, RoomA-South}.
The indoor positioning technique without constraint infers CorridorBD as the next position at t 2 from the position CorridorAB at t 1 . However, to access the semantic position Corridor BD from Corridor AB, a user must visit Room B-South first. Thus, Corridor BD is not in the set of neighboring semantic positions of Corridor AB and the movement CorridorAB → CorridorBD violates the movement constraint of Corridor AB, which was formally defined as CorridorBD / ∈ NS(CorridorAB) and CorridorAB → CorridorBD / ∈ E(CorridorAB), respectively.

Definition 11.
A semantic graph SG consists of a tuple (S, E), where S is a set of semantic positions and E is a set of movement constraints of all semantic positions in S. A semantic position s i ∈ S and a set of movement constraints E(s i ) ∈ E in semantic graph represent a vertex and the undirected edges from the respective vertex in a graph, respectively. Although they have directional properties, the movement constraints are simplified as undirected edges as they hold the symmetric relationship s i → s j and s j → s i . Figure 1 is depicted in Figure 4. We draw each semantic position in S as vertices. Then, for each vertex s ∈ S, we establish movement constraints E(s) to its neighboring semantic position set NS(s) as undirected edges. If a movement constraint already exists from s i to s j (s i , s j ∈ S), the movement constraint from s j to s i should not be drawn. Then, we define the invalid trajectory and, consequently, the valid trajectory. Then, we continue to define the semantic trajectory extraction using the previously mentioned notions.

Proposed Method
In this section, we describe the details of our approach for extracting semantic trajectories from the deployed RSSI beacons.

Architectural Overview
We show the architectural overview of our constrained approach in Figure 5. The main inputs of our system are a semantic graph and the raw RSSI observations. The semantic graph is manually defined by users and the raw RSSI observations are acquired from a smartphone, which acts as RSSI readers. In the system, the semantic graph is represented by an adjacency matrix.
The two phases refer to (1) the offline phase, which contains the reference set collection and indoor positioning model training, and (2) the online phase, i.e., real-time indoor semantic trajectory extraction.

Data Collection
The BLE RSSI data collection is handled by a person manually. We deploy M BLE beacons across the indoor environment. The collector captures BLE RSSI vectors by a smartphone. Then, the collector walks from an initial point (the most possible entrance, for example, stairs/elevator/main door) and goes to his/her destination inside the building. There are different movements based on the users' role and characteristics. Some of them are as follows: (1) the visitors in a museum may see all the exhibition objects, (2) some visitors may not have the time to explore all the objects and may exit the museum earlier, and (3) some staff may go back and forth, checking every object's condition. The collector already knows the semantic graph of the studied environment. Thus, while the collector is walking around the environment, he/she labels his/her current semantic position using the smartphone. If the collector moves to another position, then he/she changes the current semantic position. Hence, we acquire trajectory data in the form of tuples of the timestamp, the raw BLE RSSI observation from the beacons, and the semantic position.
We apply an aggregation technique to both stages as a preprocessing technique. The aggregation improves the quality of the RSSI readings with less missing observations and more stable RSSI values.

Aggregation
The observation of our use case uses BLE RSSI, which is often missing and varies in an indoor space with many surrounding objects. Thus, to overcome these issues, we employ a sliding aggregation window to gain statistical information for consecutive signal strength samples. Aggregation functions such as mean or max function can provide such statistical information.
An aggregation window with length l performs an aggregation function on a set of raw observations X (t j ) . The set X (t j ) contains the observations from one or more timestamps that span from t i−a to t i , given an integer a ≥ 0 that maximizes t i−a − t i and satisfies t i−a − t i ≤ l. Note thatt j = t i is the latest timestamp in the set. Then, the aggregation function on the set X (t) produces a pair t j ,X (j) .
To produce the next aggregated pair t j ,X (j) , we shift the aggregation window to l + seconds forward. Thus, we definet j+1 equal to t i+b , given an integer b ≥ 1 that minimizes t i+b − t i and satisfies t i+b − t i ≥ l + . Hence, we can perform the next aggregation on the set of raw observations X¯t j+1 . We apply these steps to the set of raw BLE observations. Example 9. Figure 6 illustrates an example of aggregation using 2 BLE beacons in a one-second sampling. We see five observations from time 0.2 s to 1 s. At each timestamp, we have an RSSI value from two beacons in the We assume that the length of the aggregation window is 0.6 s and that the sliding interval is 0.4 s. We start the first aggregation on the third observation to the beginning, where the aggregation windows spans from 0 s to 0.6 s (t 3 − t 1 ≤ 0.6 s, a = 2). Then, we slide the aggregation window 0.4 s forward to 1 s (t 5 − t 3 ≥ 0.4 s, b = 2). Thus, the second aggregation is from the fifth observation to the third observation, from 0.6 s to 1 s.
For each aggregation window, we perform the max aggregation function on all RSSI values of each beacon. The first aggregation on the first beacon RSSI value yields −93 from three values: {−97, 0, −93}. We exclude the missing value (0) from the aggregation function. Similarly, for the second beacon, we get −95 from {(−95, −98, 0}. Then, we get 0.6s, {−93, −95} as the first aggregated observation t 1 ,X (1) . Using the same procedure, we get t 2 ,X (2) = 1s, {−93, −94} as the second aggregated observation. However, missing observations from all installed beacons ( can occur even though we performed the aggregation. In this case, it is still possible to infer the position based on the prediction of indoor positioning techniques. We have trained the indoor positioning techniques to learn this problem because the training/reference set may contain some missing observations. The constraints also hold a sequential property to prevent impossible transitions due to the all-missing observations.

Offline Phase
In the offline phase, we collect the reference set from the raw observations and known semantic positions. We preprocess the raw RSSI observation of the reference set using the aggregation window. We also aggregate the semantic positions. The aggregation of the semantic positions is slightly different from the aggregation of the raw observations.
To aggregate the semantic positions, we perform the majority vote on S¯t j , the set of semantic positions covered by an aggregation window at timet j . We denote this aggregated reference position bys (j) . If a tie occurs between some semantic positions, we randomly pick any of the top majority candidates as an aggregated label. Thus, we can define each labeled observation as a triplet t j ,X (j) ,s (j) , maintaining the relationship similar to the raw observations.
With this reference set R, or simply transformed to R , we can perform the semantic position estimation directly (using kNN) or train the indoor positioning models first (for the approaches other than kNN) using our constrained indoor positioning approach in the online phase.

Online Phase
In the online phase, we perform semantic trajectory extraction on the captured RSSI observation using movement constraints. In the real world, we show this extracted semantic trajectory on the user's mobile device while the user is still inside the building. Thus, it is essential to have an efficient real-time processing technique for semantic trajectory extraction. In this approach, we apply the constraints to three different styles of semantic indoor positioning models: HMM, kNN, and the other approach. We guarantee these approaches to work in real-time.

HMM Using Online Viterbi with Constraints
The Hidden Markov Model (HMM) with the Viterbi algorithm can decode a hidden sequence from a sequence of observations. In this case, we use multivariate HMM (MHMM) to estimate the most likely semantic trajectoryŜT from a sequence of timestamped RSSI vectors T =< (t t ,X (t) )|1 ≤ t ≤ |T| > as the input. We use MHMM because we deploy more than one beacon; thus, we need to observe multiple devices. Then, we train the MHMM model using the reference set R (as set of trajectories). We summarize the description of the components of HMM in Table 1.
Probability of beacon m emitting RSSI k at semantic position s i ∈ S: B(i, m, k) = P(s i |O m,k ) π Occuring probability of semantic position at t = 1 We describe the relationship between the MHMM transition probability matrix A and movement constraints in semantic graph SG in Equation (1). If a movement from a semantic position s i to s j , where s i , s j ∈ S is not possible, violates the movement constraint, the transition probability of s i to s j is zero.
In contrast, we set the size of the emission probability matrix to |S| × M × K, where M is the number of deployed beacons and K is the possible emitted RSSI within the range of negative integers and 0. We limit these values from −1 to −100, where any value lower than −100 is categorized as 0. Thus, we fix the value of K = 101, where the additional value stands for the missing value (0).
Originally, an HMM uses the Viterbi algorithm to see the full trajectory to infer the best solution from the sequence of observations. However, the traditional Viterbi algorithm cannot immediately output an optimal solution in real time because the full trajectory can only be seen after the user finishes his/her trip inside the building. Thus, we apply an online Viterbi approach similar to Reference [11] with modifications to extract the most likely subsequence given a window with length w. Before we perform the online Viterbi of the HMM with constraint, we initialize the global variables of the HMM with constraint in Algorithm 1. The global variables store the information of the previously seen observations (PrevObservations) and predicted semantic positions (ŜT). Then, we perform the online Viterbi of HMM with constraint in Algorithm 1 whenever an RSSI vector is captured.
We describe the online viterbi of HMM with constraint (Algorithm 1) as follows. The original Viterbi is performed on the subset of the observation sequences with length w from the observation at time t − w + 1 to current observation at time t inclusively. Consequently, we cannot perform the online Viterbi (lines 3-4) unless the timestamp of the current observation is the (w + 1)-th observation (line 5). Thus, defining w is important as a longer w should have closer optimality to the full trajectory but a longer w also makes the prediction slower and delayed. After that, we subset the previous observation according to the condition if the current observation is at the end of trajectory (lines 6-11). Lined 12-14 represents the application of the constraint to the HMM model. For the beginning of the trajectory (predicting t = 1), we should use π for initial probability as default. However, in the later part, we already predicted our previous position asŝ (t−1) . Hence, we set the π as the transition probability ofŝ (t−1) for the Viterbi in the later part of the trajectory (line 14). Then, we compute the suboptimal solution using the Viterbi at line 16. If the trajectory does not end, we add the predicted positions one by one because we might see some change in the solution when the next observation X (t+1) comes (lines [17][18]. If the trajectory ends, we directly add all of the predicted positions to the predicted semantic trajectoryST. Thus, we obtain the predicted semantic trajectoryST as the concatenated output of the online Viterbi.

k-Nearest Neighbor with Movement Constraints
The k-Nearest Neighbor with movement constraints (kNN-C) estimates a semantic positionŝ (t) from the streamed aggregated observation t t ,X (t) at timestampt t from the reference set R , a semantic graph SG (S, E), and the previously estimated semantic positionŝ (t−1) . Given the previously estimated semantic positionŝ (t−1) , we can remove irrelevant observations from the reference set R . Thus, we can output the close semantic position, thereby preventing the occurence of invalid trajectories.
The main difference between kNN with and without the movement constraints is the search space. The kNN without constraints considers all observations in the reference set as the search space whereas that with constraints only checks the reference set that consists of the members of the neighboring set of semantic positions NS(ŝ (t−1) ) (the previously inferred positionŝ (t−1) ). Therefore, applying constraints to kNN ensures search space reduction and validity of semantic trajectory. Example 10. In Figure 7, we apply both kNN and kNN-C with k = 3 for an indoor environment with two installed RSSI beacons (M = 2). We consider a subset of the semantic graph from Figure 4, depicted in Figure 7a. We describe the reference set R in Figure 7b. From data collection, we have two aggregated observations at each semantic position in the reference set R ; thus, |R | = 6. Figure 7c depicts that the current aggregated RSSI vector isX (t) , the previously inferred positionŝ (t−1) , and the distance computation for kNN-based methods. The current observationX (t) contains two RSSI values {−65, −90} from two beacons. The previously inferred semantic positionŝ (t−1) is at Corridor AB; thus, its neighboring semantic areas NS (CorridorAB) are {RoomB-South, CorridorAB}. Then, we compute the distances between the current observationX (t) with the instancesX (i) in R = {RoomB-South, CorridorAB} as the search space. Note that EuDist(X (t) ,X (i) ) measures the Euclidean distance of the RSSI vectors between current observationX (t) and the instances ofX (i) in reference set R = {(s (i) ,X (i) )|1 ≤ i ≤ |R |} and may not reflect the actual geographical distance. When we use kNN without the movement constraint, we comparē X (t) to six observations in R in the reference set. However, by applying the movement constraint to the kNN, we have to check only four references from NS (CorridorAB) (shaded by light gray in Figure 7c). By doing this, we reduce the search space for the comparison and ensure the validity of the extracted trajectory.
The kNN (k = 3) takes three members from reference set R with the smallest distance from the current measurementX (t) . Thus, it obtains two instances with the semantic position Corridor BD and one instance with the semantic position Room B-South as the nearest neighbors. Meanwhile, kNN-C only checks the references in R that are included in NS (CorridorAB); thus, it only considers two instances with the semantic position Room B-South and one instance with the semantic position Corridor AB as the nearest neighbors. Using a majority vote, kNN returns Corridor BD while kNN-C returns Room B-South as the estimated semantic position. The extracted movement of kNN, as depicted in Figure 7d, RoomB-North → CorridorBD violates the movement constraint from Room B-North whereas the extracted movement of kNN-C, RoomB-North → RoomB-South, depicted in Figure 7e, does not. Thus, kNN-C provides a valid trajectory.
Algorithm 3 describes the application of movement constraints to the kNN-based method. The algorithm reduces the search space in R by the movement constraints using the previously inferred semantic positionŝ (t−1) as input in line 1. We perform the kNN method on the subset of R , which contains only NS(ŝ (t−1) ), i.e., the constrained reference set R c in line 8. The set R c represents the reduced search space whenŝ (t−1) is available. Then, we perform the naïve kNN on set R c (line 11). Ifŝ (t−1) is not available (t = 1), which represents the beginning of the trajectory, we cannot reduce the search space and perform naïve kNN using the original R (line 5).

Other Indoor Positioning with Constraints
We assume an indoor positioning task as a multi-class prediction of a machine learning model. In this case, we discuss the general machine learning model besides HMM and kNN. The models can vary from Deep Neural Network (DNN), Support Vector Machine (SVM), Logistic Regression, and others as long as they can perform classification tasks. We train an indoor positioning model by R to produce a likelihood model. The model predicts a semantic positionŝ (t) ∈ S given an input of aggregated RSSI vectorX (t) at time t. The outputŝ (t) is a semantic position s where s = argmax s∈S P(s|X (t) ) and ∑ s∈S P(s|X (t) ) = 1. P(s|X (t) ) is the likelihood of a semantic position s given the inputX (t) at time t, which is the result of training the indoor positioning model. Then, given the result of the trained indoor positioning model, we can ensure the validity of the extracted trajectory using the indoor positioning and adding movement constraints to the previously predicted semantic positionŝ (t−1) . We addŝ (t−1) as another input to the indoor positioning; thus, we get f (X (t) ,ŝ (t−1) ) =ŝ (t) as the indoor positioning formula. Then, we formulate the output, i.e., current predicted semantic positionŝ (t) , in Equation (2).
We only apply the likelihood of the semantic positions s whenŝ (t−1) → s ∈ E(ŝ (t−1) ). Thus, we guarantee the extraction of the valid trajectory as output using the movement constraint.
We provide the algorithm for the constrained ML approach in Algorithm 4.

Algorithm 4:
Applying a constraint to a machine learning (ML) classification task. Set Pn t = {<ŝ, p > | <ŝ, p >∈ P t ∧ŝ ∈ Sn t }; Note that, similar to the constrained kNN, when t = 1 in the beginning, we only perform a prediction withoutŝ (t−1) and the constraints.

Semantic Trajectory Extraction
Finally, we have the result of indoor positioning, i.e., the current estimated semantic position t t ,ŝ (t) . Then, we continuously concatenate t t ,ŝ (t) from the beginning until the end of trajectory T as the predicted trajectoryŜT. We denote the inferred semantic trajectory byŜT = < (t 1 ,ŝ (1) ), (t 2 ,ŝ (2) ), ..., (t |ŜT| ,ŝ (|ŜT|) ) >. Note that all of the studied indoor positioning techniques perform in real time. Hence, the user can see his/her visited semantic positions while he/she is still walking inside the building.

Experimental Results
In this section, we present the setting of our experiments and the performance evaluation of our approaches. Our goal is to evaluate the efficiency and effectiveness of the constrained approach for extracting valid semantic trajectories from the installed BLE beacons in an indoor environment. In this experiment, we use Deep Neural Network (DNN) for the base of the other ML-based indoor positioning technique. We compare the performance of our approach of HMM with online Viterbi and constraints (Section 4.5.1), kNN with constraints (Section 4.5.2), and DNN with constraints (Section 4.5.3) to the baseline naïve kNN method, adaptive bandwidth mean shift + kNN [8], unsupervised multivariate HMM, simple DNN, and particle filter (PF). We choose particle filter as a non-machine learning approach in indoor positioning.
We implement all approaches from scratch using Java except for DNN, which is provided by Tensorflow 1.15. We use a machine equipped with Intel(R) Core i7 3.6 GHz CPU and 16 GB RAM. We summarize the approaches in Table 2. We format our proposed approaches with bold typeface in the table.

Dataset
We used real trajectory-BLE RSSI datasets from two different indoor environments: (1) the fourth floor of our campus building and (2) the first floor of a library building [13]. For the first environment, we collected three datasets with different beacon deployments and semantic position definitions. The data collection in the first environment used Beabig BLE beacons with the Bluno firmware v. 1.8 [21] and an Android OS 8.10 smartphone with 3 GB RAM, 1.6 GHz processor, and sampling rate of 200 ms each. The semantic graphs of PNU1, PNU2, and PNU3 are depicted in Figure 8a-c, respectively. The PNU1, PNU2, and PNU3 datasets have different definitions of semantic positions and beacon placements.
For the second environment, we modified the iBeacon dataset (iBeacon) to suit our case because this dataset is originally for stay estimations. We defined the semantic positions in as depicted in Figure 8d. We also defined the neighboring set of each semantic position as the 8-direction connectivity, for example, NS(C3) = {B2, B3, B4, C2, C3, C4, D2, D3, D4}. We extracted trajectories from the iBeacon dataset using the time and area threshold as it provides timestamps for each recorded BLE observation. Even though it is provided for stay estimation, it captures some movements that can be used for our problem setting.
A detailed summary of all datasets is given in Table 3. We subset some trajectories as the reference or training set and used the others as the test set. We performed the trajectory extraction on the test set to measure the quality of our approach. Note that the numbers in the reference and the test set represent the number of trajectories, while the total length denotes the total of all semantic positions in all trajectories (reference + test). Note that we use the PNU3 dataset to identify the performance of our approach on a dynamic environment. The dynamic environment stands for the environment in which the PNU3 dataset consists of three different settings of the number of people walking in the indoor space, which are 5, 10, and 13 people. The dynamic environment experiment is conducted as follows. We use the 10-people dataset as the reference set and training set. Then, we test our trained model to the combination of two different test sets (5 and 13 people test sets). We compare the performance to the original training set for each number of people. Thus, given that setup, we can see the performance of the indoor positioning techniques to dynamic environment.
Before performing the main experiments, we would like to study the effect of the weak RSSI threshold to the indoor positioning techniques. We use kNN and kNN with constraint methods with PNU1 dataset for this preliminary study. Based on the preliminary experiment result, we use the best value of the threshold to perform the main experiments.

Parameter Setup
Our approach requires some parameters for the aggregation window and the indoor positioning method. For brevity, we have not included the study of the aggregation window parameters. Thus, we fix the aggregation parameters: aggregation window length (l) = 1 s, sliding length (l + ) = 0.5 s, and max aggregation function. The max function considers the strongest RSSI value of each beacon in the aggregation window.
We do not use the mean function because a large counterpart of the datasets contains missing observations. Performing mean on a sequence with missing values yields similar statistical information to that of the max function (averaging one or two nonzero values).
We compare each indoor positioning method style with its respective constrained addition (except for HMM). For the HMM, we compare the constrained approach with the basic unsupervised approach, which sees the full trajectory (not suitable for real-time environments). We study the behaviour of HMM-C by varying its window length. We use naïve kNN and the ABMS + kNN [8] as the baseline approaches for kNN because they yield good results in 2D positioning cases. However, these methods do not perform well in the real-time setting. Thus, we also study their efficiency. Additionally, we cannot directly apply our movement constraints technique to ABMS + kNN because ABMS + kNN requires 2D coordinates. Instead, we convert the final result of kNN in ABMS + kNN into the closest semantic position, which has the smallest distance from the semantic position's centroid to the inferred position. Table 4 describes the settings of parameters used in the experiment. ABMS + kNN requires additional parameters, i.e., bandwidth and number of groups, which were set as 10 and 20, respectively. The particle filter uses 100 particles, 1.2 m/s walking speed, and a gaussian distribution noise with 0 mean and 0.3 standard deviation. We report the behaviour of the approaches according to the varied parameters.

Evaluation Metrics
To measure the performance of each approach, we used three metrics in terms of effectiveness and validity. We applied these metrics to an extracted semantic trajectoryŜT and its respective ground truth ST, where ST = (t t , s (t) ) ,ŜT = (t t ,ŝ (t) ) , and 1 ≤ t ≤ |ST|.

Error
We used a simple classification error to measure the correctness of the predicted semantic trajectorŷ ST to the respective ground truth ST. Both had the same length |ST|. The formula of error is given in Equation (3).
The value of Err ranges from 0 to 1, where 0 stands for an entirely correct prediction for a single trajectory whereas 1 is obtained if the positioning method mispredicts all semantic positions to be different semantic positions.
Note that we are not only interested in correctly predicted positions as defined by the smallest error possible. Instead, we are intrigued by applying the commonly used positioning error in 2D indoor positioning for semantic positions. The positioning error should measure how far apart the predicted position is from the ground truth. In the semantic position case, this relationship is reflected by the semantic distance d SG (ŝ (t) , s (t) ). Thus, we think that only the error is not sufficient in capturing the effectiveness of the semantic positioning. Hence, we define another metric called Semantic Positioning Error Rate (SPER).

Semantic Positioning Error Rate
First, we define the semantic distance d SG (s i , s j ) between two semantic positions s i , s j ∈ S as the number of minimum traversed edges from s i to s j in the semantic graph SG. We can compute the distance using the Dijkstra algorithm, given s i as the source and s j as the destination. Thus, we can state that, at time t t , an inferred semantic positionŝ (t) is correct if its semantic distance to the ground truth s (t) is zero d SG (ŝ (t) , s (t) ) = 0 . When d SG (ŝ (t) , s (t) ) becomes larger than 0, the predicted semantic position gets farther from the ground truth.
As a measure of effectiveness, we compute the relative ratio of the semantic distance of the predicted semantic positionŝ (t) to the maximum possible distance/misprediction, given the ground truth s (t) . We call this metric the semantic positioning error rate, SPER ŝ (t) , s (t) , which is described in Equation (4). The value of SPER ranges from 0 to 1, where 0 stands for an entirely correct prediction for a single trajectory whereas 1 is obtained if the positioning method mispredicts all semantic positions as the farthest possible semantic positions.

Validity
The validity measures the "jumpy"-ness of the extracted trajectory. We do not use ground truth ST as it is not related to the distant movement of the extracted trajectory. Similar to SPER, if the consecutive predicted semantic positions are too far, that trajectory is considered to be less valid. We define this measure as V(ŜT) in Equation (5).
The value of V(ŜT) ranges from 0 to 1, where 1 stands for a valid trajectory and 0 is obtained if all of the consecutive semantic positions are the pairs of the farthest possible semantic positions.

Results
First, we discuss the effect of the threshold to an indoor positioning technique performance. Then, we observe the performance of each indoor positioning with varied parameters on SPER. Note that we also compare the efficiency of the kNN-based methods by search space reduction and the computation time in the online phase. The search space reduction does not affect HMM and DNN methods, unlike kNN, because they estimate the semantic positions using the trained model rather than looking at the reference set. Thus, their computation time would not change significantly. Then, we discuss the difference between the error and SPER of different datasets using each method. Next, we examine the validity of the trajectory extracted via each method. After that, we examine the performance of the indoor positioning techniques with constraint in a dynamic environment.

RSSI Value Thresholding
Previously, in Section 3.2, we mentioned that weak RSSI may happen due to several circumstances. However, this might be useful for the ML-based indoor positioning to learn the weak RSSI characteristics. We present the result of this preliminary study using kNN-based methods and PNU1 dataset in Figure 9. We denote no threshold as "X" in the chart. From Figure 9, it is clear that the threshold value of −100 gives the optimal thresholding value. In contrast, larger thresholds (≥ −95) did not improve the performance of indoor positioning because they ignore the weak values of RSSI. Similarly, weaker values or no threshold only give slight enhancement to the performance. Hence, we use the threshold value of −100 in the next experiments.

HMM
We compare the effectiveness of HMM with online Viterbi and constraints (HMM-C) to unsupervised MHMM (HMM) to determine the consistency between the suboptimal results yielded by the online approach with constraint and the optimal results by the original MHMM. Figure 10 shows the performance of HMM-based methods based on SPER. Note that the original approach gives straight lines as we perform the estimation using the full trajectory. It is evident that the online MHMM does not exhibit any significant difference if the subsequence window length w is increased. However, we do not want to set this parameter at a large value as it delays the prediction time. Overall, HMM-C is comparable to its optimal original approach except in dataset PNU1. This indicates that HMM-C is almost as effective as the original HMM even though it does not see the full trajectory.

kNN
We compare the performance of the kNN-based indoor positioning techniques in terms of effectiveness (SPER) and efficiency (search space reduction and average computation time). Figure 11 presents the efficiency of the three kNN-based methods, including the reduced search space in each dataset. In this experiment, we fix the number of neighbors (k) to 10 to estimate a semantic position. With the efficiency, we confirm the feasibility of each method in a real-time. It is evident that ABMS + kNN consumes the longest computation time to perform an estimation, followed by a significantly shorter time for kNN, and the shortest time for kNN-C. The search space reduction by the constraints (Figure 11a) highly influences the computation time of kNN-C (Figure 11b). The computation time largely decreases from the original kNN because kNN-C reduces the reference set/search space size owing to the movement constraints from the previously inferred position. In a current inference of positionŝ t , kNN-C only looks for the neighboring semantic position from the previous positionŝ t−1 instead of observing all instances of the reference set. ABMS + kNN performed poorly because it performs kNN until the inferred position does not change or converges. The prediction time of ABMS + kNN is nearly 10× that of kNN. Even in the largest dataset, PNU2, it needs more than 6 s to predict a single position. Thus, We can conclude that ABMS + kNN is not suitable for real-time inference due to the long prediction time. This result shows that the constraints significantly reduce the cost of computation for indoor positioning.  Table 5 depicts the comparison of the kNN methods based on the change of k. It is evident that, by using a smaller value of k, kNN-C achieves the best result. Although, with a larger value of k in dataset PNU1 and PNU2, both kNN and ABMS + kNN outperform kNN-C. This underperformance is mainly because the number of neighbor affects the performance of kNN significantly on the constrained ones. In kNN-based methods, if we consider more neighbors for the indoor positioning, we are more prone to irrelevant references. Using the constraints, the number of relevant references are reduced significantly because the references were divided into subsets of the neighboring areas of the previous estimated position. Considering a fewer number of neighbors is slightly beneficial because it saves time to infer a semantic position as it considers a fewer number of references practically.

Deep Neural Network
We used simple Deep Neural Networks (DNN) as the other ML-based indoor positioning method. We compared the performance of constrained and original DNN in terms of SPER. We averaged the results from the 10 repeated experiments for each parameter (hidden layers and datasets). Figure 12 shows that the constraints improve the SPER on nearly all datasets. However, the original DNN outperformed DNN-C on the PNU1 dataset when their layers are three. On the other hand, apparently, the number of hidden layers of DNN does not relate to the performance of SPER.

Error vs. SPER
The error and SPER have slightly different effectiveness measures in indoor semantic positioning. As described before, error does not consider the distance between the predicted semantic positionŝ (t) and the ground truth s (t) at time t. We fixed the parameters of each method according to the default value in Table 4 (HMM window length = 3, k = 10, and DNN hidden layers = 4). Figure 13 presents the error and the SPER of all studied methods, except the naive kNN and the unsupervised multivariate HMM for brevity. It is evident that, in a particular method and dataset, error and SPER do not have a linear relationship. This relationship is possible because SPER highly depends on the maximum distance from a single semantic position in the semantic graph, which varies in different indoor environments. The longer the longest distance in a semantic graph, the more likely SPER is lower. In contrast, error does not depend on any characteristic of the semantic graph. The error analysis in Figure 14 shows more details about the performance of each method and the reason for using SPER instead of error. Figure 14a-c shows the count distribution of the hops needed to travel from the correct semantic position to the predicted semantic position for each dataset in the semantic graph (semantic distance). We only show the count of the four smallest distances in the figure because the number of ground truths and predicted semantic position pairs gets significantly smaller for the larger distance. We see that ML-based methods mispredict the majority of the semantic position with a distance of 1 hop in all datasets, except for HMM-C in dataset PNU1. If we use only error, we cannot capture this detail because error only considers the count of a 0-hop distance of the ground truth and predicted positions. Figure 14d-f shows boxplot of the semantic distance from the correct semantic position to the predicted semantic position for each dataset. The boxplots reflect similar insights to the error distribution for the majority of the misprediction with a 1-hop distance. The indoor positioning techniques still predict a small amount of positions that are far from ground truth as outliers, except for the particle filter in PNU1 and PNU2 dataset. Thus, it is evident that the particle filter performed poorly in every dataset. Furthermore, most of the methods perform well in the PNU2 dataset. This is plausible because PNU2 has a larger dataset for training and a simpler semantic graph than the other two datasets. In contrast, HMM performs generally worse than the other methods, except in PNU2. This means that the HMM approach may not perform well in a smaller dataset and more complex setting. Also, most of the methods outperformed particle filter in dataset PNU2. From the dataset description in Table 3, PNU2 has more observation points than the PNU1 and iBeacon data. In other words, on average, a trajectory in dataset PNU2 is longer (has more points) than a trajectory in datasets PNU1 and iBeacon. Thus, we can conclude that the particle filter method performed worse than ML-based indoor positioning for longer trajectories. Figure 15 shows the validity of the unconstrained and constrained approaches, including the unsupervised MHMM (HMM). Evidently, the constrained approaches extract the valid trajectory (always 1) owing to the movement constraints. The movement constraints ensures the validity to be 1 because the constraints restricts the inferred positions to be close to each other. The HMM and particle filter also extract valid trajectories. HMM always extract valid trajectories as it is the more optimal form of the HMM with constraints. Meanwhile, the particle filter infers the positions by simulating the user movements by the parameter of walking speed. Thus, the particle filter by default provides valid trajectories.

Validity
It can also be observed that the unconstrained approach yields invalid trajectories but with validity in the range of (0.85-1]. Hence, the distance of the consecutive predicted semantic position by the unconstrained approach is not too far. Figure 16 shows the performance of the indoor positioning methods on a dynamically changing environment using dataset PNU3. We omit the original kNN and HMM again for brevity. We denote each result as (training)-(test) notation. For example, in the chart, the "10people-5people" result means we train the model using the 10 people setting and test the model using the 5 people setting. Particle filter gives exactly equal results because it does not train any model and works directly on the test. Interestingly, most of the methods give slightly similar or improved results of the dynamic setting (10 people-5 people and 10 people-13 people) to its original setting counterpart (5 people-5 people and 13 people-13 people), except for HMM-C that significantly improves the performance of the 13 people setting using the 10 people dataset. We can infer that a carefully selected environment as training set can improve the performance of the indoor positioning of a dynamic environment.  On the other hand, we see that the DNN without constraint does not work well on both settings. However, the DNN with constraint approach significantly improves the result. Meanwhile, the particle filter works well in the 5-people setting but it is slightly outperformed by the ML-based methods. The underperformance of particle filter is clearly shown in the 13-people setting. The ABMS + kNN method outperformed kNN with constraint in the 13-people setting, whereas they show a similar result in the 5-people setting. However, note that ABMS + kNN does not always give valid results like kNN with constraint and requires more processing time.

Discussion
Our experiments show that, among different indoor positioning methods, the application of movement constraints yields valid semantic trajectory extraction. Ensuring the validity of the trajectory may or may not affect the correctness of the results, represented by SPER and error. For example, the result of ABMS + kNN is better than kNN-C in the PNU2 dataset but it produces a less valid trajectory.
Even though particle filter, as non-ML-based indoor positioning, gives valid trajectory, it underperforms the ML-based indoor positioning results in almost every dataset. In addition, the particle filter requires heavy computation due to the resampling technique.
In contrast, the constraints in DNN improved the correctness of the indoor positioning. A perfect indoor trajectory estimation (SPER and error = 0) should provide a valid trajectory. However, this erroneous estimation is still inevitable due to the obstructions in indoor environment. Thus, a better RSSI data quality enhancement would improve the quality of the indoor positioning. Note that the RSSI data enhancement effort should not burden the computation to work in real time. On the other hand, we show that our approaches worked in different types of indoor environments, considering the various numbers of installed beacons, semantic positions, indoor layouts, movement constraints, and the number of people in the indoor space.

Conclusions
In this paper, we presented movement constraints to extract valid indoor semantic trajectories using BLE beacons. We extended some indoor positioning techniques using the proposed movement constraints to prevent the prediction of any semantic position distant from the previous estimation, thereby resulting in an invalid trajectory. We conducted the comprehensive experiments of four different indoor settings.
Our experiments demonstrated that the proposed movement constraint-based approaches extract valid trajectories that are comparable to the unconstrained and non-ML approaches. On the other hand, we also show that our proposed approach can handle a dynamic indoor environment.
For all approaches, the proposed methods with constraints yielded a comparable positioning quality with respect to their non-constrained approaches. The online HMM with constraints provides a slightly similar performance to its original counterpart. For kNN, the movement constraint, in addition to improving the correctness, also increased the efficiency by 60-70% and the speed by 1.5 times. Likewise, the constraints also improved the DNN in both correctness and validity.
In the future, we plan to improve the quality of RSSI data from the beacons and to discover interesting patterns from the extracted semantic indoor trajectory.
Author Contributions: H.R. wrote the original and revised version of the paper and the study, collected datasets, and conducted the experiments. Y.Y. contributed for the initial study and data collection. J.K. supervised the work and revised the paper. All authors have read and agreed to the published version of the manuscript.