System operation proceeds as follows: once the primary estimated trajectory is obtained by the localization system, map matching is carried out by applying the CRF model that also uses the information obtained from the map to produce a refined (corrected) trajectory. The goal is to estimate the most feasible trajectory, taking into account the constraints provided by the map such as walls and other obstacles that are obtained from a semantic map generation system, a separate unit that extracts map data from CAD files and represents it in a model so it can be used by the algorithm.
2.1. The Semantic Map Generation System
Indoor maps can be available in different formats, like images, PDF files, or CAD files. These formats are not suitable to be directly used by the map matching algorithm; two tasks should be performed, the map information should be extracted from the available map files, then this information should be represented in another format usable by the map matching algorithm. The most commonly used map models in indoor localization are the grid models and the graph models [
2]. In the grid models, the space is partitioned into regular cells with semantics. The graph models [
14] reconstruct the space as a graph where nodes represent entities or places of interest in the building. In [
12], for example, they obtained graphs from maps in image formats using standard edge detecting algorithms to extract edges from the image, then a grid-based map model was used with cell size of 0.8 m to model the map information. In our system, we preferred to use the CAD files because CAD is very widely used in architectural design, and we modelled the map using a grid-based model.
The semantic map generation unit extracts the map data from the CAD files and models it in a semantic model suitable for the map matching algorithm. We use the CAD data encoded in the Drawing Interchange File format (DXF). DXF files are standard ASCII text files that are offered by CAD and that can be easily read by other programs [
15,
16]. However, map information obtained from DXF files is not suitable for localization applications, as it is not enhanced with semantic information that allows computers to understand the architectural structure of the building and to distinguish between different architectural objects such as walls and stairs [
17]. Instead, DXF-derived map information comprises only line, curve, circle, and polyline drawing data, which means that it requires more processing in order to be used by the proposed map-matching algorithm. We extract the DXF information and represent it using our map model, which is based on dividing the map into square cells and determining the possible transitions of the pedestrian from one cell to another depending on the existing obstacles. We extract the simple entity information from the DXF files and locate it in the grid cell representation model of the floor plan.
As we will explain in
Section 2.2, the CRF model is a classifier that requires pre-defined states/labels. To this end, the proposed map model is designed to suit this specific purpose by representing the map as a group of square cells, with each cell representing a squared area in the building; to the algorithm, this represents a state/label that can be used to specify the position of the pedestrian. In the model, each cell has its own characteristics, be it representing a free space in the building or be it occupied by an obstacle such as a wall or furniture. For the purpose of representing state transitions, each cell knows its neighbor cells and the neighbors of neighbor cells. A transition graph/table is generated in such a way that the transition is only possible between neighbor cells that do not contain any obstacle; for more flexibility, a transition to a neighbor of a neighbor cell is also allowed under the condition that there are no obstacles impeding that transition. The search region for the next location from the current location is known as the buffer [
2], with the maximum transition distance allowed in each direction known as the buffer size. Allowing transitions to neighbor cells and neighbors of neighbors results in a buffer size of 2 cells in each direction, as shown in
Figure 2a.
Figure 2b presents an example of possible transitions based on the existence of obstacles.
2.2. The Map Matching Algorithm
A CRF model is used as the base model in the developed system. As undirected probabilistic graphical models developed for labelling data [
8], CRF models are used for an input set of observations
to predict a vector of hidden variables
. Unlike generative models such as HMMs that model the joint probability
by applying Bayes’ rule, CRFs are discriminative models that model the conditional distribution
over the hidden variables
given observation vector
; a comparison of generative and discriminative models can be found in [
18]. In linear chain CRFs, a special form of CRF graphs that model the output variable as a sequence [
9], the conditional probability of states given observations
is proportional to the product of potential functions that link observations to consecutive states.
Figure 3 shows a representation of the linear chain CRF model. The hidden states
are dependent on input observation vector
; each hidden state depends not on one input value, but rather on the whole input vector. As a result, they can be affected by input observations from different time steps; which is considered an advantage of the CRF technique.
In the map matching problem, the hidden state vector represents the sequence of locations to be calculated, i.e., the corrected walking trajectory; our Linear Chain CRF algorithm uses the cells in the map model as the hidden states/labels. Vector represents the system input that can be either coordinates of the locations on an estimated trajectory obtained by some localization system, as in the case of our algorithm, or direct measurements from sensors.
The CRF algorithm consists of two phases: the forward phase and the backward phase. Following the calculation of the conditional probabilities of all cells in all time steps during the forward phase, inference is carried out in the second phase (backward phase),with the optimal trajectory chosen from among different candidate trajectories.
During the forward phase, the CRF algorithm evaluates the possible transitions at each time step according to the input trajectory and the transition graph obtained from the map model; this involves calculating the probabilities of transition from all cells of the current time step to all cells of the next time step. A probability value is assigned to each cell in the map at each time step; this value represents how probable it is that the pedestrian is located in that cell at that time step (for example, the probability of pedestrian movement to a cell occupied by a wall is zero). This probability is also a conditional probability calculated using so-called feature functions that compute to what degree the input observations support the choice of a cell to be on the trajectory at that time step. However, cell selection depends not only on its probability value at the current time step, but also on the previous time steps, i.e., the path history, and how this cell is related to others in the path. Hence, choosing the cell with the highest probability is not enough; the probability of a whole trajectory should be calculated.
The feature functions specify how the transition between two states is supported by the set of observations
. The potential function of a cell is calculated at each time step by calculating the exponential of the summation of all feature functions that support the selection of this cell multiplied by the transition probability of moving to this cell from the current cell at the current time step. The higher the potential function value, the higher the probability of the cell to be the next cell. At each time step,
j the potential function is the exponential of the summation of all feature functions
fi at that time step, and can be written as:
where
m is the number of features and
is the feature weight that can be determined by training the model. The conditional probability
of each cell is calculated by normalizing the potential function as follows:
where
is the number of output states/cells and
is the normalization factor, with
If N is the number of cells, then at every time step the potential functions should be calculated N × N times; however, knowing that transition can only happen between neighbor cells, one need only calculate the potential function of those neighbors. Thus, the calculation number at each step is O(N) and the complexity of the whole procedure is O(NT), where T is the number of time step observations.
For each observation, each cell stores only one conditional probability value (the highest), while the previous cell that gave this value is also stored as the best parent. This is necessary for the following inference step.
During the backward phase, inference is implemented in order to estimate the location over time, with the most likely sequence of hidden states calculated by maximising the sum of the conditional probability function. After the potential function for all cells is calculated for each observation, the optimal path is determined. This is carried out via a backward process using dynamic programming and the Viterbi algorithm [
19]. The optimal path is obtained by maximizing the sum of conditional probabilities along the path:
Although this process might be assumed to be of high complexity, as we need to choose between all possible trajectories, it is actually linear because for each time step we only save one value of the conditional probability for each cell. Hence, the number of candidate paths equals the number of cells (N). The backward process is O (NT), as we start a backward process for each cell by determining the best parent, which is the neighbor cell with the highest conditional probability at the previous time step, before summing these conditional probabilities along the path until reaching the first parent in the path. The path with the highest sum is then chosen. The forward and backward phases of the map matching CRF algorithm are shown in Algorithm 1.
Algorithm 1. CRF Algorithm for the Map Matching Problem |
1 | Input : Observation = a vector of coordinates of the input estimated trajectory |
2 | Output: CorrectedPath = a vector of coordinates of the output corrected trajectory |
3 | Forward Phase: |
4 | For each observation (Observationj) %Observation j = coordinates of the input at time step j |
5 | For all cells (i) |
6 | For all neighbor cells of i (k) |
7 | fdis = T(i,k)/Distance (k,Observationj)%T(i,k): transition possibility from cell i to cell k {0,1} |
8 | Potential(k,Observationj) = exp(fdis); |
9 | Z = sum(Potential(:,j)) % normalization factor |
10 | ConditionalProbabilty (k,Observationj) = Potential(k,Observationj)/Z |
11 | If (ConditionalProbabilty (i,Observationj-1) > ConditionalProbabilty (bestParent (k)) |
12 | then CorrectedParent(k) = i |
13 | End |
14 | End |
15 | End |
16 | End |
17 | Backward Phase |
18 | #CandidatePaths = #Cells |
19 | For p = 1➜ #CandidatePaths % p is the last cell in the CandidatePaths |
20 | k = p % k is the current cell in the CandidatePath |
21 | Construct each CandidatePath: |
22 | For all observations (j ➜1) |
23 | CandidatePath(j) = k; %add cell K to the path at time step j |
24 | sum(CandidatePath) = sum(CandidatePath) + ConditionalProbabilty(k,j) |
25 | k = BestParent(k,j) % choose the best parent of k to be the next cell in the path |
26 | End |
27 | End |
28 | CorrectedPath = CandidatePath with highest sum of ConditionalProbabilities |
The main feature that we use in our CRF map matching algorithm is the Euclidean distance (in meters) between the centre of the candidate cell and the estimated position derived from the primary localization system; a smaller distance means a higher value of the potential function given that the transition to this cell is possible, as determined by the transition graph obtained from the map model. The proposed feature function can thus be defined as follows:
where
is a function that indicates the transition possibility from the current cell to the candidate cell, depending on the transition table, and which can be either 0 or 1,
is the time step and
is the current observation, which is the current location estimated by the primary localization system. This feature means that cells close to the current input locations have higher probabilities than far cells, on the condition that transition to the cell is possible from its neighbor cells and no obstacle forbids this transition.