# A Data Correction Algorithm for Low-Frequency Floating Car Data

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Related Work

## 3. Data Correction Algorithm

#### 3.1. Problem Statement

_{i}is the time the point was collected.

#### 3.2. Trajectory Preprocessing

_{i}in a buffer D can be calculated via Equation (2). The value of $\widehat{\lambda}$ can be calculated by Equation (3), where N

_{D}is the number of points in buffer D, and |D| is the area of buffer D. Through Equation (4), we can calculate the probability of X ≥ n

_{i}in buffer D.

_{S}is small. In contrast, the value of R

_{S}is large at the edge of the city, as shown in Figure 4.

#### 3.3. Hierarchical Map Matching Algorithm (HST-Matching)

**Assumption**

**1.**

**Assumption**

**2.**

#### 3.3.1. Preliminary Matching

_{i}, 1 ≤ i ≤ n, with radius r in a trajectory T = {P

_{1}, P

_{2}… P

_{n}} to retrieve the candidate segments and candidate points of P

_{i}. As we had already preprocessed the trajectory, the noises with the largest deviations had been removed. The radius of the buffer was set as 30 m.

_{i}, there are three candidate segments ${P}_{i}=\left\{{e}_{i}^{1},{e}_{i}^{2},{e}_{i}^{3}\right\}$. The distances between P

_{i}and the candidate segments are $\left\{{d}_{i}^{1},{d}_{i}^{2},{d}_{i}^{3}\right\}$, and the candidate points of P

_{i}are $\left\{{c}_{i}^{1},{c}_{i}^{2},{c}_{i}^{3}\right\}$. As azimuth information in floating car data is lacking, we calculate the angle differences between the vector connecting points P

_{i}and P

_{i}

_{+1}and the candidate segments direction $\left\{{e}_{i}^{1},{e}_{i}^{2},{e}_{i}^{3}\right\}$. As shown in Figure 6b, the angle differences are $\left\{{\theta}_{i}^{1},{\theta}_{i}^{2},{\theta}_{i}^{3}\right\}$. We use a threshold ${T}_{\theta}$ to filter out parts of candidate segments. If ${\theta}_{i}^{j}>{T}_{\theta}$, the corresponding segment of angle ${\theta}_{i}^{j}$ is removed. If only one candidate segment, ${e}_{i}^{1}$, remains, point P

_{i}is counted as a high-confidence tracking point (HCTP) according to Assumption 1 and segment ${e}_{i}^{1}$ is the matched road of point P

_{i}. Algorithm 1 shows the details of the preliminary matching procedure.

Algorithm 1 Preliminary Matching Algorithm | |

Input: | Trajectory P_{1} → P_{2} … → P_{n}; OSM road network R |

Output: | HCTPlist; Candidate matched points list ${c}_{i}^{1}$, ${c}_{i}^{2}$ … ${c}_{i}^{j}$ |

1: | Initialize HCTPlist and CanditateList as empty list; |

2: | fori = 1 to n do |

3: | C = GetCandidate (P_{i}, R, r); //get the candidates within radius r |

4: | for j = 1 to C.count do |

5: | ${\theta}_{i}^{j}$ = |azi_P_{i}-azi_${c}_{i}^{j}$|; |

6: | if ${\theta}_{i}^{j}$ < ${T}_{\theta}$ then |

7: | CandidatedList.add (${c}_{i}^{j}$); |

8: | end if |

9: | end for |

10: | if CandidateList.count == 1 then |

11: | HCTPlist.add (P_{i}); |

12: | end if |

13: | end for |

#### 3.3.2. Spatial–Temporal Matching

#### Spatial Analysis

_{i}and candidate segment ${e}_{i}^{j}$, ${d}_{i}^{j}=dist\left({c}_{i}^{j},{P}_{i}\right)$, and $\mu $ and $\sigma $ are the mean and variance value of normal distribution.

_{i}, $\left({c}_{i}^{1},{c}_{i}^{2}\right)$. The observation probabilities of these two candidate points are equal. Obviously, the correctly matched point should be ${c}_{i}^{2}$ according to Assumption 2. Hence, topological information is important for map matching, by which we can exclude certain points. The formula of the transmission probability is shown in Equation (7):

#### Temporal Analysis

Algorithm 2 Spatial and Temporal Matching Algorithm | |

Input: | HCTPlist P_{i}, P_{i+k}; CandidateList ${c}_{i}^{1}$, ${c}_{i}^{2}$ … ${c}_{i}^{j}$; Trajetory P_{i+}_{1} → P_{i+}_{2} … → P_{i+k} |

Output: | OSM-WayID-List; |

1: | Initialize OSM-WayID-List as empty list; |

2: | for each ${c}_{i}^{1}$ and ${c}_{i+k}^{1}$ do |

3: | F(${c}_{i}^{1}$) = 1; |

4: | F(${c}_{i+k}^{1}$) = 1; |

5: | end for |

6: | for t = i + 1 to i + k − 1 do |

7: | max = −∞; |

8: | for s = 1 to candidateList(P_{t}).count do |

9: | F(${c}_{t}^{s}$) = F(${c}_{t-1}^{j}$) + F(${c}_{t-1}^{j}$ → ${c}_{t}^{s}$); |

10: | Alt = F(${c}_{t}^{s}$); |

11: | if (Alt > max) then |

12: | max = Alt; |

13: | C = max. ${c}_{t}^{s}$; |

14: | end if |

15: | end for |

16: | OSM-WayID-List.add(C.id); |

17: | end for |

#### 3.4. Trajectory Correction Algorithm

Algorithm 3 Physical Attraction Model | |

Input: | Trajectory P_{1} → P_{2} → P_{n}; OSM-WayID-List; |

Output: | New Trajectory ${P}_{1}^{\prime}$ → ${P}_{2}^{\prime}$ … → ${P}_{n}^{\prime}$; |

1: | fort = 1 to n do |

2: | T = 0; K = ∞; |

3: | $\overline{d}$ = meandistance (d_{1}, d_{2} … d_{n}); |

4: | K = d_{i} − $\overline{d}$; |

5: | While T ≤ 20 && K > 0.5 do |

6: | F_{1}(P_{i}) = F_{2}(P_{i}); |

7: | ${\overline{d}}^{\prime}$ = meandistance (); |

8: | K = ${d}_{i}^{\prime}$ − ${\overline{d}}^{\prime}$; |

9: | T = T + 1; |

10: | end while |

11: | end for |

## 4. Experimental Tests of the Proposed Approach

#### 4.1. Experimental Data

#### 4.2. Trajectory Preprocessing

#### 4.3. Map Matching

#### 4.3.1. Evaluation Approach

#### 4.3.2. Parameter Selection

#### 4.3.3. Matching Result

#### 4.3.4. Running Time

#### 4.4. Data Correction

#### 4.4.1. Parameter Selection

#### 4.4.2. Correction Result

^{2}), where M is the number of nodes in the GPS dataset. For each node, a dataset was a square (100 × 100 m) centered at the node. It took at least 15 s to calculate the data for each node. However, the time complexity of the algorithm proposed in this paper is (M). Our algorithm only needs 150 ms to calculate the data for each point—a marked improvement on previous algorithms.

## 5. Conclusions

## Author Contributions

## Funding

## Conflicts of Interest

## References

- Gwon, G.P.; Hur, W.S.; Kim, S.W.; Seo, S.W. Generation of a Precise and Efficient Lane-Level Road Map for Intelligent Vehicle Systems. IEEE Trans. Veh. Technol.
**2017**, 66, 4517–4533. [Google Scholar] [CrossRef] - Li, Y.; Hua, L.; Tan, J.; Zan, L.; Hong, X.; Chen, C. Scan Line Based Road Marking Extraction from Mobile LiDAR Point Clouds. Sensors
**2016**, 16, 903. [Google Scholar] [CrossRef] - Guo, C.; Kidono, K.; Meguro, J.; Kojima, Y.; Ogawa, M.; Naito, T. A Low-Cost Solution for Automatic Lane-Level Map Generation Using Conventional In-Car Sensors. IEEE Trans. Intell. Transp. Syst.
**2016**, 17, 2355–2366. [Google Scholar] [CrossRef] - Tang, L.; Yang, X.; Kan, Z.; Li, Q. Lane-Level Road Information Mining from Vehicle GPS Trajectories Based on Naïve Bayesian Classification. ISPRS Int. J. Geo-Inf.
**2015**, 4, 2660–2680. [Google Scholar] [CrossRef] [Green Version] - Tang, L.; Yang, X.; Dong, Z.; Li, Q. CLRIC: Collecting Lane-Based Road Information via Crowdsourcing. IEEE Trans. Intell. Transp. Syst.
**2016**, 17, 2552–2562. [Google Scholar] [CrossRef] - Li, J.; Qin, Q.; Xie, C.; Zhao, Y. Integrated use of spatial and semantic relationships for extracting road networks from floating car data. Int. J. Appl. Earth Obs. Geoinf.
**2012**, 19, 238–247. [Google Scholar] [CrossRef] - Wang, J.; Rui, X.; Song, X.; Tan, X. A novel approach for generating routable road maps from vehicle GPS traces. Int. J. Geogr. Inf. Syst.
**2015**, 29, 69–91. [Google Scholar] [CrossRef] - Liu, X.; Biagioni, J.; Eriksson, J.; Wang, Y.; Forman, G.; Zhu, Y. Mining large-scale, sparse GPS traces for map inference: Comparison of approaches. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Beijing, China, 12–16 August 2012; pp. 669–677. [Google Scholar]
- Biagioni, J.; Eriksson, J. Map inference in the face of noise and disparity. In Proceedings of the International Conference on Advances in Geographic Information Systems, Redondo Beach, CA, USA, 6–9 November 2012; pp. 79–88. [Google Scholar]
- Worrall, S.; Nebot, E. Automated Process for Generating Digitised Maps through GPS Data Compression; University of Sydney: Sydney, Australia, 2007. [Google Scholar]
- Schroedl, S.; Wagstaff, K.; Rogers, S.; Langley, P.; Wilson, C. Mining GPS Traces for Map Refinement. Data Min. Knowl. Discov.
**2004**, 9, 59–87. [Google Scholar] [CrossRef] - Lee, J.G.; Han, J.; Whang, K.Y. Trajectory clustering:a partition-and-group framework. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Beijing, China, 12–14 June 2007; pp. 593–604. [Google Scholar]
- Lee, W.-C.; Krumm, J. Trajectory Preprocessing. In Computing with Spatial Trajectories; Springer: New York, NY, USA, 2011; pp. 3–33. ISBN 978-1-4614-1628-9. [Google Scholar]
- Fox, D. Adapting the Sample Size in Particle Filters Through KLD-Sampling, Adapting the Sample Size in Particle Filters Through KLD-Sampling. Int. J. Robot. Res.
**2003**, 22, 985–1003. [Google Scholar] [CrossRef] - Hightower, J.; Borriello, G. Particle Filters for Location Estimation in Ubiquitous Computing: A Case Study. In UbiComp 2004: Ubiquitous Computing; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2004; pp. 88–106. [Google Scholar]
- Murphy, K.; Russell, S. Rao-Blackwellised Particle Filtering for Dynamic Bayesian Networks. In Sequential Monte Carlo Methods in Practice; Statistics for Engineering and Information Science; Springer: New York, NY, USA, 2001; pp. 499–515. ISBN 978-1-4419-2887-0. [Google Scholar] [Green Version]
- Lou, Y.; Zhang, C.; Zheng, Y.; Xie, X.; Wang, W.; Huang, Y. Map-matching for Low-sampling-rate GPS Trajectories. In Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, Seattle, WA, USA, 4–6 November 2009; ACM: New York, NY, USA, 2009; pp. 352–361. [Google Scholar]
- Greenfeld, J.S. Matching GPS Observations to Locations on a Digital Map. In Proceedings of the 81th Annual Meeting of the Transportation Research Board, Washington, DC, USA, 14 January 2002. [Google Scholar]
- Qingquan, L.I.; Lian, H. A Map Matching Algorithm for GPS Tracking Data. Acta Geod. Cartogr. Sin.
**2010**, 39, 207–212. [Google Scholar] - Zhe, Z.; Qingquan, L.I.; Zou, H.; Wan, J.; University, S.; University, W. Curvature Integration Constrained Map Matching Method for GPS Floating Car Data. Acta Geod. Cartogr. Sin.
**2015**, 44, 1167–1176. [Google Scholar] - Marchal, F.; Hackney, J.; Axhausen, K. Efficient Map Matching of Large Global Positioning System Data Sets: Tests on Speed-Monitoring Experiment in Zürich. Trans. Res. Rec. J. Transp. Res. Board
**2005**, 1935, 93–100. [Google Scholar] [CrossRef] - Zhang, L.; Thiemann, F.; Sester, M. Integration of GPS traces with road map. In Proceedings of the International Workshop on Computational Transportation Science, San Jose, CA, USA, 2 November 2010; pp. 17–22. [Google Scholar]
- Liu, Q.; Tang, J.; Deng, M.; Shi, Y. An Iterative Detection and Removal Method for Detecting Spatial Clusters of Different Densities. Trans. GIS
**2015**, 19, 82–106. [Google Scholar] [CrossRef] - Barron, C.; Neis, P.; Zipf, A. A Comprehensive Framework for Intrinsic OpenStreetMap Quality Analysis. Trans. GIS
**2015**, 18, 877–895. [Google Scholar] [CrossRef] - Zhang, H.; Malczewski, J. Accuracy Evaluation of the Canadian OpenStreetMap Road Networks. Int. J. Geospat. Environ. Res.
**2017**, 5, 347. [Google Scholar] - Wang, M.; Li, Q.; Hu, Q.; Zhou, M. Quality Analysis of Open Street Map Data. ISPRS Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci.
**2013**, XL-2/W1, 155–158. [Google Scholar] [CrossRef] - OpenStreetMap Wiki. Available online: https://wiki.openstreetmap.org/wiki/Main_Page (accessed on 7 September 2018).
- Yuan, J.; Zheng, Y.; Zhang, C.; Xie, X.; Sun, G.Z. An Interactive-Voting Based Map Matching Algorithm. In Proceedings of the Eleventh International Conference on Mobile Data Management, Kansas City, MI, USA, 23–26 May 2010; pp. 517–520. [Google Scholar]
- Fu, L.; Sun, D.; Rilett, L.R. Heuristic shortest path algorithms for transportation applications: State of the art. Comput. Oper. Res.
**2006**, 33, 3324–3343. [Google Scholar] [CrossRef] - Cao, L.; Krumm, J. From GPS traces to a routable road map. In Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, Seattle, WA, USA, 4–6 November 2009; pp. 3–12. [Google Scholar]

**Figure 2.**A trajectory of the GPS points. The green dots represent the GPS points and arrows represent the driving direction of floating car. ${P}_{i}$ is the ID of the GPS point.

**Figure 3.**Areas with different point densities. D represents the buffer of central point P. The circle is the range of buffer D. Red dots represent the GPS points inside D, and black dots are the GPS points outside D. (

**a**) The points in the center of a city have more neighboring points; (

**b**) the points out of the center of a city with lower density; and (

**c**) the points at the edge of a city, which might be recognized as noise.

**Figure 4.**The Delaunay triangulation network of GPS points. The red dots represent the GPS points and the blue lines are the edges of the Delaunay triangulation network. (

**a**) in the center of the city, the length and variance of edges in the Delaunay triangulation network are small, so the value of R

_{S}is small; (

**b**) at the edge of the city, the density of points is small, so the length and variance of the edges in the Delaunay triangulation network is large. As a result, the value of R

_{S}is large.

**Figure 5.**The segments in the OSM map. The red lines represent the roads of OSM, the blue points are the start or end points of segments, and e

_{i}means the segment of road.

**Figure 6.**(

**a**) P

_{i}means the ID of GPS point and ${e}_{i}^{j}$ represents the candidate segment of road. ${d}_{i}^{j}$ means the distance between P

_{i}and the candidate segment. ${c}_{i}^{j}$ is ID of the candidate point of P

_{i}. The red lines represent the roads of OSM, and blue dashed lines represent the buffer of center point P

_{i}. Green dashed lines represent the distance between P

_{i}and candidate points. The red dots mean the GPS points, and the green dots are the candidate points of P

_{i}. In the buffer of point P

_{i}, there are three segments with which it intersects. (

**b**) The red dashed lines indicate the direction of the candidate segments; ${\theta}_{i}^{j}$ means the angle differences; only segment ${e}_{i}^{1}$ remains after being filtered out by a threshold.

**Figure 7.**An example of the transmission probability. P

_{i}means the ID of GPS point and ${c}_{i}^{j}$ represent the ID of candidate point. Red dots mean the GPS points of the trajectory and the black lines are the trajectory of floating car. The red lines represent the OSM roads and the green dots indicate the candidate points of P

_{i}.

**Figure 8.**The matching result of the ST-algorithm. P

_{i}means the ID of GPS point and ${c}_{i}^{j}$ is the ID of candidate point. The points P

_{i}and P

_{i+k}are the HCTP points, so there is only one candidate point. The black arrows represent the candidate matching path from P

_{i}to P

_{i}

_{+1}, and the red arrows represent the final matching results.

**Figure 9.**An example of the multipath effect. The yellow line is the OSM road from right to left and the white line is the OSM road from left to right. The red dots are matched to the yellow road and the green dots are matched to the white road. Some points appear on the wrong road because of the GPS error.

**Figure 10.**Two types of forces acting on the trajectories. The green dots are the GPS points on a trajectory and the black dots are the points on another trace.

**Figure 11.**An example of the physical attraction model for point ${P}_{i}$. P

_{i}means the ID of GPS points, and d

_{i}represents the distance from GPS points to the matched OSM road. ${P}_{i}^{\prime}$ is the new position of P

_{i}. The green dots are the GPS points, and the red dot is the new point. The red lines are the OSM roads, the black lines represent the original trajectory, and the black dashed lines indicate the new trajectory.

**Figure 12.**The experimental data. (

**a**) GPS points collected from taxis; the yellow dots represent the GPS points whose sampling frequency ranged from 2–10 min; the gray dots are the points who sampling frequency ranged from 1–2 min; the orange dots are the points whose sampling frequency ranged from 40–60 s; and the blue dots are the points whose sampling frequency ranged from 1–40 s. (

**b**) Distribution of the sampling frequencies.

**Figure 13.**The results of preprocessing. The red dots are the preserved GPS points and the black dots are noise points (

**a**) An example on a large scale. (

**b**) The detailed results of the black rectangle in (

**a**).

**Figure 15.**The (

**a**) recall and (

**b**) accuracy of map matching for different ${T}_{\theta}$. The blue lines represent the value of recall and accuracy of HCTP for different ${T}_{\theta}$.

**Figure 16.**The accuracy of map matching using spatial–temporal (ST)-matching (blue) vs. hierarchical ST (HST)-matching (orange).

**Figure 18.**Original data from a road. The black lines represent the OSM road, the red dots represent the points matched to the upper black line, and the green dots are the points matched to the lower line. The points from different directions are mixed together.

**Figure 19.**The results of the correction algorithm represented in this paper. The points from different directions separate well.

**Figure 20.**The results of the algorithm represented in reference [30]. The data with the same direction clustered together, and the maximum gap was 0.5 m.

**Figure 21.**An example of an intersection. (

**a**) The result of the algorithm represented in this paper. Various colors represent different directions of GPS points. (

**b**) The result of the algorithm represented in reference [30], in which there are some incorrect edges as represented by the teal lines.

Value | Motorway | Trunk | Primary | Secondary | Tertiary | Service | Residential |
---|---|---|---|---|---|---|---|

Min-speed (km/h) | 90 | 60 | 40 | 30 | 20 | 0 | 0 |

Max-speed (km/h) | 120 | 100 | 60 | 50 | 40 | 20 | 15 |

**Table 2.**The mean $\left(R.u\right)$ and variance ($R.\sigma $ ) of the road speed constraints for the different roads.

Threshold | Motorway | Trunk | Primary | Secondary | Tertiary | Service | Residential |
---|---|---|---|---|---|---|---|

$R.u$ | 105 | 80 | 50 | 40 | 30 | 10 | 10 |

$R.\sigma $ | 5 | 7 | 3 | 3 | 3 | 3 | 1.5 |

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Li, B.; Guo, Y.; Zhou, J.; Cai, Y.
A Data Correction Algorithm for Low-Frequency Floating Car Data. *Sensors* **2018**, *18*, 3639.
https://doi.org/10.3390/s18113639

**AMA Style**

Li B, Guo Y, Zhou J, Cai Y.
A Data Correction Algorithm for Low-Frequency Floating Car Data. *Sensors*. 2018; 18(11):3639.
https://doi.org/10.3390/s18113639

**Chicago/Turabian Style**

Li, Bijun, Yuan Guo, Jian Zhou, and Yi Cai.
2018. "A Data Correction Algorithm for Low-Frequency Floating Car Data" *Sensors* 18, no. 11: 3639.
https://doi.org/10.3390/s18113639