A Historical-Trajectories-Based Map Matching Algorithm for Container Positioning and Tracking

Positioning and tracking of containers is becoming an urgent demand of container transportation. Map matching algorithms have been widely applied to correct positioning errors. Because container trajectories have the characteristics of low sampling rate and missing GPS points, existing map matching algorithms based on the shortest path principle are not applicable for container positioning and tracking. To solve this problem, a historical-trajectories-based map matching algorithm (HTMM) is proposed. HTMM mines the travel time and the frequency in historical trajectories to help find the local path between two adjacent candidate points. HTMM first presents a path reconstruction method to calculate the travel time of historical trajectories on each road segment. A historical path index library based on a path tree is then developed to efficiently index historical paths. In addition, a location query and tracking method is introduced to determine the location of containers at given time. The performance of HTMM is validated on a real freight trajectory dataset. The experimental results show that HTMM has more than 3% and 5% improvement over the ST-Matching algorithm and HMM-based algorithm, respectively, at 60–300 s sampling intervals. The positioning error is reduced by half at a 60 s sampling interval.


Introduction
As the core of cargo transportation, containers need to be positioned and tracked remotely in transit during transportation. The positioning and tracking of containers is mostly realized by installing satellite positioning devices on containers, collecting and transmitting data to a remote monitoring platform [1]. Satellite positioning devices are usually powered by microbatteries. Considering the high energy consumption of positioning and communication and the long transportation cycle, the sampling rate is generally set at 2 min or more to extend the battery life [2]. The distance between two adjacent GPS points is too far and cannot accurately reflect the real path of a container [3]. Moreover, the positioning signal becomes weak or even interrupted in tunnels or under viaducts and tall buildings, which causes missing and inaccurate trajectories [4]. These problems cause blind zones in the positioning and tracking of containers, and high precision positioning cannot be achieved. Therefore, accurate positioning under scenarios of low sampling rate and missing and inaccurate trajectories is a major challenge in container transportation.
Map matching algorithms, which correct errors by matching GPS points to the most probable road, have been verified as effective for map matching of passenger transport system [5,6]. To date, there are few map matching algorithms for container transportation with low-sampling-rate trajectories. Hidden Markov model (HMM)-based map matching algorithms (such as ST-Matching algorithm [7]) have been applied to match trajectories with low sampling rates of 60-300 s [8]. The commonality of these algorithms is that they choose the shortest path between two adjacent candidate points as the local path [9]. Drivers do not choose the shortest path in many cases, which reduces the accuracy of map matching algorithms. On the one hand, container transport is accompanied by cargo handling, vehicle maintenance, high toll fees and loan guarantee, etc. [10]. Drivers are more inclined to choose familiar paths or paths with relatively low comprehensive cost rather than the shortest path. On the other hand, with the decrease in sampling rate, the number of potential matched paths between two adjacent GPS points increases. Therefore, using a shortest-path-based map matching algorithm to match low-sampling-rate trajectories of container transport can easily lead to matching errors.
The information in historical trajectories, including travel time, frequency and driving preference of drivers, is not fully utilized in current map matching algorithms. Based on this, we wonder whether potential information can be mined from historical trajectories to help map matching of container transport with low sampling rates and missing GPS points. To implement this idea, a novel historical-trajectories-based map matching algorithm (HTMM) is presented in this paper. HTMM is an improved HMM-based map matching algorithm. In HTMM, information from historical trajectories, including travel time and frequency, is mined to help determine the local path. The historical velocity, distance and heading angle are employed in the search for an optimal path. Experiments on a real freight trajectory dataset verify the effectiveness of HTMM. In general, the major contributions of this paper are as follows: (1) A historical-trajectories-based map matching algorithm (HTMM) consisting of local candidate path selection and optimal matched path search is proposed. The local path selection utilizes the travel time and the frequency mined from historical trajectories. Historical speed is introduced to improve the transition probability in the optimal matched path search. (2) A historical path reconstruction and index method is proposed to make full use of the historical paths. A meta path is introduced to determine the travel time of historical paths. A historical path index library based on a path tree is created to store historical information and accelerate the index speed. (3) A location correction and query method is presented to reconstruct matched paths and calculate the location of containers. Through this method, location correction and continuous tracking of the container are realized.
The outline of this paper is as follows. Section 2 reviews the related work of container positioning and tracking technology and map matching algorithms. Section 3 introduces a detailed discussion on the proposed HTMM algorithm. Section 4 verifies the accuracy and effectiveness of HTMM through experiments and displays the experimental results. Section 5 gives the conclusions and discusses future work.

Related Work
The Internet of Things (IoT) endows containers with intelligence, which is the basis of container positioning and tracking [11]. A container equipped with an IoT system collects identity, location, temperature and other data and transmits these data to a remote monitoring platform through a cellular mobile network, low-power LAN, LEO satellite, etc. The remote monitoring platform stores and process these data and provides data services for specified scenarios, among which container positioning and tracking is most widely demanded. For this purpose, many researchers have put forward different solutions. Nie et al. [12] designed an intelligent terminal system based on LEO satellite and narrowband Internet of Things (NB-IoT) that solves the problems of high power consumption and insufficient coverage of terminals. Lu et al. [13] proposed a container multimodal transport platform based on space-based IoT that can provide container tracking and monitoring services for different carries, such as highway, railway and waterway transportation. Tang et al. [14] proposed a refrigerated container monitoring system based on a wireless sensor network in which a geographic information system is used. This system can display container location, time, temperature and humidity information on Google Maps.
In summary, the mentioned literature points out that the following two problems need to be addressed in the design and development of positioning terminals: (1) low cost and (2) low power consumption. These two problems lead to low positioning precision and incomplete tracking of containers. Do date, few researchers have come up with useful solutions.
One method to improve positioning precision is to use pseudo-range differential positioning devices with sub-meter precision; however. high cost, missing trajectories and drift cannot be avoided. Map matching technology is a good choice to improve precision and reduce cost. Besides improvement of positioning precision, on a deeper level, trajectories after map matching can be applied to advanced applications, such as logistics tracking, location prediction and path planning or recommendation [15]. Missing and low sampling rate (60 s or more) trajectories are the research object of this paper. Simple map matching algorithms, such as geometry-based [16] and topology-based [17] map matching algorithms, have higher accuracy when the sampling rate is 1-10 s but perform worse when matching low-sampling-rate trajectories.
There are two types of map matching algorithms suitable for low-sampling-rate trajectories. One is map matching based on HMM. Newson et al. [8] first applied HMM to map matching; the accuracy can reach 99% when the sampling rate is 1-30 s but gradually decreases with sampling rates over 30 s. This map matching algorithm considers various features, such as distance and direction, when calculating observation and transition probabilities. Most assume these features satisfy a certain probability distribution. Lou et al. [7] proposed the ST-Matching algorithm on this basis. The ST-Matching combines spatial and temporal features; in particular, the road speed limit is introduced into the temporal feature. The accuracy of this algorithm can reach 80% when the sampling rate is 2.5 min. Yuan et al. [9] proposed the IVMM algorithm, which considers the influence of the context GPS point to match the current GPS point. It adds an interacting voting process on the basis of ST-Matching. When the sampling rate is 1.5-6.5 min, the accuracy of IVMM is about 70%. There are many similar map matching algorithms, such as those presented in [18,19].
Map matching algorithms based on HMM assume that the local candidate path between two adjacent points follows the shortest length/time path, which is not suitable for all low-sampling-rate trajectories and dense road networks. Map matching algorithms based on local path inference that do not rely on shortest length/time path assumptions have been proposed to solve this problem. Zheng et al. [20] believe that the user's trajectories follow certain space-time travel rules and proposed a two-stage path inference system, HRIS, driven by historical data. The first stage is local path inference, which calculates multiple possible paths between any two adjacent points. In the second stage, a dynamic programming algorithm is used to calculate the most probable path from the local paths as the optimal matched path. Banerjee et al. [21] proposed a path inference system, InferTra, that infers the complete path of a trajectory using the Gibbs sampling algorithm and the network mobility model. Wu et al. [22] proposed a completely probabilistic path inference system, STRS, that also considers the temporal and spatial probability distribution models of historical trajectories. Compared with other map matching algorithms, the accuracy of STRS is greatly improved.
In essence, these algorithms solve the most probable path through the assumed probability distribution or prior historical trajectories. However, there are some problems in solving the positioning and tracking of container transportation: (1) Map matching algorithms based on HMM follow the shortest length/time principle, which is not appliable for missing and low-sampling-rate trajectories. Map matching algorithms based on local path inference require a large number of historical trajectories to extract path selection preference and travel law, which lead to high time complexity. (2) With the rapid development of roads and traffic control, only using movement modes found from historical trajectories may not be suitable for the current situation. (3) In passenger transportation systems, trajectories come from multiple drivers with different purposes, and the movement modes found from all historical trajectories cannot represent each driver's driving preferences. (4) Most of the above algorithms focus on passenger transportation systems, especially floating cars in an urban road network. Container transportation systems are rarely studied, but the demand for positioning and tracking of containers is considerable.
Motivated by these problems, a map matching algorithm called HTMM is developed in this paper. The main differences between this work and previous work are as follows: (1) HTMM is a two-stage map matching algorithm based on a hidden Markov model and a local path inference method.
(2) Both historical information and real-time vehicle movement information are considered in HTMM. (3) HTMM aims to improve positioning precision and track the location of containers in a container transportation systems.

Preliminary
In this subsection, some definitions of the map matching problem will be given as follows.
Definition 1 (Trajectory). Trajectory, T, is defined as a sequence of GPS points with timestamps, i.e., T =< p 1 , p 2 , · · · , p n >. A GPS point is expressed as p i =< lat, lng, t, v, γ > (1 ≤ i ≤ n), which consists of the latitude (p i .lat), the longitude (p i .lng), the velocity (p i .v) and the heading angle (p i .γ) of p i at time p i .t. Definition 2 (Road Network). Road network is defined as a directed graph, G ={V, E}. Node set V represents the set of, end or intersection of road segments. Edge set E represents the set of road segments that connect the nodes in V. The e j (1 ≤ j ≤ |E|) in E has three attributes: (1) the unique number of e j (e j .eid); (2) the coordinates of the start node (e j .start) and the end node (e j .end) of e j ; and (3) the length of e j (e j .length). |E| is the number of edges in E. Definition 3 (Candidate road segment and candidate point). Given p i and G ={V, E}, the candidate road segments of p i are defined as E i = e j |d p i , e j < δ, ∀e j ∈ E , where d p i , e j is the projection distance from p i to e j , and δ is the distance threshold. The candidate point of p i on e j is expressed as c j i .

Definition 4 (Path).
A path is a sequence of road segments. As shown in Figure 1, the path from c r i to c s i+1 is expressed as P r→s : {e r → e r+1 → · · · → e s }, in which c r i is on e r and c s i+1 is on e s . algorithms based on local path inference require a large number of historical trajectories to extract path selection preference and travel law, which lead to high time complexity. (2) With the rapid development of roads and traffic control, only using movement modes found from historical trajectories may not be suitable for the current situation. (3) In passenger transportation systems, trajectories come from multiple drivers with different purposes, and the movement modes found from all historical trajectories cannot represent each driver's driving preferences. (4) Most of the above algorithms focus on passenger transportation systems, especially floating cars in an urban road network. Container transportation systems are rarely studied, but the demand for positioning and tracking of containers is considerable.
Motivated by these problems, a map matching algorithm called HTMM is developed in this paper. The main differences between this work and previous work are as follows: (1) HTMM is a two-stage map matching algorithm based on a hidden Markov model and a local path inference method.

Preliminary
In this subsection, some definitions of the map matching problem will be given as follows.
Definition 1 (Trajectory). Trajectory, T, is defined as a sequence of GPS points with timestamps, i.e., T = <p 1 , p 2 , ⋯ , p n >. A GPS point is expressed as p i = <lat, lng, t, v, γ> (1 ≤ i ≤ n), which consists of the latitude (p i .lat), the longitude (p i .lng), the velocity (p i .v) and the heading angle (p i .γ) of p i at time p i .t.
Definition 2 (Road Network). Road network is defined as a directed graph, G = {V, E}. Node set V represents the set of, end or intersection of road segments. Edge set E represents the set of road segments that connect the nodes in V. The e j (1 ≤ j ≤ |E|) in E has three attributes: (1) the unique number of e j (e j .eid); (2) the coordinates of the start node (e j .start) and the end node (e j .end) of e j ; and (3) the length of e j (e j .length). |E| is the number of edges in E.
Definition 3 (Candidate road segment and candidate point). Given p i and G = {V, E}, the candidate road segments of p i are defined as E i = {e j |d(p i , e j ) < δ, ∀e j ∈ E}, where d(p i , e j ) is the projection distance from p i to e j , and δ is the distance threshold. The candidate point of p i on e j is expressed as c i j .
Definition 4 (Path). A path is a sequence of road segments. As shown in Figure 1, the path from c i r to c i+ s is expressed as P r→s : {e r → e r+1 → ⋯ → e s }, in which c i r is on e r and c i+ s is on e s .   Due to the measurement error of the positioning device, the GPS points are not always on the road of the map. Therefore, the goal of the map matching algorithm is to find the most probable path under the given trajectory, T, and the road network, G = {V, E}. Table 1 presents the relevant indices and variables for HTMM.

Notation Description
The latitude and longitude of the ith GPS point The timestamp of the ith GPS point The velocity of the ith GPS point The heading angle of the ith GPS point e j The jth edge e j .eid The number of e j e j .start, e j .end The start and end node of e j e j .length The length of e j c j i The candidate point of p i on e j P r→s The path from e r to e s m The average speed from p i to p i+1 v j The historical speed of e j The transition probability from c r i to c s i+1

System Overview
As shown in Figure 2, the framework of HTMM consists of three parts: the historical path reconstruction and index module, the local candidate selection module, and the location correction and query module. In the historical path reconstruction and index module, the historical trajectories with high sampling rates are matched to determine the historical paths. A path reconstruction method is proposed to generate meta paths, which record the enter time and the leave time of the historical paths. A historical path index library is built to index paths efficiently. In the local candidate path selection module, a local candidate path selection strategy is introduced to determine the path between two adjacent candidate points, in which the similarity of travel time and the frequency are calculated. In the location correction and query module, an improved matched point search method based on the HMM model is used to calculate the matched points. A location query and tracking method based on the matched result is introduced to obtain the location of the container at a given time.
ule, the historical trajectories with high sampling rates are matched to determine the historical paths. A path reconstruction method is proposed to generate meta paths, which record the enter time and the leave time of the historical paths. A historical path index library is built to index paths efficiently. In the local candidate path selection module, a local candidate path selection strategy is introduced to determine the path between two adjacent candidate points, in which the similarity of travel time and the frequency are calculated. In the location correction and query module, an improved matched point search method based on the HMM model is used to calculate the matched points. A location query and tracking method based on the matched result is introduced to obtain the location of the container at a given time.

Historical Path Reconstruction and Index Module
In this subsection, a historical path reconstruction and index method is proposed to make better use of the historical paths.

Map Matching
The map matching algorithm based on HMM proposed by Newson et al. [8] is utilized to obtain the real paths of the historical trajectories. This algorithm has 99% accuracy when the sampling rate is under 30 s. The matched paths can be considered to represent where the containers really pass through.

Path Reconstruction
After map matching, two adjacent matched points may be on the same road segment or across multiple road segments, so the travel time of historical paths on each road seg-

Historical Path Reconstruction and Index Module
In this subsection, a historical path reconstruction and index method is proposed to make better use of the historical paths.

Map Matching
The map matching algorithm based on HMM proposed by Newson et al. [8] is utilized to obtain the real paths of the historical trajectories. This algorithm has 99% accuracy when the sampling rate is under 30 s. The matched paths can be considered to represent where the containers really pass through.

Path Reconstruction
After map matching, two adjacent matched points may be on the same road segment or across multiple road segments, so the travel time of historical paths on each road segment are unknown. A path reconstruction method based on time and distance interpolation is proposed in this subsection. A constructed path consists of several meta paths, which are defined as follows: Definition 6 (Meta Path). The meta path of road segment e j is represented as MP j , which is associated with the enter time on e j (MP j .enter_time) and the leave time on e j (MP j .leave_time). The path P r→s = {e r → e r+1 → · · · → e s } consists of a meta path sequence {MP r , MP r+1 , · · · , MP s }.
The travel time of historical paths is calculated to reconstruct the meta path. Three situations of the location relationship of two adjacent matched points need to be considered. As shown in Figure 3a, the leave time of MP r is equal to the enter time of MP r+1 , i.e., MP r .leave_time = MP r+1 .enter_time. It is supposed that the carrier of the container travels at a constant speed. MP r .leave_time is calculated as: where dist(m r i , e r .end) is the distance between m r i and e r .end, and len(m r i , m r+1 i+1 is the path length between m r i and m r+1 i+1 . Definition 7 (Meta Path). The meta path of road segment e j is represented as MP j , which is associated with the enter time on e j (MP j .enter_time) and the leave time on e j (MP j .leave_time). The path P r→s = {e r → e r+ → ⋯ → e s } consists of a meta path sequence {MP r , MP r+ , ⋯ , MP s }.
The travel time of historical paths is calculated to reconstruct the meta path. Three situations of the location relationship of two adjacent matched points need to be considered.
Situation 1: m i r and m i+ r+ are on the adjacent road segments and e r+ , respectively.
As shown in Figure 3a, the leave time of MP r is equal to the enter time of MP r+ , i.e., MP r .leave_time = MP r+ .enter_time. It is supposed that the carrier of the container travels at a constant speed. MP r .leave_time is calculated as: where dist(m i r , e r .end) is the distance between m i r and e r .end, and len(m i r , m i+1 r+1 ) is the path length between m i r and m i+1 r+1 .  (2): Situation 3: m i r and m i+ r are on the same road segment, e r . As shown in Figure 3c, the matched point, m i r , does not take new information for the calculation of its travel time. The enter time and the leave time of MP r should be calculated according to the next matched point that is not on e r .  (2): Situation 3: m r i and m r i+1 are on the same road segment, e r . As shown in Figure 3c, the matched point, m r i , does not take new information for the calculation of its travel time. The enter time and the leave time of MP r should be calculated according to the next matched point that is not on e r .
For the meta path of the road segment on which the first (or last) matched point is located, the enter time (or the leave time) is equal to the timestamp of the first (or the last) GPS point.

Historical Path Indexing
To accelerate the indexing speed of the historical paths, a historical path index structure called an HP-tree is built based on a path tree. The HP-tree is an improvement on FP-tree proposed by Huang et al. [23,24]. The HP-tree has the following characteristics: (1) Each node of the HP-tree represents a road segment, among which the root node is the start road segment of the historical path (or sub path) and the leaf node is the end road segment of the historical path (or sub path). (2) The child node of a node represents the next road segment to which it directly connects.
(3) All HP-trees make up an HP-forest to represent all historical paths and sub paths. Each node in the HP-tree is associated with: (1) sid, which represents the road segment number and equals to eid of the node; (2) cnt, which represents the number of times the node is passed through; and (3) stime, which represents the travel time of historical paths on the node. stime is represented in the form of a dictionary, the key of which is the trajectory id, tid, the value of which is a list of travel times of historical paths on the road segment.
Each HP-tree is associated with: (1) the root node, root and (2) the relational list, idx_list. idx_list is represented in the form of a dictionary, the key of which is the road segment number, sid, the value of which is a list of nodes with the same sid.
As shown in Figure 4, there are 4 historical paths: Seven HP-trees with e 1 − e 7 as root nodes can be created. Only the index structure of two of them are shown in Figure 5. With the help of an HP-tree, historical paths between given start and end road segments can be efficiently searched. For example, to search all the historical paths between e 2 and e 7 , the HP-tree with root node e 2 is first found; then, the relational list, idx_list, with key e 7 .eid is searched. Each node in idx_list is traversed, and its parent node is recursively traversed until the root node is reached. The historical paths are obtained by outputting these nodes in reverse order. Therefore, the historical paths are {e 2 → e 3 → e 5 → e 7 } and {e 2 → e 4 → e 6 → e 7 }. cnt and stime of each road segment in historical paths can be obtained at the same time.
tree proposed by Huang et al. [23,24]. The HP-tree has the following characteristics: (1) Each node of the HP-tree represents a road segment, among which the root node is the start road segment of the historical path (or sub path) and the leaf node is the end road segment of the historical path (or sub path). (2) The child node of a node represents the next road segment to which it directly connects. (3) All HP-trees make up an HP-forest to represent all historical paths and sub paths.
Each node in the HP-tree is associated with: (1) sid, which represents the road segment number and equals to eid of the node; (2) cnt, which represents the number of times the node is passed through; and (3) stime, which represents the travel time of historical paths on the node. stime is represented in the form of a dictionary, the key of which is the trajectory id, tid, the value of which is a list of travel times of historical paths on the road segment.
Each HP-tree is associated with: (1) the root node, root and (2) the relational list, idx_list. idx_list is represented in the form of a dictionary, the key of which is the road segment number, sid, the value of which is a list of nodes with the same sid.
As shown in Figure 4, there are 4 historical paths: P 1→7 = {e 1 →e 2 →e 4 →e 6 →e 7 } , P 1→5 = {e 1 →e 2 →e 3 →e 5 } , P 2→7 = {e 2 →e 3 →e 5 →e 7 } and P 2→6 = {e 2 →e 4 →e 6 } . Seven HP-trees with e 1 − e 7 as root nodes can be created. Only the index structure of two of them are shown in Figure 5. With the help of an HP-tree, historical paths between given start and end road segments can be efficiently searched. For example, to search all the historical paths between e 2 and e 7 , the HP-tree with root node e 2 is first found; then, the relational list, idx_list, with key e 7 .eid is searched. Each node in idx_list is traversed, and its parent node is recursively traversed until the root node is reached. The historical paths are obtained by outputting these nodes in reverse order. Therefore, the historical paths are {e 2 →e 3 →e 5 →e 7 } and {e 2 →e 4 →e 6 →e 7 } . cnt and stime of each road segment in historical paths can be obtained at the same time.

Local Candidate Path Selection Module
For the missing and low-sampling-rate trajectories of containers, the candidate points of two adjacent GPS points may not be in the same road segment or adjacent road segments. There may be multiple paths to select between two adjacent GPS points. It is assumed that the path between two adjacent GPS points follows the shortest path principle in the map matching algorithm based on HMM and several improved algorithms. In Fig-Figure 5. Illustration of HP-forest. The HP-tree with root e 1 stores two historical paths: {e 1 → e 2 → e 4 → e 6 → e 7 } and {e 1 → e 2 → e 3 → e 5 }. The HP-tree with root e 2 stores two historical paths: {e 2 → e 3 → e 5 → e 7 } and {e 2 → e 4 → e 6 }.

Local Candidate Path Selection Module
For the missing and low-sampling-rate trajectories of containers, the candidate points of two adjacent GPS points may not be in the same road segment or adjacent road segments. There may be multiple paths to select between two adjacent GPS points. It is assumed that the path between two adjacent GPS points follows the shortest path principle in the map matching algorithm based on HMM and several improved algorithms. In Figure 6a, only one path between two candidate points, c r i and c s i+1 , of high-sampling-rate trajectories is found, which is the shortest path. However, in Figure 6b, 3 possible paths are found between c r i and c s i+1 of low-sampling-rate trajectories. Based on the shortest path principle, P 2 is selected as the local candidate path. However, drivers may choose P 1 or P 3 under different traffic conditions and based on personal preferences.  Figure 5. Illustration of HP-forest. The HP-tree with root e 1 stores two historical paths: {e 1 →e 2 →e 4 →e 6 →e 7 } and {e 1 →e 2 →e 3 →e 5 }. The HP-tree with root e 2 stores two historical paths: {e 2 →e 3 →e 5 → e 7 } and {e 2 →e 4 →e 6 }.

Local Candidate Path Selection Module
For the missing and low-sampling-rate trajectories of containers, the candidate points of two adjacent GPS points may not be in the same road segment or adjacent road segments. There may be multiple paths to select between two adjacent GPS points. It is assumed that the path between two adjacent GPS points follows the shortest path principle in the map matching algorithm based on HMM and several improved algorithms. In Figure 6a, only one path between two candidate points, c i r and c i+ s , of high-sampling-rate trajectories is found, which is the shortest path. However, in Figure 6b Figure 6. Possible paths from candidate point c i r to c i+ s : (a) for high-sampling-rate trajectories, both the shortest path and the local candidate path is P ; (b) for low-sampling-rate trajectories, P is the shortest path, but the local candidate path may be P , P or P . i to c s i+1 : (a) for high-sampling-rate trajectories, both the shortest path and the local candidate path is P 1 ; (b) for low-sampling-rate trajectories, P 2 is the shortest path, but the local candidate path may be P 1 , P 2 or P 3 .
A local candidate path selection strategy that takes the travel time and the frequency of historical paths into consideration is proposed in this paper and is elaborated in the following section.

Possible Paths Search
To find all the possible paths between two adjacent candidate points, the directed graph, G ={V, E}; the end node of e r ; and the start node of e s are found and viewed as the source node and the target node, respectively. The steps of the search strategy are as follows: (1) Search the historical paths between e r .end and e s .start in the HP-Forest.; (2) If no historical path is found, the shortest path between e r .end and e s .start is selected as the local candidate path; (3) If only one historical path is found, then it is the local candidate path. If multiple historical paths are found, then add all these historical paths to the possible path set.

Similarity Calculation (1) Travel Time Estimation
Considering road conditions and driving habits of drivers, the shortest path is not necessarily the best choice. It is assumed that the driver is more likely to choose the path with a travel time is close to that of the current trajectory. Road-segment-based and path-based methods are commonly used for path travel time estimation [22]. The travel time of a road segment is obtained according to the estimated average speed in the road-segment-based method. The travel time of a path is the sum of the travel time of all the road segments. However, it cannot be used very effectively when speed changes suddenly at crossroads or speed is not available. In a path-based method, the average time of paths is calculated by retrieving the same paths form the historical paths. Although speed estimation is avoided, there are rarely two identical paths in the historical paths.
A travel time estimation method combining a road-segment-based method and a pathbased method is proposed in this paper. Because the possible paths between two adjacent candidate points are the sub paths of the historical paths, the problem with the path-based method is avoided. The travel time estimation of road segments is obtained by calculating the average travel time of the corresponding road segment from multiple identical possible paths. The estimated travel time of the path is equal to the sum of estimated travel time of these road segments.
The possible path set between c r i and c s i+1 is represented as CP = {CP 1 , CP 2 , . . . , CP l }. Assume the historical trajectories that match CP k = {e r , e r+1 , . . . , e s }(k = 1, 2, · · · , l) are T 1 , T 2 , · · · , T l . The travel time of T k on each road segment is represented as t k r , t k r+1 , . . . , t k s . The travel time of the intermediate road segments, e r+1 , . . . , e s−1 , that are passed through completely is formulated as: For e r and e s , the location of the matched point on the road segment needs to be considered when calculating the travel time. The calculation formula is: Therefore, the similarity of the time difference between c r i , c s i+1 and the estimated travel time of CP k is: (

2) Frequency Calculation
Path frequency is the second consideration when selecting the local candidate path. Through the observation of drivers, Gao et al. [25] found that about 60% of driving paths are repeated for at least 40 days.
The number of times all road segments in CP k are passed through by historical trajectories is represented as c k r , c k r+1 · · · , c k s . It is found that the number of times that CP k is passed through is equal to c k s . Therefore, the frequency of CP k is: (3) Path Selection The travel time similarities and the frequencies of all possible paths in CP are calculated by integrating the weights, w time and w f re . The comprehensive score, S k , is formulated as: The local candidate path is the most similar with the largest S k .

Location Correction and Query Module
A matched point search strategy is proposed in this subsection. More specifically, the matched point is the corrected GPS point. Because the GPS points are discrete, the location of a GPS point cannot be known every time. Therefore, a method of location query of GPS points is introduced.

Optimal Matched Point Search
As shown in Figure 7, each GPS point has multiple candidate points, so there are multiple local candidate paths between two adjacent GPS points. In order to determine which candidate point the GPS point matches to, the probabilities of all local candidate paths are calculated.
(1) Observation Probability where dist(p i , c i r ) is the distance between p i and c i r and obeys the normal distribution with mean zero and variance σ. θ is the angle between the driving direction and the direction of the road segment and obeys exponential distribution with rate parameter λ.  . Candidate graph composed of candidate points and local candidate paths. Each candidate point has an observation probability, and the probability from the previous candidate point to the current candidate point is expressed as a transition probability.
(2) Transition Probability The transition probability represents the possibility that the next path of e r is e s . In most map matching algorithms based on HMM, it is supposed that the transition probability is associated with the length difference between the length of a local candidate path and the distance between p i and p i+ . As a result, the output tends to be the shorter path. In fact, the longer paths are filtered in the local candidate path selection module of the system, so only average velocity is considered.
It is assumed that one of the local candidate paths from c i r to c i+ s is {e r , e r+ , ⋯ , e s } because the travel time of the current trajectory on each road segment is unknown, and only the average speed can be calculated, as shown in Equation (9): v i→i+ = e .length s j = r Δt i→i+ (9) where Δt i→i+ is the time difference between p i and p i+ . The ST-Matching algorithm [7] calculates velocity similarity by comparing the difference between the speed limit and the average velocity. Due to the velocity change and waiting time caused by road traffic congestion, the velocity is always much lower than the speed limit. In addition, it is difficult to obtain real-time traffic flow information. Therefore, HTMM calculates velocity similarity by comparing the difference between the historical velocity and the average velocity. The historical velocity of the road segment of the local candidate path is: Figure 7. Candidate graph composed of candidate points and local candidate paths. Each candidate point has an observation probability, and the probability from the previous candidate point to the current candidate point is expressed as a transition probability.
The observation probability represents the possibility that p i matches to c r i . The distance and the heading angle are taken into consideration; then, the observation probability p o c r i is formulated as: where dist p i , c r i is the distance between p i and c r i and obeys the normal distribution with mean zero and variance σ. θ is the angle between the driving direction and the direction of the road segment and obeys exponential distribution with rate parameter λ.
(2) Transition Probability The transition probability represents the possibility that the next path of e r is e s . In most map matching algorithms based on HMM, it is supposed that the transition probability is associated with the length difference between the length of a local candidate path and the distance between p i and p i+1 . As a result, the output tends to be the shorter path. In fact, the longer paths are filtered in the local candidate path selection module of the system, so only average velocity is considered.
It is assumed that one of the local candidate paths from c r i to c s i+1 is {e r , e r+1 , · · · , e s } because the travel time of the current trajectory on each road segment is unknown, and only the average speed can be calculated, as shown in Equation (9): where ∆t i→i+1 is the time difference between p i and p i+1 . The ST-Matching algorithm [7] calculates velocity similarity by comparing the difference between the speed limit and the average velocity. Due to the velocity change and waiting time caused by road traffic congestion, the velocity is always much lower than the speed limit. In addition, it is difficult to obtain real-time traffic flow information. Therefore, HTMM calculates velocity similarity by comparing the difference between the historical velocity and the average velocity. The historical velocity of the road segment of the local candidate path is: Therefore, the transition probability is: The optimal matched points are denoted as m r 1 → m r+1 2 → · · · → m s n , which consists of the real path. From the first GPS point, the union probability is calculated recursively until the last GPS point is reached. The union probability is the product of the observation probability and the transition probability. The Viterbi algorithm is used to obtain the solution.

Location Query and Tracking
In order to achieve more accurate query of cargo shippers and carriers on the locations of containers, a container location query and tracking method based on matching results is proposed.
Given time t, the location (p ) of the container is obtained with the following steps: (1) Reconstruct the matched path as meta paths to obtain the enter time and the leave time according to the method in Section 3.4.2. The meta path is represented as (MP 1 , MP 2 , · · ·). (2) Find the meta path (MP j ) where the container is located, which satisfies: (12) so that the container is on e j . × e j .end.lng − e j .start.lng (13) where v represents the average velocity of the GPS point.
The reconstructed path can be added to the historical index library again, which reinforces the positioning and tracking of later GPS points.

Experiments
In this section, the effectiveness of HTMM is verified on a real trajectory dataset. HTMM is compared with two baseline algorithms by evaluating the matching accuracy and the matching time. The matched results in two typical situations are visualized to illustrate the improvement of HTMM. The positioning errors before and after location correction are calculated to show the effect of map matching in reducing positioning error.

Trajectory Dataset and Road Network
The experimental dataset comes from the real-time GPS positioning records of 15000 trucks of a freight company in Shenzhen and surrounding cities. The places of departure and destination include container terminals, logistics parks, ports, etc., as shown in Figure 8.   HTMM is performed on a region whose latitude and longitude range is (22.5424, 113.8489, 22.5966, 113.9257). The road network downloaded from Open Street Map is contains 17,108 nodes and 18,892 edges, as shown in Figure 9.  Before the trajectory dataset is used for map matching, the noise and redundant GPS points are filtered, and the long trajectories are segmented. The number of GPS points of each trajectory is displayed in Figure 10a, and the sampling rate of each GPS point is displayed in Figure 10b. Most trajectories have fewer than 50 GPS points, and the sampling rates of 63% of GPS points are higher than 30 s.

Experimental Settings
Ground Truth: A total of 566 high-sampling-rate historical trajectories with sampling intervals less than 30 s are selected to match HMM-based map matching algorithm to obtain the real paths as ground truth. Then, these real paths are used to reconstruct meta paths and build a historical path index library.
Baselines: An HMM-based map matching algorithm [8] and the ST-Matching algorithm [7] are used as the comparison algorithms to verify the effectiveness of HTMM.

Experimental Settings
Ground Truth: A total of 566 high-sampling-rate historical trajectories with sampling intervals less than 30 s are selected to match HMM-based map matching algorithm to obtain the real paths as ground truth. Then, these real paths are used to reconstruct meta paths and build a historical path index library.
Baselines: An HMM-based map matching algorithm [8] and the ST-Matching algorithm [7] are used as the comparison algorithms to verify the effectiveness of HTMM.
Evaluation Criteria: Matching accuracy and matching time are chosen to evaluate HTMM. The matching time is measured by the running time of the program, which does not include the time to build the historical path index library and load the road network data. The matching accuracy is measured by the proportion of the number of correctly matched road segments to the total number of road segments of trajectories, as shown in Equation (14): where N c represents the number of correctly matched road segments, and N t represents the number of total road segments. Test Data: A total of 100 high-sampling-rate trajectories are selected to generate test data, which are first matched to the road network to obtain the real paths as ground truth, and then down-sampled at 60 s, 120 s, 180 s, 240 s, 300 s, 360 s, 420 s, 480 s and 600 s intervals.

Comparison of Three Map Matching Algorithms
A change in sampling rate leads to a change of the distance between two adjacent GPS points. Figure 11 shows the average distance between two adjacent GPS points of the test data under different sampling rates. The average distance is 0.67 km when the sampling rate is 60 s but reaches 2.36 km when the sampling rate is 600 s. This introduces problems such as matching interruption and matching error and reduces the matching accuracy.
As shown in Figure 12, the matching accuracy of the three algorithms gradually decreases as the sampling rate decreases. It is clear that HTMM outperforms the HMMbased and ST-Matching algorithms when the sampling rate ranges from 60 s to 300 s. This result may be due to the fact that the paths between adjacent GPS points do not conform to the shortest path principle. This also proves that it is feasible to select local paths according to historical information. However, the accuracy of HTMM drops sharply when the sampling rate ranges from 360 s to 600 s. The main reason for this is that as the path length increases, there are more historical paths to choose from. Drivers also make choices according to the current situation, not just relying on historical experience. In this case, the current information of GPS points is not fully utilized by the HTMM algorithm.

Comparison of Three Map Matching Algorithms
A change in sampling rate leads to a change of the distance between two adjacent GPS points. Figure 11 shows the average distance between two adjacent GPS points of the test data under different sampling rates. The average distance is 0.67 km when the sampling rate is 60 s but reaches 2.36 km when the sampling rate is 600 s. This introduces problems such as matching interruption and matching error and reduces the matching accuracy. Figure 11. The average distance between two adjacent GPS points at different sampling rates.
As shown in Figure 12, the matching accuracy of the three algorithms gradually decreases as the sampling rate decreases. It is clear that HTMM outperforms the HMMbased and ST-Matching algorithms when the sampling rate ranges from 60 s to 300 s. This result may be due to the fact that the paths between adjacent GPS points do not conform to the shortest path principle. This also proves that it is feasible to select local paths according to historical information. However, the accuracy of HTMM drops sharply when the sampling rate ranges from 360 s to 600 s. The main reason for this is that as the path length increases, there are more historical paths to choose from. Drivers also make choices according to the current situation, not just relying on historical experience. In this case, the current information of GPS points is not fully utilized by the HTMM algorithm.
Average distance(km) Figure 11. The average distance between two adjacent GPS points at different sampling rates. As shown in Figure 13, with an increase in algorithm complexity, the matching time is increases, so HTMM is inferior to the other two algorithms. Besides the search of global optimal path, the time cost of HTMM is mainly in the calculation of multiple candidate road segments and the retrieval of the historical index library. In particular, the time cost of HTMM increases significantly with a decrease in sampling rate. However, the local candidate path is determined before searching the global matched path. Although the global optimal solution may not be obtained, this operation reduces time cost. It is acceptable to sacrifice time cost for improved matching accuracy. e(s) Figure 12. Matching accuracy of three algorithms at different sampling rates.
As shown in Figure 13, with an increase in algorithm complexity, the matching time is increases, so HTMM is inferior to the other two algorithms. Besides the search of global optimal path, the time cost of HTMM is mainly in the calculation of multiple candidate road segments and the retrieval of the historical index library. In particular, the time cost of HTMM increases significantly with a decrease in sampling rate. However, the local candidate path is determined before searching the global matched path. Although the global optimal solution may not be obtained, this operation reduces time cost. It is acceptable to sacrifice time cost for improved matching accuracy.
is increases, so HTMM is inferior to the other two algorithms. Besides the search of global optimal path, the time cost of HTMM is mainly in the calculation of multiple candidate road segments and the retrieval of the historical index library. In particular, the time cost of HTMM increases significantly with a decrease in sampling rate. However, the local candidate path is determined before searching the global matched path. Although the global optimal solution may not be obtained, this operation reduces time cost. It is acceptable to sacrifice time cost for improved matching accuracy. Figure 13. Matching time of three algorithms at different sampling rates.

Visual Analysis of Typical Situations
Two typical matched paths are selected from the matched results for visual analysis.
(1) Detours As shown in Figure 14, there are many detours in the real path represented by the green line. After using the ST-Matching algorithm, the detour is replaced by a shorter path when the sampling rate is 60 s. The result become worse when the sampling rate is 120-300 s. This is the result of selecting the shortest path. It is found that the travel time of

Visual Analysis of Typical Situations
Two typical matched paths are selected from the matched results for visual analysis.
(1) Detours As shown in Figure 14, there are many detours in the real path represented by the green line. After using the ST-Matching algorithm, the detour is replaced by a shorter path when the sampling rate is 60 s. The result become worse when the sampling rate is 120-300 s. This is the result of selecting the shortest path. It is found that the travel time of detours and the shortest paths is very different, so HTMM solves this problem well by calculating the travel time. However, HTMM produces errors in the selection of main roads and auxiliary roads.  (2) Multiple Optimal Paths As shown in Figure 15, different sampling rate trajectories obtain different matched paths after using the ST-Matching algorithm. This is because the length and direction angle of these optional paths are similar, so the ST-Matching algorithm does not know which one to select. HTMM solves this problem by adding the frequency of the historical paths as the measurement of path selection. It is believed that the result is reliable if the historical As shown in Figure 15, different sampling rate trajectories obtain different matched paths after using the ST-Matching algorithm. This is because the length and direction angle of these optional paths are similar, so the ST-Matching algorithm does not know which one to select. HTMM solves this problem by adding the frequency of the historical paths as the measurement of path selection. It is believed that the result is reliable if the historical path index library is personalized and large enough. (2) Multiple Optimal Paths As shown in Figure 15, different sampling rate trajectories obtain different matched paths after using the ST-Matching algorithm. This is because the length and direction angle of these optional paths are similar, so the ST-Matching algorithm does not know which one to select. HTMM solves this problem by adding the frequency of the historical paths as the measurement of path selection. It is believed that the result is reliable if the historical path index library is personalized and large enough.

Positioning Error Analysis
In order to verify the effect of HTMM on positioning error correction, the GPS points eliminated by down sampling are taken as the points that need to query the locations. The real locations of these GPS points are known. The meta paths of the matched path are reconstructed to acquire the enter time and the leave time; then, the corrected locations are calculated according to the method introduced in Section 3.4.2. The measurement error of the GPS point is the distance difference between the observation location and its location in the historical real path, the average of which is 15.06 ± 5.57 m. The positioning error is the distance difference between the corrected location and its location in the historical real path, as shown in Table 2. The positioning error is reduced by nearly half when the sampling rate is 60 s. HTMM can eliminate the error perpendicular to roads. However, the error parallel to roads, which is caused by change in speed and other factors, cannot be eliminated.

Conclusions
In this paper, a novel map matching algorithm called HTMM is proposed. HTMM is the first map matching algorithm for container positioning and tracking with missing and low-sampling-rate trajectories. Compared with other methods that improve positioning precision at the hardware level, HTMM has a significant cost advantage.
The biggest innovation of HTMM is to propose a local path selection strategy that utilizes travel time and the frequency mined from historical paths. A path reconstruction method is introduced to determine the travel time of historical paths on each road segment before a historical path index library is built. A historical path index library is developed to accelerate the index speed when historical paths are searched in HTMM. Finally, to realize the query and tracking of containers, a location query and tracking method is presented by road segment retrieval and time-distance interpolation.
After verification on a real trajectory dataset, the matching accuracy of HTMM is proven to be better than that of an HMM-based map matching algorithm and the ST-Matching algorithm, especially when the sampling interval ranges from 30 s to 300 s. The matching accuracy is not improved much when the sampling interval is higher than 300 s. We infer that this is because with changing traffic conditions, the choice of drivers at each intersection between two adjacent GPS points is not necessarily the same every time. The longer the distance between two adjacent GPS points, the greater the diversity. In this case, HTMM only selects one complete historical path with the largest score as the local candidate path.
In the future work, we will further optimize the local candidate path selection strategy. One method is to segment the longer path on the basis of intersections and select a local candidate path for each partial path. Another method is to consider more real-time traffic information to improve the Markov decision process. In addition to this work, we will conduct experiments on a larger trajectory dataset and road network to improve the generalization of HTMM. HTMM will be deployed in online map matching combining with location query and tracking to realize the real-time positioning and tracking of containers, which will greatly improve the positioning precision and safety of container transportation.