1. Introduction
With the growth of the number of new query dimensions that address spatial and temporal properties, business intelligence analysts claim that the buzz-word term Big Data [
1] does nowadays put less emphasis on the sheer size of data but focuses more on the
v-terms variety and veracity, which translate to the context of spatial data in the following ways.
This can be seen by two facts: On the one hand, in most parts of the world, modern network data are approaching nearly a perfect mapping between digital representation and real-life infrastructure. On the other hand, diverse sources of spatial data emerge due to the vast availability of mobile gadgets that collect location data (e.g., geotagging, -caching, etc.). This phenomenon involves structured geoinformation data sources such as Open Street Map (OSM), as well.
Nowadays, spatial database systems have already taken the bait to provide useful query access to data with spatial information [
4]. They enable fast spatial queries that may involve routing, but they do not provide the easy aggregation of diverse data formats. This comes apparently to light when dealing with routing networks and spatial objects. Common spatial systems map the network locally on a plane such that the Euclidean metric can be used for fast geographic reasoning.
On the one side, adding, moving or deleting vertices or edges can be naturally interpreted geometrically. On the other side, naïve approaches neglect the following problems:
First, the network’s graph might not be planar. When projecting a non-planar graph to a plane, the image has intersecting lines that represent two edges. By merely looking at the image, we cannot differentiate whether it is a crossing or whether one street tunnels under the other. A geometrical query, unaware of this fact, might deliver results that do not match real-world expectation. Let us, for instance, consider a long bridge crossing a valley. A query for the nearest access point to the routing network of the valley might suggest this bridge as a result, although it is not directly accessible from the valley.
A second problem arises when very distant points are taken into consideration. For large routing networks, it is not possible to project them on a compact plane that respects both the angles and distances of the Earth. That is because the Earth is not isomorphic to any compact flat model. For instance, let us take a look at the Mercator projection [
3]. It gained popularity due to its accuracy of angles and is still used for course information in marine navigation systems [
5]. Furthermore, the projection provides a good approximation for nearby spatial reasoning. Unfortunately, the projected shapes of the objects get distorted with respect to angle and length. More precisely, the distortion correlates with the objects’ distances to the equator. Hence, the projection exaggerates the distances and sizes of spatial objects near pole regions. This makes spatial objects with large differences in latitude incomparable, and is a main complaint of the Mercator projection. In order to cope with large-distance reasoning, we have to leave the concept of a single chart and move towards a concept that represents a “truer” form of the Earth. In other words, distances between distant objects are retrieved by a geodesic line instead of some projection.
Having recognized these facts, we want to deal with the following emerging problem.
Research Hypothesis. Given a (global) routing network that is based on a geometric representation, we want to enhance the network in such a way that queries based on geometric information are answered by matching the geometric position with the network. This should be conducted in such a way that respects the nature of the geometric representation.
Nowadays, most current spatial systems are based on the World Geodetic System [
2]. Its underlying model shapes the Earth as a spheroid. In fact, when neglecting local height differences, e.g., mountains and valleys, it is a good approximation.
Remark 1 ([
6])
. The representation of the Earth as a spheroid can be formalized by the local parametrizationswith and , where the variables have to be chosen such that the overall error is minimized [7]. Here, θ denotes the inclination (i.e., latitude) and ψ the azimuth (i.e., longitude). The intervals are chosen in such a way that there are no degenerated points, i.e., each inverse function and its derivate are continuously defined. In this article, we show an approach combining a routing network with a geometrical representation that respects distances globally. In simple terms, we augment a given routing network’s graph with new vertices in such a way that the new graph can give us routing information from and to the newly created vertices that reference geometrical points that were not yet represented by the routing network. To keep it short, we use the term linkage for such a procedure.
1.1. Related Work
Spatial reasoning can be conducted on graph-theoretical foundations only if all interesting points are represented by the graph. Moreover, a connected graph is necessary for useful routing queries. In most common techniques, missing parts of the street infrastructure are either:
Every case may lead to an inaccurate or incomplete routing network. If the position is uncertain or the information about the area is incomplete, it is not possible to exactly match a location point with a vertex on a routing network. In scenario (1), the captured images can be used to reverify the constructed routing network. Wiedemann and Ebner’s algorithm [
8] uses detour lengths and connection completeness as criteria when analyzing imagery. In terms of case (2), this problem is common for navigation systems that have to figure out on which path an object is moving by analyzing its trajectory. In fact, this process is so frequent that it has a name of its own—the
map matching problem [
9]. For instance, Haunert and Budig [
10] used collected trajectories to discover missing road parts. Alternatively, Lou et al. [
11] took an initial candidate list and use transition probability to minimize the error of choosing the right trajectory. For case (3), we have in general no additional information to ensure our decision for linking a non-mapped point to the network. The easiest setting is a planar routing network mapped on a surface: For each point of interest (POI) we want to connect, we just add an edge to the closest location of the graph. For proximity analysis, Dahlgren and Harrie [
12] connected each geolocation with the nearest reference point of the routing network. de Jong and Tillema [
13] took the Delaunay triangulation of their existing road network as a criterion for linking non-connected points. They further discarded those parts of the Delaunay graph that intersected with obstacles. Last but not least, Aronov et al. [
14] emphasized possible detours when propagating their method to link new points to the network. They called the newly created edge a
feed-link. Savic and Stojakovic [
15] further proposed an algorithm to compute this feed-link in linear time.
For mobile ad hoc networks, Blazevic et al. [
16] proposed so-called
terminode routing to address holes in mobile network topologies. Durocher et al. [
17] reviewed various geometric routing strategies for wireless network protocols.
Although some of these approaches share similarities with the techniques introduced in this article, they differ in the problem statement. In fact, to the best of our knowledge, we are unaware of any former study adding points to spatial systems based on geometric distances. Our problem setting has not been treated in terms of spatial databases that may store wide routing networks along with spatial representations of both the routing networks’ vertices and POIs. We provide in this paper a conceptional foundation for describing routing networks globally on Earth’s surface that is then translated to the field of spatial databases.
1.2. Structure of the Paper
More verbosely, we start with some preliminaries that introduce graphs and manifolds in
Section 2. Briefly, we want to represent our routing network as a special class of graphs that can be mapped on a surface that is used for geometric reasoning. A modification of the routing network is conducted based on the geographic representation of the graph. Basically, we use manifolds for geometric reasoning. In order to connect the actual network with geometric points, we demand the routing network to be projectable on the manifold. This will allow us to do geometrical distance measuring. For that, we introduce a lower bound in
Section 2.4 that we use to elaborate upon a theoretical solution in
Section 3. The provided approach is a solution to the research hypothesis with respect to the fact that the geometrical shape of the manifold gets truthfully respected. Translation to spatial databases is conducted in
Section 4. We further evaluate the implementation in
Section 4.1 while providing an outlook in
Section 4.3.
2. Preliminaries
Let us recollect some essential text-book concepts such as graphs, manifolds and metrics. To this end, we can formulate our problem in terms of this section.
2.1. Graphs
In order to represent a routing network on the Earth, let us recall the definition of a graph [
18]:
Definition 1. A directed weighted graph is a triple consisting of a vertex set V, an edge set and a cost measure . Let us denote for two vertices with the edge that connects to . To avoid the excessive usage of brackets, we simplify the expression to for an edge . A walk
is a consecutive succession of vertices for an arbitrary such that there exists an edge for each . If the vertices and of a walk are pairwise different for all , we call P a path
. We say the walk P with vertices follows “from to ” when and . We call G connected
if there exists a walk from a to b for all . Moreover, we define the length of the walk by . Further, we denote with the infimum of the lengths of all walks from a to b, i.e., We further want to examine a special class of graphs whose path length function ℓ is restricted in such a way that we can find a metric that is a lower bound of ℓ. This restricts ℓ to be member of the following function class:
Definition 2. Let V be a set. A mapping is called a quasimetric if it is a metric which does not need to fulfill the symmetry property.
Remark 2. For our approach, we neglect a possible symmetry property of routing networks. In fact, cost functions based on fuel or calorie consumption and estimated time are valid examples for directed networks that are in general non-symmetric.
Definition 3. If and c of a connected graph supports the conditionsthen is a quasimetric. We then call G a quasimetric network
and ℓ the quasimetric induced by c. See [
19] for an indexing data structure built on a quasimetric network.
Lemma 1. ℓ respects the triangle inequality, although c is not required to hold this property.
Proof. Because for all , a shortest walk will always be a simple path: Every walk that contains a circle is not a path. It can be made simple by removing all circles. However, this will also shorten the length of the walk. Hence, the definition of ℓ stays the same when taking the infimum of the length of all paths. We will call a walk that meets this condition a shortest path. Hence, we yield:
From , we obtain the non-negativity of ℓ.
As for all walks from u to , we obtain by definition for all . As for each , we can conclude that for each . Thus, we yield the positive definiteness of ℓ.
If we define the concatenation of walks by
the triangle inequality is simple to show: For arbitrary
, let
be a path from
u to
v and
a path from
v to
w. Then, we can generate a walk
by combining both paths, so we have
. Applying the infimum over all walks from
u to
w yields the triangle inequality.
□
Remark 3. In some technical scenarios, the definiteness is too restrictive. For instance, a database user may want to store two vertices with zero distance when one node shall represent the actual POI and the other the street segment at which the POI is located. If we drop definiteness in Definitions 2 and 3, we have to take care with the definition of the embedding i below, cf. Example 5.
2.2. Manifolds
With regards to the vast number of encoding formats for spatial data, we describe our approach on a theoretical level, independently of any encoding. Therefore, we emphasize the concept of manifolds [
20] in order to model space. The core idea is that we use several charts
that map locally to a plane. The images of these charts can be glued to transit local reasoning over multiple planes. The following definition gives a precise characterization:
Definition 4. ([
20]).
A compact n-dimensional manifold is a compact topological space M for which a finite family of homomorphisms exists with open for each such that:Eachis called a chart, andis called an atlas of M.
In differential geometry, the earth is often modeled as an oriented surface
, i.e., a two-dimensional, topological manifold that embeds to the space
equipped with the Euclidean metric. We use the transition map in order to prolong geodesics over multiple maps. The
are open, connected subsets of
M that cover
M. For example, any sphere or spheroid is a surface and
and
are the charts of this manifold. Further, every oriented surface has a Gauss map [
21] that defines the normal field
. Informally,
M encodes the position on the surface, while
allows us to express elevation relative to the surface.
2.3. Routing Networks
We now introduce another manifold N that accounts for routing networks. In general, we will allow , i.e., bridges, tunnels or other partially inaccessible street segments can be expressed by this model.
Definition 5. Let be a quasimetric network. Further, let be an embedding in a manifold , where is the power set of N. Let the following conditions hold:
with
For each, there exists anand ansuch that(embedding property). This property is based on the fact that street segments may not be on the surface. Informally, the parameter h counts for the altitude difference between the ground and street; e.g., bridges have positive heights and tunnels negative heights. This separation between spherical data and altitude is also common when dealing with the WGS84 format that encodes points on M by latitude and longitude [22]. Letbe the (canonical) projection of the graph onto M. Then, we assume that for allthere exists somewith, i.e., we stipulate that the complete image of each edge is contained in at least one chart’s image (containment property).
For each edge, there is a linewithand, i.e., is a linear continuous, parameterized, geodesic line with a and b as endpoints (line-string property).
Two of these lines and are called equivalent when or . If we denote equivalence with ∼ and define the quotient set , then shall be a disjoint partition of (disjoint property).
We then call the tuple a routing network. , since G is connected. In particular, if there exists an injection for some , then the routing network G is planar. Let us use the symbol for the class of routing networks.
Remark 4. The equivalence relation Γ
can be understood as a lift of an equivalence relation that renders a directed graph undirected by Let us consider the scenario of a spatial recommender system that uses the current location of a user to search for close POIs. The geolocation is collected by a mobile navigation system. Hence, the recommender has to map this polled location to its routing network in order to compute distances. The query point might not even be in the image of i. We have to add these points to the graph by enhancing the number of vertices and edges in such a way that these particular points can be accessed by our network, as well. A vertex on which a geographic point is mapped is called a location reference. We will show some ideas on how to model location references so that shortest path evaluation should retrieve good results in relation to real-world expectation.
2.4. Lower-Bounding Metrics
The cost function c of a routing network can express various measures: distance length, travel time, fuel consumption, etc. The definiteness of c does not allow us to add new vertices to networks with edges of zero cost. In fact, we want to give the new reference location a penalty for the expense to travel from (to) the current location to (from) the routing network. Most scenarios inherit a lower bound for this expense, e.g., the beeline when c encodes travel distances. We discuss some further scenarios and show possible lower bounds, which we formalize in the following definition:
Definition 6. We call a metric such that for all vertices , where ℓ denotes the quasimetric induced by c, a lower-bounding metric of ℓ. For a set and a point , we use the common shortcut with .
Example 1. Let be a routing network in which shall represent a street segment for an arbitrary . For each street segment e, the network can query with the estimated time needed to get from the start point of e to its end point. The induced quasimetric ℓ respects the triangle inequality: For each start and end point , the obtained value of is the length of a shortest path from s to t, i.e., a path whose accumulated, estimated time is shortest.
Example 2. Letbe the Euclidean distance between the mappings ofandto a local plane. Such a mapping exists for each edge due to the containment property. For such a c, the following metrics are valid lower bounds:
A common local lower-bounding metric is the beeline when the network is mapped to a local plane. It would be tedious to adhere multiple charts by the transition map in order to calculate the beeline between two distant points. Fortunately, it is easy to calculate the geodesic on M by approximation [23]. For example, Vincenty’s algorithm [24] is a good approximation for calculating geodesics between two different points on the surface of the Earth. The Euclidean metric in is a lower-bounding metric. In particular, this metric is also a lower bound of the local beeline because it is allowed to neglect the curvature of the manifold M, cf. Figure 1.
Definition 7. Let be a routing network and d a metric. If d is a lower-bounding metric of ℓ, where ℓ is the quasimetric induced by c, we say that G respects the metric d.
3. Linkage to Network
Let us again formulate our posed problem on a surface M: Given a routing network , its geometric representation N and a lower-bounding metric d, we want to add a new vertex to G in such a way that:
The resulting routing network stays valid. In particular, the graph shall remain connected, i.e., there is an edge with as an end.
The modified geometric representation maps to M.
. Informally, this means that a pedestrian can access from the ground.
d is still a valid lower-bounding metric.
We call this transformation a
linkage and pose now a formal definition of it:
Definition 8. Letand ℓ be the quasimetric induced by i. A linkage is some mappingwith the property that the resulting quasimetric networkholds the conditions:
with and .
with for some (that is not necessarily part of V).
; hence, .
There exist some with and .
with .
respects the metric d.
Hence, for all vertices where ℓ and are the quasimetrics induced by i and , respectively.
The crux of this problem lies in the construction of appropriate edges or vertices that sustain the properties of a routing network. In the following, we introduce some construction steps to elaborate a simple but complete solution in the end. First, we treat the special case that we want to add some point
that belongs to
G, i.e., the location reference
with
already exists. Hence, nothing has to be done. Second,
may be in the image
of some edge
. Then, we split
e into two pieces by linear interpolation and add the location reference of
a as an intermediate piece. Formally, we have:
Example 3. We define the linear interpolation of for a point as follows:
- 1.
If there exists somesuch that, then.
- 2.
If there exists an edgesuch that, then there is somesuch that, whereis the line induced bydue to the line-string property. Letbe the vertices connected by e such that. We add a new vertexwith. Let us set. If , we further set, due to, withsuch that. In the end, we defineas the set E without, but with the new edges used above -, and if .
Let ℓ and be the quasimetrics that are induced by c and , respectively. Then, holds due to . Thus, we yield the beeline property with . With the same arguments, we are able to preserve the properties of a routing network for , if . Because we have not changed the image of i, we can just set and hence we are finished.
- 3.
Otherwise, we are certain that . We use a not yet defined method to create a new network with some , and .
In both specified cases, the linear interpolation is a linkage, i.e., fulfills the properties of Definition 8. See Figure 2 for an illustration. Note that Example 3 takes care just for points that are already part of N. In the following, we try to fill case (3). Here, we can be sure that the point to add does not belong to the image of i; hence, . In order to keep the graph connected, we have to add some edges after inserting the location reference of a. For the search of suitable access points, we take only those segments of the routing network into account that intersect with the manifold M. Informally, the intersection takes account for accessibility from the ground. For example, we consider the internal segments of a tunnel (“below M”) or bridge (“above M”) inaccessible. To access a tunnel or bridge, a path to the entrance of a tunnel or the vertex that connects the bridge with M has to be found. For a query point , if the routing network can be drawn on the surface, i.e., the condition holds, then we do not have to care about inaccessible points.
The first example tries to solve the problem by simply adding an edge to the closest accessible vertex of the routing network, see
Figure 3 for an example. Unfortunately, the resulting graph may not be a routing network anymore. Nevertheless, we can use this approach to elaborate upon Example 4.
Counter-Example 1. We define the vertex linkage
of for a point that is based on linear interpolation. Case (3) of Example 3 is implemented as follows: Define with some , and set and otherwise. Further, let . We then define with . In general, is not a valid linkage with regards to Definition 8. The problem arises when there exists an edge with . This is possible because there might exist an edge with the following properties (cf. Figure 4): and hence;
There exist and such thatwhere is the geodesic of e induced by i.
Instead of the closest vertex, we can also take some route segment as an access point; we split an edge like the linear interpolation of Example 3 in such a way that the intermediate piece
acts as the new reference location. The image
of a suitable edge
e should have a point
that is close to
a. On a plane
with
, the point
b is determined by the perpendicular from
a to the closest edge with respect to
a. In the general case, we have to follow the shortest geodesic from
a to the routing network. This approach is visualized in
Figure 5.
Example 4 (Edge Linkage)
. We define the edge linkage
of for a point that is based on linear interpolation and implement the last procedure as follows: First, search for anand set , where is the geodesic line induced by . Let be the vertices attached by e such that and . There are now two cases to consider:- 1.
or , i.e., or . Without loss of generality, let the equation hold. Now we can apply a vertex linkage on G by Counter-Example 1 (setting ). This linkage holds the disjoint property.
- 2.
Otherwise, . Hence, there exists some with and . We apply linear interpolation on G with b and yield a new network with for some . Note that the rule applied by linear interpolation will not modify N. If we exchange G with , the first case holds, since there exists an edge with .
Proof. For (2), we need to show that there is no that intersects with . Let us assume that such an exists. As we have merely exchanged e with in E, has to intersect either or .Then, , a contradiction to the selection of to be the closest edge to a. □
Retrieving the closest edge/vertex is usually conducted by a nearest-neighbor search on the database index. A common spatial index structure is the R-tree [
26] that builds bounding boxes of geometries. For instance, Roussopoulos et al. [
27] elaborated upon a branch-and-bound algorithm that evaluates distances between bounding boxes.
When used in real-world scenarios, it is often the case that some point
is very close to a mapped vertex, but the coordinates do not match exactly. More precisely, for an
small enough, we have
. Let us consider that we have a huge collection of sensor-collected geolocations that shall be linked to an existing routing network. For the same location, measured values tend to have small differences. Hence, it might be advisable for practical reasons to condense very close points to one vertex. If we redefine the properties of
i in such a way that we have
instead of
, then we could do the following trivial optimization:
Example 5 (Fuzzy matching). If there exists some with , then add a to , i.e., and . We define the r-snap with a threshold as a modification of our linear interpolation approach (Example 3), which makes the above rule its highest priority. Choosing the right r is situation-dependent, e.g., a small r would be more preferable in network-dense areas. Note that this approach resembles “snapping” in computer graphics.
To recap, we first studied linear interpolation as a method to add a point a to the routing network for the special case that a is on an already-existing edge, which we split at a. Next, we studied ways in how to link a when a is on no existing edge of the routing network. There, we observed that the vertex linkage linking a to the closest vertex does not retain planarity in general. Luckily, we could show that the edge linkage retains this property, making it a suitable candidate for tackling our linkage problem. Finally, we introduced the r-snap technique, which allows us to apply linear interpolation for close enough points while sacrificing accuracy. In what follows, we present an implementation of the edge linkage approach on a database system.
4. Implementation
The key extension of SQL with respect to geoscience is the technical report “Simple Features for SQL” of the Open Geospatial Consortium (OGC) [
4,
28]. Common databases already offer an addition to its core framework to address this concept. For PostgreSQL, there exists the extension PostGIS [
29] that follows OGC’s described standard. Hence, we describe our implementation in terms of the OGC standard.
While the extensions pgRouting and PostGIS are written in C/C++, our middleware was implemented in Java. The linkage and routing queries were conducted by JDBC commands with OGC’s SQL extension. By restricting ourselves to JDBC calls, our solution was independent of a specific relational database management system, as long as OGC’s spatial extensions to SQL are provided.
We have already employed the implementation described below in one demonstration [
30]. There, the end user could mark arbitrary locations as favorable points by simple mouse-clicks on a map overlay. According to the generated, geometric objects, a database query on a routing network was issued. Because the user was not restricted to selecting points on the routing network, a linkage had to be performed. For this scenario, we used the OSM map of Munich (of the year 2013) that we imported statically into the PostgreSQL database. Access to the routing network was obtained by the additional extension pgRouting [
31], which sits on top of PostGIS.
Figure 6 depicts the software layers of our employed solution. Note that there are also other frameworks with similar concepts, cf. [
32]. The translation of our linkages to the language of OGC was performed easily:
d was given by ST_Distance.
ℓ was computed by a shortest path algorithm on pgRouting’s network. Common algorithms such as Dijkstra and A* were available.
ST_ShortestLine represented the geodesic function g.
with or : We used ST_DWithin as a pre-filter for distant edges/vertices before the exact distances to all remaining edges/vertices were calculated.
The split of an edge into and with was performed by calling ST_Line_Locate_Point and then generating both new edges with two calls of ST_Line_Substring.
On a query, the network would be modified in such a way that the user’s specified geolocations were represented in the routing network. After the query was evaluated, we restored the network back to its initial state.
4.1. Evaluation
We evaluated the edge linkage as part of our demonstration [
30]. Our evaluation was conducted on top of our middleware in Java using JDBC as a connection (cf.
Figure 6). 100 to 800 points on the map were randomly marked for linkage. After each point was linked to the routing network, distances to a fixed set of POIs were calculated by pgRouting’s Dijkstra implementation. The execution times of
Table 1 were gathered on a single Debian 6.0 node with an Intel(R) Xeon(R) CPU E5540. For each geometric point a user specified, the framework linked this point to the routing network and calculated the distances to some fixed POIs. It is easy to see that the routing with Dijkstra was by far more costly than the linkage.
4.2. Expectations on Larger Datasets
The naive Dijkstra implementation runs in
time [
33,
34]. Since its running time super-linearly depends on the number of vertices, its performance deteriorates when scaling up the number of vertices. Here, indexing data structures for routing networks have been proposed, such as hierarchical hub labels [
35], contraction hierarchies [
36,
37,
38,
39] or a combination of those [
40,
41]. To adapt our approach for such an indexed routing network, we not only have to perform the linkage on the plain routing network, but we must also update the underlying indexing data structure, which is used for the shortest path query. Updating hub labels [
42] and contraction hierarchies [
43] have been studied, but it is unclear whether we can put the update time in relation to a shortest path query.
4.3. Outlook
One may argue that the here-proposed edge linkage is still a naive implementation because it neglects possible detours. Fortunately, by the introduction of both vertex and edge linkage, it is straight-forward to construct a valid linkage that creates a feed-link [
14] from the non-connected point. Another complaint may arise that the complete use of
M is quite naive. Let us imagine that the edge created to link an off-road point to the routing network leads over a river, lake, mountain gorge, etc. Pedestrians or travelers with common means of transportation are not able to overcome these obstacles and have to seek an alternate route. Real datasets such as OSM represent obstacles by a polygon in which the actual obstacle is contained, such as in
Figure 7. If we stipulate that every chart of
M is large enough such that every obstacle can be rendered in at least one chart’s image, any lower-bounding metric can be modified to respect the presence of obstacles. The additional computation can be conducted, for example, by Hershberger and Suri’s algorithm [
44] which solves the shortest path problem on a plane with obstacles. Lastly, we could adapt our introduced concept to more detailed routing networks. New models enrich network information, for instance, with affordance [
45] to describe possible physical actions.