An Area Partitioning and Subgraph Growing (APSG) Approach to the Conflation of Road Networks

A road network represents a set of road objects in a geographic area and their interconnections, and it is an essential component of intelligent transportation systems (ITS) enabling emerging new applications such as dynamic route guidance, driving assistance systems, and autonomous driving. As the digitization of geospatial information becomes prevalent, a number of road networks with a wide variety of characteristics may coexist. In this paper, we present an area partitioning and subgraph growing (APSG) approach to the conflation of two road networks with a large difference in the level of details and representation rules. Our area partitioning (AP) scheme partitions the geographic area using the Network Voronoi Area Diagram (NVAD) of the low-detailed road network. Next, a subgraph of the high-detailed road network corresponding to a complex intersection is extracted and aggregated into a supernode so that high precision can be achieved via 1:1 road object matching. For the unmatched road objects due to missing road objects and different representation rules, we also propose a subgraph growing (SG) scheme that sequentially inserts a new road object while keeping the consistency of its connectivity to the matched road objects by the AP scheme. From the numerical results at Yeouido, Seoul, Korea, we show that our APSG scheme can achieve an outstanding matching performance in terms of the precision, recall, and F1-score.


Introduction
Geographic information systems (GIS) provide solutions for capturing, manipulating, analyzing, and visualizing the geospatial data for many application fields, such as transportation, agriculture, commerce, etc. [1,2]. Initially, government agencies have built authoritative GIS because the construction of geospatial information requires extensive and accurate surveys of the land [3,4]. Recently, as the digitization of geospatial information has recently become prevalent, some portal sites or mobile service providers have constructed proprietary GIS that combines authoritative GIS, aerial photos, mobile-mapping service (MMS), crowdsourcing data, etc. [5,6]. On the other hand, voluntary GIS, such as the openstreetmap (OSM), has been constructed by the participation of voluntary users carrying a GPS-enabled mobile terminal [7]. Currently, more than 7.8 million registered users all around the world contribute to the OSM [8].
A road network is a subset of GIS that focuses on road objects, attributes, and their interconnectivity. It is usually represented by a graph, where a node represents an intersection, an endpoint of a road, or a point of attribute change, whereas an edge represents a road segment connecting two nodes. The road network is an important component of many Intelligent Transportation System (ITS) applications. For example, turn-by-turn navigation establishes the shortest route connecting the origin and destination in the road network. In addition, the current traffic situation on the road segment is indexed by the corresponding identifier in the road network and then broadcast as public transportation data (PTD), which enables novel ITS applications, such as dynamic route guidance [9][10][11][12] Taking into account the characteristics of road networks, we consider the road network conflation (RNC) between the authoritative and voluntary road networks, i.e., NLM and ORN, for emerging new ITS services. The RNC can be seen as a generalization of the road network matching (RNM) in [26][27][28][29][30][31][32][33][34][35][36][37][38][39][40]. Given two road networks, the RNM finds the association between a set of objects in one road network and another set in the other, where both sets represent the same road entity. Since the RNM is done without any modifications of input road networks, it cannot address the problem of missing road objects that can be found in the voluntary road networks [25]. The RNC relaxes this restriction by allowing to add road objects to one input road network. Since each road network has its own strengths and weaknesses, a successful RNC solution can enhance the strengths and compensate for the weaknesses. In particular, it can suggest a new direction to the emerging new ITS applications through the integration of NLM-indexed real-time transportation data with ORN software packages. The challenge of RNC is how to address the difference between two road networks, including level of details (LoD) [30,35,40], missing road objects [30,31,35], and representation rules.
In this paper, we present an area partitioning and subgraph growing (APSG) approach to the RNC that consists of two schemes: the area partitioning (AP) scheme for the RNM and the subgraph growing (SG) scheme for the unmatched NLM objects by the AP scheme. Our AP scheme exploits the network Voronoi area diagram (NVAD) in [41] to partition the map area into a set of regions centered on each node in the NLM graph. For each partitioned region, it extracts the ORN subgraph of a complex intersection and then aggregates it into an ORN supernode so that it can be associated with NLM node via 1:1 node matching. For the unmatched NLM subgraph due to missing road objects and different representation rules, we also propose the SG scheme that sequentially inserts an ORN road object corresponding to the unmatched NLM subgraph while keeping the consistency of its connectivity to the matched NLM subgraph by the AP scheme. The numerical results at Yeoui-do, Korea's autonomous vehicle testing site, show that our APSG approach can achieve an outstanding RNC performance in terms of precision and recall. The contributions of this paper are summarized as follows: • As far as we are aware, this is the first work to provide a formal definition of RNC that allows inserting new road objects into one road network only and presenting a novel APSG approach that achieves an outstanding matching performance; • The proposed AP scheme can accurately cluster the nodes at a complex intersection not only by partitioning the map area using the NVAD but also extracting the precise subgraph that yields the maximum number of paths across the NVAD; and • To address the problem of missing road objects and different representation rules, the proposed SG scheme inserts a new road object into the ORN subgraph so that it is as consistent as possible with the existing matchings by the AP scheme.
The remainder of this paper is organized as follows. Section 2 introduces the related works of the RNC. Section 3 describes the characteristics of two road networks and formulates the RNC problem. In Section 4, our AP scheme for the RNM is presented in detail. Section 5 presents the SG scheme for the unmatched NLM objects. The numerical results are discussed in Section 6, and finally, the conclusion of this paper is given in Section 7.

Related Work
Given two input road networks, RNM is the process of associating road objects and combining their attributes that represent the same road entity without any modifications of the input road networks. In the literature, numerous research efforts have focused on the RNM [27][28][29][30][31][32][33][34][35][36][37][38][39][40]. For a complete solution to RNM, it is necessary to comprehensively take into account the geometric and topological characteristics of all road objects in both input road networks. However, since it is difficult to reflect their global information, most of the existing approaches sequentially match a road object with its counterpart based on its local information. Depending on the type of matching road objects, the existing RNM approaches are classified into the node (or point) [26][27][28][29][30], path (or line) [31][32][33][34]36,37], and subgraph matching [38,39].
First, the node matching focuses on the matching between the points in the input road networks, such as intersections, traffic monitoring points, and the endpoints of overpass/underpass, bridges, and tunnels. The basic idea of node matching is to assess the proximity of the points to be matched as well as the similarity of their geometric and topological properties of incident edges. The seminal work in [26] presents an iterative scheme for RNM between the United States Geological Survey (USGS) and the Bureau of the Census: At each iteration, given a part of nodes already matched with their counterparts, the remaining nodes are relocated by the rubber-sheet transformation, and then, a new set of 1:1 node matchings is obtained again. In [27], the 1:1 node matching between two road networks with an order of scale difference exploits a few geometric dissimilarity measures, such as the Euclidean distance, nodal degree, and average orientation difference of incident edges. Given a matching of a node and its all neighbor nodes, paper [28] presents a round-trip walk scheme for evaluating the local topological consistency along the round-trip path across two road networks and the existing node matching. Although this paper also identifies the difficulties of 1:n and m:n node matchings, they are left as an open problem. By replacing the Euclidean distance of the DBSCAN clustering in [42] with the graph distance of road network, the authors in [30,40] present a node clustering scheme that aggregates the multiple nodes at a complex intersection into a single node. However, their clustering approach aggregating all intermediates nodes with an empirically determined stroke-length threshold may include too many nodes that do not belong to the complex intersection, which significantly degrades the overall matching performance. On the contrary, the proposed AP scheme can accurately cluster the nodes at the complex intersection not only by partitioning the whole map area based on the NVAD but also by extracting the precise subgraph that yields the maximum number of paths across the NVAD, which will be shown in Section 6.3.
Second, the path matching associates a path in one road network with another path in the other: Depending on the number of edges in each path, the path matching can be classified into 1:1, 1:n, m:1, and m:n edge matchings. A buffer-growing approach is proposed to address the most general m:n edge matching, where the merit function of potential matching pairs are computed by the mutual information of positions, angles, lengths, and forms within a two-hop distance, and the one with the highest mutual information is eventually selected as the matching pair [31]. An adaptive algorithm is proposed to determine the appropriate buffer size of the buffer-growing algorithm [32]: If the buffer size is too small, no candidate path can be found, and if the buffer size is too large, the computation complexity becomes high. However, the buffer-growing algorithm has two limitations: (1) to reduce the global errors between two input road networks, it requires an initial affine transformation using manually selected control points at the preprocessing step; and (2) to compute the mutual information, it also needs the statistical distribution of previously matched data from the same pair of input road networks, which is not usually available. A probabilistic relaxation scheme is also presented in [33], where it initializes the probability matrix based on the geometric dissimilarity of paths, iteratively updates the matching probabilities by evaluating the compatibility of neighbor candidate pairs, and selects the final 1:1 and 1:n matching pairs from the probability matrix. The probabilistic relaxation scheme in [34] improves the matching performance not only by considering both geometric and topological characteristics in the computation of probability matrix but also by inserting a virtual node in order to address the m:n matching pattern. To mitigate the user errors in the OSM crowdsourcing process, our APSG approach to the RNC problem also inserts a new node and edge into the ORN subgraph so that it can better match with the NLM.
Finally, the subgraph matching starts from an initial matching between the seed nodes, and the matched subgraph grows through a sequence of path and node matchings at each iteration. The semi-automated RNM in [38] consists of automated and interactive matching algorithms: The former includes the establishment of an initial matching for seed nodes and the expansion of the matching via cluster-based node/path matching algorithms, while the latter allows a human operator to manually correct the incorrect and improper initial matchings. On the other hand, the iterative matching algorithm in [39] initially performs the rubber-sheet transformation and topologically splits a path to maximize the number of 1:1 edge matchings. Then, starting from a subset of seed nodes, its combined edge and node matching algorithm gradually adds 1:1 matchings at the boundary of the existing matching set. Since the subgraph matching associates two existing road objects that represent the same road entity, its subgraph growing is determined by the similarity measure of their geometric and topological characteristics. The prime difference of our SG scheme is that new road objects are sequentially inserted into the subgraph of one road network to address the problem of missing road objects and different representation rules. In this process, the order of inserted road objects is carefully determined so that the resulting subgraph is as consistent as possible with the existing matchings by the AP scheme.

Input Road Networks and Problem Specification
In this section, we describe the characteristics of two road networks, i.e., NLM and ORN, and then formulate the RNC problem.

Node-Link Map
The Korean government has initiated the national GIS project in 1995 and completed the construction of the geospatial database in 2009 [43]. The NLM is the road network of this database that represents major road objects in Korea [4]. It also provides a unified identifier (ID) hierarchy to its road entity. In order to efficiently exchange the ITS information, the Korean law enforces that all ITS applications must use the NLM ID hierarchy to exchange road and traffic information [17]. Figure 1 shows an NLM graph representation of the Yeoui2-gyo intersection, Yeoui-do, Seoul, Korea overlaid on top of the aerial view, where Gukhoe-daero (east-west road) and the access ramps of Nodeul-ro (north-south underpass) are interconnected. The NLM graph is a directed graph G N = (N , L), where N is the set of nodes representing the points at which the road characteristics are changed and L is the set of links connecting two nodes. The A node can be an intersection (n i , n l , and n m ), traffic monitoring point, administrative boundary, and the endpoints of the road, overpass, and underpass (n j and n k ). A single NLM node n i ∈ N is used to represent a complex intersection (Yeoui2-gyo) without a detailed view of the internal road network. We define the subgraph G N (n i ) = (N (n i ), L(n i )), which consists of an NLM node n i , its directly connected links (pink solid links in G N ), and the neighbor NLM nodes (n j , n k , n l , and n m ). An NLM node is placed at the crosspoint of two roads, where a road consists of two parallel links, each of which represents a unidirectional road segment. In a dual carriage road, it is placed at the endpoint of two NLM links. The NLM link set L consists of unidirectional links, where its element l xy = (n x , n y ) is an ordered pair of nodes directed from n x to n y . The geometric shape of a link is approximated by a sequence of concatenated line segments. For example, unidirectional links l ji and l im are shown by pink solid lines with triangular marks for their directions. The underpass and overpass links are also placed in parallel with the main road segment with additional spacing between them. In this paper, we represent each NLM underpass/overpass by the pink dashed line, as shown in Figure 1. Each link has a set of attributes, such as link_id, f_node, t_node, road_rank, road_type, connect, road_use, etc., where the road_rank attribute represents the class of road segment as shown in Table 2, road_type specifies the type of road, such as overpass, underpass, bridge, tunnel, etc., connect specifies the type of ramps depending on the road_rank attribute, and f_node and t_node represent the start and end node indexes of the NLM link, respectively. Metropolitan city road 105 Aerial or inter-province road 106 Intra-province road 107 Intra-city or island road 108 Other roads

OpenStreetMap Road Network (ORN)
The ORN is a subset of OSM objects with the highway tag, where a tag is an ordered pair of (key, value) identifying the attribute of a road object. Table 3 shows the highway tag of way, which is classified into a few groups. In each group, the tag values are ordered from the most important to the least important. The main focus of this paper is on the road and link road groups, where the former is a way of representing a road while the latter is a way of connecting two roads in a complex intersection. We initially prune all ORN objects in special roads, paths, sidewalks, and cycleways groups that do not correspond to the NLM objects. This pruning process removes approximately 20% of unnecessary road objects from the original ORN. Furthermore, we also remove the subgraphs for underpass/overpass in both road networks because they can be easily matched via their attributes, such as NLM road_type and ORN tunnel and bridge tags (In some figures, we still illustrate the ORN underpass/overpass with a green dashed line for the clarity of expression).  Figure 2 shows the ORN graph representation, which can be modeled by an undirected graph G O = (V, E ). Contrary to the NLM graph G N , the ORN graph G O is designed to reflect the detailed road network at a complex intersection. This feature makes the ORN more suitable for ITS applications, such as navigation and autonomous driving.
In G O , an ORN node v ∈ V is connected to at least three neighbor ORN nodes. In the RNC, NLM node n is associated with ORN subgraph G O (n), where the ORN subgraph can be a single ORN intersection node, e.g., G O (n l ), disconnected subgraphs, e.g., G O (n j ) and G O (n k ), or a connected subgraph, e.g., G O (n i ) and G O (n m ), in Figure 2. If an intersection consists of a single ORN intersection node, it is called a simple intersection; otherwise, it is called a complex intersection.
The atomic unit for representing an ORN road is a way w ∈ W, which may span multiple ORN nodes [7]. If way w includes more than two ORN nodes, it is decomposed into consecutive ORN edges e ∈ E so that each edge connects two ORN nodes only. In Figure 2, the Gukhoe-daero in the ORN subgraph G O (n i ) consists of edges with the road tag group only, as shown in solid green lines, whereas all remaining edges in G O (n i ) belong to the link road tag group, as represented by dotted green lines. On the other hand, all intersecting edges at a simple intersection, such as G O (n l ), belong to the road tag group. In the case of a dual carriage road, a distinct edge is used for each ORN edge, whose direction is specified to the direction tag.  Figure 3 shows the NLM and ORN graph representation of Yeoui-do roads, which are given as the input of our RNC problem. The NLM in Figure 3a is a low-detailed road network consisting of major public roads only, while the ORN in Figure 3b has a much more detailed representation of the road network. Given NLM G N and ORN G O , the RNC problem is an association problem that finds the ORN subgraph corresponding to each NLM object while allowing to add new road objects to the ORN. Since each road network has its own rules for representing its road objects, there are several differences in representing road objects between two road networks, as shown in Figure 4: Figure 4a shows different numbers of road objects, where the ORN shows both major and minor roads in a geographical area, while the NLM displays major public roads only. Figure 4b illustrates different LoDs at a complex intersection, where the ORN illustrates all detailed connectivity at the intersection, whereas the NLM aggregates them into a single NLM node. Figure 4c reveals two different rules to represent a merging lane, where it is a part of the mainline road in the NLM, while it is a part of the on-ramp with the trunk_link tag in the ORN. Figure 4d shows two NLM nodes that do not have the corresponding ORN subgraphs at the crosspoints of the administrative boundary. Figure 4e illustrates an NLM link without the corresponding ORN object due to its omission during the crowdsourcing process of OSM. Figure 4f also shows an NLM subgraph that does not have the corresponding ORN subgraph due to the OSM crowdsourcing errors. To summarize, a comprehensive solution to the RNC problem needs to address the fundamental issues of these representational differences, as follows:
To identify the ORN subgraph of a complex intersection in order to alleviate the LoD difference between two road networks; 2.
To find a reliable methodology to cope with the differences in the representation of merging lane and administrative boundary; and 3.
To create a new ORN subgraph corresponding to the unmatched NLM subgraph while keeping the consistency of its connectivity to the matched NLM subgraph.

Area Partitioning for LoD Difference at a Complex Intersection
For a given NLM node n i with NLM subgraph G N (n i ) and ORN graph G O = (V, E ), the challenging task is to accurately extract ORN subgraph G T (n i ) against a wide variety of intersection topology, as shown in Figure 4b. An inaccurate ORN subgraph incurs an incorrect matching, which in turn influences the accuracy of another matching. This propagation eventually results in severe degradation of RNC performance.
Our AP scheme first computes the region of the map dedicated to NLM node n i in which the corresponding ORN subgraph may exist. Then, it extracts ORN subgraph G * O (n i ) along the path connecting each pair of entering and exiting points across the region boundary, taking into account the turning information and geometry of intersection. Finally, ORN subgraph G * O (n i ) is replaced by an ORN supernode v * i so that it can be matched with NLM node n i via 1:1 node matching.

Network Voronoi Area Diagram (NVAD) for Partitioning Map Area
Given NLM subgraph G N (n i ) and the corresponding map area A(n i ) around n i , the first task of our AP scheme is to partition this area into regions, where each region is centered at an intersection in N (n i ). If we focus on NLM node N (n i ) without considering NLM subgraph G N (n i ), a simple method to partition the map area A(n i ) based on the Euclidean distance is called the Voronoi diagram (VD) [41]. In the VD, a point n ∈ A(n i ) is associated with the region of the closest intersection n x , called the Voronoi cell V(n x ), in terms of the Euclidean distance metric: where N (n i ) = {n i , n j , n k , n l , n m } for NLM graph G N (n i ) in Figure 5a. Given an ORN node n ∈ V(n j ) in map area A(n i ), the Euclidean distances from the three closest NLM nodes are shown in Figure 5a. For two NLM nodes n x and n y (n y ∈ N (n i )\{n x }), the boundary of Voronoi cells becomes a hyperplane that is equidistant from both NLM nodes. Finally, Voronoi cell V(n x ) is constructed by intersecting all half-spaces in which NLM node n x is located. For example, Voronoi cell V(n i ) is illustrated with the blue transparent quadrilateral in Figure 5b. However, given the geometry of NLM subgraph G N (n i ), the Euclidean norm is no longer a valid measure to evaluate the distance between point n ∈ A(n i ) and the set of NLM nodes in N (n i ), because it does not account for the distance between two points in an arbitrary road network G N . The network Voronoi node and link diagrams in [41,44] can be used for the measure of graph distance, but they are limited to the points on NLM subgraph G N (n i ) only and are not applicable for an arbitrary point in area A(n i ). To address this problem, our AP scheme adopts the Network Voronoi Area Diagram (NVAD) whose measure reflects two distance factors [41]: First, if point n is on subgraph G N (n i ), the distance should be the length of shortest path to NLM node n x ∈ N (n i ) in G N (n i ), which is called the graph distance d G (n, n x ). If point n lies in A(n i )\G N (n i ), the measure should also consider the projection distance d P (n, n x ) to the closest NLM link of subgraph G N (n i ). Figure 5c shows these distances between point n and the two closest intersections, n i and n l . Consequently, the distance metric of NVAD is defined as the sum of these two distance components, i.e., To determine the NLM link onto which a given point n is projected, we choose an example of map area A jm (n i ) surrounded by unidirectional NLM links l ji and l im in Figure 5c, where the former (latter) consists of two (three) line segments. The k-th line segment and vertex of NLM link l ji are denoted by l ji (k) and n ji (k), respectively, where n ji (0) = n j and n ji (2) = n i . Our approach draws the equiangle boundary starting from the center of intersection n i until its projection approaches the endpoint of the shorter line segment n im (1). Notice that any points on this projection boundary are equidistant from both NLM links l ji and l im . In Appendix A, we demonstrate that the projection boundary curve becomes a concatenation of linear or parabolic segments. Figure 5c shows the resulting blue dotted projection boundary of map area A jm (n i ). Figure 5d shows all projection boundaries that partition map area A(n i ) into four projection areas, each of which has a pair of NLM links between n i and its neighbor NLM node. At the middle point of these links, we draw a perpendicular line that bisects the projection area. Then, NVAD cell V * (n i ) is determined by the union of the bisected map area in which NLM node n i is located, as shown by the blue transparent polygon in Figure 5d. For each NLM link l passing through the NVAD cell boundary, we finally build a list of candidate ORN edges E l = {e l (1), e l (2), · · · } of the same direction whose distance along the boundary line is less than threshold δ. For example, in Figure 5d, NLM links l ji and l ik have two ORN edges in their lists, while all remaining NLM links have only one ORN edge. In the next section, the candidate ORN edges will be examined to be the correspondent of an NLM link.

Extraction of Candidate ORN Subgraph
Given NVAD cell V * (n i ), the allowable turn information at NLM node n i , and set E l of candidate ORN edges for NLM link l, the objective of this section is to extract the corresponding ORN subgraph G * O (n i ) in V * (n i ) that corresponds to NLM node n i . Our key observation is that an intersection allows at most one path for each pair of roads, where one enters to and the other exits from NVAD cell V * (n i ). Starting with the null ORN subgraph having no ORN node and edge, the basic idea of our approach is to sequentially insert an ORN path passing through the intersection along which the turn restriction is satisfied at each pair of consecutive ORN edges. Without loss of generality, we focus on the construction of ORN path p jk , as shown in Figure 6.  Figure 6a shows an example of a simple intersection, where NLM node n i connects a two-way road (l ki and l ik ) and three one-way roads (l ji , l il , and l mi ). Due to the LoD difference, the ORN subgraph in V * (n i ) consists of two components: (1) the true ORN subgraph almost overlapped with NLM subgraph G N (n i ), and (2) the remaining ORN subgraph representing a minor road network around intersection n i . Denoting by v I j and v O k the crosspoints of the entering and exiting ORN edges at the boundary of NVAD cell V * (n i ), respectively, there are three candidate paths in Figure 6a: . Among these paths, our candidate ORN subgraph extraction (COSE) scheme chooses the path that has smallest sum of turning angles regardless of its direction. For example, path p jk (1) has the smallest total turning angle since it makes only one left-turn at v i compared to three turns in the other two paths.
Although ORN subgraph G * O (n i ) is much more complex than NLM subgraph G N (n i ) around a complex intersection, it is surprising that our key observation is valid for all complex intersections in Yeoui-do except for the blue dashed paths in Figure 6a,b. They are evidently originated from the crowdsourcing error that omits a left-turn restriction at ORN node v i by the participating users, and eventually, they turn out to be invalid paths. Unfortunately, these human errors are inevitable in the crowdsourcing-based ORN. To exclude these exceptional paths from ORN subgraph G * O (n i ), we exploit the second key observation that the geometry of connecting roads in a complex intersection is designed in a way that the curvature changes linearly with the curve length, which is known as the clothoids. Based on this observation, the COSE scheme discards a path if it has two consecutive edges and the angles between them abruptly change, e.g., the blue dashed path at node v i in Figure 6a,b.
The final step of the COSE scheme is the derivation of ORN subgraph G * O (n i ) for NVAD node n i . It first calculates the number of allowable ORN paths for each ingress-egress pair of ORN edges at the boundary of NVAD cell V * (n i ). Then, it chooses the optimal ORN subgraph G * O (n i ) = (V * (n i ), E * (n i )) that yields the largest number of allowable ORN paths. Finally, it extracts all ORN nodes from V * (n i ). If there is more than one ORN node in V * (n i ), our AP approach replaces them with ORN supernode v * i located at the center of them, as shown in Figure 7. By this replacement, the node matching becomes a simple 1:1 matching between NLM node n i and ORN supernode v * i .  Figure 8 shows the matching results between NLM subgraph G N (n i ) and ORN graph G O . Depending on which road object belongs to ORN subgraph G * O (n i ), both node and edge matching results can be one of the following four matching types: correct match (CM), incorrect match (IM), partial match (PM), and missing match (MM). For each NLM node, the node matching result can be determined as follows:

Classification of RNM Result
• The red dashed lines in Figure 8 represent the CM between NLM and ORN nodes, where the sets of true ORN nodes for NLM nodes n i , n j , n k , and n l are denoted by V T (n i ) = {v i,1 , v i,2 , v i,3 }, V T (n j ) = v j , V T (n k ) = v k , and V T (n l ) = v l , respectively; • A node matching becomes MM if its set of ORN nodes is empty, i.e., V * (·) = φ; • A node matching becomes IM if its set of ORN nodes is disjoint with the set of true ORN nodes, i.e., V * (·) ∩ V T (·) = φ; and • A node matching becomes PM if its set of ORN nodes satisfies two conditions V * (·) ∩ V T (·) = φ and V * (·) = V T (·).
At the boundary of two adjacent NVAD cells V * (n i ) and V * (n j ), the COSE scheme also yields a solution to the edge matching between NLM link l and two ORN edges: one from set E l ∩ E * (n i ) in area A(n i ) and the other from E l ∩ E * (n j ) in area A(n j ), respectively. Similarly, the type of edge matching result is determined as follows: • The blue dashed lines in Figure 8 represent the CM between NLM link and ORN edges, where the sets of true ORN edges for NLM link l ij , l ik , and l il are denoted by An edge matching becomes MM if its set of ORN edges is empty, i.e., E * (·) = φ; • An edge matching becomes IM if its set of ORN edges is disjoint with the set of true ORN edges, i.e., E * (·) ∩ E T (·) = φ; and • An edge matching becomes PM if its set of ORN edges satisfies two conditions E * (·) ∩ E T (·) = φ and E * (·) = E T (·).
Finally, we partition NLM graph G N into the matched and unmatched NLM subgraphs G * N = (N * , L * ) and G N = (N , L), where the former includes all NLM road objects of CM, IM, and PM types, while the latter has those in MM type only.

ORN Subgraph Growing for Unmatched NLM Subgraph
The unmatched NLM subgraph is mainly originated from missing ORN objects in the OSM crowdsourcing process or the differences in representation rule. In general, a connected subgraph of unmatched NLM graph G N can be either NLM node n i , NLM link l ij , or NLM component C N consisting of at least two NLM road objects. First, we present two schemes for unmatched single NLM node due to the differences in representation rule: the NVAD cell expansion (NCE) scheme for a merging lane in Figure 4c and the NLM node projection onto the ORN edge (NPOE) scheme for administrative boundaries in Figure 4d. Second, we present the ORN edge insertion (OEI) scheme for the unmatched single NLM link in Figure 4e. Finally, we present the sequential ORN subgraph growing (SOSG) scheme for the unmatched NLM component in Figure 4f. Finally, we also address the internal structure design of new ORN nodes by the SG scheme.

Schemes for Unmatched Single NLM Node
We present two schemes to address the difference in representation rule: the NCE scheme for the merging lane and the NPOE scheme for the administrative boundary. This difference results in an isolated NLM node, as shown in Figure 9. As a result, the ORN edge connecting v * k and v * i is longer than the corresponding NLM link l ki . This rule difference results in unmatched single NLM node n i with |L(n i )| ≥ 3, because its corresponding ORN node v * i is located outside its NVAD cell V * (n i ). To address this problem, we present the NCE scheme as follows: It first expands its NVAD cell V * (n i ) through the union of all NVAD cells in map area A(n i ), i.e., ∪V * (n x ) for each n x ∈ N (n i ). Next, the COSE scheme in Section 4.2 is used to extract the corresponding ORN node v * i from all possible ORN paths, e.g., paths v * j → v * l and v * k → v * l in Figure 9a. Figure 9b shows an example of different rules for indicating a road crossing an administrative boundary: Two nodes n i and n j are created to represent the administrative boundary in the NLM links, while no corresponding ORN node exists in NVAD cells V * (n i ) and V * (n j ), respectively. To address this problem, we propose the NPOE scheme that projects the unmatched NLM nodes n i and n j onto the ORN subgraphs G O (n i ) and G O (n j ) obtained from the COSE scheme, respectively. For example, Figure 9b shows two ORN nodes v * i and v * j that are matched with unmatched NLM nodes n i and n j , respectively. If the unmatched NLM node is on dual carriage roads, the NPOE scheme collapses the projected ORN nodes into an ORN node located at the middle of them. Figure 10 shows an example of OEI scheme to address the problem that there is no ORN edge corresponding to NLM link l ij . In this example, both endpoints n i and n j of NLM link l ij are matched with ORN nodes v * i and v * j via the AP scheme, respectively. However, the ORN edge connecting these ORN nodes is missing possibly due to user errors in the OSM crowdsourcing process. The goal of this section is to insert an ORN edge e * ij that corresponds to NLM link l ij . To this aim, our OEI scheme considers three factors: (1) the displacement ∆ i between NLM node n i and ORN node v * i , (2) the angle difference α between NLM line segment (n i , n j ) and ORN line segment

OEI Scheme for Missing ORN Edge
The OEI scheme first computes an orange dashed link between NLM nodes n i and n j , which is equally distant from both NLM links l ij and l ji . Next, it obtains a blue dashed link by shifting the orange dashed link by ∆ i so that it can start from ORN node v * i . Then, it computes a red dashed link by multiplying the scaling factor β to the blue dashed line. Finally, ORN edge e * ij in Figure 10 is obtained by rotating the red dashed link by angle α around ORN node v * i . Figure 10. Example of OEI scheme for correspondent-missing NLM link l ij .

SOSG Scheme for Unmatched NLM Component
During the OSM crowdsourcing process, the ORN subgraph G O (C N ) corresponding to unmatched NLM component C N may not exist due to the misinterpretation of the road network (see the example in Figure 4f). Figure 11a shows an example of NLM component C N consisting of three unmatched NLM nodes (n 1 , n 2 , and n 3 ), and 16 unmatched NLM links: Two unmatched NLM links connect two unmatched NLM nodes in C N , while 14 unmatched NLM links pass through the boundary of C N . The objective of our SOSC scheme is to construct a simple ORN subgraph G O (C N ) that corresponds to NLM component C N .
The basic idea of the SOSC scheme is to sequentially examine an unmatched NLM node in C N , and for each unmatched NLM node, to construct the corresponding ORN subgraph using both OEI and NPOE schemes. It maintains priority queue Q that determines the order of unmatched NLM nodes sequentially extracted from C N . To better associate with the neighbor NLM nodes in matched NLM subgraph G * N , the key k i of unmatched NLM node n i in priority queue Q is defined as the ratio of unmatched neighbor NLM nodes to all neighbor NLM nodes N (n i )\{n i }. Since the three key values are k 1 = 1 3 , k 2 = 1 2 , and k 3 = 1 4 in Figure 11a, NLM node n 3 is first extracted from Q. For unmatched NLM node n 3 extracted from Q, it first investigates the existence of an ORN edge corresponding to NLM links l 3j and/or l j3 , where NLM node n j belongs to the set of matched neighbor NLM nodes N (n 3 ) ∩ N * . Figure 11a shows a dual carriage edge between two neighbor ORN nodes v * 6 and v * 8 . In this case, it uses the NPOE scheme to insert ORN node v * 3 to the center of two projection points onto ORN edges (v * 6 , v * 8 ) and Figure 11b. Once ORN node v * 3 is created, the OEI scheme is used to insert a new ORN edge e * 37 . Then, unmatched NLM node n 3 and new NLM links l 36 , l 63 , l 37 , l 38 , and l 83 that have their corresponding ORN edges are removed from NLM component C N and then inserted to matched NLM subgraph G * N , which reduces the key value of unmatched NLM node n 2 to k 2 = 1 4 , as shown in Figure 11b. Next, unmatched NLM node n 2 extracted from Q is examined to find an existing ORN edge corresponding to NLM links l 23 , l 25 , l 52 , l 29 , and l 92 . Since there is no such ORN edge, the SOSG scheme overlays ORN node v * 2 on top of n 2 and uses the OEI scheme to insert these ORN edges e * 23 , e * 25 , and e * 29 , as shown in Figure 11c. The newly matched NLM objects are removed from C N and inserted to the matched NLM subgraph G * N . Finally, the key value of the last NLM node n 1 is updated to zero (k 1 = 0).
Similarly, the last unmatched NLM node n 1 in C N has one ORN edge between ORN nodes v 1 and v * 4 . The SOSG scheme creates an ORN node v * 1 at the projection point onto the extended ORN edge and inserts an ORN edge v 1 v * 1 in Figure 11d. Finally, it also uses the OEI scheme to add the ORN edges e * 12 , e * 14 , and e * 110 , which completely covers the unmatched NLM component C N . To make the resulting ORN subgraph simple for the first two cases, our SG scheme restricts that all ORN paths through the intersection must intersect at the same ORN node. In addition, a new relation must be inserted into the ORN in order to reflect a turn restriction between a new ORN edge and an existing ORN edge. Since there is only one ORN node at a simple intersection, the new ORN edge is directly connected to ORN node v * i , as shown in Figure 12a. On the other hand, the ORN supernode v * i for dual carriage road is placed in the middle of two parallel ORN edges. In Figure 12b, our SG scheme overlays an ORN node v * i,1 to this supernode and then requires that all additional ORN edges must intersect at this point. To interconnect the dual carriage edges to ORN node v * i,1 , it also inserts two internal (red dashed) ORN edges, which connect this node and its projection onto two opposite ORN edges, i.e., ORN nodes v * i,2 and v * i,3 . To avoid the u-turns via new internal ORN edges, it is also required to add an additional ORN relation that restricts the u-turns between two dual carriage edges. However, it is not easy to define a single ORN node for connecting all ORN edges in a complex intersection due to the wide diversity of its internal structure. Figure 12c shows an example of ORN subgraph for complex intersection, where the set of ORN nodes are partitioned into two subsets: (1) the subset V * i,C of core ORN nodes where each ORN edge is connected to another ORN node in the complex intersection, and (2) the subset V * i,B of boundary ORN nodes having at least one ORN edge that connects to an ORN node outside the complex intersection. For example, Figure 12c. In order to add a new ORN edge regardless of the internal structure, the SG scheme first adds a new boundary ORN node v * i,6 and then adds a new (red dashed) ORN edge that directly connects this new node with every other boundary ORN node. To reflect a turn restriction between a new ORN edge and an existing ORN edge, a new relation should be inserted into the ORN similarly to the previous two cases.

Numerical Results
In this section, we present the numerical results of the RNC between ORN and NLM at Yeoui-do island, Seoul, Korea: The former is extracted from the XML file at the official OSM website [45], and the latter is a shape file downloaded from the Korean ITS website [4]. Both road networks are imported to the PostgreSQL database for the RNC [46]. Table 4 shows the statistical information on the area, the number of nodes, the road segments, and the total length of road networks. In this paper, the proposed AP scheme is compared with three existing node matching schemes, as follows: • Nearest first matching (NFM): In the NFM, the Euclidean distance between each NLM and ORN node pair that is within a distance threshold (100 m) is initially stored in a priority queue. At each step, the matching (n * i , v * j ) with the smallest Euclidean distance in the priority queue is chosen, and then all remaining matchings with either NLM node n * i or ORN node v * j are removed from the priority queue.
• Round-trip walk matching (RWM) [28]: Given an initial matching, the RWM checks the topological consistency of the matching through the following three steps: First, it extracts the corresponding ORN node v j of each neighbor NLM node n j ∈ N (n i )\n i . Second, for each corresponding ORN node v j , it examines the topological consistency by checking whether the corresponding ORN node v i of NLM node n i is also its neighbor ORN node or not. Finally, the ratio of the topologically inconsistent neighbor node is stored in a priority queue so that an NLM node with the highest topological consistency is extracted first for the final matching. • RWM with DBSCAN clustering (RWM-DC): Since both NFM and RWM are 1:1 node matching, they do not account for the LoD difference at a complex intersection. To mitigate this problem, the RWM-DC scheme combines the RWM with a clustering algorithm called the DBSCAN [40,42].
Given all pairs of matched NLM and ORN nodes, we use the score-based matching (SM) for the edge matching of the above three schemes [37]. The SM first computes a discrete similarity score based on multiple independent measures, i.e., the Hausdorff distance [39], orientation [31,39], mean perpendicular distance, and the nodal degree of endpoint nodes [28], and then chooses a pair with the highest score.
In our AP scheme, the threshold δ in Section 4.1 is chosen to the maximum width of the general highway and local road in Korea (34 m) [3].

RNM Results
In this section, we compare the RNM results of our AP scheme with those of three other RNM schemes. In Section 4.3, the matching result can be either CM, IM, PM, or MM. If we think of the RNM result as a binary classification, the CM can be interpreted as true positive, and the IM and PM can be interpreted as false positive. On the other hand, if we look at how a true ORN subgraph is matched to which NLM object, we can classify the matching result into three different cases, as follows: First, a matching scheme successfully finds the NLM object that corresponds to the true ORN subgraph, and the matching result becomes CM. Second, if it fails to find the right NLM object corresponding to the true ORN subgraph, the matching result is classified into failed match (FM), which can be interpreted as false negative. Then, the FM can be further partitioned into PM, IM, and MM. Third, there is an exceptional case of binary classification, where the true ORN subgraph does not exist due to the errors in the OSM crowdsourcing process. Denoting the cardinality of the type-m matching result by |M(m)|, the precision, recall, and F1-score of the matching result can be defined as follows: and respectively. Figure 13 shows the ratio of node matching results against the RNM schemes. We first observe that the proposed AP scheme can achieve an outstanding CM ratio of 0.73 at least 14.1% higher than the other RNM schemes. Its (CM, PM, IM, MM) ratio is (0.73, 0.028, 0.006, 0.237). The NFM and RWM schemes that do not support node clustering show almost similar RNM performance: The (CM, PM, IM, MM) ratios of NFM and RWM schemes are (0.582, 0.164, 0.113, 0.141) and (0.588, 0.164, 0.102, 0.147), respectively. The inaccurate node clustering of RWM-DC degrades the CM ratio to 0.503 while increasing the PM and MM ratios to 0.232 and 0.175, respectively. The excellent node clustering performance of the AP scheme originates from its low false positive ratio of 0.028, which is at least 8.29 times smaller than those of the other RNM schemes. Furthermore, the AP scheme has the lowest IM ratio of 0.006, while those of the other RNM schemes are at least 0.09. Since the node matching is performed sequentially for each NLM node, an IM of the previous NLM node may block the CM of a subsequent NLM node, which can significantly reduce the CM ratios of the other RNM schemes. The only problem with the AP scheme is its relatively high MM ratio, which will be addressed in Section 6.3.  Figure 14 shows the precision, recall, and F1-score of node matching in the NFM, RWM, RWM-DC, and AP schemes. It can be seen that the precision, recall, and F1-score of the AP scheme are at least 26.7%, 17.1%, and 21.7% higher than the other RNM schemes, respectively. Similar to the ratio of node matching result, the NFM and RWM schemes show a similar precision, recall, and F1-score: The difference in their performance is within 1.1%. The RWM scheme shows the lowest precision, recall, and F1-score due to its inaccurate node clustering. For example, the node clustering results of RWM-DC and AP schemes are shown in Figure 15a and Figure 15b, respectively, for the same complex intersection in the shaded region. While the AP scheme extracts the exact ORN nodes for the complex intersection, the RWM-DC scheme cannot distinguish three red ORN nodes belonging to minor intersections.

Node Matching Results
To summarize, the proposed AP scheme achieves an excellent node matching performance in terms of precision, recall, and F1-score compared with the existing three RNM schemes.

Edge Matching Results
In this section, the edge matching performance of the AP scheme is compared with those of three existing RNM schemes in Section 6.1. Figure 16 shows the ratio of edge matching results against the RNM schemes. We observe that the proposed AP scheme shows an excellent edge matching performance compared with the other RNM schemes. It has the highest CM ratio of 0.873 (at least 32% higher than the others), the lowest false positive ratio of 0.05 (at least 12.7% lower than the others), and the lowest MM ratio of 0.076 (at least 19.4% lower than the others). This outstanding performance of the AP scheme comes from its highly accurate node clustering at a complex intersection that minimizes both PM and IM ratios, which restricts the propagation of false positive in the subsequent edge matching. On the other hand, an inaccurate node matching of three RNM schemes results in a high MM ratio of edge matching. This is because in a generic road network with limited nodal degree, a change in the endpoints of ORN edge leads to a non-existent ORN edge with high probability. Figure 17 shows the precision, recall, and F1-score of the edge matching against the RNM schemes. The AP scheme achieves superior edge matching performance with at least 18.8%, 34.1%, and 27.5% higher precision, recall, and F1-score, respectively, than the other RNM schemes. We also observe that the high MM ratio of three existing RNM schemes significantly degrades their recall performance.
From these results, we demonstrate that the proposed AP scheme can also achieve an outstanding edge matching performance compared with the existing RNM schemes.  Figure 17. Precision, recall, and F1-score of edge matching.

RNC Results
In this section, we investigate how our APSG scheme can further improve the matching performance of the AP scheme. Table 5 lists the number of CM, PM, IM, and MM results of AP and APSG schemes at Yeoui-do island consisting of 177 NLM nodes and 434 NLM links. By adding ORN objects, the APSG scheme further improves the node matching performance of the AP scheme: The number of CM results is increased by 41, while the number of MM results is reduced by 42. As a result, it can increase the recall by 8.29% while slightly improving the precision by 0.49%. The APSG scheme also improves the edge matching performance compared with AP scheme: It improves both the precision and recall of AP scheme by 1.8% and 3.19%, respectively.  Matches  CM  PM  IM  MM  FM  CM  PM  IM  MM  FM   AP  129  5  1  42  18  379  18  4  33  28  APSG  170  5  2  0  7  418  12  4  0  16 In Table 5, we also found the limitation of our APSG scheme in an exceptional node matching where an MM result of the AP scheme becomes an IM result by the APSG scheme. The shaded region in Figure 18 shows the complex intersection consisting of two nodes in both road networks. The NLM interprets this complex intersection as the combination of two intersections: n i connects a road with an underpass and n j connects three NLM links. On the other hand, the ORN interprets it as a single intersection with ORN nodes v k and v l interconnecting a dual carriage road, a road, and an underpass. This difference in the interpretation of road objects leads to 2:2 node matching, which cannot be addressed by our APSG scheme: In the AP scheme, the matching results for NLM nodes n i and n j are MM and PM, respectively. The APSG scheme projects NLM node n i onto the ORN nodes v m and v n in a dual carriage road, which changes the matching result to IM. Finally, the matching results of our APSG scheme at Yeoui-do island are shown in Figure 19, where Figure 19a,b illustrate the node and edge matching results, respectively. The blue subgraph represents the new subgraph added to the ORN by the APSG scheme. In addition, the thick dark green, orange, and red lines indicate the CM, PM, and IM results, respectively, between NLM and ORN objects. We can see that the proposed APSG scheme achieves outstanding node and edge matching performance.

Conclusions
This paper presents the APSG approach to the conflation between administrative and voluntary road networks. The AP scheme addresses the LoD problem of complex intersection through the partition of map area, extraction of candidate ORN subgraph, and aggregation to a supernode. For the unmatched NLM subgraph, the SG scheme sequentially inserts an ORN object while satisfying the connectivity with the matched NLM subgraph by the AP scheme. The numerical results show that our APSG scheme achieves an outstanding node and edge matching performance in terms of the precision, recall, and F1-score compared with the existing RNM schemes. Data Availability Statement: Publicly available datasets were analyzed in this study. This data can be found here: https://www.openstreetmap.org/ (accessed on 2 December 2021) and http: //nodelink.its.go.kr/ (accessed on 2 December 2021).

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Transient Curve of Projection Boundary
In this appendix, we identify the transient curve of the projection boundary around a vertex in an intersection area. Figure A1 shows two examples of projection boundary in area A jm (n i ), where vertex n ji (p) connects two NLM line segments l ji (p) and l ji (p + 1) in one projection side, and NLM line segment l im (q) is common on the other projection side. Starting from NLM node n i , the projection boundary is the bisector b 1 of the angle created by l ji (p + 1) and l im (q), and it is illustrated by the blue dotted line in both examples. It is clear that every point on this projection boundary should have the same projection distance to l ji and l im , e.g., d P,1 = d P,2 . Our goal is to determine the point where the projection boundary deviates from b 1 and find the equidistant projection boundary between two NLM line segments l ji (p) and l im (q). Without loss of generality, we examine the projection boundary curve in two different cases: (1) The internal angle of vertex n ji (p) is less than 180°(θ ji (p) < 180°); and (2) It is greater than 180°(θ ji (p) > 180°).  Figure A1. Construction of projection boundary in map area A jm (n i ) with (a) θ ji (p) < 180°and (b) θ ji (p) > 180°. Figure A1a shows an example where θ ji (p) < 180°. To find the point where the projection boundary deviates from b 1 , we draw two additional bisectors that intersect with bisector b 1 at point n p : bisector b 2 of the angle between l ji (p + 1) and l im (q) and bisector b 3 of angle θ ji (p). At point n p , the projection distance to NLM line segments l ji (p), l ji (p + 1), and l im (q) becomes the same. After point n p , the projection boundary deviates from b 1 and becomes the red dotted line segment b 2 .
When θ ji (p) > 180°, as shown in Figure A1b, bisector b 2 is similarly obtained from the crosspoint of l im (q) and the extended line of l ji (p). Next, we determine point n q on bisector b 2 so that its distance to point n ji (p) is equal to the projection distance to l im (q). It is clear that beyond point n q , bisector b 2 becomes the projection boundary. The remaining problem is to determine the projection boundary between points n p and n q . To address this problem, we first define a Cartesian coordinate whose X-axis crossing at the origin point n p is parallel to l im (q). We denote the Cartesian coordinate of point n on the transient boundary curve by (x, y). Similarly, the Cartesian coordinates of point n ji (p) are denoted by (x 0 , y 0 ). Since y > 0, the projection distance of point n to l im (q) becomes y + d P,2 which must be equal to the distance between points n and n ji (p), i.e., (x − x 0 ) 2 + (y − y 0 ) 2 = y + d P,2 . (A1) Finally, the transient curve of projection boundary becomes a parabola satisfying the following equation: