Automated Conﬂation of Digital Elevation Model with Reference Hydrographic Lines

: Combining misaligned spatial data from different sources complicates spatial analysis and creation of maps. Conﬂation is a process that solves the misalignment problem through spatial adjustment or attribute transfer between similar features in two datasets. Even though a combination of digital elevation model (DEM) and vector hydrographic lines is a common practice in spatial analysis and mapping, no method for automated conﬂation between these spatial data types has been developed so far. The problem of DEM and hydrography misalignment arises not only in map compilation, but also during the production of generalized datasets. There is a lack of automated solutions which can ensure that the drainage network represented in the surface of generalized DEM is spatially adjusted with independently generalized vector hydrography. We propose a new method that performs the conﬂation of DEM with linear hydrographic data and is embeddable into DEM generalization process. Given a set of reference hydrographic lines, our method automatically recognizes the most similar paths on DEM surface called counterpart streams. The elevation data extracted from DEM is then rubbersheeted locally using the links between counterpart streams and reference lines, and the conﬂated DEM is reconstructed from the rubbersheeted elevation data. The algorithm developed for extraction of counterpart streams ensures that the resulting set of lines comprises the network similar to the network of ordered reference lines. We also show how our approach can be seamlessly integrated into a TIN-based structural DEM generalization process with spatial adjustment to pre-generalized hydrographic lines as additional requirement. The combination of the GEBCO_2019 DEM and the Natural Earth 10M vector dataset is used to illustrate the effectiveness of DEM conﬂation both in map compilation and map generalization workﬂows. Resulting maps are geographically correct and are aesthetically more pleasing in comparison to a straightforward combination of misaligned DEM and hydrographic lines without conﬂation.


Introduction
Conflation is a process of combining (merging) information from two data sources into a single source, reconciling disparities where possible [1]. Saalfeld [2] identified conflation as a distinct GIS method and set its mathematical and technological foundations. Basically, conflation employs identification of corresponding features in two datasets using spatial and semantic attributes, and then performs spatial adjustment of the features or transfers attributes between them. One of most widely used techniques for spatial adjustment is so-called rubbersheeting, during which conflated features are first triangulated and then triangulation vertices are locally shifted according to conflation links that connect similar points in two datasets [3].
Conflation is not a straightforward process, since identification of corresponding features is always uncertain to some degree. A standard approach is to use a minimax strategy, in which corresponding features are identified using the minimum or the maximum value of some similarity measure. In particular, Stanislawski et al. [4] used maximum overlap area and minimum distance in conflation of waterbody and transport (stream) reaches between two hydrographic datasets. Cobb et al. [5] emphasized the importance of using attribute information to make conflation more reliable while Walter and Fritsch [6] demonstrated that determination of the most probable object pairs for matching can be laid on statistical background. The problem of object matching is often formulated in terms of optimization which seeks for the best solution that maximizes the total similarities between all matched pairs [7] or minimizes the measure of total discrepancy between datasets [8]. Since conflation is performed mostly on linear features, special similarity measures like the Fréchet distance (see the definition in Section 3.3.1), which respect the continuity of lines, have become of a particular interest in recent research [9,10].
Conflation methods are traditionally focused on road network data [11,12] and snapping of mobile/GPS tracking data to road networks [13,14]. National mapping agencies list conflation as one of the technologies incorporated into map generalization processes [15]. Ordnance Survey GB employs conflation in production of OS VectorMapTM District data [16]. While most of the conflation methods focus on vector features, it is possible to conflate discretized spatial fields too. In particular, Kyriakidis et al. [17] developed a geostatistical approach to conflation of digital elevation models where the combined result is represented as a realization of the stochastic spatial process.
Increasing variety of spatial data sources during the recent decades raised the value of conflation as one of the key technologies in spatial data processing [18]. Among the most influential technological drivers is a wide adoption of crowdsourcing, which exploded the range and variance of spatial data quality [19]. Consequently, conflation methods spread to a variety of spatial data types and have become in demand even for imprecise sketchy spatial datasets [20]. Spatial databases like OpenStreetMap [21] are often updated using heterogeneous and imprecise sources, and reconciling topological misalignments in overlapping coverages is essential to keep the data consistent [22]. Currently, conflation instruments are available as specialized modules for general-purpose GIS platforms such as ArcGIS [23] and QGIS [24], as well as in a form of a standalone open-source [25] and proprietary [26] solutions.
Terrain and hydrography are often analyzed and mapped together due to their genetic relationships. A natural hydrographic network follows locations along the negative terrain forms which have high value of flow accumulation and comprise so-called drainage network [27,28]. When hydrographic and terrain data come from different sources, this relationship is often violated. The reasons of this can be multiple, starting from mismatch in levels of detail and data acquisition techniques, and ending in inaccurate information about geodetic datums. However, the consequence always manifests itself in a similar way: hydrographic lines do not coincide with drainage network implicitly represented in elevation data. A typical example of misalignment between DEM and hydrographic line is shown in Figure 1. It is clear that such cartographic representation is geographically incorrect and aesthetically unpleasing. A misalignment observed here has another negative effect: it does not allow the direct enrichment of hydrological DEM analysis with real hydrographic data. This problem is usually solved with specialized techniques such as stream burning [29]. These techniques enforce flow direction in a way that corresponds to configuration of a real hydrographic network. However, a DEM modified in such way is still inappropriate for cartographic purposes, since misalignment between hydrographic lines and DEM surface actually remains uncorrected.
To solve the problem, a specialized conflation method which aligns elevation and hydrography data should be applied. Alignment is needed not only for map compilation, but also during the production of generalized datasets. In particular, generalized DEM must be spatially adjusted with generalized vector hydrography. Up to date, no algorithms have been developed to solve this task.
In this paper, we propose a new method that performs automatic conflation of DEM with reference hydrographic lines and can be easily embedded into structural DEM generalization workflow. The paper follows the tradition established in GIS software, which tends to use the term stream to name the path belonging to a drainage network. While the more correct term is drainage path [27] or drainage line [28], we will use the term stream for the sake of brevity, having in mind that not all such lines correspond to real streams. Also, we will use the term pixel to name the rectangular element of a raster DEM associated with each of its nodes. This avoids the confusion associated with the ambivalent term cell, which, depending on the context, is also used to name the cohesion of the four neighboring DEM nodes in subpixel computations (e.g., bilinear resampling).
The rest of the paper is organized as follows. The next section briefly discusses related research. Then the proposed method is described in detail within the corresponding paper section. The Results section demonstrates the effectiveness of our approach on the example of conflating GEBCO_2019 DEM [30] and river/lake centerlines from Natural Earth 1:10,000,000 dataset [31]. The Discussion section provides the detailed analysis of the performance of the method and its limitations. Finally, the most important insights as well as directions of the future research are briefly summarized in the Conclusion.

Related Research
The problem of misalignment between vector hydrographic data and automatically extracted drainage network is paid a great attention in hydrological applications of DEMs. It is particularly important in the tasks where DEM is used to enrich the existing hydrographic datasets [32,33]. To ensure that flow distribution on a DEM is consistent with real hydrography multiple techniques for so-called stream burning are developed. The standard approach to stream burning involves lowering DEM pixel values under the hydrographic features so that they accumulate most of the flow [34][35][36]. Wu et al. [37] modified priority flood procedure [38], which fills artificial depressions by least-cost path search technique, to implement drainage network extraction algorithm with drainage enforcement along existing hydrographic features. Lindsay [29] developed the elaborate procedure for stream burning which redirects the flow on a DEM surface in a way that concentrates its accumulation along given vector hydrographic lines. While his method is effective for hydrological applications, it does not displace terrain features encoded in the source DEM elevations and therefore is not suitable for solving the problem from a cartographic point of view.
A mapping-oriented approach to automated extraction of hydrographically corrected contours has been recently presented by Arundel et al. [39]. Their method conflates National Elevation Dataset (NED) raster DEM with independently produced National Hydrography Dataset (NHD). It does that by reinterpolating DEM pixel centers as elevation points with smoothing and drainage enforcement along NHD features using the ANUDEM program [40]. While the method showed itself effective to produce contours aligned with hydrographic data, it still uses the direct approach to burn streams into the surface. Therefore, it tends to be ineffective when elevation and hydrography data are poorly aligned [39].
Yadav and Hatfield [41] explored the alternative case when the DEM is considered to be of a better quality than vector hydrographic network. They offered DEM-Stream-Conflation (DSC) algorithm which identifies the streams corresponding to real hydrographic features by tracing them automatically from the start point of each hydrographic line. However, if the data are not aligned, the start point (the source) of a vector hydrographic feature may fall within the wrong watershed. The method also does not consider the case when tracing the corresponding stream is not possible due to different drainage pattern on a DEM.
Drainage network is traditionally paid a special interest in DEM generalization research since it is one of the most desirable features in terrain analysis and therefore should be adequately represented after generalization [42]. Structural lines, mainly streams and ridges, but also other breaklines, form a skeleton of terrain, which can be explicitly treated during generalization to derive more geographically sounding results. Weibel [43] proposed an adaptive methodology combining filtering and structural generalization in which generalized version of DEM is derived by simplification of structural lines when the desired degree of generalization is significant. Research conducted by Fan et al. [44] and Leonowicz et al. [45] showed that filtering can be guided by terrain slope, curvature or stream proximity to enhance representation of ridges and valleys in generalized DEM.
There is a family of drainage-constraining DEM generalization methods, which allow preservation of selected streams based on constrained triangulation [46][47][48]. Drainage network has its natural companion-a system of watershed lines that represent hydrologically important ridges. Algorithms proposed by Jordan [49] and Ai and Li [50] are based on the idea of filling the small watersheds and thus removing minor valleys from the DEM. Samsonov [51] evolved this approach and combined it with adaptive filtering which ensured that remaining valleys and watersheds are sufficiently wide. Chen et al. [52] used tree model construction considering the nested relationship of watersheds to extract representative terrain feature points for generalization.
Drainage-constraining DEM generalization algorithms rely on drainage network derived from DEM using automated methods such as masking or tracing the flow accumulation raster [27]. However, this approach does not guarantee that extracted streams represent the features in the generalized hydrographic network. As a solution to this kind of problems, Gaffuri [53] developed a general GAEL model for simultaneous agent-based generalization of vector features and fields. For the particular case of vector hydrography and elevation field his approach is based on determination of spatial relations between hydrography and elements of triangulated DEM in the source dataset. If a river is shifted during generalization, then these changes can be propagated to elevation field by corresponding shift in triangle coordinates. However, this is only possible when the elevation and hydrography data are initially aligned, which is not the case investigated in our research.
Our overview shows that numerous methods to resolve inconsistencies between DEM and vector hydrography have been developed. However, existing approaches are not motivated by visualization quality, and therefore do not involve spatial conflation of DEM and hydrographic lines. At the same time, data combined on maps often come from different sources. Existing free spatial databases such as Natural Earth [31] and OpenStreetMap [21] do not include layers for representing land elevation. Missing elevation data can be easily extracted from global DEMs, such as GMTED2010 [54] or GEBCO_2019 [30], but the resulting elevation layer will be misaligned with hydrographic vector features from the database, since these datasets are generated independently. Eventually, this will lead to geographically incorrect and aesthetically unpleasing cartographic representations. Therefore, the development of a specialized method for conflation of DEM with reference hydrographic lines is needed. This method should be automated as much as possible and should be embeddable into cartographic generalization processes.

General Principles and Workflow
We have developed a new method that performs conflation of DEM with reference hydrographic lines. The method is based on the following principles: 1. A resulting terrain surface represented by elevations in conflated DEM must be spatially adjusted with reference hydrographic lines. 2. Conflation is performed by displacing the elevation data. No new terrain features are created or burned into the surface. Reference lines are not considered to be of a better quality than DEM, but have primary importance in the conflation process and therefore remain at their locations. 3. Elevation data must be represented by vector features, either points or lines. Both raster and triangulated DEMs can be easily represented as a set of elevation points and lines without loss of information. Also, linear representation can be used when breaklines represent a structural skeleton of the surface extracted from the source DEM to derive its generalized version. Thus, vector-based representation abstracts the format of elevation input and serves different conflation scenarios. 4. Displacing the elevation data is performed by rubbersheeting the corresponding vector features along links directed towards reference hydrographic lines and originating from the most similar paths on the DEM surface-counterpart streams (or counterparts). Each reference hydrographic line is associated with one counterpart stream. 5. Counterpart streams are automatically extracted from the source DEM and must comprise a topologically correct network similar to the network of the ordered reference lines. A method of extraction of counterpart streams must be robust in case of existing errors in DEM and hydrographic lines (artificial depressions, incorrect line directions) as well as in cases of non-standard stream configurations (braided streams, deltas, channels). 6. Conflated DEM is reconstructed from the rubbersheeted elevation data.
The developed conflation method itself consists of seven stages, which are illustrated in Figure 2 The following paragraphs explain each stage of the method in detail.

Ordering of Reference Hydrographic Lines
Ordering is a pre-processing stage of the method, which is executed to reveal existing topological relations between hydrographic lines and to organize them into the hierarchical structure, which minimizes the number of elements in hydrographic network and establishes subordination between them. Ordering is needed to provide inambiguous sequence in which counterpart streams are traced, and to enable replication of topological relations existing in a network of reference lines by similar relations in a network of counterpart streams.
To establish hierarchy of the streams we applied Hack ordering procedure [55] also known as natural stream ordering. According to this procedure the longest upstream path from each outlet in a river network obtains 1st order. Next, longest tributaries of 1st order streams are traced upstream and the resulting lines are assigned 2nd order. The process continues until all hydrographic lines are chained into the streams of a known order. Additional information such as drainage basin area (or flow accumulation) can be used to weigh stream lengths and prioritize rivers with large basins. The main advantage of Hack stream ordering is that it establishes subordination between rivers in a network. Also, it minimizes the number of streams by joining them during the ordering process, which reduces the number of counterparts that need to be extracted. The standard Hack ordering scheme needs to be refined for river sections with braided streams, because all distributaries inside the braid are considered to be of the same order. In a modified scheme, the longest stream passing through the braid keeps its order, while its tributaries are classified as if they do not outflow from the main stream. Therefore, direct tributaries of the main stream receive the same order incremented by one, and the process continues recursively for tributaries of tributaries until the whole braid is processed. The ordering scheme refined in such way is called modified Hack ordering in this paper.
Ordering of the streams relies on the knowledge of their initial topological relations, which requires an additional preliminary processing step. Therefore, the ordered network of reference hydrographic lines is obtained using the following sequence of steps: 1. Split hydrographic lines at intersections and construct a raw stream network with stream relations and outlet nodes identified. 2. Reorganize the raw stream network using the modified Hack ordering. 3. Describe the topological structure of the resulting network in a tabular form.
The result of the first step is illustrated in Figure 3. Three special stream configurations are highlighted in this figure. Braided streams, deltas (a) and channels (c) will have an additional constraint on preservation of bifurcation topology during the search for their counterparts. Incorrectly digitized streams (b) are treated per se. Our pre-processing algorithm does not try to identify and correct such cases, because this procedure is unreliable. A possible misalignment between hydrographic lines and DEM does not allow using the underlying elevation information to detect the correct direction of the stream. Fortunately, as we will show later, this issue does not diminish the quality of conflation, since the least-cost path strategy can be effectively used to trace counterparts against the flow direction on DEM surface. The resulting network of reference lines which establishes the modified Hack ordering of the input hydrographic lines is represented in Figure 4. This network is supplemented by Table 1 that summarizes the topological relationships between streams and defines the sequence in which their counterparts should be traced. The table contains six variables: There are also two types of streams with respect to bifurcation process: • main (streams with BIFUR = −1); • distributary (streams with BIFUR = −1).
These stream types are encoded in the TYPE variable of Table 1 and in the legend of Figure 4. Non-zero values of the CONFL and the BIFUR variables establish dependencies in the network of reference hydrographic lines. In particular, if CONFL = j and/or BIFUR = k for some stream with ID = i, then i is considered subordinate to j and/or k, which means that location of its start and/or end point is tied to these streams. Consequently, j and/or k are considered to be superordinate to i. Preserving the topological relations in hydrographic network requires that counterparts of superordinate streams must be extracted prior to the counterparts of subordinate streams.  The ITER variable in Table 1 establishes the sequence in which this process can be arranged. The algorithm for generating the sequence starts from setting ITER = 1 for the streams with CONFL = −1 and BIFUR = −1. Then the corresponding streams are excluded from the list and iterations begin starting with i = 2 as cycle variable. At each i-th iteration ITER = i is set for the streams with CONFL ⊂ ID and BIFUR ⊂ ID. The corresponding streams are excluded from the list and iterations continue with i = i + 1 until all streams are processed.
As can be seen from Table 1, the number of iteration is not necessarily equal to the modified Hack order. Moreover, it does not depend on the type of the stream. At the same time, the order in which the streams with equal ITER are processed during each iteration does not matter, since these streams are independent from each other. With the sequence of iterations defined, we may proceed to the core stage of the method-extraction of counterpart streams.

Extraction of Counterpart Streams
The term counterpart stream is used in this paper to name the path on DEM surface most similar to the reference hydrographic line (hereinafter, the definitions of the terms are highlighted using the bold font). Since similarity is a general concept, we must now specify it in a way that allows automated evaluation. This is done in terms of distances defined in the following paragraph.

Distances
Given the two subsets A = {a} and B = {b} of a metric space M with distance d defined, the Hausdorff Distance d H (A, B) is calculated as [56]: In practice, where A and B are finite point sets, the discrete version of is calculated as an approximation of sup b∈B inf a∈A d(a, b). Finally, the discrete Hausdorff distance is derived as: The distance d H is called the Directed Hausdorff Distance, since it measures the maximum nearest neighbor distance from one set to another, but not the vice versa. Therefore, generally d H (A, B) = d H (B, A). Since both d H and d H are minimax measures, they are sensitive to outliers is sets. This property is useful to limit the maximum spatial deviation of a counterpart stream and a reference line from each other.
The Modified Hausdorff Distance d M introduced in [57] for image matching is based on the similar idea, but replaces the maximum of nearest neighbor distances with their average: where |A| and |B| mean the number of elements in A and B respectively. In comparison to d H and d H , d M is insensitive to outliers and provides the knowledge about the average proximity between curves. This property is useful to select the counterpart stream among multiple candidates. The three distances described above are not the best measures of proximity between two paths, because they do not respect the continuity of lines. A stronger proximity measure for comparing two lines is the Fréchet distance defined as [58]: 1] are the parametric descriptions of lines and α(t), β(t), t ∈ [0, 1] are so-called reparameterizations that are optimized to find such u = α(t) and v = β(t) that minimize the maximum distance between A(u) and B(v) for the common t. The parametric description implies that A(0) is the start point of A, A(1) is the endpoint of A, and A(u) represents some intermediate point on the curve, which moves continuously along A from its start until its end when u changes from 0 to 1. Being defined in a complicated way, the Fréchet distance is usually explained as the minimum length of the leash that is needed for a man to walk his dog, while both of them follow their separate ways and these ways must be traversed from the start to the end. While the Hausdorff distance and its modifications can be calculated in a straightforward manner, the Fréchet distance is not so easy to calculate. Methods developed by Alt and Godau [59] and Eiter and Mannila [60] can be used to obtain d F and its discrete version. The latter is represented as a short algorithm which we use in the current paper. With these distances defined, we may proceed to formalization of conditions that can be used to trace the counterpart of a single reference line. After these conditions are defined, we will briefly discuss how they should be extended in case of multiple reference lines comprising the network.

Single Reference Line
A counterpart stream candidate s for the reference hydrographic line h is the path on DEM surface that satisfies the following three conditions: where r is a distance threshold called the catch radius.
The first condition in (1) means that s must start not farther than r from the first point of h. All locations that satisfy this condition are within the circle-shaped buffer zone of h(0) called the start neighborhood and induced by offset distance r from h(0). The second condition in (1) sets the similar constraint that relates the end points of s and h. Correspondingly, the circle-shaped buffer zone of h(1) is called the end neighborhood. Finally, the third condition in (1) requires that s may deviate from h not farther than r. All locations that satisfy this condition are within the buffer zone of h called the counterpart stream corridor and induced by offset distance r from h.
The definition of counterpart stream candidate is illustrated in Figure 5. Since d H (s, h) limits only the distance of a counterpart candidate from the reference line, but not the opposite, it is the weakest possible distance constraint. The quality of the candidate can be additionally restricted by using the Hausdorff and the Fréchet distances. In particular, we distinguish the three following classes: It is important to note that if weak constraint is applied, it is still possible that counterpart candidate fits strong and/or regular constraints, but that cannot be guaranteed. For that reason, these classes are used not only for selection of candidates, but also for assessment of the quality of resulting paths ( Table 2). Let S = {s} be a set of counterpart stream candidates. To formalize the conditions used to select counterpart stream from S we need to represent each candidate as a parametric curve indexed by t: Let p = (x, y) be a tuple of coordinates representing the center of a pixel (a DEM node) and F(p) = F(x, y) be a flow accumulation field. The flowline counterpart stream c is then selected by finding s that satisfies the following three conditions: The first two conditions in (2) constrain s to be a member of the drainage network on DEM surface: it is required to pass through the pixels with flow accumulation greater than the defined threshold The third condition in (2) forces s to be the closest candidate to the reference line in terms of the Modified Hausdorff Distance.
The technical details of flowline counterparts extraction can be found in the Appendix A. In particular, Algorithm A2 contains the pseudocode which implements the developed method. An example of one extracted flowline counterpart stream is shown in Figure 6 (parameter values and the data are the same as in the Results section). This figure additionally highlights DEM pixels belonging to the drainage network defined by the first condition in (2). The variety of spatial data produces situations in which flowline counterpart streams cannot be traced. Among the common cases inverse direction of a reference line and incorrect flow distribution on DEM surface caused by artificial depressions should be noted. Also, unfortunate selection of r and a parameters or using a strong or regular candidate constraint instead of weak may prohibit obtaining the flowline counterpart. To overcome this problem, our method includes an alternative least-cost path strategy explained below. Let be a parametric description of reference hydrographic line with h(0) as the start point and h(1) as the end point. Then the least-cost counterpart stream c is found by tracing the candidate s which minimizes the following functional: under conditions: where W(p) is the offstream penalty function defined as: The constant w ≥ 1 in Equation (4) called the offstream penalty weight is set as one of conflation parameters. It increases the cost of the path when it passes through pixels that do not belong to the drainage network on DEM (F(p) < a).
The offstream penalty function also prioritizes the paths with similar or lower elevations by multiplying w on the relative elevationẐ(p) in Equation (4). This variable is calculated as a vertical offset above the minimum elevation on DEM plus 1, which is added to ensure that the pixel with Z(p) = min(Z) does not automatically receive the zero cost.
In comparison to the flowline counterprarts, least-cost counterparts start and end exactly at pixels that contain the specified points, but considered to be of a lower quality, since they are allowed to pass through DEM pixels that do not belong to the drainage network. The technical details of least-cost counterpart extraction can be found in the Appendix A. In particular, Algorithm A3 contains the pseudocode which implements the developed method. An example of the least-cost counterpart stream is shown in Figure 7. Alongside the reference line and its counterpart, this figure contains the cost surface visualized with stretched color gradient. Pixels belonging to the drainage network have the cost value equal to 1 and are assigned the darkest color. The cost rapidly increases with the distance from the reference line and gradually decreases with elevation. This forces counterpart stream to be as close as possible to the reference line and to seek for the path that goes downslope.

Multiple Reference Lines
When multiple reference hydrographic lines are given as the input, additional efforts must be undertaken to preserve their topological relations. In particular, each confluence and bifurcation junction must be represented as a similar counterpart junction between resulting counterpart streams. Let h 1 and h 2 be the reference hydrographic lines, from which h 1 is superordinate and h 2 is subordinate. Also let c 1 and c 2 be their corresponding counterparts, and s 2 be some counterpart candidate for h 2 . Both counterparts and candidates consist of the points which are essentially the centers of the pixels through which these lines were traced. Then the following five topological rules represented in Figure 8 are applied: Topological rule 1. If h 2 is tributary to h 1 at some confluence junction γ, then the end neighborhood of s 2 must be centered at point v on c 1 which is closest to γ. For the flowline candidate it means that the condition d s 2 (1), h 2 (1) ≤ r transforms into d s 2 (1), v ≤ r. If the least-cost method is used for tracing, then the condition Resulting counterpart junction is labeled as γ in Figure 8a.
Topological rule 2. If h 2 is distributary from h 1 at some bifurcation junction λ, then the start neighborhood of s 2 must be centered at point u on c 1 which is closest to λ. For the flowline candidate it means that the condition d s 2 (0), h 2 (0) ≤ r transforms into d s 2 (0), u ≤ r. If the least-cost method is used for tracing, then the condition s 2 (0) = h 2 (0) transforms into s 2 (0) = u. Resulting counterpart junction is labeled as λ in Figure 8b.
Topological rule 3. If c 2 is flowline and does not have common points with c 1 , then it should be extended to reach c 1 exactly using the least-cost approach. Figure 8c illustrates the case in which h 2 is tributary to h 1 , but the similar rule applies when h 2 is distributary from h 1 . Resulting counterpart junction is labeled as γ in Figure 8c.
Topological rule 4. If c 2 = {p k , k = 1, ...|c 2 |} is tributary and has more than one common point with c 1 , then it should be replaced with c 2 = {p k ∈ c 2 | k = 1, ..., k min }, where k min is the smallest index of the common point. Resulting counterpart junction is labeled as γ in Figure 8d.
Topological rule 5. If c 2 = {p k , k = 1, ...|c 2 |} is distributary and has more than one common point with c 1 , then it should be replaced with c 2 = {p k ∈ c | k = k max , ..., |c 2 |}, where k max is the largest index of the common point. Resulting counterpart junction is labeled as λ in Figure 8e.
Applied in the described sequence, these rules ensure that the resulting counterpart streams will comprise the network with a topological structure similar to the topological structure of reference hydrographic lines. Technical details of application of these rules are explained in the Appendix A. In particular, Algorithm A1 contains the pseudocode of the main program which applies topological rules 1 and 2, then executes tracing of the counterpart stream, and finally applies topological rules 3-5.  Figure 9 illustrates how topological rules 1 and 4 act in practice. The counterpart for superordinate reference line is traced first. Since this counterpart does not pass through reference confluence junction, it is not reasonable to seek for subordinate counterpart that ends at this junction. Instead, the end neighborhood of this line moves to the closest point on superordinate counterpart, and the counterpart stream corridor extends, respectively. When the least-cost subordinate counterpart is traced, it enters the superordinate counterpart, and shares a short section with it until reaching the endpoint. This shared section is trimmed to remove overlap and to obtain the final counterpart junction.

Generation of Rubbersheet Links
Rubbersheet links are generated between each counterpart stream and its reference line to act as the local force vectors that will guide the rubbersheeting transformation of elevation data. Since each counterpart stream is extracted from DEM, its vertices are separated by the distance equal to the DEM pixel size R or √ 2R. To ensure the regular distribution of links, the vertices of each reference line are preliminarily densified so that the distance between them is not larger than R. Next, the links are generated as line segments between the vertices of the counterpart stream and the nearest vertices of the reference line. The technical details of this process are explained in the Appendix A. In particular, Algorithm A4 contains the pseudocode which implements the developed method. Rubbersheet links generated for a fragment of the counterpart stream from Figure 7 are shown in Figure 10.

Extraction of Vector Elevation Data
The subsequent explanations are supplemented by Figures 11 and 12, which illustrate the cases of generalized and non-generalized output. The example fragment of the source raster DEM with one reference line overlaid is presented in Figures 11a and 12a.
To perform DEM rubbersheeting based on generated links, the source elevation data must be represented as vector features. This requirement allows to avoid dependency of processing workflow on the actual organization of the data, be it in a form of a raster DEM, triangulated DEM, structural lines, contours or point elevations. Therefore, at the current stage of processing, if no generalization is required, raster DEM nodes (pixel centers) may be converted into vector points. A corresponding point-based representation is shown in Figure 11b. A different approach to DEM vectorization should be taken when a DEM has to be generalized with preservation of valleys corresponding to the independently generalized hydrographic lines. To satisfy this requirement, we can use one of the drainage-constraining DEM generalization methods [45][46][47][48][49][50][51]. These methods have some differences, but all of them include extraction of primary streams that are preserved in generalized DEM surface. Therefore, to guarantee that the resulting DEM can be conflated with reference hydrographic lines, we must ensure that their counterpart streams are included into the set of primary streams.
In particular, one of the abovementioned methods [51] extracts three types of structural lines based on the flow accumulation and the stream length criteria: 1. Primary streams. 2. Watersheds of primary streams. 3. Watersheds of secondary streams (which are direct tributaries of primary streams).
These structural lines extracted from example DEM fragment for generalized case study and are shown in Figure 12b.

Rubbersheeting
With links generated and elevation data represented by vector features, a standard rubbersheeting approach can be used to conflate the elevation data with reference lines [2,3]. In comparison to affine, projective or polynomial type of transformation this method is local, which allows the precise adjustment of the DEM surface. However, since rubbersheeting warps the space not only along rubbersheet links but also at some vicinity, it is desirable to have the means of limiting the area of distortion. We implement this by reconstructing the conflation area A as a buffer of the polygon enclosed by reference lines, counterpart streams and rubbersheet links. Catch radius r is used as a default offset distance, but can be parameterized to another value. Conflation area is shaded by transparent gray color in Figures 11 and 12. The points generated along the boundary of this area define so-called identity links L I . Identity links are equally distributed with separation distance equal to DEM resolution. These points, which are depicted as white dots in Figures 11 and 12, establish locations that remain fixed during the rubbersheeting process and do not allow the transformation to be propagated outside the conflation area.
Examples of rubbersheeted elevation points and structural lines are represented in Figures 11c and 12c, respectively. Rubbersheeting is performed at once using the full set of generated links. It is important to note that outside of conflation area no rubbersheeting is performed and elevation data features remain at their locations.

Reconstruction of Conflated DEM
If the input digital elevation model is in a raster form, then the output of the conflation process can be derived by triangulating the rubbersheeted elevation data and rasterizing the obtained TIN surface. A note should be taken that if the output raster dimensions (at least pixel size and offset) are the same as the dimensions of the input raster DEM, then both the linear and the natural neighbor TIN rasterization methods will produce the same elevation values at raster pixels which are located outside of the conflation area. This is guaranteed since the centers of these pixels will be coincident with elevation points fed into the rubbersheeting algorithm.
Triangulations of conflated points and structural lines, as well as the resulting raster DEMs are represented in Figures 11d,e and 12d,e, respectively. Misalignment between the DEM and the reference lines is removed in both cases, while in the generalization case the presence of the counterpart stream is additionally enforced in the resulting surface.

Post-Processing
Generally, the previous stage finishes the horizontal conflation process. However, both cartographic and analytical purposes require that the DEM and the reference hydrographic lines must be aligned in vertical dimension. From a cartographic point of view, it means that elevations along each hydrographic line must decrease monotonically, i.e. there should be no depressions or hills on its pathway. Otherwise, the cartographic DEM representation will be unrealistic and aesthetically unpleasing. In the current implementation we perform a simple procedure that carves the positive terrain features existing in conflated DEM along each reference hydrographic line using the linear interpolation. The carved DEM can be additionally widened along the reference lines to ensure the visibility of corresponding valleys in DEM surface [45,51]. The final results obtained both by carving and widening procedures are represented in Figures 11f and 12f.
It should be noted that if the DEM is planned to be used in hydrological analysis, the constraints must be stronger. In addition to the monotonous elevation decrease, the hydrographic lines must be fully represented in a drainage network of the DEM. Since this task is usually solved by stream burning operation, the same procedure can be applied to the conflated DEM. The potential advantage of burning the conflated DEM is that it will not create the new negative terrain features in DEM surface, but will instead redirect the flow towards existing ones which are already aligned with reference lines.

General Workflow
The entire workflow for a raster DEM as the input data is represented in Figure 13. It handles both cases of non-generalized and generalized output during Extract vector elevation data operation, which can be adopted to the task. For the sake of brevity, some intermediate operators and data are omitted from the scheme. In particular, a preliminiary artificial depressions removal is needed to obtain high-quality flow direction and flow accumulation rasters from DEM. Also, a triangulated DEM and a raster DEM are the hidden results of Triangulate and Rasterize operations, respectively.

Results
We have implemented the method in Python programming language using numpy [61] and arcpy [62] modules. The software is open-access and can be downloaded from GitHub repository [63]. Implementation was tested on the example of the two widely used open-access spatial datasets: the GEBCO_2019 raster DEM [30] and the Natural Earth vector database [31]. The GEBCO_2019 Grid is the latest global bathymetric product released by the General Bathymetric Chart of the Oceans (GEBCO) in 2019 and has been developed through the Nippon Foundation-GEBCO Seabed 2030 Project. Its ground resolution is 15 arc-seconds, which estimates to approximately 500 m along equator. The Natural Earth dataset contains spatial data for mapping at 1:10,000,000, 1:50,000,000 and 1:110,000,000 scales. For our experiment we used hydrographic data from 1:10,000,000 level of detail as a set of reference lines (version 4.1.0). These lines are much more generalized than GEBCO model, which creates the case of distinctively misaligned data and is useful for testing the method.
Both datasets were clipped to the extent of the territory in Central Europe located mainly in the basin of Donau (Danube) river and represented in Figure 14. The size of the clipped DEM is 1358 × 1007 pixels, while the clipped river network contains 28 features. To enable processing in metric units, both datasets were transformed into the equidistant cylindrical (plate carrée) projection, which keeps the raster geometry unwarped. High-quality D8 flow direction and flow accumulation rasters were derived after processing the source DEM with depression breaching procedure [64]. To control the conflation process, we used the catch radius r = 6000, the minimum flow accumulation a = 10 and the offstream penalty weight w = 30 as optimal parameter values found through experiment. Weak candidates constrained by Directed Hausdorff Distance d H were considered by default.
Unique identifiers of reference hydrographic lines are labeled near each feature in Figure 14. Two features are of particular interest here: the distributary line with ID = 9, which forms a braided section of Donau river near Budapest, and the line with ID = 36, which is in fact two rivers from Elbe and Donau basins wrongly joined into one feature. We have deliberately not corrected this error to test the robustness of our approach in non-standard situations.
The map on the bottom part of Figure 14 represents the raw GEBCO_2019 DEM with counterpart streams overlaid. Since the Natural Earth data is already digitized according to the Hack order for cartographic purposes (with main streams being contiguous from the source to the outlet), we have not modified the initial geometry, but calculated stream orders and iteration numbers according to the pre-processing algorithm described in section 3.2. The derived counterpart streams comprise a topologically correct network with structure identical to the network of reference lines (the end of each stream is indicated by arrow). There are two types of counterparts indicated using a special symbol in Figure 14: distributary least-cost streams (ID = 9, 34) and extended flowline stream (ID = 17), for which the compliance with topological rules 2 and 3 was carefully ensured by the algorithm. Table 2 summarizes the results of extraction of counterpart streams. The first six columns are generated during hydrographic lines ordering stage and correspond to Table 1. The last five columns reveal the type and similarity of resulting counterpart streams to their reference prototypes. For each stream the Directed Hausdorff Distance d H , the Hausdorff Distance d H and the Fréchet Distance d F was calculated, and the quality of the counterpart was estimated in terms of classes introduced in Section 3.3.1. Even though weak constraint was used to filter candidates, all streams except one fall into the strong class limited by the Fréchet distance, and the values of three distances are mainly quite close to each other. This result advocates our decision to use weak constraint as default. Despite that similarity measures other than the Fréchet distance do not respect the continuity of lines, their application can be justified by practical considerations: weaker constraints can be satisfied by simpler and faster processing procedures, which is important in case of large datasets. The fact that counterpart stream with ID = 15 does not satisfy regular and strong constraints is caused by applying the topological rule 4: this counterpart reached the superordinate one outside end neighborhood and its final overlapping section was trimmed.
The top image in Figure 14 is a map with hydrography data overlaid upon the conflated DEM, which was generalized by structural method [51] to match the level of detail of hydrographic data. The minimal length of a primary stream was set to 45 DEM pixels, while for a secondary stream it was set to 15 DEM pixels during generalization. The resulting map in Figure 14 clearly demonstrates that the generalized DEM is perfectly aligned with the reference hydrographic lines, and a high-quality cartographic output can be achieved using our method. The map also demonstrates that the abovementioned incorrect line with ID = 36 poses no problem to the method, and the resulting DEM is aligned with this reference line. To illustrate the effectiveness of our method in detail we have selected four regions conventionally named after the largest city located inside. These regions are indicated by rectangles at the bottom map in Figure 14. The first two regions, I. Linz and II. Prague are represented in Figure 15, while the remaining two, III. Bratislava and IV. Budapest, are shown in Figure 16. Both figures are arranged in a similar layout, where the top image shows the source DEM, the middle image shows conflated DEM, and the bottom image shows conflated generalized DEM. The minimal length of a primary stream during generalization was set to 9 DEM pixels, while for a secondary stream it was set to 3 DEM pixels. In all images reference hydrographic lines are the same. These cartographic representations prove the effectiveness of the developed method both for non-generalized and generalized output. While in case of the source DEM the misalignment with reference lines is evident, this problem is effectively solved by conflation procedure. Resulting DEM can be used to produce high-quality cartographic images at different scales. Finally, we have numerically estimated the alignment between DEM and reference lines using Cohen's Kappa index of agreement [65]. For this we extracted a drainage network from both the source and the conflated DEM using a = 10 as the minimum flow accumulation threshold. Resulting streams were expanded by one pixel in both cases to compensate for inavoidable discrepancy between the TIN and the resulting raster elevations conditioned by a limited pixel size. Then the reference hydrographic lines were rasterized on the same raster grid and the overlap between the rasterized lines and the expanded streams was analyzed. The resulting kappa statistic depends on the fraction of reference line pixels contained in the expanded set of drainage network pixels. Table 3 includes the identifier of each stream, the values of kappa index of agreement κ 0 and its asymptotic standard error (ASE)σ 0 for the source DEM, and similar statistics κ c andσ c for conflated DEM. Resulting values show that our method systematically improves the alignment between the reference hydrographic lines and the drainage network implicitly represented in DEM surface. The average value of index of agreement increased from 0.56 (moderate) to 0.98 (almost perfect) as a result of conflation. Thus, both the visual and numerical assessment of results indicate that the goal of our research has been achieved.

Discussion
The current section provides information which is necessary to facilitate the practical application of the developed method and to understand its limitations. In particular, we analyze the method from four perspectives: parameterization (how to select the parameters), processing time (how long does it take to process the data), displacement and accuracy (how much distorted the resulting DEM is and how to minimize the distortion), and robustness (how effectively the method performs on different data). All conclusions are based on elevation and hydrography data used in the Results section. Table 3. Statistics of agreement between the reference hydrographic lines and the drainage network extracted from DEM.

Parameterization
Catch radius r limits the maximum spatial deviation of a counterpart stream from its reference prototype. The minimum possible value for catch radius is equal to DEM pixel size R. In such case counterpart streams will follow DEM pixels exactly under the reference lines. The values of r smaller than R will prohibit extracting of counterpart streams except in artificial case where reference line follows exactly the centers of DEM pixels. Larger values of r will guarantee that counterpart stream will be extracted, but finding the flowline counterpart can be problematic if r value is set close to R.
Since the spatial precision of counterpart extraction is limited by pixel size of the DEM, we recommend setting the catch radius as R multiplied by some positive integer value k: In our experiment we used r = 6000 meters, which for R = 500 corresponds to k = 12. The default value in the developed conflation software is set to k = 10. However, this value should be refined by the user based on the visual assessment of actual misalignment between the DEM and reference hydrographic lines. Excessively large k may lead to identifying the unreliable counterparts. Since r also defines the extent of the conflation area, increasing its avlue will also lead to the warping of a larger part of DEM surface, which may be undesirable.
Minimum flow accumulation a defines the lower threshold for possible magnitudes of counterpart streams. The higher the value of a, the more significant paths in a drainage network will be identified as counterpart streams. Since counterpart streams must start and end within the neighborhood of the start and the end point of a reference line, setting a to a large value may lead to situations where no drainage network pixels will appear inside the neighborhood, and only least-cost counterpart can be found in such case. Therefore, if a flow accumulation raster represents the number of pixels drained upstream, the following inequality can be used as a rule of thumb when selecting the value of a: a ≤ k In our experiment we used a = 10, which met this recommendation and worked nicely to extract dense and plausible drainage network suitable for conflation purposes.
Offstream penalty weight w defines how strictly the least-cost counterpart will follow the drainage network defined by a. The smaller the value of w, the easier would be for algorithm to "jump" from one stream to another in pursuit of the shortest path. Since such behavior is generally illegal for a drainage path, the value of w should be set large to increase the penalty. Based on the practical experience, we recommend setting w as 10 multiplied by some positive integer value m: In our experiment we used w = 30, which corresponds to m = 3. The same value is set as default in the developed conflation software. Lower values increase the frequency of situations where a counterpart stream follows exactly the reference line or jumps from one drainage path to another, which leads to geographically unreliable results.

Processing Time
The total processing time to conflate the example GEBCO_2019 DEM used in the Results chapter (without generalization) according to the workflow in Figure 13, was approximately 21.5 min. Its structure is represented in Figure 17a. The process is initialized by the ordering of lines and calculation of everything that is neeeded to derive the flow accumulation raster (including depression breaching and flow direction), which took approximately 1.5 min. The following stage, extraction of counterpart streams, is the most computationally expensive and took 17 min (79% of the total processing time). The remaining stages of conflation until post-processing took 2.5 min, and the post-processing was finished in 0.5 min.
Execution of exactly counterpart tracing functions took about 9 min, while the remaining 8 min of counterpart extraction was taken by supplementary actions, including quality assessment. Flowline counterpart candidate tracing (which is executed in obligatory order for each reference line) performed much faster in current implementation than least-cost counterpart tracing (33 s and 8 min 29 s in total correspondingly). It is worth noting that flowline_counterpart() was programmed in pure Python, while leastcost_counterpart() consists of calls to the compiled arcpy map algebra functions that have some execution overhead.
Duration of tracing the flowline counterpart streams showed power law dependency on the value of catch radius r. The change in execution time of flowline_counterpart() function for the reference line with ID = 13 is represented in Figure 17b (r is scaled to kilometers for convenience). At the same time, duration of least-cost counterpart tracing showed no dependency on catch radius. The process of counterpart streams extraction is easily parallelized and can be rewritten using a high-speed compiled language like C++ to dramatically reduce the overall processing time.

Displacement and Accuracy
Since DEM conflation involves rubbersheeting of elevation data, the question arises how much surface displacement is observed in horizontal and vertical dimensions as a result. To answer this question, we selected elevation points located inside the conflation area (~29% of all points) and calculated the actual (dx, dy, dz) displacement vectors between their positions before and after rubbersheeting. The components (dx, dy) define the magnitude of the horizontal displacement dxy = dx 2 + dy 2 . The third component dz is a vertical displacement which is the difference between z value sampled by rubbersheeted elevation point from the resulting conflated DEM and the initial z value of that elevation point extracted from the source DEM. Therefore, vertical displacement is not measured at fixed location is space, but keeps the record of the vertical movement of the sampled surface point. Table 4 and Figure 18 present displacement statistics in tabular and graphical form. Since actual displacements are induced by rubbersheet links, we calculated statistics for (dx, dy) and dxy for these vectors too. Table 4 shows that the magnitude of displacement vector does not exceed the DEM pixel size in average with both mean and median of dxy smaller than 500 meters. Displacement vectors are generally shorter than rubbersheet links, which is expected, since actual force applied to each point is a superposition of multiple rubbersheet links, and the magnitude of the force decreases to the zero value near the identity links.
At the same time, there is a significant asymmetry in displacement, which is reflected in dx and dy components of the vectors. While dx is slightly negative and close to zero, dy has the pronounced positive value, which means that generally all rubbersheeted elevation points are moved in the positive direction of Y axis. This fact is clearly illustrated by the joint probability density plots for (dx, dy) in Figures 18a,b. A Gaussian kernel with bandwidth σ = 750 meters was used for estimation. The density surface for the horizontal displacement is more excessive around zero value, but both surfaces have a long tail in Y direction. This observation reveals a systematic shift between DEM and hydrography, which can be reduced by applying a global (affine) transformation of elevation data before rubbersheeting.
Though for both rubbersheet links and displacement vectors according to Table 4 the largest magnitude is close to catch radius (6000 m), the interquartile range (IQR = Q 3 − Q 1 ) for displacement vector magnitude is very compact and close to DEM pixel size. Empirical distribution function of dxy (Figure 18c) shows that 66% of all elevation points were moved to the distance equal or smaller then DEM pixel size, while 95% of all elevation points were not moved farther than 1480 m, which is approximately the distance covered by three DEM pixels.
The source of vertical displacement is the procedure during which resulting raster is reconstructed by interpolation on TIN surface. Since resulting pixel centers do not coincide with locations of rubbersheeted points inside the conflation area, the sampled values differ from those extracted from the original DEM. According to Table 4 both mean and median values of dz are very close to zero, while IQR defined by Q 1 = −1.52, Q 3 = 1.76 is so narrow that can be considered insignificant for a DEM with such a large pixel size.
Probability density plot for dz in Figure 18d confirms that most of the vertical displacement values are concentrated in a few meters proximity around zero. The effective range of dz defined by the values no further than 1.5IQR from Q 1 and Q 3 is about (−6.5, 6.5) and is indicated by boxplot whiskers in the bottom part of Figure 18d. Empirical distribution function of |dz| (Figure 18e) shows that 95% of all elevation points were not displaced in vertical direction farther than 27.42 m in absolute value, which is also quite acceptable for a coarse-resolution DEM covering mountainous areas. To conclude the analysis of displacement statistics, we can state that though rubbersheeting process can distort the surface significantly, the actual displacements appeared to be very moderate, which may potentially keep the DEM suitable not only for cartographic purposes, but for analytical applications too.
A discussion on analytical applications of conflated DEMs require us to switch from impartial notion of displacement to evaluative notion of accuracy. However, in the context of DEM conflation, when some parts of the DEM remain fixed, and another are displaced, it is not clear, how the change in accuracy of DEM should be estimated. Whether the displacement provides a robust estimate of accuracy change, remains questionable. The estimated change in acccuracy may also depend on which dataset is considered to have better quality-elevation or hydrography. At the same time, as we have seen from our own experience, there can be a systematic displacement between DEM and hydrographic lines. In such cases, adjusting the DEM only locally will decrease its accuracy due to unnecessary distortions, whether or not it is less accurate than hydrography.
The developed method is greedy in the sense that it seeks for exact superimposition of reference lines and their counterprarts, and therefore allows any displacement within the defined catch radius. For analytical purposes, it could be desirable to additionally limit the amount of displacement. That can be achieved both by making the catch radius smaller, or by constraining the maximum length of a rubbersheet link. In the latter case, all links longer than the defined threshold, can be forcedly shortened. Adaptive change of conflation area size depending on the local deviation of the counterpart stream from a reference line may also improve the quality of result and reduce unnecessary distortions. It is also clear that rubbersheeting of elevation data must influence the values of geomorphometric and hydrological characteristics obtained as derivatives of the conflated DEM. Analysis of such changes is out of the scope of the current research, which is focused on cartographic applications of DEM conflation. However, such investigation must be one of the primary directions of the further development and assessment of limitations of the proposed method.

Robustness
Robustness can be understood as the ability of the method to perform well in different situations, and to resist complexities that may arise due to combination of input data and selected parameters. The properties of the input data affecting the performance of the method can be considered from quantitative and qualitative perspectives.
We relate the quantitative aspect of input data to its dimensionality, which includes spatial domain, level of detail and size.
Spatial domain defines a subset of space covered by spatial dataset. For reference hydrographic lines this domain is equal to the geometry of lines, while for a DEM it is comprised by all pixels with data values (excluding NODATA pixels). Since conflation can be performed only in areas that contain both elevation and hydrographic data, this issue is managed by cropping the hydrographic lines to DEM domain.
Level of detail of hydrographic and elevation data can be defined as amount of information represented in these datasets per unit area. It depends both on spatial resolution (point density and pixel size) and peculiarity of line/surface representation given the spatial resolution. In the Results section we demonstrated that a mismatch between the detailed elevation and the generalized hydrography is effectively handled by the developed conflation method. However, additional experiments are needed to assess the quality of results in the opposite situation, where hydrography is more detailed.
Size of the data affects the time of processing. An obvious solution to manage large datasets is to subdivide them into parts (tiles) and process the tiles separately. Currently, we do not have such solution, and this performance issue will be one of the most important directions of future research. As we expect, the problematic part of implementation will be resolution of boundary effects, which will appear due to independent rubbersheeting of neighboring DEM tiles.
A qualitative aspect of input data relates to its topological correctness, and is mainly inherent to the hydrographic component of DEM conflation. The developed conflation method does not impose any restrictions on input hydrographic lines. If the lines are incorrectly digitized or do not comprise a topologically correct network, the method will still work. This kind of robustness is achieved by resilient organization of processing workflow. On the ordering stage, non-empty topological relations between hydrographic lines are respected, but are not required to exist. If the line is isolated (does not have common points with another lines), it will just receive 1st Hack order. On the counterpart extraction stage, the tracing process itself is separated from application of topological rules, and therefore is also independent from spatial relations of input lines. A counterpart stream is always traced in the direction of reference line, even if it is digitized in a wrong direction. This property guarantees that rubbersheet links will be distributed in the correct order between the start and the end points of a counterpart and its reference line. The remaining stages of conflation until post-processing does not involve hydrographic lines at all. If the input hydrographic lines are carefully prepared, the method will perform conflation in full compliance with their structure. Therefore, we can say that it is able to take the best properties of input and is error-resistant.
At the same time, there are some situations, in which the current implementation of conflation workflow will suffer defeat. Since the standard map algebra operation used to extract least-cost counterparts cannot respect either the Hausdorff or the Fréchet distance to a given line, only weak counterparts constrained by the Directed Hausdorff distance can be derived. Figure 19 provides the example of the situation, where the derived counterpart is unreliable. In such configurations, the river meanders are densely packed, which, given a large value of catch radius, enables an invalid cost path shortcut through a self-overlapping section of the counterpart stream corridor. A large section of the river is not reflected by the counterpart stream as a result. To extract stronger counterparts, more specific methods developed for map matching and able to construct a shortest path constrained by the Fréchet distance [9,10] can be used instead. However, since corresponding algorithms are often based on dynamic programming and recursive procedures, it is also needed to estimate their computational performance in case of digital elevation models consisting of thousands and millions of pixels. Finally, it must be stressed that if spatial adjustment of hydrographic lines is more preferable due to the higher accuracy of DEM, it can be easily achieved using the developed method. For this, rubbersheet links are flipped and hydrographic lines are rubbersheeted instead of elevation data.

Conclusions
Digital elevation models are commonly used in conjunction with hydrographic data for spatial analysis and mapping. The quality and precision of the results depends to a high degree on spatial alignment between the corresponding datasets. At the same time, the increasing variety of spatial data sources makes misalignment between heterogeneous data a widespread problem that is usually solved by conflation. To date, the specific case of conflation between digital elevation models and hydrographic lines has not received the desired attention. We have developed a comprehensive solution to this problem, which includes methodology, algorithms and software.
The main insights gained during our research can be summarized as follows: 1. The conflation process can be based on identification of the streams on DEM surface most similar to reference hydrographic lines-counterpart streams. Multiple measures can be used to estimate similarity and constrain counterpart candidates, including the Directed Hausdorff, the Hausdorff and the Fréchet distance, but the weakest constraint (the Directed Hausdorff) can be used by default for practical reasons. 2. To preserve topological relations of reference hydrographic lines they must be ordered. A proper ordering allows establishing unambiguous sequence of counterpart extraction, during which the subordinate and superordinate relations reflecting the topology of reference network can be ensured. We find Hack ordering to be convenient for this purpose, with refinement on distributary streams in braided river sections. This ordering also minimizes the number of counterparts to be traced. 3. A combination of flowline and least-cost approaches allows extraction of counterpart streams in a variety of cases, including the non-standard ones, such as braided streams and wrong direction of hydrographic lines. 4. Topological relations between ordered reference hydrographic lines can be reflected by the network of counterpart streams using a series of topological rules. These rules ensure the correct location of counterpart stream junctions. 5. Performing conflation on vectorized elevation data (points or structural lines) allows abstracting from the DEM format and applying the standard rubbersheeting algorithms. Another advantage of vector-based approach is easy integration of conflation stage into structural DEM generalization workflow.
Our method effectively aligns DEM with reference hydrographic lines, which is demonstrated on multiple cartographic examples and substantiated by significant improvement of agreement statistics between the reference hydrographic lines and a drainage network on DEM surface. In comparison to existing conflation and stream burning approaches our method does not create new terrain features in DEM surface, but instead recognizes the existing features corresponding to hydrographic lines, which is a major advancement.
Digital elevation models conflated by our method can be used for producing high-quality maps combining elevation and hydrographic data at different scales. In future research we plan to improve the performance of method according to multiple issues highlighted in the Discussion section, and to investigate the possible applications of DEM conflation in hydrological DEM analysis.
Funding: This research has been funded by Russian Science Foundation (RSF) grant No 19-77-00071.

Acknowledgments:
The author would like to thank the three anonymous reviewers, whose detailed and thoughtful comments helped to significantly improve the quality of the paper.

Conflicts of Interest:
The author declares no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript:

Appendix A. Algorithms for Extraction of Counterpart Streams
A pseudocode of the main program that is used for extraction of counterpart streams is represented in Algorithm A1. Counterpart tracing and post-processing functions are represented in Algorithms A2 and A3 respectively. Algorithm A4 reveals the details of rubbersheet links generation. Algorithm A5 provides a brief reference to all functions used in Algorithms A1-A4, including those without pseudocode.
The main program (Algorithm A1) expects reference hydrographic lines H, unique identifiers ID, confluence identifiers CONFL, bifurcation identifiers BIFUR, elevation raster Z, flow direction raster FD, flow accumulation raster FA, minimum flow accumulation a, catch radius r and offstream penalty weight w as input parameters. Counterpart stream features C are the output of the program. The lists ID, CONFL and BIFUR are taken from Table 1 generated during the ordering stage. These lists, as well as the reference lines are fed into the program sorted by ITER variable from Table 1.
During the initialization stage (Algorithm A1, lines 1-5) the main program extracts the lists of start (SP) and end (EP) point coordinates of reference lines and creates the empty list of lists of counterpart stream coordinates C R . The supplementary variables c e and c s are created to store the coordinates of downstream and upstream superordinate counterparts. Then for each (n-th) reference hydrographic line (Algorithm A1, line 6) the algorithm does the following: . Since coordinate tuples returned by pixelneighborhood() function are always sorted by the distance to the central pixel, the procedure of a linear search for p can be finished at the first pixel in nb e which is a member of c e as well. This step is the implementation of topological rule 1. 3. If the current stream is distributary, then replace nb s with a similar pixel neighborhood around the closest point on superordinate upstream counterpart (Algorithm A1, lines [15][16][17][18][19][20]. This step is the implementation of topological rule 2 and uses the logic similar to the previous one.

Calculate the Euclidean distance raster E for H[n]
, which is needed on steps 6 and 7 (Algorithm A1, line 21). 5. Trace a flowline counterpart stream c which satisfies the conditions in Equation (2) (Algorithm A1, line 22). See explanations on the flowline_counterpart() function below and its representation in Algorithm A2. 6. If c is empty (no flowline counterpart candidate satisfies Equation (2)), then trace a least-cost counterpart which minimizes the functional in Equation (3) (Algorithm A1, lines [23][24]. See explanations on the leastcost_counterpart() function below and its representation in Algorithm A3. 7. If c is flowline, extend it to its superordinate counterpart (Algorithm A1, lines [25][26][27][28][29]. See explanations on the extend_path() function below and its representation in Algorithm A3. This step is the implementation of topological rule 3. 8. Ensure that c does not have intersections with its superordinate counterparts between its start and end point (line 30). See explanations on the trim_path() function below and its representation in Algorithm A3. This step is the implementation of topological rules 4 and 5. 9. Append c to C R (Algorithm A1, line 31).
After the cycle finishes, vector features with counterpart streams are built using the coordinates stored in C R (Algorithm A1, line 32).
Flowline counterparts are extracted by the flowline_counterpart() function which pseudocode is represnted in Algorithm A2. The function starts from creating the empty list c of counterpart coordinates (Algorithm A2, line 2) and initializing the lowest Modified Hausdorff Distance D for currently selected counterpart candidate with infinity (Algorithm A2, line 3). Then for each pixel p in the start neighborhood nb s (Algorithm A2, line 4) it performs the following sequence of steps: 1. If flow accumulation at p is larger than minimum value a, then proceed to the next step (Algorithm A2, line 5). 2. Trace counterpart stream candidate s starting from p using the trace_stream() function (Algorithm A2, line 6). If the stream does not reach the end neighborhood nb end , then empty set is returned. See explanations on the trace_stream() function below and its representation in Algorithm A2. 3. If s is not empty and d H (s, h) ≤ r, then proceed to the next step (Algorithm A2, line 7). If only regular or strong candidates are of interest, then d H (s, h) should be replaced with d H (s, h) or d F (s, h) respectively at this stage. 4. Calculate D = d M (s, h) (Algorithm A2, line 8). If D < W, then proceed to the next step (Algorithm A2, line 9). 5. Replace c with s (Algorithm A2, line 10) and D with D (Algorithm A2, line 11).
After iterations are finished, c is returned as a result (Algorithm A2, line 12). If no counterpart candidate has been found, then empty set (the initial value of c) will be returned.
The actual tracing of the flowline stream candidate s is performed by the trace_stream() function, included in Algorithm A2. This function starts from the point p and tries to reach the end neighborhood nb e by moving in flow direction. This is performed iteratively with the following sequence of actions at each iteration (Algorithm A2, line 17): 1. If the current pixel p belongs to nb e , then it is checked if p is closer to the center of nb e than any previously traced pixel in s. If so, then its number is stored in k end as the end point of the stream. In addition, the end mode indicating that we entered nb e is set up by end ← TRUE (Algorithm A2, lines 18-23). 2. If the current pixel p does not belong to nb e , but previously we have entered it (end = TRUE), then iterations stop (Algorithm A2, lines [24][25]. After iterations stop, k end is checked. If it is empty, then it indicates that the end neighborhood has not been reached, and the empty set is returned by the function. Otherwise, s is trimmed to k end and returned as a result (Algorithm A2, lines [33][34][35][36]. Least-cost counterparts are extracted by the function leastcost_counterpart() included in Algorithm A3. The function contains a series of map algebra expressions which operate rasters of elevation (Z), Euclidean distance to reference line (E) and flow accumulation (FA). The first three operations (Algorithm A3, lines 2-4) are local, which means that raster arithmetic is performed pixel-wise [66]. Offstream penalty raster W is calculated by conditional expression implementing Formula 4 (Algorithm A3, line 3). The cost raster is also generated by conditional expression that combines the functional in Formula 3 with restriction on passing farther than catch radius r (Algorithm A3, line 4). Finally, the resulting counterpart stream c is extracted by cost_path() function, which solves the standard cost path map algebra operation [67] using cost as a friction layer (Algorithm A3, line 5).
Algorithm A5: List of functions. 1 append(L, p) appends p to the end of the list L