This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

Recent technologies allowed a major growth of geographical datasets with different levels of detail, different point of views, and different specifications, but on the same geographical extent. These datasets need to be integrated in order to benefit from their diversity. Conflation is one of the solutions to provide integration. Conflation aims at combining data that represent same entities from several datasets, into a richer new dataset. This paper proposes a framework that provides a geometrical conflation that preserves the characteristic shapes of geographic data. The framework is based on least squares adjustment, inspired from least squares based generalization techniques. It does not require very precise pre-matching, which is interesting as automatic matching remains a challenging task. Several constraints are proposed to preserve different kind of shape and relations between features, while conflating data. The framework is applied to a real land use parcels conflation problem with excellent results. The least squares based conflation is evaluated, notably with comparisons with existing techniques like rubber sheeting. The paper also describes how the framework can be extended to other geometrical optimization problems.

Together with the developments of GI technologies and Web 2.0, the number of available geographical datasets has increased tremendously in the past years. Among this diversity, data quality, level of detail, modeling choices, and content may greatly differ. In order to enable or expand complex spatial analyses, trying to integrate datasets from different sources can be very attractive but quite difficult. Two options are possible to integrate data from different sources, multi-representation databases, and conflation. Multi-representation databases allow us to model real world objects with different representations according to scale or point of view. The integration in such a database implies to match the representations from the different sources [

Conflation aims at combining data that represent the same real world entities, from several datasets, into a richer new dataset that contains information that cannot be obtained from any of the single datasets alone [

The shape of geographic data conveys essential implicit information for analysis (e.g., curvature, sharp angles). Conflation deformations may transform the initial shapes and convey misleading information. For instance, a straight road should not become curved and building should not lose their right angles. Added to that, spatial relations, like the relative position of a building to a dead-end road, should not be destroyed by the conflation process. As a consequence, finding a deformation model that takes shape and spatial relations into account, when absorbing deformations, is key topic for the improvement of conflation techniques. The paper tries to tackle this key topic by proposing a deformation model that balances snapping considerations with preserving shape and spatial relations considerations, thanks to least squares optimization.

The second part of the paper describes related work on conflation. The third part proposes a model to conflate data by least squares optimization in order to maintain the initial shape of data. The fourth part presents experiments on a use case with land use data. Part five draws some conclusions and describes future work.

Conflating data requires, first, to match data that reflects the same real world entity, then to merge their geometry and their attribute data. Although the paper does not focus on data matching, this section describes previous work on the automatic matching of geographic data, as it is a necessary pre-condition of the presented geometry fusion method.

First, all data matching methods rely on some kind of similarity measures to figure out if the features represent the same real world entity [

The surface distance to assess two polygons similarity, the ratio between the areas of the grey surfaces (the intersection of polygons A and B and their union).

Data matching can be achieved by theme specific methods. For instance, as coastlines are very sinuous, similarity, and one-to-one matching, can be achieved using Fréchet distance [

Although geographic data can be very specific from theme to theme, efficient generic matching methods have been proposed in the literature. The method proposed by Walter and Fritsch [

The problem of geometry transformation, or snapping the least precise geometries on the most precise ones, has already been tackled by a few researchers. First, geometry fusion can be achieved by creating a new geometry that replaces the ones of the matched features with an average position [

Although it was proposed for frontier connection purposes, rubber sheeting can be applied to conflation to snap the less precise geometries on the most precise ones [

The principles of rubber sheeting conflation (

Finally, it is possible to model conflation in its entirety with a global framework, as Cobb

Nevertheless, as the issue of this paper is a framework for geometry fusion during conflation, which does not include the other steps of conflation, it focuses on the development of an alternative to rubber sheeting that allows preserving the shapes of conflated features, whether the features are man-made with straight shapes and right angles, or natural with smooth curved shapes.

The proposed conflation model was inspired by map generalization models that are briefly presented in the first section. The second section describes the principles of the least squares based conflation model, then details the constraints to maintain shape, the ones to conflate data, and finally explains how additional data can be propagated.

Least squares adjustment is, among others, a mathematical method to solve linear equations systems when the systems have more equations than unknowns, and are therefore unsolvable exactly. The approximate least squares solution for the linear system minimizes the sum of squared residuals (the difference from the exact solution for each equation) [

If we add a weighting matrix

This solution is obtained by matrix inversion and is the following:

The specifications that make a generalized map legible (e.g., the minimum size of symbols, the minimum distance between close symbols,

Two least squares based generalization processes were proposed independently and simultaneously some years ago [

How to translate generalization into a linear equation system.

Sester and Harrie proposed a small set of constraints that can be translated into linear equations on the vertices [

The proposed shape preserving conflation framework is based on the least squares generalization model presented in the previous section. The next section describes its principles; then, constraints to maintain shape, to conflate data and to maintain data consistency are presented; finally, additional propagation mechanisms are proposed.

The proposed framework requires a major hypothesis on input data: the datasets to conflate have to be matched, at least minimally. A full feature-to-feature matching is not necessary (

Two sets of polygonal data to be conflated: two feature-to-feature matchings (red arrows) and two vertex-to-vertex matchings (blue arrows).

The principles of the least squares shape preserving conflation model takes the generalization model up. The unknowns in the linear equation system are the coordinate displacements ∆

In order to preserve the shapes of the conflated objects, specific constraints can apply to them, depending on the feature type: for instance, stiffness constraints can apply to buildings or urban land use parcels and curvature constraints can apply to rivers or natural land use parcels. However, a constraint is common to all feature types to avoid over displacements that could lead to horizontal positioning errors: the initial position of the conflated features should be preserved. It corresponds to the movement constraint described by Harrie [

As it is a very restrictive constraint, it should be very slightly weighted to allow some movements in the least squares solutions. For instance, the weight put in the

In order to preserve the shape of features that should only be translated and/or rotated, the stiffness constraint from Harrie [

Alternative preserving constraints are possible for stiff objects that are allowed to be distorted a little more than with the stiffness constraint: a side orientation constraint and a segment length constraint [

Nonlinear equations for segment length and orientation in a stiff feature.

The previous shape preservation constraints are mostly dedicated to stiff (or man-made) features like buildings, streets, or parcels. But considering features whose shape was crafted by nature like rivers, forests, mountain roads, or lakes, requires the definition of other constraints. We propose to use the curvature constraint introduced by Harrie [

Expression of the curvature preservation constraint.

To balance the constraints to maintain shape, it is necessary to introduce constraints that force conflation. We compute such constraints based on matches of features. Our least squares conflation framework requires these matches as input. Three kind of conflation constraints are proposed in the framework from very local action to propagation actions close to the rubber sheeting approach. For each one, the cases where it should be used are briefly discussed.

All constraints rely on the computing of displacement vectors from the features matched in the conflated datasets. For each matched feature, the feature matching is transformed into vertex matching: the vertices in the less detailed dataset are matched to vertices of the matched feature in the most detailed dataset (

Computing vector displacement from the partial matching of features (the textured features are matched).

In the first two proposed constraints, only the features close to matched points (

Conflation constraint 1: contribution of the displacement vector to the closest vertex of close features.

The second constraint is very similar to the first one as the constrained features and vertices are the same. The difference lies in the way the contribution norm is computed (Equation (7)), which absorbs more quickly the displacements. This constraint is preferred to the first one when the positional shift between datasets is large, as it avoids unexpected deformations.

The third constraint is inspired from the rubber sheeting conflation method and is applied on every vertex of every conflated feature. Every vector contributes in an inverse proportional way to the computation of the propagation in a given point (

Computation of the contribution of displacement vectors in a given point P_{0}.

For the three constraints, the linear equations on the constrained points are very simple, for a vector

The previous sections propose constraints that allow a conflation that preserve the initial shape of geographic objects. But geographic objects are not isolated in a dataset and share relationship, particularly topological and proximity relations. Our least squares based conflation framework allows preserving these geographic relations thanks to dedicated constraints; three constraints are presented in this section. None is dedicated to topology preservation as topology preservation is inferred in our framework: when features share geometries, their common vertices are only taken into account once in the adjustment and when geometries are transformed by adjustment results, one vertex displacement modifies all features that share the vertex.

The first presented constraint preserves proximity relations between conflated features,

Constrained Delaunay triangulation used to identify proximities: (

In order to preserve relations with features that are not part of conflation (for instance, additional features in the most detailed dataset), the same type of constraint can be adapted. The same neighborhood computation principle is used but equations are simpler in the derivative computation, as the vertices of the non-conflated features are fixed (the derivative of their coordinates is zero). In the test case presented in the results section, land use parcels are conflated with a more detailed dataset that contains precise city limits, in which the parcels are conflated and a road network with good positional accuracy. The proximity between parcels and roads is preserved so that parcels might not cross roads.

In addition to proximity relations, it may be important to preserve relative orientation relations (e.g., a building parallel to a road, buildings whose main orientation is perpendicular,

Examples of relative orientation and position relations to preserve during conflation.

As with the proximity relations, these relations should first be identified in the initial data, and then the constraints are applied to the related features. The relative position relation is identified each time there is a proximity relation (identified by triangulation) between features that are prone to this kind of constraints (e.g., a dead end and a building). The identification of relative orientation relations is also based on proximities and relies on the measure of the general orientation of polygons [

Constraint expression of the preservation of relative orientation and position relations.

The experiments will show that the least squares conflation framework may generate very large equation systems that could slow computation down or even prevent the solution from being computed. In order to avoid too large systems, the framework allows excluding features from least squares conflation and applying to the excluded features propagation mechanisms from the least squares solution. Two kinds of mechanisms are proposed, one for features topologically connected to conflated features and one for features that are inside or near conflated features.

Both mechanisms rely on the computation of vectors from the least squares displacements. Every vector contributes in an inverse proportional way to the computation of the propagation in a given point (

In order to allow a propagation on additional features topologically connected to conflated features, which preserves the topological connection, the framework identifies the topological relationships before the conflation. Therefore, topological connection points are added to the geometry points that are constrained. This addition is automatically done in the implemented framework, benefiting from the GIS capabilities of the platform. Then, the least square solution on the connected points is applied to both features. The remaining points of the propagated features are displaced as in

Propagation for topologically connected objects: the connected points are added to the system and a propagated displacement is applied to the remaining points (P_{3} and P_{4}).

For the additional objects that are not topologically connected to features conflated by least squares (

The decision on which features to propagate and which features to adjust requires knowledge of the use case. Therefore, we leave this decision to the user. However, some general rules to choose can be enunciated:

Only unmatched features should be propagated.

Small features and rigid features like should be preferred for propagation as such features often need fewer distortions.

Features inside conflated features are good candidates for propagation, as it provides accurate propagation vectors.

The shape preserving conflation framework has been tested thoroughly as experiments have been carried out on a use case with large datasets and real data. The next section describes the use case with land use parcels and buildings. Then, the implementation of the framework is presented and scaling issues are discussed. Finally, some results are presented, evaluated, and discussed.

The use case is the conflation of two datasets, a very accurate one containing city limits and network information like roads, paths and rivers; and a less accurate dataset that contains less accurate city limits but also cadastral land use parcels and a building layer, topologically consistent with the parcels. The issue is, thus, to snap the second dataset to the first one’s city limits, preserving consistency with the network elements.

Extract of the less accurate dataset of the use case, to conflate with accurate city limits.

In this use case, the city limits and the parcels are included in the least squares conflation and the buildings are considered as additional propagated features. As this work only focused on geometrical conflation, we did not use one of the matching techniques presented in the related work section. To save time, the data matching is manually carried out between the two layers existing in both datasets,

Three areas covering different kind of cities are conflated with both datasets of the use case. One area is quite small with a rural town (

The conflation framework presented in

The implementation provides a GUI to define the parameters (e.g., the constraints used, the weights, the input data,...) and trigger the conflation. A scheduler was developed and it sequences the steps of the conflation: build the points (

To control a conflation with the GUI, the user simply defines the internal constraints for each feature type, the external constraints, constraints weights, and finally the features that are propagated rather than adjusted. As many constraints are available, a user has to tune the framework to determine the correct parameterization for a use case. For our use case, experiments showed that propagating buildings was successful, so it was decided that including them in the adjustment was not necessary. The experiments allow the inference of some criteria to define constraints weight:

the chosen conflation constraint should have very high weights (20 in the experiment),

key shape constraints like stiffness for parcels should have high weights (16 in the experiment),

the movement constraint should have a minimal weight (1 in the experiment).

With such a use case that contains many land use parcels, the number of equations in the least squares system may quickly become very large. For instance, the conflation of a dataset with 110 land use parcels, which corresponds to 1,400 unknowns (

The first solution to compute large datasets is to partition data into smaller parts that are computed separately. In order to avoid topology disconnections, partitioning has to take topology into account: land use parcels are grouped when they are topologically connected (

Features that share topology and spatial relations should be grouped in a partition.

Constraints between features at the edge and features outside the partition should be included in the adjustment.

Disconnected sets of land use parcels that can be partitioned and treated separately.

Added to the partitioning, the implementation was changed to introduce sparse matrices in the least squares adjustment. Sparse matrices are matrices with a lot of zero values. The storage of such matrices is much lighter than standard matrices [

Conflation results are presented for the use case with two different cities that present variations in size, shapes, and deformations required. Only results on small cities are presented to keep the figures readable because most distortions are negligible compared to city size for large cities, and would not be readable at the city scale. The first result is presented in

The second city is characterized by large distortions (10 to 15 m), and quite complex shapes, and the conflation results remain excellent (

Conflated parcels (dashed lines for initial data) extracted from the

Zoomed extracts of the second conflated city: despite large distortions, complex shapes are well preserved.

In order to improve the legibility of the presented results, the buildings propagation was excluded from the previous result pictures.

In order to evaluate the results obtained with the least squares conflation framework on the test cases, they are compared to the results of the rubber sheeting method on the same test case. The rubber sheeting interpolation is a simple inverse distance weighting, as proposed by Haunert [

Conflation results with data that require large distortions (conflated parcels are in plain blue, initial outlines are dashed and arrows show matching vectors).

Propagation of the least squares conflation to buildings inside parcels: Initial geometries drawn with dashed lines.

Other examples of propagated buildings, including buildings topologically connected to conflated parcels.

(

Conflation methods were also compared by automatic measuring of shape preservation: the shape of the conflated features was compared to the shape of the corresponding initial features (the framework maintains links between features). Five measures are used: area increase ratio that measures if the conflation increased or decreased the feature area, the surface distance (

Nevertheless, some defects have been noticed in one of the conflated areas, around large displacement vectors: when the segments of the conflated polygons are very large (

Root Mean Square errors (RMS) for Least Squares conflation (LS) and Rubber Sheeting conflation (RS) compared to initial data, for five shape comparing measures and 200 features.

RMS Error LS | RMS Error RS | |
---|---|---|

Area increase ratio | 3.39% | 5.48% |

Surface distance | 0.152 | 0.102 |

Turning function | 0.093 | 0.184 |

Polygon signature | 0.536 | 0.932 |

Hausdorff distance | 3.087 | 3.736 |

Initial data to conflate where the identified two defects (very long (

As mentioned earlier, the use case does not contain any ideal final data to compare with the conflated result as the land use parcels are not in the precise dataset. In order to cope with this situation in the framework evaluation, benchmark data were created in a GIS with initial and final parcels and the vectors of the transformation (

(

A final way to evaluate the framework is to analyse its sensitivity to its parameters: the choice of the constraints and the weights assigned to each constraint. Different test conflations were carried out with varying constraints and weight-settings. Weight-setting in the framework is a challenging task as it is in the least squares based generalization [

Conflation variations with varying parameters.

The presented shape preserving conflation framework is computationally intense, so computation time needs to be discussed. As mentioned earlier, the first dataset (^{T}PA^{2}) complexity due to implementation issues. If we include the required matching process in the computation time, there is no drastic change: an automatic matching technique was tested between initial parcels and conflated parcels of the test datasets, and the processing time is negligible compared to conflation computation time. More complex matching processes may take more time but should not drastically increase the total computation time.

The requirement for matching should also be discussed. Only one matched feature is necessary to conflate a dataset but such minimal matching does not allow a conflation as accurate as a more complete vertex to vertex matching. Indeed, the more vertices are matched, the more precise the conflation constraints are. However, this can be counterbalanced by the use of an appropriate conflation constraint: if there are few displacement vectors, the conflation constraint to choose has to have a large impact radius for each vector.

Furthermore, one of the hypotheses of the conflation framework is that a less precise dataset is conflated on a reference dataset, whose positional accuracy is better. But it is not always possible to determine a reference dataset. As it is modeled and implemented, the framework only allows choosing one of the datasets as the reference and conflating the other on it. However, it seems feasible to define constraints to conflate both datasets to a medium geometry, replacing the conflation constraints based on displacement vectors.

The proposed least squares based conflation framework is a contribution to geometrical conflation methods as it allows to preserve geographic shapes during conflation, which was not possible before. There are also slighter contributions to the least square based methods that are used in generalization for instance: some new constraints to preserve spatial relations have been introduced, techniques like partitioning or sparse matrices used to reduce matrices complexity could be used to apply least squares based generalization methods to larger zones. The way it integrates rubber-sheeting propagation could be useful in generalization too. In addition to the defects identified in the evaluation section, the framework has some drawbacks, the major one being the complexity of the parameterization,

To conclude, this paper dealt with the proposition of a new framework to geometrical conflation that preserves geographic shapes while deforming conflated features. The framework is based on the least squares principles that were successfully used in map generalization, by Harrie [

To go further, the least square based conflation framework should be tested on the conflation of use cases with quite different data to verify its genericity in the preservation of geographic shapes. Then, it could be extended by new constraints to preserve additional kinds of shape or geographic relations. Moreover, the framework is only dedicated to conflation cases with a dataset used as the geometrical reference. As a consequence, it is not directly usable in cases where the datasets have similar accuracies. An extension of the framework should be developed to allow shape preserving conflations in such cases. Finally, it could be interesting to investigate the use of hard constraints in a least squares adjustment, shifting from Gauss-Markov least squares model to Gauss-Helmert model.

The authors declare no conflict of interest.