The crucial problem for integrating geospatial data is finding the corresponding objects (the counterpart) from different sources. Most current studies focus on object matching with individual attributes such as spatial, name, or other attributes, which avoids the difficulty of integrating those attributes, but at the cost of an ineffective matching. In this study, we propose an approach for matching instances by integrating heterogeneous attributes with the allocation of suitable attribute weights via information entropy. First, a normalized similarity formula is developed, which can simplify the calculation of spatial attribute similarity. Second, sound-based and word segmentation-based methods are adopted to eliminate the semantic ambiguity when there is a lack of a normative coding standard in geospatial data to express the name attribute. Third, category mapping is established to address the heterogeneity among different classifications. Finally, to address the non-linear characteristic of attribute similarity, the weights of the attributes are calculated by the entropy of the attributes. Experiments demonstrate that the Entropy-Weighted Approach (EWA) has good performance both in terms of precision and recall for instance matching from different data sets.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited