Historical Collaborative Geocoding
Abstract
:1. Introduction
1.1. Context
1.2. Approach and Contributions
2. Theory
2.1. Geocoding
2.1.1. Related Work
2.1.2. Estimating and Conveying the Quality of Geocoded Places
2.1.3. Temporal Depth
2.1.4. Handling the Imperfections of Geohistorical Data
2.1.5. Handling Heterogeneous Address Types
2.2. Integrating Geohistorical Data
General Considerations about Building a Spatio-Temporal Database
2.3. Extracting Geohistorical Objects from Historical Maps
- Historical maps are spatially close to modern maps. The way in which spatial information is described is very similar (both are based on mathematically well-defined reference systems, as opposed for instance to an artistic painting of a city which would be seen as a non-geometrical map). The integration of the information they convey in a Geographical Information System (GIS) is therefore facilitated.
- The main goal of such maps is to provide a reliable depiction of the shape and location of geographical features.
2.3.1. Georeferencing Historical Maps
Choosing the Target Spatial Reference System
Selection of Ground Control Points
Choosing a Geometric Transformation Model
2.3.2. Temporalization: Locating Geohistorical Sources in Time
2.3.3. Extracting Information from Maps
3. Methods
3.1. Building Historical Gazetteers
- a historical map is scanned;
- scans are georeferenced using hand-picked control points;
- historical work allows for the estimation of temporal information and spatial precision of the map;
- road names and axis geometry are extracted from the scan (manually or automatically);
- building numbers are extracted from the scan (manually or automatically);
- in some cases, building numbers can be generated from the available data (e.g., road starting and ending building number);
- normalised names are created from historical names (dealing with abbreviations, etc.),
- geohistorical objects are created.
Extracting Geo-Historical Information from Maps
3.2. Modelling Geohistorical Objects
3.2.1. Modelling Geohistorical Objects
Historical Aspect
- Name. By name, we mean the historical name initially used to identify the object in the historical source, and the current name used by historians to identify the object in the current context. For instance, the historical name for the Eiffel Tower in Paris may be “tour de 300 m”, but today, it is referenced as “Tour Eiffel”. Both can coexist in a gazetteer (two different geohistorical objects, with a different source and date).
- Source. A historical object is defined by a primary historical source (document) where the object is referenced. In addition to the historical source, the way in which the object was digitized in this source is also essential. For instance, a street name may have the Jacoubet map as its historical source, and would have been digitized via collaborative editing on the georeferenced map.
- Temporalization. Any historical source is associated with temporal information (fuzzy dates), which is the period during which the source is most likely to be relevant. In addition to the historical source’s temporal information, a historical object can also have its own temporal information. For instance, a street may have been extracted from a historical map created between 1820 and 1842. The use of other historical documents may allow the probable existence of this street to be narrowed to 1824–1836. Keep in mind that several other geohistorical objects may describe this street at several other time periods in the same or in another gazetteer.
Geospatial Aspect
- Geometry. A feature has a geometry which follows the OGC standard http://www.opengeospatial.org/standards. It may be a point, polyline, polygon, or a composition of any of these, in a specified SRS. The geometry is extracted from the historical source manually or automatically. Such information will be given in the Source description.
- Positional accuracy. Historical features have positional accuracy information. This precision expresses the spatial uncertainty of the historical source (the person drawing the map might have made mistakes) and the spatial imprecision of the digitizing process (the person editing the digitised map might have made a mistake). One historical source may contain several accuracy metadata, one for each geohistorical object type it contains. For instance, a historical map may contain buildings and roads. Buildings may have a different positional accuracy (5 m) than the road axis (20 m). Besides, the digitising process precision may have been 5 m.
Temporal Aspect
3.2.2. A Database of Geohistorical Objects
Table Inheritance
Simulated Inheritance of Index and Constraints
Modelling a Geohistorical Object from the User’s Perspective
- Add the historical source and numerical origin process in the source and process tables.
- Create a new table inheriting geohistorical objects and containing your additional custom columns
- Use the registering function with this table name
- Insert your data in the table.
3.3. The Historical Geocoder
3.3.1. Creating Geohistorical Object Gazetteers for Geocoding
Database Architecture for Geocoding
3.3.2. Finding the Best Matches
Concept
Example
Metric: String Distance
Metric: Temporal Distance
Metric: Building Number Distance
Metric: Positional Accuracy
Metric: Level of Detail Distance
Metric: Geospatial Distance
Example of Matching Function
3.4. Collaborative Editing of Geohistorical Objects
3.4.1. About Collaborative Editing
3.4.2. Collaborative Editing Architecture
Architecture
Persistence of Geocoding Results and Edits
3.4.3. Collaborative Editing User Interface
Interface for a REST API
Interface for Batch Geocoding via CSV Files
Interface for Displaying and Editing Results
4. Results
4.1. Geohistorical Objects Sources
4.1.1. Historical Maps Used
- reconstruct the meridian-aligned grid with Lambert I coordinates;
- in each sheet, mask the non-cartographic parts out (cartouche, borders, etc.);
- for each sheet, set pairs of ground control points at each intersection between the vertical and horizontal lines of the grids in the map and in the reconstructed grid;
- transform each sheet with a rubbersheeting transform based on the ground control points previously identified on the grids.
4.1.2. Other Geohistorical Sources
4.2. Geocoding of Historical Datasets
4.2.1. Manually Collected Datasets
4.2.2. Belle Epoque
4.3. Manual Editing of the Geocoding Results for Evaluation
- When the edit moved the address point by less than 15 m, we considered that the edit was mostly about small moves (e.g., centering the point on the building limit).
- Between 15 and 55 m, the correct street is found, but the building numbers are slightly misplaced (by a few numbers).
- Between 55 and 155 m, the street is correct in most cases, but the building numbers are far from their correct position.
- Above 155 m, streets are mostly wrong.
4.4. Collaborative Editing
4.4.1. Use Case 1: Top Three Results for One Address
4.4.2. Use Case 2: Batch Geocoding of 30 Addresses and Check/Edit
5. Discussion
5.1. Genericity
5.1.1. Geohistorical Sources and Data
Using External Resources from the Web of Data as New Sources
Widening the Spectrum of Cartographic Sources
Diversity in Geohistorical Object Natures
5.1.2. Genericity in Usages
Named Entity Linking
Analysis tool of the Cartographic Sources Content
5.2. Quality of the Geocoding
5.2.1. Increasing the Quality of the Gazetteers
Collaborative Enrichment
Cross-Referencing Historical Maps
5.2.2. Communicating the Reliability of a Geocoding
Geocoding Qualification and Quality Measures
Geovisualisation
5.2.3. Integrating User Correction into Historical Sources
5.2.4. Scalability
6. Conclusions
Supplementary Materials
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Goldberg, D.W.; Wilson, J.P.; Knoblock, C.A. From Text to Geographic Coordinates: The Current State of Geocoding. J. Urban Reg. Inf. Syst. Assoc. 2007, 19, 33–46. [Google Scholar]
- St-Hilaire, M.; Moldofsky, B.; Richard, L.; Beaudry, M. Geocoding and Mapping Historical Census Data: The Geographical Component of the Canadian Century Research Infrastructure. Hist. Methods J. Quant. Interdiscip. Hist. 2007, 40, 76–91. [Google Scholar] [CrossRef]
- Daras, K.; Feng, Z.; Dibben, C. HAG-GIS: A spatial framework for geocoding historical addresses. In Proceedings of the 23rd GIS Research UK Conference, Leeds, UK, 15–17 April 2015. [Google Scholar]
- Hutchinson, M.J.; Veenendaal, B. An agent-based framework for intelligent geocoding. Appl. Geomat. 2013, 5, 33–44. [Google Scholar] [CrossRef]
- Roongpiboonsopit, D.; Karimi, H.A. Comparative Evaluation and Analysis of Online Geocoding Services. Int. J. Geogr. Inf. Sci. 2010, 24, 1081–1100. [Google Scholar] [CrossRef]
- Clough, P.; Tang, J.; Hall, M.M.; Warner, A. Linking archival data to location: A case study at the UK national archives. Aslib Proc. 2011, 63, 127–147. [Google Scholar] [CrossRef]
- Mostern, R.; Johnson, I. From named place to naming event: Creating gazetteers for history. Int. J. Geogr. Inf. Sci. 2008, 22, 1091–1108. [Google Scholar] [CrossRef]
- Southall, H.; Mostern, R.; Berman, M.L. On historical gazetteers. Int. J. Humanit. Arts Comput. 2011, 5. [Google Scholar] [CrossRef]
- Logan, J.R.; Jindrich, J.; Shin, H.; Zhang, W. Mapping America in 1880: The Urban Transition Historical GIS Project. Hist. Methods J. Quant. Interdiscip. Hist. 2011, 44, 49–60. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Lafreniere, D.; Gilliland, J. “All the World’s a Stage”: A GIS Framework for Recreating Personal Time-Space from Qualitative and Quantitative Sources: GIS Framework for Recreating Time-Space. Trans. GIS 2015, 19, 225–246. [Google Scholar] [CrossRef]
- De Runz, C. Imperfection, Temps et Espace: Modélisation, Analyse et Visualisation Dans un SIG Archéologique. Ph.D. Thesis, Université de Reims-Champagne Ardenne, Reims, France, 2008. [Google Scholar]
- Goodchild, M.F. Citizens as sensors: The world of volunteered geography. GeoJournal 2007, 69, 211–221. [Google Scholar] [CrossRef]
- Heipke, C. Crowdsourcing geospatial data. ISPRS J. Photogramm. Remote Sens. 2010, 65, 550–557. [Google Scholar] [CrossRef]
- Southall, H.; Aucott, P.; Fleet, C.; Pert, T.; Stoner, M. GB1900: Engaging the Public in Very Large Scale Gazetteer Construction from the Ordnance Survey “County Series” 1:10,560 Mapping of Great Britain. J. Map Geogr. Libr. 2017, 13, 7–28. [Google Scholar] [CrossRef]
- Vershbow, B. NYPL Labs: Hacking the Library. J. Libr. Adm. 2013, 53, 79–96. [Google Scholar] [CrossRef]
- Haklay, M. Citizen Science and Volunteered Geographic Information: Overview and Typology of Participation. In Crowdsourcing Geographic Knowledge: Volunteered Geographic Information (VGI) in Theory and Practice; Sui, D., Elwood, S., Goodchild, M., Eds.; Springer: Dordrecht, The Netherlands, 2013; pp. 105–122. [Google Scholar]
- Fomel, S.; Claerbout, J.F. Guest Editors’ Introduction: Reproducible Research. Comput. Sci. Eng. 2009, 11, 5–7. [Google Scholar] [CrossRef]
- Aruliah, D.A.; Brown, C.T.; Hong, N.P.C.; Davis, M.; Guy, R.T.; Haddock, S.H.D.; Huff, K.; Mitchell, I.; Plumbley, M.D.; Waugh, B.; et al. Best Practices for Scientific Computing. arXiv 2012, arXiv:1210.0530. [Google Scholar]
- Wilson, G.; Bryan, J.; Cranston, K.; Kitzes, J.; Nederbragt, L.; Teal, T.K. Good Enough Practices in Scientific Computing. arXiv 2016, arXiv:cs.SE/1609.00037. [Google Scholar] [CrossRef] [PubMed]
- Marwick, B. Computational Reproducibility in Archaeological Research: Basic Principles and a Case Study of Their Implementation. J. Archaeol. Method Theory 2016, 1–27. [Google Scholar] [CrossRef]
- Armstrong, M.P. Temporality in Spatial Databases. In Proceedings of the GIS/LIS’88, San Francisco, CA, USA, 30 November–2 December 1988; pp. 880–889. [Google Scholar]
- Duménieu, B.; Abadie, N.; Perret, J. Assessing the planimetric accuracy of Paris atlases from the late 18th and 19th centuries. In SAC 2018, KEGeoD–Knowledge Extraction from Geographical Data; ACM Press: New York, NY, USA, 2018. [Google Scholar]
- Herrault, P.A.; Sheeren, D.; Fauvel, M.; Monteil, C.; Paegelow, M. A Comparative Study of Geometric Transformation Models for the Historical “Map of France” Registration. Geographia Technica 2013. p. 34. Available online: https://hal.archives-ouvertes.fr/hal-01416127/document (accessed on 10 May 2018).
- Fabbri, R.; Kimia, B. 3D curve sketch: Flexible curve-based stereo reconstruction and calibration. In Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, CA, USA, 13–18 June 2010; pp. 1538–1545. [Google Scholar]
- Cléri, I.; Pierrot-Deseilligny, M.; Vallet, B. Automatic Georeferencing of a Heritage of old analog aerial Photographs. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2014, 2, 33. [Google Scholar] [CrossRef]
- Bitelli, G.; Cremonini, S.; Gatta, G. Ancient map comparisons and georeferencing techniques: A case study from the Po River Delta (Italy). E-Perimetron 2009, 4, 221–228. [Google Scholar]
- Boutoura, C.; Livieratos, E. Some fundamentals for the study of the geometry of early maps by comparative methods. E-Perimetron 2006, 1, 60–70. [Google Scholar]
- De Runz, C.; Desjardin, E.; Piantoni, F.; Herbin, M. Anteriority index for managing fuzzy dates in archæological GIS. Soft Comput. 2010, 14, 339. [Google Scholar] [CrossRef]
- Kauppinen, T.; Mantegari, G.; Paakkarinen, P.; Kuittinen, H.; Hyvönen, E.; Bandini, S. Determining relevance of imprecise temporal intervals for cultural heritage information retrieval. Int. J. Hum. Comput. Stud. 2010, 68, 549–560. [Google Scholar] [CrossRef]
- Duménieu, B. Un Système D’information Géographique Pour Le Suivi d’objets Historiques Urbains à Travers L’espace et Le Temps. Ph.D. Thesis, Ecole Des Hautes Etudes en Sciences Sociales, Paris, France, 2015. [Google Scholar]
- Massey, C.G. Playing with matches: An assessment of accuracy in linked historical data. Hist. Methods J. Quant. Interdiscip. Hist. 2017, 50, 129–143. [Google Scholar] [CrossRef]
- Perret, J.; Gribaudi, M.; Barthelemy, M. Roads and cities of 18th century France. Sci. Data 2015, 2. [Google Scholar] [CrossRef] [PubMed]
- Noizet, H.; Bove, B.; Costa, L. Paris de Parcelles En Pixels; Presses Universitaires de Vincennes: Vincennes, France, 2013. [Google Scholar]
- Dhanani, A. Suburban built form and street network development in London, 1880–2013: An application of quantitative historical methods. Hist. Methods J. Quant. Interdiscip. Hist. 2016, 49, 230–243. [Google Scholar] [CrossRef] [Green Version]
- Carrion, D.; Migliaccio, F.; Minini, G.; Zambrano, C. From historical documents to GIS: A spatial database for medieval fiscal data in Southern Italy. Hist. Methods J. Quant. Interdiscip. Hist. 2016, 49, 1–10. [Google Scholar] [CrossRef]
- Gribaudi, M.; Magaud, J. L’action Publique et Ses Administrateurs Dans Les Domaines Sanitaires et Social en France, 1800 à 1900. 1999. Available online: https://journals.openedition.org/histoiremesure/777?lang=en (accessed on on 10 May 2018).
- Lazzara, G.; Levillain, R.; Géraud, T.; Jacquelet, Y.; Marquegnies, J.; Crépin-Leblond, A. The SCRIBO Module of the Olena Platform: A Free Software Framework for Document Image Analysis. In Proceedings of the 11th International Conference on Document Analysis and Recognition (ICDAR), Beijing, China, 16–17 September 2011; pp. 252–258. [Google Scholar] [CrossRef]
- Plumejeaud-Perreau, C.; Grosso, E.; Parent, B. Dissemination and Geovisualization of Territorial Entities History. J. Spat. Inf. Sci. 2014, 8, 73–93. [Google Scholar] [CrossRef]
- Shen, W.; Wang, J.; Han, J. Entity linking with a knowledge base: Issues, techniques, and solutions. IEEE Trans. Knowl. Data Eng. 2015, 27, 443–460. [Google Scholar] [CrossRef]
- Overell, S. The problem of place name ambiguity. SIGSPATIAL Spec. 2011, 3, 12–15. [Google Scholar] [CrossRef]
- Mihalcea, R.; Csomai, A. Wikify!: Linking Documents to Encyclopedic Knowledge. In Proceedings of the Sixteenth CIKM ’07 ACM Conference on Conference on Information and Knowledge Management, Lisbon, Portugal, 6–10 November 2007; ACM: New York, NY, USA, 2007; pp. 233–242. [Google Scholar] [CrossRef]
- Hachey, B.; Radford, W.; Nothman, J.; Honnibal, M.; Curran, J.R. Evaluating Entity Linking with Wikipedia. Artif. Intell. 2013, 194, 130–150. [Google Scholar] [CrossRef]
- Zhang, W.; Gelernter, J. Geocoding location expressions in Twitter messages: A preference learning method. J. Spat. Inf.Sci. 2014, 2014, 37–70. [Google Scholar]
- Costes, B. Vers la Construction D’un référentiel géographique Ancien. Un modèle de Graphe Agrégé Pour intégrer, Qualifier et Analyser des Réseaux Géohistoriques. Ph.D. Thesis, Université Paris-Est, Champs-sur-Marne, France, 2016. [Google Scholar]
- Costes, B.; Perret, J.; Bucher, B.; Gribaudi, M. An aggregated graph to qualify historical spatial networks using temporal patterns detection. In Proceedings of the 18th AGILE International Conference on Geographic Information Science, New York, NY, USA, 14–17 May 2015. [Google Scholar]
- Zimmerman, D.L.; Fang, X.; Mazumdar, S.; Rushton, G. Modeling the Probability Distribution of Positional Errors Incurred by Residential Address Geocoding. Int. J. Health Geogr. 2007, 6, 1. [Google Scholar] [CrossRef] [PubMed]
- Brando, C.; Frontini, F.; Ganascia, J. REDEN: Named Entity Linking in Digital Literary Editions Using Linked Data Sets. CSIMQ 2016, 7, 60–80. [Google Scholar] [CrossRef]
- Zaveri, A.; Rula, A.; Maurino, A.; Pietrobon, R.; Lehmann, J.; Auer, S. Quality assessment for linked data: A survey. Semant. Web 2016, 7, 63–93. [Google Scholar] [CrossRef]
- Brando, C.; Abadie, N.; Frontini, F. Linked Data Quality for Domain Specific Named Entity Linking. In Proceedings of the 1st Atelier Qualité des Données du Web, 16ème Conférence Internationale Francophone sur l’Extraction et la Gestion de Connaissances, Reims, France, 18–22 January 2016. [Google Scholar]
Dataset Name | Input Addresses | Response Rate (Rough) | Seconds/1000 Addresses |
---|---|---|---|
South Americans | 13,991 | 13,743 (250) | 138 |
Textile | 5777 | 5688 (16) | 135 |
Textile 2 | 3070 | 3053 (2) | 110 |
Artists accommodations | 13,907 | 10,215 (2955) | 244 |
Health administrators | 1887 | 1698 (171) | 316 |
Belle epoque (0.3) | 6467 | 3880 (337) | 280 |
Belle epoque (0.5) | 6467 | 6000 | 351 |
Dist. (m) | % | Avg(Agg) | Avg(Sem) | Avg(Tempo) | Main Edit Cause (Subjective) |
---|---|---|---|---|---|
0–15 | 81% | 9.4 | 0.07 | 19.5 | moving point on building limit |
15–55 | 11% | 12.4 | 0.09 | 27.2 | small numbering editing (same street) |
55–155 | 2% | 23.7 | 0.14 | 41.2 | large numbering editing (same street) |
155–7200 | 6% | 26.9 | 0.18 | 49.1 | editing street |
© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Cura, R.; Dumenieu, B.; Abadie, N.; Costes, B.; Perret, J.; Gribaudi, M. Historical Collaborative Geocoding. ISPRS Int. J. Geo-Inf. 2018, 7, 262. https://doi.org/10.3390/ijgi7070262
Cura R, Dumenieu B, Abadie N, Costes B, Perret J, Gribaudi M. Historical Collaborative Geocoding. ISPRS International Journal of Geo-Information. 2018; 7(7):262. https://doi.org/10.3390/ijgi7070262
Chicago/Turabian StyleCura, Rémi, Bertrand Dumenieu, Nathalie Abadie, Benoit Costes, Julien Perret, and Maurizio Gribaudi. 2018. "Historical Collaborative Geocoding" ISPRS International Journal of Geo-Information 7, no. 7: 262. https://doi.org/10.3390/ijgi7070262
APA StyleCura, R., Dumenieu, B., Abadie, N., Costes, B., Perret, J., & Gribaudi, M. (2018). Historical Collaborative Geocoding. ISPRS International Journal of Geo-Information, 7(7), 262. https://doi.org/10.3390/ijgi7070262