New Era for Geo-Parsing to Obtain Actual Locations: A Novel Toponym Correction Method Based on Remote Sensing Images
Abstract
:1. Introduction
2. Related Works
2.1. Geo-Parsing Progress
2.2. A Geo-Parsing Offset Case
3. Methodology
3.1. Basic Idea
3.2. TC-RSI Method
4. Experiments and Results
4.1. Case Study Area
4.2. Correction Result
4.2.1. Presentation
4.2.2. Correction Ranges
4.3. Correction Evaluation
4.3.1. Visual Validation
4.3.2. Statistical Assessment
4.3.3. Robustness Assessment
4.4. Correction Effect
4.4.1. Improving Geo-Parsing Location Accuracy
4.4.2. Promoting Geographical Discoveries on Small Scales
5. Discussion
5.1. Terrian Impact
5.2. Method Limitation
5.3. Potential Future Work
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A
Id | Forest Main Species (Official Latin Name) | Family Name | Generic Name | Image |
---|---|---|---|---|
1 | Juglans regia Linn. | Juglandaceae | Juglans | |
2 | Toxicodendron vernicifluum (Stokes) F. A. Barkl. | Anacardiaceae | Toxicodendron | |
3 | Phyllostachys heterocycla (Carr.) Mitford cv. Pubescens Mazel ex H.de leh. | Gramineae | Phyllostachys | |
4 | Castanea mollissima Bl. | Fagaceae | - | |
5 | Pinus massoniana Lamb. | Pinaceae | Pinus | |
6 | Cerasus yedoensis | Cerasus yedoensis | Cerasus Mill. |
Appendix B
References
- Purves, R.S.; Clough, P.; Jones, C.B.; Hall, M.H.; Murdock, V. Geographic Information Retrieval: Progress and Challenges in Spatial Search of Text. Found. Trends Inf. Retr. 2018, 12, 164–318. [Google Scholar] [CrossRef]
- Wang, J.; Hu, Y. Are we there yet? evaluating state-of-the-art neural network based geoparsers using EUPEG as a benchmarking platform. In Proceedings of the 3rd ACM SIGSPATIAL International Workshop on Geospatial Humanities, Chicago, IL, USA, 5 November 2019; pp. 1–6. [Google Scholar]
- Nizzoli, L.; Avvenuti, M.; Tesconi, M.; Cresci, S. Geo-semantic-parsing: AI-powered geoparsing by traversing semantic knowledge graphs. Decis. Support Syst. 2020, 136, 113346. [Google Scholar] [CrossRef]
- Tshitoyan, V.; Dagdelen, J.; Weston, L.; Dunn, A.; Rong, Z.; Kononova, O.; Persson, K.A.; Ceder, G.; Jain, A. Unsupervised word embeddings capture latent knowledge from materials science literature. Nature 2019, 571, 95–98. [Google Scholar] [CrossRef] [PubMed]
- Singh Chawla, D. Text-mining tool seeks out ‘hidden data’. Nature 2017. [CrossRef]
- Zhou, C.; Wang, H.; Wang, C.; Hou, Z.; Zheng, Z.; Shen, S.; Cheng, Q.; Feng, Z.; Wang, X.; Lv, H.; et al. Geoscience knowledge graph in the big data era. Sci. China Earth Sci. 2021, 64, 1105–1114. [Google Scholar] [CrossRef]
- Gritta, M.; Pilehvar, M.T.; Limsopatham, N.; Collier, N. What’s missing in geographical parsing? Lang. Resour. Eval. 2018, 52, 603–623. [Google Scholar] [CrossRef]
- Yousaf, M.; Wolter, D. A reasoning model for geo-referencing named and unnamed spatial entities in natural language place descriptions. Spat. Cogn. Comput. 2021, 21, 1–39. [Google Scholar] [CrossRef]
- Xuke, H.; Zhiyong, Z.; Hao, L.; Yingjie, H.; Fuqiang, G.; Jens, K.; Hongchao, F.; Friederike, K. Location reference recognition from texts: A survey and comparison. arXiv 2022, arXiv:2207.01683. [Google Scholar] [CrossRef]
- Han, X.; Wang, J. Earthquake Information Extraction and Comparison from Different Sources Based on Web Text. ISPRS Int. J. Geo-Inf. 2019, 8, 252. [Google Scholar] [CrossRef]
- Liu, Y.; Liu, X.; Gao, S.; Gong, L.; Kang, C.; Zhi, Y.; Chi, G.; Shi, L. Social Sensing: A New Approach to Understanding Our Socioeconomic Environments. Ann. Assoc. Am. Geogr. 2015, 105, 512–530. [Google Scholar] [CrossRef]
- Scheele, C.; Yu, M.; Huang, Q. Geographic context-aware text mining: Enhance social media message classification for situational awareness by integrating spatial and temporal features. Int. J. Digit. Earth 2021, 14, 1721–1743. [Google Scholar] [CrossRef]
- Gao, S.; Liu, Y.; Kang, Y.; Zhang, F. User-generated content: A promising data source for urban informatics. In Urban Informatics; Shi, W., Goodchild, M.F., Batty, M., Kwan, M.-P., Zhang, A., Eds.; Springer: Singapore, 2021; pp. 503–522. [Google Scholar]
- Wang, S.; Qian, L.; Zhu, Y.; Song, J.; Lu, F.; Zeng, H.; Chen, P.; Yuan, W.; Li, W.; Geng, W. A web text mining approach for the evaluation of regional characteristics at the town level. Trans. GIS 2021, 25, 2074–2103. [Google Scholar] [CrossRef]
- Salcedo-Sanz, S.; Ghamisi, P.; Piles, M.; Werner, M.; Cuadra, L.; Moreno-Martinez, A.; Izquierdo-Verdiguier, E.; Munoz-Mari, J.; Mosavi, A.; Camps-Valls, G. Machine learning information fusion in Earth observation: A comprehensive review of methods, applications and data sources. Inf. Fusion 2020, 63, 256–272. [Google Scholar] [CrossRef]
- Wang, L.; Yan, J.; Mu, L.; Huang, L. Knowledge discovery from remote sensing images: A review. WIREs Data Min. Knowl. Discov. 2020, 10, e1371. [Google Scholar] [CrossRef]
- Acheson, E.; De Sabbata, S.; Purves, R.S. A quantitative analysis of global gazetteers: Patterns of coverage for common feature types. Comput. Environ. Urban Syst. 2017, 64, 309–320. [Google Scholar] [CrossRef]
- Souza, L.A.; Davis, C.A.; Borges, K.A.V.; Delboni, T.M.; Laender, A.H.F.; Society, I.C. The role of gazetteers in geographic knowledge discovery on the Web. In Proceedings of the Third Latin American Web Congress (LA-WEB’2005), Buenos Aires, Argentina, 1 October–2 November 2005; pp. 157–165. [Google Scholar]
- Asokan, A.; Anitha, J. Change detection techniques for remote sensing applications: A survey. Earth Sci. Inform. 2019, 12, 143–160. [Google Scholar] [CrossRef]
- Ghaffarian, S.; Valente, J.; van der Voort, M.; Tekinerdogan, B. Effect of Attention Mechanism in Deep Learning-Based Remote Sensing Image Processing: A Systematic Literature Review. Remote Sens. 2021, 13, 2965. [Google Scholar] [CrossRef]
- Kuenzer, C.; Bluemel, A.; Gebhardt, S.; Quoc, T.V.; Dech, S. Remote Sensing of Mangrove Ecosystems: A Review. Remote Sens. 2011, 3, 878–928. [Google Scholar] [CrossRef]
- Aldana-Bobadilla, E.; Molina-Villegas, A.; Lopez-Arevalo, I.; Reyes-Palacios, S.; Muñiz-Sanchez, V.; Arreola-Trapala, J. Adaptive Geoparsing Method for Toponym Recognition and Resolution in Unstructured Text. Remote Sens. 2020, 12, 3041. [Google Scholar] [CrossRef]
- Dewandaru, A.; Widyantoro, D.H.; Akbar, S. Event Geoparser with Pseudo-Location Entity Identification and Numerical Argument Extraction Implementation and Evaluation in Indonesian News Domain. ISPRS Int. J. Geo-Inf. 2020, 9, 712. [Google Scholar] [CrossRef]
- Karimzadeh, M.; Pezanowski, S.; MacEachren, A.M.; Wallgrün, J.O. GeoTxt: A scalable geoparsing system for unstructured text geolocation. Trans. GIS 2019, 23, 118–136. [Google Scholar] [CrossRef]
- Qiu, Q.; Xie, Z.; Wang, S.; Zhu, Y.; Lv, H.; Sun, K. ChineseTR: A weakly supervised toponym recognition architecture based on automatic training data generator and deep neural network. Trans. GIS 2022, 26, 1256–1279. [Google Scholar] [CrossRef]
- Wang, J.; Hu, Y.; Joseph, K. NeuroTPR: A neuro-net toponym recognition model for extracting locations from social media messages. Trans. GIS 2020, 24, 719–735. [Google Scholar] [CrossRef]
- Wang, S.; Zhang, X.; Ye, P.; Du, M. Deep Belief Networks Based Toponym Recognition for Chinese Text. ISPRS Int. J. Geo-Inf. 2018, 7, 217. [Google Scholar] [CrossRef]
- Wang, S.; Ji, L.; Zhang, X.; Zhao, R.; Chen, X.; Yu, H. Change Detection of Geographic Features Based on Web Pages. J. Geo-Inf. Sci. 2013, 15, 625–634. [Google Scholar] [CrossRef]
- Li, J.; Sun, A.; Han, J.; Li, C. A Survey on Deep Learning for Named Entity Recognition. IEEE Trans. Knowl. Data Eng. 2022, 34, 50–70. [Google Scholar] [CrossRef]
- Nasar, Z.; Jaffry, S.W.; Malik, M.K. Named Entity Recognition and Relation Extraction: State-of-the-Art. ACM Comput. Surv. 2021, 54, 1–39. [Google Scholar] [CrossRef]
- Wallgrün, J.O.; Karimzadeh, M.; MacEachren, A.M.; Pezanowski, S. GeoCorpora: Building a corpus to test and train microblog geoparsers. Int. J. Geogr. Inf. Sci. 2018, 32, 1–29. [Google Scholar] [CrossRef]
- Karimzadeh, M.; MacEachren, A.M. GeoAnnotator: A Collaborative Semi-Automatic Platform for Constructing Geo-Annotated Text Corpora. ISPRS Int. J. Geo-Inf. 2019, 8, 161. [Google Scholar] [CrossRef]
- Molina-Villegas, A.; Muñiz-Sanchez, V.; Arreola-Trapala, J.; Alcántara, F. Geographic Named Entity Recognition and Disambiguation in Mexican News using word embeddings. Expert Syst. Appl. 2021, 176, 114855. [Google Scholar] [CrossRef]
- Yan, Z.; Yang, C.; Hu, L.; Zhao, J.; Jiang, L.; Gong, J. The Integration of Linguistic and Geospatial Features Using Global Context Embedding for Automated Text Geocoding. ISPRS Int. J. Geo-Inf. 2021, 10, 572. [Google Scholar] [CrossRef]
- Kim, J.; Vasardani, M.; Winter, S. Similarity matching for integrating spatial information extracted from place descriptions. Int. J. Geogr. Inf. Sci. 2017, 31, 56–80. [Google Scholar] [CrossRef]
- Chen, X.; Gelernter, J.; Zhang, H.; Liu, J. Multi-lingual geoparsing based on machine translation. Future Gener. Comput. Syst. 2019, 96, 667–677. [Google Scholar] [CrossRef]
- Moura, T.H.V.M.; Davis, C.A., Jr.; Fonseca, F.T. Reference data enhancement for geographic information retrieval using linked data. Trans. GIS 2017, 21, 683–700. [Google Scholar] [CrossRef]
- Lample, G.; Ballesteros, M.; Subramanian, S.; Kawakami, K.; Dyer, C. Neural architectures for named entity recognition. arXiv 2016, arXiv:1603.01360. [Google Scholar]
- Big Data Search and Mining Lab. NLPIR. Available online: http://ictclas.nlpir.org/ (accessed on 11 March 2022).
- Wang, S.; Zhu, Y.; Qian, L.; Song, J.; Yuan, W. The Spatial Distribution Dataset on Ecological Agriculture Patterns of China (2018–2020). J. Glob. Change Data Discov. 2021, 5, 14–21. [Google Scholar] [CrossRef]
- Wang, S.; Zhu, Y.; Qian, L.; Song, J.; Yuan, W.; Sun, K.; Li, W.; Cheng, Q. A novel rapid web investigation method for ecological agriculture patterns in China. Sci. Total Environ. 2022, 842, 156653. [Google Scholar] [CrossRef] [PubMed]
- The Stanford Natural Language Processing Group. Stanford Named Entity Recognizer (NER). Available online: https://nlp.stanford.edu/software/CRF-NER.shtml (accessed on 10 March 2022).
- spaCy. Industrial-Strength Natural Language Processing in Python. Available online: https://spacy.io/ (accessed on 10 March 2022).
- Baidu. Geocoding API v2.0. Available online: https://api.map.baidu.com/lbsapi/cloud/webservice-geocoding.htm (accessed on 10 March 2022).
- Zhang, X.; Liu, L.; Chen, X.; Gao, Y.; Xie, S.; Mi, J. GLC_FCS30: Global land-cover product with fine classification system at 30 m using time-series Landsat imagery. Earth Syst. Sci. Data 2021, 13, 2753–2776. [Google Scholar] [CrossRef]
- Anhui Forestry Bureau. Implementation Outline of Underforest Economic Development in Anhui Province (2019–2025). Available online: https://lyj.ah.gov.cn/public/9913203/39124599.html (accessed on 7 March 2022).
- Hefei Forestry and Garden Bureau. Hefei Forestry and Garden Bureau Website. Available online: http://lyj.hefei.gov.cn/index.html (accessed on 12 March 2022).
- Benoit, L.; Briole, P.; Martin, O.; Thom, C.; Malet, J.P.; Ulrich, P. Monitoring landslide displacements with the Geocube wireless network of low-cost GPS. Eng. Geol. 2015, 195, 111–121. [Google Scholar] [CrossRef]
- Carlà, T.; Tofani, V.; Lombardi, L.; Raspini, F.; Bianchini, S.; Bertolo, D.; Thuegaz, P.; Casagli, N. Combination of GNSS, satellite InSAR, and GBInSAR remote sensing monitoring to improve the understanding of a large landslide in high alpine environment. Geomorphology 2019, 335, 62–75. [Google Scholar] [CrossRef]
- Chwedczuk, K.; Cienkosz, D.; Apollo, M.; Borowski, L.; Lewinska, P.; Santos, C.A.G.; Eborka, K.; Kulshreshtha, S.; Romero-Andrade, R.; Sedeek, A. Challenges related to the determination of altitudes of mountain peaks presented on cartographic sources. Geod. Vestn. 2022, 66, 49–59. [Google Scholar] [CrossRef]
Id | Forest Ecological Pattern | Toponym | Location (Longitude, Latitude) |
---|---|---|---|
1 | Forest-grass-livestock | Hefei city | (117.2334427, 31.82657783) |
2 | Forest-grass-livestock | Hefei city | (117.2334427, 31.82657783) |
3 | Forest-orchard | Hefei city | (117.2334427, 31.82657783) |
4 | Forest-crop | Hefei city | (117.2334427, 31.82657783) |
5 | Forest-herb | Hefei city | (117.2334427, 31.82657783) |
6 | Forest-herb | Hefei city | (117.2334427, 31.82657783) |
7 | Forest-mushroom | Hefei city | (117.2334427, 31.82657783) |
8 | Forest-grass-livestock | Feixi county | (117.1645578, 31.71296213) |
9 | Forest-grass-livestock | Feixi county | (117.1645578, 31.71296213) |
10 | Forest-grass-livestock | Feixi county | (117.1645578, 31.71296213) |
11 | Forest-grass-livestock | Feixi county | (117.1645578, 31.71296213) |
Forest Ecological Pattern | Cooperating Main Species (Official Latin Name) | Forest Main Species (Official Latin Name) | Forest Type in Remote Sensing Images | Forest Type in an Ecological Pattern |
---|---|---|---|---|
Forest-crop | Dioscorea esculenta (Lour.) Burkill | Juglans regia Linn. | Deciduous broadleaved forest | Closed deciduous broadleaved forest (GRIDCODE 4) |
Amorphophallus rivieri Durieu | Toxicodendron vernicifluum (Stokes) F. A. Barkl. | Deciduous broadleaved forest | ||
Forest-mushroom | Dictyophora indusiata (Vent.ex Pers) Fisch | Phyllostachys heterocycla (Carr.) Mitford cv. Pubescens Mazel ex H.de leh. | Evergreen broadleaved | Closed deciduous broadleaved forest and Closed Evergreen broadleaved forest (GRIDCODE 2 & 4) |
Agaricus bisporus (lang.) Sing | Castanea mollissima Bl. | Deciduous broadleaved forest | ||
Auricularia auricula (L. Ex Hook.) | Castanea mollissima Bl. | Deciduous broadleaved forest | ||
Forest-herb | Dendrobium nobile Lindl. | Pinus massoniana Lamb. | Evergreen needle-leaved forest | Closed evergreen needle-leaved forest (GRIDCODE 6) |
Ganoderma lucidum (Leyss. Ex Fr.) Karst. | Pinus massoniana Lamb. & Castanea mollissima Bl. | Evergreen needle-leaved forest | ||
Radix Paeoniae Alba | Pinus massoniana Lamb. | Evergreen needle-leaved forest | ||
Forest-grass-livestock | nigrum porcus | Phyllostachys heterocycla (Carr.) Mitford cv. Pubescens Mazel ex H.de leh. | Evergreen broadleaved | Closed deciduous broadleaved forest and Closed Evergreen broadleaved forest (GRIDCODE 2 & 4) |
caprae | Juglans regia Linn. | Deciduous broadleaved forest | ||
pecus | Phyllostachys heterocycla (Carr.) Mitford cv. Pubescens Mazel ex H.de leh. | Deciduous broadleaved forest | ||
Forest-orchard | Vaccinium spp. | Juglans regia Linn. | Deciduous broadleaved forest | Closed deciduous broadleaved forest (GRIDCODE 4) |
Rubus corchorifolius Linn. f. | Cerasus yedoensis | Deciduous broadleaved forest | ||
Vaccinium bracteatum Thunb. | Cerasus yedoensis | Deciduous broadleaved forest |
Id | Forest Ecological Pattern | Location | Original Belonged County | Corrected Location | Correction Distance (km) | Corrected Belonged County | County Change |
---|---|---|---|---|---|---|---|
1 | Forest-grass-livestock | (117.2334427, 31.82657783) | Shushan | (116.920334, 31.718822) | 36.76 | Feixi | Yes |
2 | Forest-grass-livestock | (117.2334427, 31.82657783) | Shushan | (116.920334, 31.718822) | 36.76 | Feixi | Yes |
3 | Forest-orchard | (117.2334427, 31.82657783) | Shushan | (117.171331, 31.843366) | 7.14 | Shushan | No |
4 | Forest-crop | (117.2334427, 31.82657783) | Shushan | (117.661231, 31.855506) | 47.59 | Feidong | Yes |
5 | Forest-herb | (117.2334427, 31.82657783) | Shushan | (117.866917, 31.828833) | 70.32 | Chaohu | Yes |
6 | Forest-herb | (117.2334427, 31.82657783) | Shushan | (117.866917, 31.828833) | 70.32 | Chaohu | Yes |
7 | Forest-mushroom | (117.2334427, 31.82657783) | Shushan | (117.609733, 31.794555) | 41.92 | Feidong | Yes |
8 | Forest-grass-livestock | (117.1645578, 31.71296213) | Feixi | (116.927803, 31.720662) | 26.29 | Feixi | No |
9 | Forest-grass-livestock | (117.1645578, 31.71296213) | Feixi | (116.927803, 31.720662) | 26.29 | Feixi | No |
10 | Forest-grass-livestock | (117.1645578, 31.71296213) | Feixi | (116.927803, 31.720662) | 26.29 | Feixi | No |
11 | Forest-grass-livestock | (117.1645578, 31.71296213) | Feixi | (116.927803, 31.720662) | 26.29 | Feixi | No |
Id | Forest Ecological Pattern | Corrected Location | Nearest Actual Location | Offset (km) |
---|---|---|---|---|
1 | Forest-grass-livestock | (116.920334, 31.718822) | (116.920334, 31.718822) | 0 |
2 | Forest-grass-livestock | (116.920334, 31.718822) | (116.920334, 31.718822) | 0 |
3 | Forest-orchard | (117.171331, 31.843366) | (117.179421, 31.912389) | 7.71 |
4 | Forest-crop | (117.661231, 31.855506) | (117.661231, 31.855506) | 0 |
5 | Forest-herb | (117.866917, 31.828833) | (117.866917, 31.828833) | 0 |
6 | Forest-herb | (117.866917, 31.828833) | (117.866917, 31.828833) | 0 |
7 | Forest-mushroom | (117.609733, 31.794555) | (117.609733, 31.794555) | 0 |
8 | Forest-grass-livestock | (116.927803, 31.720662) | (116.927803, 31.720662) | 0 |
9 | Forest-grass-livestock | (116.927803, 31.720662) | (116.927803, 31.720662) | 0 |
10 | Forest-grass-livestock | (116.927803, 31.720662) | (116.927803, 31.720662) | 0 |
11 | Forest-grass-livestock | (116.927803, 31.720662) | (116.927803, 31.720662) | 0 |
Average | 0.70 |
Group | Toponym Recognition Algorithm | Toponym Resolution Gazetteers | Without the TC-RSI Avg. offset (km) | With the TC-RSI Avg. Offset (km) | Decreasing Offset Distance (km) |
---|---|---|---|---|---|
1 | NLPIR | Amap | 39.65 | 0.82 | +38.83 |
pyltp | Amap | 68.81 | 2.21 | +66.60 | |
SpaCy | Amap | 42.10 | 1.44 | +40.66 | |
Jieba | Amap | 61.52 | 2.01 | +59.51 | |
2 | NLPIR | Amap | 39.65 | 0.82 | +38.83 |
NLPIR | Baidu | 46.32 | 1.29 | +45.03 | |
NLPIR | Geonames | 73.99 | 3.21 | +70.78 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, S.; Yan, X.; Zhu, Y.; Song, J.; Sun, K.; Li, W.; Hu, L.; Qi, Y.; Xu, H. New Era for Geo-Parsing to Obtain Actual Locations: A Novel Toponym Correction Method Based on Remote Sensing Images. Remote Sens. 2022, 14, 4725. https://doi.org/10.3390/rs14194725
Wang S, Yan X, Zhu Y, Song J, Sun K, Li W, Hu L, Qi Y, Xu H. New Era for Geo-Parsing to Obtain Actual Locations: A Novel Toponym Correction Method Based on Remote Sensing Images. Remote Sensing. 2022; 14(19):4725. https://doi.org/10.3390/rs14194725
Chicago/Turabian StyleWang, Shu, Xinrong Yan, Yunqiang Zhu, Jia Song, Kai Sun, Weirong Li, Lei Hu, Yanmin Qi, and Huiyao Xu. 2022. "New Era for Geo-Parsing to Obtain Actual Locations: A Novel Toponym Correction Method Based on Remote Sensing Images" Remote Sensing 14, no. 19: 4725. https://doi.org/10.3390/rs14194725
APA StyleWang, S., Yan, X., Zhu, Y., Song, J., Sun, K., Li, W., Hu, L., Qi, Y., & Xu, H. (2022). New Era for Geo-Parsing to Obtain Actual Locations: A Novel Toponym Correction Method Based on Remote Sensing Images. Remote Sensing, 14(19), 4725. https://doi.org/10.3390/rs14194725