Extraction of Terms Related to Named Rivers
Abstract
:1. Introduction
1.1. Motivations for the Research
1.2. Distributional Semantic Models
2. Materials and Methods
2.1. Materials
2.1.1. Corpus Data
2.1.2. GeoNames Geographic Database
2.2. Methodology
2.2.1. Pre-Processing
2.2.2. Named River Recognition
2.2.3. Term-Term Matrix Construction
2.2.4. Term Selection Procedure and Weighting Schemes
2.2.5. Clustering of Named Rivers
2.2.6. Terms Characterizing each Cluster
- For each of the named rivers in the 13 clusters, a set of the top 30 terms, most semantically related to each river, was extracted from the DSM using cosine similarity.
- For each cluster, the mathematical operation set intersection was applied to the sets of the top 30 terms, most semantically related to the rivers in the same cluster. Only the shared terms with a cosine similarity higher than 0.55 were selected.
3. Results
3.1. First Cluster: Sakawa, Tenryu and Magome Rivers
3.2. Twelfth Cluster: Omaru and Mimigawa Rivers
4. Discussion
Author Contributions
Funding
Conflicts of Interest
References
- Alrabia, Maha, Nawal Alhelewh, AbdulMalik Al-Salman, and Eric Atwell. 2014. An Empirical Study on the Holy Quran Based on A Large Classical Arabic Corpus. International Journal of Computational Linguistics 5: 1–13. [Google Scholar]
- Ars, Fatemeh, Jon Willits, and Michael Jones. 2016. Comparing Predictive and Co-occurrence Based Models of Lexical Semantics Trained on Child-directed Speech. Paper presented at 38th Annual Conference of the Cognitive Science Society, CogSci, Austin, TX, USA, August 10–13; pp. 1092–97. [Google Scholar]
- Baroni, Marco, Georgiana Dinu, and Germán Kruszewski. 2014. Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors. Paper presented at 52nd Annual Meeting of the Association for Computational Linguistics, ACL, Baltimore, MD, USA, June 22–27; vol. 1, pp. 238–47. [Google Scholar]
- Bernier-Colborne, Gabriel, and Patrick Drouin. 2016. Evaluation of distributional semantic models: A holistic approach. Paper presented at 5th International Workshop on Computational Terminology, CompuTerm, Osaka, Japan, December 12; pp. 52–61. [Google Scholar]
- Bertels, Ann, and Dirk Speelman. 2014. Clustering for semantic purposes: Exploration of semantic similarity in a technical corpus. Terminology 20: 279–303. [Google Scholar]
- Blei, David M., Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet Allocation. Journal of Machine Learning Research 3: 993–1022. [Google Scholar]
- Bullinaria, John A., and Joseph P. Levy. 2007. Extracting semantic representations from word co-occurrence statistics: A computational study. Behavior Research Methods 39: 510–26. [Google Scholar] [CrossRef] [PubMed]
- Cabezas-García, Melania, and Pamela Faber. 2018. Phraseology in specialized resources: An approach to complex nominals. Lexicography 5: 55–83. [Google Scholar] [CrossRef]
- Evert, Stefan. 2008. Corpora and Collocations. In Corpus Linguistics. An International Handbook. Edited by Anke Lüdeling and Merja Kytö. Berlin: Mouton de Gruyter, chp. 58. [Google Scholar]
- Faber, Pamela. 2009. The cognitive shift in terminology and specialized translation. MonTI. Monografías de Traducción e Interpretación 1: 107–34. [Google Scholar] [CrossRef]
- Faber, Pamela. 2011. The Dynamics of Specialized Knowledge Representation: Simulational Reconstruction or the Perception action Interface. Terminology 17: 9–29. [Google Scholar]
- Faber, Pamela, ed. 2012. A Cognitive Linguistics View of Terminology and Specialized Language. Berlin and Boston: De Gruyter Mouton. [Google Scholar]
- Faber, Pamela, Pilar León-Araúz, and Juan Antonio Prieto. 2009. Semantic Relations, Dynamicity, and Terminological Knowledge Bases. Current Issues in Language Studies 1: 1–23. [Google Scholar]
- Gries, Stefan, and Anatol Stefanowitsch. 2010. Cluster analysis and the identification of collexeme classes. In Empirical and Experimental Methods in Cognitive/Functional Research. Edited by Sally Rice and John Newman. Stanford: CSLI, pp. 73–90. [Google Scholar]
- Jurafsky, Daniel, and James Martin. 2017. Vector Semantics. In Speech and Language Processing. Unpublished Draft of August 28. [Google Scholar]
- Kiela, Douwe, and Stephen Clark. 2014. A Systematic Study of Semantic Vector Space Model Parameters. Paper presented at 2nd Workshop on Continuous Vector Space Models and their Compositionality (CVSC), EACL, Gothenburg, Sweden, April 26–30; pp. 21–30. [Google Scholar]
- Kilgarriff, Adam, Pavel Rychlý, Pavel Smrz, and David Tugwell. 2004. The Sketch Engine. Paper presented at 11th EURALEX International Congress, Lorient, France, July 6–10; pp. 105–15. [Google Scholar]
- Lapesa, Gabriella, Stefan Evert, and Sabine Schulte im Walde. 2014. Contrasting Syntagmatic and Paradigmatic Relations: Insights from Distributional Semantic Models. Paper presented at 3rd Joint Conference on Lexical and Computational Semantics, SEM’2014, Dublin, Ireland, August 23–24; pp. 160–70. [Google Scholar]
- León-Araúz, Pilar, Arianne Reimerink, and Pamela Faber. 2013. Multidimensional and Multimodal Information in EcoLexicon. In Computational Linguistics. Edited by Adam Przepiórkowski, Maciej Piasecki, Krzysztof Jassem and Piotr Fuglewicz. Berlin: Springer, pp. 143–61. [Google Scholar]
- León-Araúz, Pilar, Antonio San Martín, and Pamela Faber. 2016. Pattern-based Word Sketches for the Extraction of Semantic Relations. Paper presented at 5th International Workshop on Computational Terminology, CompuTerm, Osaka, Japan, December 12; pp. 73–82. [Google Scholar]
- León-Araúz, Pilar, Antonio San Martín, and Arianne Reimerink. 2018. The EcoLexicon English corpus as an open corpus in Sketch Engine. Paper presented at 18th EURALEX International Congress, Ljubljana, July 17–21; pp. 893–901. [Google Scholar]
- Levi, Judith. 1978. The Syntax and Semantics of Complex Nominals. New York: Academic Press. [Google Scholar]
- Meyer, Ingrid. 2001. Extracting knowledge-rich contexts for terminography: A conceptual and methodogical framework. In Recent Advances in Computational Terminology. Edited by Didier Bourigault, Chistian Jacquemin and Marie-Claude L’Homme. Amsterdam and Philadelphia: John Benjamins, pp. 279–302. [Google Scholar]
- Meyer, Ingrid, and Kristen Mackintosh. 1996. Refining the terminographer’s concept-analysis methods: How can phraseology help? Terminology 3: 1–26. [Google Scholar] [CrossRef]
- Mikolov, Tomas, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. Paper presented at International Conference on Learning Representations, ICLR, Scottsdale, AZ, USA, May 2–4. [Google Scholar]
- Miller, George, and Walter Charles. 1991. Contextual correlates of semantic similarity. Language and Cognitive Processes 6: 1–28. [Google Scholar] [CrossRef]
- Moisl, Hermann. 2015. Cluster Analysis for Corpus Linguistics. Berlin: De Gruyter Mouton. [Google Scholar]
- Nakov, Preslav. 2013. On the interpretation of noun compounds: Syntax, semantics, and entailment. Natural Language Engineering 19: 291–330. [Google Scholar] [CrossRef] [Green Version]
- Pantel, Patrick, and Dekang Lin. 2002. Discovering Word Senses from Text. Paper presented at ACM Conference on Knowledge Discovery and Data Mining, KDD-02, Edmonton, AB, Canada, July 23–26; pp. 613–19. [Google Scholar]
- Rohde, Douglas, Laura Gonnerman, and David Plaut. 2006. An Improved Model of Semantic Similarity Based on Lexical Co-Occurrence. Communications of the ACM 8: 627–33. [Google Scholar]
- Rojas-García, Juan, and Melania Cabezas-García. forthcoming. Use of Knowledge Patterns for the Evaluation of Semiautomatically-Induced Semantic Clusters. Serie Forum für Fachsprachen-Forschung; Berlin: Frank & Timme.
- Sager, Juan C., David Dungworth, and Peter F. McDonald. 1980. English Special Languages. Principles and Practice in Science and Technology. Wiesbaden: Brandstetter Verlag. [Google Scholar]
- Sahlgren, Magnus, and Alessandro Lenci. 2016. The Effects of Data Size and Frequency Range on Distributional Semantic Models. Paper presented at 2016 Conference on Empirical Methods in Natural Language Processing, Austin, TX, USA, November 1–5; pp. 975–80. [Google Scholar]
- Shutova, Ekaterina, Lin Sun, and Anna Korhonen. 2010. Metaphor identification using verb and noun clustering. Paper presented at 23rd International Conference on Computational Linguistics, COLING, Beijing, China, August 23–27; vol. 2, pp. 1002–10. [Google Scholar]
- Suzuki, Ryota, and Hidetoshi Shimodaira. 2004. An application of multiscale bootstrap resampling to hierarchical clustering of microarray data: How accurate are these clusters? Paper presented at Fifteenth International Conference on Genome Informatics, GIW2004, Yokohama, Japan, December 13–15. [Google Scholar]
- Suzuki, Ryota, and Hidetoshi Shimodaira. 2006. Pvclust: An R package for assessing the uncertainty in hierarchical clustering. Bioinformatics 22: 1540–42. [Google Scholar] [CrossRef] [PubMed]
1 | |
2 | |
3 | |
4 | |
5 |
Cluster 1 (Japan) | Cluster 12 (Japan) |
---|---|
Sakawa River | Omaru River |
Sakawa River Mouth | Mimigawa River |
Tenryu River | |
Tenryu River Mouth | |
Tenryu River Delta | |
Magome River Mouth |
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Rojas-Garcia, J.; Faber, P. Extraction of Terms Related to Named Rivers. Languages 2019, 4, 46. https://doi.org/10.3390/languages4030046
Rojas-Garcia J, Faber P. Extraction of Terms Related to Named Rivers. Languages. 2019; 4(3):46. https://doi.org/10.3390/languages4030046
Chicago/Turabian StyleRojas-Garcia, Juan, and Pamela Faber. 2019. "Extraction of Terms Related to Named Rivers" Languages 4, no. 3: 46. https://doi.org/10.3390/languages4030046
APA StyleRojas-Garcia, J., & Faber, P. (2019). Extraction of Terms Related to Named Rivers. Languages, 4(3), 46. https://doi.org/10.3390/languages4030046