Next Article in Journal
Congestion Adaptive Traffic Light Control and Notification Architecture Using Google Maps APIs
Next Article in Special Issue
Continuous Genetic Algorithms as Intelligent Assistance for Resource Distribution in Logistic Systems
Previous Article in Journal
Effect of Heat-Producing Needling Technique on the Local Skin Temperature: Clinical Dataset
Previous Article in Special Issue
The Extended Multidimensional Neo-Fuzzy System and Its Fast Learning in Pattern Recognition Tasks
Open AccessArticle

Similar Text Fragments Extraction for Identifying Common Wikipedia Communities

1
Department of Intelligent Computer Systems, National Technical University “Kharkiv Polytechnic Institute”, 61002 Kharkiv, Ukraine
2
Department of Information Systems, Poznan University of Economics and Business, 61-875 Poznan, Poland
3
Institute of Information and Computational Technologies, Almaty 050010, Kazakhstan
4
Department of Informatics, Al-Farabi Kazakh National University, Almaty 050040, Kazakhstan
*
Authors to whom correspondence should be addressed.
Received: 4 November 2018 / Revised: 9 December 2018 / Accepted: 10 December 2018 / Published: 13 December 2018
(This article belongs to the Special Issue Data Stream Mining and Processing)
Similar text fragments extraction from weakly formalized data is the task of natural language processing and intelligent data analysis and is used for solving the problem of automatic identification of connected knowledge fields. In order to search such common communities in Wikipedia, we propose to use as an additional stage a logical-algebraic model for similar collocations extraction. With Stanford Part-Of-Speech tagger and Stanford Universal Dependencies parser, we identify the grammatical characteristics of collocation words. With WordNet synsets, we choose their synonyms. Our dataset includes Wikipedia articles from different portals and projects. The experimental results show the frequencies of synonymous text fragments in Wikipedia articles that form common information spaces. The number of highly frequented synonymous collocations can obtain an indication of key common up-to-date Wikipedia communities. View Full-Text
Keywords: information extraction; short text fragment similarity; Wikipedia communities; NLP information extraction; short text fragment similarity; Wikipedia communities; NLP
Show Figures

Figure 1

MDPI and ACS Style

Petrasova, S.; Khairova, N.; Lewoniewski, W.; Mamyrbayev, O.; Mukhsina, K. Similar Text Fragments Extraction for Identifying Common Wikipedia Communities. Data 2018, 3, 66.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop