Next Article in Journal
Congestion Adaptive Traffic Light Control and Notification Architecture Using Google Maps APIs
Next Article in Special Issue
Continuous Genetic Algorithms as Intelligent Assistance for Resource Distribution in Logistic Systems
Previous Article in Journal
Effect of Heat-Producing Needling Technique on the Local Skin Temperature: Clinical Dataset
Previous Article in Special Issue
The Extended Multidimensional Neo-Fuzzy System and Its Fast Learning in Pattern Recognition Tasks
Article Menu

Export Article

Open AccessArticle

Similar Text Fragments Extraction for Identifying Common Wikipedia Communities

1
Department of Intelligent Computer Systems, National Technical University “Kharkiv Polytechnic Institute”, 61002 Kharkiv, Ukraine
2
Department of Information Systems, Poznan University of Economics and Business, 61-875 Poznan, Poland
3
Institute of Information and Computational Technologies, Almaty 050010, Kazakhstan
4
Department of Informatics, Al-Farabi Kazakh National University, Almaty 050040, Kazakhstan
*
Authors to whom correspondence should be addressed.
Received: 4 November 2018 / Revised: 9 December 2018 / Accepted: 10 December 2018 / Published: 13 December 2018
(This article belongs to the Special Issue Data Stream Mining and Processing)
  |  
PDF [1786 KB, uploaded 14 December 2018]
  |     |  

Abstract

Similar text fragments extraction from weakly formalized data is the task of natural language processing and intelligent data analysis and is used for solving the problem of automatic identification of connected knowledge fields. In order to search such common communities in Wikipedia, we propose to use as an additional stage a logical-algebraic model for similar collocations extraction. With Stanford Part-Of-Speech tagger and Stanford Universal Dependencies parser, we identify the grammatical characteristics of collocation words. With WordNet synsets, we choose their synonyms. Our dataset includes Wikipedia articles from different portals and projects. The experimental results show the frequencies of synonymous text fragments in Wikipedia articles that form common information spaces. The number of highly frequented synonymous collocations can obtain an indication of key common up-to-date Wikipedia communities. View Full-Text
Keywords: information extraction; short text fragment similarity; Wikipedia communities; NLP information extraction; short text fragment similarity; Wikipedia communities; NLP
Figures

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).
SciFeed

Share & Cite This Article

MDPI and ACS Style

Petrasova, S.; Khairova, N.; Lewoniewski, W.; Mamyrbayev, O.; Mukhsina, K. Similar Text Fragments Extraction for Identifying Common Wikipedia Communities. Data 2018, 3, 66.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Metrics

Article Access Statistics

1

Comments

[Return to top]
Data EISSN 2306-5729 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top