Next Article in Journal
An Improved Brain-Inspired Emotional Learning Algorithm for Fast Classification
Previous Article in Journal
An Easily Understandable Grey Wolf Optimizer and Its Application to Fuzzy Controller Tuning
Previous Article in Special Issue
Development of Filtered Bispectrum for EEG Signal Feature Extraction in Automatic Emotion Recognition Using Artificial Neural Networks
Article Menu

Export Article

Open AccessArticle

Cross-Language Plagiarism Detection System Using Latent Semantic Analysis and Learning Vector Quantization

Department of Electrical Engineering, Faculty of Enginering, Universitas Indonesia, Depok 16424, Indonesia
Author to whom correspondence should be addressed.
This paper is an extended version of a conference paper entitled “Analysis of the Effect of Term–document Matrix on the Accuracy of Latent Semantic Analysis-Based Cross-Language Plagiarism Detection” presented at the International Conference on Network, Communication and Computing 2016 (ICNCC 2016) at Kyoto, Japan, 17–20 December 2016.
Academic Editor: Andras Farago
Algorithms 2017, 10(2), 69;
Received: 31 March 2017 / Revised: 16 May 2017 / Accepted: 10 June 2017 / Published: 13 June 2017
(This article belongs to the Special Issue Networks, Communication, and Computing)
PDF [2181 KB, uploaded 16 June 2017]


Computerized cross-language plagiarism detection has recently become essential. With the scarcity of scientific publications in Bahasa Indonesia, many Indonesian authors frequently consult publications in English in order to boost the quantity of scientific publications in Bahasa Indonesia (which is currently rising). Due to the syntax disparity between Bahasa Indonesia and English, most of the existing methods for automated cross-language plagiarism detection do not provide satisfactory results. This paper analyses the probability of developing Latent Semantic Analysis (LSA) for a computerized cross-language plagiarism detector for two languages with different syntax. To improve performance, various alterations in LSA are suggested. By using a linear vector quantization (LVQ) classifier in the LSA and taking into account the Frobenius norm, output has reached up to 65.98% in accuracy. The results of the experiments showed that the best accuracy achieved is 87% with a document size of 6 words, and the document definition size must be kept below 10 words in order to maintain high accuracy. Additionally, based on experimental results, this paper suggests utilizing the frequency occurrence method as opposed to the binary method for the term–document matrix construction. View Full-Text
Keywords: Latent Semantic Analysis; learning vector quantization; plagiarism detection system Latent Semantic Analysis; learning vector quantization; plagiarism detection system

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).

Share & Cite This Article

MDPI and ACS Style

Ratna, A.A.P.; Purnamasari, P.D.; Adhi, B.A.; Ekadiyanto, F.A.; Salman, M.; Mardiyah, M.; Winata, D.J. Cross-Language Plagiarism Detection System Using Latent Semantic Analysis and Learning Vector Quantization . Algorithms 2017, 10, 69.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics



[Return to top]
Algorithms EISSN 1999-4893 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top