Next Article in Journal
Investigation of Spoken-Language Detection and Classification in Broadcasted Audio Content
Previous Article in Journal
The Effect of Augmented Reality on Students’ Learning Performance in Stem Education
Open AccessArticle

Recognizing Indonesian Acronym and Expansion Pairs with Supervised Learning and MapReduce

1
Department of Informatics, Universitas Syiah Kuala, Banda Aceh 23111, Indonesia
2
Department of Electrical and Computer Engineering, Universitas Syiah Kuala, Banda Aceh 23111, Indonesia
3
Department of Statistics, Universitas Syiah Kuala, Banda Aceh 23111, Indonesia
*
Author to whom correspondence should be addressed.
Information 2020, 11(4), 210; https://doi.org/10.3390/info11040210
Received: 10 February 2020 / Revised: 3 April 2020 / Accepted: 10 April 2020 / Published: 15 April 2020
(This article belongs to the Section Information and Communications Technology)
During the previous decades, intelligent identification of acronym and expansion pairs from a large corpus has garnered considerable research attention, particularly in the fields of text mining, entity extraction, and information retrieval. Herein, we present an improved approach to recognize the accurate acronym and expansion pairs from a large Indonesian corpus. Generally, an acronym can be either a combination of uppercase letters or a sequence of speech sounds (syllables). Our proposed approach can be computationally divided into four steps: (1) acronym candidate identification; (2) acronym and expansion pair collection; (3) feature generation; and (4) acronym and expansion pair recognition using supervised learning techniques. Further, we introduce eight numerical features and evaluate their effectiveness in representing the acronym and expansion pairs based on the precision, recall, and F-measure. Furthermore, we compare the k-nearest neighbors (K-NN), support vector machine (SVM), and bidirectional encoder representations from transformers (BERT) algorithms in terms of accurate acronym and expansion pair classification. The experimental results indicate that the SVM polynomial model that considers eight features exhibits the highest accuracy (97.93%), surpassing those of the SVM polynomial model that considers five features (90.45%), the K-NN algorithm with k = 3 that considers eight features (96.82%), the K-NN algorithm with k = 3 that considers five features (95.66%), BERT-Base model (81.64%), and BERT-Base Multilingual Cased model (88.10%). Moreover, we analyze the performance of the Hadoop technology using various numbers of data nodes to identify the acronym and expansion pairs and obtain their feature vectors. The results reveal that the Hadoop cluster containing a large number of data nodes is faster than that with fewer data nodes when processing from ten million to one hundred million pairs of acronyms and expansions. View Full-Text
Keywords: acronym and expansion pair recognition; feature vectors; mapreduce; supervised learning techniques acronym and expansion pair recognition; feature vectors; mapreduce; supervised learning techniques
Show Figures

Figure 1

MDPI and ACS Style

Abidin, T.F.; Mahazir, A.; Subianto, M.; Munadi, K.; Ferdhiana, R. Recognizing Indonesian Acronym and Expansion Pairs with Supervised Learning and MapReduce. Information 2020, 11, 210. https://doi.org/10.3390/info11040210

AMA Style

Abidin TF, Mahazir A, Subianto M, Munadi K, Ferdhiana R. Recognizing Indonesian Acronym and Expansion Pairs with Supervised Learning and MapReduce. Information. 2020; 11(4):210. https://doi.org/10.3390/info11040210

Chicago/Turabian Style

Abidin, Taufik F.; Mahazir, Amir; Subianto, Muhammad; Munadi, Khairul; Ferdhiana, Ridha. 2020. "Recognizing Indonesian Acronym and Expansion Pairs with Supervised Learning and MapReduce" Information 11, no. 4: 210. https://doi.org/10.3390/info11040210

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Search more from Scilit
 
Search
Back to TopTop