Next Article in Journal
Optimization of Mixed Numerology Profiles for 5G Wireless Communication Scenarios
Next Article in Special Issue
RF-Based UAV Detection and Identification Using Hierarchical Learning Approach
Previous Article in Journal
Practices and Applications of Convolutional Neural Network-Based Computer Vision Systems in Animal Farming: A Review
 
 
Article

Machine Translation Utilizing the Frequent-Item Set Concept

1
Department of Computer Sciences, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, Riyadh P.O. Box 11671, Saudi Arabia
2
Department of Information Systems, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, Riyadh P.O. Box 11671, Saudi Arabia
*
Author to whom correspondence should be addressed.
Academic Editor: Ansar-Ul-Haque Yasar
Sensors 2021, 21(4), 1493; https://doi.org/10.3390/s21041493
Received: 30 November 2020 / Revised: 13 February 2021 / Accepted: 17 February 2021 / Published: 21 February 2021
In this paper, we introduce new concepts in the machine translation paradigm. We treat the corpus as a database of frequent word sets. A translation request triggers association rules joining phrases present in the source language, and phrases present in the target language. It has to be noted that a sequential scan of the corpus for such phrases will increase the response time in an unexpected manner. We introduce the pre-processing of the bilingual corpus through proposing a data structure called Corpus-Trie (CT) that renders a bilingual parallel corpus in a compact data structure representing frequent data items sets. We also present algorithms which utilize the CT to respond to translation requests and explore novel techniques in exhaustive experiments. Experiments were performed on specific language pairs, although the proposed method is not restricted to any specific language. Moreover, the proposed Corpus-Trie can be extended from bilingual corpora to accommodate multi-language corpora. Experiments indicated that the response time of a translation request is logarithmic to the count of unrepeated phrases in the original bilingual corpus (and thus, the Corpus-Trie size). In practical situations, 5–20% of the log of the number of the nodes have to be visited. The experimental results indicate that the BLEU score for the proposed CT system increases with the size of the number of phrases in the CT, for both English-Arabic and English-French translations. The proposed CT system was demonstrated to be better than both Omega-T and Apertium in quality of translation from a corpus size exceeding 1,600,000 phrases for English-Arabic translation, and 300,000 phrases for English-French translation. View Full-Text
Keywords: machine translation; frequent-item set; bilingual corpus; BLEU score machine translation; frequent-item set; bilingual corpus; BLEU score
Show Figures

Figure 1

MDPI and ACS Style

Mahmoud, H.A.H.; Mengash, H.A. Machine Translation Utilizing the Frequent-Item Set Concept. Sensors 2021, 21, 1493. https://doi.org/10.3390/s21041493

AMA Style

Mahmoud HAH, Mengash HA. Machine Translation Utilizing the Frequent-Item Set Concept. Sensors. 2021; 21(4):1493. https://doi.org/10.3390/s21041493

Chicago/Turabian Style

Mahmoud, Hanan A. Hosni, and Hanan Abdullah Mengash. 2021. "Machine Translation Utilizing the Frequent-Item Set Concept" Sensors 21, no. 4: 1493. https://doi.org/10.3390/s21041493

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop