Next Article in Journal
Data Analysis, Simulation and Visualization for Environmentally Safe Maritime Data
Previous Article in Journal
Power Allocation Algorithm for an Energy-Harvesting Wireless Transmission System Considering Energy Losses
Previous Article in Special Issue
Trajectory Clustering and k-NN for Robust Privacy Preserving Spatiotemporal Databases
Open AccessArticle

Ensemble and Deep Learning for Language-Independent Automatic Selection of Parallel Data

Department of Informatics, Ionian University, 491 00 Kerkira, Greece
*
Author to whom correspondence should be addressed.
Algorithms 2019, 12(1), 26; https://doi.org/10.3390/a12010026
Received: 31 October 2018 / Revised: 25 December 2018 / Accepted: 15 January 2019 / Published: 18 January 2019
(This article belongs to the Special Issue Humanistic Data Mining: Tools and Applications)
Machine translation is used in many applications in everyday life. Due to the increase of translated documents that need to be organized as useful or not (for building a translation model), the automated categorization of texts (classification), is a popular research field of machine learning. This kind of information can be quite helpful for machine translation. Our parallel corpora (English-Greek and English-Italian) are based on educational data, which are quite difficult to translate. We apply two state of the art architectures, Random Forest (RF) and Deeplearnig4j (DL4J), to our data (which constitute three translation outputs). To our knowledge, this is the first time that deep learning architectures are applied to the automatic selection of parallel data. We also propose new string-based features that seem to be effective for the classifier, and we investigate whether an attribute selection method could be used for better classification accuracy. Experimental results indicate an increase of up to 4% (compared to our previous work) using RF and rather satisfactory results using DL4J. View Full-Text
Keywords: machine learning; deep learning; education data; data selection; machine translation; DL4J deep learning architecture; random forest machine learning; deep learning; education data; data selection; machine translation; DL4J deep learning architecture; random forest
Show Figures

Figure 1

MDPI and ACS Style

Mouratidis, D.; Kermanidis, K.L. Ensemble and Deep Learning for Language-Independent Automatic Selection of Parallel Data. Algorithms 2019, 12, 26.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map

1
Back to TopTop