Special Issue "Computational Linguistics for Low-Resource Languages"
A special issue of Information (ISSN 2078-2489).
Deadline for manuscript submissions: 30 June 2019
After years of neglect, low-resource languages (be they minority, regional, endangered, or heritage languages) have made it to the scene of computational linguistics, thanks to the increased availability of digital devices, which make the request for digital usability of low-resource languages stronger. Preservation, revitalisation, and documentation purposes also call for the availability of computational methodologies for these languages, often upon request of speakers’ communities themselves.
In addition to their applicative interest, low-resource languages are a challenging case for computational linguistics per se. By expanding the range of languages traditionally studied by computational linguistics, low-resource languages often represent a test-bed for validating current methods and techniques. In an era dominated by big data, for instance, the data sparseness of low-resource languages requires alternative approaches. Limited availability of expert human resources, on the other side, calls for and questions crowdsourcing approaches and brings in the picture issues of data protection and community involvement.
The goal of this Special Issue is to collect current research in computational linguistics for low-resource languages for a variety of languages, tasks, and applications. We invite submissions of high-quality, original technical and survey papers addressing both theoretical and practical aspects, including their ethical and social implications. We also hope that this Special Issue will not only represent a showcase for promising research but will also contribute to raising awareness about the importance of maintaining linguistic diversity, an effort to which computational linguistics can make an important contribution.
Dr. Claudia Soria
Manuscript Submission Information
Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All papers will be peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.
Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Information is an international peer-reviewed open access monthly journal published by MDPI.
Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1000 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.
- Low-resource languages
- Computational linguistics
- Natural language processing
- Linguistic diversity
The below list represents only planned manuscripts. Some of these manuscripts have not been received by the Editorial Office yet. Papers submitted to MDPI journals are subject to peer-review.
Authors: Donghui Lin, Yohei Murakami, and Toru Ishida
Affiliation: Department of Social Informatics, Kyoto University
Abstract: The lack of language resources makes it difficult for various tasks in Natural Language Processing (NLP) of low-resource languages, such as machine translation and cross-lingual information retrieval. In previous research, pivot language and cognate recognition approaches have been studied to create some specific language resources like bilingual lexicon for low-resource languages. In this paper, we aim at providing a general framework for creation and customization of language resources for low-resource languages from available language resources. To achieve this goal, we first develop a service-oriented language infrastructure on the Web, the Language Grid to share and combine language resources as language services. Then, we propose a generalized constraint approach to automatic creation of language services based on pivot languages. Finally, we demonstrate our proposed framework by realizing automatic bilingual lexicon induction for low-resource Turkic languages and Indonesian ethnic languages.
Tentative Title: Investigating backtranslation for the improvement of English-Irish machine translation
Authors: Andy Way and Meghan Dowling
Affiliation: Adapt Centre, Dublin City University
Abstract: In this paper, we discuss the difficulties of building reliable machine translation systems for the English-Irish (EN-GA) language pair. In the context of limited datasets, we report on assessing the use of backtranslation as a method for creating artificial EN-GA data to increase training data for use state-of-the-art data-driven translation systems. We compare our results to earlier work on EN-GA machine translation by Dowling et al (2016, 2017, 2018) showing that while our own systems do not compare in quality with respect to traditionally reported BLEU metrics, we provide a linguistic analysis to suggest that future work with domain specific data may prove more successful.
Tentative Title: Mapping the Circulation of Literary Writings through Aligned Translations: The example of Slavic and Finno-Ugric Translations of Adventures of Huckleberry Finn.
Authors: Amel Fraisse1, Ronald Jenn1, Zheng Zhang2 and Shelley Fisher Fishkin3
Affiliation:University of Lille, France1, LIMSI-CNRS, France2 Stanford University, USA3
Abstract: Because translated texts have been regarded as unreliable due to suspicions of bias and untrustworthiness, they have so far been an overlooked resource in the field of NLP. But localizing, digitizing, and aligning translated texts of well-travelled famous novels can provide a fruitful basis for developing digitized linguistic material in under-resourced language. In this paper we focus on translations of Mark Twain’s Adventures of Huckleberry Finn into a set of Slavic and Finno-Ugric languages in order to map the circulation of ideas and writings and to build up digitalized linguistic material with a view to help preserve the diversity of languages and cultures.
Authors: Björn Gambäck
Affiliation: Department of Computer Science, Norwegian University of Science and Technology
Authors: Anil Kumar Singh
Affiliation: Department of Computer Science and Engineering, Indian Institute of Technology(BHU)
Authors: Wanjiku Nganga
Affiliation:School of Computing and Informatics, University of Nairob
Affiliation:Language Science and Technology, Universität des Saarlandes
Authors: Mathew Magimai Doss
Affiliation: Idiap Research Institute, Martigny, Switzerland
Authors: Jaco Badenhorst and Febe de Wet
Affiliation: Council for Scientific and Industrial Research, South Africa