Next Article in Journal
Bimetal CuFe Nanoparticles—Synthesis, Properties, and Applications
Previous Article in Journal
A New In Vitro Model to Evaluate Anti-Adhesive Effect against Fungal Nail Infections
Article

Exploring the Data Efficiency of Cross-Lingual Post-Training in Pretrained Language Models

1
Department of Computer Science and Engineering, Korea University, 145, Anam-ro, Seongbuk-gu, Seoul 02841, Korea
2
Wisenut Inc., 49, Daewangpangyo-ro 644beon-gil, Bundang-gu, Seongnam-si, Gyeonggi-do 13493, Korea
3
Independent Researcher, Sujeong-gu, Seongnam-si, Gyeonggi-do 13106, Korea
*
Author to whom correspondence should be addressed.
Academic Editor: Rafael Valencia-Garcia
Appl. Sci. 2021, 11(5), 1974; https://doi.org/10.3390/app11051974
Received: 7 February 2021 / Revised: 14 February 2021 / Accepted: 19 February 2021 / Published: 24 February 2021
(This article belongs to the Section Computing and Artificial Intelligence)
Language model pretraining is an effective method for improving the performance of downstream natural language processing tasks. Even though language modeling is unsupervised and thus collecting data for it is relatively less expensive, it is still a challenging process for languages with limited resources. This results in great technological disparity between high- and low-resource languages for numerous downstream natural language processing tasks. In this paper, we aim to make this technology more accessible by enabling data efficient training of pretrained language models. It is achieved by formulating language modeling of low-resource languages as a domain adaptation task using transformer-based language models pretrained on corpora of high-resource languages. Our novel cross-lingual post-training approach selectively reuses parameters of the language model trained on a high-resource language and post-trains them while learning language-specific parameters in the low-resource language. We also propose implicit translation layers that can learn linguistic differences between languages at a sequence level. To evaluate our method, we post-train a RoBERTa model pretrained in English and conduct a case study for the Korean language. Quantitative results from intrinsic and extrinsic evaluations show that our method outperforms several massively multilingual and monolingual pretrained language models in most settings and improves the data efficiency by a factor of up to 32 compared to monolingual training. View Full-Text
Keywords: cross-lingual; pretraining; language model; transfer learning; deep learning; RoBERTa cross-lingual; pretraining; language model; transfer learning; deep learning; RoBERTa
Show Figures

Figure 1

MDPI and ACS Style

Lee, C.; Yang, K.; Whang, T.; Park, C.; Matteson, A.; Lim, H. Exploring the Data Efficiency of Cross-Lingual Post-Training in Pretrained Language Models. Appl. Sci. 2021, 11, 1974. https://doi.org/10.3390/app11051974

AMA Style

Lee C, Yang K, Whang T, Park C, Matteson A, Lim H. Exploring the Data Efficiency of Cross-Lingual Post-Training in Pretrained Language Models. Applied Sciences. 2021; 11(5):1974. https://doi.org/10.3390/app11051974

Chicago/Turabian Style

Lee, Chanhee; Yang, Kisu; Whang, Taesun; Park, Chanjun; Matteson, Andrew; Lim, Heuiseok. 2021. "Exploring the Data Efficiency of Cross-Lingual Post-Training in Pretrained Language Models" Appl. Sci. 11, no. 5: 1974. https://doi.org/10.3390/app11051974

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Search more from Scilit
 
Search
Back to TopTop