Semantic Connections in the Complex Sentences for Post-Editing Machine Translation in the Kazakh Language
Abstract
:1. Introduction
- -
- In the Kazakh language, the dependent clause always comes before the independent clause since, in the dependent clause, the verb is in an incomplete form (that is, some completed event must follow this verb);
- -
- Each type of sentence in Kazakh does not have an exact match in form in English and Russian;
- -
- In Kazakh, the semicolon (;) is not used to connect simple sentences to a complex sentence;
- -
- A case when one simple sentence comes in the middle (inside) of another simple sentence.
2. Related Work
3. Rules and Algorithms for Determining and Editing Complex Sentences for the English-Kazakh and Russian-Kazakh Pairs of Languages
Algorithm 1: The algorithm for determining a type of the complex sentence |
1. Obtaining a text. 2. Breaking a text into sentences. 3. Sentences were checked for a match according to the rule template using the FuzzyWuzzy library for fuzzy comparison. If there were several matches, proceed to step 4. If it matches, then this was a complex sentence of type X (the type of a complex sentence that matched); proceed to step 5. If it did not match, this was a simple sentence; proceed to step 5. 4. When there were multiple matches, the type with the highest match value was selected. 5. End |
Algorithm 2: The algorithm for editing a complex sentence |
|
4. Experiments and Results
5. Conclusions and Future Work
- -
- A corpus of complex sentences was assembled, aligned in parallel in English, Kazakh, Russian, and Kazakh;
- -
- Development and implementation of an algorithm for determining the type of complex sentences for post-editing Russian-Kazakh and English-Kazakh machine translation;
- -
- Development and implementation of an algorithm for post-editing complex sentences of Russian-Kazakh and English-Kazakh machine translation.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Yamamoto, Y.; Matsumoto, Y.; Watanabe, T. Dependency patterns of complex sentences and semantic disambiguation for abstract meaning representation parsing. Proceedings of *SEM 2021: The Tenth Joint Conference on Lexical and Computational Semantics, Association for Computational Linguistics, Bangkok, Thailand, 5–6 August 2021; pp. 212–221. [Google Scholar]
- Skrebova, E.G.; Pavlova, Y.E.; Sidorova, N.V. To the question of the influence of linking language means on subordinate relations structuring in a complex sentence (based on the material of the Russian and German languages). Russ. Linguist. Bull. 2020, 23, 61–66. [Google Scholar]
- Yeskermessova, G.A.; Yermekova, T.N.; Zhubaeva, O.S.; Maukanuly, T.; Kenzhekanova, K.K. Communicative and pragmatic function of punctuation. Man India 2017, 97, 265–284. [Google Scholar]
- Qodirova, B.I. Comparison of combined sentences (on the example of Uzbek and Kazakh school textbooks). Pindus J. Cult. Lit. ELT 2021, 9, 96–99. [Google Scholar]
- Luchkova, G.D.; Altayeva, A.K.; Buzhelo, A.S. Semantics and translation of conditional patterns from Kazakh into English and their usage in the language and speech. Mod. Sci. Res. Pract. Appl. 2013, 11311, 126–149. [Google Scholar]
- Amanzholov, S. Sarsen Amanzholov’s Research in the Field of Syntax. 2013. Available online: https://www.referat911.ru/Literatura/srsen-amanzholovty-sintaksis-salasyna-atysty/223386-2436719-place7.html (accessed on 22 May 2022).
- Dubrovina, M.E. The absence of Turkic complex sentences as a consequence of the internal structure of agglutinative languages [Отсутствие тюркских сложноподчиненных предложений как следствие внутреннего устройства агглютинативных языков]. RUDN J. Lang. Stud. Semiot. Semant. 2017, 8, 563–570. (In Russian) [Google Scholar] [CrossRef]
- Naber, D. A Rule-Based Style and Grammar Checker. Bachelor’s Thesis, University of Bielefeld, Bielefeld, Germany, 2003. [Google Scholar]
- Sidorov, G.; Gupta, A.; Tozer, M.; Catala, D.; Catena, A.; Fuentes, S. Rule-based system for automatic grammar correction using syntactic n-grams for English language learning (L2). In Proceedings of the Seventeenth Conference on Computational Natural Language Learning: Shared Task, Association for Computational Linguistics, Sofia, Bulgaria, 8–9 August 2013; pp. 96–101. [Google Scholar]
- Chodorow, M.; Tetreault, J.R.; Han, N. Detection of grammatical errors involving prepositions. In Proceedings of the fourth ACL-SIGSEM workshop on prepositions, Association for Computational Linguistics, Prague, Czech Republic, 28 June 2007; pp. 25–30. [Google Scholar]
- Rozovskaya, A.; Roth, D. Grammar error correction in morphologically rich languages: The case of Russian. Trans. Assoc. Comput. Linguist. 2019, 7, 1–17. [Google Scholar] [CrossRef]
- Ahmadnia, B.; Dorr, B.J. Augmenting neural machine translation through round-trip training approach. Open Comput. Sci. 2019, 9, 268–278. [Google Scholar] [CrossRef]
- Ahmadnia, B.; Aranovich, R.; Dorr, B. Strengthening Low-resource Neural Machine Translation through Joint Learning: The Case of Farsi-Spanish. In Proceedings of the 13th International Conference on Agents and Artificial Intelligence (ICAART 2021), Vienna, Austria, 4–6 February 2021; pp. 475–481. Available online: https://www.scitepress.org/Link.aspx?doi=10.5220/0010362604750481 (accessed on 22 May 2022).
- Kuwanto, G.; Feyza Akyürek, A.; Chara Tourni, I.; Li, S.; Jones, A.G.; Wijaya, D. Low-Resource Machine Translation Training Curriculum Fit for Low-Resource Languages. arXiv 2021, arXiv:2103.13272. [Google Scholar]
- Bekmanova, G.; Yergesh, B.; Sharipbay, A.; Omarbekova, A.; Zakirova, A. Linguistic Foundations of Low-Resource Languages for Speech Synthesis on the Example of the Kazakh Language. In Computational Science and Its Applications—ICCSA 2022 Workshops; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2022; Volume 13379. [Google Scholar]
- Vania, C.; Kementchedjhieva, Y.; Søgaard, A.; Lopez, A. A systematic comparison of methods for low-resource dependency parsing on genuinely low-resource languages. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Hong Kong, China, 5–7 November 2019; pp. 1105–1116. [Google Scholar]
- Washington, J.N.; Salimzyanov, I.; Tyers, F.M. Finite-state morphological transducers for three Kypchak languages. In Proceedings of the 9th Conference on Language Resources and Evaluation, LREC2014, Reykjavik, Iceland, 26–31 May 2014; pp. 3378–3385. [Google Scholar]
- Development of a Free/Open System of Machine Translation from Kazakh into English and Russian (and Vice Versa) Based on the Apertium Platform: Research Report (Final)/Leader; Tukeyev U.A.: Almaty, Kazakhstan, 2017; 77p, No. GR 0115RK00778. (In Russian)
- Tukeyev, U.; Amirova, D.; Karibayeva, A.; Sundetova, A.; Abduali, B. Combined Technology of Lexical Selection in Rule-Based Machine Translation. In Computational Collective Intelligence; Nguyen, N., Papadopoulos, G., Jędrzejowicz, P., Trawiński, B., Vossen, G., Eds.; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2017; Volume 10449. [Google Scholar] [CrossRef]
- NLP-KazNU. Available online: https://github.com/NLP-KazNU/KazCorpus_Compound_and_Complex_Sentences (accessed on 13 June 2022).
- Coughlin, D. Correlating Automated and Human Assessments of Machine Translation Quality. In Proceedings of the Machine Translation Summit IX: Papers, New Orleans, LA, USA, 23–27 September 2003; pp. 23–27. [Google Scholar]
- Koehn, P. Statistical Machine Translation; Cambridge University Press: Cambridge, UK, 2010. [Google Scholar]
- Snover, M.; Dorr, B.; Schwartz, R.; Micciulla, L.; Makhoul, J. A study of translation edit rate with targeted human annotation. In Proceedings of the 7th Conference of the Association for Machine Translation of the Americas (AMTA 2006), Visions for the Future of Machine Translation, Cambridge, MA, USA, 8–12 August 2006; pp. 223–231. [Google Scholar]
Types of Complex Sentences | Quantity of Sentences | ||
---|---|---|---|
In Kazakh | In Russian | In English | |
Complex Sentences (Sabaqtas Qurmalas) | |||
Adversative expression (Qarsylyqty bagynyngqy-sabaqtas) | 101 | 101 | 101 |
Expression behavior of action (Qimyl-syn bagynyngqy-sabaqtas) | 100 | 100 | 100 |
Purpose expression (Maqsat bagynyngqy-sabaqtas) | 100 | 100 | 100 |
Time expression (Mezgil bagynyngqy-sabaqtas) | 150 | 150 | 150 |
Reason expression (Sebep bagynyngqy-sabaqtas) | 70 | 70 | 70 |
Conditional expression (Shartty bagynyngqy-sabaqtas) | 200 | 200 | 200 |
Compound Sentences (Salalas Qurmalas) | |||
Sequential compound sentence (Kezektes salalas) | 100 | 100 | 100 |
Opposite compound sentence (Qarsylyqty salalas) | 254 | 254 | 254 |
Causal compound sentence (Sebep-saldar salalas) | 70 | 70 | 70 |
Selective compound sentence (Talgauly salalas) | 100 | 100 | 100 |
Explanatory compound sentence (Tusindirmeli salalas) | 100 | 100 | 100 |
Ynggailas salalas | 200 | 200 | 200 |
Complex Sentences | Kazakh | Russian | English |
---|---|---|---|
1 | 2 | 3 | 4 |
COMPLEX SENTENCES (SABAQTAS QURMALAS) | |||
Adversative expression | ‘\w+(((са(|м|ң|ңыз|қ|ңдар|ңыздар) да)|(се(|м|ң|ңіз|к|ңдер|ңіздер) де))|((г|к)ен((|ім|ің|і|дерің|дері)мен|(іңіз|іміз|деріңіз) бен)|(ғ|қ)ан ((|ым|ың|ы|дарың|дары)мен|(ыңыз|ымыз|дарыңыз)бен))|(((г|к)ен ((ім|ің|дерің)е|(і|дері)не|(|іңіз|іміз|деріңіз)ге)|(ғ|қ)ан((ым|ың|дарың)а|(ы|дары)на|(|ыңыз|ымыз|дарыңыз)ға)) қара(май|мастан))| (((г|к)ен(ім|ің|іңіз|і|іміз|дерің| деріңіз|дері)|(ғ|қ)ан(ым|ың|ыңыз|ы| ымыз|дарың|дарыңыз| дары)) болмаса)|((аz|е|й|ып|іп|п)(\ )(тұра| тұрып))|((ма|па|ба) са(|м|ң|ңыз|қ|ңдар|ңыздар)|(ме|пе|бе)се(|м|ң| ңіз|к|ңдер| ңіздер))|((|ма|па|ба)стан|(|ме|пе|бе)стен)|((ғ|қ)анша| (г|к)енше))(|\ )\,\ \w+’ | Если\w+ | If\w+ |
Conditional expression | ‘\w+((се(|м|ң|ңіз|к|ңдер|ңіздер)|са(|м|ң|ңыз|қ|ңдар|ңыздар))|((м|п|б)(ай|ей|айынша|ейінше))|(((ғ|қ)ан(|ым|ың|ыңыз|ымыз|дарың|дарыңыз)да)|((г|к)ен(|ім|ің|іңіз|іміз|дерің| деріңіз)де)))(|\ )\,\ \w+’ | Если \w.+, \w.+ \w.+, если \w.+ | If \w.+, \w.+ \w.+ if \w.+ |
1 | 2 | 3 | 4 |
Time expression | ‘\w+(((ғ|қ)анда|(г|к)енде)|(((ғ|қ)анға|(г|к)енге) (д|ш)ейін)|(((ғ|қ)аннан| (г|к)еннен) (кейін|соң|бері))|(((ғ|қ)ан|(г|к)ен) (сайын|кезде|сәтте|соң| уақытта|шақта))|((ғ|қ)анша|(г|к)енше)|((ғ|қ)алы|(г|к)елі)|((м|б|п)(а|е)с бұрын)|((ы|)сы|(і|)сі)мен| ((а|)рда|(е|)рде)|(са(|м|ң|ңыз|қ|ңдар| ңыздар)|се(|м|ң|ңіз|к|ңдер|ңіздер)))(|\ )\,\ \w+’ | когда, пока, прежде чем, покамест, едва, как только, как, как вдруг, с тех пор как, в то время как, до тех пор как, по мере того как и др. (Предложение), [предложение ] | when, as, while, before, after, since, as long as, till/until, whenever, as soon as, by the time, the moment that, hardly … when, no sooner … than, once, immediately, the first/last/next time (sentence), [sentence] |
Expression behavior of action | ‘\w+(((а|е|й|ып|іп|п)(|-ақ))|((м|п|б)(астан|естен))|((ған|ген|қан|кен) (күйі|қалпы|бойы))|(дай|дей|тай|тей)|(((ғ|қ)андай|(г|к)ендей) болып))(|\ )\,\ \w+’ | как, что, чтобы, словно, как будто, точно; (sentence), [sentence] | как, as if, as though (sentence), [sentence] |
Reason expression | ‘\w+(((ғ|қ)андықтан|(г|к)ендіктен)|(((ғ|қ)аны|(г|к)ені) үшін)|(((ғ|қ)ан| (г|к)ен) соң)|((м|б|п)(а|е)й)|(ып|іп|п|а|е|й)| ((((а|й)(мын|сың|сыз| ды|мыз|сыңдар|сыздар))|((е|й)(мін|сің| сіз|ді|міз|сіңдер|сіздер))| ((ар|р)(|мын|сың|сыз|мыз|сыңдар|сыздар))|((ер|р)(|мін|сің|сіз|міз| сіңдер|сіздер))|((ғ|қ)ан(|мын|сың|сыз|быз|сыңдар|сыздар))|((г|к)ен (|мін|сің|сіз|міз|сіңдер| сіздер))|((ып|п)(ты|пын|сың|сыз|пыз| сыңдар|сыздар))|((іп|п) (ті|пін|сің|сіз|піз|сіңдер|сіздер))|((а|й)тын (|мын|сың|сыз|быз|сыңдар|сыздар))|((е|й)тін(|мін|сің|сіз|біз|сіңдер| сіздер))) деп))(|\ )\,\ \w+’ | так как, поскольку, ибо, потому что, оттого что, из-за того что, вследствие того что, благодаря тому что, в связи с тем что, тем более что, затем что и др. (), [] | The reason why (), [] Because of/on account of/due to + существительное |
Purpose expression | ‘\w+(((м|п|б)(ақ|ек|ақшы|екші) (болып|боп))|((у|с) үшін)|(((\ екен)| са(|м|ң|ңыз|қ|ңдар|ңыздар)|се(|м|ң|ңіз|к|ңдер|ңіздер)|(ғай|қай)(|мын|сың|сыз|мыз|сыңдар|сыздар)|(гей|гей)(|мін|сің|сіз|міз|сіңдер|сіздер)|(а|е|)(йын|йін|йық|йік)|(ы|і|)(ң|ңыз|ңіз)(|дар|дер)|(сын|сін)) деп)|(((қы|ғы)(м|ң|ңыз|сы|мыз| ларың|ларыңыз|лары)|(кі|гі)(м|ң| ңіз|сі|міз|лерің|леріңіз|лері)) (келіп|келмей)))(|\ )\,\ \w+’ | чтобы, дабы, с тем чтобы, затем чтобы, для того чтобы, союзов-частицам только бы, лишь бы; | in order to/so as to + инфинитив |
COMPOUND SENTENCES (SALALAS QURMALAS) | |||
Explanatory compound sentence | ‘\w+ (соншалық|сонша|соншама|соншалықты|сондай)(\ |)(\,|(\-|\–|\—))(\ |)\w+’ | \w+, что \w+: | \w.+.that \w.+. \w+: |
Selective compound sentence | ‘\w+((\ |)\,(\ |))(не|немесе|я|яки|болмаса|(я болмаса)|(не болмаса)|әйтпесе|әлде)(\ |)(\,|)\ \w+’ | \w+, либо \w+, иначе | \w+, or, \w+, otherwise |
Sequential compound sentence | ‘\w+((\ |)\,(\ |))(кейде|біресе|бірде)\ \w+’ | \w+, иногда \w+, а иногда | \w+, sometimes \w+ and sometimes |
1 | 2 | 3 | 4 |
Opposite compound sentence | ‘\w+((\ |)\,(\ |))(бірақ(|\, сонымен бірге)|алайда|(сонда да)|дегенмен|(әйтсе де)|(сөйтсе де)|әйткенмен)(\,|)\ \w+’ | \w.+.однако \w.+. \w.+.но \w.+. .хотя \w.+.\w. | \w.+.but \w.+. \w.+.although \w.+. .although \w.+.\w. |
Ynggailas salalas | ‘\w+ және \w+’, | \w+, и \w+ \w+ и \w+ | \w+ and \w+ |
‘\w+ (д|т)(а|е). \w+’, | \w+, и \w+ \w+ и \w+ | \w+ and \w+ | |
‘\w+ әрі \w.+, әрі \w.+ ‘, | \w+ и \w.+, и \w.+ \w+ ни \w.+, ни \w.+ | \w+, and \w+ | |
‘\w+ әрі \w+’ | \w+ и \w+ \w+, да \w+ | \w+ and \w+ | |
Causal compound sentence | ‘\w+((\ |)\,(\ |))(себебі|өйткені|(сол себепті)|сондықтан|(неге десеңіз)|(неге десең))(\ |)(\,|)\ \w+’ | По смыслу, знаки(:, -) | Within the meaning, signs (:, -) |
Sentence Type in Kazakh | Order of Internal Sentences in Kazakh | Sentence Type in Russian | Order of Internal Sentences in Russian |
---|---|---|---|
SALALAS QURMALAS (with conjunctions) | Compound Sentences | ||
Ynggailas salalas | [MS1*] және [MS2*] | Compound sentences with connecting conjunctions | [MS1*] и [MS2*] <==> [MS2*] и [MS1*] |
[MS1* __ әрі __], [әрі MS2*] | И [MS1*], и [MS2*] <==> И [MS2*], и [MS1*] | ||
Talgauly salalas | [MS1*], әйтпесе (әлде, немесе) [MS2*] | Compound sentences with disjunctive conjunctions | [MS1*], или (либо) [MS2*] <==> [MS2*], или (либо) [MS1*] |
Kezektes salalas | [MS1* ... бірде ...], бірде [MS2*] | То [MS1*], то [MS2*] <==> [MS1* ... то ...], то [MS2*] | |
Біресе [MS1*], біресе [MS2*] | То [MS1*], то [MS2*] <==> То [MS2*], то [MS1*] | ||
Sebep-saldar salalas | [MS1*], сондықтан [MS2*] | [MS1*], поэтому [MS2*] <==> [MS2*], так как [MS1*] | |
SABAQTAS QURMALAS | Complex Sentences | ||
Shartty bagynyngqy-sabaqtas | (DS* ... v1), [MS*] | Complex sentences of conditions | (Если ... DS*), [MS*] <==> [MS*], (если ... DS*) |
Qarsylyqty bagynyngqy-sabaqtas | (DS* ... v2), [MS*] | Compound sentences of concession | (Хотя ... DS*), [MS*] <==> [MS*], (хотя ... DS*) |
Mezgil bagynyngqy-sabaqtas | (DS* ... v3), [MS*] | Complex sentences of time | (Когда ... DS*), [MS*] <==> [MS*], (когда ... DS*) |
Sebep bagynyngqy-sabaqtas | (DS* ... v4), [MS*] | Complex sentences of reason | (Поскольку ... DS*), [MS*] <==> [MS*], (поскольку ... DS*) |
Qimyl-syn bagynyngqy-sabaqtas | (DS* ... v5), [MS*] | (DS*), [MS*] <==> [MS*], (DS*) | |
Maqsat bagynyngqy-sabaqtas | (DS* ... v6), [MS*] | Complex sentences of purpose | (Чтобы ... DS*), [MS*] <==> [MS*], (чтобы ... DS*) |
BLEU | WER | TER | |
---|---|---|---|
YANDEX translator—Russian-Kazakh | 41 | 0.52 | 0.53 |
Proposed approach—Russian-Kazakh | 45 | 0.43 | 0.45 |
YANDEX translator—English-Kazakh | 43 | 0.58 | 0.58 |
Proposed approach—English-Kazakh | 46 | 0.49 | 0.51 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Turganbayeva, A.; Rakhimova, D.; Karyukin, V.; Karibayeva, A.; Turarbek, A. Semantic Connections in the Complex Sentences for Post-Editing Machine Translation in the Kazakh Language. Information 2022, 13, 411. https://doi.org/10.3390/info13090411
Turganbayeva A, Rakhimova D, Karyukin V, Karibayeva A, Turarbek A. Semantic Connections in the Complex Sentences for Post-Editing Machine Translation in the Kazakh Language. Information. 2022; 13(9):411. https://doi.org/10.3390/info13090411
Chicago/Turabian StyleTurganbayeva, Aliya, Diana Rakhimova, Vladislav Karyukin, Aidana Karibayeva, and Asem Turarbek. 2022. "Semantic Connections in the Complex Sentences for Post-Editing Machine Translation in the Kazakh Language" Information 13, no. 9: 411. https://doi.org/10.3390/info13090411
APA StyleTurganbayeva, A., Rakhimova, D., Karyukin, V., Karibayeva, A., & Turarbek, A. (2022). Semantic Connections in the Complex Sentences for Post-Editing Machine Translation in the Kazakh Language. Information, 13(9), 411. https://doi.org/10.3390/info13090411