You are currently viewing a new version of our website. To view the old version click .
Information
  • Article
  • Open Access

20 September 2023

Machine Translation of Electrical Terminology Constraints

,
and
1
School of Electrical Engineering, Henan University of Science and Technology, Luoyang 471023, China
2
Henan Province New Energy Vehicle Power Electronics and Power Transmission Engineering Research Center, Luoyang 471023, China
3
School of Foreign Languages, Henan University of Science and Technology, Luoyang 471023, China
*
Author to whom correspondence should be addressed.
This article belongs to the Section Artificial Intelligence

Abstract

In practical applications, the accuracy of domain terminology translation is an important criterion for the performance evaluation of domain machine translation models. Aiming at the problem of phrase mismatch and improper translation caused by word-by-word translation of English terminology phrases, this paper constructs a dictionary of terminology phrases in the field of electrical engineering and proposes three schemes to integrate the dictionary knowledge into the translation model. Scheme 1 replaces the terminology phrases of the source language. Scheme 2 uses the residual connection at the encoder end after the terminology phrase is replaced. Scheme 3 uses a segmentation method of combining character segmentation and terminology segmentation for the target language and uses an additional loss module in the training process. The results show that all three schemes are superior to the baseline model in two aspects: BLEU value and correct translation rate of terminology words. In the test set, the highest accuracy of terminology words was 48.3% higher than that of the baseline model. The BLEU value is up to 3.6 higher than the baseline model. The phenomenon is also analyzed and discussed in this paper.

1. Introduction

Machine translation [1] is the process of converting source code language to target language by using a computer. As a cross-language communication tool, machine translation plays an increasingly important role in people’s daily lives and has become one of the hot contents in the research field of natural language processing [2,3]. In recent years, deep learning [4] technology has made major breakthroughs, and neural machine translation [5] that integrates deep learning technology has gradually replaced statistical machine translation [6] with excellent translation quality, becoming the mainstream machine translation method in academia and widely used in industry.
In the past 70 years of development history, the methodology of machine translation has undergone several major changes. From rule-based machine translation [7], to statistical machine translation [6] during the beginning of the big data era, to today’s neural machine translation, the academic research on machine translation technology has never been interrupted. Nowadays, machine translation in the general-purpose domain with more abundant parallel corpus resources has achieved good research results, which has brought great convenience to translation work such as language communication in daily human life. However, the neural machine translation system has the problem of insufficient translation of low-frequency words. Terminology words in specialized domain literature are a kind of low-frequency word carrying important professional information. Whether it is translated correctly or not often has an important influence on the credibility of the translation text [8].
In the current social and economic development, electricity involves all aspects of our lives. Machine translation for the electrical field is of great significance in promoting the progress and development of the electrical industry. There are many terminologies in the literature of the electrical field, and the current translation model often fails to give the correct translation. Therefore, the research on machine translation in the electrical field is of great significance.
The main contributions of this paper are as follows.
This paper takes the parallel corpus in the electrical domain as the data set to construct the terminology phrase dictionary; based on this, three schemes are given to use the term phrase dictionary to constrain the translation model to improve the translation effect of the model. Scheme 1 replaces the terminology phrases of the source language; Scheme 2 uses the residual connection at the encoder end after the terminology phrase is replaced; Scheme 3 uses a segmentation method of combining character segmentation and terminology segmentation for the target language and uses an additional loss module in the training process. The three schemes are superior to the baseline model in terms of BLEU value and correct translation rate of term words. The results show that the proposed method in this paper can make the translation model effectively learn the knowledge of the external terminology phrase dictionary and improve the translation quality of machine translation in the field of electrical engineering.

3. Transformer-Based Neural Machine Translation Model

Transformer [32] is a neural machine translation model based on the attention mechanism proposed by the Google team in 2017. The model completely solves the problem that the recurrent neural network is restricted by cyclic computing and has achieved remarkable results in machine translation, becoming the mainstream network architecture of neural machine translation. This paper selects the Transformer model to carry out experiments. The content of Section 3.1 is the introduction of the traditional Transformer model. The content of Section 3.2 is the introduction of the improved Transformer model in Scheme 2 and Scheme 3 proposed in this paper.

3.1. Traditional Transformer

The Transformer uses an encoder-decoder structure, as shown in Figure 1. The encoder and decoder in the model are both N = 6 layers; each layer has the same structure, but the parameters are not shared.
Figure 1. Transformer structure.
Each layer of the encoder part contains two sub-layers, followed by the self-attention sub-layer and the feedforward network sub-layer. The attention mechanism in the Transformer is formalized as:
A t t e n t i o n Q , K , V = s o f t m a x Q K d k V ,
where Q R n q × d k ,   K R n k × d k , and V R n k × d v ; the self-attention mechanism is essentially the case of Q = K = V in Formula (1).
Each sub-layer has residual connection and layer normalization operation:
L a y e r N o r m x + S u b l a y e r x ,
where S u b l a y e r x represents the self-attention sublayer or the feedforward network sublayer.
The output dimensions of all sub-layers in the network are equal. Each layer of the decoder part contains three sub-layers. A cross-attention sub-layer is added between the self-attention sub-layer and the feedforward network sub-layer. The sub-layer query is the output from the self-attention sub-layer. The keys and values are the output of the encoder so that each position in the decoder can perform attention calculations on all positions in the encoder. The self-attention sub-layer of the decoder is different from the encoder. It adds a mask mechanism so that each position in the decoder can only calculate the left and the current position to ensure that only the current position and the left side have been generated when predicting the next word. The Transformer uses a parameter-free functional position encoding to preserve the position of words in the sequence to identify the order relationship in the language:
P ( p , 2 i ) = sin p / 10,000 2 i / d m o d e l ,
P ( p , 2 i + 1 ) = cos p / 10,000 2 i / d m o d e l ,
where p is the position and i is the dimension subscript. This functional position encoding can also enable the model to generalize to a sequence length not seen in the training process [32].

3.2. The Improved Transformer in Our Scheme

For the source language sentence, the vector representation of the sentence input to the encoder end is obtained by adding the word embedding operation and the position encoding. Since the encoder is a multi-layer stacked structure, it is easy to cause the multi-layer units inside it to be like the Elmo [33] model. The information contained in the output vectors of each layer will have different emphasis on syntax and meaning. The units near the top layer will focus more on grammatical information. The annotation information of the external dictionary and the semantic information of the term itself will be lost. Therefore, this paper adds a residual connection structure to the encoder end, as shown in Figure 2.
Figure 2. Residual connection encoder.
The residual connection structure fuses the source language word vector that adds the position encoding with the output vector of the last layer on the encoder end, as shown in Equation (5).
C o u t = C l 6 + C e m b ,  
where C o u t is the output vector of the encoder, C l 6 is the output vector of the sixth layer of the encoder, and C e m b is the output vector of the embedding layer.
The residual connection structure is added to ensure that the word vector input decoder contains the term phrase dictionary label information and word meaning information. Experiments show that the improved model improves the translation quality of term words. On the decoder side, in order to make the model adapt to the Character&Term scheme proposed in this paper, a term word additional loss module is added, as shown in Figure 3. This module encourages the translation model to focus on the translation of terminologies and similar methods have been proven to be effective in Reference [24]. Specifically defined as:
J θ = a r g max θ P y | x ; θ + λ × P ( b | x ; θ ) ,
where y is the target reference translation, b is the term word sequence generated by using the Character&Term scheme. y is set as a hyper-parameter of 0.5 in the comparative experiment of this paper. No new parameters are added when the loss module is introduced, which only affects the loss calculation during the training of the standard NMT model.
Figure 3. Terminology word additional loss module.

4. Scheme

In order to solve the problem of terminology translation, this paper constructs a terminology dictionary in the electrical field and uses the dictionary information to impose constraints on the corpus at the source language and the target language. In addition, a residual connection structure is added to the encoder part of the model to make the constraint information better utilized. Based on this, according to different use environments, this paper proposes three schemes to make dictionary knowledge effectively constraint model.

4.1. Scheme 1: Replace the Source Language Term Phrase (B&O-Replace)

The scheme replaces the corresponding English terms in the source language with Chinese terms in the term dictionary and labels <B><O> at the beginning and end of the target language phrase in the source language to realize the constraint of dictionary information. For example, in the source language sentence ‘Clearances, creepage distances and solid insulation.’, there are terms ‘creepage distances’ and ‘solid insulation’, which have their corresponding interpretations for ‘爬电距离’ and ‘固体绝缘’ in the terminology dictionary. The source language sentence after term substitution becomes ‘Clearances, <B>爬电距离<O> and <B>固体绝缘<O>.’. It is worth noting that the scheme only imposes constraints at the data level and has a good effect. It can be implemented in any field and any scene when using any translation model.

4.2. Scheme 2: Use Residual Connection at Encoder after Term Phrase Replacement (Res + B&O-Replace)

On the basis of Scheme 1, this scheme improves the mainstream translation model Transformer to make better use of constraint information. The specific model improvement method is shown in Section 3.2. Experiments in this paper show that this scheme significantly improves the translation accuracy of term words and can be used in any field and any scene.

4.3. Scheme 3: The Target Language Uses Dictionary Constraints at the Same Time (Character&Term)

The scheme is designed for English–Chinese machine translation scenarios. Since the current word segmentation method will cause a lack of term information on the Chinese side, dictionary information is used to constrain the Chinese side during word segmentation. Specifically, in the target language sentence ‘电气间隙、爬电距离和固体绝缘’, ‘爬电距离’ and ‘固体绝缘’ are the term words in the dictionary, and the word segmentation result after the constraint is ‘电\0气\0间\0隙\0, \0爬电距离\0 and \0固体绝缘’. Among them, ′\0′ represents a space character and will not be used as a label information input model. In order to enhance the term constraint effect of the target language, we add an additional loss module at the end of the decoder. This scheme significantly improves the BLEU value of the translation results in the English–Chinese machine translation scenario.

5. Experiment

The experiments in this paper are all based on the Transformer model. By designing multiple sets of different data preprocessing experiments and model comparison experiments, the effectiveness of the scheme described in this paper is verified.

5.1. Data Preprocessing

In order to realize machine translation in the field of electrical engineering, the data sets of all experiments in this paper adopt Chinese and English parallel corpus in the field of electrical engineering. The content mainly comes from some Chinese and English materials collected in the field of electrical engineering, including some professional books and documents in the field of electrical engineering, some related technical forums and official websites in the field of electrical engineering. The training set used in the experiment has about 190,000 bilingual parallel corpora; the validation set and the test set each have 2000 bilingual parallel sentence pairs. Table 1 shows an example of the parallel sentence pairs in the corpus.
Table 1. Examples of parallel sentence pairs.
The external knowledge used in the experiment is a dictionary containing 38,859 pairs of bilingual term phrases in the electrical field. The term pairs in the dictionary are included in the training set, test set and verification set. The dictionary is established by a dictionary containing 200,000 pairs of bilingual term words in the electrical field. The specific screening steps are as follows: 1. Screen out the English term words in the form of phrases. Because the research in this paper aims at the problem of phrase mismatch and improper translation caused by word-by-word translation of English term phrases in English–Chinese machine translation, only the English phrase part in the dictionary is retained to construct a phrase dictionary; 2. Screening out the term words that appear in the target language sentences in the Chinese side of the dictionary. Table 2 shows the examples of term phrases in the phrase dictionary.
Table 2. Example of term phrases.
When the term phrase dictionary information is integrated into the English side of the source language, the phrases contained in the dictionary of the source language in the parallel sentence pairs are replaced with their corresponding target language phrases, and the tags <B><O> are marked at the beginning and end of the target language phrases on the source language side. Finally, the nltk tool is used to segment the word and input the model. The target language side is processed in two different ways for comparative experiments. In the baseline model and control group, the target language was processed using the Jieba tool. In order to increase the model’s recognition of the target language term words, this paper designs a word segmentation method that combines word segmentation and the use of a term dictionary, and the sentence parts outside the target language term words are input into the model by word segmentation. Table 3 shows the parallel sentence pairs processed in different ways.
Table 3. Examples of parallel sentence pair processing results.
Source is the source language sentence, B&O-Replace is the source language term phrase replacement method proposed in this paper, Target is the target language sentence, Jieba means using Jieba word segmentation tool for word segmentation, Character&Term is the word segmentation method proposed in this paper, which combines word segmentation and the use of term dictionary. It should be noted that the term Clearances in English sentences are individual words; there is no mismatch problem. Compared with term phrases, it will have a better model learning effect, so it is not included in the scope of annotation. The ′\0′ represents a space character that will not be used as a label information input model. The encoder structure designed in this paper is adopted, and the training process after processing the data with B&O-Replace and Character&Term is shown in Figure 4.
Figure 4. Training process.

5.2. Experimental Settings

In this paper, the open-source NMT system OpenNMT is used to implement the baseline model Transformer. In terms of data set processing, the length of sentences in the corpus is limited to 100; that is, sentences with a length of more than 100 will be filtered. The vocabulary size is set to 60,000, and the shared vocabulary is used. Words that are not in the vocabulary are represented by <UNK>. During the training process, both the word vector dimension and the hidden layer dimension inside the codec are set to 512, the batch_size is set to 64, and the number of multi-head attention heads h is set to 8. The Adam optimization algorithm is used, and the neuron random deactivation probability (dropout) is set to 0.1. A total of 30,000 steps are trained in this experiment, and the model is verified every 2500 steps. The beam search method is used in decoding, where beam_size is set to 5, the hyper-parameters λ in the additional loss module are set to 0.5 according to experience, and the remaining parameters use the default parameters of OpenNMT.
All parameters in the experiment are consistent, and the BLEU value of the translation results and the correct translation rate of the term words are comprehensively evaluated. The 2000 sentence pairs in the test set contain 1626 term words in the term phrase dictionary. The correct translation rate of term words is calculated as the percentage, and the number of correct term translations generated in the output accounts for the total number of terms in the test set. Specifically, as shown in Formula (7), P b is the translation accuracy of terminologies, n ( x b ) is the number of terminologies in the translation result and n ( y b ) is the number of terminologies in the reference answer.
P b = n ( x b ) n ( y b ) .

5.3. Comparative Experiment

Baseline model [32]: Using the original Transformer structure, neither the source language nor the target language is marked.
Wu et al. [20]: A method of fusing source language context features. We extract the context features of the term words in the term dictionary according to the experimental steps of Wu et al., use ‘1 prev, 1next’ as the context vector, and use the best C mode proposed by Wu for fusion. The experimental settings are the same as Wu. The number of embedding dimensions, feedforward layer dimensions, and encoder modules is 512, 2048 and 6, and the source language and the target language do not do any other preprocessing.
Xiong et al. [12]: A method of incorporating label information into the model. We label the term phrase dictionary according to the method of Xiong et al. and adopt the third fusion method proposed by Xiong to fuse the label information in the additional embedding layer with the model. Neither the source language nor the target language perform any other preprocessing.
Hu et al. [21]: A method of fusing key information by adding additional encoders. We follow the method of Hu et al. and integrate the terms in the term dictionary as key information into the model by adding an encoder. The source language and the target language do not do any other preprocessing.

5.4. Experimental Results

We focused on the problem of phrase mismatch and improper translation caused by word-by-word translation of English terminologies in English–Chinese machine translation in the field of electrical engineering to carry out the experiment. The source language and the target language used different ways to integrate the knowledge of the terminologies phrase dictionary and changed the encoder structure to use the dictionary information better. Three schemes of integrating dictionary knowledge were proposed: B&O-Replace, Res + B&O-Replace, Character&Term. The experimental results are shown in Table 4 (“↑”represents an improvement over the baseline model).
Table 4. Experimental Results.
The results show that the BLEU values of the three proposed schemes are 0.1, 0.84 and 3.6 higher than the baseline model, and the correct rate of terminology is 5.7%, 44.5% and 48.3% higher than the baseline model. It is proved that the three schemes have improved the BLEU and term accuracy on the baseline model and are competitive in the latest model proposed by peers.

5.5. Results Analysis

In order to intuitively show the effectiveness of the proposed scheme, Figure 5 and Figure 6 show the differences in the translation results of different models in the test set with BLEU and the correct translation rate of term words as ordinates.
Figure 5. BLEU value comparison [13,21,22].
Figure 6. Comparison of correct translation rate of terminological words [13,21,22].
It can be seen from Table 4 and Figure 5 that when the B&O-Replace scheme is used for term dictionary knowledge fusion, the BLEU value of the translation result is 0.1 higher than that of the baseline model, indicating that the marking method used in this paper can effectively integrate external dictionary knowledge with the model. When the Res + B&O-Replace scheme is used for term dictionary knowledge fusion, the BLEU value of the translation result can be increased by 0.84 compared with the baseline model, which is greater than the improvement of the B&O-Replace scheme. It shows that the residual connection structure is added to the top layer of the word embedding vector and the encoder with the label, which can effectively integrate the information carried by the two. It not only ensures the integrity of the top-level grammatical information but also ensures the full use of the underlying semantic information so that the semantic information in the label is more fully learned by the model and the translation result is more accurate. When using the Character&Term scheme, the BLEU value of the translation result is increased by 3.6, indicating that under the premise of integrating prior knowledge in the source language, the method of integrating prior knowledge in the target language can further improve the symmetry of the encoder and decoder information, thereby improving the utilization rate of prior knowledge.
From the analysis of Table 4 and Figure 6, it can be seen that when the B&O-Replace scheme is used for term dictionary knowledge fusion, the accuracy of the translation results in the test set is 5.7% higher than that of the baseline model, indicating that the model can effectively learn the word meaning information in the label after adding the tagged external dictionary information to the source language, thereby improving the translation accuracy of the term words. When the Res + B&O-Replace scheme is used for term dictionary knowledge fusion, the term accuracy in the translation results of the test set reaches 91.3%, which is 44.5% higher than that of the baseline model, indicating that adding a residual connection structure between the tagged word embedding vector and the top layer of the encoder can make the word meaning information in the label more fully learned by the model. The scheme improves the term accuracy by 38.8% compared with the B&O-Replace scheme without residual connection structure. It proves that using the residual connection structure to directly input the label information into the decoder can more effectively improve the utilization of label information. At the same time, it shows that after the six-layer encoder, the label information of the embedded layer will be lost to a certain extent. When using the Character&Term scheme, the term accuracy rate is 95.1%, which is a similar result to the scheme Res + B&O-Replace but still 3.8% higher than the former. It shows that the fusion of the target language term label and the additional loss module at the decoder side can further improve the accuracy of the terms in the translation results.
In the comparison experiment, the methods of Wu et al. and Hu et al. have improved the BLEU value and the accuracy of term words compared with the baseline model, but the effect is lower than the scheme proposed in this paper. The BLEU value and term accuracy of Xiong et al.’s method are lower than those of the baseline model. We speculate that it may be due to the difference between English phrases and Chinese words. When the English end is input into the embedding layer according to the space word segmentation, the phrase will be assigned to more than one vector, which leads to the fact that the source language term phrase cannot be well aligned with the label information during embedding, resulting in a mismatch of information.

5.6. Pretreatment Comparison Experiment

Dong et al. [10] and Xiong et al. [12] both used the preprocessing method of direct splicing in the corpus. The preprocessing method used in this experiment is to replace the corresponding position of the source language after the target language term words are labeled. The replacement method not only takes into account the research experience of Dong et al. but also combines the language characteristics of electrical terms. Here, in order to prove the superiority of the preprocessing scheme we adopted, a comparative experiment with the direct stitching method was designed. When the preprocessing method of direct splicing is adopted, the results of corpus processing are as shown in Table 5.
Table 5. Preprocessing method of direct splicing.
We use the pre-processed corpus to replace the corpus in the three proposed schemes, and the experimental results are shown in Table 6 (“↓” represents a decrease compared to the baseline model).
Table 6. Pretreatment comparison experimental results.
The results show that when the prior knowledge of the term dictionary is directly spliced at the corresponding position of the source language, the BLEU value and the term accuracy will be reduced, and the result is lower than the preprocessing scheme used in this paper. The reason for the analysis may be that the source language end of the dictionary used in this paper is a phrase with more than 1 English word. Direct splicing will affect the use of prior knowledge and the sentence structure of the source language sequence.

5.7. Hyper-Parameter λ Fine-Tuning and Ablation Experiment

In order to adapt to the fusion of terminologies in the target language in Scheme 3, an additional loss module with hyper-parameter λ is added in this paper. Figure 7 and Figure 8 are the changes of BLEU and the correct rate of term words when the hyper-parameter T is fine-tuned. When λ is 0, it means that no additional loss module is used. Obviously, when λ increases from 0 to 0.5, the BLEU value and the correct rate of term words are improved. The BLEU value is increased by 0.53, and the correct rate of term words is increased by 1.5%, which proves that the additional loss module improves the translation performance of the model. When λ is greater than 0.5, the BLEU value gradually decreases, which indicates that over-biased translation of term words will lead to a decrease in the ability of the model to translate non-term words, which is consistent with Chen et al. [24]. At the same time, as λ becomes larger, the accuracy of term words is stable at 94.9–95.2%. Therefore, when setting the test, we take 0.5 as the value of λ .
Figure 7. The influence of hyper-parameter λ on BLEU.
Figure 8. The effect of hyper-parameter λ on the accuracy of term words.

5.8. Example Analysis

Some test set reference sentences, baseline model translation results and several model translation results integrated with term phrase dictionary knowledge are compared, as shown in Table 7. Compared with the baseline model, the translation results of B&O-Replace, Res + B&O-Replace and Character&Term, which integrate the knowledge of terminology phrase dictionary, are not only more fluent but also reduce the <UNK> words, and the term words in the sentence are well translated. All three schemes can correctly translate the unknown term ‘rectification unit’ in the baseline model, which proves the effective integration of prior knowledge. In the translation results of the Res + B&O-Replace scheme, the terms ‘combined power generation system’ and ‘heater’ that cannot be translated by the baseline model are translated into synonyms ‘hybrid power generation system’ and ‘thermal power generating unit’. In the translation results of the Character&Term scheme, the terms ‘solar photovoltaic’ and ‘heater’ that cannot be translated by the baseline model are accurately translated. This shows that the proposed scheme improves the translation performance of the model on the basis of effectively translating the term words in the dictionary, which is reflected in the higher overall quality of the translation results.
Table 7. Examples of translation results.

5.9. Comprehensive Analysis

(1)
Integrating prior knowledge into low-resource and professional domain corpus is an effective way to improve translation quality. Experiments show that the three schemes we proposed can improve the BLEU value and term accuracy of machine translation;
(2)
Comparing the three schemes, the BLEU value is the most improved in the third scheme. The reason is that the source language end is integrated with the prior knowledge, and the target language end is also integrated with the same prior knowledge, which effectively prevents the ambiguity and forgetting of the label knowledge after the six-layer encoder of the model. In terms of term accuracy, both Scheme 2 and Scheme 3 have a greater improvement than the baseline model, which indicates that when prior knowledge is added in the preprocessing stage, after the multi-layer attention of the model, the prior knowledge will be lost, and our improvement in the model effectively prevents this loss; in terms of the translation results of the test set, Scheme 1, Scheme 2 and Scheme 3 can effectively translate the term words that cannot be translated by the baseline model, and the effect of Scheme 2 and Scheme 3 is better than that of Scheme 1, which further shows that the improvement of the model improves the utilization of prior knowledge;
(3)
The results of comparative experiments show that different types of terms require different preprocessing methods to adapt to the translation model. For example, the terms of a single word, term phrases, and professional expression sentences of terms need to be classified or customized to obtain better translation results. This will be our future research direction of interest.

6. Conclusions

This paper constructs a term phrase dictionary in the field of electrical engineering and proposes three schemes to use dictionary knowledge to constrain translation models. In the first scheme, <B><O> markers are used to replace terms in the source language. Compared with the baseline model, the BLEU value of the translation result is increased by 0.1, and the accuracy of term words is increased by 5.7%. The advantage of this scheme is that it is simple and easy to implement and can be implemented in any model. In the second scheme, <B><O> markers are used to replace terms at the source language end, and a residual connection structure is added at the encoder end. Compared with the baseline model, the BLEU value of the translation result is increased by 0.84, and the accuracy of term words is increased by 44.5%. The scheme achieves a correct translation rate of 91.3% of term words. On this basis, the model can also obtain a higher overall quality translation. The third scheme uses the segmentation method of combining characters and terms on the Chinese side and adds an additional loss module. Compared with the baseline model, the BLEU value of the translation result is increased by 3.6, and the correct rate of term words is increased by 48.3%. This method can better improve the quality of the translation when the target language is Chinese. The experimental results and translation examples strongly prove the effectiveness of the proposed scheme. By integrating the term phrase dictionary, the problem of phrase mismatch and improper translation caused by word-by-word translation of English term phrases is alleviated.
We will build a better terminology dictionary in the electrical field in future work, which contains various forms of terms, constantly improve the methods of using dictionary knowledge, and explore effective ways to further improve neural machine translation in the professional field.

Author Contributions

Research conceptualization: Z.W.; Model building: Z.W.; Data collection: Z.W. and Y.C.; Experiment design: Z.W., Y.C. and J.Z.; Manuscript preparation: Z.W.; Manuscript review: Z.W., Y.C. and J.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Juwei Zhang, grant number U2004163, and the APC was funded by Juwei Zhang.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The data sets used and/or analyzed during the current study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Zhao, T.-J. Principles of Machine Translation; Harbin Institute of Technology Press: Harbin, China, 2000. [Google Scholar]
  2. Zong, C.-Q. Statistical Natural Language Processing; Qinghua University Press: Beijing, China, 2013. [Google Scholar]
  3. Yuan, C.-F.; Li, W.; Li, Q.-Z. Fundamentals of Statistical Natural Language Processing; Electronic Industry Press: Beijing, China, 2005. [Google Scholar]
  4. Yin, B.-C.; Wang, W.-T.; Wang, L.-C. Review of deep learning research. J. Beijing Univ. Technol. 2015, 41, 48–59. [Google Scholar]
  5. Li, Y.-C.; Xiong, D.-Y.; Zhang, M. Summary of Neural Machine Translation. Comput. J. 2018, 41, 2734–2755. [Google Scholar]
  6. Liu, Q. Review of Statistical Machine Translation. Chin. J. Inf. 2003, 1, 12. [Google Scholar]
  7. Yuan, X.-Y. Summary of rule-based machine translation technology. J. Chongqing Univ. Arts Sci. Nat. Sci. Ed. 2011, 30, 56–59. [Google Scholar]
  8. Wu, Y.; Schuster, M.; Chen, Z.; Le, Q.V.; Norouzi, M.; Macherey, W.; Krikun, M.; Cao, Y.; Gao, Q.; Macherey, K.; et al. Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv 2016, arXiv:1609.08144. [Google Scholar]
  9. Crego, J.; Kim, J.; Klein, G.; Rebollo, A.; Yang, K.; Senellart, J.; Akhanov, E.; Brunelle, P.; Coquard, A.; Deng, Y.-C.; et al. Systran’s pure neural machine translation systems. arXiv 2016, arXiv:1610.05540. [Google Scholar]
  10. Dong, Z.-H.; Ren, W.-P.; You, X.-D. Machine translation methods incorporating terminology knowledge in the field of new energy. Comput. Sci. 2022, 49, 305–312. [Google Scholar]
  11. Liu, Q.-F.; Liu, C.-X.; Wang, Y.-N. Personalized machine translation methods in the field of integrating external dictionary knowledge in the conference scene. Chin. J. Inf. 2019, 33, 31–37. [Google Scholar]
  12. Wang, T.; Kuang, S.; Xiong, D. Merging external bilingual pairs into neural machine translation. CoRR 2019, 1912, 00567. [Google Scholar]
  13. Hokamp, C.; Liu, Q. Lexically constrained decoding for sequence generation using grid beam search. arXiv 2017, arXiv:1704.07138. [Google Scholar]
  14. Hasler, E.; De Gispert, A.; Iglesias, G.; Byrne, B. Neural machine translation decoding with terminology constraints. arXiv 2018, arXiv:1805.03750. [Google Scholar]
  15. Post, M.; Vilar, D. Fast lexically constrained decoding with dynamic beam allocation for neural machine translation. arXiv 2018, arXiv:1804.06609. [Google Scholar]
  16. Zhao, Y.; Zhang, J.; Zhou, Y. Knowledge graphs enhanced neural machine translation. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20, International Joint Conferences on Artificial Intelligence Organization, Cape Town, South Africa, 11 July 2020; pp. 4039–4045. [Google Scholar]
  17. Feng, S.; Gangal, V.; Wei, J.; Chandar, S.; Vosoughi, S.; Mitamura, T.; Hovy, E. A survey of data augmentation approaches for NLP. In Findings of the Association for Computational Linguistics, ACL-IJCNLP, 2021; Online; Association for Computational Linguistics: Toronto, Canada, 2021; pp. 968–988. [Google Scholar]
  18. Sennrich, R.; Haddow, B.; Birch, A. Improving Neural Machine Translation Models with Monolingual Data. In Proceedings of the 54th annual meeting of the association for computational linguistics, Berlin, Germany, 7–12 August 2016; Association for Computational Linguistics: Toronto, Canada, 2016; pp. 86–96. [Google Scholar]
  19. Currey, A.; Barone, A.V.M.; Heafield, K. Copied monolingual data improves low-resource neural machine translation. In Proceedings of the Second Conference on Machine Translation, Copenhagen, Denmark, 7–8 September 2017; pp. 148–156. [Google Scholar]
  20. Wu, X.; Xia, Y.; Zhu, J. A study of BERT for context-aware neural machine translation. Mach. Learn. 2022, 111, 917–935. [Google Scholar] [CrossRef]
  21. Hu, S.-J.; Li, X.-Y.; Bai, J.-Y. Neural Machine Translation by Fusing Key Information of Text. Comput. Mater. Contin. 2023, 74, 2. [Google Scholar] [CrossRef]
  22. Qing-dao-er-ji, R.; Cheng, K.; Pang, R. Research on Traditional Mongolian-Chinese Neural Machine Translation Based on Dependency Syntactic Information and Transformer Model. Appl. Sci. 2022, 12, 10074. [Google Scholar] [CrossRef]
  23. Li, F.-X.; Zhu, J.-B.; Yan, H.; Zhang, Z. Grammatically Derived Factual Relation Augmented Neural Machine Translation. Appl. Sci. 2022, 12, 6518. [Google Scholar] [CrossRef]
  24. Chen, K.-H.; Wang, R.; Utiyama, M.; Sumita, E. Content Word Aware Neural Machine Translation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 358–364. [Google Scholar]
  25. Nguyen, T.; Nguyen, T. Heavyweight Statistical Alignment to Guide Neural Translation. Comput. Intell. Neurosci. 2022, 2022, 6856567. [Google Scholar] [CrossRef]
  26. Peng, R.; Lin, N.; Fang, Y.; Jiang, S.; Hao, T.; Chen, B.; Zhao, J. Deps-SAN: Neural Machine Translation with Dependency-Scaled Self-Attention Network. In International Conference on Neural Information Processing; ICONIP 2022, Neural Information Processing; Springer: Cham, Germany, 2023; pp. 26–37. [Google Scholar]
  27. Dunđer, I.; Seljan, S.; Pavlovski, M. Automatic Machine Translation of Poetry and a Low-Resource Language Pair. In Proceedings of the 2020 43rd International Convention on Information, Communication and Electronic Technology (MIPRO), Opatija, Croatia, 28 September–2 October 2020; pp. 1034–1039. [Google Scholar]
  28. Seljan, S.; Dunđer, I.; Pavlovski, M. Human Quality Evaluation of Machine-Translated Poetry. In Proceedings of the 2020 43rd International Convention on Information, Communication and Electronic Technology (MIPRO), Opatija, Croatia, 28 September–2 October 2020; pp. 1040–1045. [Google Scholar]
  29. Gašpar, A.; Seljan, S.; Kučiš, V. Measuring Terminology Consistency in Translated Corpora: Implementation of the Herfindahl-Hirshman Index. Information 2022, 13, 43. [Google Scholar] [CrossRef]
  30. Huang, S.-G.; Guo, J.-J.; Yu, Z.-T. Effective domain awareness and adaptation approach via mask substructure for multi-domain neural machine translation. Neural Comput. Appl. 2023, 35, 14047–14060. [Google Scholar] [CrossRef]
  31. Yu, Z.-Q.; Huang, Y.-X.; Guo, J.-J. Improving thai-lao neural machine translation with similarity lexicon. J. Intell. Fuzzy Syst. 2022, 42, 4005–4014. [Google Scholar] [CrossRef]
  32. Vaswani, A.; Shazeer, N.; Parmarl, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. Neural Inf. Frocessing Syst. 2017, 1, 5998–6008. [Google Scholar]
  33. Peters, M.; Neumann, M.; Iyyer, M.; Gardner, M.; Clark, C.; Lee, K.; Zettlemoyer, L. Deep Contextualized Word Representations. arXiv 2018, arXiv:1802.05365. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.