2. Related Work
Whether the terminology in the professional field is translated accurately is very important in practical application. Using external knowledge to constrain translations is one of the ways to improve the quality of terminology word translations. In order to take advantage of pre-defined translations, Crego et al. [
9] used placeholder tags to replace named entities on both the source and target language end, allowing the model to learn translation placeholder tags to translate these named entity words. For example, the i-th named entity in the source language sentence is replaced with ‘Tag
(i)’, and at the output, the placeholder markers in the output will be replaced with a pre-specified translation. The disadvantage of this method is that the named entities themselves are replaced by labels in the sentences of the input model so that the input sentences lack the semantic information of the named entities, which will affect the adequacy and fluency of the output. In the field of new energy, Dung et al. [
10] used two methods of terminology word substitution and splicing to improve the quality of terminology translation in Chinese–English machine translation in the field of new energy, and the results show that the use of labels when replacing splicing will achieve better results. Liu et al. [
11] used the combination of placeholder and splicing to improve the accuracy rate of terminology translation in the Chinese–English conference scenes machine translation in three domains: sports, business, and medicine. Dung and Liu [
10,
11] improved the baseline model with the source language of Chinese and did not solve the problem of word-by-word translation of English phrases and phrase mismatch when the source language was English. Moreover, they did not improve the model structure when using the replacement splicing method, which can cause the loss of label information. In this paper, the method uses a bilingual phrase dictionary to replace the terminology words, which solves the translation problem of English term phrases. Compared with the use of placeholder methods, the method in this paper retains semantic information in data processing and adds residual connection structure at the encoder end, which further improves the use of lexical information of terminology words in the source language.
In the word on English–Chinese translation, Deyi Xiong et al. [
12] proposed a method of neural machine translation intervention and fusion using bilingual dictionaries. In order to fuse the translation from bilingual dictionaries into the neural machine translation model, Deyi Xiong et al. [
13] proposed three methods: tagging, mixing phrases, and extra embedding. The first two methods implement interventions in the data preprocessing stage and expand the data with words or phrases that appear in the bilingual dictionary in the training data. The additional embedding method is to intervene at the embedding layer. In addition to word embedding and position embedding, the label signal is input as an additional embedding to the embedding layer. When the three methods are combined, the best results are achieved.
Another mainstream method is to use constraint decoding algorithms in decoding, using external knowledge as hard constraints in decoding. Hokamp [
13] proposed a grid beam search algorithm, which uses the pre-specified vocabulary translation as constraints during the beam search process. The cost is to increase the complexity of the standard decoding algorithms and increase the decoding time of the model. Hasler [
14] used a multi-stack decoding method and proposed a constrained beam search algorithm, which uses the alignment method to obtain the corresponding source language vocabulary of the target side constraints and constructs a finite state automaton to guide the constraint decoding before decoding. Post and Vilar [
15] improved Hokamp’s method by proposing a dynamic beam allocation algorithm. This algorithm maintains a single beam of size k during decoding, which limits the decoding complexity and solves the problem that it is difficult to integrate with batch decoding and other operations when using grid beam search algorithm and constrained beam search algorithm for constrained decoding. All three algorithms of constraint decoding make structural changes to the decoder, the complexity is high, and it is difficult to integrate with other methods, which have certain application limitations. In contrast, the method in this paper is simple and efficient, and the main work is completed on the data side to facilitate the integration of more advanced models and structures. The decoding speed of this method is better than that of the constraint decoding method. Because the semantic information contained in Chinese sentences is more complex, the use of hard decoding will destroy the overall semantic integrity of the sentence and affect the fluency of the sentence. The method in this paper cannot guarantee that the translation results contain all the marked words, but it will have a higher overall quality than the constrained decoding method.
Another method of data expansion using external knowledge is knowledge graph fusion. Since knowledge graphs usually contain rich knowledge of named entities and entity relationships, they can be used to improve the named entity translation ability of neural machine translation models. The knowledge graph contains a large number of named entities that do not appear in the machine translation model. These entities can be called extra-domain entities relative to the machine translation model. On the contrary, entities that appear in both the training data and the knowledge graph can be called intra-domain entities. Therefore, the translation of intra-domain entities can be used to guide the translation of extra-domain entities. Zhao et al. [
16] used the knowledge representation learning method to learn the source language knowledge graph and the target language knowledge graph to obtain the vector representation of the source language entity and the target language entity. The entity translation pair in the parallel sentence pair is extracted as the seed entity translation pair, and it is used as the anchor point. The vector representation of the source language entity and the target language entity is mapped to the same semantic space. After calculating the semantic distance between them, the translation of the extra-domain entities is predicted to constitute the derived entity pairs. When the semantic distance between the derived entity pair and the seed entity pair is less than the preset threshold of λ, the derived entity pair is replaced by the seed entity pair in the parallel sentence pair to generate a pseudo-bilingual sentence pair. Finally, the pseudo-bilingual data is merged with the original data to complete the fusion of the knowledge graph.
Data augmentation is a general term for a class of methods that increase the diversity of training data by changing existing data or synthesizing new data [
17]. Sennrich R et al. [
18] proposed a reverse translation method, which uses the monolingual data from the target language to reversely translate into the source language and combines it into pseudo-parallel data and the original parallel data to train the forward translation system. Currey et al. [
19] enhanced the training data by copying the target language sentence to the source language, resulting in an enhanced training corpus where the source language and the target language contain the same sentence. The enhanced data proved to improve translation performance, especially for the same proper nouns and other low-frequency words in the source and target languages.
In recent years, there have been many new studies on the fusion of prior knowledge and the enhancement of key information in text. Wu et al. [
20] used BERT to extract context features and proposed three ways to fuse context sequences with source language sequences. Hu et al. [
21] proposed that accurate and complete translation of key information in the text can ensure the quality of translation results. Their work is embodied in the fusion of key information in the source language with the source language sentences through the preset encoder, which improves the translation effect of keywords. Ren [
22] integrated grammatical information into the Mongolian–Chinese machine translation work. Specifically, the grammatical prior knowledge was integrated into the model through two different grammatical auxiliary learning units preset in the encoder and decoder, which improved the translation quality of the model. Li [
23] proposed a method to enhance the factual relationship, which integrates the factual relationship information into the model as a priori knowledge. Specifically, the fact relation mask matrix is constructed in the encoding process as the fact relation representation of the source language. In the decoding process, the actual relation representation and the original representation of the original sentence are merged into the decoder. Chen et al. [
24] classified the source language words into content words and function words according to word frequency and designed content word perception NMT. Specifically, the content word is encoded as a new source representation, based on which additional content word context vectors are learned, and the content word-specific loss of the target sentence is introduced. Nguyen et al. [
25] used prior alignment to guide the training of the Transformer and tested the influence of different weights. Finally, an 8-head Transformer-HA model using prior alignment with heavier weights to guide multi-head translation was proposed. The disadvantage of this model is that, compared with our method, the prior knowledge will be diluted when faced with long sentences. Peng et al. [
26] established the syntactic dependency matrix of each word based on the syntactic dependency tree and integrated quantitative syntactic knowledge into the translation model to guide translation, effectively learning syntactic details and eliminating the dispersion of attention scores.
Machine translation research in low-resource and professional fields has also been a hot topic in recent years. Dungyer et al. [
27] created a data set that includes the works of a Croatian-related contemporary poet and the translation of German poetry by two professional literary translators. The research results show the effectiveness of poetry machine translation in terms of special automatic quality indicators. At the same time, Seljan et al. [
28] performed manual adequacy and fluency analysis on the machine translation results of poetry under the same data set, which proved the effectiveness of applying machine translation to Croatian–German pairs in the field of poetry. Gašpar et al. [
29] proposed a method to test the quality of language texts for terms. When the consistency of terms in the corpus is high, a high HHI score will be obtained. Huang et al. [
30] proposed a domain-aware NMT with mask substructure for multi-domain machine translation, which is characterized by the improved model that can be automatically adapted in multi-domain neural machine translation. Specifically, mask substructures are used in both encoders and decoders to capture specific representations in each specific domain, which effectively maintains specific knowledge in each domain and solves the problem of catastrophic forgetting. Yu [
31] proposed a method to improve neural machine translation by using similar dictionaries in view of the similarity between Thai and Lao languages. A bilingual similarity dictionary composed of pairs of similar times is established, and an additional similarity dictionary encoder is introduced. The above two methods enhance the specific word representation in low-resource machine translation and improve the effect of machine translation.
In summary, in the existing research on machine translation in the professional field, the main ideas are divided into three categories: (a) Mark the term position in the corpus to allow the model to enhance the learning of the marker position [
9,
10,
11,
12,
20]; (b) Change the structure of the Transformer encoder to enhance the learning of term information [
21,
22,
23,
25,
31]; (c) Use constraint decoding when decoding [
13,
14,
15]. Although these works have made great contributions to machine translation in the field of low resources, there are still areas for improvement. (a) The class method will affect the semantic information of the original sentence, which in turn affects the overall effect of translation. In addition, marking information only in the corpus will cause information dilution in the process of multiple training of the model. The scheme proposed in this paper uses the target language term phrases when marking and replacing in the corpus, which will not interfere with the original information in content. At the same time, the residual link structure is added to ensure the integrity of semantic information and labels; (b) The class method is generally more complex, and the added information is easily diluted in multiple trainings, which is more obvious in the translation of long sentences. The scheme proposed in this paper is simple and effective and does not require an additional complex structure, only the addition of a residual link module and an additional loss module. Moreover, the label information is well retained through the residual link module during the training process, and the effect will not decrease significantly, even in the face of long sentences; (c) The class method is a hard decoding method, which will affect the translation of non-decoding constrained dictionary words, thus affecting the fluency of sentences. In addition, the training time of this decoding method is obviously longer. The decoding time of the proposed scheme is close to that of the original model, and the overall translation quality is better than that of the hard decoding method. The experimental results show that the proposed method is effective and superior.