A Diverse Data Augmentation Strategy for LowResource Neural Machine Translation
Abstract
:1. Introduction
2. Related Work
3. Approach
3.1. BackTranslation and SelfLearning
3.2. Decoder Strategy
3.3. Training Strategy
Algorithm 1. Our data augmentation strategy. 

4. Experiments
4.1. IWSLT2014 EN–DE Translation Experiment
 We denote Gao’s work as SoftW. They randomly replace word embedding with a weight combination of multiple semantically similar words [26].
4.2. LowResource Translation Tasks
5. Discussion
5.1. Copying the Original Data
5.2. Backward Data or Forward Data
5.3. The Number of Samplings
6. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
Model  DE–EN 

Base  34.72 
SW  34.70 * 
DW  35.13 * 
BW  35.37 * 
SmoothW  35.45 * 
LMW  35.40 * 
SoftW  35.78 * 
Our method  36.68 
Model  Train  Dev  Test 

TR–EN  0.35M  2.3k  4k 
SI–EN  0.4M  2.9k  2.7k 
NE–EN  0.56M  2.5k  2.8k 
Test  Test2011  Test2012  Test2013  Test2014  AVG 

Baseline  24.08  24.79  26.38  25.03  
Our model  25.55  26.11  28.26  26.41  1.51 
Model  NE–EN  SI–EN 

Baseline  7.64  6.68 
Our model  8.92  8.21 
Method  DE–EN  TR–EN (Test2013) 

Baseline  34.72  26.38 
7Baseline  34.51  26.23 
Our method  36.68  28.26 
Method  DE–EN  TR–EN (Test2013) 

Baseline  34.72  26.38 
Random  36.73  28.27 
Our method  36.68  28.26 
Method  Test2011  Test2014 

TR–EN  TR–EN  
Baseline  24.08  25.03 
Forward  24.42  25.07 
Backward  25.08  25.94 
Bidirectional  25.55  26.41 
Method  DE–EN  TR–EN(Test2013) 

Baseline  34.72  26.38 
Sample_5  36.68  28.27 
Sample_10  36.48  28.23 
Sample  36.59  27.86 
