# Humans Outperform Machines at the Bilingual Shannon Game

^{*}

^{†}

## Abstract

**:**

## 1. Introduction

`T h e _ b r o k e n _ v``2 1 1 1 11 3 2 5 1 1 1 15`

$-\frac{1}{n}$ log(${\prod}_{i=1}$ P(${g}_{i}$)) = $-\frac{1}{n}$ ${\sum}_{i=1}^{n}$ log(P(${g}_{i}$)) = $-{\sum}_{j=1}^{95}$ P(j) log(P(j)) |

${\sum}_{j=1}^{95}$ j · [P(j) − P($j+1$)] · log(j) |

#### 1.1. Contributions of This Paper

- A web-based bilingual Shannon Game tool.
- A collection of guess sequences from human subjects, in both monolingual and bilingual conditions.
- An analysis of machine guess sequences and their relation to machine compression rates.
- An upper bound on the amount of information in human translations. For English given Spanish, we obtain an upper bound of 0.48 bpc, which is tighter than Shannon’s method, and significantly better than the current best bilingual compression algorithm (0.89 bpc).

#### Related Work

## 2. Materials and Methods

#### 2.1. Shannon Game Data Collection

#### 2.2. An Estimation Problem

#### 2.3. Machine Plays the Monolingual Shannon Game

`T h e _ c h a p``2 1 1 1 7 2 4 ?`

- Character context (c) is generally more valuable than guess context (g).
- With large amounts of training data, modest context (g = 1, c = 2) allows us to develop a fairly tight upper bound (1.44 bpc) on PPMC’s actual compression rate (1.37 bpc).
- With small amounts of training data, Witten-Bell does not make effective use of context. In fact, adding more context can result in worse test-set entropy!

#### 2.4. Modeling Human Guess Sequences

## 3. Results

## 4. Information Loss

## 5. Conclusions

- Bilingual compression algorithms have plenty of room to improve. There is substantial distance between the 0.95 bpc obtained by [1] and our upper bound of 0.48 and lower bound of 0.21.
- Zoph et al. [1] estimate that a translator adds 68% more information on top an original text. This is because their English-given-Spanish bilingual compressor produces a text that is 68% as big as that produced by a monolingual Spanish compressor. Using monolingual and bilingual Shannon Game results, we obtain a revised estimate of 0.42/1.25 = 34% (Here, the denominator is monolingual English entropy, rather than Spanish, but we assume these are close under human-level compression).

## Acknowledgments

## Author Contributions

## Conflicts of Interest

## Appendix A. PPMC Compression

## References

- Zoph, B.; Ghazvininejad, M.; Knight, K. How Much Information Does a Human Translator Add to the Original? In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 17–21 September 2015.
- Bilingual Compression Challenge. Available online: http://www.isi.edu/natural-language/compression (accessed on 28 December 2016).
- Shannon, C. Prediction and Entropy of Printed English. Bell Syst. Tech. J.
**1951**, 30, 50–64. [Google Scholar] [CrossRef] - Rissanen, J.; Langdon, G. Universal modeling and coding. IEEE Trans. Inf. Theory
**1981**, 27, 12–23. [Google Scholar] [CrossRef] - Cleary, J.; Witten, I. Data compression using adaptive coding and partial string matching. IEEE Trans. Commun.
**1984**, 32, 396–402. [Google Scholar] [CrossRef] - Witten, I.; Neal, R.; Cleary, J. Arithmetic coding for data compression. Commun. ACM
**1987**, 30, 520–540. [Google Scholar] [CrossRef] - Brown, P.F.; Della Pietra, V.J.; Mercer, R.L.; Della Pietra, S.A.; Lai, J.C. An estimate of an upper bound for the entropy of English. Comput. Linguist.
**1992**, 18, 31–40. [Google Scholar] - Zobel, J.; Moffat, A. Adding compression to a full-text retrieval system. Softw. Pract. Exp.
**1995**, 25, 891–903. [Google Scholar] [CrossRef] - Teahan, W.J.; Cleary, J.G. The entropy of English using PPM-based models. In Proceedings of the IEEE Data Compression Conference (DCC ’96), Snowbird, UT, USA, 31 March–3 April 1996; pp. 53–62.
- Witten, I.; Moffat, A.; Bell, T. Managing Gigabytes: Compressing and Indexing Documents And Images; Morgan Kaufmann: San Francisco, CA, USA, 1999. [Google Scholar]
- Mahoney, M. Adaptive Weighting of Context Models for Lossless Data Compression; Technical Report CS-2005-16; Florida Institute of Technology: Melbourne, FL, USA, 2005. [Google Scholar]
- Hutter, M. 50,000 Euro Prize for Compressing Human Knowledge. Available online: http://prize.hutter1.net (accessed on 29 September 2016).
- Conley, E.; Klein, S. Using alignment for multilingual text compression. Int. J. Found. Comput. Sci.
**2008**, 19, 89–101. [Google Scholar] [CrossRef] - Martínez-Prieto, M.; Adiego, J.; Sánchez-Martínez, F.; de la Fuente, P.; Carrasco, R.C. On the use of word alignments to enhance bitext compression. In Proceedings of the Data Compression Conference, Snowbird, UT, USA, 30 March–1 April 2009; p. 459.
- Adiego, J.; Brisaboa, N.; Martínez-Prieto, M.; Sánchez-Martínez, F. A two-level structure for compressing aligned bitexts. In Proceedings of the 16th International Symposium on String Processing and Information Retrieval, Saariselka, Finland, 25–27 August 2009; pp. 114–121.
- Adiego, J.; Martínez-Prieto, M.; Hoyos-Torío, J.; Sánchez-Martínez, F. Modelling parallel texts for boosting compression. In Proceedings of the Data Compression Conference, Snowbird, UT, USA, 24–26 March 2010; p. 517.
- Sánchez-Martínez, F.; Carrasco, R.; Martínez-Prieto, M.; Adiego, J. Generalized biwords for bitext compression and translation spotting. J. Artif. Intell. Res.
**2012**, 43, 389–418. [Google Scholar] - Conley, E.; Klein, S. Improved Alignment-Based Algorithm for Multilingual Text Compression. Math. Comput. Sci.
**2013**, 7, 137–153. [Google Scholar] [CrossRef] - Grignetti, M. A note on the entropy of words in printed English. Inf. Control
**1964**, 7, 304–306. [Google Scholar] [CrossRef] - Burton, N.; Licklider, J. Long-range constraints in the statistical structure of printed English. Am. J. Psychol.
**1955**, 68, 650–653. [Google Scholar] [CrossRef] [PubMed] - Paisley, W. The effects of authorship, topic, structure, and time of composition on letter redundancy in English texts. J. Verbal Learn. Verbal Behav.
**1966**, 5, 28–34. [Google Scholar] [CrossRef] - Guerrero, F. A New Look at the Classical Entropy of Written English. arXiv
**2009**. [Google Scholar] - Jamison, D.; Jamison, K. A note on the entropy of partially-known languages. Inf. Control
**1968**, 12, 164–167. [Google Scholar] [CrossRef] - Rajagopalan, K. A note on entropy of Kannada prose. Inf. Control
**1965**, 8, 640–644. [Google Scholar] [CrossRef] - Newman, E.; Waugh, N. The redundancy of texts in three languages. Inf. Control
**1960**, 3, 141–153. [Google Scholar] [CrossRef] - Siromoney, G. Entropy of Tamil prose. Inf. Control
**1963**, 6, 297–300. [Google Scholar] [CrossRef] - Wanas, M.; Zayed, A.; Shaker, M.; Taha, E. First second-and third-order entropies of Arabic text (Corresp.). IEEE Trans. Inf. Theory
**1976**, 22, 123. [Google Scholar] [CrossRef] - Cover, T.; King, R. A convergent gambling estimate of the entropy of English. IEEE Trans. Inf. Theory
**1978**, 24, 413–421. [Google Scholar] [CrossRef] - Nevill, C.; Bell, T. Compression of parallel texts. Inf. Process. Manag.
**1992**, 28, 781–793. [Google Scholar] [CrossRef] - Koehn, P. Europarl: A parallel corpus for statistical machine translation. In Proceedings of the Machine Translation Summit X, Phuket, Thailand, 12–16 September 2005; pp. 79–86.
- Cover, T.; Thomas, J. Elements of Information Theory, 2nd ed.; Wiley-Interscience: Hoboken, NJ, USA, 2006; pp. 16–18. [Google Scholar]

**Figure 1.**Bilingual Shannon Game interface. The human subject reads the Spanish source and guesses the translation, character by character. Additional aids include a static machine translation and a dynamic word completion list.

**Figure 2.**Example guess data collected from the Shannon Game, in both monolingual (top) and bilingual (bottom) conditions. The human subject’s guesses are shown from bottom up. For example, in the bilingual condition, after seeing ‘...reason’, the subject guessed ‘.’ (wrong), but then correctly guessed ‘i’ (right).

**Figure 3.**Guess number distributions from human monolingual Shannon Game experiments (training portion). Plot (

**a**) shows all 1238 guesses, while plots (

**b**–

**d**) show guesses made in specific character contexts ‘ ’ (space), ‘a’ and ‘p’. The y-axis (probability of guess number) is given in log scale, so a geometric distribution is represented by a straight line. We observe that the single-parameter geometric distribution is a good fit for either the head or the tail of the curve, but not both.

**Table 1.**Estimates of the entropy of English (in bits per character). Machine results are taken from actual compression algorithms [1], while human results are computed from data elicited by the Shannon Game. The monolingual column is the original case studied by Shannon [3]. The bilingual column represents the number of additional bits needed to store English, given a Spanish source translation.

Monolingual | Bilingual | |
---|---|---|

Machine | 1.39 | 0.89 |

Human | 1.25 | 0.42 (this paper) |

**Table 2.**Unigram probabilities of machine and human guesses, in both monolingual and bilingual conditions. Amounts of training data (in characters) are shown in parentheses.

Guess # | Monolingual | Bilingual | ||
---|---|---|---|---|

Machine | Human | Machine | Human | |

(100 m) | (1283) | (100 m) | (1378) | |

1 | 0.732 | 0.744 | 0.842 | 0.916 |

2 | 0.105 | 0.086 | 0.074 | 0.035 |

3 | 0.047 | 0.047 | 0.024 | 0.013 |

4 | 0.027 | 0.030 | 0.014 | 0.011 |

5 | 0.017 | 0.020 | 0.009 | 0.005 |

6 | 0.012 | 0.012 | 0.007 | 0.004 |

7 | 0.009 | 0.008 | 0.005 | 0.001 |

8 | 0.007 | 0.007 | 0.004 | 0.001 |

9 | 0.006 | 0.005 | 0.003 | 0.001 |

10 | 0.005 | 0.004 | 0.003 | 0 |

... | ... | ... | ... | ... |

93 | 7.05 × 10${}^{-8}$ | 0 | 0 | 0 |

94 | 7.69× 10${}^{-8}$ | 0 | 0 | 0 |

95 | 1.09× 10${}^{-7}$ | 0 | 0 | 0 |

**Table 3.**Entropies of monolingual test guess-sequences (1000 guesses), given varying amounts of context (c = number of previous characters, g = number of previous guess numbers) and different training set size (shown in parentheses). Witten-Bell smoothing is used for backoff to shorter contexts. The best number in each column appears in bold.

c | g | Machine Guessing | Human | |||
---|---|---|---|---|---|---|

(100 m) | (10 m) | (1 m) | (1 k) | (1283) | ||

0 | 0 | 1.72 | 1.72 | 1.72 | 1.76 | 1.68 |

0 | 1 | 1.70 | 1.70 | 1.71 | 1.84 | 1.75 |

0 | 2 | 1.69 | 1.69 | 1.71 | 2.03 | 1.92 |

1 | 0 | 1.54 | 1.54 | 1.70 | 1.86 | 1.74 |

1 | 1 | 1.52 | 1.52 | 1.54 | 2.20 | 2.18 |

1 | 2 | 1.50 | 1.52 | 1.55 | 2.37 | 2.32 |

2 | 0 | 1.45 | 1.51 | 1.58 | 2.25 | 2.20 |

2 | 1 | 1.44 | 1.53 | 1.54 | 2.56 | 2.58 |

2 | 2 | 1.48 | 1.56 | 1.59 | 2.73 | 2.70 |

8 | 0 | 1.37 |

**Table 4.**Entropies of human guess-sequences (1000 test-set guesses), given varying amounts of context (c = number of previous characters , g = number of previous guess numbers) and different smoothing methods. Prediction models are trained on a separate sequence of 1283 guesses in the monolingual case, and 1378 guesses in the bilingual case. The best entropy of Monolingual/Bilingual human guessing appears in bold.

c | g | Monolingual Human Guessing | Bilingual Human Guessing | ||||
---|---|---|---|---|---|---|---|

Witten-Bell Smoothing | Geometric Smoothing | Frequency and Geometric Smoothing | Witten-Bell Smoothing | Geometric Smoothing | Frequency and Geometric Smoothing | ||

0 | 0 | 1.68 | 2.02 | 1.62 | 0.54 | 0.73 | 0.67 |

0 | 1 | 1.75 | 2.06 | 1.62 | 0.56 | 0.72 | 0.66 |

0 | 2 | 1.92 | 2.06 | 1.65 | 0.61 | 0.72 | 0.67 |

1 | 0 | 1.74 | 1.57 | 1.50 | 0.65 | 0.57 | 0.48 |

1 | 1 | 2.18 | 1.55 | 1.48 | 0.84 | 0.56 | 0.48 |

1 | 2 | 2.32 | 1.52 | 1.49 | 0.93 | 0.56 | 0.49 |

2 | 0 | 2.20 | 1.65 | 1.60 | 0.94 | 0.63 | 0.63 |

2 | 1 | 2.58 | 1.57 | 1.57 | 1.10 | 0.63 | 0.62 |

2 | 2 | 2.70 | 1.59 | 1.58 | 1.18 | 0.63 | 0.63 |

Guesser | Shannon Upper Bound | Our Improved Upper Bound | Compression Rate | Shannon Lower Bound | |
---|---|---|---|---|---|

Monolingual | Machine | 1.76 | 1.63 | 1.39 | 0.63 |

Human | 1.65 | 1.47 | ∼1.25 | 0.57 | |

Bilingual | Machine | 1.28 | 1.01 | 0.89 | 0.46 |

Human | 0.54 | 0.48 | ∼0.42 | 0.21 |

© 2016 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC-BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Ghazvininejad, M.; Knight, K. Humans Outperform Machines at the Bilingual Shannon Game. *Entropy* **2017**, *19*, 15.
https://doi.org/10.3390/e19010015

**AMA Style**

Ghazvininejad M, Knight K. Humans Outperform Machines at the Bilingual Shannon Game. *Entropy*. 2017; 19(1):15.
https://doi.org/10.3390/e19010015

**Chicago/Turabian Style**

Ghazvininejad, Marjan, and Kevin Knight. 2017. "Humans Outperform Machines at the Bilingual Shannon Game" *Entropy* 19, no. 1: 15.
https://doi.org/10.3390/e19010015