Next Article in Journal
Nonasymptotic Upper Bounds on Binary Single Deletion Codes via Mixed Integer Linear Programming
Next Article in Special Issue
Productivity and Predictability for Measuring Morphological Complexity
Previous Article in Journal
Nonlinear Heat Transport in Superlattices with Mobile Defects
Previous Article in Special Issue
Semantic Entropy in Language Comprehension
Open AccessArticle

Entropy Rate Estimation for English via a Large Cognitive Experiment Using Mechanical Turk

1
Sorbonne Université, École Polytechnique Universitaire, 75005 Paris, France
2
Graduate School of Engineering, The University of Tokyo, Tokyo 113-8654, Japan
3
Research Center for Advanced Science and Technology, The University of Tokyo, Tokyo 153-8904, Japan
*
Author to whom correspondence should be addressed.
Entropy 2019, 21(12), 1201; https://doi.org/10.3390/e21121201
Received: 30 October 2019 / Revised: 2 December 2019 / Accepted: 3 December 2019 / Published: 6 December 2019
(This article belongs to the Special Issue Information Theory and Language)
The entropy rate h of a natural language quantifies the complexity underlying the language. While recent studies have used computational approaches to estimate this rate, their results rely fundamentally on the performance of the language model used for prediction. On the other hand, in 1951, Shannon conducted a cognitive experiment to estimate the rate without the use of any such artifact. Shannon’s experiment, however, used only one subject, bringing into question the statistical validity of his value of h = 1.3 bits per character for the English language entropy rate. In this study, we conducted Shannon’s experiment on a much larger scale to reevaluate the entropy rate h via Amazon’s Mechanical Turk, a crowd-sourcing service. The online subjects recruited through Mechanical Turk were each asked to guess the succeeding character after being given the preceding characters until obtaining the correct answer. We collected 172,954 character predictions and analyzed these predictions with a bootstrap technique. The analysis suggests that a large number of character predictions per context length, perhaps as many as 10 3 , would be necessary to obtain a convergent estimate of the entropy rate, and if fewer predictions are used, the resulting h value may be underestimated. Our final entropy estimate was h 1.22 bits per character. View Full-Text
Keywords: entropy rate; natural language; crowd source; Amazon Mechanical Turk; Shannon entropy entropy rate; natural language; crowd source; Amazon Mechanical Turk; Shannon entropy
Show Figures

Figure 1

MDPI and ACS Style

Ren, G.; Takahashi, S.; Tanaka-Ishii, K. Entropy Rate Estimation for English via a Large Cognitive Experiment Using Mechanical Turk. Entropy 2019, 21, 1201.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Search more from Scilit
 
Search
Back to TopTop